Handbook of the Mathematics of the Arts and Sciences [1 ed.] 3319570714, 9783319570716

https://www.springer.com/gp/book/9783319570716 The goal of this Handbook is to become an authoritative source with chapt

2,590 136 115MB

English Pages 2853 [2794] Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Handbook of the Mathematics of the Arts and Sciences [1 ed.]
 3319570714, 9783319570716

Table of contents :
Foreword
Contents
About the Editor
Editorial Board
Section Editors
Consulting Editors
Contributors
Part I Mathematics, Art, and Aesthetics
1 Mathematics, Art, and Aesthetics: An Introduction
References
2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?
Contents
Introduction: Who Is Modern Homo Habilis Mathematicus?
What Would Jon Do?
Phase Portraits: A Motivating Example
Reinvention by Bridging Between Contexts
Repurposing Phase Portraits for Dynamical Systems
Dynamical Geometry and Asymptotic Destination Plotting
Experimentally Checking Numerical Error
Completing the Circle: The Line from Specific to General
Sometimes All You Need Is a Good Walk
Walking on a Dynamical System
When the Computer Knows More Than You Do
Symbolic Answers from Numerical Approximations
Conic Programming and Mystery Geometry
Conclusion
References
3 The Beauty of Blaschke Products
Contents
Introduction
Complex Arithmetic and Geometry
Seeing Complex Functions
Hyperbolic Geometry
Blaschke Products
Blaschke Products and Ellipses in the Euclidean Plane
Blaschke Products and Ellipses in the Poincaré Disk Model
Compositions of Blaschke Products
Conclusion
References
4 Looking Through the Glass
Contents
Introduction
A Brief History
A New Mathematical Object: The Point of Projective Geometry
Ideal Points
Vanishing Points
Where Was the Camera?
A Consequence of Viewing Distances: Illusion, Distortion, and Anamorphism
Dolly Zoom
Anamorphic Art
Impossible Figures
Going Backward from Pictures to 3D
Homogeneous Coordinates
Multiple View Geometry
The Ames Room
Reconstructing Objects from Images
Conclusion
Cross-References
References
5 Designing Binary Trees
Contents
Introduction
Creating Binary Trees
Mathematical Approach
A First Example: L~LR
A Second Example: LR~RL
A Third Example: LR∞ ~ (RL)∞
Other Issues
Artistic Considerations
Conclusion
References
6 Homeomorphisms Between the Circular Disc and the Square
Contents
Introduction
Canonical Mapping Space
Mapping Diagram with Equations
Some Mathematical Details
Fernandez-Guasti Squircle
Tapered2 Squircular Mapping
Lamé Squircle
Elliptical Grid Mapping
Conformal Square Mapping via Schwarz-Christoffel
Legendre Elliptic Integrals
A Fundamental Conformal Map
Canonical Alignment
Software Implementation
A Complex Class of Squircles
Application: Squaring the Poincaré Disk
Hyperbolic Tilings
Application: Elliptification of Rectangular Imagery
Size Versus Shape Distortions
Conclusion
Cross-References
References
7 A Visual Overview of Coprime Numbers
Contents
Introduction
Coprime Numbers and Skew Sturmian Sequences
Bézout Coefficients
Ford Circles and Farey Sequences
Bézout Graphs
Conclusion
References
8 Almost All Surfaces Are Made Out of Hexagons
Contents
Introduction
Closed Surfaces
Pants Decomposition
Hyperbolic Plane and Negative Curvature
Each Surface Admits More Than One Geometric Shape
References
9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives
Contents
Introduction
Anamorphosis Formed Again
The Empirical Principle: Radial Occlusion
Anamorphosis Formed Fast
Some Considerations on Anamorphosis
The Point of Observation
Multiple Points of Observation
``Impossible'' Objects
On Color
Binocular Anamorphoses
Anamorphosis Formally Reformed
Mathematical Preliminaries
Anamorphosis as a Mathematical Object
More General Surfaces
Simplifications: Talking to Artists
On Compactness
Descriptive Geometry Construction of Anamorphoses
Handmade vs Digital Anamorphoses
Dürer Machines Running Back and Forth
Perspectives
Spherical Perspectives
The Problem with Perspective
Euclid and Psychophysics
Leonardo's Axiom and Paradox
Effects on the Development of Spherical Perspective
Conclusion
Cross-References
References
10 Anamorphosis: Between Perspective and Catoptrics
Contents
Introduction
Anamorphosis Between Paris and Rome: A Catoptric Relationship
The Project for a Scientific Villa in Baroque Rome as a Mirror of Time
Conclusion
References
11 Geometric and Aesthetic Concepts Based on Pentagonal Structures
Contents
Introduction
Tessellations and Their Dualizations
Tiling with Regular Pentagons
Pentagrid as Art Repertoire
From the Pentagrid to the Kite-Dart-Grid
Spatial Structures with Dodecahedra
Spatial Structures with Rhombohedra: Golden Diamonds
Geometry and Art: Reflections on Aesthetics
Conclusion
Cross-References
References
12 Mathematics and Origami: The Art and Science of Folds
Contents
Introduction
Modern Origami and Mathematical Axiomatization
Origami and the Delian Problem
Modular Origami
Origami and Technology
Art of Origami
Conclusion
Cross-References
References
13 Geometric Strategies in Creating Origami Paper Lampshades: Folding Miura-ori, Yoshimura, and Waterbomb Tessellations
Contents
Introduction
Background on Paper Lanterns
Contemporary Origami-Inspired Paper Lampshades
Light, Origami Design, and Material
Design Parameters and Considerations for Origami Lampshade Design
Flat-Foldable Origami Tessellations: Miura, Yoshumura, and Waterbomb Patterns
Mathematical Theorems Governing Flat-Foldable Origami Tessellations
Miura-ori Tessellation
Miura-ori and the Bird's-Foot Vertex
Folding Miura-ori into Cylindrical Lampshade with Translation Symmetry
Folding Miura-ori into a Lampshade with Rotational Symmetry
Yoshimura Tessellation
Yoshimura Tessellation and Its Double Bird's Foot Vertex
Folding Yoshimura into Cylindrical Lampshade with Translational Symmetry
Folding Yoshimura into a Lampshade with Rotational Symmetry
Waterbomb Tessellation
Waterbomb Tessellation and Its Vertices
Folding Waterbomb Tessellation into Cylindrical Lampshade with Translational Symmetry
Folding Waterbomb Tessellation into a Lampshade with Rotational Symmetry
Conclusion
Cross-References
References
14 Mathematical Design for Knotted Textiles
Contents
Introduction: Mathematics and Textiles
Textile Knot Practice to Be Analyzed
What is a Knot? Knot Theory and Its Diagrammatic Method
Comparison Between Textile Knot Practice and Mathematical Knot Theory
Analysis of Textile Knot Practice Using Knot Theory
New Knot Pattern Designs Based on Knot Diagrams
Use of New Materials Inspired by Knot Theory
Analysis of Textile Knot Practice Using Braid Theory
Definition of Tilings
Analysis of Textile Knot Practice Using Tilings
New Pattern and Structure Designs Based on Tiling Concepts
Conclusion
References
15 Art and Science of Rope
Contents
Introduction
Terminology
Archaeological and Historical Aspects
Pottery
Mosaic
Materials
Natural Fibers
Plants
Animalia
Minerals
Man-Made Fibers
Methods of Construction
Laid Rope
Hand-Operated Equipment and Tools
Machines
Braided Rope
Hand-Operated Equipment and Tools
Machines
Rope Properties
Mathematical Properties
Cross Section
Rope Diameter
Core Diameter
Mechanical and Physical Properties
Degree of Twist
Linear Density
Breaking Force
Elongation
Rope Length
Conclusion
References
16 A Survey of Cellular Automata in Fiber Arts
Contents
Introduction
Cellular Automata
Representations of Cellular Automata in Fiber Arts
Sierpiński Triangles and Related Cellular Automata
Other Designs from Well-Known Cellular Automata Rules
Cellular Automata Designs Created for Fiber Arts
Conclusion
Cross-References
References
17 Mathematics and Art: Connecting Mathematicians and Artists
Contents
Introduction
Mathematical Tools for Artists
Symmetry
Asymmetry
Mathematical Artists and Artist Mathematicians
Geometrical Art
Polyhedra, Tilings, and Dissections
Origami
Bridging the World of Art and Mathematics
End Notes
References
18 Mathematics and Art: Unifying Perspectives
Contents
Introduction
Mathematics in Art
Mathematics as an Artistic Inspiration
Mathematics as an Artistic Tool and Medium
The Interplay of Art, Culture, and Mathematics
Artistic Ideas in Mathematics
Graphs and Their Visualizations
Examples of Graphs
Knots and Graphs
Reconfiguration Systems
Unifying Perspectives
Conclusion
Cross-References
References
19 Spherical Perspective
Contents
Introduction
History
Spherical Anamorphosis
Radial Occlusion and Mimesis
Spherical Anamorphs and Their Vanishing Points
Spherical Perspective as Cartography of the Visual Sphere
Referentials
Azimuthal Coordinate System
Horizontal Coordinate System
Angular Measurements
Azimuthal Equidistant Spherical Perspective (360-degree Fisheye)
The Azimuthal Equidistant Flattening
Solving the Azimuthal Equidistant Spherical Perspective
Fixed Grids for the Azimuthal Equidistant Perspective
A Ruler and Compass Construction of the Azimuthal Equidistant SphericalPerspective
Perspective Constructions
Tiled Floor (Central)
Inside a Cube
Arbitrary Square
Tiled Floor
Dynamic Grids
Drawing from Nature
Equirectangular Perspective
VR Panoramas as Immersive Anamorphoses
Construction of the Equirectangular Flattening
Images of Geodesics
Ruler and Compass Approximations
Drawing Lines
Sliding Grids
Spherical Straightedges in Digital Drawing Programs
Conclusion: What Is (Not) a Spherical Perspective
Cross-References
References
20 A Hidden Order: Revealing the Bonds Between Music and Geometric Art – Part One
Contents
Introduction
Harmony
Harmony of Time
Harmony of Space
The Pentagon
The Hexagon
The Octagon
A Mapping between Music and Geometric Art
Color to Pitch Relationship
Loudness and Brightness
Hue and Pitch
Brightness, Loudness, and Pitch
Timbre and Saturation
A Relationship Between Rhythm and Pattern
A Unit of Time and a Unit of Space
Binary Counting Grid
An Alternative Square Tiling
Hilbert Curve Tiling
The Dragon Curve
Hexagons
More Hexagons
Rhythmic Motifs
Rotations
Grid Symmetry, Time Signature, and Structure of the Composition
Pentagonal Symmetry
Fibonacci, Bar Length, and Structure of Composition
Indexing the Penrose Tiling
Octagonal Symmetry
Summary
References
21 A Hidden Order: Revealing the Bonds Between Music and Geometric Art – Part Two
Contents
Structure of Final Design
Indefinite Growth
Linear Layout
Change of Time Signature
Compound Grids
Performance Dynamics and Accents
Timbre and Texture
Creative Implications of a Translation Between Music and Art
Creative Approaches Explored in A Hidden Order
Applying Musical Composition Techniques to Geometric Artwork
Introduction/Contrasting Sections
Aperiodic Rhythms
Conclusion
Some Final Thoughts on the Research
Pain and Gain Through Restrictions
A Multidimensional Artistic Object
Time as Space
Looking Ahead
Cross-References
References
22 Korean Traditional Patterns: Frieze and Wallpaper
Contents
Introduction
Frieze Patterns
Wallpaper Patterns
Some Designs
Conclusion
References
23 Projections of Knots and Links
Contents
Introduction
Terminology
Mathematical Concepts
Geometry
Knot Theory
Knotwork Concepts
Rectangular Diagonal Knotwork
Circular Knotworks
Turk's Head
Archaeological and Historical Aspects
Contemporary and Traditional Art
Knotwork Analysis
The Number of Components
The Number of Crossings
Braiding Pattern
Symmetry
Coloring
Construction of Knotworks
Discussion
References
24 Comparative Temple Geometries
Contents
Introduction
Islamic Region and Religion
Trading Mathematics and Art
Islamic Mathematics
Islamic Geometric Patterns and Art
Japanese Mathematics
Japanese Temple Geometry
Conclusion
Cross-References
References
25 Wasan Geometry
Contents
Introduction
Wasan
Wasan Geometry
Problems Involving Congruent Circles
Congruent Circles on a Line and a Circle
Congruent Circles on a Line with Two Congruent Circles on a Line
Congruent Circles on a Line and Congruent Squares
Two Congruent Circles on a Line
Congruent Circles on a Line with Two Intersecting Congruent Circles
Two Sets of Congruent Circles on a Line and Two Circles
A Square and Three Congruent Circles in an Isosceles Triangle
Congruent Circles in a Rectangle
The Arbelos in Wasan Geometry
Two Sangaku Problems Involving a Circle of the Same Radius
Two Congruent Circles Touching a Perpendicular to AB
Two Circles Touching a Perpendicular to AB at the Same Point
Two Congruent Circles Touching an Inclined Line to AB
Congruent Circles Touching a Circle Passing Through the Center of α
Reflection in the Axis
Golden Arbelos
Arbelos with Overhang
Arbeloi Determined by a Chord
A Sangaku Problem Involving an Archimedean Circle
A Sangaku Problem Involving Two Archimedean Circles
Wasan Geometry and Division by Zero
The Configuration A(1)
A Three-Circle Problem
Practical Side
Study of Wasan Geometry: Past and Present
References
26 Geometries of Light and Shadows, from Piero della Francesca to James Turrell
Contents
Introduction
Piero della Francesca's Darkness
James Turrell's Darkness
Conclusion
References
27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through the Logic of the X-Tiles
Contents
Introduction: The Two Traditional Persian Families of Pentagonal Patterns
The Kond + Sholl Family
The Tond Family
Multilevel Patterns. Reminders, and a New Case
Two Kond Self-Similar Systems
A Third Type of Kond Self-Similar System
Transitions Between Different Families
[A] == > [A] (from Kond + Sholl to Kond + Sholl)
[A] == > [B]. From Kond + Sholl to Tond or, More Often, from [A1] to [B]
[B] == > [A1]. From Tond to Kond
Generalization of the First Example
Generalization of the Second Example
[B] == > [B]. From Tond to Tond
X-Tiles
Definition
The X-Tiles and the Tond Traditional Family of Pentagonal Patterns
Transition from Kond to Tond with the X-Tiles
Tond to Tond Transition Through the X-Tiles
Self-Similarity of TOND Patterns Through the X-Tiles
Principle
First Inflation Rule: System V1
The Inflation Rule
Order of Appearance of the Tiles
The Two-Level Tiles
Second Inflation Rule: System V2
The Inflation Rule
The Set of All the Tond Tiles that Can Emerge from the V3 System
Order of Appearance of the Tiles
The Two-Level Tiles
Remark: Other Valid Orientation Options in the V2 System
Third Inflation Rule: System V3
The Inflation Rule
The Set of All the Tond Tiles that Can Emerge from the V3 System
Option V3.1
The Two-Level Tiles and the Interlacings
Option V3.4
Fourth Inflation Rule: System V4
The Inflation Rule
Working with Decorated Rhombuses
To Go Further
Conclusion
Cross-References
References
28 Artistic Manifestations of Topics in String Theory
Contents
Introduction
Glimpses into String Theory
Genesis
First Superstring Revolution
Second Superstring Revolution
AdS/CFT Correspondence
The Imagery of String Theory
A Piece of String
Pants Diagram
Calabi-Yau
M.C. Escher
Music
Film and Television
Ceramics Inspired by String Theory
Circle
Cusp
Sewing
Threehalves
Cut
Anomaly
Subsurface
Conclusions
References
29 Cutting, Gluing, Squeezing, and Twisting: Visual Design of Real Algebraic Surfaces
Contents
From Algebraic Formulas to Geometric Forms: Real Algebraic Surfaces
Standard Constructions: Union, Intersection, and Smoothing
Morphing
Symmetry
Cutting and Gluing
Squeezing, Shifting, and Twisting
References
30 Double Layered Polyhedra
Contents
Elevation
Vertex Figure
Knots
Holes and Compounds
Connected Holes
Connecting the Knots
Odd or Even, Grünbaum's Double Polyhedra Versus Jitterbug
Face-Doubling
Jitterbug Transformation Applied to Infinite Uniform Polyhedra
Unfolding Multilayer Polyhedra
Unfolding the Double Layered Cube
Double Layered Tetrahedron
Double Layered Cuboctahedron
Double Layered Dodecahedron
Double Layered Icosahedron
Elevation: Combinations of Polyhedra
Strips and Rings
Zonohedra
Polar Zonohedra
Conclusion
Cross-References
References
Part II Mathematics, Humanities, and the Language Arts
31 Mathematics, Humanities, and the Language Arts: An Introduction
Contents
Cross-References
32 Mathematics and Poetry: Arts of the Heart
Contents
Introduction
Mathematics of Poetry
Syllabic Verse
Rhyme
Visual Form
Other Mathematical Concerns About Poetry
Poetry of Mathematics
Poetic Mathematics
Mathematical Poetry
Educational Possibilities
Further Reading and Making Connections
References
33 ``Elegance in Design'': Mathematics and the Works of Ted Chiang
Contents
Introduction
Direction
Decryption
Division
Determination
Writing Like a Heptapod: Nonlinear Semasiography
Thinking Like a Heptapod: Variational Principles
Premembering: Nonlinear Orthography and Nonlinear Time
Story of Her Life
Conclusion
References
34 Running in Shackles: The Information-Theoretic Paradoxes of Poetry
Contents
Introduction
The Form Paradox
The Nonsense Paradox
The Curious Case of Missing Synonyms
A Word in Its Place
Beyond Entropy
Conclusion
References
35 Metaphor: A Key Element of Beauty in Poetry and Mathematics
Contents
Introduction
Beauty in Poetry and Math
Metaphors in Mathematics
A Taxonomy of Mathematical Metaphors
Explicative or Homey Metaphors
Discovery or Eureka Metaphors
Creative or Special Metaphors
Mathematical and Poetic Metaphors: Differences and Similarities
Seven Differences Between Mathematical and Poetic Metaphors
Seven Reasons Why Metaphor Creates Beauty (Emotion) in Poetry and Mathematics
Cross-References
References
36 Poems Structured by Mathematics
Contents
Introduction
Early Examples of Mathematical Form
The Oulipo and Raymond Queneau
Sestinas
Poetic Enumeration
Syllables per Line
Words per Line and Latin Squares
Lines per Stanza and Pi
Letters per Line
Pantoums and Platonic Solids
Fundamental Theorem of Arithmetic Poetry
Incidence Geometry Poetics
Summary and Concluding Remarks
Cross-References
References
37 Lewis Carroll's Defense of Euclid: Parallels or Contrariwise
Contents
Introduction
Euclid and His Controversial Elements
Emergence of Non-Euclidean Geometries
Non-Euclidean Geometries and the Education System
Charles Dodgson: The Oxford Mathematician
Lewis Carroll's New Approach to the Euclidean Debate
Geometric “Straight” Analogies
Defense of the Parallel Postulate
Carroll and Mathematics Examinations
Euclid and His Modern Rivals
Carroll's Misunderstandings of Non-Euclidean Geometries
Conclusion: The Real Reason Carroll Fought for Euclid
References
Part III Mathematics and Architecture
38 Architecture and Mathematics: An Ancient Symbiosis
Contents
Introduction
Relationships and Epistemology
Mathematics in Architecture
Mathematics for Architecture
Mathematics of Architecture
Conclusion
Cross-References
References
39 Egyptian Architecture and Mathematics
Contents
Introduction
Definitions
Accurate Reckoning for Enquiring into Things
Scribes and Builders
Mathematics and Architecture
Practical Operations
Meanings Beyond Numbers?
Conclusions
References
40 Labyrinth
Contents
Introduction
Topology of Labyrinths
Definitions
Definition
Mnemonic Devices
Conclusion
References
41 Classical Greek and Roman Architecture: Mathematical Theories and Concepts
Contents
Introduction
The Figurate Representation of Quantities
Arithmetic
Geometry
The Visual Comparison of Quantities
The Theory of Proportion and Means
Musical Proportions
The Duplication of the Cube
Art and Architecture
Conclusion
Cross-References
References
42 Classical Greek and Roman Architecture: Examples and Typologies
Contents
Introduction
Vitruvius
Symmetry: Numbers and Ratios in Greek Temples
Ionic Temples
Doric Temples
Arithmetization of Geometry
Roman Innovation: Amphitheaters
Conclusion
Cross-References
References
43 Mathematics and the Art and Science of Building Medieval Cathedrals
Contents
Abbreviations
Introduction. The Cathedral and the Gothic Order
Gothic Apses and Sacred Geometry
The Theorica of the Canons of Tortosa Cathedral
Commentary on Euclid's Elements by Al-Haijaj (c.325–c.265 BC)
Saint Augustine's De Civitate Dei
Translation of Plato's Timaeus by Calcidius, with Part of a Commentary
Part of the Commentary on Plato's Timaeus by Calcidius
Commentary on Somnium Scipionis by Macrobius
Part of Geometria from Martianus Capella's Marriage of Philology and Mercury
Geometria Incerti Auctoris by Gerbert (Silvester II)
The Positional Number System of Adelard of Bath
Practica Versus Theorica of Tortosa Cathedral
The Construction of Heptagons
The Construction of Octagons
The Geometria Fabrorum
Mathematics and the Art and Science of Building Medieval Cathedrals
References
44 Renaissance Architecture
Contents
Introduction
The Heritage from Classical Antiquity
Mathematical Beauty in the Renaissance
Beauty in Renaissance Architecture
Perspective
Conclusion
Cross-References
References
45 Baroque Architecture
Contents
Introduction
Baroque Architecture and Architects
Church Design: The Elongated Centrality
Odd Polygons and Complex Curves
Literary Sources and Onsite Studies
Perspective and Anamorphosis
Baroque Polymathy
Conclusion
Cross-References
References
46 Temple of Solomon
Contents
Introduction
Villalpando's Flawless System
Ezechielem Explanationes' Influence
Conclusion
References
47 Utopian Cities
Contents
Introduction
The Search for the Ideal City
Conclusion
References
48 Tessellated, Tiled, and
Woven Surfaces
in Architecture
Contents
Introduction
Background to Tiling
Tiling in Architecture
Conclusion
Cross-References
References
49 Stereotomy: Architecture and Mathematics
Contents
Introduction
Geometric Knowledge for the Rationalization of Structural Form Constructed with Small Elements
Stereotomic Architecture Is Historically Based on Geometrical and Cutting Technique Knowledge
The Application of Stereotomy Using Innovative Technology: “Stereotomy 2.0”
Research About “Stereotomy 2.0”
Stereotomy with 3D Printing in the Age of Industry 4.0
Conclusion
Cross-References
References
50 Fractal Geometry in Architecture
Contents
Introduction
Background
Fractal Geometry
Fractal Geometry in Architecture
Examples of Fractal Geometry in Architecture and Design
Conclusion
Cross-References
References
51 Parametric Design: Theoretical Development and Algorithmic Foundation for Design Generation in Architecture
Contents
Introduction
Generative Design
Common Characteristics of Generative Design
Main Generative Design Systems
Generative Grammars
Evolutionary Systems
Emergent and Self-Organized Systems
Associative Generation
Parametric Design
Historical Review of Parametric Design
Origin of Parametric Design
Development of Parametric Design
Parametricism
Parametric Design
Reshaping Architectural Design
Impact on Architectural Design
Limitations of Parametric Design
Conclusion
Cross-References
References
52 Shape Grammars: A Key Generative Design Algorithm
Contents
Introduction
Background
Basic Shape Grammars
Main Components of a Shape Grammar
Shape Grammar Application
Designing a Shape Grammar
Corpus Selection
Shape Grammar Development
Shape Grammar Evaluation
Extensions of Basic Shape Grammars
Parallel Grammars
Parametric Grammars
Graph Grammars
Further Discussion on the Extensions
Applications of Shape Grammars
Description and Analysis
Reproduction and Generation
Optimization and Customization
Combination with Other Methods
Implementation of Shape Grammars
Shape Grammar and Other Generative Design Algorithms
Discussion and Conclusion
References
53 Space Syntax: Mathematics and the Social Logic of Architecture
Contents
Introduction
Space Syntax and Mathematics
Spaces, Lines, and Points
Application
Conclusion
Cross-References
References
54 Isovists: Spatio-visual Mathematics in Architecture
Contents
Introduction
Background
Isovist Measures and Mathematics
Application
Conclusion
Cross-References
References
55 Fractal Dimensions in Architecture: Measuring the Characteristic Complexity of Buildings
Contents
Introduction
Background
The Box-Counting Method in Architecture
Stage 1: Data Preparation
Stage 2: Data Representation
Stage 3: Data Preprocessing
Stage 4: Data Processing
Application
Conclusion
Cross-References
References
Part IV Mathematics in Society
56 Mathematics in Society: An Introduction
57 Probabilistic Thinking from Elementary Grades to Graduate School
Contents
Introduction
Interpretations of Probability
Probability in US Schools
Probability in Grades K-12
Probability in Undergraduate Mathematics
Measure-Theoretic Probability in Graduate Mathematics
Subjective Probability in Graduate Mathematics
Probabilistic Connections to the Sciences
Conclusion
Cross-References
References
58 Risk and Decision Making: Modeling and Statistics in Medicine – Fundamental…
Contents
Introduction
Rationality in Decisions in Health Issues
Kinds of Thinking and Learning: Consequences of the Goal of Rationality
Constituents of Risky Situations
Nature and Definition of Risk Involved in Decisions
Type of the Decision Situation
People or Stakeholders Involved in the Decision
The Quality of Information
Risk Management in Health Issues
The Difficulty to Assess Information
Informed Consent Versus Shared Decisions
Understanding Risk
Statistical Methods in Medicine
Significance Tests
An Example
Concerns with the P Value
A Medical Diagnosis Based on Cut Points to Separate the Groups of Healthy and Ill
An Analogy of the Medical Situation to Statistical Tests
Sample Size Needed for Ensuring Good Quality of Information from Studies
Conclusions
Cross-References
References
59 Risk and Decision Making: Modeling and Statistics in Medicine – Case Studies
Contents
Introduction
Case Study 1: Risk Communication
The Case of Lipitor: Absolute and Relative Risks
Background Information
The Advertising Campaign Is a Mixture of Objective Information and a Play with Emotions
The Flaws of the Advertisement Campaign
Absolute and Relative Risk and the Interpretation of Reducing Risks
Empirical Evidence for the Claim of Superiority of Lipitor and the Risk Reduction
Last But Not Least: The Missing Discussion About the Side Effects of Long-Term Medication
Understanding the Statistical Information and Other Criteria for Judging the Risk
Simplifying the Methods for Easier Communication and Understanding of Risks
Case Study of Prostate Cancer
Case Study of Breast Cancer
Simplifying Supports the Communication But Introduces a Shift of Data Toward Facts
Case Study 2: Dialogues on a Medical Diagnosis
To Screen or Not to Screen
A First Attempt to Compare Alternatives, Find Data, and Interpret the Risk Numbers
First Investigations
A Preliminary Evaluation of the Risk
Further Data for a More Profound Evaluation of the Risk
Prevalence: The Incidence of Breast Cancer is Dependent on Age
An Interpretation of Correct-Negative: The Correct-Negative Rate
Case Study 3: Benefits and Drawbacks of Screening
Measuring the Success of Screening Programs
Stakeholders Involved in the Introduction of Screening Programs
Meta-Analyses: The Attempt of an Evaluation of Screening for Breast Cancer
Increase in Lifetime and Number of Lives Saved
Rate of False Positives
Rate of False Negatives
Evaluation of Potential Harm
An Evaluation of the Impact of Screening as Compared to No Screening
Success of Other Screening Programs
Does the Evidence Support the Recommendations?
Crucial questions for an informed decision are:
Gigerenzer's Fact Box on Screening for Breast Cancer
Gasche's Public-Health Discussion in Switzerland
The US Discussion on Screening
Conclusions
Cross-References
References
60 To Justice Through Statistics
References
61 Actuarial (Mathematical) Modeling of Mortality and Survival Curves
Contents
Introduction to the Development and History of Mathematical Models of Mortality
Life Insurance Before the Invention of the Mortality Table
Importance of Having a Mortality Table
The Innovation of Mortality Model
De Moivre and the First Creation of a Mathematical Law of Mortality
Gompertz and Makeham Laws of Mortality
Other Parametric Mortality Models
Stochastic Mortality Model for Individual Mortality Rate
Joint Life Mortality Models
Why Do We Need Joint Life Mortality Models?
Copula Model
A New Stochastic Mortality Model for Joint Lives
Nonparametric Estimation of the Mortality Function
One-Sample Estimation
Joint Mortality Estimation
Mortality Modeling with Cohort Effect
Increasing in Human's Life Expectancy and Longevity Risk
Lee-Carter Model
Extensions of Lee-Carter Model
Mitchell et al. (2013)’s Extension of the Mortality Model
References
62 Mathematics in the Maritime
Contents
Introduction
Calculating Latitude
Calculating Longitude
Map Making
Global Positioning Systems
The Least Squares Method
The Advent of Insurance and Actuarial Science
Conclusion
Cross-References
References
63 Mathematics and Economics, with Special Attention to Social Choice Theory
Contents
Introduction
Mathematics in Economics, Game Theory, and Social Choice Theory
General Equilibrium Theory
Social Choice Theory
Game Theory
The Use of Mathematics in Economics Questioned
Conclusion: The Indispensability of Mathematics
References
64 Social Algorithms and Optimization
Contents
Introduction
A Brief History
Essence of Algorithms
Optimization Algorithms
Optimization
Search for Optimality
Advantages of Social Algorithms
Social Algorithms
Algorithms as Descriptive Systems
Ant Colony Optimization
Bees-Inspired Algorithms
Algorithms as Linear Systems
Particle Swarm Optimization
Artificial Bee Colony
Firefly Algorithm as a Nonlinear System
Algorithms as Quasi-linear Systems
Bat Algorithm
Cuckoo Search
Algorithm Analysis and Open Problems
Algorithms and Self-Organization
Balance of Exploitation and Exploration
Open Problems
Conclusions
References
65 Applications of the Gini Index Beyond Economics and Statistics
Contents
Introduction
Gini's Measures and the Lorenz Curve
The Standard Deviation and Coefficient of Variation
Applications of the Gini Index and GMD
Society and Household Income Inequity
Contrast in Grayscale Images
Other Lorenz-Inspired Measures of Spread and Inequality
Further Modeling with the Lorenz Curve and Gini Index
Equalization and the Gini Index
The Golden Equity
Golden Academia
Summary of Desirable Properties of Measures of Inequality and Spread
Conclusions
References
66 A Computational Music Theory of Everything: Dream or Project?
Contents
The World Formula: A Physical Theory of Everything (ToE)
The ToE in Contemporary Physics
Are Physicists Dreaming?
Is ToE Essentially a Mathematical Problem?
A Computational Music Theory of Everything (ComMute),a Mathematical Nightmare?
Arguments Against a ComMute
Individual Creativity
Colonialist Universalism
Uncontrollable Complexity
What Does ``Computational'' Mean in ComMute?
Some Directions Toward ComMute
Two Dimensions, Same Idea: Harmony and Rhythm
Understanding Harmony and Counterpoint via Gestures
Counterpoint Worlds for Different Musical Cultures
Unification of Mental and Physical Realities in Music: Introducing Complex Time
Unifying Note Performance and Gestural Performance: Lie Operators
Unifying Composition and Improvisation?
Conclusions
References
67 Groovy Mathematics: Toward a Theoretical Model of Rhythm
Contents
Introduction
Order in Movement
A Natural Attraction to Rhythmic Behavior and Experience of Rhythm
Expressive Timing in Music
Modeling Music Performance
RFM: A Continuous Model of Rhythm Performance
Oscillations and Rhythmic Structure
Synthesis of Expressive Timing by Frequency Modulation
Computer Implementation
Simulating Movements in Rhythmic Behavior
Synthesis of Asymmetric Movement Trajectories
Illustration: RFM Simulation of fON
Conclusion
References
68 Music, Dance, and Differential Equations
Contents
Introduction
Music
Sound Generation
Musical Composition
Dance
Dance Movement
Choreography
Three-Body Problem
Influenced by Chaos
Choreography Using Waveforms
Fluid Dynamics
Movement of a Pendulum
Summary
Cross-References
References
69 Breaking the Ice: Figure Skating
Contents
Introduction
History and Equipment
Mathematics Within Skaters' Blade Tracings
Quantitative Ways to Describe Pattern Dances
Geometric Transformations
Rotations
Reflections
Translations
Biomechanical Principles Within Skating
Angular Momentum in Spins
Moment of Inertia in Camel Spin
Moment of Inertia in Upright Spin
Conservation of Angular Momentum from Camel Spin to Upright Spin
Is There Potential for More Record-Breaking Spins?
Angular Momentum in Jumps
Projectile Motion in Jumps
Quintuple Jumps?
Training Tools for Jumps
Pole Harness
Hinged Figure Skating Boot
Weighted Gloves
International Judging System Scoring
Judging Biases
Figure Skating Team Event
Entrants' Contributions to Their Team Scores
Team Event Compared to Hypothetical Team Event
Application of Hypothetical Team Event to Past Olympic Winter Games
Summary
References
70 The Mathematical Foundations of the Science of Cities
Contents
Introduction
Ebenezer Howard's Perspective on Cities
Jane Jacobs' Perspective on Cities
Graph Theory
Network Science
Space Syntax
The Axial Map
Measures Using the Axial Map
Criticisms of Space Syntax
Road Network Analysis
Named-Street Construction
Intersection Continuity Negotiation
Measures of Road Network Analysis
Social Network Analysis
Urban Scaling Theory
Conclusions
References
71 Gilles Deleuze's The Fold: Calculus and Curvilinear Design
Contents
Introduction
Deleuze's The Fold
The Fold and Architecture
Greg Lynn on Folded Architecture, Blobs, and Animate Form
Summary
Cross-References
References
72 Mathematics and Oenology: Exploring an Unlikely Pairing
Contents
Introduction
Maths and Wine-Related Problems
Barrel Volume Calculations
The Mathematics of Wine Aging: Arrhenius and Eyring Equations
Optimal Wine Storage Conditions
Optimal Average Temperature
Temperature Fluctuation
Humidity
Light
Vibrations
The Influence of the Heat Flow in the Temperature Equation
The Optimal Depth for a Wine Cellar
The Temperature Equation at the Optimal Depth
A Qualitative Study of the Depth of a Wine Cellar Based on the Chosen Reference Period and Soil Conditions While the Temperature Is Changing
What's Food and Wine Pairing?
The Graph
Geometrical Issues
Matching Algorithm (MA)
Implementation Details and Examples
More Recent Investigations
Conclusion
References
73 CombinArtorial Games
Contents
Introduction
Rulesets
Normal Play Games
Computational Complexity
Overview
Heap Games
Compounds of Games
Aesthetics of Games
Combinatorial Number Theory
Play Games and Math Games
The mex-Rule: a Minimal EXclusive Algorithm
Three Games
Fibonacci Nim
Euclid's Game
Wythoff Nim
A Fibonacci Numeration System, ZOL
Game Solutions
Fibonacci Nim
Euclid's Game
Wythoff Nim
Wythoff Properties
Mex-Rule
Floor-Function
Fibonacci Morphism
ZOL-Numeration
Proofs of Solutions
Proof for Fibonacci Nim
Proof for Euclid's Game
Proofs for Wythoff Nim
Proof by Wythoff-Properties
More on the Mex-Rule
Proof of Floor-Function
Proof of Fibonacci Morphism
Proof of ZOL-Numeration
When Sprague and Grundy Mex Bouton's Nim
Sprague and Grundy Theory
Conway's Theory of the Full Class of Normal Play
Positional Games with Nonnegative Incentive
Patterns of a Generalized Games
Epilogue
References
74 Combinatorial Artists: Counting, Permutations, and Other Discrete Structures in Art
Contents
Introduction
Combinatorics in Music
Dodecaphonic Music
Iannis Xenakis
Tom Johnson
Elliott Carter
Further Examples
Combinatorics in Literature
The Oulipo
Raymond Queneau
George Perec
Italo Calvino
Juan Eduardo Cirlot
Digital Poetry
Brion Gysin
Combinatorics in Visual Art
Sol LeWitt
Vera Molnar
Manfred Mohr
Vladimir Bonačić
Anders Hoff Aka Inconvergent
Other Combinatorial Visual Artists
Dance, Theatre, and Cinema
Dance
Theatre
Cinema
Closing Time
References
Part V Mathematics, Science, and Dynamical Systems
75 Mathematics, Science, and Dynamical Systems: An Introduction
76 Modern Ergodic Theory: From a Physics Hypothesis to a Mathematical Theory with Transformative Interdisciplinary Impact
Contents
Prelude
Origins
Consequence of the Ergodic Theorem and Other Significant Results
Interdisciplinary Aspects of Ergodic Theory in Mathematics
Number Theory
Combinatorics
Functional Analysis and Harmonic Analysis
Fractal Geometry
Interdisciplinary Aspects of Ergodic Theory with Other Disciplines
References
77 Two-Way Thermodynamics
Contents
Introduction
Some Mathematics
Opposite Arrows
A Paradox
Further Issues
Conclusions
Appendix: Precise Definition of the Modified ``Cat''
Notes
References
78 Visualizing Four Dimensions in Special and General Relativity
Contents
Introduction
Mathematics of Space and Time
Four-Dimensional Spacetime and the Special Theory of Relativity
Gravity, Geometry, and the General Theory of Relativity
Black Holes and Numerical Relativity
Revealing Spacetime Through Technology
Imagination and Artistry
Analogies and Metaphors
Spacetime Diagrams
Relativistic Ray Tracing and First-Person Visualizations
Gravitational Lensing and Astrophysical Observations
Numerical Simulations of Gravitational Waves
Virtual, Augmented, and Mixed Reality
Conclusion
Cross-References
References
79 Coevolution of Mathematics, Statistics, and Genetics
Contents
Introduction
Early Contributions
Mendel and His Inheritance Models
Hardy-Weinberg Equilibrium
Wright-Fisher Model
Study of Family History and Pedigrees
Twin Studies
Genetic Linkage Mapping
Exploring Big Genetic Data
Genome-Wide Association Studies
Whole Genome Sequencing
Network-Based Analysis for Genetic Data
Discussion
References
80 Topology in Biology
Contents
Introduction
What and Why Topology?
Finding Topological Cavities: Persistent Homology
Data Systems and Solutions: Sheaves
Lead-Lag Relationships: Path Signatures
Where Are We Going?
Citation Diversity Statement
References
81 Dynamical Systems and Fitness Maximization in Evolutionary Biology
Contents
Introduction
Historical Development of Natural Selection and Genetics
Charles Darwin and Survival of the Fittest
Gregor Mendel and Experimental Genetics
The Eclipse of Darwinism
Population Genetics
Fitness Maximization and the Neo-Darwinian Theory of Evolution
The Decline of Fisher's Fundamental Theorem
Fisher's Fundamental Theorem of Natural Selection
Fisher's Setting for His Fundamental Theorem
Fisher's Mathematical Model for His Fundamental Theorem
Mutations and Fisher's First Corollary
Genetic Variance and Fisher's Second Corollary
Review of Fisher's Biological Setting for His Theorem
The Problem of Genetic Mutation
Muller and Muller's Ratchet
Models of Selection and Mutation
Mutation-Selection Models with More Realistic Factors
Numerical Simulations from the FTNSWM Mutation: Selection Equations
Conclusions from Mathematical Mutation-Selection Models
Comprehensive Simulations and Comprehensive Fitness
The Necessity of Comprehensive Numerical Simulations
Other Challenges to Net Fitness Maximalization
Why Have We Not Died 100 Times Over?
Lewontin's Lamentations
Reductive Evolution
Evolutionary Models, Dynamical Systems, and Maximization Principles
Stable Equilibria in Mutation-Limited, Infinite Population, Perfect Selection Scenarios
Conley's Fundamental Theorem of Dynamical Systems
Are There Laws in Biology?
A Biological Experiment, Individual Mutations, Adaptation, and Fitness
The Long-Term Evolutionary Experiment
Mutation-Selection-Reproduction Experimental Results
LTEE Experiment and Mathematical Modeling Conclusion
Maximization of Net Biological Function
Conclusion
Skepticism of Fitness Maximization
References
82 Damped Dynamical Systems for Solving Equations and Optimization Problems
Contents
Introduction
Linear Problems
Linear Equations
Linear Eigenvalue Problems
Linear Least Squares
Ill-Posed Problems
Numerical Simulations
From Linear to Nonlinear Problems
Local Linearization Using Optimal Damping and Time Step
Total Energy as a Lyapunov Function
Numerical Experiments
Applications
Image Analysis
Inverse Problems for Partial Differential Equations
Numerical Simulations
Applications in Quantum Physics
Excited States to the Schrödinger Equation
The Yrast Spectrum for Atoms Rotating in a Ring
Phase Separation of Bosonic- and Fermionic-Densities in an Ultracold Atomic Mixture
Conclusions and Future Work
References
83 Mathematics and Climate Change
Contents
Introduction
Climate: A Fluid Dynamical System
Mathematical Equations
Nondimensional Parameters: The Reynolds Number
Convection in the Rayleigh-Bénard System
Reduction of Dimensions and the Lorenz System
Scaling in the Climate System
Projection Methods: Coarse Graining and Stable Manifold Theory
Brownian Motion, Weather, and Climate
Climate Variability and Sensitivity
Non-normal Growth of the Climate System
Predictability
Boltzmann Dynamics
Conclusions
Cross-References
References
84 Mathematical Models Can Predict the Spread of an Invasive Species
Contents
Introduction
Population Growth Models
Dispersal by Diffusion
Conclusion
Cross-References
References
85 Mathematics and Recurrent Population Outbreaks
Contents
Introduction
The Lotka–Volterra Model
Advantages of the Lotka–Volterra Model
Criticism Against the Lotka–Volterra Model
Gause-Type Models for Population Interaction
What About Real Chemostat Conditions?
References
86 Limit Cycles in Planar Systems of Ordinary Differential Equations
Contents
Introduction
Planar Linear and Linearized Systems
First Integral Systems and Gradient Systems
Monotone Dynamics
Index Theory
The Complex Plane
The Existence of Limit Cycles
The 34:Lienard.Revue:23 Equation
Theorems for Absence of Limit Cycles
Uniqueness of Limit Cycles
Summary
References
87 Mathematical Models in Neuroscience: Approaches to Experimental Design and Reliable Parameter Determination
Contents
Introduction
Chemical Kinetics Schemes and the Law of Mass Action
Characteristic Scales and Model Non-dimensionalization
Brief Review of Asymptotic Analysis and Asymptotic Algorithm for Model Reduction
Quasi-Steady-State Approximation and Michaelis–Menten–Henri Kinetics
NMDAR Desensitization: Background Information and General Model
Kinetic Model of NMDAR and Experiment Design
Initial Conditions for NMDAR Experiments
Reduction of the NMDAR Model in Case of Experiments with High Concentration of D-Serine
Reduction of the NMDAR Model in Experiments with High Concentration of L-Glutamate
Reduction of the NMDAR Model in Experiments with High Concentrations of D-Serine and L-Glutamate
Reduction of the NMDAR Model After the Pulse
Reliable NMDAR Model Parameter Estimation
Model Fitting to Data
Conclusion
References
88 Interdisciplinary Mathematics and Sciences in Schematic Ocean Current Maps in the Seas Around Korea
Contents
Introduction
Direct Measurement and Indirect Estimation of Ocean Current
In situ Measurement Using Instruments
Surface Current from Satellite Altimeter Data
Surface Current from Surface Drifters
Maximum Cross Correlation Method from Sequential Satellite Images
Navigation and Registration of Ocean Current Maps
Unified Geographical Mapping Procedure
Digitized Current Maps of Textbooks and Scientific Articles
Strategy for Unified Current Map
List of Topics and Issued Contents
Working Flow for Finalized Schematic Map
Schematic Map of Ocean Current
Case I: East Sea (Japan Sea)
Case II: Yellow Sea and East China Sea
Other Issues
Name of Current
Use of Colors
Use of Lines
Strength of Current
Quantitative Information on Digital Ocean Current Map
Implications to Other Countries
Conclusion
References
Part VI Mathematics, History, and Philosophy
89 Mathematics, History, and Philosophy: An Introduction
References
90 Writing the History of Mathematics: Interpretations of the Mathematics of the Past and Its Relation to the Mathematics of Today
Contents
Introduction
Traces of Mathematics of the First Humans
History of Ancient Mathematics: The First Written Sources
History of Mathematics or Heritage of Mathematics?
Further Views of the Past and Its Relation to the Present
Can History Be Recapitulated or Does Culture Matter?
Concluding Remarks
Cross-References
References
91 Mathematics and Cultures Across the Chessboard: The Wheat and Chessboard Problem
Contents
Introduction
Mathematics and the Invention of Chess
Mathematics and the Origins of Chess
Geometric Progressions and Chess
Arabic Sources on the Computation 264 − 1
Greek Sources on the Computation 264 − 1
Western Sources on the Computation 264 − 1
Number Theory
Summary
Cross-References
References
92 Ancient Greek Methods of Measuring Astronomical Sizes
Contents
Introduction
Conclusion
Cross-References
References
93 Space and Time in the Foundations of Mathematics, or Some Challenges in the Interactions with Other Sciences
Contents
The Geometric Intelligibility of Space, an Introduction
Euclid
B. Riemann
A. Connes
Some Epistemological Remarks on the Geometry of Physical Space
Codings
Geometry in Computing
Living in Space and Time
Multiscale Phenomena and the Mathematical Complexity of the Neural System
Theories Versus Models
Conclusion: Epistemological and Mathematical Projects
Epistemology
Geometry in Information
Geometric Forms and Meaning
References
94 Baroquian Folds: Leibniz on Folded Fabrics and the Disruption of Geometry
Contents
Introduction
Folded Drapery: Between Geometry and Its Subversion
Before the Baroque: The Geometrization of Folded Drapery
Folds of the Baroque: Disruption of and Deviation from the Geometrical Space
Leibniz on Folding
Conclusion
References
95 Nyaya Methodology and Western Mathematical Logic: Origins and Implications
Contents
Introduction: Debate Over the Importance of Nyaya Philosophy
Comparisons Between the Aristotelian Syllogism and Nyaya Syllogism
Valid Knowledge and Logical Methods in the Nyaya System
Flaws in the Law of Contrapositive
Navya-Nyaya Theory of Number
Aristotle v. Nyaya: Final Word
The Nyaya Syllogism's Conceptual Origins and Implications
Origins of Logic
The Original Debate: Milinda-Panha
Logical Objects
Four-Cornered Negation
Kathavatthu and the Vadayutti
The Nyayasutra
The Nyaya Syllogism and the Problem of Jati
Summary/Conclusion
References
96 Reception and Contestation: Mathematics and Esoteric Spirituality, 1875–1915
Contents
Introduction
Hyperspace Theorizing and Early Theosophical Interventions
Contesting the Fourth Dimension
Hyperspace in Ouspensky's Tertium Organum
Making Sense of an Erratic Discourse
Concluding Remarks
References
97 Islamic Design and Its Relation to Mathematics
Contents
The Geometric Mode in Islamic Art
Theories, Problems, and Evidence
Symbolic Meaning
Early Islamic Art: The Emergence of an Islamic Aesthetic Sensibility
Islam's Greek Inheritance: Mathematics, Science, and Philosophy
Theoretical Geometry and Artisanal Practice in the Islamic World
Mathematics in the Islamic World and Its Involvement in Geometric Ornament
Conclusion of Historical Perspective
Modern Mathematical Analysis
Computer Usage
References
98 Mathematical Explanations and Mathematical Applications
Contents
Introduction
Catastrophes and Games
Curious Cicadas and Simple Strawberries
What Are Mathematical Explanations Like?
Philosophical Significance
Conclusion
References
Part VII Mathematical Influences and New Directions
99 Introduction to Mathematical Influences and New Directions
References
100 Ethnomodelling as the Translation of Diverse Cultural Mathematical Practices
Contents
Introduction
Ethnomathematics and Modelling
Exploring Ethnomodelling
Ethnomodelling and its Three Approaches of Viewing Cultures
Etic: The Global/Outsider Approach
Emic: The Local/Insider Approach
Dialogic: The Glocal/Emic-Etic Approach
Characterizing Ethnomodels
Emic and Etic Ethnomodels of the Mangbetu Ivory Sculpture
An Etic Ethnomodel of Brazilian Roller Carts
A Dialogic Ethnomodel of a Local Farmer-Vendor
Relevance of Ethnomodelling in a Mathematics Curriculum
Conclusion
References
101 Cognition, Interdisciplinarity, and Equity
Contents
Introduction
Focus and Criteria for the Review
Selected Works Influenced by Cultural Anthropology and Ethnography
Ethnomathematics Research
Funds of Knowledge Research
Summary and Future Research: The Importance of Community Engagement
Individual Cognition of Academic Mathematics
Malloy and Jones (1998)
Morton (2014)
Adiredja (2019) and Adiredja and Zandieh (2017, in press)
Lewis (2014) and Lewis and Lynn (2018a, b)
Fuson, Smith, and Lo Cicero (1997)
Summary and Future Research: Diversity in Engaging the Politics of Mathematical Learning
The Use of Existing Literature
The Recruitment of Participants and Emancipatory Approach
Managing Generalization and Essentialization of Findings
Conclusion
Cross-References
References
102 Mathematics and Rhetoric
Contents
Introduction
Why Study Math from a Rhetorical Perspective?
What Do Mathematicians Have to Gain?
Summary: How Is a Rhetorical Approach Different from Other Interdisciplinary Approaches?
References
103 Modes and Modalities of Mathematical Authority: Disseminating the ``New Infinite,'' 1870–1920
Contents
Introduction
Mathematical Considerations
In Advance of the New Infinite, 1870–1890
Josiah Royce (1855–1916): The New Infinite and the Absolute
Cassius J Keyser (1862–1947): Policing and Promoting the New Infinite
Responses and Other Commentaries, 1900–1920
Concluding Remarks
Cross-References
References
104 ``Bok Bok'': Exploring the Game of Chicken in Film
Contents
Introduction
Theoretical Chicken: Analyzing the Game
Payoff Matrices and Non-zero-sum Games
Rationality
Cooperation Versus Defection
Equilibrium Points
Communication
Applied Chicken: Winning Friends and Influencing People
Why Play the Game?
``Nobody Here but Us Chickens''
``Don't You Play Chicken with Me, Boy''
``Chickens Are Bitches, Dude''
Conclusion
References
105 Moral Mathematics
Contents
Introduction
Dollar Auction Vignette
History of Moral Math
Limitations, Resistance, and Cautions
Ten Examples of Math Used as a Tool to Impact Social Behavior
Experiential Presentations
Five (of Many) Areas to Target for Continued Moral Math Development
Potential for Spiritual Healing
Closing Vignette
Conclusion
References
106 Feminist Theories Informing Mathematical Practice
Contents
Introduction
Mathematics and the Shadow of Gender Essentialism
Mathematics, Feminist Perspectives, and Connections to Science and Technology Studies
Mathematics, Issues of Power, and Pedagogical Practice
Mathematics, Popular Culture, and Representation
Conclusion
References
107 Queer(y)ing Mathematical Knowledge and Practices
Contents
Introduction
Appreciating Queer in Context
Queering Visibility, Support, and Resources in Mathematics and STEM
Queering Curricula
Queer(y)ing Perspectives on Disciplinary Knowledge and Practices
Alan Turing
Reuben Hersh: What Is Mathematics, Really?
Imre Lakatos: Proofs and Refutations
Concluding Remarks
References
Index

Citation preview

Bharath Sriraman Editor

Handbook of the Mathematics of the Arts and Sciences

Handbook of the Mathematics of the Arts and Sciences

Bharath Sriraman

Handbook of the Mathematics of the Arts and Sciences Volume 1

With 1473 Figures and 74 Tables

123

Editor Bharath Sriraman Department of Mathematical Sciences, University of Montana, Missoula, MT, USA

ISBN 978-3-319-57071-6 ISBN 978-3-319-57072-3 (eBook) ISBN 978-3-319-57073-0 (print and electronic bundle) https://doi.org/10.1007/978-3-319-57072-3 © Springer Nature Switzerland AG 2021 All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

This work is dedicated to: My mother Swarna – initium Auream & Claire – Lux vitae meae

Foreword

The first edition of the Handbook of the Mathematics of the Arts and Sciences is a 5-year undertaking that has culminated in a scholarly product consisting of 100+ chapters, which address the ubiquity of mathematics in the arts and sciences. The handbook consists of seven parts with chapters that address (1) mathematics, art and aesthetics; (2) mathematics, humanities and the language arts; (3) mathematics and architecture; (4) mathematics in society; (5) mathematics, science and dynamical systems; (6) mathematics, history and philosophy; and (7) mathematical influences and new directions. The parts include contributions from researchers that have both an appreciation as well as an understanding of the mathematics that interacts with their particular topic. Each part also has a separate introduction written by the section editors that summarizes the contents and scope of their respective sections. The section editors/coeditors of this handbook are Michael Ostwald (mathematics and architecture); Kyeonghwa Lee (mathematics, art and aesthetics); Torsten Lindström (mathematics, science and dynamical systems); Gizem Karaali (mathematics, humanities and language arts) and Ken Valente (mathematical influences and new directions). In addition, three consulting editors, Alexandre Borovik, Daina Taimina, and Nathalie Sinclair, were part of the editorial team. The goal of this handbook was to become an authoritative source with chapters that show the origins, unification, and points of similarity between different disciplines and mathematics. Some chapters also show bifurcations and the development of disciplines which grow to take on a life of their own. Science and art are used as umbrella terms to encompass the physical and natural sciences, as well as the visual and performing arts. Numerous chapters in the book explore these connections. Some of the questions that provoked this handbook were: • What are the origins of interdisciplinarity in mathematics? • What are cross-cultural components of interdisciplinarity linked to mathematics? • What are contemporary interdisciplinary trends? The chapters in this book reveal that mathematics is both a concrete human activity, that is, present in numerous artistic, building, exploratory, and rhetorical endeavors as well as an abstract activity as evidenced in its presence in uncanny situations and contexts both micro and macro. The origins of the intentional or unintentional vii

viii

Foreword

use of mathematics abound in the artistic and architectural splendors of the world since time immemorial whereas modern artistic, computational, and scientific forays into the digital world reveal this ubiquity all over again. In the natural and physical sciences, newer connections are continuously being made. For instance, at the microscopic level, notions from topology, a very abstract mathematical discipline, find relevance in cellular biology whereas at the macroscopic level, ergodic theory, another abstract mathematical discipline, forms the theoretical backbone for modeling dynamical systems all around us. Rhetorical endeavors are understood here in their Aristotelian sense – namely ethos, logos, and pathos. Indeed, mathematics is a very persuasive language that has stood the test of time. Numerous chapters also reveal cross-cultural aspects of interdisciplinarity linked to mathematics. This is especially evident in the parts Mathematics, History, and Philosophy and Mathematical Influences and New Directions. In these two parts, cross-cultural aspects of interdisciplinarity, both ancient and modern, are explored in familiar and unfamiliar contexts. Finally, contemporary interdisciplinary trends manifest themselves in the sheer range of topics covered in this handbook, which more or less run the entire gamut of the alphabet. The interested reader can uncover this when perusing the table of contents. No preface is complete without acknowledging those who were integral to this project. Clemens Heine (Birkhauser) was the initial sounding board for this idea who helped me crystalize the impossible into something workable. Thomas Hempfling and Michael Hermann gave their logistical expertise and publishing experience. Luca Sidler (Birkhauser), Ruth Lefevre, and Eleanor Gaffney provided invaluable production support during different phases of this book project for which I am most grateful. Last but not least, my family played a major role in their enthusiastic questions and noise levels in motivating me to complete this book. Bharath Sriraman

Contents

Volume 1 Part I Mathematics, Art, and Aesthetics . . . . . . . . . . . . . . . . . . . . . . . . .

1

1

Mathematics, Art, and Aesthetics: An Introduction . . . . . . . . . . . . Bharath Sriraman and Kyeonghwa Lee

3

2

The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scott B. Lindstrom

7

3

The Beauty of Blaschke Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ulrich Daepp, Pamela Gorkin, Gunter Semmler, and Elias Wegert

45

4

Looking Through the Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Annalisa Crannell

79

5

Designing Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vincent J. Matsko

105

6

Homeomorphisms Between the Circular Disc and the Square . . . . Chamberlain Fong

123

7

A Visual Overview of Coprime Numbers . . . . . . . . . . . . . . . . . . . . . . Benjamín A. Itzá-Ortiz, Roberto López-Hernández, and Pedro Miramontes

149

8

Almost All Surfaces Are Made Out of Hexagons . . . . . . . . . . . . . . . Hyungryul Baik

169

9

Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . António B. Araújo

10

Anamorphosis: Between Perspective and Catoptrics . . . . . . . . . . . . Agostino De Rosa and Alessio Bortot

175 243

ix

x

11

Contents

Geometric and Aesthetic Concepts Based on Pentagonal Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cornelie Leopold

12

Mathematics and Origami: The Art and Science of Folds . . . . . . . Natalija Budinski

13

Geometric Strategies in Creating Origami Paper Lampshades: Folding Miura-ori, Yoshimura, and Waterbomb Tessellations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiangmei Wu

291 317

349

14

Mathematical Design for Knotted Textiles . . . . . . . . . . . . . . . . . . . . Nithikul Nimkulrat and Tuomas Nurmi

381

15

Art and Science of Rope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexander Åström and Christoffer Åström

409

16

A Survey of Cellular Automata in Fiber Arts . . . . . . . . . . . . . . . . . . Joshua Holden and Lana Holden

443

17

Mathematics and Art: Connecting Mathematicians and Artists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joseph Malkevitch

467

18

Mathematics and Art: Unifying Perspectives . . . . . . . . . . . . . . . . . . Heather M. Russell and Radmila Sazdanovic

497

19

Spherical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . António B. Araújo

527

20

A Hidden Order: Revealing the Bonds Between Music and Geometric Art – Part One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sama Mara and Lee Westwood

589

A Hidden Order: Revealing the Bonds Between Music and Geometric Art – Part Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lee Westwood and Sama Mara

627

21

22

Korean Traditional Patterns: Frieze and Wallpaper . . . . . . . . . . . . Hyunyong Shin, Shilla Sheen, Hyeyoun Kwon, and Taeseon Mun

649

23

Projections of Knots and Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexander Åström and Christoffer Åström

665

24

Comparative Temple Geometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kelly McGonigal

697

25

Wasan Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hiroshi Okumura

711

Contents

26

27

xi

Geometries of Light and Shadows, from Piero della Francesca to James Turrell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Agostino De Rosa and Francesco Bergamo

763

TOND to TOND: Self-Similarity of Persian TOND Patterns, Through the Logic of the X-Tiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jean-Marc Castera

801

28

Artistic Manifestations of Topics in String Theory . . . . . . . . . . . . . Nadav Drukker

29

Cutting, Gluing, Squeezing, and Twisting: Visual Design of Real Algebraic Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stephan Klaus

30

Double Layered Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rinus Roelofs

841

863 879

Volume 2 Part II 31

Mathematics, Humanities, and the Language Arts . . . . . . . . . .

961

Mathematics, Humanities, and the Language Arts: An Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gizem Karaali and Bharath Sriraman

963

32

Mathematics and Poetry: Arts of the Heart . . . . . . . . . . . . . . . . . . . Gizem Karaali and Lawrence M. Lesser

33

“Elegance in Design”: Mathematics and the Works of Ted Chiang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jessica K. Sklar

967

981

34

Running in Shackles: The Information-Theoretic Paradoxes of Poetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001 Dmitri Manin

35

Metaphor: A Key Element of Beauty in Poetry and Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015 Sânziana Caraman and Lorelei Caraman

36

Poems Structured by Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045 Daniel May

37

Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise . . . . 1093 Natalie Schuler Evers

xii

Contents

Part III

Mathematics and Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 1115

38

Architecture and Mathematics: An Ancient Symbiosis . . . . . . . . . . 1117 Michael J. Ostwald

39

Egyptian Architecture and Mathematics . . . . . . . . . . . . . . . . . . . . . . 1135 Corinna Rossi

40

Labyrinth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1147 Tessa Morrison

41

Classical Greek and Roman Architecture: Mathematical Theories and Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163 Sylvie Duvernoy

42

Classical Greek and Roman Architecture: Examples and Typologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1181 Sylvie Duvernoy

43

Mathematics and the Art and Science of Building Medieval Cathedrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1203 Josep Lluis i Ginovart

44

Renaissance Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1245 Sylvie Duvernoy

45

Baroque Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1261 Sylvie Duvernoy

46

Temple of Solomon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277 Tessa Morrison

47

Utopian Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1293 Tessa Morrison

48

Tessellated, Tiled, and Woven Surfaces in Architecture . . . . . . . . . 1309 Michael J. Ostwald

49

Stereotomy: Architecture and Mathematics . . . . . . . . . . . . . . . . . . . 1325 Giuseppe Fallacara and Roberta Gadaleta

50

Fractal Geometry in Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1345 Josephine Vaughan and Michael J. Ostwald

51

Parametric Design: Theoretical Development and Algorithmic Foundation for Design Generation in Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1361 Ning Gu, Rongrong Yu, and Peiman Amini Behbahani

52

Shape Grammars: A Key Generative Design Algorithm . . . . . . . . 1385 Ning Gu and Peiman Amini Behbahani

Contents

xiii

53

Space Syntax: Mathematics and the Social Logic of Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1407 Michael J. Dawes and Michael J. Ostwald

54

Isovists: Spatio-visual Mathematics in Architecture . . . . . . . . . . . . 1419 Michael J. Dawes and Michael J. Ostwald

55

Fractal Dimensions in Architecture: Measuring the Characteristic Complexity of Buildings . . . . . . . . . . . . . . . . . . . . . . . 1433 Michael J. Ostwald and Josephine Vaughan

Part IV

Mathematics in Society . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1451

56

Mathematics in Society: An Introduction . . . . . . . . . . . . . . . . . . . . . 1453 Bharath Sriraman

57

Probabilistic Thinking from Elementary Grades to Graduate School . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1455 John Beam

58

Risk and Decision Making: Modeling and Statistics in Medicine – Fundamental Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1471 Manfred Borovcnik

59

Risk and Decision Making: Modeling and Statistics in Medicine – Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1507 Manfred Borovcnik

60

To Justice Through Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1543 Mary W. Gray

61

Actuarial (Mathematical) Modeling of Mortality and Survival Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1559 Patrick L. Brockett and Yuxin Zhang

62

Mathematics in the Maritime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1593 Kyra Mycroft and Bharath Sriraman

63

Mathematics and Economics, with Special Attention to Social Choice Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1613 Maurice Salles

64

Social Algorithms and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 1637 Xin-She Yang

65

Applications of the Gini Index Beyond Economics and Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1661 Roberta La Haye and Petr Zizler

xiv

Contents

66

A Computational Music Theory of Everything: Dream or Project? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1691 Guerino Mazzola

67

Groovy Mathematics: Toward a Theoretical Model of Rhythm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1711 Carl Haakon Waadeland

68

Music, Dance, and Differential Equations . . . . . . . . . . . . . . . . . . . . . 1731 Lorelei Koss

69

Breaking the Ice: Figure Skating . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1749 Diana Cheng

70

The Mathematical Foundations of the Science of Cities . . . . . . . . . 1795 Christa Brelsford and Taylor Martin

71

Gilles Deleuze’s The Fold: Calculus and Curvilinear Design . . . . . 1819 Menno Hubregtse

72

Mathematics and Oenology: Exploring an Unlikely Pairing . . . . . 1835 Lucio Cadeddu, Alessandra Cauli, and Stefano De Marchi

73

CombinArtorial Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1867 Urban Larsson

Volume 3 74

Combinatorial Artists: Counting, Permutations, and Other Discrete Structures in Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1925 Lali Barrière

Part V

Mathematics, Science, and Dynamical Systems . . . . . . . . . . . . 1965

75

Mathematics, Science, and Dynamical Systems: An Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1967 Torsten Lindström and Bharath Sriraman

76

Modern Ergodic Theory: From a Physics Hypothesis to a Mathematical Theory with Transformative Interdisciplinary Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1969 Do˘gan Çömez

77

Two-Way Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1993 L. S. Schulman

78

Visualizing Four Dimensions in Special and General Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2003 Magdalena Kersting

Contents

xv

79

Coevolution of Mathematics, Statistics, and Genetics . . . . . . . . . . . 2039 Yun Joo Yoo

80

Topology in Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2073 Ann Sizemore Blevins and Danielle S. Bassett

81

Dynamical Systems and Fitness Maximization in Evolutionary Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2097 William Basener, Salvador Cordova, Ola Hössjer, and John Sanford

82

Damped Dynamical Systems for Solving Equations and Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2171 Mårten Gulliksson, Magnus Ögren, Anna Oleynik, and Ye Zhang

83

Mathematics and Climate Change . . . . . . . . . . . . . . . . . . . . . . . . . . . 2217 Gerrit Lohmann

84

Mathematical Models Can Predict the Spread of an Invasive Species . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2249 John G. Alford

85

Mathematics and Recurrent Population Outbreaks . . . . . . . . . . . . 2275 Torsten Lindström

86

Limit Cycles in Planar Systems of Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2291 Torsten Lindström

87

Mathematical Models in Neuroscience: Approaches to Experimental Design and Reliable Parameter Determination . . . . 2319 Denis Shchepakin, Leonid Kalachev and Michael Kavanaugh

88

Interdisciplinary Mathematics and Sciences in Schematic Ocean Current Maps in the Seas Around Korea . . . . . . . . . . . . . . . 2359 Kyung-Ae Park, Jae-Jin Park, Ji-Eun Park, Byoung-Ju Choi, Sang-Ho Lee, Do-Seong Byun, Eun-Il Lee, Boon-Soon Kang, Hong-Ryeol Shin, and Sang-Ryong Lee

Part VI Mathematics, History, and Philosophy . . . . . . . . . . . . . . . . . . . 2389 89

Mathematics, History, and Philosophy: An Introduction . . . . . . . . 2391 Bharath Sriraman

90

Writing the History of Mathematics: Interpretations of the Mathematics of the Past and Its Relation to the Mathematics of Today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2395 Johanna Pejlare and Kajsa Bråting

xvi

Contents

91

Mathematics and Cultures Across the Chessboard: The Wheat and Chessboard Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2421 Alberto Bardi

92

Ancient Greek Methods of Measuring Astronomical Sizes . . . . . . . 2445 Adam Clinch

93

Space and Time in the Foundations of Mathematics, or Some Challenges in the Interactions with Other Sciences . . . . . . . . . . . . . 2459 Giuseppe Longo

94

Baroquian Folds: Leibniz on Folded Fabrics and the Disruption of Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2487 Michael Friedman

95

Nyaya Methodology and Western Mathematical Logic: Origins and Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2515 Martin Schmidt and Bharath Sriraman

96

Reception and Contestation: Mathematics and Esoteric Spirituality, 1875–1915 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2539 K. G. Valente

97

Islamic Design and Its Relation to Mathematics . . . . . . . . . . . . . . . . 2561 Brian Wichmann and David Wade

98

Mathematical Explanations and Mathematical Applications . . . . . 2587 Markus Pantsar

Part VII

Mathematical Influences and New Directions . . . . . . . . . . . . . 2603

99

Introduction to Mathematical Influences and New Directions . . . . 2605 K. G. Valente

100

Ethnomodelling as the Translation of Diverse Cultural Mathematical Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2607 Milton Rosa and Daniel Clark Orey

101

Cognition, Interdisciplinarity, and Equity . . . . . . . . . . . . . . . . . . . . . 2637 Aditya P. Adiredja

102

Mathematics and Rhetoric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2663 G. Mitchell Reyes

103

Modes and Modalities of Mathematical Authority: Disseminating the “New Infinite,” 1870–1920 . . . . . . . . . . . . . . . . . . 2685 K. G. Valente

104

“Bok Bok”: Exploring the Game of Chicken in Film . . . . . . . . . . . . 2707 Jennifer Firkins Nordstrom and Jessica K. Sklar

Contents

xvii

105

Moral Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2729 Sarah Voss

106

Feminist Theories Informing Mathematical Practice . . . . . . . . . . . 2753 Linda McGuire

107

Queer(y)ing Mathematical Knowledge and Practices . . . . . . . . . . . 2777 K. G. Valente

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2795

About the Editor

Bharath Sriraman is a professor of mathematics at the University of Montana, Missoula, known internationally for his research in the interdisciplinary aspects of mathematics with the arts and sciences, cognition, creativity, history and philosophy of mathematics, and mathematics education. To date, Professor Sriraman has published 300+ journal articles, book chapters, proceedings papers, and reference work entries in his areas of interest, which include 31 edited books. In 2016, he was named the University of Montana Distinguished Scholar. He is the founder and editor-inchief of The Mathematics Enthusiast, an independent, peer-reviewed open access international journal now in its 18th year of existence. He is the co-founder/coSeries editor of Advances in Mathematics Education and Creativity Theory and Action in Education which are both published by Springer. Professor Sriraman has held more than 30 visiting professorships at institutions in Norway, Iceland, Sweden, Germany, Turkey, Iran, Malaysia, Canada, South Africa, Colombia, and Argentina, which include two US Fulbright awards. The Handbook of the Mathematics of the Arts and Sciences has been his most ambitious editorial project to date. He is presently curating and editing The Handbook of the History and Philosophy of Mathematical Practice, another Springer Major Reference Works project. In his spare time he is an amateur arborist.

xix

Editorial Board

Section Editors Gizem Karaali Pomona College, Claremont, USA Kyeong-Hwa Lee Seoul National University, Seoul, South Korea Torsten Lindström Linnaeus University, Växjö, Sweden Michael J. Ostwald University of New South Wales, Sydney, Australia Ken Valente Colgate University, Hamilton, USA

Consulting Editors Alexandre Borovik Manchester University, Manchester, UK Nathalie Sinclair Simon Fraser University, Burnaby, USA Daina Taimina Cornell University, Ithaca, USA

xxi

Contributors

Aditya P. Adiredja The University of Arizona, Tucson, AZ, USA John G. Alford Department of Mathematics and Statistics Huntsville, Sam Houston State University, Huntsville, TX, USA Peiman Amini Behbahani School of Art, Architecture and Design, University of South Australia, Adelaide, South Australia, Australia António B. Araújo CIAC-UAb, Center for Research in Arts and Communication, Universidade Aberta, Lisbon, Portugal Alexander Åström Gothenburg, Sweden Christoffer Åström Ucklum, Sweden Hyungryul Baik Department of Mathematical Sciences, KAIST, Yuseong-gu, Daejeon, South Korea Alberto Bardi Polonsky Academy for Advanced Study in the Humanities and Social Sciences, Van Leer Jerusalem Institute, Jerusalem, Israel Lali Barrière Departament de Matemàtiques, Universitat Politècnica de Catalunya, Barcelona, Spain William Basener University of Virginia, Rochester, NY, USA Danielle S. Bassett Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA John Beam Department of Mathematics, University of Wisconsin Oshkosh, Oshkosh, WI, USA Francesco Bergamo Dipartimento di Culture del Progetto, Università Iuav di Venezia, Venezia, Italy Ann Sizemore Blevins Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA Manfred Borovcnik Department of Statistics, Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria

xxiii

xxiv

Contributors

Alessio Bortot University Iuav in Venice/dCP, Venice, Italy Kajsa Bråting Department of Education, Uppsala University, Uppsala, Sweden Christa Brelsford Oak Ridge National Laboratory, Oak Ridge, TN, USA Patrick L. Brockett Department of Information, Risk and Operations Management, University of Texas at Austin, Austin, TX, USA Natalija Budinski Petro Kuzmjak School, Ruski Krstur, Serbia Do-Seong Byun Korea Hydrographic and Oceanographic Agency, Busan, Korea Lucio Cadeddu University of Cagliari, Cagliari, Italy Lorelei Caraman Department of English, Alexandru Ioan Cuza University, Iasi, Romania Sânziana Caraman Department of Mathematics, Gheorghe Asachi Technical University, Iasi, Romania Jean-Marc Castera Independent Artist, Paris, France Alessandra Cauli Politecnico di Torino, Turin, Italy Diana Cheng Towson University, Towson, MD, USA Byoung-Ju Choi Chonnam National University, Gwangju, Korea Adam Clinch Capital High School, Helena, MT, USA Do˘gan Çömez Department of Mathematics, North Dakota State University, Fargo, ND, USA Salvador Cordova FMS Foundation, Canandaigua, NY, USA Annalisa Crannell Franklin & Marshall College, Lancaster, PA, USA Ulrich Daepp Bucknell University, Lewisburg, PA, USA Michael J. Dawes University of New South Wales, Sydney, NSW, Australia Stefano De Marchi University of Padova, Padova, Italy Agostino De Rosa Dipartimento di Culture del Progetto, Università Iuav di Venezia, Venezia, Italy Nadav Drukker Department of Mathematics, King’s College London, London, UK Sylvie Duvernoy Politecnico di Milano, Milan, Italy Natalie Schuler Evers University of South Alabama, Mobile, AL, USA Giuseppe Fallacara DICAR, Politecnico di Bari, Bari, Italy Chamberlain Fong exile.org, San Francisco, CA, USA

Contributors

xxv

Michael Friedman Excellence Cluster Matters of Activity, Humboldt University, Berlin, Germany Roberta Gadaleta DICAR, Politecnico di Bari, Bari, Italy Pamela Gorkin Bucknell University, Lewisburg, PA, USA Mary W. Gray American University, Washington, DC, USA Ning Gu School of Art, Architecture and Design, University of South Australia, Adelaide, South Australia, Australia Mårten Gulliksson Mathematics, School of Engineering and Technology, Örebro, Sweden Joshua Holden Rose-Hulman Institute of Technology, Terre Haute, IN, USA Lana Holden SkewLoose, LLC, Terre Haute, IN, USA Ola Hössjer Stockholm University, Stockholm, Sweden Menno Hubregtse Art History and Visual Studies, University of Victoria, Victoria, BC, Canada Benjamín A. Itzá-Ortiz Universidad Autónoma del Estado de Hidalgo, Pachuca, Mexico Leonid Kalachev University of Montana, Missoula, MT, USA Boon-Soon Kang Korea Hydrographic and Oceanographic Agency, Busan, Korea Gizem Karaali Department of Mathematics, Pomona College, Pomona, CA, USA Michael Kavanaugh University of Montana, Missoula, MT, USA Magdalena Kersting Department of Physics, University of Oslo, Oslo, Norway ARC Centre of Excellence OzGrav, Swinburne University of Technology, Hawthorn,Australia Stephan Klaus Oberwolfach Research Institute for Mathematics, Oberwolfach, Germany Lorelei Koss Department of Mathematics and Computer Science, Dickinson College, Carlisle, PA, USA Hyeyoun Kwon Gyeongsangnam-do Office of Education, Changwon, South Korea Roberta La Haye Mount Royal University, Calgary, AB, Canada Urban Larsson School of Computing, National University of Singapore, Singapore, Singapore Eun-Il Lee Korea Hydrographic and Oceanographic Agency, Busan, Korea

xxvi

Contributors

Kyeonghwa Lee Department of Mathematics Education, College of Education, Seoul National University, Seoul, South Korea Sang-Ho Lee Kunsan National University, Gunsan, Korea Sang-Ryong Lee Pusan National University, Busan, Korea Cornelie Leopold FATUK – Faculty of Architecture, TUK Kaiserslautern, Kaiserslautern, Rheinland-Pfalz, Germany Lawrence M. Lesser Department of Mathematical Sciences, The University of Texas at El Paso, El Paso, TX, USA Scott B. Lindstrom Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong Torsten Lindström Department of Mathematics, Linnaeus University, Växjö, Sweden Josep Lluis i Ginovart Universitat Internacional de Catalunya, Barcelona, Spain Gerrit Lohmann Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany University of Bremen, Bremen,Germany Giuseppe Longo Centre Cavaillès, CNRS and Ecole Normale Supérieure, Paris, France School of Medicine, Tufts University, Boston, MA, USA Roberto López-Hernández Universidad Autónoma del Estado de Hidalgo, Pachuca, Mexico Joseph Malkevitch York College (CUNY), New York, NY, USA Dmitri Manin Independent researcher, Menlo Park, CA, USA Sama Mara Musical Forms, London, UK Taylor Martin Sam Houston State University, Huntsville, TX, USA Vincent J. Matsko Independent Scholar, St. Petersburg, FL, USA Daniel May Black Hills State University, Spearfish, SD, USA Guerino Mazzola School of Music, University of Minnesota, Minneapolis, MN, USA Kelly McGonigal Independent Scholar, Anaconda, MT, USA Linda McGuire Department of Mathematics and Computer Science, Muhlenberg College, Allentown, PA, USA

Contributors

xxvii

Pedro Miramontes Universidad Nacional Autónoma de México, Mexico City, Mexico Tessa Morrison The School of Architecture and Built Environment, The University of Newcastle, Newcastle, NSW, Australia Taeseon Mun Seoul Metropolitan Office of Education, Seoul, South Korea Kyra Mycroft University of Montana, Missoula, MT, USA Nithikul Nimkulrat OCAD University, Toronto, Canada Jennifer Firkins Nordstrom Mathematics Department, Linfield College, McMinnville, OR, USA Tuomas Nurmi Turku, Finland Magnus Ögren Mathematics, School of Engineering and Technology, Örebro, Sweden Hiroshi Okumura Faculty of Engineering, Department of Life science and Informatics, Graduate School of Engineering, Division of Life Science and Informatics, Maebashi, Gunma, Japan Anna Oleynik Department of Mathematics, University of Bergen, Bergen, Norway Daniel Clark Orey Departamento de Educação Matemática, Universidade Federal de Ouro Preto, Ouro Preto, Minas Gerais, Brazil Michael J. Ostwald UNSW Built Environment, University of New South Wales, Sydney, NSW, Australia Markus Pantsar University of Helsinki, Helsinki, Finland Jae-Jin Park Seoul National University, Seoul, Korea Ji-Eun Park Seoul National University, Seoul, Korea Kyung-Ae Park Seoul National University, Seoul, Korea Johanna Pejlare Department of Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg, Göteborg, Sweden G. Mitchell Reyes Lewis & Clark College, Portland, OR, USA Rinus Roelofs Independent Sculptor, Hengelo, The Netherlands Milton Rosa Departamento de Educação Matemática, Universidade Federal de Ouro Preto, Ouro Preto, Minas Gerais, Brazil Corinna Rossi Politecnico di Milano, Milan, Italy Heather M. Russell Department of Mathematics and Computer Science, University of Richmond, Richmond, VA, USA

xxviii

Contributors

Maurice Salles CREM (UMR-CNRS 6211), University of Caen-Normandy, Caen Cedex, France CPNSS, London School of Economics, London, UK Murat Sertel Center for Advanced Economic Studies, Bilgi University, Istanbul, Turkey John Sanford FMS Foundation, Canandaigua, NY, USA Radmila Sazdanovic Department of Mathematics, North Carolina State University, Raleigh, NC, USA Martin Schmidt Western Nevada College, Carson City, NV, USA L. S. Schulman Physics Department, Clarkson University, Potsdam, NY, USA Gunter Semmler Technische Universität Bergakademie Freiberg, Germany

Freiberg,

Denis Shchepakin University of Montana, Missoula, MT, USA Shilla Sheen Graduate School, Korea National University of Education, Cheongjusi, South Korea Hong-Ryeol Shin Kongju National University, Gongju, Korea Hyunyong Shin Korea National University of Education, Cheongju-si, South Korea Jessica K. Sklar Mathematics Department, Pacific Lutheran University, Tacoma, WA, USA Bharath Sriraman Department of Mathematical Sciences, University of Montana, Missoula, MT, USA K. G. Valente Colgate University, Hamilton, NY, USA Josephine Vaughan The University of Newcastle, Newcastle, NSW, Australia Sarah Voss Independent Scholar, Omaha, NE, USA Carl Haakon Waadeland Department of Music, Norwegian University of Science and Technology (NTNU), Trondheim, Norway David Wade Independent Scholar, Llanidloes, UK Elias Wegert Technische Universität Bergakademie Freiberg, Freiberg, Germany Lee Westwood University of Sussex, Brighton, UK Brian Wichmann Independent Scholar, Woking, UK Jiangmei Wu Eskenazi School of Art, Architecture and Design, Indiana University, Bloomington, IN, USA Xin-She Yang Department of Design Engineering and Maths, School of Science and Technology, Middlesex University, London, UK

Contributors

xxix

Yun Joo Yoo Department of Mathematics Education, Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea Rongrong Yu School of Engineering and Built Environment, Griffith University, Southport, Australia Ye Zhang Faculty of Mathematics, Chemnitz University of Technology, Chemnitz, Germany Yuxin Zhang Department of Management and Information Systems, Wayne State University, Detroit, MI, USA Petr Zizler Mount Royal University, Calgary, AB, Canada

Part I Mathematics, Art, and Aesthetics

1

Mathematics, Art, and Aesthetics: An Introduction Bharath Sriraman and Kyeonghwa Lee

Abstract In this short introduction, the section “Mathematics, Art, and Aesthetics” of the Handbook of the Mathematics of the Arts and Sciences is discussed. In particular, the interplay of mathematics, art, and aesthetics is examined in relation to the connections between these disciplines.

Keywords Aesthetics · Art · Mathematics and art

Aesthetics from the Greek aisthetikos not only deals with the perception of beauty but is also a branch of philosophy with other concerns beyond the realm of the sensory. The term aesthetics can also refer to theoretical principles underlying art for the purpose of historical comparison, e.g., art movements from different periods, composition changes in music, etc. Art is seemingly ubiquitous with aesthetics, but the connection of mathematics with aesthetics is not so obvious and perhaps even unusual. In other words, can one discuss mathematics per se and aesthetics without using art as a mediator? This section of the Handbook is substantially longer than any other section, which suggests that the relationship between mathematics and aesthetics is more

B. Sriraman () Department of Mathematical Sciences, University of Montana, Missoula, MT, USA e-mail: [email protected] K. Lee Department of Mathematics Education, College of Education, Seoul National University, Seoul, South Korea e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_140

3

4

B. Sriraman and K. Lee

pervasive than one imagines. The aim of this introduction is to unravel some strands of this relationship. An issue that immediately arises in any discussion of mathematics and art is how these two disciplines are defined. One could view such questions as rhetorical – Doesn’t everyone know what mathematics is? Or what art is? Let us start with the latter. The unabridged version of the Random House Dictionary of the English Language has 16 meanings for the word “art,” starting with “the quality, production, or expression, according to aesthetic principles of what is beautiful, appealing, or of more than ordinary significance” and ending with an archaic meaning that is “science, learning, or scholarship.” In this definition, clearly “art” is being referred to in terms of quality, production, or expression. If we substitute “mathematics” in place of “art,” then we need to determine the criteria of “aesthetic principles of what is beautiful” and “more than ordinary significance.” Who or what determine these criteria? There are numerous anecdotal accounts from eminent mathematicians and physicists which we will not refer to here, but needless to say these accounts attest to the important role of aesthetics in the development of mathematics, its theorems, and proofs. These accounts typically appeal to those who understand the mathematics. In other words, these descriptions are subjective. There can never be anything that universally appeals to everyone. In order to be of “more than ordinary significance,” the burden of selection or curation by experts in a discipline has to be met. In some rare cases, the “object” simply has stood the test of time (e.g., the Taj Mahal). Olsson (2019) narrates a portion of the letters during WWII between André Weil, a mathematician, and Simone Weil, a philosopher, in which Simone asked her brother to explain to her what he did as a mathematician. The answers that Andre gave Simone were vague. Olsson writes: That math is an art, that one of its signature qualities is its beauty—these are ideas that continue to be articulated by mathematicians, even as non-mathematicians may wonder, as Simone did, what that could possibly mean. I myself become wary when a mathematician or scientist speaks about the beauty of her discipline, since it can seem vague and high-handed, if not wrong.

Others like Paul Erdös were more dismissive about questions of beauty in mathematics – “If you don’t see why, someone can’t tell you.” It is difficult to “see” the beauty in numbers or abstract mathematics or in proofs; however, visual artistic expression offers one glimpses into why questions of beauty or appeal can be posed in relation to mathematics. To wit, the chapters in this section of the book can be loosely divided into three strands, namely: 1. Mathematics present in artistic expression. In particular, the interplay of geometry and art both in visual and musical form. 2. Aesthetic experiences in mathematics that mediate artistic possibilities. In other words, art that is the result of mathematics. 3. Art through a mathematical lens that suggests ideas or concepts in a sensual form. One can also refer to Hegel (1835) who proposed that art should make ideas intuitive in a sensual sense, and even transcend thought. This suggestion is realized in some chapters in this section, which propose ways in which music can be transformed

1 Mathematics, Art, and Aesthetics: An Introduction

5

into geometric art (rhythm to pattern) and vice versa. Another chapter demonstrates how Korean traditional music instruments can play the seven frieze patterns. Other chapters of this section give a more intra-mathematical perspective of beauty by using modern methods of computer graphics and programming to transform abstract mathematical concepts like co-prime numbers, möbius transformations, Blaschke products, and the Riemann zeta function into stunning visual forms. Caveat emptor – an individual simply looking at the visual forms may not necessarily realize or understand the mathematics conveyed through these visualizations. However, this is not a prerequisite by any means to be able to appreciate the striking visual patterns that emerge. Finally a tactile perspective of mathematics through art in the form of origami, braiding, and knots is also given. It has become clichéd or a form of mathematical kitsch to refer to the ubiquity of the golden ratio in different visual representations in art – the chapters in this section more or less debunk this cliché. Mathematics offers much more than a study of ratios to be able to appreciate art, just as art continually presents possibilities for mathematics to develop a lens through which one might enlarge our aesthetic understanding.

References Hegel GW (1835) Vorlesungen über die Aesthetik. https://www.lernhelfer.de/sites/default/files/ lexicon/pdf/BWS-DEU2-0170-04.pdf Olsson K (2019) The artistic beauty of math. The Paris Review. https://www.theparisreview.org/ blog/2019/07/22/the-aesthetic-beauty-of-math/

2

The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do? Scott B. Lindstrom

Contents Introduction: Who Is Modern Homo Habilis Mathematicus? . . . . . . . . . . . . . . . . . . . . . . . . . . . What Would Jon Do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Phase Portraits: A Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reinvention by Bridging Between Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Repurposing Phase Portraits for Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamical Geometry and Asymptotic Destination Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experimentally Checking Numerical Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Completing the Circle: The Line from Specific to General . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sometimes All You Need Is a Good Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Walking on a Dynamical System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . When the Computer Knows More Than You Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symbolic Answers from Numerical Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conic Programming and Mystery Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 10 12 15 19 20 27 29 30 32 34 35 37 39 41

Abstract Jonathan Borwein was a founder and early champion of the field of experimental mathematics. His high-profile accomplishments and extensive writing on the role of computational discovery have served to establish experimental mathematics as a field in its own right. In the wake of his passing, the question “What would Jon do?” has served as a frequent catalyst in the investigations of the present author. The present work draws the inspiration for its name from Borwein’s first posthumous book chapter, and it is intended as a spiritual progeny to all of

S. B. Lindstrom () Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_133

7

8

S. B. Lindstrom

Borwein’s expositions on experimental mathematics. Borwein was well known as an ardent supporter of the use of visualization. This work samples the myriad of ways in which artistic methods are instrumental to experimental mathematics, along with the ways that experimental mathematics gives birth to unexpected art. The examples are problems encountered by the present author and solved with techniques motivated by the strategies and interests of Borwein. These motivations are highlighted. Particular emphasis is placed on tools, strategies, and the broader arc that research follows: from low-dimensional, specific, and visible, to high-dimensional and general. Topics include dynamical systems, geometry, optimization, error bounds, random walks, special functions, and number theory.

Keywords Experimental mathematics · Visualization · Jonathan borwein · Dynamical systems · Phase portraits · Random walks · Computational discovery

Introduction: Who Is Modern Homo Habilis Mathematicus? Sometimes it is easier to see than to say. – Jonathan Borwein Jonathan Borwein passed this simple wisdom along to all those he worked with, and it underpinned his approach to computational discovery. In his first posthumous book chapter, The Life of Modern Homo Habilis Mathematicus: Experimental Computation and Visual Theorems (Borwein 2016b), he included the following quote of Littlewood, along with his own contextualization thereof: Long before current graphic, visualization and geometric tools were available, John E. Littlewood, 1885–1977, wrote in his delightful Miscellany: A heavy warning used to be given [by lecturers] that pictures are not rigorous; this has never had its bluff called and has permanently frightened its victims into playing for safety. Some pictures, of course, are not rigorous, but I should say most are (and I use them whenever possible myself). (Littlewood 1953, p. 53)

This exposition is intended as a spiritual progeny to Borwein (2016b), from whence it draws the inspiration for its name. The question “What Would Jon Do?” has been catalytic for many challenges encountered by the present author. This work samples a collection of such investigations and describes how the techniques used to advance them are inspired by the broader strategies and interests of Borwein. Emphasis is given to the myriad of ways in which artistic methods are instrumental to computational discovery, along with the ways that experimental mathematics gives birth to unexpected art. As a first example, Fig. 1 shows the use of dynamical geometry software to find stable sets of periodic points of a dynamical system. Once the discovery is made, minimal modification of the view produces the artistic rendering in Fig. 2.

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

9

Fig. 1 Experimental discovery of periodic tendencies for a dynamical system

By experimental mathematics , we mean, as discussed by Borwein and Bailey (2008) and recalled by Borwein (2016b): 1. 2. 3. 4.

Gaining insight and intuition Discovering new relationships Visualizing math principles Testing and especially falsifying conjectures

10

S. B. Lindstrom

Fig. 2 Periodic tendencies of the Douglas–Rachford dynamical system, as revealed by a selection of sequences

5. 6. 7. 8.

Exploring a possible result to see if it merits formal proof Suggesting approaches for formal proof Computing replacing lengthy hand derivations Confirming analytically derived results The problems sampled in this exposition will showcase each of these.

What Would Jon Do? For inspiration in answering the eponymous question, the reader is well served by Borwein’s writings. A brief list should at least contain his aforementioned chapter (Borwein 2016b), Mathematics by experiment: Plausible reasoning in the 21st century (Borwein and Bailey 2008), The computer as crucible: An introduction to experimental mathematics (Borwein and Devlin 2008), Experimentation in mathematics: Computational paths to discovery (Borwein et al. 2006), coauthored with David Bailey and Roland Girgensohn, and Experimental and computational mathematics: Selected writings (Borwein and Borwein 2010), coauthored with Jon’s brother Peter Borwein. A list of such brevity is an unavoidable sin; at the time of his death, Borwein’s curriculum vitae showed 388 refereed journal articles, 103 refereed or invited proceedings items, and many books (Borwein 2016a), along with far too many outstanding metrics and accomplishments to begin listing them. Over 80 of the articles and 5 books were coauthored with David H. Bailey, his long-time friend and cofounder of the experimental mathematics field (Bailey 2016). These tallies have grown in the years since his passing; Bailey and Beebe (2020) curate a living catalog. As for the man himself, the present author’s personal remembrance of him is found in (Lindstrom 2019) and online at the Jonathan Borwein Memorial Website (Lindstrom 2016), along with many others. Bailey et al. (2020) provided a brief biography and overview of the main areas to which Borwein made major contributions. These were:

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

11

1. 2. 3. 4.

Analysis and optimization Education Financial mathematics Number theory, special functions, and pi (for his contributions to both the subject and the popularization thereof, Borwein received the nickname Doctor Pi) 5. Experimental mathematics and visualization Devlin (2020) chronicles the evolution of experimental mathematics as a discipline in its own right. Borwein’s pioneering advances therein spanned his investigations in the other four. For example, Naomi Borwein explains: When Jon wrote about Homo Habilis Mathematicus , he was advocating a new way of thinking about math education. Indeed, Jon labels the experimental mathematician Modern Homo Habilis Mathematicus where Australopithecus meets Homo Aestheticus with digital literacy and math tools. The term was playing on the extinct human ancestor, but it was also embodying the modern experimental mathematician as a math educator caught between two cultures (Borwein 2016b). For Jon, those cultures were old and new: Enlightenment era versus modern. But for me, and from conversations I had with him while he was writing (the referenced chapter), he was also playing with the idea of how experimental math sits between the mathematical research culture and the culture of math education. That’s because experimental math is an education tool. (Borwein and Osborn 2020)

Many of the tools exemplified by the present work (e.g., phase portraits, Cinderella, Geogebra, computer algebra systems) are amenable to such classroom use. Borwein and Osborn (2020) also do an excellent job of summarizing succinctly where experimental mathematics lives: Jon was a vocal advocate and disciplinary father of modern (computational and) Experimental Mathematics; the field is in fact both an educational and heuristic tool and a philosophical way of doing and experiencing mathematics. It sits on the boundary of pedagogy and research, and between proof and artefact. (Borwein and Osborn 2020)

On the matter of a succinct summary of what experimental mathematics is, perhaps the final word is best left to Borwein and Devlin (2008): United States Supreme Court justice Potter Stewart famously observed in 1964 that, although he was unable to provide a precise definition of pornography, “I know it when I see it.” We would say the same is true for experimental mathematics. Nevertheless, we realize that we owe our readers at least an approximate initial definition (of experimental mathematics, that is; you’re on your own for pornography) to get started with, and here it is. Experimental mathematics is the use of a computer to run computations—sometimes no more than trial-and-error tests—to look for patterns, to identify particular numbers and sequences, to gather evidence in support of specific mathematical assertions that may themselves arise by computational means, including search. Like contemporary chemists—and before them the alchemists of old—who mix various substances together in a crucible and heat them to a high temperature to see what happens, today’s experimental mathematician puts a hopefully potent mix of numbers, formulas, and algorithms into a computer in the hope that something of interest emerges.

The remainder of this exposition is devoted to examples that exemplify visual approaches to this kind of pattern-seeking exploration. The examples described in

12

S. B. Lindstrom

Fig. 3 Laureate Professor Jonathan Borwein FRSC FAAAS FBAS FAustMS FAA FAMS FRSNSW (20 May 1951–2 August 2016) was a founder and early champion of experimental mathematics. The cartoon of Borwein and Veselin Jungic was drawn by cartoonist Simon Roy and is courtesy of Veselin Jungic

detail are drawn from the present author’s works, some examples of Borwein’s are mentioned in order to highlight meaningful connections, and some examples lie conveniently in the intersection of these two sets. (One such example involves the study of two-set feasibility problems, so Jon would approve of – if not insist upon – this wordplay.) Borwein’s affinity for such visual approaches is captured in his oftrepeated aforementioned proverb; Fig. 3 (right) depicts him sharing it with Veselin Jungic. This cartoon appears in Jungic’s (2016) remembrance of Borwein. Figure 3 (left) shows Borwein upon his election to the Australian Academy of Science.

Phase Portraits: A Motivating Example In Computer as Crucible: An Introduction to Experimental Mathematics, Borwein and Keith Devlin illustrated experimental mathematics with the following story about the Riemann hypothesis , which states that the roots of the Riemann zeta function are either negative even integers or have real part 1/2. Riemann’s 1859 memoir did not contain any clues as to how he was led to make this conjecture. For many years mathematicians believed that Riemann had come to this conclusion on the basis of some profound intuition. Indeed, the Riemann hypothesis was held up as a premier example of the heights one could attain by sheer intellect alone. In 1929, however, long after Riemann’s death, the renowned number theorist Carl Ludwig Siegel (1896–1981) learned that Riemann’s widow had donated his working papers to the Göttingen University library. Among these papers, Siegel found several pages of dense

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

13

numerical calculations, with a number of the lowest-order zeroes of the zeta function calculated to several decimal places each. One can only imagine that, were computers available in Riemann’s time, the great German mathematician would have calculated several hundred zeroes. As it was, it is remarkable that he was able to formulate his conjecture on the basis of relatively little numerical evidence, but it seems clear that his method was one of experimental mathematics! (Borwein and Devlin 2008)

Had they written a few years later, they might instead have remarked that if computers were available in Riemann’s time, he could simply have plotted the function that now bears his name on the sphere that now bears his name and, with one 3D image, convinced himself of the hypothesis that now bears his name. In 2020, this image – in Fig. 4 – could be generated overnight on an unexceptional computer. The method of plotting employed here is called a phase portrait. Such images were popularized in a Notices article by Wegert and Semmler (2011), by Elias Wegert’s book Visual Complex Functions (Wegert 2012), and by the related Complex Beauties calendar series (Borwein and Straub 2016).

Fig. 4 A most Riemannian situation: a plot of the Riemann zeta function on the Riemann sphere immediately suggests the Riemann hypothesis

14

S. B. Lindstrom (·) |·|

0

f (z) f z

Fig. 5 Construction of a complex phase portrait

Fig. 6 Stereographic projection is a conformal map between plane and sphere

Phase plotting is a method of revealing the characteristics of functions by coloring domain points according to their images. In the complex plane, a phase portrait illustrates a complex function f by assigning the color wheel to the unit circle and associating each point z ∈ C with its image f (z)/ f (z) thereon. The basic construction is shown in Fig. 5 (left), together with a plot of the identity function (center). This color wheel assignment determines the hue (H) part of an HSV (hue, saturation, value) assignment, leaving additional freedom in the other two values. The additional freedom in V is often used to capture changes in the modulus of the image value. For example, Fig. 5 (right) is a plot of z → z3 where a logarithmic rule has been used to incorporate the modulus of the image into the parameter V. Figure 4 uses a similar logarithmic rule for the plot of the Riemann zeta function. The representation of the entire plane on the sphere also makes use of stereographic projection, shown in Fig. 6. The construction is illustrated at left; at right is a plot of z → z3 that is similar to the one in Fig. 5 (right). In the first image in Fig. 4, the real axis is a great circle running from left to right. Roots may be identified as points where all of the different hues meet. Note that the hues also meet at z = 1, where a pole exists. Another circle wrapping around the sphere “vertically” represents the line defined by ℜ(z) = 1/2. Other roots are also visible along this curve as points where all the hues meet. In the second image, this circle continues wrapping around the back side of the sphere, where it eventually

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

15

runs through a point “at infinity” on the left side. Upon observing that all of the visible roots lie on one of these two circles about the sphere, one quickly suspects the Riemann hypothesis.

Reinvention by Bridging Between Contexts It is a truism that the method of visualization should be chosen so that the phenomenon of study is clearly readable. In practice, one rarely knows a priori how best to represent the phenomenon being studied. The process is most often one of trial-and-error, even when no invention (or reinvention) is required. However, the process is easier if one can identify the properties that make established methods useful. Figure 4 is an informative case study. In the plane, the hue assignment of a point corresponds to the unique halfline emanating from the origin on which its image lies. On the other hand, the V assignment reveals information about the modulus of its image; the modulus essentially describes the origin-centered circle of unique radius on which the image point lies. The two pieces of information combine to give a unique description of the map. The map of the plane to the sphere preserves this utility, because stereographic projection sends both plane circles and plane lines to circles on the sphere, all while preserving the angles at which they meet. A map that preserves the angles and orientations between curves is said to be conformal. In Visual Complex Analysis, Needham (1997) used conformal maps to illustrate the connections between non-Euclidean geometry and the complex plane. Figure 7 shows two such conformal maps that connect three representations of hyperbolic space: the Poincaré disc, the upper half of the complex plane, and the pseudosphere. Armed with the conformal maps described by Needham (1997), Lindstrom and Vrbik (2019) reinvented phase portraits for illustrating the behavior of functions on hyperbolic space. The reinventions are nontrivial. However, they are useful because they possess many of these same properties. The constructions of Lindstrom and Vrbik (2019) assign unique colors to the preimages of unique geodesic curves in

−1 0 −i

K

Fig. 7 Conformal maps between the upper half plane and the Poincaré disc (left) and pseudosphere (right)

16

S. B. Lindstrom

TD

f Re

0

0

TD

TD

TD

f

|·| ·i

0 0

TD

Fig. 8 Two different methods of assigning unique colors to the preimages of geodesics in hyperbolic space

hyperbolic space. A geodesic curve is simply a curve that serves as the shortest path between its points. Examples of geodesics include the lines in Euclidean space and the great circles on a sphere. In the upper half complex plane representation of hyperbolic space, the geodesic curves are represented by semicircles centered on the real axis and vertical halflines emanating from the real axis. Figure 10 shows a rotation plotted on this representation of hyperbolic space. Unique colors correspond to the preimages of the geodesics that are represented by vertical half lines with real components between 0 and 2π . The preimages are themselves geodesics, the semicircles centered on the real axis and clearly visible in the rainbow. Figure 8 shows maps introduced by Lindstrom and Vrbik (2019) that assign unique colors to the preimages of both of these √ sets of geodesics. Both assignments make use of inversion in a circle of radius 2 about the point –i, the conformal map in Fig. 7 (left) that sends the real axis to the unit circle. The half-line geodesics are said to be asymptotically parallel because they meet at a point at infinity, while the origin-centered semicircle geodesics are said to be ultra-parallel because they do not meet at all. These differences enable the two plotting methods to reveal different characteristics. These may be seen in Figure 9, where rotations are plotted on the pseudosphere and Poincaré disc. Unlike the Poincaré disc, the pseudosphere has extrinsic hyperbolic curvature, meaning that the geodesics of hyperbolic space really are the shortest paths along the pseudosphere. This advantage comes at a cost; the pseudosphere affords only a

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

17

Fig. 9 Phase portraits of rotations on hyperbolic space on the pseudosphere (left) and Poincaré disc (right)

Fig. 10 A rotation of hyperbolic space in the upper half complex plane representation illustrates geodesics

limited local window into the space. If one allows the pseudosphere to bend slightly and unroll like paper, the size of this window may be augmented. The resulting “twisted” pseudosphere is also known as Dini’s surface. Figure 11 shows three rotations of hyperbolic space illustrated thereon. Under the rotations, the points at infinity where the colors meet are each mapped to the point at infinity at the “bottom” of the surfaces. The approach of Lindstrom and Vrbik (2019) provides a template for reinventing an existing visualization method for a new context. What must first be determined are the properties that make the existing visualization useful. In the case of phase portraits, these are as follows. 1. The assignment makes use of curves that are meaningful in the space under consideration.

18

S. B. Lindstrom

Fig. 11 Phase portraits of rotations of hyperbolic space on twisted pseudospheres (aka Dini’s surfaces)

a. In the case of complex functions, the plane curves used are the circles centered at the origin and the half-lines emanating from the origin. These correspond to modulus and argument, respectively. b. In the case of hyperbolic space, the plane curves are half-lines and semicircles. These correspond to asymptotically parallel and ultra-parallel geodesics, respectively. 2. The curves chosen must still be distinguishable after the application of any conformal maps involved. a. In the case of complex functions, both lines and circles are sent to circles by the conformal map of stereographic projection onto the sphere. b. In hyperbolic space: i. In the case of the pseudosphere, the chosen curves are sent to geodesics that may be recognized as the shortest paths along the surface. ii. In the case of the Poincaré disc, the chosen curves are sent to lines that divide the disc in half and semicircles that meet the edges of the disc at perpendicular angles. In both cases, the images of the plane curves are easily recognizable. Once one understands these basic principles, reinventing existing methods in new contexts is much easier. One can also repurpose old methods to illustrate other phenomena without further modification of the methods themselves. An example of such a repurposing of phase portraits is their application to illustrating invertible dynamical systems.

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

19

Repurposing Phase Portraits for Dynamical Systems The iterated application of a map from a space into itself induces a dynamical system. For example, repeated rotation about a point induces a dynamical system. Lindstrom (2019) showed that if the map is invertible, it is then amenable to illustration by phase portrait. One need only invert the operator and then generate phase portraits of the inverse raised to increasing powers. Figure 12 shows the example of repeated rotation of hyperbolic space about a point. Such an illustration is easy to interpret, because it may be read like a filmstrip where colored points are in motion. The dynamical system plotting method in Fig. 12 differs appreciably from the approach of beginning with a fixed finite set

Fig. 12 Inverting a dynamical system in order to illustrate it with phase portraits

20

S. B. Lindstrom

of points in a bounded region and thereafter coloring their images. Figure 14 in the next section shows an example of the latter for a different dynamical system, one admitted by repeated application of a projection method for solving a feasibility problem.

Dynamical Geometry and Asymptotic Destination Plotting The feasibility problem of finding a point in the intersection of two closed sets A and B is prototypical of a more general class of constrained optimization problems. Such problems are often tackled by splitting methods, which separate components of the problem that are difficult to simultaneously address into subproblems that may be solved individually. For example, the Douglas–Rachford method seeks to solve the feasibility problem by reflecting across a closest point projection on the set A before doing the same for set B, and finally updating the governing sequence by averaging with the starting point. The construction of a single step is shown in Fig. 13 (left). This approach separates the consideration of the constraint sets A and B into different substeps. In the convex case, repeated application generates a dynamical system that generically converges to a fixed point; the fixed point admits recovery of a solution to the problem (Lions and Mercier 1979, special case). For more details, see the recent survey (Lindstrom and Sims 2018). The algorithm solves many nonconvex problems as well. In the first example studied in full detail, A is a sphere and B a line, a case prototypical of other problems of interest. Borwein and Sims (2011) used the dynamical geometry package Cinderella to visualize the long-term behavior of the dynamical system for various start points; the behavior from four start points is shown in Fig. 13 (right). For points started in the axis of symmetry, the sequence generated by the dynamical system remains therein, as may be seen of the blue sequence. Continuously moving the start point of iteration around in the plane, one quickly suspects global convergence of the sequence whenever the start point is outside of the axis of symmetry.

xn

xn+1

Reflection across a closest point of B B

Reflection across a closest point of A

A

Fig. 13 The Douglas–Rachford method for the feasibility problem

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

21

Fig. 14 Visualizing a dynamical system for a grid of starting points

Borwein and Sims (2011) further exploited Cinderella’s scripting to illuminate the behavior of the dynamical system for many start points at once. An example of the visualization method is shown in Fig. 14. First, one applies a rainbow tiling to a fine grid of points in a region of the plane. Thereafter, one takes each grid point and applies the same color to its nth image under the Douglas–Rachford mapping. In Fig. 14, from upper left to lower right: n = 0, 5, 10, 15. One need only perform a handful of such experiments before convincing themselves that a more general global convergence result holds outside of the axis of symmetry. Notwithstanding the elegance of the early experimental evidence, the proof took much longer. Borwein and Sims (2011) could provide only a local convergence result. Aragón Artacho and Borwein (2013) later extended it with an

22

S. B. Lindstrom

approach based on explicit regions. Of the gulf between this latter proof and the evidence from Figs. 13 (right) and 14, Borwein (2016b) remarked that: . . . what we can prove [reference to image of proof] is frequently less than what we can see [reference to image from Cinderella]. There is nothing new here. The French academy stopped looking at attempts to solve the three classical ruler-and-compass construction problems of antiquity—trisection of an angle, doubling the cube, and squaring the circle— centuries before they were proven impossible during the nineteenth century.

Borwein then added, as footnote, the following comment: Indeed, changing the tools slightly makes all three constructions possible.

Benoist (2015) finally proved the global convergence by means of a very different tool: the Lyapunov function whose level curves are shown in Fig. 15. While this is clearly not what Borwein meant by changing the tools slightly, it is a poetic and fitting conclusion to the saga. It is also a potent illustration of the power of experiment; motivated by the Cinderella images, Benoist constructed the Lyapunov function to have level curves that are tangent to the trajectories of the dynamical system. Once the case of a sphere and line was resolved, Borwein, Lindstrom, Sims, Skerritt, and Schneider investigated the next simplest generalization of a sphere: the ellipse and the p-sphere (Borwein et al. 2018). Once one leaves the simplicity of the sphere behind, the computation of the projection becomes more complicated than simply taking a point and scaling it by the reciprocal of its norm. Instead, one must obtain the solution by solving the Lagrangian system illustrated in Fig. 16 (left). The point of projection should both lie on the ellipse (black) and also on the curve of points whereat the gradients of the elliptical function (purple) and the

Fig. 15 Benoist’s Lyapunov function for the circle and line problem

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

23

Fig. 16 The Lagrangian system associated to the projection of a point onto an ellipse

quadratic function (blue) are colinear. The x coordinate of the solution to this system is obtained by finding a root of the Lagrangian function (red). Cinderella’s dynamical interface allows one to drag the starting point of iteration or rescale the ellipse, while continuously plotting the associated Lagrangian functions, as shown in Fig. 16 (right). As one continuously stretches the ellipse and line more steeply, the Lagrangian functions behave more dramatically, and the Newton– Raphson method ultimately becomes unstable. Cinderella’s dynamical interface also allows one to plot the Newton–Raphson iterates and determine the source of the instability. The visual interface, together with the similarity of its scripting language – Cindyscript – to C, makes Cinderella a natural learning environment for basic numerical methods. Moreover, the iterated Douglas–Rachford method for an ellipse and line is a worthy example of a problem for which a stable and fast numerical solver is instrumental to building a useful tool. This is true because the most interesting algorithmic behavior only occurs when the ellipse is sufficiently stretched. By stretching the ellipse a little, one is able to disprove the hypothesis that the global convergence outside a set of zero measure extends; unstable periodic points appear as at top left in Fig. 17. Connecting every second iterate as at top right in Fig. 17 and then rotating the line dynamically as below in Fig. 17, the unstable periodic points transform into stable ones.

24

S. B. Lindstrom

Fig. 17 Using the dynamical geometry package Cinderella to study changes in a dynamical system as a repelling basin collapses into an attractive one

Playing the same game and further stretching the ellipse, points of higher periodicity appear, as in Fig. 1. By assigning different colors to distinct sequences emanating from distinct starting points, one can see the different periodic points

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

25

Fig. 18 Two methods of visualizing the behavior of a dynamical system: viewing individual sequences (left) and coloring points according to their eventual destination sets (right)

together with some local approximations of their basins of attraction. One can also zoom in to examine the individual spirals formed by the periodic convergent subsequences as they approach their limits, as in Fig. 20. Figure 18 (left) shows the full ellipse from Fig. 1, including two sequences that locally converge to the solution points in the intersection of the ellipse and line. For a better understanding of the basins of attraction, At right, we assign each pixel in the plane to a starting point and color it according to its eventual destination set under the dynamical system. In contrast with phase plotting, which colors domain points according to their image under a map, this is a kind of asymptotic destination plotting. Rather than

26

S. B. Lindstrom

Fig. 19 Destination plotting reveals a dynamical system with coloring based on Aboriginal Australian art

assigning colors to preimages of curves, colors are assigned to sets that are identified from Cinderella plots to be eventually periodic under the method (including period one for feasible points). The best-known experimental renderings of dynamical systems are probably those of Julia sets. However, Fig. 1 reveals how the dynamical system admitted by repeated application of the Douglas–Rachford operator admits a striking geometry of its own. This is even more visible in the wider version in Fig. 2. Blowing up and enhancing the destination coloring plot in Fig. 18 (right), one gleans a better sense of the extraordinary complexity of the dynamical system. Borwein et al. (2018) rendered such an enhanced image, in Fig. 19, in colorings based on Australian Aboriginal art. This is the poster image for the Australian Mathematical Society special interest group Mathematics of Computation and Optimization, which Borwein co-founded. Matthew P. Skerritt also featured it at Bridges 2018, Stockholm Sweden.

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

27

Fig. 20 Using dynamical geometry tools to visualize convergent subsequences of a dynamical system

Experimentally Checking Numerical Error Images like Fig. 18 merit some skepticism, for reasons that may be intuited from Fig. 16. Each new update of the discrete dynamical system entails solving a Lagrangian system numerically. For this reason, plots that show the long-term evolution of the dynamical system may be sensitive to the effects of compounding numerical error. Without a good way of checking this sensitivity theoretically, Lindstrom et al. (2017) turned to an experimental method. Their approach was to completely replace the solving of the Lagrangian system with a surrogate approximate approach to computing the projection onto the ellipse. This approximate approach makes use of the notion of Schwarzian reflection, a mapping that relies upon the construction of the Schwarz function. Let f be a generalization of a real function with domain P that shares one side of its boundary with the real axis. Letting f  (z) = f (z), we have that f  is an analytic continuation of f on P . In this case, f  is a Schwarz function for the curve

28

S. B. Lindstrom

f P

Q

P

Q

f

z p

K SK (z)

SK (z) K Fig. 21 Analytic continuation and Schwarzian reflection

K := {x + iy| y = 0}. Figure 21 (top left) shows this construction. More generally, the Schwarz function of a curve K is analytic and sends K to its complex conjugate (reflection across the real axis). The case when K = {x + iy| y2 = x} is shown at right in Fig. 21. The Schwarz function SK sends K (blue solid) to its conjugate (blue dashed). It sends the other solid curves to their same-colored dashed counterparts, and the dotted curves to their same-colored dot-dashed counterparts. Given a point z, the conjugate of its image under the Schwarz function for K is its Schwarzian reflection about K. The solid red curve is the conjugate of the image of the solid purple curve under the Schwarz function. Infinitesimally close to the curve K, the Schwarzian reflection and Euclidean reflection coincide; Fig. 21 (bottom left) shows how the difference between these two notions of reflection becomes more pronounced as distance to K grows. In this case, K is a circle. Schwarzian reflection is shown in red, while Euclidean reflection is shown in blue. Schwarzian reflection in a circle is simply inversion; the map at left in Fig. 7 is another example. Interestingly, the direction vectors SK (z) − z and PK z – z, where PK is the Euclidean projection onto K, remain approximately colinear. In fact, when K is a circle as at bottom left in Fig. 21, they are colinear. Because of this near colinearity, one expects the point p, where the line connecting z and SK (z) intersects K, to be a reasonable approximation for the true Euclidean projection onto K. Figure 21 (right) shows this construction. This approximate method is a completely different approach to computing the projection. If the observed behavior of the dynamical system were to remain relatively unchanged when substituting this approximate approach for the more rigorous Lagrange multiplier method, this would suggest that the observed behavior is not especially sensitive to compounding numerical error. This gives the practitioner greater confidence in the results. Indeed, this is what Lindstrom et al. (2017) observed. One would expect that the exact boundaries of the basins would change. However, started within the basins, the approximate approach generates sequences that are practically indistinguishable from the Lagrangian approach. Lindstrom et al. (2017) also investigated the behavior of the dynamical system when the Euclidean projection across a closest

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

29

point is replaced by the Schwarzian reflection itself. As Fig. 21 (bottom left) suggests, the differences in this case are more pronounced. Surprisingly, minimally perturbed versions of the same periodic points and basins still appear under this approach, providing even more experimental evidence that the rich complexity is no numerical artifact. In this way, one may combine visualization and approximate methods of computation in order to obtain experimental evidence for the trustworthiness of other experimental results. This is also a useful example of the power of borrowing: seemingly disparate mathematical areas may be fertile ground for useful numerical techniques. In this case, the Schwarzian reflection had never before been used as a step of an iterated projection and reflection algorithm. Lindstrom et al. (2017) included an abbreviated description of the computation of the Schwarz function for the ellipse, based on the more detailed and motivated description of Needham (1997), whose book inspired them to use it. Texts like Needham’s Visual Complex Analysis make such fruits accessible by placing the geometry front and center.

Completing the Circle: The Line from Specific to General The investigation of the Douglas–Rachford dynamical system for plane curves might seem a journey deep into the weeds. However, when one ventures into the weeds, one often returns with a hen. Such was true of this excursion. The investigations of Borwein and his coauthors yielded valuable insights about the intriguing success of the Douglas–Rachford method for nonconvex feasibility problems. Aragón Artacho et al. (2019) have provided an excellent survey of the many of the feasibility problems for which the Douglas–Rachford has been successfully adapted. Moreover, among the legacy of those early investigations are many ongoing projects that extend well beyond of the context of feasibility problems and into the more general optimization setting. Lindstrom and Sims (2018) reviewed the history of the method, including a discussion of such problems. Notably, Aragón Artacho and Campoy (2019) have bootstrapped from feasibility problems back up to more general optimization problems by introducing an algorithm that finds the image of a point under the resolvent of a sum of maximally monotone operators. The latter generalizes the closest point projection onto the intersection of sets. More recently, Lindstrom (2020) has described a relationship between the circumcentering reflections method of Behling et al. (2018a, b, 2019) and the underlying Lyapunov functions for the Douglas–Rachford operator introduced by Benoist (2015) and further developed by Dao and Tam (2019) and Giladi and Rüffer (2019). Lindstrom (2020) showed that, for certain feasibility problems, the former iteratively returns minimizers of quadratic surrogates for the latter. Figure 22 (left) shows this relationship. Having discovered this characterization, he designed a new method, shown in Fig. 22 (right), that retains similar properties with fewer structural assumptions, making it adaptable to primal/dual implementation for optimization problems beyond the feasibility setting. Dizon et al. (2020) have also used a variant

30

S. B. Lindstrom

Fig. 22 Centering methods that minimize surrogates for Lyapunov functions

of this method to find wavelets. In all these ways and more to come, Borwein’s experimental foray into the weeds, and the philosophy that underpinned it, continue to advance the understanding far beyond them.

Sometimes All You Need Is a Good Walk On the power of naming, Borwein wrote that In the current mathematical world, it matters less what you know about a given function than whether your computer package of choice (say Maple, Mathematica or SAGE) or online source . . . does. (Borwein and Lindstrom 2016)

He made this broader point in the context of explaining how computer algebra systems may illuminate functional relationships wherein the named functions are those of which the practitioner is not even aware. Borwein and Lindstrom (2016) illustrated this point with the Meijer G-function, musing as follows: The Meijer-G function is very useful, if a bit difficult for a human to remember the exact definition for. Often one’s computer can help . . . While researchers who have prior experience with these special functions may come to the same conclusions by hand, Mathematica and Maple will figure it out as well . . . [A] web search, or Maple’s Function Advisor then inform the scientist. This is another measure of the changing environment: naiveté need no longer impair to the same extent when the computer may aid in the discovery. It is usually a good idea—and not at all immoral—to data mine.

They went on to describe how such mining led Borwein et al. (2012) to the discovery of a Meijer-G closed form for the Moment function of a four-step planar random walk for ℜ(z) ≥ 2. Borwein et al. (2011) provided a recursive definition

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

31

of this moment function that may be used to evaluate it numerically in the case ℜ(z) < –2. Armin Straub generated an early rendering of its phase portrait in Borwein et al. (2013) and Borwein and Straub (2013). Borwein and Borwein (2010) featured this version on the cover of their book, Experimental and Computational Mathematics: Selected Writings, a deep dive on their shared interests. Straub and Zudilin (2017) provided a review of this literature, dedicated: To the memory of Jon Borwein, who convinced us that a short walk can be adventurous[.]

Borwein and Armin Straub contributed a larger version this phase portrait, computed by Elias Wegert with their data and shown in Fig. 23, to the 2016 Complex Beauties Calendar. Borwein and Lindstrom (2016) included this version in their discussion of computer-assisted discovery. Borwein’s interest in phase plotting is also what originally inspired Lindstrom and Vrbik (2019) to extend phase portraits to the setting of hyperbolic geometry as described earlier in this chapter. What may be the most iconic image from Borwein’s works, A Walk on Pi (Fig. 24), is also born out of random walks. Aragón Artacho, Borwein, and Bailey created this rendering as part of their work, Walking on Real Numbers, which appeared in Mathematical Intelligencer (Aragón Artacho et al. 2013). The image shows 100 billion steps of a walk on the base-4 digits of π . Walks on numbers are useful, because they may provide experimental evidence for or against the base-b normality of the number under examination. A number α is said to be base-b normal if the expected limiting frequency of every base-b string of length m is exactly 1/bm . While normalcy and randomness

Fig. 23 The Moment function of a four-step planar random walk was discovered by Borwein et al. (2012) and has been featured in many places

32

S. B. Lindstrom

Fig. 24 A walk on the first 100 billion base-4 digits of π , courtesy of Aragón Artacho et al. (2013)

are distinct concepts, one might expect that a clear pattern in the walk would be instrumental to falsifying the conjecture of either. If such a pattern is identifiable and describable, it may even lead to the discovery of new descriptions for the phenomenon being “walked” on. Such was the case for next the example, wherefore the walk in question was used to illuminate the behavior of a dynamical system admitted by the Douglas–Rachford operator.

Walking on a Dynamical System Bauschke et al. (2019) studied the dynamical system at left in Fig. 25, which is prototypical of the behavior of the Douglas–Rachford dynamical system for a circle and line on the axis of symmetry, as illustrated by the blue sequence in Fig. 13. The sequence is clearly chaotic in the sense that it lies on the shared boundary between attractive sets for two different points, but it also exhibits a pattern. They rewrote a closely related dynamical system as the binary sequence 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, . . . , (1) which they quickly determined was aperiodic. When confronted with mysterious sequences, numbers, or expansions, the following tools may be useful. 1. The Online Encyclopedia of Integer Sequences (OEIS) is useful for sequences of numbers or sequences of digits for numbers (2016).

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

33

Fig. 25 Walking on a dynamical system

2. The Inverse Symbolic Calculator (ISC) developed by Jonathan Borwein, Peter Borwein, and Simon Plouffe may be useful for identifying constants (Borwein et al. 1995). 3. The parameterized integer relation construction algorithm PSLQ was introduced by Ferguson and Bailey (1992). 4. David Stoutemyer’s AskConstants program for Mathematica. Borwein and Corless (1999) provided an early guide to forming conjectures with the ISC and PSLQ, including a description of how PSLQ led to the discovery by Bailey et al. (1997) of the celebrated Bailey–Borwein–Plouffe formula for independently calculating the kth hexadecimal (base-16) digit of π . An initial search of OEIS for Eq. (1) did not return any matches. (In retrospect, this was because too many digits were entered.) Plotting the binary sequence as a walk on the vertical axis over horizontal discrete time at right in Fig. 25, one immediately observes the hidden sequence

2, 2, 3, 2, 3, 2, 2, 3, 2, 3, 2, 2, 3, 2, 3, 2, 3 . . . ,

which is listed √ in the OEIS as the First differences of the Beatty sequence A003151 for 1 + 2 (Kimberling 2011). This  led to√ the  discovery that Eq. (1) is the  √  sequence floor n 2 − 1 − floor (n − 1) 2 (Kimberling 2016). Lindstrom (2019) furnished a diagram that describes these relationships. This led to a complete description of the behavior on the axis of symmetry in terms of generalized Beatty sequences. The walk was instrumental to the discovery.

34

S. B. Lindstrom

When the Computer Knows More Than You Do Borwein and Lindstrom (2016) continued their discussion of the power of naming and the importance of what one’s computer algebra system knows by illustrating with the Lambert W function. Informally, Lambert W is the function whose two real branches satisfy W(x)eW (x) = x. Lambert W admits the discovery of a closed form solution for an object of interest in optimization: the Fenchel–Moreau–Rockafellar conjugate of the (t∈]0, 1[)-weighted average of the Boltzmann-Shannon entropy and energy functions: ft : [0, ∞[ → R : x → t (x log(x) − x) + (1 − t) x 2 /2. In a follow-up article, Bauschke and Lindstrom (2020) sought the (λ∈]0, 1[)proximal average of these two functions, which informally provides a continuous transformation of one graph into the other (continuous in the epi-topology). For this purpose, one must solve for the variable y in the equation   λ x − (1 − λ) W ey − y = 0. 2 This task is nontrivial; the straightforward inquiry stumps Maple. When one further specifies λ = 1/p for a specified prime, say p = 2, this situation at first appears to persist, with Maple returning a RootOf expression. This usually indicates that Maple cannot solve the equation. However, a further request for allvalues returns infinitely many solutions that, upon further scrutiny, differ only in the specified branch of the Lambert W function. Repeating the procedure with varying primes for p and analyzing the pattern, one may guess the closed form solution:  y=

2 λ

   2x  − 2 W λ2 − 1 e λ 2 λ

−1

+

2x . λ

(2)

The proof is quite involved, and so it behooves the practitioner to first test the solution obtained with this identity against what one would expect the true solution to look like. The heat plot in Fig. 26 clearly shows the graph of the energy (dark blue) transforming into the graph of the Boltzmann–Shannon entropy (dark red). “Knowing” that the solution is correct gives one the confidence to proceed with the lengthy proof, for which Maple is also instrumental. The transformation is striking, as the energy has full domain while the Boltzmann–Shannon entropy has restricted domain.

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

35

Fig. 26 A heat plot confirms the hypothesized closed form in Eq. (2)

Symbolic Answers from Numerical Approximations In another illuminating case study, Burachik et al. (2019a) sought the elusive closed form for the Fenchel–Moreau–Rockafellar conjugate of the Fitzpatrick function of the logarithm. This object is of interest, because it admits the harshest generalized Bregman distance in the family of the Kullback–Leibler divergence. Deriving this conjugate via the typical analytic route would require the solution of a complicated equation, which proved prohibitive, even with the aid of computation. Fortunately, the problem may be reformulated as that of finding the largest convex function that lower bounds  g : R2 → R : (x, y) →

xy if y = log x; ∞ otherwise.

The graph of this function is shown at left in Fig. 27. This geometry problem admits a different equation, though one that is also unsolvable. Throwing in the towel on a symbolic solution and pursuing a numerical one, the natural approach is to take a point z ∈ R2 and compute a set of candidate values for f(z), returning the lowest as the solution to plot. This may be accomplished by taking various lines Li passing over z in the domain space and striking the graph of g twice in the lifted space; the lowest height of all these lines at z is the approximate numerical solution. This construction is shown in Fig. 27 (center). Using Maple, one has that the points xi , yi at which the line Li strikes the graph of g are actually given explicitly in the

36

S. B. Lindstrom

y3

0.50 0.25 0 –0.25

G(S) x1 x2 x3

z

L3

y2

L2

L1

y1

0.50 0.25 0 –0.25

Fig. 27 Experimental discovery of the conjugate of the Fitzpatrick function of the logarithm

Fig. 28 Left envelopes for representative functions of the logarithm

two real branches of the Lambert W function. This facilitates the computation by admitting the fast routines for evaluation of W implemented in Maple. However, upon careful examination of the full numerical procedure, one notices that the best candidate value is always obtained for the line closest to vertical. This observation leads not only to a plausible conjecture for the closed form, shown at right in Fig. 27, but also to a method for proving its correctness. One can also first test its correctness by plotting the generalized Bregman distance admitted thereby with other distances from the same family, such as that of the Kullback–Leibler divergence. In the sequel, Burachik et al. (2021) provide the so-called γ -parameterized generalized Bregman envelopes that regularize the function | · −1/2|. They contrasted the new members of the family with the known case of the Kullback–Leibler divergence, which was originally provided in Bauschke et al. (2017). The results make heavy use of the Lambert W function, and the heat plots in Fig. 28 allow for an easy comparison. Through the use of color, the reader may observe that for any fixed parameter γ , the envelope functions are increasing from left to right. Such plots provide visual evidence for the correctness of the conjectured forms, a valuable service because the arithmetic is so complicated. Interestingly, the meetings with Lambert W do not end there.

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

37

Conic Programming and Mystery Geometry Lindstrom et al. (2020) discovered error bounds for exponential cone feasibility problems by computing the facial residual functions for the exponential cone. Figure 29 shows this cone in purple, together with its dual in green. Taking any point z in the green dual cone, its complementary subspace {z}⊥ intersects the pink exponential cone boundary in a set F, called an exposed face. With the exception of the purple two-dimensional exposed face, the other exposed faces are all onedimensional halflines emanating from the origin. One of them is highlighted in blue. The computation of the error bound reduces to the following problem. Take a bounded net of points (wm )m ⊂ {z}⊥ that satisfies d(wm , F) → 0 as m → ∞. Take also a corresponding net (vm )m in the boundary of the cone, where each vm is the nearest point to wm that is sent to wm under the map of projection onto {z}⊥ . Then show that lim d (wm , vm ) /d(wm , F )2 > 0.

m→∞

(3)

Using an xyz coordinate system, this problem reduces to finding the limit as t := (1 − β)wy − wx → 0 of the expression (wx − vx )/((β − 1)wy + wx )2 . Working with a computer algebra system, one can reduce this trivariate expression to a bivariate one by replacing (wx – vx ) with the complicated expression of (wx , wy ) in Fig. 30. The parameter β = zy /zx encodes the face being considered. Having reduced to an expression of wx and wy , one can attack the limit as t := (1 − β)wy − wx → 0 with the tools of calculus. Great arithmetic difficulty ensues, but again the computer algebra system is instrumental. A Taylor series centered at t = 0, that might have taken weeks to compute by hand alone, was built in mere hours with computer assistance. Its correctness was then quickly confirmed by comparison with limits obtained numerically.

Fig. 29 The exponential cone and its dual

38

S. B. Lindstrom

Fig. 30 A complicated expression found with the aid of a computer algebra system

Having found the solution, all that remained was to put it on rigorous footing. This was more challenging than retracing the path of computational discovery, because the involved terms fail to be analytic at t = 0 when β = β := − 12 W0 2e−2 ≈ −0.1088575528, a constant that will henceforth be referred to as evil beta . For evil beta, the Lambert W term goes to ±∞ as t → 0. It is easy to observe, and even to prove, that that growth is counteracted by other terms in the expression tending to zero. This success notwithstanding, the difficulty in proving the limit persisted. Figure 29 does not immediately suggest any special geometric properties of the exposed face corresponding to evil beta; this face is just to the right of, and very close to, the highlighted blue one-dimensional face. If a concrete geometrical characterization of evil beta exists, it remains unknown. Figure 31 illuminates an analytic characterization. At left, fixing wy = 2 and plotting the bivariate   function of (t, β), one sees that the branch change runs through the point 0, β . At right, fixing β = β and plotting the bivariate function of (t, y), one sees the branch change living on the line given by t = 0. This suggests that the analytic challenges may go away if the employment of the Lambert W function may be avoided. Analyzing the geometry in Fig. 29, it is apparent that for one-dimensional faces sufficiently near to the purple twodimensional face, preimages of wm for the projection onto the hyperplane {z}⊥ may contain two points, one corresponding to each real branch of the Lambert W function. A possible remedy presents itself. Since the correct Lambert W branch is related to the inversion of the projection, perhaps the branch change is an avoidable artifact of the inversion. If one can find an equivalent expression through a process that avoids this inversion, perhaps the branch change may be removed. In fact, this is the case. Reformulating the problem from the perspective of the points (vm )m that lie on

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

39

Fig. 31 Visualizing the branch change on a function from R3 to R

the cone, one obtains an expression that is exactly equivalent but avoids the step of inverting the projection. This expression does not involve the Lambert W function. Perhaps the importance of evil beta is wholly summarized by the fact that some value of β had to lie at the branch change. If this is the case, then evil beta had no geometrical significance beyond its impediment as an artifact of the choice of variables. This is an unsatisfying answer for the geometer. Whatever the case, the change of variables removed the impediment, and the proof followed. Yet again, the visualization suggested a path to the solution.

Conclusion With the aforementioned story about the Riemann hypothesis – among other examples – Borwein and Devlin (2008) highlighted the fact that mathematics by experiment is hardly new. In the sense that the word computer referred to a human prior to the latter half of the twentieth century, neither is computerassisted mathematics by experiment. What has changed are the tools available. One might ask: how remarkable is this? When Keith Devlin took over the short-lived “Computers and Mathematics” section of the Notices (Devlin and Wilson 1995), he wrote This column is surely just a passing fad that will die away before long. Not because mathematics will cease to have much connection with computers, but rather, quite the reverse: the use of computers by mathematicians will become so commonplace that no one thinks to mention it any more. (Devlin 2020)

When Devlin’s prophecy came true and he concluded the column 4 years later in 1994, he opined:

40

S. B. Lindstrom The disappearance of this column does not mean that the Notices will stop publishing articles on the use of computers in mathematics. Rather, recognizing that the use of computer technology is now just one more aspect of mathematics, the new Notices will no longer single out computer use for special attention. I will drink to that. The child has come of age. (Devlin 2020)

With the use of computers and experiment essentially ubiquitous in the contemporary mathematical landscape, in what way then is experimental mathematics a discipline in its own right? Fourteen years later, Borwein and Devlin set out to succinctly answer this question in Computer as Crucible: What makes experimental mathematics different (as an enterprise) from the classical conception and practice of mathematics is that the experimental process is regarded not as a precursor to a proof, to be relegated to private notebooks and perhaps studied for historical purposes only after a proof has been obtained. Rather, experimentation is viewed as a significant part of mathematics in its own right, to be published, to be considered by others, and (of particular importance) to contribute to our overall mathematical knowledge. (Borwein and Devlin 2008)

Placing useful visualization front and center, then, is clearly a hallmark of the genre. This is true in both the pedagogical and investigatory aspects of the discipline, recalling Borwein and Osborn (2020)’s characterization of experimental mathematics as living on this boundary. Experimental mathematics, then, is both a scientific and artistic pursuit. Image Sources A small version of Fig. 1 appeared in Borwein et al. (2018) and a large version in its online appendix (Borwein et al. 2017). Figure 2 is original to this work, but is implicitly contained in Fig. 18, which appeared in Borwein et al. (2017). The photo of Jonathan Borwein in Fig. 3 was used in the presentation for Borwein and Lindstrom (2016); the cartoon was drawn by Simon Roy and included in Jungic (2016)’s remembrance of Borwein on the Jon Borwein memorial webpage managed by David Bailey. The cartoon appears courtesy of Veselin Jungic. Figure 4 is original to this work. Figure 5 (left) and (center) are from Lindstrom and Vrbik (2019), while the image at right is original to this work. Figure 6 (left) is from Lindstrom and Vrbik (2019) while the right image is original to this work. The images in Figs. 7 and 8 are from Lindstrom and Vrbik (2019)). The images in Figs. 9, 10, and 11 are original to this work and are closely related mathematically to images in Lindstrom and Vrbik (2019). The images in Fig. 12 are from Lindstrom (2019). Figure 13 (right) is original to this work; a slightly different version of the left image appeared in Lindstrom (2019) and originally in Díaz Millán et al. (2020). Figure 14 is original to this work. Figure 15 is original to this work, though different images of this Lyapunov function have appeared in the various related references. Figure 16 is original to this work. Figure 17 is from Borwein et al. (2018). Figure 18 is from Borwein et al. (2017), and cropped versions of these images appeared in Borwein et al. (2018). Figure 19 is from Borwein et al. (2017) and Lindstrom (2019), and is also the official artwork of the Australian Mathematical Society special interest group Mathematics of Computation and Optimization (MoCaO).

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

41

Figure 20 is from Borwein et al. (2018), and a larger version is in Borwein et al. (2017). Figure 21 is original to this work and is similar to images in Lindstrom et al. (2017). Figure 22 is original to this work and based on a similar image in Lindstrom (2020). Figure 23 first appeared in Borwein and Straub (2016) and has also been featured in Borwein and Lindstrom (2016). Related and cropped variants have appeared in other places, as described in the text above. Figure 24 is originally from Aragón Artacho et al. (2013) and has been featured in many articles in the popular press. It is also featured online in Walking on Real Numbers: A Multiple Media Mathematics Project (Aragón Artacho et al. 2014). Figure 25 is original to this work. Figure 26 is from Bauschke and Lindstrom (2020). Figure 27 is from Burachik et al. (2019a). Figure 28 is from Burachik et al. (2021). Figure 29 is from Lindstrom et al. (2020). Figures 30 and 31 are original to this work, because that analysis was omitted from Lindstrom et al. (2020) in lieu of the shortest path to the solution.

References Aragón Artacho FJ, Borwein JM (2013) Global convergence of a non-convex Douglas–Rachford iteration. J Glob Optim 57(3):753–769 Aragón Artacho FJ, Campoy R (2019) Computing the resolvent of the sum of maximally monotone operators with the averaged alternating modified reflections algorithm. J Optim Theory Appl 181(3):709–726 Aragón Artacho FJ, Bailey DH, Borwein JM, Borwein PB (2013) Walking on real numbers. Math Intell 35(1):42–60 Aragón Artacho FJ, Bailey DH, Borwein JM, Borwein PB, with the assistance of Fountain J, Skerritt MP (2014) Walking on real numbers: a multiple media mathematics project. https:// walks.carma.newcastle.edu.au/ Aragón Artacho FJ, Campoy R, Tam MK (2019) The Douglas–Rachford algorithm for convex and nonconvex feasibility problems. Math Meth Oper Res 91(2):201–240. (arXiv preprint arXiv:190409148) Bailey DH (2016) Jonathan Borwein dies at 65. Jonathan Borwein Memorial Website. https:// jonborwein.org/2016/08/jonathan-borwein-dies-at-65/ Bailey DH, Beebe NH (2020) Publications and talks by (and about) Jonathan M. Borwein. https:// www.jonborwein.org/jmbpapers/ Bailey D, Borwein P, Plouffe S (1997) On the rapid computation of various polylogarithmic constants. Math Comput 66(218):903–913 Bailey DH, Borwein NS, Brent RP, Burachik RS, Osborn JH, Sims B, Zhu QJ (2020) Jonathan Borwein: mathematician extraordinaire. In: Bailey DH, Borwein N, Brent RP, Burachik RS, Osborn JA, Sims B, Zhu Q (eds) From analysis to visualization: a celebration of the life and legacy of Jonathan M. Borwein, Callaghan, Australia, September 2017, Springer proceedings in mathematics and statistics. Springer, Cham, pp ix–xx Bauschke HH, Dao MN, Lindstrom SB (2017) Regularizing with Bregman–Moreau envelopes. SIAM J Optim 28(4):3208–3228. (arXiv preprint arXiv:170506019) Bauschke HH, Dao MN, Lindstrom SB (2019) The Douglas–Rachford algorithm for a hyperplane and a doubleton. J Glob Optim 74(1):79–93

42

S. B. Lindstrom

Bauschke HH, Lindstrom S (2020) Proximal averages for minimization of entropy functionals is pending publication in Pure and Applied Functional Analysis. (arXiv preprint arXiv:1807.08878) Behling R, Bello-Cruz JY, Santos LR (2018a) On the linear convergence of the circumcenteredreflection method. Oper Res Lett 46(2):159–162 Behling R, Cruz JYB, Santos LR (2018b) Circumcentering the Douglas–Rachford method. Numer Algorithms 78:759–776 Behling R, Bello-Cruz JY, Santos LR (2019) On the circumcentered-reflection method for the convex feasibility problem. (arXiv preprint arXiv:200101773) Benoist J (2015) The Douglas–Rachford algorithm for the case of the sphere and the line. J Glob Optim 63:363–380 Borwein JM (2016a) Jonathan Borwein: curriculum vitae. https://carma.newcastle.edu.au/ resources/jon/CV.pdf Borwein JM (2016b) The life of modern Homo Habilis Mathematicus: experimental computation and visual theorems. In: Tools and mathematics, mathematics education library, vol 347. Springer, Berlin, pp 23–90 Borwein JM, Bailey DH (2008) Mathematics by experiment: plausible reasoning in the 21st century. A.K. Peters Ltd, Wellesley Borwein JM, Borwein PB (2010) Experimental and computational mathematics: selected writings. PSI Press, Portland Borwein JM, Corless RM (1999) Emerging tools for experimental mathematics. Am Math Mon 106(10):889–909 Borwein JM, Devlin K (2008) The computer as crucible: an introduction to experimental mathematics. A.K. Peters Ltd/CRC Press, Wellesley Borwein JM, Lindstrom SB (2016) Meetings with Lambert W and other special functions in optimization and analysis. Pure Appl Funct Anal 1(3):361–396 Borwein NS, Osborn JH (2020) On the educational legacies of Jonathan M. Borwein. In: Bailey DH, Borwein N, Brent RP, Burachik RS, Osborn JA, Sims B, Zhu Q (eds) From analysis to visualization: a celebration of the life and legacy of Jonathan M. Borwein, Callaghan, Australia, September 2017, Springer proceedings in mathematics and statistics. Springer, Cham, pp 103–131 Borwein JM, Sims B (2011) The Douglas–Rachford algorithm in the absence of convexity. In: Bauschke HH, Burachik RS, Combettes PL, Elser V, Luke DR, Wolkowicz H (eds) Fixed point algorithms for inverse problems in science and engineering, Springer optimization and its applications, vol 49. Springer, New York, pp 93–109 Borwein JM, Straub A (2013) Mahler measures, short walks and log-sine integrals. Theor Comput Sci 479:4–21 Borwein JM, Straub A (2016) Moment function of a 4-step planar random walk. Complex Beauties (2016 calendar). http://www.mathe.tu-freiberg.de/files/information/calendar2016eng.pdf Borwein JM, Borwein P, Plouffe S (1995) Inverse symbolic calculator. http://wayback.cecm.sfu. ca/projects/ISC/ISCmain.html Borwein JM, Bailey DH, Girgensohn R (2006) Experimentation in mathematics: computational paths to discovery (combined interactive CD version edition). A.K. Peters Ltd, Natick Borwein JM, Nuyens D, Straub A, Wan J (2011) Some arithmetic properties of short random walk integrals. Ramanujan J 26(1):109 Borwein JM, Straub A, Wan J, Zudilin W, with appendix by Zagier D (2012) Densities of short uniform random walks. Can J Math 64:961–990. http://arxiv.org/abs/1103.2995 Borwein JM, Straub A, Wan J (2013) Three-step and four-step random walk integrals. Exp Math 22(1):1–14 Borwein JM, Lindstrom SB, Sims B, Skerritt M, Schneider A (2017) Appendix to dynamics of the Douglas–Rachford method for ellipses and p-spheres. http://hdl.handle.net/1959.13/1330341 Borwein JM, Lindstrom SB, Sims B, Skerritt M, Schneider A (2018) Dynamics of the Douglas–Rachford method for ellipses and p-spheres. Set-Valued Var Anal 26(2): 385–403

2 The Art of Modern Homo Habilis Mathematicus, or: What Would Jon Borwein Do?

43

Burachik RS, Dao MN, Lindstrom SB (2019a) The generalized Bregman distance. to appear in SIAM J Optim (arXiv preprint arXiv:190908206) Burachik RS, Dao MN, Lindstrom SB (2021) Generalized Bregman Envelopes and Proximity Operators (arXiv preprint arXiv:2102.10730) Dao MN, Tam MK (2019) A Lyapunov-type approach to convergence of the Douglas–Rachford algorithm. J Glob Optim 73(1):83–112 Devlin K (2020) How mathematicians learned to stop worrying and love the computer. In: Bailey DH, Borwein N, Brent RP, Burachik RS, Osborn JA, Sims B, Zhu Q (eds) From analysis to visualization: a celebration of the life and legacy of Jonathan M. Borwein, Callaghan, Australia, September 2017, Springer proceedings in mathematics and statistics. Springer, Cham, pp 133–139 Devlin K, Wilson N (1995) Six-year index of “computers and mathematics”. Not Am Math Soc 42:248–254 Díaz Millán R, Lindstrom SB, Roshchina V (2020) Comparing averaged relaxed cutters and projection methods: theory and examples. In: Bailey DH, Borwein N, Brent RP, Burachik RS, Osborn JA, Sims B, Zhu Q (eds) From analysis to visualization: a celebration of the life and legacy of Jonathan M. Borwein, Callaghan, Australia, September 2017, Springer proceedings in mathematics and statistics. Springer, Cham, pp 75–98 Dizon N, Hogan J, Lindstrom SB (2020) Centering projection methods for wavelet feasibility problems. (arXiv preprint arXiv:200505687) Ferguson HR, Bailey DH (1992) A polynomial time, numerically stable integer relation algorithm. RNR technical report, RNR-91-032, 14 July 1992 Giladi O, Rüffer BS (2019) A Lyapunov function construction for a non-convex Douglas–Rachford iteration. J Optim Theory Appl 180(3):729–750 Jungic V (2016) Jon Borwein: a friend and a mentor. JonathanBorwein MemorialWebsite. https:// jonborwein.org/2016/08/jon-borwein-a-friend-and-a-mentor/ Kimberling C (2011) The On-Line Encyclopedia of Integer Sequences (entry a188037). https:// oeis.org/A188037 Kimberling C (2016) The On-Line Encyclopedia of Integer Sequences (entry a276862). https:// oeis.org/A188037 Lindstrom SB (2016) Jon made us big. Jonathan Borwein Memorial Website. http://jonborwein. org/2016/09/jon-made-us-big/ Lindstrom SB (2019) Proximal point algorithms, dynamical systems, and associated operators: modern perspectives from experimental mathematics. PhD thesis, University of Newcastle, Newcastle upon Tyne Lindstrom SB (2020) Computable centering methods for spiraling algorithms and their duals, with motivations from the theory of Lyapunov functions. (arXiv preprint arXiv:200110784) Lindstrom SB, Sims B (2018) Survey: sixty years of Douglas–Rachford. J AustMS (to appear). (arXiv preprint arXiv:180907181) Lindstrom SB, Vrbik P (2019) Phase portraits of hyperbolic geometry. Mathematical Intelligencer 41(3):1–9 Lindstrom SB, Sims B, Skerritt MP (2017) Computing intersections of implicitly specified plane curves. Nonlinear Conv Anal 18(3):347–359 Lindstrom SB, Lourenço B, Pong TK (2020) Error bounds, facial residual functions and applications to the exponential cone. arXiv:2010.16391 Lions PL, Mercier B (1979) Splitting algorithms for the sum of two nonlinear operators. SIAM J Numer Anal 16(6):964–979. https://doi.org/10.1137/0716071 Littlewood JE (1953) A mathematician’s miscellany. Methuen, London Needham T (1997) Visual complex analysis. Clarendon Press, Oxford Straub A, Zudilin W (2017) Short walk adventures. In: Jonathan M. Borwein commemorative conference. Springer, Newcastle, pp 423–439 Wegert E (2012) Visual complex functions: an introduction with phase portraits. Springer, Basel Wegert E, Semmler G (2011) Phase plots of complex functions: a journey in illustration. Not AMS 58(6):768–780

3

The Beauty of Blaschke Products Ulrich Daepp, Pamela Gorkin, Gunter Semmler, and Elias Wegert

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complex Arithmetic and Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seeing Complex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyperbolic Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Blaschke Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Blaschke Products and Ellipses in the Euclidean Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Blaschke Products and Ellipses in the Poincaré Disk Model . . . . . . . . . . . . . . . . . . . . . . . . . . . Compositions of Blaschke Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46 49 51 56 60 63 67 71 75 77

Abstract This chapter is dedicated to showing how visual tools, using geometry and color, can be used to enhance the understanding of statements about complex functions. In particular, the focus will be on the class of finite Blaschke products, and the relevant geometries are Euclidean as well as hyperbolic. Some of the geometric tools that will be used include symmetry, tilings, and curves generated by lines constructed using particular properties of Blaschke products. The focus then turns to the possibility of visualizing when a Blaschke product is the composition of two (nontrivial) Blaschke products. Color appears in the phase portraits that are constructed, and the main results are then validated with these visual tools.

U. Daepp · P. Gorkin () Bucknell University, Lewisburg, PA, USA e-mail: [email protected]; [email protected] G. Semmler · E. Wegert Technische Universität Bergakademie Freiberg, Freiberg, Germany e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_88

45

46

U. Daepp et al.

This chapter begins with an introduction to finite Blaschke products, the Poincaré disk model, and phase portraits of complex functions.

Keywords Blaschke product · Phase portrait · Ellipse · Poincaré disk model · Hyperbolic geometry · Symmetry · Composition of functions · Complex function

Introduction Geometry is one of the first mathematical disciplines in history, and this is not just happenstance. Geometric ideas are abstractions of objects commonly found in the human visual experience, and its theorems were first discovered by looking at simple sketches. In this chapter, the focus is on the geometry of so-called hyperbolic polynomials, also known as (finite) Blaschke products. Though there are many geometries, functions will be considered in one of two: the Euclidean plane and the Poincaré disk model. Euclidean geometry may be thought of as a deductive structure of thoughts; this was first realized in the 13 books of the Elements by Euclid. Although Euclid’s texts have shaped the way modern mathematics is developed and taught, he remains an obscure figure in Greek mathematics. The Elements start from definitions, axioms, and postulates, from which all propositions are derived through logical reasoning and (though not always perfectly) without recourse to visual perception. Here, axioms and postulates are basic statements that are accepted without proof. The postulates are stated below as they appear in Euclid’s elements; Heath’s translation, without the parenthetical statements, is from Euclid (1956). The postulates are stated in terms of construction: (1) (2) (3) (4) (5)

To draw a straight line from any point to any point. To produce (extend) a finite straight line continuously in a straight line. To describe a circle with any center and distance (radius). That all right angles are equal to one another. (The parallel postulate): That, if a straight line falling on two straight lines makes the interior angles on the same side less than two right angles, the two straight lines, if produced indefinitely, meet on that side on which the angles are less than two right angles.

The fifth postulate is different than the others – for one thing, it is difficult to understand. Playfair’s postulate, which is equivalent to the parallel postulate, is easier to understand and states that (P) In a plane, given a line and a point not on it, at most one line parallel to the given line can be drawn through the point.

3 The Beauty of Blaschke Products

47

In fact, many mathematicians felt the fifth postulate could be derived from the other four. Failure to show that this postulate could be deduced from the others led to other geometries. One of these, called hyperbolic geometry is based on the following “fifth postulate”: (5 ) Through any point not on a given line more than one straight line can be drawn parallel to the given line. These geometries are quite different; for example, in Euclidean geometry, the sum of the angles in a triangle is the sum of two right angles, and in hyperbolic geometry, the sum is less than two right angles. (There is also a third geometry, called spherical geometry, in which the sum is more than two right angles.) This chapter will consider familiar functions and symmetries in the Euclidean and hyperbolic geometries. Informally speaking, symmetry means that an object does not change when you move it in some way. For example, in Euclidean geometry, symmetries come from translations, reflections, and rotations; see Fig. 1. More details on symmetries, as well as similarities, will be given later. A geometric model is specified by its points and lines. While there are other hyperbolic models, the focus here is on the Poincaré disk model in which the points are the points of the open unit disk (inside the bounding circle) and the lines are circles (or diameters) that are orthogonal to the bounding circle. Thus, points close to the bounding circle may be “far” from each other in this model, but close to each other in the Euclidean geometry. Hyperbolic geometry was attractive to the graphic artist M. C. Escher, for example, because it is restricted to a disk and thus lies inside a bounded region. It also preserves angles; more precisely, the angles between intersecting lines are equal to the Euclidean angles between the tangents to the hyperbolic lines at the points of intersection. This model is said to be conformal, and therefore objects retain their shape in a repeating pattern, roughly speaking.

Fig. 1 Translation, reflection, and rotation

48

U. Daepp et al.

Functions that behave well in hyperbolic space are the Möbius maps (or linear fractional transformations). While these will be discussed further in the section entitled “Complex Arithmetic and Geometry”, at this point it is only important to mention that the Möbius transformations that map the open unit disk D to itself and the unit circle ∂D to itself have a special form: M(z) = c

z − z0 , 1 − z0 z

where z, z0 ∈ D, and c is a point on the unit circle. These maps and their properties are relatively easy to investigate. A next natural step would be to investigate properties of products of such maps. One goal of this chapter is to consider these products, known as Blaschke products, not only from the point of view of formal mathematics but also from a visual point of view. This will be done in the section entitled Blaschke Products. While real-valued functions of real variables are easy to visualize using a twodimensional graph, Blaschke products are complex-valued functions of complex variables that map the unit disk to the unit disk and the unit circle to itself. Therefore, visualization requires four dimensions. While this might sound impossible, there is a method for doing precisely this. This method relies on a particular coloring of the complex plane and consideration of lines of constant modulus and argument. This procedure will be explained in detail in the section Seeing Complex Functions. For the moment, we only illustrate this with a particular example in Fig. 2. The number of products of Möbius transformations used is called the degree of the Blaschke product, and it is known that the Blaschke product winds the circle around itself the same number of times as its degree. This can be seen in Fig. 2 by looking at the unit circle and noting that each color shows up the same number of times as the degree: Each color appears 5 times and the degree of the Blaschke product is 5. What

Fig. 2 Representation of a degree-5 Blaschke product in the plane and on the Riemann sphere

3 The Beauty of Blaschke Products

49

happens when points of the same color on the unit circle are connected with lines in the Euclidean plane? What happens in the Poincaré disk model? The answer will appear in sections entitled Blaschke Products and Ellipses in the Euclidean Plane and Blaschke Products and Ellipses in the Poincaré Disk Model, respectively. While these results are certainly interesting from a geometric point of view, they also have a curious function-theoretic application. For this, a little more background is required: Given two polynomials, it is easy to compose them. But given a polynomial p, it is usually very difficult to decide whether or not there are two polynomials (of degree greater than one) such that when they are composed, the resulting function is p. It would seem that the same is true for Blaschke products, yet it turns out that it is possible to “see” when a Blaschke product is a composition of two other Blaschke products of degree greater than one. The technique for deciding whether or not a Blaschke product is a composition will be discussed in the section Compositions of Blaschke Products. The geometry of Blaschke products provides not only a deep and rich mathematical theory but also a theory that can be understood through visualization. The authors of this chapter hope that this serves to convince the reader of the beauty of Blaschke products.

Complex Arithmetic and Geometry Complex numbers z = x + iy can be represented as points z in the complex plane C, often associated with the names of Wessel, Argand, or Gauss. In this context, the elementary arithmetic operations admit geometric interpretations. In the following, it is convenient to consider the point z not as being fixed, but as a representative of any point in the complex plane. Adding a (fixed) complex number b to z moves z to z + b, which induces a parallel shift of the complex plane that moves the origin 0 to b. Similarly, multiplication of z by a positive number r amounts to dilating (or contracting) the plane by a stretching (contracting) factor of r, keeping the origin fixed. Multiplication by complex numbers of the form eiϕ , where ϕ is a real number (these are called unimodular and are represented by points on the unit circle ∂D), is another special transformation; the mapping z → eiϕ z rotates the plane by an angle of ϕ about the origin. So complex addition and multiplication represent three similarity transformations: translation, dilation/contraction, and rotation. These operations can be combined freely with each other. For instance, multiplication with an arbitrary complex number a = reiϕ (with r > 0) is the combination of a rotation and a dilation (in either order) that both keep the origin fixed. This is called a spiral similarity. Another example is the mapping z → eiϕ (z−c)+c, which describes a rotation by an angle ϕ about the point c. Expressed in mathematical jargon, the complex operations z → z + b (with b ∈ C) and z → a·z (with a ∈ C\{0}) generate a group with respect to composition. An arbitrary element of this group is the mapping z → az + b, which is also the most general form of a bijective holomorphic self-map of the complex plane.

50

U. Daepp et al.

This group is an arithmetic model of the group of orientation preserving similarity transformations of the Euclidean plane. The fourth of the elementary similarity transformations, reflection, can also be expressed as an operation on complex numbers: the mapping z → z sends z = x+iy to its complex conjugate z = x − iy, which represents reflection in the real line R. It is a similarity transformation, but in contrast to the other three it reverses orientation, which is the reason why it will be excluded from the game. It is also possible to reflect an object in a curved surface. In the plane, the simplest such transformation is reflection in the unit circle, also called inversion. This transformation sends a point z to a point z∗ on the same ray emanating from the origin in such a way that the product of the absolute values |z| and |z∗ | is 1. This may be viewed using complex arithmetic via the mapping z → 1/z. As an orientation reversing transformation, inversion plays against the rules. However, when combined with reflection in the real line, the mapping z → 1/z is obtained, and this can be admitted. The action of this transformation is illustrated in Fig. 3. In fact, the new player z → 1/z paves the way for consideration of two nonEuclidean geometries, called hyperbolic and spherical, respectively. To enter the spherical world, the complex plane must be complemented by the point at infinity, z = ∞. A model for this extended complex plane, C ∪ {∞}, is the Riemann sphere S. With appropriate definitions of the arithmetic operations, like 1/0 = ∞ and 1/∞ = 0, the mappings z → z + a, z → a · z, z → 1/z can be extended to the Riemann sphere. These mappings can then be combined arbitrarily (composed), and any such composition is a Möbius transformation, f (z) =

az + b , cz + d

a, b, c, d ∈ C, ad − bc = 0.

The Möbius transformations form a group that is an arithmetic model of the group of all orientation preserving conformal (angle preserving) automorphisms (bijective self-mappings) of the Riemann sphere.

Fig. 3 The transformation z → 1/z acting on a flower bed: original, transformed, zoomed out

3 The Beauty of Blaschke Products

51

Seeing Complex Functions Visualization is an important tool for understanding mathematical objects. In two dimensions, it is possible to visualize a real-valued function of a real variable via its graph: If x is a real number in the domain of f , the value f (x) is associated with x and the point (x, f (x)) is plotted. If the domain and the range of f are subsets of the complex numbers, then a point z in the domain can be written as z = x + iy. Therefore, the domain already requires two dimensions, as does the range. Visualization in four dimensions is tricky, and it is a central focus of this chapter. Traditionally, complex functions are visualized by so-called analytic landscapes that depict the graph of the modulus, |f |, of f . An example appears on the left of Fig. 4, which is an illustration of Euler’s gamma function from the famous book of Jahnke and Emde (1909). A careful look at this picture shows that the surface carries additional lines. There are two families of such lines: along lines of the first type f has constant modulus, while on the other lines the argument (phase) of f is constant. Now that computer graphics are available, it is more convenient to use colors to code the phase information; on the right, this method is applied to the same function. Another common method of visualization considers complex functions f : D → G as transformations that map the domain D onto the range G. When D is endowed with some additional structure, S, that can be transplanted to the range plane via f , it may be possible to recover mapping properties of the function f from the shape of the image f (S). A typical application is visualization of (univalent) conformal mappings, where S is a mesh formed by two families of orthogonal grid lines (see Arnold and Rogness 2008, e.g.). Conformality of the mapping, or preservation of angles, is reflected in the orthogonality of the image mesh. However, when f is not univalent, this technique of pushing S forward is limited, because the image f (S) is self-overlapping (see Fig. 5). Fortunately, there is an easy way to overcome this dilemma: Instead of pushing structures forward from the

Fig. 4 A historical and a contemporary analytic landscape of the gamma function

52

U. Daepp et al.

Fig. 5 Transplantation of a square mesh via complex functions

Fig. 6 Pullback of a flower bed (right) via f (z) = z5 (left) and f (z) = sin z (middle)

z-plane to the w-plane via w = f (z), it is possible to do this the other way around; that is, they can be pulled back. An unconventional realization of this idea is depicted in Fig. 6. Here the periodic “flower bed” on the right is pulled back by the functions w = z5 (left) and w = sin z (middle), revealing symmetries of these mappings. For serious investigations, it is better to use standardized structures. This is illustrated in Fig. 7 for the rational function f (z) = (z − 1)/(z2 + z + 1). The image in the middle of the figure is a standard polar mesh in the w-plane. Its pullback via f is shown on the left, and a colored analytic landscape of f is depicted on the right. The colored analytic landscape uses only saturated colors to represent the phase. So why not incorporate the modulus of functions by brightness and color-code the values f (z) completely? This is the basic idea of domain coloring. Early examples appeared in the 1980s, but the method was popularized by Frank Farris in his paper Farris (1998). The examples in Fig. 8 show domain coloring representations of f (z) = (z − 1)/(z2 + z + 1), f (z) = sin z, and f (z) = e1/z . Since domain coloring encodes the values f (z) uniquely, these images make it possible to recover the depicted function – at least theoretically. In practice, however, it is often difficult to see the details clearly. In some cases it is even better to discard the modulus of a function completely. This leads to the concept of phase portraits (or phase plots), which are color-coded representations of the phase f/|f |

3 The Beauty of Blaschke Products

53

Fig. 7 A grid mapping induced by f (z) = (z − 1)/(z2 + z + 1) and its colored analytic landscape

Fig. 8 Domain coloring for f (z) = (z − 1)/(z2 + z + 1), f (z) = sin z, and f (z) = e1/z

on the domain of f . (For more detailed information, see Wegert (2012); Wegert and Semmler (2011)). There is also an annual calendar, Complex Beauties, that provides a phase portrait for each month and is available from its homepage http://www.math calendar.net/. It can be verified easily that holomorphic (and, more generally, meromorphic) functions are uniquely determined by their phase up to a positive constant factor, so that (in principle) all essential information can be reconstructed from phase plots. In order to reveal hidden structures, additional information can be incorporated. Figure 9 illustrates the construction of such enhanced phase plots. In the first step, a standard color scheme highlighting special features is defined in the w-plane (lower row). In the second step, every point z in the domain of f is assigned the same color as the value f (z) in the w-plane (upper row). The leftmost column corresponds to a plain phase plot; the second has shaded contour lines of |f |; in the third, some lines of constant phase are emphasized; and the rightmost phase plot is equipped with a conformal tiling. The last color scheme combines the advantages of traditional domain coloring with mesh representations. The location of zeros and poles can easily be seen in a phase plot: these are the points where all colors meet and the number of times each color appears indicates their multiplicity. Zeros and poles can be distinguished by the orientation of colors in their neighborhood. Figure 10 shows some examples.

54

U. Daepp et al.

Fig. 9 Different versions of phase plots (upper row) as pullback from the images below

Fig. 10 Appearance of zeros (upper row) and poles (lower row) in a phase plot

Aside from zeros and poles, there is a third class of points that is at least of the same importance for the structural analysis of functions: the critical points, that is, the zeros of its derivative f  . Critical points that are not zeros of f are said to be saddle points, a name that is motivated by their appearance in the analytic landscape. In a plain phase plot, it is difficult to locate saddle points. Since the value of the function is nearly constant in a neighborhood of such points, the phase plot is almost monochromatic there and does not show much detail (see Fig. 18). One way to depict them more clearly is to enhance lines with constant phase. In fact, if z0 is a saddle point of order k (i.e., f  has a zero of multiplicity k at z0 ), then z0 is the crossing point of exactly k + 1 such isochromatic lines. The functions

3 The Beauty of Blaschke Products

55

Fig. 11 Saddle points of orders 1, 2, 3, and 8 as crossings of isochromatic lines

Fig. 12 Saddle points of orders 1, 2, 3, and 8 located in an exceptional tile

depicted in Fig. 11 from left to right have saddle points of orders 1, 2, 3, and 8, respectively. In these images some highlighted isochromatic lines run through z0 (which is the exception rather than the rule). The four images in Fig. 12 show the typical appearance of saddle points of orders 1, 2, 3 and 8, respectively, in enhanced phase plots with conformal tiling. Though it is nearly impossible to count how many isochromatic lines cross in these pictures, the orders of the saddle points can be determined from the number of vertices of the large exceptional tile: A tile with 4k + 4 vertices contains a saddle point of order k (or several saddle points with orders summing up to k). Let T be a tile in an enhanced phase plot of a meromorphic function f that is constructed from a (standard) polar conformal tiling of the w-plane (see Fig. 9, right). Then T is said to be regular if its closure does not contain a zero, a pole, or a saddle point of f . Each regular tile is the (univalent) conformal image of a square, and its vertices correspond to the corners of the square. While all regular tiles are conformally equivalent (to a unit square), this is not so for the exceptional tiles. Moreover, these tiles have some intrinsic symmetry that depends on the location of the saddle point z0 . This is visualized in Fig. 13. The squares in the lower row are embedded in a (Cartesian) chessboard tiling. The upper row shows some prototypes of exceptional tiles T that are generated by holomorphic functions with a single critical point z0 inside T or on its boundary. The red dots indicate the locations of the critical points z0 (upper row) and the critical value f (z0 ) (lower row). Note that several (two or four) squares are involved in the construction in the third and fourth column. More subtle symmetries are revealed when the squares of the range tiling are subdivided into four triangles, which are colored red, yellow, green, and blue,

56

U. Daepp et al.

Fig. 13 Some prototypes of exceptional tiles (shown in black)

Fig. 14 Symmetries near critical points arising from subdivision of tiles

respectively, in Fig. 14. As in Fig. 13, the images in the upper row are constructed by pulling back the images in the lower row via w = f (z).

Hyperbolic Geometry One strategy used to prove the parallel postulate was to replace it by its negation and then to search for a contradiction. Carl Friedrich Gauss first realized that this search would be in vain and that, in fact, there is a geometry that satisfies all Euclidean axioms but with the fifth postulate negated. Although he had developed a fairly advanced non-Euclidean geometry in the year 1816, Gauss did not publish

3 The Beauty of Blaschke Products

57

it and confined himself to occasional remarks in private letters (see Stäckel 1933 for details). The first publications about what is now called hyperbolic geometry are by Nikolai Ivanovich Lobachevsky (1829) and János Bolyai (1832). Yet, even later, whether or not this new and counterintuitive geometry might contain a contradiction remained an open question. These doubts could be dispelled by providing a Euclidean model for hyperbolic geometry; that is, as discussed in the introduction, the terms line and point, among others, are associated with certain objects from Euclidean geometry that are then shown to satisfy the axioms. It follows that nonEuclidean geometry is consistent if Euclidean geometry is. The first model described by Eugenio Beltrami in 1868 was the pseudosphere model that represents only a portion of the hyperbolic plane and will therefore not be discussed further. That same year Beltrami first described the Poincaré and Klein models, even in n dimensions (see Milnor 1982). In two dimensions these models can be introduced most conveniently using projections from a sphere, as is visualized in Fig. 15. (1) The points of the hemisphere model are the points lying strictly in the upper half of a unit sphere in R3 ; the lines are half-circles orthogonal to the equator. In Fig. 15, one such line is depicted in yellow. It can be verified easily that each pair of distinct points determines exactly one line that meets them both. Moreover, to a given line L and a point P not on L, there exist an infinite number of lines through P that do not intersect L, so that this geometry does not satisfy Playfair’s axiom. (2) The Poincaré disk model is obtained from the hemisphere model by stereographic projection from the south pole to the equator plane. Here points are identified with points in the open unit disk D, and lines are either open arcs orthogonal to the unit circle or open diameters. In Fig. 15, points and lines in the Poincaré model are depicted in red. Figure 16 on the left shows a few hyperbolic lines in the Poincaré model. Again, through two distinct points in D, there is exactly one (hyperbolic) line. The endpoints on the unit circle, which are not part of the line, are referred to as ideal points. Hence, two different lines terminating at the same ideal point are

Fig. 15 Isomorphisms between the hemisphere model, the Klein model, and the Poincaré models

58

U. Daepp et al.

Fig. 16 Lines in the Poincaré disk model and the half-plane model

parallel because they have no common point in D. This distinguishes them from ultraparallel lines, which do not even tend to common ideal points. Figure 16 also illustrates the fact that Playfair’s axiom is violated. (3) Projecting the hemisphere orthogonally onto the equator plane yields the Klein model, Cayley-Klein model, or Beltrami-Klein model. Its points are the points in the open unit disk, and its lines are open line segments connecting ideal points on the boundary of the unit disk. A (hyperbolic) line is depicted in blue in Fig. 15. Composing the two mappings just introduced yields an isomorphism between the Poincaré and the Klein model. Its explicit form in complex coordinates is F (z) =

2z . 1 + |z|2

(1)

For more details about the Klein model, the reader is referred to Baldus (1944) and Greenberg (2007). (4) If, as in Fig. 15, the projection is performed from a point on the equator of the Riemann sphere onto a plane tangent to the sphere at an opposite point, the model that lives in a half plane, usually identified with the upper half plane H := {x + iy : y > 0} of the complex plane, is obtained. This model is the Poincaré half-plane model in which lines are either open semicircles or open half-lines orthogonal to the real line; see Fig. 16, right. Elementary considerations also show a mapping that directly relates both Poincaré models. Such a mapping turns out to be the Cayley transform, a special Möbius transformation that maps D onto H: F (z) = i

1−z . 1+z

(2)

See, for example, the exposition in Anderson (2005), which is based on the Poincaré half-plane model. Hilbert’s axioms for plane geometry introduce only five undefined or primitive terms by the properties they are supposed to possess. These are “point,” “line,”

3 The Beauty of Blaschke Products

59

“incidence,” “betweenness,” and “congruence.” So, in the description of the models, all these terms must be addressed, not only points and lines. Incidence and betweenness have their usual Euclidean meaning. It is therefore possible to define a line segment AB as the collection of all points on the line through A and B that are located between A and B or coincide with one of the points. Such a line segment is colored in green in the two Poincaré models shown in Fig. 16. It is also possible to define (hyperbolic) convexity: A set is hyperbolically convex if it contains all line segments between any two of its points. As in Euclidean geometry, the intersection of all convex sets containing a set M is again convex and will be called the hyperbolic convex hull of M. The notion of congruence and length of line segments in the Poincaré model is more subtle, since it must be compatible with Hilbert’s axioms. To this end, we define the hyperbolic distance between z and w in D by    z−w  1 + ρ(z, w)  . , where ρ(z, w) =  d(z, w) = log 1 − ρ(z, w) 1 − wz 

(3)

The function ρ is called the pseudo-hyperbolic distance. This formula has interesting consequences: If z is fixed and w tends to the boundary of the unit disk, their distance will go to infinity. In other words, the Poincaré disk is unbounded and ideal points have infinite distance from points inside the disk. If both z and w tend to ∂D while their Euclidean distance remains fixed, their hyperbolic distance will go to infinity. The other way around also works: An object of fixed size in the hyperbolic world that moves toward the boundary of the unit disk, (which is not a boundary of the hyperbolic disk) becomes tiny for an observer in the Euclidean plane. The conformality of the Poincaré disk model and the Möbius map (2) imply that the Poincaré half-plane model is also conformal. On the other hand, the transition map (1) is not conformal. Therefore, the hyperbolic size of an angle between two lines in the Klein model is not equal to its Euclidean size. The final relevant notion is that of isometries, that is, self-mappings of the hyperbolic plane that leave the length of line segments and, therefore, angles invariant. Isometries form a group under composition that acts on the hyperbolic plane. In the Poincaré disk model, there are two types of isometries: conformal selfmaps of the unit disk, which are the special Möbius transformations that can be written in the form b(z) = c

z − z0 1 − z0 z

(4)

with |c| = 1 and z0 ∈ D, and anti-conformal self-maps of D, which can be written as the composition b(z). Recall that z → z is the reflection in the real axis and it reverses orientation. A function of the form (4) is called a Blaschke factor. When considered in the complex plane, b has a pole at 1/z0 , as can be seen on the left of Fig. 17. The image in the middle of the figure shows a Blaschke factor.

60

U. Daepp et al.

Fig. 17 A conformal self-map of the disk (domain on the left and in the middle, image at right)

This self-map of D is a hyperbolic translation; that is, there is a line that is invariant under b (which is the black line with the white dots in the middle and right images of the figure).

Blaschke Products A complex polynomial p of degree n has the form p(z) = an zn + · · · + a1 z + a0 with complex coefficients a0 , . . . , an , where an = 0 is assumed. It follows from the fundamental theorem of algebra that, for any w ∈ C, the equation p(z) = w has exactly n solutions, if they are counted according to multiplicities. Since p is also holomorphic, we can say that a polynomial is a self-map of C that maps C onto an n-fold covering of itself. In view of the fact that lim|z|→∞ |p(z)| = ∞, polynomials have a continuous extension onto the Riemann sphere by defining p(∞) := ∞. Recalling that z → 1/z interchanges 0 and ∞, it is even possible to speak of the multiplicity of ∞, which is then defined to be the multiplicity of the zero of the analytic function z → 1/p(1/z), and this multiplicity is easily seen to be equal to n. Hence, polynomials are also self-maps of the Riemann sphere of valency n. (It should be noted that they are not the only ones.) What are the self-maps of the unit disk of valency n? The answer is that these are exactly the functions B(z) = c

n  z − zk 1 − zk z

(5)

k=1

with |c| = 1 and z1 , . . . , zn ∈ D. These functions are called Blaschke products or hyperbolic polynomials of degree n. Blaschke products share many properties with polynomials. Again by the fundamental theorem of algebra, a polynomial of degree n can be factored as p(z) = c (z − z1 ) · · · (z − zn )

3 The Beauty of Blaschke Products

61

where z1 , . . . , zn ∈ C are its zeros, each repeated according to its multiplicity. Recalling that conformal self-maps of C are of the form z → az + b, it follows that polynomials of degree n are exactly those functions that can be written as a product of n conformal self-maps of C. Comparing the conformal self-maps of D appearing in (4) with the general form of a Blaschke product of degree n as in (5), it follows that Blaschke products are exactly those functions that can be written as a product of n conformal self-maps of D. The derivative of a polynomial p of degree n is a polynomial of degree n − 1 and therefore has n − 1 zeros, called critical points of p. The derivative of a Blaschke product B of degree n is not a Blaschke product, but it has n − 1 critical points in D (counting, of course, possible multiplicities). If the points ζ1 , . . . , ζn−1 ∈ C are prescribed, and a polynomial p with these points as critical points is to be constructed, it is apparent that p (z) = a(z − ζ1 ) · · · (z − ζn−1 ) with a = 0 arbitrary, and thus  p(z) =

p (z) dz = a

 (z − ζ1 ) · · · (z − ζn−1 ) dz + b;

that is, p exists and is uniquely determined up to post-composition with a conformal self-map z → az + b of the plane C. The analogous property for hyperbolic polynomials is a nontrivial result that was first proved by Heins: Given n − 1 points in D, there is a Blaschke product B of degree n with these points as critical points, Heins (1962, 1986). The function B is unique up to post-composition with conformal self-maps of D. Semmler and Wegert (2019) gave a simple proof using equilibriums of charge configurations. For further references on this topic, see Kraus and Roth (2008, 2013) and Garcia et al. (2018, Chapter 6). Blaschke products live most naturally in the Poincaré disk model because of their conformal character. This is demonstrated by the following theorem due to Walsh, which is the hyperbolic counterpart of the well-known Gauss-Lucas theorem, Walsh (1950, 1952). The Gauss-Lucas theorem states that the critical points of any complex polynomial lie in the convex hull of its zeros. Walsh’s theorem says that the critical points (in D) of a hyperbolic polynomial (i.e., a Blaschke product) lie in the hyperbolic convex hull of its zeros (for an alternative proof, see Wegert 2011). In the picture on the left of Fig. 18, the 10 zeros of a Blaschke product are given (black points), along with their hyperbolic convex hull and the critical points in D of the Blaschke product (gray points). In fact, from Formula (5) it is possible to define the Blaschke product B not only in the unit disk but also on the Riemann sphere. In this case, B is a rational function that maps the Riemann sphere onto itself. More precisely, each of the upper hemisphere, the lower hemisphere, and the equator is mapped onto an n-fold covering of itself, respectively. In particular, a Blaschke product of degree n maps D onto itself n times, and it wraps the unit circle around itself n times with strictly monotone argument. The Blaschke product B also shows some nice symmetry: The value B(1/z) at a point 1/z, which is the reflection of z in the unit circle, is the reflection of B(z) in

62

U. Daepp et al.

Fig. 18 Walsh’s theorem on the location of critical points of a Blaschke product

Fig. 19 Blaschke products of degrees 3, 20, and 70 in the square |Re z| ≤ 2, |Im z| ≤ 2

the unit circle, B(1/z) = 1/B(z),

z ∈ C.

(6)

This has the consequence that the zeros zk of B inside the unit circle correspond to poles of B at the reflected points 1/zk outside the unit disk. This is illustrated in Fig. 19. (Not all poles can be seen because z is restricted to the square |Re z| ≤ 2, |Im z| ≤ 2.) The perfect symmetry induced by relation (6) can be seen more readily by viewing the Blaschke product on the Riemann sphere, as shown in Fig. 20. Since reflection reverses orientation, the orientation of the color spectrum in the upper and lower hemisphere is reversed, and the zeros in the upper hemisphere appear as poles in the lower hemisphere. This brief discussion demonstrates the surprising analogy between Blaschke products and polynomials, justifying the name hyperbolic polynomials. More substantial results in this direction can be found in Ng and Tsang (2013).

3 The Beauty of Blaschke Products

63

Fig. 20 Blaschke products of degrees 3, 20, and 70 on the Riemann sphere

Fig. 21 A Blaschke product of degree 5 with some inscribed polygons

Blaschke Products and Ellipses in the Euclidean Plane Recall from the previous section that a Blaschke product of degree n wraps the unit circle, ∂D, around itself n times. Therefore, given any color, there will be exactly n points on the unit circle in the phase plot that have this color; in other words, these points are mapped to the same λ ∈ ∂D. Now draw the convex n-gon with these points as vertices. In Fig. 21, this process is applied once in the figure on the left and then repeated on the right, moving λ around the unit circle. As this happens, the envelope of the n-gons obtained produces a closed curve in the unit disk. The phase portrait of the Blaschke product appears to have no obvious symmetries, but what about the curve? Can the curve be described in a nice way? Does it possess symmetries? And, if so, do these symmetries say anything about the Blaschke product?

64

U. Daepp et al.

To form a Blaschke curve, points of the same color on the unit circle are joined to their nearest neighbors using line segments. The envelope of these line segments yields the curve. The curve appearing in Fig. 21 is obtained from a Blaschke product of degree 5, and, perhaps as expected, it does not have any obvious symmetry. Here it is assumed that B(0) = 0. If this is not the case, it is possible to choose a Blaschke factor M mapping B(0) to 0 and then to consider the Blaschke product defined by B1 := M ◦ B. Now B1 and B are of the same degree, B1 (0) = 0, and the curve obtained from B1 is the same as the one obtained from B. One thing is surprising though: The four nonzero zeros of the Blaschke product (the white dots not in the center) lie inside the curve. It is not difficult to show that if B(0) = B(a) = 0 and a = 0, then for each λ ∈ ∂D the point a lies in the convex hull of the points z for which B(z) = λ, Gorkin et al. (1994, Proposition 4.8). Consider the Blaschke product B(z) = i z (z − 1/2) / (1 − z/2). In the picture of Fig. 22, there are two red points on the unit circle and these are the two points z1 and z2 for which B(z1 ) = B(z2 ) = 1. The result of connecting these two points is the picture on the left. Continue connecting the two points on the circle of the same color and drawing the line segments; something unexpected happens as a result of this process. Looking at the picture on the right in Fig. 22, all line segments seem to intersect in one point, namely, the point 1/2, the nonzero zero of B. This is always true: Let B be a Blaschke product of degree 2 with one zero at 0 and one zero at a point a in D, and consider the line segment joining the two points z1,λ and z2,λ for which B(z1,λ ) = B(z2,λ ) = λ. Then for all λ on the unit circle, these line segments will pass through the point a. In this simplest case, the enveloping curve will degenerate to a point. While this might be considered a symmetric object, it is not a very interesting one. The obvious question now is: What happens if B has zeros 0, a, and b and the points B identifies on the unit circle are connected?

Fig. 22 Line segments with points on the unit circle identified by a Blaschke product of degree 2

3 The Beauty of Blaschke Products

65

Fig. 23 The circles associated to B1 (z) = z3 and B2 (z) = z6

Start with the simplest degree-3 Blaschke product, namely, B(z) = z3 . As in Fig. 23, the points B identifies are equally spaced on the unit circle, and it follows from this that the curve obtained from this process is a circle, a perfectly symmetric curve. In fact, any Blaschke product of the form czn for |c| = 1 and a positive integer n ≥ 3 will be associated with a circle centered at the origin. The bigger the n is, the larger the radius of the circle. This is illustrated by the picture on the right of Fig. 23. Now consider the case in which the degree-3 Blaschke product has a nonzero zero. For example, suppose the two zeros are a = 0.8 − 0.1i and b = 0.5 + 0.5i. In Fig. 24, the three points B maps to the point 1 are connected to obtain the single triangle appearing in the picture on the left. For the picture on the right, 20 such triangles are formed. As before, a and b lie inside all the triangles. But more seems to be true here. In fact, it seems that every triangle circumscribes an ellipse and that the foci of the ellipse are located at these two zeros of B. A classical theorem explains why such a result might be expected: Siebeck’s theorem says that given three noncollinear points z1 , z2 , and z3 , the zeros of the function F (z) =

m1 m2 m3 + + , where m1 , m2 , m3 > 0 z − z1 z − z2 z − z3

are the foci of the ellipse inscribed in the triangle formed by z1 , z2 , z3 . Applying Siebeck’s theorem to the function Fλ (z) =

B(z)/z , for λ ∈ ∂D, B(z) − λ

66

U. Daepp et al.

Fig. 24 A Blaschke product of degree 3

one sees that for each λ ∈ ∂D, the triangle with vertices at the three points B maps to λ circumscribes an ellipse with foci at the zeros of B(z)/z. But these ellipses may change as λ changes. So the question is: Why do these triangles always circumscribe the same ellipse? For the answer, see Daepp et al. (2002, 2018). For Blaschke products of degree 3, then, the result is a well-known curve with two axes of symmetry, and the two foci of the curve are the zeros of B(z)/z. The next case to consider is, of course, degree 4. Here, when the convex quadrilaterals with vertices at the points the Blaschke product identifies are formed, the result is a curve – but not a familiar one, in general. However, there is a very special case in which some symmetry appears: By a theorem of Fujimura, the associated curve of a degree-4 Blaschke product is an ellipse if and only if the Blaschke product is the composition of two degree-2 Blaschke products, Fujimura (2013) (see also Gorkin and Wagner 2017). For example, the picture on the left of Fig. 25 is associated with a Blaschke product that is a composition, B = D ◦ C, where

C(z) = z

z − (0.2 − 0.5i) z − (0.5 + 0.3i) and D(z) = z . 1 − (0.2 + 0.5i)z 1 − (0.5 − 0.3i)z

On the right, an asymmetrical picture associated with a degree-4 Blaschke product outlining a clearly non-elliptical form appears, and Fujimura’s result says that the Blaschke product cannot be written as a composition of two nontrivial Blaschke products.

3 The Beauty of Blaschke Products

67

Fig. 25 Curves associated to degree-4 Blaschke products

Blaschke Products and Ellipses in the Poincaré Disk Model So far, the focus has been on the envelope of the line segments joining the points that the Blaschke product B identifies. But Blaschke products are hyperbolic polynomials, so it is natural to view this in terms of hyperbolic geometry. In particular, what happens when the Euclidean lines of the previous section are replaced with hyperbolic lines in the Poincaré disk model? We again start with the simplest case, a Blaschke product B of degree 2. The ideal points on the unit circle that are identified by B are now joined by the hyperbolic line of the Poincaré disk model. These lines all intersect in one point, but this time they intersect at the critical point of B; see Fig. 26. In the Euclidean model, when B(0) = 0, a Blaschke factor could be used to move points so that zero mapped to zero. In the hyperbolic case, there is no need to require that B(0) = 0, because the critical points are invariant under post-composition by a Blaschke factor. Formula (3) for distance in the Poincaré disk model was introduced in the section entitled Hyperbolic Geometry. In view of this formula, it is reasonable to ask what a hyperbolic circle with center a ∈ D is. The answer is that it is the set of points z for which d(z, a) = r, for a constant r with r > 0 – the same as in the Euclidean setting, but using a different distance function. Since d(z, a) = r if and only if ρ(z, a) = s for some s with 0 < s < 1, a hyperbolic circle with center a and ρ-radius s is Dρ (a, s) := {z : ρ(z, a) = s}. If a = 0, then (as the reader should check) the hyperbolic circle centered at zero and of radius r is also a Euclidean circle with center zero and Euclidean radius s, where s = (er − 1)/(er + 1). Something like this is true in general: Consider Dρ (a, s) and the Blaschke factor defined by

68

U. Daepp et al.

Fig. 26 Hyperbolic lines with ideal points identified by a degree-2 Blaschke product

Ma (z) =

z−a 1 − az

that maps a to 0. Since Ma preserves hyperbolic circles and distances, Ma maps the circle Cρ (a, s) to the circle Cρ (0, s). But Ma also preserves Euclidean circles and so does Ma−1 . Therefore Cρ (a, s) = Ma−1 (Cρ (0, s)) is also a Euclidean circle. Thus, a hyperbolic circle is also a Euclidean circle (and conversely), though the center and radius are not usually the same. Formulas for the center and the radius can be found in Garnett (1981). The Poincaré disk model bounded by the black unit circle with five blue circles of hyperbolic radius 1 appears in Fig. 27. The blue hyperbolic centers are located at five distinct points, namely   1 1 − n e(2n−1)π i/4 2

for n = 0, . . . , 4.

The black dots are the Euclidean centers of the circles, and it should be noted that the Euclidean and hyperbolic centers of the circle centered at the origin coincide. The curve in the figure on the right is also a hyperbolic circle produced by the Blaschke product  B(z) =

z − (1 + i)/5 1 − z(1 − i)/5

3 .

In the previous section, the focus was on ellipses (Euclidean ellipses). It is time to consider ellipses again, but this time in the hyperbolic setting. A hyperbolic ellipse is the set of points such that the sum of the distances to two fixed points a1 and a2

3 The Beauty of Blaschke Products

69

Fig. 27 Circles in the Poincaré disk model

is a constant. While this may sound familiar, the point is that the distance is now the hyperbolic distance. Recall that lines in this model (the Poincaré disk model) are represented by circular arcs (geodesics) inside the unit circle that are orthogonal to the circle at their points of intersection. In the Euclidean model, starting with a Blaschke product of degree 3 with a zero at zero and connecting the points it identified, the resulting triangle circumscribed an ellipse. Consider now the hyperbolic version of this: Connect the three points a Blaschke product of degree 3 (not necessarily with a zero at 0) identifies with geodesics, forming three parallel lines. Is the result interesting? Indeed it is. On the left of Fig. 28, the ideal points that are identified by a Blaschke product of degree 3 are connected by three parallel lines. On the right, these lines envelope a curve when the procedure is repeated with other points on the unit circle. In fact, Singer showed that if γ denotes the curve in D that is the envelope of the non-Euclidean geodesics joining points a Blaschke product B of degree n identifies, then γ is part of an algebraic curve for which the real foci in D are the critical points of B in D (together with their inverses with respect to D), Singer (2006). If the Blaschke product is of degree 3, then the curve γ is a non-Euclidean ellipse, and the geometric foci are the two algebraic foci in D, which are the two critical points of B in D. The curve on the right in Fig. 28 is thus a hyperbolic ellipse with foci marked in the figure. Our eye, trained by Euclidean geometry, is no longer able to recognize the symmetry of this curve because the distance function seems “distorted”; thus, to understand this, it is necessary to expand the traditional vision (i.e., Euclidean vision) of an ellipse. We now also have a better understanding of the circle on the right of Fig. 27. The curve associated with this Blaschke product is a hyperbolic ellipse, but the two critical points coincide, and thus the ellipse becomes a hyperbolic circle. This, however, is also a Euclidean circle, as discussed above. It may be difficult to see in

70

U. Daepp et al.

the picture, but the zero of the Blaschke product, which is also the critical point, is not in the Euclidean center of this circle. It is the circle’s center with respect to the Poincaré disk model. Figure 29 shows curves that correspond to Blaschke products of degrees 4 and 7, respectively. These are no longer ellipses, but the critical points of the Blaschke products that lie in the Poincaré disk are real foci of these curves. There is another piece of the curve and other critical points outside the disk; the relation between the points is that those outside are the symmetric points, with respect to the unit circle, of those inside.

Fig. 28 A Blaschke product of degree 3 in the Poincaré disk model

Fig. 29 Curves associated to Blaschke products of degree 4 and 7 in the Poincaré model

3 The Beauty of Blaschke Products

71

Compositions of Blaschke Products This section demonstrates how visualization can provide new insight into the algebraic structure of mathematical objects. Such structures arise, for instance, from the factorization of natural numbers, which is the source of the rich field of number theory. In contrast to this, factorization of Blaschke products in the usual sense is trivial. However, there is another algebraic operation that gives rise to a number of challenging questions – this is the composition g ◦ f of two functions f and g, defined by (g ◦ f )(z) := g(f (z)). It can easily be verified that the composition B = g ◦ f of Blaschke products f and g of degree m and n, respectively, is a Blaschke product of degree mn, so composition is an operation on the class of (finite) Blaschke products. In analogy to multiplication of integers, it is easy to build a composition – but to decide whether or not a given function is composite, and to decompose it into its “factors,” is a nontrivial task. There has been a great deal of interest in this problem: For polynomials a breakthrough appears in Ritt (1922). For a brief account of the history of the composition problem for Blaschke products, the reader, is referred to Cowen (2012). One approach to the problem will be described in detail below. For a different, but related, approach, see Chalendar et al. (2018). Figure 30 shows enhanced phase plots of two Blaschke products. Knowing that one is a composition and the other one is not, it is not too difficult to guess which is which: A composition should have more structure than an arbitrary function – and looking at the phase plots with some patience, one finds that the candidate on the left is the better choice. In fact, the relevant structure is encoded in the exceptional tiles that have more than 4 corners (see the section Seeing Complex Functions). The reason for this is

Fig. 30 Which of these phase plots represents a composition of Blaschke products?

72

U. Daepp et al.

f →

g →

Fig. 31 Construction of the phase plot of a composition g ◦ f , to be read from right to left

Fig. 32 Generation of critical tiles in the phase plots of g ◦ f and g from tiles in the z plane

explained in Fig. 31, which illustrates the composition of two Blaschke products. Starting with a standard polar pattern on the right-hand side, the phase plot of g in the middle is its pullback via the mapping g. The n−1 critical points of g are located in the exceptional tiles. The phase plot of g◦f on the left is the pullback of the image in the middle via f . Its exceptional tiles are of different origin: Tiles of the first type contain the critical points of f – they are pre-images of regular tiles of the plot in the middle – while tiles of the second type are generated as pullback of exceptional tiles in the phase plot of g. Since the image of f is an m-fold covering of the disk, each of the latter tiles is replicated m times. As a consequence, the exceptional tiles in the phase plot of g ◦ f form n − 1 groups, each consisting of m conformally equivalent tiles, and the remaining m − 1 tiles contain the critical points of f (see Fig. 32). The arithmetic background of this observation is the chain rule (g ◦ f ) = (g  ◦ f ) · f  .

3 The Beauty of Blaschke Products

73

The explanations above are somewhat simplified and have to be modified if f or g have multiple critical points or if their critical points and critical values are in special positions. Working out the details yields the following decomposition result. A finite Blaschke product B is decomposable as B = g ◦ f with Blaschke products f and g of degree m ≥ 2 and n ≥ 2, respectively, if and only if the critical points of B can be partitioned into multisets1 A0 , A1 , . . . , An−1 such that: (i) The set A0 contains m − 1 elements, and each set A1 , . . . , An−1 contains m elements. (ii) Two critical points of B have the same multiplicity whenever they belong to the same set Ak for some k = 1, . . . , n − 1, (iii) Let f0 be (one and then any) Blaschke product of degree m with A0 as set of critical points. Then f0 is constant on each Ak for k = 1, . . . , n − 1. If these conditions are satisfied, then B can be decomposed as B = g0 ◦ f0 , and the general form of such decompositions is B = (g0 ◦ h−1 ) ◦ (h ◦ f0 ) with a conformal disk automorphism h, Daepp et al. (2015). In many cases the determination of candidates for the sets A1 , . . . , An−1 is rather easy: If B = g0 ◦ f0 with f0 satisfying condition (iii), then B must also be constant on each set, say B(z) = ck for all z ∈ Ak . Typically, these n − 1 critical values ck of B are distinct, and they are also different from the critical values of B attained at the critical points of f0 . Thus, the splitting of critical points into the sets Ak corresponds to the values of B at these points. However, one should be aware that the existence of such a partitioning is a necessary condition that does not by itself guarantee that B is decomposable. Moreover, there are (exceptional) cases in which B also attains the critical value ck at critical points not belonging to Ak , for instance, at points in Aj = Ak or at critical points of f0 . This happens, for example, if g is itself a composition. Verification of the crucial third condition relies on Heins’ theorem on the existence and uniqueness of a Blaschke product f0 of degree m with prescribed set A0 of m − 1 (not necessarily distinct) critical points in D (see the section Blaschke products). The algorithmic aspects of this result are quite challenging, but in the case at hand, this can be circumvented in the following way: Since the set of Blaschke products (of any fixed degree) is invariant with respect to pre- and post-composition with conformal automorphisms h of the disk, one can assume that B(0) = 0 and then replace a (possible) composition B = g◦f by B = (g◦h−1 )◦(h◦f ) = g0 ◦f0 , such that also f0 (0) = 0. This implies g0 (0) = 0. Hence, if f0 (z0 ) = 0 at a point z0 ∈ D, then B(z0 ) = 0. Consequently f0 must be a sub-product of B, and it is only necessary to test the third criterion (iii) against all such f0 .

1 In

contrast to sets, multisets may contain elements repeatedly.

74

U. Daepp et al.

Fig. 33 Some hidden symmetries in the phase plot of a composition

Fig. 34 Discrete conformal analysis of Blaschke products revealing symmetries

The governing principle behind these observations is a kind of hidden symmetry that is inherent in the compositions of Blaschke products and can be read off from their enhanced phase plots. Such symmetries are visualized in Fig. 33, which illustrates the fact that the critical points of B = g ◦ f belonging to the same set Ak (see the decomposition result) are, in a sense, symmetric with respect to the critical points of f . The reason for the appearance of these symmetries becomes plausible in Fig. 34, which shows “polar chessboard plots” of g ◦ f (left) and g (right). The yellow and the violet exceptional tiles contain the m − 1 = 2 critical points of f . Each of these belongs to a colored chain that connects opposite edges of that tile with the boundary of the disk. These two chains split the disk into m = 3 subdomains S1 , S2 , S3 , and f maps each subdomain onto (a copy of) D (shown on the right). These mappings are injective, except for the (colored) tiles of the bounding chains, in which two symmetric tiles have the same image. The exceptional (yellow and violet) tiles are mapped onto (double coverings) of the corresponding terminal tiles of the (yellow and violet) chains in the image on the right.

3 The Beauty of Blaschke Products

75

Fig. 35 Symmetric paths connecting tiles associated with the sets A1 (left) and A2 (right)

Each of the subdomains Sj contains n − 1 = 2 exceptional tiles (red, green, blue) that are the pre-images of the (correspondingly colored) critical tiles in the chessboard plot of g with respect to the mapping f . Finally, each group A1 , . . . , An−1 of critical points in the decomposition result is represented by exactly one member in each subdomain Sj . The color of the corresponding exceptional tile indicates the group to which its critical point belongs. Polar chessboard plots are also convenient for finding the relevant symmetries, as illustrated in Fig. 35. This figure shows several chains of tiles that are symmetric with respect to the critical points of f (contained in the yellow and violet exceptional tile) and connect exceptional tiles (red, green, and blue) that are associated with the same set A1 , A2 , and A3 , respectively. If such symmetries are absent at some level of discretization, the Blaschke product is indecomposable. Theoretically, it would also work the other way around, provided that one could verify these symmetries exactly. This requires two things: (1) finding appropriate symmetric paths connecting the exceptional tiles and (2) verifying that the connected critical tiles are conformally equivalent. The first step is basically a counting exercise, but the second is impracticable. On the other hand, if a Blaschke product is not a composition, this can always be detected by choosing a sufficiently fine discretized chessboard plot.

Conclusion The goal of this chapter has been to demonstrate how visualization can help to understand concepts and results of a mathematical theory without recourse to formulas. In fact, many mathematicians consider their research to be one of discovery, and seeing the objects of their investigations can be a source of

76

U. Daepp et al.

Fig. 36 Carsten Nicolai’s stellar grating MI-RG-090 and a plot of f (z) = exp(1/z)

inspiration. Beauty and symmetry are often guiding principles in these explorations. While beauty remains a matter of individual taste, experience, and convention, the concept of symmetry has a precise mathematical meaning, formally expressed in the language of group theory and reaching far beyond the common interpretation of this notion. For centuries in the visual arts, beauty was a dominating criterion for the reception of artwork. In particular, during the Renaissance, strong composition rules were often based on symmetry and mathematical principles, and ingenious artists like Leonardo da Vinci or Albrecht Dürer made important contributions to mathematics. During the twentieth century, the fundamental paradigm that art must be beautiful was abandoned, and art was assigned a new role as part of the human project of exploring nature, reality, and society. In the preface to the catalogue of Carsten Nicolai’s exhibition “unidisplay,” Eva Huttenlauch writes: “Art has become a category of thinking rather than seeing . . . ”, Huttenlauch (2013, p. 13). As the aims of artists expanded, the border between mathematics and arts began to blur. Artists like Victor Vasareli, Sol LeWitt, and Carsten Nicolai and scientists like Heinrich Heesch, Anatoly Fomenko, and Isaac Amidror are building bridges between arts and mathematics. Approaching a topic from opposite sides, emphasizing different aspects, and using tools specific to each discipline, mathematicians and artists sometimes seem to explore one and the same general phenomenon, indicating the presence of universal principles. For example, it is not an accident that the “stellar grating MI-RG-090” from the work “moiré index” (Nicolai, 2010b) resembles a black-and-white phase plot of the function f (z) = e1/z (Fig. 36). In the same vein, it would be interesting to do a comparative study of Niclolai’s systematic listing of grids and tilings, Nicolai (2010a), (especially the semiregular and irregular ones), with Frank Farris’ wallpaper patterns, Farris (2015),

3 The Beauty of Blaschke Products

77

and Isaac Amidror’s theoretical foundation of moiré patterns, Amidror (2007, 2009). Visualization is important for both the arts and mathematics. For the former, this has a long tradition. For the latter, tradition has focused on the subfield of geometry. However, advances of computer graphics opened up new possibilities for other branches of mathematics as well. In this chapter, it is the field of complex analysis that benefits from visual representations of its objects. In many mathematical subjects, the visual tradition has only just begun. Acknowledgments Since August 2018, Pamela Gorkin has been serving as a Program Director in the Division of Mathematical Sciences at the National Science Foundation (NSF), USA, and as a component of this position, she received support from NSF for research, which included work on this paper. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References Amidror I (2007) The theory of the Moiré Phenomenon: Volume II: Aperiodic Layers. Springer, Dordrecht Amidror I (2009) The theory of the Moiré Phenomenon: Volume I: Periodic Layers. Springer, London Anderson JW (2005) Hyperbolic geometry. Springer undergraduate mathematics series, 2nd edn. Springer, London Arnold DN, Rogness J (2008) Möbius transformations revealed. Notices Am Math Soc 55(10):1226–1231 Baldus R (1944) Nichteuklidische Geometrie: hyperbolische Geometrie der Ebene, 2nd edn. Sammlung Göschen, De Gruyter Chalendar I, Gorkin P, Partington JR, Ross WT (2018) Clark measures and a theorem of Ritt. Math Scand 122(2):277–298. https://doi.org/10.7146/math.scand.a-104444 Cowen CC (2012) Finite Blaschke products as compositions of other finite Blaschke products. arXiv:1207.4010 Daepp U, Gorkin P, Mortini R (2002) Ellipses and finite Blaschke products. Am Math Mon 109(9):785–795. https://doi.org/10.2307/3072367 Daepp U, Gorkin P, Shaffer A, Sokolowsky B, Voss K (2015) Decomposing finite Blaschke products. J Math Anal Appl 426(2):1201–1216. https://doi.org/10.1016/j.jmaa.2015.01.039 Daepp U, Gorkin P, Shaffer A, Voss K (2018) Finding Ellipses: What Blaschke Products, Poncelet’s Theorem, and the Numerical Range Know about Each Other. The Carus mathematical monographs, No. 34. The Mathematical Association of America/American Mathematical Association, Providence Euclid (1956) The thirteen books of Euclid’s Elements translated from the text of Heiberg. Vol. I: Introduction and Books I, II. Vol. II: Books III–IX. Vol. III: Books X–XIII and Appendix. Dover Publications, Inc., New York, translated with introduction and commentary by Thomas L. Heath, 2nd edn Farris FA (1998) Reviews: Visual Complex Analysis. Am Math Mont 105(6):570–576. https://doi. org/10.2307/2589427 Farris FA (2015) Creating symmetry: The Artful Mathematics of Wallpaper Patterns. Princeton University Press, Princeton. https://doi.org/10.1515/9781400865673 Fujimura M (2013) Inscribed ellipses and Blaschke products. Comput Methods Funct Theory 13(4):557–573. https://doi.org/10.1007/s40315-013-0037-8

78

U. Daepp et al.

Garcia SR, Mashreghi J, Ross WT (2018) Finite Blaschke products and their connections. Springer, Cham. https://doi.org/10.1007/978-3-319-78247-8 Garnett JB (1981) Bounded analytic functions, Pure and Applied Mathematics, vol 96. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York/London Gorkin P, Wagner N (2017) Ellipses and compositions of finite Blaschke products. J Math Anal Appl 445(2):1354–1366. https://doi.org/10.1016/j.jmaa.2016.01.067 Gorkin P, Laroco L, Mortini R, Rupp R (1994) Composition of inner functions. Results Math 25(3-4):252–269. https://doi.org/10.1007/BF03323410 Greenberg MJ (2007) Euclidean and non-euclidean geometries: development and history, 4th edn. W. H. Freeman, New York Heins M (1962) On a class of conformal metrics. Nagoya Math J 21:1–60 Heins M (1986) Some characterizations of finite Blaschke products of positive degree. J Analyse Math 46:162–166. https://doi.org/10.1007/BF02796581 Huttenlauch E (2013) Art as an explanatory model of universal complexity: on Carsten Nicolai’s unidisplay, in: Carsten Nicolai, unidisplay, pp 9–13 (80), Gestalten, Berlin Jahnke E, Emde F (1909) Funktionentafeln mit Formeln und Kurven. Mathematisch-physikalische Schriften für Ingenieure und Studierende, B.G. Teubner, Leipzig und Berlin Kraus D, Roth O (2008) Critical points of inner functions, nonlinear partial differential equations, and an extension of Liouville’s theorem. J Lond Math Soc (2) 77(1):183–202. https://doi.org/ 10.1112/jlms/jdm095 Kraus D, Roth O (2013) Critical points, the Gauss curvature equation and Blaschke products. In: Blaschke products and their applications. Fields institute communications, vol 65, Springer, New York, pp 133–157. https://doi.org/10.1007/978-1-4614-5341-3_7 Milnor J (1982) Hyperbolic geometry: the first 150 years. Bull Am Math Soc 6(1):9–24 Ng TW, Tsang CY (2013) Polynomials versus finite Blaschke products. In: Blaschke products and their applications. Fields institute communications, vol 65. Springer, New York, pp 249–273. https://doi.org/10.1007/978-1-4614-5341-3_14 Nicolai C (2010a) Grid Index. Die Gestalten Verlag, Berlin Nicolai C (2010b) Moiré Index. Die Gestalten Verlag, Berlin Ritt JF (1922) Prime and composite polynomials. Trans Am Math Soc 23(1):51–66. https://doi. org/10.2307/1988911 Semmler G, Wegert E (2019) Finite Blaschke products with prescribed critical points, Stieltjes polynomials, and moment problems. Anal Math Phys 9(1):221–249. https://doi.org/10.1007/ s13324-017-0193-5 Singer DA (2006) The location of critical points of finite Blaschke products. Conform Geom Dyn 10:117–124. https://doi.org/10.1090/S1088-4173-06-00145-7 Stäckel P (1933) Gauß als Geometer. In: Carl Friedrich Gauß Werke Band X.2 Abhandlungen über Gauss’ wissenschaftliche Tätigkeit auf den Gebieten der reinen Mathematik und Mechanik. Springer Berlin Walsh JL (1950) The location of critical points of analytic and harmonic functions. Colloquium Publications, vol 34. American Mathematical Society Walsh JL (1952) Note on the location of zeros of extremal polynomials in the non-euclidean plane. Acad Serbe Sci, Publ Inst Math 4:157–160 Wegert E (2011) Phase diagrams of meromorphic functions. Comput Methods Funct Theory 10(2):639–661 Wegert E (2012) Visual complex functions: An Introduction with Phase Portraits. Birkhäuser/Springer Basel AG, Basel. https://doi.org/10.1007/978-3-0348-0180-5 Wegert E, Semmler G (2011) Phase plots of complex functions: a journey in illustration. Notices Am Math Soc 58(6):768–780

4

Looking Through the Glass Annalisa Crannell

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A New Mathematical Object: The Point of Projective Geometry . . . . . . . . . . . . . . . . . . . . . . . Ideal Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vanishing Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Where Was the Camera? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Consequence of Viewing Distances: Illusion, Distortion, and Anamorphism . . . . . . . . . . . Dolly Zoom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anamorphic Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impossible Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Going Backward from Pictures to 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Homogeneous Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple View Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Ames Room . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconstructing Objects from Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80 81 83 84 84 86 88 90 92 92 94 96 98 100 100 102 102 102

The description in section “Going Backward from Pictures to 3D” of the three steps for reconstructing three-dimensional objects from a collection of photographs was influenced by a talk by Joe Kileel at the Algebraic Vision session at the SIAM conference on Applied Algebraic Geometry, July 31 2017 in Atlanta GA. Joe had just finished his PhD from UC Berkeley and was headed for Princeton. A. Crannell () Franklin & Marshall College, Lancaster, PA, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_41

79

80

A. Crannell

Abstract Projective geometry allows us, as its name suggests, to project a threedimensional world onto a two-dimensional canvas. A perspective projection often includes objects called vanishing points, which are the images of projective ideal points; the geometry of these points frequently allows us to either create images or to reconstruct scenes from existing images. We give a particular example of using a pair of vanishing points to locate the position of the artist Canaletto as he painted the Clock Tower in the Piazza San Marco. However, because mappings from three-dimensional space to a two-dimensional plane are not invertible, we can also use perspective and projective techniques to create and analyze illusions (e.g., anamorphic art, impossible figures, the dolly zoom, and the Ames room). Moving beyond constructive (e.g., ruler and compass) projective geometry into analytical projective geometry via homogeneous coordinates allows us to create and analyze digital perspective images. The ubiquity of digital images in the present day allows us to ask whether we can use two (or many) images of the same object to reconstruct that object in part or in entirety. Such a question leads us into the emerging field of multiple view geometry, straddling projective geometry, algebraic geometry, and computer vision.

Keywords Linear perspective · Multiple view geometry · Projective geometry · Anamorphism

Introduction This chapter is about perspective art, and in particular about the role that projective geometry plays in perspective art. Most people are aware that perspective techniques began to flourish during the Renaissance, and as a result drawings and paintings of that era became demonstrably more “realistic” or “lifelike” than art in previous eras. Now we are living through a similar Renaissance, especially in the technological realm (which includes our animated movies, video games, medical imaging, and more). The mathematics that transformed our world several centuries ago still flourishes around us; it continues to have relevance and power in the way we look at the world today. The word perspective comes from the Medieval Latin roots per (“through”) and specere (“look at” – the same root that gives us “spectacles”). So the title of this chapter is a deliberate pun: like the book written in 1871 by the mathematician Charles Dodgson under his pen name, Lewis Carroll (1871), perspective art literally intends us to look through a window to see the objects it portrays lying on the other side. And as Carroll’s book suggests, sometimes the view that we get by looking through the glass will give us glimpses of the world that are surprising – even wonderful – feats of illusion and magic.

4 Looking Through the Glass

81

A Brief History There is a lore that projective geometry has been a subject intimately connected with, and arising from, the development of perspective art. That lore is not entirely in accordance with historical fact. (For a much more comprehensive description of the history of perspective art than this chapter can provide, see Andersen’s excellent volume Andersen 2006.) The formal introduction of linear perspective is generally credited to Filippo Brunelleschi, an Italian designer, architect, and engineer who lived 1377–1446 (See also  Chap. 44, “Renaissance Architecture”). His perspective demonstrations relied extensively on geometry but also on physical apparatuses – he interposed mirrors between his canvas and the pictured scenes to validate the accuracy of his images. Brunelleschi’s work had an almost immediate influence on Leon Battista Alberti (1404–1472), an Italian polymath (architect, priest, artist, and author). In 1435, Alberti published Della pittura, his seminal work on perspective, whose influence reached far and wide. For two centuries, perspective art remained largely in the arena where Brunelleschi and Alberti had placed it: as an exercise in Euclidean geometry and engineering. The German mathematician and astronomer Johannes Kepler (1571– 1630) may have been the first person to introduce the projective notion of “points at infinity.” However, Kepler’s motivation arose not from perspective art, but rather from developing a unified theory of conics (e.g., “closing up” the parabola). In the early-to-mid 1600s, Girard Desargues (1591–1661) published a series of short works, some in perspective art (notably, Desargues 1987) and others in projective geometry. Like Brunelleschi and Alberti, Desargues was a mathematician and engineer. The theorem that bears his name to this day appears in a work of homage by his contemporary, Abraham Bosse (1648). Desargues’s theorem states that “A pair of triangles perspective from a point is also perspective from a line.” This theorem does indeed have perspective art interpretations: see Fig. 1, which depicts a lamp casting a shadow. In this figure, the corresponding vertices of two triangles are collinear from the bulb of the lamp (a point), and the corresponding edges of the triangles are coincident with the line where the glass meets the ground. But it is not clear that the theorem was directly motivated by a similar situation; in Bosse’s manuscript, the formulation of Desargues’s theorem is separated from his description of Desargues’s work in perspective; the diagram and proof are both highly abstract. Desargues’s work seems to have been lost or neglected in the period that follows, possibly because the algebraic approach to geometry put forward by his contemporary, Rene Descartes, proved more versatile. A century later, for example, the artist Canaletto (whom we will return to in section “Where Was the Camera?”) was creating his paintings with the camera obscura rather than with geometry. Across the channel, the English mathematician Brook Taylor (of Taylor’s series fame) would publish his highly celebrated “New Principles of Linear Perspective: or the Art of Designing on a Plane the Representations of all sorts of Objects, in a more General and Simple Method than has been done before” Taylor (1719). But in

82

A. Crannell

Fig. 1 A perspective interpretation of Desargues’s theorem

spite of the promise of the first word of this title, the book contained very little that was “new”; it relied almost exclusively on Euclidean geometry (moreover, it was often described as far from “simple” to read). Two centuries after Desargues introduced projective geometry, another French engineer and mathematician – Jean-Victor Poncelet (1788–1867) – resurrected it. Famously, Poncelet wrote much of what would become his “Traité des propriétés projectives des figures” during a two-year imprisonment; he had been captured during Napoleon’s campaign against the Russian Empire. Poncelet’s geometry was axiomatic and theoretical, and was not explicitly motivated by, nor applied to, perspective art. The centuries that followed have seen projective geometry take a variety of forms. Perhaps farthest from perspective art is the subfield of finite projective geometry, with points and lines abstracted (as in the Fano plane, Fig. 2). But fittingly, given the coincident geometric contributions of Desargues and Descartes, it is in the realm of analytical projective geometry where we see recent, exciting applications to perspective images, as well as to reconstructing the objects that make those images. In the sections that follow, we build from perspective applications of “traditional” (ruler and compass) projective geometry toward these analytical applications.

4 Looking Through the Glass

83

Fig. 2 The Fano plane contains seven points, each incident with three “lines”, and seven “lines”, each incident with three points

A New Mathematical Object: The Point of Projective Geometry Traditional perspective art assumes that there is an artist looking with one eye through a window or canvas at the world. We call the location of the viewer’s eye the center of the projection and denote it by the point O; we’ll denote the picture plane by the greek letter ρ, and the image of a real-world point X on the canvas ρ we’ll denote by the symbol X . There are other physical setups that give us similar projections on planes. For example, a camera might have a lens or pin-hole that projects objects in the real world onto a sheet of film or a set of pixels; again, we call the lens the center O of the projection, with the film lying in a plane ρ and the object and its image similarly denoted by X and X , respectively. Or we might have a light source casting a shadow on the ground; the light source in this case would play the role of the center O; the ground becomes the image plane ρ, and the object and its shadow are X and X . What all these situations have in common is that the points O, X, and X are collinear and that X is the intersection of the line through O and X with the plane ρ. (In shorthand mathematical notation, we write X = (OX) · ρ.) This simple notion runs into difficulties, however, if the point X lies in an “awkward” place: if the line OX is parallel to the plane ρ, then the intersection (OX) · ρ is empty (at least in the usual realm of Euclidean geometry). Fortunately for artists, this situation does not seem to arise often; if an artist wanted to draw her feet (which presumably are directly below her eye), she would tilt the picture plane rather than leaving the canvas vertical. A much more frequent artistic conundrum is that sometimes the image X appears to exist even though the object X does not: this situation arises in the case of the well-known vanishing point. The vanishing point where the two railroad tracks appear to meet together on the horizon plays an extremely important role in a perspective picture, even though there is no such point in the real world.

84

A. Crannell

Ideal Points To counteract both of the above difficulties with single solution, mathematicians expanded the notion of Euclidean space to a larger space; if we use analytic properties such as coordinates in this space, we call it “projective space” (P 3 (R)), or if we use purely geometric properties, we call it “Extended Euclidean space” (E3 ). This larger space includes not only all the familiar points in R3 , but also an additional set of points called ideal points (or sometimes points at infinity). In the spaces P 3 (R) and E3 , we must alter our conception of parallel lines; in particular, lines in R3 that are parallel meet in E3 at an ideal point. We will delve further into P 3 (R) in section “Homogeneous Coordinates”; until then, this text will only need the geometric properties of E3 . Ideal points are created by what we call a formal definition, meaning that the definition itself “forms” the object. This kind of definition √ is different than one that merely identifies an existing object: we could define 2 to be√“the positive real number x with the property that x 2 = 2.” The definition of 2 is not a formal definition, because such a number already exists in R. But the definition of ideal points creates something new, in the same way that defining the imaginary number i to be “a number z with the property that z2 = −1” creates something that does not exist in R, leading to the formation of the complex plane C. In the same way, the space E3 is larger than and has different properties from R3 . In particular, in E3 , every line and plane intersect in a point (unless the line is a subset of the plane, in which case their intersection is a line). This means that if the center O is not a subset of the image plane ρ and if O = X, the image point X = (OX) · ρ is always well defined. Similarly, two lines in E3 are coplanar if and only if they intersect in exactly one point. In this sense, as we noted above, “parallel” lines are coplanar and intersect in an ideal point. Artistically speaking, the existence of ideal points as the intersection of parallel lines allows us to say that if X is a vanishing point in our picture, then the object X that it portrays exists and is a point “at infinity.” Because vanishing points play such a crucial role in understanding perspective pictures, it is worth looking at these objects more carefully.

Vanishing Points In the same way we say someone is a “parent” when that person is the parent of some child or group of children, a “vanishing point” is always a vanishing point of some line or collection of lines. An examination of Fig. 3 shows that the line  appears to vanish when the artist at point O is looking parallel to ; it follows that a point V ∈ ρ is the vanishing point for the line  if and only if OV  .

4 Looking Through the Glass

85

B r A

O

V F E

A

B

C

D

E

 F

D C

Fig. 3 Points on the line  project to the plane ρ from the center O. Points A and B project to A and B  like a camera with O like a lens; points C and D project to C  and D  like shadows with O like a light source; points E and F project to E  and F  like drawing on a window, with O like the artist’s eye. The line OV is parallel to ; we say V is the vanishing point of 

As we noted above, the vanishing point V is the image of the ideal point (the point at infinity) on . It follows that if several lines 1 , 2 , 3 , . . . are parallel to one another, then the line OV is parallel to all of them if and only if OV is parallel to any one of them, so the lines 1 , 2 , 3 , etc. all have the same vanishing point. If the lines 1 , 2 , 3 , etc. are parallel to one another but not parallel to the picture plane, it follows that V is a real (rather than ideal) point, so their images 1 , 2 , 3 , etc. are not parallel but rather all intersect at that point V (giving us, e.g., the drawing of the railroad tracks that converge at a point in the horizon). If the lines are parallel to one another and also parallel to the picture plane, then OV is likewise parallel to the picture plane, implying V is an ideal point, and so the images 1 , 2 , 3 , etc. will all be parallel to each other (as well as to the original lines). Note that this definition of vanishing point implies something significant about interpreting a piece of art. If we know something about a set of lines (say, we can infer that the lines in the road were running perpendicularly to the canvas), and we can locate the vanishing points of those lines on the canvas, then this means we know something about the location O of the artist, and this location is something we explore further in the next section.

86

A. Crannell

Where Was the Camera? In the previous section, we claimed that the location of vanishing points helps us determine the location of the artist or camera that made the picture. In this section, we explore the implications of this claim. Determining the location of an artist or of a camera is the source of a good amount of mathematical inquiry (see, e.g., references Byers and Henle 2004, Crannell 2006, Futamura and Lehr 2017, Robin 1978, and Tripp 1987). Moreover, the methods for solving this question lead to multiple applications, as we will see in the sections that follow (See also  Chap. 26, “Geometries of Light and Shadows, from Piero della Francesca to James Turrell”). Here we give one simple example of using geometry to locate the original position of an artist: a standard, back-of-the envelope calculation that uses two vanishing points to determine the original viewing location. Figure 4 shows a painting from circa 1730 of the Clock Tower in the Piazza San Marco, a noted tourist attraction. The artist, Giovanni Antonio Canal (better known as “Canaletto”), was noted for his realistic city scapes; he often used a camera obscura to project images onto canvas where he would capture them in paint. As such, his works give us excellent examples of perspective projections. In Fig. 4 we can see that images of vertical lines in the Piazza have vertical images on Canaletto’s canvas. Likewise, the horizontal lines in the front face of the clock tower building also have horizontal images. This tells us that Canaletto’s canvas was

Fig. 4 “The Clock Tower in the Piazza San Marco”, Canaletto (circa 1730)

4 Looking Through the Glass

87

set up parallel to that face of the building. We can deduce that a third set of lines depicted in the picture run perpendicularly to the canvas. Figure 5 shows that these lines have images that converge at a point V in the second floor of the building, near the main doorway and below the clock in the painting. Because this third set of real-world lines are perpendicular to the canvas, it follows that Canaletto was perpendicularly across from the point V depicted in the picture – in other words, he was not standing on the ground, but was stationed on the second floor of another building. But we can be even more specific. The picture also contains clues that help us deduce his horizontal distance from the clock tower. On the right side of the plaza is a building with semicircular arches. Around one of these arches, we can draw the image of a rectangle that is twice as long as it is high. We draw the image of the diagonal line through this rectangle, which has vanishing point D directly above V (see Figs. 5 and 6). Because the slope of the real-world line is 1/2, the geometry of similar triangles allows us to deduce that the viewing distance (the distance from Canaletto to the canvas) is twice the length of the segment V D. Assuming the clock tower to be approximately 70 ft tall (based on its height relative to the people in the picture), we get that the height of the clock tower appears to be 35% the length of V D, so Canaletto was approximately 200 ft from the clock tower.

Fig. 5 The vanishing point V of those lines that are perpendicular to the canvas shows that Canaletto painted this canvas from a second-story location. The vanishing point D the diagonal line of a vertical rectangle lies directly above V

88

A. Crannell

d D

face of clock tower

r

O V

C

Fig. 6 A side view showing the location O of the artist and the picture plane ρ. The point V ∈ ρ is both the image of a point C on the second floor of the clock tower and also the vanishing point for lines perpendicular to the picture plane ρ. The line d has vanishing point D ∈ ρ, so OD  d; therefore, the slope of OD is 1/2

In conclusion, a few standard assumptions about Canaletto’s world (buildings were constructed with right angles, the arches were semicircles, and people were approximately the same height they are today) allow us to reconstruct the location of that artist as he painted this picture 300 years ago.

A Consequence of Viewing Distances: Illusion, Distortion, and Anamorphism Understanding where the artist stood is more than a historical exercise; it also has the power to affect how we view photographs and the apparent distortion within them. Almost every person has had the experience of seeing a breathtaking vista and trying to capture it on camera, only later to lament that the photograph didn’t do justice to the power of the original view. Often, the problem is not with the mechanics of the photograph or the photographer, but with the small size of the image coupled with the too-far distance of the person looking at the photograph. If the photograph were larger, or if its viewer were closer, the sense of awe for the vista might return. Figure 7 gives an example of why the size of a photograph, a movie screen, or a reproduction of a perspective painting matters. Good perspective artists often place their vanishing points far off the picture because doing so “reduces distortion.” In Fig. 7, we have instead sized the drawing in such a way that the vanishing points are readily apparent on the page (like consolidating a magnificent scenic view into a photo that is only as wide as a phone or a laptop). Notice that the word “LIFE” appears to be highly distorted. In particular, the bottom corner of the “L” has an

4 Looking Through the Glass

89

×

Fig. 7 “LIFE” in two-point perspective, with vanishing points indicated on the horizon. The near bottom corner of the “L” has an angle of 48◦ , but if you look at the picture with one eye from very close to the × on the horizon line, the angle appears to be “correct”; that is, it appears to be 90◦

angle in the drawing of 48◦ , even though this vertex is supposed to represent a rightangled corner. We could have made the corner appear more like a right angle by placing the vanishing points further apart. But surprisingly, we can also make the corner appear more like a right angle by moving ourselves closer to the drawing. If a viewer moves uncomfortably close to this picture (in particular, if a person looks with one eye from a location very close to the × on the horizon), the angles in the word appear to be correct, 90◦ angles. Figure 8 explains why moving our eye close to the image helps the picture appear more realistic. The viewer at O1 is far from the image – just as most readers of this chapter will view “LIFE” in Fig. 7 from a comfortable distance. The lines of sight to the two vanishing points for the viewer at O1 form an acute angle θ . Recall that when an artist draws a scene through a window, the vanishing points in the picture plane will lie on those lines of sight that are parallel to the lines she is drawing in the “real world.” Therefore, for the viewer at O1 , the drawing appears to depict an object that is likewise formed by the acute angle θ . On the other hand, the viewer at O2 is closer to the picture plane, at a place where the lines of sight from O2 to the vanishing points are perpendicular. Therefore,

90

A. Crannell r

q q

O2

×

O1

Fig. 8 A top view showing two viewers looking at the picture plane ρ. The lines of sight from the viewers to the two vanishing points are parallel to the lines of the objects they appear to see in the “real world”; hence the viewer at O1 sees a diamond-shaped object, while the viewer at O2 sees a rectangle

for this viewer, the drawing appears to depict an object in the real world formed by lines that are likewise perpendicular to one another. In other words, if the drawing is supposed to depict an object with right angles, the closer viewer sees an “undistorted” picture, whereas the further viewer sees a distorted image. The reason our photographs don’t capture what we remember seeing is not because the camera messed up; it is because we view the small photographs from too far away. Enlarging the photos or moving closer to the photos will restore the illusion of depth.

Dolly Zoom Cinematographers make effective use of altered viewing distances to create a distinctive mood. Figure 9 shows one of the most effective and common of these: a movie camera technique called the dolly zoom. (The dolly zoom has many other names – including the Hitchcock zoom, because it first appeared in that director’s film Vertigo when it was pioneered by cameraman Irmin Roberts.) In this zoom, the camera is placed along a track and pulled backward while simultaneously zooming in on the figure in the foreground. The effect of this technique appears in Fig. 10. When the camera is close, even though the house and tree are large objects, they are far from the camera, so the nearby person seems relatively large compared to the background objects. But as the camera zooms in on the person’s face while simultaneously drawing backward,

4 Looking Through the Glass

91

Fig. 9 Side views showing the camera up close, and then drawn back while zoomed in. Note that the image of the house in the distant background grows larger relative to the image of the person

Fig. 10 In the first figure, the camera is close, so the nearby person seems relatively large compared to the background objects; in the second figure, the camera zooms in on the person’s face while simultaneously drawing back, so that the background objects seem to swell ominously

the background objects seem to swell in size. The effect is to make the world appear to loom large, giving that short scene a disturbingly ominous feeling. If the camera pulls back slowly (as in a diner scene in Goodfellas), the psychological effect is one of creeping unease. The audience is aware of something being not quite right, but can’t quite place the source of trouble. Often, however, the camera zooms back quickly: in The Lord of the Rings: Fellowship of the Rings, as Frodo stands on a road, the accompanying dolly zooms last a fraction of a second, evoking a feeling of terror. It’s no surprise, then, that Michael Jackson’s Thriller video ends with a similar, speedy zoom! These sudden zooms are technically difficult and costly, but clearly they are worth the expense and effort to the directors

92

A. Crannell

of these films. See Boing Boing (2015), for example, for a video clip purporting to be “23 of the best dolly zooms in cinematic history.” Of course, the effect can be reversed (even with a virtual camera); at the end of Fiona’s battle with Robin Hood’s men in the animated movie Shrek, there is a splitsecond reverse dolly zoom, giving the sudden impression that the battle is over and all is right with the world.

Anamorphic Art The word “LIFE” in Fig. 7 looks moderately distorted because of the unusually close viewing distance, but the word is still recognizable because the viewing target (at the “×”) is centered on the horizon. That is, if we hold this picture in front of us, we’ll be centered on the viewing target; the distortion comes solely from the distance between our eye and that target. Perspective techniques allow artists to create even more significant illusions by locating the viewing target close to an edge of the canvas – or even off the edge of the canvas. One of the most famous examples of this technique, called anamorphism, appears in a 1533 painting by the German-born artist Hans Holbein the Younger. The Ambassadors (Fig. 11) appears to show a wealthy landowner and a Bishop surrounded by objects both secular and religious. Toward the bottom of the painting is an odd gray-and-black smear; this smear is in fact meant to be viewed from the extreme right edge of the painting. A viewer standing at this extreme angle would not be able to see the men and their possessions clearly, but would clearly be able to see a skull hidden in plain view within the painting (Fig. 12). Anamorphic art is hardly confined to the sixteenth century; it abounds today in curated museum shows, in public spheres (e.g., in the New York subway system), and in art-gone-viral (just perform an Internet search for the sidewalk chalk artist, Julian Beever, sidewalk art (Beever 2019)). See  Chap. 10, “Anamorphosis: Between Perspective and Catoptrics” for a fuller treatment of the topic. Anamorphism has its practical aspects, too: turn arrows painted on roadways look highly distorted when seen from directly above but appear correct to the drivers approaching along the road. There are parking garages that paint anamorphic exit signs, which make sense to the cars needing to leave the building but appear to be a jumble otherwise.

Impossible Figures The above examples show how perspective art can “hide” or distort the image of a real-world, three-dimensional object within a two-dimensional canvas. But perspective art can also make unreal objects appear to exist. One famous example of such an example is the eponymous Penrose triangle (Fig. 13), popularized in the 1950s by the father-son team of Lionel and Sir Roger Penrose, a psychologist and mathematician.

4 Looking Through the Glass

93

Fig. 11 The Ambassadors by Hans Holbein the younger (Holbein, 1533)

This triangle is one of the simplest and most iconic examples of what we call “impossible figures.” Locally, at each corner of the object, this appears to be the image of a solid three-dimensional object made of flat surfaces with linear edges. But the object as a whole contradicts the local analysis. For example, as we travel around the object counter-clockwise, each subsequent corner appears to be closer to the viewer than the previous one – an impossibility in a closed loop! Many artists include “impossible figures” in their work, including Swedish artist Oscar Reutersvard – who is credited with the 1930s discovery of the triangle that would later bear the Penrose name – and M.C. Escher, whose Waterfall, Ascending and Descending, Belvedere (among many others) have captivated and perplexed generations of curious viewers. For this reason, it’s especially interesting that artists have created threedimensional statues depicting impossible figures – see, for example, Fig. 14.

94

A. Crannell

Fig. 12 Viewed from the extreme right and close to the canvas, the smear on The Ambassadors appears to be a skull

Fig. 13 A Penrose triangle is an “impossible figure”

These statues, even more than their two-dimensional counterparts, require a strict alignment with a particular viewing position for the illusion to be effective. The observation that the same object (such as the Penrose Triangle sculpture above) can have very different appearances when viewed from two different locations is one of the reasons that reconstruction of three-dimensional objects from two-dimensional photographs is such a challenging one. This challenge is the focus of the next section.

Going Backward from Pictures to 3D In the centuries that saw Desargues, Canaletto, and Poncelet, the task of drawing accurate images and maps was a significant technological challenge. But in today’s

4 Looking Through the Glass

95

Fig. 14 Two views of a Penrose Triangle sculpture at the Deutsches Technikmuseum, Berlin, February 2008 (Deutsches-Technikmuseum, 2008)

Fig. 15 A reconstruction of the Colosseum from photographs uploaded to Flickr (Agarwal et al., 2011)

world – where cameras are built into cell phones – accurate images surround us. The ubiquity of digital images has allowed us to attempt new technological challenges of our day: to recreate a three-dimensional world from a collection of photographs. See for example Fig. 15, the lead figure from a highly cited paper entitled “Building Rome in a Day” by Agarwal et al. (2011). In this rendering of the Colosseum, each triangle in the picture is the location of one of more than 2000 cameras that had uploaded photos to Flickr. The authors describe their work in this way: Entering the search term “Rome” on Flickr returns more than two million photographs. This collection represents an increasingly complete photographic record of the city, capturing every popular site, facade, interior, fountain, sculpture, painting, cafe, and so forth. It also offers us an unprecedented opportunity to richly capture, explore and study the three dimensional shape of the city.

We are all familiar with computer games that allow us to move through a virtual 3D world, and also with online sites (such as Google Maps) that allow us to virtually “move” through city streets while seated at our computers. These newly familiar experiences rely on already knowing the structure of space. Virtual gaming worlds

96

A. Crannell

have a three-dimensional structure already encoded into the software; Google maps takes images from satellite or roving, calibrated cameras with GPS coordinates encoded into the image. What makes the work of Agarwal (etc.) a geometrical challenge is the almost complete absence of a priori geographic or spatial information. Piecing the world back together from a collection of random of photographs is like fitting together a jigsaw puzzle with a million pieces, some of which are missing and many of which are redundant. (Almost no one takes pictures of the dumpster behind the grocery store; millions of people take photographs of a famous statue.) Reconstructing three-dimensional objects from a collection of photographs requires going through these three steps: 1. identifying feature points or lines that match across images; 2. doing a reconstruction from pairs or possibly triplets of images; and 3. piecing together and refine these many reconstructions using optimization. The first step requires careful use of a cluster of computers, one of which is designated as the “master node” that distributes images to individual computers (nodes) in a balanced manner. The nodes each toil away at pre-processing images by verifying they are readable and extracting available camera information (if any is attached). The process of matching images is not entirely random; in the same way most people begin solving a jigsaw puzzle by looking for edge pieces, the matching algorithm uses a library of SIFT (Scale Invariant Feature Transform) features. Likewise, the third (final) step uses intense use of computational algorithms, outside the scope of this chapter. Step two is where projective geometry comes in; this step requires “undoing” the kind of perspective map that Desargues and Caneletto mastered long ago. This step is the basis for the field of multiple view geometry, an increasingly fertile area of research for theoretical and applied mathematicians alike. Indeed, the author was introduced to this subject at an energetic week-long gathering of university professors and Google engineers at a conference on Algebraic Vision hosted by the American Institute of Mathematics in summer 2016. To describe the locations of real-world points and their photographic images in a way that is amenable to computer algorithms, we will need to understand homogeneous coordinates for space; that is the subject of the next section.

Homogeneous Coordinates To motivate the use of homogeneous coordinates (as contrasted with Cartesian coordinates), we return to the notion of an observer positioned at the origin 0 = (0, 0, 0) ∈ R3 , gazing at the world through a picture plane. To this observer, every point along a given line of sight will map to the same point on the picture plane. In

4 Looking Through the Glass

97

particular, points (x, y, z) and (λx, λy, λz) have the same image whenever  λ = 0.If x y , ,1 . the picture plane is z = 1, for example, then both of these points map to z z In this vein, we form P 2 (R), the projective plane, as equivalence classes of points in R3 \{0}. A point in P 2 (R) can be written in homogeneous coordinates in the form [x : y : z]T for (x, y, z) ∈ R3 \ {0}; we say ⎡ ⎤ ⎡ ⎤ x λx ⎣ y ⎦ = ⎣ λy ⎦ z

λz

whenever λ = 0. Just as points in P 2 (R) correspond to real lines through the origin; lines in P 2 (R) correspond to real planes through the origin. Said another way, projective points [x1 : y1 : z1 ]T , [x2 : y2 : z2 ]T , and [x3 : y3 : z3 ]T are collinear in P 2 (R) precisely when real points (x1 , y1 , z1 ), (x2 , y2 , z2 ), and (x3 , y3 , z3 ) are coplanar in R3 . The projective plane P 2 (R) and the Extended Euclidean plane E2 (see section “A New Mathematical Object: The Point of Projective Geometry”) have a natural correspondence. If we think of E2 as the extension of the particular plane z = 1, then we can identify the projective point [a : b : c]T with the ordinary point ( ac , bc , 1) whenever c = 0; projective points of the form [a : b : 0]T correspond to ideal points in E2 . This makes some intuitive sense, as these correspond to the observer’s lines of sight that are parallel to the picture plane, and so “intersect” the plane z = 1 “at infinity”. We define P 3 (R) analagously: projective points take the form [x : y : z : w]T = [λx : λy : λz : λw]T ∈ P 3 (R) for (x, y, z, w) ∈ R4 \ {0} and λ = 0. As before, we can find a natural correspondence between P 3 (R) and E3 (say, via the identification using w = 1). These homogeneous coordinates underlie much of the field of analytical projective geometry. To understand how the use of homogeneous coordinates helps us understand camera projections, consider the case of an observer standing at [0 : 0 : 0 : 1]T , looking through a planar window located at z = d, w = 0, which we think of as an embedding of P 2 (R) ⊂ P 3 (R). To such an observer, the point [x : y : z : w]T would have an image on the window located at 

dx dy : :d z z

T = [dx : dy : z]T .

That is, we can compute the transformation P 3 (R) → P 2 (R) above via the matrix multiplication

98

A. Crannell

Fig. 16 Using a spreadsheet to draw the perspective image of a cube with viewing distance 4 and viewing target (2, 7)

⎡ ⎤ ⎤ ⎡ ⎤ ⎛ ⎞ x x dx d 000 ⎢ ⎥ ⎢y⎥ y ⎥ = ⎝ 0 d 0 0 ⎠ ⎢ ⎥ = ⎣ dy ⎦ . P⎢ ⎣z⎦ ⎣z⎦ z 0010 w w ⎡

The computation above shows why algebraic geometers define a camera to be a 3 × 4 matrix. Moving the viewer, shifting the film, rotating the image plane, or using a camera with non-square pixels has the effect of changing the entries of the camera matrix P . (See (Hartley and Zisserman, 2003, Chapter 6) for a fuller description.) Figure 16 demonstrates putting this into practice in a rather simple spreadsheet. In this sheet, we draw the one-point perspective image of a cube; the viewing distance is 4 and the viewing target is (2, 7).

Multiple View Geometry How do we recover information about a three-dimensional world from twodimensional images? Suppose we have two images of the same real-world object. Usually, one of the first steps in reconstruction of the 3D scene is to determine what is called the fundamental mapping taking points in the first image α to a certain set of lines in the second image β. The description below explains how and why this mapping emerges.

4 Looking Through the Glass

99

We say points xα ∈ α and xβ ∈ β are corresponding points if they are images via the appropriate maps of a common point X ∈ P 3 (R). That is, X projects onto xα ∈ α from the point Oα , and X projects onto xβ ∈ β from the point Oβ . Then the five points X, xα , Oα , xβ , and Oβ are necessarily coplanar. Note that the line (Oα Oβ ) – called the epipolar line – lies in every such plane constructed from corresponding points. Of particular interest along this line are the epipolar points eα = α · (Oα Oβ ) and eβ = β · (Oα Oβ ). We can think of eα as the image in α of the camera at Oβ , and eβ as the image in β of the camera at Oα . The point xα might correspond to several different points in the plane β. For example, the camera at Oα might appear to show a tree growing out of a person’s head: the point xα could come from both the person’s hat and the trunk of the tree. The images of the hat and trunk in another photograph β might not coincide with each other, but because of the coplanar relationship described in the previous paragraph and illustrated in Fig. 17, they must be collinear with the epipolar point eβ . Accordingly, a pair of photographs of the same scene, taken from two different camera locations, describe a function from points xα in α to lines (eβ xβ ) in β. This function is called the fundamental mapping. Because xα and xβ can be thought of as points in P 2 (R), we can represent the fundamental mapping with a 3 × 3 matrix F , called the fundamental matrix. In general, we can determine F from 7 pairs of corresponding points in general position (the matrix is a rank-2 matrix and therefore has 7 degrees of freedom). For each of these corresponding pairs of points, the mapping satisfies F xα = (eβ xβ ); that is to say, X b a

xb

xa

Oa

ea

eb

Ob

Fig. 17 The point X and its images xα and xβ lie in a plane with the line containing the centers (Oα and Oβ ) and the epipoles (eα and eβ )

100

A. Crannell

xβT F xα = 0. In the previous section, we described a camera as a 3 × 4 matrix. If we have two images α and β, then the fundamental matrix allows us to describe a relationship between the two cameras Pα and Pβ which created the two images. Why is this? For any point X ∈ P 3 (R), we have (XT PβT )F (Pα X) = (xβT )F xα = (xβT )(eβ xβ ) = 0. Therefore, it follows that PβT F Pα is a skew-symmetric matrix. This fact is a footin-the-door for developing reconstruction algorithms. How, then, do we use the fundamental matrix to reconstruct the real-world scene? The answer is not simple, as the figure of the Ames room below shows.

The Ames Room The Ames room, designed by perceptual psychologist Adelbert Ames, Jr., is an illusion room. Viewers who peer into the room from a peephole in the wall seem to see objects that grow and shrink as the objects move from one side to the other. The illusion works because from the correct vantage point, the room appears to be a “normal,” rectangular room. But in fact, the walls, ceiling, and floors are trapezoids, with the short edges close to the vantage point and the long edges far from the vantage point. The illusion that the room is rectangular, and not trapezoidal, can be hard to overcome, even when viewers have been inside the room or see people they know walking through it, appearing to shrink or grow as they walk (Fig. 18). Said another way, the Ames room is projectively equivalent to a normal room; there is a collineation P 3 (R) → P 3 (R) (a function that takes points to points and lines to lines) that maps the Ames room onto a normal, rectangular room. For this reason, the methods described above can determine the relationship between two cameras – and thereby the reconstruction of the three-dimensional scene – only up to projective equivalence. The fundamental mapping by itself can help us distinguish between an Ames room and an A-frame house, but it can’t tell an Ames room from a regular rectangular room. We can’t extract distance or angle measurements of real-world objects without a priori information about the scene or the cameras.

Reconstructing Objects from Images Knowing real-world information vastly increases the ease with which we can reconstruct objects from images. A “calibrated camera” makes the reconstruction process much simpler. For instance, many modern digital cameras come available

4 Looking Through the Glass

101

Fig. 18 Ames room: “Room constructed to make a person appear large or small depending on perspective, in the city of Rio de Janeiro, Brazil.” (Courtesy of Andrevruas) (Andrevruas, 2011)

with GPS information encoded into the image. For even more accuracy, many 3D scanners use a known camera that is a fixed distance from a turntable rotating at known angles. Knowing the focal length of the camera allows us to account for phenomena such as the dolly zoom; knowing the viewing target allows us to account for anamorphic effects (see section “A Consequence of Viewing Distances: Illusion, Distortion, and Anamorphism”). Real-world information is useful as well. Note that in analyzing Canaletto’s painting in section “Where Was the Camera?,” we used standard observations about real-world parallel lines, and also about real-world perpendicular lines, to gain information about Canaletto’s viewing position. In general (meaning, if the scene is not an Ames-room-like scene), this kind of assumption means that reconstructing scenes with architectural features is simpler than, say, reconstructing landscapes. We can see the importance of knowing such geometric information for understanding drawings like “LIFE” (Fig. 7) or the Penrose sculpture (Fig. 13). In the analysis of Canaletto’s painting, we also used information about proportions (by assuming the arch was a semicircle) and about actual size (e.g., the heights of the people pictured). This kind of detective work is another part of reconstruction; without it, we can’t distinguish between photographs of, for example, a singlefamily home and a doll’s house.

102

A. Crannell

In practice, the task of reconstruction is further complicated by “noise” and error: points are infinitesimal, but pixels are discrete and finite. So optimization and error analysis also enter into reconstruction algorithms. Nonetheless, at the heart of any reconstruction lies the language of homogeneous coordinates and analytical projective geometry.

Conclusion The long and storied history of projective geometry weaves itself through the last half-millennium of mathematics; it is a subject that has been discovered and rediscovered by mathematicians searching for answers beyond Euclidean geometry. Its reemergence under Poncelet points to the aesthetic elegance of its axiomatic structure; the subject has also led to deeper understandings of conics (e.g., under the influence of Steiner) and of topology (e.g., under Möbius). But its utility in perspective drawings and photographs is where the subject of projective geometry becomes most applied and touches our lived experiences most directly. With the passing of time, this tool is becoming even more relevant and powerful than when Desargues first introduced it. We live in a world that is increasingly visual, a world in which technology creates, reproduces, and alters images constantly; analytic projective geometry is the machinery that allows us to create, explain, and analyze these digitized images. Beyond the technical aspect of analyzing digital images, constructive projective geometry gives us all a way to see our surroundings and the objects in them: to better understand how to look at paintings or our vacation photographs, to create or to dispel illusions, and to interpret the way we look at our wonderful, threedimensional world.

Cross-References  Anamorphosis: Between Perspective and Catoptrics  Geometries of Light and Shadows, from Piero della Francesca to James Turrell  Renaissance Architecture

References Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz SM, Szeliski R (2011) Building Rome in a day. Commun ACM 54(10):105–112. With a Technical Perspective by Prof. Carlo Tomasi Andersen K (2006) The geometry of art: the history of the mathematical theory of perspective from Alberti to Monge. Springer, New York Andrevruas (2011) Português: Casa construída de forma a fazer a pessoa parecer grande ou pequena dependendo da perspectiva, na cidade do Rio de Janeiro, 24 Jan 2011. https://commons. wikimedia.org/wiki/File:Casaperspectiva.jpg, from Wikimedia Commons

4 Looking Through the Glass

103

Beever J (2019) Julian Beever’s website. http://www.julianbeever.net/ Boing Boing (2015) Watch 23 of the best dolly zooms in cinematic history, 26 Jan 2015. https:// boingboing.net/2015/01/26/watch-23-of-the-best-dolly-zoo.html Bosse A (1648) Manière universelle de Mr. Desargues, pour pratiquer la perspective par petit-pied, comme le Geometral, Paris Byers K, Henle J (2004) Where the camera was. Math Mag 77:4:251–259 Canaletto GA (circa 1730) The Clock Tower in the Piazza San Marco. https://commons.wikimedia. org, oil on canvas, 69.22 × 86.36 cm, current location at the Nelson-Atkins Museum of Art Carroll L (1871) Through the looking-glass. Macmillan & Co, London Crannell A (2006) Where the camera was, take two. Math Mag 79:4:306–308 Desargues G (1987) Exemple de l’une des manieres universelles du s.g.d.l. touchant la pratique de la perspective sans emploier aucun tiers point, de distance ny d’autre nature, qui soit hors du champ de l’ouvrage. In: The geometrical work of Girard Desargues. Springer, New York, p 1636 Deutsches-Technikmuseum (2008) Penrose triangle sculpture. https://commons.wikimedia.org/w/ index.php?curid=3597501, images from Wikimedia Commons Futamura F, Lehr R (2017) A new perspective on finding the viewpoint. Math Mag 90(4):267–277 Hartley R, Zisserman A (2003) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, New York Holbein H (1533) The Ambassadors. https://commons.wikimedia.org, oil on oak, 209.5 cm× 207 cm Robin AC (1978) Photomeasurement. Math Gaz 62:77–85 Taylor B (1719) New principles of linear perspective: or the art of designing on a plane the representations of all sorts of objects, in a more general and simple method than has been done before, London Tripp C (1987) Where is the camera? The use of a theorem in projective geometry to find from a photograph the location of a camera. Math Gaz 71:8–14

5

Designing Binary Trees Vincent J. Matsko

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A First Example: L ∼ LR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Second Example: LR ∼ RL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Third Example: LR∞ ∼ (RL)∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Artistic Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

106 107 107 110 113 116 118 119 121 122

Abstract Binary trees are usually defined so that left and right branches are determined by scaled rotations. However, when arbitrary affine transformations are allowed, a wide variety of trees may be produced. By varying parameters in the transformations, it is possible to produce trees with interesting geometrical properties. This paper explores the inverse problem: if it desired that a tree is to possess a specific geometrical property, find out which pairs of left/right branching transformations produce trees with this property.

Keywords Binary trees · Fractal binary trees · Applications of linear transformations

V. J. Matsko () Independent Scholar, St. Petersburg, FL, USA © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_131

105

106

V. J. Matsko

Introduction When encountering a new geometrical form to use in creating digital art – such as the binary tree, discussed in this chapter – one typically writes a program which allows the use of several different parameters in the creation of the form. By experimenting with the parameters, intriguing images may be stumbled upon. Once one is found whose geometrical structure will make for interesting artwork, color, line width, texture, etc. may be incrementally adjusted until a finished piece emerges. This process may be repeated several times, resulting in a wide variety of images. Each image has its own particular geometry. A mathematician might wonder why a particular image looks the way it does and therefore dive more deeply into the geometry in an attempt to understand it more fully. However, once a particularly appealing behavior is observed, it may be desirable to create additional pieces exhibiting this same behavior. In other words, instead of the behavior being an accident of the creative process, the process is designed so that the particular behavior is evident. Thus, interesting features of a digital work are not left to chance, but instead are guaranteed to be present. As an example, consider the image in Fig. 1. Note the overlapping disks. What is happening is that as the binary tree is generated and nodes are added at increasing depths, sometimes nodes are revisited – and another smaller disk is overlaid on top of an existing larger disk or disks. The interior triangles result from the fact that it takes three iterations to revisit the same node when this behavior occurs. It is possible to generate arbitrarily many trees with this property – the revisiting of nodes after three iterations – and choose those that are aesthetically pleasing. This paper is about specifying a particular property of a binary tree in advance and looking at the mathematics behind designing families of trees with this chosen

Fig. 1 Sierpinski

5 Designing Binary Trees

107

property. This creates a sense of intentionality, not merely a reliance upon chance in randomly assigning values to parameters.

Creating Binary Trees The simplest way to define a binary tree is to specify left and right branching operations, L and R. In the classic paper, Mandelbrot and Frame 1999, left and right branching operations were scaled rotations, with the rotations being the same angle when branching left or right and the scale factor being the same on each side. Taylor (2007) defined left and right branching similarly, and Espigulé (2013) considered a similar symmetry, although allowing more than one branch on each side. O’Hanlon et al. (2004) considered asymmetrical trees, but only so far as to allow the scale factor to be different on each side. Here, L and R may be any affine transformations, allowing for a wide variety of trees to be produced. This greatly increases the dimension of the parameter space for generating binary trees. In addition to specifying the branching transformations, a trunk t of the tree must also be specified. As will be seen later, allowing more general transformations means that the choice of trunk will affect the geometry of the tree. Nodes of the tree are specified by strings of instructions. It is not necessary to provide a formal definition here, as this would involve an abundance of notation. The goal is not to prove theorems, but to illustrate interesting phenomena which arise when studying binary trees. To illustrate, consider the string LLR, shown in Fig. 2. Begin by drawing a trunk t. From the tip of t, add the vector Lt to create a new node. Then successively add LLt and RLLt, as shown. This produces the node P = t + Lt + LLt + RLLt = (I + L + LL + RLL)t,

(1)

where I is the identity transformation, which is assigned the label LLR. Note that a different font is used for the string of instructions so as not to confuse the string with matrix products such as those seen in (1). Other nodes are similarly produced. For completeness, the usual convention is that the empty string produces the node t. Finally, the cases when L = R, or when one of the transformations is the zero transformation, are not considered. In such cases, the tree is one-dimensional.

Mathematical Approach Several examples will be examined at various levels of detail. This is because there appears to be no typical mathematical behavior – there is no “one size fits all” technique to solving matrix equations. The goal will be to illustrate different techniques which give the reader a sense of the mathematics involved.

108

V. J. Matsko

Fig. 2 Creating a binary tree

RLLt P

LLt

Lt

t

Extensivework in solving matrix equations reveals that representing matrices ab in the form is not particularly useful. One reason is that the geometry of cd a linear transformation is heavily disguised in this form. But the geometry of the transformations is crucial in determining the shape of a binary tree. A second reason is the fact that the geometry of the transformations does not in itself determine the shape of the binary tree. Look ahead for a moment at Fig. 3, where the fact that a trunk t = (0, 1) is used is evident. But if a trunk t is chosen such that Lt = 0, then the tree actually lies along a straight line. Such behavior suggests that a more useful way to represent matrices is in terms of their eigenvalues and eigenvectors, which is referred to as eigendecomposition or spectral decomposition. Enumerating the types of matrices this way in two dimensions is not difficult; it is just a matter at looking at the Jordan normal form for a 2 × 2 matrix. Recall that a matrix is said to be defective if it has an eigenvalue with geometric multiplicity less than its algebraic multiplicity. Here, the kernel of A will be denoted by Ker A, and the range of A will be denoted by Rng A. Nonzero matrices can either be singular or nonsingular, or defective or nondefective, giving the following four possibilities: 1. Invertible and nondefective. In this case, the matrix is diagonalizable and has the form   λ1 0 [v1 : v2 ]−1 , [v1 : v2 ] 0 λ2

λ1 λ2 = 0,

(2)

5 Designing Binary Trees

109

Fig. 3 A binary tree with L ∼ LR

where the eigenvalue λ1 has eigenvector v1 and λ2 has eigenvector v2 . The notation “[v1 : v2 ]” means that matrix whose columns are the vectors v1 and v2 . In this (and all subsequent cases), v1 and v2 will be assumed to be linearly independent so that the matrix [v1 : v2 ] is invertible. Note that it is possible that λ1 = λ2 ; in this case, the resulting transformation is λ1 I. It is also possible that λ1 and λ2 are complex conjugates; these cases will not be considered here as they have been extensively studied in the literature (Espigulé, 2013; Mandelbrot and Frame, 1999; Taylor, 2007). 2. Invertible and defective. In this case, the matrix has the form [v1 : v2 ]

  λ1 [v1 : v2 ]−1 , 0λ

λ = 0.

(3)

This is an example of a shear. Most matrices representing shears are not defective; this is a special case. 3. Singular and nondefective. This is simply (2) with λ2 = 0:   λ0 [v1 : v2 ] [v1 : v2 ]−1 , 00

λ = 0.

(4)

The reason this case is singled out is that it is often necessary to solve matrix equations that have the form  AB =

 00 = [0], 00

where the notation [0] denotes the matrix of all 0’s. If one of A or B is invertible, then the other must be [0], which produces a one-dimensional tree and is therefore avoided. So in solving matrix equations, it is usually necessary to consider singular cases separately. 4. Singular and defective. In this case, the matrix has the form (3) with λ = 0:

110

V. J. Matsko



 01 [v1 : v2 ] [v1 : v2 ]−1 . 00

(5)

This is a particularly interesting case. Such a matrix M satisfies M 2 = [0], but M = [0]. These matrices may also be classified by the condition that the kernel of the transformation is the same as its range; both are Rv1 . They are significant because they “prune" trees in the sense that if such a transformation is applied twice in a row, the branch ends (since M 2 = [0]).

A First Example: L ∼

LR

For the first example, the case L ∼ LR is examined, where this notation means that the instruction L results in the same node as the string of instructions LR. An example of a tree with this property is shown in Fig. 3. This tree does not look like a typical binary tree, as there seems to be little branching going on. To see why, look at the first few iterations of the creation of this tree, as shown in Fig. 4. Here, nodes get slightly smaller with each iteration to emphasize the overlap. The transformation L is the leftmost, going to the upper right from the trunk, while R initially goes down to the right. Note that the node labeled RL may also be reached by RLR, since L ∼ LR. In fact, any set of instructions beginning with RLR will end up at this same node. It is not difficult to show this algebraically. Referring to how (1) was obtained, L ∼ LR may be written as (I + L)t = (I + L + RL)t. l+

l

r+ rll rl ∼ rlr ∼ rlr+ r

rr+

Fig. 4 The first five iterations of the tree shown in Fig. 3

5 Designing Binary Trees

111

Note that this equality depends upon the trunk t. If L ∼ LR for any choice of trunk, then L and R satisfy I + L = I + L + RL, giving RL = [0].

(6)

Now consider a set of instructions beginning with RLR. The node reached is (I + R + LR + RLR + · · · )t = (I + R + LR)t. This is because the term “RLR” and all terms after include an “RL,” and RL = [0] (see (6)). This calculation illustrates that a geometric property of a binary tree may be verified algebraically. Such verifications are, in general, extremely helpful in studying properties of families of binary trees. The notations in Fig. 4 also explain various features of the binary tree produced. The branch labeled R+ means that the nodes in this branch are reached by strings of instructions beginning with exactly one R, similarly for RR+. The branch labeled L + indicates that all strings beginning with L result in nodes on this branch. Note how the nodes obtained by R, RR, RRR, etc. seem to zigzag along the line going down to the right. This is because the choice of R here is  R=

2 3 −1 2

  −1 −0.9 0 2 3 , 0 0 −1 2

(7)

and so R has an eigenvalue of −0.9 with corresponding eigenvector (2, −1). The eigenvector describes the direction of the line going down to the right, and the negative eigenvalue causes the zigzag. This illustrates that writing R in the form (7) is geometrically quite relevant. To produce other trees with similar properties, it is necessary to solve (6). Here and in the other examples, only solutions where L and R are linear will be considered; allowing affine solutions adds an additional layer of complexity to the solutions and will not be considered. This equation looks deceptively simple, but there are subtleties. Clearly, neither L nor R may be invertible, since that would force the other transformation to be [0], which is avoided. So both L and R must be singular. In addition to being singular, one or both of L and R may be defective. This gives four cases to consider – and all must be considered, since the different cases generate trees with different geometrical properties. CASE 1: Both L and R are nondefective. Since L is singular, begin by selecting u with Lu = 0. Because L is nondefective, λ1 = 0 and v may be found with Lv = λ1 v. Now

112

V. J. Matsko

0 = RLv = R(λ1 v) = λ1 Rv. This means that v corresponds to the eigenvalue 0 for R, and since R is nondefective, there is also λ2 = 0 and w with Rw = λ2 w. This gives the following representations for L and R: 

 λ1 0 L = [v : u] [v : u]−1 , 0 0



 λ2 0 R = [w : v] [w : v]−1 . 0 0

The tree in Fig. 3 is an example of this case; the R used was given in (7), and the L is    −1 3 −1.2 0.8 0 3 −1.2 . L= 2 2 0 0 2 2 CASE 2: L is defective and R is nondefective. Because L2 = [0] (see the discussion accompanying (5)), it is easy to see that L ∼ LL . The effect is to prune the tree so that there are never two successive left branches, but successive right branches are possible. This results in the tree shown in Fig. 5. Here, the transformation L takes the trunk up to the left, while R takes the trunk to the right. The transformations which generate this tree are given by       −1 −1 2 −1 0.75 0 2 −1 −1 0 0 1 −1 0 , R= . L= 1 1 1 1 00 0 0 1 1 1 1 Again, representing the transformations in this form offers geometrical insight into the nature of the tree. The range of L is R(−1, 1); the vector (−1, 1) is evidently Fig. 5 Binary tree with L defective and R nondefective

5 Designing Binary Trees

113

the direction of the left branches. The eigenvector corresponding to the nonzero eigenvalue of R is (2, 1), which is the direction of the right branches. As a result, the shape of this tree may be specified by the artist rather than be left to chance. Finally, a direct (but unenlightening) calculation shows that the nodes at the ends of the left branches lie on the same horizontal line and that 3/4 is the only choice of eigenvalue (given the eigenvectors and trunk t = (0, 1)) for which these nodes do lie on a horizontal line. CASE 3: L is nondefective and R is defective. Since L is singular and nondefective, λ = 0 and v may be found such that Lv = λv, so that the range of L is Rv. But since RL = [0], then 0 = RLv = R(λv) = λRv, so that the kernel of R is Rv. But since R is defective, the range of R is also Rv. Since the range of both L and R is Rv, all the nodes of the tree must lie on the line t + Rv. CASE 4: Both L and R are defective. In this case, determine u and v so that Ker L = Rng L = Ru and Ker R = Rng R = Rv. Because RL = [0], it follows that RLw = 0 for all w. This means that R acting on Rng L is always 0, implying that Ker R = Rng L. Thus, u is parallel to v. But note that this also means that Ker L = Rng R, so that LR = [0] as well. Since RL = LR = [0], any string of instructions beginning with L generates the same node as the string L, and any string of instructions beginning with R generates the same node as R – any instruction subsequent to the first L or R effectively prunes the tree. This implies that the tree only has two nodes in addition to the trunk, generated by the strings L and R. Now the last two cases do not produce trees with an interesting structure and, as such, would not lend themselves to artwork. The point of looking at these four cases is to illustrate how singular defective transformations influence the geometry of binary trees. In most treatments of eigenvalues and eigenvectors, defective transformations are barely mentioned. However, they play an important role in the study of the geometry of binary trees.

A Second Example: LR ∼

RL

The next example involves instructions LR and RL generating the same node. This has the effect of creating quadrilaterals, as shown in the tree in Fig. 6. Note how going left then right results in the same node as going right then left. Again looking at (1), this gives the matrix equation I + L + RL = I + R + LR, which is equivalent to

114

V. J. Matsko

Fig. 6 Binary tree with LR ∼ RL

(I + R)L = (I + L)R.

(8)

Now each of I + R, L, I + L, and R may be singular or invertible, or defective or nondefective. This apparently gives 256 different cases to consider – a daunting number. But several cases are impossible; for example, it is not possible for three of these four to be invertible, while the fourth is singular. Or if exactly two of these four are singular, then they must be one of I + R and L, as well as one of I + L and R. In such situations, it is usually enough to find infinite families of solutions in order to create interesting artwork. This still gives a very broad range of trees which may be produced even if all solutions to (8) are not found. In particular, the case when L and R are nondefective is considered here, so that they are of the form (2) or (4). Since L is nondefective, λ1 , λ2 , v1 , and v2 may be found with Lv1 = λ1 v1 ,

Lv2 = λ2 v2 .

(9)

Since v1 and v2 are independent, α, β, γ , and δ may be found such that Rv1 = αv1 + βv2 ,

Rv2 = γ v1 + δv2 .

(10)

Now applying (8) to v1 and substituting repeatedly from (9) and (10) results in (λ1 − α)v1 + β(λ1 − λ2 − 1)v2 = 0.

(11)

Similarly, applying (8) to v2 results in γ (λ2 − λ1 − 1)v1 + (λ2 − δ)v2 = 0.

(12)

Because v1 and v2 are independent, all coefficients in the above equations must be 0, resulting in

5 Designing Binary Trees

λ1 = α,

115

λ2 = δ,

β(λ1 − λ2 − 1) = 0,

γ (λ2 − λ1 − 1) = 0.

It is worthwhile to remark that this is a general strategy for solving matrix equations. Regardless of the form of (8), this procedure will always result in a set of four equations in λ1 , λ2 , α, β, γ , and δ. In general, the more complex the form of (8), the more difficult these equations will be to solve. This is typically a “last resort” strategy; the examples in the previous section show that there may be more direct methods. The equations (11) and (12) are straightforward to solve. It must be that either β = 0 or λ1 = λ2 + 1 (or both) and either γ = 0 or λ2 = λ1 + 1 (or both). Note that having simultaneously both β = 0 and γ = 0 would be impossible, since this would imply that L = R, which is avoided. Of course having simultaneously λ1 = λ2 + 1 and λ2 = λ1 + 1 is impossible as well. The remaining two cases are symmetric in form, so without loss, assume that β = 0 and λ2 = λ1 + 1. Thus, since α = λ1 and β = 0, L and R have an eigenvalue/eigenvector in common, λ1 and v1 (see (10)). Also, there is now only one free parameter, γ , since α, β, and δ are determined. It is straightforward to show that R(γ v1 + v2 ) = λ2 (γ v1 + v2 ), so that the other eigenvalue/eigenvector pair for R is now known. Note that γ = 0 since L = R is avoided. Writing both L and R with respect to v1 and v2 , it follows that L = [v1 : v2 ]

  λ1 0 [v1 : v2 ]−1 , 0 λ2

 R = [v1 : v2 ]

 λ1 γ [v1 : v2 ]−1 , (13) 0 λ2

where λ2 = λ1 + 1 and γ = 0. Note that this is a four-dimensional parameter space: two dimensions for the directions of v1 and v2 , one for λ1 , and one dimension for γ . This allows for a wide array of trees to be produced. Recall that this is just one of many infinite families which are solutions to (8); making different assumptions about the invertibility/defectiveness of the matrices in consideration will result in a different family of solutions. The tree shown in Fig. 6 is given by       −1 −1 1 −1 0.6 2 1 −1 0.6 0 1 −1 1 −1 , R= . L= 0 1 0 1 0 1.6 0 1 0 1.6 0 1 Observe that the form of the solutions given in (13) depended only on the fact that L and R were nondefective. In particular, consider the case where λ1 = 0 (and hence λ2 = 1), shown in Fig. 7, so that both L and R are singular and therefore have the form

116

V. J. Matsko

Fig. 7 Binary tree with LR ∼ RL , and L and R singular

 L = [v1 : v2 ]

 00 [v1 : v2 ]−1 , 01

R = [v1 : v2 ]

  0γ [v1 : v2 ]−1 . 01

Simple matrix multiplication reveals that L = LL = LR and R = RL = RR. This implies that any product of L’s and R’s may be simplified; for example, RLRR = (RL)RR = RRR = (RR)R = RR = R. In general, any product of L’s and R’s is equal to the leftmost matrix in the product. So the string of instructions RRLR results in the node (I + R + RR + LRR + RLRR)t = (I + R + R + L + R)t. It is evident, then, that the node produced by a string depends only on the number of L’s and R’s in the string, not on their order. Thus, for strings of instructions of length n, just n + 1 new nodes are produced. This is readily apparent in Fig. 7.

A Third Example: LR∞ ∼ (RL)∞ The third example addresses the issue of equivalence of infinite strings of instructions. Such strings were considered in Mandelbrot and Frame (1999) when investigating “just-touching” binary trees. They looked at just a few particular infinite strings of instructions, but of course it is possible to generalize to any infinite strings. In Fig. 8, observe that the instructions LR∞ (i.e., LRRRR . . .) and (RL)∞ (that is, RLRLRL . . .) lead to the same node in the tree (these two paths are drawn in thicker lines). Consider a process for finding an infinite family of pairs of transformations which will produce binary trees with this particular property. The difficulty lies in that the matrix equation involves infinite sums: I +L+RL+R 2 L+R 3 L+· · · = I +R+LR+RLR+LRLR+RLRLR+· · · (14)

5 Designing Binary Trees

117

Fig. 8 Binary tree with LR ∞ ∼ ( RL )∞

To work with this equation, assume that both sums are absolutely convergent. Not only will this guarantee that the sums actually exist, but will also allow the rearranging of terms on the right-hand side to obtain ⎛ ⎞ ∞ ∞   I +⎝ R j ⎠ L = (I + R) (LR)j . j =0

j =0

The sums in these equations are geometric series and can therefore be added using the usual formula adapted for matrices: I + (I − R)−1 L = (I + R)(I − LR)−1 . Now multiply on the left by I −R and on the right by I −LR, expand, and rearrange terms to give R 2 − LR − R + L + RLR − L2 R = [0]. Since L = R must be a solution to the equation (since then both infinite strings are actually the same set of instructions), it is plausible that a factor of (R − L) may be extracted from the left-hand side. This results in (R − L)(R − I + LR) = [0]. Assuming that R −L is invertible, then it follows that R −I +LR = (I +L)R −I = [0]. If I + L is invertible, this results in

118

V. J. Matsko

R = (I + L)−1 .

(15)

Several assumptions have been made along the way. Nonetheless, there is still a four-dimensional infinite family of pairs of transformations to choose from. Simply choose L, and define R as in (15), assuming that the choices guarantee the existence of inverses and convergence of sums as indicated above. Again, it is not necessary to find all solutions to (14), just a large enough family to allow for a broad investigation of binary trees with a particular convergence property. The choices of L and R which generate the tree shown in Fig. 8 are given by

L=

1 R70◦ , 2

R = (I + L)−1 ,

(16)

where R70◦ denotes the counterclockwise rotation through 70◦ . In this case, because of (15), it is not necessary to specify L and R in terms of eigenvalues and eigenvectors.

Other Issues There are several other issues at play here which were not addressed in the examples above so as not to get in the way of the essential linear algebra. 1. The terms “left” and “right” are arbitrary; examples were chosen so that they were visually easy to understand. In general, for arbitrary transformations L and R, the branches will not visually appear to go consistently to the left and right. 2. The choice of trunk also influences the geometry of the binary tree. For example, if the trunk of the tree shown in Fig. 7 is chosen to be v1 – the eigenvector corresponding to an eigenvalue of 0 – the tree will be degenerate and only have one node. This is not artistically very interesting, but is an important issue when completely describing binary trees with a particular property. 3. In many instances, there exist solutions to matrix equations where L and R are affine but not linear. Considering such solutions adds dimensions to the parameter space, resulting in an even wider range of trees. However, the solution procedure is more involved in these cases. 4. Left and right branchings are always contractive in the literature. When convergence of infinite strings of instructions is desired, this is usually necessary. But when looking for trees with the property that LR ∼ RL, for example, and iterating to some finite depth, convergence is not a concern. This opens up additional possibilities.

5 Designing Binary Trees

119

Artistic Considerations As seen above, a complete analysis of an apparently simple constraint like L ∼ LR may be quite involved. Yet the trees produced using this constraint, seen in Figs. 4 and 5, cannot match the intricacy of the tree illustrated in Fig. 1. As a result, the artistic possibilities with trees satisfying L ∼ LR are rather limited. In this section, several examples of artistically more interesting trees are presented. They will all exhibit one feature, seen in Fig. 1: L and R will be chosen so that nodes in the tree overlap in a non-trivial way. In this figure, L is chosen to be a counterclockwise 120◦ rotation, and R is a scaled rotation. As nodes and branches are traversed, the size of the nodes and the width of the branches become successively smaller, which helps create texture. But because L is not a scaled rotation, successive applications of L create equilateral triangles within the figure. Since L3 = I, the equilateral triangles of nodes will be created regardless of the choice of R. It is necessary to experiment with various choices of R to see which result in an interesting geometry. There is no one heuristic which may be used as a failsafe guide – simply lots of work, trial and error, and the rendering of thousands of trees provide an artist some intuition about what might make a good choice. Figure 9 was created in a similar way. Here, L is a 90◦ counterclockwise rotation, so that L4 = I ; squares are clearly evident. The origin is at the lower right corner of the square, which is repeatedly drawn as successive L transformations are applied. The transformation R is also a counterclockwise rotation by 90◦ , but scaled by 1/2. This creates the smaller squares within and the diagonal patterns as well. Other scale factors do not produce as interesting a result; with scale factors less than 1/2, there is no interaction between the smaller squares, and with scale factors larger than 1/2, there is considerable overlap, but not in any patterned way. In the next example, Galaxy (shown in Fig. 10), transformations were chosen so that for any string σ, σ LLR ∼ σ. This was accomplished by choosing L and R such that I + L + RL = [0]. Assuming L is invertible, it is not difficult to see that given L, putting R = −I − L−1 accomplishes this task. Consider σ = R. Referring to (1) and using the property that I + L + RL = [0], the string RLLR takes the trunk t to (I + R + LR + LLR + RLLR)t = (I + R + (I + L + RL)LR)t = (I + R + [0] · LR)t = (I + R)t, which is precisely where the string R takes t. This calculation is easily generalized for any string σ . It is worth noting that in this example, the transformations used were affine transformations. Using affine transformations allows a larger parameter space, but solving equations like (6) becomes more involved. It is often necessary to

120

V. J. Matsko

Fig. 9 Untitled

Fig. 10 Galaxy

be satisfied with a family of solutions to a particular equation involving affine transformations rather than a complete solution. Note that if L is invertible, R is easily determined – but if L is singular, a complete solution requires considerably more work.

5 Designing Binary Trees

121

Fig. 11 S

Sometimes the desired coincidences result in complicated matrix equations. The binary tree S shown in Fig. 11 has RR ∼ LRL, but one of the transformations necessary to produce this binary tree was generated by a computer algebra system (CAS). L was chosen to be invertible and defective, and so v1 , v2 , and λ were chosen as in (3). A CAS was then used to find R so that RR ∼ LRL for all choices of trunk t; that is, I + R + R 2 = I + L + RL + LRL.

Polynomial matrix equations are more difficult to solve than their algebraic counterparts, since in general AB = [0] may occur even though A = [0] and B = [0], as seen when solving (6). Although solving such equations exactly gives the artist more control over the parameter space, interesting results may still be produced by using a CAS to find specific solutions to involved matrix equations.

Conclusion When left and right branchings in the usual algorithm for creating binary trees are allowed to be arbitrary affine transformations, the parameter space for binary trees becomes quite large. As a result, an extremely rich variety of trees with diverse geometrical properties may be produced. But rather than just randomly wander through this forest by arbitrarily varying parameters, it is possible to be quite intentional about the design of the structure of binary trees with the help of some elementary linear algebra.

122

V. J. Matsko

References Espigulé B (2013) Generalized self-contacting symmetric fractal trees. Symmetry Cult Sci 21:333– 351 Mandelbrot B, Frame M (1999) The canopy and shortest path in a self-contacting fractal tree. Math Intell 21:18–27 O’Hanlon A, Howard A, Brown D (2004) Path length and height in asymmetric binary branching trees. Missouri J Math Sci 16:88–103 Taylor TD (2007) Golden fractal trees. In: Sarhangi R, Barrallo J (eds) Bridges Donostia: mathematics, music, art, architecture, culture. Tarquin Publications, London, pp 181–188

6

Homeomorphisms Between the Circular Disc and the Square Chamberlain Fong

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Canonical Mapping Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mapping Diagram with Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Mathematical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fernandez-Guasti Squircle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tapered2 Squircular Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lamé Squircle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elliptical Grid Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conformal Square Mapping via Schwarz-Christoffel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Complex Class of Squircles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application: Squaring the Poincaré Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyperbolic Tilings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application: Elliptification of Rectangular Imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Size Versus Shape Distortions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

124 125 125 127 127 127 129 131 134 137 139 142 144 147 147 147 147

Abstract The circle and the square are among the most common shapes used by mankind. Consequently, it is worthwhile to study the mathematical correspondence between the two. This chapter discusses three different ways of mapping a circular region to a square region and vice versa. Each of these mappings has nice closed-form invertible equations and different interesting properties. In addition, this chapter will present artistic applications of these mappings such as

C. Fong () exile.org, San Francisco, CA, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_27

123

124

C. Fong

converting the Poincaré disk to a square as well as molding rectangular artworks into oval-shaped ones.

Keywords Conformal square · Escheresque artworks · Invertible mappings · Non-Euclidean geometry · Poincaré disk · squircles

Introduction In topology, a homeomorphism is equivalence relation between two geometric objects that can be continuously deformed into one another. Two objects are homeomorphic if there is a continuous invertible mapping between them. An optrepeated humorous example of this is the homeomorphism between the donut and the coffee cup. Joking aside, a coffee cup is not at all easy to define mathematically. Instead, this chapter will focus on two objects that are simple and well-defined mathematically – namely, the circular disc and the square. We shall study ways to map the circular disc to a square region and vice versa (Fig. 1). Needless to say, there are infinitely many ways to map a circular disc to a square. Of particular interest to us are mappings with nice closed-form invertible equations. This chapter will discuss three such mappings. In addition, this chapter will also introduce and discuss different curious properties of these mappings. These properties include concepts such as uniform grids, radial constraints, and conformality. But before we proceed, this is an important lingering issue that needs to be addressed. This mathematical problem sounds all too familiar. Are we dealing with a classic problem in mathematics? Is this problem just “squaring the circle” under a modern guise? Indeed, there is a famous classic problem in mathematics called “squaring the circle.” This well-known geometric construction problem involves using a straight-

Fig. 1 A chessboard mapped to a circular disc

6 Homeomorphisms Between the Circular Disc and the Square

125

Fig. 2 Canonical mapping space for the circular disc and the square

edge and a compass to produce a circle and square with equal area. At first glance, our mapping problem seems similar to “squaring the circle.” However, the two problems are only superficially alike and ultimately quite different. Our mapping problem involves finding two-dimensional mapping equations that a computer can calculate, whereas the classic problem has to do with geometric construction using a draftsman’s tools.

Canonical Mapping Space In order to mathematically describe the mappings, we need to first cover notation and introduce some variables in the canonical mapping space. For the mappings, the domain is the unit disc centered at the origin, and the range is the square with corners at (±1, ±1). This is shown in Fig. 2. We shall denote (u, v) as a point contained in the unit disc and (x, y) as the corresponding point contained in the square after the mapping. Mathematically speaking, we want to find functions f that maps every point (u, v) in the circular disc to a point (x, y) in the square region and vice versa. In others word, we want to derive equations for f such that (u, v) = f (x, y) and (x, y) = f −1 (u, v).

Mapping Diagram with Equations Figure 3 shows the three mappings that are at the core of this chapter. These are the Tapered2 Squircular, Elliptical Grid, and the Conformal Square mapping. In order to illustrate the visual properties of these mappings, there are diagrams for a disc with a radial grid converted to a square. These appear on the left side of Fig. 3. Similarly, on the right side, there are diagrams for a rectilinear square grid converted to a circular disc. Each of these mappings has corresponding forward and inverse equations accompanying the diagrams. From here, one can observe that these equations come in varying degrees of mathematical sophistication. For the Tapered2 Squircular mapping, it is convenient to use vector notation. In contrast, for the Conformal Square mapping, it is most appropriate to use complex

126

Fig. 3 Some mappings to convert a disc to a square and vice versa

C. Fong

6 Homeomorphisms Between the Circular Disc and the Square

127

numbers to come up with relatively compact equations. On the other hand, the Elliptical Grid mapping can be expressed simply by using plain algebraic equations. Each of these mappings will be discussed in more detail in the next section. It is important to review the signum function at this point. This function is abbreviated as sgn(x) and is defined as ⎧ ⎨ −1 sgn(x) = 0 ⎩ 1

if x < 0 if x = 0 if x > 0

Some Mathematical Details Fernandez-Guasti Squircle In 1992, Manuel Fernandez-Guasti discovered a plane algebraic curve that is an intermediate shape between the circle and the square (Fernandez-Guasti, 1992). His curve is represented by a quartic equation. x2 + y2 −

s2 2 2 x y = r2 r2

(1)

His equation includes a parameter s that can be used to blend between the circle and the square smoothly. Figure 4 illustrates the Fernandez-Guasti squircle at varying values of s. The squareness parameter s can have any value between 0 and 1. When s = 0, the equation produces a circle with radius r. When s = 1, the equation produces a square with side length 2r. In between, the equation produces a smooth curve that is a geometric hybrid of the circle and the square.

Tapered2 Squircular Mapping The Fernandez-Guasti squircle can be used as groundwork for creating mappings between the circular disc and the square. The key idea is to match circular contours inside the circular disc to squircular contours inside the square. This is shown in Fig. 5.

Fig. 4 The Fernandez-Guasti squircle at varying squareness

128

C. Fong

Fig. 5 Mapping based on the squircular continuum

Fig. 6 Radial constraint on the mapping

A circular disc can be considered as a continuum of concentric circles. Likewise, a square region can be considered as a continuum of concentric squircles with increasing squareness. These concentric squircles become increasingly squarelike as they approach the bounding rim. We shall denote this phenomena as the squircular continuum of the square. In addition to matching contours inside the circular disc and the square, the Tapered2 Squircular mapping has a restriction called the radial constraint. In a nutshell, this means that points are only allowed to move radially during the mapping process. This is illustrated in Fig. 6. By coupling the radial constraint with a nonlinear squircular continuum, it is possible to derive a set of equations for mapping the circular disc to a square. This is a geometric overview of how the Tapered2 Squircular mapping was derived. For more details, see Fong (2019).

6 Homeomorphisms Between the Circular Disc and the Square

129

The mapping equations provided in Fig. 3 are not exactly precise. In the interest of brevity, some important details were intentionally left out. For example, in the square-to-disc mapping equation, there are degenerate points that cause a division by zero. These degenerate points are located at the center and the four corners of the square. At these locations, the mapping equation results in the indeterminate form 0 0 . However, the limiting value of the mapping for these points is well-defined. In view of these and other missing details, here are the amended mapping equations for the Tapered2 Squircular mapping. ⎡1⎤ ⎧  √ ⎪ ⎪ −u2 −v 2 + (u2 +v 2 )[u2 +v 2 +4u2 v 2 (u2 +v 2 −2)] ⎣ u ⎦ ⎪ ⎪ ⎨ sgn(uv) 2(u2 +v 2 −2)

  x =   ⎪ y ⎪ ⎪ u ⎪ ⎩ v

if

1 v

u = 0 v = 0

otherwise

⎧   ⎪ x x 2 +y 2 −2x 2 y 2 ⎪ ⎪ ⎪ 2 +y 2 )(1−x 2 y 2 ) (x ⎪ y ⎪ ⎪ 

⎪ √   ⎪ ⎨ u sgn(x) 12 2 = √ ⎪ v ⎪ sgn(y) 12 2 ⎪ ⎪ ⎪   ⎪ ⎪ 0 ⎪ ⎪ ⎩ 0

if

(x, y) = (0, 0) (x, y) = (±1, ±1)

if (x, y) = (±1, ±1) if (x, y) = (0, 0)

Lamé Squircle The Fernandez-Guasti squircle is by no means the only shape that is a parameterized hybrid of the circle and square. In fact, there is a much more famous curve known as the superellipse that also has this property. The Lamé squircle is a special case of the superellipse with no eccentricity. This plane algebraic curve was originally studied by Gabriel Lamé in 1818. Qualitatively, the curve appears very similar to the Fernandez-Guasti squircle, but there are several important differences which will be discussed later. The Lamé squircle has the equation |x|p + |y|p = r p

(2)

There are two parameters in this equation: p and r. The power parameter p allows the shape to interpolate between the circle and the square. When p = 2, the equation produces a circle with radius r. As p → ∞, the equation produces a square with a side length of 2r. In between, the equation produces a smooth planar curve that resembles both the circle and the square. Figure 7 shows the Lamé curve at increasing powers.

130

C. Fong

Fig. 7 Lamé squircle at varying polynomial powers

Fig. 8 Disc-to-square mapping based on the Lamé squircle

Although the Lamé squircle exhibits qualitative similarities with the FernandezGuasti squircle, there is a major difference between the two. The Lamé squircle can only approximate the square. It requires an infinite exponent in order to fully realize the square. Moreover, since the polynomial equation for the Lamé curve has unbounded exponents, it is unwieldy and difficult to manipulate algebraically. Nevertheless, the Lamé squircle can be used as groundwork for devising another disc-to-square mapping. The Lamé squircle has a parametric form in which x and y coordinates of points on the curve can be specified in terms of parameter t. The parametric equations are 2

x(t) = sgn(cos t) | cos t| p 2

y(t) = sgn(sin t) | sin t| p

(3)

Using these parametric equations of the Lamé squircle, it is possible to devise another disc-to-square mapping. The key idea is to make the substitutions u = cos t and v = sin t to represent points in the circular disc. This mapping is shown in Fig. 8 with corresponding equations as 2 −v 2

x = sgn(u) |u|1−u

2 −v 2

y = sgn(v) |v|1−u

(4)

6 Homeomorphisms Between the Circular Disc and the Square

131

This mapping based on the Lamé squircle has several undesirable properties that make it less useful than the other mappings shown in Fig. 3. For one thing, explicit inverse equations are not provided. It is quite likely that there are no closed-form equations for expressing the inverse mapping. This means that the inverse mapping can only be computed using iterative root-solving techniques in numerical analysis. Another egregious property of this mapping is that it only works on the open circular disc and the open square region. This means that mapping does not include the circular boundary of the disc or the rim of the square region. Mathematically, the domain of the mapping is the open circular disc {(u, v)|u2 + v 2 < 1}, and the range of the mapping is the (−1, 1) × (−1, 1) open square region.

Elliptical Grid Mapping The square can easily be subdivided into a grid of smaller squares. In stark contrast, there is no easy way to subdivide the circle into a grid of smaller nonoverlapping circles. Nevertheless, the circular disc can be subdivided into a uniform grid based on elliptical arcs. This is shown on the right of Fig. 9. Using this observation, it is possible to devise a square-to-disc mapping. The key idea is to match cells inside the square grid with cells inside the circular grid. In 2005, Philip Nowell (2005) came up with very simple equations for mapping a square to a circular disc. The derivation basically boils down to two mathematical constraints needed to match square cells with elliptical grid cells. The first constraint comes from matching vertical line segments inside the square to vertically oriented elliptical arcs inside the circular disc. This is illustrated in Fig. 10 where a vertical line and its corresponding elliptical arc are shown in red. This mapping relationship can be summarized by the equation:

Fig. 9 Overview of the Elliptical Grid mapping

132

C. Fong

Fig. 10 Vertical constraint of the Elliptical Grid mapping

Fig. 11 Horizontal constraint of the Elliptical Grid mapping

1=

u2 v2 + x2 2 − x2

(5)

Essentially, for each vertical line segment of constant x inside the square, there is a corresponding equation of an ellipse centered in the circle. The bounds, eccentricity, and semi-major and semi-minor axial lengths of this ellipse vary depending on the value of x. Specifically, the left and right vertex tips of the √ ellipse follow the value of x. Meanwhile, the top and bottom vertex tips vary as ± 2 − x 2 . The second constraint of the mapping comes from matching horizontal line segments inside the square to sideway-oriented elliptical arcs inside the circular disc. This is illustrated in Fig. 11 where a horizontal line and its corresponding elliptical arc are shown in red. This mapping relationship can also be summarized by the equation: 1=

u2 v2 + 2 − y2 y2

(6)

Essentially, for each horizontal line segment of constant y in the square, there is a corresponding equation of an ellipse centered in the circle. The bounds, eccentricity, and semi-major and semi-minor axial lengths of this ellipse vary depending on the value of y. Specifically, the top and bottom vertex tips of the ellipse follow the value of y. Meanwhile, the left and right vertex tips vary as ± 2 − y 2 . Basically, the mapping assigns the grid of perpendicular vertical and horizontal lines inside the square to a grid of elliptical arcs inside the circular disc. As shown in Fig. 9, this is essentially matching square grid cells with elliptical grid cells . A curvilinear grid of elliptical arcs can be formed by superimposing the vertically

6 Homeomorphisms Between the Circular Disc and the Square

133

Fig. 12 Superimposing the vertical and horizontal arcs to create a curvilinear elliptical grid

oriented and horizontally oriented elliptical arcs inside the circular disc. Figure 12 shows the resulting curvilinear grid resulting from the superimposition. Note that the ellipses get more and more eccentric as x or y approach zero. Also, the ellipses get more circular as x or y approach ±1. Mathematically, we can mix the vertical constraint equation with the horizontal constraint equation to get algebraic expressions for u and v in terms of x and y. This can be done by starting with the vertical constraint equation and doing some algebraic manipulations to isolate u2 . 1=

u2 v2 + x2 2 − x2

=⇒

1−

v2 u2 = 2 − x2 x2

=⇒

 u2 = x 2 1 −

v2 2 − x2



We can then plug this u2 value into the horizontal constraint equation

1=

u2 2 − y2

+

v2 y2

=⇒

1=

 x2 1 −

v2 2−x 2 2 − y2

 +

v2 y2

Multiply both sides of the equation by (2 − x 2 )(2 − y 2 )y 2 to remove fractions and get

134

C. Fong

(2 − x 2 )(2 − y 2 )y 2 = x 2 y 2 (2 − x 2 − v 2 ) + v 2 (2 − x 2 )(2 − y 2 ) After which, one can isolate v 2 into one side of the equation (2 − x 2 )(2 − y 2 )y 2 − x 2 y 2 (2 − x 2 ) = −x 2 y 2 v 2 + (2 − x 2 )(2 − y 2 )v 2

=⇒

v 2 (2 − x 2 )(2 − y 2 − x 2 y 2 ) = (2 − x 2 )y 2 (2 − y 2 − x 2 )

=⇒

v2 = y 2 v2 = y 2

(2 − x 2 )(2 − y 2

− x2)

(2 − x 2 )(2 − y 2 ) − x 2 y 2

=⇒

2 (2 − x 2 )(2 − y 2 − x 2 ) 22 −x =⇒ = y 2 4 − 2x 2 − 2y 2   2 − x2 x2 v=y =⇒ v = y 1 − 2 2

Similarly, in a symmetric fashion, one can solve for u as  u=x 1−

y2 2

It is also possible to derive the inverse equations for this mapping, but the algebraic manipulation needed is much more elaborate. Instead, we refer the reader to Fong (2014) for a lengthy derivation of denested inverse equations.

Conformal Square Mapping via Schwarz-Christoffel A conformal map is a mapping that preserves angles between geometric features after the mapping operation is performed. In other words, a conformal mapping does not distort angles. This section will discuss a disc-to-square mapping that is conformal. However, in order to do this, we need to take a small detour through the complex plane beforehand and discuss some theory first. One of the most celebrated results in nineteenth-century complex analysis is the Riemann mapping theorem (Frederick and Schwarz, 1990). It states that there exists a conformal map between the open unit disc and every simply connected subset of the complex plane. Moreover, it states that this conformal map is unique if we fix a point and the orientation of the mapping. In theory, the Riemann mapping theorem is nice, but it is only an existence theorem. It does not specify how to find the conformal mapping. The next important breakthrough came with the works of Hermann Schwarz and Elwin Christoffel. In the 1860s, Schwarz and Christoffel independently developed a formula for a conformal mapping between the unit disc and simple polygonal regions in the complex plane. The formula is complicated and involves an integral in the complex plane.

6 Homeomorphisms Between the Circular Disc and the Square

135

In order to show how complicated this formula is, it is listed below in its full glory. We will not bother explaining what each of the variables mean in the complex integral. It is suffice to say that the general Schwarz-Christoffel formula is quite complicated.  f (z) =

z

 n   ζ αk −1 1− dζ zk

(7)

k=1

Furthermore, for most polygons, the integral can only be approximated numerically. Fortunately, for the special case of the square, the Schwarz-Christoffel formula can be reduced to an explicit analytical expression involving complex-valued elliptic integrals and elliptic functions (Langer and Singer, 2011).

Legendre Elliptic Integrals It is appropriate to make a segue here and briefly discuss elliptic integrals and elliptic functions. Of particular interest to us are Legendre elliptic integrals and Jacobi elliptic functions. Specifically, we are interested in the incomplete Legendre elliptic integral of the first kind called F and its closely related inverse function, the Jacobi elliptic function cn. Mathematically, the incomplete Legendre elliptic integral of the first kind is a two-parameter function defined as 

φ

F (φ, k) = 0

1  dt 1 − k 2 sin2 (t)

(8)

Note that this integral cannot be simplified using any of the standard techniques covered in freshman calculus classes. This integral was originally studied in the context of measuring the arc length of an ellipse. That is the reason why it is called an elliptic integral (Rice and Brown, 2012). The two arguments provided with this function are also intimately tied to the ellipse. The first parameter φ is some sort of angular parameter. It originates from the angle subtended by an arc of the ellipse. The second parameter k is closely related to the eccentricity of the ellipse. Meanwhile, the Jacobi elliptic function cn can be considered as some sort of inverse to F . Mathematically, it is related to F by the following equation: cn(η, k) = cos(F −1 (η, k))

(9)

A Fundamental Conformal Map Without getting into the nitty-gritty details of the Schwarz-Christoffel mapping, Fig. 13 shows a fundamental conformal map between the circular disc and the square in the complex plane. This mapping can be derived by simplifying the SchwarzChristoffel integral for the square and using the doubly periodic nature of the Jacobi elliptic function cn on the complex plane. In essence, one could map every point

136

C. Fong

Fig. 13 A conformal map between the disc and square in the complex plane

inside the unit disc to a square region conformally by just an evaluation of the complex-valued Jacobi elliptic function cn. Furthermore, the inverse of the mapping can be calculated using the incomplete Legendre elliptic integral of the first kind F .

Canonical Alignment The main drawback of the diagram on the complex plane is that the x and y coordinates are not in the canonical mapping space. Figure 13 shows a square with corner coordinates in terms of a constant Ke instead of the ±1 that is desired. Moreover, the square is tilted by 45◦ and off-center from the origin. In order to get this mapping into the canonical mapping space, one needs to perform a series of affine transformations on the square. These include centering the square to the origin and scaling it down to have a side length value of 2. In order to do this, one has to introduce a rotational factor for the 45◦ tilt as well as Ke offsets and scale factors. This is exactly what happens in the explicit equations for the Conformal Square mapping. Basically, this is the canonized mapping equation in the complex plane √ √ −2i z=1−i− F (cos −1 w i, Ke

√1 ) 2

(10)

and its inverse is w=

 √ −i cn(Ke z 2i − Ke ,

where z is the complex number x + y i and w is the complex number u + v i.

√1 ) 2

(11)

6 Homeomorphisms Between the Circular Disc and the Square

137

√ It is probably appropriate to explain the ±i factors that appear throughout the equations. These multiplicative constants are just a compact way of representing the ±45◦ rotational adjustments needed to align the equations to the canonical mapping space in Fig. 2. In the complex plane, rotation can be done simply by multiplication with the complex number eiθ . For 45◦ rotation, the multiplicative factor is π

ei 4

= cos

π π + i sin 4 4

√ 1 = √ (1 + i) = i 2

(12)

For −45◦ rotation, the multiplicative factor is π

e−i 4

= cos

π π − i sin 4 4

√ 1 = √ (1 − i) = −i 2

(13)

Software Implementation In order to implement this mapping on a computer, one needs to be able to calculate special functions such as cn and F . Source code for numerical computation of these functions is readily available in open-source libraries such as Boost and the GNU Scientific Library. Also, there is a reference implementation appearing in the popular Numerical Recipes book (Press et al., 1992). One possible pitfall in software implementation is that the mapping requires complex-valued versions of the special functions. These complex variants are typically not included in open-source libraries. Nonetheless, these complex variants are well-defined mathematically. The formulas for computing complex-valued cn and F are given by L.M. Milne-Thomson in the classic AMS-55 reference book (Abramowitz and Stegun, 1972). It is worth mentioning here that there are well-established fast and robust algorithms for computing the F and cn special functions (Carlson, 1977). As a matter of fact, Gauss has shown that elliptic integrals can be calculated quickly using the arithmetic-geometric mean (Hancock, 1958). It is suffice to say that these special functions can be calculated about as fast as standard trigonometric functions.

A Complex Class of Squircles The Fernandez-Guasti squircle was used as groundwork in the development of the Tapered2 Squircular mapping. This section will discuss using a similar idea but acting in reverse to derive another type of squircle. Specifically, one can start with the Conformal Square mapping and come up with a different type of squircle called the complex squircle. Let us revisit the squircular continuum in Fig. 5 which was originally discussed in the derivation of the Tapered2 Squircular mapping. By working backward this time and starting from the Conformal Square mapping, it is possible to derive equations for a curve that is an intermediate shape between the circle and the square.

138

C. Fong

Fig. 14 The squircular continuum revisited

Figure 14 shows a circular disc subdivided into concentric rings of different colors. This circular disc is then mapped to a square via the Conformal Square mapping. The resulting square is still subdivided by concentric rings, but the enclosing shapes are not quite circular. Consequently, it is logical to ask what sort of shape encompasses the boundaries of the concentric rings inside the square. One can then intuitively surmise from Fig. 14 that the concentric shapes inside the square are some sort of squircular curve. Indeed, these concentric shapes exhibit all of the defining characteristics of a hybrid curve between the circle and the square. Therefore, it makes sense to classify them as yet another type of squircle – the complex squircle. The complex squircle can be written in parametric form by applying Conformal Square mapping to the parametric equation of the circle. In the complex plane, the parametric form of the unit circle is eit , with 0 ≤ t ≤ 2π . By applying the Conformal Square mapping on the circle, one can come up with a complex-valued function ψ that serves as an auxiliary representation of the complex squircle. √ ψ(t, q) = 1 − i −

 −2i  −1 it √ F cos (qe i), √1 2 Ke

(14)

ψ is a two-parameter complex function that is based on the disc-to-square equation of the Conformal Square mapping. The first argument t is used for curve parametrization. The second argument q is just a squareness parameter analogous to the s parameter of the Fernandez-Guasti squircle. Using this ψ function, the complex squircle can be defined as the curve arising from these parametric equations: x(t) = ℜ[ψ(t, q)]

r ψ(0,q)

y(t) = ℑ[ψ(t, q)]

r ψ(0,q)

(15)

6 Homeomorphisms Between the Circular Disc and the Square

139

Fig. 15 The complex squircle at varying squareness values Table 1 Three different types of squircles discussed in this chapter Name

Squareness Equation

Fernandez-Guasti squircle Lamé squircle

s ∈ [0, 1]

x2

+ y2



Key property s2 2 2 x y r2

p ∈ [2, ∞) |x|p + |y|p = r p ψ (t,q)=1−i−

Complex squircle

=

r2

q ∈ [0, 1]



√ −2i −1 it i), √1 ) Ke F (cos (qe 2

x(t) = ℜ[ψ(t, q)] y(t) = ℑ[ψ(t, q)]

r ψ(0,q) r ψ(0,q)

Quartic polynomial equation Unbounded polynomial power

Complex parametric equations with parameter t ∈ [0, 2π ]

As with the other types of squircles previously discussed, there are two parameters for the complex squircle: q and r. The squareness parameter q allows this shape to interpolate between the circle and the square. When q = 0, the equation produces a circle with radius r. When q = 1, the equation produces a square with a side length of 2r. In between, the equation produces a smooth planar curve that resembles both the circle and the square. This is shown in Fig. 15. In summary, mathematical shapes known as squircles play an important role in the development of disc-to-square mappings. This chapter discussed three different types of squircles along with associated mappings. These squircles are summarized in Table 1.

Application: Squaring the Poincaré Disk This section will discuss how to convert circular Escheresque artwork into squares. Figure 16 shows an example of a circular tiling with interlocking angels and devils. The pattern on the right was inspired from M.C. Escher’s famous Circle Limit IV (1960). The pattern on the left is its conversion to a square. In order to delve more into Escheresque artworks (Dunham, 2009), some mathematical background in non-Euclidean geometry is necessary. In particular, we need to discuss a type of non-Euclidean geometry called hyperbolic geometry. To make things simple, we shall restrict ourselves to a two-dimensional construct of hyperbolic geometry called the hyperbolic plane. One can intuitively think of the hyperbolic plane as a surface with negative curvature everywhere (Taimina, 2009). In contrast, the Euclidean plane is a flat surface with zero curvature everywhere.

140

C. Fong

Fig. 16 A circular angels & devils pattern converted to a square

Fig. 17 Playfair’s axiom on the Euclidean plane (left) and the hyperbolic plane (right)

There is a long and storied history of non-Euclidean geometry that this chapter will only gloss over. Non-Euclidean geometry arises from the negation of Euclid’s fifth postulate – also known as the parallel postulate. A modern formulation of this postulate states that In a plane, given a line and a point not on it, at most one line parallel to the given line can be drawn through the point.

This formulation is known as Playfair’s axiom. It is illustrated in the left diagram of Fig. 17. Meanwhile, hyperbolic geometry arises when Playfair’s axiom is overturned with the following statement: In a hyperbolic plane, given a line and a point not on it, there are several lines parallel to the given line than can be drawn through the point.

6 Homeomorphisms Between the Circular Disc and the Square

141

In order to qualify this statement in hyperbolic geometry, one needs to have welldefined and consistent notions of points, lines, and parallelism in the hyperbolic plane. First, let us introduce the Poincaré disk. Mathematicians have come up with many different models of the hyperbolic plane in order to study hyperbolic geometry. The most popular model is probably the Poincaré disk. The Poincaré disk can be considered as some sort of projection of the hyperbolic plane onto a unit circular disc in the Euclidean plane. It has several interesting properties that make it desirable for representing the hyperbolic plane. One important property of the Poincaré disk is that it presents the entire hyperbolic plane within the confines of a finite circular disc in IR2 . Every point in the hyperbolic plane is represented by a unique point inside the Poincaré disk. This makes it easy to visualize the entire hyperbolic plane within the model. Another important property of the Poincaré disk is conformality. In a conformal model, the hyperbolic measure of angle is the same as the Euclidean measure of angle. In other words, this hyperbolic model does not distort angles. One can measure hyperbolic angles between geometric entities inside the Poincaré disk by simply making Euclidean angular measurements. Unfortunately, the same does not hold true for hyperbolic distance. Hyperbolic distances are greatly distorted in the Poincaré disk. In the Poincaré disk model, the origin at (0, 0) lies at the center of the circular disc. The bounding circle at the rim of disc is infinitely far away from the origin. The hyperbolic distance of a point away from the origin gets larger and larger as the point gets closer to the bounding circle. As a consequence of this, geometric entities appear smaller as they approach the bounding circle. One can think of the Poincaré disk in analogy to the Cartesian coordinate system. Just like the Cartesian coordinate system faithfully models the Euclidean plane, the Poincaré disk is a meaningful representation of the hyperbolic plane. The Cartesian coordinate system allows one to convert geometric statements about the Euclidean plane into algebraic equations. In a similar manner, the Poincaré disk allows one to do the same for the hyperbolic plane. Table 2 lists various hyperbolic entities along with their Euclidean counterparts from within the Poincaré disk model. Using this table, one can make sense of Playfair’s axiom in the hyperbolic plane.

Table 2 Hyperbolic entities in the Poincaré disk Geometric entity Hyperbolic point

Corresponding feature inside the Poincaré disk This corresponds to a Euclidean point in the interior of the bounding circle of the Poincaré disk model Hyperbolic line This corresponds to a circular arc orthogonal to the bounding circle. Circular arcs can be part of circles of arbitrary radius Hyperbolic angle The angle between two hyperbolic lines is the same as the Euclidean angle between their corresponding circular arcs inside the Poincaré disk model Hyperbolic parallelism Hyperbolic lines are parallel if and only if they do not intersect in the interior of the Poincaré disk model

142

C. Fong

Fig. 18 The three regular Euclidean tilings

Fig. 19 Possible pentagonal, heptagonal, and octagonal tilings of the Poincaré disk

Hyperbolic Tilings It is well-known that only three types of regular polygons can tile the Euclidean plane – namely, the triangle, the square, and the hexagon. These tilings are shown in Fig. 18. It is lesser known that all regular polygons can tile the hyperbolic plane! To illustrate this, Fig. 19 shows the Poincaré disk tiled with hyperbolic pentagons, heptagons, and octagons. The stop sign is probably the most common octagonal shape that people encounter in their daily lives. It was ratified as an international standard by the United Nations in 1968. Being a regular octagon, the stop sign can tile the entire hyperbolic plane. This is shown in Fig. 20 as a Poincaré disk pattern. Furthermore,

6 Homeomorphisms Between the Circular Disc and the Square

143

Fig. 20 Hyperbolic stop signs

it is possible to convert this pattern into squares by using any of the three mappings discussed in this chapter. The most relevant and appropriate way to convert the Poincaré disk to a square is via Conformal Square mapping. There is a very simple reason for this – conformality. Recall that the Poincaré disc model is a conformal model of the hyperbolic plane. It follows that the conformal square is a natural extension of the Poincaré disc to a square region. In essence, undistorted Euclidean angles in the Poincaré disk remain intact in the conformal square. The two other mappings can also be used to convert the Poincaré disk to a square, but the results are less than stellar. For the Elliptical Grid mapping, the corners

144

C. Fong

Fig. 21 Hyperbolic tilings mapped to the conformal square

of the square appear as a muddled mess. Meanwhile, the Tapered2 Squircular mapping produces fairly decent results, but there is significant shape distortion of the octagons. To summarize, the conformal square is the natural extension of the Poincaré disk to a square. In order to further illustrate this, Fig. 21 shows the conformal square versions of the hyperbolic tilings from Fig. 19.

Application: Elliptification of Rectangular Imagery Most of the world’s photographs are rectangular. However, one might want to convert them into circular or elliptical images for artistic reasons. This section will extend the previously discussed square-to-disc mappings to handle ellipses and rectangles. After which, one can apply these mappings to rectangular artworks to produce elliptical ones. Henceforth, this process of molding a rectangle into an ellipse will be referred to as elliptification. Elliptification is very simple if one already has a square-to-disc mapping. Elliptification can be done by simply removing the eccentricity and then reintroducing it back after the square-to-disc mapping is performed. This procedure is shown in the top diagram in Fig. 22 along with an illustrative example showing the elliptification of the United States flag at the bottom. The artistic process of elliptification is not new. For centuries, artists would paint on oval-shaped canvases, and photographers would cut their pictures into oval regions. Creating oval-shaped artworks is an important form of artistic expression and stylization. This has traditionally been done by just cropping or cutting out the corner regions of the artwork. This chapter promotes the idea of using explicit mathematical mappings to create oval-shaped artworks. This is possible using any of the square-to-disc mappings covered in this chapter and related papers (Fong, 2014, 2019; Shirley and Chiu, 1997). To demonstrate this effect, Figs. 23 and 24 show Edouard Manet’s last major work “A Bar at the Folies-Bergere” (1882) converted into an oval region using

6 Homeomorphisms Between the Circular Disc and the Square

Fig. 22 Elliptification: converting rectangular imagery to oval regions Fig. 23 Manet’s “A Bar at the Folies-Bergere” (1882) and its cropped version

145

146

C. Fong

Fig. 24 Elliptification of Manet’s famous impressionist painting

the different mappings discussed in this chapter. The traditional process of simply cropping the picture to produce an oval region is quite unacceptable because it removes many noteworthy features near the corners of the painting. For example, observe that in Manet’s cropped painting, the bar patron with the top hat is gone. Also, the dangling legs of the trapeze artist at the top left corner of the painting are nowhere to be found. In contrast, the mathematically mapped paintings keep all these features intact, albeit distorted.

6 Homeomorphisms Between the Circular Disc and the Square

147

Size Versus Shape Distortions In differential geometry, two-dimensional distortion is generally categorized into two major types: size and shape distortions (Floater and Hormann, 2005). The elliptification of Manet’s painting highlights the difference between these two types of distortion. The stretched Schwarz-Christoffel based on the Conformal Square mapping has significant size distortions near the corner. This is quite evident by observing the dangling legs of the trapeze artist, which has shrunk to the point of being barely visible. In contrast, the Elliptical Grid mapping has significant shape distortions near the corner. Specifically, the gentleman with a top hat appears considerably deformed. On the other hand, the Tapered2 Squircular mapping offers a good compromise between size and shape distortions. It provides the best result among the three mappings for Manet’s painting. In fact, the circularized chessboard shown in Fig. 1 also uses the Tapered2 Squircular mapping. One can observe that the size and shape fidelity of the corner rooks is reasonably intact.

Conclusion This chapter discussed three explicit methods for mapping the circular disc to a square and vice versa. In addition, some artistic applications of these mappings were provided. For hyperbolic art, the Conformal Square mapping gives the best results. However, for other artistic applications such as elliptification, this is not necessarily the case because the mapping has sizeable distortions near the four corners.

Cross-References  Geometric and Aesthetic Concepts based on Pentagonal Structures  Mathematics and Art: Unifying Perspectives  Tessellated, Tiled, and Woven Surfaces in Architecture  The Beauty of Blaschke Products

References Abramowitz M, Stegun I (1972) Handbook of mathematical functions. Dover Publications Inc., New York Carlson B (1977) Special functions of applied mathematics. Academic, New York Dunham D (2009) Hamiltonian paths and hyperbolic patterns. Contemp Math 479:51–65 Fernandez-Guasti M (1992) Analytic geometry of some rectilinear figures. Int J Math Educ Sci Technol 23:895–901 Floater M, Hormann K (2005) Surface parameterization: a tutorial and survey. In: Dodgson N, Floater M, Sabin M (eds) Advances in multiresolution for geometric modelling. Springer, New York, pp 157–186 Fong C (2014) Analytical methods for squaring the disc. In: Seoul ICM 2014

148

C. Fong

Fong C (2019) Elliptification of rectangular imagery. In: Joint mathematics meeting SIGMAAARTS Frederick C, Schwarz E (1990) Conformal image warping. IEEE Comput Graph Appl 10(2):54–61 Hancock H (1958) Elliptic integrals. Dover Publications Inc., New York Langer J, Singer D (2011) The lemniscatic chessboard. Forum Geometricorum 11:183–199 Nowell P (2005) Mapping a square to a circle (blog). http://mathproofs.blogspot.com/2005/07/ mapping-square-to-circle.html Press W, Flannery B, Teukolsky S, Vetterling W (1992) Numerical recipes in C. The art of scientific computing, 2nd edn. Cambridge University Press, Cambridge Rice A, Brown E (2012) Why ellipses are not elliptic curves. Math Mag 85(3):163–176 Shirley P, Chiu K (1997) A low distortion map between disk and square. J Graph Tools 2:45–52 Taimina D (2009) Crocheting adventures with hyperbolic planes. AK Peters, Wellesley

7

A Visual Overview of Coprime Numbers Benjamín A. Itzá-Ortiz, Roberto López-Hernández, and Pedro Miramontes

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Coprime Numbers and Skew Sturmian Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bézout Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ford Circles and Farey Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bézout Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

150 152 156 158 159 167 167

Abstract Some examples of geometric constructions based on prime numbers are presented. After that, a novel visualization of coprime numbers in the Cartesian plane based on Bézout coefficients is introduced. Using the classification of skew Sturmian sequences as a departing point, it becomes natural to select certain subsets of coprime numbers which contain the dynamical information of skew Sturmian sequences up to conjugacy. When plotting these sets, astonishing geometric structures emerge. Among them are some parabolic arcs; their parametric representations are given.

B. A. Itzá-Ortiz () R. López Hernández Universidad Autónoma del Estado de Hidalgo, Pachuca, Mexico e-mail: [email protected] P. Miramontes Universidad Nacional Autónoma de México, Mexico City, Mexico e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_139

149

150

B.A. Itzá-Ortiz et al.

Keywords Coprime numbers · Bézout coefficients · Skew Sturmian sequence · Quadratic arc

Introduction Prime numbers have fascinated the humanity since they were first systematically studied in ancient Greece around the sixth century before our era. The Pythagorean School of Mathematics was particularly attracted by the properties of prime numbers, and according to their cosmogony, they gave them several mystic interpretations. A couple of centuries later, Euclid studied them and wrote in his Elements the first proof about the infinity of prime numbers as well as many other of its interesting properties. Later on it was Eratosthenes of Cyrene who proposed what in modern language could be called an algorithm to find prime numbers, the celebrated Eratosthenes sieve for finding all prime numbers up to a given limit, simply by discarding composite numbers formed by multiples of each prime beginning with multiples of 2. Some centuries had to pass until the sixteenth and seventeenth centuries when prime numbers captured again mathematician’s attention. Fermat, Girard, Euler and in particular Mersenne are just a few names of scholars who devoted their efforts to unveil prime numbers properties. Nonetheless, the connection between prime numbers and geometry was not systematically explored. A possible exception is the Gauss-Wantzel theorem about constructible polygons: A regular polygon with n sides can be constructed with compass and straightedge if and only if n is the product of a power of 2 and any number of distinct Fermat primes k

A Fermat prime is an integer of the form 22 + 1. Unfortunately the field relating prime numbers and geometry has been largely disregarded by scholars. However, there are some beautiful examples. For instance, in 1964 S.M. Ulam, M.L. Stein, and M.B. Wells (Stein et al, 1964) proposed an ingenious way to depict the prime numbers over the plane, known today as Ulam spiral. In their words Suppose we number the lattice points in the plane (1), e.g., Fig. 1 by starting at (0,0) and proceeding counterclockwise in a spiral so that (0, 0) → 1, (1, 0) → 2, (1, 1) → 3, (0, 1) → 4, (−1, 1) → 5, (−1, 0) → 6, (−1, −1) → 7, (0, −1) → 8, (1, −1) → 9, (2, −1) → 10, (2, 0) → 11, (2, 1) → 12, (2, 2) → 13, etc.

After this indexation of the points with integer coordinates in the plane, Ulam and coworkers marked the points whose index was a prime number and called this set P . They noticed that P looked anything but random. It is noticeable the presence of apparent straight diagonal lines (Fig. 1). Ulam and coworkers demonstrated a series of astonishing results derived from the figure; for instance, that the points over a

7 A Visual Overview of Coprime Numbers

151

Fig. 1 Ulam spiral. The white dots over diagonal straight lines correspond to point whose coordinates are coprime numbers

Fig. 2 Klauber triangles with 5 rows (left) and 400 rows (right)

line are images of a quadratic form. Incidentally, the spiral shown in their paper was created by a MANIAC II electronic computing machine at Los Alamos Scientific Laboratory displaying up to 65,000 points. Before Ulam spiral, an herpetologist proposed there was a graphical visualization of prime numbers. Laurence Monroe Klauber, an amateur mathematician, constructed the Klauber triangle. It is built by listing the natural numbers starting with 1 in the first row, the numbers 2, 3 and 4 in the second row, 5, 6, 7, 8, 9 in the third row, and so on. Next, highlight the positions where the prime numbers appear and delete (leave blank) the places for non-prime numbers. Figure 2 shows two Klauber triangles. As in the case with the Ulam spiral, certain patterns emerge in the Klauber triangle; straight lines seem to rise. And again, some quadratic formulas can be found to describe such lines in the triangle. Another interesting aspect of the triangle is that Klauber did not publish his result in a professional mathematics journal. Instead, he communicated his results over a presentation in the March Meeting of the Southern California Section of the Mathematical Association of America (Daus, 1932) and through letters addressed to the mathematician Eric T. Bell (Klauber, 1931). Another simple, but appealing, way for visualizing prime numbers is given by plotting the identity function (p, p), with p prime, in polar coordinates. Recall

152

B.A. Itzá-Ortiz et al.

Fig. 3 Prime numbers in a polar plot visualization; 10,000 points with (p, p) coordinates are plotted. See text

that a point (x, y) in the usual Cartesian coordinates may be represented in polar coordinates as (r, θ ) where r 2 = x 2 + y 2 , tan θ = yx , and θ is measured in radians. The set of all points (p, p), with p a prime number, is then plotted in polar coordinates. A picture of this set is shown in Fig. 3. So far, some interesting links between prime numbers and geometry have been shown. Now, the focus of this essay will switch exclusively to coprime numbers. Up to our knowledge, perhaps, the only example of the relation between coprime numbers and geometry is the set of Ford circles. Given two relatively prime integers   1 p and q, the circle of radius 2 centered at pq , ± 2p1 2 is a Ford circle. Ford circles 2p are tangent to the abscisae axis and do not intersect among themselves (Fig. 4). The relation of Ford circles and Farey sequences will be explored below.

Coprime Numbers and Skew Sturmian Sequences In this section a way to relate coprime numbers with symbolic dynamics is developed. For this purpose a brief introduction to symbolic dynamics is given.

7 A Visual Overview of Coprime Numbers

153

Fig. 4 Ford circles. In the following sections, they will be treated extensively

As a result of this relation, an outstanding subset of coprime numbers will exhibit intriguing quadratic arcs. Recall that two integers p and q are said to be coprime or relatively prime if their only common positive divisor is 1; this is often also denoted as gcd(p, q) = 1. From now on, strings of zeros and ones will be the basic material of this study. Define a word w to be a finite string of zeros and ones. The size or length of a word w is denoted by |w|; it is the number of symbols it contains. A word w of length n will be called an n-word. A concatenation of two words w1 y w2 is just a new word w1 w2 obtained by the symbols of w1 followed by those of w2 . A word w which begins with a 1 followed by a finite number of zeros will be called a cell. If w is a cell with n zeros, it will be then said that w is the n-cell. For example, 1 is the 0-cell, 10000 is the 4-cell, and so on. Notice then that a n-cell is a n + 1-word. Let p and q be positive relatively prime numbers. A q/p-chain is a p + q-word defined as a concatenation of q cells w1 w2 · · · wq , and thus it contains a total of p zeros. For example, consider the cells w1 = w2 = 100 and w3 = 10. Then the concatenation w1 w2 w3 = 10010010 is a 3/5-chain. A useful way of thinking of a q/p-chain consists of starting with a single p-cell and then inserting q − 1 ones. For example, the 3/5-chain 10010010 shown before may be obtained by beginning with the 5-cell 100000 and then inserting a symbol 1 in the third and fifth position. The terminology just introduced is now going to be translated into geometrical information. Every time a coprime pair (p, q) is associated to a straight line with slope pq through the origin will be taken into account, and a p + q-word will be associated to it. For this purpose, start on the line y = pq x in the Cartesian plane, and consider the integer lattice superimposed in it. In the p × q rectangle with vertices (0, 0), (p, 0), (p, q), (0, q), the line y = pq x intersects the lines of the grid in

154

B.A. Itzá-Ortiz et al.

p + q − 1 points Q1 = (x1 , y1 ), Q2 = (x2 , y2 ), . . . , Qp+q−1 = (xp+q−1 , yp+q−1 ). Notice that the last point of intersection, the vertex (p, q), is discarded while the first point Q0 is the origin. A point Qi will be called horizontal if yi is an integer or vertical if xi is an integer. The origin Q1 = (0, 0), where both coordinates are integer, is considered a double point Q1 = Q0 Q1 so that it is a horizontal point Q0 followed by a vertical point Q1 . The other possible choice for Q1 is to double it backwards as Q1 = Q1 Q0 , that is, a vertical point followed by a horizontal point; however, it finally produces an equivalent presentation for the periodic Sturm sequences which is going to be introduced below. In this setting, a word is formed out of the sequence Q1 , Q2 , . . . , Qp+q−1 = Q0 Q1 , Q2 , . . . , Qp+q−1 by writing a 0 for each vertical point and a 1 for each horizontal point. Thus a pq -chain has been created, and it will be denoted as P ( pq ). Figure 5 shows the P ( 35 ) chain.   The pq -chain P pq is known as a Sturmian chain. A bi-infinite periodic   sequence can be formed by taking P pq and repeating it over and over. This sequence is a special case of the so-called cutting sequence defined in (Lothaire, 1997, pg. 48). It turns out that this periodic bi-infinite is a Sturmian sequence. Sturmian sequences are characterized by the Sturmian property: any two chains containing the same number of cells either have the same number of zeros or one has exactly one more zero than the other (Morse and Hedlund, 1940). Let s be a bi-infinite sequence in the metric space {0, 1}Z of all bi-infinite sequences. Define the subshift X generated by s as the closure of the orbit of s, that is, the subshift X is the closure of the set {Σ n (s) : n ∈ Z}, where Σ : {0, 1}Z → {0, 1}Z is the so-called shift map, defined as Σ(s)n = sn+1 . The notation Σ n denotes composition of Σ with itself n times if n = 0. Being a dynamical system, it is interesting to ask for the conjugacy class of a given subshift generated by s, that is to say, which are the bi-infinite sequences t in

3 y= x 5 1 0 10 Q Q0 1 Fig. 5 The

3 5 -chain

labeling the line y =

0 0 Q3

1 Q5

Q6

0 Q7

Q4

Q2

1001010 is an example of the geometric 3 5x

as it cuts the grid of the integer lattice

3 5 -chain

  P

3 5

, obtained from

7 A Visual Overview of Coprime Numbers

155

{0, 1}Z so that the subshift generated by t is conjugate to the subshift generated by s. Please refer to (Lind and Marcus, 1995) for an introduction to symbolic dynamics.   The subshifts X generated by periodic Sturmian sequences with period P pq are in fact well understood: by (Morse and Hedlund, 1940, Theorem 3.5), the only biinfinite Sturmian subshifts conjugate to the subshift X are the periodic Sturmian sequences in the orbit of s. It turns out that one can insert an anomaly word into a periodic Sturmian sequence; thereby, the sequence will no longer be periodic while still preserving its Sturmian   property. For example, the periodic Sturmian sequence s with period P 35 , that is, s = · · · 10010010 · · · , can be converted to · · · 1001001010010010010 · · · or to · · · 100100101001010010010 · · · which are still Sturmian but no longer periodic: anomaly words 10 and 10010 were introduced in between the periodic Sturmian word, respectively. The resulting eventually periodic sequence is known as a skew Sturmian sequence of frequency pq (ItzáOrtiz et al, 2016; Morse and Hedlund, 1940).   Given a periodic Sturmian sequence with period P pq , there are exactly two different types of skew Sturmian sequences associated to it. Consider the line y = q p x as before together with the following rule: introduce a 0 or a 1 every time this line intersects either a vertical or horizontal lattice line, respectively. • For a type A skew Sturmian sequence with frequency pq , introduce a 10-word on the origin and every time the line crosses an integer lattice point below the x-axis. On the other hand, introduce a word 01 on the vertex (p, q) of the rectangle as well as in all the integer lattice points that the line crosses above the x-axis • For a type B skew Sturmian sequence with frequency pq , the opposite approach as for the type A is followed; that is, introduce a word 01 on the origin and every time the line crosses an integer lattice point below the x-axis. On the other hand, introduce a word 10 on the vertex (p, q) of the rectangle as well as in all the integer lattice points that the line crosses above the x-axis The above description of skew Sturmian sequences with frequency pq is shown in Fig. 6. While it is a fairly straightforward description, it may not be evident how

y= 0 1

3 x 5 1

01

y=

0 0

0

1

0 10

0

0 01

Fig. 6 Skew Sturmian sequences of type A (left) and Type B (right)

3 x 5 1

10 0

156

B.A. Itzá-Ortiz et al.

  to detect the anomaly word inserted in the periodic sequence of period P pq . A second equivalent way to describe both types of skew Sturmian which do exhibit the inserted anomaly word involves the use of the Bézout coefficients of the coprime numbers pair (p, q), which will be defined next.

Bézout Coefficients The Bézout coefficients are widely used in elementary Number Theory; their definition is quite simple, but, as it is explored in this essay, their consequences can be amazing ranging from applications in cryptography to the visualizations that are presented here. Definition 1 (Bézout coefficients). Given the coprime pair (p, q) of positive numbers, a and b are the Bézout coefficients of p and q of type A if the pair a and b are the unique integers such that 0 < a ≤ p, 0 ≤ b ≤ q and they satisfy the Bézout equation aq − bp = 1 In this case the A(p, q) = (a, b) notation is introduced. Similarly, c and d are the Bézout coefficients of a and b of type B if c and d are the unique integers such that 0 ≤ c ≤ p, 0 < d ≤ q and satisfy the Bézout equation cq − dp = −1 In this case, denote B(p, q) = (c, d). Furthermore, the flip map F will be F (p, q) = (q, p) The following is a straightforward observation. The proof is left to the reader. Proposition 1. If (p, q) is a pair of positive coprime numbers, then F (A(p, q)) = B(q, p) = B(F (p, q)). Suppose that A(p, q) = (a, b) and B(p, q) = (c, d). Then to obtain the type A skew Sturmian sequence with frequency q/p, apply the rule of inserting the word 10 each time a lattice point is crossed by the piece-wise line

y=

⎧ q ⎪ ⎪ ⎪px ⎨

b ax ⎪ ⎪ ⎪ ⎩ q (x p

if x ≤ 0 if 0 ≤ x ≤ a − a) + b

if x ≥ a

7 A Visual Overview of Coprime Numbers

157

And to obtain the type B skew Sturmian sequence with frequency q/p, apply the rule of inserting the word 01 each time a lattice point is crossed by the piece-wise line ⎧ q ⎪ x if x ≤ 0 ⎪ ⎪ ⎨p y = dc x if 0 ≤ x ≤ c ⎪ ⎪ ⎪ q ⎩ (x − c) + d if x ≥ c p Using this last representation, it is possible to see which a/b-chain or c/d-chain turns out to be the anomaly word for the resulting skew Sturmian sequence. In Fig. 7 the lines for (p, q) = (5, 3) are shown. In this case A(5, 3) = (2, 1) and B(5, 3) = (3, 2). In fact, these are the only essentially different skew Sturmian sequences associated to the rational number q/p. More precisely, according to (ItzáOrtiz et al, 2016, Corollary 3.6), if s is a skew Sturmian sequence of type A, then there is only one skew Sturmian sequence which is not in its orbit and that it is conjugate to s, namely, the skew Sturmian sequence with inverse frequency and opposite type. Figure 8 shows the pairs of conjugate skew Sturmian sequences. One may check that the conjugacy between a pair of skew Sturmian sequences is given by the exchange of symbols. Hence, to a pair (p, q) of coprime numbers, it corresponds to a skew Sturmian sequence with frequency pq . To distinguish type A from type B skew Sturmian sequences with the same frequency pq , associate to (p, q) either the pair A(p, q) or B(p, q). Hence, the sets: A q = {(p, q), A(p, q)} and B q = {(p, q), B(p, q)} p

p

are introduced. Each contains the complete dynamical information of a particular skew Sturmian sequence with frequency pq , up to a conjugacy. It is natural to

4

2

−2 0 1

−4

10

0

10

10

0

0

10

1

01

0

0

4

10

01

2 0

2

4

6

−2 0 1

−4

0

−2 01

0

10

01

1

0

10

01

0 01

0

2

4

6

0

−2

Fig. 7 Type A (left) and type B (right) skew Sturmian sequence with same frequency 3/5

8

158

B.A. Itzá-Ortiz et al.

5

−4

−2

5

2

4

8

6

−4

−2

−5

2

4

6

8

−5

Black line is type A frequency 3/5. Red line is Type B frequency 5/3

Black line is type B frequency 3/5. Red line is Type A frequency 5/3

Fig. 8 Pairs of conjugate skew Sturmian sequences with different types and reciprocal frequencies

introduce an equivalence relation defined by A q ∼ B p and B q ∼ A p . Given the p q p q geometric nature of prime numbers, it is natural to wonder about the geometrical behavior of these equivalence classes of a given skew Sturmian sequence. To this purpose, introduce the subset Zq = Aq ∪ Bp ∪ Bq ∪ Ap p

p

q

p

q

containing the (skew Sturmian) dynamical information associated to (p, q). Since Z q has cardinality eight, it seemed more interesting to plot the points of p Z q when the points (p, q) belong to a certain line. p

Ford Circles and Farey Sequences Ford circles (Ford, 1938) are “geometric picturizations” of fractions. For a given rational number pq with p and q coprime numbers, the Ford circle associated to pq is given by the equation    q 2 1 2 1 2 x− + y− 2 = . p 2p 2p2 A remarkable property of Ford circles is the following: If Q P is an adjacent fraction to pq , that is to say, if |qP − pQ| = 1, then the Ford circles associated 1 2 4 3 to pq and Q P are tangent. For example, the fractions 2 , 3 , and 7 are adjacent to 5 . The Ford circles corresponding to these fractions are illustrated in Fig. 9.

7 A Visual Overview of Coprime Numbers

159

y

1/2

4/73/5

2/3

x

Fig. 9 Ford circles for 35 , in red, and the Ford circles of its adjacent fractions 12 , 47 , and 23 . Notice that the Ford circles are an infinite family of circles tangent to each other

Ford proved (Ford, 1938, Theorem 4) that any such irreducible fraction pq with |p| > 1 has exactly two adjacent fractions with denominators smaller than p. It follows that A(p, q) = (a, b) and B(p, q) = (c, d) are the only other two pairs with 0 < a, c < p such that both ab and dc are the closest adjacent fractions to q p . Recall that a pth-Farey sequence consists of all proper reduced fractions with denominator up to p. Hence, A(p, q) and B(p, q) can be seen as the left and right neighbors, respectively, of pq in the p-th Farey sequence (Conway and Guy, 1996, Page 152). Recall that the mediant fraction of ab and dc is defined as b+d a+c . Suppose that A(p, q) = (a, b) and B(p, q) = (c, d). Since pq is the mediant fraction of its neighbors, it follows that pq = b+d a+c which implies 1 = aq − bp = a(b + d) − b(a + c) = ad − bc. Therefore the pairs (p, q), A(p, q), and B(p, q) have mutually adjacent associated fractions. Thus, the Ford circles for (p, q), A(p, q), and B(p, q) are mutually tangent, as shown in Fig. 9.

Bézout Graphs Let p be a positive integer. In (Itzá-Ortiz et al, 2020), the Bézout coefficients of the coprime pair (p, q) as B1 (p, q) = A(p, q) and B−1 (p, q) = B(p, q) were defined. Then the Bézout set Bp for p is Bp =



Zq . p

0 bN > bq and so a > b, that is, 0 < a − b. It will be shown that a − b ≤ N − q or, equivalently, that q − b ≤ N − a. Since 0 < b < a, 0 ≤ q − b and 1 = aq − bN = a(q − b) − b(N − a): b(q − b) ≤ a(q − b) = 1 + b(N − a). Since N > 1, the strict inequality holds in the last inequality. Hence q −b
2 be an integer. Let 0 < q < N be such that q and N are coprime. Let w be the smallest positive integer such that Nw is a perfect square. √ √ Nw Let r and s be coprime such that rs = q−a−b . Let u = r Nw. Then, as long as 0 < q + ut < N, A(N − q − ut, q + ut) belongs to the parametric parabola given by the expression:   u2 u2 q −a−b q −a−b − 1 ut + t 2 , b − ut − t 2 γ (t) = a + N N N N 

11 γ (t) is precisely the quadratic curve exhibited in the 01 proof of (Itzá-Ortiz et al, 2020, Theorem 1).

Proof. The product

7 A Visual Overview of Coprime Numbers

165

As an example, let N = 1536 and q = 503. Then A(N − q, q) = A(1033, 503) = (306, 149) = (a, b). On the other hand, w = 6 and r = 2 since rs = u = 192 and so the parametric quadratic curve is



Nw q−a−b

=

96 48

γ (t) = (306 − 186t + 24t 2 , 149 − 6t + 24t 2 ).

= 2. Therefore

(1)

In Table 1 several values of t and the resulting Bézout coefficients of type A in the parabola γ (t) are listed; these points are illustrated in Fig. 16. It is worth to notice that although the parabola γ (t) in Equation (1) might be extended outside the first quadrant, only its values in the first quadrant are of interest

Table 1 For N = 1536, the points of the form A(N − q − ut, q + ut) in the right column belong to the parabola γ (t) of Equation 1

t 5 4

1 3 4 1 2 1 4

0 − 41 − 21 − 43 −1 − 45

Fig. 16 The highlighted points are Bézout coefficients some of which are listed in Table 1. They lie in the parabola γ (t) of Equation 1

(N − q − ut, q + ut) (793, 743) (841, 695) (889, 647) (937, 599) (985, 551) (1033, 503) (1081, 455) (1129, 407) (1177, 359) (1225, 311) (1273, 263)

A(N − q − ut, q + ut) = γ (t) (111, 104) (144, 119) (180, 131) (219, 140) (261, 146) (306, 149) (354, 149) (405, 146) (459, 140) (516, 131) (576, 119)

166

B.A. Itzá-Ortiz et al.

within the framework of this chapter. For example, for the value t = 3 in the left column of Table 1 (not shown in the table for this reason), γ (3) = (−36, −85) is a point of the parabola (right column of Table 1) which corresponds to the coprime pair (N − q − 3u, q + 3u) = (457, 1079) (center column of Table 1). It follows that 1079(−36) − 457(−85) = 1, that is to say, they satisfy the Bézout equation for Bézout coefficients of type A (cf. Definition 1). However A(1079, 457) = (−36, −85) + (457, 1079) = (421, 994) = (−36, −85). A picture of a Bézout set together with several of these parabolas is shown in Fig. 17. An interesting outcome of Theorem 1 is that since every pair of positive coprime number (m, n) may be seen as the Bézout coefficients (of type A) of a pair of positive coprime numbers (p, q), then each pair of positive coprime numbers may be classified as belonging to a Bézout set CN , for some N. A way for generating all pairs of positive coprime numbers using ternary trees was proposed and used as generators of Pythagorean triplets in (Saunders and Randall, 1994). Another consequence of Theorem 1 can be stated in the context of Bunyakovsky conjecture (Schinzel and Sierpinski, 1958, 1959), which provides conditions for a polynomial p(x) with integer coefficients to imply that the sequence p(1), p(2), . . ., p(n), . . . contains infinitely many primes. In this case, the parametrized parabola γ (t) = (p(t), q(t)) provides a pair of polynomials p(t) and q(t) with integer

Fig. 17 The solid color lines are the parabolas mentioned in Theorem 1. They are the loci of the points belonging to the Bézout set

7 A Visual Overview of Coprime Numbers

167

coefficients producing infinitely many coprime pairs γ (ku) for all k ∈ Z and some rational number u.

Conclusion This chapter contains a visual presentation of coprime numbers in the Cartesian plane. The motivation was given by the classification of skew Sturmian sequences up to conjugacy. Using this idea, a pair of coprime numbers has been associated with four pairs of Bézout coefficients. When the pair of coprime numbers vary along a given segment, the Bézout coefficients turn out to belong to certain quadratic arcs. Several pictures are presented to show the close relationship between Mathematics and what could be called visual art. All the figures included in this essay were elaborated with Java and Python scripts. The code is available on request.

References Conway JH, Guy RK (1996) The book of numbers. Springer, New York Daus P (1932) The march meeting of the southern California section. Am Math Mon 39(7):373– 374. https://doi.org/10.1080/00029890.1932.11987331 Fine B, Rosenberger G (2007) Number theory: an introduction via the distribution of primes. Birkhauser, Boston Ford L (1938) Fractions. Am Math Mon 45:586–601 Itzá-Ortiz B, Malachi M, Marstaller A, Saied J, Underwook S (2016) Classification of eventually periodic subshifts. Indag Math 27:868–878 Itzá-Ortiz B, López-Hernández R, Miramontes P (2020) Digital images unveil geometric structures in pairs of relatively prime numbers. Math intelligencer 42:30–35 Klauber LM (1931) Field notes, available via Archiv.org. https://archive.org/stream/ 1931fieldnotesla00klau#page/362/mode/1up. Accessed July 1st, 2020 Lind D, Marcus B (1995) An introduction to symbolic dynamics and coding. Cambridge University Press, Cambridge Lothaire M (1997) Algebraic combinatorics on words. Cambridge University Press, Cambridge Morse M, Hedlund G (1940) Symbolic dynamics II. sturmian trajectories. Am J Math 62:1–42 Saunders R, Randall T (1994) The family tree of the Pythagorean triplets revisited. Math Gaz 78:190–193. https://doi.org/10.2307/3618576 Schinzel A, Sierpinski W (1958) Sur certaines hypotheses concernant les nombres premiers. Acta Arith 185–208 Schinzel A, Sierpinski W (1959) Corrigenda. Acta Arith 259 Stein ML, Ulam SM, Wells MB (1964) A visual display of some properties of the distribution of primes. Am Math Mon 71(5):516–520. http://www.jstor.org/stable/2312588

Almost All Surfaces Are Made Out of Hexagons Hyungryul Baik

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Closed Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pants Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyperbolic Plane and Negative Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Each Surface Admits More Than One Geometric Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

169 170 171 172 173 174

Abstract In this chapter, we will see that almost all “finite” surfaces can be built out of hexagons whose sides are “straight lines” with respect to an appropriate geometry.

Keywords Surface · Hyperbolic geometry · Topology

Introduction In this chapter, we will talk about surfaces. But what is a surface? Loosely speaking, a surface is a space which can be made by pasting certain pieces of the 2dimensional plane. In mathematics, especially in the subfield of mathematics called topology, the notions of a “piece” and the process of “pasting” are very precisely

H. Baik () Department of Mathematical Sciences, KAIST, Yuseong-gu, Daejeon, South Korea e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_111

169

8

170

H. Baik

defined, but one can imagine what a surface is pretty easily without a rigorous mathematical discussion, since in a very reasonable sense, everything we see in our daily lives is a surface (of some object). Imagine you are in front of the Eiffel tower. We all surely know that Eiffel tower is not an 2-dimensional object (since it is not an infinitely thin layer), but unless you are the Superman with X-ray vision, we cannot see the inside of each beam in the Eiffel tower – we only see the outermost shell of the tower, which is the surface of Eiffel tower. To be able to talk about surfaces in a more systematic way, we will restrict ourselves to so-called closed surfaces and then discuss that almost all of them can be built out of hexagons. But even to begin to talk about what kinds of surfaces are there, we first need to understand which two surfaces are considered to be the same surface by mathematicians. For mathematicians, especially topologists, a mug cup and a donut are the same surfaces. Again, you may complain that a donut is not a 2-dimensional object, but remember we are seeing only the surface of the donut and that’s all we consider. Since it is too long to say “the surface of a donut,” we will just say a donut to indicate its surface. A mug cup and a donut are the same, in the sense that we can slowly change one to the other by stretching, compressing, and bending without cutting or breaking it. Those two surfaces are said to be topologically equivalent, or homeomorphic. In the next section, we will consider surfaces up to topological equivalence and define a closed surface, and we will describe how to build a donut from a hexagon. Finally, we will show that all but one closed surface can be built from hexagons, not flat but negatively curved. We do not want to make readers swamped by a tons of references, so we minimized the list. For various things mentioned in this article without a specific reference, one should be able to find details in either Farb and Margalit (2011) or Bredon (2013).

Closed Surfaces A closed surface is a surface which can be obtained from the (2-dimensional) sphere by attaching the so-called handles. Let me describe the process of attaching one handle more precisely. First, remove two disjoint small disks from the sphere. This creates “boundary” of the surface. Now attach a cylinder to this surface by gluing each boundary of the cylinder to one of the boundaries we just created by removing a disk. This attached cylinder is called a handle. One can easily convince that, up to topological equivalence, the sphere with one handle is the donut. We can add more than one handle to the sphere. What would we get if we add an extra handle to the donut? This means we just remove four disjoint disks from the sphere and attach two cylinders. The resulting shape will have two holes, so if you make a Figure 8 with some thickness, its surface is the surface we obtained by adding two handles to the sphere. If you go to a water park in a hot Summer day and take a tube slide for two people, the tube is the surface we are talking about.

8 Almost All Surfaces Are Made Out of Hexagons

171

In fact, one can add arbitrarily large number of handles. Say we want to add two handles. Then the surface obtained from the sphere by attaching g handles is called the closed surface of genus g, i.e., genus is the number of handles. If we say simply a closed surface, it means a surface homeomorphic to the closed surface of genus g for some g. To be more rigorous, we should also say these are “orientable” surface. Intuitively speaking, being orientable means it has two sides. A surface such as the Möbius band has only one side, since wherever in the surface you start, you can reach any other point by walking on the surface without crossing it. For simplicity, we will only consider orientable surfaces without further mentioning it. Our definition is based on the classical result on the classification of closed surfaces. For instance, see Bredon (2013). A donut is the closed surface of genus 1 and also known as the torus. The easiest way of building a torus is gluing opposite sides of a square. Consider a square and glue the top side with the bottom side and the left side with the right side. When one glues the top and the bottom of the square, it becomes a cylinder, and gluing the left and right sides corresponds to gluing two boundaries of the cylinder together which makes a donut. We can build a closed surface of arbitrary genus in a similar fashion. To build the closed surface of genus g, first consider the regular 4g-gon in the plane, i.e., a regular polygon with 4g-sides. By gluing opposite sides pairwise, one gets the desired surface. Seeing this process works in general is not so easy at the first place. In order to get familiar with this picture, let’s consider the regular octagon together. First glue the top side to the bottom side and the left side to the right side. From our first exercise of building the torus from a square, one can see that this results in the torus with a square removed. Gluing opposite diagonal sides pairwise is now equivalent to glue the opposite sides of the removed square, which ends up creating another torus with a squared removed on the “other side.” Said differently, we built a surface which looks like two copies of the torus with a square removed glued along their boundary. Once you try to draw a surface of genus 2, it is not hard to convince yourself that the surface we just created is indeed (homeomorphic to) the closed surface of genus 2.

Pants Decomposition Interestingly enough, there is another systematic way of building closed surfaces. The way we will describe in this section might be easier to visualize. We first introduce a new surface, so-called a pair of pants. A pair of pants is the surface obtained from the sphere by removing three disjoint disks. If you are wearing a pair of pants now, please compare it with our definition. With a bit of imagination, you will be able to see that they are the same up to topological equivalent unless you have some unnecessary holes on your pants (OK, I ignored the “handles” attached along the waist for your belt). Suppose you have two pairs of pants. Then there are six boundary components in total. Glue them pairwise to get a surface without boundary. No matter how you

172

H. Baik

Fig. 1 Two different pants decompositions of a surface of genus 2

glue, you end up with the closed surface of genus 2. Figure 1 is a picture of two kinds of pants decompositions of the surface of genus 2. If you use four pairs of pants, then you get the closed surface of genus 3. In fact, the surface of genus g can be obtained from 2g-2 pairs of pants by gluing 6g6 boundary components of those pants pairwise. It does not matter how you glue them – you always get the same surface up to topological equivalence. As an exercise, I invite the readers to try to decompose the closed surface of genus g into 2g-2 pairs of pants by drawing 3g-3 loops appropriately. Try with g = 2, 3, 4, . . . and see if you can generalize your idea for arbitrary genus. This is a fun mental exercise, and it is the base of the idea of building closed surfaces out of hexagons.

Hyperbolic Plane and Negative Curvature In this section, we will briefly talk about the hyperbolic plane H2 . This is like the usual plane in some way but very different in other ways, since it is not flat but curved weirdly. At every point, a small neighborhood looks like the saddle. It is curved in the way that in some direction, it looks like convex, and in other direction, it looks like concave. In the world of hyperbolic plane, there are more “rooms” out there. For a straight line L and a point p which is not on the line L, there are infinitely many straight line passing through p but never meet the line L we started with! If you recall what you learned about the geometry of the usual plane, there should be a unique such line, but in the hyperbolic plane, there are infinitely many. To grasp the idea how the hyperbolic plane actually looks like, I encourage the readers to check out the famous paintings of the tessellation of the hyperbolic plane by Escher (1959). The reason why I mention the hyperbolic plane in this section is that the “natural geometric shape” of an arbitrary closed surface of genus at least two is like the hyperbolic plane. So far, we have always considered surfaces up to topological equivalence which ignore the precise shape such as the distance between two points and the angle between two lines. But each surface can be made into a geometric object by specifying the shape, and it is a remarkable fact that any closed surface of genus at least two has a geometric shape such that every point has a small neighborhood where the world looks exactly like the hyperbolic plane. When the surface is in such a shape, we call it a hyperbolic surface. Once you have a decomposition of a hyperbolic surface of genus g into 2g − 2 pairs of pants via 3g−3 loops on the surface, we can straighten up the loops so that at every point on the loop, it locally looks like a piece of a straight line in the hyperbolic plane. On each pair of pants, we have three straight line boundary components. For

8 Almost All Surfaces Are Made Out of Hexagons

173

each pair of boundary components, one can connect them via a straight line segment which meets the boundary components at the right angle. After connecting all three pairs, now one sees that the pair of pants is decomposed into two hexagons. Each of this hexagon geometrically lives in the hyperbolic plane, in the sense the it is exactly the same hexagon with some hexagon in the hyperbolic plane where each side is a straight line segment. This is already an interesting fact, because now we have a pair of pants made out of two hexagons where each side is a straight line segment and each internal angle is the right angle. We call such a hexagon a right-angled hexagon. A right-angled hexagon cannot live in our usual world, since this would mean the total sum of the internal angles is 3π , while the total sum of the internal angles of a hexagon in the Euclidean plane is 4π . This is another feature of negative curvature. In fact, in the hyperbolic plane, one can have a hexagon whose internal angles sum up to any number between 0 and 4π . Above we expressed a hyperbolic surface of genus g as a collection of 4g − 4 hexagons in the hyperbolic plane glued along their boundary. An elementary hyperbolic geometry can show that the geometric shape of a right-angled hexagon in the hyperbolic plane is completely determined by the length of the three alternating sides. Hence, the two hexagons forming one pair of pants are in the exactly same shape. In particular, this means, up to geometric equivalence, we only need 2g − 2 hexagons to build a closed surface of genus g.

Each Surface Admits More Than One Geometric Shape We have so far shown that all but two closed surfaces are built out of hexagons – more precisely, any closed surface of genus at least 2 can be obtained from gluing hyperbolic hexagons along their boundary. Since the surface is built upon pieces which live in the hyperbolic plane, it has the natural geometric shape as a hyperbolic surface. But the geometric shape of the surface is not unique, and by varying the shapes of hexagons, we can build surface which are topologically the same but geometrically different. Perhaps this can be best explained in the pants decomposition point of view. For each loop used in the pants decomposition of a surface, one can change the length of the loop (but not all loops can be really short or really long due to our geometric constraints), and also when the lengths of the loops are fixed, there is still a room for further deformation of the shape. In fact, at each loop, one can think two pairs of pants are glued along the loop. But this gluing can be also altered by cutting it along the loop again and twist by an arbitrary angle and glue back. Mathematicians showed that each twisting will yield a surface of different geometric shape (although this is totally nontrivial). We have 3g − 3 curves for a pants decomposition on the surface of genus g, and each curve has two degrees of freedom for the deformation – the length and the twisting. This means we have 6g − 6 parameters to determine the geometric shape of the surface. In fact, if we collect all possible (hyperbolic) geometric shapes that a closed surface of genus g can admit, we get a huge 6g − 6dimensional space called the moduli space (Fig. 2).

174

H. Baik

Fig. 2 Moduli space for the surface of genus 2

Fig. 3 From a regular hexagon on the plane to the torus

To conclude our journey, we go back to the case of the torus and see that the torus can be also obtained from a hexagon. In this case, it is obtained from a flat hexagon (i.e., a hexagon in the usual plane) by gluing sides pairwise. Consider a regular hexagon in the plane, and glue the opposite sides pairwise. Then we get a torus – like a magic! Seeing is believing, so I describe the procedure of this side-gluing with a picture. See Fig. 3. To learn more about the torus and its symmetry, the readers are encouraged to see also the first chapter of Thurston (1997).

References Bredon G (2013) Topology and geometry. Springer, Berlin Escher MC, (1959) Circle Limit III, website (https://mcescher.com/gallery/mathematical/) Farb B, Margalit D (2011) A primer on mapping class groups. Princeton University Press, Princeton/Oxford Thurston W (1997) Three-dimensional geometry and topology. Princeton University Press, Princeton

9

Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives António B. Araújo

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anamorphosis Formed Again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Empirical Principle: Radial Occlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anamorphosis Formed Fast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Considerations on Anamorphosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anamorphosis Formally Reformed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anamorphosis as a Mathematical Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simplifications: Talking to Artists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . On Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Geometry Construction of Anamorphoses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Handmade vs Digital Anamorphoses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dürer Machines Running Back and Forth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spherical Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Problem with Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

176 180 180 182 188 193 193 195 211 211 213 213 214 220 225 230 238 238 238

Abstract We discuss a definition of conical anamorphosis that sets it at the foundation of both classical and curvilinear perspectives. In this view, anamorphosis is an equivalence relation between three-dimensional objects, which includes two-

A. B. Araújo () CIAC-UAb, Center for Research in Arts and Communication, Universidade Aberta, Lisbon, Portugal e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_101

175

176

A. B. Araújo

dimensional representatives, not necessarily flat. Vanishing points are defined in a canonical way that is maximally symmetric, with exactly two vanishing points for every line. The definition of the vanishing set works at the level of anamorphosis, before perspective is defined, with no need for a projection surface. Finally, perspective is defined as a flat representation of the visual data in the anamorphosis. This schema applies to both linear and curvilinear perspectives and is naturally adapted to immersive perspectives, such as the spherical perspectives. Mathematically, the view here presented is that the sphere and not the projective plane is the natural manifold of visual data up to anamorphic equivalence. We consider how this notion of anamorphosis may help to dispel some long-standing philosophical misconceptions regarding the nature of perspective.

Keywords Anamorphosis · Occlusion · Euclidean optics · Mimesis · Perspective · Spherical perspective · Curvilinear perspective · Immersive · Panoramas · Optical illusion

Introduction This chapter presents an unusual treatment of a familiar concept: anamorphosis. It is an elaboration on a set of recent works, especially (Araújo 2018c), that reconsider the notion of anamorphosis with a view toward a rigorous and useful definition of the conical spherical perspectives. The title of this chapter is an allusion to that foundational role. This view turns out to also have a clarifying role with regard to the philosophical debate on the mimetic nature of classical perspective. It can be seen as reconciling concepts often thought to be in opposition: Euclidean optics, linear perspective, and curvilinear (spherical) perspectives. The proposed definition of anamorphosis can be thought of as Euclid’s optics with vanishing points: it emphasizes topological aspects, providing a canonical definition for vanishing points that does not depend on a choice of projection surface. This gives anamorphosis a role that precedes that of perspective both logically and in importance. Perspective proper we define as a two-step process, the first and most important of which is anamorphosis and the second step being a representation step, a flattening of the visual data onto the plane, analogous to cartography. This is rather the opposite of the usual state of affairs. Expressing the common view on anamorphosis, Collins (1992) states that “any explanation of its mechanisms necessarily begins with a discussion of classical perspective.” We affirm here the antipodal view that perspective is the derived concept and that its most important features – namely, the definition of vanishing points – are more adequately defined at the level of anamorphosis.

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

177

Historically, anamorphosis has been seen as the trickster, the charming but ultimately fatuous sibling indulging in party tricks, while perspective handles the respectable work: the latter fit for the architectural firm, the former for the cabinet of curiosities. In historical analysis, as in perspective textbooks, it portrayed as a derivative and degenerate special case of perspective, or even a perversion of it, a wonderful monstrosity, though a charming one, like the scares of a B-movie can be charming. We feel the fear is justified and telling; it is the trepidation of the human mind contemplating the palpable ambiguity of its senses. This view of anamorphosis is expertly treated in other places (Baltrušaitis 1983; Collins 1992) (see also an excellent chapter of the present book,  Chap. 10, “Anamorphosis: Between Perspective and Catoptrics”). Here we take a very different view: we say that anamorphosis is a charmingly simple consequence of a single axiom, the principle of radial occlusion, and a perfect example the rational study of perception, with its origins in Euclid’s optics, and that linear perspective is the slightly awkward, utilitarian derived concept. Traditionally, also, anamorphosis is mostly seen as an inverse problem, a game of hide and seek, where the spectator is bent on finding the single observational point from which an image makes sense, from which it can be “formed again,” as the etymology of the word “anamorphosis” suggests. It is the territory of Niceron and of Holbein. But here we emphasize a parallel tradition that treats anamorphosis as a direct problem that starts with Brunelleschi and at which Pozzo excelled. The distinction is one of emphasis and intention rather than principle. In this second approach, the observational point is proffered at the start, and the emphasis is on integration with the physical environment. With Brunelleschi, standing at the doors of Florence with his tavoletta, anamorphosis became a seminal experiment in visual psychophysics, the study of the connection between stimuli and sensation. With Andrea Pozzo, with anamorphic ceilings such as the one of the church of Sant’Ignazio in Rome (Fig. 1), it becomes immersive, a discipline of architecture by other means. No peeking through peepholes here, no mirrors or contraptions. Indeed Pozzo made use of anamorphosis as part of a lived in space and projected it both as true architecture and as a replacement for such (Fasolo and Mancini 2019; Kemp 1990). Although the view is not fully immersive, it is wide enough to make clear that there are no limitations of principle to the field of view, only practical ones. This view of anamorphosis as a living part of architecture is of course not exclusive to Pozzo (see, for instance, Rossi 2016), although he is a prime example of that tradition. Besides architecture proper, the use of such devices in scenography ˇ cakovi´c and Paunovi´c 2016). is well known (Cuˇ We note that some works here called anamorphoses (e.g., Pozzo’s ceilings, Brunnelleschi’s panels) are often and even historically called perspectives . The term anamorphosis was a belated seventeenth-century invention, and in the meantime the term perspective became rather polymorphous, destined to denote both plane and curvilinear projections, 2D pictures and 3D objects. This semantic overloading can become a source of confusion, and Araújo (2016) claims it as important for the debate between the realist and conventionalist schools of the philosophy of perspective but also for the late development of spherical perspective. We will here

178

A. B. Araújo

Fig. 1 Architecture by other means. Right: Fisheye view of the ceiling of St. Ignazio’s Church in Rome. Illusionistic columns prolong the real ones seamlessly. A painting on a flat ceiling simulates a dome. Left: on the ground a golden disc marks the spot where the observer should stand. In a coincidental but quite charmingly appropriate manner, the disc and the surrounding reddish strips look like an eye when photographed with a fisheye camera. (Photograph by the author)

follow Araújo (2018c) in using the freedom of mathematicians to plunder old words for new purposes. Definitions are after all just syntactic sugar – they are good if they help us to think clearly. We will organize concepts by restricting the term “perspective” to plane pictures and using “anamorphosis” for the more general mimetic device. It is a choice justified by how natural it makes the terminology of spherical perspectives while decently in accordance with common usage and historic precedence; Taylor (Andersen, 1992), for instance, defined perspective as (emphasis mine) “the art of drawing on a plane the appearances of figures (. . . ).” The separation of perspective in two steps, first a conical projection onto the sphere, the next a cartographic flattening of this projection, is practically inevitable when dealing with spherical perspectives and has been done implicitly in Barre and Flocon’s seminal work (Barre and Flocon 1968; Barre et al 1964) “La Perspective Curviligne,” although with a reluctance in considering the whole sphere in the perspective construction (hence this first spherical perspective was in fact only hemispherical). In artistic applications, Dick Termes made a career out of spherical anamorphoses, but also limited himself to the hemisphere in his plane projections (Termes 1998). It is hard to know how much of the limitation was due to technical difficulties, how much to philosophical reluctance, and how much to aesthetical preference. The matter of constructing total spherical perspectives is taken up in another chapter of this book (see the  Chap. 19, “Spherical Perspective” in the present volume). Here we emphasize the role of anamorphosis as an equivalence between 3D objects and not just drawings: as a radial equivalence, without a preferred axis, hence with no field of view limitations. Mathematically, this can

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

179

be seen as replacing the projective plane by the sphere as the visual data manifold (on the projective formalization of perspective, see (Morehead 1955) and also the  Chap. 4, “Looking Through the Glass” in the present volume). In computer graphics the spherical view – free of the practical difficulties of handmade drawing on the sphere – is very natural. Barnard (1983) saw it as a natural setting for the algorithmic interpretation of perspective images. In (Correia et al 2013; Correia and Romão 2007) a two-step process of projection onto a curvilinear surface followed by a plane representation is used for free-form wideview visualization of architecture. As the surfaces in question can be deformed homeomorphically onto the sphere, this is analogous to the method considered here. The approach of Araújo (2018c) is of particular use to us here for explicitly relating the first step of this entailment to a general notion of anamorphosis and for setting it within a tradition of rational drawing as practiced by the draughtsman rather than in the field of computer rendering. This work used spherical anamorphosis as the first step of a construction for handmade spherical perspective drawings of the azimuthal equidistant (or fisheye) spherical perspective, and defined a general strategy for solving spherical perspectives, later applied to the equirectangular (Araújo 2017d, 2018b) and cubical (Araújo et al 2019b) spherical perspectives. The entailment in the definition proposed, separating the anamorphic aspect from the plane projection, has the side effect of dispelling the state of tension between the concepts of anamorphosis and perspective, automatically solving equivocations like the so-called paradox of Leonardo (Araújo 2016) that so bothered the philosophy of art. This view is especially important in face of a growing awareness of the connection between the digital techniques of visual immersion with the sometimes misunderstood methods of the past (Araújo 2017b; Gay and Cazzaro 2019; Grau 1999; Tomilin 2001), a connection often made obscure by the fog of confusing and inadequate terminology incidental to their historical development. The details of the machinery apart, the geometry used in VR, AR, and MR, in immersive photography, video mapping, and full dome projections, is still the same as what was used to draw the anamorphoses of Niceron and Pozzo or Robert Barker’s immersive panoramas that so dazzled nineteenth-century crowds (Grau 1999; Huhtamo 2013). Hence these concepts, properly reformulated, have didactic, artistic, and technological possibilities still to be explored, and indeed several researchers and artists have lately investigated, through mathematical, technological, and artistic works, the possibilities of such connections between the digital and physical notions of anamorphosis and the hybrids in between (Araújo et al 2019a; Michel 2013; Rossi et al 2018). This chapter arose from such explorations, as well as from an extensive teaching experience of these concepts to a varied audience ranging from school children just starting out in their studies of perspective and descriptive geometry (Araújo 2017c), to working illustrators to Ph.D. students of digital media art looking to better understand the underlying concepts hidden in the often opaque digital black boxes that serve as their tools (Araújo 2017b). It is hoped that both the artist and the geometer will find something of interest here.

180

A. B. Araújo

Anamorphosis Formed Again We present a formulation of anamorphosis following Araújo (2018c) and Araújo (2017a). We intend this chapter to be readable both by mathematicians and artists. We therefore will alternate between the terse style of the mathematician and longer explanations for the artist and philosopher, flirting with that doubtful balance wherein all are fed and none is satisfied. To make the reading fluid, we present it uncritically and naively in a first section, then render it mathematically, and finally discuss the assumptions that were made in the process.

The Empirical Principle: Radial Occlusion To start with, let’s recall the usual definition of (conical) anamorphosis. It is usually some variation of “a distorted image that will look normal when viewed from a particular point.” We are unsatisfied with this. It will do as a dictionary definition but it won’t pass muster as a mathematical one. What is meant by distortion? What is an image? What does it mean to look normal? It is claimed in Araújo (2016) that this carelessness in definition is a piece of historical baggage that has caused all sorts of conceptual mischief. We will discard it entirely and try to recapture its spirit in a more rigorous fashion that encapsulates both the actual operational aspects of anamorphosis and exposes its mathematical structure. The study of conical anamorphosis is a subset of the study of mimesis, the imitative representation of the world. We find that there are classes of objects that although very different as three dimensional forms look the same to human observers under adequate experimental conditions. The expression look the same has here a very specific and operational meaning: a human observer, under stipulated experimental conditions, cannot tell the difference between the two objects. If one object was to be suddenly replaced by the other, the subject would not notice the switch. This mimicry, to really deceive the eye, depends on several factors, regarding shape, color, and lighting, among others. Conical anamorphosis, as defined here, studies only the most basic of these factors: the appearance of objects with regard to their apparent contour or the form of the region that they occupy in the visual field. This is by itself enough for certain effective illusions requiring only the matching of form and is a necessary condition for more complex illusions involving matters of color or more complex optical conditions. That visual mimesis of form occurs under certain conditions is not demonstrable from first principles of geometry but is rather an empirical fact about human vision that relates human visual sensations to measurable, geometrical properties of visual stimuli. It can be made to be demonstrable only when certain empirical principles are abstracted onto geometrical assumptions. In the case of conical anamorphosis, the whole subject derives from a single empirical observation, regarding how object occlude (hide) each other. We may

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

181

imagine the following experiment: an observer, his eye at a fixed point O, is presented with two very small (“point-like”), black, featureless balls, set in various positions relative to each other. The observer is asked how many balls he sees. For most such positions, the observer reports seeing two balls. But when the balls are geometrically aligned with his eye, he will report seeing only one. From this we abstract a principle we call radial occlusion: Principle of radial occlusion Two points P and Q look the same to an observer at a point O if they are on the same ray with origin in O. By look the same we mean literally that they look like the same point. You could say that they occupy the same location in the visual field. Or you could state the principle as “optical alignment coincides with geometric alignment.” The choice of words here is not where precision lies. The meaning is operational, so precision comes from the specification of the conditions of the experiment: “Look the same” means that the subject of some specific experiment reports something like “I see only one ball” instead of “I see two balls.” We are dealing here with psychophysics, the study of human perceptions as a function of (controlled, measured) stimuli. The study of perception ultimately relies on the response of human subjects to experiments. These are very concrete experiments made under well-specified conditions, typically meant to elicit simple yes-or-no responses. In the study of color, for instance, a subject may be presented with two adjacent patches of color from different power spectra and asked if they are the same color or not (or, equivalently, if they look like one or two different patches). With such experiments, a principle of linear color addition may be tested. In the same way, the principle of radial occlusion is not demonstrable by mathematics. It is a mathematical abstraction of the results of experiments on human subjects. Brunelleschi’s demonstration of perspective may be seen as one such experiment, showing that the lines of a drawing made in linear perspective match the lines of the real object when one is superimposed on the other. Of course, we do have precise models of both optics and physiology that allows us to model and understand when and why the principle of radial occlusion is valid (such models were in turn tested against the empirical facts of perception). We know the principle depends on the way light rays move and how they are processed by the human visual system, and we therefore also know when to expect it to fail. Both refraction and reflection cause it to fail. In Fig. 2 we see that points A and B are aligned with both a light rays and a geometrical ray, but points C and D, which are on different optical media (say water and air), are therefore on the same light ray but not on the same geometrical ray. Then C will occlude (hide) D to an observer at O, although the points are not aligned. This is what optics leads us to expect, and perception confirms it (when refractions or reflections are important we are led to a different sort of anamorphosis, which we will not treat here). Now, since a lens works through refraction and that our very eye has a lens, we realize that radial occlusion cannot be but approximately true. This is common fare with statements regarding perception. A principle borne out of experiment is only strictly trusted to be valid under the conditions of the experiment – how far from that

182

A. B. Araújo

Fig. 2 Points A and B are both geometrically and visually aligned relative to the observer at O. Points C and D are visually but not geometrically aligned relative to O due to refraction on the transition between optical media (e.g., water and air)

it will extend is an open question. So, more than explicit statements of absolute truth, these principles are implicit definitions of the gamut of experimental conditions under which they are verified. And they are interesting concepts if that gamut is interesting. That is the case with radial occlusion. Although far from universally valid, it is the default supposition of our interaction with light and the world. We will no longer be concerned in this section on the intricacies of when the principle works, but will reason on its consequences. All that lies ahead follows geometrically from this single assumption of radial occlusion that we will later abstract into our definition of anamorphosis (Definition 4).

Anamorphosis Formed Fast We make a quick and informal summary here, before we proceed to the formal definitions and results. If we accept the principle of radial occlusion, then it follows that many different 3D objects will look the same from O. The points of an object define a cone of rays from the observer O, the visual cone. If two objects define the same cone of rays, they should look the same. Such objects we will call anamorphs of each other relative to O (in fact we will require a technical condition for this anamorphic equivalence or anamorphosis that is slightly looser than equality of visual cones, but that is the subject of the next section; here we speak loosely). The important things to note is that, unlike in perspective, the equivalence here is between 3D objects, and not just between objects and drawings. There is no need here to mention projection surfaces. For instance, take the wireframe cube of Fig. 3 and an observer at O. If we slide each vertex along its ray from O, we get the object to the right of the cube. These

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

183

Fig. 3 Two very different 3D objects with the same visual cone. They will look the same when seen from O if the principle of radial occlusion is valid Fig. 4 Two very different 3D objects with the same visual cone, one surrounding the other. Anamorphosis is radial and isotropic, with no preferential axis

two shapes are very different, but they define exactly the same cone of rays from O, so, if we accept the principle of radial occlusion, we must accept that the two objects will look the same from O. Of course, the segments joining the vertices don’t have to be straight. You could bend each segment freely within the cone of rays it generates, and it would preserve the equivalence. For instance, arc A B  in Fig. 3 is equivalent to rays AB and A B  as it lies within the cone of AB. This cone is just the angle  AOB. This equivalence is radial and isotropic in nature. Unlike classical perspective, where there is a preferential axis defined by the perpendicular to the plane of the picture, here there is no such axis, plane, or even picture, hence no limitation of field of view: this is just an equivalence between objects. So the example of Fig. 4 works

184

A. B. Araújo

just as well as the previous one. This allows to give equal footing to immersive anamorphoses, that is, anamorphs that completely surround the observer (in turn, this proves foundational for spherical perspectives (Araújo 2018c)). Remark 1. The limitation of a preferred axis is often kept in the literature from attachment either to the habits of perspective or to the convenient availability of the mathematical framework of projective geometry (see, for instance, (Sánchez-Reyes and Chacón 2016)). But in projective geometry all points on a line are equivalent, and this is a limited framework for anamorphosis. In anamorphosis the classes of equivalence are rays, not lines, so the sphere and not the projective plane will represent the visual data. The projective plane ignores exactly half of this visual data. The principle of radial occlusion implies the existence of 2D anamorphs to any 3D object. Suppose S is a surface, not containing O. Then the intersection of the visual cone of a 3D object X with S is a 2D object that subtends the same cone as X itself. Hence it is a “drawing” that looks exactly like S when seen from O. You could loosely say that the principle of radial occlusion is what makes drawing possible. The most obvious case is the intersection of an object’s cone with a plane. We see two examples in Fig. 5, the same object being projected onto a “vertical” and a “horizontal” plane. There is no fundamental difference between horizontal and vertical, of course, since these are just named relative to the gravity field, which is quite irrelevant for optics. And yet the former anamorph is usually called a classical perspective, while the latter is called an “oblique anamorphosis.” What is the difference? Any projection plane defines a preferred axis, the perpendicular to the plane through O. The difference is the angular distance of the object’s cone with regard to that axis. When the object’s cone is closely aligned with the main axis, the drawing is called a perspective; when the angular separation is “large,” it is called an oblique anamorphosis. We see that the difference is one of degree, and both are

Fig. 5 Projection onto a plane. There is no absolute difference between the linear perspective and the oblique anamorphosis. Both are anamorphosis onto a plane; the latter merely projects the object to a region further away from the foot of the perpendicular

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

185

special cases of anamorphosis, the projection being just another anamorph that happens to be contained in a plane, whatever the plane’s orientation may be. The projection is operationally the same in both cases; what makes it a “perspective” or “oblique anamorphosis” is the relative position of the body being projected, not the functional form of the projection itself. It is also quite indefensible to say, as is usually naively put, that the anamorphosis is “deformed,” while the perspective isn’t. What deformation is meant? By the principle of radial occlusion, none of them is optically deformed, being indistinguishable from O, while metrically both of them are deformed; the elongation of the edges in the oblique anamorphosis has a counterpart in their compression in the perspective. Of course some type of metrical deformation is inevitable if you lose one dimension. Let us move to more complex setups. Nothing stops us from using several planes at once, as in Fig. 6. The cone projects onto this union of planes as several images that should all join seamlessly when viewed from O. We will see that this changes the global properties (such as the number of vanishing points) in a fundamental manner while preserving the local properties of classical perspective. This multiˇ cakovi´c plane projection is historically used in the so-called perspective boxes (Cuˇ and Paunovi´c 2016; Gay and Cazzaro 2019; Spencer 2018; Verweij 2010). But of course we are not limited to planes. If we intersect the visual cone of an object with a curved surface, we still obtain a 2D anamorph. Now the line projections will themselves be curved in space, yet they will seem straight when seen from O. Ahead we will see how to investigate the shapes of these lines analytically and geometrically and how to draw them in practice. In Fig. 7 we see an example of

Fig. 6 Anamorphosis of a cube onto three planes. Locally these are just plane perspectives but globally, some lines in such multi-plane configurations may present up to two vanishing points. Temporary installation in Óbidos, Portugal, joint work by the author and Maria Bianchi Aguiar

186

A. B. Araújo

Fig. 7 Anamorphosis of a cube onto a union of a plane and a cylinder. (Drawing by the author)

an anamorph onto a composite surface made up of a cylinder and a plane. We notice that the lines of the anamorph don’t even have to be connected. Further, if we wish to obtain an immersive anamorphosis, that is, an anamorphosis that surrounds the viewer, we may place O inside a surface, for example, in the axis of a cylinder. Then the observer may look in all directions around the vertical axis and have the illusion of being surrounded by an immersive landscape. This is the principle behind the famous panoramas of the nineteenth century, illusions so striking at the time that the word em panorama, originally created to signify these illusory pictures, became in time associated to the physical landscape itself. In fact the word seems to appear in print for the first time in 1791 in an advertisement promoting Robert Barker’s large displays of immersive, cylindrical anamorphoses at his “panorama building,” (Fig. 8) erected on Leicester square, in London (Huhtamo 2013). From the platform at the center of this building, large paintings were displayed, to immerse the spectators in foreign landscapes or representations of famous battles, such as that of Waterloo, that was pictured in the Leicester Square panorama in 1816, having been fought less than a year before. Such panoramas became widespread worldwide in the nineteenth century, and even today a panorama of the battle of Waterloo is on display at the museum erected on the field of battle, and a few depictions of civil war battles survive in the United States. It is important to note here that Barker’s panoramas, striking as they were, were not strict cylindrical anamorphoses. According to Kemp (1990, p. 214) they were made by joining a series of adjacent linear drawings, with some care to soften the transitions. These are perfectly adequate as long as there are no exceedingly long architectural lines involved, which is not a problem for wide vistas where each building will occupy a small angle of view and the general landscape is one of natural forms that the eye cannot judge for exactness. Of course, the most isotropic anamorphic view would be a spherical one, and this too has been exploited in various ways. The visual illusions of the planetarium

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

187

Fig. 8 Cross section of the Rotunda in Leicester Square, showing the viewing platforms for the main and secondary panoramas. A dark passage between the two cleared the viewing palate. The building was designed by Robert Mitchell to exhibit the panoramas of Robert Barker. Etching/Colored Aquatint by Robert Mitchell, 1801 (Mitchell 1801a, b)

or of a modern, digital full dome allow for hemispherical anamorphoses, and no more than architectural convenience and economy stops us from applying the same principle to build a full sphere, to provide a truly immersive anamorphosis, completely surrounding the observer. In fact, fully spherical immersive illusions were tried architecturally. Charles Delangard proposed the concept of the Georama to the Geographical Society of Paris in 1822, and the first exemplar was erected in that city in 1826 (Figs. 9) – a sphere of approximately 12 meters in diameter, whose interior was painted with a representation of the Earths geography, seen from within (Belisle 2015; Oettermann and Schneider 1997). Delangard’s Georama, like the panorama before it, became widespread and was implemented elsewhere , the most famous example being James Wyld’s Great Globe in Leicester Square. The concept was also used to represent the celestial globe. There was one such globe in the Paris exhibition of 1900 that visitors could enter to see the firmament rotate around their station point. Some Georamas paired these geographic and astronomical views, as in the case of Wyld’s Great Globe, whose interior displayed the Earth’s surface while having the celestial firmament painted on the exterior dome. This was to be expected, since the map making tradition has a long-standing pairing between the charts of the stars and those of the Earth (see, for instance, the  Chap. 19, “Spherical Perspective” in the present volume). Again, like the

188

A. B. Araújo

Fig. 9 Illustration of Delangard’s Georama

panoramas before them, the Georamas were not strict anamorphoses, having often deformations to facilitate particular readings, or putting the viewers in observational platforms not necessarily located at the center of the sphere (Belisle 2015). Still, they demonstrate the architectural possibilities of a completely immersive anamorphosis. Of course in the present day, a type of immersive anamorphoses is daily achievable through digital means. Fist-person games or simulations work by creating a flat anamorphosis dynamically onto the screens of the 3D glasses, centered on a moving observational point O, the screen serving as a moving window into the virtual world that will be in anamorphosis if the distance to the screen is adequate. Of course, since the distances are small, the principle of radial occlusion will not really be verified, so the illusion is not really immersive. Better results are obtained with the help of 3D helmets, where lenses create a sense of size and distance and illusion pairs are created for each eye at a moving pair of observational points OL and OR . This is but another step in the evolution of moving panoramas.

Some Considerations on Anamorphosis The Point of Observation It is important to stress that anamorphic equivalence is always related to an observer O. It makes no sense to ask if something is an anamorphosis without context. Everything is an anamorphosis of some other things relative to some

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

189

point. It only makes sense to ask if object X is an anamorphosis of object Y with relation to O. This is what is implied when in the usual dictionary definition of anamorphosis, we say that the object “looks right” from a certain point. But this vagueness causes all sorts of misconceptions. The issue is not that it looks “correct” (meaning recognizable as “something”) but that it looks like something that has be prescribed in advance. The question is how much can be prescribed or, to put it more rigorously, how much does a prescribed anamorph determine the set of all possible anamorphs of it with relation to given viewpoints.

Multiple Points of Observation We can prescribe the different appearances for the same object from two different points within some rather forgiving bounds. When one prescribes that X be anamorphic to Y1 from O1 , this determines a cone of rays from O1 to X, but does not determine X itself. Each point of X on that cone may still be freely moved along its ray. That degree of freedom allows one to specify another object (under constraints) Y2 such that X will be anamorphic to Y2 from another point O2 . As beautifully demonstrated by the works of Kokichi Sugihara, the degrees of freedom of the conic projection allow quite some leeway. Sugihara wonderfully explored such double anamorphic constructions with ambiguous 3D objects that from one point may look cylindrical and from another box-like or star-shaped (Sugihara 2015a, b, 2016), and not only the geometry of such ambiguous cylinders but even their topology may be changed, when, for instance, a pair of cylinders may appear to be intersecting or not according to the choice of observation point (Sugihara 2018). “Impossible” Objects It may seem paradoxical, but a real object may be anamorphic to an impossible one, that is, it may look like an object that makes sense locally, but not globally, like Penrose’s triangle (Draper 1978; Penrose and Penrose 1958), M. C. Escher’s Belvedere cube (Escher 1958, 1972), or Huffman’s various specimens of wireframe constructions (Huffman 1968, 1971). This is achieved by visually mimicking incompatible self-connections and other impossible features through hidden discontinuities or deformations of a real object (see Sánchez-Reyes and Chacón (2020) for a modern software-based treatment of this problem). Of course the apparent impossibility comes not so much from what we see as from what we assume, that is, not from the geometric constraints determined by the appearance of the object but from the psychological assumptions of how our brain makes sense of the limited visual information supplied by the view from a single point (Kulpa 1983; Sugihara 1982, 2000). Human visual processing relies on shortcuts and rules-of-thumb meant to work on the generic rather than the general case. For a simple example of an apparently impossible object, note that the rightmost object of Fig. 3 might be classified as one of Huffman’s impossible polyhedra, but only if we assume that the implied faces are planar. The reader will verify that there is in fact no polyhedron whose faces are delimited by the edges shown. However, if we triangulate across the vertices, as in Fig. 10, then get a perfectly consistent polyhedron, only the number

190

A. B. Araújo

Fig. 10 The rightmost object of Fig. 3 can be made into a consistent polyhedron by a choice of triangulation. Corresponding triangulations would result in the anamorphically equivalent edges in the cube and the right side object, but the faces themselves would be shaded differently under a light source. In a Lambert model of illumination, the value (gray level) of the two faces A H  D  and A E  H  will vary with the inner product of the faces normals to the incidence direction of the light. This could be compensated by painting the faces with different shades of gray so as to make them seem uniform when seen from O, achieving “color anamorphosis”

of planar faces is greater than the six originally assumed. Due to the ambiguity of radial occlusion, it is no wonder that even the simplest of wireframe objects, like Necker’s cube (Necker 1832), should make our brains oscillate between two readings undecisively; the wonder is that we may make any reading at all with confidence from among the infinite anamorphs that are mathematically consistent.

On Color It is important to note that anamorphosis in our present sense refers only to mere equivalence of contours or outlines. This limited principle is necessary but not sufficient for proper anamorphoses in the usual sense, as most optical illusions going by that name – be it Pozzo’s dome at St. Ignazio’s or Holbein’s painting of the ambassadors – require also mimesis of color to be effective. This aspect is usually unmentioned in the literature, as we assume that color will take care of itself once we use conical anamorphoses to define the contours of the objects we wish to represent in painting. This is true with careful diffuse lighting, but in a general setup color will certainly not take care of itself. Color anamorphoses (so to speak) are a whole new layer of complexity for mimesis that we will not treat it here with any detail, but would like to mention briefly. Color anamorphosis would require consideration of models of illumination, material properties, and, of course, consideration of color itself as a perceptual theory with its own principles. Color has its own mimetic principle, its own “anamorphosis” of sorts. The space of visual color stimuli may be identified with the set of all power distributions in the visible spectrum, which, as a vector space of functions, is of very high dimension (we could say infinite dimensional if we identify it with the set of all functions on the interval [0, 1], but

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

191

there are restrictions so the space is probably better seen as very high – but finite – dimensional). This space is reduced by our perception to only a convex set in a space of three dimensions, identifiable (for instance) as something like value, saturation, and hue. This reduction in dimension is analogous to the way anamorphosis reduces the three dimensions of the real space to a two-dimensional perceptual space (though the reduction is much more drastic in color). But all of this is beyond our scope. To consider the simplest of cases, take again the two anamorphic wireframe objects of Fig. 10. The situation is simple enough if we just consider the wireframe objects with ideal segments joining the points. If we consider a real object, then already we have some complications, as those lines will need some thickness to be visibly represented. We can naively conceive of them optically as painted matte black cylinders (with no specular reflections), thin, and featureless. If we now consider the polyhedron defined by a triangulation then both objects would just look like a black mass defined by their identical outlines. Let us now go a bit further and suppose the objects are made of triangles painted in some neutral gray. Suppose also they reflect light equally in every direction and with an intensity proportional to the inner product of the surface normal to the incident light ray direction. This is a simple Lambert model of illumination, adequate for rendering matte surfaces (Foley et al 1996). Within such a model, the gray level of each face will be determined by its angle to the light source. Hence, mimicry of color will require us to paint each face in such a way as to compensate the difference in their normal vectors. For instance, in Fig. 10, face ADH E of the cube is anamorphic to the union of the two triangular faces A D  H  and A H  E  . Notice the difference in the normal vectors on the faces. If the faces were to be painted equally, and assuming light comes from front and high, the face that points downward would look darker than the cube’s face, and the face that points upward would look brighter. So to compensate for this and achieve apparent equality of value (“color anamorphosis”), we would have to paint the upward facing triangle darker and the downward one lighter than the square face. In Fig. 11 we see a simple hand-drawn example of color anamorphosis: a plane anamorph of a cube, side by side with a real cube of the same apparent dimensions. The physical cube is painted uniformly with white paint. The plane anamorph’s normal vector coincides with the normal vector of the top face, so these corresponding faces can be painted with the same value. But the other “faces” of the plane anamorph had to be painted darker in order to match the appearance of the faces of the 3D cube that are turned at a larger angle from the incident light. The rules of this color anamorphosis game depend on many factors and will vary with the type of light, its form, its location, the type of reflection we get from the materials, and so on. In most cases it will be impossible to match apparent color by simply changing the local color, as the ranges will be too large. Lambert reflection and a uniform light is just about the easiest situation for matching. Diffuse lighting helps, as cast shadows will create additional complications (or artistic opportunities). If we were to attempt a formal treatment of color anamorphosis, we would start with something akin to the following definition: Two objects X1 and X2 are color-

192

A. B. Araújo

Fig. 11 A 3D cube and a plane anamorph

anamorphic relative to point O and light sources L1 , L2 if X1 seen from O under light source L1 is visually indistinguishable from X2 seen from point O under light source L2. Hence we have a dependence on the light sources quite analogous to the dependence on O in conical anamorphosis. Such a concept allows us to speak, for instance, of a ball under an incandescent bulb being color-anamorphic to a disc under a halogen lamp, the disc being painted so as to simulate the color gradations of the ball. We will not, however, attempt here any formal development of this, but the concept is intuitively what any painter does when he paints a daylight scene to be seen under gallery lights and, more analytically, what a restoration expert has in mind when making sure that he uses not just the right color but the right pigment to restore a patch of color, so as to ensure color matching under change of light source spectrum. We can sum up the observations of this section by saying (unpardonably boldy and omitting many qualifiers) anamorphosis is what makes drawing possible; color anamorphosis is what makes painting possible.

Binocular Anamorphoses The abstract eye of anamorphosis is cyclopic, yet we can still easily work with binocular vision. Anamorphosis acts before the light hits the eye – it is a manipulation not so much of the eye, but of the light that reaches it. For this reason, arguments too concerned with what happens at the retina or the brain are missing the point. Whatever the complex workings of the visual system, feeding it equivalent inputs will result in equal perceptions. Hence, we don’t need to understand how the brain combines binocular images; we just need to fool each eye separately, by figuring out what light packets it would be receiving from its position if the imaginary object was present. Then the brain will take care of the mixing. The problem is reduced to the technical one of showing each eye its own anamorph, built in the standard way, tailored to its specific position. The biggest problem is a suppressive one: to make sure that each eye only gets its corresponding picture. This can be achieved simply (but with limitations) by using a physical wall (septum) to

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

193

Fig. 12 Anamorphic anaglyph of a cube, to be viewed with red-blue 3D glasses

constrain each eye to its own compartment. This simple method, used by Dutour (1760) in the eighteenth century, is still an element used in contemporary solutions. Wheatstone (1838, 1852) achieved the separation through his mirror stereoscope in 1832 (Wade and Ono 2012). From that idea comes a long line of evolutions and simplifications, through the Holmes stereoscope in the 1860s and then the viewmaster in the late 1930s. These are not very different from modern VR viewers like Google Cardboard, except that today the illusions can move. All of these use in some measure either diffraction or reflection, so strictly speaking they are not working according to conical anamorphosis (although the distinction is superficial). A more strictly conical device is the anaglyph. Perspective anaglyphs were first published by Vuibert (1912) in a book with drawings by Henri Richard (Cabezos Bernal 2015). We are all familiar with the red-blue or red-cyan 3D anaglyphic glasses. The filter over each eye eliminates the opposite image, therefore separating the two anamorphoses, resulting in pictures that pop out of their frames. Doing the same procedure with an “oblique anamorphosis” instead of a “perspective” creates a far more startling result (Araújo 2017a; Cabezos Bernal 2015). For instance, Fig. 12, seen from the correct point, at a sharp angle to the page, will look like a cube popping out of the page, and young students get surprisingly amazed – in spite or perhaps because of their familiarity of computer graphics – to see what resembles a wireframe cube floating midair and wobbling as they move their head slightly. Moving the head in a vertical over the observation point gives the illusion of a bar chart animation, with a parallelepiped growing before one’s eyes. We will not remark further upon binocular vision, as it can be handled by this adaptation of monocular considerations. The reader can refer to Cabezos Bernal (2015) for more on this matter.

Anamorphosis Formally Reformed Mathematical Preliminaries We now proceed to the mathematical development of what we loosely described above.

194

A. B. Araújo

We need some geometric background. We will model our three-dimensional world abstractly as an Euclidean three-dimensional space, denoted E. You can think of E as R3 . A point O of E will represent the location of the observer’s eye. Let EO = EO \ {O} denote the three-dimensional space except for point O. We will want to model objects. We define an object to mean a closed set of EO and a scene to mean a finite union of objects. The term “closed” is a topological term. We recall some terminology briefly for the mathematician and then explain it at length. We beg the artist for bravery at this point. Hold the line for these will all be made more concrete later. Definition 1. Let S be a subset of a topological space. We say that P is a limit point of S if every neighborhood of P contains a point Q of S other than P . The closure of S is the union of S with its limit points and is denoted by cl(S). S is closed if S = cl(S). The residue of S is the closure of S minus the set itself, denoted Res(S) = cl(S) \ S. Intuitively: a limit point of a set is a point that can be approached indefinitely close without leaving the set. Closed sets are those that contain all their limit points. A point, a segment, a line, a plane, a circle, and a sphere are all closed sets. For a counterexample, take the set S = AB \ {A, B}, a segment minus its endpoints. Both A and B can still be approached indefinitely close without leaving S, so they are limit points of S. But they are not in S; hence S is not closed. We will find ahead sets that naturally will be missing limit points. For various reasons we do not like that, so we will add them in. Mathematicians have a fancy name for adding in those missing points – they call it taking the closure. The closure of S is the smallest closed set that contains it. So, going back to our example of the segment without endpoints S, the closure of S, denoted by cl(S), is the full segment AB, which contains all its limit points and is therefore a closed set. Finally, we call residue of S to the set of limit points that we have to add to a set S in order to close it. So, Res(S) = cl(S) \ S. In our example, the residue was the set of the missing endpoints of the segment, {A, B}. We cannot really speak about closed sets, compacts, and so on, without having a topology. A topology on a set is a specification of which subsets are open (open sets are the complement of closed sets). In what follows we will be interested in studying the set of visual rays from the eye O. We will see how to put a topology on the set of these rays. First let us define what a ray is. We say that a set S is convex if for any two points A and B of S, the segment AB is in S. A point O on a line l divides l \ {O} into two disjoint convex sets l1 , l2 . We say that l1 and l2 are each a ray over the line l, with origin (or vertex) in O. We interpret rays as directed half-lines going from O to infinity (in the case of l1 and l2 , going in opposite directions from O). Note that rays are missing the point of origin O. −→ → → Vectorially, if − v is a vector, a ray is the set O + a − v , a ∈ R+ . For P = O, OP denotes the ray with origin in O that passes through P . It is the same notation as for

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

195

a vector, but we will allow context to disambiguate. Note that the usual definition of ray includes the origin point; ours doesn’t because we don’t want different rays to intersect each other. So a ray with origin in O is contained in EO . 2 denote the unit sphere with Let RO denote the set of all rays from O. Let SO center O. There is a canonical isomorphism between the set of rays and the sphere: we identify each ray with the point where it intersects the sphere. This bijection endows the set of rays with the topology of the sphere, giving it a notion of closed sets, limit points, and so on. We will therefore freely refer to rays or points on the sphere interchangeably.

Anamorphosis as a Mathematical Object The principle of radial occlusion defined above expresses a notion of equivalence between points with regard to a reference point O, the eye of the observer. When the principle is valid, two points that are geometrically aligned along a ray from the eye are equivalent as far as visual perception goes. It follows that two objects should be equivalent if all their points are aligned or, equivalently, if they generate the same set of visual rays from O. We express this set as follows: Definition 2. We say that the visual cone of an object Σ relative to O is the set of rays subtended by Σ with origin in O, which we denote CO (Σ). Hence CO (Σ) = −→ {OP , P ∈ Σ}. Now, from radial occlusion, it is natural to define that two sets “look the same” if they have the same visual cone. Later we will show that a somewhat looser equivalence is more interesting, which we will call anamorphosis. For now, for the purpose of motivation, we will concentrate on the stricter notion of conical equivalence. Definition 3. We say that two objects Σ1 and Σ2 are conically equivalent relative O to O if they have the same visual cone relative to O. We write this as Σ1 = Σ2 . It is clear that conical equivalence is indeed an equivalence relation between 3D objects, meaning a relation that is reflexive, symmetric, and transitive. We are interested in studying the classes of equivalence by this relation and how to construct the members of these classes. O Q if and only if P and Q are on the Proposition 1. Let P , Q be points of EO . P = same ray from O.

Proposition 1 shows that the principle of radial equivalence for pairs of points is a particular case of Definition 3. This trivially follows from the fact that the cone of a − → point P is the ray P .

196

A. B. Araújo

O O P and Q = Q then Proposition 2. if P , Q, P  , Q are points of EO such that P  = O   PQ = P Q .

Proof. By Proposition 1, triangles P OQ and P  OQ define the same angle with vertex at O. Each ray inside the angle intersects sides P Q and P  Q at a single point, which establishes a bijection between both the sides and the rays of the cone. We can state this as the cone of rays of a segment AB is the angle  AOB. Remark 2. The proof does not work for lines as the triangles do not capture the full cone of the line. In fact the proposition is false for lines. Corollary 1. If γ is a continuous curve with parametrization g : [0, 1] →  AOB O O O A and g(0) = B then γ = with g(0) = AB. Remark 3. This is the case of curve A B  in Fig. 3. Proposition 3. if ΔP QR and ΔP  Q R  are triangles with vertices in EO , such that O O O O P = P , Q = Q, and R  = R then ΔP QR = ΔP  Q R  . Proof. For the sides of the triangles, the result follows from Proposition 2. Let I be an interior point of one of the triangles, say triangle ΔP QR. Then the ray from A O through I finds a point J on QR. Since QR = Q R  , there is a point J  on Q R   O such that J = J . As in the previous proposition, this does not generalize to the full plane defined by a triangle. Note that segments are convex combinations of two points and triangles are convex combinations of three. The result is valid for general convex combinations. As a corollary of the propositions above, if Σ is a graph composed of vertices and segments (say, the cube of Fig. 3), we can obtain slide every vertex freely along its ray from O, and the graph obtained will have the same cone as the original graph. The same is true of the polyhedra obtained from the graph by triangulation (there are several choices of flat faces compatible with a graph). Moving the vertices will create a new polyhedron, equivalent to the first. Corollary 2. If S is a continuous surface with boundary on a continuous closed curve anamorphic to triangle ABC and S is contained in the solid angle defined O by ABC, then S = ABC. Remark 4. When dealing in computer graphics, the propositions above can be used to obtain anamorphs very easily from a reference object. For instance, it is very easy, using the Geogebra software (Hohenwarter 2002; Hohenwarter et al 2013), to

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

197

slide points on their rays from O to obtain anamorphic polyhedra as in Fig. 3 and even to deform these sides into simple curves. Sánchez-Reyes and Chacón (2016) do something analogous to obtain anamorphic deformations of a reference object using Bézier triangles and NURBS. As we have seen, Proposition 2 does not generalize to full lines. A line l on a plane H divides the plane into a disjoint union of convex sets π = π1 ∪ l ∪ π2 , where π1 and π2 the half-planes on either side of l. The cone of l is characterized by the following proposition. Proposition 4. Let l be a line in EO . O and l define a plane πO (l). Let lO be the line parallel to l passing through O. lO divides πO (l) in two disjoint half-planes. Then CO (l) is the set of rays through the half-plane of πO (l) that contains l. We remark that although l is closed in E, its cone is not closed in RO . This will be important ahead. There are two rays missing to make the cone closed, these being the two diametrically opposite rays whose union is lO . These mark the limits of the cone of l but do not belong to it. Intuitively, the line of sight approaches parallelism to l more and more as it follows it to infinity, but never reaches it. The reader probably recognizes line lO as the line that in classical perspective defines the vanishing point of l by intersection with the plane of projection. Here we have no plane of projection, but this line is still important: O  l if and only if l and l  define Proposition 5. Let l and l  be lines in EO . Then l =  the same plane with O and lO = lO .

In linear perspective line lO is used to construct the projection of l. The construction is made by joining the intersection of l and lO with the projection plane. This is a fundamental theorem of linear perspective (Andersen 1992, page 12), but it cannot even be stated for anamorphoses since we have no projection plane. Further we will see ahead that a general projection surface will not intersect l anyway. But we have the following analogous construction that uses lO and l to obtain an arc of circle that is related to l in a canonical way. Proposition 6. There is a single circle C of center O that is tangent to l. Let CO (l) be the intersection of C with the half-plane of πO (l) on the same side of lO as l. Then CO (l) and l define the same cone of rays from O. Proof. The perpendicular to l from O intersects l at the point I which is minimizes the distance from l. Hence the circle through I of center O will be tangent to l (Fig. 13). This circle intersect lO at two points V and V  diametrically opposite from O. Consider the half-circle obtained from V I V  by excluding its endpoints V and V  . There is a one-to-one correspondence between points on this arc and points −→ on the cone CO (l), obtained by intersection of the arc with OP , for each P in l.

198

A. B. Araújo

Fig. 13 The canonical meridian representative of a line’s visual cone

This is nice result at first sight. There is a canonical arc of circle that has the same cone as the line. But there is a problem. The reader may have noticed that we didn’t say that these were equivalent objects. That is false both for the arc with the endpoints and for the arc without them. The arc without the endpoints has the same cone of rays as the line, but it is not a closed set; hence it is not what we defined as an object. The arc with the endpoints is closed, but it is not conically equivalent to the line, as the rays corresponding to the endpoints are not a part of the line’s cone. We could solve this in several ways, for instance, by changing the definition of object, but the most satisfying seems to be to change our notion of equivalence. Hence we define anamorphic equivalence by slightly relaxing conical equivalence: Definition 4. We say that two objects Σ1 and Σ2 are equivalent by conical anamorphosis relative to O; their visual cones are equal up to topological closure, O Σ2 . that is, if cl(CO (Σ1 )) = cl(CO (Σ2 )). We write this as Σ1 ∼ Some nomenclature, “equivalent by conical anamorphosis relative to O,” is a mouthful, so for short we’ll just say that two objects are anamorphic, or that they are anamorphs (of each other), or that one is an anamorphosis of the other. This definition fulfills well both the common usage of these expressions and our technical needs. These concessions to brevity and custom should not make us forget that any anamorphic equivalence is always relative to a specific observation point, so each of these phrases should be mentally ended with “relative to O.” O O Proposition 7. If Σ1 , Σ2 are objects such that Σ1 = Σ2 . Σ2 , then Σ1 ∼

Obviously, conic equivalence implies anamorphic equivalence. Hence the results we obtained for conical equivalence above are still valid for anamorphosis. Now let us reconsider the canonical semicircle we defined above. If we add to it its endpoints, by taking its closure, it becomes an anamorph to the line l. These two points are very important. They are the intersection of lO with the sphere; hence

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

199

they are the same for every line parallel to l. Hence they will be meeting points for all lines parallel to l. The two rays of lO generalize the notion of vanishing points from classical perspective. We can generalize the construction we did for the line, so as to get a canonical anamorph of any object, as well as a vanishing set: Definition 5. We say that the anamorphosis of an object Σ relative to O is ΛO (Σ) = cl(CO (Σ)). We identify it with its projection onto the unit sphere, 2 , when context is clear. The latter we also call the canonical cl(CO (Σ)) ∩ SO anamorph of Σ. It follows trivially from Definition 4 that an object is anamorphic to its anamorphosis thus defined. Its spherical anamorph is called canonical because it is the most natural of its 2D anamorphs, arising from the identification of the space of rays with the sphere. Definition 6. We say that VO (Σ) = Res(CO (Σ)) = cl(CO (Σ)) \ CO (Σ) is the vanishing set of object Σ relative to O. We call vanishing points to the points of the vanishing set. We identify the rays of the vanishing set with the corresponding 2. points on the unit sphere SO The vanishing set is the residue of the visual cone of the object, that is, the points you must add to make it closed. This definition has the advantage that it works for any object and, when restricted to lines and planes, reduces to what you would expect from the classical definition, as we will see. Remark 5. When teaching the concept of vanishing points to artists, it is hard to deal with topological concepts explicitly, especially on a short timeframe, such as a workshop. Then it is more practical to generalize the tradition of perspective started by Taylor and simply define vanishing points of lines as the points associated with the two opposite rays that make up lO and then notice that these are at the end of the canonical semicircle; then it follows from the definition that parallel lines have the same vanishing points and that these are meeting points for those lines. Proposition 8. The canonical anamorph of a line l ⊂ EO is a meridian of the unit sphere. The vanishing set of l is a pair of antipodal points located at the ends of the meridian, where lO , the parallel to l through O, intersects the sphere. This proposition turns perspective into spherical geometry, so we can apply the tools of that discipline. See Catalano (1986) and Araújo (2018c). In particular we will be talking a lot about antipodal points, so let us recall that two points are called antipodal if they are diametrically opposite on a sphere. Given a point P on the sphere we denote its antipodal point by P  . The notion extends trivially to the corresponding rays through the center of the sphere.

200

A. B. Araújo

This proposition has the obvious corollary that not only do parallel lines have the same vanishing points, but these are actual meeting points of their anamorphosis. This is a very concrete alternative to the abstract notion of parallel lines meeting at infinity – their spherical anamorphs actually meet on the sphere. Corollary 3. If l and l  are parallel lines, their canonical anamorphs intersect at their vanishing points. This is a beautifully unifying view of vanishing points, unlike the classical perspective view where a line has sometimes one and sometimes no vanishing points, or even the (hemi)spherical perspective of Barre and Flocon (Barre and Flocon 1968) where lines have either one or two vanishing points. This search for constancy in the number of vanishing points can be compared to the insistence of mathematicians in redefining numbers to ensure that an n-degree equation always has exactly n roots. The projective plane view of perspective answers the same desire for symmetry, but there we have always a single vanishing point. Our present notion of anamorphosis (and later of perspective) splits that point in two as we want no preferential axis and wish to capture the whole visual environment. Now we consider the matter of construction of these anamorphs. Since the canonical anamorph of a line is an arc of circle, we might expect it to require three points for its construction. But since it is a meridian, and the two vanishing points are antipodal, two points are enough. Recall that two points on a sphere define a plane through its center. The intersection of that plane with the sphere is a circle with the same radius as the sphere and to this is called a great circle of the sphere, or a geodesic . A meridian is a connected half of a great circle, so it can be specified by choosing two antipodal points and then picking one of the two possible halves of the geodesic. This picking of a meridian can be done by choosing a point in its interior. But since the two endpoints are mutually antipodal (hence knowing one is enough), all we really need to know to determine a meridian is two points of it, as long as one of them is specified to be an endpoint. For the same reason, only two points are needed to determine the image of a line, as long as one of them is specified to be a vanishing point. Proposition 9. The anamorphosis of a line l of EO is determined by one vanishing point and by the projection on the sphere of one point of l. Proof. The vanishing point V and the measured point P  projected from l onto the sphere determine a plane through O, hence a geodesic. On this geodesic, the line image will be the meridian V P  V  . Note that if the line l crosses O then its image is just a set of two points, these being located at the place of the vanishing points, which in this case are actual points of the projection. It is the line itself that vanishes. Taylor remarked on much the same situation in classical perspective, and that is one of the justifications for the terms

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

201

“vanishing point,” as the line “vanishes” into it in this degenerate case (Andersen, 2007, page 175). There is a technical difference in our definitions here, since by Definition 6, these two points are not in our vanishing set at all, while in classical perspective the vanishing point exists and coincides with the single projected point. Example 1. In Figs. 14 and 15 we see two examples of pairs of spatial lines l and j relating to their vanishing points and to their spherical anamorphs. In Fig. 14 we see two parallel lines. They have therefore the same translation to the origin, so lO ≡ jO ; hence they have the same pair of antipodal vanishing points, V and V  . Their canonical anamorphs on the sphere are meridians ending at these vanishing points. Hence the vanishing points are actual meeting points for the canonical anamorphs. As for construction of these spherical anamorphs, two projected points P and Q from l and j , respectively, define each meridian, as in Proposition 9. These points may be measured anywhere, but one way would be at the intersection of the meridian with the geodesic plane orthogonal to lO . In Fig. 15 we see two perpendicular lines. Now lines lO and jO are themselves orthogonal, crossing at O, and we get two pairs of vanishing points. The meridians are obtained from these by measuring a single additional point, the common point where the two lines cross. Hence we see that only vanishing point V1 and point P need be measured to obtain the whole construction. In Fig. 16 we see the spherical anamorph of a cube. The cube defines a set of six distinct vanishing points, which are meeting points for the lines that prolong the edges of the cube.

Fig. 14 Anamorphosis onto the sphere of two parallel lines. The lines project as meridians that meet at two common vanishing points V and V  , antipodal to each other

202

A. B. Araújo

Fig. 15 Two lines l and j intersecting at a right angle. Their canonical anamorphs on the sphere go to pairs of vanishing points distributed at a regular separation of 90 degrees along the geodesic defined by the orthogonal lines lO and jO

Fig. 16 Anamorph of a cube on a sphere. The cube defines a set of six distinct vanishing points, two for each set of parallel edges

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

203

Fig. 17 A plane σ and its vanishing set VO (σ ) on the unit sphere

Proposition 10. Let σ be a plane not containing O. Let σO be the plane parallel to σ through O. Then σO defines a great circle of the sphere, which is the vanishing set of σ (Fig. 17). This great circle divides the sphere in two hemispheres. The anamorphosis of σ is the hemisphere on the side of σO that contains σ . Proof. The construction is analogous to that of the canonical circle. Take a perpendicular from O to σ , and this defines a sphere tangent to σ , half of which will be the image of the visual cone of σ . To get the anamorphosis, scale it down to unit radius. The vanishing points of lines and planes are quite analogous to the linear perspective case, in that they are obtained by translation to the observation point followed by intersection with the projection surface. But our Definition 6 applies also to general objects that cannot be treated in this fashion, such as the following example. Example 2. Let l be the curve t → (t, t 2 , 0) and let O = (0, 1, 0). Then the cone of l will the great circle defined by z = 0, minus point (0, 0, 1). But the closure of this set on the sphere is the whole of the great circle. So the vanishing set is (0, 0, 1) and the anamorphosis is the great circle through z = 0. Example 3. Let S be the surface z = x 2 + y 2 and let O = (0, 0, 1). Then the cone of S will the whole sphere, minus the point (0, 0, 1). But the closure of this set on the sphere is the whole of the sphere, so the vanishing set is (0, 0, 1) and the anamorphosis is the sphere itself.

204

A. B. Araújo

This is a different perspective from the more traditional notion of “following the line to infinity.” Although the vanishing set is the same, we can look for the vanishing points in the anamorphic projection itself, by looking for “missing points” or, if you will, at “vanished” points. This is also a better definition when we consider sets whose limit rays cannot be obtained by simple curve parametrizations like in our second example above. We would now like to define an anamorphosis onto an arbitrary surface and its vanishing points. In what follows we will define surface to mean a manifold of dimension two, that is, something that locally maps to a region of a plane. These will usually be manifolds with boundary, such as a sheet of paper or a half-sphere. Most of the time they will be smooth, but we only require them to be topological manifolds. Most of the time they will be connected, but not always. We use the canonical anamorphosis onto the sphere as the blueprint from which all others are derived as projected images, usually partial ones. The idea is simple: the anamorphosis of an object onto a surface is just the closure of the conical projection of the object onto the surface. The vanishing points on the surface will be the subset of the canonical vanishing points that happen to fall on the surface. Technically, it takes some care not to count points multiple times if the surface has a complex shape that folds over itself or has several connected components. To avoid these problems, and because that work is centered on spherical perspectives, Araújo (2018c) defines the anamorphosis only for compact starred surfaces, which are just radial deformations of the sphere. These are quite enough for most purposes, so we start with them and then discuss possible generalizations briefly. Definition 7. We say that a compact surface S is starred relative to O, or that it is an O-starred surface, if every ray from O touches S at most once. A compact surface is said to be locally O-starred at P ∈ S if there is a neighborhood B of P such that every ray through B intersects S ∩ B at most once. You can think of an O-starred surface as being locally defined by f (u) = 2 where h : U ⊂ S 2 → R+ on a region U of the sphere S 2 . h(u)u, u ∈ SO O O Intuitively, h(u) represent a ratio that pushes each point of the sphere closer or further away from the center O. We can define anamorphosis onto an O-starred surface S as follows. Definition 8. The anamorphosis relative to O of an object Σ onto an O-starred surface S is ΛO,S (Σ) = cl(CO (Σ)∩S). The vanishing set of Σ on S is VO,S (Σ) = ΛO,S (Σ) \ (CO (Σ) ∩ S). Since the conical projection is a continuous map onto an O-starred surface, we have the following result.

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

205

Proposition 11. Let S be a locally starred surface at V . Then V is a meeting point −→ for the anamorphic images of lines with vanishing point OV ∈ RO . When S is a starred surface, we can express these objects very simply as intersections. Proposition 12. Let S be an O-starred surface. Then ΛO,S (Σ) = ΛO (Σ)∩S, and VO,S (Σ) = VO,S (Σ) ∩ S. That is, the anamorphosis (resp. vanishing set) of Σ on S are just the intersection of the rays of the anamorphosis (resp. vanishing set) with S. When considering the actual construction of an anamorphosis on a surface, we can construct one anamorph – the simplest one we can find – and then project it onto the required surface. In particular, the canonical anamorphosis onto the unit sphere is a good candidate as a prototype for others, since it is so symmetrical. So it is natural to solve the anamorphosis there and then project it conically onto the required surface, O using the fact that ΛS,O (Σ) ∼ C0,S (ΛS 2 ,O (Σ)). For this reason, we are compelled to study the projections of lines and planes on the sphere with special interest, as well as its projections onto various kinds of surfaces. Lines, as we have discussed, project as meridians onto the sphere, each meridian ending at two antipodal vanishing points. We will now consider how these meridians in turn project onto various surfaces. Example 4. Consider the anamorphosis of a line l onto a compact region S of a plane π ∈ EO : this can be identified with (a compact subset of) linear perspective or with so-called oblique anamorphosis. Let πO be the plane through O such that πO π . l projects on the sphere as a meridian with vanishing set {V , V  }. Suppose the vanishing set is on πO . Then if l is on the side of π , it projects onto a line; if on the other side, it projects as the empty set. In either case the rays corresponding to the vanishing points do not intersect π , so the line has no vanishing points on π . If the vanishing points are not on πO , then the line intersect the plane π at a point I and one and only one of the vanishing points projects onto π as a point Vπ . Then the line projects into the ray with origin at Vπ that passes through I , and its vanishing point on π is Vπ . Since we assume S to be a compact (hence bounded) subset of π , both the vanishing point and point I (as well as an unbounded section of the line projection) may be actually outside the anamorphosis proper, but they can still be used for the construction. Remark 6. We can contrast classical perspective (or plane anamorphosis) with spherical anamorphosis by saying that in the former a line projects either as a line or as a ray, while in the latter a line projects as a “segment” (i.e., a meridian ending at vanishing points).

206

A. B. Araújo

Example 5. Suppose S is a compact region of three intersecting planes disposed as the floor and walls in the corner of a room (Fig. 6). This is locally the same as a plane anamorphosis, but globally very different, as some lines may have two vanishing points. In rendering lines, we may consider the following strategy: in each plane plot two points of the line, thus finding two lines joining at the intersection of the projection planes. More efficiently, find only three points per line, one point on each plane and one on the intersection of the planes. But since two non-antipodal points define a geodesic, in fact two points anywhere on the planes must be enough to define the line across all projection planes. These constructions are maximally symmetric in the case of the anamorphosis onto a cube, which has been studied in Araújo et al (2019b) as a special case of spherical perspective. Then the geodesic associated with a line l projects as a set of six or of four segments on the cube, obtainable from any two points through simple descriptive geometry constructions. Example 6. Let S be a compact region of a cylinder, with O outside the cylinder (Figs. 18 and 7). This is a good example of how we can make do with only starred surfaces. Although the cylinder is not starred from O we can consider as projection

Fig. 18 Descriptive geometry construction of the anamorphosis of a cube onto a cyilinder. Each point is found on the top view (Bottom left) then lifted to the side view (Upper Left) and found on the developed (cut and unrolled) cylinder by transforming the angular coordinates from the top view and transporting the heights from the side view (a). The rectangle can be cut and rolled up to obtain the cylindrical anamorphosis (b) which will be correctly observed from the point O determined by the two projections OS and OT . From this point the edges of the cube appear straight (c)

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

207

surface only the proximal region of the cylinder, which is visible from O and exclude the distal, occluded part, thus obtaining a starred subset. A generic nonvertical line will define a plane that cuts the cylinder in an ellipse. A line will project as an arc of an ellipse, with at most one vanishing point. A large class of lines will have no vanishing points at all on the cylinder, as lO will not intersect it. If the cylinder were to be cut at a vertical line and opened isometrically (unrolled), the ellipses would transform into segments of sinusoidal curves. See Apostol and Mnatsakanian (2007). Example 7. Let S again be a cylinder as in the previous example, but now O is on its axis (Fig. 19). Now generic geodesics will project onto ellipses that are half above and half below the horizontal plane through O. The line will project as an arc of its ellipse, ended by two diametrically opposed vanishing points. Lines on the horizontal plane through O are special cases, projecting as horizontal half-circles. Lines with vanishing points along the axis of the cylinder will project as verticals.

Fig. 19 Descriptive geometry construction of the anamorphosis of a line onto a cylinder with O on the axis. The image of the line on the unrolled cylinder is one half of a sinusoid whose axis of symmetry is the horizon line, at the height of O

208

A. B. Araújo

All geodesics, projected as sinusoids in the opened cylinder, will share the same horizontal axis of symmetry, as the planes of their geodesics all intersect the axis of the cylinder at O. This was not true in the previous case, where the plane of a generic line could intersect the cylinder’s axis at any point.

More General Surfaces The following section is technically nitpicking that should be ignored by the artist (and probably by everyone on a first reading). Although starred surfaces are quite enough for most purposes and provide the most elegant theoretical results, they can hardly model the large variety of real surfaces on which artists have constructed anamorphoses. Often these can be rather complex surfaces that fold over themselves in such a way as cut a ray from O multiple times; hence they are non-starred. Often they will have several connected components, sometimes so many that they might be better modeled as a point cloud. It is in fact quite up for debate what the most desirable definition would be for a general anamorphic projection surface. We will not aim at closing that discussion here but just at exploring it a little. First of all, we note that we could certainly extend our definitions to include all sorts of surfaces, even non-compact surfaces such as the whole Euclidean plane (in that case the formalism under consideration just becomes that of linear perspective). We would lose some of the elegance but things still work. It is interesting to consider what happens when we remain with compacts but relax conditions just a little. One way to deal with non-starred surfaces is by cropping or changing the surfaces into their minimal starred equivalents by considering only the subset that is visible from O. We did this informally with the cylinder in Example 6. We can formalize this procedure by generalizing conical projection to take occlusion into account, in the following way: given a point P and a surface S, the proximal conical projection sends P to the intersection with S which is closest to O (the reasoning is that the distal ones are occluded). Definition 9. The proximal conical projection from O to a surface S ⊂ EO is the → → → map from RO to S defined by ϕO,S (− r ) = {P ∈ − r ∩ S : ∀Q ∈ − r ∩ S, |OQ| ≥ |OP |} Remark 7. When the surface is O-starred, the proximal conical map reduces to the → → ordinary conical projection − r → − r ∩ S. We identify the map from RO with the −→ corresponding map from EO defined by P → OP ∩ S that maps each spatial point to its conical projection on the surface. Identifying point of S with their rays, we can use the proximal conical projection map to define the minimal starred equivalent of S itself, i.e., the smallest locally starred surface that has the same anamorphosis as S. In fact ϕO,S (S) has no double points, preserving only the proximal (closest to O) intersection of each ray going

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

209

Fig. 20 The object composed by the rectangles A, B, C, D is not starred. Taking the proximal projection ϕO,S (S) eliminates sections B and C, leaving two connected components A and D, minus segment h2 , which is recovered by taking the closure

through a point of S. But since we insist on compact sets, we define the minimal  = cl(ϕ O-starred equivalent of S to be SO O,S (S)). Because of the possible existence of multiple connected components, this surface will generally have a nonempty set of repeated intersections, although one of measure zero. It will be locally  is starred almost everywhere, i.e., there will be a set δ of measure zero such that SO  locally starred in SO \ δ. We say that a surface with this property is quasi-starred. Example 8. The object of Fig. 20, defined by the union of rectangles A, B, C, and D, is not O-starred. Taking the proximal projection ϕO,S (S) eliminates sections B and C, which are occluded by A, leaving two connected components A and B, minus the segment h2 of B, which is occluded by segment h1 of A. This segment  = A ∩ B is the disjoint union of two is recovered by taking the closure, so that SO O h2 is a set of double points compact rectangles. It is only quasi-starred, since h1 ∼ relative to O. These double points have to be handled carefully. Continuing Example 8, consider Fig. 21. The vanishing points V and V  , which are aligned with O, both belong to the surface and correspond to the same vanishing point on RO , but are meeting points for different sets of lines. We must take some care in defining vanishing points for general surfaces, to avoid duplication or the appearance of false vanishing points, as the equivalent of Proposition 12 is no longer valid. Let us consider our requirements: the vanishing points of a set Σ on S are determined by projection of VO (Σ), so clearly VO,S (Σ) is a subset of the intersection of VO (Σ) ⊂ RO with S. But this cannot be an equality otherwise we’d have double vanishing points. Also, we cannot define them as ϕO,S (VO (Σ)) as sometimes the proximal intersection with S of the vanishing rays are not the meeting

210

A. B. Araújo

Fig. 21 Anamorphosis onto a quasi-starred set with two connected components. Points V and V  are in the ray of the same canonical vanishing point, but Definition 10 ensures that V (resp. V  ) is only a vanishing point for the family parallel lines s1 , s2 (resp. l1 , l2 )

point of the projected lines, which is located in a distal connected component. The following solves all these difficulties: Definition 10. The anamorphosis of an object Σ onto a projection surface S  )), where S  = ϕ relative to O is ΛO,S (Σ) = cl(ϕO,S (CO (Σ) ∩ SO O,S (S) O is the minimal compact starred subset of S. The vanishing set of Σ on S is  ). VO,S (Σ) = ΛO,S \ (CO (Σ) ∩ SO  is the union of Let us see how this works with the example of Fig. 21. S = SO two disjoint compact rectangles. Points V and V  are on the same ray from O, on the edge of the proximal and distal rectangle, respectively. Let Σ be the set of lines s1 , s2 . By Definition 10, the vanishing set is V  , the meeting point of the lines projected on the distal rectangle, and V doesn’t figure in either the vanishing set or the anamorphic image. That is because neither V nor V  are present in  ), the latter only appearing when we take the closure. In the ϕO,S (CO (Σ) ∩ SO same way, V  will be ignored when calculating the vanishing set of lines s1 , s2 by Definition 10, and only V will remain. This is why we define the vanishing set through topological closure and not mere projection from the canonical vanishing set. In this way we both avoid duplication of points and ensure that the vanishing set preserves its meaning as a location for the meeting of parallel line images.  is Proposition 13. If S is a surface, and its minimal quasi-starred equivalent SO locally starred at a point V , then V is a meeting point for the anamorphic images −→ of lines that have the ray OV as a vanishing point.  then it will be a meeting point for only a subset of Note that if V is in δ ⊂ SO those lines (see Fig. 21).

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

211

Simplifications: Talking to Artists We would like to keep a balance between the concerns of artists and those of mathematicians. The technical details of Definition 10 are somewhat overwhelming for the former, so it behooves us to find some simplification. The trick is how to simplify without outright lying. The matter can be presented thus to an artist. We are interested in drawing an anamorphosis of an object Σ onto a surface S. The surface can be anything. The artist is instructed that for the purposes of drawing, only the part that is visible from O must be considered. Points of S in the conical shadow of other points of S which are closer to O are to be discarded. The exception are points at the edge (boundary) of the shadow. These are allowed to remain. From this cropping we get a working surface S  , the minimal surface, in which we will actually draw. Each line l determines exactly two vanishing “points,” V and V  , which are the rays from O that are parallel to lO (you can also call them vanishing rays for clarity). These are always exactly two, and diametrically opposite to each other. These vanishing rays can be represented as actual points in a canonical way by intersecting them with the surface of an the imaginary “visual sphere” around O. When drawing an anamorphosis onto the minimal surface S  , each of the two vanishing rays may intersect it several times. The multiple intersections of each ray with S  should be seen as multiple manifestations of the same vanishing point. So, for instance, V will manifest as points V1 , . . . , Vk , which we can order by distance to O. Then all lines parallel to l which meet at V on the sphere will meet at one of the Vi on S  . Which one can be found a posteriori from drawing the lines near to their meeting points and seeing in which subsurface they are found (it will be the one that contains the projection of the germ of l around V – but here we’d be going all abstract again, so we won’t). This statement avoids the most difficult terminology and requires minimal abstraction. It can be easily explained graphically to even very young students, and yet it carries the essentials, as far as drawing is concerned, of the discussion above. In a later section we will see how to implement these concepts graphically through descriptive geometry constructions.

On Compactness Why have we insisted on compact projection surfaces? If we think about it, this is quite strange. This was introduced by Araújo (2018c) for the needs of immersive perspectives as it is a very natural setting for the most usual such perspectives, namely, the cylindrical and spherical cases. Rather shockingly, however, it excludes classical perspective in its usual formulation, as the Euclidean plane, infinite and unbounded, is not a compact set. So it reformulates linear perspective onto a bounded region of the plane, though this region may be as large as you want it. You can imagine this region as a rectangle, for instance, representing a sheet of

212

A. B. Araújo

paper. Since we want it to be closed, it will technically be a manifold with boundary. Classical perspective will then become a degenerate limiting case as the region grows to infinity. This is analogous to the way in which orthogonal projection is usually considered a limiting case of conical projection as O becomes more and more distant from the projected object. As we have seen in Example 5, classical perspective still remains, in our view, a fundamental object, as our constructions of the compact plane anamorphosis are better done considering the full plane, but only as a construction device, with the final result being cropped to a region that is bounded, though as large as we want it (reminding us of the Aristotelian philosophical distinction between potential and actual infinity). Even so, this is a frightful demotion of the infinite plane of classical perspective, so we must consider what we get in return. Compactness is a desirable property. Ask a mathematician what linear perspective is as a mathematical object, and he will insist it is the projective plane. Like the sphere, the projection plane is a compact. Like the sphere, it gives you a definition of vanishing points that attributes a constant number of vanishing points to a line (this number being one, in the projective case, two in our case). This uniformity is pleasing to the mathematician’s eye, like that of the number of roots in a polynomial equation of given degree. In the projective plane case, this is done by identifying the actual perspective drawing as a chart of the projective plane. Hence the drawing lives naturally in the infinite plane. On the other hand this implies that the vanishing point of lines parallel to the chart’s plane actually lies nowhere in the chart. It lives only on the “point at infinity” of the chart, that is, on the one-dimensional projective line that lies outside of the chart’s domain. We find that this is a rather immaterial satisfaction. The projective plane is not a surface that the draughtman can “see”; the gained vanishing point is more than a little ethereal. By contrast, the sphere is not only an abstraction, it lives in the actual scene to be drawn; it is as concrete a surface as the plane itself – in fact more so, because it is bounded. It is the natural object that requires the least abstraction, just a bounded surface of ordinary Euclidean space, and yet affords a more pleasing result than either the infinite plane or the comparably abstract projective plane, affording a distinction between directions that encompasses a fully immersive view. What is formalized by the anamorphosis onto a compact surface is the notion of a drawing itself: the result is a real, drawing that can be materially executed and yet captures two vanishing points for every line. Mathematicians, of course, have no problem with the abstract notion of the infinite plane, but sometimes we want to deliberately keep to the finite. Just as we can abstract a notion of an infinite proof and yet for the most part decide that finite, actually executable proofs are our main object of interest, so here too, although plane perspective is certainly not a problematic concept – we could certainly accept the full Euclidean plane as a projection surface and still do our work – we decide that compact surfaces are the object of most interest to abstract the notion of drawing. This in fact corresponds neatly to the way in which plane perspective, or plane anamorphosis, is actually used by draughtsmen. Every drawing is made onto a finite canvas, although the canvas may be extended at will with no absolute limitation. And the draughstman knows the difference well

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

213

between the bounded plane of the drawing and the unbounded plane that contains it and may be useful as a construction device though not part of the drawing itself. Often the draughtsman will have to use methods that find a vanishing point outside the drawing sheet in order to construct the lines inside it. And it is nice to have an abstraction of perspective that relates directly to the practical distinction. Let us be more general: in a way we can say that compact – bounded and closed – objects are all that we can draw. Can we really draw a line? No. It is unbounded. We cannot draw infinite things. We draw a segment and end it with little dots to signify continuation. This is representing infinity, not drawing it. But when we project the line conically onto the sphere around O, it becomes a meridian. Can we now draw it? Not yet. As we have seen, it is not a closed meridian – it is missing its endpoints, as the vanishing points do not belong to the visual cone. We cannot draw open ended arcs. We can again only represent them, usually by drawing a small circles to signify that the ends are missing, but we cannot really perform the feat of drawing an arc without its endpoints. How to solve this? We could just add on those endpoints. In mathematics we do that and call it that fancy name: taking the closure. By taking the closure of the conical projection of the line, the cone becomes a closed half-plane, the meridian gets its endpoints. We now have a closed meridian on the closed and bounded – compact – sphere. A compact arc of circle. And compact sets are the right mathematical abstraction for that which can actually be drawn.

Descriptive Geometry Construction of Anamorphoses In this section we will be concerned with constructing anamorphoses onto surfaces from the point of view of the draughtsman, using simple descriptive geometry. We focus on the mechanical device of a modified Dürer machine, both in physical form and abstracted through descriptive geometry constructions. This is a matter intrinsic interest for our purposes, but we start with some lateral considerations of a didactic nature.

Handmade vs Digital Anamorphoses It is undeniable that at present, digital means greatly simplify practical construction of anamorphosis, both in 2D, with projection mapping (Inglis 2018; Monroe and Redmann 1994) and in sculpture through 3D modeling and printing (we have already mentioned (Sugihara 2015a), but see also Ballegooijen and Kuiper’s work in Dunham (2019)). These are the quickest and more efficient options for production purposes, and there is nothing wrong with that; the working artist knows that there is no such thing as cheating and anyway a vast gamut of anamorphosis would be unreachable or at least untried without such means. However, premature or excessive reliance on technological shortcuts not only lacks charm and intellectual appeal, but can lead to limiting results, as the tools that happen to be on offer define a scope of action and thought, the terms of the language of creativity,

214

A. B. Araújo

so to speak (Papert and Turkle 1991). This is always so, even if the tool is the brush, but digital tools, due to their complexity and opaqueness, tend to so in a more insidious manner. The expressive scope thus delimited may be inadequate for some artists , who then have little recourse to developers if they lack the understanding of the basics that is required for communication. The tendency will be for artists to adapt to tools rather than the other way around and end up chasing ever-changing flawed interfaces (Coates et al 2010; Norman 1990), so it is hard to say how much effort is really saved if time economies obtained by neglecting fundamentals are spent chasing transient knowledge of tools that never really just work as advertised (Kim and Chin 2019). The tendency is for these interfaces to help artists work with implicit principles they do not understand deeply, but there are limits to what can be achieved before the confused artist hits the walls of his ignorance. For instance, Tran Luciani and Lundberg (2016) reports both on the achievements of digital interfaces in helping artists create spherical panoramas and on the difficulties that always eventually arise from not knowing the essentials of the spherical perspective underlying the interface. Araújo (2019a) has argued that there is a need for more interfaces that force (and help) the artist to learn the fundamentals rather than helping them to avoid such learning; and that production software should ideally come in after the user has already mastered the discipline in its rudimentary form, so as to avoid the black box frame of mind. Further, the knowledge of the mathematical and geometrical fundamentals underlying their tools is better acquired, for most artists, not through formalism, but through manual practice in an embodied form. For this purpose, Araújo (2017b) proposed anamorphosis, in the form here discussed, as the central concept behind technologies such as virtual and augmented reality, 360-degree photography, and projection mapping; further, it proposed the modified Dürer machine, both physical and “virtualized” through descriptive geometry constructions, as an adequate physical embodyment for artistic exploration of the principle. This resulted in a teaching process that has been tested in students of a Ph.D. program in digital media arts and yet also, reworked and simplified, as a new way of introducing young students to the subjects of linear perspective and descriptive geometry (Araújo 2017c). In the latter case, making anamorphosis the fundamental concept and linear perspective the derived concept, in a reversal of the usual practice, and expressing this concept with constructions that are handmade, yet display their anamorphic properties through digital means (such as photography and VR), resulted in an enticing process to students that have grown with a view centered on digital media. This has shown promise in preliminary tests with Portuguese 9th grade students (approximately 15 years old). In what follows we present the descriptive geometry constructions of anamorphosis in a way that reflects the learning paths here mentioned.

Dürer Machines Running Back and Forth Descriptive geometry constructions are adequate for constructing both planar and curvilinear anamorphoses. We will describe a path of growing generality

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

215

leading swiftly from plane anamorphosis to curvilinear perspectives. This is both a descriptive path for our subject and a learning path a student might follow with exercises of growing difficulty and generality. Using conical anamorphosis as the central concept integrates the process and simplifies it to the point where the whole apparatus of both anamorphosis and spherical perspectives can be learned by an artist in a few weeks, even if starting from no formal knowledge of perspective (Araújo 2017b). A practical way to teach anamorphic constructions to artists is to start by implementing a simple Dürer machine with a thread fixed on a tripod or other fixed point that will serve as the observer’s point O (Fig. 22). We make the Dürer machine work forward to the ground plane rather than, as usual, backward to the perspective plane. For each reference point P on the object, we extend the thread from O through P and mark the image point R where the thread hits the ground plane. In this way Dürer’s perspective machine becomes a machine for making anamorphoses, which can be used to obtain an oblique anamorphosis such as those of Figs. 11 and 23. The same setup can also be used to obtain curvilinear anamorphoses such as that of Fig. 7, onto a cylinder, only requiring more points for drawing each line segment that will now project as a curve. This mechanical implementation of the Dürer machine is important to establish in the mind of the artist a concrete, physical, and operational notion of anamorphic equivalence. This makes it easier to then abstract it diagramatically, doing away with the thread. A simple diagram with a side view and top view – the most basic of descriptive geometry concepts – allows for construction of the oblique

Fig. 22 A Dürer perspective machine can be run back or forth, to make either a “perspective” onto the vertical plane or an “oblique anamorphosis” onto the horizontal one. Points P, R, and Q are all anamorphically equivalent from O, so should be indistinguishable to a viewer at O (assuming radial occlusion). (Reproduction of a print by Albrech Dürer, defaced by the author)

216

A. B. Araújo

Fig. 23 Making a cube anamorph with a thread fixed to a point

anamorphosis with diagrammatic “thread” (see Fig. 24). The process integrates the anamorph and its construction on the same piece of paper. Take the example of Fig. 24, a student’s exercise. We start by establishing orthographic top and side views (a “plan and elevation”) of the object to be projected, using the ground plane as a horizontal folding line. We establish the position of O by plotting its projection OT on the horizontal plane and fixing its height at OS in the side view. For the example presented, it is easier to start the object’s drawing with the top view of the vertices of the base and then lifting them to the side view, through the folding line, for instance, starting from the top view projection of point PT in the figure, lifting it through a vertical to get its side view PS , by setting its height above the folding line. Then to plot the anamorph of the object onto the ground plane, start with the side view, as rays in the side view have true intersections with the ground plane. An intersection is said to be true if the intersection of the projection is the projection of the intersection. For −→ −−→ instance, taking the vertex P , OP projects as OPS , and since the ground plane is

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

217

Fig. 24 Top: Abstracting a Dürer machine with plan and elevation views of the threads going through a cube’s vertices. The anamorphosis is realized on the top view and should be observed with the eye above OT at the height of OS . Bottom: The anamorphic view from O. (Student work by Manuel Flores)

−−→ perpendicular to the folding line on the side view, ray OPS has a true intersection −→ with the ground line, that is, the side view of the intersection of OP with the ground −−→ plane is the intersection of the projected ray OPS with the ground line. By contrast, the intersection of that ray with a vertical edge of the cube, for instance, would not necessarily be true. Obtaining the intersection with the ground line, drop it on −−→ −−→ a vertical to intersect ray OPT . Since OPT is the projection of the vertical plane −→ through OP , the intersection is true and equal to the top view of the anamorphic image of P onto the ground plane.

218

A. B. Araújo

In this way we obtain the image of all the vertices on the top view of the ground plane, which we can connect with line segments to get the full anamorph of the cube. Then the anamorphic illusion will be obtained if we observe it with our eye over point OT at the height determined by OS . Of course this effect only happens in our perception if the principle of radial occlusion is verified. This will not be verified by the naked eye if the anamorphosis is too small, and the typical classroom anamorphosis will generally be far too small for the principle to be valid. However, the camera – especially the small sensor camera of mobile phones, with their large depth of field and close focusing distance – is just the right tool to simulate the eye of the perfect monocular observer. Most anamorphosis we have displayed here, being drawn in small A4 to A3 sheets of paper, are, in this situation, small constructions that cannot really fool the naked eye, but which display their magic perfectly on camera. Students, especially younger students, seem to find this extra step not a mere concession to convenience, but actually an added charm, a mathematical trick that is inherently instagrammable. We can extend this descriptive geometry construction to many other situations; in fact we could use it as a pretext to teach all the usual operations of descriptive geometry, projecting objects against planes slanted at arbitrary angles, or against cones, cylinders, spheres, or various other classes of surfaces. One can also consider the problem of shadows, both volumetric shadows as we have considered above and also projected shadows (See Fig. 25) that can be constructed in much the same way as we construct oblique anamorphosis, as the mechanism is exactly the same, requiring only that we treat a point light source as an additional conic apex in the diagram of Fig. 24. Anaglyphic anamorphoses (Fig. 12) are also easily constructed in this way. So many are the achievable constructions and the descriptive geometry techniques put to use that Araújo (2017c) proposes anamorphoses as a path into descriptive geometry for young students, given its motivational possibilities, and the Fig. 25 Anamorphic structure with projected shadows, using orthographic projection. (Student work by Maria Bianchi Aguiar)

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

219

fact that, unlike many other constructions in this discipline, anamorphoses provides exercises without a need for an oracle, meaning that a student can, for the most part, independently see if the exercise was well executed, without a teacher to tell him so. This solves a good deal of the difficulties of descriptive geometry, the main ones being the student’s inability to visualize the spatial construction that is being achieved, the other the lack of compelling incentives to achieving the result. Let us consider, among the many constructions accessible through this scheme, a couple that will be of use in our transition to the study of perspective. We revisit, with the aid of descriptive geometry, the case of the cylinder. First, revisit Example 6 (Fig. 18), where a cube is projected onto a cylinder with O located outside of the cylinder. In this case, and in contrast to the case of the oblique anamorphosis onto the ground plane, it is the top view that provides for the easier starting point, as rays from OT have true intersections with the cylinder wall, which projects as a circle. So, as we see in Fig. 18a, a vertex P of the cube, with top view PT may be projected from OT in the top view until it hits a point PT in the cylinder’s circular boundary; raise this point on a vertical to meet the side −−−→ view of the same ray, OS PS , finding PS . Transport the height of PS by sending a horizontal across to the flattened cylinder view, and find there the image of P where this horizontal intersect the vertical that marks the position of the radial plane through O defining angle α =  (PT CT F ) in the top view, where point F is located at 180 degrees to the point where the cylinder is cut. Consider next the case of Example 7 (Figs. 19 and 26) in which O is in the cylinders axis. The projection is made in the same way, illustrated in the figure with the plot of a vertex P of a cube. Consider, for instance, the horizontal edge of the cube going through point P and ending above point F , in the figure. The segment is contained in a line that projects as the curve V P V  . As mentioned in Example 7 this curve is a sinusoid. It is easy to see why using descriptive geometry, in the particular case of horizontals (and the general case is easily obtained through a little trigonometry). One can always rotate the horizontal to assume it projects as a point in side view, as in Fig. 19. Then any point Q along that line, whose orthogonal projection QT subtends an angle α = QT OT F in the top view, will project as cos(α) on the horizontal line OT F in top view, and this measure will be transferred up to the line OL PL on the side view, hence being equal to m cos(α) with m determined by the slope of OL PL and hence constant for all Q on the line. Once the cylinder is flattened, the vertical measurement is preserved, and the horizontal measurements are linear with the angle α, so the curve obtained as Q slides along the line is proportional to cos(α), hence a sinusoidal curve. Returning to Fig. 19 and to the line defined by the horizontal edge through P , its vanishing points V and V  are obtained by translation to O and intersection with the cylinder. There are several ways to plot the sinusoid itself. In a variation of what was done in Fig. 19, one can simply plot points corresponding to regular angular intervals on the top view circle and project these onto the flat view, then interpolate. More interesting may be to note that the sinusoid is defined by any two of its (nonantipodal) points, for instance, V and P . A good choice is V and the apex of the

220

A. B. Araújo

Fig. 26 Cylindrical anamorphosis of a cube onto a cylinder. The image on the flattened cylinder (upper right) is a cylindrical perspective. It transforms non-vertical lines into sinusoidal curves. Notice the blue sinusoidal curve defining the cube’s top edge through point P

curve, where it reaches it maximum. This apex projects in top view at the point where the top view of the line meets its orthogonal through OT . Given V and the apex image, it is easy to plot the sinusoid in the flat view.

Perspectives We have talked from the start of considering perspectives, both linear and curvilinear, as a derived concept from anamorphosis. We are now in a position to develop that idea. Let us begin once more intuitively and loosely: in our view, a perspective is just a flat representation of a surface anamorphosis, like a map is a flat representation of the Earth’s globe. That is all. We want these maps to have certain properties, so we will need some technical details, but this is the main idea to keep in mind. The notion of curvilinear perspective has a complicated past with a history that is often confusingly reported upon. The difficulties come not so much from the misconceptions of the past, but from those remaining in the present, under whose

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

221

burden the historian must work. It is hard to argue whether Leonardo or Jean Fouquet are precursors of curvilinear perspective (Andersen 2007) without a proper definition of what those perspectives are. Often both the original sources and the historian fail to distinguish the perspective from the corresponding anamorphosis. This distinction, that so clarifies the subject, is our focus here. The matter of a perspective being “curvilinear” is comparatively minor. We will define as curvilinear any perspective that projects spatial straight lines into plane curves. This of course includes the cylindrical perspective and the usual spherical ones. Note that “curves” cannot mean only smooth curves; otherwise this has the curious result that at least one perspective (the cubical case) manages to be spherical without being curvilinear (in this perspective spatial lines project as sets of line segments). But we will not concern ourselves much with this. Curves will arise from Definition 11 as side effects of the need to project compact anamorphoses onto compact plane sets while preserving their main topological characteristics, such as vanishing points. The perspectives we will define are not so much characterized by being curvilinear as they are by being central (a side effect of being derived from conical anamorphosis) and compact. We will motivate our definition of perspective by considering the simplest of curvilinear perspectives: cylindrical perspective. It has a long history. Whether you consider it defined by Herdman’s 1853 treatise (Herdman 1853) or implicitly defined by Baldassare Lanci’s 1557 (Kemp 1990, p. 175) perspective machine, curvilinear perspective was certainly textbook matter in the 1900 Ware (1900), and in the very early twentieth century was used for military balloon airmen spotting for artillery (Dept. of Military Aeronautics 1918). Its aspect is familiar to the modern reader, as it corresponds to that of the common panoramic photography. We have already silently met cylindrical perspective, in Figs. 19 and 26. We have seen that in order to construct the anamorphosis of an object (in this case a cube) onto the cylinder, with O at the axis, it is convenient to first draw the scene in plan and elevation, flatten the cylinder into a rectangle by cutting and unrolling it on the plane, and project the object onto this rectangle (upper right corner of Fig. 26). Then, once the projection is drawn, the rectangle can be rolled up again and glued at the vertical edges to form the compact cylinder in 3D space, and the resulting picture on its surface will thus become the cylindrical anamorphosis. That auxiliary plane drawing on the rectangle, from which the anamorphosis is rolled up, is what we call the cylindrical perspective. Note that this is purely an affair of convenience. By definition, the anamorphosis is obtained on the spatial cylinder, through conical projection. But in practice, we find it convenient to draw on planes. This is analogous to cartography, where the exact shape and metric of features on a globe is sacrificed to the convenience of a plane chart. So, to us, perspective is merely this: a plane chart – a flat representation – of the spatial anamorphosis. The flat representation is not only easier to draw; it allows for they eye to capture the whole picture at a glance, at the price of introducing optical deformations. This is often the reason to use it, as in the case of Jean Fouquet’s Arrival of the Emperor Charles IV at the Basilica in Saint Denis , where a deformation quite similar to cylindrical perspective allows us to see at a glance further down the street than would be possible with linear perspective.

222

A. B. Araújo

Fig. 27 A View of Delft, with a Musical Instrument Seller’s Stall. Carel Fabritius, 1652. Currently at the National Gallery in London. Apparently a cylindrical perspective, although nothing is known regarding the method of its construction

Analogously to the case of cartography, we must pay for this convenience with deformations that result in perception inaccuracies. This is true even in the case of the cylinder that unrolls isometrically onto the plane. See, for instance, Fabritius’ View of Delft (1652) (Fig. 27).The flat cylindrical perspective drawing, although it contains all the information of the cylindrical anamorphosis, and is isometric to it, is no longer an optical illusion; there is no point O from which the points can recreate the effect of the original object – the cones no longer align; hence it is no longer an anamorphosis. So the straight lines in the picture all get deformed. The line image of Fig. 19 looks nothing like a line (as we have seen, it is a sinusoidal curve), and the cube on Fig. 26 looks nothing like a cube. The non-vertical lines are bent; that in itself is not the problem, as they are also bent in the anamorphosis, and still they will look straight when seen from O. The problem is that in the perspective they are not just metrically but fundamentally optically deformed; there is no point O  from which they can all look straight. The perspective drawing is a representation, a storage of visual information – but it is not mimetic. Of course it can still be read up to a point, even without training. But I challenge anyone to distinguish with perfect accuracy – a task easy in classical perspective – which lines in a cylindrical perspective are really bent and which are only bent through projection. The task can hardly be done by the untrained eye unless with an extreme reliance on context. The cylindrical perspective still evokes the scene, but no longer mimics it. Let us be more precise. We view the perspective as a map from 3D space to a compact region of the plane, defined by a two-step process: the first step being the anamorphosis, from the 3D space to the surface S, the second step being the flattening of S. The anamorphosis is completely defined by the choice of S and O;

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

223

the flattening is a much more arbitrary map, just like in cartography. Because we will want to preserve the visual information , if not the visual aspect, we have some requirements. First, we would like it to be a bijection, so that no information is lost. Then, we’d like it to be a homeomorphism (continuous with continuous inverse) so as to preserve the topology and hence the vanishing sets and their meeting properties. Finally, we would like it to be as smooth as possible. We have a second set of more vague, yet crucial requirements: we would like it to preserve as much of the mimesis as possible; though it will not be an anamorphosis, it should be in some way recognizable as a picture; it should summon the visual presence – or at least the intellectual recognition – of what it represents. Finally, it should be solvable by elementary means. By this we mean that you should be able to draw it by hand, by ruler and compass, or some other simple means with adequate precision and in reasonable time, i.e., we would like it to be a perspective for humans and not just for computers. We find that even the first three technical requirements are asking too much. We need to interpret them with some care. Look at the case of the cylinder in Fig. 26 and it will become apparent why. The cylinder is a developable surface; it can be unrolled onto the plane, preserving distances on the surface. Intuitively we would like to say that the flattening is the unrolling map that sends the cylinder to the rectangle. But this is not a bijection on the compact rectangle. Let c be the segment through B where the cylinder is cut, and let c1 and c2 be the corresponding vertical edges of the rectangle. Each point of c is sent to a point on c1 and another equivalent one on c2 , so this “projection” is not even a map at all. We solve this by noting first that the flattening π is well defined almost everywhere, meaning, on an open dense subset of the cylinder S. Namely, it is well defined on S minus the union of its boundary with the vertical segment c. This projects to an open rectangle that excludes c1 , c2 and is a homeomorphism onto that open rectangle. Further, the inverse π −1 can be extended uniquely to a continuous map π˜ from the full compact rectangle to the complete cylinder. We just map both c1 and c2 to c, since in this direction the map has no problem of uniqueness. Apart from technicalities, we have the map we wanted. We define the cylindrical perspective to be the composition of π with the conical projection, and the perspective image/vanishing set to be the image of the anamorphosis/vanishing set, where we identify as the same point any points with equal images through π˜ . We will use this example as a template for the general case and say that π is a flattening if it can be extended to a continuous map in the way we just exemplified. We formalize this by the following definition, adapted from Araújo (2015): Definition 11. Let S be an O-starred surface. Let U be an open dense subset of S. We say that π : U → R2 is a flattening of S if π is an homeomorphism onto π(U ), and there is a continuous map π˜ : cl(π(U )) → S such that π˜ |π(U ) = π −1 . We say that p = π ◦ ϕO,S is the perspective associated to the flattening π , where ϕO,S is −1 the conical projection. Let p˜ = ϕO,S ◦ π˜ . Given an object Σ, we say that p˜ −1 (Σ) is the strict perspective image of Σ, that p˜ −1 (VO (Σ)) is the vanishing set of Σ, and

224

A. B. Araújo

that the perspective image of Σ is the union of its strict perspective image with its vanishing set. The following set is often useful in discussing perspectives. It is the minimal closed set we must take out of S for π to be well defined. Definition 12. Given a flattening π of S, We call blowup to the subset of points of cl(π(S)) where π˜ is not injective. The geometer may notice that we are here abusing the term “blowup,” common in other areas of geometry, where it means, roughly, the replacement of a point by the projective line. Here it is in fact just the set where gluing (identification) of edges takes place. We use the term blowup by analogy to the important case of fisheye perspective where a point will identify to a circle or more precisely to the set of rays or directions defined by that circle. This is analogous to the usual sense of blowup of a point, which roughly is the replacement of the point with a projective line. Example 9. Take again the case of the cylinder in Fig. 26. Let O = (0, 0, 0); let the radius of the cylinder be r = 1 and its top boundary lie at z = ±h. Suppose we cut the cylinder at the vertical segment c that passes through B = (−1, 0, 0). Then, in cylindrical coordinates, we define π˜ (θ, z) = (cos(θ ), sin(θ ), z), so the blowup set is B = c, and π˜ implicitly defines π in the S \ c. Having defined π , the cylindrical perspective is defined by the entailment p = π ◦ ϕO,S . As seen above, spatial verticals are anamorphic to verticals on the cylinder and non-verticals to ellipses. Their perspective images by π are the curves we obtained above through the cylinder’s unfolding: arcs of sinusoidal curves. For instance, in Fig. 26, the line that goes through P and has vanishing points V and V  defines a plane through O that intersects the cylinder on an ellipse and that ellipse maps in perspective to the full sinusoid in blue. The line itself is the arc V P V  , one half of the sinusoid. These can be seen as the images through the cylindrical flattening of the great circle (resp. meridian) of the canonical anamorph of the plane (resp. line) in question. Notice that although π , the flattening, is the more intuitive concept, π˜ , the closure of its inverse, is the more natural map, defined everywhere and usually easier to work with analytically, as in the Example 9. This is very common in geometry. Definition 11 is just a technical way of making sure that the perspective image of an object and its vanishing points are just the sets in the perspective image that correspond to the ones previously defined in the anamorphosis. As we stated before, everything important is already defined at the level of anamorphosis, perspective being relegated to a final step of convenient representation. This of course does not in deny the importance, both technical and aesthetic, of perspective, nor the intricacies of its construction, but it clarifies it meaning. In particular it does away with long-standing conceptual mistakes expressed in questions such as “do we see in classical/spherical perspective” or “does linear perspective cause deformations.” It is

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

225

not so much that we answer these questions; instead we show they are badly posed. There is no meaning to the question because we simply do not “see in perspective,” whatever that perspective may be. Perspective is a map of the information in an anamorphosis and the anamorphosis is, when radial occlusion is verified, a mimetic object (a trompe l’oeil ). It is true that we can speak, more or less vaguely, of some perspectives preserving the aspect of the scene better than others, that is, of being “closer” to the mimesis than others. How close depends on the measure we use. (Araújo 2016) speaks of the reading mode of a perspective. For instance, a cylindrical perspective has a natural reading mode in the sense that when the cylinder of radius r is unrolled, we can think of the eye point O being transformed into a horizontal line floating parallel to the central axis of the picture, at a constant distance r. The natural reading mode is for the eye to travel horizontally, scanning the picture as it moves, one vertical thin strip at a time. At each point of the motion the vertical line in front of the viewer – and only that one – will be in anamorphosis. By contrast, in a fisheye perspective, only the central point in anamorphosis and a small neighborhood around it will show minimal differences if overlapped with the observation of the actual object. Notice that we take care to say “with the observation of the actual object” not with “its image,” as otherwise we would be in the difficult situation of trying to define what the “image” means. Mimesis is checked not by comparing “images” but by looking at pairs of objects (that may be volumes or drawings on surfaces) and being unable to tell the difference between them, at least in some well-specified experimental sense (e.g., do the vertices of a wireframe anamorph of a cube seem to occupy the same spot as the vertices of the real cube it seeks to mimic). In this sense we see that indeed classical perspective occupies a special place in the bestiary of perspectives, as it is the only one that is still an anamorphosis. This happens because the anamorphosis is already on a plane, so the flattening map is just the identity map. The reading mode of a perspective is the same as that of the anamorphosis: it consists of putting your eye in the point O and rotating it freely toward any spot of the plane of the picture. The so-called perspective deformations that happen for large angles of vision happen simply because the pictures in question are made in such a scale that they make the viewer unable to occupy point O without breaking the requirements of the principle of radial occlusion. Usually this happens because the perspective is draw at such a scale that the eye must be too close to the picture plane. And of course, the illusion will not work of the viewer leaves point O to look closer at some detail, or if he looks at the picture from some arbitrary point in a crowded room. But anamorphosis only works from point O. To blame it for not working otherwise is to blame a fork for not being a spoon.

Spherical Perspectives We have seen that among anamorphoses, the spherical one holds a special place, due to its natural identification with the space of rays RO , or with the concept of

226

A. B. Araújo

visual sphere. For the same reason, the spherical case also holds special importance among perspectives. Of course, while spherical anamorphosis is unique, spherical perspectives are innumerable. Every chart of the sphere defines a flattening and hence a spherical perspective according to Definition 11. In fact every central perspective (even classical perspective, if restricted to a compact subset of the plane) can be seen formally as a spherical perspective; hence the term is only as interesting as the qualifications we add to it: for instance, we can distinguish total spherical perspectives, i.e., one those in which all points of the sphere have a perspective image. Each spherical perspective, although it holds the same visual information and would generate the same anamorphosis, holds its own visual characteristics and artistic possibilities as a drawing type. These visual characteristics hold all sort of representational and expressive possibilities that have been explored in traditional media (Araújo 2019b; Barre and Flocon 1968; Casas 1983; Michel 2013; Moose 1986) as well as in purely digital visualizations (Correia et al 2013) and are more and more being investigated as hybrid immersive media (Araújo et al 2019a; Olivero and Sucurado 2019), a mode of artistic expression that allows handmade drawings to be visualized through immersive, digital means. It is interesting to ask how we can solve a spherical perspective, where by solving we mean giving a systematic method to find and draw all vanishing sets of lines and planes through simple geometrical constructions (such as ruler and compass constructions). Notice that since by Proposition 10 the perspective images of lines are subsets of the vanishing sets of planes, and since plane images are delimited by their vanishing sets, obtaining all vanishing sets also obtains all line and plane images; hence we can say that solving vanishing sets is the true subject of perspective. We have seen that the vanishing sets of spherical perspective are all determined by the geodesics of the sphere. This suggests, if not an algorithm, at least a strategy for solving a spherical perspective: solving a given spherical perspective should be always attempted by focusing not on lines themselves, but on full geodesics, on classifying the geodesics according to the properties of their projection by the flattening and on finding an efficient method to plot the geodesics of each class. In this, we will find there are features common to all spherical perspectives, and features peculiar to each. What common is the first step of the perspective projection, since it is just anamorphosis onto the sphere; what is particular is that each spherical perspective will have its own metric characteristics, which may vary widely. It will also have its own topological properties, but these are by design more limited in variation than the metric ones. Definition 11 ensures that π is continuous outside of the blowup set, so the topological variations will depend entirely on this set, which means that the blowup is a fundamental feature of perspective. This view of perspective, first stated and applied to the case of the azimuthal equidistant perspective (Araújo 2018c), was later used to solve the equirectangular perspective case (Araújo 2018b) and then the case of cubical perspective (Araújo et al 2019b). The splitting of the anamorphosis from the flattening, and the focus on full geodesics instead of individual lines, proved an elegant solution for

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

227

these perspectives. Previously, the only spherical perspective that had been solved systematically was the azimuthal equidistant in the hemispherical case, meaning that in fact only half of it had been solved. This is the perspective of Barre and Flocon (Barre and Flocon 1968), solved in the 1960s, which renders into images similar to those of a “fisheye” camera. The full azimuthal equidistant case had only been solved in either a qualitative way (Casas 1983) or through informal grid methods (Michel 2013) which lack the generality of Barre and Flocon’s solution. So there stood this curious situation in which the first spherical perspective was not (fully) spherical, and the subsequent ones were not (formal) perspectives. The approach here described and implemented in (Araújo 2018c) was able not only to extend Barre and Flocon’s perspective to the full sphere but also simplified their solution in the frontal hemisphere, reducing it to two distinct classes, the focus on geodesics instead of lines providing a clearer, more unified view of the problem, for even when plotting only the frontal part of a line, it is helpful to be able to call on the antipode of every point (and the pair of any vanishing point) for auxiliary constructions. As for the equirectangular case, it had previously been treated only computationally or through fixed grid constructions, the equirectangular grid being calculated at fixed intervals and used as a guide for drawing. The more general approach in Araújo (2018b) leads naturally to an exploitation of the translational symmetries of the perspective , which can be used with a system of dynamical moving grids that allow for the plotting of all lines in good approximation (Araújo 2018a, 2019a). The cubical perspective case is an interesting one, since seeing it as a spherical perspective (Araújo et al 2019b) greatly simplifies previous approaches (Olivero et al 2019; Rossi et al 2018) that saw it as a conjunction of six related classical perspectives. It might not at all seem obvious that seeing a set of classical perspectives as a spherical one would be a simplification. In all these three cases, the approach passes through understanding the flattening and its blowup set, classifying the geodesics of the sphere according to how they are projected in the plane, and finding a method to render each class through simple constructions. For this, the understanding of antipodes and how to plot them is essential , as the duality of vanishing point pairs usually results in symmetries that can be exploited. As these are curvilinear perspectives, approximations will be required to draw line projections, and in these the number of operations should be kept to a minimum. This will require a careful exploitation of the symmetries arising from the flattening. A general philosophical principle is that not only the perspective drawings themselves but also the auxiliary drawings should happen in a compact subset of the plane; in short, everything should be bounded. We will not explore in any detail here the actual constructions of spherical perspectives, leaving that for another chapter in the present volume (see  Chap. 19, “Spherical Perspective”). We will merely relate the fundamental properties of two of these cases, to see how they fit on our general scheme described above. In the case of the azimuthal equidistant perspective (Fig. 28 (top)), the flattening works by choosing a reference point B (for “Back”) and loosening the meridians there, straightening them without stretching to obtain a disc centered on the point

228

A. B. Araújo

Fig. 28 Azimuthal equidistant flattening of the sphere. Left: Blowing up point B maps the sphere to a disc centered on the image of F , the antipode of B. Geodesics through F map to diameters of the disc. Right: The green line is a geodesic through F . Geodesics not through F (blue and red lines) map to closed convex curves, well approximated by arcs of circles in the yellow inner disc that represents the space in front of the observer

F (for “’Forward”) at the antipode of B. If we reference F and B as the “poles” of the sphere, then F B meridians flatten as rays of the disc and “parallels” flatten as circles, with the image of the equator separating an inner circle (the frontal view, rendered yellow on Fig. 28 (right)) from an outer ring, where the back view is rendered (see the examples in Fig. 29). The flattening is one-to-one at all points except at B. Taking the closure we get a compact disc, with the outer boundary circle (the blowup set) mapping entirely onto B. The flattening preserves the metric along each individual F B meridian, though of course not globally. This preservation of the metric is what allows measurements to be made, so drawing this perspective requires careful attention to these special meridians in the construction process. The natural measurements for this perspective are angle pairs, each point on the sphere being measured by one angle choosing the F B meridian where it lies and the other the length traversed from F along that chosen meridian. As for geodesics, these can be classified in two classes: geodesics through F render as diameters and all others as closed curves (see Fig. 28 (right)). For the latter, half the geodesics will be rendered in the anterior disc (yellow in Fig. 28 (right)) as an arc of circle c in good approximation, as shown by Barre and Flocon (1968); the other half can be rendered, as the locus of a point P that moves so as to keep at a distance of half

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

229

Fig. 29 Two handmade drawings of urban scenes, obtained through the process described in Araújo (2018c) (left) and Araújo (2019b) (right). All lines are contained in geodesic images of the azimuthal equidistant flattening. (Drawings by the author)

a diameter from the point of c that is across from it along the line that joins it to F . This locus can be drawn using a mechanical process involving a ruler sliding on a fixed point, as explained (Araújo 2018c) and in the  Chap. 19, “Spherical Perspective” in the present volume. This can be simplified by a moving grid system using the rotational symmetries of the perspective around F (Araújo 2019b). In the case of equirectangular perspective, we blowup the sphere at two points U and D (representing for instance the directions “up” and “down”) and straighten the meridians, keeping them fixed at the circle halfway between U and D (the “equator”). In this way we obtain a cylinder, which we can now cut and unroll onto a rectangle. Unlike the case of cylindrical perspective, this rectangle will contain a record of the whole view around O. There are two classes of geodesics: geodesics through U that render onto straight vertical lines and all the others. The first (resp. second) type includes the image of vertical lines (resp. horizontals) but is not restricted to these. If we consider the set of all geodesics through two antipodal points on the equator (say, points L and R, corresponding to the observer’s Left and Right), we can generate all other geodesics from translations of elements of this set. This allows for a construction of complex scenes such as that of Fig. 30 without explicit use of ruler and compass and without the limitations of a fixed grid (see (Araújo 2018a, 2019a)). As pointed shown in Araújo (2018b). These geodesics are very similar to sinusoidal curves when their apex (point of highest angular elevation) is low; hence this perspective looks like cylindrical perspective for low elevations (see Fig. 30). As the elevation rises, the geodesic images converge to a square wave, but even for high elevation they can still be rendered in good approximation with descriptive geometry constructions. In Fig. 30 we can see a drawing done on location of an equirectangular spherical perspective using the sliding grid method of (Araújo 2018a). On top of it we see a spherical anamorph of the same scene; the drawing was

230

A. B. Araújo

Fig. 30 A spherical anamorphosis and its equirectangular perspective. The spherical perspective drawing was done on location by the author. It was scanned, transformed on a computer, and then printed and glued on a ball to obtain the corresponding spherical anamorphosis. The author would like to thank Ph.D. student Lucas F. Olivero for making the 3D model on the top left of the picture

transformed by a computer into a sinusoidal projection, and then printed and glued onto a ball. This again shows that often the point of a perspective is to be a blueprint for the anamorphosis, instead of the other way around. We defined the perspective as the flattening of the anamorphosis because the anamorphosis is the uniquely defined object; but just as often the anamorphosis will be in practice treated as the folding of the flat perspective.

The Problem with Perspective The view of anamorphosis we studied here was motivated mainly by the wish to solve conceptual problems with perspective. We will start by discussing what those problems are. One might be surprised to hear are any conceptual problems with perspective at all. Linear perspective is universally used today to represent a 3D scene on a plane. Most working artists and architects take it for granted as an objective picture of an environment, identifying it with photographic representation. Mathematicians take it for a well-understood concept, made rigorous as an application of projective geometry (see  Chap. 4, “Looking Through the Glass”). Yet the nature and meaning of perspective has been a point of contention since its inception. We can see this as a contention between the views of Euclid and those of Leonardo.

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

231

Euclid and Psychophysics We could argue that our view of anamorphosis in the present work is a recovery of concepts first formulated in Euclid’s Optics with the addition of vanishing points. Or, put in another way, that the fourth axiom of Euclid’s optics is the earliest known precedent for the concept of anamorphosis. Axiom 1. (4th axiom of Euclid’s Optics) and that those things seen within a larger angle appear larger, and those seen within a smaller angle appear smaller, and those seen within equal angles appear to be of the same size (Burton 1945) Euclid here is postulating that for visual perception, the visual angle subtended by an object and its perceived size are the same thing. This is clearly true for a simple object in isolation of other stimuli. That is, if I take a black ball on a white background and I make it subtend progressively larger angles an observer will certainly report a larger apparent size. But it is too daring a proposition when related to complex visual environments. For instance, Ebbinghaus illusion (Ebbinghaus 1902) shows that the perceived size of a colored circle can depend strongly on the disposition of other colored circle’s in its vicinity. What exactly is happening is still a matter for contention, with several theories extant (Roberts et al 2005). This is not uncommon for visual phenomena whenever global variables – rather than very wellcontrolled local experiments – are taken into account. Analogous problems occur in color theory, where the local phenomena are well understood and mathematically elegant, while the global theory is confusing and qualitative. So we must not take the term “apparent size” at face value. Astronomers use the term in the Euclidean sense to mean simply the size of the solid angle subtended, while in studies of perception, apparent size means the subjective impression of size as reported by human subjects. In the latter sense, Euclid asserts too much, ignoring contextual matters. Still, a careful localized interpretation of the fourth axiom does hold water as empirical fact. Like we said before, it defines an implicit empirical scope wherein it is true, and therefore it is as interesting a concept as that scope happens to be relevant. Still, in our own work here, we took a different, although related, path. We are not concerned with size but with form; not with the size of the solid angle subtended by the cone of an object but with the cone itself. Our principle of radial occlusion neither implies nor is implied by Euclid’s 4th axiom of optics. But the emphasis on the visual angle, without reference to any projection surface, the isotropy of view, and the notion of visual cone as a function of the object rather than on some central axis is already present in Euclid, and a reading of his theorems shows that he has in mind something similar to us. The difference is his avoidance of infinity. Therefore we might say that the work here developed clarifies Euclid and adds to it the notion of points at infinity – of compactness, if you will. It is impressive to read Euclid in retrospect and notice the generality of his view at such an early time and how the late arrival of linear perspective seems by comparison a step back in generality, being a theory of representation of too particular a kind.

232

A. B. Araújo

Leonardo’s Axiom and Paradox Leonardo considered there were many conceptual problems with perspective as formulated by Alberti. He distinguished between “natural” and “accidental” perspective. This is related (but not identical) with the more usual distinction between natural perspective (meaning optics, as a study of both light and visual perception) and artificial perspective (meaning linear perspective). Leonardo objected to the requirement in linear perspective of eye placement at single point of observation and thought that a spherical canvas would make for a more natural perspective; and he seems to have thought that in the absence of this a very distant canvas might do. In this he seemed to be concerned with the inclination of the surface normal with regard to the angle of view (in Kemp’s (Kemp 1990) view, the foreshortening of the canvas just as much as the foreshortening of the picture). “Foreshortening of the canvas” is of course no more than a reference to the effects of oblique anamorphosis. Leonardo’s greatest difficulty, or at least the one that survived to our time, was based on the following axiom, which he considered a requirement of a natural perspective: Axiom 2. Leonardo’s axiom: Among objects of equal size that which is most remote from the eye will look the smallest. This in itself is not contentious. But he then confusingly identified this notion with the completely different and in fact incompatible one that they should therefore be represented by metrically smaller projections. This in contrast with Euclid’s fourth axiom of optics. He gave the example still called Leonardo’s paradox (Fig. 31): the conical projection of a sphere that moves along the plane projection actually grows as it moves away from the observer (see Araújo 2016; ?). This “paradox” really isn’t one. True, the perspective picture grows, but so does the distance of this projection to the observer, in such a way that the apparent size of the projection diminishes. By Euclid’s fourth axiom, both the distal sphere and its projection will look smaller than the proximal one. Leonardo’s real objection may be the disparity between the properties of the metric representation and the angular/visual properties. He would like a perspective that carried the proper visual properties of mimesis without the requirement of standing at a single point O. And he would like that the metric properties, along with the angular ones, should satisfy his axiom. Hence his preference for a spherical canvas. In this, there is a conflating of the two concepts that we separated in the present work: perspective and anamorphosis. This conflation would endure and persist as a philosophical difficulty until the twentieth century. It is in fact still a difficulty today in the way these affairs are discussed. This leads to asking the wrong questions such as whether a “perspective” has “deformations” or whether it is “natural” or whether it corresponds to the way we see. These questions vanish if we only separate the two concepts of mimesis and representation, anamorphosis, and its flattening into perspective. So that the point may be made that the paradox is no such thing – it does not point out to any failure of linear perspective in achieving what it sets out to do, but is merely a complaint

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

233

Fig. 31 Leonardo’s paradox: |AB| > |CD| even though the corresponding sphere is further away from O

that linear perspective does not do as much as Leonardo wishes from a perspective. But it is hard to see this as a flaw of perspective as much as a failure of Leonardo to settle on a consistent set of requirements. One cannot ask at the same time to preserve both angular and linear measurements. That linear perspective (as plane anamorphosis) does not preserve the latter is not a bug but a feature mathematically required to preserve the former. This enduring failure to distinguish the mimetic role of anamorphosis from the representational role of perspective may be due to the fact that linear perspective and plane anamorphosis are the same physical object (the “flattening” being trivial); this identification between the anamorphosis and the corresponding perspective only happens in the case of linear perspective, and this coincidence probably made it all the harder to understand the distinction between the two concepts. It is rather unfortunate that the most “obvious” of perspectives is in fact a quite peculiar one. The matter was still contentious in dealing with cylindrical perspective, much later, and even when the practical difficulties of the drawing itself were well handled, mechanically and theoretically (Kemp 1990, p. 175).

Effects on the Development of Spherical Perspective In the twentieth century, the discussion started by Leonardo took a particularly confusing turn and produced reams of papers in a polemic between the realist and conventionalist schools of thought. A simplified view of the matter is that the realist position, represented by Ernst Gombrich, attributes an objective quality to the ability of perspective pictures to mimic reality. This ability comes from the fact that the picture realized by the rules of perspective furnishes a properly located observer with the same bundle of light rays that he would receive from the original object. The conventionalist position, as championed by Goodman, argues that instead we see classical perspective pictures as realistic simply because we have been acculturated to them. Like in most such debates, things are not so clear as in this rapid caricature. First of all Gombrich’s “realist” position is hardly clear-cut. In Art and Illusion (Gombrich 1960) he considered that there is no reality without

234

A. B. Araújo

interpretation and that seeing is inseparable from conceptualization: there is no innocent eye. Gilman (1992) argues that there is no naturalistic position on either side of the debate and sees it rather as a contention between conventionalist factions, Gombrich being merely a conventionalist with particular scruples where perspective is concerned, which Goodman saw as puzzling contradiction within the context of Gombrich’s general position. Gombrich’s objection was not so much against the contention that perception required interpretation but that this interpretation was just a cultural construct. According to Mitrovi´c (Mitrovi´c 2013b) it was this strong thesis of cultural relativism that the totality of perceived reality is culturally constructed, as well as associated collectivist notions that repulsed Gombrich. The positions are complex, and Vergsten (Verstegen 2011) argues that both Gombrich and Goodman are “guilty of letting extraneous issues color their discussion of the (non)conventionality of perspective.” Be all this as it may, the opposing positions have been argued confusingly and in so many skirmishes that scholar’s careers can be erected on the mere accountancy of blows. As in most such debates, the boxers have reached belatedly and by exhaustion the obvious conclusion that, while fresh, they refused to reach by good sense: that neither extreme position is tenable (Frigg and Hunter 2010, page xx) and that the interesting question is how exactly purely optical mimesis and convention interact. Fortunately, however, we are not here concerned with fleshing out such intricate business. Goodman argued famously that representation cannot be mere resemblance, since resemblance is symmetric (A resembles B implies B resembles A) while representation is not (a picture may represent a horse but a horse doesn’t represent its picture). We argue that in our present formulation we have cut this gordian knot by separating the matter of resemblance (anamorphic equivalence, which is indeed symmetric) from that of representation (perspective, which is not; A being a perspective of B doesn’t make B a perspective of A). So we are not interested in dealing with this debate in the terms set by Goodman. We have answered it: linear perspective is not conventional – it is the only perspective that is also an anamorphosis, and anamorphosis is an optical principle that stands on top of an objective empirical fact: the principle of linear occlusion. We still care about this debate, however, because it influenced contemporary understanding of the purely mimetic possibilities of perspective and anamorphosis. This is especially unfortunate as Goodman was at its weakest when intruding into the field of mimesis proper and has since been extensively refuted in detail – see, for instance, Mitrovi´c (2013a). Consider the infamous phrase from Languages of Art (Goodman 1968). Goodman concedes for the moment the mimetic properties of perspective, but decried what he calls the “remarkable” requirements for such an illusion to occur: The picture must be viewed through a peephole, face on, from a distance, with one eye closed and the other motionless. The object also must be observed through a peephole, from a given (but not usually the same) angle and distance, and with a single unmoving eye. Otherwise, the light rays will not match (Goodman 1968, page 12).

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

235

The conditions of observation, as described, seem indeed remarkable, even reminiscent of an infamous scene from Kubrik’s A Clockwork Orange when Alex is bound to a chair with eyes clamped wide open, his gaze forcedly fixed forward. But more remarkable is that the argument easily crumbles not only under the scrutiny of the empiricist but merely under that of the tourist. For all it takes is a casual stroll into the church of Saint Ignazio de Loyola in Rome and a look at its illusionary ceiling, where Andrea Pozzo indeed used “perspective” (anamorphosis in our view) as an alternative to architecture proper (Fig. 1). The Jesuits wanted a grander building than it could afford, and Pozzo constructed it in virtuality, going through all the steps of an architectural project, with plan and elevation constructions, perspective renderings, and finally anamorphic constructions . Here, the tourist looks to painted columns that extend the real ones so expertly that the seams are hard to identify. So expertly are these illusions constructed that it is hard to discern the true form of the curved ceiling. There is no peephole here, no need to close an eye and to keep the other unmoving. Goodman is stuck on the experience of smallscale anamorphoses and confuses the practical limitations of size with theoretical limitations. With a high ceiling at his disposal Pozzo constructs an immersive anamorphosis. At that large scale, the observer’s two eyes work monocularly in good approximation. Also, the head can and is expected to rotate freely to contemplate the surrounding illusion. It is rather deceiving to say that the anamorphosis has to be seen “from a certain angle and distance.” It has to be seen from a certain point, but on that point the eye is allowed to rotate freely. In such a large scale illusion as in Pozzo’s ceiling at St. Ignazio’s (or as in the large-scale nineteenth-century panoramas), the observer can even break a little with the theoretical impositions and get away with it. A large disk marks the spot from where the illusion of Pozzo should be observed, but a couple of steps away from it (or a few heads too tall or short) won’t break the illusion substantially. It is quite remarkable that the arguments of Goodman could survive scrutiny in the twentieth century when Pozzo not only understood in practice the full possibilities of anamorphosis but wrote on them so eloquently and clearly in his treatise. If the eye is not innocent, far less is the mind so. It was simply the right cultural moment for such an argument to be entertained, in spite of the evidence of the eye. In the present work we would hardly engage with these arguments, which in our view have aged badly, except that they were also used in much the same way by none other than Barre and Flocon as a justification for their development of spherical perspective (Barre and Flocon 1968) or at least in the paper they wrote with Bouligand aiming at its mathematical and conceptual justification (Barre et al 1964). In the latter work the author’s seek to justify their choice of spherical projection (the azimuthal equidistant projection, limited to one hemisphere) in the context of the search for a natural perspective, which in their view would satisfy the rather vague requirement of “ordering visible elements on a surface to form an image that causes on the spectator a sensation of volume and space” (“Ordonner sur une surface des éléments visibles, formant une image qui procure au spectateur des sensations de volumes e d’espace”). This desideratum (called “demande Δ”) being somewhat vague, the authors seek to approach it by the

236

A. B. Araújo

constraint of an “axiome fondamental” A, a more geometrical axiom that should be, in their view, a necessary condition for such a perspective. This “axiome A” turns out to be just Leonardo’s axiom (Axiom 2). Watching what the authors proceed to do one finds that the “surface” mentioned in the requirement Δ is in fact assumed to be a plane and that “seems smaller” means metrically smaller. This is the exact same conflation of concepts that was made by Leonardo. The author’s give much the same objections to linear perspective as Leonardo, as Goodman, and in particular, two objections usually attributed to Panofsky (Panofsky 1927, 1991), but actually much older, which pretend to justify that whatever the “natural” perspective is, it can’t be classical perspective. The first objection is that the projection of lines in the retina is curved, not straight; hence we must in fact not see lines as straight. This is a rather blunt misconception that implicitly assumes some sort of theater of the mind where some homunculus in the visual brain sits to see the retinal image (and then we might ask, in infinite regression, how does its visual system work?). This is in fact an old objection already aimed at Kepler’s model of vision, which Descartes answered by pointing out that the shape of the stimulus in the retina and the sensation caused by the stimuli are two different things. The retinal image is also inverted and yet we don’t see the world upside down (Kemp 1990, p. 234). A second objection is that the example of two parallel lines going to infinity goes against Leonardo’s axiom, as we know from experience that the lines seem to converge yet in linear perspective the distance between their projections remains constant. This confuses the metric properties of the projection with the perception of these when seen from O. Araújo (2016) refutes this by pointing out that if we put the plane of projection on top of the lines then the lines will coincide with their perspective. Hence to claim that the perspective of the lines does not seem to converge would be to claim that the lines themselves don’t seem to converge. To put it more strikingly, if the perspective drawing of a long wall is not visually realistic then the real wall itself is not visually realistic. Of course the answer is that both the lines and their perspectives do appear to converge because the subtended angle from O decreases with distance in exactly the right way precisely because the drawn lines remain at a constant distance from each other. But much the same question had been raised by Schickhardt in 1623 and refuted by Kepler in a similar fashion (Kemp 1990, p.477-8). We can add that Leonardo’s paradox of the spheres has exactly the same answer: it doesn’t matter that the distal sphere has the largest projection, since the projection is seen from further away, in exactly the right measure to make it mimic the real sphere. Being charitable and removing the layers of confusion and contradiction, one finds in the arguments of Barre, Flocon, and Bouligand one concrete and welldefined objective: to find a projection onto the plane that should cover a 180-degree angle of vision around an axis, project onto a bounded region of the plane, and be anamorphic in the whole representation. This would imply that the metric properties should be such as to save appearances (the angular properties). This is a concrete interpretation of desideratum Δ and Leonardo’s axiom. The only problem is that it can’t be done, which is also, in the end, the conclusion of Bouligand. So the

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

237

azimuthal equidistant projection is chosen as simply the better compromise among the available candidates. It is all about minimizing metric deformations as far as possible. Hence the vagueness of desideratum Δ – the more strict requirement of actual mimesis could not be satisfied. But of course, no matter what word juggling we use, nobody in the end mistakes a spherical perspective for the real thing. There is certainly a visual evocation of the object, but not mimesis in the strict sense. And this is not, as the authors seem to assume, a problem that comes merely from the sphere being non-developable. The cylinder is developable (isometric to the plane) and yet unrolling it destroys the anamorphic effect. In the case of Barre and Flocon’s spherical perspective, there is simply no point O from which the drawing has a mimetic effect in the sense of the cone of the representation being the same as that of the represented object. The authors have it exactly the wrong way around: spherical perspective, unlike classical perspective (and unlike spherical anamorphosis), is entirely conventional. It requires an agreed upon, rather sophisticated intellectual translation in order to be read. It’s achievements are of a different order: it is a flat record of one half of a spherical anamorphosis, which has good metrical properties and – crucially – is easy to draw by hand, since its line projections are arcs of circle. The authors do injustice to their own work by chasing a quixotic goal while underselling their true achievements – the rational description of a construction method for the first rigorous spherical perspective. These philosophical misconceptions are not merely academic. Araújo (2016) argues that looking exclusively at the minimization of distortion is probably the main reason why Barre and Flocon did not extend their perspective beyond the hemispheric view. Extending their perspective to a full 360-degree view would require acceptance of rather egregious metric deformations, which would defeat their standard objections against linear perspective. In the view we present in this work, these deformations are not a problem, as the spherical perspective does not intend to contend for the place of “natural perspective” and is by definition flawed in its mimetic properties. Mimesis is left to the anamorphic step, and we bind all the complicated phenomenological, optical, and physiological concepts purely within our axiom of radial occlusion. Rules of interpretation and empirical motivations are not allowed inside once the mathematical development begins. In this view there is no contradiction between anamorphic mimesis and metric deformations on the perspective, just as there is no paradox between surveying and map making. The view that perspective is an entailment of an anamorphosis and a flattening also clarifies the fundamental role of anamorphosis and the arbitrary nature of perspective, giving a dual role to spherical anamorphosis and linear perspective: among anamorphoses, it gives the spherical one the canonical role, the points of the sphere being identified with rays, hence with the classes of anamorphic equivalence of spatial points; and to linear perspective it gives a special place, justifying the realist’s view in its most naturalistic bent: linear perspective is indeed special, because it is the only perspective for which the flattening is trivial, that is, the only perspective that is still an anamorphosis. It therefore keeps a dual role of

238

A. B. Araújo

representation of visual information and of mimetic object, objectively creating a visual illusion, independently of any convention.

Conclusion We have presented a theoretical framework that separates and delimits the notions of perspective and anamorphosis in such a way as to render mute long points of contention regarding whether or not curvilinear perspectives are “natural” or “correspond to the way we see.” The question is found to be not well posed, and the difficulties vanish upon reformulation. Our view of anamorphosis may be seen as an update of Euclid’s optics with a reworking of the assumptions involved and the addition of vanishing points, hence taming the notion of infinity, which Euclid is careful to avoid. We make a single empirical assumption – the principle of linear occlusion – abstracted into a simple geometrical notion, anamorphic equivalence, and proceed from there to obtain both practical geometrical constructions and an elegant, symmetric notion of vanishing set. We derive from this notion of anamorphosis a definition of compact central perspectives that includes all the usual curvilinear perspectives. This family, strictly speaking, excludes classical perspective, since the infinite plane is not a compact, yet it includes the restrictions of linear perspective to any compact, with classical perspective as a degenerate limiting case. Yet this notion – by viewing perspectives as an entailment of two separate steps, one being mimetic (anamorphosis) and the other representational (flattening) – clarifies the special role of linear perspective, giving it a distinguished position in the bestiary of perspectives, as the perspective that is simultaneously an anamorphosis. In this way we provide a language that naturally dissolves a philosophical misconception dating back at least to the Renaissance, simply by making the terms of discourse correspond to the operational practice of perspective.

Cross-References  Anamorphosis: Between Perspective and Catoptrics  Looking Through the Glass  Spherical Perspective Acknowledgments A. B. Araújo was funded by FCT Portuguese national funds through project UIDB/Multi/04019/2020

References Andersen K (1992) Brook Taylor’s Work on Linear Perspective: A Study of Taylor’s Role in the History of Perspective Geometry. Including Facsimiles of Taylor’s Two Books on Perspective. Springer, New York, vol 10, pp 1–67. https://doi.org/10.1007/978-1-4612-0935-5

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

239

Andersen K (2007) The geometry of an art: the history of the mathematical theory of perspective from Alberti to Monge. Springer Science & Business Media, New York Apostol TM, Mnatsakanian MA (2007) Unwrapping curves from cylinders and cones. Am Math Mon 114(5):388–416 Araújo A (2015) A construction of the total spherical perspective in ruler, compass and Nail. https://arxiv.org/abs/1511.02969 Araújo A (2017a) Anamorphosis: optical games with perspective’s playful parent. In: Silva JN (ed) Proceedings of the Recreational Mathematics Colloquium V (2017) – G4G Europe. Associação Ludus, Lisbon, pp 71–86 Araújo A (2017b) Cardboarding mixed reality with Durer machines. In: Proceedings of the 5th Conference on Computation, Communication, Aesthetics and X, pp 102–113 Araújo A (2017c) A geometria (descritiva) da anamorfose e das perspectivas curvilíneas. In: Workshop “Matemática e Arte”, Sociedade Portuguesa de Matemática, pp 101–108 Araújo A (2017d) Guidelines for Drawing Immersive Panoramas in Equirectangular Perspective. In: Proceedings of the 8th International Conference on Digital Arts – ARTECH2017, ACM Press, Macau, China, pp 93–99. https://doi.org/10.1145/3106548.3106606 Araújo A (2018a) Let’s Sketch in 360º: Spherical Perspectives for Virtual Reality Panoramas. In: Bridges 2018 Conference Proceedings, Tessellations Publishing, pp 637–644 Araújo AB (2016) Topologia, anamorfose, e o bestiário das perspectivas curvilíneas. Convocarte–Revista de Ciências da Arte 2:51–69 Araújo AB (2018b) Drawing equirectangular VR panoramas with ruler, compass, and protractor. J Sci Technol Arts 10(1):2–15. https://doi.org/10.7559/citarj.v10i1.471 Araújo AB (2018c) Ruler, compass, and nail: constructing a total spherical perspective. J Math Arts 12(2-3):144–169. https://doi.org/10.1080/17513472.2018.1469378 Araújo AB (2019a) Eq A sketch 360, a serious toy for drawing Equirectangular spherical perspectives. In: Proceedings of the 9th International Conference on Digital and Interactive Arts, ACM, Braga Portugal, pp 1–8. https://doi.org/10.1145/3359852.3359893 Araújo AB (2019b) A fisheye gyrograph: taking spherical perspective for a spin. In: Goldstine S, McKenna D, Fenyvesi K (eds) Proceedings of Bridges 2019: Mathematics, Art, Music, Architecture, Education, Culture, Tessellations Publishing, Phoenix, Arizona, pp 659–664 Araújo AB, Olivero LF, Antinozzi S (2019a) HIMmaterial: exploring new hybrid media for immersive drawing and collage. In: Proceedings of the 9th International Conference on Digital and Interactive Arts, ACM, Braga Portugal, pp 1–4. https://doi.org/10.1145/3359852.3359950 Araújo AB, Rossi A, Olivero LF (2019b) Boxing the visual sphere: towards a systematic solution of the cubical perspective. UID per il disegno (2019):33–40. https://doi.org/10.36165/1004 Baltrušaitis J (1983) Les Perspectives Dépravées. Flammarion Barnard ST (1983) Interpreting perspective images. Artif Intell 21(4):435–462 Barre A, Flocon A (1968) La Perspective Curviligne. Flammarion, Paris Barre A, Flocon A, Bouligand G (1964) ’Etude comparée de différentes méthodes de perspective, une perspective curviligne. Bulletin de la Classe des Sciences de La Académie Royale de Belgique 5(L) Belisle B (2015) Nature at a glance: Immersive maps from panoramic to digital. Early Popular Visual Culture 13(4):313–335 Burton HE (1945) Euclid’s optics. J Opt Soc 35(5):357–372 Cabezos Bernal PM (2015) Imágenes estereoscópicas aplicadas a la representación arquitectónica. PhD Thesis, Universitat Politècnica de València Casas F (1983) Flat-sphere perspective. Leonardo 16(1):1–9 Catalano G (1986) Prospettiva Sferica. Università degli Studi di Palermo Coates P, Arayici Y, Koskela LJ, Kagioglou M, Usher C, O’ Reilly K (2010) The limitations of BIM in the architectural process. In: First International Conference on Sustainable Urbanization (ICSU 2010), Hong Kong, China Collins DL (1992) Anamorphosis and the eccentric observer: inverted perspective and construction of the gaze. Leonardo 25(1):73–82

240

A. B. Araújo

Correia JV, Romão L, Ganhão SR, da Costa MC, Guerreiro AS, Henriques DP, Garcia S, Albuquerque C, Carmo MB, Cláudio AP, Chambel T, Burgess R, Marques C (2013) A New Extended Perspective System for Architectural Drawings. In: Zhang J, Sun C (eds) Global design and local materialization, vol 369. Springer, Berlin/Heidelberg, pp 63–75. https://doi. org/10.1007/978-3-642-38974-0_6 Correia V, Romão L (2007) Extended perspective system. In: Proceedings of the 25th eCAADe International Conference, pp 185–192 ˇ cakovi´c A, Paunovi´c M (2016) Perspective in stage design: an application of principles of Cuˇ anamorphosis in spatial visualisation. Nexus Netw J 18(3):743–758. https://doi.org/10.1007/ s00004-016-0297-5 Dept of Military Aeronautics USMADA (1918) Panoramic drawing, one-point and cylindrical perspective. G.P.O. Draper SW (1978) The Penrose triangle and a family of related figures. Perception 7(3):283–296. https://doi.org/10.1068/p070283 Dunham D (2019) The Bridges 2018 mathematical art exhibitions. J Math Arts 1–15. https://doi. org/10.1080/17513472.2019.1654330 Dutour É (1760) Discussion d’une question d’optique. l’Académie des Sciences. Memoires de Mathematique et de physique presentes par Divers Savants 3:514–530 Ebbinghaus H (1902) Grundzüge Der Psychologie. Verlag von Viet & Co., Leipzig Escher MC (1958) Belvedere Escher MC (1972) The Graphic work of M. C. Escher – Introduced And Explained By The Artist, New, Revised and Expanded Edition. Ballantine Books, New York Fasolo M, Mancini MF (2019) The ‘Architectural’ Projects for the Church of St. Ignatius by Andrea Pozzo. diségno (4):79–90. https://doi.org/10.26375/disegno.4.2019.09 Foley JD, Van FD, Van Dam A, Feiner SK, Hughes JF, Angel E, Hughes J (1996) Computer graphics: principles and practice, vol 12110. Addison-Wesley Professional Frigg R, Hunter M (2010) Beyond Mimesis and Convention: Representation in Art and Science, vol 262. Springer Gay F, Cazzaro I (2019) Venetian perspective boxes: When the images become environments. In: Luigini A (ed) Proceedings of the 1st International and Interdisciplinary Conference on Digital Environments for Education, Arts and Heritage. EARTH 2018. Advances in Intelligent Systems and Computing, vol 919. Springer, Cham pp 95–105. http://doi-org-443.webvpn.fjmu.edu.cn/ 10.1007/978-3-030-12240-9_11 Gilman D (1992) A new perspective on pictorial representation. Aust J Philos 70(2). https://doi. org/10.1080/00048409212345061 Gombrich EH (1960) Art and illusion; a study in the psychology of pictorial representation. Pantheon Books, New York Goodman N (1968) Languages of art: an approach to a theory of symbols, 2nd edn. Hackett Publishing Company, Indianapolis Grau O (1999) Into the belly of the image: historical aspects of virtual reality. Leonardo 32(5):365– 371. https://doi.org/10.1162/002409499553587 Herdman WG (1853) A treatise on the curvilinear perspective of nature; and its applicability to art. John Weale & co., London Hohenwarter M (2002) GeoGebra: Ein Softwaresystem für dynamische Geometrie und Algebra der Ebene. PhD thesis, Paris Lodron University, Salzburg, Austria Hohenwarter M, Borcherds M, Ancsin G, Bencze B, Blossier M, Delobelle A, Denizet C, Éliás J, Fekete Á, Gál L, Koneˇcný Z, Kovács Z, Lizelfelner S, Parisse B, Sturr G (2013) GeoGebra 4.4 Huffman DA (1968) Decision criteria for a class of ‘impossible objects’. In: Proceedings of the First Hawaii International Conference on System Sciences, Honolulu Huffman DA (1971) Impossible objects as nonsense sentences, Machine Intelligence 6. Machine Intelligence 6:295–323 Huhtamo E (2013) Illusions in motion – media archaeology of the moving panorama and related spectacles, 1st edn. Leonardo Book Series, The MIT Press

9 Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

241

Inglis T (2018) Constructing 3D perspective anamorphosis via surface projection. In: Bridges 2018 Conference Proceedings, Tessellations Publishing, pp 91–98 Kemp M (1990) The science of art. Yale University Press, New Haven and London Kim Y, Chin S (2019) An analysis of the problems of BIM-based drawings and implementation during the construction document phase. In: 36th International Symposium on Automation and Robotics in Construction, Banff. https://doi.org/10.22260/ISARC2019/0025 Kulpa Z (1983) Are impossible figures possible? Signal Process 5(3):201–220. https://doi.org/10. 1016/0165-1684(83)90069-5 Michel G (2013) ’L’oeil, au Centre de la Sphere Visuelle. Boletim da Aproged (30) Mitchell R (1801a) Plans, and views in perspective, with descriptions, of buildings erected in England and Scotland: and also an essay, to elucidate the Grecian, Roman and gothic architecture, accompanied with designs. Wilson & Company Mitchell R (1801b) Section of the Rotunda, Leicester Square | British Library – Picturing Places. https://www.bl.uk/collection-items/section-of-the-rotunda-leicester-square Mitrovi´c B (2013a) Nelson Goodman’s arguments against perspective: a geometrical analysis. Nexus Netw J 15(1):51–62. https://doi.org/10.1007/s00004-012-0133-5 Mitrovi´c B (2013b) Visuality after Gombrich: the innocence of the eye and modern research in the philosophy and psychology of perception. Zeitschrift für Kunstgeschichte 76(H. 1):71–89 Monroe MM, Redmann WG (1994) Apparatus and method for projection upon a three-dimensional object Moose M (1986) Guidelines for constructing a fisheye perspective. Leonardo 19(1):61–64 Morehead JC Jr (1955) Perspective and projective geometries: a comparison. Rice Institute Pamphlet-Rice University Studies 42(1):1–25 Necker LA (1832) Observations on some remarkable optical phænomena seen in Switzerland; and on an optical phænomenon which occurs on viewing a figure of a crystal or geometrical solid. Lond Edinb Dublin Philos Mag J Sci 1(5):329–337. https://doi.org/10.1080/ 14786443208647909 Norman DA (1990) Why interfaces don’t work. The art of human-computer interface design 218 Oettermann S, Schneider DL (1997) The panorama: history of a mass medium, vol 2. Zone Books, New York Olivero LF, Sucurado B (2019) Inmersividad analógica: Descubriendo el dibujo esférico entre subjetividad y objetividad. Estoa Revista de la Facultad de Arquitectura y Urbanismo de la Universidad de Cuenca 8(16):80–109 Olivero LF, Rossi A, Barba S (2019) A codification of the cubic projection to generate immersive models. diségno (4):53–63. https://doi.org/10.26375/disegno.4.2019.07 Panofsky E (1927) Die Perspektive asl’symbolische Form’. Vortrage der Bibliothek Warburg 1924– 1925, vol 320 Panofsky E (1991) Perspective as symbolic form. Zone Books, New York Papert S, Turkle S (1991) Epistemological Pluralism. In: Idit, Papert, Harel S (eds) Constructionism, Ablex Publishing Co., pp 161–191 Penrose LS, Penrose R (1958) Impossible objects: a special type of visual illusion. Br J Psychol 49(1):31–33. https://doi.org/10.1111/j.2044-8295.1958.tb00634.x Roberts B, Harris MG, Yates TA (2005) The Roles of Inducer Size and Distance in the Ebbinghaus Illusion (Titchener Circles). Perception 34(7):847–856. https://doi.org/10.1068/p5273 Rossi A, Olivero LF, Barba S (2018) “CubeME”, a variation for an immaterial rebuilding. In: Rappresentazione/Materiale/Immateriale Drawing as (in) Tangible Representation, Cangemi Editore, pp 31–36 Rossi M (2016) Architectural perspective between image and building. Nexus Netw J 18(3):577– 583. https://doi.org/10.1007/s00004-016-0311-y Sánchez-Reyes J, Chacón JM (2016) Anamorphic free-form deformation. Comput Aided Geom Des 46:30–42 Sánchez-Reyes J, Chacón JM (2020) How to make impossible objects possible: Anamorphic deformation of textured NURBS. Computer Aided Geom Des 78:101826. https://doi.org/10. 1016/j.cagd.2020.101826

242

A. B. Araújo

Spencer J (2018) Illusion as ingenuity: Dutch perspective boxes in the Royal Danish Kunstkammer’s ‘Perspective Chamber’. J Hist Collections 30(2):187–201 Sugihara K (1982) Classification of impossible objects. Perception 11(1):65–74. https://doi.org/10. 1068/p110065 Sugihara K (2000) ”Impossible objects” Are not necessarily impossible – mathematical study on optical illusion –. In: Goos G, Hartmanis J, van Leeuwen J, Akiyama J, Kano M, Urabe M (eds) Discrete and Computational Geometry, vol 1763, Springer, Berlin/Heidelberg, pp 305– 316. https://doi.org/10.1007/978-3-540-46515-7_27 Sugihara K (2015a) Ambiguous cylinders: a new class of impossible objects. Comput Aided Drafting Des Manuf 25(4):19–25 Sugihara K (2015b) Height reversal generated by rotation around a vertical axis. J Math Psychol 68:7–12 Sugihara K (2016) Ambiguous Cylinder illusion. https://www.youtube.com/watch?v= oWfFco7K9v8 Sugihara K (2018) Topology-disturbing objects: a new class of 3D optical illusion. J Math Arts 12(1):2–18. https://doi.org/10.1080/17513472.2017.1368133 Termes D (1998) New perspective systems. self-published Tomilin MG (2001) Anamorphoses-optical oddities of the Renaissance or sources of the science of image processing? J Opt Technol 68(9):723. https://doi.org/10.1364/JOT.68.000723 Tran Luciani D, Lundberg J (2016) Enabling designers to Sketch Immersive Fulldome presentations. In: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems – CHI EA ’16, ACM Press, San Jose, California, USA, pp 1490–1496. https://doi.org/10.1145/2851581.2892343 Verstegen I (2011) Come dire oggettivamente che la prospettiva è relativa. Rivista di estetica (48):217–235 Verweij A (2010) Perspective in a box. In: Architecture, mathematics and perspective. Springer, pp 47–62 Vuibert H (1912) Les Anaglyphes Géométriques. Librairie Vuibert, Paris Wade NJ, Ono H (2012) Early studies of binocular and stereoscopic vision 1. Jpn Psychol Res 54(1):54–70 Ware W (1900) Modern perspective: a treatise upon the principles and practice of plane and cylindrical perspective. The Macmillan company, New York; Macmillan & co., ltd., London Wheatstone C (1838) Contributions to the physiology of vision. Part the first. on some remarkable, and hitherto unobserved, phenomena of binocular vision. Philos Trans R Soc Lond 128:371–394 Wheatstone C (1852) Contributions to the physiology of vision. Part the Second. On some remarkable, and hitherto unobserved, phenomena of binocular vision (continued). Philos Trans R Soc Lond 142:1–17

Anamorphosis: Between Perspective and Catoptrics

10

Agostino De Rosa and Alessio Bortot

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anamorphosis Between Paris and Rome: A Catoptric Relationship . . . . . . . . . . . . . . . . . . The Project for a Scientific Villa in Baroque Rome as a Mirror of Time . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further Recommended Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

244 245 258 285 287 289

Abstract This essay is conceived in two parts: the first part deals with the perspectival and artistic work of the Minim Father Jean François Niceron (1613–1646). Niceron is the author of two volumes (the second published posthumously) that have become milestones concerning studies on perspective in the Seventeenthcentury – La perspective curieuse (Paris, 1638) and the Thaumaturgus opticus (Paris, 1646). Niceron’s expressive world developed into acutely deceptive works at a very early stage of his life. The second part describes a digital interpretation of a non-executed project for a scientific Villa where we could have found instruments of Wonder employing mirrors and lenses commissioned by Cardinal Camillo Pamphilj (1622–1666). The project was conceived by the architect Francesco Borromini (1599–1667) and Father Emmanuel Maignan (1601–1676) at the end of the first half of

A. De Rosa () Dipartimento di Culture del Progetto, Università Iuav di Venezia, Venezia, Italy e-mail: [email protected] A. Bortot University Iuav in Venice/dCP, Venice, Italy e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_38

243

244

A. De Rosa and A. Bortot

the Seventeenth century. Borromini, who is well known for his architectonical work, drew both building plan and façade using two different symmetrical solutions. Maignan wrote a list of 21 scientific games, most of them scaled architectonically, representing the experimental research in the Baroque period on optics, gnomonic, void, acoustics, magnetism and so on. As well, optical ‘games’ adopting conical mirrors that create catoptric anamorphosis, and flat mirrors conceived to project sunrays in the building and develop catoptric sundials, will also be examined. Niceron’s and Maignan’s epistemological research intersected Cartesian and Hobbesian thought. Their works often became a true reflection of contemporary philosophical positions, while nevertheless preserving their stylistic autonomy both in content and form.

Keywords Anamorphosis · Perspective · Optics · Catoptrics · Dioptric · Baroque

Introduction ᾿ The term αναμ´ oρφωσ ις (anamorphosis) is etymologically derived from the Greek, ᾿ α´ (upwards, backwards, back to) and the root μoρφ η´ (form) (De in the suffix αν Rosa and D’Acunto 2002). According to Jurgis Baltrušaitis (Baltrušaitis 1984) the word first appeared in a treatise by Gaspar Schott (1608–1666), Magia Universalis naturae et artis . . . (Paris 1657–1659) and is confirmed in the Oxford companion to Art. In the field of art anamorphosis refers to a precise category of planar images or to tridimensional structures that are represented in a strongly deformed perspective. These images aren’t immediately recognizable in their real – frontal – configuration, but can only be fully understood if observed from a precise point of view. In other cases anamorphoses are recognizable due to their reflection upon a convex, spherical, cylindrical, conical or pyramidal mirror called an anamorphoscope (Kuchel 1979). According to Kirsti Andersen, we can talk about a direct anamorphosis in the first case, while the adjective “catoptric” refers to the second (Andersen 1995). Anamorphic images are inextricably connected – both in their geometrical creation and fruition – to the rules of linear perspective (perspectiva artificialis). As affirmed by Baltrušaitis, they can be considered virtuoso expressions of perspective constituting a kind of projective ‘depravation’. These unconventional images – derived from rigorous and strict Euclidean construction – first of all consider that a personal point of view is unique, as if the observer had only a single eye (as a monocular observer). Considering the representative conventions exposed in the treatises by Leon Battista Alberti (1435/1973) and Piero della Francesca (1472/1984), the anamorphic image exceeds the typical Renaissance need of combining the space of representation with the one of natural optical experience. An anamorphic image appears to the observer, even if geometrically correct, like a graphic enigma in which specific

10 Anamorphosis: Between Perspective and Catoptrics

245

representational choices are combined with strong mystic symbolism, even with magic and ritual. These characteristics are set forth by the strong deformations. In the first half of sixteenth century the upheaval of perspectiva artificialis rules was almost completed. The observer’s oblique position, compared to the rules of 1400s perspective representation, is just one of the transgressive characteristics of anamorphosis. These variations on the traditional Renaissance perspective are described in the treatises of the upcoming century and become a reference for the artist that starts to be a “ . . . showman whose spectacular displays strike sparks of wonder and mystery, intriguing concealments and sudden revelations” (Gilman 1978). It is inevitable that the most important authors or scientific disseminators of the anamorphosis phenomenon were opticians and students who studied perspective belonging to religious orders (Cojannot-Le Blanc 2006) such as the Jesuits and the Minims. During the Baroque period monks used this technique of representation to convey messages or hidden allegories using the archaic code of perspectiva secreta. According to Baltrušaitis “ . . . the procedure is stated as a technical curiosity but contains abstraction, a powerful mechanism of optical illusion and a philosophy on artificial reality. Anamorphosis is a puzzle, a monster, a prodigy. Although it belongs to the world of oddities that have always had a cabinet and a shelter in manhood, it often goes beyond the hermetic frame” (Baltrušaitis 1984). During the Seventeenth century the study of the anamorphic genre reached levels of theoretical depth and graphic virtuosity that had only received a momentary glance in previous treatises or in artistic achievements. The theme of anamorphosis was taken into consideration at the beginning of the century by Salomon de Caus (1576–1626), and then by Pierre Hérigone (1580–1643), but the subject finds its most exhaustive exegesis in the scientific texts written by two religious authors living in Rome. The most precise is La Perspective Curieuse (1638) by the Minim Friar Jean François Niceron that was published in Latin (1646), and edited and integrated by Father Marin Mersenne with the new title of Thaumaturgus opticus after the Friar’s death. This last work must be considered as a partial realization of the editorial project that Niceron had been following for many years. Unfortunately, the commitments of the religious order and his early death prevented him from completing it. The story of direct and indirect anamorphosis develops exactly around the figure of Niceron and his friend Emmanuel Maignan, who both lived and worked in the Roman convent of Trinità dei Monti in the first half of the Seventeenth century (Martin late XVIII century).

Anamorphosis Between Paris and Rome: A Catoptric Relationship The portrait (Fig. 1) depicts a young gaunt-faced monk with a barely visible beard wearing a tunic with the typical cap of the religious Order of Minims (Withmore 1967). He is holding the planche of his latest treatise on which he was still working just before his death on the 22nd of September 1646. Since Niceron passed away before his book was published, the engraving by Michel Lasne (1595–1667) actually appears as a space-time paradox. In fact, although he is represented holding the

246

A. De Rosa and A. Bortot

Fig. 1 M. Lasne, R. P. Joannes Franciscus Niceron ex Ordine Munimorum, egregiis animi dotibus et singulari matheseos peritia celebris, obiit Aquis Sextiis 22 septembris an. Dni 1646, Aetat 33. Engraving. Paris, first half of the XVIIth century

book, his work will not be published until after his death. On the planche that Niceron is holding upright we can read: “F. Iaon Franciscus Niceron/Delinea Romæ ano Sal. 1642/Ætatis Suæ 29”. These words give a hint of the period when the first draft was drawn out. At the time, which coincided with the second stay of Niceron in Rome (post January 1641–April 1642), he was writing the Latin edition (Niceron 1646), with the relative plates of his La Perspective Curieuse (1638). This treatise unlocked the aberrant secrets of perspective known as anamorphosis. As the treatise was written in French it prevented other European students from reading it. The illustration selected by Niceron is number 13 and depicts the Propositio Trigesima dedicated to the perspectival representation of a “spherically stellated solid with a square based pyramid” (Niceron 1646). The choice of the subject was probably connected to the new theme that symbolized – proposing it graphically – the expansion of the Latin edition as compared to the French. Niceron was born in Paris on July 5th 1613 (Withmore 1967; Roberti 1902–1908); he joined the Order of Minims at the convent of Nigeon-Chaillot (now Passy), where he served as a novice. On the 26th of January 1632, after he had completed his novice, Niceron was admitted to the profession and went to

10 Anamorphosis: Between Perspective and Catoptrics

247

Fig. 2 J.F. Nicéron, Anamorphic portrait of Jacques d’Auzolles de Lapeyere. Engraving. Archives Départementales du Cantal, Aurillac Cedex (Auvergne), 1631

the convent of Place Royale (Paris). He was given the second name Jean after his uncle who was also a Minim monk. In 1631, at the age of 18, during his novitiate, Niceron created his first artistic work: an anamorphic portrait (Fig. 2) of Jacques d’Auzolles de Lapeyere (1571–1642), a well-known author of Mercure charitable (d’Auzoles de Lapeyre 1638). This work included chronological details on the back cover. It is an aberrant image, outlined and engraved by Jean Picard, that appears distorted if seen on a horizontal surface, but becomes recognizable if reflected on a cylindrical mirror put inside. The image identifies the writer’s ‘rectified’ portrait – medal-shaped – defined by Niceron as ‘princeps chronographorum’. At the time of Jacques d’Auzolles’ portrait, it is evident that Niceron could not have read some of the seminal works by René Descartes (1596–1650) (Rodis-Lewis 1997), such as Dioptrique (1637) or Géométrie (1637), which focused on the theme of perception by working out the inevitable mistakes in terms of the senses. A following consultation among the authors is certain, since both studied in the Convent’s Library of the Order of Minims in Place Royale (Paris) (Krakovitch 1981). This structure gathered many students who attended the intellectual circle of Father Marin Mersenne (1588–1648), secrétaire de l’Europe savante, besides the

248

A. De Rosa and A. Bortot

precious volumes and incunabula. These meetings were organized every Saturday in his monastic cell. From his first artistic and scientific experience, Niceron’s work is an attempt to escape from the inexorable idea of mechanical laws, in order to identify “ . . . a strategy to avoid the reduction of appearances to the laws of inert matter, or rather, to find a way through which, in itself, the appearances of material bodies were recognizable and oriented themselves to the spirit, reflected their otherness and their principle, not to be reduced to the size of the res extensa, the strict mechanistic model and spatial partes extra partes” (Baitinger 2006). It is likely that Niceron had seen the first catoptric anamorphoses in 1627 in Paris. They were probably similar to exotic samples imported to France by the painter Simon Vouët (1590–1649) on his return trip from Constantinople where he had acquired these techniques between 1611 and 1612. There are four oil on canvas paintings by Niceron in Paris around 1635 (50 × 66.5 cm) that belong to this class of images and can be included in the category called catoptric regenerative devices; these are settled through their reflection on cylindrical mirrors. Today they are kept at the National Gallery of Ancient Art in Palazzo Barberini (Rome). They portray: Louis XIII before a crucifix, Louis XIII, San Francesco di Paola, and a Nuptial scene. The amazing effect of this reflective reconstruction created from deformed images is obtained by Niceron by applying the geometric constructions revised and edited after his death. These structures are also present in his ‘vernacular’ treatise (1638) and were based on those developed for students by the French mathematician Jean-Louis Vaulezard (* – *) in his Perspectivae cilindrique et conique ou traite des apparences vues par le moyen des miroirs . . . (Paris 1630). The young Jean François showed a special inclination towards mathematical studies and a remarkable interest in catoptric and dioptric optics, without neglecting his previous studies in theological and philosophical disciplines (in his first treatise). During the same year the 25 year old Niceron published La Perspective Curieuse, ou magie artificielle des effets mervellieux . . . in Paris for Pierre Billaine (1638) (Fig. 3). This was influenced by Salomon de Caus’ and the Jean-Louis Vaulezard’s texts mentioned previously, which in fact was far more original than most of his famous predecessors. The folio volume consists of 20 unnumbered pages, 120 numbered pages (including the Preludes geometriques, the Definitions nécessaires and the books I–IV), two further unnumbered pages and 25 plates (Vagnetti 1979). The illustrations were engraved by Joan Blanchin and based on Niceron’s drawings, whose graphic abilities in the world of art, as confirmed in his prior texts, seem to be unquestioned. Niceron, well aware of the level of sophistication that perspective technique had achieved between the Sixteenth and early Seventeenth century, approaches the problem of deformation that nowadays could be defined as ‘projective’ avant la lettre. He actually abandoned the practical expedients widely exploited until then, “ . . . because it is a matter of small weight and for which it is not necessary to have any knowledge of perspective” (Niceron 1638). Niceron shows a deep understanding of perspective theories in his treatise just as his Italian, French and German predecessors. The Minim Father assumes a leading role in the development of the discipline. Rather than Platonic, he followed

10 Anamorphosis: Between Perspective and Catoptrics

249

Fig. 3 J.F. Nicéron, La Perspective Curieuse, ou magie artificielle des effets mervellieux . . . , chez Pierre Billaine, Paris 1638

an ‘Archimedean’ approach in the expository issues, which are more focused on application rather than on the speculative abstract. The directorship of optics is quickly established. Descartes asserted in his preface to La Dioptrique that sight dominates among all senses. Niceron proposes and solves many problems of linear perspective with a clear and rigorous language. The theoretical explanation is reinforced by beautiful etchings on plates by Blanchin. Father Niceron had no intention of supplying a critical document representing a summary of the best

250

A. De Rosa and A. Bortot

previous treatises, but to actually deal with “ . . . kindness of the curious perspective, which, as they have amused him and distracted from the seriousness of theological studies, may not be disagreeable to the curious” (Niceron 1638). From a critical point of view, Niceron’s theoretical and practical work appears to be closely linked to the works of Father Emanuel Maignan (1601–1676). The relationship between these two men obliges us to investigate the speculative and artistic activities of our thaumaturgus opticus on the Roman monastery site. Nowadays the extraordinary catoptric astrolabe (1637) (De Rosa 2013) can still be admired in the convent’s corridor facing north. The corridor, when walking clockwise, is then followed by a second long corridor hosting the ‘perspectival painting’ (1639–1640) made by Father Jean-François Niceron using tempera paint. Niceron admits that this anamorphosis’s light-headed effect inspired the following anamorphic work by Father Maignan (1642) which is a grisaille painting depicting the founder of the Order, St. Francis from Paola (Fiorini Morosini 2000). The mural paintings were probably depicted within a quiet atmosphere of collaboration between the two confrères in the Roman convent. At the age of 29 Niceron and Maignan’s fellow Brother had also lived in the same College of Trinità dei Monti during Niceron’s first Italian stay from the 25th of May (1639) to the 28th of March (1640) (AGM n.d.). It was precisely here that Niceron carried out the large anamorphic colored mural between 1639 and the early 1640s, which was then replicated – although with significant differences – in Paris (Minim’s Motherhouse in Place Royale) in 1644. It depicted St. John the Evangelist writing his Gospel in the island of Patmos (De Rosa 2013) (Fig. 4a, b). Father Maignan also created the anamorphic grisaille portrait representing the founder of the Order in this very same place in 1642: St. Francis of Paola in prayer (Ceñal 1952; Baltrušaitis 1984; De Rosa 2013) (Fig. 5). Although the work was restored (February 2009), and even with a few parts of Niceron’s painting missing, the artwork remains completely understandable. At first the mural was considered a fresco-secco. This information is significant because the analysis on Niceron’s curriculum vitae et studiorum has not yet revealed any

Fig. 4 (a, b). J.F. Nicéron, St. John the Evangelist writing his Gospel in the island of Patmos, 1639–1640. Colored mural painting. Convent of Trinità dei Monti, Rome. Images of anamorphosis. (Photo by the Author)

10 Anamorphosis: Between Perspective and Catoptrics

251

Fig. 5 E. Maignan, St. Francis of Paola in prayer, 1642. Grisaille mural painting. Convent of Trinità dei Monti, Rome. Image of anamorphosis. (Photo by the Author)

particular reference of an advanced artistic apprenticeship. Since the anamorphosis of St. Francis of Paola and the catoptric sundial (Fig. 6) – both painted by Maignan – were created with the same technique, we can deduce that this method was chosen by the two scholars because it required an lesser degree of expertise. Niceron, who had already demonstrated remarkable skills in the fields of decoration and design, probably chose fresco-secco painting due to its overt performance, simplicity and velocity. Moreover, it represented a sort of ideal technique capable of translating a complex optical-mathematical theorem quickly into painting. Niceron considered the fresco as a whole and didn’t contemplate checking and correcting the image while in process, as the strong optical deformity produced by the oblique position would have probably made it difficult to control every work carried out daily (“giornata”). Although we have no references regarding St. Francis of Paola, the transference of St. John to an oblique image – from a projective and not mechanical point of view – was already present in Proposition II, in particular in the three following corollaries and the plates 12 and 13 of La Perspective Curieuse’s Book II (1638), where Niceron states: “Provide the method to describe any kind of figure, images and pictures, in the same way as the chairs of the previous statement, that is to say, in such a way that they appear confused in appearance, and from a certain point [of observation] they perfectly represent a proposed object” (Niceron 1638). As recalled by Niceron in his Thaumaturgus Opticus, on the mural painting, a long ancient Greek inscription arises on the book spine and reads as follows: “The Apocalypse of Optics, the Eyewitness of the Apocalypse” (Niceron 1646; De Rosa

252

A. De Rosa and A. Bortot

Fig. 6 E. Maignan, Catoptric sundial. Convent of Trinità dei Monti, Rome. Image of the corridor with sundial and the window sill where the gnomonic mirror is positioned. (Photo by the Author)

2006) (Fig. 7). The reference is clearly about the power of epiphany (apokalypsis meaning revelation) implying on one hand the anamorphic magic of the work itself, which discloses its contents only when observed from a specific point of view (geometrically and spatially fixed), and on the other hand the theological role played by St. John (Bessot 2005; Fratini Moriconi 2010; De Rosa 2013). St. John was the only human being to be sharp-eyed enough, which allowed him to contemplate the True Light of the Word. In fact, as in the theriomorphic attribute to the Evangelist, eagles are the only living beings able to soar in the sky’s heights and look into direct sunlight without going blind. Niceron portrays the eagle next to and slightly forward of the Saint so that, when viewed from the front, its beaked head and one of its spread wings (the other, covered by part of the wall where the mural painting appears to be seriously damaged, is lost) become part of the background. It is a Biblical landscape where a descending sun is darkened by thick smoke representing a prelude to the Parousia. Perhaps the choice of such an atypical subject (for Niceron’s mural painting) is to be found in this critical connection. Since it wasn’t the object of a special veneration by the French Minims, St. John’s figure must have been chosen for its complex ethical-philosophical meaning. On the one hand he was able to bring the word Logos (greek for verb) to God, thus reconciling Christian culture to a Jewish-Hellenistic vision of the world. On the

10 Anamorphosis: Between Perspective and Catoptrics

253

Fig. 7 J.F. Nicéron, St. John the Evangelist writing his Gospel in the island of Patmos, 1639–1640. Colored mural painting. Convent of Trinità dei Monti, Rome. Detail. (Photo by the Author)

other hand, not having deliberate abstraction that characterized his testimony of faith, in his Gospel he only referred to “ . . . what we have heard, which we have seen with our eyes, what we beheld, and what our hands have touched . . . life was made manifest.” He was therefore the eyewitness of the living Logos, an epithet with which John loved to be defined as (First Epistle of John). Even the Minim fathers (contemporaries of Niceron’s pictorial work) were the first witnesses of anamorphosis’ scopic catastrophe: they were first startled and then reassured by its ongoing optical composing and decomposing. In this act of witnessing, assigned by Niceron to the ocular nature of his painting, it is still possible to see an item imputed to Cartesian speculation. It might be this cultural environment that influenced the Latin inscription “CITRA DOLUM FALLIMUR” (“we are deceived without malice”) (Fig. 8) that adorns the pendant cartouche from one of the branches placed over St. John’s back. Fratini and Moriconi suggest (Fratini, Moriconi 2010) it is a quote from the motto that accompanied the title of the famous Perspectivae Libri Sex (Del Monte 1600) by Guidobaldo del Monte (1545–1607). Niceron often refers to this author in his two treatises, underlining its rigorous mathematical and proto-projective approach to perspective – sometimes excessively abstract and complex. The tribute is to one of the most authoritative sources of science that the author most profitably cultivated. The depicted quotation, given its inclusion in a mural anamorphosis, is a critical reflection on the exercise of doubt. The presence of this and other pictorial and decorative works adorning both Roman and Parisian Minim monasteries on one hand constituted a breeding ground in which to test experiments in optical and figurative painting theoretically elaborated and therefore performed in vitro in

254

A. De Rosa and A. Bortot

Fig. 8 J.F. Nicéron, St. John the Evangelist writing his Gospel in the island of Patmos, 1639–1640. Colored mural painting. Convent of Trinità dei Monti, Rome. Detail. (Photo by the Author)

treaties and studies; on the other hand these were the subject of a powerful reflection on the Cartesian labyrinth, on what is visible and on the falsa credita which derived from it. On plate 33 of the Thaumaturgus opticus (1646) (Fig. 9), Niceron provides a graphical summary of the projective method probably employed in order to achieve the anamorphic work in the Roman monastery. Surely he also used this method for the twin anamorphoses carried out in Paris. The image can be easily recognized: it is drawn in black-and-white and placed obliquely in respect to the wall surface in St. John’s recomposed portrait inserted within a network of orthogonal lines. In the text (Book II, XI proposition, III corollary) Niceron clarifies the nature of the portrait’s subject and the color chosen for the dress – green – and for the Holy cloak – purple – but does not point to any source of inspiration: “Among the painters, it is accepted for common use and as usual the fact that, when they foreshadow the image of St. John the Evangelist, they represent the robe with the green and the cloak with purple” (Niceron 1646). It is a reconstruction from memory – and obviously for didactic purposes – of the sinopia on which Niceron’s Parisian anamorphosis was based circa 1645 (galleries of Place Royale’s Convent) and in accordance to a precise decorative program, was executed shortly before his death. Niceron describes his work in Paris with these words: “ . . . Instead, by that one drawn here in Paris we show directly in BCDE the prototype from which, by means of the exposed method, the projection was obliquely transposed on the wall. As we have already said, this is not to be seen as a naked projection with oblique rays, but in it are offered to a direct view many other objects not disagreeable or ugly: here, we have them listed and provided as an example, especially because, at a given circumstance, a similar reproduction

10 Anamorphosis: Between Perspective and Catoptrics

255

Fig. 9 J.F. Nicéron, Thaumaturgus opticus, Paris 1646. Liber secundus, Propositio Undecima, Plate 33

might be attempted, and even obtaining one more beautiful and elegant” (Niceron 1646). The size of the anamorphic painting had to be remarkable, as it was placed in an aisle: “ . . . twenty-four feet long (33,78 m), where the aforementioned image projection covers fifty-four feet in length (17.54 m) on a wall of at least eight feet in height (2.60 m), and the ocular point which is perpendicularly at five feet (1.62 m) from the wall or delineation surface, it rises above the floor only four and a half feet (1.46 m). We could not delineate with these proportions the given figure, because of the restricted size of the panel where we located it” (Niceron 1646). Niceron’s pictorial invention had to overcome all of those previously carried out, for the choice of a religious theme so relevant for Christianity (the revelation of the end of Times, the beginning of the New Kingdom to the Saint), but also because the landscape hidden in the anamorphic portrait had to show a second and more dramatic exegetical level than was already seen in his Roman St. John and in St. Francis of Paola by E. Maignan. The picture is shadowed with soft colors and flimsy appearances, so that “ . . . we are no longer looked at from a far and oblique point of view (Niceron 1646). Walking along the portico, one could describe it in this way: “ . . . in the dark and shadowy folds of green tunic, [ . . . ] intricate forests and dense woods of impenetrable trees. In tunic’s more enlightened parts or in foreground, [ . . . ] instead blondes ears and already ripe harvest. In his Candid belt, flowing water from rivers and spring; in white sheets of the open book, a large lake, and in it a harbor, beaches, ships, fishing etc. In the head, caves, caverns, steep cliffs, rocks,

256

A. De Rosa and A. Bortot

buildings: rather, the ruins of the whole city of Babylon, next to which we place even of angels playing the trumpet” (Niceron 1646). Only by moving away from what Niceron defined as an ocular point (perhaps placed in the corridor’s threshold that allows access to the library of the Convent in Paris), visions and mysteries appeared narrated in the Apocalypse’s various chapters that are reproduced figuratively in a growing vertigo of sight so compelling that the author himself was disinclined to self-celebration. If not for purely rhetorical purposes, the author had to admit that all of this had been reported by the Greek words painted on the Evangelist‘s book. “The Apocalypse of Optics, the Eyewitness of Apocalypse [ . . . ] Thus, in the pages of the open book, between the lines of written text or between the verses we represented land furrows, and in them that grazed flocks and shepherds that guarded them; so, in the purple cloak of our Evangelist I represented the harvest which, in the fourteenth chapter of the Apocalypse, is said to be whipped up horse bridles and to be flowed to one thousand six hundred stadia; in addition, we represented in heaven, sitting on a cloud, the one who sank the sickle into the earth and sent the angels to harvest the grapes of the vineyard. Rather, even from the face’s features, by applying suitable colors, we painted with care a barrel or a cask from which, crushed grapes, flowed the harvest” (Niceron 1646). From the comparison between Niceron’s Roman painting (finally visible today), and Niceron’s drawing according to the Parisian anamorphosis (now disappeared) – we can clearly see that the postures assumed by the Saint in the two images were totally different. Niceron’s drawing is the only evidence of the original anamorphosis. In the Roman image, St. John is bent forward towards the page where he is writing the Apocalypse. His gaze is focused on the draft of the prophecy which is reified in the biblical episodes anamorphically hidden in the surrounding landscape where his body is created. Above all the eagle is a decisive element as his theriomorphic attribute appears in front of his body. In Paris, instead, the Saint is portrayed in a proxemically open posture, not leaning on the tome, his legs are tightly wrapped around the eagle’s neck. In Niceron’s drawing the gaze of the subject is precisely directed on the library of the Convent in Paris. The landscape is barely traced – (composed by a tree where ivy wraps around a branch and rocks are placed in the background) and appears suggesting its ‘narrative’ development in the anamorphical transformation. Therefore a logical correspondence between the figurative structure of the image and the biblical episodes – which should have been concealed in it – is not possible, although it had already been described in the treatise by the author. In this work Niceron theorized how anamorphosis could be applied to extensive wall surfaces, allowing the creation of proper mural paintings like those he had already carried out in Rome and Paris (Figs. 10 and 11). In addition to the anamorphic representations, depicting St. John the Evangelist in the two famous Minim Convents, the author painted another accelerated perspective (probably accomplished in the fresco-secco technique) in the Paris Cenoby – ‘en perspective’ according to the Convent’s annals – which had as its subject The Magdalene contemplating in the Sainte-Baume cave (1645) (Fig. 12a–c), which

10 Anamorphosis: Between Perspective and Catoptrics

257

Fig. 10 Axonometrical view of the eastern corridor in Trinità dei Monti, Rome. Digital reconstruction of the projective process generating anamorphosis. Digital elaboration by Cosimo Monteleone/Imago rerum/Iuav

Fig. 11 Orthophoto of the wall surface of the western gallery on the first floor of the Convent of Trinità dei Monti. Digital elaboration of the point cloud showing the reflectance value in grayscale. Rendering by Cristian Boscaro/Imago rerum/Iuav

was finished after Niceron’s sudden death by Father Maignan during his visit to Place Royale in 1662. We know that Niceron adopted a similar approach to Maignan’s while creating the extensive Parisian paintings. This was directly taken from Dürer’s ‘sportello,’ which the author further discusses in the previously mentioned plate in his Thaumaturgus opticus. Niceron also uses ‘gallows’ connected to a shifting plumb wire that identifies a point in the rectified portrait allowing it to be projected anamorphically on the wall of the mural. This work established that Alberti’s and Dürer’s perspective had been overcome just as Maignan’s which at present is still visible. The work was imagined as an open window, a reality offered to the painter’s eyes where the anamorphic frame is already fitted with a perspective image drawn in true shape inside a square network projected on the wall surface. Therefore it no longer exists “ . . . the intersection of the visual pyramid that separates the subject from the object, or the simple projection of the object on the plane of intersection. Now, on the plane there are depicted images projected by the mind” (Ciucci 1982).

258

A. De Rosa and A. Bortot

Fig. 12 (a–d) J.F. Nicéron, The Magdalene contemplating in the Sainte-Baume cave, Paris 1645. (a–b) Digital model of the northern gallery of the Place Royale Minims Convent, Paris. Axonometric reconstruction of the relationship between oblique projection, image and right point of observation. Digital elaboration by E. Trevisan/Imago Rerum. (c–d) Digital model of the corridors of the Minims Convent in Place Royale, Paris. The progressive Christological intensification of the frescos itinerary ranging from the Aubervilliers corridor to St. John’s. Digital elaboration by E. Trevisan/Imago Rerum

The Project for a Scientific Villa in Baroque Rome as a Mirror of Time Within the Roman context of the seventeenth century, scientific research is well summed-up by an unbuilt project that emerged from the collaboration between an architect and a religious man: Francesco Borromini (1599–1667) and the Minim Friar Emmanuel Maignan (1601–1676), (Figs. 13 and 14). The fortified villa, belonging to Virgilio Spada (the secret almoner and superintendent of the papal properties), was built by Borromini outside the Gate of St. Pancrazio in Rome as a residence for Cardinal Camillo Pamphilj. The material related to this project, accredited by Paolo Portoghesi to Borromini (Portoghesi 1964), is composed by two technical drawings, a plan proposing a façade in two different styles and a sketch plan, with some alterations compared to Borromini’s plan, made by Virgilio Spada. Among Spada’s papers there is a letter addressed to Camillo Pamphilj written by Borromini and also a manuscript including some pages entitled Mathematics to Adorn the Garden of His Eminence Mr. Cardinal Panfilio. The translation into vernacular language of a document

10 Anamorphosis: Between Perspective and Catoptrics

259

Fig. 13 Digital reconstruction of the project for Villa Pamphilj: isometric-axonometric projection from South-East

entitled Mathematica Pamphilianos hortos exornans has recently been found. The text is in Latin and uses a more accurate lexis than the quoted version in vernacular. It shows a detailed and a very accurate list of scientific games that would have been installed in the villa. The above-mentioned ‘games’ (mirabilia) represent the main fields of research carried out by some eclectic Baroque personalities. These characters often belonged to religious orders and used scientific, theological and philosophical speculations in their works. The authorship of the work was accredited to Emmanuel Maignan (1601–1676) by Paolo Portoghesi and Filippo Camerota (Camerota 2000), a Friar from the Order of Minims who lived and taught in Rome at the Cloister of Trinità dei Monti for some years.

260

A. De Rosa and A. Bortot

Fig. 14 Digital reconstruction of the project for Villa Pamphilj: axonometric exploded view from North-West. In evidence the different coverage systems of the interiors

The manuscript Mathematica Pamphilianos hortos exornans written by Emmanuel Maignan is composed of four pages where, as stated above, some ‘mathematic’ Wonders are described and designed to adorn the villa. These Wonders are divided into a list of 21 numbered points that seem to represent the whole of

10 Anamorphosis: Between Perspective and Catoptrics

261

Fig. 15 The cross vaults of northern loggia on which anamorphic catoptric images have been drawn

‘scientific-experimental’ knowledge of the time on optics, gnomonics, existence of void, acoustics, magnetism, and so on. At this point we would like to describe, through digital reconstructions, some ‘games’ using mirrors or lenses generally employing rule of reflection and refraction. Point number 5 of the manuscript speaks about the north side loggia on the north side where one could find some Wonders linked to catoptric science (Fig. 15). In particular, sketched distorted figures will be in the cross vaults. These images would appear distorted if facing them directly, but would become recognizable if observed through their reflection on mirrors of extraordinary size. The mirrors would be conically shaped, and one would have to align the mirror relative to the axis of the cone itself (Fig. 16). A reference can be found in the work of Jean François Niçeron La Perspective Curieuse, whose third book is explicitly dedicated to the “aspect of flat mirrors, cylindrical and conical; and the way of building figures, carrying and representing for reflection something entirely different than what appears to be directly seen” (Niceron 1638). The content suggested in this part of the treatise would not have been sufficient for the installation described by Maignan. In fact, there is an element introduced in the Mathematica Pamphilianos hortos exornans that complicates the execution – making it even more complex. The surface on which the distorted drawing is placed is not planar, but will be composed by portions of the cylinder (orthogonal to each other), forming the rib vaults of each span. The original figure will not only be deformed when properly observing it as a reflected image, but will also be distorted in its projection on the ribs of the vault (Fig. 17).

262

A. De Rosa and A. Bortot

Fig. 16 Perspective view of Christ distorted image from the correct point view which is recognizable as a reflection on the conical mirror positioned on cross vault

Within the digital environment the problem was overcome by the use of a function that can ‘bounce’ the generatrix of the visual cone, passing through each point of the figure to be deformed on the mirrored surface. According to the rule of reflection, the angle formed between the incident ray and the normal in that point on the surface must be equal to the one formed between the normal and the reflected ray (Fig. 18). In order to obtain the anamorphic image on the rib vault, it was sufficient to extend the vectors up to the four ribs. The subjects chosen for this ‘game’ were the portraits of four Popes used for a dioptric ‘game’ that will be exposed to the next, plus Christ who occupies a central position on the plate 25r of La Perspective Curieuse by Niceron (Fig. 19). Another optical ‘game’ is described in the fourth point of Maignan’s manuscript. It is exactly the one proposed by Jean François Niceron in Book IV of La Perspective Curieuse. On another occasion we have examined the apparatus described in this section of the treatise which requires the use of prismatic lenses (polioptrum) examining it from a historical, scientific and geometric point of view (Bortot 2013). The device had to be placed in the southern loggia and was composed by a painting that shows some portraits of important people. One man only can be seen if observed

10 Anamorphosis: Between Perspective and Catoptrics

263

Fig. 17 Perspective view of three conical mirrors on cross vaults

through a tube provided with a polyhedric lens positioned at a precise point in front of the drawing that is supported by a statue. This lens is accurately described by the Minim Friar (Fig. 20) in Plate 23 of his work. Instead, Plate 24 shows an example of sketched painting: some portraits of Ottoman sultans among which Amurates IV stands out in a central position. A top view of the lens appears at the bottom left, whose parallel frontal projection was supplied in the upper register of Plate 23. At the bottom right corner, there is the depiction of one of the ‘champions of Christianity’ of the epoch, Louis XIII. It is actually the portrait of the French King that would have been seen when observing the Plate with the effigies of the Turks through the monocle provided with the glass lens. According to the interpretation given in the above essay the apparatus of Niceron would have had deep political implications: the political and religious pressure exerted by the advancement of the Ottoman Empire on the ports of the European borders would have created a collective phobia related to a possible supremacy of the infidels over Christianity. According to some scholars (Siguret

264

A. De Rosa and A. Bortot

Fig. 18 Reconstruction of mirabilia 5: the visual rays reflected by the mirroring cone are evident, (perspective view)

1993) the ‘game’ would hide a subtle irony: “12 heads of Turkish kings are worth the one of the King of France”. A device, very similar to the one described, and attributed to Niceron, is kept in the Museum Galileo in Florence (Fig. 21). The oil painting also shows some halflength portraits of Turks sovereigns and would have however shown the icon of the Grand Duke of Tuscany, Ferdinando II de’ Medici, thanks to the monocle. Another testimonial on the real creation of this ‘game’ is given by Thomas Hobbes in his work De Homine published in 1658. He probably saw it in the Minims’ cloister library in Paris during his forced exile due to his filo-monarchic ideas. The optical ‘game’ efficiency is guaranteed by several factors: the position of the viewing point, the length of the monocle and its distance from the painting, the size of the painting and above all, a key factor: the inclination of both sides of the lens related to the refractive index of the material. The results of the previous analysis using a digital three-dimensional recomposition of the lens had demonstrated that the Plate of the Ottomans, proposed by the Friar, was not just an abstract scheme, but could really be used to obtain the described effect with a certain compromise (Fig. 22). The complexity of this specific case derives from the precision of the calculation of the painting areas intended

10 Anamorphosis: Between Perspective and Catoptrics

265

Fig. 19 Reconstruction of mirabilia 5: at the top one can see Popes images to be distorted through anamorphosis; in the middle and on the bottom orthogonal projections of the distorted subjects are shown

to ‘detach’ engravings or paintings from the surface, and then recomposing them digitally, thanks to the lens forming a completely different and coherent image (Fig. 23). The rule of refraction was known in the seventeenth century, thanks to research carried out by René Descartes and Willebrord Snel van Royen (1580–1626); nowadays known as the Snell-Descartes law. In the last chapter of the Dioptric, Descartes describes a practical method to calculate the measure of a refracted angle by an apparatus provided with a prism cut into the shape of a wedge (Descartes 1637). The interesting thing about these measurements is in the topic of anaclastics, used in the production of lenses for scientific instruments (telescopes and microscopes). Descartes was fascinated by these prisms, able to create the socalled “science of miracles”, and also to use them to create ‘games’ that surprised the observer, but not only. We know that Isaac Beeckman (1588–1637) mentioned a hypothesis formulated by Cornelius Agrippa to Descartes, according to whom it

266

Fig. 20 J. F. Niceron, La Perspective curieuse (1638), Plate 23r

A. De Rosa and A. Bortot

10 Anamorphosis: Between Perspective and Catoptrics

267

Fig. 21 J. F. Niceron, Optical game, 1642. Museo Galileo, Florence, Room I, inv. 3196

Fig. 22 On the left one can find the rendering while looking inside the monocle; on the right the portrait of Louis XIII which is shown by Niceron in his treatise

268

A. De Rosa and A. Bortot

Fig. 23 The functioning scheme of Niceron’s game. The refracted portion of painting appears to be detached from the canvas surface (axonometric view)

would have been possible to ‘write letters on the moon’ and then send messages to its hypothetical inhabitants. The philosopher replied that according to Giovanni Battista Della Porta (1535–1615) the experiment would have been achieved through the use of lenses. The certainty was so strong that in a letter written to Jean Ferrier, a lenses turner, dated back to the 13th of November 1629, he stated: “I dare hope that with your help we will be able to ascertain the existence of living beings on the Moon” (Shea 2014). Going back to the adaptation of the ‘game’ to the ‘mathematics for Villa Pamphilj’, we should notice that in this case nothing has prevented us from installing the ‘game’ in the area suggested by the manuscript (Fig. 24) and putting the two statues provided with lens on the left and right side of the southern loggia (Fig. 25).

10 Anamorphosis: Between Perspective and Catoptrics

269

Fig. 24 The areas of the villa in southern loggia where mirabilia 4 has been positioned, (axonometric view)

For simplicity, Plates 24 and 25 of La Perspective Curieuse have been chosen to show the images framed by the tube provided with prismatic lens (Fig. 26). In the manuscript concerning Villa Pamphilj by Maignan, there is a description of a sundial which does not employ the traditional shadow projected by a gnomon, but a small mirror positioned on a window sill enabling it to reflect the sunbeams on the hemispherical ceilings which only cover the towers situated in the corner. The direct reference to this mirabilia is represented by the sundials made by Emanuel Maignan in Rome (Fig. 27). One is placed in the cloister of Trinità dei Monti and the other in Palazzo Spada, not neglecting the ones that the Minim built before his stay in Rome, in Aubeterre (Dordogne), Toulouse and Bordeaux (not visible nowadays). According to Gianni Ferrari the most ancient catoptric clock, now only partially visible, would have been created by Nicolaus Copernicus in one of the towers in the Castle of Olsztyn in Poland approximately in 1520 (Ferrari 2005). Maignan is also the author of a book on solar clocks entitled Perspectiva Horaria (Maignan 1648). From the treaties’ point of view instead, the oldest one dealing with this subject, although not as detailed as the Minim Father’s one, is the one by the Jewish Raffaele Mirami, written in vernacular and entitled Compendiosa introduttione alla prima parte della specularia, published in Ferrara in 1582. There is also another, preceding the one by Kircher (1635), published by the German Jesuit Georg Shonberger (1622). It is important to confirm that Mirani’s summary work was dedicated to the catoptrics in general, explaining the physical-geometrical principles, the applications to the perspective views in addition to gnomonics. In

270

A. De Rosa and A. Bortot

Fig. 25 Perspective view of mirabilia 4 from the garden facing the villa

the first pages the author claims to have been inspired by Euclid: “ad Euclide, à Vitellone ad Alhazeno, et altri, che dottamente ne scrissero,” while and when he had to explain the practical ramifications of such matter, he refers to the use of “tali principi per illuminar luoghi oscuri, per voltare alcune sorti d’ombre [ . . . ] per fare Horaloggi e per trasportarli da un sito all’altro” by introducing the ‘specularia’ (Mirami 1582). This digital model has been built and inspired by the catoptric clock on the first floor of Trinità dei Monti Cloister. There are four time systems used for this device: the system of ‘hours temporarie’, ‘Italic hours’, the ‘babilonian’ and finally the one of ‘astronomical hours’. In the sumptuous astrolabe in Palazzo Spada (1644) created thanks to Maignan’s experience acquired at the Trinità dei Monti, the clock can also be read at night through a the projection moonlight. This device employs a circular wheel conceived for both daytime and nighttime reading. Faithful to Maignan’s assumption “any sundial is a certain projection of a sphere and its circle toward some flat surface or any other kind of it” (Maignan 1648), we have reconstructed a ideal celestial sphere related to the latitude of Rome (41.9◦ ), identifying the various hour lines and references that stay on its surface thanks to the intersection with some fundamental geometric entities such as cones of light,

10 Anamorphosis: Between Perspective and Catoptrics

271

Fig. 26 Perspective view of mirabilia 4 from southern loggia

beams of light and plans of light (Fig. 28). The center of the sphere with its relevant fundamental entities was then positioned in correspondence with the center of the mirror located on the window-sill (Figs. 29, 30, 31, and 32). At this point the geometric light entities have been intersected one by one with the spherical vault which covers the circular environment and also intersected with parts of the walls above the ideal plane on which the mirror is located, generating a dense network of curves visible from the outside and inside of the room (Figs. 33 and 34). Maignan suggests aligning the geographic locations in the world along the projection on the spherical vault of a perpetual rainbow obtained through a tool, Iride Horariae Dioptricae, demonstrated in Book III of his treatise Perspectiva Horaria making it possible to identify when midday occurs in the world (Fig. 35). In other words, the position of the arc will indicate the parallel where the sun is located. This varies during the course of the year, indicating the places every day in which the star is at its zenith. Maignan in his Perspectiva Horaria explains how to make a rainbow using a cylinder of glass passed through by a sunbeam reflected on the gnomonic mirror (Fig. 36). In order to obtain the desired effect it is not sufficient for the outer surface of the crystal cylinder to be smoothed. The surface needs to be sliced by making many grooves in order to amplify the refractive phenomenon.

272

A. De Rosa and A. Bortot

Fig. 27 E. Maignan, catoptric quadrant on main floor of the Convent of Trinità dei Monti, Rome

A variation consists in the use of a cylindrical mirror (Fig. 37) which is useful to project an arch of light on the vault in this case (Fig. 38). This device is called Iride Horariae Catoprticae. In point number 10 and 15 in Maignan’s manuscript we can read: “[10] In one or the other of the two parts, or in both, or in any other room through the art of dioptric itinerant images, lying or erected, can be made according to the art catoptric. [15] Outlining these various living rooms ad hoc you can build other marvelous works in addition to those already mentioned under n. 4, 5 and 6 with many kinds of mirrors arranged variously”. The two steps of these reminders are analyzed together because of their similar concept and the vagueness concerning their location inside the villa. The scientific apparatuses described are again intended to exploit the rules of catoptric and dioptric and can be placed in any room, as we can read in the manuscript. Since the previous games occupied the entire ground floor, we thought that they could be located on the first floor in the room towards west (Fig. 39). These Wonders have not been reconstructed digitally since their realization would have been quite arbitrary due to the vast repertoire of the time. We can recall some possible references that were perhaps in Maignan’s mind. Upon reflection, the tradition of studies and experiments finds its roots in classical tradition. Euclid (367–283 b.C.) is the author of the first treatise we know about on optics, where he exposes his assumptions in two fragments: Optic and Catoptric. Catoptric, in particular, is divided into seven postulates born from the union between experimental observations and subjective

10 Anamorphosis: Between Perspective and Catoptrics

273

Fig. 28 E. Maignan, Perspectiva horaria (1648), p. 334

experiences observed by Cristina Candito (Candito 2010). However, the astonishing aspects of these tools come from Hero of Alexandria (II century b.C.) and constitute a direct connection with what is described in point 10 of the manuscript. According to the studies by Nix and Schmidt (Nix, Schmidt 1900) a viewer would have believed in seeing a flying person thanks to the double reflection obtained through flat mirrors positioned at an angle (Fig. 40). The same effect will be resumed and described by Vitellione, whose Perspectiva Libri X appears in the frontispiece of the Perspectiva horaria by Maignan. The treatise by Vitellione remained a reference throughout the centuries and testifies to the descriptions of

274

A. De Rosa and A. Bortot

Fig. 29 Celestial meridians and parallels which are obtained from the intersection of the room surfaces with geometrical elements in order to contain the catroptric clock

10 Anamorphosis: Between Perspective and Catoptrics

275

Fig. 30 The Sun’s declinations which are obtained from the intersection of the room surfaces with geometrical elements in order to contain the catroptric clock

276

A. De Rosa and A. Bortot

Fig. 31 Astronomic and unequal hour lines which are obtained from the intersection of the room surfaces with geometrical elements in order to contain the catroptric clock

10 Anamorphosis: Between Perspective and Catoptrics

277

Fig. 32 Italian and Babylonian hour lines which are obtained from the intersection of the room surfaces with geometrical elements in order to contain the catroptric clock

278

A. De Rosa and A. Bortot

Fig. 33 The net of the hour lines on the solar catoptric clock positioned in the south-western room, (axonometric external view)

a similar apparatuses in the works of Agrippa and Della Porta (Baltrušaitis 1981). Their books talk about flat mirrors capable of showing standing up figures or figures that fly caused by reflection which are instead lying supine. Using concave mirrors, it is even possible to create furthermore mysterious illusions: the mirrored image seems to be detached from the surface, according to rules explained by Euclid

10 Anamorphosis: Between Perspective and Catoptrics

279

Fig. 34 Inner perspective view of the south-western room from below; in evidence the projection of sunbeam

in Propositio XVIII, where the position of the observer and the mirror itself are geometrically defined. The combination of catoptric and dioptric rule arouses some observations apart from the phantasmagorical effects which are produced. These rules were a support for the construction of instruments to enhance the vision (telescopes, microscopes), for reproducing images (magic lanterns) and for the science of perspective representation. As it is renowned based on most accredited theories, Filippo Brunelleschi (1377– 1446) himself used a mirroring surface for the realization of one of the two famous wooden boards in linear perspective, in particular the one depicting the Florentine Baptistery (about 1413–1425). In the following centuries, mirrors and lenses assembled in a single instrument became capable of reproducing the imago rerum as perceived by our sight. The sixteenth century represents a crucial point in

280

A. De Rosa and A. Bortot

Fig. 35 Inner perspective view of the catoptric solar clock and projection on the vault of the rainbow which is defined as a Iridi Horariae Diopticae

the historical evolution of this device, thanks to the introduction of a lens in a hole made in the wall in order to improve the sharpness of the image projected from the inside of a room. Medieval authors such as Alhazen instead, was one of the main proponents; he regarded the dark room as a metaphor of our sight mechanism. According to Martin Kemp (2005) the first reference to this innovation dates back to the De Subtilitate (1550) by Girolamo Cardano (1501–1576); although its full description can be found in the Pratica della prospettiva (1569) by Daniele Barbaro (1514–1570).

10 Anamorphosis: Between Perspective and Catoptrics

281

Fig. 36 Cylindrical lens (with external grooves) which is able to break up sunbeams into their apparent spectrum

Fig. 37 Cylindrical mirror to project a light arc

282

A. De Rosa and A. Bortot

Fig. 38 Inner perspective view of the catoptric solar clock and projection on the vault of the light arc described in E. Maignan’s scientific reminder (point 6)

The patriarch of Aquileia suggests applying an opaque filter to the convex lens allowing the light to pass through its central part only. The marginal aberrations of the projection due to the curvature of the crystal would be reduced doing so. Further innovations will arrive later caused by the need of flipping the image upside down. Therefore a mirror was introduced diagonally which reflected the rays after passing through the lens. The same problem was solved with the introduction of an additional lens. These dark rooms became portable boxes as shown by Johannes Zahn in his treatise, Oculus artificialis Teledioptricus: sive Telescopium, published in 1685 (Fig. 41).

10 Anamorphosis: Between Perspective and Catoptrics

283

Fig. 39 The Villa’s room on first floor where mirabilia 10 and 15 have been located

Fig. 40 L. Nix, W. Schmidt, reconstruction of a catoptric effect as it is described by Hero of Alexandria

284

Fig. 41 Johannes Zahn, Oculus artificialis . . . (1685), p. 181

A. De Rosa and A. Bortot

10 Anamorphosis: Between Perspective and Catoptrics

285

Fig. 42 Perspective exploded cross section of the digital reconstruction for Villa Pamphilj by Francesco Borromini; in evidence the scientific games conceived by Emmanuel Maignan

In the games described, mirror and lenses are employed in optics for anamorphic devices and in gnomonics in order to measure the passing of time. From this point of view we can affirm that the project for Villa Pamphilj conceived by the architect Francesco Borromini and the monk Emmanuel Maignan, and influenced by Jean François Niceron, represents a wunderkammer of scientific devices (Fig. 42). The project also witnessed a new research method able to explain natural laws in the seventeenth century and at the same time created a sense of wonder in the visitors.

Conclusion The relationship between the two Minim Brothers beginning a few centuries ago only lasted 33 years an is now about to end. It is significant to supply the reader with a final image after having had to handle figures, patterns, distortions, anamorphoses

286

A. De Rosa and A. Bortot

Fig. 43 E. Maignan, Perspectiva horaria, sive de orographia gnomonica tum theorethica tum pratica libri quattuor, Rome 1648. Liber Tertius. Catoptrice horaria sive horographiae gnomonicae. Propositio XXXVI. Linea Meridianam, in superficie horologji Catoprtico-Gnomonici plana . . .

and reflections that dominated this essay. Gagnaire gave some suggestions in an article (Gagnaire 2003) which we would like to share with you to observe the plate that accompanies the Propositio LVI (56) (Fig. 43) in the Perspectiva horaria by Emanuel Maignan in 1648. We can find a perspective of the other famous catoptric sundial carried out by the Minim Friar from Toulouse in Rome (Palazzo Spada), today headquarters of the State Council in Italy. As it is known, the work was conducted in 1644 on behalf of Cardinal Bernardino Spada, protector of the Order of Minims. In the image he accompanies three other visitors. He is the character on the far right, wrapped in a cape, with a hat in his hand and with the unmistakable goatee immortalized by Guido Reni in a famous portrait (1631). The work is now kept in Galleria Borghese, Rome. The identification of the other three defendants is more complex. The Cardinal is turning his gaze to an area of the frescoed vault towards where the index of the main character is pointing. A nobleman wearing a cloak is turning around, perhaps interested in intensifying the gnomonic themes developed in the astrolabe. A rose in the fabric attached to a flap on his breeches is a distinctive sign tracing its identity: a heraldic symbol of the Orsini family. Next to the Cardinal, on his left, there are two friars facing each other. They are wearing a rope which

10 Anamorphosis: Between Perspective and Catoptrics

287

represents the patience of the Minim Order. We suspect that one of the two is the inventor of the sundial or rather Father Maignan. The other man in the distance is directly looking upwards showing attraction to the gnomonic problem brought up by the unknown gentleman. His uneven features, spirited look and shaved hair make us think he could be Jean-François Niceron. While the friar who is turning his back appears to be sturdy and bald, suitable to his rank of a priest, making us think he could be Father Maignan. A non-hagiographic bust was made by Marc Arcis (1655–1739) and kept in the Illustres célébrant les Grands de l’histoire de Toulouse gallery in order to identify Maignan in the painting. He has “ . . . a broad and powerful face, strong features, sharp eyes, which incline slightly to the side, as if to observe, reflect. The tight lips, finely drawn, slightly ajar, suggest questions already posed by the luminous eyes, scrutinizing. This careful look is discreetly emphasized by thin folds of leather and this contained tension leaves a vein slightly protruding on the temples. On the broad front, some wrinkles are formed, while on the upper part and on the sides of the skull, the rebellious strands, hot carved, soften and enliven this watchful look. From the robust aspect of an overweight man, a formidable intellect emerges.” (Julien 2005). In this image, just as in the portrait by Lasne, with which we have introduced in our essay, we find a space-time paradox insinuating: this must be Niceron’s ghost, as he had died in 1646, 2 years before the work was painted. Niceron had never been to this gallery, not even before that fateful date, since the parable of his brief last stay in Rome ended in October 1642. This painting was carried out after Niceron’s death, enhancing the relationship based on discipleship and shared between the two Minims with a deeply human meaning. Maignan would have probably wanted his friar friend beside him once again, maybe only in a painting, the friend with whom he had shared academic and theological reflections, decorative projects, and wonderful visions gathered in the Wonders of artificial magic.

References AGM (XVII century). Livre des Conclusions Capitulaires de ce convent de la S.te Trinitè Du mont (5-X-1620 -26-IX- 1649) Andersen A (1995) The mathematical treatment of anamorphoses from Piero della Francesca to Nicéron. In: History of mathematics: states of the art. Academic, Cambridge, MA Baitinger F-C (2006) L’esprit du portrait ou le portrait de l’esprit/Etude d’un portrait en anamorphose de Jacques d’Auzoles par le père J-F Niceron. In: Lampe-tempête, n1, le silence de l’expérience, w.i.p. Baltrušaitis J (1981) Lo specchio, rivelazioni, inganni e science-fiction (italian version). Adelphi, Milan Baltrušaitis J (1984) Anamorfosi o Thaumaturgus Opticus (italian version). Adelphi, Milan Bessot D (2005) Synthèse et développement de techniques d’anamorphoses au XVIIe siècle: les traités du père Jean-François Niceron. In: Mélanges de l’école française de Rome. 117–1. ‘École française de Rome, Rome, pp 91–129 Bortot A (2013) Dove lo sguardo si ricompone e s’acquieta. Immaginario scientifico e contestualità storica nei giochi ottici di Jean François Niceron. In: De Rosa A (ed) Jean François Nicéron. Prospettiva, Catottrica e Magia Artificiale. Aracne edizioni, Rome, pp 124–151

288

A. De Rosa and A. Bortot

Camerota F (2000) Architecture and science in baroque Rome. The mathematical ornaments of villa Pamphilj. In: Nuncius, annali di storia della scienza, Year XV, Number 2, Leo L. Olschki Firenze, pp 611–638 Candito C (2010) Il disegno e la luce. Fondamenti e metodi, storia e nuove applicazioni delle ombre e dei riflessi nella rappresentazione. Alinea, Florence Ceñal R (1952) Emmanuel Maignan su vida, su obra, su influencia. Revista de Estudios Polìticos XLVI:111–149 Ciucci G (1982) Rappresentazione dello spazio e spazio della rappresentazione. In: Ciucci G, Scolari (eds) Rassegna. Rappresentazioni, vol 9. Electa, Milan, p 11 Cojannot-Le Blanc M (2006) Les traités d’ecclésiastiques sur la perspective en France au XVIIe siècle: un regard de clercs sur la peinture? Dix-septième siècle 1(230):117–130 D’Auzoles de Lapeyre J (1638) Le Mercure charitable, ou Contre-Touche et souverain remède pour desempierrer le R. P. Petau, jésuite d’Orléans, depuis peu métamorphosé en fausse pierrede-touche, par Jacques d’Auzoles Lapeyre. in-fol. G. Alliot, Paris, pp 72–73 Del Monte G (1600) Perspectiva Libri Sex, Girolamo Concordia, Pesaro De Rosa A (2006) The Optik’s Apocalipse. The twin anamorphosis by Emmanuel Maignan and Jean-François Nicéron. Ikhnos. Lombardi, Siracusa De Rosa A (ed) (2013) Jean François Nicéron. Perspective, Catoptric and Artificial Magic. Aracne edizioni, Rome De Rosa A, D’Acunto G (2002) La vertigine dello sguardo. Saggi sulla rappresentazione anamorfica. Cafoscarina, Venice Descartes R (1637) Discours de la méthode pour bien conduire sa raison, et chercher la vérité dans les sciences Plus la Dioptrique, les Météores, et la Géométrie qui sont des essais de cette Méthode, Leiden, Ian Maire Ferrari G (2005) Copernico e la prima meridiana a riflessione. In: Procedings of the Conference XIII Seminario Nazionale di Gnomonica. Unione astrofili italiani, Lignano, pp 88–95 Fiorini Morosini G (2000) The penitential charism of St. Francis of Paola and the Order of the Minims. History and spirituality. Bibliotheca Minimum 3, Rome Fratini G, Moriconi F (2010) Datazione e attribuzione dell’anamorfosi di San Giovanni a Pathmos presso il Convento della Trinità dei Monti a Roma. In: MEFRIM: Mélanges de l’École française de Rome. Italie et mediterranée. T. 122/1: École française de Rome, Rome. pp 128–129 Gagnaire P (2003) Le cadran solaire à réflexion du Pére Maignan, à la Trinité des Monts. In: ANCAHA, n 97, Paris Gilman EB (1978) Curious perspective. Literary and Pictorical Wit in the Seventeenth Century. Yale University Press, New Haven, p 41 Julien P (2005) Anamorphoses et vision miraculeuses du Père Maignan (1602–1676). In: MEFRIM: Mélanges de l’École française de Rome. Italie et mediterranée, t. 117, 1. École française de Rome, Rome 2005, pp 65–66 Kemp M (2005) La scienza dell’arte. Prospettiva e percezione visiva da Brunelleschi a Seurat, italian edn. Giunti, Milan, pp 210–211 Kircher A (1635) Primitive Gnomonicae Catoptricae, hoc est horologiographiae novae specularis, ex typographia I. Piot, Avignone Krakovitch O (1981) Le couvent des Minimes de la Place-Royale, in Paris et Ile-de-France Mémoires. Klincksieck, Paris Kuchel P W (1979) Anamorphoscopes: a visual aid for circle inversion. In: The Mathematical Gazette, vol. 63, n 424. Mathematical Association, Leicester Maignan E (1648) Perspectiva horaria, sive de orographia gnomonica tum theorethica tum pratica libri quattuor, Typis & Expenfis Philippi Rubei, Rome Martin RPC (late XVIII century) Histoire du couvent royal des Minimes français de la très sainte Trinité sur le mont Pincius à Rome. Manuscript of the convent of Trinita dei Monti (Ms. Trin.). p. 325 Mirami R (1582) Compendiosa introduttione alla prima parte della specularia . . . , care of Francesco Rossi & Paolo Tortorino’s successors. Ferrara

10 Anamorphosis: Between Perspective and Catoptrics

289

Niceron JF (1638) La Perspective curieuse ou magie artificielle des effets merveilleux . . . . Chez Pierre Billaine, Parigi Niceron JF (1646) Thaumaturgus opticus, Langlois, Paris Nix L, Schmidt W (1900) Herons von Alexandria Mechanik und Katoptrik. Teubner, Leipzig Portoghesi P (1964) Borromini nella cultura europea. Officina edizioni, Rome Roberti G M (1902–1908) Disegno storico dell’Ordine dei Minimi, 3 vll. Tip. Poliglotta, Rome Rodis-Lewis G (1997) Cartesio. Una biografia. Editori Riuniti, Rome Shea WR (2014) Cartesio. La magia dei numeri e del moto, René Descartes e la scienza del Seicento,.Bollati Boringhieri, Turin, p 119 Shonberger G (1622) Demonstratio et Constructio Horologiorum novorum. Radio recto; refracto in water; reflexo in speculo; only magnet horas astronomicas, italicas, babylonicas indicatium, apud Ioannem Strasserum, Friburgi Brisgoiae Siguret F (1993) L’oeil surpris. Perception et représentation dans la 1ère moitié du XVIIe siècle. Klincksieck, Paris, pp 189–217 Vagnetti L (1979) De et natural artificial perspectiva. Libreria Editrice Fiorentina, Florence, pp 392–393 Withmore PJS (1967) The order of Minims in seventeenth-century France. Springer, The Hague, pp 155–162 Zahn J (1685). Oculus artificialis Teledioptricus: sive Telescopium. Würzburg.

Further Recommended Readings Alberti L B/Grayson C (1973). De Pictura. Laterza, Bari. Amodeo F (1933) Lo sviluppo della Prospettiva in Francia nel secolo XVII. In: Atti dell’Accademia Pontaniana, vol LXIII. Naples. pp. 24–25 Boyd ML (2005) Sundials: history, art, people, science. Frances Lincoln, London Camerota F (2006) La Prospettiva del Rinascimento. Arte, architettura, scienza. Electa, Milan, pp 194–195 Ceñal R (1952) Emmanuel Maignan su vida, su obra, su influencia. Revista de Estudios Polìticos XLVI:111–149 della Francesca P/Nicco Fasola G (1984). De Prospectiva pingendi. Le Lettere, Florence De Rosa A, Bortot A, Boscaro C, Monteleone C, Trevisan E (2012) Memory and oblivion. Discovery and digital survey of J.-F. Niceron’s mural anamorphosis. In: Acts of XVI ASITA National Conference. Vicenza Maignan E (around 1650) Mathematica Pamphilianos hortos exornans: In: Archivio di Stato di Roma, Archivio Spada 235, “Miscellanea de negotijj passati per mani mie sub Innocentio PP X”. Roma, cc. 627–630 Malcolm N (2004) Aspects of Hobbes. Oxford University Press, Oxford Massey L (2007) Picturing space, displacing bodies. Anamorphosis in early modern theories of perspective. Pennsylvania State University Press, Philadelphia Portoghesi P (1967) Borromini: architettura come linguaggio. Electa, Milano Rodis-Lewis G (2011) Marchingegni e prospettive curiose nel loro rapporto con il cartesianesimo. In: Lo Sguardo-Rivista di Filosofia, n 6, Il sapere Barocco: tra scienza e teologia Osborne H (ed) (1970) Oxford companion to art. Clarendon Press, Oxford Stafford BM, Terpak F (2001) Devices of wonder, from the world in a box to images on a screen. Getty Research Institute, Los Angeles Tabarrini M (2008) Borromini e gli Spada, un palazzo e la committenza di una grande famiglia nelle Roma barocca. Gangemi, Roma

Geometric and Aesthetic Concepts Based on Pentagonal Structures

11

Cornelie Leopold

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tessellations and Their Dualizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tiling with Regular Pentagons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pentagrid as Art Repertoire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . From the Pentagrid to the Kite-Dart-Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spatial Structures with Dodecahedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spatial Structures with Rhombohedra: Golden Diamonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geometry and Art: Reflections on Aesthetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

292 293 295 297 300 301 307 309 313 314 314

Abstract The relationship between geometry and art will be examined using the example of pentagonal structures. The work of contemporary Dutch artist Gerard Caris is based on those pentagonal structures. He calls his art work Pentagonism and questions how art creations and design processes can rely on strong, geometric, structural thinking. Pentagonal structures in plane as well as in space will be analyzed from a geometrical point of view and compared to corresponding art approaches. A review of geometric research on tessellations will be followed by a discussion on previous attempts to tile the Pentagrid with regular pentagons. The fundamental role of the Pentagrid and derivable Kite-Dart-Grid in Caris’ art design processes will also be explained. A step into the three-dimensional space

C. Leopold () FATUK – Faculty of Architecture, TUK Kaiserslautern, Kaiserslautern, Rheinland-Pfalz, Germany e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_20

291

292

C. Leopold

leads to the dodecahedron and derived rhombohedra configurations for tessellations, or packings, in space. The geometric background refers to fundamental works by Plato, Euclid, Dürer, and Kepler as well as recent research results. The investigation will end with a discussion of the aesthetic categories of redundancy and innovation, their application to art evaluation and the differentiation of geometry and art. The example of Caris’ art, which concentrates on the regular pentagon and the spatial counterpart dodecahedron, points out the possibilities of aesthetic expressions on the basis of geometric structures. Art enables the exploration of those structures in a playful and self-explanatory way and often precedes scientific research.

Keywords Geometry · Tessellation · Pentagon · Pentagrid · Kite-dart-grid · Dodecahedron · Packing · Aesthetics · Gerard Caris

Introduction The foundation of art can be seen in configurations of elements on the image plane or in space to create sculptures, in which elements are brought in relation to each other. Mathematics as a structural science was developed in the 1960s by the Bourbaki project (Bourbaki 1948): This provides a new foundation of art. Mathematical structures of order can especially allow to explore and describe possible geometric arrangements and configurations of artistic elements. By expressing such geometrically ideal structures of order in perceptible materialized configurations, these mere thinking categories become perceptible. We can refer to the fundamental work of Kant (1783) in which forms of intuition, space, and time are necessary conditions for all sensual experiences, and therefore also aesthetics. Kant understood the notion of aesthetics in its original Greek meaning “Aisthesis,” as the theory of the sensual perceptions. After further development, aesthetics has been defined as the theory of beauty or art, though the sensual perceptibility of an idea remains essential to this modern definition of aesthetics. In his “Vorlesungen über die Aesthetik,” Hegel described the task of art as: to present the idea for the immediate intuition in a sensual form and not in the form of thinking and pure spirituality in general. (Hegel 1835)

According to these characteristics of art proposed by Hegel, the task of art can be said to make an idea perceptible to our senses. But how can an idea be captured and considered in the creation process? The philosopher Max Bense developed a new definition of aesthetics (Bense 1965) starting with Hegel’s description of art. In that definition, the aesthetic state of an object is related to distributions of elements or schemata of order in the meaning of arrangements. Elisabeth Walther described the role of this new definition of aesthetics:

11 Geometric and Aesthetic Concepts Based on Pentagonal Structures

293

Aesthetics, as Bense brings it into play, is the principle of order par excellence. Aesthetics is order, and order on the other hand is describable by mathematics. Therefore, aesthetics is important as structuring the world for techniques as well as architecture, literature, etc., for all what will be created. Whenever we take something out of the chaos of existing and assemble it new, we need an aesthetic foundation. (Walther 2004)

Such principles of order as an aesthetic foundation of art can be found in the geometry of tessellations, patterns, and its spatial variants. This corresponds to the statement of Paul Valéry (1895) that patterns and ornaments are fundamental to art. He compared the role of the ornamental drawing in art with the role of mathematics in the sciences. This fundamental role can be also confirmed by the appearance of geometric patterns in various traditions of ethnic handicrafts found in transcultural studies. By studying the art of Gerard Caris (Leopold 2016, 2018) in its developments, it is possible to follow the process from an early stage of geometrically ordered structures to the later stage of aesthetic expressions. His concentration on the regular pentagon and its spatial counterpart, the dodecahedron, allows him to explore manifold pentagonal structures in plane as well as in space. Caris calls his art “Pentagonism” (Jansen and Weibel 2007). The suffix “-ism” denotes a condition or system; thus, the word describes his art as a system based on the pentagon. The following sections discuss this pentagonal system, its geometric foundations, and the received art creations stimulated by those structural considerations.

Tessellations and Their Dualizations There are three possible tessellations that can be formed with regular polygons: triangle, square, and hexagon. These three regular tessellations, or Platonian tessellations, are followed by eight semiregular or Archimedean tessellations. Archimedean tessellations are characterized by two or more convex regular polygons for which the same polygons in the same order surround each polygonal vertex. With the help of the mighty tool of dualization, the center of each polygon as a vertex can be joined to the centers of adjacent polygons in order to form dual tessellations (Weisstein). The number of lines in one point (vertex) produces the number of vertexes in the resulting polygon (Fig. 1). The result of the dualization of a semiregular tessellation is a dual tessellation with one type of nonregular polygon instead of regular polygons. One example is the so-called “Cairo Tiling” (MacMahon 1921), which is the result of the dualization of the semiregular tiling of three regular triangles and two squares. This is characterized by 3-3-4-3-4, or the sequence of triangles and squares in each vertex. Mathematical research on pentagonal tiling later continued. Casey Mann, Jennifer McLoud, and David Von Derau discovered a 15th monohedral tiling convex pentagon in 2015 (Pöppe 2015). In July 2017, Michaël Rao (2017) completed a computer-assisted proof showing that there are no other types of convex pentagons that can tile the plane (Fig. 2).

294

C. Leopold

Fig. 1 Regular and semiregular tessellations and their duals (http://mathworld. wolfram.com/ DualTessellation.html)

The artist Caris started with an intuitively perceived irregular pentagon in his works “Birth of Forms” (Fig. 3, left) and “Creation of the Pentagon.” For Caris, these works answered the question: “How to imagine something from nothing?” These spontaneous compositions were his keyworks. Pentagons are not obvious in these works, but they can be derived by connecting their vertices, extending and adding lines. The configuration in Fig. 3 is reminiscent of an overhand knot created with using a paper strip (Fig. 3, right). This astonishing fact shows the self-evident occurrence of the pentagon, which was not considered at that time by the artist. After some explorations into tiling the plane with variations of irregular pentagons and hexagons (Fig. 4), Caris was fascinated/captivated by the systematic

11 Geometric and Aesthetic Concepts Based on Pentagonal Structures

295

Fig. 2 The 15 possible monohedral tilings by convex pentagons (https://commons.wikimedia.org/ wiki/File:PentagonTilings15.svg)

Fig. 3 Gerard Caris, Birth of Forms, 1968. Knot construction of a regular pentagon, 2017. (© VG Bild-Kunst, Bonn 2018)

consequences of the universe found in the regular pentagon. Therefore, his art explorations were later guided by the universe of the regular pentagon, or “Pentagonism.”

Tiling with Regular Pentagons Early attempts to tile the plane with regular pentagons were unsuccessful. Rhomboidal gaps nevertheless remained between pentagons and could be placed at various positions. Figure 5 shows three examples of Caris’ work and the aesthetic

296

C. Leopold

Fig. 4 Gerard Caris, View of the universe 1, 1969. Cosmic motion series, 1971. (© VG BildKunst, Bonn 2018)

Fig. 5 Gerard Caris, Structure 1C and 2C, 1974, and Structure 6 C 2, 1975. (© VG Bild-Kunst, Bonn 2018)

results he discovered from those variations. The use of bright-dark colors supports spatial interpretations of the works, which he continues in his later reliefs. Such studies on tiling with regular pentagons have been conducted by Albrecht Dürer (1525, pp. 66–69) and Johannes Kepler (1619, p. 77), as shown in Fig. 6. It is interesting to see that Kepler had tried to achieve a subdivision of the pentagon, though Caris is able to more thoroughly explore the complex structures that arise from the pentagonal tiling in their aesthetic variations. In recent times, mathematical studies on pentagon packings in order to find the closest pentagon packing have been conducted by Greg Kuperberg and Włodzimierz Kuperberg (1990) based on double-lattice packings (Fig. 7). Caris’ artistic style is mainly interested in the pentagon as a systematic fundament for his creations.

11 Geometric and Aesthetic Concepts Based on Pentagonal Structures

297

Fig. 6 Configurations of regular pentagons by Dürer (left) and Kepler (right)

Fig. 7 Closest pentagon packing with double-lattice packings, according to Kuperberg

Pentagrid as Art Repertoire Explorations in tiling with regular pentagons led Caris to develop the Pentagrid (Fig. 8), which became the structural basis for his art works in the plane. It contains manifold relationships and configurations from the pentagon and forms and is thus the repertoire for all potentially derivable art works. The Pentagrid reflects the pentagonal system. Caris describes the Pentagrid as a grid with five degrees of freedom. The five lines in one node form its basic structure (Fig. 9, left). The regular pentagon has a side-to-diagonal relationship that follows the golden section; therefore, the grid has angles of 36◦ and 72◦ , and their sum, 108◦ , is the magnitude of the interior angle of the pentagon. The two possible

298

Fig. 8 Gerard Caris, Pentagrid, 1994. (© VG Bild-Kunst, Bonn 2018)

C. Leopold

299

108

°

11 Geometric and Aesthetic Concepts Based on Pentagonal Structures

36°

36°

72 °

Fig. 9 Five-dimensional grid of the pentagon and figures in the Pentagrid

Fig. 10 Gerard Caris, PC 26, 1995. ETX 21, 1998. PC 34, 2016. (© VG Bild-Kunst, Bonn 2018)

golden triangles are part of the pentagon. An infinite continuation of subdivision according to the golden section would result in fractal structures of the Pentagrid. In this way, the Pentagrid explains the structures depicted in Fig. 9: the pentagram, golden triangles, and two kind of rhombuses (one of them is divided by the grid lines in kite and dart figures) (Fig. 9, right). Pentagonal structures in the Pentagrid are the basis for various painting series by Caris such as Structure, PC (Pentagon Complex), and ETX (Eutactic Star Series). During a visit to his atelier, it was interesting to see that he used the Pentagrid on his drawing table during the creation process. Figure 10 shows some diverse examples from Caris’ creation period. Possible spatial interpretations become increasingly more obvious. In the ETX, Caris refers to the concept of vectors forming a star, which is called well-arranged, or eutactic, if the orthogonal projection from a higher-dimensional

300

C. Leopold

Fig. 11 Constructing Islamic ornament from pentagon. Gerard Caris, ETX 36, 1999. ETX 145, 2014. (© VG Bild-Kunst, Bonn 2018)

space is placed onto a subspace (Coxeter 1951). In this case, the projection moves from the three-dimensional space into the two-dimensional space. This background can explain spatial interpretations of two-dimensional works. Patterns have been especially developed in the tradition of Islamic ornaments. In the ninth century, references to patterns based on the golden section had already been made, in addition to root two and three proportional patterns. Figure 11 illustrates how an Islamic ornament can be constructed from the regular pentagon according the elementary studies of El-Said and Parman (1976). By overlaying two pentagons that have been rotated by 180◦ , the decagon is created, and the ornament is further developed (Ghyka 1977, p. 34). Two works of Caris (Fig. 11, right) related to these ornaments were also developed out of the Pentagrid.

From the Pentagrid to the Kite-Dart-Grid Kites and darts are present within the two rhombuses that form the structure of the Pentagrid (see Fig. 9, right). They form the basis for the quasiperiodic tilings, also known as Penrose tiling, found by Robert Ammann and Roger Penrose in the 1970s. In 1971, Caris showed that kites and darts are part of the Pentagrid, and he derived the Kite-Dart-Grid from it (Fig. 12). There are seven ways to arrange kites and darts around a node. Conway and Lagarias (1990) called them the Star, Ace, Sun, King, Jack, Queen and Deuce (Fig. 13). Each kite and dart is composed of two golden triangles, also called Robinson triangles (Grünbaum and Shephard 1987), which are marked in Fig. 13. Caris worked extensively on the arrangements of kites and darts, especially in his Kites and Dart series 2015–2017 (Fig. 14). In these works, Caris creates figurative works from the arrangement of the kites and darts, symmetrical or asymmetrical, and reveals the golden triangles by coloring them. In a new work of 2017, he returns to the original figure, the regular pentagon, with the help of the kite and dart configurations (Fig. 14, right). In this way, the circle of his substantial pentagon studies closes again.

11 Geometric and Aesthetic Concepts Based on Pentagonal Structures

301

Fig. 12 Gerard Caris, Pentagrid with kite and dart, 1971. Kite-Dart-Grid, 2015. (© VG BildKunst, Bonn 2018)

Fig. 13 The seven possible arrangements of kites and darts around a node (https://commons. wikimedia.org/wiki/File:Penrose_vertex_figures.svg)

Spatial Structures with Dodecahedra During his working process, Caris created spatial pentagonal structures in parallel to those in plane. Twelve regular pentagons form the dodecahedron, which is one of the five Platonic solids. The Platonic solids are built using congruent regular polygons

302

C. Leopold

Fig. 14 Gerard Caris, Kites and Darts series #26, 2015. #29, 2015. #98, 2017. (© VG Bild-Kunst, Bonn 2018)

Fig. 15 Platonic solids with cosmic assignments by Kepler

so that the same number of polygons abut each vertex. This condition leads to five regular polyhedra. Plato mentioned these polyhedra in his dialogue Timaeus and assigned each a cosmological meaning: cube (hexahedron) – earth, tetrahedron – fire, octahedron – air, icosahedron – water, and dodecahedron – universe. About the dodecahedron, Plato remarked, “There was yet a fifth combination which God used in the delineation of the universe” (Plato 360 BC, Timaeus 55c). Kepler illustrated the Platonic solids with these assignments in his Harmonices Mundi (Kepler 1619, p. 80) (Fig. 15).

11 Geometric and Aesthetic Concepts Based on Pentagonal Structures

303

Fig. 16 Dodecahedron grid and Gerard Caris, Monumental Polyhedral Net Structure, 1977. Polyhedral Net Structure #1, 1971. (© VG Bild-Kunst, Bonn 2018)

Finally, Euclid proved that there were only these five regular convex polyhedral. He described the properties of the Platonic solids and their construction. In searching for regular tessellations of space via tessellations in the plane, regular tessellations of space were found only to be possible for cubes. This may be the reason for the dominance of the orthogonal structures in the built environment. Caris looked to pentagonal structures as a new system of order to establish a pentagonal universe of art in the hopes of escaping overwhelming orthogonality. He started to arrange spatial configurations with dodecahedra, for example, by extending the edges of the dodecahedron. The result was a spatial dodecahedra grid. These attempts, along with other criteria for the distances, led to sculptures of Caris like Monumental Polyhedral Net Structure (Fig. 16). He also experimented with spatial packings of dodecahedra in accordance with the tessellation attempts of pentagons in plane. The resulting sculptures and reliefs (Fig. 17) show various gaps between the dodecahedra dependent of the kind of packing, or configurations, with dodecahedra. Euclid focused in his Elements, Book XIII, Proposition 17 (Euclid 300 BC) on construction possibilities packing dodecahedra. Because packing is possible with cubes, the dodecahedron can be truncated so that a cube is left, or the hipped roofs on the cube faces can make the dodecahedron arise. Kepler illustrated the idea (Fig. 18, left) in his Harmonices mundi (Kepler 1619, p. 181). With this idea, a concave solid can be constructed by taking a cube and putting the six, hipped roofs inwards. This concave solid can be used to fill the gaps between the dodecahedra for a space filling arrangement. Caris used this idea for series of reliefstructures (Fig. 19).

304

C. Leopold

Fig. 17 Gerard Caris, Reliefstructure 1K-1 Detail, 1988. Reliefstructure 1E-2 Detail, 1985. Helix 2–2 branching, 2002. (© VG Bild-Kunst, Bonn 2018)

Fig. 18 Truncating a dodecahedron or putting hipped roofs on the faces; a concave solid is created by turning them inside

Fig. 19 Gerard Caris, Reliefstructure 1D-3, 1985. Reliefstructure 2M-1, 1989. Reliefstructure 1O-2, 2002. (© VG Bild-Kunst, Bonn 2018)

11 Geometric and Aesthetic Concepts Based on Pentagonal Structures

305

Fig. 20 Gerard Caris, Tetracaidecahedron, 1975. (© VG Bild-Kunst, Bonn 2018)

Fig. 21 Gerard Caris, Model D House, 1985. Polyhedra sculpture 3 (truncated), 1979. (© VG Bild-Kunst, Bonn 2018)

Besides these more geometric sculptures and reliefs, Caris strove for applications of these spatial packings for housing designs. Plates with conceptional drawings by Caris illustrate possible housing creations with the appropriate truncated dodecahedra. The 12-sided dodecahedron and truncation results in the 14-sided tetracaidecahedron shown in Fig. 20. These conceptional ideas led him to experiments with models of such housing or sculptural designs (Fig. 21).

306

C. Leopold

Fig. 22 Zvi Hecker, Ramot Polin housing project, 1971 ff. (© Zvi Hecker)

Fig. 23 Zvi Hecker, Dense space packing of cubes inscribed into dodecahedrons. Assembly System of the pre-fabricated elements. (© Zvi Hecker)

There are few architects which have used dodecahedra for housing structures. In the 1970s, Zvi Hecker created a modular structure composed of dodecahedra in his Ramot Polin housing project in Israel. The structures were constructed with prefabricated pentagonal concrete panels. Hecker was commissioned by the Israeli Ministry of Housing to plan a complex of 720 housing units, northwest of Jerusalem. The project was also based on the truncated dodecahedron shown in Fig. 18. The concept is described by Zvi Hecker (Figs. 22 and 23): Most researchers-architects-students focus on the polyhedral (dodecahedral) elevations of the buildings, but actually the geometric strategy of the project is much more complex. It is related to the form of the site and the phenomena of the Golden Section manifested in the arrangement of seeds in sunflowers and in the geometry of the pentagon. The general plan of

11 Geometric and Aesthetic Concepts Based on Pentagonal Structures

307

the Ramot Housing is reminiscent of the palm of an open hand – its five fingers are retaining walls ‘supporting’ the slope of the hill. Each ‘finger’ is composed again of five boomerang– like buildings assembled in such a way as to create interior courtyards, kind of pedestrian paths reminiscent of the Old City of Jerusalem. The buildings’ inclination towards the interior of the courtyards provides a protected shadowy exterior. The geometry of the buildings follows a stereometric dense space packing of cubes inscribed into dodecahedrons (as dodecahedrons do not pack densely). In this project the elevation pentagons are the smallest in scale elements based on number five. (Hecker 2018).

Topological interlocking of convex regular polyhedra has been a subject of recent geometric and experimental research in structural design (Viana 2018a). Topological interlocking is defined as: “Elements (blocks) of special shape [arranged] in such a way that the whole structure can be held together by a global peripheral constraint, while locally the elements are kept in place by kinematic constraints” (Estrin et al. 2011). The topological interlocking of dodecahedra was found to differ in their relation to a tessellation at a half-section of the dodecahedron. The first possibility is deducible from the regular tessellation outlined by their hexagonal cross-sections following Dyskin et al. (2003) (Fig. 24, left). The second is based on a regular decagonal cross-section, also devised by Dyskin et al. (2003; Kanel-Belov et al. 2008) (Fig. 24, middle), and the third, described by Viana (2018b, p. 257), refers to the half-sections of dodecahedra as regular hexagons that tesselate the plane together with equilateral triangles (Fig. 24, right). It seems that Caris also worked on those arrangements of dodecahedra in reliefstructures. The shown reliefstructures in Fig. 25 are similar to TI 2 and TI 3.

Spatial Structures with Rhombohedra: Golden Diamonds The Kite-Dart-Grid that developed out of the Pentagrid led to the Penrose-tiling. The spatial counterpart can be found in the rhombohedra forming the so-called golden diamonds (Miyazaki 1986). There are two rhombohedra necessary for spatial tessellation that corresponds to the types of rhombuses in the plane (see Figs. 9 and 12). They play an important role for aperiodic space fillings in quasi-crystals (van de Craats 2007). In the rhombohedra, the diagonals cut each other in a golden ratio. The

Fig. 24 Three types of topological interlocking of dodecahedra, TI 1, TI 2, TI 3. (© Vera Viana)

308

C. Leopold

Fig. 25 Gerard Caris, Reliefstructure 1I-1, 1986. Reliefstructure 1E-2, 1985. Reliefstructure 1R-1, 1993. (© VG Bild-Kunst, Bonn 2018)

Fig. 26 Axes of the dodecahedron – eutactic star. Two types rhombohedra and compound rhombic polyhedral solids

rhombohedra can be developed out of the axes of the dodecahedron. The six possible fivefold rotational axes in the dodecahedron, through the faces and the center of the dodecahedron, build the spatial structural grid, or eutactic star (Fig. 26, left). Choosing three of the six axes generates the two different rhombohedra with vertices along these axes. Each rhombohedron consists of six congruent rhombi as faces. Two of each different rhombohedra together form a rhombic dodecahedron (Groß 2007), which can grow into larger rhombic polyhedral spatial structures (Fig. 26, right). These axes of the dodecahedron enable the dodecahedron to be used as a universal node for the node-edge-model of a related lattice structure (Fig. 27). A hole is placed in the middle of each face of the dodecahedron into which rods can be

11 Geometric and Aesthetic Concepts Based on Pentagonal Structures

309

Fig. 27 Dodecahedral node for the rhombohedral structural node system

Fig. 28 Gerard Caris, Reliefstructure 13V-1, 2003. Sculpture 2X-1, 1996. Rhombohedra sculpture # III, 2017. (© VG Bild-Kunst, Bonn 2018)

stuck. This forms a structural node system (Groß 2007). In this way, the relationship between dodecahedra and rhombohedra again becomes apparent. These spatial rhombohedra structures are applied by Caris in various reliefstructures and sculptures (Fig. 28).

Geometry and Art: Reflections on Aesthetics Are these examples by Caris just geometric models or artworks? This question leads to reflections about the relationship between geometry and art as well as aesthetic categories. As mentioned in the introduction, Paul Valéry compared the role of ornamental drawing in art to the role of mathematics in the sciences. Geometry is the groundwork for ornaments, and the found geometric structures drive explorations in art. However, art cannot be equated to geometry. Artwork gives the chance for aesthetically perceptible, materialized geometric structures. Max Bill (1949) explained in “The Mathematical Way of Thinking in the Visual Art of Our Time” that geometry is the primary element of every artwork, or the relationship of the components on the surface or in space. Nevertheless, geometry and art are not identical.

310

C. Leopold

Now in every work of art the basis of its composition is geometry or in other words the means of determining mutual relationship of its component parts either on plane or in space. ( . . . ) It must not be supposed that an art based on the principles of mathematics, such as I have just adumbrated, is in a sense the same thing as a plastic or pictorial interpretation of the latter. (Bill 1949, pp. 7–8)

Bill studied the relationship between structures and art. In his opinion, rhythmical order is the creative act of the artist producing an artwork, starting with a general structure. Let us start with the extreme case: a plane is covered with a uniform distribution in the sense in which this is understood in statistics; or a uniform network extends into space. This is an order which could be uniformly extended without end. Such an order we here call a structure. In a work of art, however, this structure has its limits, either in space or on the plane. Here we have the basis for an aesthetic argument in the sense that a choice has to be made: the possible, aesthetically feasible extension of the structure. Actually, it is only through this choice to limit the arbitrarily extensible structure on the basis of verifiable arguments that a discernible principle of order becomes comprehensible. ( . . . ) This means that art can originate only when and because individual expression and personal invention subsume themselves under the principle of order of the structure and derive from it a new lawfulness and new formal possibilities. ( . . . ) Such lawfulness and such inventions manifest themselves as rhythm in an individual case. Rhythm transforms the structure into form; i.e. the special form of a work of art grows out of the general structure by means of a rhythmic order. (Bill 1965)

How can aesthetics be substantiated in relation to order structures, rhythm, and as Bill characterized it, in the individual creative decisions that constitute the difference between art and mathematics? Some fundamental assumptions from Information Aesthetics can give criteria for aesthetic measures and evaluations. Information Aesthetics was developed in the 1960s, mainly in Germany and France, as an aesthetic theory on a rational mathematical fundament. The theory was developed by Max Bense (1965), a professor at the University of Stuttgart, and Abraham A. Moles (1966), a professor at the Université de Strasbourg. Today, the term “information aesthetics” is used with a different meaning: the display of huge quantities of data (Nake 2012). Frieder Nake, a protagonist of Information Aesthetics himself, profoundly summarized the theory, its applications, and its critics in his article. Information is seen as the key concept to understand aesthetic processes, and its aim was to create an opposing position to the classical tendencies of aesthetic theory using formalizations (Gianetti 2004). There are two roots of this new aesthetic theory: information and aesthetic measures. Information as a root was introduced by Claude E. Shannon (1948) during the rise of communication theory and communication technology. His mathematical information model integrated the stochastic nature of news. The possible states of a system can be described in combination with a set of transition probabilities going from one state to the next. Bense applied Shannon’s information theory to aesthetics. Successive emergence of structures is achieved by stochastic selections of unstructured material as concrete, perceivable realizations. Aesthetic realizations are seen as part of a communication process. The second root is Georg David Birkhoff’s aesthetic measure (Birkhoff 1933). He defined the aesthetic measure

11 Geometric and Aesthetic Concepts Based on Pentagonal Structures

311

as the function of the order and complexity grade of the viewed configuration: M = O/C, where O is the measure number of order relations, symmetries, and harmonies and C is complexity. Birkhoff described an aesthetic experience as follows: The typical aesthetic experience may be regarded as compounded of three successive phases: (1) a preliminary effort of attention, which is necessary for the act of perception, and which increases in proportion to what we shall call the complexity (C) of the object; (2) the feeling of value or aesthetic measure (M) which rewards this effort; and finally (3) a realization that the object is characterized by a certain harmony, symmetry, or order (O), more or less concealed, which seems necessary to the aesthetic effect. (Birkhoff 1933, p. 3)

The aesthetic measure of an artwork could be calculated out of order and complexity as numeric quantities. Bense supplemented Birkhoff’s numeric aesthetics with Information Aesthetics. Bense defined the aesthetic state as the relation of an ordered to a not-ordered state. The artwork gives aesthetic information; it is a material carrier of the aesthetic state. Information is always transmitted by signs. Therefore, information theory is based on semiotics, the science of signs for information transmission. Moles (1966) explained the difference between semantic and aesthetic information. Semantic information refers to the meaning of what appears in the message, which is dependent of conventional signs. On the other hand, aesthetic information is how it appears, the way it is expressed, and thus bound to individual signs. The artworks of Caris show us this difference very clearly; they give us aesthetic information. Birkhoff’s aesthetic measure M is interpreted by Bense as aesthetic information. The order relations O correspond to redundancy in finding order relationships. Redundant features are necessary for innovations to become recognizable: A perfect innovation in which there were only new states as in chaos, would not be recognizable. A chaos is finally unidentifiable. The recognizability of an aesthetic state requires not only the recognizability of its singular innovation, but also their identifiability based on their redundant order characteristics. (Bense 1965, p. 356, translated from German by the author)

Therefore, the interplay of redundancy and innovation – order and chaos – have to be in an optimal relation to achieve an aesthetic state. Although the goal of information aesthetics to offer totally objective methods for evaluating aesthetic objects has not been reached, these categories of redundancy and innovation remain important for an aesthetic evaluation. The recent developments in art and architecture show more and more complex, asymmetric, and chaotic trends with the tendency to arbitrariness. Often it is even expressed that the artist or architect wants to create complex works. Unfortunately, these aesthetic considerations make obvious that then the perceptibility is no longer reachable, and an aesthetic state cannot be established. Only with the help of redundant order features, which can be achieved by geometric order structures, perceptibility can be reached. More details on the aesthetic theory and the relationship to geometry have been explained by the author in “Prolegomena zu einer geometrischen Ästhetik” (Leopold 2011). By applying these considerations to the art of Caris, we are able to determine that aesthetic states are identifiable by their geometric order, without identifying

312

C. Leopold

geometry with art. His presented artwork transmits the pentagonal system by compositions of material elements, in the form of aesthetic information, the way it is expressed in paintings and sculptures as well as individual selections from the repertoire as it is formalized in the Pentragrid. Caris’ preference for the pentagon is not due to its form, nor is the form of the pentagon declared an aesthetic object. The preference is rather due to the manifold geometric structures derived from the pentagon. According Bense (1965, p. 43), the geometric form, here the geometric element pentagon, is not identical to the aesthetic element; only in its materialization in a composition with the individual decisions of the artist does it become the aesthetic element, which is seen as the smallest aesthetic unit and aesthetic structure. Caris refers in his art to the geometry of pentagonal structures in plane as well as in space. He uses the geometric structures to experience them in aesthetic compositional processes. In this way, he explores many geometric relationships in aesthetic expressions, without reducing the art to these geometric order structures. But structural thinking turns out as an adequate method to consolidate design and creation processes (Leopold 2012). Bense used later the term “generative aesthetics”; he defines this as “the summary of all operations, rules, and theorems, through whose application to a set of material elements that are able to function as signs, can deliberately and methodically generate in these aesthetic states (distributions and/or configurations)” (Bense 1965, p. 333, translated by the author). He compared generative aesthetics to the generative grammar for words. These attempts led to first computer generated artworks and texts in the 1960s by students of Bense like Nake. Although the works of Caris are not computer generated, we can interpret the investigated geometricmathematical structures, as they are expressed in the Pentagrid, as the underlying generative aesthetics that he calls also “principle structuring element” or “leading principle.” Caris explained the role of these generative mathematical principles in the creative process for his artworks with high importance: The grid and its three-dimensional version, known as a lattice, is considered as the principle structuring element of reality as well as a leading principle in my personal exploration of art in which an entirely new form of development, self-coined as ‘Pentagonism’, has come into existence. This reveals creations never envisioned before in art, in which an interlinking of art and mathematics becomes self-evident for me and everyone else to see, as well as bringing about an aesthetical appreciation dependent on the prior package of conditioned aspects of the individual.

Caris’ “Pentagonism” should be understood as a radical structural fundament for art and design. With this claim, he works out all consequences for an alternative to orthogonal structures, showing a more complex, fascinating, and perceptible structural system. Exploring the pentagonal and dodecahedral structures through his artworks leads him again back to the fundamental starting element: the pentagon. He recently created in a new work, through the radiation of lines forming the pentagon or in a sculptural work by polyhedral net structures, a pentagon by arranging of dodecahedra (Fig. 29). Aesthetics and mathematics merge together as one for him: Geometric as well as non-algorithmic elements involved in the creation of these new works evoke a sense of Unity in the viewing process in which aesthetics and mathematical logic merge together as one. (Statement of Gerard Caris, 29 September 2018)

11 Geometric and Aesthetic Concepts Based on Pentagonal Structures

313

Fig. 29 Gerard Caris, Pentagonal Radiation # 2, 2017. Polyhedral Net Structure #2, 1972. (© VG Bild-Kunst, Bonn 2018)

Conclusion The pentagon evokes its own manifold structures in plane and in space. Compared to the mostly common orthogonal system that is based on the square, the pentagonal system reveals many unique geometric relationships. Through the artworks presented, we can follow the development of the pentagonal system and the related creations of art. The derived geometric structures can serve as fundamental order characteristics in the described meaning of redundancy according to Information Aesthetics. The work of the artist Gerard Caris shows in an extraordinary way how the pentagonal and dodecahedral structures can be a repertoire for so many different art expressions in drawings, paintings, and sculptures, each with its aesthetic information. It is apparent that neither geometry nor mathematics can be directly identified with art, but aesthetics and mathematical logic merge together for Caris. The analysis of pentagonal and dodecahedral structures goes back to the early history of geometry (i.e., Plato, Euclid, Dürer, and Kepler), but still today new relationships have been found in recent research. Monohedral tiling with convex pentagons or topological interlocking of dodecahedra, for example, is a new research topic connected to advanced digital possibilities. Caris’ art expresses the stimulating role of geometric structures in the creation processes. The viewer of his works can take part in his structural thinking processes, which are visualized in methodological tools like the Pentagrid, for example. The manifold surprising and fascinating patterns as well as the spatial configurations in Caris’ art juxtapose innovation and redundancy, a balance that guarantees perceptible aesthetics.

314

C. Leopold

Cross-References  Tessellated, Tiled, and Woven Surfaces in Architecture Acknowledgements Many thanks to the artist Gerard Caris for the opportunity to visit him in his atelier, showing and explaining his work to me, and allowing me to get an inside view of his creation processes. The images of his works of art in the figures here are used with his kind permission, and they are managed and supported by VG Bild-Kunst, Bonn. I am grateful to Margriet Caris for helping me with all of my questions and requests. Thank you to Zvi Hecker, who agreed to allow the use of his drawings and photos of Ramot Polin housing project to explain his design background. Finally, many thanks to Vera Viana for her discussions on the relationship of recent topological interlocking research and Gerard Caris’ respective artworks, as well as for creating the drawings/renderings in Fig. 24 for this paper. Thank you also to Jasmine Segarra for proofreading this chapter.

References Bense M (1965) Aesthetica. Einführung in die neue Ästhetik. Agis, Baden-Baden, 2nd expanded edn 1982 Bill M (1949) Die mathematische Denkweise in der Kunst unserer Zeit. Werk 36, 3, Winterthur. English version: The mathematical way of thinking in the visual art of our time. In: Emmer M (ed) (1993) The visual mind: art and mathematics. MIT Press, Cambridge, pp 5–9 Bill M (1965) Structure as art? Art as structure? In: Kepes G (ed) Structure in art and in science. Braziller, New York, pp 150–151 Birkhoff GD (1933) Aesthetic measure. Harvard University Press, Cambridge Bourbaki N (1948) Éléments de mathématique. Paris 1939 ff, L’Architecture des Mathématiques Caris G (2018) Pentagonism. http://www.gerardcaris.com. Accessed 10 Oct 2018 Conway JH, Lagarias JC (1990) Tiling with polyominoes and combinatorial group theory. J Combin Theory Ser A 53:183–208. Figures available https://commons.wikimedia.org/wiki/ File:Penrose_vertex_figures.svg. Accessed 10 Oct 2018 Coxeter HSM (1951) Extreme forms. Can J Math 3:391–441. https://doi.org/10.4153/CJM-1951045-8 Dürer A (1525) Underweysung der Messung, mit dem Zirckel und Richtscheyt, in Linien, Ebenen und gantzen corporen. Nürnberg, p 66–69. Online Edition: digital.slubdresden.de/werkansicht/dlf/17139. Accessed 10 Oct 2018 Dyskin A, Estrin Y, Kanel-Belov A, Pasternak E (2003) Topological interlocking of platonic solids: a way to new materials and structures. Philos Mag Lett 83(3):197–203 El-Said I, Parman A (1976) Geometric concepts in Islamic art. World of Islam Festival Publishing Company Ltd, London, p 82ff Estrin Y, Dyskin A, Pasternak E (2011) Topological interlocking as a material design concept. Mater Sci Eng C 31(6):1189–1194 Euclid (300 BC) Elements Book XIII. English Version by David E. Joyce, 1996. https://mathcs.clarku.edu/ djoyce/java/elements/bookXIII/bookXIII.html. Accessed 11 Oct 2018 Ghyka MC (1977) The geometry of art and life, 2nd edn. Dover, New York Gianetti C (2004) Cybernetic aesthetics and communication. Media Art Net. http://www. medienkunstnetz.de/themes/aesthetics_of_the_digital/cybernetic_aesthetics. Accessed 14 Nov 2018 Groß D (2007) Planet “Goldener Diamant”. In: Leopold C (ed) Geometrische Strukturen. Technische Universität Kaiserslautern, Kaiserslautern, pp 28–33

11 Geometric and Aesthetic Concepts Based on Pentagonal Structures

315

Grünbaum B, Shephard GC (1987) Tilings and patterns. W. H. Freeman, New York, pp 537–547 Hecker Z (2018.) http://www.zvihecker.com/projects/ramot_housing-113-1.html and corrections, sent by email. Accessed 27 Oct 2018 Hegel GWF (1835) Vorlesungen über die Aesthetik. In: Hotho HG (ed) Duncker & Humblot, Berlin, p CXVIII Jansen G, Weibel P (eds) (2007) Gerard Caris. Pentagonismus/Pentagonism. Walther König, Köln Kanel-Belov A, Dyskin A, Estrin Y, Pasternak E, Ivanov-Pogodaev I (2008) Interlocking of convex polyhedra: towards a geometric theory of fragmented solids. Mosc Math J 10(2):337–342. (ArXiv08125089 Math) Kant I (1783) Prolegomena zu einer jeden künftigen Metaphysik, die als Wissenschaft wird auftreten können. Johann Friedrich Hartknoch, Riga. http://www.uni-potsdam.de/u/philosophie/ texte/prolegom/!start.htm Kepler J (1619) Harmonices Mundi. Lincii Austriae, Linz. Online Edition https://archive.org/ details/ioanniskepplerih00kepl. Accessed 10 Oct 2018 Kuperberg G, Kuperberg W (1990) Double-lattice packings of convex bodies in the plane. J Discrete Comput Geom 5:389–397. https://doi.org/10.1007/BF02187800 Leopold C (2011) Prolegomena zu einer geometrischen Ästhetik. In: Kürpig F (ed) Ästhetische Geometrie – Geometrische Ästhetik. Shaker, Aachen, pp 61–65 Leopold C (2012) Strukturelles Denken als Methode. In: Warmburg J, Leopold C (eds) Strukturelle Architektur. Zur Aktualität eines Denkens zwischen Technik und Ästhetik. Transcript, Bielefeld, pp 9–29 Leopold C (2016) Geometry and aesthetics of pentagonal structures in the art of Gerard Caris. In: Torrence E et al (eds) Proceedings bridges Finland. Tessellations Publishing, Phoenix, pp 187–194 Leopold C (2018) Pentagonal structures as impulse for art. In: Emmer M, Abate M (eds) Imagine Math 6. Between culture and mathematics. Springer International Publishing, Cham. https://doi. org/10.1007/978-3-319-93949-0 MacMahon MPA (1921) New mathematical pastimes. University Press, Cambridge, p 101 Miyazaki K (1986) An adventure in multidimensional space: the art and geometry of polygons, polyhedra, and polytopes. Wiley, New York Moles AA (1966) Information theory and esthetic perception. Urbana, University of Illinois Press. French original 1958 Nake F (2012) Information aesthetics: an heroic experiment. J Math Arts 6(2–3):65–75. https:// doi.org/10.1080/17513472.2012.679458 Plato (360 BC) Timaeus. Translated by Jowett B. Online Edition https://www.ellopos.net/elpenor/ physis/plato-timaeus. Accessed 12 Oct 2018 Pöppe C (2015) Unordentliche Fünfeckspflasterungen. Spektrum der Wissenschaft 11/2015, pp 62–67. https://commons.wikimedia.org/wiki/File:PentagonTilings15.svg. Accessed 12 Oct 2018 Rao M (2017) Exhaustive search of convex pentagons which tile the plane. Manuscript: 16, Bibcode: 2017arXiv170800274R. https://perso.ens-lyon.fr/michael.rao/publi/penta.pdf. Accessed 10 Oct 2018 Shannon CE (1948) A mathematical theory of communications. Bell Tech J 27:379–423; 623–656 Valéry P (1895) Introduction à la méthode de Léonard de Vinci. La Nouvelle Revue Française, Paris van de Craats J (2007) Rhombohedra in the work of Gerard Caris. In: Jansen G, Weibel P (eds) Gerard Caris. Pentagonismus/Pentagonism. Walther König, Köln, pp 44–48 Viana V (2018a) From solid to plane tessellations, and back. Nexus Netw J. 20:741–768 https:// doi.org/10.1007/s00004-018-0389-5 Viana V (2018b) Topological interlocking of convex regular Polyhedra. In: Leopold C, Robeller C, Weber U (eds) RCA 2018. Research culture in architecture x international conference on cross-disciplinary collaboration. Conference book. Fatuk – Faculty of Architecture, Technische Universität Kaiserslautern, 2018, pp 254–257

316

C. Leopold

Walther E (2004) Philosoph in technischer Zeit – Stuttgarter Engagement. Interview mit Elisabeth Walther, Teil 2. In: Büscher B, von Herrmann H-G, Hoffmann C (eds) Ästhetik als Programm. Max Bense/Daten und Streuungen. Diaphanes, Berlin, pp 62–73, translated by Cornelie Leopold Weisstein EW “Dual Tessellation”. From MathWorld – a Wolfram web resource. http://mathworld. wolfram.com/DualTessellation.html. According Williams R (1979) The geometrical foundation of natural structure: a source book of design. Dover, New York, p 37. Accessed 10 Oct 2018

Mathematics and Origami: The Art and Science of Folds

12

Natalija Budinski

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modern Origami and Mathematical Axiomatization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Origami and the Delian Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modular Origami . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Origami and Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Art of Origami . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

318 319 324 326 326 330 344 345 345

Abstract Origami is usually connected with fun and games, and the most common association with origami is a paper-folded crane, which has a special place in Japanese culture. The popularity of modern origami has grown in many aspects, including mathematical, scientific, artistic, or even as an enjoyable craft. This chapter describes the developmental path of origami from simple paper folding through to that of a serious scientific discipline. It might be said that it all started with the discovery of mathematical rules behind the folds, which led to the axiomatization of origami and the establishment of it as a mathematical discipline. Many unsolved problems, such as doubling the cube, became solvable thanks to origami. Interestingly, while developing as a scientific discipline, origami has also been establishing itself as a form of modern art and is a great inspiration to many contemporary artists in different creative disciplines, such

N. Budinski () Petro Kuzmjak School, Ruski Krstur, Serbia e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_13

317

318

N. Budinski

as sculpture, fashion, or design. Examples of these contemporary art forms are shown in this chapter.

Keywords Origami · Mathematical · Artistic · Crane · Scientific

Introduction There is no consensus on where origami originated, but it is assumed that its roots are in China associated with the discovery of paper. However, the craft of origami blossomed in Japan, where it is treated as a national art. Origami is usually connected with fun and games, and the most common association with origami is a crane, which has a special place in Japanese culture. The Crane, as shown in Fig. 1, can be simply folded from a square piece of paper. The popularity of modern origami has grown in many aspects, including mathematical, scientific, artistic, or even as an enjoyable craft. Today it is a complex discipline, with a preference for simplicity, where less is more. By less, we mean the number of folds. The folding process is equally as important as the final result. Complicated and tiresome folding results in stiff, messy, and unappealing origami (Kasahara 1973). The two basic rules that provide simplicity were given by Robert Harbin (1956). The first one is that models are obtained without scissors and glue, only by folding, and the second is that the shape of models should be recognized without additional colors or markings. Those rules are mostly obeyed by enthusiastic origamists, but variations with magnificent results are also possible. The restriction of uncut and unglued paper induces sparks of creativity. Each step can be reversed and studied, changed, or improved, which contributes to the freshness and freedom of the expression. Origami is an art that communicates and shares, based on ordinary paper folding, thus making it appealing to common people. There are many kinds of origami with respect to folding, for example, there are flat origami, modular origami, wet origami, or tessellation origami. Flat origami produces models that can be pressed (Hull 2002; Schneider 2004) without additional creases, such as the well-known crane or similar models. Modular origami consists of assembled pieces (modules) in a model. It is interesting that modules are obtained

Fig. 1 Origami crane

12 Mathematics and Origami: The Art and Science of Folds

319

Fig. 2 Pureland origami model of samurai hat

by simple folds, while finished models can be very complex. Different types of polyhedrons are usually made with this technique. Wet origami is a type of origami where models are obtained from a paper that is dampened, which allows easy molding and gentle curves. Natural looking models, such as animals and plants, are often made by this origami technique. Origami tessellation is an origami technique where models are obtained by folding in a repetition (Verrill 1998). Also, there is pureland origami, a branch of origami proposed by John Smith. He established a minimalistic aesthetic of design and made origami accessible and suitable for beginners in folding, as well as for disabled children and people with hands manipulation difficulties. In Fig. 2, we can see a pureland origami model example called samurai hat. These models exhibit their own elegance and harmony. The principles of pureland origami are few, exact, and simple. One is that only a square-shaped paper can be used. The other is that only mountain and valley folds can be used in folding, with permission to unfold and turn the model over. Folds are created or manipulated one at a time. It is interesting to note that this set of strict constrains not only opens new possibilities for origami but also allows for new artistic approaches as well. Each folding result must provide strong suggestion of a final form where the essence of the structure is appreciated. This kind of simplicity provides a new perspective in the art of origami (Smith 1993).

Modern Origami and Mathematical Axiomatization When talking about origami, we need to mention the father of modern origami Akahira Yoshizawa (1911–2005) who invented around 50,000 origami models and diagrams and described some of them in 18 books. Yoshizawa’s intention was to make models that would be based on simple folding lines that anyone could follow (Smith 2011). His modifications of traditional design made origami a creative art with vast potential and numerous followers. Through the work of Akahira Yoshizawa, a folder can discover recommendations for successful paper folding. According to Yoshizawa, a folder should have a sophisticated respect for paper and a firm sense of shape. The results that would follow should express the inner

320

N. Budinski

characteristic of the subject and emphasize the suggestion rather than explanation (Konjevod 2008). Artistic moments enable folders to feel the spirit of origami. To Yoshizawa, origami was more than diagrams and geometry, even though diagrams of his models were origami’s introduction to the world of mathematics. Yoshizawa was very systematic in his work and symbolically represented origami folding, which lead to the development of a system of origami folds. For example, folding as it is shown in Fig. 3 (on the left) is called a valley fold or crease and diagrammatically it is represented with ———–, while folding as it is shown in Fig. 3 (on the right) is called a mountain fold or crease. Mountain fold is represented with -•-•-•- kind of lines. Mathematically, it can be said that mountain folds are convex, and valley folds are concave (Hull 2003). Valley and mountain folds interchange the view point of the paper face. The two-fold sets can be considered as dual (Dureisseix 2012). Origami models can be described with crease patterns to some extent. A crease pattern is a representation of crease types on unfolded paper. In Fig. 4 we can see the crease pattern of the crane from Fig. 1. The issue is that crease patterns lack information that would describe the folded model, and determination of the general crease pattern folding ability is an open question (Maehara 2010). Flat origami is loaded with mathematical problems. Flat paper folding follows the rules that can be described mathematically. The flatness of paper enables us to

Fig. 3 Mountain and valley folding

Fig. 4 Crease pattern of crane

12 Mathematics and Origami: The Art and Science of Folds

321

observe an origami model in two dimensions even though it is a three dimensional model without compromising any information about layer overlapping (Schneider 2004). There are established mathematical rules that enable us to produce a crease pattern on flat origami models, for example, if we take a piece of paper and mark a point on it somewhere in the center of the paper. If we make one or more folds that pass through that point and then count the folding that represents mountains and valleys, the difference between the number of mountains and valleys is always two. That claim is known and proven as Maekawa’s theorem (Justin 1986a). Maekawa theorem states: |M-V| = 2, where M is number of mountain folds and V is number of valley folds at every vertex. That means that the number of creases is even, and if we imagine an origami figure’s crease-pattern as a graph, it can have two colorable faces. The consequence of Maekawa’s theorem is that for each flat origami figure it is always possible to color with two colors in a way that the fields obtained by folding and with the same border are colored with a different color. Also it means that each vertex number of creases is even. Figure 5 shows the crane crease-pattern colored in two colors (Hull 1994). The second very important rule of folding is described by the Kawasaki theorem. The Kawasaki theorem says that an origami model is flat-foldable if and only if the alternating sum of the consecutive angles folds around the vertex is zero. An example of this is shown in Fig. 6. It can be seen that the sum of the angles around the vertex (clockwise from bottom) is equal to zero (90◦ − 45◦ + 22◦ 30 − 22◦ 30 + 45◦ − 90◦ + 22◦ 30 − 22◦ 30 ) so it can be concluded that this crease pattern is flat-foldable. This criterion cannot be easily extended to the crease patterns with more vertexes (Hull 1994). It is also important to state that the paper sheet can never penetrate the fold. During the folding, the length of curves drawn on the surface is preserved on the paper, despite the transformation, due to the absence of cuts. If we overlook the paper thickness, this geometric transformation can be considered as isometric embedding (Lebee 2015). Besides theorems, there are origami axioms. Basically, origami axioms are operations that distinguish the creation of a crease by aligning one or more points of combination and lines on a paper sheet. They are mostly known as Huzita axioms because they were first proposed formally by Humiaki Huzita (1989, 1992), even though other mathematicians worked on the topic, such as Jacques Justin (Justin

Fig. 5 Two colored crane crease-pattern

322

N. Budinski

Fig. 6 Illustration of Kawasaki’s theorem Fig. 7 Graphical representation of the first axiom

A1

A2

Fig. 8 Graphical representation of the second axiom

A1

A2

Fig. 9 Graphical representation of the third axiom

p1

p2

1986b). Koshiro Hatori added another axiom in 2001 (Hatori 2001), which was proven by Robert Lang as a complete system of axioms (Lang 2003; Alperin and Lang 2006). The mathematically formal description provided an explanation of possible origami-geometric constructions. The list of axioms and their graphical representations are provided below in Figs. 7, 8, 9, 10, 11, 12, and 13, respectively.

12 Mathematics and Origami: The Art and Science of Folds

323

Fig. 10 Graphical representation of the fourth axiom

A1

p1

Fig. 11 Graphical representation of the fifth axiom

A1

A2 p1

Fig. 12 Graphical representation of the sixth axiom

A1

A2 p1 p2

Fig. 13 Graphical representation of the seventh axiom p2

A p1

The first axiom: Given two different points A1 and A2 , there is a unique fold passing through both (ruler operation). The second axiom: Given two different points A1 and A2 , there is a unique superposing fold A1 and A2 (perpendicular bisector).

324

N. Budinski

The third axiom: Given different lines p1 and p2 , there is a superposing fold of p1 onto p2 (bisector of an angle). The fourth axiom: Given a line p1 and a point A1 , there is a unique perpendicular fold to p1 and passing through point A1 (perpendicular footing). The fifth axiom: Given two different points A1 and A2 and a line p1 , there is a fold placing A1 to p1 and passing through A2 ( tangent to parabola from a point). The sixth axiom: Given two different points A1 and A2 and two different lines p1 and p2 , there is a fold placing A1 onto p1 and A2 onto p2 . The seventh axiom: For a given point A and two different lines p1 and p2 , there is a fold placing A onto p1 and perpendicular to p2.

Origami and the Delian Problem The idea of discovering mathematics behind the paper folds appeared before the Huzita–Hatori axioms. Sundara Row wrote a book titled “Geometrical Exercises in Paper Folding” published in 1893 in India that suggested explanations of underlying mathematical concepts in paper folding. In Fig. 14, we can see the paper folded squaring binomial explanation from the book. In 1930, an Italian mathematician Margherita Beloch Piazzola proposed that paper folding could be used as a tool for solving geometrical problems. She analyzed algebraic aspects of origami in her work and proposed paper folding in order to solve third-degree equations. In honor of her, Axiom 6 is also called the Beloch fold (Liu 2017), which corresponds to solving third-degree equations. Today, it is well known that the mathematical basis of origami enables solutions of equations, such as quadratic, cubic, and quartic equations, with rational coefficients. Also, the doubling of the cube problem or trisecting an angle can be solved with origami. Constructing cube roots are also possible due to origami axioms.

Fig. 14 Part of “Geometrical exercises in paper folding” (Row 1893)

12 Mathematics and Origami: The Art and Science of Folds

325

The Doubling the cube problem is also known as the Delian problem and is significant in the history of mathematics (Burton 2006; Zhmud 2006). It is based on calculating the volume of geometric√solids. A unit cube doubling depends upon constructing a line segment of length 3 2, which is impossible to solve within the constraints of Euclidian geometry. The solution based on origami was found in 1986 by Peter Messer (Messer 1986). The instructions for the solution based on folding are shown in Fig. 15. There are five steps. First, a square-shaped paper needs to be folded in half. Then it must be folded to match the line segments AC and BE. The third step is to fold three equal parts, and the fourth is to fold angle C to match the line AB and to match the point√I with line FG. The point C will define the line segment AC and CB in the ratio 1: 3 2. The paper folded solution is shown in Fig. 16. Let BC = 1 and let x = AC and y = BR, where AB = x + 1 and CR = 1 + x − y follow. By applying the Pythagorean theorem CR2 = BR2 + BC2 , we obtain 2 +2x (1 + x − y)2 = y2 + 1. Simplifying the expression, we obtain y = x2+2x (1). If we observe triangles IFC and CBR, we can notice that they are similar. This indicates that BR:CR = FC:IC (2). The segment FC makes up part of segment AB, which means that AB = AF + FC + CB which is 1+x = 13 (1 + x)+F C+1, where F C = y 2x−1 3 . If we apply that to (2) we obtain 1+x−y = (2x−1)(x+1) (3). As the result of that we get x3 + 3 and x3 = 2.

2x−1 3 1+x 3

3x2

, which by simplifying is y = + 2x = 2x3 + 3x2 + 2x − 2

Fig. 15 Peter Messer’s instruction for the origami solution of the Delian problem (Fenyvesi et al. 2014)

Fig. 16 Paper folded solution of the Delian problem

326

N. Budinski

Modular Origami Besides flat origami, modular origami also attracts scientific attention due to the visually attractive models. Modular origami is attractive in the world of mathematical art, and there are many books with instructions for folding shapes like polyhedrons (Kasahara 2003; Mukerji 2007). Unlike flat or traditional origami, models of modular origami are made by assembling units. Usually, all the units are the same and easy to fold, while the final models are complex and require certain skill to assemble. Figure 17 represents Platonic solids folded by the principles of modular origami. Platonic solids are regular geometrical solids, named after Plato. There are only five of them: tetrahedron, cube, octahedron, dodecahedron, and icosahedron. This is because there are at least three faces at each vertex. The sum of internal angles at the vertex has to be less than 360◦ , otherwise the shape would be flattened. Because of their aesthetic beauty and interesting mathematical properties, they are an inspiration to mathematicians and artists. Modular origami can be used for assembling different types of polyhedrons. For example, Fig. 18 shows a stellated icosahedron with 20 triangular faces rising to triangular pyramids. This polyhedron was made from 30 units called sonobe. The instructions for sonobe folding are in Fig. 19. In Fig. 20, we can see objects called Epcot balls, and each of them is made from 270 sonobe pieces. Figures 17, 18, and 19 show other origami polyhedrons constructible with paper (origami polyhedrons shown in Figs. 17, 18, and 19 were made by high school students from Petro Kuzmjak School in Serbia in their mathematical lessons dedicated to the geometrical properties of polyhedrons.).

Origami and Technology It is almost an unbelievable fact that simple transformation of paper can produce so many amazing properties, which inspire many mathematicians, artist designers, scientists, and engineers. It is almost impossible to make a comprehensive list of all

Fig. 17 Platonic solids made by origami technique

12 Mathematics and Origami: The Art and Science of Folds

327

Fig. 18 Origami made stellated icosahedron

Fig. 19 Instructions for folding sonobe unit

the applications and results influenced by origami. Anyone who has tried origami might notice that simple fold organization results in rich motion. Many advantages of paper folding, such as simple transformation of matter, the use of inexpensive material without cuts, and easy development of three-dimensional shape, form a basis for the investigation of technological and architectural implementation (Peraza-Hernandez et al. 2014; Sorguç et al. 2009; Dureisseix 2012; Lalloo 2014). The latest challenges are in the fields of robotics (Felton et al. 2014), technology, and space research. Miura folding (also called Miura-ori) is ubiquitous with connections between origami and science. This form of origami folding proposed by Koryo Miura reveals the flatness of folded paper or of some other material. The paper version of Miuraori folding pattern is represented in Fig. 21. It can be noticed that the mountain and valley folds form congruent parallelograms. After making creases, objects can be simultaneously folded or unfolded (Mahadevan and Rica 2005). It gives the impression that folds can be “remem-

328

N. Budinski

Fig. 20 Epcot balls, each made from 270 sonobe units

Fig. 21 Miura-ori folding pattern. (Wikipedia commons, source: www. Commons. wikimedia.org/ wiki/File%3AMiura-Ori_CP. svg)

bered,” so it is also called surface memory origami (Nishiyama 2012). Flat surfaces that are Miura folded are collapsible, transportable, and deployable objects. This makes them suitable to design a range of objects, from robots to small surgical devices. The future of Miura folded objects is bright, since it can be applied to a wide range of materials, including graphene, which is practically a one-atom thick

12 Mathematics and Origami: The Art and Science of Folds

329

Fig. 22 The space satellite with foldable solar panels. (Courtesy: National Science Foundation, source: www.nsf.gov)

material (Turner et al. 2016). The Miura-ori can also be applied to rigid materials, which is called rigid origami, and it has great practical importance for and influence on technology. It is used in astro-nautical engineering for space satellites and their solar panels. In 1995 a Japanese research vessel was launched into space with a solar array that was folded using Miura-ori pattern. The space satellite with its foldable solar panel is shown in Fig. 22. Rigid origami, unlike paper origami, deals with flat rigid sheets, which cannot bend during the folding process. Rigid origami models are made from sheet metal or some other material connected with hinges that represent crease lines. Many patterns can be folded in the conventional paper way but cannot be folded rigidly (Abel et al. 2016). Besides space exploration, rigid origami has many applications in kinetic architecture (Tachi 2011) and robotics (Balkcom 2002). Origami and Miura-ori find applications in the construction of metamaterials. Metamaterials are engineered materials with properties that cannot be found in nature. Their unusual properties arise from the design of their structure arranged in smaller units. Origami metamaterials consist of units that are tessellated folding patterns such as Miura-ori. Metamaterials based on Miura-ori exhibit many interesting properties with vast possibilities for applications. For example, the Miura-ori fold has the feature of coexistence of positive and negative Poisson’s ratio, which is the ratio of the transverse contraction strain to longitudinal extension strain in a simple tension experiment (Lakes and Witt 2002); Lv et al. 2014). These unique features are a valuable starting point for innovative metamaterials design, which in general can give us a new perspective on materials.

330

N. Budinski

Art of Origami Besides science, the mathematical concepts are very inspiring and lead to unique artistic expression outside of tradition. In the case of origami, it so happened that mathematical concepts became crucial in origami design and allowed for better and more diverse forms of art (Lang 2009). It is not an easy job to describe connections between mathematics and origami art. The computer scientist David Huffman said he finds it natural that elegant mathematical theorems joined with paper surfaces should lead to certain visual elegance (Wertheim 2004). The artistic expression is noticeable from the very beginnings of origami. Fig. 23 shows instructions in the first origami book Hiden Senbazuru Orikata, (English: Secret to Folding Thousand Cranes), published in Japan in 1797. The boundaries of origami are pushed not just by science but also by art. Contemporary artists combine traditional origami aesthetics with new concepts, modern materials, and innovative technology, producing marvelous designs. There are numerous ways of artistic expression through paper folding and geometric structures. The Demaines, father and son artists and mathematicians, suggest advantages of combining art and mathematics in their approach. First, the two disciplines complement each other. According to them, creating art allows different insights into mathematics, while mathematics inspires new art. Secondly, this approach enables more fluent work. On the one hand, when mathematics gets problematic,

Fig. 23 A page from the first origami book published in Japan in 1797. (Wikipedia commons, source: commons. wikimedia.org/wiki/ Category:Hiden_Senbazuru_Orikata)

12 Mathematics and Origami: The Art and Science of Folds

331

one can switch to visual representation, and on the other hand, art can be analyzed in light of mathematical understanding (Demaine and Demaine 2009). Origami is a great medium for developing structural form due to the characteristics of development and folding, which are useful for the design of deployable structures. However, it has its restrictions. If folding is combined with bending, the result is called curved folding, which can be applied to materials that come in sheets (Demaine et al. 1999). The art models of curved-crease are shown in Fig. 24. It is called “Computational origami” and is the work of Erik and Martin Demaine on display at the Museum of Modern Art (MoMA) in New York as part of the permanent collection. The curved-crease sculptures were known at the beginning of the last century as a result of Joseph Albers’ work at the famous art school Bauhaus in Germany and later at the Black Mountain College in North Carolina. Both artists and professors encouraged experimenting with different materials, including paper, and held a preliminary course in “paper folding.” In Fig. 25 we can see the work of students of paper studies at Black Mountain College (Adler 2004). This course had great pedagogical value, since paper folding enabled students to explore constructions through hands-on activities. Materials, such as paper, have certain limitations, but according to Albers, such constrains should awaken students’ creativity. His approach greatly influenced modern architecture, art, and design. Mathematicians and engineers study his models and examine their geometrical and mechanical features (Magrone 2015). The family of pleated origami models, such as pleated hyperbolic paraboloids, comprises simple repeated patterns of mountain and valley folds, forming concentric shapes. It is fascinating that paper somehow finds a configuration of equilibrium, where flat parts remain flat, while creased parts

Fig. 24 “Computational origami” by Erik and Martin Demaine (Demaine and Demaine 2009, source: www. erikdemaine.org)

332

N. Budinski

Fig. 25 Picture of students work in Black Mountain’s paper studies course in thesis “A New Unity, the Art and Pedagogy of Joseph Albers” by Esther Dora Adler (2004) Fig. 26 Paper folded hyperbolic paraboloid

remain curved (Demaine et al. 2011). A simple model of a paper folded hyperbolic paraboloid is shown in Fig. 26. Paul Jackson highlights paper folds as an excellent point to start teaching design since it is a simple and nonexpensive way of matter transformation (Jackson 2011). He is considered a pioneer of the one crease style. His results come from the exploration of possibilities coming from minimal or even a single one-fold piece of paper. Jackson can be considered as an artistic minimalist. He is considered as the creator of recognizable and appealing models obtained from very few folds (Smith 2011), where intertwined light and shades on a sheet are essential (Konjevod 2008). Figures 27, 28, and 29 show the work of Paul Jackson. Even though there is no solid definition of minimalism, besides using a minimal number of possible folds, Jackson’s challenge of minimalism was accepted by other origamists. Paola Versnic’s (2004) Santa Claus is an origami minimalism masterpiece. The pose of the model is remarkable, sensitive, and impressive and achieved in a simple way. Even though it is extremely plain, it has immediate impact and recognition. It attracts instant interest and attention. It is shown in Fig. 30.

12 Mathematics and Origami: The Art and Science of Folds

333

Fig. 27 Minimalism of Paul Jackson. (Used with author’s permission)

Fig. 28 Organic Abstract by Paul Jackson. (Used with author’s permission)

Origami might seem too simple to be important, but Robert Lang, one of the leading origamists in the world, made bridges among mathematics, science, and nature. Lang holds a doctorate in physics, but he dedicated his work to origami. With his work he proved that origami is not just a playful activity but a potential problem solver in the field of design, fashion, electronics, robotics, or space research (Orlean 2017). His book Origami Design Secrets has become an origami bible. Origami-related terminology suggested by Lang, such as circle packing or uniaxial base, has become very well accepted. Besides creating numerous origami diagrams, Lang produced two computer programs for implementing origami designs. The first is called Tree maker. Tree maker is a computer program that designs origami bases. It produces crease patterns for “tree” resembling forms, such as people or bugs. The second computer program is called Reference finder, which is also

334

N. Budinski

Fig. 29 Paper bags by Paul Jackson. (Used with author’s permission)

Fig. 30 Folded Paola Versnic’s Santa Claus origami model. (Source: www. orihouse.com)

very useful since it gives instructions for the patterns. Robert Lang changed the very meaning of origami and increased its importance dramatically. Origami has become more complex and practical. According to Lang, origami is developed in three segments, even though not strictly divided: mathematical, computational, and engineering. Besides developing the mathematical, computational, and engineering foundations of origami, Lang left his mark on the art of origami. According to

12 Mathematics and Origami: The Art and Science of Folds

335

Fig. 31 Dogwood blossom, Opus 688 by Robert Lang. (Used with author’s permission, source: www. langorigami.com)

Fig. 32 White Rhinoceros, Opus 714 by Robert Lang. (Used with author’s permission, source:www. langorigami.com)

him, origami can be compared with music. Lang compares folding instructions to a performance guide that still allows the performer to express oneself. Figures 31, 32, and 33 show the art work of Robert Lang. For each displayed artwork, Lang uses various mediums, such as Korean hanji, Japanese paper, or even an American foil, respectively. Regardless of the medium, each piece is breathtaking.

336

N. Budinski

Fig. 33 Elevated Icosahedron (gold) by Robert Lang. (Used with author’s permission, source: www. langorigami.com)

Fig. 34 Allomyrina dichotoma, Opus 655 by Robert Lang. (Used with author’s permission, source: www.langorigami.com)

Robert Lang is recognized, among his other famous paper figures, for origami insects that are very complex, colorful, and realistic. Before Lang, there were very few origami models of insects. He raised insect folding to an artistic level by finding inspiration in spiders, scorpions, and other arthropods, which can be both fascinating and disturbing (Lang 1995). Lang’s models are beautiful and elegant and somehow absorb negative emotions. Some of his extraordinary passion for art involving paper folded insects is shown in Figs. 34, 35, and 36.

12 Mathematics and Origami: The Art and Science of Folds

337

Fig. 35 Allomyrina Yellow Jacket, Opus 624 by Robert Lang. (Used with author’s permission, source: www. langorigami.com)

Fig. 36 Allomyrina Stag Beetle, Opus 220 by Robert Lang. (Used with author’s permission, source www. langorigami.com)

Wet origami is a widely accepted technique established by Yoshizawa. Wet folding gives origami models a more realistic appearance. In Figs. 37 and 38, we can see the work of origami artist David Chain made by this technique. Animals made by this origamist take on a very life-like expression. His artistic expression is detached from any references, he creates his design by himself and his models look like they are clay models rather than folded paper. Paper puts many limitations on those who fold, but origami artists such as Daniel Chang overcome these limitations with their lavish imagination resulting in astonishing artwork. Chang is distinguished for paper folded face sculptures. Catching a facial expression in paper is quite challenging, but his results bring inner emotions to the surface accurately. In Fig. 39, we can see his art work called “Female Hair Style.”

338 Fig. 37 Protector by David Chang. (Used with author’s permission source: www. flickr.com/photos/mitanei)

Fig. 38 Prancing pony by David Chang. (Used with author’s permission, source: www.flickr.com/photos/ mitanei)

Fig. 39 Female hair style by David Chang. (Used with author’s permission, source: www.flickr.com/photos/ mitanei)

N. Budinski

12 Mathematics and Origami: The Art and Science of Folds

339

Origami tessellations are an aspect of flat origami that represents paper folding into tessellated patterns. Tessellations are known as patterns consisting of shapes arranged side by side without gaps. Patterns can repeat as long as they continue with the folding. Each origami tessellation consists of the following patterns: the crease, front, back, and light pattern. Light pattern can be seen when origami is put up to the light (Verrill 1998). The most common types of origami tessellations are classic and corrugation. Classic tessellations are based on square or hexagonal grids that are folded into an odd number of layers forming patterns. The corrugation tessellations are based on a layer where tessellations are made into waves and wrinkles in the paper. The discovery of flat origami tessellations is attributed to Shuzo Fujimoto (1976) and Lister (1997)). The world of origami tessellation is enriched by Joel Cooper’s striking three-dimensional sculptures of origami tessellations with exciting elements of tessellated nets. His artistic style is distinguished by folded masks inspired by bronze sculptures. Face shapes are made from a single paper sheet folded in tessellations. In Figs. 40 and 41, we can see Cooper’s female and male mask made with origami tessellation technique. The work of David Huffman has been recently revealed to the public, and among the many things he did, he is recognized for exceptional tessellations that are very different not only from the most modern paper folded tessellations but also from Fujimoto’s, which are considered historical. Those tessellations are three dimensional and unlocked, which means that they can be rigidly folded by bending materials as crease lines into final form (Davis et al. 2013). Another important reference in origami tessellation folding is Chris Palmer, whose designs of Queen

Fig. 40 Eurydice 6 by Joel Cooper. (Used with author’s permission, source: www. flickr.com/photos/ origamijoel)

340

N. Budinski

Fig. 41 Hector 1 by Joel Cooper. (Used with author’s permission, source: www. flickr.com/photos/ origamijoel)

Fig. 42 Queen Box Decoration design by Chris Palmer, folded by Jorge Jaramillo. (CC BY 2.0, source: www.flickr.com/ photos/georigami)

Box Decoration and Five Points Flower Tower are shown folded in Figs. 42 and 43, respectively. When analyzing the connections between origami and mathematics, we need to mention Thomas Hull who has dealt with this topic in both mathematical and artistic ways. Hull is also known for using techniques that combine more than one

12 Mathematics and Origami: The Art and Science of Folds

341

Fig. 43 Five Points Flower Tower design by Chris Palmer, folded by Jorge Jaramillo. (CC BY 2.0, source: www.flickr.com/ photos/georigami)

Fig. 44 Torus 1 by Thomas Hull. (Used with author’s permission, source: www. flickr.com/photos/ 33761183@N00)

piece of paper and explore advanced mathematical concepts of origami. Thomas Hull’s field of research is the mathematics of origami, but he is also known for his models of polyhedrons and geometrical objects. These stunning geometrical objects are an excellent visual example of the bridge among mathematics, origami, and art. In Fig. 44 we can see torus, which is a three-dimensional triangle twist tessellation, while in Fig. 45 we can see a 3D grid made from a square grid. Hull also does wet origami techniques, and a result of that work is in Fig. 46, where we can see a hyperbolic paraboloid. The boundary of the paper follows a Hamilton circuit on the cube graph. His model of Five Intersecting Tetrahedra has been selected as one of the ten best origami models of all-time by the British Origami Society. In Fig. 47, we can see one example of FIT. Hull is also known as the inventor of the PHiZZ unit that is used for modular origami (Hull 2006). In Fig. 48 we can see his work called “Phizz Variation 2: rhombicosidodecahedron” made from 120 PHiZZ units.

342

N. Budinski

Fig. 45 3D Grid by Thomas Hull. (Used with author’s permission, source: www. flickr.com/photos/ 33761183@N00)

Fig. 46 Cube 1 by Thomas Hull. (Used with author’s permission, source: www. flickr.com/photos/ 33761183@N00)

Besides mathematics, origami can be combined with robotics in artistic expression. Matthew Gardiner works in the fields of art and science, connecting origami and robotics. He coined the term that describes his work as orirobotics. He bases his research on nature, origami, and robots and connects aesthetics and biomechanics. In Fig. 49 we can see Matthew Gardiner’s interactive gardens, rich in color, form, and material. Jun Mitani, a Japanese computer scientist, is referred to as an origami magician due to his creations of extraordinarily complex and so-called organic origami

12 Mathematics and Origami: The Art and Science of Folds

343

Fig. 47 Five Intersecting Tetrahedra, design by Thomas Hull. (CC BY 2.0, source: www.flickr.com/ photos/fdecomite/)

Fig. 48 Phizz Variation 2: rhombicosidodecahedron by Thomas Hull. (Used with author’s permission, source: www.flickr.com/photos/ 33761183@N00)

forms. The fusion of his two passions, computer science and origami, resulted in development of software for origami called Ori-revo and Ori-revo-morth. The first one generates 3D origami shapes with rotational symmetry that can be folded from a sheet of paper. The second one is for animation of origami folding and unfolding, so users can observe the process of generating 3D origami objects from a single sheet of paper. In Figs. 50 and 51, we can see models of Mitani’s “Triangle of whipped cream” and “A column embossed with circles” on the left sides, while the crease patterns are on the right sides (Mitani 2016). Mitani recognizes origami technology as a tool for designing various products, for example, in the fashion industry. The dynamic collaboration with fashion

344

Fig. 49 Interactive Gardens by arselectronica, CC-BY-NC-ND-2.0)

N. Budinski

Matthew

Gardiner.

(Source:

www.flickr.com/photos/

Fig. 50 “Triangle of Whipped Cream” and its crease pattern. (Source: www.mitani.cs.tsukuba. ac.jp, CC 4.0)

designer Issey Miyake resulted in creating garments from a single sheet of fabric without cutting or sewing, only by permanent pleats folding and using imperceptible snaps.

Conclusion Each of these origamist–scientists–mathematicians–artists has their own specific approach, own folding skills and techniques, and area of expertise. Each one is pushing the limits of the disciplines of their interest. From simple paper forms, like cranes and frogs, this traditional craft has become modern, global, mathematical,

12 Mathematics and Origami: The Art and Science of Folds

345

Fig. 51 “A Column Embossed with Circles” and its crease pattern. (Source: www.mitani.cs. tsukuba.ac.jp, CC 4.0)

scientific, and artistic, suitable for creating unexpected results. Prominent art museums worldwide celebrate the works of masters of origami folds, and origami art can be found as part of the collection in the Museum of Modern Art in New York, Hangar 7 in Salzburg, Tikotin Museum of Japanese art in Haifa, and in the Pendulum Museum in Vancouver. Exploration and experimentation with traditional and modern media gives one of the oldest materials such as paper many abilities for creative expression. The fusion of paper and computer, past and present, new and old, opens endless possibilities for mathematics and art as a unified concept. The dream of Akahira Yoshizawa has become truth, and origami has become a boundless creative art. Even more, it has become a mathematical and scientific discipline and also found its place in mental therapies, education, media, and design. Origami’s richness comes from its purity. The requirements are small: a piece of paper, hands, and a little bit of imagination. The outcomes are astonishing: proofs of theorems, wings of spaceships, medical devices, clothes, furniture, sculptures, and many more that the future will bring.

Cross-References  TOND to TOND: Self-Similarity of Persian  TOND to TOND: Self-Similarity of Persian TOND Patterns, Through the Logic

of the X-Tiles

References Abel Z, Cantarella J, Demaine E, Eppstein D, Hull T, Jason K, Lang R, Tomohiro T (2016) Rigid origami vertices: conditions and forcing sets. J Comput Geom 7(1):171–184 Adler D (2004) A new unity, the art and pedagogy of Joseph Albers. Ph.D. thesis, University of Maryland Alperin R, Lang R (2006) One-, two-, and multi-fold origami axioms. In: Origami 4 fourth international meeting of origami science, mathematics and education. A K Peters Ltd, Natick Balkcom D (2002) Robotic origami folding. Ph.D. thesis, Carnegie Mellon University

346

N. Budinski

Burton D (2006) The history of mathematics: an introduction. McGraw−Hill, New York Davis E, Demaine ED, Demaine ML, Ramseyer J, Tessellations O (2013) Reconstructing David Huffman’s origami tessellations 1. J Mech Des 135(11):111010. https://doi.org/10.1115/ 1.4025428. ISSN 1050–0472. http://mechanicaldesign.asmedigitalcollection.asme.org/article. aspx?doi=10.111/1.4025428 Demaine E, Demaine M (2009) Mathematics is art. In: Proceedings of 12th annual conference of BRIDGES: mathematics, music, art, architecture, culture, Banff, pp 1–10 Demaine E, Demaine M, Lubiw A (1999) Polyhedral sculptures with hyperbolic paraboloids. In: Proceedings of the 2nd annual conference of BRIDGES: mathematical connections in art, music, and science (BRIDGES’99), pp 91–100 Demaine M, Hart V, Price G, Tachi T (2011) (Non)existence of pleated folds: how paper folds between creases. Graphs and Combinatorics 27(3):377–397 Dureisseix D (2012) An overview of mechanisms and patterns with origami. Int J Solids Struct 27(1):1–14 Felton S, Tolley M, Demaine E, Rus D, Wood R (2014) A method for building self-folding machines. Science 345(6197):644–646 Fenyvesi K, Budinski N, Lavicza Z (2014) Two solutions to an unsolvable problem: connecting origami and GeoGebra in a Serbian high school. In: Greenfield G, Hart G, Sarhangi R (eds) Proceedings of bridges 2014: mathematics, music, art, architecture, culture. Tessellations Publishing, Phoenix, pp 95–102 Fujimoto S (1976) Twist origami, Home print Harbin R (1956) Paper magic. Oldbourne Press, London Hatori K (2001) K’s Origami, Fractional Library retrieved from https://urldefense.proofpoint.com/ v2/url?u=https-3A__origami.ousaan.com_library_&d=DwIF-g&c=vh6FgFnduejNhPPD0fl_yR aSfZy8CWbWnIf4XJhSqx8&r = WPwWG2ce3JB3-UYO4Xo6BzA39JJ5GA-pgmDH9wKJgB w&m=9gvCA4pN5s3RZH29MCD5c3mqvkG22CJV3voD2KnnKis&s = WpCmhgXXbzqqUW 8CkCCy8zLQLO3YdRSZCocvv8VljQ8&e on 20.06.2019. Hull T (1994) On the mathematics of flat origamis. Congr Numer 100:215–224 Hull T (2002) The combinatorics of flat foldis survey. In: Proceedings of the third international meeting of origami science, mathematics, and education, pp 29–38 Hull T (2003) Counting mountain-valley assignments for flat folds. Ars Combin 67:175–188 Hull T (2006) Project origami: activities for exploring mathematics, Wellesley. AK Peters Huzita H (1989) Axiomatic development of origami geometry. In: Proceedings of the 1st international meeting of origami science and technology, pp 143–158 Huzita H (1992) Understanding geometry through origami axioms. In: COET91: Proceedings of the first international conference on origami in education and therapy. British Origami Society, pp 37–70 Jackson P (2011) Folding techniques for designers, from sheet to form. Laurence King Publishing, London, p 118 Justin J (1986a) Mathematics of origami, Part 9. Br Origami Soc 118:28–30 Justin J (1986b) Résolution par le pliage de l’équation du troisième degré et applications géométriques. L’Ouvert - Journal de l’APMEP d’Alsace et de l’IREM de Strasbourg (in French) 42:9–19 Kasahara K (1973) Origami made easy. Japan Publications Inc, Tokyo Kasahara K (2003) Extreme origami. Sterling, New York Konjevod G (2008) Origami science, mathematics and technology, poetry in paper, the first origami exhibition in Croatia with international origami masters. Open University Krapina, Krapina City Gallery Lakes RS, Witt R (2002) Making and characterizing negative Poisson’s ration materials. Int J Mech Eng Educ 30(1):50–58 Lalloo M (2014) Applied origami. Ingenia 61:33–37 Lang R (1995) Origami insects and their kin. General Publishing Co., Toronto Lang R (2003) Origami and geometric constructions, retrieved from https://urldefense.proofpoint. com/v2/url?u = http-3A __ www.semanticscholar.org_paper_Origami-2Dand-2DGeometric-2D

12 Mathematics and Origami: The Art and Science of Folds

347

Constructions-2DLang_37098bb78bc40596f4019cca42020545d9da79c0&d=DwIF-g&c=vh6F gFnduejNhPPD0fl_yRaSfZy8CWbWnIf4XJhSqx8&r=WPwWG2ce3JB3-UYO4Xo6BzA39JJ5 GA-pgmDH9wKJgBw&m = 9gvCA4pN5s3RZH29MCD5c3mqvkG22CJV3voD2KnnKis&s=J C92zXyFK1fbbGqyyccCGSh6KYy10y3vKU1D8DF8IrQ&e= on 20.06.2019. Lang R (2009) Mathematical methods in origami design. Bridges: mathematics, music, art, architecture, culture, pp 13–20, Banff Centre Banff, Alberta, Canada Lebee A (2015) From folds to structures, a review. Int J Space Struct 30(2):55–74. Multi Science Publishing Lister D (1997) Introduction to the third edition. In: Harbin R (ed) Secrets of origami: the Japanese art of paper folding. Dover Publications, Mineola, pp 1–3. (Originally published, 1964) Liu K (2017) The magic and mathematics of paper-folding retrieved from https://urldefense.proof point.com/v2/url?u=http-3A__www.tor.com_2017_06_29_the-2Dmagic-2Dand-2Dmathematics2Dof-2Dpaper-2Dfolding_&d=DwIF-g&c=vh6FgFnduejNhPPD0fl_yRaSfZy8CWbWnIf4XJh Sqx8&r=WPwWG2ce3JB3-UYO4Xo6BzA39JJ5GA-pgmDH9wKJgBw&m=9gvCA4pN5s3R ZH29MCD5c3mqvkG22CJV3voD2KnnKis&s=dVc90iMY2kosF-xWSh6VwCDCGqG2Ry2R He0L61qEGSk&e= on 20.06.2019. Lv C, Krishnaraju D, Yu H, Jiang H (2014) Origami based mechanical metamaterials. Sci Report 4:5979 Maehara H (2010) Reversing a polyhedral surface by origami-deformation. Eur J Comb 31(4):1171–1180 Magrone P (2015) Form and art of closed crease origami. In: Proceedings of 14th conference of applied mathematics. Slovak University of Technology, Bratislava Mahadevan L, Rica S (2005) Self organized origami. Science 307:1740 Messer P (1986) Problem No. 1054. Crux Math 12:284–285 Mitani J (2016) 3D origami art. A K Peters/CRC Press, Natick Mukerji M (2007) Marvelous modular origami. A K Peters Ltd, Natick Nishiyama Y (2012) Miura folding: applying origami to space exploration. Int J Pure Appl Math 79(2):269–279 Orlean S (2017) Origami lab: why a physicist dropped everything for paper folding: The New York Times retrieved from The Origami Lab > on 20.06.2019. Peraza-Hernandez E, Hartl D, Malak D Jr, Lagoudas D (2014) Origami-inspired active structures: a synthesis and review. Smart Mater Struct 23(9):1–50 Row S (1893) Geometrical exercises in paper folding. Addison & Co, Mountain Road Schneider J (2004) Flat-foldability of origami crease patterns. http://www.sccs.swarthmore.edu/ users/05/jschnei3/origami.pdf. http://www.britishorigami.info Smith J (1993) Some thoughts of minimal folding. British Origami. https://pdfs.semanticscholar. org Smith J (2011) Simplicity and realism in origami retrieved from https://urldefense.proofpoint.com/ v2/url?u=https-3A__pdfs.semanticscholar.org_6a00_3260b6455aa7a6c21ce42e44d1b10b2c1a 58.pdf&d=DwIF-g&c=vh6FgFnduejNhPPD0fl_yRaSfZy8CWbWnIf4XJhSqx8&r=WPwWG2 ce3JB3-UYO4Xo6BzA39JJ5GA-pgmDH9wKJgBw&m = 9gvCA4pN5s3RZH29MCD5c3mqv kG22CJV3voD2KnnKis&s = _D-kDbmSVOOTcKMBGfDRFIjzrUSbPVVU2z9l9VzjvaY&e= on 20.06.2019. Sorguç A, Hagiwara I, Selçuk S (2009) Origamics in architecture: a medium of inquiry or design in architecture. METU J Fac Archit 26(2):235–247 Tachi T (2011) Rigid-foldable thick origami. In: Origami5: fifth Int. meeting of origami science, mathematics, and education, pp 253–263 Turner N, Goodwine B, Sen M (2016) A review of origami application in mechanical engineering. Proc Inst Mech Eng C J Mech Eng Sci 230(14):2345–2362

348

N. Budinski

Verrill H (1998) Origami tessellation. In: Bridges: mathematical connections in art, music, and science. Winfield, Kansas, pp 55–68 Versnic P (2004) Folding for fun. Orihouse, p 36 Wertheim M (2004) Cones, curves, shells, towers: he made paper Jump to life. NYork Times http:// www.nytimes.com/2004/06/22/science/cones-curves-shells-towers-he-made-paper-jump-to-life. html Zhmud L (2006) The origin of the history of science in classical antiquity. Walter de Gruyter, Berlin

Geometric Strategies in Creating Origami Paper Lampshades: Folding Miura-ori, Yoshimura, and Waterbomb Tessellations

13

Jiangmei Wu

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background on Paper Lanterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contemporary Origami-Inspired Paper Lampshades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Light, Origami Design, and Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design Parameters and Considerations for Origami Lampshade Design . . . . . . . . . . . . . . . . . Flat-Foldable Origami Tessellations: Miura, Yoshumura, and Waterbomb Patterns . . . . . . . . Mathematical Theorems Governing Flat-Foldable Origami Tessellations . . . . . . . . . . . . . . . . Miura-ori Tessellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miura-ori and the Bird’s-Foot Vertex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Folding Miura-ori into Cylindrical Lampshade with Translation Symmetry . . . . . . . . . . . . Folding Miura-ori into a Lampshade with Rotational Symmetry . . . . . . . . . . . . . . . . . . . . . Yoshimura Tessellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoshimura Tessellation and Its Double Bird’s Foot Vertex . . . . . . . . . . . . . . . . . . . . . . . . . Folding Yoshimura into Cylindrical Lampshade with Translational Symmetry . . . . . . . . . Folding Yoshimura into a Lampshade with Rotational Symmetry . . . . . . . . . . . . . . . . . . . . Waterbomb Tessellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Waterbomb Tessellation and Its Vertices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Folding Waterbomb Tessellation into Cylindrical Lampshade with Translational Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Folding Waterbomb Tessellation into a Lampshade with Rotational Symmetry . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

350 351 353 354 356 357 358 360 360 361 363 365 366 366 370 372 372 374 375 378 379 379

J. Wu () Eskenazi School of Art, Architecture and Design, Indiana University, Bloomington, IN, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_102

349

350

J. Wu

Abstract This chapter describes geometric strategies in origami design to create paper lanterns that gleam with the luminous gradations of light. While the design of origami paper lampshades is largely based on origami design, it also presents new challenges due to its specific set of design constraints as a new genre of functional art. This chapter intends to address this specific set of the design constraints through understanding the underlying mathematics in origami design and provide a set of tools for constructing origami lampshade that results in high aesthetic quality. It begins with an introduction to origami design and its relationship to mathematics and the historical background on origami paper lanterns, and it then discusses various geometric strategies for creating origami paper lampshades based on the Miura-ori, Yoshimura, and Waterbomb tessellations. The emphasis is on specific mathematical requirements for creating functional light art with dramatic and perceptual effects of translucent light.

Keywords Origami · Lampshade · Geometry · Tessellation · Flat-foldable

Introduction The original purpose of origami was to create various shapes, ranging from animal figures to objects, and as decorative items to be used in religious and ceremonial activities (pbs.org 2017). However, the craft techniques were also used to create geometrically abstract and nonrepresentational functional objects. Perhaps one of the earliest functional models of origami was a folded gift box called Tamatebako or “magic treasure chest.” This first appeared in a Japanese book published in 1743 called Ranma Zushiki which documented Edo-period design (Kasahara 2004). Outside of Japanese origami tradition, there are many disparate areas of functional folding endeavors, ranging from the fifteenth-century European napkin-folding (Sallas 2010) to early twentieth-century Bauhaus architecture (Wingler and Stein 1976). Today, constructing three-dimensional surfaces from two-dimensional sheet material has inspired artists, designers, architects, and engineers to come up with folded sculptural forms that react to kinetic movements (Wu et al. 2018) as well as the interplay of light and shadow in fashion (Morriseey 2019), products (Wu 2018), and architecture (Choma 2018). Many of these new designs relay on the techniques in computational geometry and mathematics. Unfolding a piece of origami reveals the intricate crease patterns that define the geometrical transformation of folding the piece of material. The lines of the crease patterns will keep their length constant during the paper-folding transformation and that paper cannot intersect itself. Mathematicians call these geometric transformations isometry and injection (Lang 2018). These remarkable properties found in paper folding suggest that there are deep connections between mathematics and

13

Geometric Strategies in Creating Origami Paper Lampshades:. . .

351

origami. As early as the mid-to-late 1890s, the geometry of origami was studied (Sundara Rao 1901). In 1930s, Margherita P. Beloch proved that instead of using straightedges and compasses, origami can be used to solve cubic equations (Hull 2011). While mathematical studies of origami go back at least as far as 1890s or 1930s, a new research field, origami mathematics, has been developed in the last decades in order to understand the mathematical formalization underlying paper folds. Beginning in the 1960s, computer scientists Ron Resch designed and folded paper forms using mathematical and computational algorithms (Resch 1968, 1973). Another computer scientist, David Huffman, developed many new mathematical concepts on folding based on Resch’s work in the 1970s (Huffman 1976). In 1997, mathematician Thomas Hull (Hull 1997) created a modular origami model F.I.T. that was made of five intersecting tetrahedra and today there are many artists and mathematicians using modular origami techniques to create polyhedra (Gurkewitz and Arnstein 2003). Humiaki Huzita and Jacques Justin discovered seven axioms that are specific to origami. These axioms allow for certain geometric constructions not possible with classical Euclidean Axioms, including trisecting an arbitrary angle (Huzita 1989) and constructing the cubic root of integers (Lang 1996). Robert Lang, a physicist who became one of the foremost origami artists and theorists, has written widely on the mathematical methods for both representational or figurate origami and abstract origami (Lang 2012, 2018). Folding a piece of paper can be simple and does not require any sophisticated tools. However, to model the morphology and to understand the intrinsic properties found in paper folding scientifically is very difficult and requires sophisticated tools of mathematics and computer science. In modeling the origami mathematically, the paper is often assumed to have no thickness (except in the case with thick origami) and does not stretch or distort or intersect itself. In reality, paper has thickness and can be bent and distorted in a way that is difficult to describe mathematically. In this chapter, the mathematical methods discussed are aimed at creating origami-inspired functional and aesthetically pleasing lampshades, and so many of the real-world physical models demonstrated might not have exact mathematical descriptions. Physical models can behave in ways that are difficult to understand mathematically, and mathematical models may not have exact analogues in the physical world. For example, in the mathematical study of rigid origami, the mathematical model pretends that the paper is stiff, like sheet metal, and the creases act like hinges. In reality, such rigid origami models can bend in ways the mathematical model won’t predict because, in reality, paper can bend.

Background on Paper Lanterns There is a long history of paper lampshade and lantern that have been used both as a functional item for daily life and as a symbolic item for ceremonial occasions. People of many different cultures have long preferred the soft diffused warm light transmitted by translucent materials such as paper and fabric instead of other

352

J. Wu

dazzling light sources (Klint 2018). In western culture, paper or silk lanterns are still used in many traditional festival activities. Traditional luminaria, originating from Hispanic culture and often displayed during Christmas to kindle the spirit of Christ, is made from a paper bag with a folded-down top and filled with a layer of sand that holds a lit candle (Ortega 1973). During the Festa della Rificolona in Florence, Italy, decorated paper lanterns carried on sticks become the signature of the festival that is dedicated to the Virgin Mary (Raison 1994). In China, the tradition of making lanterns out of bamboo sticks and paper, silk, skin, and other translucent materials goes back as early as 2000 years ago. Lanterns or lampshades made out of paper were considered to be especially elegant as they often require high quality craftsmanship. Today, Chinese paper lights can be found as functional objects in homes and as decorative and symbolic items in festival activities such as the lantern festival, a tradition started in the East Han Dynasty when Emperor Ming ordered that lanterns be lit in order to honor the Buddhist spirit during the auspicious full moon period of each new lunar year. Usually Chinese paper lights for homes are designed as a simple spherical or oblong plain form, while the paper lanterns for the festivals are decorated with vibrant colors and elaborate figures from myths in order to enhance the holiday spirit (Song 2015). The most well-known paper lanterns today are perhaps the Japanese paper lanterns. Lanterns were introduced to Japan from China by Buddhist priests in the fourteenth century (Hughes 1978). Japanese lanterns, similar to Chinese lanterns, are often made of paper or silk that is stretched over a bamboo stick or wire frame. Different types of paper or silk lanterns are used in different settings for various functions and symbolic meaning. For example, Chochin, often in an oblong shape, is used at the entrance of Buddhist temples, in traditional festivals, and at the entrances of bars and restaurants. Andon, often in a tetrahedral, cylindrical, or cubic shape, is often used in the interiors of hotels and restaurants. Unlike Chochin and Andon, Toro is only used on special occasions, such as Toro Nagashi, the festival of the floating lantern (García 2010). These traditional lanterns were often covered with a thin kozo paper, or occasionally mitzumata paper, that are super strong and yet appear to be light luminous and translucent. The centers of traditional lantern production were often closely link to the product of traditional paper, or called washi in Japanese. One of the well-known washi making places is the Mino area of Gifu prefecture. The well-known contemporary Akari lights made by Isamu Noguchi are still produced in Gifu prefecture using traditional lantern-making techniques (Kida 2003). To create a Chochin, a temporary wooden form is used. This form is comprised of several panels that were rotationally and evenly spaced so its silhouette resembles the profile of light. Thin strips of split bamboo are then wound around this wooden form in a spiral. To secure placement of the bamboo strips, string is then looped around each revolution of the bamboo strip vertically. Thin pieces of kozo fiber paper, cut to smaller pieces of rectangles, is applied by brush using thick wheat starch paste and is then trimmed with a knife to follow the curve of the string line. This process is repeated several times till the whole bamboo and wire structure is covered completely with paper. After the paper is dry, the temporary wooden form is then removed as the bamboo strips and the wires remain as the permanent

13

Geometric Strategies in Creating Origami Paper Lampshades:. . .

353

interior structure of the lantern. The paper exterior of the lantern is then adorned with paintings by the artists and finished with lacquered wooden rings on the top and a wood base at the bottom to hold the light source (Nichols et al. 2007). Though many great painting masters had created artwork for Chochin in the past, Chochins of the nineteenth-century are often not preserved today as they were often damaged or destroyed during use. One exceptional example was two Chochins that were painted by one of the great ukiyo-e painting master, Katsushika Hokusai (1760– 1849), which are preserved at the Museum of Fine Arts, Boston (Nichols et al. 2007).

Contemporary Origami-Inspired Paper Lampshades As demonstrated in the making of the Chochin lantern, traditional lanterns made from paper or fabric require internal frames to support otherwise flimsy and insubstantial paper and fabric. The process of making the traditional paper lantern is lengthy and tedious and is not suitable for mass production. Today, as the tradition of paper lanterns continues, it is not uncommon to see some contemporary designs use origami-inspired folding techniques to create lanterns or lampshades without internal frames. Folding adds significant structural quality to material. Contemporary origami-inspired light, either folded from paper or from other foldable synthetic material, has become very popular, thanks to a few top-quality lighting designers and manufacturers in the world such as Le Klint and Issey Miyake. Light passes through an origami paper surface, creating a beautiful translucent gradation of light that works well with the modern aesthetic. While the design of origami paper lampshades is largely based on origami design, it also presents new challenges due to its specific set of design constraints as a new genre of functional art. While traditional paper lanterns often use candle or oil as light sources, other light sources have been developed since the nineteenth century. Coal-gas lighting was patented in 1804 by German inventor Fredrich Winzer (en.wikipedia.org 2018). Because of the frame ignited by the gas was too intense, functional lampshades made of glass or light fabric were used to attenuate the light. In 1879, Joseph Swan and Thomas Edison invented the first electric light bulb. Again, to disguise the intense electric light, lampshades were used. The lampshades by Louis Comfort Tiffany in colored stained glass with elaborate patterns for very first electric lights were handmade by skilled craftsmen (Quin and Sibthorp 2012). These Art Nouveau style lamps are still very popular after over a 100 years and the originals have been collected by the museums and the collectors around the world. Since Tiffany’s first lampshade, many notable designers, including Frank Lloyd Wright, Josef Hoffman, Gerrit Rietveld, Eileen Gray, Poul Henningson, George Nelson, Alvar Aalto, Isamu Noguchi, just to name a few, have designed lampshades that have become the symbols of iconic contemporary design. It was unclear when the art of origami was first used in a contemporary lampshade design. In Scala and Sibthorp’s Lighting: Twentieth Century Classics (Quin and Sibthorp 2012), a Le Klint paper shade by the acclaimed and influential

354

J. Wu

Danish architect and industrial designer Kaare Klint that was mass-produced in 1943 was listed as one of the twentieth century classics. Kaare Klint’s paper folded light can be traced back to his father, P.V. Jesen-Klint, also a well-known architect. As early as 1907, P.V. Jesen-Klint designed a pleated paper lampshade for a paraffin lamp with the help of his friend, captain Jeppe Hagedorn, who had travelled to Japan and learned about the art of origami (Klint 2018). In 1943, Tage Klint, another son of P.V. Jensen-Klint, made modification of P.V. Klint’s original shade by adding collars so that the paper shade could fit tightly to a metal stand and founded Le Klint to sell the pleated lampshades commercially. Since then, a series of other leading designers also created pleated paper lampshades for Le Klint, including Peter Hvidt, Orla Molgaard-Nielsen, Erik Hansen, Poul Christiansen, etc. Today Le Klint still produces their lamps by folding one piece of large paper or plastic through handcraft, with the aid of automatic creasing by machines. Another contemporary origami inspired paper lampshade design example is by Japanese designer Isssey Miyake. Miyake is best known for his origami-inspired fashion designs that can be folded flat and expanded into three-dimensional forms to be worn. Recently Miyake launched a collection of origami-inspired light sculptures called In-Ei, which is made of recycled fiber from PET fibers. Miyake’s garmentmaking techniques are applied in the welding and folding of nonwoven plastic fibers to create smooth and seamless joineries (www.isseymiyake.com 2017). In both origami-inspired Le Klint and In-Ei, mathematical principles in both two and three dimensions were explored, resulting in sculptural forms that manipulated the gradations of light and shadows in poetic ways.

Light, Origami Design, and Material When light strikes the mountain and valley creases of a folded surface, it creates dramatic effects of gradations of light and shadows. When a light source is placed behind an origami structure that is folded from translucent material, the light does not pass directly through the material. It diffuses through the material, much like dye diffusing through a liquid. The glaring light source on the other side of the translucent material appears fuzzy and soft when seen from the outside. And it is precisely the perceptual quality of this warm fuzziness that has drawn people of different cultures since ancient times. A light source positioned in front of an opaque shade will produce various lighting effects such as downlight, uplight, sidelight, and backlight. However, when an origami folded design of translucent material is lit from within, the distance of the light source and the material can be essentially neglected; this has minimal effect on the perceptual quality of the light. Areas of the material that receive strong direct illumination tend to dissipate the light by transmitting it to other parts of the object. In order to create more dramatic effects with an origami light, the dihedral angles of the folds need to be carefully considered. The definition of the dihedral angle of an origami fold is the angle measured between two facets. By contrast, the fold angle of an origami is defined as deviation of the unfolded state. When an origami fold is

13

Geometric Strategies in Creating Origami Paper Lampshades:. . .

355

flat and unfolded, it is deviation from the flatness is 0, and therefore its fold angle is close to 0 and its dihedral angle is close to π. In general, the fold angle and dihedral angle are simply related: ◦

Fold angle = 180 –Dihedral Angle When the dihedral angles of the folds are large and the fold angle is small, the origami mountains and valleys seem more flat and it is closed to the unfolded state. Decreasing the dihedral angles and increasing the fold angles of the folds to make the folds sharper will bring out more dramatic gradational changes with more contrasts in illumination (Fig. 1). Another strategy to increase contrast on a folded origami form illuminated from behind is to vary the material’s thickness. Less light will pass through the surface of a material that is double or triple layered, and the illuminated surface will appear to be darker in comparison to areas where there is only a single layer of material (Fig. 2). Artist Chris Palmer (Rutzky and Palmer 2011) has used this technique to create his beautiful illuminated Shadow Folds by creating origami tessellations on translucent textiles. The material being considered in this chapter in general is paper or paper-like. Paper is an ideal and versatile choice as it can be easily cut, creased, folded, and rolled, and it can reflect and deflect the light evenly. In particular, all of the folded lampshades photographed in this chapter use a type of hi-tec kozo paper that has a three-layer structure that is perfect for lampshades: a piece of polymer is sandwiched between two pieces of kozo paper. Besides paper, many other materials can be used, including plastic, Mylar, leather, fabric, etc., for the origami lampshade design, as

Fig. 1 Comparison of an origami surface folded in paper, lit from behind. (a) Small folding angles of the folds reduce contrast on the illuminated surface. (b) Large folding angles of the folds increase contrast on the illuminated surface

356

J. Wu

Fig. 2 Examples of using material thickness in illuminated design. (a) Illuminated layered paper showing dramatic contrast between areas that are triple-layered, double-layered and single-layered. (b) Folded layered paper can showcase dramatic design features under the illumination

long as they are nonstretchy, paper-like, and translucent. However, only paper will be used in this chapter to represent a broader category of material.

Design Parameters and Considerations for Origami Lampshade Design In this chapter, the origami lampshades discussed must meet the following five design requirements: 1. The lampshade must be folded from a single sheet of paper. In reality, folding a 1:1 scale lampshade sometimes require a sheet of paper as wide as 10 feet and as long as 30 feet, an impossible feature for sourcing and for digital fabrication. A lampshade is often folded from several sheets of paper that can be connected to form a large flat sheet. 2. The lampshade must be flat-foldable. Flat-foldable design allows compact and sustainable storage and shipping. 3. It must provide a continuously enclosed volumetric body for the light source. The continuously enclosed lampshade should be big enough to allow 4” space between the light source of the lampshade surface, and it should be small enough allows the light to bounce back and forth between the paper surface to produce a smooth, gentle, and even glow. 4. It must not need to have any additional internal structure support for the 3D volumetric shape other than the origami folds themselves. 5. There must be pleasing contrast in the illuminated origami design. Since the target lampshades need to be volumetric, several possible shapes are categorized based on Fig. 3 below. The first three examples in Fig. 3 each has

13

Geometric Strategies in Creating Origami Paper Lampshades:. . .

357

Fig. 3 Various three-dimensional geometry of lampshade designs: (a) regular extruded cylindrical column, (b) regular helical twist cylindrical column, (c) irregular extruded cylindrical column, (d) regular rotational sphere, (e) irregular rotational surface, (f) irregular rotational surface

translational symmetry based on a horizontal profile along a vertical axis. Figure 3a is a linear extrusion based on a regular geometric profile, Fig. 3b is a helical extrusion based on a regular polygonal profile, and Fig. 3c is a linear extrusion based on an irregular geometric profile. The last three examples of Fig. 3 each have rotational symmetry based on a vertical profile around a vertical axis. Figure 3d has rotational symmetry based on a symmetric arc, Fig. 3e has rotational symmetry based on an irregular geometric profile, and Fig. 3f has semi-rotational symmetry as its profile changes as it rotates around a vertical axis. All the lampshade examples discussed in this chapter fall into one of the categories in Fig. 3. Note that some of the polyhedral designs, including the examples in the Platonic solids and Archimedeans solids, are left out here. Polyhedral origami lampshades produce aesthetically pleasing geometric designs and involve modular origami techniques, which are outside of the design scope defined here, therefore, they are intentionally left out.

Flat-Foldable Origami Tessellations: Miura, Yoshumura, and Waterbomb Patterns Due to the above design parameters and considerations, only flat-foldable threedimensional tessellations in origami are focused upon. There are many types of origami tessellations including twist tiling, flagstone tessellations, woven tessellation, etc., many of these tessellations and the mathematical methods have been discussed by Lang in his book tilted Twists, Tilings, and Tessellations (Lang 2018). In this chapter, only a specific set of tessellations, including the Miura-ori, Yoshimura, and Waterbomb tessellations, that can be folded from a flat sheet and can be flat foldable will be focused. When the fold angles are close to 0◦ , these three tessellations are unfolded and flat. When the fold angles are close to 180◦ , these three tessellations are folded and flat. When fold angles are between 0◦ and 180◦ , these three tessellations are not completely folded flat and can be formed into a 3D surface. These three tessellations, in many cases, can be used for lampshade designs. Figure 4 shows these three crease patterns and their folded forms in both their deployable and flat-folded states. In the crease patterns shown in Fig. 4 and in

358

J. Wu

Fig. 4 Crease patterns and the associated folded forms in deployable states and flat-foldable states: (a) Miura-ori tessellation, (b) Yoshimura tessellation, and (c) Waterbomb pattern

the entire chapter, the mountain folds are noted in solid lines and the valley folds are noted in dashed lines. These three tessellations are sometimes called origami corrugations, due to the fact that each of them can be folded into a compact and corrugated form and each of them has corrugated crease lines that are running parallel to each other. The yellow highlights in Fig. 4 refer to a single corrugated region of the tessellations, or a strip, that can be modified and generalized. This strip can then be arrayed to form a tessellation that can be either extruded or rotated to form a three-dimensional volume. In each of these tessellations, generalization can only be done in the yellow highlighted region; therefore, this type of generalization is called semi-generalization. Lang studied Miura-ori in detail and laid out a method to semi-generalize the Miura-ori to create various profiles, which he called a semigeneralized Miura-ori (SGMO) (Lang 2018). While the semi-generalizations of these three tessellations provide many design flexibility, some of the interesting designs are the result of combining multiple types of tessellations, such as a tessellation that is a combination of Miura-ori and Yoshimura pattern.

Mathematical Theorems Governing Flat-Foldable Origami Tessellations In order to semi-generalize Miura, Yoshimura, and Waterbomb tessellations, the mathematical theorems governing the flat-foldability of these tessellations need to be studied. Various conditions that satisfy the flat-foldability of a crease pattern are

13

Geometric Strategies in Creating Origami Paper Lampshades:. . .

359

discussed below. In general, a crease pattern refers to a set of mountain-folded lines (denoted by solid lines in this article) and valley-folded lines (denoted by dashed lines in this article) appearing when the folded structure is opened flat. A necessary condition for a crease pattern to be flat-foldable locally is given by Kawasaki-Justin Theorem (Mitani 2011; Lang 2018): Kawasaki-Justin Theorem A crease pattern is flat-foldable locally if and only if (i) At each vertex (a point where crease lines meet), the number of lines meeting at that vertex is even, and ◦ (ii) The sums of alternating sector angles about that vertex is 180 . For an entire origami tessellation to be flat-foldable, the Kawasaki conditions must be satisfied by all the inner vertices of a crease pattern. While the Kawasaki’s Theorem deals with the number of crease lines and the sector angles at each vertex, it does not address the necessary conditions for the mountain or valley assignments for the folds. For a vertex to be flat foldable, both the Maekawa-Justin Theorem and Big-Little-Big Angle (BLBA) Theorem must also be satisfied. Maekawa-Justin Theorem For any flat-foldable vertex, let M be the number of mountain folds at the vertex and v be the number of valley folds. Then M-V = ±2. That is, for any vertex, the number of mountain folds and valley folds connected to the vertex must differ by exactly 2. Big-Little-Big Angle (BLBA) Theorem. At any vertex, the crease on either side of any sector angle that is smaller than its neighbors must have an opposite crease assignment. In other words, at any vertex, the crease assignments for the creases on either sides of any smaller sector angle must be opposite with one being a mountain assignment and one being a valley assignment. And correspondingly, at any vertex, on either sides of the largest sector angle, the crease assignments for creases must be the same with either both being mountain assignments or both being valley assignments. In addition to the sector angles and crease assignments for each interior vertex in a crease pattern, there should be no collision of the parts of the folded structure during assembly. While Lang discussed various scenarios in his book on mathematical methods for geometric origami (Lang 2018), including the Miura-ori, Yoshimura, and Waterbomb tessellations, the geometric strategies in designing crease patterns based on the Miura-ori, Yoshimura, and Waterbomb tessellations, as well as alternations and combinations of these tessellations, for the purpose of specific lampshade designs are discussed in the following sections.

360

J. Wu

Miura-ori Tessellation The Miura-ori tessellation, credited to Japanese astrophysicist Koryo Miura, has become well-known for its application in deployable structure, such as a solar array in a 1995 mission for JAXA, the Japanese space agency (Miura 2009). The Miuraori tessellation is made of repeated parallelograms arranged in zigzag formation and has only one type of vertex: a 4-degree vertex. A key feature of the Miuraori is its ability to fold and unfold rigidly with a single degree of freedom with no deformation of its parallelogram facets as seen in Fig. 4a (Lang 2018).

Miura-ori and the Bird’s-Foot Vertex To semi-generalize the Miura-ori so that various shapes of profile paths can be generated, a single strip, or a single corrugated region of the Miura-ori (in the yellow highlighted row in Fig. 4a) will be focused upon. (“Semi-generalized Miura-ori” is a term used by Lang to refer to an approach for modifying a Miura-ori to create targeted designs (Lang 2018).) The crease pattern in the semi-generalized strip will then be repeated periodically in another direction to create origami tessellation that can then be folded into either rotationally stretched or vertically extruded the three-dimensional surfaces. At the core of the single strip Miura-ori is the 4-degree vertex, or a bird’s-foot vertex that has bi-lateral symmetry (the bi-lateral symmetry allows the semi-generalized Miura tessellation to satisfy the Kawasaki’s Theorem). To semi-generalize the Miura-ori strip, three parameters in Fig. 5 showing the bird’s-foot vertex can be adjusted: the sector angle α, the folding angle along the corrugation crease γ, the bending angle along the corrugation crease β.

Fig. 5 Crease pattern and three-dimensional geometry of a bird’s-foot vertex. (a) crease pattern, (b) partially folded surface

13

Geometric Strategies in Creating Origami Paper Lampshades:. . .

361

The relationship among them as proved by Robert Lang (Lang 2018) can be expressed as: ⎛ α = ArcTan ⎝

 ⎞

β 2   sin γ2

tan



(1)

If γ is constant, decreasing the bending angle β will decrease the sector angle α. If a target profile requires a more shallow bending profile (smaller β) as shown in Fig. 4a, it will result in smaller sector angle α. If a target profile requires a sharper bending profile (larger β) as shown in Fig. 4c, it will result in a larger sector angle α.

Folding Miura-ori into Cylindrical Lampshade with Translation Symmetry A Miura-ori strip that can be folded into either a seamless regular polygonal profile or a seamless irregular polygonal profile in which the edges of the folded paper align perfectly. Such a Miura-ori strip can be arrayed into a semi-generalized Miura-ori tessellation and which in turn can be folded into seamless cylindrical lampshade with translation symmetry. To fold the Miura strip into a seamless profile outlined ◦ by regular polygons with n sides (Fig. 6) when γ =180 , angle α and β must satisfy Eq. 2 and 3 below, respectively: α=

π n

β = 2α =

(2) 2π n

(3)

It is important to note that n must be an even number in this specific generalization, and therefore, the regular polygons must have an even number of sides. If the

Fig. 6 A Miura strip that folds into a square profile with α = 45 ◦ and β = 90 ◦ . (a) crease pattern, ◦ (b) folded geometry with γ = 180

362

J. Wu

polygons have odd number of sides, the mountain or valley crease assignments on the corrugation creases at the both ends of a Miura strip won’t be able to match. In the above example, let w to be half of the width of the strip, let d1 and d2 be the distances between the three consecutive vertex points on the Miura strip. To fold the strip into a seamless profile in the shape of a square when γ =180 ◦ , the distances between next three consecutive vertex points also need to be d1 and d2 , and in addition, the relationship between d1 and d2 must satisfy the condition below as the side lengths of the square profile must be equal: d1 = d2 + 2w/Tan (α)

(4)

To fold the Miura strip into a seamless profile outlined by irregular polygons with n sides (Fig. 7) when γ =180 ◦ , let the number of a bird’s-foot vertex in a Miura strip be i. Each sector angle αi (as in α1 , α2 , α3 , and α4 in Fig. 7a) and each bending angle βi (as in β1 , β2 , β3 , and β4 in Fig. 7a) must satisfy Eq. 5 and 6 below, respectively:

n

αi = π

(5)

i=1 n

βi = (n − 2) ∗ π

(6)

i=1

To fold the above Miura strip into a flat seamless design with a profile of an irregular quad, the distances between the vertex points are all different; however, two of the distances cannot be arbitrary. In the example above, if d1 and d2 predetermined, d3 and d4 must be derived from a value based on d1 , d2 , w, and α i . The Miura strips in both Figs. 6 and 7 can be arrayed in vertical direction to create the Miura tessellation. In general, these tessellations can be folded into seamless cylindrical columns physically in paper that are suitable for lampshade application.

Fig. 7 A Miura strip that folds into an irregular quad profile when completed folded flat with ◦ ◦ ◦ ◦ α1 = 45 , α2 = 30 ,α3 = 50 , and α4 = 55 : (a) crease patter, (b) folded geometry

13

Geometric Strategies in Creating Origami Paper Lampshades:. . .

363

As γ increases, the folded surface will bring out more dramatic gradations changes with more contrasts in illumination. However, the paper lampshade designed using the above calculations are not mathematically accurate. From Eqs. 2 and 3, for a Miura strip to be folded into a profile in the shape of regular hexagon when ◦ ◦ ◦ ◦ γ = 180 , α needs to 30 and β needs to be 60 . However, from Eq. 1, for γ = 150 ◦ ◦ ◦ and α = 30 , β is about 53 , which is less than 60 that is needed to for the regular ◦ ◦ polygonal profile to be a hexagon. Instead, when γ = 150 and β = 60 (a necessary ◦ condition for folding a seamless regular hexagonal column), α needs to be 33.69 based on Eq. 1. Figure 8 shows a mathematically correct origami model that is folded into a cylindrical column based on a regular hexagonal profile. Note that the sector angle ◦ ◦ ◦ α =33.69 while β = 60 and γ = 150 . For this mathematical model to stay valid, ◦ γ needs to be 150 ,which means that the structure cannot be deployed or collapsed, whereas in a paper model of similar shape, γ can be changed because of the bending and distortion in the paper, thus allowing the structure to be collapsed and deployed in a way that is mathematically impossible.

Folding Miura-ori into a Lampshade with Rotational Symmetry While a Miura-ori strip can be used generate various seamless and continues profile, a Miura-ori strip can also be used generate expressive open profile. This Miura-ori strip can then be arrayed to create a Miura-ori tessellation which can then be folded and stretched to create three-dimensional surfaces with rotational symmetry. The original Le Klint origami lamp, modified by Tage Klint (Fig. 9), was folded with this method.

Fig. 8 A mathematically correct Miur-ori hexagon column and its crease pattern. (a) Crease ◦ ◦ ◦ pattern with sector angle α =33.69 . (b) Folded cylindrical lampshade with β = 60 , γ = 150

364

J. Wu

Fig. 9 An early mass-produced Le Klint light design by Tage Klint. (Photo Courtesy of Le Klint)

Fig. 10 Graphical construction of a Miura-ori strip to be folded into the profile of the Le Klint lampshade design by Tage Klint

In Lang’s book (Lang 2018), Lang detailed a construction process for using Miura-ori to generate any target profile that can be used to create a threedimensional surfaces, which he called a semi-generalized Miura-ori (SGMO). Lang’s construction method is fairly straightforward and requires almost no mathematics. Below are the steps of the construction process of Tage Klink’s lamp using a method that is similar to Lang’s method with a small modification using divots at bottom rim of the Klint lampshade (Fig. 10). 1. Draw a Le Klint lampshade profile (Fig. 10a). 2. Draw pairs of lines parallel to and equally spaced from the original profile. Make sure the lines are long enough to intersect with the next pair of lines. The offset distance between the original profile and its parallel lines is half of the pleating width w (Fig. 10b).

13

Geometric Strategies in Creating Origami Paper Lampshades:. . .

365

Fig. 11 Graphical construction of the crease pattern that can be folded into the Le Klint lampshade design by Tage Klint: (a) unfolded strip, (b) mirror patterns, (c) arrayed crease pattern

3. Draw diagonal lines connecting the turning points on the profile and the respective intersecting points (Fig. 10c). 4. Trim the access guidelines (Fig. 10d). 5. Draw the silhouette of the folded strip follows the desire path (Fig. 10e). 6. Change the bird’s foot vertex to a bird’s foot vertex with the divot (Fig. 10f). The folded strip in Fig. 10f above can be rearranged onto a flat sheet in the form of polygons (Fig. 11a). This pattern can then be mirrored. Mountain and valley creases can then be assigned as in Fig. 11b. The pattern in Fig. 11b can then be arrayed to generate the crease pattern in Fig. 11c that can be folded into the Le Klint lamp with rotational symmetry designed by Tage Klint.

Yoshimura Tessellation The Yoshimura tessellation was discovered by scientist Y. Yoshimura (Yoshimura 1955) while he was researching the buckling patterns of thin-walled cylinders. One of the most important features of the Yoshimura pattern is its ability to allow the form to reduce the dimensions in all directions when compressed or folded, facilitating easy transportation and storage. A regular deployment of the pattern produces an approximated arc form that is suitable for creating three-dimensional volumetric surface.

366

J. Wu

Yoshimura Tessellation and Its Double Bird’s Foot Vertex If two Miura-ori bird’s foot vertices that have opposite sector angles come close together to form a single vertex, it becomes a double bird’s foot vertex. This double bird’s foot vertex can be arrayed on a strip and the strip that can then be arrayed to form a Yoshimura tessellation (Fig. 4b). Since the double bird’s-foot vertex is a degree-6 vertex which has three degree of freedom, the deployment and compression of Yoshimura is somewhat flexible. If the boundary of the Yoshimura pattern is not fixed, then its folded surface can be twisted and deformed in multiple directions (Wu 2017). To semi-generalize the Yoshimura pattern so that various shapes of profiles for either extrusion or rotation can be generated to be used as a lampshade, a single strip of Yoshimura pattern will be focused upon. At the core of the single strip Yoshimura is the double bird’s-foot vertex that has bi-lateral symmetry (the bilateral symmetry allows the semi-generalized Yushimura tessellation to satisfy the Kawasaki’s Theorem). Since a double bird’s-foot vertex is a degree-six vertex that has three-degree of freedom, folding angle γ on the left and right could work independently. However, to restrict the degree of freedom in the mechanical behavior of a Yoshimura strip, only the consistent folding angle γ is considered here. To semi-generalize the Yoshimura, four parameters in Fig. 12 showing the double bird’s-foot vertex can be adjusted: the two sector angles α1 and α2 , the folding angle along the corrugation γ, the bending angle of along the corrugation β. From Eq. 1 of a bird’s-foot vertex, Eq. 7 can be arrived as proved by Robert Lang (Lang 2018): γ β = tan (α) sin tan 2 2

(7)

In a double bird’s foot vertex, the folding angle γ is restricted to be consistent. From Eq. 7, β in the double bird’s foot vertex must therefore satisfy the Eq. 8 below when two sector angle α1 and α2 are different:    γ    γ     +2Arctan T an α2 × Sin (8) β = 2Arctan T an α1 × Sin 2 2

Folding Yoshimura into Cylindrical Lampshade with Translational Symmetry A Yoshimura strip that can be folded into either seamless a regular polygonal profile or a seamless irregular polygonal profile. Such a Yoshimua strip can be arrayed into a Yoshimura tessellation and which in turn can be folded into seamless cylindrical lampshade with translation symmetry. The seamless condition results from the fact that the edges of the folded paper align perfectly without any gaps. To fold the Yoshimura strip into a seamless profile outlined by regular polygons with

13

Geometric Strategies in Creating Origami Paper Lampshades:. . .

367

Fig. 12 A Yoshimura double bird’s-foot vertex: (a) crease pattern, (b) partially folded form when ◦ ◦ ◦ ◦ α1 = 37.7 , α2 = 21.7 , β = 95.0 , and γ = 99.9





Fig. 13 Two Yoshimura strips: (a) One folds into a square profile with α = 45 , β = 180 , and ◦ ◦ ◦ ◦ γ = 180 , (b) the other one folds into a hexagon profile with α = 30 , β = 120 , and γ = 180 ◦

n sides (Fig. 13) when γ =180 , angle α and β must satisfy Eqs. 9 and 10 below, respectively: α=

π n

β = 4α =

(9) 4π n

(10)

It is important to note that n must be an even number that is bigger than 4. If the polygons have an odd number of sides, the mountain or valley crease assignments ◦ at the ends of a Yoshimura strip won’t be able to match. When n = 4 and α = 45 the Yoshimura strip can be folded with no overlapping part as in Fig. 13a. However, the folded strip does not leave any space for a light source if the strip is arrayed and √ ◦ 3 folded into a lampshade. When n = 6, α = 30 , w = 6 d, the ratio between the area of the empty space and the over enclosed area by the folds is 1/3. The lampshade

368

J. Wu

folded from the tessellation that is arrayed from such a strip is big enough to host a light source. In general, in a Yoshimura regular polygonal strip, let AL to be the larger enclosed area by the folds, and AS to be the empty space. Since d 2Cos (α)

(i)

d ∗ Tan (α) Tan (2α)

(ii)

AL = As =

Based on the above two equations, the ration between AS and AL is therefore given by the following equation: AS 2Sin (α) = f (α) = AL T an (2α)

(11)

Plotting the function f (α) gives us above diagram (Fig. 14). As the sector angle α gets smaller, the ration between AS and AL gets larger, which means that the empty volume enclosed by the paper folds gets bigger compared to the overall paper fold volume. The empty volume that is a big enough to host a light source is a desired feature for a lampshade. However, as α gets smaller and the empty volume gets larger, the pleating width w is also getting smaller. Smaller pleating width creates less dramatic gradation of light and shadows. Therefore, the key here is to balance these design considerations to have a successful lampshade design. A Yoshimura strip can be folded into a seamless profile outlined by irregular ◦ polygons with n sides (Fig. 15) when γ =180 . Let the number of double bird’s-foot vertex in a Yoshimura strip be i. Let the distance between the two consecutive double bird’s-foot vertex be di . If the lines that divide sector angles of two consecutive

AS/AL 1.0 0.8 0.6 0.4 0.2

0.2

0.4

0.6

Fig. 14 Graph show the ration between AS and AL and Sector angle α

Sector angle α in radian 0.8 0 4, a laid rope will naturally get more of a tubular surface, like a braided rope, confer by looking at the cross sections in Fig. 17. Defining how to measure the diameter of a braided rope or a laid rope with n > 4 is rather straightforward. However, in the case of laid rope with n = 2, 3, 4, it could be debated how to define the diameter, especially in the case of n = 2. A simple solution is to see it as the circle that encloses the rope, somewhat dubious for n = 2 but acceptable for n ≥ 3. As mentioned above, it is rather difficult to calculate an accurate diameter based on the geometry of the theoretical cross section of a rope with all the, normally, unknown physical parameters and factors. A useful equation is presented in Himmelfarb (1957), seemingly adapted to a common angle of lay and the factors mentioned above. The approximation of π1 is likely part of the practical adjustment: dr = 0.32ds · (n + 3)

(5)

Where dr is the diameter of the rope, ds is the diameter of a strands, and n is the number of strands. If n ≥ 4, it is essential that the correct diameter of the core is used, as the effect of the size of the core is ignored in the equation. The size of the core is discussed in the subsequent section. Nowadays, rope is normally made using machines that directly spools the rope onto bobbins. These machines are often specialized for specific rope dimensions and need to be adapted (e.g., changing gears, etc.) in order to get other dimensions. Laying rope on a rope walk gives more flexibility in terms of rope dimension, which makes Eq. 5 rather useful.

432

A. Åström and C. Åström

Core Diameter Studying the cross sections in Fig. 17, it is intuitive to see why a core is needed for n ≥ 4. A core could also be used for n = 3 but would need to be rather small in comparison to the strands. It would in addition make the rope diameter calculation less accurate if using an equation adapted for the deformation of the strands due to the space between the strands. A core is usually applied for rope with n ≥ 4. Normally, the core consists of an additional strand, made of yarns, which during the laying of the rope is twisted in the same direction as the strands. An alternative is to use another smaller rope as the core. This method has the advantage that the core will not be deformed. Obviously this needs to be accounted for in the calculations. Using a rope like this for other applications than decorative ones may have unwanted properties such as decreased tensile strength as the core rope will not have the same tensile properties as an additional strand used as core. Calculations of the size of the core do not suffer from all the uncertainties as the rope diameter. As the yarns are twisted in the core, they will form helices, but the core itself will not be shaped into a helix, like the other strands. It does, however, have some additional difficulties due to the outer surrounding strands compressing the core. This makes the cross section of the core resembling a polygon rather than a circle. For instance, some specimen of four-stranded rope with a core will have the cross section of the core shaped as a square. A useful equation is presented by Lawrie (1948), just like the rope diameter equation seemingly adjusted based on field experience: dc =

3 32 ds

· (n + 3)

(6)

Where dc is the diameter of the core, ds is the diameter of a strands, and n is the number of strands. Lawrie (1948) writes that the core is usually made slightly larger than the calculated diameter in order to compensate for compression and strand deformation. Another interesting method for calculating the size of the core in fourstranded ropes is given by af Trolle (1841). He argues that the core shall consist of around 8–12% of the total number of yarns in the strands of the rope.

Mechanical and Physical Properties An important physical property to consider when working with rope is the humidity, which affects the hardness of the rope and thus how pliable it will be. The humidity is significant for both the process of manufacturing the rope and the activity when using the rope, which is focused on decorative purposes in this chapter. This makes it important to know what the rope will be used for already when producing the rope. If the rope is manufactured in an environment with a high humidity and later on moved to an environment with low humidity, the rope will dry up and become looser (the helix angle will decrease). If the rope instead is manufactured in an environment with a low humidity and the humidity later on increases, it will become harder (the helix angle will increase). Thus, when using rope to create fancywork, different humidity is desired which depends on the field of application. Note also

15 Art and Science of Rope

433

that different materials are affected to different extent by the humidity. Further, there are clearly a number of important mechanical and physical properties of rope itself to consider as well, such as linear density, elongation, and breaking force. Obviously, these properties are associated with rope applications such as securing a boat or rock climbing. When using rope for decorative purposes, it is not likely that the rope will snap when tying fancywork or equivalent. Nonetheless, there are some indirect correlations to rope made for decorative use as well. When producing rope to be used under physical stress, it is convenient to have estimated the characteristics of the rope beforehand. When using the rope for decorative purposes, this is not as critical, which is why these properties are not discussed in much detail. These properties may also be determined using test methods defined in ISO 2307 (2010); these can then be applied on a sample of the produced rope. Other properties of interest are degree of twist and the yarn length in relation to the rope length.

Degree of Twist When laying rope on a traditional rope walk, the degree of twist is foremost dependent on the twist speed of the strands and the velocity in which the top is moved forward. In rope machines where the rope is spooled onto a bobbin, the degree of twist is naturally dependent on gear ratios specific for the machine. Deciding the degree of twist when manufacturing rope is normally something that the rope maker does based on experience and the machine or technique at hand. In contrast, measuring the degree of twist in a rope is preferably done by following a standard, such as ISO 2307 (2010). The angle is basically measured when a sample of the rope is put under a specified tension. According to the standard, the tension is given by the reference number ascribed to the rope. In general, a greater helix angle equals a harder rope structure and a less pliable rope. If the rope is twisted too hard, it will be difficult for a knot artist to work with and form the rope as intended. If the rope instead is twisted too loose, it will on the one hand be easy to lay out and form according to ones needs, but on the other hand the finished product can change shape, as it may be somewhat loose and floppy. Some authors, e.g., Barker and Midgley (1914) and Osborne and Osborne (1954), tend to refer to the resulting product as soft or medium, etc. and referring to angle intervals when describing the effect of the degree of twist. It is, however, important to relate the degree of twist to the number of strands in the rope. For instance, a two-stranded rope have certain type of applications associated with it, which may allow a wide range in the appropriate degree of twist. Looking at three-stranded rope, the associated applications may allow a narrower range in the appropriate degree of twist. Clearly, the entire span is not appropriate for every rope application. For instance, making a lasso, the degree of twist shall be as hard as possible, but when tying a key ring, the degree of twist should be somewhat smaller. See Fig. 18 for examples of products made of rope with low and high degree of twist, respectively. Note also that the degree of twist is dependent on the humidity, that is, the humidity may be different when the rope is used to manufacture the fancywork, from when the rope was laid. In addition, the humidity may also differ where the finished product is to be used.

434

A. Åström and C. Åström

Fig. 18 Decorative fancywork: (a) a belt made of flax rope with n = 3 and γ = 25◦ and (b) a rectangular diagonal knotwork realized in hemp rope with n = 3 and γ = 33◦

Linear Density The linear density of a rope refers to the quotient of the net mass per unit length when the rope is under the specified tension as described above. Recalling the cross section of a rope as a function of the ropes degree of twist, it is intuitive to see that the linear density is proportional to the degree of twist. Thus, the more packed the rope is, the higher is the linear density. Other parameters that affect the packing are the number of strands and if a core is used or not. Mckennan et al. (2004) gives the following equation for the linear density of a rope ρr in [kg/m]: ρr = 10−6 · ny · ρy · fc .

(7)

Where ny is the total number of yarns in the rope, ρy is the linear density of the yarn measured in tex, and fc is the contraction factor of the rope. The contraction factor is a consequence of the twisting or braiding of the rope, which gives the helical structures of the strands and the yarns. The twisting or braiding increases the linear density of the rope and in addition makes the rope contract in length. The contraction factor is given by the mean value of the yarns degree of twist θ . Recall the different layers of yarns in a strand as discussed in section “Machines”. The mean value is in turn given by γy , the helix angle of the yarn, measured at the surface of a strand (Hearle et al. 1969; Mckennan et al. 2004): fc = sec(θ ) =

1 2

  sec(γy ) + 1 .

(8)

Considering a rope application such as rock climbing, it can easily be understood why it may be preferred to have a rope which has very low linear density but a high breaking force. Using a rope for decorative purposes, there are no such requirements. Though, from a historical point of view, and sometimes still seen today, manufacturer of rope calculates the price of rope using its weight. The parameters of the rope could then be rope diameter, number of strands, and the total weight of the rope rather than its total length. Given this circumstance, the linear density is important since it will affect the length of the rope.

15 Art and Science of Rope

435

Breaking Force There are several factors that affect the tensile strength of rope and thus its breaking force. Mckennan et al. (2004) describes the strength of rope, including tables of breaking strength of rope of various materials. The breaking strength can also be determined by following a standard, such as ISO 2307 (2010). Pan and Brookstein (2001) reviews the physical properties of twisted structures, including rope, and describes different factors that affect the tensile behavior and strength of rope, such as lateral compression in the yarns. Another important factor for strong rope is the fiber length, which needs to be long in relation to the diameter of the yarn in order to give strong yarns. If the intention for the rope is to be used for decorative purposes, the breaking force is not among the foremost important characteristics, which means that rope made of material with short fibers, such as coir and cotton, may be used; see Figs. 7 and 11, respectively. In fact, this might be a desirable characteristic of the rope, as it will have a rather different appearance. The rope tends to be a bit fluffy, which may be desired for various decorative products. Yet, using a material with shorter fibers, the yarn thickness will be limited as the thickness is dependent of the fiber length. Elongation Elongation can be discussed mainly from three perspectives, firstly while laying the rope, secondly directly after the rope has been made, and thirdly in a finished and stabilized rope. The first and second ones are rather interlinked as a rope is laid under a certain tension which causes the strands to extract. Thus, there is a certain elongation due to the tension used while laying the rope. When the tension is released, the rope will naturally be compressed. For instance, laying a full length rope in a rope walk of 220 m, the length will differ a few percent until the rope has stabilized. This also means that the degree of twist will at first have a lower angle and then when the rope is stabilized the angle will increase. However, according to Bohr and Olsen (2011) who discusses the zero twist of rope, the rope will have no flexibility at all if laid with zero twist. In practice though, laying a rope at the zero twist is, if at all possible, rather tough. If the rope instead is produced in a machine which continuously spools the product onto a bobbin, it is more difficult for the rope to compress. Further, considering the elongation from the third perspective, when the rope has stabilized, the same reasoning as with the breaking force can be applied. The conclusion is that it is not of much importance for rope made for decorative purposes. That is, the importance rather concerns those applications where the rope will be under a significant load. The elongation in a stabilized rope can be determined by following a standard, such as ISO 2307 (2010).

Rope Length An important activity when preparing for rope making, regardless of the ropes’ intended use, is to calculate the amount of materials needed. This usually means calculating the total length of the yarn. Given all the parameters and factors that

436

A. Åström and C. Åström

will affect the resulting product and its length, this is rather complicated. These parameters can roughly be divided into geometrical and physical parameters and factors. The geometrical parameters consists of the number of strands ns , diameter of a strand ds , the angle of lay γs , the number of yarns ny , diameter of a yarn dy , the degree of twist in the yarns γy , and the yarn length Ly . The physical parameters and factors consist of the tension in the strands and yarns during the laying of the rope, the humidity, and the water absorption in the yarns. Various approximations can be found in the literature, for instance, in Himmelfarb (1957) presenting the following set of equations for calculating the length of a strand and a rope, respectively: Ls = Ly cos(γy ),

(9)

Lr = Ls cos(γs )

(10)

The length of the laid rope is denoted Lr , the strand length Ls , the helix angle of the rope γs , and the helix angle of the outer yarns in a strand γy . These approximations can be derived from a right triangle, with the yarn or strand length as hypotenuse and catheti equals to 2π rh and the resulting rope length. Lawrie (1948) uses similar approximations but multiplies the strand length with a scalar computed from a common angle of lay.

Conclusion Due to the perishable nature of cordage, the archaeological evidence of actual rope are rather few. Instead, the earliest evidence are seen as indirect, where applications must have required rope. In the case of rope made for decorative purposes, it is even more difficult to speculate when it was first used. It may even be debatable what is meant by decorative rope. Compared with the usage of cordage as clothing, a considerable amount is likely for practical use. However, some are for decorative purposes, which may be distinguishable due to weaving patterns and such. The characteristics separating practical and decorative rope may not be as distinct as that. What could be said though, with support from archaeological findings, is that in addition to the use of rope for practical purposes, there has been a secondary use of rope aimed for decorative purposes. A distinct example of such findings is the impressions of rope made on pottery discussed above. Having distinguished rope for decorative purposes from rope aimed for practical use, it is probably safe to say that most knot artists nowadays do not have the means to produce their own rope with the physical properties they prefer. Naturally, there is an overlap between the physical properties of rope for practical use and rope aimed for decorative purposes, which makes it possible to easily acquire useful rope for fancywork. From a historical, geographical and cultural perspective, the market has demanded different kinds of material and rope types. The material for rope have shifted over time, from being based on materials from a local context into materials

15 Art and Science of Rope

437

originating basically anywhere and later to being man-made materials which in theory could be made locally, and then to some extent returning to natural products. Just like the preferred material, the preferred type of rope has drifted between laid and braided. Nowadays, rope applications with physical demands are more favorable toward braided techniques. Fancywork may yet be more diverse in the preferred rope type. It may vary depending on geographical location, intended use, and over time, among other things. In Scandinavia, applications such as doormats seem to be preferred if the rope is laid and using natural products. Studying rope from a mathematical point of view, there is a lot to be learned concerning its behavior and structure, etc. For instance, the zero twist is a geometrical property rather than a material as shown by Bohr and Olsen (2011). However, as the zero twist is obtained at such a high angle of twist regardless of the number of strands, there is supposedly a material limitation that will constrain the rope from reaching the zero twist. One such limitation is tensile strength, which basically will make the rope break during laying due to the high tension being applied. Another property that in practice is dependent on the material is, for instance, material with short fibers which limits the possible size of the yarn. The unquestionable most important mathematical property of laid rope is the reason why it will not unwind. According to archaeological findings, this has been known to mankind for several millennia but as pointed out by Bohr and Olsen (2011) just recently explained mathematically. The basic idea with rope is rather simple and its structure may at first look primitive, but the mathematical models needed to fully capture the properties and characteristics are indeed rather complex. However, for the modern rope maker, equations based on approximations for calculating for instance diameter and length are more than sufficient. Note All figures and images are either made by or photographed by A. Åström if not stated otherwise. All ropes and fancywork are made by C. Åström if not stated otherwise.

References Adovasio J, Lynch T (1973) Preceramic textiles and cordage from Guitarrero cave, Peru. Am Antiq 38(1):84–90 Adovasio J, Soffer O, Illingworth J, Hyland D (2014) Perishable fiber artifacts and paleoindians: new implications. North Am Archaeol 35(4):331–352 af Trolle AE (1841) Handbok I takling, med afseende på Handelsfartyg. Berlingska Boktryckeriet, Lund Allen J, O’Connell JF (2008) Getting from sunda to sahul. In: Clark G, Leach F, O’Connor S (eds) Islands of inquiry: colonization, seafaring and the archaeology of maritime landscapes. ANU E Press/Australian National University, Canberra, pp 31–46 Amsden C (1930) What is clockwise? Am Anthropol 32(3):579–580 Andersen K (1984) Hårkullorna Fra Våmhus. Karen Andersens børn og Bangsbomuseet, Frederikshavn Ascher M, Ascher R (1969) Code of ancient peruvian knotted cords (quipus). Nature 222(5193):529–533

438

A. Åström and C. Åström

A.S.T.M. D 123-03 (2003). Standard terminology relating to textiles. American Society for Testing Materials, Philadelphia A.S.T.M. D 123-49 (1949) Standard definitions of terms relating to textile materials. In: 1949 book of A.S.T.M. standards, part 5. American Society for Testing Materials, Philadelphia, pp 1–32 Åström A, Åström C (2016) Mathematical and physical properties of rope made for decorative purposes. In: Torrence E, Torrence B, Séquin C, McKenna DKF, Sarhangi R (eds) Proceedings of bridges 2016: mathematics, music, art, architecture, education, culture, pp 681–688 Aujoulat N (2005) Lascaux: movement, space and time. Harry N. Abrams, Inc., New York Backwell L, d’Errico F (2008) Early hominid bone tools from Drimolen, South Africa. J Archaeol Sci 35(11):2880–2894 Balmelle C, Blanchard-Lemée M, Darmon J, Gozlan S, Raynaud M (2002) Le Décor Géométrique De La Mosaïque Romaine: II. Répertoire graphique et descriptif décors centrés. Picard, Paris Barber E (1991) Prehistoric textiles. Princeton University Press, Princeton Barker A, Midgley E (1914) Analysis of woven fabrics Scott, Greenwood and Sons, London Becker L, Kondoleon C (2005) The arts of antioch – art historical and scientific approaches to roman mosaics and a catalogue of the worecester art museum antioch collection. Worcester Art Museum, Worcester Bednarik RG (2005) The technology and use of beads in the pleistocene. In: Submitted to Paul Bouissac on 8-8-2005 for archaeology of gesture conference, Cork Bendure Z, Pfeiffer G (1946) America’s fabrics: origin and history, manufacture, characteristics and uses. The Macmillan Company, New York Bohr J, Olsen K (2011) The ancient art of laying rope. Europhys Lett 93(6):1–5 Brunnschweiler D (1953). Braids and braiding. J Text Inst Proc 44(9):666–686 Cahlander A (1980) Sling braiding of the andes. Colorado Fiber Center, Boulder Çamurcuo˘glu DS (2015) The wall paintings of Çatalhöyük (Turkey): materials, technologies and artists. PhD dissertation, University College London Chapman W (1798) Making ropes of any number of yarns and strands, tarred or untarred; coiling up the same while making. British Patent Number: 2219 Chauvet J (1996) Dawn of art: The Chauvet cave (the oldest known paintings in the world). Harry N. Abrams, Inc., New York Clark JGD (1936) The mesolithic settlement of northern Europe: a study of the food-gathering peoples of northern Europe during the early post-glacial period. Cambridge University Press, Cambridge Collias NE, Collias EC (1962) An experimental study of the mechanisms of nest building in a weaverbird. Auk 79(4):568–595 Conard NJ, Malina M (2016) Außergewöhnliche neue funde aus den aurignacienzeitlichen schichten vom hohle fels bei schelklingen. In: Archäologische Ausgrabungen in BadenWürttemberg 2015. Konrad Theiss Verlag, Stuttgart, pp 60–66 Connolly T, Erlandson J, Norris S (1995) Early holocene basketry and cordage from Daisy Cave San Miguel Island, California. Am Antiq 60(2):309–318 David SK, Pailthorpe MT (1999) Classification of textile fibres: production, structure, and properties. In: Robertson J, Grieve M (eds) Forensic examination of fibres. Taylor and Francis, London, pp 1–31 d’Errico F, Backwell LR (2003) Possible evidence of bone tool shaping by swartkrans early hominids. J Archaeol Sci 30(12):1559–1576 Desroches-Noblecourt C (1963) Tutankhamen – life and death of a Pharaoh. New York Graphic Society, New York Dixon K (1957) Systematic cordage structure analysis. Am Anthropol 59(1):134–136 Drooker PB (2000) Approaching fabrics through impressions on pottery. In: Approaching textiles, varying viewpoints: proceedings of the seventh biennial symposium of the textile society of America. Textile Society of America, Santa Fe, pp 59–68 Dunbabin MK (2006) Mosaics of the Greek and Roman world. Cambridge University Press, Cambridge

15 Art and Science of Rope

439

Dyer J, Daul GC (1998) Rayon fibers. In: Lewin M, Pearce EM (eds) Handbook of fiber chemistry. Volume 15 of international fiber science and technology series, 2nd edn. Marcel Dekker Inc., New York, pp 725–802 Emery I (1952) Naming the direction of the twist in yarn and cordage. El Palacio 59(8):251–262 Emery I (1966) The primary structures of fabrics – an illustrated classification. The Textile Museum/The Spiral Press, Washington, DC/New York Engels H, Brabender K, Moeller P (1996) Flechttechnologie. Arbeitgeberkreis Gesamttextil, Eschborn Ericsson NW (1739) En kårt berättelse, om rep och tråssars styrka eller sammanhängande kraft, som på almänt sät ihopwrides. In: Kongl. Swenska Wetenskaps Academiens Handlingar, för månaderna Julius, August. och September, 1739, vol 1. Kongl. Swenska Wetenskaps Academien, Stockholm, pp 52–64 Evans JJ, Ridge IML (2005) Rope and rope-like structures. In: Jenkins CHM (ed) Compliant structures in nature and engineering. Design and nature, vol 20. WIT Press, Boston, pp 133–169 Frimannslund R (1961) Rossmålreip. In: By og Bygd – Norsk Folkemuseums Årbok 1960, vol 14. Norsk folkemuseum, Oslo, pp 93–104 Gorielyi A, Neukirch S, Hausrath A (2012) Helices through 3 or 4 points? Note di Matematica 32(1):87–103 Harmand S, Lewis JE, Feibel CS, Lepre CJ, Prat S, Lenoble A, Boes X, Quinn RL, Brenet M, Arroyo A, Taylor N, Clement S, Daver G, Brugal J, Leakey L, Mortlock RA, Wright JD, Lokorodi S, Kirwa C, Kent DV, Roche H (2015) 3.3-million-year-old stone tools from lomekwi 3, West Turkana, Kenya. Nature 521(7552):310–315 Hearle J, Grosberg P, Backer S (1969) Structural mechanics of fibers, yarns, and fabrics, vol 1. Wiley, New York Hill A, Ward S, Deino A, Curtis G, Drake R (1992) Earliest homo. Nature 355(6362):719–722 Himmelfarb D (1957) Cordage fibres and rope. Leonard Hill Limited, London Holmes W (1884) Prehistoric textile fabrics of the united states, derived from impressions on pottery. Technical report, Government Printing Office, Washington, DC Huddart J (1793) Making cables and other cordage. British Patent Number: 1952. Reprinted in the year 1856 Huddart J (1799) Registering or forming the strands in the machinery for manufacturing cordage. British Patent Number: 2339 Huddart J (1800) Tarring and manufacturing cordage. British Patent Number: 2421 Hurley W (1979) Prehistoric cordage – identification of impressions on pottery. Aldine manuals on archaeology, vol 3. Taraxacum Inc., Washington, DC Hyland D, Adovasio J, Illingworth J (2003) The perishable artifacts. In: MacNeish R, Libby J (eds) Pendejo cave. University of New Mexico Press, Albuquerque, pp 297–416 Ingstad AS (1961) Rep av Furupert. Særtrykk av årbok for Norsk Skogsbruksmuseum - Skogsbruk, Jakt og Fiske 1958 - 1960. Elverum Trykk ISO 1968 (2004) Fibre ropes and cordage – vocabulary. International Organization for Standardization, Geneva, Switzerland ISO 2 (1973) Textiles – designation of the direction of twist in yarns and related products. International Organization for Standardization, Geneva, Switzerland ISO 2307 (2010) Fibre ropes – determination of certain physical and mechanical properties. International Organization for Standardization, Geneva, Switzerland Jirlow R (1931) Drag ur färöiskt arbetsliv. RIG 14(3–4):97–133 Johnson WC (1995) A new twist to an old tale: analysis of cordage impressions on late woodland ceramics from the potomac river valley. In: Petersen JB (ed) A most indispensable art. University of Tennessee Press, Knoxville, pp 144–159 Ko KH (2016) Origins of human intelligence: the chain of tool-making and brain evolution. Anthropol Noteb 22(1):5–22 Kvavadze E, Bar-Yosef O, Belfer-Cohen A, Boaretto E, Jakeli N, Matskevich Z, Meshveliani T (2009) 30,000-year-old wild flax fibers. Science 325(5946):1359

440

A. Åström and C. Åström

Kyosev Y (2015) Braiding technology for textile. Woodhead publishing series in textile: number 158. Woodhead Publishing, Cambridge Larson HL (1929) Slagning av läderrep i dalarna. In: Lindblom A (ed) Fataburen, vol 3. Kulturhistorisk Tidskrift. Nordiska Museet, Stockholm, pp 153–161 Lawrie G (1948) The practical rope maker. H. R. Carter Publications Ltd., Belfast Lepperhoff B (1914). Die Flechterei. Dr. Max Jänecke, Verlagsbuchhandlung, Leipzeig Lilley S (1948) Men, machines and history. Past and present: studies in the history of civilization, vol 7. Cobbett Press, London Maclaren P (1955) Netting knots and needles. Man 55(105):85–89 March R (1784) Machine for manufacturing platted work, plain and figured lace, lines, ropes, and cables, nets, and net work. British Patent number: 1445 Martin C (1991) Kumihimo – Japanese Silk Braiding Techniques. Lark Books, Asheville McGee WJ (1897) Primitive rope-making in Mexico. Am Anthropol 10(4):114–119 Mckennan H, Hearle J, O’Hear N (2004) Handbook of fibre rope technology. Woodhead Publishing, Cambridge McPherron SP, Alemseged Z, Marean CW, Wynn JG, Reed D, Geraads D, Bobe R, Bearat HA (2010) Evidence for stone-tool-assisted consumption of animal tissues before 3.39 million years ago at dikika, ethiopia. Nature 466(7308):857–860 Modéer I (1928) Öländskt tallrepslageri. In: Upmark G (ed) Fataburen. Kulturhistorisk Tidskrift, vol 1–2. Nordiska Museet, Stockholm, pp 27–70 Morgan DW (2004) Whips and whipmaking, 2nd edn. Cornell Maritime Press Inc., Centreville Munro R (1888) The lake-dwellings of Europe: being the rhind lectures in archaeology for 1888. Nabu Press, LaVergne. Reprinted 2010 Myking T, Hertzrerg A, Skrøppa T (2005) History, manufacture and properties of lime bast cordage in Northern Europe. Forestry 78(1):65–71 Nadel D, Danin A, Weker E, Schick T, Kislev M, Stewart K (1994) 19,000-year-old twisted fibers from Ohalo II. Curr Anthropol 35(4):451–458 Neukirch S, van der Heijden G (2002) Geometry and mechanics of uniform n-plies: from engineering ropes to biological filaments. J Elastic 69(1):41–72 Nomura K, Rinaldo C (2013) Tutankhamons Väverskor – Berrättelsen om att Återskapa en Faros Textila Skatt. Bokförlaget Signum, Lund O’Connell J, Allen J (2004) Dating the colonization of sahul (Pleistocene Australia-New Guinea): a review of recent research. J Archaeol Sci 31(6):835–853 O’Connell JF, Allen J, Hawkes K (2010) Pleistocene sahul and the orgins of seafaring. In: Anderson A, Barrett J, Boyle K (eds) The global origins and development of seafaring. Cambridge University/The McDonald Institute for Archaeological Research, Cambridge, pp 57–68 Olofsson O (1936) Rep av trä och näver. In: Norrbotten. Norrbottens läns hembygdsförening. Särtryck, Luleå, pp 117–155 Olsen K, Bohr J (2010) The generic geometry of helices and their close-packed structures. Theor Chem Acc 125(3–6):207–215 Olsen K, Bohr J (2011) The geometrical origin of the strain-twist coupling in double helices. AIP Adv 1:1–7 O’Neill ME (1936) Police microanalysis: III cordage and cordage fibers. J Crim Law Criminol 27(1):108–115 Osborne D, Osborne C (1954) Twines and terminologies. Am Anthropol 56(6):1093–1101 Pan N, Brookstein D (2001) Physical properties of twisted structures. II. Industrial yarns, cords, and ropes. J Appl Polym Sci 83(3):610–630 Pike AWG, Hoffmann DL, García-Diez M, Pettitt PB, Alcolea J, De Balbín R, González-Sainz C, de las Heras C, Lasheras JA, Montes R, Zilhão J (2012) U-series dating of paleolithic art in 11 caves in Spain. Science 336(6087):1409–1413 Polhem C (1739) Tankar, til ytterligare styrkande af Wallerii utgifne rön om rep. In: Kongl. Swenska Wetenskaps Academiens Handlingar, för månaderna Julius, August. och September, 1739, vol 1. Kongl. Swenska Wetenskaps Academien, Stockholm, pp 65–67 Przybył S, Piera´nski P (2001) Helical close packings of ideal ropes. Eur Phys J E 4(4):445–449

15 Art and Science of Rope

441

Rausing G (1967) The bow – some notes on its origin and development. Acta archaeologica lundensia series altera in 8◦ , N◦ 6. CWK Gleerups Förlag, Lund Redondo FJ (2017) Analysis of the perforated batons functional hypothesis. In: Alonso R, Baena J, Canales D (eds) Playing with the time. Experimental archaeology and the study of the past, 4th international experimental archaeology conference, 8–11 May 2014. Museo de la Evolución Huma/Universidad Autónoma de Madrid, Burgos/Madrid, pp 209–214 Ross M, Nolan RP (2003) History of asbestos discovery and use and asbestos-related disease in context with the occurrence of asbestos within ophiolite complexes. In: Dilek Y, Newcomb S (eds) Ophiolite concept and the evolution of geological thought, Special paper 373. Geological Society of America, Boulder, pp 447–470 Ryan D, Hansen D (1987) A Study of Ancient Egyptian Cordage in the British Museum. Occasional Paper No 62. British Museum, London Saheb DN, Jog JP (1999) Natural fiber polymer composites: a review. Adv Polym Technol 18(4):351–363 Sanders D (2010) Knowing the ropes: the need to record ropes and rigging on wreck-sites and some techniques for doing so. Int J Naut Archaeol 39(1):27–126 Semaw S (2000) The world’s oldest stone artefacts from Gona, Ethiopia: their implications for understanding stone technology and patterns of human evolution between 2,6 – 1,5 million years ago. J Archaeol Sci 27(12):1197–1214 Semaw S, Rogers MJ, Quade J, Renne PR, Butler RF, Dominguez-Rodrigo M, Stout D, Hart WS, Pickering T, Simpson SW (2003) 2.6-million-year-old stone tools and associated bones from ogs-6 and ogs-7, Gona, Afar, Ethiopia. J Hum Evol 45(2):169–177 Semenov SA (1973) Prehistoric technology – an experimental study of the oldest tools and artefacts from traces of manufacture and wear. Adams & Dart, Bath. Translated by Thompson, M. W. Shahzad A (2012) Hemp fibre and its composites – a review. J Compos Mater 46(8): 973–986 Shahzad A (2013) A study in physical and mechanical properties of hemp fibres. Adv Mater Sci Eng 2013:1–9 Soffer O (2004) Recovering perishable technologies through use wear on tools: Preliminary evidence for upper paleolithic weaving and net making. Curr Anthropol 45(3):407–413 Soffer O, Adovasio J, Hyland D (2000) The “venus” figurines – textiles, basketry, gender, and status in the upper paleolithic. Curr Anthropol 41(4):511–537 Speiser N (1974) The Japanese art of braiding. CIBA-GEIGY Rev 4(4):24–35 Splitstoser J (2012) The parenthetical notation method for recording yarn structure. In: Textiles and politics: textile society of America 13th biennial symposium proceedings. Textile Society of America, Washington, DC, pp 1–16 Stevenson MC (1915) Ethnobotany of the Zuñi Indians. Thirteenth annual report, 1908–1909. Bureau of American Ethnology, Washington, DC Stigum H (1933) Forsynkroken i Norge og på færöiene. RIG 16(1–2):83–86 Teeter E (1987) Techniques and terminology of rope-making in ancient Egypt. J Egypt Archaeol 73(1):71–77 Tyson W (1966) Rope – a history of the hard fibre cordage industry in the United Kingdom. Published for the Hard Fibre Cordage Institute by Wheatland Journals LTD, London Van de Griend P (1993) Culture-historical aspects and the science behind knots. Privately Published, Århus Van de Griend P (2006) Handling knots – the Faroe Islands. Knot News 10(58):1–7 Veldmeijer A (2005) “knotless” netting in ancient Egypt: a reappraisal on the basis of archaeologically attested material from berenike and qasr ibrim. Göttinger Miszellen 206: 91–102 Veldmeijer A (2006) Knots, archaeologically encountered: a case study of the material from the ptolemaic roman harbour and at berenike (Egyptian sea red coast). Studien zur Altagyptischen Kultur 35:337–366 Veldmeijer A (2009) Cordage production. In: Wendrich W (ed) UCLA Encyclopedia of Egyptology. University of Carlifornia, Los Angeles, pp 1–9

442

A. Åström and C. Åström

Veldmeijer A, Bourriau J (2009) The carrier nets from a burial at Qurna. J Egyp Archaeol 95: 209–222 Veldmeijer AJ, Zazzaro C, Clapham AJ, Cartwright CR, Hagen F (2008) The “rope cave” at mersa/wadi gawasis. J Am Res Cent Egypt 44:9–39 Vinson S (2013) Boats (use of). In: Wendrich W (ed) UCLA encyclopedia of egyptology. University of Carlifornia, Los Angeles, pp 1–13 Wahlbeck O (1991) Rep och repslageri under olika tidsåldrar. Samhall Klintland Grafiska, Linköping Walford T (1748) Specification for an engine or machine for the laying or intermixing of threads, cords or thongs of different kinds, commonly called platting. British Patent Number: 638 Warner C, Bednarik R (1996) Pleistocene knotting. In: Turner J, Van de Griend P (eds) History and science of knots. Series on knots and everything, vol 11. World Scientific Publishing Co. Pte. Ltd, Singapore, pp 3–18 Weber-Partenheimer W (1974a) Braiding machinery. CIBA-GEIGY Rev 4(4):5–9 Weber-Partenheimer W (1974b) Machine braids. CIBA-GEIGY Rev 4(4):10–13 Weber-Partenheimer W (1974c) Patents and skills. CIBA-GEIGY Rev 4(4):14–19 Wendrich W (1996) Ancient Egyptian rope and knots. In: Turner J, Van de Griend P (eds) History and science of knots. Series on knots and everything, vol 11. World Scientific Publishing Co. Pte. Ltd., Singapore, pp 43–68 Zaki NM, Iskander Z, Salah OM, Youssof MA (1960) The cheops boats, part I. General Organisation for Government Printing Offices, Cairo

A Survey of Cellular Automata in Fiber Arts

16

Joshua Holden and Lana Holden

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Representations of Cellular Automata in Fiber Arts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sierpi´nski Triangles and Related Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Designs from Well-Known Cellular Automata Rules . . . . . . . . . . . . . . . . . . . . . . . . . Cellular Automata Designs Created for Fiber Arts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

444 444 452 452 457 460 464 464 464

Abstract Cellular automata provide a natural way of exploring the intersection between mathematics and art, and many fiber arts provide natural ways of depicting cellular automata. We review some of the types of cellular automata, including elementary cellular automata, the Game of Life and “lifelike” cellular automata, and stranded cellular automata. We then provide a survey of many of the different ways that cellular automata have been used in fiber arts. This includes both depictions of well-known patterns produced by cellular automata and also cellular automata that were specifically designed or specifically chosen for their suitability in fiber media.

J. Holden () Rose-Hulman Institute of Technology, Terre Haute, IN, USA e-mail: [email protected] L. Holden SkewLoose, LLC, Terre Haute, IN, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_54

443

444

J. Holden and L. Holden

Keywords Cellular automata · Fiber arts · Game of Life · Elementary cellular automata · Sierpi´nski triangle · Knitting · Crochet · Weaving · Cross-stitch · Beading

Introduction Mathematicians in general are intensely creative people and often express themselves in art. Conversely, artists frequently explore the constraints of structure in their work, and mathematics is a common way to do that. The intersection of artistic creativity and mathematical structure can be particularly fruitful when the mathematics is easy to work with and yet produces complex results. At the same time, the choice of artistic medium can be chosen to either support or work against the structure. Cellular automata were first explored in the 1940s in order to conceptualize the idea of a programmable machine in the simplest possible way. Time and space are modeled as broken into discrete chunks which obey a small set of simple rules. Despite the simplicity of the rules, it can be shown that any calculation that can be done on a computer can be reproduced on a cellular automaton. A discrete grid of cells can be depicted in any of a vast range of artistic media, but some media have such a grid built in. Knitting, crochet, weaving, braiding, crossstitch, needlepoint, and bead weaving all have discrete elements in a natural grid. Other fiber arts such as tatting, Chinese knotwork, macramé, and quilting do not have a built-in grid, but they have discrete elements which could be arranged in such a grid. Thus fiber arts provide natural media for presenting depictions of cellular automata. This chapter starts with a definition of cellular automata and a review of some of the common types of cellular automata, their notations and depictions, and their characteristics. We then survey some representations of cellular automata in fiber arts. We start with cellular automata rules that produce approximations of the well-known Sierpi´nski triangle fractal, following that with designs produced from other well-known rules. Designs from cellular automata rules that were specifically invented for fiber artwork, or obscure ones that were specifically chosen for that purpose, are the final topic covered.

Cellular Automata A cellular automaton, or CA, is a mathematical construct which models a system evolving in time. This system could be physical, chemical, biological, social, computational, or purely mathematical. A CA is characterized by a discrete set of cells, finite or infinite, in a regular grid, with a finite number of states that each cell can be in. Each cell has a well-defined finite neighborhood which determines how the cell will evolve through the different states. The CA starts in some arbitrary initial configuration at time t = 0. Time moves in discrete steps from that point, and

16 A Survey of Cellular Automata in Fiber Arts

445

the state of each cell at time t is determined by the states of its neighbors at time t − 1. Finally, each cell uses the same rule to determine its state. Often, but not always, one of the states is designated as a “quiescent,” or “ground,” state. In this case, the rule is often set up so that if a cell is in the ground state at time t − 1 and the entire neighborhood of the cell is also in the ground state, then the cell will remain in the ground state at time t. If the grid of the CA is infinite, it is also common to consider only initial configurations in which all but a finite number of cells are in the ground state. These are known as “finite configurations.” If the grid is finite, on the other hand, the behavior of the CA is somewhat less complex, and it may not be important to have a ground state. For a finite grid the pigeonhole principle implies that any CA on the grid will eventually repeat. The length of such a repeat may be quite difficult to predict, however (Holden, 2017; Fabienne Serriere, 2018). The earliest cellular automaton is credited to John von Neumann, with help from Stanislaw Ulam (Schiff, 2008, Sec. 1.1). Von Neumann was interested in the question of whether it was possible to build machines that could replicate themselves, and the structure of cells on a grid seemed like a good way to mathematically conceptualize this. In the early 1950s, Von Neumann sketched out a system with an infinite two-dimensional square grid and 29 states, including a ground state. The other states were part of the mathematical “machine,” known as a universal constructor. The neighborhood of a cell is the cell itself along with the four cells which are horizontally or vertically directly adjacent to the given cell, as shown in Fig. 1. This is now known as the von Neumann neighborhood (Schiff, 2008, Ch. 4). The 29 states allowed the machine to read and store enough information from a virtual “tape” to turn sets of cells in the ground state into more copies of itself and the tape. The final details were filled in by Arthur Burks in 1966, after von Neumann’s death, although it was not until 2008 that computers became powerful enough to run a full simulation of the universal constructor, which requires over 200,000 cells for the machine and the tape. One of the best known examples of cellular automata is the “Game of Life,” invented by John Conway in 1970 (Schiff, 2008, Sec. 4.1). The grid in this game

Fig. 1 The von Neumann neighborhood

446

J. Holden and L. Holden

is an infinite two-dimensional square grid as in the von Neumann automaton. Each cell now has only two possible states, which are often thought of as “live” and “dead,” reminiscent of a biological system. The neighborhood of a cell is the cell itself along with the eight cells which are horizontally, vertically, or diagonally directly adjacent to the given cell, as shown in Fig. 2. This neighborhood is also known as the Moore neighborhood (Schiff, 2008, Ch. 4). The Game of Life can be summarized in just four instructions, which are listed in Fig. 3. These simple instructions produce extremely complex behavior, including patterns which move and replicate. An example of a “glider” is shown in Fig. 4. In fact, it has been shown that these patterns can be used to replicate any computation that can be done by a digital computer. Cellular automata based on the same setup as the Game of Life but varying the number of neighbors required for “birth” and “survival” are often called “lifelike” cellular automata or outer totalistic Moore neighborhood cellular automata. A common notation for these rules consists of a list of the numbers of cells which produce a birth, followed by a slash, followed by a list of the numbers of cells Fig. 2 The Moore neighborhood

Any live cell with two or three live neighbors at time t − 1 stays live at time t. Any other live cell at time t − 1 dies at time t. Any dead cell with exactly three live neighbors at time t − 1 becomes a live cell at time t. Any other dead cell at time t − 1 stays dead at time t.

Fig. 3 The Rules for the Game of Life

Fig. 4 A “glider” in the Game of Life

16 A Survey of Cellular Automata in Fiber Arts

447

which allow survival. The Game of Life rule is 23/3 in this notation. Other examples include the nine-cell parity rule (Schiff, 2008, Sec. 4.3), notated 02468/1357. In this rule a cell becomes live if it has an odd number of live neighbors, including itself, and dead if it has an even number of live neighbors, including itself. Another example is the “1-of-8” rule (Griswold, 2004a), notated 1/0123456789, in which a dead cell becomes live if it has exactly one live neighbor, and live cells always stay live. Rules in which live cells always stay live are sometimes called “solidification” rules. Lifelike cellular automata can also use the von Neumann neighborhood described earlier. Interesting rules on this include the five-cell parity rule (Schiff, 2008, Sec. 4.3), which operates analogously to the nine-cell parity rule, and the voter rule (Griswold, 2004a). In the voter rule, a cell becomes live if it has more than two live neighbors, becomes dead if it has less than two live neighbors, and flips its state if it has exactly two live neighbors. Another well-known set of cellular automata is the set of “elementary cellular automata” (ECAs), popularized by Stephen Wolfram (2002). In this case, the grid is infinite but one-dimensional. Once again there are two possible states for each cell, which are arbitrarily labeled 0 and 1. The neighborhood of a cell is the cell itself plus the two cells on either side. There are a total of 23 = 8 possible configurations of the entire neighborhood and time t − 1, and a rule needs to specify for each configuration whether the cell at time t is in state 0 or state 1. There are therefore 28 = 256 possible rules, which have been assigned numbers from 0 to 255 based on a system devised by Wolfram. An example is shown in Fig. 5. Wolfram divided these rules into four rough classes based on the complexity of their behavior starting with a random initial configuration of cells (Wolfram, 2002, pp. 231–235; Schiff, 2008, Sec. 3.8). Class 1 rules converge rapidly to a configuration where all cells are the same. Class 2 rules converge rapidly to a stable but nonuniform configuration or a repetitive pattern of such configurations. These first two classes are generally less interesting from an artistic point of view. Rules in Class 3, when started in a random configuration, continue to evolve with apparent randomness. Rule 30, for instance, produces aperiodic patterns which have passed all known statistical tests of randomness. Rules in Class 4 form some areas of stable or repeating configurations but also produce structures which can move, interact, and replicate. For example, it can be shown that Rule 110 is like the Game of Life in that it can be used to replicate any computation. A common way to represent the evolution of elementary cellular automata is with a two-dimensional square grid, where the top row is t = 0 and each succeeding time step is placed underneath its predecessor. State 1 is usually represented with a dark

bit 7 = 0

bit 6 = 1

Fig. 5 ECA Rule 90

bit 5 = 0

bit 4 = 1

bit 3 = 1

bit 2 = 0

bit 1 = 1

bit 0 = 0

448

J. Holden and L. Holden

square and state 0 with a light square. Using this representation, Rule 90 started on a row with a single dark square and run for 2n time steps produces an approximation to the Sierpi´nski triangle fractal as shown in Fig. 6. This approximation converges to the fractal as n goes to infinity. Several other rules also produce Sierpi´nski triangles of various sorts, including reversed, doubled, and asymmetric ones. The firing squad synchronization problem (see Schiff 2008, Sec. 3.11) is related to elementary cellular automata, although it uses more than one state and a finite one-dimensional grid. The goal is to find an automaton which starts with one cell (the “general”) in an “active” state and the others in a quiescent state and ends with all cells reaching a “firing” state. All cells must be in the firing state simultaneously, and no cell must reach the state before any other. This problem was first proposed by John Myhill in 1957 and first solved by John McCarthy and Marvin Minsky in 1962. Their solution is illustrated in Fig. 7. The shortest possible amount of time required is 2n−1 steps for n cells; this was first achieved in 1962. The best currently known minimal-time solution to the problem requires six states and was achieved in 1987. It is not known whether a five-state minimal time solution exists, although it is known that a four-state one does not. Holden and Holden (2014a, 2016) introduced a type of cellular automaton designed specifically for use in fiber arts. Each cell contains zero, one, or two “strands” which could represent raised ribs of knitted or crocheted cables; fibers or reeds in loom weaving, basket weaving, braiding, or plaiting; or cords used in Chinese knotwork or macramé. The grid of a stranded cellular automaton (SCA) is one-dimensional but finite. The states of each cell represent whether the cell contains no strands, only a strand starting on the left, only a strand starting on the right, or strands starting on both sides. The strands can be upright or slant from one side of the cell to the other. If there are two slanting strands, they will cross, and we need to specify which strand is on “top.” (The possibility where one strand is slanted and

Fig. 6 An approximation to the Sierpi´nski fractal produced by Rule 90

16 A Survey of Cellular Automata in Fiber Arts

449

Fig. 7 A solution to the firing squad problem with 15 states and 3n time steps. (Wikipedia User Lecanard˜enwiki)

upright

slanted

no strands left only right only both Fig. 8 Each cell will store four pieces of information in eight states

the other is upright has not yet been considered.) All of this amounts to four pieces of binary information but only eight distinct states, as shown in Fig. 8. Unlike the standard for elementary cellular automata, stranded cellular automata diagrams represent time as moving from the bottom of the picture to the top, in order to make them resemble knitting or crochet patterns. The neighborhood used is a onedimensional version of the Margolus neighborhood (Schiff, 2008, Sec. 4.2.1). The

450

J. Holden and L. Holden

Fig. 9 The neighbors of each cell are the two cells which touch it from below

grid shifts back and forth by one-half the width of a cell at each time step, giving the two-dimensional diagram a “brick wall” appearance, as shown in Fig. 9. The neighbors of a cell at time t are the two cells which overlapped its space at time t = 1. In addition, the cells at the far left and far right of the finite grid are considered as if they were next to each other. (In the cellular automata literature, this is known as “periodic boundary conditions.” See, for example, Schiff (2008, Sec. 3.4).) Therefore the state of a given cell depends on the state of the two cells which touch it from below in the figure, keeping in mind that the grid “wraps around” at the edges into a cylinder. For physical representations, one could “cut” the cylinder in a convenient place and unroll it or use it in a tubular context such as a sock or the sleeve of a garment. In theory, such cellular automaton could use any of the 88·8 ≈ 6 · 1057 rules which map the states of the two neighbor cells to the state of the new cell. However, the stranded cellular automata system restricts this rule set for aesthetic and practical reasons. First, it specifies that if the left neighbor has a strand ending on the right, the new state will have a strand starting on the left, and similarly for the right neighbor, in order to preserve the continuity of the strands. Figure 10 shows examples of this. This still leaves 23·5 25·3 35·5 ≈ 9 · 1020 possible rules, which is a very large rule set to explore. In order to provide a practical starting point for exploration, the system is broken into two simpler cellular automata controlling different aspects of the state. The first cellular automaton controls whether strands are upright or slanted, based on whether the strands in the neighbor cells are upright, slanted, or absent. This can be thought of as a function which takes nine possibilities as input (three for each neighbor) and for each possible input chooses one of two possibilities as output (upright or slanted). This gives 29 = 512 possible functions. These functions are coded using binary numbers as a “Turning Rule,” similar to the coding for elementary cellular automata mentioned earlier. Figure ?? shows Turning Rule 39, which in binary is 000100111. (Note that in fact the value of bit 4 is irrelevant, since it always controls an empty cell and thus does not affect the final output.) The second cellular automaton controls which strand is on top if the strands cross. Each neighbor again is considered to have three possibilities: the strand going

16 A Survey of Cellular Automata in Fiber Arts

no left

451

left

no right

right

Fig. 10 The conditions controlling whether strands are present or not

bit 8 = 0

bit 7 = 0

bit 6 = 0

bit 5 = 1

bit 4 = 0

bit 3 = 0

bit 2 = 1

bit 1 = 1

bit 0 = 1

Fig. 11 Turning Rule 39

toward the new cell is on top, the strand going toward the new cell is on the bottom, or strands do not cross. (In the last case either there is only one strand or the two strands are upright.) Again, there are two possible outputs and 29 = 512 rules, coded in binary as “Crossing Rules.” Figure 12 shows Crossing Rule 39.

452

J. Holden and L. Holden

bit 8 = 0

bit 7 = 0

bit 6 = 0

bit 5 = 1

bit 4 = 0

bit 3 = 0

bit 2 = 1

bit 1 = 1

bit 0 = 1

Fig. 12 Crossing Rule 39

Representations of Cellular Automata in Fiber Arts Cellular automata have been explicitly used in fiber arts since at least Debbie New’s article (New, 1997) in Knitter’s Magazine. Many fiber art media, such as knitting, crochet, loom weaving, cross-stitch, and needlepoint, have a natural rectangular grid. The different states of cells on this grid can be represented by different colors of fiber or different types or patterns of stitches. This situation lends itself particularly well to Game of Life, elementary, and stranded cellular automata. The use of cellular automata in art in general can be divided into two broad categories: artistic representations of designs generated by well-known cellular automata and designs that are generated by cellular automata specifically created for that purpose. Of course there is overlap between the two categories.

´ Sierpinski Triangles and Related Cellular Automata The best-known design element produced by cellular automata is the Sierpi´nski triangle approximation. As was mentioned earlier, several cellular automaton rules produce these approximations, notably including ECA Rule 90. Since Sierpi´nski triangles have been used in art for centuries, before either fractals or cellular automata had been recognized as mathematical objects, it is not always straightforward to determine whether a design should be classified as being from a cellular automaton.

16 A Survey of Cellular Automata in Fiber Arts

453

However, cellular automata have been specifically mentioned by several authors and designers in the context of Sierpi´nski triangles depicted in fiber arts. We discuss in addition some patterns for Sierpi´nski triangles which don’t specifically mention cellular automata but are constructed in such a way that a cell structure is visible in (the approximation to) the fractal. Debbie New is often credited with popularizing the use of cellular automata in fiber arts. Her knitting patterns in New (1997) use elementary cellular automaton Rule 22. This is one of the ones that generates Sierpi´nski triangles, although her examples do not confine themselves to this design. She suggests several possible ways to differentiate the two states, including different colors, different stitches such as knit and purl, four-by-four blocks of stitches producing cables which are either twisted or untwisted, and lace patterns with blocks of stitches that do or do not have a hole. The cable instructions include a suggested method for determining which way the cables cross. This can be seen as a precursor to the stranded cellular automata system of Holden and Holden. Nora Gaughan also gives a well-known pattern for knitting a Sierpi´nski triangle scarf (Fig. 13) in her chapter on knitting fractals (Gaughan, 2012, Ch. 5), although she doesn’t specifically mention cellular automata. In her pattern the two states of the cellular automaton are represented by blocks of knit stitches and lace eyelets. Crocheted Sierpi´nski triangles are also popular. Wildstrom (2007) credits the first such design to Mary Pat Campbell (2002). Her design (Fig. 14) is based on a staggered rectangular grid of double crochet stitches, with filled and open grid spaces representing the two states. Wildstrom’s own design (Fig. 15) is based on a

Fig. 13 Sierpi´nski triangle scarf designed by Nora Gaughan, knit by Bonnie Sennott

454

J. Holden and L. Holden

Fig. 14 Sierpi´nski triangle, designed by Mary Pat Campbell, crocheted by Julia Collins

Fig. 15 Sierpi´nski triangle, designed and crocheted by Jake Wildstrom, photograph by sarah-marie belcastro

diamond-shaped mesh of chain stitches in which some diamonds are filled by a shell stitch. He also gives quite a bit of detail about the mathematical relation between Sierpi´nski triangles and cellular automata. Ted Ashton (2011) expands on Wildstrom’s work to give instructions for creating Sierpi´nski triangles in five other fiber art media. Of these, tatting and cross-stitch clearly show the cell structure of a generating cellular automaton rule. In fact, Ashton describes several cross-stitched variations based on different rules and starting configurations. The states are represented by colored stitches on a white background (Fig. 16). Tatting is a particularly interesting case because it is built around modular structures. In Ashton’s work (Fig. 17) a single tatted ring is used to represent one of the states. The other state is represented simply by the absence

16 A Survey of Cellular Automata in Fiber Arts

455

Fig. 16 Sierpi´nski triangle, designed and cross-stitched by Ted Ashton Fig. 17 Sierpi´nski triangle, designed and tatted by Ted Ashton

of a ring. Gwen Fisher (2016) used a similar idea to create Sierpi´nski triangles (Fig. 18) from angle-woven beads, where the states are indicated by the presence or absence of a ring of beads. (In fact, Fisher is currently writing a book (Fisher, 2020, Forthcoming) specifically on cellular automata and bead weaving.) The same idea could easily be adapted to other fiber arts with a modular structure, such as modular crochet. Andrew Kieran (2013) used a computer-controlled loom to produce a Jacquardwoven Rule 90 pattern which was not a Sierpi´nski triangle. He represented the states by 3/1 warp-faced twill and 1/3 weft-faced twill, avoiding the problem of loose threads on the fabric. All of the threads were the same color, resulting in a fabric (Fig. 19) where cells in different states were visually distinguishable by the reflective characteristics of the different twills. Elaine Ellison (2009; 2018, p. 4 and p. 7) has made quilts, including “Sierpinski’s triangle” (Fig. 20) and “Pascal’s Pumpkin,” based on the Sierpi´nski triangle and

456

J. Holden and L. Holden

Fig. 18 Sierpi´nski tetrahedron, designed and beaded by Gwen Fisher

Fig. 19 Rule 90 pattern, Jacquard-woven by Andrew Kieran

related patterns. While the cell structure is not clearly visible in the construction of these works, Ellison has indicated that she was aware of the connection to cellular automata (Ellison, Personal communication, Email from 11 Aug 2018). It should be noted that even before elementary cellular automata had been classified, J.C.P. Miller (1970) used a form of needlepoint tapestry (i.e., needlepoint embroidery, not tapestry weaving) to illustrate his analysis of a dynamical system that was later seen to be equivalent to the Rule 90 ECA. Oddly enough, Miller does not seem to have connected his system to the Sierpi´nski triangle or produced one, despite his work having been inspired by a problem found in one of Sierpi´nski’s books. In Miller’s needlepoint (Miller, 1970, Figs. 48 and 49), different colors are used to indicate the two states of the cellular automaton.

16 A Survey of Cellular Automata in Fiber Arts

457

Fig. 20 “Sierpinski’s triangle,” designed and quilted by Elaine Ellison

Fig. 21 Rule 110 tea cosy, designed and knit by Camilla Fox

Other Designs from Well-Known Cellular Automata Rules In addition to the rules that produce Sierpi´nski triangles, some other elementary cellular automata rules are also well known, notably Rules 30 and 110 mentioned earlier. Rules 73 and 54 are also notable for being on the borders of Wolfram’s complexity classes described earlier. Rule 73 will devolve into a Class 2 repetitive configuration starting from most initial conditions but exhibits random-looking Class 3 behavior starting from others (Wolfram, 2002, p. 699). Rule 54 appears to be Class 4, and in fact appears to have the potential to replicate any computation, but no one has been able to prove definitively that such computations are possible (Wolfram, 2002, p. 697). Camilla Fox (2008) is known on the Internet for her two-color knitted tea cozies (Fig. 21) based on Rules 30, 110, and 109. Julia Collins (2013) has also designed a pattern (Fig. 22) based on Rule 30. Fabienne Serriere has produced two-color machine-knit scarves using Rule 110 (Fig. 23) and Rule 73 (Lamb, 2015). Jer Thorp

458

J. Holden and L. Holden

Fig. 22 Knitting pattern based on Rule 30, designed and knit by Julia Collins

Fig. 23 Rule 110 scarf, designed and machine-knit by Fabienne Serriere

(2012) and Diane Thorp have collaborated on handwoven textiles based on Rule 30 and also on the von Neumann neighborhood voter rule (Fig. 24). Conway’s Game of Life has also been used to generate knitting patterns. A common practice seems to be starting with a pattern based on someone’s initials and letting it evolve for one or more time steps. Debbie Chachra knit such a sweater (Fig. 25) based on a design by Clive Thompson (2006). David Gross (2013) cross-stitched a design (Fig. 26) based on a 16-state onedimensional cellular automaton that solves the firing squad synchronization problem. The states were indicated by different colors, although not all of the colors were unique.

16 A Survey of Cellular Automata in Fiber Arts Fig. 24 Von Neumann neighborhood voter rule fabric, designed by Jer Thorp and handwoven by Diane Thorp

Fig. 25 Game of Life baby sweater, designed by Clive Thompson and knit by Debbie Chachra

459

460

J. Holden and L. Holden

Fig. 26 iPhone case based on the firing squad synchronization problem, designed and cross-stitched by David Gross

Cellular Automata Designs Created for Fiber Arts Fiber artists have also designed new cellular automata, or chosen obscure ones, especially for their suitability in specific media. Even before the work mentioned earlier, Mary Pat Campbell (2002) designed a cross-stitch using a lifelike cellular automata with a solidification rule, specifically chosen because it was easier to replicate on graph paper for her cross-stitch design process. She called this the “kudzu automaton” for its growth properties. The live states appear to have been represented by the presence of cross-stitches and the dead states by their absence. The pattern, like Clive Thompson’s, used someone’s initials as the starting point. In New (2003, Sec. 7), Debbie New expanded her work mentioned earlier to include three other elementary cellular automaton rules, numbers 126, 60, and a shifted version of Rule 22. While it turns out that all of the these rules also can be used to generate Sierpi´nski triangles, New emphasizes the fact that all four of the rules can be easily described in knitting terms. If the states are represented by different colors or different stitches, then only a few stitches on or below the tips of the needles need to be compared in order to determine the new state. Similarly, Ralph Griswold (2004b) investigated a large number of lifelike cellular automaton rules in order to find ones which produced a sequence of patterns he considered attractive and interesting when used with periodic boundary conditions.

16 A Survey of Cellular Automata in Fiber Arts

461

Fig. 27 Cellular automaton Möbius scarf, designed and machine-knit by Elisabetta Matsumoto, Henry Segerman, and Fabienne Serriere

He specifically looked for rules which generate long sequences of patterns as time progresses before repeating. (Recall that any cellular automaton restricted to a finite grid will eventually repeat.) He also looked for rules which tended to maintain the balance of live and dead cells over time, which eliminated the standard Game of Life. Griswold settled on four rules which fit his criteria: the five-cell and nine-cell parity rules and lifelike rules 123/123 and 234/234. (Note that the source has two typos: rule 123/123 is called /123 and 234/234 is called /234.) He used these rules to produce a collection of 761 weaving patterns called the Fancy Twill Variations (Griswold, 2005). Elisabetta Matsumoto, Henry Segerman, and Fabienne Serriere (2018) considered double-knitting two-color scarves with elementary cellular automaton patterns on Möbius strips of finite width. The double-knitting technique produces a fabric where the second side is the mirror image of the first side except that the colors (and therefore the states of the cells) are switched. In addition, since the scarf is in the form of a Möbius strip, it was desired that the two sides be able to be joined such that the pattern continues through the half twist. The authors determined that under periodic boundary conditions, there were exactly 16 ECA rules satisfying the desired properties. Two of these, Rules 105 and 150, were deemed interesting enough to knit. Each of these was knit on a Stoll CMS 530 HP 7.2 multigauge industrial flat knitting machine, using a width of 114 and a repeat length of 1022 in order to get scarves of appropriate proportions (Fig. 27). More drastic innovations in cellular automata have also been made in order to work well with fiber arts. As mentioned earlier, the stranded cellular automata system was specifically invented to model artistic depictions of knotwork, including knitted and crocheted cables and woven textiles as well as actual knots. Lana Holden used stranded cellular automata patterns to design cable-knit socks (Fig. 28)

462 Fig. 28 “Automata socks,” designed and knit by Lana Holden

Fig. 29 Cowl based on SCA Rules 201 and 39, designed and knit by Lana Holden

J. Holden and L. Holden

16 A Survey of Cellular Automata in Fiber Arts

463

Fig. 30 Puppytooth pattern

Fig. 31 Fabric woven from the cellular automata puppytooth pattern

and cowls (Fig. 29), including the commercially available “automata socks” pattern (Holden, 2014b; Holden and Holden, 2016). Loe Feijs and Marina Toeters (2017) were interested in the weaving pattern known as houndstooth and its simpler version, puppytooth. An example of puppytooth is shown in Fig. 30. In order to produce the puppytooth pattern, they developed a one-dimensional cellular automaton with the same neighborhood as ECAs. They proved that no cellular automaton with only two states was suitable, even if the neighborhood was enlarged. Therefore they created an automaton with five states: two light-colored (red and green), two dark-colored (red and green), and a quiescent state (white). When the rule is started on a row with a single dark red square, the dark and light states produce the puppytooth pattern. Feijs and Toeters produced woven fabric starting with a row sparsely seeded with dark red and dark green squares, producing a semi-random effect that is still reminiscent of puppytooth. Garments

464

J. Holden and L. Holden

with this pattern were created from fabric woven by a specialty fabric business in the Netherlands (Fig. 31).

Conclusion Many artists have found inspiration from the mathematical concept of cellular automata. The relatively simple rules of a cellular automaton are easy to understand and work with, yet the structure they provide leads to complex patterns, many of which are aesthetically pleasing. The discrete nature of cellular automata makes them especially appealing to artists who work with fiber. This medium has an essentially discrete nature in itself – fiber generally comes in discrete strands which are separate entities. Thus the range of fiber artwork based on cellular automata shown here should come as no surprise. We hope this chapter has shown the reader a variety of useful and pleasing combinations of this mathematical concept with this artistic medium. It should be noted that much of this work was done by art hobbyists outside the realm of galleries and professional art shows. (Much of it was done by math hobbyists as well!) The use of social media has greatly increased the dissemination of this sort of hobby work, but there is still surely much out there which is documented poorly or not at all. We therefore also hope that this chapter spreads the awareness of mathematical fiber arts, especially that related to cellular automata, and encourages others to continue both producing and disseminating this work.

Cross-References  Fractal Geometry in Architecture  Mathematical Design for Knotted Textiles  Parametric Design: Theoretical Development and Algorithmic Foundation for

Design Generation in Architecture  Projections of Knots and Links  Shape Grammars: A Key Generative Design Algorithm

References Ashton T (2011) Fashioning fine fractals from fiber. In: belcastro s-m, Yackel C (eds) Crafting by concepts, Chap 4. A K Peters/CRC Press, Natick, pp 58–86 Campbell MP (2002) Fractal crochet. http://www.marypat.org/stuff/nylife/020325.html. Accessed 10 Aug 2018 Collins J (2013) Chaotic knitting. https://botanicamathematica.wordpress.com/2013/03/18/ chaotic-knitting/. Accessed 13 Jan 2019 Ellison EF (2009) Mathematical classroom quilts. In: Proceedings of Bridges 2009: mathematics, music, art, architecture, culture. Tarquin Publications, pp 341–342 Ellison EK (2018) Mathematical quilts. http://mathematicalquilts.com. Accessed 6 Oct 2018

16 A Survey of Cellular Automata in Fiber Arts

465

Feijs L, Toeters M (2017) A cellular automaton for pied-de-poule (Houndstooth). In: Proceedings of Bridges 2017: mathematics, music, art, architecture, education, culture. Tessellations Publishing, pp 403–406 Fisher G (2016) Free beading pattern – sierpinski triangle. https://gwenbeads.blogspot.com/2016/ 10/free-beading-pattern-sierpinski-triangle.html. Accessed 18 Aug 2018 Fisher G (2020, Forthcoming) Bead weaving with algorithms. World Scientific Publishers Fox C (2008) Cellular automata tea cozy – instructions. http://web.mit.edu/cfox/www/knitting/b. html. Accessed 10 Aug 2018 Gaughan N (2012) Knitting nature: 39 designs inspired by patterns in nature. Stewart, Tabori and Chang, New York/London Griswold RE (2005) Cellular automata, part 1. Webside 10:3–4 Griswold RE (2004a) Drawdown automata, part 1: basic concepts, 8p. https://www2.cs.arizona. edu/patterns/weaving/webdocs/gre_dda1.pdf. Accessed 27 Oct 2018 Griswold RE (2004b) Drawdown automata, part 4: a few good rules, 4p. https://www2.cs.arizona. edu/patterns/weaving/webdocs/gre_dda4.pdf. Accessed 27 Oct 2018 Gross D (2013) Cellular automaton design for cross-stitch IPhone case. http://www.instructables. com/id/Cellular-Automaton-Design-for-Cross-Stitch-iPhone-/. Accessed 10 Aug 2018 Holden L (2014a) Knit stranded cellular automata. Sockupied, 10 Spring 2014 Holden L (2014b) Automata socks. Sockupied, 11 Spring 2014 Holden J (2017) The complexity of braids, cables, and weaves modeled with stranded cellular automata. In: Proceedings of Bridges 2017: mathematics, music, art, architecture, education, culture. Tessellations Publishing, pp 463–466 Holden J, Holden L (2016) Modeling braids, cables, and weaves with stranded cellular automata. In: Proceedings of Bridges 2016: mathematics, music, art, architecture, education, culture. Tessellations Publishing, pp 127–134 Kieran A (2013) Weaving Wolfram rule 90. https://edinburghhacklab.com/2013/12/weavingwolfram-rule-90/. Accessed 18 Aug 2018 Lamb E (2015) Help make wearable cellular automata a thing. https://blogs.scientificamerican. com/roots-of-unity/help-make-wearable-cellular-automata-a-thing/. Accessed 10 Aug 2018 Matsumoto E, Segerman H, Serriere F (2018) Möbius cellular automata scarves. In: Proceedings of Bridges 2018: mathematics, music, art, architecture, education, culture. Tessellations Publishing, pp 523–526 Miller J (1970) Periodic forests of stunted trees. Phil Trans R Soc Lond A 266(1172):63–111 New D (1997) Celluar automaton knitting. Knitter’s Mag 49:82–83 New D (2003) Unexpected knitting. Schoolhouse Press, Stevens Point Schiff JL (2008) Cellular automata: a discrete view of the world, 1st edn. Wiley, Hoboken Thompson C (2006) A baby sweater generated by Conway’s “Game of Life”!. http://www. collisiondetection.net/mt/archives/2006/04/_as_a_principle.php. Accessed 13 Aug 2018 Thorp J (2012) Infinite weft (Exploring the old aesthetic). http://blog.blprnt.com/blog/blprnt/ infinite-weft-exploring-the-old-aesthetic. Accessed 27 Oct 2018 Wildstrom DJ (2007) The sierpinski variations: self-similar crochet. In: belcastro s-m, Yackel C (eds) Making mathematics with needlework: ten papers and ten projects, Chap 3. A K Peters/CRC Press, Wellesley, pp 40–52 Wolfram S (2002) A new kind of science. Wolfram Media, Champaign

Mathematics and Art: Connecting Mathematicians and Artists

17

Joseph Malkevitch

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Tools for Artists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asymmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Artists and Artist Mathematicians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geometrical Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polyhedra, Tilings, and Dissections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Origami . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bridging the World of Art and Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . End Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

468 469 474 478 478 483 485 489 492 492 492

Abstract Mathematics and art interact in surprisingly many different ways. These include mutual goals of seeking insights into the world we live in, the “human condition,” and creating beautiful works. Mathematics provides tools for the generation of art (e.g., perspective) and concepts that inspire artists. Many mathematicians choose to express themselves through art in addition to proving theorems. The interaction of mathematics and art has been beneficial for both subjects.

J. Malkevitch () York College (CUNY), New York, NY, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_83

467

468

J. Malkevitch

Keywords Art · Beauty · Convex · Euclidean geometry · Fractal · Frieze · Geometry · Group · Hyperbolic geometry · Mathematics · Pattern · Perspective · Polygon · Polyhedra · Projective geometry · Regular polygon · Symmetry

Introduction Wild animals do not make art. However, humans may admire a bird’s nest as being a “work of art,” and we may find the patterns in the snow made by a squirrel or rabbit pleasing. (1) The study of patterns is sometimes given as a short definition of the content of mathematics. Appreciating art often involves seeing the patterns within a picture and the patterns that connect an artist’s output over time. Similarly, it is not clear if the sounds that whales make, sometimes treated as “musical art,” qualify as art in the same way that a Mozart opera or a Mahler symphony do. Yet, bird songs are indeed a form of communication for birds as may be giraffe vocalization for giraffes though the meaning of giraffe sounds is unclear to us. Certainly “art” is a form of communication for humans. What constitutes art is a very complex and hotly debated subject. When Jackson Pollock first experimented with expressing himself by flinging paint at a canvas, many saw his activity as a form of self-indulgence rather than artistic expression. Many see art as the pursuit of the beautiful and a way to express emotional truth as “seen” by the artist. Beauty and truth are also commonly mentioned by mathematicians as what attracted them to mathematics and continues to shape the reason they pursue their mathematical endeavors. My purpose here is to call attention to the surprisingly rich association between mathematics and narrow sections of the arts – the visual arts (design) and architecture as shown in paintings/sculpture. One mathematical connection with art is that some individuals known as artists have needed to develop or use mathematical thinking to carry out their artistic vision. Among such artists were Luca Pacioli (c. 1145–1514), Leonardo da Vinci (1452–1519), Albrecht Dürer (1471–1528), and M.C. Escher (1898–1972). Another connection is that some mathematicians have become artists while often at the same time pursuing their research in mathematics. Mathematicians commonly talk about beautiful theorems and beautiful proofs of theorems. They also often have emotional reactions to proofs or theorems. There are nifty proofs of “dull” mathematical facts and “unsatisfying” proofs of “nifty” theorems. Artists and art critics also talk about beauty. Does art have to be beautiful? Remember that Francis Bacon’s paintings may or may not be beautiful to everyone, yet there are few people who have no reaction to his work. Art is concerned with communication of emotions as well as beauty. Some people may see little emotional content in many of M.C. Escher’s prints, but it’s hard not to be “impressed” by the patterns he created, though familiarity with what he did makes others who now do similar things appear to be “copycats.” Some find Escher’s prints beautiful but

17 Mathematics and Art: Connecting Mathematicians and Artists

469

with a different beauty from the great works of Rembrandt. Like art itself, the issues of beauty, communication, and emotions are complex subjects, but then so is mathematics.

Mathematical Tools for Artists Art is born of the attempt by humans to express themselves about the experience of life. Art can take the form of writing, music, painting, architecture, and sculpture, as well as a variety of other forms of expression. There is art in the combining of function and aesthetics in such everyday objects such as plates, cutlery, and lamps. Mathematicians have been able to assist artists by creating “tools” of various kinds for them. Such tools sometimes consist of theorems which show the limitations of what artists can do. One cannot attempt to represent more than five regular (faces are regular convex polygons) convex polyhedra in Euclidean three-dimensional space because mathematics shows there are only five such regular or Platonic solids. Dodecahedra (solids with 12 faces) can be used to put 1 month of a calendar on each face, but the number of faces of a regular convex polyhedron in Euclidean threedimensional space is 4, 6, 8, 12, or 20. There can be no others. Yet there is another convex symmetrical solid, the rhombic dodecahedron (12 congruent rhombus faces), that can be used for displaying a year’s worth of monthly calendars. While the regular dodecahedron has 120 symmetries, the rhombic dodecahedron has only 48 (Fig. 1).

Fig. 1 Two symmetrical dodecahedra. For both of these polyhedra, when they sit on a plane, there is another face which sits in a parallel plane. This means one can display the 12 months of a year on the faces of these solids, though one almost never sees this done with the rhombic dodecahedron on the left. (Courtesy of Wikipedia)

470

J. Malkevitch

A much more important issue is the realism with which artists can draw on a flat piece of paper what they perceive when they look out at their three-dimensional world. If one looks at attempts at scene representation in ancient Egyptian and Mesopotamian art, one sees that phenomena associated with the human vision system are not always respected. We are all familiar with the fact that objects far away from us appear smaller than they actually are and that lines which are parallel appear to converge in the distance. This phenomenon is familiar to anyone who sees a straight section of railroad tracks receding into the distance. These features, which are a standard part of the way three-dimensional objects are now usually represented on a planar surface, were not fully understood before the Renaissance. It is common to refer to artists as using “perspective” (or “linear perspective”) to increase the realism of their representations. The issues and ideas involved in understanding perspective are quite subtle and evolved over a long time. There has been an interaction between scholars and practitioners with regard to ideas about “perspective” which parallels the interactions between theory and application that goes on in all the arenas where mathematical ideas are put to work. An artist may want to solve a problem better than he or she did in the past and will not always be concerned with the niceties of proving that the technique being used always works or has the properties the artist wants. An analogy for a more modern situation is that if the current system used to route email packets takes on the average 7.2 units of time and one discovers a way of doing the routing in 6.5 units of time on the average, one may not worry that one can prove the very best system might do the job in 6.487 units of time even if no one has found such a system yet. Questions about perspective are very much in the spirit of mathematical modeling questions, mathematical modeling being the part of mathematics concerned with using mathematics to get insights into subjects outside of mathematics. In the usual approach, one is concerned with the issue of the perception of, say, a scene in three-dimensional space on a flat canvas under the assumption that the scene is being viewed by a “single point eye.” Yet we all know that humans are endowed with binocular vision! We are attacking such binocular vision questions today because we have the mathematical tools to take such questions on, while the artist/mathematicians of the past had to content themselves with simpler approaches. A variety of people whose names are known to mathematicians (but perhaps not to the general public) have contributed to a theory of perspective (Fig. 2). Though every calculus student knows the name of Brook Taylor (1685–1731) for his work on power series, how many mathematicians know that Taylor wrote on the theory of linear perspective? On the other hand, every art historian will recognize the name of Piero della Francesca (c. 1412–1492), yet how many of these art historians (or mathematicians) will be familiar with his contribution to mathematics? Similarly, Girard Desargues (1591–1661) is a well-known name to geometers for his work on projective geometry. Projective geometry of the plane concerns points and lines, but unlike Euclidean geometry where lines can be parallel (never intersect or meet), in a projective plane, any pair of lines always intersects. Few people involved with art are familiar with Desargues’s work. The diagram

17 Mathematics and Art: Connecting Mathematicians and Artists

471

Fig. 2 Portrait of Brook Taylor. (Courtesy of Wikipedia)

A'

A C' C

Eye

B

B'

Fig. 3 Two triangles thought of as being in different planes in three-dimensional space, whose corresponding vertices pass through a single point. (My drawing)

below (a portion of a “Desargues Configuration”), familiar to students of projective geometry, can be thought of as a plane drawing of an “eye” (point) viewing triangles which lie in different planes (Fig. 3). Here we are thinking of a drawing in the plane as representing something we are thinking about in three-dimensional space. What is true of this diagram is that if the sides of the “corresponding triangles” are not parallel, then these sides meet in three points which all must lie on the same line. One can state different versions of what Desargues discovered in the Euclidean plane, but the natural place to think about the result is in the real projective plane

472

J. Malkevitch

which, from a theoretical point of view about geometry, offers an alternative to geometry of the Euclidean plane. In Euclidean geometry use is made of an axiom of John Playfair (1748–1819) that given a point P not on a line l, there is a unique parallel line to l through P, which captures more intuitively the content of the somewhat complicated statement of Euclid’s fifth postulate. In the real projective plane, pairs of different lines always meet; in the real hyperbolic plane, given a point P not on a line l, there will be infinitely many lines through P which are parallel (do not meet) l. Hyperbolic geometry is fascinating but will not be discussed here though as we will see later, occasionally artists want to use ideas from hyperbolic geometry to help express ideas in a way that can’t be done in Euclidean geometry. Johann Lambert (1728–1777), whose name is best known for having produced results which would follow from assuming that Euclid’s fifth postulate (or Playfair’s axiom) does not hold, also made systematic contributions to the mathematics of perspective. Today we understand better that there are alternatives to Euclidean geometry in both theoretical mathematics and possible models of behavior in the “real world.” Though perspective is a well-mined area, this does not stop the flow of continued thoughts on the subject. For those accustomed to work on one-point or two-point perspective, there is the monograph of D. Termes who treats one through 6-point perspective! Related to the tool of linear perspective is the branch of geometry known as descriptive geometry. While in the nineteenth century descriptive geometry was widely taught, especially in schools of engineering, today the subject is not widely known. The reason in part is that computer software makes it possible for people not familiar with descriptive geometry to perform tasks which make explicit knowledge of it increasingly obsolete. Descriptive geometry provides a set of procedures for representing three-dimensional objects in two dimensions. The two-dimensional representation might be on either a piece of paper or a computer screen. These techniques are of great importance to engineers, architects (notably Frank Gehry), and designers. To design, say a large aircraft, might involve tens of thousands of drawings. The roots of the subject lie with people such as Albrecht Dürer (1471– 1528) and Gaspard Monge (1746–1818). If an artist, creative designer, sculptor, or architect cannot get across his/her conception of how to manufacture or otherwise assemble a “creative” design they wanted to have made, then the work involved might go unrealized. Descriptive geometry supports both constructive manufacture and creative design by giving procedures for showing how to represent proposed three-dimensional creations on a flat surface. To give some of the flavor of the issues involved, the diagram below shows in blue how parallel lines “project a triangle onto a line,” while the red lines show how the same triangle is “projected” from the “eye” onto the same line. Intuitively, parallel projection can be thought of as a projection from an “eye” looking at the triangles from “infinitely far away.” A’, B’, and C’ show where the vertices of a triangle are moved by “parallel” projection, and A”, B”, and C” show where the vertices are moved by “conical” projection (Fig. 4). Here is a rather cute result which grew out of this interaction between mathematics and the needs of artists for representing three dimensions on a flat plane.

17 Mathematics and Art: Connecting Mathematicians and Artists

473

A'' parallel projection

A A'

B

eye

B'=B''

C' C

C''

Fig. 4 Two approaches to “projecting” a triangle onto a line in a plane. (My drawing)

The result is known as Pohlke’s theorem. Karl Wilhelm Pohlke (1810–1876) was a German painter and teacher of descriptive geometry at an art school. He formulated this result in 1853, though the first proof seems to be due to K.H.A. Schwarz (1843– 1921) in 1864. The theorem is quoted in various levels of generality. Here is one version: Theorem (Pohlke): Given three segments (no two collinear) of specified length (not necessarily the same) which meet at a point in a plane, there are three equal length line segments which meet at right angles at a point in 3-dimensional space such that a parallel projection of these segments maps them onto the three chosen line segments in the plane. Intuitively, this means that if one wants to draw a cubical box in the plane, one can draw any triad of lines for a corner of the cube because there is some position of a cube in three-dimensional space which maps to the given triad. Thus, in the diagram below, the triad on the left can be completed to form a “cube,” and there is some set of three orthogonal segments in three-dimensional space which can be mapped using parallel projection onto the triad on the left. The angles at the vertex of the triad are (in degrees) 90, 135, and 135. In general the sum of three angles in a forming a triad will be 360 degrees (Fig. 5). Sometimes Pohlke’s theorem is referred to as the fundamental theorem of axonometry which, like descriptive geometry, deals with the problem of drawing three-dimensional objects in the plane.

474

J. Malkevitch

Fig. 5 A triad of lines used as a corner of a box. (My drawing)

Symmetry Art critics have evolved a language for discussing and analyzing art. It turns out that some mathematics can be useful in analyzing art. Some art or parts of pieces of art consist of things that are pleasing to the eye because they are symmetrical in a mathematical sense. Although the study of symmetry has implicitly been done within mathematics for a long time, in some ways its systematic study is quite recent. Thus, it was Felix Klein (1849–1925) who called attention to the fact that one way of classifying different kinds of geometries was by looking at the geometric transformations in each of these geometries which preserved interesting properties. In particular, it is of interest in Euclidean, spherical, or Bolyai-Lobachevsky geometry to look at the geometric transformations that are isometries. An isometry is a transformation that preserves distance. A complete list of isometries in the Euclidean plane consists of translations, rotations, reflections, and glide reflections. Of course, it is important to keep in mind that many of the pieces of art that mathematicians (and others) analyze using mathematical ideas may not reflect attempts on the part of the artist to incorporate these mathematical ideas in his/her work. Just as someone who makes a cubical box may not realize that the cube is a regular polyhedron (one where all the vertices are alike and where all the faces are congruent regular polygons), a maker of a weaving may not know anything about isometries (distance-preserving transformations). Thus, a mathematician may state which of the 17 wallpaper groups is used in the design of a rug, but that does not mean that the designer of the rug invoked any mathematical thinking at all. Yet symmetry ideas and the conservatism of the mechanisms of cultural transmission have been used as a tool by social scientists. Archeologists and anthropologists have attempted to use symmetry ideas to date artifacts (pottery or fabrics) and to study trade and patterns of commerce. A major tool in the analysis of symmetry has been the concept of a group, which has been studied by mathematicians who think of themselves as algebraists or geometers. Using groups or symmetry ideas is insightful but typically involves the reality that the actual art is being “modeled,” since it rarely meets the strict mathematical requirements involved. The group concept has a rich and complicated

17 Mathematics and Art: Connecting Mathematicians and Artists

475

history with ties to the study of the theory of equations (attempts to show that one could not find formulas to solve quintic (fifth-degree) polynomial equations). By late in the nineteenth century, group theory was being used as a tool to help crystallographers understand the symmetry of crystals and other naturally occurring structures. Out of this interest grew the work that made possible the classification of tilings and patterns using group theory considerations. Artists, architects, and designers (of clothes and furniture) often make use of patterns on a band or frieze. Here are some samples of such “infinite” symmetric patterns on a frieze, constructed using letters of the alphabet. It might seem that infinite variety is attainable using different motifs (here the motifs all consist of letters but one can have more varied motifs). ...L L L L L... ....H H H H H..... ....p b p b p b... ....p q p q p q.... ...b q b q b q.... ...W W W W W... ....C C C C C... ...X X X X X... ...E E E E E... ...A A A A A... .....p d p d p d ... .....b p b p b p .... ....d b d b d b d b ... It turns out that some of these 13 patterns “look” different but there is an amazing mathematical result showing that there is a mathematical sense in which any frieze pattern is one of 7 kinds, so you might enjoy seeing which of the patterns above are the same and which are different. The use of groups to classify frieze patterns by groups was discussed by Speiser, Pólya, and Niggli in 1924 in a joint paper. In about 1980 Branko Grünbaum and Geoffrey Shephard found a way of generalizing the notion of a pattern on a strip which results in 15 different types of patterns. Often mathematics grows when things which were thought to be the same can be seen to be different by identifying some “new” property that distinguishes them. One can look at the symmetry of a single motif or ornament. An example of such motifs (illustrated from patterns used in batiks) is shown below. Such ornaments typically have rotational symmetry and/or reflection symmetry (Fig. 6). A more complex pattern such as the one below can be built up from simple motifs. Such patterns have translational symmetry in one direction. Designs or patterns of this kind are known as strip, band, or frieze patterns. Think of the diagram as consisting of three separate “vertically” positioned friezes, which go off to infinity in both the up and down directions (Fig. 7). The motifs used to make such frieze patterns may be isolated from one another or coalesce into a “continuous” geometric design along the strip. If a pattern has

476

J. Malkevitch

Fig. 6 Two symmetrical patterns but note that the symmetry is not mathematically “perfect”

translations in two directions, then the pattern is often referred to as a wallpaper pattern. Wallpaper patterns are usually thought of as, say, a white piece of paper, the background on which a symmetric pattern has been drawn, where the pattern appears in a single color, say black. However, mathematicians have looked at the problem of enumerating symmetrical patterns with many colors which have translations in two directions. Thus, there are 17 two-colored frieze patterns and 46 two-colored wallpaper patterns. Whereas an artist may choose to create a pattern with absolute and strict adherence in all details to have symmetry in the pattern, this is not all that common for “tribal” artists or artisans. Thus, if one looks carefully at a rug which at first view looks very symmetrical, it is common to see that at a more detailed level it is not quite totally symmetric either in the use of the design or the colors used in different parts of the design. One can see the small liberties that are taken either as a sign of the difficulty of making patterns exact without machines to make the designs or because the artist wants consciously to make such small variations. In analyzing the symmetry of such a pattern, it probably makes sense to idealize what the artist has done before applying some mathematical classification of the symmetry involved.

17 Mathematics and Art: Connecting Mathematicians and Artists

477

Fig. 7 A symmetrical pattern which is two-dimensional but each of the three columns can be thought of as a frieze pattern

Fig. 8 Symmetrical pattern

In the patterns shown above, no color appears. We have a black design on a white background. However, in discussing the symmetry of a pattern, one can study the symmetry involved if color is disregarded or by taking color into account. If you look at the batik below from a symmetry point of view, you must idealize (model) what is going on to use mathematics. This batik is not infinite in either one or two directions. You must decide what colors have been used and what is the background color (Fig. 8). Many find it interesting to use mathematics to decide what symmetry pattern is involved for various interpretations of the whole or parts of the design one sees. E. Fedorov (1859–1919) enumerated the 17 two-dimensional patterns in 1891 in a paper which did not receive wide attention because it was in Russian. P. Niggli

478

J. Malkevitch

(1888–1953) and G. Pólya (1887–1985) developed the 7 one-dimensional and the 17 two-dimensional patterns in the 1920s; it was through this work that a mathematical approach to the analysis of symmetry patterns became more widely known. One extension of this work to color symmetry was accomplished by H. Woods in the 1930s. It turns out that there are 46 two-color types of patterns. Subsequently much work has been done with regard to studying symmetry in higherdimensional spaces and using many colors. Surprisingly recently Branko Grünbaum and Geoffrey Shephard in a long series of joint papers and in their seminal book Tilings and Patterns (1989) have explored many extensions and facets of pattern, tilings, and their symmetries. In particular, they explored the interaction between the use of a motif and symmetry. This enabled them, for example, to develop a “finer” classification of the 7 “frieze” patterns and 17 “wallpaper” patterns. Unfortunately, this work is not as widely known as it should be. Many people have been instrumental in expanding mathematical knowledge of symmetry and pattern to scholars outside of mathematics as well as to the general public. One of the most influential and early books of this kind was the book called Symmetry by Hermann Weyl (1885–1955). Also noteworthy among these popularizers are Doris Schattschneider, Branko Grünbaum (1929–2018) and Geoffrey Shephard (1927– 2016), Marjorie Senechal, Michele Emmer, H. S. M. Coxeter (1907–2003), and Dorothy Washburn (an anthropologist), Donald Crowe, and Kim Williams. These individuals called attention to the use of symmetry as a tool for insight into various aspects of fabrics, ethnic designs and cultural artifacts, architecture, and art, as well as to artists such as Escher whose work tantalizes people with a mathematical bent.

Asymmetry Artistic endeavors that were symmetric date back a long way, for example, mosaic tilings. Roman examples exist in both Italy and Africa. While symmetric objects seem to attract human attention as both art and mathematics, so do “random” structures. Early art was primarily representational, images of people, animals, homes, and panoramic scenes of beauty. However, art also came to encompass nonrepresentational works. Intriguingly, the mathematical theory of chance events has been much more recent in its development than that of “deterministic” (non-chance) ones. Examples of such art include various fractal shapes. Perhaps surprisingly, when points are plotted using color coding from a function which involves complex numbers, the images which result are both beautiful and visually appealing (Fig. 9).

Mathematical Artists and Artist Mathematicians Not surprisingly, in light of the internal aesthetic qualities of mathematics, many mathematicians (and computer scientists) have chosen to express themselves not only by proving theorems but by producing art. There are many such individuals including Helaman Ferguson, Nathaniel Friedman, George Hart, Koos Verhoeff,

17 Mathematics and Art: Connecting Mathematicians and Artists

479

Fig. 9 A very appealing pattern which is very asymmetrical. This is an example that involves fractals. (Courtesy of Wikipedia)

Michael Field, and many others. Complementing these career “mathematicians” who are also artists is a group of people who are not mathematicians but who have drawn great inspiration from mathematical phenomena. Examples of such individuals are Brent Collins, Charles Perry, and Sol LeWitt. Not surprisingly there are many architects whose work has a feeling of having been influenced by “technical capability” made possible by mathematics and computer systems. Though perhaps having only a tangential connection with mathematics, the distinguished architect Frank Gehry has discussed how the availability of CAD (computer-aided design) software has made it possible for him to express himself in a way that would not otherwise have been possible. Structural engineering has many ties with mathematics. If you are not familiar with the work of Santiago Calatrava, you are in for a treat. Many mathematicians find his bridges and other structures appealing and having a mathematical flavor. There have also been attempts of various kinds to generate art with algorithms. Some of this work is rather interesting. For the general public, there is one artist whose work, perhaps more than any other, is seen as having a mathematical quality (Fig. 10). This artist was M. C. Escher (1898–1972). This is true even though Escher did not see himself as having mathematical talent. Yet despite his lack of formal study of mathematics, Escher approached many artistic problems in a mathematical way. Doris Schattschneider has been instrumental in calling to the public’s attention Escher’s work and its relation to mathematics. Not only did mathematical issues inspire Escher, but his work inspired others to create art which relates to mathematics (Fig. 11). Some of Escher’s work relates to drawings which can be made in the plane of what appear to be three-dimensional situations but are impossible objects in the sense that they can’t be physically achieved. These objects which are related to issues involving visual illusions have both mathematical and artistic aspects (Fig. 12). Escher was influenced by at least one very distinguished mathematician (geometer), Harold Scott MacDonald Coxeter (1907–2003). Escher interacted with

480

J. Malkevitch

Fig. 10 A three color pattern due to Escher. (Courtesy of Wikipedia)

Fig. 11 A piece of sculpture inspired by Escher on the Twente University Campus in Holland. (Courtesy of Wikipedia)

17 Mathematics and Art: Connecting Mathematicians and Artists

481

Fig. 12 Three visual illusions. The Penrose staircase, the impossible triad, and the Penrose triangle. (Courtesy of Wikipedia). (Courtesy of Wikipedia) Fig. 13 A photo of the distinguished geometer and algebraist Harold Scott MacDonald Coxeter, known to his friends as Donald. (Courtesy of Wikipedia)

Coxeter about the difficulties he was having with representing “infinity” in a finite region. Coxeter responded by showing the connection between tilings that involved infinitely many tiles of the hyperbolic plane but could be drawn in a finite region of the Euclidean plane. Coxeter has explained the details of this mathematical connection. Amazingly, Coxeter continued to do mathematics way into his 90’s (Fig. 13). While there have been many examples of mathematician-artist collaborations, a rarer event, until the Bridges Organization phenomenon, was a community where art and mathematics interacted and flourished in a “college” setting. Black Mountain College (1933–1957) was an experimental art school located in North Carolina founded in 1933. Among the distinguished artists who were

482

J. Malkevitch

associated with the college were John Cage (1912–1992) music, Merce Cunningham (1919–2009) dance, Walter Gropius (1883–1969) architect, Willem (1904– 1997) and Elaine (1918–1989) de Koning, Franz Kline (1910–1962), Robert Motherwell (1915–1991), and Dorothea Rockburne (1932- ) Also remarkably, (Richard) Buckminster Fuller (1895–1983) appears to have done some of the development work and experimentation with the “geodesic dome” while there in the summers of 1948 and 1949. Fuller domes appear in the architecture of many houses and botanical spaces. The mathematics associated with ideas related to Fuller domes have attracted lots of attention (Figs. 14–15). Fuller domes are related to the study of convex three-dimensional polyhedra all of whose faces are triangles and where the vertices of the polyhedra have exactly 12 vertices with 5 edges at a vertex and h (more than 1) vertices with exactly 6 edges at a vertex. The collection of convex polyhedra where the roles of vertices and faces are “interchanged” for the polyhedra above have come to be called fullerenes. These convex polyhedra have 3 edges at every vertex, exactly 12 faces with 5 sides and h (greater than 1) faces with 6 sides. Models of both of these kinds of polyhedra are very attractive physically as well as having nifty mathematical properties. One reason perhaps for the connection between mathematics and art that seems to have been fostered by Black Mountain College was the presence on the faculty there of the distinguished mathematician Max Dehn (1878–1952). Dehn, though he worked in many areas of mathematics, was primarily known for his work in topology and geometry. In fact, part of what brought Dehn to fame was that he solved Hilbert’s third problem. David Hilbert (1862–1943) having been one of

Fig. 14 Photo of a “Fuller” dome. (Courtesy of Wikipedia)

17 Mathematics and Art: Connecting Mathematicians and Artists

483

Fig. 15 Diagram related to the polyhedra associated with Fuller domes. A near-triangulation with interior vertices of degree only 5 or 6. (Courtesy of Wikipedia)

the most important contributors to mathematics in the early parts of the twentieth century produced a famous list of problems in 1901 that he argued when solved would lead to important new insights and methods into topics that concerned mathematicians at the time. One of these problems asked in essence if was whether it was possible cut up two three-dimensional convex polyhedra of the same volume each into polyhedral pieces which could be reassembled to form the other. Dehn showed that this could not be done for a cube and regular tetrahedron of the same volume. He showed that if the two polyhedra had the same what today is called the Dehn invariant, then such a decomposition was possible. A discussion of the two-dimensional analog of this question will be treated briefly later.

Geometrical Art While for a long period of time painting was dominated by portraits and landscapes, by the time the twentieth century arrived, there were many artists who seemed to be inspired by shapes and the way they interacted with each other as well as the way the shapes were colored. Abstract art certainly preceded the twentieth century but purely geometric shape for its own sake as a source of interest was much less common. Some of this geometric art emphasized symmetry but much of this work also used “disordered” shapes. There were many aspects to the geometric art “movement.” While some of the artists associated with geometric abstraction perhaps were not “inspired” by mathematical considerations, their work often speaks to mathematicians aesthetically and/or emotionally and often suggests mathematical questions

484

J. Malkevitch

to geometers. A good example of such an artist was Piet Mondrian (1872–1944), many of whose most famous works center around geometrical shape, in particular, rectangles (Fig. 16). Some of his work encouraged mathematicians to analyze when a rectangular shape could be decomposed into other rectangles with interesting properties. A typical example of such a question might be: When can an integer-sided square be decomposed into other squares no pair of which are congruent to each other? This question leads to the notion of the “perfect squared square.” The problem has a complicated history, but one noteworthy contribution was by Rowland Brooks (1916–1993), Cedric Smith (1917–2002), Arthur Stone (1916–2000), and William Tutte (1917–2002) while they were undergraduate students at Cambridge University. While Mondrian may be the best-known such artist, there were many artists attracted to geometric painting. A short list includes, with no attempt to be inclusive: Frantisek Kupka (1871–1957) Bart van der Leck (1876–1978) Kazimir Malevich (1879–1935) Theo Van Doesburg (1883–1931) Sonia Delaunay (1885–1979) Josef Albers (1888–1976) Ilya Bolotowsky (1907–1981) Barnett Newman (1905–1970) Morris Louis (1912–1952)

Fig. 16 A painting of Piet Mondrian. (Courtesy of Wikipedia)

17 Mathematics and Art: Connecting Mathematicians and Artists

485

Ad (Adolph) Reinhardt (1913–1967) Carmen Herrera (1915- ) Ellsworth Kelly (1923–2015) Frank Stella (1936- ) Slavik Jablan (1952–2015) Another term in the art world associated with this interest in geometric shapes was hard-edge painting. In this style of painting, crisp transitions between the objects that the eye focuses attention on are made in much the same way that mathematicians aim for clarity in distinctions between different kinds of shapes by using definitions. Another strand of geometric art with mathematical appeal concerns what has come to be called optical art. Some of this art is tied to interest in optical illusions, but much of it rests on “surprise” aspects of the interplay of shapes and light. Victor Vasarely (1906–1997) Bridget Riley (1931- ) Richard Anuszkiewicz (1930–2020) Larry Poons (1937- ) Jeffrey Steele (1931- ) Ted Collier (1974- ) While the roots of geometric abstraction go back a long way, many practicing artists still find this style of communication to their liking. Many new directions are being explored by a younger generation of geometric abstraction artists.

Polyhedra, Tilings, and Dissections Drawing polyhedra was an early testing ground for ideas related to perspective drawing. Renaissance artists were involved in trying to build on historical references to “Archimedean polyhedra” which were discussed in the writings of Pappus (290–350) though Archimedes original work is lost. The Archimedean solids (traditionally these do not include the Platonic solids) are a set of convex polyhedra with the property that locally every vertex looked like every other vertex and whose faces were regular polygons, perhaps not all with the same number of sides. Rather surprisingly, no complete reconstruction occurred until the work of Kepler (1571– 1630), who found 13 such solids, even though one can make a case for there being 14 such solids. Pappus-Archimedes missed one in ancient times. The modern definition of Archimedean solids defines them as convex polyhedra which have a symmetry group under which all the vertices are alike. Using this definition, there are 13 solids, but there is little reason to believe that in ancient Greece geometers were thinking in terms of groups rather than in terms of local vertex equivalence, that is, the pattern of faces’ around each vertex being identical.

486

J. Malkevitch

In more modern times, polyhedra have inspired artists and mathematicians with an interest in the arts. Inspired by polyhedra, Stewart Coffin has created a wonderful array of puzzle designs which require putting together pieces he designed made from rare woods to form polyhedra. Coffin’s puzzles are remarkable for both their ingenuity as puzzles and their beauty. This beauty is a reflection of the beauty of the polyhedral objects themselves, but also the beauty of the rare woods he used to make his puzzles. Coffin showed creativity in selecting symmetrical variants of wellknown polyhedra. Like Escher, who inspired many others, Coffin’s work has been an inspiration to others. Good puzzles engender the same sense of wonder that beautiful mathematics inspires. George Hart, whose background is in computer science, provides a recent example of a person who is contributing to the mathematical theory of polyhedra, while at the same time he uses his skills as a sculptor and artist to create original works inspired by polyhedral objects (Fig. 17). Another artist who is inspired by polyhedra, symmetry, and topological phenomena is Bathsheba Grossman. A sample of her art appears below (Fig. 18). There is a long tradition of making precise models of polyhedra with regularity properties. It is common at mathematics conferences for geometers to feature a models room where mathematician/artists who enjoy building models can display the beauties of geometry in a physical form. They complement the beauty of such geometric objects in the mind’s eye. The beauty of polyhedral solids in the hands of a skilled model maker results in what are, indeed, works of art. Magnus Wenninger (1919–2017) is the author of several books about model making. His models are especially beautiful. Here is a small sample, which only hints at the variety of models Wenninger has made over many years (Fig. 19). His models of “stellated” polyhedra are particularly striking.

Fig. 17 A sculpture by George Hart. (Courtesy of George Hart)

17 Mathematics and Art: Connecting Mathematicians and Artists

487

Fig. 18 Sculpture by Bathsheba Grossman. (Courtesy of Wikipedia)

Fig. 19 A sample of models of symmetrical polyhedra, made by Magnus Wenninger. (Courtesy of Magnus Wenninger, now deceased)

488

J. Malkevitch

A tiling of the plane is a way of filling up the plane without holes or overlaps with shapes of various kinds. For example, one can tile the plane with congruent copies of any triangle, and more surprisingly with congruent copies of any simple quadrilateral, whether convex or not. Tilings are closely related to artistic designs one finds on fabrics, rugs, and wallpaper. Though there were scattered looks at different ways of tiling the plane which date back to ancient times, there was surprisingly little in the way of a theory for tilings of the plane as compared to what was done to understand polyhedra. Kepler did important work on tilings, but from his time to that of the late nineteenth century, relatively little work was done. Unfortunately, not only was work on tilings sporadic, but often it was incomplete or misleading. The publication of the monumental book by Branko Grünbaum (1929– 2018) and Geoffrey Colin Shephard (1927–2016), Tilings and Patterns, changed this. Many new tiling problems were addressed and solved, and the generation of a variety of software tools for creating tilings (and polyhedra and playing games) of different kinds was developed. Daniel Huson and Olaf Fredrichs (RepTiles) and Kevin Lee (Tesselmania) developed very nice tiling programs, but some of the locations this software used to be available at are no longer supported. A more recent source of art inspired by mathematics has been related to dissections. A good starting place for the ideas here is the remarkable theorem known as the Bolyai-Gerwien-Wallace theorem. It states that two (simple) polygons A and B in the plane have the same area if and only if it is possible to cut one of the polygons up into a finite number of polygonal pieces and assemble the pieces to form the other polygon. In one direction, this result is straightforward: if one has cut polygon A into pieces which will assemble to form polygon B, then B’s area is the same as A’s area. The delightful surprise is that if A and B have the same area, then one can cut A up into finitely many polygonal pieces and reassemble the pieces to get B. Where does the art come in? Given two polygons with the same area, one can ask for two extensions of the Bolyai-Gerwien-Wallace theorem: a. Find the smallest number of pieces which into which A can be cut and reassembled to form B. b. Find pieces with appealing properties into which A can be cut and reassembled to form B. These properties might be that all the pieces are congruent, are similar, or have edges which are related by some appealing geometric transformation. Greg Frederickson has collected together a large amount of material about how polygons of one shape can be dissected into other polygons of the same area. These dissections concentrate on dissections of regular polygons (which may be convex or “star-shaped”) into other regular polygons. One might expect that the mathematical regularity of the objects leads to aesthetic solutions. This turns out to be the case. Frederickson also describes how, with a suitable mechanism, one could address how to attach the pieces which arose from one polygon and move them so that they created the other polygon. These dissections are known as hinged dissections. The first way to hinge the pieces that comes to mind is to attach the pieces at their vertices. There are lovely examples of hinged dissections of this kind including ones

17 Mathematics and Art: Connecting Mathematicians and Artists

489

that prove the Pythagorean theorem geometrically by showing how one can cut the squares on the two legs of a right triangle and assemble the pieces to form the square on the hypotenuse. However, there is another ingenious way to do the hinging. This involves hinging the edges so that the polygons that are joined along these two edges can rotate with respect to one another. This type of hinging is known as twisthinging. Frederickson arranged for several very attractive hinged dissections to be realized physically with polygonal sections of the dissection involved to be made of beautiful woods. These physical models draw heavily for their beauty on the mathematics behind the dissections. For example, in one of the physically realized hinged dissections commissioned by Frederickson, a regular hexagon with a hole is twist-hinge dissected into a hexagram with a hole of the same area. The mathematics behind this dissection is a way to dissect a hexagon with a hole into a hexagram with a hole. Based on this dissection, Frederickson cleverly produced a hinge-twist dissection. This appealing object (Fig. 20) is not very interesting as a puzzle but creates a lovely effect as one watches the unexpected transformation between the two shapes evolve as one manipulates the twist-hinged pieces.

Origami Traditional origami was concerned with taking a single piece of paper and folding it into complex shapes, typically an animal or something representational. However, Tomoko Fusè revolutionized the world of origami from a mathematical perspective by popularizing “modular” origami. In modular origami, one typically starts with congruent pieces of paper (usually, but not always, squares) and folds each of these into identical “units.” These units are then “woven” together to form highly symmetrical objects such as polyhedra, tilings, or boxes. By using appropriate colors, one can often construct fascinating paper models of a wide variety of polyhedra and tilings with attractive symmetry properties. The creativity involved in unit origami is the ingenious panels which have been developed and the way that the panels can be assembled. Fusè’s books appear in the “art section” of book

Fig. 20 Three positions of a “flexible” sculpture of Greg Frederickson, which transforms a convex hexagon with a hole into a “star” hexagon with a hole. (Courtesy of Greg Frederickson)

490

J. Malkevitch

stores. Interestingly, for people with some experience in origami, those of Fusè’s books which have not been translated into English can still be used because of the universality of the instruction system for folding origami (e.g., symbols for mountain folds, valley folds, etc.). Along with the artistic aspects of origami constructions has been a parallel development of a mathematical theory of origami. This has taken many approaches. The elaborate mathematical theory of what plane figures can be drawn using the traditional Euclidean construction tools of a straight edge (unmarked ruler) and compass has an origami companion. What are the shapes that can be constructed using various rules (axioms) concerning the folding of paper? Thomas Hull, Erik Demaine, and others have also studied issues related to folding and origami. A major area of interest has been the study of the crease patterns (system of lines on the paper) which can be folded “flat.” The mathematics needed involves ideas and methods somewhat different from what was done in the past in attempting to understand how a piece of a plane (a square of origami paper) could be transformed by a geometric transformation, because at the end of the transformation parts of the origami paper touch each other, though they do not interpenetrate other parts of the paper. Here is an example of a spectacular result that has been proved in this area. Suppose one is allowed to make one cut along a straight line after having folded a piece of paper flat with the goal of taking the pieces that are cut off and opening them up. What shaped pieces can one get in this way? The surprising answer is that one can get any graph consisting of vertices and straight line segments that can be drawn in the plane! For example, one could cut out the shape of something prosaic like the letter “I” or the outline of a butterfly. This result was originally developed by Erik Demaine, Martin Demaine, and Anna Lubiw. Subsequently a different approach to the result was developed by Marshall Bern, Erik Demaine, David Eppstein, and Barry Hayes. One can make many different kinds of polyhedral objects using modular origami. Here is a sample of origami models of Helena Verrill (Fig. 21): Origami models of polyhedra use approaches where the panels one makes become the faces of the polyhedra and so that the challenge becomes producing panels with different numbers of sides with the same edge lengths. One can also produce polyhedra which are “pyramided.” By this I mean that the solids represent convex polyhedra with pyramids erected on each face. (Those are not stellations in the usual sense that geometers use this term.) Other polyhedra like these emphasize the edges of the polyhedron and in essence serve as rigid rod models for the polyhedra. They resemble the drawings that Leonardo da Vinci made to demonstrate the use of emerging techniques of drawing polyhedra in perspective. In addition to being beautiful objects, many of the polyhedra which can be created using origami paper suggest mathematical questions of interest. As a simple example of this, one can make a cube out of six unit origami pieces. If these six pieces are all the same color, then one can make only one “type” of colored cube. Suppose that one has three panels of one color and three panels of another color. How many inequivalent cubes can one make?

17 Mathematics and Art: Connecting Mathematicians and Artists

491

Fig. 21 Models used with Helena Verrill’s permission. (Courtesy of Helena Verrill)

Fig. 22 Used with the permission of Joe Gilardi. (Courtesy of Joe Gilardi)

The diagram below shows an origami construction based on ideas of Thomas Hull and folded by Joe Gilardi. At the mathematical level, what one sees is a nested collection of tetrahedra folded from dollar bills (Fig. 22). Many also see a work of art! In the discussions above, I have charted the tip of the iceberg of the connections between mathematics and art. These connections are good for both mathematics and art and will continue to grow and prosper.

492

J. Malkevitch

Bridging the World of Art and Mathematics For some time now, there has been an organization called Bridges whose goal is to foster links between art and mathematics. The Bridges Organization has an annual meeting and publishes online proceedings of the many talks which are given at its meetings, which include many samples of art that are mathematically inspired as well as treating topics which help artists use mathematics. The American Mathematical Society has sponsored an exhibit of mathematically based art (including textiles) at its annual Joint Mathematics Meetings for a number of years. This includes having a “juried” award system for the best art in the display held at the Joint Meeting. This exhibit together with the Bridges Organization is continuing to create incentives and interest to foster mathematician artists and art inspired by mathematics. Acknowledgments This chapter grew out of an online column posted to the Feature Column of the American Mathematical Society for Mathematics Awareness Month in 2003.

End Notes 1. In discussing connections between art and mathematics (in particular geometry), I believe that art and mathematics are human endeavors. While some have claimed that species other than homo sapiens “create” mathematics and art, I personally don’t find these discussions compelling in understanding the nature of either mathematics or art or what they have in common.

References The literature connecting art and mathematics is especially scattered and varied. What is listed here is meant only as a small sample of what is available. There are many online sites related to this, in particular the Bridges site. Abas S, Salman A (1995) Symmetries of islamic geometrical patterns. World Scientific, Singapore Anderson K (1992) Brook Taylor’s role in the history of linear perspective. Springer, New York Auckly D, Cleveland J (1995) Totally real origami and impossible paper folding. Am Math Mon 102:215–226 Bangay S (2000) From virtual to physical reality with paper folding. Comput Geom 15:161–174 Bartashi W (1981) Linear perspective. Van Nostrand, New York Berne M, Hayes B (1996) The complexity of flat origami. In: Proceedings of 7th ACM-SIAM symposium on discrete algorithms, pp 175–183 Bixler N (1980) A group theoretic analysis of symmetry in two-dimensional patterns from Islamic art, Ph.D. Thesis, New York University Boehm W, Prautzsch H (1994) Geometric concepts for Geometric Design. A. K. Peters, Wellesley Booker P (1963) A history of engineering drawing. Chatto & Windus, London Bool F, Ernst B, Kist J, Locher J, Wierda F, Escher MC (1982) His life and complete graphic work. Harry Abrams, New York Botermans J, Slocum J (1986) Puzzles old and new. University of Washington Press, Seattle

17 Mathematics and Art: Connecting Mathematicians and Artists

493

Bourgoin J (1973) Arabic geometrical pattern & design. Dover, New York Coffin S (1990) The puzzling world of polyhedral dissections. Oxford University Press, New York Comar P (1992) La Perspective En Jeu: les dessous de limage. Découvertes Gallimard Sciences, Paris Coxeter H et al. (eds) (1986) M.C. Escher: art and science. North-Holland, Amsterdam Crannell A, Frantz M (2000) A course in mathematics and art. J Geosci Educ 48:313–316 Cromwell P (1997) Polyhedra. Cambridge University Press, London Crowe D (1971) The geometry of African art, I. Bakuba art. J Geom 1:169–182 Crowe D (1975) The geometry of African art, II. A catalog of Benin patterns. Hist Math 2:253–271 Crowe D (1981) The geometry of African art, III: the smoking pipes of Begho. In: Davis C et al. (eds) The geometric vein, (Coxeter Festschrift). Springer, New York Crowe D (1986) The mosaic patterns of H. J Woods. In: Hargittai I (ed) Symmetry: unifying human understanding. Pergamon, New York, pp 407–411 Crowe D (1994) Tongan symmetries. In: Morrison J, Garaghty P, Crowl L (eds) Science of pacific island peoples, part IV, education, language, patterns and policy. Institute of Pacific Studies, Suva Crowe D, Nagy D (1992) Cakaudrove-style masi kesa of Fiji. Ars Textrina 18:119–155 Crowe D, Torrence R (1993) Admiralty Islands spear decorations: a minicatalog of pmm patterns. Symmetry Cult Sci 4:385–396 Crowe D, Washburn D (1985) Groups and geometry in the ceramic art of San Ildefonso. Algebras Groups Geom 3:263–277 Davies C (1857) A treatise on shades, shadows, and linear perspective. A. S. Barnes and Burr, New York Demaine E (2001) Folding and unfolding linkages, paper, and polyhedra. In: Akiyama J, Kano M, Urabe M (eds) Discrete and computational geometry, vol 2098. Lecture notes in computer science. Springer, New York, pp 113–124 Demaine E, Demaine M (2001) Recent results in computational origami. In: Proceedings of 3rd international meeting of origami science, math and education (held in Monterey, CA., March) Demaine E, Demaine M, Lubiw A (1998) Folding and cutting paper. In: Akiyama J, Kano M, Urabe M (eds) Japan Conference on Discrete and Computational Geometry, vol 1763. Lecture notes in computer science. Springer, New York, pp 104–117 Demaine E, Demaine M, Mitchell J (2000) Folding flat silhouettes and wrapping polyhedral packages: new results in computational origami. Comput Geom Theory Appl 16:3–21 Descargues P (1982) Perspective: history, evolution, techniques. Van Nostrand, New York Dress A, Huson D (1991) Heaven and hell tilings. Struct Topology 17:25–42 Eastwood M, Penrose R (2000) Drawing with complex numbers. ArXiv: Math, 0001097 Edgerton S (1975) The renaissance rediscovery of linear perspective. Basic Books, New York El-Said I, Parman A (1976) Geometric concepts in islamic art. World of Islam Festival, London Emmer M (1984) M.C. Escher: geometries and impossible worlds; M.C. Escher: symmetry and space, 16mm films. International Telefilm Enterprises, Toronto Emmer M (ed) The visual mind. MIT Press, Cambridge (1993) Ernst B (1976) (Hans de Rijk), The magic mirror of M. C. Escher. Random House, New York Ernst B (1992) (Hans de Rijk), Optical illusions. Benedict Taschen Verlag, Koln Fahr-Becker G (2000) Owienero Werkstaette, Köln Farmer D (1996) Groups and symmetry. American Mathematical Society, Providence Federov E (1891a) Symmetry in the plane. In: Proceedings of the imperial Saint Petersburg society, series 2. 28, pp 345–389 (in Russian) Federov E (1891b) Symmetry of regular systems of figures. In: Proceedings of the imperial Saint Petersburg society, series 2, 28, pp 1–146 (in Russian) Field J (1985) Giovanni Battista Benedetti on the mathematics of linear perspective. J Warburg Courtauld Inst 48:71–99 Field J (1987) Linear perspective and the projective geometry of Girard Desargues. Nuncius 2:3–40 Field J (1988) Perspective and the mathematicians: Alberti to Desargues. In: Hay C (ed) Mathematics from manuscript to print. Oxford University Press, New York, pp 236–263

494

J. Malkevitch

Field J (1993) Mathematics and the craft of painting: Piero della Francesca and perspective. In: Field J, James F (eds) Renaissance and revolution: humanists, craftsmen and natural philosophers in early modern Europe. Cambridge University Press, London, pp 73–95 Field J (1995) A mathematician’s art. In Piero della Francesca and his Legacy. In: Lavin M (ed) Studies in the history of art, number 48, center for the advanced study of the visual arts. National Gallery of Art, Washington, pp 177–197 Field R (1996) Geometric patterns from Churches & Cathedrals. Tarquin, St. Albans Field J (1997) The invention of infinity: mathematics and art in the renaissance. Oxford University Press, New York Field R (2004) Geometric patterns from islamic art & architecture. Tarquin, Norfold Field J, Gray J (1987) The geometrical work of Girard Desargues. Springer, New York Frantz M (1998) The telescoping series in perspective. Math Mag 71:313–314 Frederickson G (1997) Dissections plane & fancy. Cambridge University Press, New York Frederickson G (2002) Hinged dissections: swinging & twisting. Cambridge University Press, New York Frederickson G (2001) Geometric dissections that swing and twist. In: Akiyama J, Kano M, Urabe M (eds) Discrete and computational geometry, vol 2098. Lecture notes in computer science. Springer, New York, pp 137–148 Gasson P (1983) Geometry of spatial forms: analysis, synthesis, concept formulation and space vision for CAD. Ellis Horwood, New York Gamwell L (2006) Mathematics + Art: A cultural history. Princeton U. Press, Princeton Gerdes P (1999) Geometry from Africa. Mathematical Association of America, Washington Glassner A (1999) Andrew Glassner’s notebook: recreational computer graphics. Morgan Kaufmann, San Francisco Gray J (1979) Ideas of space: Euclidean, Non-Euclidean, and relativistic. Oxford University Press, London Grünbaum B (1994) Regular polyhedra. In: Grattan-Guinness I (ed) Companion encyclopedia of the history and philosophy of the mathematical sciences. Routledge, London, pp 866–876 Grünbaum B, Shephard G (1986) Is there an all-purpose tile? Am Math Mon 93:545–551 Grünbaum B, Shephard G (1987) Tilings and patterns. Freeman, New York Grünbaum B, Grünbaum Z, Shephard G (1986) Symmetry in moorish and other ornaments. Comput Math Appl 12:641–653 Grünbaum B, Shephard G (2016) Tilings and patterns, Second edition. Dover, New York Gurkewitz R, Arnstein B (1995) 3-D geometric origami: modular polyhedra. Dover, New York Hanson R (1995) Molecular origami: precision scale models from paper. University Science Books, Sausalitio, Hargettai I (ed) (1986) Symmetry1: unifying human understanding. Pergamon, Oxford Hargettai I (ed) (1989) Symmetry2. Pergamon, Oxford Hargittai I (ed) (1992) Fivefold symmetry. World Scientific, Singapore Holden A (1991) Shapes space and symmetry. Dover Press, New York Hull T (1994) On the mathematics of flat origamis. Congr Number 100:s215–224 Hull T (1996) A note on “impossible” paper folding. Am Math Mon 103:240–241 Jablan S (1995) Mirror generated curves. Symmetry Cult Sci 6:275–278 Jones O (1856) The grammar of ornament, day and son, London, 1856, reprint, Studio Editions, London (1988) Kaplan C, Salesin D (2000) Escherization, international conference on computer graphics and interactive techniques. In: Proceedings of the 27th annual conference on computer graphics and interactive techniques, association of machinery Kappraff J (1990) Connections. The geometric bridge between art and science. McGraw Hill, New York Kemp M (1990) The science of art: optical themes in western art from brunelleschi to seurat. Yale University Press, New Haven Kinsey L, Moore T (2002) Symmetry, shape and space. Key Curriculum Press, Emeryville

17 Mathematics and Art: Connecting Mathematicians and Artists

495

Lang R (1996) A computational algorithm for origami design. In: Proceedings of 12th symposium on computational geometry. ACM, New York, pp 98–105 Lindberg D (1976) Theories of vision from Al-Kindi to Kepler. University of Chicago Press, Chicago Liu Y (1990) Symmetry groups in robotic assembly planning, Ph.D. Thesis, University of Massachusetts, Amherst Locher J (ed) (1972) The world of M. C. Escher. Harry Abrams, New York MacGillavery C (1976) Fantasy and symmetry: the periodic drawings of M.C. Escher. Harry Abrams, New York Mainzer K (1994) Symmetries in mathematics. In: Grattan-Guinness I (ed) Companion encyclopedia of the history and philosophy of the mathematical sciences. Routledge, London, pp 1612–1623 Makovicky E (1989) Ornamental brick work, theoretical and applied symmetrology and classification of pattern. Comput Math Appl 17:995–999 Martin G (1992) Transformation geometry. Springer, Berlin Miura K (ed) (1997) Origami science and art. In: Proceedings of the second international meeting of origami science and scientific origami, Seian University, Otsu, Shiga Niggli P (1924) Die Flachensymmetrien homogener diskontinuen. Zeit. f. Kristallographie 60: 283–298 Niggli P (1926) Die regelmassige Punkverteilung langs einer Geraden in einer Ebene. Zeit. f. Kristallographie 63:255–274 Ouchi J (1977) Japanese optical and geometric art. Dover, New York Ornes S (2019) Math Art: Truth, beauty, and equations. Sterling, New York Penrose L, Penrose R (1958) Impossible objects: a special type of illusion. Br J Psychol 49:31 Peterson I (2001) Fragments of infinity: a kaleidoscope of math and art. Wiley, New York Polya G (1924) Uber die Analogie der Kristallsymmetrie in der ebene. Z Kristall 60:278–282 Rowe C, McFarland J (1939) Engineering descriptive geometry, 2nd edn, 1953. Princeton University Press, Princeton Salenius T (1978) Elementart bevis for pohlkes sats. Nordisk Matematisk Tidskrift 25–26:150–152 Sarhangi R (ed) Bridges: mathematical connections in art, music, and science, conference proceedings, yearly, 1998–2001 Sarnitz A (2007) Hoffmann, Taschen, Köln Schaaf W (1951) Art and mathematics: a brief guide to source materials. Am Math Mon 58: 167–177 Schattschneider D (1978a) Tiling the plane with congruent pentagons. Math Mag 51:29–44 Schattschneider D (1978b) The plane symmetry groups. Their recognition and notation. Am Math Mon 85:439–450 Schattschneider D (1980) Will it tile/try the conway criterion! Math Mag 53:224–233 Schattschneider D (1986) In black and white: how to create perfectly colored symmetric patterns. Comput Math Appl 12B:673–695 Schattschneider D (1987) The Polya-Escher connection. Math Mag 60:293–298 Schattschneider D (1988) Escher: a mathematician in spite of himself. In: Guy R, Woodrow R (eds) The lighter side of mathematics. Mathematical Association of America, 1994, pp 91–100. (Reprinted from Structural Topology 15:9–22) Schattschneider D (1990) Visions of symmetry. W. H. Freeman, New York Schattschneider D, Walker WMC (1987) Escher kaleidocycles. Pomegranate Artbooks, Rohnert Park Schreiber P (1994) Art and architecture. In: Grattan-Guiness I (ed) Companion encyclopedia of the history and philosophy of the mathematical sciences. Routledge, London, pp 1593–1611 Senechal M (1975) Point groups and color symmetry. Z Kristall 142:1–23 Senechal M (1979) Color groups. Disc Appl Math 1:51–73 Senechal M, Fleck G (eds) (1974) Patterns of symmetry. University Massachusetts Press, Amherst Shubnikov A, Koptsik V (1974) Symmetry in science and art, Nauka, Moscow, 1972. Plenum Press, New York

496

J. Malkevitch

Stevens P (1981) Handbook of regular patterns. MIT Press, Cambridge Stewart I, Golubitsky M (1992) Fearful symmetry – is god a geometer? Blackwell, Oxford Taylor R, Micolich A, Jones D (1999) Fractal analysis of Pollock’s drip paintings. Nature 399:422 Termes D (1998) New perspective systems, (privately published). Spearfish, South Dakota Van Delft P, Botermans J (1978) Creative puzzles of the world. Harry Abrams, New York Veltman K (1986) Linear perspective and the visual dimension of science and art. Deutscher Kunstverlag, Munich Videla C (1997) On points constructible from conics. Math Intell 19:53–57 Washburn D (1990) Style, classification and ethnicity: design categories on Bakuba raffia cloth. American Philosophical Society, Philadelphia Washburn D, Crowe D (1988) Symmetries of culture. University Washington Press, Seattle Washburn D, Crowe D (eds) (2004) Symmetry comes of age. University Washington Press, Seattle Wenninger M (1971) Polyhedron models. Cambridge University Press, New York Wenninger M (1983) Dual models. Cambridge University Press, New York Weyl H (1952) Symmetry. Princeton University Press, Princeton White J (1987) The Birth and rebirth of pictorial space, reprinted. Harvard University Press, Cambridge Wittkower R, Carter B (1953) The perspective of Piero della Francesca’s “Flagellation,”. J Warburg Courtauld Inst 16:292–302 Yen J, Sequin C (2001) Escher sphere construction kit. In: Proceedings of the 2001 symposium on interactive 3D graphics. ACM, pp 95–98 Zaslavsky C (1973) Africa counts: number and pattern in African culture. Lawrence Hill Books, Brooklyn

Mathematics and Art: Unifying Perspectives

18

Heather M. Russell and Radmila Sazdanovic

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematics in Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematics as an Artistic Inspiration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematics as an Artistic Tool and Medium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Interplay of Art, Culture, and Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Artistic Ideas in Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graphs and Their Visualizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unifying Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

498 499 499 502 505 509 510 513 520 522 523 523

Abstract In this chapter, we explore the interconnection of mathematics and art. We discuss mathematics as a lens to understand artwork and investigate how mathematical thinking and mathematical tools contribute to the process of creating art. Turning then to the manifestation of art within mathematics, we introduce ideas and constructions from mathematical graph theory that can be appreciated

H. M. Russell Department of Mathematics and Computer Science, University of Richmond, Richmond, VA, USA e-mail: [email protected] R. Sazdanovic () Department of Mathematics, North Carolina State University, Raleigh, NC, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_125

497

498

H. M. Russell and R. Sazdanovic

from an artistic perspective. Finally, we reflect on how the process of doing and communicating mathematics is inherently artistic.

Keywords Visualizations · Visual mathematics · Graphs · Knots · Tessellations · Diagrams · Culture · Symmetry · Dimension · Deformation

Introduction The word calculus may not elicit excitement as an initial reaction. However, behind the typical topics one sees in calculus lies a fascinating history and interesting modern applications. Few know of the beautiful foundation of ideas on which calculus is built or that it was independently developed in different parts of the world at different times (Joseph, 2010). Most well known are the contributions of Cavalieri (1598–1647), Fermat (1607– 1665), Newton (1643–1727), Leibniz (1646–1716), and their contemporaries to the field of calculus. However, ideas of infinitesimal calculus and expressions akin to modern Taylor approximations of the second order stated by Brook Taylor in 1715 were developed in the Kerala region of south India more than two-hundred years prior to that. The work of Nilakantha Somayaji (1444–1544), Citrabhana (1475– 1550), Narayana (1500–1575), and others, all of which builds on the work of Madhava (1340–1425), focuses on astronomical calculations (Joseph, 2010). For example, the infinite series for π appears in the work of Madhava in the fifteenth century, but it does not appear in the Western culture until the seventeenth century in the work of Leibniz and James Gregory (Joseph, 2010). Calculus ideas were also independently developed in seventeenth-century Japan in the work of Seki Takakazu (1642–1708) (Joseph, 2010; Smith and Mikami, 1914). Called yenri, it is considered to be the foundation of wasan (Poole, 2014; Restivo, 2013; Selin, 2013). This interesting history demonstrates that mathematics, like art, is contemporary and mediated both by cultural and historical influences. Its advancement depends on societal needs and technological developments. As is explored in the field of ethnomathematics, culture has a particularly strong influence on mathematical visualization (D’Ambrosio, 2001; Gerdes, 1994, 1997; Washburn and Crowe, 1988, 2004). Visualizations, like art, are also impacted by the media in which they are rendered. For instance, modern software enables us to generate and interact with incredibly sophisticated mathematical objects and complex data. In addition to the cultural and visual connections between mathematics and art is a similarity of processes; both doing mathematics and generating art are inherently creative tasks. This creativity is essential for successfully addressing the problems we face in our global, interconnected, and increasingly interdependent society. Solutions require innovative and cross-disciplinary thinking. Mathematics, like art, thrives on originality. The space in which mathematics and the arts come together is a powerful place where diverse perspectives are celebrated and problems are tackled with strength, agility, and imagination.

18 Mathematics and Art: Unifying Perspectives

499

The goal of this chapter is to explore the interconnection of mathematics and art. In section “Mathematics in Art”, we focus on the role mathematics plays in art. In particular, section “Mathematics as an Artistic Inspiration” discusses mathematics as a lens to understand artwork, while sections “Mathematics as an Artistic Tool and Medium” and “The Interplay of Art, Culture, and Mathematics” investigate how mathematical thinking and mathematical tools contribute to the process of creating art. In section “Artistic Ideas in Mathematics”, we introduce ideas and constructions from mathematical graph theory that can be appreciated from an artistic perspective. In section “Unifying Perspectives”, we reflect on how the process of doing and communicating mathematics is inherently artistic. Acknowledging and celebrating this fact serves to showcase math as useful, accessible, and interesting.

Mathematics in Art Mathematics provides computational and theoretical means to many areas of science and engineering, and it is essential for disseminating and explaining results (Mößner, 2013). Likewise, in addition to lending itself to analyzing artwork, mathematics provides inspirational, conceptual, and technological tools to visual artists. In this section we focus on mathematical visualizations, followed by a collection of mathematical ideas that are used by artists to give life to their ideas. Section “The Interplay of Art, Culture, and Mathematics” provides illustrations of some of these ideas appearing in Tess-celestial (Sazdanovic et al., 2018).

Mathematics as an Artistic Inspiration Within the mathematical community, there is an historical and ongoing debate about the role of visual representations in mathematics. The group of mathematicians in France collectively known Bourbaki sought to introduce added rigor and precision into mathematics. In Arnold (1990) Arnold mentions Bourbaki’s opinion on the work of Barrow (Barrow and Whewell, 1860), thus hinting at the tension surrounding the issue of mathematical visualization. Bourbaki writes with some scorn that in Barrow’s book in a hundred pages of the text there are about 180 drawings. (Concerning Bourbaki’s books it can be said that in a thousand pages there is not one drawing, and it is not at all clear which is worse.)

Certainly, some remain skeptical of mathematical visualization as a necessary and rigorous framework for proving theorems. On the other hand, areas like graph theory, which is discussed below, demonstrate the merit of using mathematical visualization to do mathematics. Regardless of where one sides in this debate, it is indisputable that visualizations provide a vehicle for teaching mathematics and communicating contemporary research results. Pictures and diagrams are commonly used to convey and clarify the structure of an argument and to break complex arguments into pieces that are easier to understand.

500

H. M. Russell and R. Sazdanovic

Fig. 1 A photo of the whiteboard featuring ideas by R. Koytcheff and R. Sazdanovic from May 2018

Not only are visualizations used to communicate mathematics as a finished product, but also they are used as a tool to shape mathematics. For example, see Fig. 1 representing an attempt by R. Koytcheff and R. Sazdanovic to formulate and then prove a theorem. Diagrams similar to the ones on the board, common in algebraic topology and category theory, appear in the work of Bernar Venet, for example, his  “TOND to TOND: Self-Similarity of Persian TOND Patterns, Through the Logic of the X-Tiles”, “Saturation Commutativity”, and “Saturation with Three Red Arrows” from the beginning of the twenty-first century. Venet’s work shows that mathematical beauty, as described in Hardy’s book (Hardy, 1992), inspires art. Whether or not one appreciates mathematical visualizations as inherently artistic, it is clear that mathematics interacts with art in a variety of other ways. Mathematics provides a precise language and set of technical tools for describing, analyzing, and conceptualizing art. Mathematical concepts like symmetry, dimension, and even deformation become useful lenses through which we can view art. Even the paintings of Salvador Dalí (1904–1989), inspired by the French surrealist movement and research on the subconscious, can be analyzed from this viewpoint. We all have a sense of what symmetry means; perhaps it involves similarity or repetition. To a mathematician, an intuitive feeling of what a concept means is an irresistible target for further investigation. Indeed, mathematics has developed a rich language to describe different types of symmetry precisely. Most of these are featured in the paintings of Dalí. In “Portrait of My First Cousin”, we see central symmetry in parts of the image, which fixes a point, and in “The Elephants”, we

18 Mathematics and Art: Unifying Perspectives

501

see mirror symmetry, which fixes a vertical line. According to gestalt theory, which claims that a whole is more than a simple sum of its parts, we perceive this painting as symmetrical, and after that we recognize the details and discover some deviations from the exact symmetry. “The Railway Station at Perpignan” combines several symmetries: mirror symmetries with respect to two axes and a central symmetry. Taken together they create dihedral symmetries, or symmetries of a square. Additionally, this piece demonstrates scaling, or homothetic symmetry, about the central point. Because the centers of the homothety and dihedral symmetry coincide, we get the impression of balance and perspective. The eye is drawn toward the center, but the figures themselves are static. In “Crucifixion (Corpus Hypercubus)”, Dalí not only uses but also is inspired by the mathematical concept of dimension. A collection of seven cubes is depicted with an eighth one hidden but implied. This configuration of cubes, shown in Fig. 2, is an unfolding of the four-dimensional cube in three-dimensional space just as the three-dimensional cube unfolds into six squares in the plane (The image in Fig. 2 was obtained via Creative Commons. “File:8-cell net.png” was generated by Robert Webb using Robert Webb’s Stella software available at http://www.software3d.com/ Stella.php.). In the 1970s and 1980s, mathematician Thomas Banchoff, who was struck by the hypercube and also a fan of Dalí’s paintings, met and talked with Dalí often about concepts such as four-dimensional space. In fact, Banchoff gave his model of the hypercube to Dalí, and it now resides at the Dalí museum in St. Petersburg, Florida (Reidy, 2018). In the 2004 documentary The Dalí Dimension, Dalí says “Scientists give me everything, even the immortality of the soul” (Marques and Joan, 2004). In “A Propos of the ‘Treatise on Cubic Form’ by Juan de Herrera”, we see Dalí’s take on Necker’s illusion and the Koffka cube: a regular hexagon divided into three congruent rhombs. Dalí also breaks the archetypal symbol of a cross into cubes to create the illusion of movement within a flying cross in “Nuclear Cross.” Although mathematics is sometimes understood to be a rigid, one-question-oneanswer field, the reality is that mathematicians often play with the ideas, tweaking

Fig. 2 Hypercube: four-dimensional analogue of a cube and a square

502

H. M. Russell and R. Sazdanovic

them and exploring the consequences of their game. Reaching beyond the notion of symmetric sameness, mathematicians who work in the field of topology take a broader view of what shapes count as “the same.” Shapes are considered to be the same if, informally and only slightly incorrectly, one can be deformed to the other one by gently stretching and bending, without cutting or re-gluing. For example, a topologist considers a circle to be the same as an ellipse, square, diamond, or rectangle. This abstract idea has found a contemporary application in analyzing big data. A central principle of the area of mathematics known as topological data analysis (Edelsbrunner and Harer, 2010; Ghrist, 2008, 2014) is that data have shape and shape contains relevant information. Dalí, of course, was quite comfortable with the concept of deformation. In “Swans Reflecting Elephants” – where swans and elephants can be viewed as the same – we see Dalí playing with both reflective symmetry and topological ambiguity. Similarly, in “The Persistence of Memory” also known as “Soft Watches,” one of the most famous Dalí’s paintings and one of the icons of surrealism, we recognize watches and paintings although they are melted and deformed. Dalí was inspired by Einstein’s special relativity, and he used ideas from topology to depict the relativity of time and space. The title of his “Topological Contortion of a Female Figure,” demonstrates Dalí’s direct interest in this mathematical topic. In his works Dalí also used crystallographic symmetries, sphere packings, selfreferential systems (“The Face of War”), tessellations, and textures.

Mathematics as an Artistic Tool and Medium In addition to providing a language for analyzing and understanding artwork, mathematical techniques can be used as tools to create art, enabling artists to realize their ideas. For example, conceptual, algorithmic, and generative artforms are all infused with mathematical processes and ideas. Ritual drawings by the Tchokwe of Central Africa (Huylebrouck, 2019) and the Tamil of India come from different cultures and continents, and yet both are examples of algorithmic generative art (Jablan, 2002). In the Tchokwe sand drawings – called “sona” – a set of rules determines the end result based on the placement of dots and a choice of starting point for the curve. The artistic skill resides in cleverly choosing these initial conditions. Figure 3 shows two examples of “sona” through which the Tchokwe illustrate their stories. The drawing on the left describes the process of selecting a chief, and the one on the right represents life (Huylebrouck, 2019). The algorithm that generates sand drawings from initial choices is a bit like billiards. Imagine a billiard table as a rectangular grid with a certain number of additional walls positioned within the grid. A ball is then placed in the grid and set in motion. The trajectory as it bounces off of various walls traces out what is called a mirror curve in the mathematical literature. Mirror curves are related to the mathematical field of knot theory and also arise in ideas related to quantum computing (Jablan et al., 2012; Lomonaco and Kauffman, 2008). Modern-

18 Mathematics and Art: Unifying Perspectives

503

Fig. 3 Tchokwe sand drawings (Huylebrouck, 2019)

Fig. 4 S. Jablan’s work on modularity and Kufic tiles (left) and an image created by D. Knuth using the jablantile font (right)

day mirror curves are most often generated using software, thus automating the underlying algorithm. This makes it easier and more accessible to create such curves and to explore their properties. Jablantile is a six-character font created by computer scientist Donald Knuth (Knuth, 2009), based on work of mathematician-artist Slavik Jablan (1952–2015) in the theory of symmetry and modularity (Attenberg, 2005). By using the jablantile font’s characters “ ,” “ ,” “ ,” “ ,” “ ,” and “ ” as modular pieces, one can generate images in the spirit of Jablan’s original work. This is shown in Fig. 4. An understanding of the mathematics of symmetry in combination with other modern technologies like 3D printing can lead to striking new artistic developments, for example, in the work of Rinus Roelofs, Fig. 5. Here and in our previous examples, we can see that mathematics is not only a tool but also in some sense a medium. The artistic object that is produced has precise mathematical relevance. This is further exemplified by the work of Helaman Ferguson. Ferguson, a mathematics Ph.D., has been an artist for the greater part of his life (Ferguson and Ferguson, 2011). In his words

504

H. M. Russell and R. Sazdanovic

Fig. 5 Rinus Roelof’s sculptures: Elevated cube, the trefoil knot as the shape of the hole (left); double version of Coxeter’s infinite polyhedron (middle), and three interwoven layers – elliptical holes (right)

Fig. 6 The process of creating Helaman Ferguson’s umbilic torus: Helaman with the robot and the board describing the mathematics behind the sculpture (left) and on top of the welded piece (right)

Fig. 7 The process of creating Helaman Ferguson’s umbilic torus: mathematical robot carving the molds (left), one half of the sculpture while it was being welded in artist’s studio (center), prototype (right)

I celebrate mathematics with sculpture and sculpture with mathematics. Eons-old stone strikes me as a perfect medium through which to celebrate timeless mathematics.

The photos shown in Figs. 6 and 7 illustrate the process of creating his almost 10ton bronze sculpture “Umbilic Torus” whose home is in Stony Brook, New York.

18 Mathematics and Art: Unifying Perspectives

505

On the left in Fig. 6, we see the artist with a chalkboard that contains most of the mathematics that describe the piece. To construct the sculpture, molds for various parts were 3D printed and then used to construct the bronze in many pieces. These pieces were then welded into two gigantic pieces. Mathematics plays an essential role not only in conceptualizing the piece but also in the 3D printing process and the technical details of assembly. The installation took a lot of careful planning.

The Interplay of Art, Culture, and Mathematics Mathematical ideas, patterns, and regularities appear in the work of many artists, although not always in geometric form. The concept of pattern, for instance, is communicated via the law of complementary colors in impressionist paintings and in cubism by depicting objects simultaneously from multiple viewpoints. Excellent examples can also be found in the works of Josef Albers (1888–1976) based on elementary geometrical forms (e.g., the series of graphics  “Homeomorphisms Between the Circular Disc and the Square”), Edna Andrade (1917–2008) (e.g., “Torsion”), Frank Stella (e.g., “Gran Cairo”), the conceptual art of Sol LeWitt (1928–2007) (e.g., “Corner Piece No. 2”), and many more in “Lumen Naturae: Visions of the abstract in art and Mathematics” by Matilde Marcolli (Marcolli, 2020). In the rest of this section, we focus on a recent take on tessellations of the hyperbolic plane featured in the work of the second author, which is akin to the “Circle Limit” series of M. C. Escher (1898–1972). Artisans in the classical Islamic world made great strides discovering, describing, and exploiting patterns and relationships among shapes via tilings long before the formal mathematical study of these ideas. While the simplest tessellations date back to 4000BC, Harmonices Mundi by Johannes Kepler (1571–1630) is among the first written documents in which their study appears (Kepler, 1969). A tessellation is usually thought of as a way of covering a plane by geometric shapes with no gaps or overlaps subject to certain conditions (Conway et al., 2016) but there is no reason to stop at two-dimensional spaces or flat geometry. Tessellations appear as a research topic in several distinguished branches of mathematics, such as algebra, geometry, and dynamical systems. A formal classification of tessellations of the plane and the proof that there exist precisely 17 two-dimensional crystallographic groups are both results derived only in the late nineteenth and early twentieth centuries. While there are finitely many tessellations in the flat setting, there are infinitely many tessellations of the hyperbolic plane (Conway et al., 2016). Rather than going into details and definitions, let us take a dive into hyperbolic tessellations by analyzing the mathematics behind Escher’s “Circle Limit III”. Images in Fig. 8 illustrate the steps of constructing the tessellation shown in Fig. 9 inspired by Escher’s “Circle Limit III”. The underlying structure of this hyperbolic tessellation is determined by Schläfli symbol (3, 4, 3, 4, 3, 4). Each 3 in the Schläfli symbol indicates a triangle, and each 4 indicates a square (Sazdanovic, 2012b). The order in which these numbers appear means that at every vertex of this tessellation,

506

H. M. Russell and R. Sazdanovic

Fig. 8 Hyperbolic tessellation (3, 4, 3, 4, 3, 4) (left), with the tile spanned by green vertices (middle), tile filled in with a pattern inspired by Escher’s “Circle Limit III” (right) Fig. 9 “Escheresque” image created by M. Sremcevic and R. Sazdanovic using software Tess (Sazdanovic and Sremcevic, 2004)

one can find a triangle, followed by a square, followed by a triangle, etc. as is shown in Fig. 8. Presumably, one could pick any finite sequence of numbers in this manner, where a number n represents an n-gon, and attempt to construct a tessellation this sequence describes. However, a given sequence may have one, many, or no corresponding tessellations (Conway et al., 2016). Once we have the tessellation, we can use some algebra and geometry to analyze its symmetries to determine the tile, that is, the smallest piece of the plane that could be used to cover the whole plane using the symmetries of the tessellation. As shown in the middle image in Fig. 8, the tile for tessellation (3, 4, 3, 4, 3, 4) is spanned by the green vertices and inscribed in the red circle. It is important to note that all triangles are congruent and equilateral, the same for squares: their appearance

18 Mathematics and Art: Unifying Perspectives

507

and size vary because of their representation within the Poincare disk model for the hyperbolic plane (Sazdanovic, 2012a,b). Finally, the choice of the three-fish pattern shown on the right of Fig. 8 for the tile creates a tessellation in Fig. 9 which resembles “Circle Limit III”. It is important to note that the structure of the tessellation is determined, often not uniquely, by the Schläfli symbol, but the choice of the pattern in the tile is completely up to the artist. The geometry and combinatorics behind tessellations provide the framework for creative, artistic expression. Both tessellations in Fig. 10 have the underlying structure of hyperbolic tessellation (3, 4, 3, 4, 4) combined with patterns of increasing complexity and size. A striking difference between these and the previous tessellation in Fig. 9 stems from using circular patterns for the tile. Moreover, they are examples of conceptual, generative, and algorithmic art and the interplay between the artist, mathematics, and the algorithm. The rules of the game are as follows: the artist picks the Schläfli sequence of numbers, mathematics says “End game” if the symbol does not determine a tessellation or hands one or more tessellations back to the artist together with the shape of each corresponding tile. Then the artist picks the pattern for the tile and hands it back to the algorithm. The mathematics and the algorithm behind the software Tess are determining the order in which the circles are drawn, hence the appearance of the artwork as a whole. The choice of a particular motif can alter or hide the mathematical structure, obscuring or enhancing internal symmetries. The differences between “Disoriented” Fig. 10 (left) and “Disoriented and Confused” (right) are caused partly by the color, complexity, and size of their tile patterns but most importantly by their placement with respect to the tile. “Disoriented and Confused” is an example of breaking the symmetry of the original tessellation (Grünbaum and Shephard, 1987) by choosing a pattern that does not fit inside of the tile. This ultimately leads to overlaps and a more chaotic appearance.

Fig. 10 “Disoriented” (left) and “Disoriented and Confused” (right) art by R. Sazdanovic. The underlying structure is that of a tessellation but Disoriented has gaps, while Disoriented and Confused has both gaps and overlaps: both prohibited under the definition of tessellation

508

H. M. Russell and R. Sazdanovic

Let us go back to the time when Japanese mathematician Seki Takakazu was developing ideas from calculus and introducing determinants (Joseph, 2010) when sangaku, geometric theorems or problems painted on wooden tablets , were also created. An example is shown on the left in Fig. 11. Placed as offerings at Shinto shrines and Buddhist temples during Japan’s Edo period (1603–1868) by members of all social classes, sangaku were often presented as mathematical solutions to questions or challenges to congregants (Clark, 2016). “Sugaku-geijutsu I” (2018), shown on the right in Fig. 11, uses sangaku from the Tashiro Shrine of Gifu Prefecture as a tile for (4,4, 4, 4, 4, 4). The result is a modern homage to a culturally and historically important moment in mathematics. “Knotted DiscoTess” (2018) combines R. Sazdanovic’s research interest in knots with visualizations of hyperbolic space (Fig. 12). The tangle, or piece of a knot,

Fig. 11 A photo from Tashiro Shrine, Gifu Prefecture, January 2015 by David Clark (left); “Sugaku-geijutsu I” (2108) by R. Sazdanovic (right) Fig. 12 Knotted DiscoTess (2108) tessellation by R. Sazdanovic with the pattern consisting of a tangle created by Knot Plot by R. Scharein (Scharein, 1998)

18 Mathematics and Art: Unifying Perspectives

509

used as a tile pattern was created using KnotPlot by R. Scharein (Scharein, 1998; Scharein and Booth, 2002) and carefully chosen to match the internal symmetries of the underlying tessellation (5, 5, 5, 5, 5, 5, 5). This tessellation contains seven pentagons at each vertex, and it was created using software by Malin Christersson that allows the user to create a tessellation by choosing one of the hyperbolic tessellations and any image (Crissterson, 2015). Created using the software Tess (Sazdanovic and Sremcevic, 2003) and KnotPlot (Scharein, 1998), R. Sazdanovic’s tessellations showcase different symmetries of hyperbolic, negatively curved space. Together, these images convey the beauty and subtlety of this deep mathematical concept providing the viewer with an intuitive understanding of hyperbolic geometry. They were created as a part of the exhibition and associated webapp “Tess-celestial” created within Immersive Scholar initiative (Sazdanovic et al., 2018).

Artistic Ideas in Mathematics So far we have discussed many of the mathematical aspects of art. In his May 1935 obituary for visionary mathematician Emmy Noether (Einstein, 1935), Albert Einstein addresses the artistic nature of mathematics. Pure mathematics is, in its way, the poetry of logical ideas. One seeks the most general ideas of operation which will bring together in simple, logical and unified form the largest possible circle of formal relationships. In this effort toward logical beauty spiritual formulas are discovered necessary for the deeper penetration into the laws of nature.

This quote highlights the creativity and importance of mathematical abstraction and the sense in which abstract thinking strengthens the eventual application of mathematics. The goal of this section is to explore a collection of artistic ideas in mathematics centered around the common theme of the graph. We have chosen graphs because of their simple yet satisfying and inherently visual nature and because they are essential objects in the research of both authors (Adamaszek et al., 2019; Beier et al., 2016; Bhakta et al., 2019; Dasbach and Russell, 2018; Gasparovic et al., 2018; Jablan and Sazdanovi´c, 2007, 2008; Jablan et al., 2011; Khovanov and Sazdanovic, 2015, 2020a,b; Lowrance and Sazdanovi´c, 2017; Pabiniak et al., 2009; Przytycki and Sazdanovic, 2012; Russell, 2013; Sazdanovi´c and Yip, 2015). Thanks to courses like algebra and calculus, the reader is most likely familiar with the concept of the graph of a function as illustrated in Fig. 13a. This type of a graph, with perpendicular axes representing inputs and outputs, is important in a variety of contexts but in general does not fit into the framework of graphs we will explore here. The graphs we will discuss – those coming from discrete mathematics like the one in Fig. 13b – are not as familiar to some since they tend not to show up in standard pre-university math courses. Nevertheless they are fundamental objects in mathematics and beyond, both simple to describe and, at the same time, leading to rich, interesting, and highly creative mathematics. We begin by exploring graphs

510

H. M. Russell and R. Sazdanovic

y

x (a) The graph of the function f (x) = x2

(b) A discrete graph

Fig. 13 Two types of graphs. (a) The graph of the function f (x) = x 2 . (b) A discrete graph

and their properties with an emphasis on visualization. Then, we show how these ideas are used via two examples of the application of graphs.

Graphs and Their Visualizations Graphs are ubiquitous in mathematics and its applications. They are used to represent everything from algebraic structures and geometric objects to social networks and cartographic maps. A graph G = (V , E) consists of a set V of vertices (or nodes) and a set E of edges connecting pairs of vertices. A graph can be represented purely via mathematical symbols. For instance, G = ({a, b, c, d}, {ab, ac, ad, bc, bd, cd}) describes a graph with four vertices and every possible connection between distinct pairs of vertices. This is an example of a complete graph since there is an edge between every pair of vertices. To understand the structure of a graph more deeply and intuitively, it is often useful to visualize it. This is typically done by representing each vertex as a dot and each edge as a curve connecting the corresponding pair of dots. We use the word curve intentionally to emphasize edges do not have to be represented by straight line segments. There are infinitely many ways one can pictorially represent a graph using dots and curves, so there is a great deal of flexibility involved in graph visualization. Figure 14 shows four ways the graph G described in the previous paragraph can be represented. A fundamental difference between the instantiations of G in Fig. 14 is that Fig. 14a has intersecting edges, whereas the graphs in Fig. 14b,c do not. The broken line segment in Fig. 14c indicates that one edge is above the other from the perspective of the viewer. This diagrammatic trick allows us to use a twodimensional rendering to describe G as a subset of three-dimensional space. We can think of this graph three-dimensionally as having a curved top edge or noncoplanar vertices. A graph is said to be embedded in a space if it has no self-intersections;

18 Mathematics and Art: Unifying Perspectives

(a) A non-embedding

511

(b) Two different embeddings in R2

(c) An embedding in R3

Fig. 14 Three diagrams of the complete graph G on four vertices. (a) A non-embedding. (b) Two different embeddings in R2 . (c) An embedding in R3

(a) Unlinked circles

(b) A Hopf link

(c) An intrinsically linked graph

Fig. 15 A Hopf link in a graph embedding. (a) Unlinked circles. (b) A Hopf link. (c) An intrinsically linked graph

a graph that is embeddable in a flat, infinitely extending plane is called planar. Figure 14b demonstrates that a complete graph on four vertices is planar. An interesting classical result is that complete graphs on more than four vertices are not planar. In fact, complete graphs are key in characterizing exactly which graphs have the planarity property (Kuratowski, 1930; West, 2000). While only some graphs are planar, every graph can be embedded in threedimensional space. This fact leads to interesting questions about properties of graph embeddings which is the focus of spatial graph theory. The spatial study of graphs is closely connected to the study of knots, which we will touch on below, and the study of rigid molecules in chemistry. A graph is said to be intrinsically linked if no matter how it is embedded in three-dimensional space, the embedding contains a pair of linked circles. The complete graph on six vertices is intrinsically linked. See Fig. 15 for an example of an embedding of this graph with linked circles highlighted in red and blue. As is the case with planar graphs, there is an exact characterization of graphs that are intrinsically linked (Conway and Gordon, 1983). We can also consider graphs to be embedded in subspaces of two- or threedimensional space. For instance, we might consider a graph to be embedded in a disk or sphere or the doughnut-shaped surface called a torus. We stated earlier that the complete graph on five vertices is not planar. It follows that it cannot be embedded on the sphere. Figure 16 demonstrates that it can be embedded on the torus. In fact, given a finite graph there are infinitely many surfaces on which it can be embedded. An interesting question is to find the simplest such surface for a particular graph.

512

H. M. Russell and R. Sazdanovic

Fig. 16 A graph embedded on a torus

Fig. 17 A graph (in blue) embedded in a disk together with its dual (in red)

A graph G embedded on a surface divides the surface into regions we call faces. Using this additional structure, we can construct a dual graph G∗ on the same surface where vertices of G∗ lie in the faces of G and edges in G∗ indicate adjacent faces of G. Dual graphs are an important tool in graph theory and one example of a more general mathematical theme of constructing pairs of dual objects. Figure 17 below is an example of a graph embedded in a disk together with its dual. For a connected graph G in the plane, it is always true that (G∗ )∗ = G. Our example in Fig. 17 demonstrates this is not always true in more general settings. More on this and many other subtleties can be found in Mohar and Thomassen (2001). The concept of a dual can also be used to generate a graph from any cartographic map as is shown in Fig. 18. It is common practice to decorate graphs with additional structure. For instance, one can assign labels such as numbers or letters to the vertices or edges of a graph; the labels are often diagrammatically represented using colors which is one reason this labeling is called graph coloring. A vertex coloring is called proper if no two distinct vertices connected by an edge have the same label. A natural question is to find the minimum number of colors needed to properly color a specific graph. This quantity is known as the chromatic number of the graph. The four-color theorem, which states that, using four colors, any cartographic map can be colored in such a way that adjacent regions have different colors, is equivalent to the statement

18 Mathematics and Art: Unifying Perspectives

513

Fig. 18 A graph relating regions of a map and a coloring of the vertices in the graph

Fig. 19 Proper colorings of graphs. (a) Bipartite. (b) Not bipartite

(a) Bipartite

(b) Not bipartite

that the chromatic number of the graph coming from a map is at most four (Appel and Haken, 1977; Robertson et al., 1997). Figure 18 shows such a graph and an associated proper coloring. (The underlying map in Fig. 18 is modified from an image obtained via Creative Commons. “File:1st Circuit map.svg” by MarginalCost is licensed under CC BY-SA 4.0.). Note that, while we use four colors in the figure, only three colors are needed for this particular map. An important class of graphs are those with chromatic number two; these are called bipartite graphs. Figure 19a shows a graph that is two-colorable and therefore bipartite, while the graph in Fig. 19b has chromatic number three and is hence not bipartite. Bipartite graphs have many applications including in coding theory and in the modeling of distributed and concurrent systems in computer science (Moon, 2005; Peterson, 1981).

Examples of Graphs While the preceding exposition barely scratches the surface of graph theory, hopefully it convinces the reader of the beautiful and rich ideas that can come from exploring a seemingly simple mathematical concept. We now turn to examples of the application of graphs in two different areas: knot theory and reconfiguration theory.

514

H. M. Russell and R. Sazdanovic

(a) Shadow of a diagram

(b) Crossing information choices

(c) A knot diagram Fig. 20 A shadow together with crossing information yields a knot diagram. (a) Shadow of a diagram. (b) Crossing information choices. (c) A knot diagram

Knots and Graphs A knot is an embedding of a circle in three-dimensional space. Knots arise both in pure mathematics and in fields like chemistry, physics, and biology. An accessible introduction to knot theory including most of the ideas in this section can be found here (Adams, 2004). As we saw in Fig. 14c above, we can represent three-dimensional embeddings two-dimensionally by using broken lines to indicate crossing information from the perspective of the viewer. This leads to the concept of a knot diagram. So that we may apply constructions from graph theory, we sometimes think of knot diagrams as planar graphs with vertices labeled or “colored” by crossing information. The underlying graph is referred to as the shadow of the diagram. Crossing information identifies the overstrand and understrand at each vertex of the shadow yielding a knot diagram. This is illustrated in Fig. 20. Knot theory has many other connections to graph theory. From the beginning, graphs have been used to study knots (Tait, 1877, 1878); conversely, knot theory has contributed essential ideas to the study of spatial graphs, graph invariants, and many other topics in graph theory (Flapan et al., 2017; Helme-Guizon and Rong, 2005; Sazdanovi´c and Yip, 2015). Like graphs, knots interact with surfaces in interesting ways. Two examples of this are shown in Fig. 21 (The images in Fig. 21 were obtained via Creative Commons. “File:Noeud de trefle et surface de seifert.svg” by Accelerometer in Fig. 21a is licensed under CC BY-SA 3.0. “File:TorusKnot3D.png” by Michiel Sikma in Fig. 21b is licensed under CC

18 Mathematics and Art: Unifying Perspectives

(a) A link and its Seifert surface

515

(b) A (3, −7) torus knot

Fig. 21 Knots interacting with surfaces. (a) A link and its Seifert surface. (b) A (3, −7) torus knot

(a) A checkerboard shading

(b) Overlaid dual Tait graphs

(c) Individual Tait graphs Fig. 22 A checkerboard shaded diagram and its two dual Tait graphs. (a) A checkerboard shading. (b) Overlaid dual Tait graphs. (c) Individual Tait graphs

BY-SA 2.5.). A link is a collection of circles embedded in space. See Fig. 15 for an example. Figure 21a shows a Seifert surface for a link with three components. Constructed via Seifert’s algorithm, this is a particular orientable surface for which the link is the boundary (Seifert, 1935). Figure 21b shows an example of a torus knot which, as the name suggests, can be embedded on the torus. The pair of integers (3, −7) describes how the knot wraps around the torus. There is a variety of graphs we can associate to a knot diagram to analyze its structure. A classical example is the Tait graph also referred to as the checkerboard graph since it comes from a checkerboard shading of the diagram like the one shown in Fig. 22a. Every knot diagram has two Tait graphs associated to it: one for shaded

516

H. M. Russell and R. Sazdanovic

regions and one for unshaded regions of the checkerboard coloring. Part of the data of a Tait graph is its embedding which interacts with the knot diagram in a particular way. The vertices correspond to either shaded or unshaded regions, and the edges pass through crossings that connect regions with the same shading. A checkerboard shading and its two associated Tait graphs are shown in Fig. 22. The reader may notice that the two Tait graphs in Fig. 22 are dual to one another. By construction, this is always the case. Another observation is that the Tait graph does not depend on the crossing data for the diagram. Hence, the Tait graph is really an object associated to the shadow of the diagram. A common operation on knot diagrams is smoothing. This is used in Seifert’s algorithm mentioned above and also extensively applied to compute knot invariants which are functions used to distinguish different knots. To smooth a crossing in a diagram, we reconnect the four strands coming into the crossing in such a way that they no longer cross. A choice of smoothing at every crossing results in a resolution of the diagram. Figure 23 shows a diagram, local smoothing choices, and a resolution with labels indicating which smoothings were chosen at each crossing. As the figure shows, every smoothing can be uniquely labeled either A or B according to the following convention. Begin by viewing the crossing so that the overstrand extends from the bottom left to the top right and understrand extends from the bottom right to top left. From this perspective, an A-smoothing connects the regions above and below the crossing, whereas a B-smoothing merges the left and right regions. Smoothings interact with Tait graphs in a striking way. Around each crossing of a checkerboard colored diagram, the regions alternate shading. As a result, the two dual Tait graphs intersect in a transverse fashion at each crossing. A choice of smoothing can be thought of as merging two shaded regions or two unshaded regions and choosing a corresponding Tait graph edge passing through the crossing. This idea is illustrated in Fig. 24. Thistlethwaite capitalizes on this relationship between Tait graphs and smoothings to reimagine an important knot invariant called the Jones polynomial (Thistlethwaite, 1987).

Reconfiguration Systems Our final example of an application of graphs is reconfiguration. Given a problem with a well-defined set of solutions and a rule for transitioning between solutions, one can construct a reconfiguration system. Such a system can be represented as a graph with vertex set corresponding to the set of solutions to the problem and edges representing transitions between solutions according to the given rule. By describing the solution set as a graph, we can use tools from graph theory to characterize its structure. Reconfiguration systems arise in a variety of settings including robotics (Nishimura, 2018). We return to knot theory for a preliminary example of a reconfiguration system. As we have seen, there are two ways to smooth each crossing in a knot diagram, and a choice of smoothing at every crossing results in a resolution of the diagram. Since there are two choices of smoothing at each crossing, a diagram with n crossings has 2n resolutions. To obtain a reconfiguration system, we equip the set of resolutions

18 Mathematics and Art: Unifying Perspectives

517

A

B (b) Local smoothing choices

(a) A knot diagram

A

A

B (c) A resolution of the diagram Fig. 23 A resolution of a diagram is obtained by smoothing all crossings. (a) A knot diagram. (b) Local smoothing choices. (c) A resolution of the diagram

(a) Smoothing and selecting Tait edges

(b) A resolution

Fig. 24 A resolution together with the corresponding collection of Tait edges. (a) Smoothing and selecting Tait edges. (b) A resolution

with the transition rule given by switching the choice of smoothing at one crossing. The graph for this reconfiguration system, sometimes called a cube of resolutions, is shown in Fig. 25. For any n crossing diagram, this reconfiguration system will have the same underlying graphical structure, so its structure does not directly enable us to distinguish different knots. However, organizing a diagram’s resolutions in this way provides a useful framework for computing many knot invariants including the Jones polynomial (Bar-Natan, 2002).

518

H. M. Russell and R. Sazdanovic

Fig. 25 A cube of resolutions for the trefoil

Fig. 26 Proper colorings correspond to good schedules. (a) Bad schedule. (b) Good schedule

(a) Bad schedule

(b) Good schedule

We have already discussed graph coloring abstractly and in the context of the four-color theorem. It can also be used to model a variety of other applied problems including networking or scheduling (Marx, 2004). To do this, we construct a graph that encodes all scheduling requirements and constraints. We begin by assigning one vertex to each event that needs to be scheduled. If the events are meetings, perhaps one person needs to attend two or more of the meetings. If the events are jobs within a computational system, perhaps two jobs need to access the same resource. To encode such constraints, we put an edge between vertices corresponding to events that cannot be scheduled at the same time. If we let colors represent meeting times, a proper vertex coloring of the resulting graph corresponds to a good schedule since no pair of events connected by an edge will be scheduled at the same time. This is demonstrated in Fig. 26. From here, we can describe a graph coloring reconfiguration system as follows. Fix a graph G that encodes a particular scheduling scenario, and let k be the maximum number of meeting times. As we have just discussed, the problem of finding a schedule satisfying all constraints using at most k meeting times can be rephrased in the language of graph theory as properly vertex coloring G with at most k colors. Our rule for transitioning between solutions is rescheduling one meeting or, in the language of graph theory, recoloring one vertex. The resulting graph coloring reconfiguration system, denoted by Ck (G), has one vertex for each proper k-coloring of G with edges between colorings if they differ at only one location. Note that Ck (G) is a sort of meta-graph as it is a graph with vertices that are graph colorings. An example is shown in Fig. 27 for the path on three vertices which we denote by P3 .

18 Mathematics and Art: Unifying Perspectives

519

Fig. 27 The graph coloring reconfiguration system C3 (P3 ) Fig. 28 A graph coloring reconfiguration system with cubical structure. (a) The graph G. (b) C3 (G)

(a) The graph G

(b) C3 (G)

Graph coloring reconfiguration systems get large very quickly, so it is imperative to have software to visualize anything beyond simple examples. With the aid of such software, we begin to notice properties of these systems. In particular, they seem to have a locally cube-like structure. Figure 28 shows an example of this phenomenon. We emphasize that each vertex in Fig. 28b represents a coloring of the graph in Fig. 28a. This cubical structure is not just coincidental. It is a theorem that every reconfiguration system has an associated cubical complex where the cubes represent reconfigurations that are independent of one another (Ghrist and Peterson, 2007). The graph for the reconfiguration system is embedded as the so-called one-skeleton of this complex. A cubical complex can be endowed with a geometric structure. In fact, complexes coming from reconfiguration problems have nonpositive curvature and are known as locally CAT(0) spaces (Gromov, 1987). A useful tool for studying symmetries of reconfiguration systems is their associated block-cut (or BC) trees. A cut vertex in a graph has the property that removing it disconnects the graph. A subgraph of a graph is called biconnected if it does not have cut vertices. In Fig. 29a, cut vertices are colored yellow, and maximal biconnected components are circled. The BC tree associated to a graph has one vertex for each maximal biconnected component and one vertex for each cut vertex.

520

H. M. Russell and R. Sazdanovic

(a) Graph with cut vertices

(b) The associated BC tree

Fig. 29 Constructing a BC Tree from a graph. (a) Graph with cut vertices. (b) The associated BC tree

(a) Base graph G

(b) C4 (G)

(c) The BC tree for C4 (G)

Fig. 30 Graph coloring reconfiguration systems exhibit radial symmetry. (a) Base graph G. (b) C4 (G). (c) The BC tree for C4 (G)

There is an edge between a biconnected component and a cut vertex if the cut vertex lies in the component. A simple example is shown in Fig. 29. Examining BC trees for graph coloring reconfiguration systems, a radial symmetry becomes apparent. Indeed, we can prove that connected graph coloring reconfiguration systems always have a star-shaped, radial structure (Bhakta et al.). For this reason, we call the central biconnected component of such a system its nucleus. Examples of a base graph, coloring graph, and BC tree are shown in Fig. 30. Note that symmetry in the BC tree indicates the same symmetry is present in the coloring graph. This emphasizes the power of the BC tree since symmetry is not readily apparent when examining Fig. 30a.

Unifying Perspectives In this final section, we reflect on the creativity involved in the process of doing and making mathematics drawing parallels with the creation of an artpiece. Of course, visual areas of mathematics like graph theory have a literal connection to the arts as they seek to communicate ideas pictorially, but we claim the connection is much deeper.

18 Mathematics and Art: Unifying Perspectives

521

Solving a mathematical problem is akin to painting: it is a nonlinear process that requires a lot of imagination. Applying results from theoretical mathematics to problems in other sciences is often like cubism: one needs to analyze the object, take it apart, and then put it back together in an abstract form. Every idea – from embedding graphs to computing their duals to coloring them – was at some point introduced by an imaginative mind in an attempt to solve or understand a problem. Additional creativity is involved in asking questions that push these ideas further and connect them with other concepts. Indeed, graph coloring was initially motivated by the map coloring problem. What started as a method to study map coloring was then applied to other problems like scheduling. Labeling vertices with colors became labeling vertices or edges with more complicated quantities and mathematical structures. Mathematicians play with assumptions, rules, and variations to see what happens. This is the “jazz” of mathematics. Another analogy between math and art stems from the relationship between an idea and its realization. A proof realizes a mathematical idea just as an artpiece brings to life an artist’s vision. Just as artists have different styles, so too do mathematicians differ in the way they present ideas. For instance, some are detailoriented, while others focus on the big picture. Some write to be accessible to a wide audience, and others require readers to approach the work with extensive prior knowledge. Even different proofs of the same statement are not considered equal. First is the controversial issue, also seen in the arts, of whether or not a proof is elegant (Aigner and Ziegler, 2009; Hardy, 1992). Beyond that, one proof may be preferred over another because of the mathematics it uses. For example, a proof might use a simpler set of tools, provide a unique insight, or offer opportunity for generalization. This is similar to the idea that different depictions of the same object in art – perhaps one figurative and the other abstract or one coming from realism and the other surrealism – can bear little to no resemblance to one another. The argument that substantiates a mathematical truth is just as much the artwork of a mathematician as is the theorem to which it leads. The process of formulating such arguments requires inspiration, experience, creativity, and patience. Artists and mathematicians both must learn to cross the abyss between an idea and its substantiation. In the process of proving things, mathematicians may need to define new quantities or concepts, like graphs, for instance, in order to build a solid framework for an argument. In this way, a graph can be seen as a choice of mathematical medium for problem-solving. Choosing to reason using graphs as opposed to some other mathematical object is an important decision that will shape the results that emerge. This process is analogous to the artist choosing a medium. Just as animation can easily communicate the idea of movement, graphs are a natural framework for studying knots. The concept of a free ride in mathematics, first introduced by Shimojima in 1996 (Shimojima, 1996), points at an important parallel between visually or diagrammatically driven mathematical media like graphs and the visual arts. De Toffoli explains the concept of a free ride as follows, “A diagram produced to represent a textually speci?ed object or situation gives us a free ride when it reveals

522

H. M. Russell and R. Sazdanovic

a consequence not logically deducible from the specification alone” (De Toffoli, 2019). By translating mathematical information into a visual or diagrammatic framework, consequences that were not obvious may become apparent. Of course, these consequences can be proven without the diagrams, but the diagrams provide the insight. This is a striking connection between mathematics and art. Experiencing a piece of artwork can bring new understanding and fundamentally change the viewer’s perspective. Visualizing mathematics can enable the mathematician to see a problem in new and exciting ways leading to new discoveries. Mathematicians, like artists, likely have at least one preferred medium though they may experiment with several over time. Like an artist, the mathematician at the beginning of their career may mimic the approaches and constructions of others to solve problems. Over time and with work, the mathematician develops a unique voice and perspective. As happens in the arts, there are exciting moments in the unfolding of mathematics when a new idea is introduced. This idea sends ripples through the community as mathematicians investigate its implications in their own mathematical territories. Selecting the medium for doing mathematics is one choice, but there are additional stylistic decisions that must be made. In the case of graphs, what will vertices represent? What will edges encode? Will the graph be embedded? If so, how and where? Should the edges or vertices be labeled? Each of these choices impacts the structure of the graph and which additional tools can be used to analyze it. The mathematician chooses which properties of a mathematical object are retained when it is represented as a graph and how they are communicated. Similarly, the artist has a set of choices to make about color, perspective, level of abstraction, and composition. Yet another aspect of creativity in mathematics is the immense level of detail it requires. In order for our theories to be solid and correct, there must be a careful accounting of how precisely we represent the object of study in our chosen medium and how we encode each of its properties. The ideas we produce may be surprising, revolutionary, or beautiful, but the structure that leads us there should not be debatable or subjective. We see this same level of detail play out in the artwork of Helaman Ferguson. Mathematics, like art, demonstrates that creativity and structure are not at odds with one another.

Conclusion Mathematics and art share many connections. The language of mathematics can be used to analyze and understand art. Mathematics inspires art in literal and figurative ways. Not only mathematical visualizations but also mathematical structures, ideas, and writing have artistic merit. The creative processes of the mathematician and the artist have many parallels. The centuries-old relationship between art and mathematics continues to evolve all the while remaining an important and mutually beneficial connection worthy of recognition and celebration.

18 Mathematics and Art: Unifying Perspectives

523

Cross-References  Mathematics and Art: Connecting Mathematicians and Artists  The Beauty of Blaschke Products

References Adams CC (2004) The knot book. American Mathematical Society, Providence. An elementary introduction to the mathematical theory of knots, Revised reprint of the 1994 original Adamaszek M, Adams H, Gasparovic E, Gommel M, Purvine E, Sazdanovic R, Wang B, Wang Y, Ziegelmeier L (2019) On homotopy types of vietoris–rips complexes of metric gluings. arXiv preprint arXiv:1712.06224 Aigner M, Ziegler GM (2009) Proofs from the book, 4th edn. Springer Publishing Company, Incorporated Appel K, Haken W (1977) Every planar map is four colorable. Part I: discharging. Illinois J Math 21(3):429–490, 09 Arnold VI (1990) Huygens and Barrow, Newton and Hooke: pioneers in mathematical analysis and catastrophe theory from evolvents to quasicrystals. Springer Science & Business Media, Cham Attenberg L (2005) Modularity: understanding the development and evolution of natural complex systems. MIT Press, Cambridge Bar-Natan D (2002) On Khovanov’s categorification of the Jones polynomial. Algebr Geom Topol 2(1):337–370 Barrow I, Whewell W (1860) The mathematical works of Isaac Barrow . . . University Press, Cambridge Beier J, Fierson J, Haas R, Russell HM, Shavo K (2016) Classifying coloring graphs. Discrete Math 339(8):2100–2112 Bhakta P, Buckner BB, Farquhar L, Kamat V, Krehbiel S, Russell HM (2019) Cut-colorings in coloring graphs. Graphs Combin 35(1):239–248 Bhakta P, Krehbiel S, Morris R, Russell HM, Sathe A, Su W, Xin M (In preparation) Symmetry and connectivity of coloring graphs Clark D (2016) Seeking Sangaku: visiting Japan’s homegrown mathematics. Math Horizons 24(2):8–11 Conway JH, Gordon CM (1983) Knots and links in spatial graphs. J Graph Theory 7(4):445–453 Conway JH, Burgiel H, Goodman-Strauss C (2016) The symmetries of things. CRC Press, Baton Raton Crissterson (2015) Make hyperbolic tilings of images. http://www.malinc.se/m/ImageTiling.php D’Ambrosio U (2001) Mathematics across cultures: the history of non-Western mathematics, vol 2. Springer Science & Business Media, Dordrecht Dasbach OT, Russell HM (2018) Equivalence of edge bicolored graphs on surfaces. Electron J Combin 25(1):1.59, 15 De Toffoli S (2019) Epistemic roles of mathematical diagrams. Ph.D. thesis, Stanford University Edelsbrunner H, Harer J (2010) Computational topology: an introduction. American Mathematical Society, Providence Einstein A (1935) The late emmy noether. New York Times Ferguson H, Ferguson C (2011) Celebrating mathematics in stone and bronze. The Best Writing on Mathematics, pp 150 Flapan E, Mattman TW, Mellor B, Naimi R, Nikkuni R (2017) Recent developments in spatial graph theory. In: Knots, links, spatial graphs, and algebraic invariants. Contemporary mathematics, vol 689. American Mathematical Society, Providence, pp 81–102

524

H. M. Russell and R. Sazdanovic

Gasparovic E, Gommel M, Purvine E, Sazdanovic R, Wang B, Wang Y, Ziegelmeier L (2018) A complete characterization of the one-dimensional intrinsic cˇ ech persistence diagrams for metric graphs. In: Research in computational topology. Springer, pp 33–56 Gerdes P (1994) Reflections on ethnomathematics. Learn Math 14(2):19–22 Gerdes P (1997) Survey of current work on ethnomathematics. Ethnomathematics: challenging Eurocentrism in mathematics education, pp 331–372 Ghrist R (2008) Barcodes: the persistent topology of data. Bull Am Math Soc 45(1):61–75 Ghrist RW (2014) Elementary applied topology, vol 1. Createspace Seattle Ghrist R, Peterson V (2007) The geometry and topology of reconfiguration. Adv Appl Math 38(3):302–323 Gromov M (1987) Hyperbolic groups. In: Gersten SM (ed) Essays in group theory. Springer, New York, pp 75–263 Grünbaum B, Shephard GC (1987) Tilings and patterns. Courier Dover Publications, New York Hardy GH (1992) A mathematician’s apology. Cambridge University Press, Cambridge Helme-Guizon L, Rong Y (2005) A categorification for the chromatic polynomial. Algebr Geom Topol 5:1365–1388 Huylebrouck D (2019) Missing link. In: Africa and mathematics. Springer, pp 153–166 Jablan SV (2002) Symmetry, ornament and modularity, vol 30. World Scientific, River Edge Jablan S, Sazdanovi´c R (2007) Unlinking number and unlinking gap. J Knot Theory Ramifications 16(10):1331–1355 Jablan S, Sazdanovi´c R (2008) Braid family representatives. J Knot Theory Ramifications 17(07):817–833 Jablan S, Radovi´c L, Sazdanovi´c R (2011) Tutte and Jones polynomials of links, polyominoes and graphical recombination patterns. J Math Chem 49(1):79–94 Jablan S, Radovi´c L, Sazdanovi´c R, Zekovi´c A (2012) Mirror-curves and knot mosaics. Comput Math Appl 64(4):527–543 Joseph GG (2010) The crest of the peacock: non-European roots of mathematics. Princeton University Press, Princeton Kepler J (1969) Harmonices mundi libri V. Forni, Bologna Khovanov M, Sazdanovic R (2015) Categorifications of the polynomial ring. Fundam Math 3(230):251–280 Khovanov M, Sazdanovic R (2020a) Bilinear pairings on two-dimensional cobordisms and generalizations of the deligne category. arXiv preprint arXiv:2007.11640 Khovanov M, Sazdanovic R (2020b) Diagrammatic categorification of the chebyshev polynomials of the second kind. arXiv preprint arXiv:2003.11664 Knuth D (2009) Jablantile. https://www-cs-faculty.stanford.edu/~knuth/graphics.html Kuratowski C (1930) Sur le problème des courbes gauches en topologie. Fundam Math 15(1): 271–283 Lomonaco SJ, Kauffman LH (2008) Quantum knots and mosaics. Quant Inf Process 7(2–3): 85–115 Lowrance AM, Sazdanovi´c R (2017) Chromatic homology, Khovanov homology, and torsion. Topology Appl 222:77–99 Marcolli M (2020) Lumen naturae: visions of the abstract in art and mathematics. MIT Press, Cambridge Marques S, Joan U (2004) The Dali dimension. https://www.amazon.com/Dali-DimensionDecoding-Mind-Genius/dp/B001BWYT4E Marx D (2004) Graph colouring problems and their applications in scheduling Mohar B, Thomassen C (2001) Graphs on surfaces. In: Johns Hopkins series in the mathematical sciences Moon TK (2005) Error correction coding: mathematical methods and algorithms. WileyInterscience, Hoboken Mößner N (2013) Photographic evidence and the problem of theory-ladenness. J Gen Philos Sci 44(1):111–125 Nishimura N (2018) Introduction to reconfiguration. Algorithms 11:52, 04

18 Mathematics and Art: Unifying Perspectives

525

Pabiniak MD, Przytycki JH, Sazdanovi´c R (2009) On the first group of the chromatic cohomology of graphs. Geom Dedicata 140(1):19 Peterson JL (1981) Petri net theory and the modeling of systems. Prentice Hall PTR, Englewood Cliffs Poole D (2014) Linear algebra: a modern introduction. Cengage Learning, Stamford Przytycki JH, Sazdanovic R (2012) Torsion in Khovanov homology of semi-adequate links. arXiv preprint arXiv:1210.5254 Reidy K (2018) Salvador Dalí and the hypercube. Sci Am Restivo S (2013) Mathematics in society and history: sociological inquiries, vol 20. Springer Science & Business Media, Dordrecht Robertson N, Sanders D, Seymour P, Thomas R (1997) The four-colour theorem. J Combin Theory Ser B 70(1):2–44 Russell HM (2013) An explicit bijection between semistandard tableaux and non-elliptic sl3 webs. J Algebraic Combin 38(4):851–862 Sazdanovic R (2012a) Diagrammatics in art and mathematics. Symmetry 4(2):285–301 Sazdanovic R (2012b) Fisheye view of tessellations. In: Proceedings of bridges 2012: mathematics, music, art, architecture, culture. Tessellations Publishing, pp 361–364 Sazdanovic R, Sremcevic M (2003) Tessellations of the Euclidean, elliptic and hyperbolic plane. https://library.wolfram.com/infocenter/MathSource/4540/ Sazdanovic R, Sremcevic M (2004) Hyperbolic tessellations by tess. Symmetry Art Sci 1:226–229 Sazdanovi´c R, Yip M (2015) A categorification of the chromatic symmetric polynomial. In: Proceedings of FPSAC 2015, discrete mathematics and theoretical computer science proceedings. Association of Discrete Mathematics and Theoretical Computer Science, Nancy, pp 631–642 Sazdanovic R, Vandegrift M, Wust M, Hallman S, Hayes E, Lang J, Gurley W, Waller M, Davidson B (2018) Tess-celestial. https://immersive-scholar.github.io/tess-celestial/ Scharein R (1998) Knotplot. Program for drawing, visualizing, manipulating, and energy minimizing knots. See http://www.knotplot.com Scharein RG, Booth KS (2002) Interactive knot theory with knotplot. In: Multimedia tools for communicating mathematics. Springer, pp 277–290 Seifert H (1935) Über das geschlecht von knoten. Math Ann 110(1):571–592 Selin H (2013) Encyclopaedia of the history of science, technology, and medicine in non-westen cultures. Springer Science & Business Media Shimojima A (1996) On the efficacy of representation. Ph.D. thesis, Indiana University Indiana Smith D, Mikami Y (1914) A history of japanese mathematics. Open Court Pub, Chicago Tait (1877) Viii.—on knots. Trans R Soc Edinb 28(1):145–190 Tait (1878) 2. On links. Proc R Soc Edinb 9:321–332 Thistlethwaite MB (1987) A spanning tree expansion of the Jones polynomial. Topology 26(3):297–309 Washburn DK, Crowe DW (1988) Symmetries of culture: theory and practice of plane pattern analysis. University of Washington Press, Seattle Washburn DK, Crowe DW (2004) Symmetry comes of age: the role of pattern in culture. University of Washington Press, Seattle West DB (2000) Introduction to graph theory, 2 edn. Prentice Hall, Upper Saddle River

Spherical Perspective

19

António B. Araújo

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spherical Anamorphosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Radial Occlusion and Mimesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spherical Anamorphs and Their Vanishing Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spherical Perspective as Cartography of the Visual Sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . Referentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Azimuthal Coordinate System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Horizontal Coordinate System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Angular Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Azimuthal Equidistant Spherical Perspective (360-degree Fisheye) . . . . . . . . . . . . . . . . . . . . The Azimuthal Equidistant Flattening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Solving the Azimuthal Equidistant Spherical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fixed Grids for the Azimuthal Equidistant Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Ruler and Compass Construction of the Azimuthal Equidistant Spherical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perspective Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Equirectangular Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VR Panoramas as Immersive Anamorphoses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Construction of the Equirectangular Flattening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Images of Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ruler and Compass Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Drawing Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sliding Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion: What Is (Not) a Spherical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

528 529 532 532 534 540 541 543 544 545 545 546 549 551 553 561 567 568 569 573 574 575 576 577 582

A. B. Araújo () CIAC-UAb, Center for Research in Arts and Communication, Universidade Aberta, Lisbon, Portugal e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_100

527

528

A. B. Araújo

Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

585 585

Abstract We survey the present state of spherical perspective, regarding both mathematical structure and drawing practice, with a view to applications in the visual arts. We define a spherical perspective as the entailment of a conical anamorphosis with a compact flattening of the visual sphere. We examine a general framework for solving spherical perspectives, exemplified with the azimuthal equidistant (“fisheye”) and equirectangular cases. We consider the relation between spherical and curvilinear perspectives. We briefly discuss computer renderings but focus on methods adapted to freehand sketching or technical drawing with simple instruments such as ruler and compass. We discuss how handmade spherical perspective drawings can generate immersive anamorphoses, which can be rendered as virtual reality panoramas, leading to hybrid visual creations that bridge the gap between traditional drawing and digital environments.

Keywords Spherical perspective · Anamorphosis · Fisheye · Equirectangular · Drawing · Curvilinear perspective · Virtual reality panoramas

Introduction We will survey ahead the present state of spherical perspective and discuss its mathematical definition and its connection to immersive visualizations and anamorphoses. We are interested in spherical perspective both as a mathematical object and as a practical discipline of drawing that connects traditional technical drawing with digital visualization methods. We start by briefly discussing the history of spherical perspectives and then redefining them, taking anamorphosis as the central concept. Then we consider how to solve the two perspectives that we have chosen as main examples: the azimuthal equidistant (“fisheye”) and equirectangular cases. From the parallels found between the solutions obtained, we infer a general framework for approaching arbitrary cases. In the present work, we are interested in a notion of perspective that can be both mathematically rigorous and executable by the human hand, not requiring modern digital machinery. Such drawing has a long history of fertile interactions between mathematics and art, a tradition of rational drawing (Araújo 2020) that straddles mathematics and art, connecting the hand, the eye, and the mind. It is an intellectual thread that we can trace at least back to Euclid (through both his optics (Brownson 1981) and his geometry) and touches an endless multitude of artists, engineers, and architects, some of them sound theoreticians, most of them delightful and sometimes

19 Spherical Perspective

529

infuriating hackers, and once in a while brushes again against a mathematician proper, often a now surprising name, such as Taylor – yes, that Taylor (Andersen 1992), mostly now known by his series rather than by his views on vanishing points. This thread of rational drawing also connects strongly with modern ideas of graphical computing, but we are mostly concerned with what new drawing software can be installed in our minds and hands. Computers will be mostly considered as means of displaying such constructions. This should not be read – quite the opposite – as any deprecation of the shining role of computer graphics in both geometric investigation and artistic creation in the area of immersive drawing, but rather as an attempt to focus on one other domain, sometimes lost in that bright light, and which has its own scope and enduring interest that is deserving of consideration, especially for those of us who see handmade drawing not only as a means of expression but as a mode of thought.

History Spherical perspective had a curious development, with many false starts and equivocations. It can be argued that the first spherical perspective was not spherical and the next spherical perspectives were not perspectives. The first was Barre and Flocon’s formal treatment of the azimuthal equidistant spherical perspective (commonly called fisheye perspective) in their 1968 book La Perspective Curviligne (Barre and Flocon 1968, 1987). This was a proper perspective, in the sense that it provided a system to calculate and render all points, lines, planes, and vanishing points, using only ruler and compass constructions, to an approximation with controlled error bounds. Yet spherical perspective is somewhat of a misnomer, as those same rules were only set for half of the sphere, rendering therefore only a 180-degree view around a central axis, in what is often called, in a rather ambiguous way, a five-point perspective. Later attempts at spherical perspectives suffered from the opposite problem: they were fully spherical (so-called six-point perspectives), but they weren’t actually perspectives by the standards set up by Barre and Flocon (or indeed by classical perspective), i.e., they were not general, rigorous methods for calculating all vanishing points and line images. They were either qualitative, ad hoc, or partial, grid-based methods. Later, they were computer-based (see, for instance, Correia et al. 2013), moving out of the reach of the human draughtsman; the focus having shifted to computers graphics, the original perspective problem was dropped rather than solved. In a Leonardo paper, Casas (1983) writes of a setup that seems at first sight a generalization of Barre and Flocon (1968) (though not claiming to be so and in fact never citing their work – it is unclear if Casas was familiar with it) but is limited to qualitative observations that apply both to such a generalization and to a whole class of so-called six-point perspectives. Further, these qualitative observations suffer from peculiar misconceptions regarding the behavior of the projection near what we here will call its blowup, that is, the point at which the projection from the

530

A. B. Araújo

sphere to the plane is not defined. For instance, Casas states that the sphere flattening cannot be achieved mathematically but only graphically. What is meant by this is somewhat cryptic; Casas difficulties seem to be at times related to the topological differences between the sphere and the projection disc, sometimes to the well-known fact that the plane and sphere are not isometric, sometimes to a misconception that lines going to the blowup point should widen to nonzero measure regions as they approach it. The author was apparently unaware, unlike Flocon, Barre, and Bouligand in their discussion of the theoretical aspects of their perspective (Barre et al. 1964) that the flattening itself is not new. It is the azimuthal equidistant projection, well known in cartography. The real question is how to calculate it in a way appropriate for drawing with simple instruments and how to relate the projection to the spatial scene in an elegant way. This is alluded to by K. R. Adams in a letter to Leonardo (Adams 1983). Adams was the author of another spherical perspective (Adams 1976), the tetraconic perspective (based on a flattening of a tetrahedron), and was aware of the real problem: of balancing the pros and cons of various projections, their topological and metric properties, and the possibility of plotting them efficiently by simple means such as ruler and compass. Casas provides no specific metric description of his projection, being seemingly convinced of the impossibility of doing so; he would later publish a second paper in Leonardo (Casas 1984), describing a spherical picture-within-picture recursive graphical device reminiscent of the so-called Droste effect in Escher’s Print Gallery (de Smit and Lenstra 2003; Escher 1956), again worked qualitatively. Moose (1986), in an attempt to make specific and practical the qualitative considerations of Casas, proposed a five-point perspective with an angle of view wider than 180◦ and with a specific construction in ruler and compass. It is, however, a rather ad hoc construction and not a generalization of Barre and Flocon’s. G. Michel approaches the problem in a clear and pragmatic fashion, understanding correctly that the azimuthal equidistant projection can be extended all the way to a 360-degree view (Michel 2013). Michel constructs a correct grid of horizontal and vertical line projections on which measurements from observation can be plotted and lines drawn by interpolation on the grid. This does not constitute a formal “perspective” in the sense that Barre and Flocon’s work does; rather, it is a gridding method, but unlike Casas it is well defined and unlike Moose’s it is indeed a generalization of Barre and Flocon’s to a 360-degree view. This informal scheme, in the hands of a virtuoso draughtsman like G. Michel, is quite enough to produce exquisite drawings in 360-degree spherical perspective. A much similar gridding method was applied by the same author to the equirectangular case (Michel n.d.). In a private communication, the present author was informed by G. Michel that his approach was made independently of Barre and Flocon’s. In fact, a main instrument of these authors – the approximation of lines by arcs of circle in the anterior hemisphere (as we will see ahead) – is absent from Michel’s work, who works from exact azimuthal equidistant grids, through freehand interpolation, without explicit consideration of error bounds. In contrast to these works, A. B. Araújo (Araújo 2018c) proposes a formal perspective in the sense of Barre and Flocon, meaning a systematic method to obtain

19 Spherical Perspective

531

all line images and their vanishing points for any given 3D scene, with consideration of formal ruler and compass interpolations. This perspective is the total azimuthal equidistant spherical perspective, that is, the formal generalization of Barre and Flocon’s “fisheye” perspective to the full sphere. This is the main work we will be describing and developing here, so let us consider its main points in a quick summary. The work can be divided into two parts: (1) a general definition of spherical perspectives and a framework for their study and (2) the application of this framework to the azimuthal equidistant case. A spherical perspective is defined as a two-step process: a spherical anamorphosis followed by a cartographic flattening. Anamorphosis itself is redefined as an equivalence relation between three dimensional sets, each three-dimensional scene being represented by a canonical anamorphic image on the surface of a sphere. This concept of anamorphosis has been studied at length in another chapter in the present volume (see  Chap. 9, “Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives”), but we will recall it ahead briefly. The main point is that the canonical anamorphosis is the same for any spherical perspective, and it is very symmetrical; each spatial line is represented as a meridian (half of a geodesic) ending at exactly two vanishing points antipodal to each other. These vanishing points are functions of the observation point and of the lines themselves, not depending on the projection chosen for a particular perspective. In the second step, this anamorphic image is projected onto a compact subset of the plane by a flattening map, which is usually a cartographic map of the sphere. The resulting picture, flat and bounded, is what we call the perspective drawing. Implicit here is also a philosophy of compactness – that is, unlike classical perspective, the spherical perspective picture, including its vanishing points, should be fully contained in a prescribed bounded and closed region of the plane where the drawing is to be performed. In fact, ideally, even the auxiliary construction diagrams should be fully contained therein. This approach separates the mimetic from the representational aspects of the problem and suggests a general approach to solving a spherical perspective. Unlike the previous approaches above, the focus shifts from lines to geodesics (the vanishing sets of spatial planes), and an emphasis is placed on the importance of vanishing points always arising as antipodal pairs. Each perspective is characterized by the particular form of its flattening map, and this naturally breaks up the set of geodesics into classes which must be handled differently to get an efficient rendering of the perspective. This partition and its rendering methods turn out to rely heavily on the duality of vanishing points. Beyond the original azimuthal equidistant case treated in Araújo (2018c), this approach was also fruitful in the equirectangular (Araújo 2018b) and cubical (Araújo et al. 2020) cases. These latter cases relate naturally to VR visualizations, using the perspective drawings as data sources for VR panoramas. This establishes an interesting connection between the development of hand-drawn perspectives and immersive computer graphics, at both a theoretical and artistic level (Araújo et al. 2019; Michel

532

A. B. Araújo

2013; Olivero et al. 2019b). Spherical perspective had become somewhat split after Flocon and Barre, with handmade practice being handled mostly by gridding or ad hoc sketching with little new mathematical insight, and most new developments taking place at the computational level in ways that required computers to render a scene (see, for instance, the software of Correia et al. (2013) which can render a scene in parametric curvilinear perspectives). The methods we will present ahead are focused on understanding spherical perspective in a theoretically meaningful way, and rendering it in a way that is appropriate for hand drawing, yet accurate enough to serve as a basis for the creation of immersive VR visualizations.

Spherical Anamorphosis To understand spherical perspective, we need to review spherical anamorphosis. We will first approach the subject loosely and then restate it formally.

Radial Occlusion and Mimesis Suppose you are stuck to a point in space but free to look around you in any direction. You want to make a drawing that captures all that you see. How would you go about it? The question is too open ended. This really depends on what you are allowed to draw and how. It also depends on what “capturing what you see means.” Ultimately, we will want to define a spherical perspective as a drawing done on a bounded and closed region of the plane, which represents all the visual data accessible from a fixed observation point. But before we get to that definition, we will relax one of those criteria. If we are allowed to draw on a curved surface, we can give a very precise meaning to the notion of “capturing what we see.” The answer lies in anamorphosis. This is a word used to denote several different concepts. Here we mean it in a way that follows more in line with Euclid’s optics (Brownson 1981; Burton 1945) than with its usual definition derived from linear perspective. The reader may find a careful accounting of this notion in the  Chap. 9, “Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives” in the present volume or in Araújo (2017, 2018c). Here we will proceed loosely. It is an empirical fact that under certain optical setups, two points will “look the same” (i.e., seem to occupy the same position in the visual field) for an observer at a point O if and only if they lie on the same ray from O. More generally, two 3D objects will “look the same” (i.e., seem to occupy the same region of the visual field) if and only if the set of their points define the same cone of rays from O. This is an empirical fact that we call the principle of radial occlusion: visual occlusion coincides with geometrical alignment. This is not a mathematical truth, but a geometric statement of an empirical fact that is only verified when special circumstances (optical media, lighting conditions, geometric

19 Spherical Perspective

533

conditions) are themselves verified. It fails whenever refraction or reflection effects are strong enough, for instance, or when for some reason or another the abstraction of a monocular observer stationed at a point is not adequate. We will not be concerned with specifying all those conditions, but simply in deriving the interesting conclusions that result when the principle is indeed verified. For instance, if two objects look the same from O whenever they have the same cone of rays from O, it follows that every 3D object X has an infinity of other objects that look exactly like it from point O. We will call these objects the anamorphs of X relative to O (the O-anamorphs of X, for short). This defines an equivalence relation between objects. It also follows that there are anamorphs of X which are two-dimensional, since, if you intersect the cone of rays of X with an adequate surface S, the intersection will be a two-dimensional object and by construction will have the same cone of rays as X. We call the closure of this 2D projection, uniquely defined by O, S, X, the anamorphosis of X onto S relative to O. Since the space of rays from O is itself isomorphic to the unit sphere around O, this sphere is the most natural choice of projection surface to define anamorphosis relative to O. It records every ray in the most symmetrical way possible, and every other surface anamorphosis can be obtained from it, so that it can be called the canonical anamorphosis. Further, it “captures” this information in a very concrete sense: if the principle of radial occlusion is assumed, then the anamorphosis results in mimesis: an observer standing at O and looking at the anamorphic drawing on the sphere would be unable to distinguish the anamorphic drawing on the sphere from the actual 3D environment from which the drawing was obtained. For instance, in Fig. 1, if the 3D cube were to be replaced by its conical projection on the sphere, the observer at the center of the sphere would not notice the switch. This projection onto the sphere, although it has the same visual information as the 3D scene (when seen from O), has very different and interesting properties. Unlike

Fig. 1 Anamorph of a cube on a sphere. The 3D cube is projected radially toward O to obtain a 2D anamorph on the surface of the sphere. The 2D anamorph would look exactly like the 3D object for an observer at O, if the principle of visual occlusion is valid

534

A. B. Araújo

the original scene, the anamorphic image is bounded, as the sphere is compact. And if we make it compact (bounded and closed), this gives rise to a natural notion of vanishing points. Let us explore this in the next section, while making the concept rigorous. Note 1. The notions of spherical anamorphosis and spherical perspective are often conflated in the literature and are not at all standard. For instance, Catalano (1986) calls spherical perspective to what we here call spherical anamorphosis. Dick Termes, who is famous for his spherical paintings (which are spherical anamorphoses by our definition), uses a mix of the two terms, for instance, Termes (1991) calls five-point spherical perspective to the hemispherical perspective of Barre and Flocon (fisheye perspective) but calls six-point perspective to what we define here as spherical anamorphosis. Termes uses two separate, complementary fisheye perspectives to arrange on the plane the projection of a full sphere anamorphosis (Termes 1998, 1991), which technically counts as a total spherical perspective as we will define the term ahead. It can be argued that the conflation of the terms perspective and anamorphosis has led to much confusion historically (see  Chap. 9, “Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives” or (Araújo 2016)), and in the present work, we use the term spherical perspective exclusively for a representation on the plane (that follows certain conditions explained ahead) and never for the anamorphic drawing on the sphere.

Spherical Anamorphs and Their Vanishing Points Let us formalize the intuitions of the previous section. We have seen that accepting the principle of radial occlusion means that all the visual information of a scene, relative to an observation point, is held by the visual cone of the scene. By an observer we mean a point O in three-dimensional Euclidean space. Let 2 be the unit sphere centered at O. The map RO be the set of rays from O. Let SO −→ P → OP is a bijection between the sphere of center O and the set of rays from O, identifying each ray from O with the point where the ray crosses the sphere. Hence 2 interchangeably. we can speak of rays from O and points on the sphere SO −→ Definition 1. Given a set Σ, we say that CO (Σ) = {OP : P ∈ Σ} is the visual 2 is the intersection of cone of Σ relative to O. The conical projection of Σ onto SO 2 the visual cone with the unit sphere, CO (Σ) ∩ SO . We identify freely the cone with the conical projection on the sphere whenever context is clear. We will have a special interest in sets like planes and lines, so the following example is vital: Example 1. Visual cone of a line: Consider a line l not containing O. The visual cone of l relative to O is a half-plane minus a line. Let Hl be the single plane that

19 Spherical Perspective

535

Fig. 2 Two parallel lines l, j . Their visual cones are two meridians converging at the vanishing points V and V 

contains both O and l. Let lO be the translation of l to O, which is parallel to l. Then the rays from O to points of l cover the half-plane of Hl that contains l and is bounded by lO , with the exception of lO itself. lO \ {O} is the union of two diametrically opposite rays V , V  . These are not in the cone since they are parallel to l and therefore do not intersect it. Intuitively, the eye, following along the length of l to infinity (in both directions), approaches but never reaches these two diametrically opposite limit directions. 2 Now consider the conical image of the visual cone on the sphere, CO (Σ) ∩ SO (see Fig. 2). Plane Hl intersects the sphere in a geodesic of the sphere, and the visual cone CO (l) projects on half that geodesic, that is, on a meridian of the sphere. The meridian will be missing its two endpoints V and V  , corresponding to the two rays that make up lO \ {O}. We recall that a geodesic of the sphere is a great circle on the sphere, that is, a circle with the same radius as the sphere itself. A geodesic is defined by intersecting the sphere with a plane passing through O. So we see that the visual cone of a line is a half-plane missing two rays, or, equivalently, on the sphere it is a meridian missing its two endpoints. Now, we rather dislike that these points are missing, and we will do something about that. But first we must recall some topological notions. Definition 2. Let S be a subset of a topological space. We say that P is a limit point of S if every neighborhood of P , no matter how small, contains a point Q of S other than P . The closure of S is the union of S with its limit points and is denoted by cl(S). We say S is closed if S = cl(S). The residue of S is the closure of S minus the set itself, denoted Res(S) = cl(S) \ S.

536

A. B. Araújo

Intuitively, a limit point of a set is a point you can approach as close as you want without leaving the set. Closed sets are those that contain all their limit points. For instance, spatial points, lines, and planes are closed. A line segment with its endpoints included is closed. But a line segment without its endpoints is not closed, as its missing endpoints are limit points of the set. We are interested in these notions because they will relate to the notion of vanishing points ahead. We are now ready to proceed: We call objects to closed sets in three-dimensional space. A scene is a finite set of objects. 2 to There is on RO a single topology that makes the canonical bijection from SO RO continuous both ways (a homeomorphism). We will from now on endow the set of rays with this topology, so we can speak of the topological closure of a set of rays from O. In topological terms, we saw in Example 1 that a spatial line – which is closed but unbounded – has a visual cone (the meridian on the sphere) which is bounded. However, it is not closed, since it is missing its endpoints. We call compact set to a set that is both closed and bounded. Now, speaking intuitively, compact sets are all that we can draw. We cannot draw an infinite line, only a bounded one. And we cannot really draw a line missing endpoints. Things going to infinity or things missing endpoints are things that we can signify but not really draw. The point of anamorphosis is to replace the real 3D object, closed but unbounded by another – its spherical anamorph – that is visually equivalent to it, and yet is compact, hence drawable. To achieve that, we cannot simply take the strict conical projection, since as we have seen it will not contain the endpoints of a line image. These two missing endpoints of the meridian correspond naturally to the notion of vanishing points. We add them to conical projection by taking the topological closure and call the resulting compact object a spherical anamorphosis. This is the visual equivalent of the original 3D object, but is 2D, bounded, and closed: a drawing, even if one made on a curved surface. 2 ) is the Definition 3. Let Σ be a scene. We say that ΛO (Σ) = cl(CO (Σ) ∩ SO 2 3 spherical anamorphosis of Σ relative to O. Let λO : R \ {O} → SO be the map −→ 2 . We call λ the anamorphism (or conical projection) onto S 2 P → OP ∩ SO O O 2. relative to O. We use the same name for the corresponding map λO : RO → SO We have ΛO (Σ) = cl(λO (Σ)).

Definition 4. We say that VO (Σ) = cl(CO (Σ)) \ CO (Σ) is the set of vanishing 2 is the set of vanishing points of scene Σ relative to O. We say that VO (Σ) ∩ SO points of Σ in the spherical anamorphosis ΛO (Σ). So, we define the anamorphosis of a scene onto the sphere to be the topological closure of the strict conical projection. That is, the anamorphosis is what we get if we add the missing limit points to the strict conical projection. And to those missing

19 Spherical Perspective

537

points we have to add to make the anamorphosis closed we call the vanishing points. The vanishing set is therefore the topological residue of the conical projection. Note 2. Note that both the anamorphosis and the vanishing points of a scene are naturally defined on the space of rays and depend only on the scene and on the viewpoint. Of course rays and points on the sphere are the same thing. When we view the anamorphosis on the sphere, we can think more concretely, using the sphere as a projection surface. In Araújo (2018c) or in the present volume’s  Chap. 9, “Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives,” anamorphosis is defined for more general surfaces, but the spherical anamorphosis is the canonical one, from which all others can be derived (again because the sphere is identified with the space of rays), and since we are only interested in spherical perspectives here, we will discard further generality. From Example 1 we see that the vanishing points of a line l are obtained by translating the line to O and intersecting it with the sphere. Since parallel lines will have the same translated line at O, their vanishing points are equal. Hence vanishing points of lines are meeting points for their anamorphic images, thus preserving the main feature of the classical notion of vanishing points. We note that the vanishing points of a line are antipodal to each other. Recall that given a point P on a sphere, the antipode of a point P is the point diametrically opposite to P on the sphere. From now on we denote the antipode of a point P by P  . Proposition 1. The spherical anamorphosis of a line l not containing O is a 2 , where l meridian ending at two antipodal vanishing points, given by lO ∩ SO O is the translation of l to O. Hence parallel lines have the same vanishing points. See, for instance, in Fig. 2 the two parallel lines l and j . Their planes Hl and Hj intersect at the common parallel line through O, which in turn projects onto the sphere at two antipodal points V and V  , which are meeting points for the meridians which are the anamorphic images of the spatial lines. Along with lines and points, planes are the most important objects in perspective. Their projections are very simple: Proposition 2. Let σ be a plane that does not contain O. Let σO be the translation of σ to O. The intersection of σO with the sphere is a geodesic, which is the vanishing set VO (σ ). This geodesic divides the sphere into two disjoint hemispheres. The visual cone of σ relative to O is the hemisphere bounded by VO (σ ) and on the same side of σO as σ . Note that all planes parallel to σ will have the same vanishing set, as their translation to O is the same. Example 2. In Fig. 3 we see two planes σ, τ . The two planes are parallel to each other; hence their translations to the origin are equal and define the same geodesic when intersected with the sphere. This geodesic is their vanishing set. The visual

538

A. B. Araújo

Fig. 3 Two parallel planes σ , τ . Their visual cones are two hemispheres converging at a geodesic that is their common vanishing set

cone of σ is the hemisphere on the side of σ relative to σO , and the cone of τ is the opposite hemisphere. The spherical anamorphs of each plane are obtained by joining the vanishing set to them both, so they intersect at the geodesic. Note that any plane parallel to σ has to have exactly one of these two visual cones and anamorphs, according to each side of σO it happens to lie. We see that the images of both lines and planes are determined by the geodesics of the sphere. Indeed, geodesics and their rendering are the main concern of spherical anamorphosis and perspective. We therefore recall some generalities now. Because two different planes with a common point intersect along a line, the planes of two geodesics will intersect along a diameter of the sphere, which will project as a pair of antipodal points (so the vanishing sets of two nonparallel spatial planes will always have two points in common). Conversely, two antipodal points P and P  on the sphere define a family of planes around the axis P P  , which define a family of geodesics that covers the sphere without intersecting outside of those two 2 \ {P , P  }). By contrast, two non-antipodal points define a points (a foliation of SO single geodesic. Note that although two points of a spatial line, P and Q, define the line, giving their projections P  and Q on the sphere only defines their geodesic, but does not determine which meridian of that geodesic is the line’s anamorphic image. But giving the projection of one point of the line, P  , and the projection of another point V  which is known to be a vanishing point, does determine the line’s anamorph to be the meridian V  P  V  . Example 3. The perspective image of a cube is a classical problem in the study of both perspective and anamorphosis.

19 Spherical Perspective

539

Fig. 4 The six vanishing points of the spherical anamorphosis of a cube

In linear perspective the various positions of the cube relative to the projection plane result in so-called one-, two-, and three-point perspectives of the cube. In spherical anamorphosis, the vanishing points defined by the cube are always six. There is no privileged direction of view, the sphere being isotropic, and the position of the vanishing points depends only on the orientation of the cube relative to O. Note that by Proposition 1 translating the cube does not move its vanishing points (though it changes the cube’s visual cone) but rotating it does. The position on the vanishing points depends only on the orientation – not the spatial position – of the edges of the cube (Fig. 4). We take note of the degenerate cases when the lines or planes do contain O. Then a line l will project as just the two antipodal points of its translation to the origin, lO , but these points will not be vanishing points, by Definition 4. Analogously, a plane 2 , but this will be its full image and σ through O will project as the geodesic σ ∩ SO contain no vanishing points. We note that Definition 4 attributes a vanishing set to the anamorphic image of arbitrarily complex closed objects, not only lines and planes. We have now a nice way to capture all the visual information around a point onto a drawing. It comes with an elegant definition of vanishing points. All of this followed from the principle of radial occlusion and the insistence that a drawing is a compact set. But our drawing is still on a curved surface. The passage from anamorphosis to perspective comes simply from insisting that our drawings should be flat.

540

A. B. Araújo

Spherical Perspective as Cartography of the Visual Sphere We have seen how the spherical anamorphosis captures the visual information relative to a point O in the very concrete mimetic sense: it is a 2D projection that looks exactly like the original objects when seen from O. We define a spherical perspective to be a flattening of a spherical anamorphosis, that is, a representation of it on the plane. Passing the spherical drawing to the plane will of course deform it, as is well known in cartography. So we will sacrifice the mimetic properties of the anamorphosis: the perspective drawing will no longer fool the eye. It will require interpretation. But we insist that it should preserve all the visual information, so that the anamorphosis should be in principle recoverable from the data in the drawing – that is, one should be able to fold the perspective back onto the visual sphere. Hence, the flattening must be a bijection (almost everywhere at least). We also try to preserve at least some properties of continuity. Here as in the previous anamorphosis step, we will keep our adherence to the view of perspective as compactification. Our spherical perspectives will all be compact perspectives; they will be projections from the visual sphere to a compact subset of the plane. In this we are guided by the philosophy that compact sets are a fruitful abstraction of those things that we can actually draw, and we can only draw bounded shapes in bounded time. We can certainly conceive of unbounded perspectives (classical perspective is one), but so can the logician conceive of infinite proofs, and yet we find the concept of a finite proof to be most fruitful. Mathematics profits as much from selfimposed restrictions as from liberties. In this work our game will be this notion of a topologist’s view of perspective, and this restriction will prove to have its benefits. For instance, at its most prosaic, a compact representation can be stored as a raster representation in a computer file, something very useful if one is to recover the anamorphosis as a VR visualization. Barnard (1983, p. 441) points out this advantage, calling the spherical representation “closed” by opposition to the “open” Cartesian space, where by “closed” he means “bounded.” An unbounded representation would require more sophisticated ways of storing visual data, for instance, through parametric forms or equations. For some graphical computing, these would be inconvenient, but to the human draughtsman, they are fatal. We are interested in what we can recover from the pencil marks on a single drawing. We are bound to compactness by the nature of our tools. We now present our definition of perspective, which may seem quite abstract but will be made clear ahead through examples. 2 . We say that μ : Definition 5. Let U be an open dense subset of the sphere SO 2 if μ is a homeomorphism onto μ(U ), and there U → R2 is a flattening of SO 2 such that μ| is a continuous map μ˜ : cl(μ(U )) → SO ˜ μ(U ) = μ−1 . We say that p = μ ◦ λO is the perspective associated with the flattening μ, where λO : RO → 2 is the conical projection. Let p˜ = λ−1 ◦ μ. SO ˜ Given an object Σ, we say that O p˜ −1 (CO (Σ)) is the strict perspective image of Σ, that p˜ −1 (VO (Σ)) is the vanishing

19 Spherical Perspective

541

set of Σ, and that the perspective image of Σ is the union of its strict perspective image with its vanishing set. The following set is often useful in discussing perspectives. 2 , we call blowup of μ to the subset of points Definition 6. Given a flattening μ of SO 2 of cl(μ(SO )) where μ˜ is not injective.

The intuitive translation is that a flattening is an arbitrary map that sends the sphere (and hence the anamorphic representation of a scene) to a compact set of the plane, in such a way that it is a bijection and continuous both ways “almost everywhere.” The exception is the blowup set, where the flattening is not defined. But the blowup set is of measure zero, that is, it is at most one-dimensional; hence the anamorphic picture can be recovered by continuity. The perspective is simply defined as the sequence of the two steps from space to plane: anamorphosis onto the sphere followed by flattening. Since the anamorphosis is always the same, each spherical perspective is completely characterized by its flattening map. Note 3. The definition above assumes that we are projecting the whole sphere. We can easily define a partial spherical perspective to be a flattening of a compact, connected region of the sphere. Then we can include, trivially, the (hemi)spherical perspective of Barre and Flocon. In this work we found it proper to focus on the total spherical perspectives, that is, spherical perspectives that project the whole sphere. But partial spherical perspectives are also important in practice, since they include things like cylindrical perspective or linear perspective onto a compact region of the plane. Note 4. A note for geometers: the term blowup is not being used here with its usual significance in geometry. The name arose from the fact that in the azimuthal equidistant case (treated ahead), this set is the inverse image of a point by μ˜ and equals a circle which corresponds to the tangent plane at that point in the sphere. This is analogous to the blowup of point (although not projectivised), and that is where the name comes from.

Referentials Ahead we will study three flattenings, each of which defines a spherical perspective. In all cases some conventions will be useful, so we will set them here. Given a point P in space, we may be concerned with its projection on the sphere, P  , and then with its perspective image on the plane, P  . This notation gets rather cumbersome, so we will instead use the following simplification: we denote both the spatial point and its anamorph on the sphere by P, in bold font, leaving context to disambiguate between the two, and we denote the perspective image on the plane

542

A. B. Araújo

Fig. 5 Reference points and planes

by P . We will use more careful notation when ambiguity warrants, but for the most part, we can identify spatial points with their spherical anamorphs. We will need a reference frame. This is of course arbitrary, but we will choose the following in this text: We consider a referential (x, y, z) centered at the observer’s viewpoint, which we take to be O = (0, 0, 0) (see Fig. 5). We take three arbitrary points on 2 , denoted R (short for Right), F (Front), and U (Up), such the unit sphere SO −→ −→ −→ → → → that the vectors (OR, OF, OU) = (− u ,− u ,− u ) form a right-handed orthonormal x

y

z

referential. We denote the antipodes of these points by L (Left), B (Back), and D (Down), respectively. We call plane of the horizon to the plane ROF; we call horizon to the geodesic it defines by intersection with the sphere and horizontal to any plane parallel to it; we call observer’s frontal plane or coronal plane to the plane ROU and to its geodesic we call the corona (or “crown,” from anatomical notation). We call frontal plane to any plane parallel to it. We call median plane to the plane UOF, median circle to its geodesic, and sagittal to any plane parallel to it. We call anterior space to everything on the same side of the observer’s frontal plane as point F and posterior space to everything on the opposite side. Finally, regarding spatial lines, we say a line is a frontal line if it lies in a frontal plane; otherwise we call it a receding line. A receding line must intersect the coronal plane; if it intersects it orthogonally, then we say it is a central line. Note 5. Note that these notations vary widely in the literature, some authors preferring the geographical notation for the reference points, using “north” in place of “front,” “east” in place of “right,” etc. In the present text, we opted for anatomical notation rather than geographical, as over time we found the associated imagery

19 Spherical Perspective

543

clearer, especially for beginners. We mostly follow (Araújo 2018c) except where it calls equatorial plane to what we here call the coronal plane, as we feel it was mixing metaphors, and confusing, to let “equator” remain a sole geographical term in the midst of an otherwise uniform anatomical notation. Given a spatial point P, we will call P-geodesic to any geodesic that contains point P. But note that we will only call P-meridian to a meridian that goes from P to its antipode P . Two spherical coordinate systems will be needed ahead and must sometimes be related to each other. The following two sections define them. This may be skimmed now and read carefully when the developments require them.

Azimuthal Coordinate System This coordinate system is especially useful for the azimuthal equidistant perspective. It corresponds to the so-called astronomer’s system except that we put our reference point F(ront) where the zenith usually is in the star maps, simply because it is more convenient for the most common setups of spherical perspective drawings. We locate a point P on the sphere by two angles θ and ζ (see Fig. 6). Theta is called the azimuth and determines in which of the F-geodesics the point is, while zeta, often called the zenith angle or polar angle, measures how far P is from F along its

Fig. 6 Azimuthal coordinate system. A point P on the sphere is determined by angles θ (azimuth) and ζ (zenith angle or polar angle)

544

A. B. Araújo

F-meridian. More specifically, θ =  PF OR, where PF is the orthogonal projection of P onto the observer’s frontal plane, and ζ =  POF ∈ [0, π [. In coordinates, θ = atan2(z, x), ζ = arccos (y)

(1)

where atan2 : R2 \ {(0, 0)} →] − π, π ] is the two-argument arctangent, such that atan2(y, x) is the angle between the ray from (0, 0) to (x, y) and the positive xaxis. Note that atan2(y, x) = arctan(y/x) when x > 0 and otherwise differs by the appropriate addition/subtraction of π , or by extension through continuity, in order to extend the arctangent function to the four quadrants. Of course the azimuthal system on the sphere may be extended  to a full set of spherical coordinates for R3 by adding the radial coordinate ρ = x 2 + y 2 + z2 =  |OP|. In that case we have ζ =  POF = arccos (y |OP|) for a general spatial point P, which reduces to the previous expression when |OP| = 1.

Horizontal Coordinate System The horizontal coordinate system locates a point P on the sphere by two angles λ and ϕ (see Fig. 7) such that if PH is the orthogonal projection of P onto the plane of the horizon, then λ =  PH OF and ϕ =  POPH . We will call lambda the bearing and phi the elevation. Intuitively, to locate P, we start at F, rotate our gaze around the z axis by the angle λ until facing the vertical plane through P, and lift our gaze by an angle ϕ to face P. This is of course the system that surveyors ordinarily use Fig. 7 Horizontal coordinate system. A point P is determined by angles λ (bearing) and ϕ (elevation)

19 Spherical Perspective

545

and is naturally adapted for measuring from nature using theodolites, clinometers, and other surveying equipment (see, for instance, Schofield and Breach 2007). In coordinates we have λ = atan2(y, x), ϕ = arcsin (z)

(2)

and adding the radial coordinate ρ, we obtain a  general system   spherical coordinate  3 2 2 2 for R \ O, with ϕ = arcsin (z |OP|) = arcsin z x +y +z . We note that in drawing practice it is useful to measure angles (and distances along F-meridians) in degrees rather than radians, and we will follow this practice.

Angular Measurements In order to draw spherical perspectives, one must take angular measurements, such as an astronomer or surveyor would, and then render their projections according to the rules of the perspective. When drawing from nature, these angular measurements can be taken with adequate devices – such as theodolites, physical or digital – or, with greater error, by very simple methods such as using a pencil or even one’s hand as reference. An old astronomer’s trick is to stretch the arm straight ahead and rotate the palm away from the eye with the knife of the hand facing upward. In this position, the hand should subtend about 10◦ of the field of view when measured vertically along the line of the knuckles. Of course this rule of thumb must be calibrated to one’s own body proportions, but it is good enough to casually take measurements for an outdoor sketch. The measurement as described obtains the angular elevation. Rotating the hand until the knuckles are horizontal allows you to similarly measure the bearing. So this is appropriate for the horizontal coordinate system. Measuring angles in this system is somewhat natural, and the horizontal system is the one commonly used in surveying (Schofield and Breach 2007), for which there is no shortage of tools, both mechanical and digital. Measuring angles directly in the azimuthal system is rather more awkward, but up ahead we will see how horizontal system measurements may be easily converted to azimuthal measurements at key points of the drawing. So in either case, the horizontal system is the one most often used to obtain the raw measurements of a drawing.

Azimuthal Equidistant Spherical Perspective (360-degree Fisheye) We will first consider the case of the azimuthal equidistant spherical perspective. As we have seen in Definition 5, a spherical perspective is defined by its flattening (and often, as in the present case, named after it). So we must begin by defining the flattening.

546

A. B. Araújo

The Azimuthal Equidistant Flattening We consider the flattening defined by the azimuthal equidistant map. This is the projection used by Barre and Flocon for their fisheye perspective (Barre and Flocon 1968). The flattening far antedates the perspective, having a long history in the cartography of both Earth and night sky. Barre and Flocon call it the Postel projection, as is usual in France, where it is named in honor of Guillaume Postel, who used it in a 1581 map, but it had in fact been used much earlier by several authors. The earliest extant example is a star map of 1426 by Conrad of Dyffenbach, and several others use it before Postel, including Albrecht Durer in a 1515 map and Mercator as an inset for north polar areas on his 1569 navigational map (Snyder 1993, p. 29). According to Berggren (1981), it had already been described in principle, with an emphasis on its uses for celestial mapping, by al-Biruni in the tenth century, who in turn attributes its origin to other authors of the ninth century (Savage-Smith 2015, p. 35-36). Snyder mentions that it may have been known to the ancient Egyptians and used for star maps. It is in fact eminently suited to that function, since it preserves angular distances from the zenith. Although this flattening of the sphere is well known in cartography, it has been derived again and again by multiple authors interested in spherical perspective (the present one included) either out of ignorance of its existence or the need to express it in some particular convenient form. We here express it in a way especially useful for our purposes. The azimuthal equidistant flattening can be defined by the following requirements: 1. 2. 3. 4.

Maps F to (0, 0). Maps F-geodesics to line segments. Preserves lengths along F-geodesics. Preserves tangents at F.

Let’s consider intuitively how to make such a flattening. The sphere may be seen as the union of all the F-meridians, that is, all the meridians going from F to its antipodal point B. Now imagine that the meridians are inextensible threads, and suppose that, keeping them fixed at F, you release them from B and straighten them without stretching on the sphere’s tangent plane at F, all the while keeping each meridian on its original plane through OF. Then you will have transformed the sphere of unit radius into a disc of radius π , satisfying the conditions above (see Figs. 8 and 9). Then F-meridians will be distributed radially, and distances along them will equal angular measurements from F. Parallels of the FB axis, that is, circles of the sphere which are parallel to the frontal plane, will project as uniformly spaced circles. In particular, the perspective image of the corona URD is a circle with half the diameter of the projection disc. The anterior space (the space in front of the observer) will project onto the inner disc bounded by the corona U RD, and the posterior space (the space behind the observer) will project onto the ring between the corona’s circle and the outer rim of the perspective disc.

19 Spherical Perspective

547

Fig. 8 The azimuthal equidistant flattening. Side and top views. Distances are preserved along F-meridians and angles are preserved at F

Analytically, the flattening may be seen as a two-step process: Given a point P = (x, y, z) on the sphere, we project it onto the observer’s frontal plane to get PF = (x, z). This determines the plane of the F-meridian containing P. We then scale this projection to a length of arccos(y), to preserve the angular measure of 2 \ {B} → R2 by P’s position along the F-meridian. This defines a map μ : SO −→ −→ μ(P) = PF arccos(OP · OF), or, in coordinates, (x, z) μ(x, y, z) = √ arccos (y) x 2 + z2

(3)

which is a flattening according to Definition 5. Indeed, it is a homeomorphism from 2 \ {B} (the sphere minus the “Back” point) to the dense subset of the sphere U = SO √ the open disc D = {(x, z) : x 2 + z2 < π }. According to Definition √ 6 the blowup set is the outer perimeter of the disc, B = cl(D) \ D = {(x, z) : x 2 + z2 = π },

548

A. B. Araújo

Fig. 9 Several steps in flattening the sphere onto a disc. The sphere is pierced at B and the meridians straightened onto the tangent plane at F

2 , by and the inverse of μ can be extended by continuity to a map μ˜ : cl(D) → SO setting μ(P ˜ ) = B for all P ∈ B. The blowup, which gets entirely mapped by, μ˜ to point B, can be seen as the replacement of the point B by a circle that represents the tangent plane of the sphere at B. Each point of the circle is connected by an F -meridian, representing a direction from which B may be approached. So, every radius from F to the rim of the disc is a line from F to B, but each such line codifies a different direction from which B may be reached along the sphere. The term blowup is borrowed from the analogous procedure in the study of singularities, wherein a point is replaced by a projective line. This flattening is very natural when expressed in the azimuthal coordinate system we described above. It maps (θ, ζ ) to ζ (cos(θ ), sin(θ )), that is, (θ, ζ ) act as polar coordinates from the sphere to the perspective picture plane.

19 Spherical Perspective

549

Solving the Azimuthal Equidistant Spherical Perspective Now that we have defined the azimuthal equidistant flattening, the corresponding perspective is intrinsically defined: to obtain the perspective image of a spatial point −→ −→ P , first project it radially to the sphere by doing P → OP /|OP | and then flatten it to the plane. Given a scene, a computer could render it easily pixel by pixel, using Eq. 3. Since occlusion is radial in spherical perspective, even the usual hidden-faces algorithms would still work for rendering 3D scenes. But drawing by hand is a more complicated process that must be handled with judgment. How can you efficiently perform the necessary measurements and renderings if you can only handle dozens rather than millions of operations? In the following we will be concerned with solving this perspective. By the term “solving a perspective,” we mean to obtain a systematic procedure for drawing all line images, plane images, and their vanishing sets, for an arbitrary scene, using simple geometrical operations that a human draughtsman may reasonably perform. Following Araújo (2018c), we will follow the strategy of classifying geodesics and using this as a basis for rendering all other objects. It is important to realize that the classification is not unique; it is a matter of convenience. Different partitions of the set of geodesics may be adapted to different strategies for rendering the perspective images of lines, planes, and their vanishing points. Theorem 3 gives us a simple classification of geodesics in this perspective, with two classes. Lemma 1. Two geodesics intersect at exactly two antipodal points. Proof. Let g1 , g2 be geodesics defined by the planes H1 , H2 . Since both plane contain O, then they intersect on a line through O, which must therefore intersect the sphere at two antipodal points belonging to both geodesics.

Proposition 3. In azimuthal equidistant perspective, a geodesic is projected as either a diameter of the perspective disc or an F-starred, closed curve. Proof. Recall that, given a point P, a set is called P-starred if any ray from P intersects the set in one point at most. Let g be a geodesic not passing through F and H its plane. Since g doesn’t contain F, then it doesn’t contain F = F . 2 \ {F, B}. By Lemma 1, g intersects each F The F-geodesics are a foliation of SO geodesic at exactly two antipodal points, none of which is F or B. Hence it intersects each F B-meridian at exactly one point none of which is F or B. Hence its image intersects each ray of the perspective disc at a single point, hence it is F -starred, and it is continuous and closed because the geodesic is continuous and closed and the 2 \ {B}. As for geodesics through F, they must project flattening is continuous in SO as diameters by the definition of the azimuthal equidistant map.

550

A. B. Araújo

Fig. 10 The LR family of geodesics, plotted at 5-degree intervals of polar angle along the median line

Let us investigate the aspect of these curves by parametric plotting (GeoGebra (Hohenwarter et al. 2013) code and grids available at the author’s website Araújo 2015). In Figs. 10 and 11, we can see the family of LR geodesics and its perspective image. The family has been plotted at 5-degree intervals of polar angle ζ at the median line, from the horizon at polar angle ζ = 0◦ to the corona at ζ = 90◦ . Each LR geodesic projects on the perspective as a continuous, closed, convex, F -starred curve. The corona projects as a circle with half the radius of the perspective disc, and then as ζ decreases and the anterior part of the projected curve becomes flatter but still fairly circular in nature, the posterior part (on the ring outside the corona) of the curve bulges to hug the line LR closer and closer as the geodesic approaches the horizon. The tangent at L and R becomes closer and closer to the horizontal until finally at ζ = 0◦ the geodesic reaches the horizon, which no longer projects as an F -starred curve, but instead collapses into a diameter of the disc. Each geodesic divides the perspective disc into a convex region and a non-convex complement. Note that both these regions are the images of planes, or rather of equivalence classes of planes parallel to each other (since they all have the same vanishing set which is the geodesic itself). The planes that are inside the convex region are antipodal to the ones represented by the outer region – they are parallel but face opposite sides of the sphere. These geodesic images are rather complex curves in appearance, and ahead we will consider how to plot them by elementary means in close approximation. For now we will take them as given and consider the uses of these plots.

19 Spherical Perspective

551

Fig. 11 The image of the LR family of geodesics under the azimuthal equidistant perspective

Fixed Grids for the Azimuthal Equidistant Perspective A simple way of drawing spherical perspectives is by interpolation on grids. One can parametrize a finite number of horizontal and vertical lines on frontal planes, at regular intervals of elevation and bearing, respectively, and plot them through Eq. 3 to some required precision. Then any point may be obtained in approximation by locating it on this grid, to whatever precision it allows. In fact the ordinary literature on spherical perspective for artists uses such gridding methods almost exclusively. The most sophisticated use of fixed grids is probably what can be found in Michel (2013), and the intricate works therein make obvious that such methods can be perfectly adequate for many artistic applications. Let us therefore consider the construction of such grids. Instead of parametrizing horizontal and vertical lines, we note that the family of LR geodesics we presented above is all that we need to print an adequate grid. Consider a LR geodesic that crosses the anterior median line at a polar angle ζ , say the ζ = 45◦ geodesic in Figs. 10 and 11. This may be interpreted as a pair of diametrically opposite frontal lines: the anterior meridian of the geodesic is the projection of a horizontal frontal line that crosses the median line in front of the observer at a 45◦ elevation, that is, at point (θ = 90◦ , ζ = 45◦ ), and goes to the vanishing points L and R. On the other hand, the posterior meridian of the same geodesic (the antipodal image of the first meridian) can be seen as the image of a horizontal spatial line going from R to L that crosses the median line behind and

552

A. B. Araújo

under the observer, at a negative 45◦ elevation, or at point (θ = −90◦ , ζ = 135◦ ), that is, at the antipode of the first point. The meridians of these two lines are antipodal of each other and together form a complete geodesic. Repeating the plot of the F R family for negative ζ at the frontal median line, we obtain a symmetrical family of horizontal line plots, and this forms a full plot of horizontals. Now consider the F -family of geodesics, which as we have seen project as diameters. Each diameter can be interpreted as a pair of lines that go from F to B, perpendicular to the coronal plane (central lines). Each diameter will represent two antipodal line images, say one passing the coronal plane to the left and above the observer and the other passing to the right and below the observer along the same plane through O and perpendicular to the coronal plane. We plot these diameters crossing the corona at regular intervals of θ . Since, as we have seen above, each ray from F in the perspective disc intersects each LR geodesic exactly once, then the set of F rays together with the set of LR geodesic images forms a grid on the perspective disc, such that any point may be located at the intersection of two such lines. In Fig. 12 (left) we see a grid where we plotted these two classes of geodesics at 15-degree intervals. This can be interpreted either as grid of geodesics or as a grid of central and frontal lines. In either case they locate any point on the sphere up to the precision determined by the grid interval. As mentioned previously, a crucial part of treating a perspective is to consider the classes into which one should divide the set of geodesics. The partition is not unique. In the present case, one might as well choose a grid of crossing LR and UD geodesics (interpreted as horizontal and vertical lines) as in Fig. 12 (right). The UD geodesics are readily obtainable from the set of LR geodesics in Fig. 11 by a 90◦ rotation around F .

Fig. 12 Geodesic grids. Left: Grid of LR and F B geodesics. Right: Grid of LR and U D geodesics. All geodesics plotted at 15-degree intervals

19 Spherical Perspective

553

We will see ahead that the set of LR geodesics is even more versatile than that: it can be seen as a seed for the set of all geodesics. But for now this is enough of grids. Thinking in terms of grids is a limitative view of perspective, both classical and spherical. In the next section, we will develop spherical perspective as a perspective proper, in the vein of Barren and Flocon’s treatise (Barre and Flocon 1968), and then generalize it following Araújo (2018c). Only after that we will revisit grids, in a new way that makes them dynamic rather than fixed and exploiting the symmetries of the perspective. Note 6. While we focus here, for conceptual reasons, on grids of geodesics, there are other useful spherical grids made up of non-geodesic foliations: Fig. 8 shows a grid of constant polar angle circles crossing LR geodesics and the curves of constant elevation that together with U D geodesics make a very useful grid (see Araújo 2018c).

A Ruler and Compass Construction of the Azimuthal Equidistant Spherical Perspective We will now learn to draw geodesics (and images of lines and planes) by ruler and compass constructions. We do this in two parts. In the first part, we follow Barre et al. (1964) to render the anterior hemisphere by arc of circle approximations. In the second part, we follow Araújo (2018c) in generalizing these constructions to the posterior hemisphere through the use of antipodal images. A crucial point in the work of Barre et al. (1964) is that in the anterior disc, the geodesic projections are well approximated by arcs of circle, and their perspective constructions are all built under that approximation. In fact it is a matter of definition whether the fisheye perspective of Barre and Flocon is the azimuthal equidistant perspective or another one, defined by projection of geodesics in exact arcs of circle. Either way, they are functionally the same. The point of using arcs of circles is that they are very nice objects to work with, being the very next step in complexity from line segments: a line segment has zero curvature, while an arc of circle has constant curvature. Two points determine a line segment, while three points determine an arc of circle. Our goal in rendering lines in the anterior hemisphere consists in finding a set of three adequate points to define their arc of circle image. We start by noticing that points in the coronal plane are very easy to render. Proposition 4. If P is on the coronal plane, then its perspective image P is on the coronal circle and  P F R =  POR. This proposition follows trivially from the form of the azimuthal equidistant flattening, but it suggests a very important construction that relates the spherical perspective and the orthographic projection. This is crucial for drawing from plan and elevation, just as in classical perspective.

554

A. B. Araújo

Construction 1. Construction of coronal point. Let P = O be a point on the coronal plane. Draw an orthographic projection of the 3D scene, onto the coronal plane, on the same drawing as the perspective disc, with F serving both as the perspective image and as the orthogonal image of F. Then, if P is the perspective image of P and Pf is its orthogonal projection onto the coronal plane, P is the −−→ intersection of the ray OPf with the perspective image of the corona (see Fig. 13). Note that the orthogonal perspective may be scaled at will relative to the perspective disc. We are now ready to construct the images of lines in the anterior hemisphere. Following Barre and Flocon (1968), we divide them into two classes: receding (intersecting the coronal plane) and frontal (parallel to the coronal plane). In particular, receding lines may be central, that is, have F and B as vanishing points. Construction 2. Construction of central lines. Suppose l is a central line. Then it intersects the coronal plane at a point P. Let Pf be the frontal orthographic projection of P. Then the anterior image of l is the radius of the anterior disc that goes through Pf . Proof. Since l is a central line, then it has F as a vanishing point and must project as a radius. By Construction 1, it must contain Pf .

Construction 3. Construction of frontal lines in the anterior half space. Let l be a line on an anterior frontal plane. Then l has two antipodal vanishing points V

Fig. 13 The perspective image of a point P on the coronal plane is the point P on the image of the corona such that θ =  P F R =  POR. This may be obtained directly by drawing a line from F to Pf , the orthogonal projection of P on the coronal plane, arbitrarily scaled at will and overlayed on the perspective plane with F ’s perspective image coincident with its orthogonal image

19 Spherical Perspective

555

and V  , obtained by translating l to O and intersecting it with the sphere. Since l is frontal, its vanishing points lie on the corona. Plot them as in Construction 1. Also, l must intersect either the median plane or the horizon. Since both the median plane and the horizon define F -geodesics, angles are preserved along their images. Hence if P is a point of l on the horizon (resp. median plane), then P is a point on the horizontal (resp. vertical) diameter of the perspective disc such that |F P | =  POF. Then line l projects on the anterior plane as a the arc of circle V P V  . If l is in the coronal plane, then it projects as one half of the coronal circle. In particular we now can construct horizontal and vertical lines in the anterior half space. They just project as LP R (resp. U P D) where P is the point where the line intersects the median plane (resp. horizon). Note that we like to measure the points where the lines cross the median plane or the horizon because at those positions the polar angle is the same as the bearing or the angular elevation, respectively, and as we have mentioned before, it is much more natural to measure elevation and bearing in the horizontal system than polar angle and azimuth in the azimuthal equidistant system. Through the judicious use of the median, horizon, and coronal intersections, we relate the azimuthal system, which is natural for this perspective, with the horizontal system of coordinates, which is convenient for measurements from nature. This suggests a construction for points from their orthogonal projections on the median and coronal planes: Construction 4. Construction of points in the anterior space. Let P be a point in the anterior half space. Pass a frontal horizontal line and a frontal vertical line through P. Project the lines as in Construction 3 to get two arcs of circle. Then P is found at the intersection of the two arcs. Now we can plot general receding lines. Construction 5. Construction of receding lines in the anterior space. Let l be a receding line. It will intersect the coronal plane at a point P. Plot P as in Construction 1. Since P is in the geodesic of the line, then so is P  , which is found diametrically opposite to P on the corona. Let lO be the parallel to l through O. It intersects the anterior hemisphere at a point V , which is a vanishing point of l. Project V as in Construction 4. Then the line image on the anterior half space is the segment P V of the arc P V P  . Let’s see some examples of the perspective constructions we can obtain: Example 4. In Fig. 14 we see three lines: line l = V P V  is on a frontal plane somewhere ahead of the observer. It makes a 30-degree angle with the horizon, going down as it goes to the right, so that its vanishing points lie on the corona and  V F R = 30◦ . The line crosses the median plane at point P , which is at 60-degree

556

A. B. Araújo

Fig. 14 Three lines rendered in the anterior half space following the method of Barre and Flocon (1968)

elevation in front of the observer and therefore is marked two thirds of the way up on the segment F U . Line l1 is a receding line on a horizontal plane, going 45◦ to the right; hence it has a vanishing point V1 halfway at the midpoint of segment F R. It crosses the corona at a 45-degree angle down and to the left of the observer, as evidenced by point P1 , such that  P1 F L = 45◦ . Since P1 is on the line, the antipode P1 must be on its geodesic, which is therefore equal to the arc P1 V1 P1 in the anterior disc. However, the line itself only covers the segment P1 V1 of this arc. Line l2 is also receding but is a central line, hence its vanishing point in the anterior space is point F , and the line projects as a straight line segment P2 F , where P2 is the projection on the corona of its intersection with the coronal plane, which in this case happens somewhere at a 20-degree angle down and to the left of the observer. These constructions, along with the decision to approximate line images by arcs of circle, are the essence of the (hemi)spherical perspective of Barre and Flocon (1968). We will now consider its generalization to the full sphere, following (Araújo 2018c). This method piggybacks on the previous one by relying on the arc of circle approximation in the anterior hemisphere and extending the line classification and geodesic rendering through the use of antipodal images. We begin with the construction of the antipode of a point.

19 Spherical Perspective

557

Fig. 15 Construction of antipodes. Since  POF = 180◦ along the FP geodesic, then |P P  | equals half a diameter along the ray from P to F

Proposition 5. Let P = B be a point with perspective image P . The image of the −→ antipode of P is the point P  on the ray P O such that |P P  | equals half a diameter of the perspective disc. Proof. Since P and F are not antipodal, they define a single F -geodesic, which projects as a diameter of the perspective disc. To find P from P, one must move 180◦ along this geodesic (see Fig. 15). Hence, since lengths are preserved on F -geodesics, then |P P  | must equal half a diameter. Since the projection is continuous, it must preserve the ordering of points, so P  must be across from O −→ on the projected geodesic, therefore on P O.

Now we can draw a posterior meridian from a given anterior meridian. Corollary 1. Let g be a geodesic and m a meridian image in the anterior perspective disc. Then m , the antipodal meridian of m, is the locus of the points P  determined by Proposition 5 for P in m, and we have g = m ∪ m . Construction 6. Construction of the antipodal image of a meridian. Let m be the arc of circle approximation of an anterior meridian m of a geodesic g. Mark points Yi , i = 1 . . . n at regular angular intervals along m. Draw the antipodes Yi according to Proposition 5, and interpolate a smooth curve between them to draw an approximation to m . The interpolation in Construction 6 may be done in several ways. Araújo (2018c) uses so-called fatlines, that is, sequences of overlapping arcs of circles between sequential triples of reference points Yi , in the sequence Y1 Y2 Y3 , Y2 Y3 Y4 , etc., so that each triple shares two points with the previous one. The corresponding overlap of the arcs serves as a guide for the accuracy of the approximation. If the arcs do not

558

A. B. Araújo

align well enough at the overlap (if the line is “fat”), then this serves as a warning to further subdivide the offending region of the meridian, adding further points Yi to the list of points that generate the locus, until a required accuracy is reached. This can be done to the specific areas identified, since the required density of points will not be equal along all areas of the meridian, curvature changing faster near the corona than elsewhere. For the draughtsman, the hardest part of the process is the theoretically more prosaic: drawing the arcs between the triples takes time and effort using ruler and compass by the usual Euclidean method of taking perpendiculars through the segment midpoints to find the arc’s center. But any trained draughtsman may in practice abbreviate that part by just eyeballing constant curvature arcs through each set of three given points. Of course, other interpolation methods may be used to join the points. For instance, arcs may be drawn without overlap, using the matching of end tangents to judge accuracy; Bezier curves, which are easily drawn by hand, may be interpolated instead of arcs of circles, and, of course, many draughtsmen will in practice just reach for their French curves, curve guides, or flexible curves and judge the approximation by eye. Apart from fatlines, Araújo (2018c) proposes a mechanical method for very quick drawing with excellent accuracy, called “ruler, compass and nail.” Construction 7. Stick a nail through the drawing at point F . Suppose a meridian m is given in the anterior disc. Suppose the perspective disc has radius r in the drawing sheet. Take a ruler marked at length 0 and at length r. Move the 0 mark of the ruler over the arc of the meridian m, while sliding the edge of the ruler against the nail, so that it never stops touching it; then, by Proposition 6, the r mark of the ruler will describe the locus of m . This provides a very quick and accurate way of plotting as many points as required of the antipodal meridian. Just slide the ruler as described, stopping along the way to mark as many points as desired and then interpolate by hand. It is quite easy and efficient to perform the process as described, as you only need to keep the eye on mark 0 and feel the contact of the ruler against the nail to make sure the ruler is properly positioned to draw the locus of m at the other end. One could further conceive of a simple mechanism to draw the locus automatically and continuously as the 0 mark follows the meridian m: a ruler with a lengthwise slit wherein the nail could slide, with pencils at both marks. But the method as described is efficient enough and requires only ordinary drawing tools (apart from the nail). We now generalize the constructions of lines to the full perspective disc: Construction 8. Construction of central lines. Suppose l is a central line. Then it intersects the coronal plane at a point P . Plot this point according to Construction 1. The image of l is the radius of the perspective disc that goes though P .

19 Spherical Perspective

559

Construction 9. Construction of posterior frontal lines. Let l be a line on a frontal posterior plane. Then l has two antipodal vanishing points V and V  on the coronal geodesic. Plot them as in Construction 1. Also, l must intersect either the median plane or the horizon at some point behind the observer. If P ∈ l is on the horizon (resp. median plane), then P is a point on the horizontal (resp. vertical) diameter of the perspective disc such that |F P | =  POF. Obtain P  by the construction in Proposition 5. The arc m = V P  V  is the image of the anterior meridian of the geodesic through l. Draw m by Construction 3. Draw m by Construction 6 to obtain the image of l. Construction 10. Plotting posterior points from orthographic projections. Let P be a point in the posterior space, let v and h be the frontal vertical and frontal horizontal through P, and let PH , PM be the perspective images of their respective  orthographic projections in the horizon and in the median plane. Plot PH and PM   by Proposition 5. Pass a frontal vertical v through PH and a frontal horizontal h  as in Construction 4. These intersect at P  . Construct the full geodesics through PM  of v and h by Construction 6. These geodesics will contain v and h and intersect at P . Alternatively, if the full geodesics are not required, just obtain P by Proposition 5 once P  is obtained. Construction 11. Construction of receding lines in the posterior hemisphere. If l is a receding line, then it has a point image P at the coronal line and a vanishing point in each of the hemispheres. If the posterior vanishing point V is given, construct the anterior vanishing point V  , according to the way V is given. If the plot of V is known, use Proposition 5; if the orthographic projections of an anamorph point of V are known, use Construction 10. Draw the anterior meridian P V  P  by Construction 5. Then draw the full geodesic of l through Construction 6. The image of l is the subset V P V  of the full geodesic. Example 5. Figure 16 illustrates the use of the preceding constructions in obtaining the antipodal of line l and the prolongations to the posterior hemisphere of lines l1 and l2 from Example 4. It is often useful, especially in the plotting of vanishing points, to use the horizontal coordinate system. To use it graphically, it is necessary to know how to plot the lines of constant elevation (the lines of constant bearing are just vertical planes through O so we already know how to plot them). The following proposition solves this in the anterior hemisphere, following Barre and Flocon (1968): Proposition 6. Let h be a circle of constant elevation ϕ on the visual sphere around O. Then h intersects the median line at the point PM such that |PM F | = ϕ and intersects the corona at points PL and PR such that PL F L = PR F R = ϕ. The anterior image of h is approximated by the arc of circle PL PM PR .

560

A. B. Araújo

Fig. 16 Construction of lines and their geodesics on the full perspective disc according to the method of Araújo (2018c)

In the posterior hemisphere, we use the following proposition, following Araújo (2018c): Proposition 7. Let h be a circle of constant elevation ϕ on the visual sphere around −→ O. Let P = F be a point of h in the anterior disc. Let M be the intersection of F P with the corona. Let Q be the point such that M is the midpoint of P Q. Then Q is in h. The posterior part of h is the locus of the Q thus defined by P , for all P in the anterior part of h. −→ Proof. This follows from the fact that F P defines an F -meridian, hence preserving lengths, and an F -meridian will intersect a line of constant elevation at exactly two points. These points are equidistant from the corona, since both the circle of h and the plane of the F -meridian are mirror symmetric relative to the coronal plane.

Construction 12. To construct an approximation of a constant elevation line, use the arc of circle approximation in the anterior hemisphere, and then construct from it the locus of Proposition 7 to use as an approximation for the posterior line.

19 Spherical Perspective

561

Note that although the result of Proposition 7 is correct, the use of the arc of circle approximation in Construction 12 leads to errors also in the posterior curve. These are mostly visible as a break of continuity of the tangent near the intersection with the coronal circle. See Araújo (2018c) for details.

Perspective Constructions The constructions we obtained above allow us to draw general scenes in azimuthal equidistant spherical perspective. Let us explore some examples of perspective constructions.

Tiled Floor (Central) In Fig. 17 we see the construction of a tiled floor with a central axis (or a uniform grid), a classical problem in perspective. We assume a grid of square tiles on a plane perpendicular to the coronal plane. Assume one of the vertices of the grid is at D and a line q of the grid is parallel to LR. Then the vertices of the grid on line q will intersect the coronal plane at equally spaced points Qi , i ∈ Z. The lines of the grid perpendicular to q are central lines, going back to front; hence they project as rays

Fig. 17 Construction of a uniform central grid

562

A. B. Araújo

Fig. 18 Construction of a cubical room from 45-degree geodesics

−−→ F Qi . To find the lines of the grid going left to right, we use exactly the same trick as in classical perspective: we pass a horizontal line through D, going at 45◦ to the grid lines. Then it will intersect exactly one vertex of the grid per row. To mark this line, place its vanishing point V at the midpoint of F R. Since the line passes at D, its geodesic also passes at U = D  . Draw the full geodesic g as in Construction 6. The vertices of the grid will be at the points Gi , i ∈ Z, where g intersects the central lines of the grid.

Inside a Cube In Fig. 18 we show the construction of a cubical room seen from its center. The whole construction is done through the use of a 45-degree geodesic as the one in the previous example, which crosses the vertices of the room. Note that we have shaded the planes of the walls with the exception of the back wall which ends at the blowup – the resulting region is shaped like a four-leafed clover. Notice the extreme distortion in the back wall, which nonetheless doesn’t preclude an accurate construction by the techniques here presented.

19 Spherical Perspective

563

Fig. 19 A square on the ground. Spherical perspective view (top) constructed from the orthographic projection onto a horizontal plane (bottom)

Arbitrary Square In Fig. 19 we show the construction of an arbitrary square on the horizontal plane through D, in this case slanted at a 30-degree angle to the OF axis. We draw the orthographic top view of the square at the bottom of the figure, at a convenient distance from the perspective drawing. Line h is a top view of the coronal plane. We extend the four sides of the square along their lines, find the intersections of these with h, and then lift these intersections to their orthographic image of line h on the perspective disc itself, using Construction 4 to get their images at the coronal circle. These four lines are all examples of receding lines, so we use their intersections with the coronal plane, and their antipodes, together with their vanishing points at 30◦ to the right and 60◦ to the left on the horizon, to obtain the images of all four lines. Their intersections determine the image of the square ABCE.

564

A. B. Araújo

Fig. 20 Multiplication of a square to form a uniform grid or floor tiling

Tiled Floor In Fig. 20 we show how to multiply the arbitrary square we designed in Fig. 19 to get a tiled floor that is not aligned with the central axis (Fig. 21). In this case we get a grid on a horizontal plane, whose axes go to a vanishing point V 30◦ to the right of F and the other axis to vanishing point W at 60◦ to the left. Starting from square ABCE, we pass two lines a and b at 45◦ to the edges of the square, going to vanishing point Z at 75◦ to the right of F and to its antipode at 105◦ to the left of F . In the orthographic view, we see how these lines are used to obtain the grid: edge AB is extended until it hits line a. At the point P1 where it hits a is a vertex of the grid, and two crossing lines of grid are obtained by passing a line from P1 to the vanishing points V and W . One of these lines will hit line b, obtaining another point P2 , and the process repeats. The same thing happens in the opposite direction, obtaining the points P−1 , P−2 , and so on. In each case, one obtains a line that pingpongs between lines a and b, and at each collision locates another vertex of the grid, which in turn generates two lines of the grid. In this way the whole grid is obtained by perspective multiplication (Fig. 21).

19 Spherical Perspective

565

Fig. 21 The final uniform tiling, constructed in Fig. 20

A fundamental departure from the classical method adopted in Fig. 17 is that we don’t use the orthographic projection to plot the points Pi , which is convenient as these points quickly extend beyond the boundaries of the drawing sheet, as they similarly did in the grid of Fig. 17 or indeed in classical perspective. By contrast, in Fig. 20 all the constructions beyond the first square and the lines a, b are internal constructions, that is, are done directly from the points in the perspective disc without requiring external auxiliary diagrams. This makes the construction method compact, like the end result itself, and is therefore more in accord with the general philosophy of spherical perspective. In Fig. 22 we added cubes and a ramp to Fig. 21 in an example of the use of vanishing planes and constant elevation lines. Consider first the ramp, drawn on the right side of the picture. Its base occupies 2 × 5 square tiles. The long axis of the base goes toward the horizon, to vanishing point V at 30◦ to the right of F . Suppose the ramp has a slope of 45◦ . Consider the proximal vertical plane that contains the long side of the ramp. Since the plane is vertical and contains the base line going to V , its vanishing plane is the geodesic through V U . Since the sloping edges l of the ramp rise at 45◦ from the ground, their translation to the origin, lO , rises at 45◦ from O with respect to the horizon. Hence it is on the line of constant elevation 45◦ . We

566

A. B. Araújo

Fig. 22 Cubes and ramps, drawn using a ground tiling as reference and sending lines to vanishing points on the c, which is a circle of constant elevation 45◦

draw that line of constant elevation by the construction of Proposition 6 and denote it by c in Fig. 22. The vanishing point V  of the sloping edges is in the intersection of U V with c. We draw the ramp by sending lines from the base to V  and stopping them when they hit the vertical coming from the base of the ramp. Now consider the three boxlike figures center and left of the picture. All three represent cubes of exactly the same size, which is a testament to how much the deformations of this perspective may be disconcerting with regard to one’s intuitive judgment of size relations. In all three cases, a base was established that is equal to 3 × 3 squares. The leftmost cube stands on this base, while the others are aligned with theirs by vertical lines. To establish the height of the cubes, one just has to note that on each face of a cube, opposite vertices are connected by a diagonal making a 45-degree angle with the edges of the cube. Therefore, to establish height, send such a diagonal from a base vertex to its vanishing point, and the top vertex will be at the point where the diagonal hits the distal vertical of the same face of the cube.

19 Spherical Perspective

567

Dynamic Grids Let us revisit grid methods. Using grids can be liberating, especially when drawing from nature, where ruler and compass constructions can become cumbersome. However, drawing on top of fixed grids like those of Fig. 12 is also very limited. Most lines will not be in the class one chose to plot in the grid and must be therefore guessed at with great uncertainty. Araújo (2019b) proposes a method of dynamic grids that has the mechanical simplicity of a grid method without the limitations. The method works by exploiting the group of symmetries of a specific perspective to create a simple mechanical device that generates all geodesics from a seed family. It can be implemented in other spherical perspectives (as we shall see ahead in the equirectangular case) that have other group symmetries. In the azimuthal equidistant case, it works as follows: The LR family of geodesics that cross the median plane at ζ ∈ [0, 90◦ ] (Fig. 11) generates all other geodesics by rotation around the y-axis (i.e., the FB axis), and that spatial rotation acts on the perspective plane as a rotation around F . Indeed, giving a geodesic is the same as giving a plane through O, and all planes through O will either be the coronal plane or intersect it on a diameter. The azimuth of the intersection, together with the dihedral angle ζ between the plane of the geodesic and the coronal plane, therefore determines the geodesic. If −→ we define the apex of the geodesic relative to OF to be the point of a geodesic that is closest to F , any geodesic outside the coronal plane is uniquely determined by the (θM , ζM ) coordinates of its apex, with ζM ∈ [0, 90◦ ] and in θ ∈ [−180◦ , 180◦ ]. Then we can select a geodesic’s ζM by picking one curve of the LR family and select the value of θM by rotating this chosen element around F , which changes θM for fixed ζM . This suggests a simple mechanical method to draw the segment between two given points (Araújo 2019b). Place a print of the LR family of geodesics under a sheet of tracing paper, and fix one to the other by sticking a nail through point F in the print. The drawing will be executed on the tracing paper sheet. Then, given two points P and Q (Fig. 23) drawn on the tracing paper sheet, rotate the drawing on top of the print; there is one and only one geodesic of the LR family that will pass through the given points. Once found, just trace over it. Of course, since we are only printing a finite number of geodesics, the correct geodesic will often be found between two printed ones and can be traced by drawing between these, with a measurement error bounded by half the angular distance between printed lines. Unlike the ruler and compass methods of Barre and Flocon (1968) and Araújo (2018c), this method avoids the need for measuring and plotting special points in the construction of geodesics – any two points will do – and unlike the fixed grid methods, it is not constrained by arbitrary setups. Also, when tracing from parametric plots, we are using a more accurate rendering of geodesics (no longer using the arc of circle approximation), which may be somewhat more accurate when recovering the anamorphosis from the perspective (say, in VR renderings, which we will discuss ahead).

568

A. B. Araújo

Fig. 23 Finding the geodesic that joins two given points P and Q by rotation of the LR family around F . This can be executed mechanically by placing a tracing paper sheet over a print of the family of geodesics and sticking a pin through the center of both sheets. Then the tracing paper containing points P and Q can be easily rotated around F while the printed geodesics remain fixed, and thus P Q will be found

Drawing from Nature We end the section with two drawings made from observation. The first one, a picture of a stairwell (Fig. 24) drawn entirely with the ruler and compass methods, required careful measurements and the preliminary construction of a plan and elevation of the scene. The steps of the stairs are built from line vanishing to F and U that bounce from twin sets of ramps going to common vanishing points at around 33◦ elevation. The second example (Fig. 25) was sketched using the dynamic grid method in a much looser way, using a few key measurements to obtain the scene, which certainly is far from exact and yet essentially correct. The two drawings are both small, fitting, respectively, into an A4 and an A3 drawing sheet, and show what can be done by an average draughtsman with the methods presented above.

Equirectangular Perspective We will now consider a different flattening and its corresponding perspective: the equirectangular spherical perspective. We will not go into as much detail as in the previous case, but rather establish parallels with it, while referring the reader to external references for details. Our point here is that there is a general schema, or strategy – although not an algorithm – for solving spherical perspectives. We will therefore stress the parallels between the two cases.

19 Spherical Perspective

569

Fig. 24 Construction of a stairwell. Pencil and ink on graph paper. Drawing by the author

VR Panoramas as Immersive Anamorphoses Before we start solving equirectangular perspective, let us take the time to point out its peculiar connection with VR visualizations, for which it is particularly convenient. As we have discussed above, spherical perspectives can be seen as repositories of the visual information of a spherical anamorphosis, that is, of an immersive optical illusion that elicits in the observer the impression of being surrounded by a 3D environment. The construction of immersive anamorphoses is a problem with a long history. The problem was posed and answered in various ways, from the partial to the fully immersive, and using a great variety of projection surfaces. Andrea Pozzo, among many, deals with this problem both theoretically (Pozzo 1693) and in his work itself (Fasolo and Mancini 2019), for anamorphoses on surfaces planar, multiplanar, or even irregularly curved. The hugely successful panoramas of the nineteenth century (Grau 1999) dealt with the recovery of immersive anamorphoses from cylindrical perspectives, even if, as pointed out by Kemp (1990), these were

570

A. B. Araújo

Fig. 25 Corridor and stairs. Pencil on paper. Drawing by the author

usually not cylindrical at all but rather clever hacks involving the stitching of adequate linear perspectives (this is quite proper: the business of artists is to be clever magicians rather than mindful technicians). There were even attempts at spherical anamorphoses (Belisle 2015), although a spherical perspective as such was not yet formulated. Most of these early anamorphoses required the creation of large-scale architectonic structures wherein the trompe l’oeil could take form. Today we have a rather more economic tool at our disposal, in the form of the virtual reality panorama. These panoramas work by taking an image file (Fig. 26) – usually photographic or a 3D graphics render – and wrapping it around a virtual sphere. Then a user may view them interactively by virtually pointing the screen (moving a mouse or moving the head in a VR helmet), like a camera, at a desired section of the sphere. In terms of anamorphosis, the user controls the position of a rectangle in space (Fig. 27), which defines a cone of rays that cuts the sphere, selecting a region of the spherical anamorphosis, and this rectangular anamorphosis is what is seen on screen (Figs. 28 and 29). Risking pedantry, we point out that anamorphosis, being

19 Spherical Perspective

571

Fig. 26 An equirectangular perspective of a corridor and inner courtyard. Graphite on paper. Drawing by the author Fig. 27 An interactive virtual reality panorama is obtained by projecting the spherical anamorphosis radially onto a rectangle that can move freely around O. Each such projection will capture at most one vanishing point of each line

defined as an equivalence relation, is transitive; hence a (plane) anamorphosis of a (spherical) anamorphosis of a scene X is still an anamorphosis of X. These panoramas have many applications in computer graphics; they can be used for environmental mapping or generally as painted backgrounds for 3D environments (Greene 1986). Photographic VR panoramas (also called 360-degree photographs or VR photographs) created by image stitching have been extensively studied both in the methods of their accurate creation and of efficient interactive rendering (for a technical account, see Benosman et al. 2000) and have many appli-

572

A. B. Araújo

Fig. 28 Virtual reality view of the panorama of Fig. 26 (looking toward the left)

Fig. 29 Virtual reality view of the panorama of Fig. 26 (looking toward the right)

cations, from their uses in casual or artistic photography to scientific applications in the documentation cultural heritage sites (Rossi 2017). Our peculiar interest in these panoramas stems from the fact that we can use our handmade perspective drawings as source data for the same rendering engines that display VR photography. Indeed, the data files for these applications are constructed with the exact same chart projections that we use in our spherical perspectives. Since, as we have discussed, all (total, compact) spherical perspectives are equivalent, then we could, in theory, use any spherical perspective as source. In practice, VR engines are built to use certain specific formats, and we must either draw in the corresponding perspectives or use software to convert our drawings to them.

19 Spherical Perspective

573

The common formats for VR panoramas are either cubical or equirectangular, the latter being the most common in photography applications. The azimuthal equidistant projection, unfortunately, is not common for these purposes. Although fisheye cameras naturally project light onto the sensor in a configuration that approaches the azimuthal equidistant projection, this is then converted to equirectangular format for further processing. Unlike the fisheye projection, which renders onto a disc, the equirectangular projection renders onto a 2 × 1 rectangle, thus fitting well with the rectangular nature of image files. The deformations of this projection are exhibited well by the equirectangular perspective drawing in Fig. 26. They are similar to those of a cylindrical projection near the horizon and then become squarish as we approach the top and bottom edges of the picture. Since the drawing has been correctly drawn according to the precepts of equirectangular perspective (that we will discuss ahead), it can be loaded as is into any VR panorama engine which will render it as if it was a 360-degree photograph. In Figs. 28 and 29, we see two frames of such an interactive render, looking left and right along the corridor depicted in Fig. 26. Note that in each linear perspective picture thus obtained, each line will exhibit at most one of the two vanishing points present in the full spherical perspective. Note 7. Currently there are VR panorama visualizers freely available both in the desktop and in social media websites. The peculiarities of loading images into each such viewer (which sometimes involve metadata injections or image tagging) are ephemera better left to a quick online search or to supplementary notes at the author’s website (Araújo 2015). We will now solve the equirectangular perspective by following the strategy we used in the azimuthal equidistant case: classify geodesics and their projections, and then learn how to render them efficiently, with a focus on antipodal constructions. In this we will be following (Araújo 2018b).

Construction of the Equirectangular Flattening Equirectangular perspective is defined by the flattening of the same name. Again, this flattening is a well-known map projection; it was at one time attributed to Eratosthenes, but present research tends to follow Ptolemy in crediting Marinus of Tyre (about A.D. 100) as its originator (Snyder 1993). The equirectangular projection was very prevalent in maps due to the simplicity arising from their main property, that of transforming the graticules of the meridians and parallels (lines of constant longitude and latitude) into a uniform grid of verticals and horizontals on a projection rectangle. We can construct this flattening in the following way: taking the same referential as in the previous sections, we now release the meridians at points U and D and straighten them without stretching while keeping fixed the line of the horizon (Fig. 30). This transforms the unit radius sphere into a cylinder of radius one and

574

A. B. Araújo

Fig. 30 Equirectangular flattening. Left: Straightening the UD meridians with horizon circle fixed transforms the unit sphere onto a cylinder of height π and radius one. Right: Cutting the cylinder vertically through point B and unrolling it results in a 2 × 1 rectangle

height π . Then we cut the cylinder vertically across B and unroll it onto the plane, as a rectangle of height π and length 2π , with F at the center. Lengths are preserved along the UD lines, which become verticals in the perspective rectangle. Both lengths and angles are also preserved along the line of the horizon. The map is a homeomorphism everywhere except at U, D and at the UBD meridian. Points U and D become the top and bottom lines of the rectangle, respectively (each point of the line representing a vector of the tangent plane), and UBD is sent to both the left and right borders of the perspective rectangle. Hence the border of the perspective rectangle is the blowup of this flattening, by Definition 6. The flattening is very simple in terms of the horizontal coordinate system in Equation 2 (recall also Fig. 7). In terms of this system of coordinates, a point P that projects on the sphere with bearing λ and elevation ϕ is projected onto the plane as the point P = (λ, ϕ).

Images of Geodesics We must begin with a classification of geodesics in equirectangular perspective. Take once more the LR geodesics (Fig. 10), as we did for the azimuthal equidistant case. If we plot them at 5-degree intervals with their maximum elevation ϕM in the interval [0, 90◦ ], we get the curves in Fig. 33. It can be shown (Araújo 2018b) that these curves follow the equation ϕ(λ) = arctan(tan(ϕM ) cos(λ))

(4)

−−→ Where ϕM is the angular elevation of the curve at its apex relative to OU (its point of maximum elevation), which in this case occurs for λM = 0, that is, on the median plane. When ϕM = 90◦ , the curves degenerate into the union of the two vertical lines, through L and R.

19 Spherical Perspective

575

Analogously to the azimuthal equidistant case, we can obtain all other geodesic −→ from the LR family by rotating it, but now around the UD axis. The rotation ◦ ◦ determines the value of λM ∈ [−180 , 180 ], while the choice of geodesic in the family selects the value of ϕM ∈ [0, 90◦ ]. Together they determine the apex and therefore the geodesic. The difference with regard to the azimuthal equidistant −→ case is that this time the spatial rotation around the UD by an angle δ acts as the translation (λ, ϕ) → (λ+δ, ϕ) in the perspective image. Hence the general geodesic is determined by the coordinates of the apex (λM , ϕM ) in the form ϕ(λ|λM , ϕM ) = arctan(tan(ϕM ) cos(λ − λM ))

(5)

Ruler and Compass Approximations The geodesic images in Fig. 33 may look somewhat daunting to the draughtsman at first. They turn out not to be so hard to draw in good approximation using a ruler, compass, and protractor. Modulo translational and reflection symmetries, the difficulty is reduced to the plot of the curves of the LR family in the interval λ ∈ [−90◦ , 0◦ ]. What can we say about these? A cursory look at the geodesic plot shows that the geodesic images are very similar to sinusoids for low values of elevation. In fact, equirectangular drawings (Fig. 26) look very similar to cylindrical perspectives for the regions of the image near to the horizon (the causal photographer may mistake an equirectangular photo for the photo panoramas of old, made by simply overlapping photos side by side on a wall). This cursory impression turns out to be partly accurate, but more can be said. As pointed out in Araújo (2018b): 1. For apex elevations smaller than ϕM = 33◦ , a geodesic image is well approximated by the sinusoidal curve that matches it at the apex and horizon. The maximum error is less than 1◦ for ϕM < 29◦ and less than 2◦ for ϕM < 36◦ . At ϕM = 33◦ , it equals 1.7◦ . 2. As apex elevations grow larger than ϕM = 33◦ and just as the sinusoidal approximation breaks down, the geodesics become well approximated by the arcs of circle that match them at apex and horizon. At ϕM = 33◦ the circular approximation yields a maximum error of 1.6◦ . This approximation holds well (maximum error under 2◦ ) until ϕM = 60◦ , when the error is 1.9◦ . 3. When the apex rises above 60◦ , the circular approximation becomes useless, as the geodesics assume their squarish form. Araújo (2018b) proposes a simple descriptive geometry construction that allows one to obtain points of the geodesic by using operations with ruler, compass, and protractor. Even for an apex elevation as high as ϕM = 80◦ (see Fig. 31), mere four points thus calculated are enough to obtain an error of the order of 1◦ (using arc of circle interpolation in between the calculated points).

576

A. B. Araújo

Fig. 31 A construction (left) of the ϕM = 80◦ geodesic (right) using a ruler, compass, and protractor, according to Araújo (2018b)

4. Because the equirectangular flattening preserves angles at the horizon, the perspective image of a geodesic whose apex has elevation ϕM makes an angle of ϕM with the perspective image of the horizon (see, for instance, the plot of the ϕM = 80◦ geodesic in Fig. 31(right)). Since we also know that the tangent is flat at the apex, we have two tangents that can be used to constrain the drawing of the curve. These constraints are helpful with Bezier approximations. We will not elaborate here on the construction mentioned on point 3 above. Details may be consulted in the work cited above. The point is that it puts equirectangular perspective on the list of perspectives that can be reasonably constructed without computers, even if it takes some effort. Having classified and rendered geodesics, we can now consider the line images within them.

Drawing Lines Having understood the shape of the equirectangular geodesics, having learned how to plot them according to the coordinates of their apex by elementary means, we now need to learn how to plot lines themselves. Equirectangular line images may take rather confusing, sigmoid shapes when seen in isolation. This confusion vanishes once we see them as meridians (halves) of geodesics. Then we know all of them are described by the curves of Fig. 33 (modulo translation) by choosing a meridian between two vanishing points. Note how we follow very much the same schema as in the azimuthal equidistant case:

19 Spherical Perspective

577

First, we learn how to plot points from adequate angular measurements: this is easier in the present case, since equirectangular perspective relates very simply to the horizontal system, which is very natural to use in observations. Having measured the bearing λ and elevation ϕ of a point P (with a theodolite, from nature, or a protractor, from plans), the perspective image of P is just the point (λ, ϕ) in the perspective rectangle. To find vanishing points of a line l, bring the line to O by translation to get line 2. lO , and then plot the points of lO ∩ SO We then learn how to plot antipodes: on the sphere you can get the antipode P  of a point P by rotating 180◦ around the z axis and then reflecting across the plane of the horizon. In the equirectangular flattening, the first operation becomes a translation of half the perspective rectangle’s length, and the second becomes a reflection across the line of the horizon. Now we divide lines into classes, according to whether they intersect the horizon or not. The horizon takes, in this perspective, a role analogous to the corona’s role in the azimuthal equidistant case. Suppose a line does intersect the horizon. Measure the bearing λH at which this happens. If the line is on a vertical plane through O, then it projects into the union of two verticals, the one through (λH , 0) and the one through its antipode, at a distance of 180◦ along the horizon. Find a vanishing point V , and plot this and V  . If these points are U and D, the line is vertical and projects as the vertical segment through (λH , 0). If not, then the line contains either U or D. Find which, and project the meridian (consisting of two segments of the two lines) which contains U or D, according to the case. If the line crosses the horizon but is not on a vertical plane through O, then it will fall on one of the LR geodesics modulo translation. Find where the line hits the horizon. Then the apex must be at 90◦ from that point along the λ axis (find on which side by inspection). Having obtained λM , measure the elevation over that point to get the apex (λM , ϕM ). Now just plot the proper geodesic, modulo translation by λM , as described in the previous section, and crop at the vanishing points. If the line does not touch the plane of the horizon, then it must have two vanishing points on it. Measure them in the usual way, then the bearing λM of the apex must be halfway between them. Measure the elevation there to get the apex, and plot the geodesic, cropping at the vanishing points. This briefly covers all cases. See Araújo (2018b) for more details. The drawing of Fig. 32 was done entirely with the methods just described. But ahead we will see how to use the symmetries of the perspective to avoid these auxiliary constructions in drawing practice.

Sliding Grids The translational symmetry in Eq. 5 implies that, given the perspective images P , Q of two spatial points, the geodesic segment image P Q may be found by translating

578

A. B. Araújo

Fig. 32 The stairway of Fig. 24, now in equirectangular perspective, constructed with ruler, compass, and protractor. Graphite on graph paper. Drawing by the author. See author’s website (Araújo 2015) to view the corresponding VR panorama

Fig. 33 Equirectangular projection of the LR family of geodesics at 5-degree intervals of maximum elevation

the LR family (Fig. 33) until a member of this family is found to go through P and Q (see Fig. 34). This may be exploited mechanically to make a very efficient “dynamical grid” sketching method (Araújo 2018a) analogous to the rotation grid method we described for the fisheye case: draw an equirectangular grid (or print one) onto a sheet of hard-backed paper of a convenient size (say A4). Then take a sheet of tracing paper twice as large (A3 in this case), fold it in two to form an envelope of size A4, and place the print inside it. All the drawing is done on one side of the tracing paper envelope. When two points need to be joined, slide the print sideways along the fold of the tracing paper envelope, to implement mechanically

19 Spherical Perspective

579

Fig. 34 Sliding the LR family of geodesics to find the geodesic segment that joins points P and Q. Drawn with the free software Eq A Sketch 360 (Araújo 2019a). Additional verticals are marked for measuring convenience

the operation of Fig. 34. After an adequate amount of sliding, a segment will be found to match both points. Trace it, and it is guaranteed that the curve obtained is the correct projection of the segment PQ. The maximum error will be half the separation of the printed geodesics. This is of course analogous to the dynamic grid method we described for the azimuthal equidistant case, but now the mechanical device expresses the group of translations rather than rotations. For convenience, the grid usually contains four copies of the LR family, one for each cardinal point, and another four, their reflections across the horizon (see Fig. 35). Although this makes the grid more crowded, it helps the drawing process in some ways, since the physical print of the LR family, unlike the example of Fig. 34, cannot flow cylindrically out of one edge to reappear at the opposite one (i.e., physical translation is not translation modulo 360◦ ). Therefore the repeated grid will avoid some awkward physical operations like grid inversions or the need to trace partial curves on opposite edges of the grid. In a digital process, of course, the grid can indeed flow cylindrically across opposite edges, and a sliding system

580

A. B. Araújo

Fig. 35 Equirectangular grid, with copies of the LR family centered on each cardinal point and reflected over the horizon for convenience. Verticals are also marked at 5-degree intervals. The grid is sliding inside a tracing paper envelope where the drawing is done. In the figure, it slid about 40◦ to the right to find the line joining A and B

has been implemented in the equirectangular drawing program Eq A Sketch 360 (Araújo 2019a) which uses the much leaner system of Fig. 34. The simplicity of this operation hides its importance. It provides us with a practical and accurate equirectangular straightedge, that is, it gives us a method to join any two points. The common practice among artists was until very recently to just trace over fixed grids, with all the limitations that entail, as most lines will not be in the grid. The analytic expression for the equirectangular line joining two given points of known coordinates seems too complex for ruler and compass construction (a derivation of this expression may be found in Araújo 2019a). The ruler and compass method we described above is in fact a roundabout way of obtaining this line by relying on the measuring of special pairs of points that make the construction feasible; but the sliding grid method we just described reduces that computation to a simple, physical operation of visual inspection that can be performed with greater ease, and comparable precision, to ruler and compass constructions, for any two arbitrarily chosen points. This situation is in both its difficulties and their resolution quite similar to what we found in the azimuthal equidistant case. The sliding grid method opens the way for easier drawing by internal perspective operations. In Fig. 36 we see an example which comes from what is sometimes called the “telephone pole construction” in classical perspective. In the drawing, made from observation, only the proximal arch was measured from nature. Since it was known that the others had the same measurements, these were constructed by the following operations. Mark points A (upper left internal

19 Spherical Perspective

581

Fig. 36 Telephone pole problem. The arch delimited by A and B was measured from observation. Once drawn, it was used as a template to construct the other arches by internal operations. See author’s website (Araújo 2015) to view the corresponding VR panorama

corner of the left column of the arch) and B (lower right external corner of the right column) as in the figure. Then using the sliding grid method, draw the geodesic AB. Trace the vanishing set of the plane of the arches. In this case they are going left to right, so the vanishing set is the plane represented by the vertical line through point R. Where the geodesic AB intersects this line must lie the vanishing point of line AB, marked V in the figure. Now mark point C at the point of the second arch corresponding to point A of the first arch. Using the sliding grid method, trace line CV . Let D be the intersection of CV with the ground (i.e., with line BR). Since the arches are congruent, then the diagonals AB and CD are parallel; hence D must be the point of the second arch that corresponds to B on the first. We have obtained the second arch. Repeat as needed to get the rest. There are many such operations from classical perspective that translate easily once we have a spherical perspective straightedge. Many such operations in fact work better in spherical perspective, since compactness ensures that all the vanishing points will be available inside the drawing paper.

Spherical Straightedges in Digital Drawing Programs We have seen the importance of a spherical straightedge for internal perspective constructions and how this straightedge is made available for handmade spherical perspective drawings. Of course in the digital realm, such a straightedge can be obtained with greater ease and precision. In Araújo (2019a) an analytic expression is derived for the geodesic that goes through two arbitrary points. This was then used for programming a digital equirectangular straightedge – a snapping system for drawing geodesics – in the drawing application Eq A Sketch 360. This is a piece of free, experimental software

582

A. B. Araújo

for teaching and drawing equirectangular perspective. As of this writing, Eq A Sketch 360 (available at Araújo 2015) seems to be the only drawing program to implement both a sliding grid and an equirectangular straightedge that works as a geodesic snapping system: the user selects two points on the screen and the drawing pen or mouse snaps to the geodesic AB, that is, draws only over the path of the geodesic. The sliding grid is used in combination with the snapping system as a measuring device to plan the perspective constructions. In its latest version (September 2020), the Microsoft Garage app Sketch 360 has now adopted a sliding grid into its set of tools, though not yet a snapping system for general geodesics. Note 8. For the azimuthal equidistant case, there exists a vectorial drawing program implemented as a script in GeoGebra (available at the author’s website Araújo 2015) that draws arbitrary geodesics, but currently no raster drawing app. The calculations required for programming the azimuthal equidistant case are an adaptation of those described in Araújo (2019a) for the equirectangular case.

Conclusion: What Is (Not) a Spherical Perspective We have been concerned with defining what a spherical perspective is. It is just as enlightening to consider what it is not. Definitions, of course, are just ways of organizing concepts, and they are quite arbitrary, but we need to at least settle momentarily upon on a working meaning for our terms if we are not to get hopelessly lost. This has been the case for long with the term “spherical perspective,” as well as its relative, “curvilinear perspective,” both heavily loaded terms that have been often discussed without any definition being attempted leading to endless confusion in theory, artistic practice, and historical study (Andersen 2007, p. 109). To gain some clarity, we have followed here the definition of Araújo (2018c) (slightly adapted). We may want to stop and consider what it encompasses and what it doesn’t. The technical Definition 5 is somewhat heavy, so let us recall its intuitive meaning: from a 3D scene, projected radially on a sphere and then made compact by adding vanishing points, we get a spherical anamorphosis. A spherical perspective is the flattening of that anamorphosis, that is, the mapping of the anamorphosis onto a compact region of the plane, following certain conditions of continuity. This definition includes the things that have traditionally been called by the term “spherical perspective.” It includes the azimuthal equidistant and equirectangular case. These are total spherical perspectives, in the sense that they project the whole sphere. We can trivially expand Definition 5 to cover what we called partial spherical perspectives in Note 3, and then the definition covers Barre and Flocon’s perspective, being the restriction of the azimuthal equidistant perspective to a single hemisphere. It also covers cylindrical perspective and most things that have been

19 Spherical Perspective

583

traditionally called “curvilinear perspectives,” a vague term that we don’t attempt to define formally here, as we don’t really need it. Our definition of spherical perspective insists on compactness, and that leads us to our main, perhaps surprising exclusion: our definition does not cover classical perspective. To be sure, it covers any actually constructible linear perspective drawing! The restriction of linear perspective to any compact region, like a rectangle, is a spherical perspective, and the rectangle can be as large as you want. It just cannot be the whole plane, which is not compact. And we cannot get the projective plane by our schema, since we don’t identify opposite directions. We split them quite deliberately: we want a line to have two vanishing points in a total spherical perspective, not one. Hence, even as we open a spherical anamorphosis up to a hemisphere, so that the rays cover a whole plane, we find that frontal lines will have two vanishing points. This may be seen already at the edge of Barre and Flocon’s perspective. By contrast, in the projective plane, we would be finding all lines to have one vanishing point. Ours is a fundamentally different construct, and we have already discussed why we think it is, for our purposes, the right one. We could of course drop compactness and reasonably call the result a noncompact spherical perspective. Then we could allow classical perspective, and we could allow the use of stereographic projections in a total perspective, as well as many other useful and interesting projections. But here we have focused on the compact ones because we think these bring something elegant into focus, which merits special attention: having all vanishing point accessible for effective drawing constructions and allowing the full reconstruction of the immersive anamorphosis from the flat perspective drawing. Even if they are a restricted set, compact spherical perspectives cover many things, some not obvious at first sight. For instance, Correia and Romão (2007) propose a computational extended perspective system, with applications to architecture (Correia et al. 2013), which encompasses cylindrical, hemispherical perspective and linear perspective, among others. It does this by changing the surface of projection smoothly according to some parameter (radius, eccentricity) to change, for instance, a cylinder into a rectangle or a half sphere into a half-ellipsoid continuously. This in turn will change the plane projection smoothly. This may seem different from our perspectives, since we always start from a spherical anamorphosis. Yet, these are still conical anamorphoses, and as we have discussed, spherical anamorphosis is the canonical conical anamorphosis. It can represent any other. If you project radially onto a cylinder or an ellipsoid or any starred surface, you are still getting, in the end, a (partial or total) spherical perspective. Then any transformation of the projection surface (say, from a sphere to an ellipsoid) results in a change of the final perspective image that can be equally accomplished by an adequate change of the sphere’s flattening map. This is not to say that the use of different surfaces is irrelevant. It is not. It may often be much more intuitive to think of a change of the flattening in terms of a change of projection surface. What we are saying is that these perspectives are encompassed by the present definition (whether they result in perspectives amenable to handmade drawing is a different matter).

584

A. B. Araújo

This same argument shows that cubical perspective is also a spherical perspective in our sense. Recall that a cubical perspective is an image obtained by projecting a 3D scene radially onto a cube, then cutting open the cube and flattening it (Rossi et al. 2018). This projection, which recalls the tradition of perspective boxes (Spencer 2018; Verweij 2010), has now many uses in computer graphics (Greene 1986). It can be treated as a perspective, in the general sense of the word, by simply seeing it as an organized set of six linear perspectives (see, for instance, Olivero et al. 2019a). But it can also be seen, much more elegantly, as a spherical perspective. Again, the fact that the projection is made onto a cube is irrelevant. The cube is homeomorphic to the sphere by a radial map, so, for our purposes, a cube might as well be a sphere. We take the radial projection of the sphere onto the cube as the first step of the flattening and the usual planification of the cube as the second step. Cubical perspective can then be treated according to our general strategy above: we can talk of geodesics on the cube, of antipodes, of line classification and rendering, in a way that is far more efficient and elegant than seeing it simply as a group of six somehow related classical perspectives. This has been done in Araújo et al. (2020), following very much the same approach we used here for the two cases we treated explicitly. It would be interesting to try the same approach on other polytope-based perspectives, such as the tetraconic perspective of Adams (1976). All of these perspectives (cylindrical, cubical, etc.) are encompassed in our definition for the simple reason that they are conical, that is, they satisfy radial occlusion. So our spherical perspectives are not only compact, but fundamentally conical. If we overvalued accuracy over terseness, we might have named this chapter quite correctly as conical, compact, total spherical perspectives. What if we dropped radial occlusion? We could certainly project onto the sphere in a nonradial way and then obtain flattenings of that. Should we include these perspectives as “spherical”? Spherical non-conical? We could, but it is a delicate choice, as a concept loses interest if it encompasses too much. So we elect not to. As an example of what is left out, consider spherical reflections. Reflection on a sphere is a time-honored way of drawing a wide-angle view without need for formal dominion of the projection (see, for instance, Jan van Eyck’s Portrait of John Arnolfini and his Wife (1434)). The artist can simply look at a spherical mirror and draw what he sees. The canonical example in the modern mind is Escher’s Hand with Reflecting Sphere (1935). The end result is very similar in aspect to a fisheye perspective. Yet, on careful analysis, it is fundamentally different, because occlusion is not radial. The points that occlude each other in a sphere reflection are not aligned atop rays that stem from a central point in the observer’s eye. Hence, although Escher’s famous print looks very much like a fisheye drawing, it is fundamentally different: the set of points represented is simply not the same, even when the angle covered is. The set of equivalent points in the reflected image does not respect the principle of radial occlusion. Since the picture does not derive from a conical anamorphosis, it cannot be represented as a spherical perspective in our sense of the word, since the anamorphic step is broken. For more on sphere reflections and their relations with perspective, see Crannell (2011), Glaeser (1999), and Araújo (2018c).

19 Spherical Perspective

585

Reflections are just the most obvious of the things we are leaving out of our definition and that could conceivably be called perspectives or even spherical perspectives. Definitions are tentative, but they are crucial to help us delimit, separate, classify, and clarify. It is useful that we have a clear way of separating very similar-looking things, to express clearly why Escher’s reflection sphere and a fisheye picture are fundamentally different. A sphere reflection may define a curvilinear perspective, maybe even a spherical perspective if you so decide to define it so, but certainly not a total, conical, compact, spherical perspective. These partitions of concepts are not just nitpicking. They have technical implications, in computing, for instance: the fact that our perspectives are conical means that occlusion computations are completely determined at the anamorphosis step and in the exact same way for any spherical perspective. We two steps are computationally separate and functionally composable. Further, occlusion being radial implies that we can use the same hidden faces algorithms as in classical perspective! This is not the case at all with reflections, where computing occlusion is a far more complex task (Glaeser 1999). Even within the realm of our very specifically restricted spherical perspectives (total, conical, compact), we have available a large variety of image creating structures, each of which opens new aesthetic avenues for the artist, problems for the geometer, and opportunities for the technologist. We believe that the tools here presented may help both the artist and the geometer to tame the curious members of this perspective bestiary. And, outside the realm of these spherical perspectives, there are an infinite varieties of other “perspectives” yet to classify and put to use. Hopefully this chapter has also shaped a vague glimpse of this further realm in the reader’s mind.

Cross-References  Anamorphosis Reformed: From Optical Illusions to Immersive Perspectives

References Adams KR (1976) Tetraconic perspective for a complete sphere of vision. Leonardo 9(4):289–291. https://doi.org/10.2307/1573354 Adams KR (1983) Flat sphere and tetraconic perspective (letter to Ed.). Leonardo 16(4):333 Andersen K (1992) Brook Taylor’s role in the history of linear perspective. In: Brook Taylor’s work on linear perspective. Springer, New York, pp 1–67 Andersen K (2007) The geometry of an art: the history of the mathematical theory of perspective from Alberti to Monge. Springer Science & Business Media, New York Araújo A (2017) Anamorphosis: optical games with perspective’s playful parent. In: Silva JN (ed) Proceedings of the recreational mathematics colloquium V (2017) – G4G Europe, Associação Ludus, Lisbon, pp 71–86

586

A. B. Araújo

Araújo AB (2015) Notes on spherical perspective. http://www.univ-ab.pt/~aaraujo/full360.html Araújo AB (2016) Topologia, anamorfose, e o bestiário das perspectivas curvilíneas. Convocarte–Revista de Ciências da Arte (2):51–69 Araújo A (2018a) Let’s sketch in 360º: spherical perspectives for virtual reality panoramas. In: Bridges 2018 conference proceedings, Tessellations Publishing, pp 637–644 Araújo AB (2018b) Drawing equirectangular VR panoramas with ruler, compass, and protractor. J Sci Technol Arts 10(1):2–15. https://doi.org/10.7559/citarj.v10i1.471 Araújo AB (2018c) Ruler, compass, and nail: constructing a total spherical perspective. J Math Arts 12(2–3):144–169. https://doi.org/10.1080/17513472.2018.1469378 Araújo AB (2019a) Eq a sketch 360, a serious toy for drawing equirectangular spherical perspectives. In: Proceedings of the 9th international conference on digital and interactive arts. ACM, Braga Portugal, pp 1–8. https://doi.org/10.1145/3359852.3359893 Araújo AB (2019b) A fisheye gyrograph: taking spherical perspective for a spin. In: Goldstine S, McKenna D, Fenyvesi K (eds) Proceedings of bridges 2019: mathematics, art, music, architecture, education, culture, Tessellations Publishing, Phoenix, pp 659–664. Available online at http://archive.bridgesmathart.org/2019/bridges2019-659.pdf Araújo AB (2020) Explorations in rational drawing. J Math Arts 14(1–2):4–7. https://doi.org/10. 1080/17513472.2020.1734437 Araújo AB, Olivero LF, Antinozzi S (2019) HIMmaterial: exploring new hybrid media for immersive drawing and collage. In: Proceedings of the 9th international conference on digital and interactive arts, ACM, Braga, pp 1–4. https://doi.org/10.1145/3359852.3359950 Araújo AB, Olivero LF, Rossi A (2020) A descriptive geometry construction of VR panoramas in cubical spherical perspective. Diségno (6):35–46. https://doi.org/10.26375/disegno.6.2020.06 Barnard ST (1983) Interpreting perspective images. Artif Intell 21(4):435–462 Barre A, Flocon A (1968) La perspective curviligne. Flammarion, Paris Barre A, Flocon A (1987) Curvilinear perspective: from visual space to the constructed image. University of California Press, Berkeley Barre A, Flocon A, Bouligand G (1964) ’Etude comparée de différentes méthodes de perspective, une perspective curviligne. Bulletin de la Classe des Sciences de La Académie Royale de Belgique 5(L) Belisle B (2015) Nature at a Glance: immersive maps from panoramic to digital. Early Pop Vis Cult 13(4):313–335 Benosman R, Kang S, Faugeras O (2000) Panoramic vision. Springer, New York Berggren JL (1981) AI-Biruni on plane maps of the sphere. J Hist Arab Sci (5):191–222 Brownson CD (1981) Euclid’s optics and its compatibility with linear perspective. Arch Hist Exact Sci 24:165–194 Burton HE (1945) Euclid’s optics. J Opt Soc 35(5):357–72 Casas F (1983) Flat-sphere perspective. Leonardo 16(1):1–9. https://doi.org/10.2307/1575034 Casas F (1984) Polar perspective: a graphical system for creating two-dimensional images representing a world of four dimensions. Leonardo 17(3):188–194. https://doi.org/10.2307/ 1575189 Catalano G (1986) Prospettiva Sferica. Università degli Studi di Palermo, Palermo Correia V, Romão L (2007) Extended perspective system. In: Proceedings of the 25th eCAADe international conference, pp 185–192 Correia JV, Romão L, Ganhão SR, da Costa MC, Guerreiro AS, Henriques DP, Garcia S, Albuquerque C, Carmo MB, Cláudio AP, Chambel T, Burgess R, Marques C (2013) A new extended perspective system for architectural drawings. In: Zhang J, Sun C (eds) Global design and local materialization, vol 369. Springer, Berlin/Heidelberg, pp 63–75. https://doi.org/10. 1007/978-3-642-38974-0_6 Crannell A (2011) Perspective drawings of reflective spheres. J Math Arts 5(2):71–85 de Smit B, Lenstra HW Jr (2003) The mathematical structure of Escher’s print gallery. Not AMS 50(4):446–451 Escher MC (1956) Print gallery. Litograph

19 Spherical Perspective

587

Fasolo M, Mancini MF (2019) The ‘architectural’ projects for the church of St. Ignatius by Andrea Pozzo. diségno (4):79–90. https://doi.org/10.26375/disegno.4.2019.09 Glaeser G (1999) Reflections on spheres and cylinders of revolution. J Geom Graph 3(2): 121–139 Grau O (1999) Into the Belly of the image: historical aspects of virtual reality. Leonardo 32(5):365– 371. https://doi.org/10.1162/002409499553587 Greene N (1986) Environment mapping and other applications of world projections. IEEE Comput Graph Appl 6(11):21–29 Hohenwarter M, Borcherds M, Ancsin G, Bencze B, Blossier M, Delobelle A, Denizet C, Éliás J, Fekete Á, Gál L, Koneˇcný Z, Kovács Z, Lizelfelner S, Parisse B, Sturr G (2013) GeoGebra 4.4. http://www.geogebra.org Kemp M (1990) The science of art. Yale University Press, New Haven/London Michel G (2013) ’L’oeil, au Centre de la Sphere Visuelle. Boletim da Aproged (30):3–14 Michel G (n.d.) Dessin à main levée du Cinéma Sauveniére. http://autrepointdevue.com/blog/wpcontent/vv/vv-gm-sauveniere/vv-gm-sauveniere.html Moose M (1986) Guidelines for constructing a fisheye perspective. Leonardo 19(1):61–64 Olivero LF, Rossi A, Barba S (2019a) A codification of the cubic projection to generate immersive Models. diségno (4):53–63. https://doi.org/10.26375/disegno.4.2019.07 Olivero LF, Sucurado B, Olivero LF, Sucurado B (2019b) Analogical immersion: discovering spherical sketches between subjectivity and objectivity. Estoa Revista de la Facultad de Arquitectura y Urbanismo de la Universidad de Cuenca 8(16):80–109. https://doi.org/10.18537/ est.v008.n016.a04 Pozzo A (1693) Perspectiva pictorum et architectorum. Rome Rossi A (2017) Immersive high resolution photographs for cultural heritage, vol 2. Libreriauniversitaria.it, Padova Rossi A, Olivero LF, Barba S (2018) “CubeME”, a variation for an immaterial rebuilding. In: Rappresentazione/materiale/immateriale drawing as (in) tangible representation, Cangemi Editore, pp 31–36 Savage-Smith E (2015) Celestial mapping. In: Harley J, Woodward D, Lewis G (eds) The history of cartography. University of Chicago Press, pp 12–70 Schofield W, Breach M (2007) Engineering surveying, 6th edn. Butterworth-Heinemann, Amsterdam/Boston Snyder JPP (1993) Flattening the Earth: two thousand years of map projections. University of Chicago Press, Chicago Spencer J (2018) Illusion as ingenuity: Dutch perspective boxes in the Royal Danish Kunstkammer’s ‘perspective chamber’. J Hist Collect 30(2):187–201 Termes D (1998) New perspective systems. self-published Termes DA (1991) Six-point perspective on the sphere: the termesphere. Leonardo 24(3):289–292 Verweij A (2010) Perspective in a box. In: Architecture, mathematics and perspective. Springer, Berlin, pp 47–62

A Hidden Order: Revealing the Bonds Between Music and Geometric Art – Part One

20

Sama Mara and Lee Westwood

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Harmony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Harmony of Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Harmony of Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Mapping between Music and Geometric Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Color to Pitch Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Relationship Between Rhythm and Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Unit of Time and a Unit of Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hexagons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pentagonal Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Octagonal Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

590 592 592 592 599 599 604 604 613 618 624 625 625

Abstract The following chapter describes a method of translating music into geometric art and vice versa. This translation is achieved through an exploration of the mutual foundations – in mathematics and its role in harmony – of both music and geometric art. More specifically, the process involves the implementation of principles derived from traditional Islamic geometric art and contemporary mathematics, including fractal geometry and aperiodic tilings.

S. Mara () Musical Forms, London, UK e-mail: [email protected] L. Westwood University of Sussex, Brighton, UK e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_18

589

590

S. Mara and L. Westwood

The method was discovered by Mara in 2011 and was subsequently developed during his collaboration with composer Lee Westwood on the project A Hidden Order. Examples from this project are used to illustrate parts of this chapter. Also discussed are the implications of establishing such a connection between music and geometric art. These include the possibility of unique creative processes that combine practices from both visual arts and musical composition, as well as facilitating the application of developments, practices, and creative processes from one discipline to the other.

Keywords Music · Geometry · Visualization · Sonification · Islamic · Art · Fractal · Aperiodic · Tilings · Harmony

Introduction The exploration of the relationship between music and visual art recurs repeatedly in the arts and sciences over the centuries and includes a variety of different approaches such as the use of theoretical, intuitive, and experiential methods, alongside the exploration of physical phenomena. The theoretical studies considered include Isaac Newton’s relation of the seven colors from the visible spectrum to the seven notes of the musical scale, detailed in his book Opticks (1704). Also of interest are the various systems of proportion explored in architecture, particularly in the Renaissance period, by scholars such as Alberti, who relates the musical intervals (1:2, 2:3, etc.) to rectangles of the same ratios (Padovan 1999). According to the art historian Wittkower, Leonardo Da Vinci theorizes on the shared harmonic principles at play within music and the practices of perspective in art, stating “The same harmonies reign in music and perspective space” (Wittkower 1953). Recent theoretical approaches include “A Geometry of Music” by Dmitri Tymoczko (2011), which views Western musical tonality through geometric space. Physical explorations of this relationship between music and art may be seen in experiments conducted in Cymatics, whereby liquids or particles on metal plates are exposed to the vibrations of sound waves, creating symmetric forms dependent on the sonic frequencies. In a similar manner, the harmonograph may be used to create specific geometric shapes via the use of pendulums swinging in relative frequencies analogous to the musical ratios. Experiential and intuitive approaches in this field include the phenomenon of Synaesthesia, in which the stimulus of one sense may cause an impression in another, most pertinently where a visual sensation is experienced in response to a sound stimulus. There are many examples of composers relating pitch to color, the French composer Olivier Messiaen being perhaps the most well documented, whose experiences directly informed his approach to harmony (Messiaen 2002). Similarly,

20 A Hidden Order: Revealing the Bonds Between Music and . . .

591

Scriabin’s later works devised elaborate relationships between color and key centers (Galev and Valechkina 2001). In practice, the intuitive and experiential approaches of relating music to art may be seen in the works of the abstract painters of the 1900s such as Mondrian, Kandinsky, and Klee, as well as animated pieces by the likes of Oskar Fischinger and John Whitney. Wassily Kandinsky wrote “The sound of colors is so definite that it would be hard to find anyone who would express bright yellow with bass notes, or dark lake with the treble” (Kandinsky 1914). This exemplifies the shared intuitive understanding of a relationship that seems to be common to the human experience of colors and sound. If we omit the references to hues from this statement by Kandinsky, it may be interpreted as relating the pitch of the sound with the brightness of color, in that the higher the pitch, the brighter the color. Within a branch of research called crossmodal correspondence that “refers to consistent associations between features in two different sensory domains” (Griscom 2015), there are many studies exploring the relationship between color and sound which support Kandinsky’s observation. Ward et al. (2006) found that “both synaesthetes and non-synaesthetes associate higher pitches with lighter colors” (Griscom 2015). The sonification of art is perhaps less explored, although the use of the Golden ratio through the implementation of the Fibonacci series is one area in which we find numerous examples of the influence of geometry and the visual arts on music, as seen in compositions by Joseph Schillinger (Livio 2002), Béla Bartók (Lendvai 2000), and Debussy (Howat 1983). The approach documented in this chapter is primarily a theoretical one which explores the common root of mathematics at the foundation of music and geometric art. Throughout this method, geometric art is considered in relation to color and pattern and music in terms of sound and rhythm. From these two aspects of music and of visual-art, we relate notes to color, and rhythm to pattern. The reason for this is due to the relative frequencies of sound, color, rhythm, and pattern and how they are perceived. In the realms of color and musical notes, when considered as light and sound waves, the human sensory systems are not able to distinguish individual cycles of the waves, but rather a general sensation of color or sound. Rhythm and pattern are also of a cyclical nature. However, each unit of the cycle may be experienced individually. In essence, the translation from sound to color is achieved through understanding both as waveforms and drawing upon the physical properties of waves, such as amplitude and frequency. The relationships derived are supported through research in cross-modal correspondence. In establishing a relationship between rhythm and pattern, the approach demanded study of the mathematical roots of both. Rhythm is here considered as a division of time and pattern as a division of space. Hence, if both rhythm and pattern are rooted in mathematics, the question of their relationship develops into finding a meaningful and logical association between their respective mathematical roots. By defining rhythm as divisions of time, the theory assumes a pulse in the music. As such, arrhythmic music (e.g., Boulez’s “smooth time” (Boulez 1971)) does not carry a meaningful analogy in the method described here.

592

S. Mara and L. Westwood

Harmony At the root of this approach are the mathematical foundations of “harmony of time” (as exemplified by the musical rhythms of many cultures around the world) and “harmony of space” (as understood and applied in Islamic geometric art and many other traditions). We refer to “harmony of time” and “harmony of space,” relating to aural and visual harmony, respectively.

Harmony of Time Systems of harmony governing rhythm and pitch in Western music are often based upon simple divisions of time. Concerning rhythm, the bar is divided into a given number of beats, each beat being subsequently divided further into equal parts, thus creating a structure in which the rhythmical aspects of the music may be conceived. In regard to pitch, and more specifically the intervals between two pitches, it is understood that “when the frequency ratio of the two notes is a ratio of low integers: the simpler the ratio, the more consonant are the two notes” (Bibby 2003). The octave is a ratio of 2:1, meaning that when a note is played at twice the frequency of another note, the interval created is an octave. Likewise, in just intonation, relationships of 3:2 (a fifth), 4:3 (a fourth), 5:4 (a major third), and 6:5 (a minor third) all form intervals found within common musical scales and produce consonant sounds. The tempered scale, now standard in music in the Western world, is also rooted in these harmonies and although it does not employ them precisely apart from the octave, it achieves a harmonious sound by closely approximating these ratios. This is possible because there is a certain tolerance in deviating from these ratios that still creates a harmonious result (Bibby 2003). It is this understanding of harmony of both rhythm and pitch that is used in this method.

Harmony of Space In regard to harmony of space, we draw upon the principles applied in Islamic geometric art, which is rooted in Euclidean geometry. These principles are not unique to Islamic art and are true for other systems of proportioning that originate from the regular polygons. We start with the circle and the equal divisions of its circumference which produce the regular polygons (the equilateral triangle, square, regular pentagon, and so on). The forms and ratios that are revealed through the natural subdivisions of the regular polygons form the myriad of different patterns of Islamic geometric art. The numerical ratios that are at play are different to those in the harmony of time and are far more complex. Within systems of proportion implemented in art and architecture, the artist Jay Hambidge defined two general approaches as static versus dynamic symmetry. Within static symmetry is included Aberti’s system of proportion as well as the

20 A Hidden Order: Revealing the Bonds Between Music and . . .

593

“musical ratios” applied in renaissance art and architecture, consisting of rectangles with edge lengths in whole number ratios. Dynamic symmetry, on the other hand, √ involves the use of ratios with irrational numbers such as 2:1 and the golden ratio. Although these two systems are not entirely exclusive of one another, it is the approach regarding dynamic symmetry that is explored here. To serve as a brief introduction to this system of harmony, we shall look at the ratios and forms that appear from the subdivisions of the regular pentagon, hexagon, and octagon. The intent of this is to familiarize the reader with the forms and numbers at play and to illustrate the harmonious interactions between these forms. In each of the examples to follow, we shall derive a dynamic rectangle from the regular polygons – illustrating the inherent harmonic properties they contain – and show an example of them applied in Islamic geometric art.

The Pentagon The regular pentagon contains, within its subdivisions, the well-known golden ratio, as seen in Fig. 1. The fascinating properties of the golden ratio are well documented, with numerous publications devoted to it observing its occurrence in the visual-arts, architecture, nature, and music, reaching back from Ancient Egypt through the Middle Ages, the Renaissance, and Modernism, up to the present day. A few of the numerical properties of the golden number may be seen in Fig. 2. The geometric properties of the golden ratio include the golden rectangle and accompanying spiral, and the golden triangle formed by two diagonals and an edge of a regular pentagon. The golden triangle also has a related spiral, seen in Fig. 3. From the edge of the pentagon and diagonal of the decagon is a dynamic rectangle whose ratio is related to the golden ratio. Subdivisions within this rectangle display a harmonious arrangement of decagons and pentagons whose interplay suggests the extra levels of subdivisions that may continue indefinitely – see Fig. 4.

Fig. 1 The ratio between the edge of a regular pentagon and its diagonal is the golden ratio, represented by the Greek letter φ (phi)

1

594

S. Mara and L. Westwood

1

= 1+

1+

2

=

=

1

1+

+ 1

1+ 1+ 1+ 1+

1 1 + ...

= 1 +

1

Fig. 2 Unique properties of the golden number. Top left shows the golden ratio expressed as a continued fraction; top right as a nested radical

1

1

Fig. 3 The golden rectangle with edge lengths 1: φ, with related spiral (left). The regular pentagon with golden triangle (created by the edge of the pentagon and two diagonals) and related spiral (right)

Within Islamic geometric art, the regular pentagon and related golden ratio stand as one of a series of harmonious forms among the other regular polygons, each of which possessing their own unique properties of number and form.

The Hexagon √ The regular hexagon with edge length of 1 has diagonal of length 3. A parallel diagonal reveals a dynamic rectangle referred to as the root-3 rectangle – see Fig. 5. As the root-3 rectangle originates from the hexagon, it may be subdivided indefinitely with combinations of hexagons and triangles. Figure 6 illustrates some of these harmonic subdivisions. The root-3 rectangle is often used as a repeat unit in Islamic arts, as shown in Fig. 7.

20 A Hidden Order: Revealing the Bonds Between Music and . . .

595

Fig. 4 The regular decagon and subdivision revealing a harmonious arrangement of decagons and pentagons (left). A Classic Islamic pattern developed from the same subdivision of the decagon (right)

Fig. 5 The regular hexagon, with edge√length 1 and diagonal 3. Two parallel diagonals and edges form the root-3 rectangle shown in orange

1

√3

Fig. 6 Harmonic subdivisions of the root-3 rectangle

596

S. Mara and L. Westwood

Fig. 7 An example of a geometric pattern from Islamic art, with a root-3 rectangle as the repeat unit

Interactions between the square and the root-3 rectangle create a series of √ √ √ rectangles with edge lengths in the ratios of 1: 3, 1: 3–1, and 1: 3 + 1 that may be used together at various scales and may be used to create endless possible arrangements and subdivisions within the root-3 rectangle – see Fig. 8.

The Octagon √ The regular Octagon with edge length 1 has diagonal of length 2 + 1 (Fig. 9). The √ ratio of 1: 2 + 1 is known as the silver ratio. The silver rectangle is another dynamic rectangle and may be subdivided by octagons, squares, and related forms, as shown in Fig. 10. As with the golden rectangle and root-3 rectangle, a repeating sequence of rectangles may be created by the use of squares inside of the silver rectangle. This time, at each generation two squares are placed inside the silver rectangle, leaving a smaller silver rectangle at the next generation, as shown in Fig. 11. The silver number has interesting numerical properties (see Fig. 12) and is used within Islamic arts to subdivide the square, resulting in interesting interplays of form and harmony – see Fig. 13. We have seen here a variety of patterns and geometric forms derived from the regular pentagon, hexagon, and octagon, with the intention of appreciating their unique – and at times delightful – properties. We have also seen examples of how these ratios and forms are at the foundation of harmony for geometric art,

20 A Hidden Order: Revealing the Bonds Between Music and . . .

√3 - 1

597

√3+1

√3

Fig. 8 Sequence showing the interaction between the square and root-3 rectangle. If a square (with edge length equal to the shorter edge of the root-3 rectangle) is placed inside the rectangle against a √ shorter edge, the √ remaining area is a rectangle of 1: 3–1. This process may be repeated, leaving a rectangle of 1: 3 + 1. Applying this process again returns the area to the original √ root-3 √ rectangle. This √ process may be repeated indefinitely with a series of rectangles of ratios 1: 3, 1: 3–1 and 1: 3 + 1. Bottom right: A possible subdivision of the root-3 rectangle into squares and smaller congruent rectangles (bottom right)

particularly in the Islamic tradition. These principles also extend to other traditions that are based upon the regular polygons and Euclidean geometry. Comparing the systems of harmony of time and space, as described here, we find both similarities and differences. In much the same way as the musical bar is divided into beats, which in turn are further subdivided (with each subdivision having its own relevant and defined place in relation to other parts and to the whole rhythmical structure), so we see an equivalent with the elements of a pattern, where each unit is further subdivisible (again, with each subdivision having its place relative to the whole, with no areas left unresolved). The common root of mathematics is clear in both systems. However, when we observe the numbers at play, two different systems appear. The harmony of time is based upon whole number ratios both in pitch and rhythm: 2:1 resulting in an octave (C to C) and 3:2 in a fifth (C to G), while integer divisions of a bar

598

S. Mara and L. Westwood

1

√2 + 1

Fig. 9 The √ relation between the edge length to the diagonal of the octagon describes the silver ratio of 1: 2 + 1. The shaded area is the silver rectangle

Fig. 10 Harmonious subdivisions of the silver rectangle

20 A Hidden Order: Revealing the Bonds Between Music and . . .

599

Square

√2 + 1

Square

Fig. 11 Interactions between the silver rectangle and the square. A silver rectangle may be subdivided into two squares and a smaller silver rectangle (left). Repeating this process (center) reveals the framework for the related double spiral (right)

1

s = 2+

s = 2s + 1

1

2+ 2+

1 2 + ...

s=2+

1 s

Fig. 12 The silver number expressed as a continued fraction (left)

create a certain number of beats. Visual harmony is also based upon simple integer divisions, but this time of the circumference of the circle. The numbers that unfold over the two dimensional interaction of the intersecting lines are relatively complex and involve irrational numbers. These two number systems are an outcome of the respective dimensions at play within the disciplines of music and art, rhythm being one dimensional (that of time), while pattern is based in two dimensions (those of space). The act of bringing together music and pattern becomes essentially about bridging the gap between one and two dimensions.

A Mapping between Music and Geometric Art Color to Pitch Relationship The relationship between pitch and color described here is based upon the wave properties of sound and light. The mapping is not new, but is described for the sake of completion, in order to present the whole method together. Sound and light may both be understood as waves, though they are of differing natures. Light is part of the electromagnetic spectrum and is a transverse wave with the ability to travel through a vacuum, whereas sound waves are longitudinal, requiring a medium such as air or water in which to travel. Studying the wave

600

S. Mara and L. Westwood

1

√2

1

1

√2+1

Fig. 13 A Geometric design from the Alhambra in Spain, showing the repeat unit as a square, with the silver ratio at the foundation of the design

Amplitude

Fig. 14 The amplitude of a wave

nature of these two allows for simple correlations to become apparent between the properties of color and sound, these correlations being supported by experiments in cross-modal correspondence.

Loudness and Brightness The amplitude of a wave is defined as the distance between a peak or valley from the equilibrium point – see Fig. 14. The amplitude of a sound wave corresponds to the loudness of the sound, where increased amplitude results in increased loudness. In regard to light waves, the amplitude relates to the brightness of light. From this we deduce that the amplitude of the wave form relates the loudness of the sound to the brightness of the color.

20 A Hidden Order: Revealing the Bonds Between Music and . . .

601

Studies in cross-modal correspondence support this correlation between loudness of sound and brightness of color. For example, experiments conducted by Stevens and Marks (1965) found consistent correlations between loudness of sound and brightness of color. The inverse of the above-stated relationship is also of relevance in this context. If the color presented is based upon a white ground rather than black, one might expect a louder sound to have a more intense color, so creating a darker result. An experiment was conducted that asked subjects to match the loudness of the sound to samples of neutral grey paper of different values, presented on either a white background or dark background. On the dark background, 10 of the 12 participants matched the increasing loudness to increasing brightness of the cards, while 2 matched this with increasing darkness. With the light background, of the 10 subjects involved, 4 matched increasing loudness to increasing brightness, 5 to increasing darkness, with 1 subject matching identical sound pressures to all the greys (Marks 1974). In this experiment, there is a consistent relationship displayed between the increase or decrease in both loudness and brightness: on a darker background the tendency was towards the sequence increasing in brightness, while on a white background it tended towards increased darkness. It is the relationship of increased darkness with increased loudness that is applied in this chapter and was predominantly applied in A Hidden Order. Starting from the white background of the paper, louder notes would leave darker marks, with quieter notes leaving less dark marks. Ultimately silence would be the white of the background.

Hue and Pitch The frequency of a wave is defined by the number of waves passing any given point per second and is measured in hertz (Hz). The frequency of light determines the hue of the color, and in sound it governs the pitch of the note. It is based upon this common root of frequency that we relate the pitch of the sound to the hue of the color. The human eye is sensitive to wavelengths between 390 nm and 760 nm (Jacobson et al. 2000), a nanometer being 1/1000,000,000 of a meter, the frequency range being approximately 400–770 THz. The wave lengths of sound are much larger, at around 1.3 m for middle C, the frequency range of audible sound being roughly 30–18,000 Hz (Taylor 2003). From these frequency ranges, we can see that humans are able to hear up to around ten octaves (doubling of frequencies), whereas the range of visible frequencies is under one doubling of frequency. This sets up a mismatch in terms of the ranges of the two sets to be mapped to one another. The auditory experience of the interval of an octave in music is such that two notes an octave apart (with a frequency relationship of 1:2) posses a similar quality, referred to as octave equivalency. This property leads to the formation of cyclical scale systems whereby each note that is double the frequency of another has the same name attributed to it, with a key being defined by the collection of notes within one octave. The perception of hue is also of a cyclical nature, with violet and red at opposite ends of the visible spectrum, bridged by magenta (not in itself a spectral color,

602

S. Mara and L. Westwood

but a combination of the opposite ends of the spectrum) that completes the cyclic gradation of hues. The means of mapping hue to pitch makes use of the property of octave equivalency and the cyclic nature of both pitch and hue. Specifically, the visible spectrum of colors (plus magenta) is mapped onto the frequencies within one octave of sound, meaning that any pitch an octave apart from another is mapped to the same hue. This creates a continuous mapping between hue and the frequencies of pitch within an octave, independent of any formal system of frequencies, notes, or scales. As each subsequent octave maps to the same range of hues, it is consequently a many-to-one mapping from pitch to hue. Let us now consider how our starting pitch is mapped to a particular hue, from which point all the other frequencies may be mapped relative to this. C Is Green To map a particular frequency of hue to a particular frequency of pitch, we start from a wavelength of light. For our purposes, we shall choose a mid-green light at a wavelength of 520 nm, though it should be noted that each hue is related to a band of wavelengths on the visible spectrum and is not assigned a particular wavelength as such. When converted to frequency, a light wave of 520 nm comes to 576 THz. This frequency considered as a sound wave is way beyond the human hearing range. However, recognizing the property of octave-equivalency, the frequency may be halved and still retain the same quality and so the same pitch-to-color mapping. Halving the frequency repeatedly (41 times, in fact), stepping down one octave each time, eventually brings the frequency within audible range to 262.17 Hz, very close to the note of middle C (261.63 Hz) in equal tempered tuning. The authors do not maintain that this relationship between the note of C and the color of green holds any weight, but it is a seemingly logical way of establishing a mapping between pitch and hue. Once this relationship between the frequency of light and sound is established, the other frequencies are derived in relation to these, resulting in a continuous mapping. Applied to the notes of the chromatic scale, an approximate relationship is derived, as shown in Fig. 15.

Brightness, Loudness, and Pitch A remaining concern is that all the scales are now matched to the same hue, leaving no differentiation between high-pitched notes and low-pitched ones. This brings us back to Kandinsky’s quote and the correlation between an increase in brightness and higher pitch which, as was stated earlier, is supported by studies in cross-modal mappings. As described above, brightness has already been attributed to loudness, meaning that now both the loudness of the note and height of the pitch contribute to the brightness of the color. This relationship of both loudness and pitch to brightness has also been noted in cross-modal studies: Marks states that “visual brightness has at least two structural and functional correlates in the auditory realm – pitch and loudness” (Marks 1989).

20 A Hidden Order: Revealing the Bonds Between Music and . . .

603

E D#

D

F#

F

C

G#

G

C#

B

A

A# Fig. 15 Hue-to-pitch relationship within one octave

The resultant affect of both loudness and pitch relating to brightness is that the same hue and brightness of a color may be achieved by two notes an octave apart, but with a counterbalancing change in loudness.

Timbre and Saturation The “timbre” of a sound is a quality that allows us to determine the difference between two different instruments playing the same note at the same loudness. It may be defined as “the way in which musical sounds differ once they have been equated for pitch, loudness, and duration” (Krumhansl 1989). Timbre is a complex subject and various aspects contribute towards it, one of these aspects being the relationships of the overtones of a sound. In musical sounds, the overtones typically consist of the harmonic series, whereby each overtone is a multiple of the fundamental frequency. It is the relative amplitudes of these overtones that have an affect upon the timbre of the sound. In general terms, a tone consisting of a single sine wave and little or no overtones will have a very “pure” sound, where the pitch of the sound is clearly discernible. A sound wave with complex interactions in its harmonic series and less order among them will have a less pure sound, leading ultimately to white noise, where no particular pitch is defined.

604

S. Mara and L. Westwood

Likewise with color, a light source containing just one wavelength will output a color with high saturation, such as the light emitted by a laser. More complex interactions of different frequencies of light will reduce the saturation until ultimately the color is a shade of grey, white, or black. By the complexity of the interaction of frequencies in either sound or light, this mapping relates aspects of the timbre of sound to the saturation of color. This correlation has also been noted in studies in cross-modal mappings (Caivano 1994).

A Relationship Between Rhythm and Pattern A Unit of Time and a Unit of Space To explore the relationship between rhythm and pattern, we shall start from a basic premise where one unit of time relates to one unit of space. For our unit of time, we shall choose a “beat” in music. This assumes that the music in question does in fact have a pulse whereby a beat may be defined, thereby excluding arrhythmic music. As a unit of space, we shall choose the square. We shall see later that we may equally choose other regular polygons. The decision to start from a polygon is based upon the approach to harmony of space described above. We now have a beat as our unit of time, represented by a square as our unit of space. The next question follows, what would two beats look like? One option is to place another square next to our original square, creating a double square. This is a valid step and is explored later in this chapter. We shall first look at the approach that represents two “beats” also as a square but twice the area of the original square – see Fig. 16. The process of doubling the area may be repeated indefinitely, each step representing double the number of beats, and thus creating the sequence 2, 4, 8, 16, 32, 64, and so on (see Fig. 16). We shall refer to each square as a “generation,” so that the first square may be known as the “first generation,” the square of area two as the “second generation,” area 4 as “third generation” and so on. Beats 1 and 2 are already located within the diagram, the first beat being the original square and the second beat being the second generation minus the first generation square. Beats 3 and 4 are located somewhere in the area defined by the third generation square minus the second generation square. Beats 5–8 are located somewhere within the fourth generation square minus the third generation square. We may now ask, where are beats 3–8 located? A solution presents itself when we reveal a natural subdivision of these nested squares into a grid of equally sized and shaped cells. Each cell is now the visual representation of a beat in music – see Fig. 17. The convention here shall be that the cell number is correlated to the beat number directly (e.g., cell-3 is the visual representation of beat-3). The grid displays fourfold symmetry about the origin, so from here only the top-left eighth of the grid needs to be considered, as this area is reflected and repeated around the origin.

20 A Hidden Order: Revealing the Bonds Between Music and . . .

605

16

8

4

2 1

2 1

Fig. 16 A square representing one beat in music (left). Doubling the area of the square represents 2 beats (center). The process of doubling the area is repeated three more times, resulting in a series of nested squares with the largest representing 16 beats (right)

16 15

10 9 14 11 13 12

5 6 4 3

8 7 2 1

Fig. 17 The nested square sequence broken down into a symmetric grid of equally sized cells. Each cell represents one beat and is repeated eight times around the origin. The cells in the top left section of the grid are indexed according to which beat they represent

606

S. Mara and L. Westwood

Within a given generation, there is a choice as to the location of a particular cell number. A logical choice within the third generation square is to place cell-3 neighboring cell-2, and cell-4 neighboring cell-3. This indexing sequence may be continued indefinitely, whereby any two consecutive beats are visually represented by contiguous cells – see Fig. 17. This sequence describes a version of the Sierpinski space-filling curve from fractal geometry, which may be created by joining the centers of each of the consecutive cells in the grid by a continuous curve, as shown in Fig. 17. A curve is inherently 1- dimensional, the term “space-filling” referring to the fact that this curve will eventually fill a two dimensional space at its limit. As Peitgen describes, “given some patch of the plane, there is a curve which meets every point in that patch” (Peitgen et al. 2004). Successively subdividing the grid and creating the curve creates more and more dense versions (Fig. 18), tending towards its limit of covering every part of the grid. The principles behind space-filling-curves are a perfect concept to meet our aim of crossing the dimensional gap between 1 and 2 dimensions, from line to plane, and from rhythm to pattern. By implementing this indexing sequence with a selection of rhythms, we see how each rhythm is represented by a unique pattern that may also be read back from pattern to rhythm. In Fig. 19, the shaded areas represent an accented beat, and the white areas represent silence in the music. The visualizations reveal an inherent problem with the method and indexing sequence so far described, in that simple rhythms do not necessarily relate to simple patterns. For example, one of the most simple rhythms in music – where every other beat is sounded, leaving the intermediate beats silent – creates a relatively complex pattern (see the first pattern in Fig. 19). Within this pattern shapes are formed on the horizontal and vertical axes that differ to those on the diagonal axes, which in turn are different from those not lying on the axes at all, while at the origin only is a square formed. Consequently there are four different forms to represent a two beat repeated rhythm.

Fig. 18 Three stages of a version of the Sierpinski space-filling curve. The curve is created by joining the centers of each of the cells with a continuous line in the order shown in Fig. 17. The three stages illustrate how the curve becomes tighter and more dense. When the cells are infinitely small, the curve will cover every part of the defined area

20 A Hidden Order: Revealing the Bonds Between Music and . . . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 16 15

10 9 8 14 11 13 12 7 5 6 4 3 2 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 16 15

10 9 8 14 11 13 12 7 5 6 4 3 2 1

607 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 16 15

10 9 8 14 11 2 13 12 7 5 6 4 3 2 1

Fig. 19 Three 16-beat rhythms and related patterns. A rudimentary rhythm where every other beat is accented (left). A selected rhythm (center) and a palindromic rhythm where beats 9–16 are the reverse of beats 1–8 (right). The rhythms applied are shown in the row of boxes numbered 1–16 above each pattern. Each box relates to a beat, the shaded areas represented by accented beats and silence represented by white

It turns out that palindromic rhythms create simple patterns (Fig. 19). Whilst palindromic rhythms are not a standard approach to rhythm in music, this translation between rhythm and pattern is also not consistent in terms of relating the perceived complexity of the rhythm and pattern. When considering the qualitative aspects of rhythm and pattern, there are basic relationships which ideally should hold true, for a successful visualization of music and vice versa. Two of these aspects are that: 1. A sparse rhythm should create a sparse pattern, and they should increase in density together. 2. The complexity of a rhythm should be reflected in its visual counterpart, so a simple rhythm creates a simple pattern. Of these two requirements the first is already met, but not the second. A solution to meeting both of these requirements would be to re-order the cell indexing, as show in Fig. 20. In a sense, we have embedded the palindromic aspect within the cell order itself, so relieving the rhythmic counterpart of that restriction. This cell order no longer has the property of the original space-filling curve whereby consecutive beats are represented in contiguous cells. However, on visualizing various rhythms, the results satisfy the requisite where simple rhythms create simple patterns and the complexity increases together – see Fig. 21. The new indexing sequence is created through a series of reflections, whereby each reflection line runs along the edge of a generation square (see Fig. 22). These reflections govern the ordering of the cells. Through this process, cell 2 is located via a reflection of cell 1 through reflectionline-1 (rl-1) that runs along on the edge of the original square. Cells 3 and 4 are located by reflecting cells 1 and 2 using rl-2 to map them onto the new cells 3 and 4, respectively. Rl-3 maps each cell from cell 1 to cell 4 to a new cell, as follows: 1

608

S. Mara and L. Westwood

9 10 14 13 12 16 5 11 15 6 7 8 3 4 2 1

Fig. 20 Alternative cell indexing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 9 10 14 13 12 16 5 11 15 6 7 8 3 4 2 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 9 10 14 13 12 16 5 11 15 6 7 8 3 4 2 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 9 10 14 13 12 16 5 11 15 5 6 7 8 3 4 2 1

Fig. 21 The same 16-beat rhythms as visualized in Fig. 19, but this time applying the new indexing system. The simple rhythm on the left now creates a visually simple pattern more in line with what one would expect of such a simple rhythm

→ 5, 2 → 6, 3 → 7, and 4 → 8. The reflection-lines determine the indexing of the cells as each generation is a reflection of all the previous generations, reflecting the origin of the grid out to the vertex of the new generation square and preserving the relative order of cells through the reflection.

Binary Counting Grid An interesting property becomes apparent if the cells are numbered with binary numbers, with the first cell as 0 – see Fig. 23.

20 A Hidden Order: Revealing the Bonds Between Music and . . .

3 4 2 rl-1 1

2 rl-

7 8 3 4

2 rl-1 1

2 rl-

5 6 rl-3

2 rl-1 1

609

9 10 14 13 12 16 5 11 15 6 7 8 rl-3 4 3 4 rl 2 rl-1 2 1 rl-

Fig. 22 Series of reflections used to locate and index the cells

1000 1001

1101 1100 0

1011 1111 1010 1110 0

100 101

110 111 10 11 1 0

Fig. 23 Indexing system using binary numbers

Each cell may be located within the grid through a unique combination of reflections. For example, to locate the cell with binary number 1000 (cell number 8) only one reflection is necessary: rl-4 and rl-1, rl-2 and rl-3 are all omitted (Fig. 24). Figure 25 shows another example, locating the cell with binary number 1011, applying rl-1, rl-2 and rl-4, while leaving out rl-3. As a final example, cell number 110 (number 6) is located by applying rl-2 and rl-3, leaving out the first reflectionline. Rl-4 is not applicable as it relates to cell numbers beyond this generation – see Fig. 26. Table 1 displays the transforms that are applied to locate the cells in the examples. On observing the binary numbers and the reflection lines applied, it becomes apparent that in the above examples the binary expression of the cell numbers

610

S. Mara and L. Westwood

Fig. 24 Locating cell with binary number 1000. Only the fourth reflection-line (rl-4) shown in orange is applied

1000

rl-3

4 rl-

rl-1

2 rl-

1011

1011

1011

rl4

rl-3 rl-1

rl-1

rl2

rl2

rl-1

Fig. 25 Locating cell with binary number 1011 using three reflections: rl-1, rl-2 and rl-4

110

110

2 rl-

rl-3 rl-1

rl-1

2 rl-

Fig. 26 Locating cell with binary number 110 using two reflections: rl-2 and rl-3

encode the instructions as to which reflections to apply in order to locate that given cell within the grid. Reading the digits of the binary numbers from right to left, each digit corresponds to a particular reflection-line in order, whereby the first digit relates to the first reflection, the second digit to the second reflection and so on. If the digit is “1” then we apply the associated reflection-line; if it is “0” then this reflection-line is omitted. It turns out that this may be extended indefinitely: for example, to locate

20 A Hidden Order: Revealing the Bonds Between Music and . . .

611

Table 1 This table shows the reflection-lines applied to locate three cells within the grid. Note how the reflection-lines applied relate to the binary numbers themselves Binary Number 1000 1011 110

Reflection-line-1   

Fig. 27 The Hilbert Curve and cell indexing of the first 16 cells in the grid

Reflection-line-2 x x 

Reflection-line-3 x  x

Reflection-line-4 x  x

6

7

10 0

11

5

8

9

12 2

4

3

14 4

13

1

2

15 5

16

the cell with binary number 111000110110011 within the grid, we apply reflectionlines 1, 2, 5, 6, 8, 9, 13,14, and 15, and omit the others. In a sense this grid and indexing sequence may be considered a visual form of the binary counting system.

An Alternative Square Tiling In the derivation of the previous grid and mapping, we explored the route that represented two beats as a larger square with twice the area of the original square. Now we shall look at two options pursued from an alternative approach, whereby two beats is represented by two squares placed next to each other. Hilbert Curve Tiling We start from one square representing a beat and the double square representing two beats. A generation may be completed by adding two further squares to the first two, forming a larger square comprised of four squares. To continue the process, we may refer to another space-filling curve known as the Hilbert Curve – see Fig. 27. As this grid resolves to the next generation at four times the original area, it relates to a time signature based upon the powers of 4 (groups of 4 beats or 16, 64 and so on). On visualizing rhythms using this grid with the standard indexing for the Hilbert curve, we find a similar issue to the Sierpinski space-filling curve, in that simple rhythms result in complex patterns (see Fig. 28). As before, this is overcome by indexing the cells in a different sequence, based upon reflections, as in Fig. 29.

612

S. Mara and L. Westwood

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 6

7

10 0

11

5

8

9

12 2

4

3

14 4

13

1

2

15 5

16

Fig. 28 Visualization of a simple 4 beat repeated rhythm on a grid based on the Hilbert Curve. The white areas represent silence while darker cells represent accented beats, with the intermediate shades representing varying degrees of loudness

In contrast to the first grid explored, we begin to see here how each grid has its own distinctive quality and enables a different language of form and pattern.

The Dragon Curve When starting from a square and a double square, we may alternatively draw upon the space-filling curve called “The Heighway Dragon.” As opposed to the previous grids that are created by the use of reflections, this grid is based upon rotations, creating a distinctively visual quality. Each generation doubles the area of the grid. Unlike the other grids, each generation does not resolve to the same congruent form, but tends towards a particular form known as the Heighway Dragon (Fig. 30). The grid is indexed using the same principles as before, where each rotation maps the existing cells onto the new cells in the same order – see Fig. 31. The use of rotations rather than reflections lends the grid a distinctive look and feel that is no longer so reminiscent of traditional forms of geometry (such as those found in Islamic art), but is distinctive of fractal geometry. This grid may lend itself to certain styles of music, as opposed to the hard crystalline quality of the reflectionbased grids.

20 A Hidden Order: Revealing the Bonds Between Music and . . .

613

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 5

6

14

13

7

8

16

15

3

4

12

11

1

2

10

9

Fig. 29 Visualization of the same simple repeated rhythm as in Fig. 28, using a grid based on the “Hilbert Curve” with an alternative indexing sequence

Hexagons The above examples all explored possibilities on the basis of a square representing a beat. What happens when we choose a hexagon to represent a beat? What implications does this have upon the system for translating rhythm to pattern? Following a similar process as implemented in the first of our grids, we start by assigning a regular hexagon to represent a beat. Two beats is represented by a hexagram, so doubling the area. At the next step the area of the original form is tripled and the sequence returns to a regular hexagon, so completing one generation (see Fig. 32). Continuing this process creates a sequence of nested hexagons that triples the area with each generation (1, 3, 9, 27, etc.). This series of nested hexagons will naturally subdivide into a grid of triangles, as shown in Fig. 33. As the grid resolves to each new generation at powers of 3, it relates to musical time signatures based upon the same numbers of beats per bar such as 3, 9 or 27 and so on. The grid may be created, and cells located, by a series of reflections in the same manner as the first grid, though now there are three possible transformations (no reflection, 1 reflection, or 2 reflections) – see Fig. 34.

614

S. Mara and L. Westwood

9 10 12 11 14 16 15 13 5 6 8 1 7 3 4 2 Fig. 30 The first six stages of the Heighway created through a series of 90◦ rotations, and a higher order of the curve showing its distinctive visual quality

1 2 3 4

Fig. 31 Heighway Dragon grid, visualizing a simple 4-beat repeated rhythm. The grid is comprised of four Heighway Dragon curves meeting at the origin, creating the fourfold symmetry

20 A Hidden Order: Revealing the Bonds Between Music and . . .

615

9 6 2 1

3

1

2 1

3

2 1

Fig. 32 The regular hexagon, representing 1 beat of music (left); doubling the area results in a hexagram; returning to another hexagon three times the area of the initial hexagon; repeating this process results in a hexagon nine times the original area (right)

7 8 9 4 5 6 3 2 1

Fig. 33 The hexagon divided into a grid representing 9 beats of music

As with the square grid and its relation to the binary counting system, this grid and indexing system relates to the ternary counting system. Here, each cell number written as a ternary number contains the information as to which reflections to apply and which to omit to locate any given cell within the grid.

More Hexagons Returning to the sequence from hexagon to hexagram, then to a larger hexagon, we may continue another step to arrive once more at a larger hexagon which is four times the original area, again continuing this sequence to quadruple the area in each generation – see Fig. 35. Because the hexagon resolves with an increase of three times or four times the original area, hexagonal grids may be created that resolve after groups of 3 cells,

616

S. Mara and L. Westwood

rl-4

7 4

4

5 6 rl-3 3 2 rl 1 rl-1 1

5 6 rl-3 3 2 rl-1 rl 1 1

3 2 rl-1 rl 1 1

rl-2

rl-2

rl-2

2 rl-1 1

8 9

Fig. 34 Reflections used to locate cells within the hexagonal grid. The resulting grid is a visual representation of the ternary counting system

16

8 12 4

4 2 1

1

16 15

3

2 1

3

2 1

13 14 9 0 1 5 11 6 7 12 8 4 3 2 1

Fig. 35 The hexagon may also follow a sequence where it resolves to a larger hexagon at four times the original area (top). The related grid displays the first 16 cells (bottom)

20 A Hidden Order: Revealing the Bonds Between Music and . . .

617

4 cells, or any combinations of these (e.g., 9, 12, 16, 18, 36). This enables the possibility of exploring a variety of different note groupings, time signatures, and accordingly, musical styles.

Rhythmic Motifs Within music a rhythmic motif may define an entire style of music. A Flamenco compás, the clave from Central American music, the waltz and so on, all have unique qualities defined by the arrangement of beats, accents, and rests. The instrumentation and performance of these motifs is of vital importance to the music, but the DNA of the style, as it were, is encoded within the rhythm itself. Figure 36 shows examples of a variety of traditional rhythms translated into their corresponding patterns. These examples are all 12 beat motifs based upon a hexagon divided into a 12 cell sequence. As with the rhythmic motifs, each visual pattern has its own unique character and expression, created purely by the different arrangements of the shaded cells within the grid. Rotations Figure 37 shows an example of a hexagonal grid created through rotations. The aesthetic quality is in line with the Heighway dragon, but this grid has sixfold symmetry rather than fourfold, and relates to a time signature with 6 or 12 beats per bar. Further grids created through rotations are possible, exploring different symmetries and related time signatures. Grid Symmetry, Time Signature, and Structure of the Composition The symmetry of the grid relates to the time signature of the music. The growth of the areas between subsequent generations of the grid determines the number of beats in a bar. For example, in the first square grid described, a square resolves to a larger square with a doubling of the area, relating to either a 2-beat bar, a 4-beat bar, 8-beat bar, or any number in that doubling sequence. The hexagon resolves to a larger hexagon with a tripling of the area, relating to a 3-beat bar, but may also resolve to the next generation at 4-times the original area, so relating to a 4-beat bar, or a combination of these. The structure of the grid also determines the structure of the musical piece beyond a bar length. It determines the arrangement of the bars into sections and ultimately of the sections into the whole composition. For example, with the square gird, we may choose a bar length of 4 beats. These bars themselves would also be structured into sections relating to the grid sequence, such as 16 bars. These sections would then also be structured according to the grid and could have four sections, creating a macro level pattern over the whole piece, where the juxtaposition of one section against another will create an overall pattern. Just as when exploring rhythms within a bar and the related visual motifs, the macro level of the pattern and overall structure of the piece is open to creative exploration.

618

S. Mara and L. Westwood 1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

Fig. 36 Four 12-beat rhythms visualized using a hexagonal grid, showing the rhythm repeated 16 times: a simple rhythm accenting the first beat of each triplet, results in a simple pattern of uniformly spaced hexagons (top left); an African bell pattern (top right); a Flamenco compás (bottom left); and Arabic rhythm (bottom right)

Pentagonal Symmetry The grids explored so far involve the square and regular hexagon. These two regular polygons, as well as the equilateral triangle, create the three regular tilings that use just one type of regular polygon to tile the plane, leaving no gaps or overlaps. These tilings translate to time-signatures commonly found within music, such as 4/4, 3/4, and 12/8. In recent decades, there has been progress in tiling theory − particularly regarding aperiodic tilings – the Penrose Tiling being the most well known of these, which tiles the plane indefinitely with fivefold symmetry and has two cell types. Building a grid based upon the Penrose tiling and other aperiodic tilings reveals

20 A Hidden Order: Revealing the Bonds Between Music and . . .

619

Fig. 37 A grid created through a series of rotations, displaying sixfold symmetry. Successive generations of a section of the grid are shown below

interesting implications for the rhythmical and structural counterpart in music, as we will see. To derivea Penrose tiling, we shall use a similar approach to the other grids and start from a regular decagon. This time, rather than expanding outwards, the areas shall be subdivided to reveal the tiling. This process is known as substitution tiling, whereby each cell of a given shape is replaced by a specific grouping of cells to reveal the next generation of the tiling. The two substitutions applied here are as

620

S. Mara and L. Westwood

type a tile

type b tile

Fig. 38 The two substitutions of the Penrose tiling. The type-a tile is substituted by one type-a and one type-b tile. The type-b tile is substituted by one type-a and two type-b tiles

Fig. 39 Creating a Penrose tiling starting from Decagon (left), recursively applying the substitutions from Fig. 38

Number of Cells

1

2

3

5

8

13

21

34

Fig. 40 Number of cells in each generation of the Penrose tilings, also showing the intermediate steps revealing the Fibonacci sequence

shown in Fig. 38. The cell which forms one tenth of the decagon (hereby called a type “a” tile) is substituted with one smaller version of itself and a new, wider tile called type “b.” The type-b tile, in turn, is substituted with two smaller type-b tiles and one type-a tile. These are known as Robinson Triangles. Figure 39 shows this substitution applied to the decagon four times to create a Penrose tiling. There are particular orientations in which this substitution must take place. For further information, see Grünbaum and Shephard (1987). The amount of cells in one segment of the decagon in each generation increases in the sequence 1, 2, 5, 13, 34. These are alternate numbers from the Fibonacci series, the remaining numbers of the Fibonacci series being revealed where the grid resolves as a pentagonal shape comprised of type-b tiles – see Fig. 40.

20 A Hidden Order: Revealing the Bonds Between Music and . . .

621

Sections rs Ba

Wh ole

Fig. 41 An example of the Penrose tiling governing the structure of a musical piece. Each cell represents a beat of music. They are grouped into 5(blue) and 8 (orange) cells, representing a 5 beat bar and an 8 beat bar. These groups are arranged into sections of 13 (green) and 21 (purple), finally these sections are arranged into a structure of the whole piece of 3 (red) and 2 (yellow)

Fibonacci, Bar Length, and Structure of Composition The implications of the generations of this grid being based on the Fibonacci series are that, when explored as music, the Fibonacci series will govern the structure at every scale from beat to bar, to section, to the whole piece – see Fig. 41. The two cell types, although different in size, are both considered to be the same beat length in music and so represent equal lengths of time. This seems acceptable given that both cells may be considered as projections from the same higher dimensional cubic structure (Senechal 1995). This does raise a concern, however, in that at larger scales of the grid – for example, bar lengths – the two areas which are congruent to these two forms contain different amounts of cells, and so correspond to different lengths of time. The shapes congruent to the type-a and type-b cells on any given scale contain consecutive numbers of cells from the Fibonacci series (Fig. 41), meaning that in musical form there will be two bar lengths within a piece using consecutive Fibonacci numbers. For example, we may choose a bar length of 5 beats and a

622

S. Mara and L. Westwood

2

5 1

11 9 12 10 8 13 7 5 46 3 2 1

4 2

3

1

Fig. 42 Indexing the Penrose tiling using the tiling substitution order. An issue with this indexing can be seen in the second level of subdivision (center), where cells 2 and 3 together create a type-a tile shaded in orange, as do cells 4 and 5 shaded in blue, though the order of the tiles within this shape is reversed, shown by the arrow

47 48 53 30 14 19 52 18 29 31 13 15 20 2227 33 7 21 17 26 32 16 4 2 23 28 3 0 24 25

49 54

50 37 51 38

36 34 41 35 39 40 45 46 44 12 5 42 6 11 8 43 1 10 9

Fig. 43 The Penrose Cartwheel tiling indexed as by F. Lunnon

bar of 8 beats. These bars shall then also be grouped into sequences based upon the Fibonacci series, and so on, up until the level of the whole piece (Fig. 41). To make meaningful use of the grid and the relationship of the forms that are naturally occurring within them, the composer and designer must consider the order of “a” type cells and “b” type cells within the bar. What this all adds up to is a rhythmical structure tightly governed by the Fibonacci series at every scale of the piece.

20 A Hidden Order: Revealing the Bonds Between Music and . . .

623

Fig. 44 The Ammann-Beenker tiling, an aperiodic tiling displaying eightfold symmetry made with two tile shapes. The two tiles, with substitution arrangements, are shown below

Indexing the Penrose Tiling One form of indexing the Penrose tiling would be to follow the sequence of type-a and type-b tiles created from the substitution itself. Figure 42 shows this applied to the first 13 cells of the Penrose tiling. This sequence presents an issue, in that congruent shapes within the tiling sometimes have a different route through them, as shown in Fig. 42. Other indexing sequences have been explored, though an entirely satisfactory grouping of cells which works at every level of the substitution with the Penrose tiling remains illusive.

624

S. Mara and L. Westwood

Fig. 45 Aperiodic octagonal tiling with four tiles, shown with substitutions

However, the Penrose cartwheel tiling (Fig. 43) has been indexed by Fred Lunnon (Grünbaum and Shephard 1987) in a manner that works at every level of the sequence figure and may be applied in this translation method.

Octagonal Symmetry Another aperiodic tiling which may be applied in this method is the AmmannBeenker tiling, with eightfold symmetry – see Fig. 44. Another eightfold aperiodic tiling discovered by Mara, inspired by the aesthetics of Islamic arts, is shown in Fig. 45.

20 A Hidden Order: Revealing the Bonds Between Music and . . .

625

Summary In summary, we have seen a process for translating rhythm to pattern and vice versa, derived from a simple premise and following logical steps in geometry. This results in a geometric grid standing as the visual counterpart of the temporal structure of music, whereby each cell in a grid represents a particular beat, allowing for creative explorations in either rhythm or pattern to be represented in the other. We subsequently explored a selection of tilings, looking at their different symmetries and related time signatures. We have observed how each tiling has its own visual qualities and vocabulary of forms and have examined the contrasting quality of tilings created through reflections or rotations, as well as the regular tilings versus aperiodic tilings.

References Bibby N (2003) Tuning and temperament: closing the spiral. In: John F, Flodd R, Wilson R (eds) Music and mathematics from pythagoras to fractals, 1st edn. Oxford University Press, New York, pp 13–14 Boulez P (1971) Notes of an apprenticeship. Faber & Faber, London Caivano JL (1994) Color and sound: physical and psychophysical relations. Color Res Appl 19(2):126–133 Galeev BM, Valechkina IL (2001) Was Scriabin a Synesthete? Leonardo 34(4):357–361 Griscom W (2015) Visualizing sound: cross-modal mapping between music and color. PhD, University of California, Berkeley, P1 Grünbaum B, Shephard GC (1986) Tilings and patterns. In: Klee V (ed) . W.H. Freeman and company, New York, pp 540–570 Howat R (1983) Debussy in proportion. Cambridge University Press, Cambridge Jacobson R, Ray SF, Attridge GG, Axford NR (2000) The manual of photography, digital and photographic imaging, 9th edn. Focal Press, Oxford Kandinsky W (1914) Concerning the spiritual in art. Dover Publications, New York, p 25. (1977) Krumhansl C (1989) Why is musical timbre so hard to understand? In: Olsson O, Nielzén S (eds) Structure and perception of electroacoustic sound and music, 1. Excerpta Medica, Amsterdam, pp 43–53 Lendvai E (2000) Bela bartok: an analysis of his music. Kahn & Averill, London Livio M (2002) The golden ratio: the story of phi, the extraordinary number of nature, art and beauty. Review, London, p 193 Marks LE (1974) On associations of light and sound: the mediation of brightness, pitch, and loudness. Am J Psychol 87:173–188 Marks LE (1989) On cross-modal similarity: the perceptual structure of pitch, loudness, and brightness. J Exp Psychol Hum Percept Perform 15(3):598 Messiaen O (2002) Traite De Rythme, De Couleur, Et D’Ornithologie, tome VII. Alphonse Leduc, Paris, pp 95–191 Padovan R (1999) Proportion. Routledge, New York, pp 221–227, 2008 Peitgen HO, Jürgens H, Saupe D (2004) Chaos and fractals: new frontiers of science, 2nd edn. Springer Science+Business Media Inc, New York, p 92 Senechal M (1995) Quasicrystals and geometry. Cambridge University Press, Cambridge, p 195 Steinitz R (2013) György ligeti: music of the imagination. Faber & Faber, London, pp 267–269 Stevens JC, Marks LE (1965) Cross-modality matching of brightness and loudness. Proceedings of the National Academy of Sciences of the United States of America 54(2):407–411

626

S. Mara and L. Westwood

Taylor C (2003) The science of musical sound. In: John F, Flodd R, Wilson R (eds) Music and mathematics from pythagoras to fractals, 1st edn. Oxford University Press, New York, p 51 Tymoczko D (2011) A geometry of music: harmony and counterpoint in the extended common practice. Oxford University Press, New York Wittkower R (1953) Brunelleschi and “proportion and perspective”. J Warburg Courtauld Inst XVI. Idea and Image, Thames & Hudson, London, p 131, 1978

A Hidden Order: Revealing the Bonds Between Music and Geometric Art – Part Two

21

Lee Westwood and Sama Mara

Contents Structure of Final Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Indefinite Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compound Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creative Implications of a Translation Between Music and Art . . . . . . . . . . . . . . . . . . . . . . . . Creative Approaches Explored in A Hidden Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applying Musical Composition Techniques to Geometric Artwork . . . . . . . . . . . . . . . . . . Aperiodic Rhythms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Final Thoughts on the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

628 628 630 631 639 640 640 642 643 643 647 648

Abstract This chapter follows on from Part 1 in describing a method of translating music into geometric art and vice versa. Here we look at the macro level structure of the whole piece. Also discussed are the implications of establishing such a connection between music and geometric art. These include the possibility of unique creative processes that combine practices from both visual arts and musical composition,

L. Westwood University of Sussex, Brighton, UK e-mail: [email protected] S. Mara () Musical Forms, London, UK e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_19

627

628

L. Westwood and S. Mara

as well as facilitating the application of developments, practices and creative processes from one discipline to the other.

Keywords Music · Geometry · Visualisation · Sonification · Islamic · Art · Fractal · Aperiodic · Tilings · Harmony

Structure of Final Design In the following text, works from the A Hidden Order project shall be used to illustrate the points being made. Created by Mara and Westwood, this project explored the relationship between music and art, implementing the process for translating music into geometric art discussed here. The project is comprised of a series of geometric artworks and a corresponding suite of compositions for mixed ensemble, short films, an album, an interactive platform, and a live audiovisual performance. Upon completion of the design and compositional process, the works were performed by a small ensemble comprising of flute, cor anglais, cello, and marimba/percussion, which were recorded professionally in the studio. These recordings were entered into a bespoke computer program implementing the translation described here, resulting in a series of high quality digital prints and their accompanying animations.

Indefinite Growth The mapping between pattern and rhythm discussed here describes an everexpanding form that grows relative to the length of time of the music. The nature of how the grids are constructed means that two notes or sections of music separated by long periods of time may neighbor one another in the work’s visual form (Fig. 1). This can lead to visual results that do not correspond to the auditory experience. For example, two patterns neighboring each other may appear visually dissonant, but given the distance in time separating these sections in its musical translation, there is no corresponding dissonance aurally. Creating visually harmonious forms that bridge these areas, maintaining logical structures, is of course possible. There are many musical forms that follow similarly simple and rigid structures, and this was the approach taken towards a number of pieces with the A Hidden Order project, including Octagon I – Flute & Marimba (Fig. 2), whereby the geometric structure of the grid dictated the musical structure precisely. However, always sticking to a predefined and rigid structure ultimately leads to creative restrictions on the musical composition. To free up the compositional requirements when working with the translation from rhythm to pattern, different structural systems may be applied at different scales of the piece so that, for example, one system of symmetry may be used up

21

A Hidden Order: Revealing the Bonds Between Music and. . .

629

Fig. 1 An example of two cells neighboring each other which are temporally distant. The grid (right) shows four sections of the structure of a piece. The grid represents 768 beats. At 120 bpm each cell represents half a second, so the grid would represent 6 min and 24 s. The magnified area (left) shows two cells neighboring each other shaded orange and blue. These cells exist in different sections and represent beats that are 96 s apart in the musical structure Fig. 2 Octagon I – Flute & Marimba

to the level of a bar, with another that arranges the bars and sections themselves. This becomes relevant when considering the structure of a piece of music, whereby different scales of the piece, from beat, to bar, to section, to the whole piece may follow different structural arrangements, and where each scale may be considered as an independent object within the whole, one not necessarily entwined with, or dependent on, another distant part.

630

L. Westwood and S. Mara

Linear Layout One example of a change of structure applied at different scales is to introduce the linear element of music into the overall structure of the visual pattern. To illustrate this, we shall look at the piece Octagon Square–Marimba (Fig. 3), where each bar is visually represented as a square. These squares are arranged in a simple linear sequence. This arrangement allows more freedom for the musical composition, as

Fig. 3 Octagon Square–Marimba. Each bar of music is represented by a square. These square are arranged in a linear sequence starting from top-left and ending bottom right

21

A Hidden Order: Revealing the Bonds Between Music and. . .

631

Fig. 4 A visualization of a change in time-signature. Each row represents one bar of music. For example, the squares (bars 1–4) could represent 4-beat bars, the triangles 9-beat bars, and the rectangles above could be based on another symmetry and represent bar lengths of 5-beats

the linear reading of the piece frees up the consideration of how disparate parts of the piece might juxtapose visually.

Change of Time Signature It has not yet been possible to have a logical and continuous visual representation of a change of time-signature, as the symmetry of the grid would need to change in accordance with this. The linear layout provides an opportunity to allow for this to happen in a seamless manner (see Fig. 4). Any time-signature change may be accommodated by the use of a triangular or rectangular section of a grid that contains the required number of cells.

Compound Grids Drawing inspiration from Islamic geometric artworks once again, we see beautiful works that display local areas of radial symmetry at different scales of a piece (see Fig. 5). This artwork displays a symmetric cell unit, with an immediate surrounding area also displaying radial symmetry and ultimately a symmetric

632

L. Westwood and S. Mara

Fig. 5 An example of Zellij tiling from Morocco, from the Moulay Idriss II. Within this artwork are areas of local symmetry that are repeated and arranged to create a symmetric overall composition (Image © Ghulam Hyder Daudpota)

composition of the whole piece. Visually this is very satisfying and something that was felt appropriate to integrate within the system itself. This is not only an aesthetic consideration but also has an impact on the effectiveness of the translation method. By applying an inherent symmetry at various levels, the bars, motifs, and sections of the musical composition are in a sense visually self-contained within an area of the piece, sitting alongside other sections rather than being entwined together. This liberates the dependence of one area of the piece from another, more akin to the way music is heard and experienced. It is possible to achieve radial symmetry at various levels of the work following the original approach, and this was explored most notably in the pieces Pentagon III – ‘Roundels’ Ensemble and Octagon III – Solo Conga (Fig. 6). However, as discussed above, this is achieved only through the use of a restrictive set of compositional requirements. Following is presented a solution to achieving local areas of symmetry at various levels of scale within the visual design while maintaining more freedom in the compositional process. The approach again draws upon techniques from fractal geometry and is to construct radially symmetric groupings, which themselves are arranged and repeated so as to create a larger symmetric group, repeating this process up until the scale of the whole piece. Examples of this approach from A Hidden Order are Hexagon I – Ensemble (Fig. 7), Square – Ensemble (Fig. 8), and Triangle – Ensemble (Fig. 9). To illustrate the process, we shall examine the underlying structure of Hexagon I – Ensemble. Each bar of music is comprised of 12-beats and is represented by a hexagon which itself is radially symmetric. The hexagons are grouped into fours

21

A Hidden Order: Revealing the Bonds Between Music and. . .

633

Fig. 6 Octagon III – Solo Conga, from the A Hidden Order collection. The area highlighted in purple is a symmetric motif, and this is repeated eight times around the blue highlighted area to create a larger symmetric motif in orange. The orange area itself is repeated eight times around the center of the whole piece

Fig. 7 Hexagon I – Ensemble

and arranged together, with the centers of each hexagon lying on the vertices of a rhombus comprised of two equilateral triangles (Fig. 10). This group of four hexagons is then copied and rotated three times to create a section of radial symmetry (Fig. 11).

634

L. Westwood and S. Mara

Fig. 8 Square – Ensemble

Fig. 9 Triangle – Ensemble – Variation I

These four bar sections themselves are arranged together with other four bar sections on the vertices of a triangle (Fig. 12), which are then copied and rotated to create another level of radial symmetry (Fig. 13). This process may be repeated up until the final arrangement of the piece. In Hexagon I – Ensemble, the final arrangement was left without radial symmetry for aesthetic reasons.

21

A Hidden Order: Revealing the Bonds Between Music and. . .

635

Fig. 10 Each hexagon represents one bar of music. The diagram shows four bars arranged together based on a rhombus

Fig. 11 The four bar section is repeated in the visualization three times to create a radially symmetric area. This whole area represents four bars, as the repetition is applied to the visual representation only

At various levels of the piece, we see local symmetry displayed. First on the level of a bar, then within four bar sections, then 12 bar sections and so on. This approach leads to visually satisfying pieces and frees up the creative constraints from having to strictly consider the relationship of disparate parts of the composition. Now each area contains its own symmetry and is therefore visually more self-contained, though the juxtaposition of these areas still has a visual impact, and there is still

636

L. Westwood and S. Mara

Fig. 12 The four bar section is combined with two other 4-bar sections in a triangular formation, visually representing a 12-bar section

the ability to create pieces with consideration for the larger scales of the musical structure and visual pattern.

Performance Dynamics and Accents In exploring the relationship between rhythm and pattern, we have built upwards from a “beat” and a “cell,” towards bars, sections, and a whole piece. It is clear to see that the beat and cell may also be subdivided in the same manner as has already been described, by working down to smaller generations rather than up to larger ones. This subdivision may be performed indefinitely, leading to shorter and shorter beats and ultimately enabling the analysis of the contours of a sound within a particular beat. The process is best performed by the use of a computer to achieve more detailed and accurate results, given the short lengths of time that are being analyzed. This fine-detailed analysis of sounds within a beat allows for the performance dynamics and even aspects of the timbre of the music to be visually represented in the rendering of the pattern, creating artworks with subtle nuances and variations which lend a natural feel to an otherwise digital process. In Fig. 14, we see the rendering of a section of a percussive piece performed on the conga. The piece was performed by a musician, recorded, and processed through a computer program which implements the method described here. The two colors are created by the two different-pitched congas, one bass one treble. Within each cell of the grid, there is a gradation of intensity from a rich, saturated dark color at the more acute vertex, fading gradually to the white of the background (Fig. 15). This gradation is created by the attack of the note as the conga is struck, and subsequent decay of the sound as it returns to silence.

21

A Hidden Order: Revealing the Bonds Between Music and. . .

637

Fig. 13 The 12-bar section is repeated visually three times, arranged on an equilateral triangle to create a radially symmetric 12-bar section, comprised of three sections of four bars, where each four bar section is also radially symmetric

Fig. 14 Detail of Octagon – Solo Conga

Across the area of the artwork are subtle fluctuations of the colors, created by the variations within the dynamics of the performance, when the drum is struck slightly harder or softer, or in a slightly different manner that resulted in a change of instrumental tone (Fig. 16).

638

L. Westwood and S. Mara

Fig. 15 A single beat (outlined in orange) from the piece Octagon III – Solo Conga. The attack of the sound starts at the sharpest vertex of the triangle, represented by the darker and more saturated area, and fades towards the opposite side

Fig. 16 The octagram shown in blue is comprised of the same drum being played 16 different times, resulting in slight variations in the sections of the octagram

Timbre and Texture At an even finer level of detail, the timbre of the sound affects the texture of the pattern. There are many aspects that contribute towards the timbre of a sound. According to Griscom, “timbre is particularly interesting as a musical feature because it is inherently multidimensional, consisting of aspects that are temporal,

21

A Hidden Order: Revealing the Bonds Between Music and. . .

639

Fig. 17 The note E played on marimba (left) and flute (right). The darker center of the marimba shows the sharp attack of the note, with the sound quickly fading to silence as the color fades back to the white of the ground. Inversely, the flute has a slow attack, with a paler center, the curve of the sound reaching its loudest in the area with the most saturated purple. The note then starts to fade away as the colors return towards white. At a finer level of detail, the specific texture within the forms is created through the analyses of the harmonics of the sound and how they vary over the course of the note length. The texture of the flute’s visualization is more consistent and smoother, whereas the marimba is more erratic and has more contrast, resulting in a rougher texture

spectral, and spectro-temporal in nature” (Griscom 2015). The spectral aspects of timbre were discussed earlier and relate to the saturation of the color. Here it is the temporal and spectro-temporal aspects which result in a textural variation within the pattern. Figure 17 shows visualizations of the note E, performed at similar amplitudes on two different instruments. The resolution of the analyses is high, each cell representing 0.06 ms, and the visualization in total representing 1 s. On a general level, we see the variations in attack and decay of the two notes. Also evident here is a variation of the texture itself, created by the difference in the harmonic spectrum of the two instruments. The subtle variations in hue, brightness, and saturation over each piece are due to the interactions of the harmonics of the sounds varying over time.

Creative Implications of a Translation Between Music and Art In the above sections, we have described a translation between music and geometric art, working from the mathematical foundations of both. The process strives to place minimal restrictions upon musical composition, allowing for a wide spectrum of creative expression. Such translation enables music to have a visual form and for geometric artwork to be heard.

640

L. Westwood and S. Mara

This direct dialogue between music and art allows for a unique creative process whereby, at any stage during the development of a piece, it may be considered in either its aural or visual form. As such, it is possible that a piece may be developed by applying techniques and practices from either the discipline of musical composition or geometric design. A further outcome of this interaction between music and art is the ability to bring in developments of one practice to inform and advance the other.

Creative Approaches Explored in A Hidden Order Within the development of the works of A Hidden Order, Mara and Westwood explored a range of creative possibilities: 1. The piece was composed purely from aural considerations, using a time signature, and structure related to the chosen grid. The piece was then visualized upon completion of the composition process (Figs. 7 and 8). 2. Starting from a selection of musical motifs, these motifs were then visualized and selected based upon their aesthetic qualities. The selected motifs were subsequently developed into a final musical composition (Fig. 9). 3. Starting from a selection of visual motifs, these were translated into their rhythmical counterparts and developed into a final musical composition (Fig. 3). 4. A visual template was created that governed the rhythmic and overall structure of the piece. This was then developed musically, with the addition of pitches and harmonic accompaniment, and once again visualized. A to-and-fro process between the visual and sonic states took place before arriving at a final composition based upon the composers’ aural and aesthetic considerations (Fig. 2). 5. A complete geometric artwork was designed, based purely upon aesthetic considerations. This was transcribed to its musical counterpart (Fig. 6).

Applying Musical Composition Techniques to Geometric Artwork The translation between music and geometric art allows for practices and creative decisions of one medium to be directly applied to the other, creating the possibility that developments from one field are able to benefit the other.

Introduction/Contrasting Sections Within the structure of a musical piece, it is common to have a variety of contrasting sections. The introduction, for example, may be more sparse, where themes are slowly introduced and an atmosphere is set. Developing over time, there may be a chorus that acts in contrast to a verse, and so on. In geometric art, the use of contrasting sections and the development of themes occurs, but is not standard

21

A Hidden Order: Revealing the Bonds Between Music and. . .

641

practice. There is usually one theme within the work, repeated across the image field, and there will usually be only one color scheme, with a consistent dynamic across the artwork (Fig. 18). Exceptions to this exist: for example, within the Zillij tiling work of Morocco (Fig. 5), we see works that are not purely one design, but instead develop over the area of the artwork. However, even in this example, there is only one color scheme and a consistent dynamic across the piece. In Triangle – Ensemble – Variation I, a piece primarily created through musical composition, the central area (Fig. 9) is the introduction to the piece. Here, the areas of white are created by the silences within the music, with the accented notes creating the strong green. Gradually, longer notes are introduced, creating larger areas of color. Following this, more defined rhythms come into play, creating areas with more cohesive patterns. Here we see a more lyrical approach to geometric art, created from the narrative and development that is so natural to musical composition. The sparse introduction to the piece stands in contrast to the two sections depicted either side of the introduction (Fig. 9). In these two sections, the full ensemble of flute, cor anglais, cello, and marimba is performing together, creating a rich tapestry of colors and a greater depth of pattern, resulting from the complex interactions of the harmonies and dynamics of the musical piece. Again this use of layered motifs and rich overlaying textures is not standard practice within geometric art, but translates very successfully and opens many possibilities that are hitherto less explored. Another example of compositional approaches applied to geometric artwork includes is the development and manipulation of motifs (Fig. 19). Within the four sections, there is a common theme which may be seen in the overall structure of the section, though the details vary and the level of complexity within each of them grows as the theme is developed.

Fig. 18 A classic Islamic pattern from the Alhambra in Granada, Spain. The use of a repeated, unaltered motif across the image plane is standard practice in Islamic arts. (Image © Ghulam Hyder Daudpota)

642

L. Westwood and S. Mara

Fig. 19 Detail from Triangle – Ensemble, showing a series of motifs that are variations on a theme

Aperiodic Rhythms Through this translation, we have also been able to bring in developments from geometric art into musical composition. An example of this in A Hidden Order is of the application of aperiodic sequences as a musical structure. As mentioned earlier, the Fibonacci series has been implemented within musical composition. This is usually applied to the overall structure, or at specific scales of the piece. Within the method described here, however, the Fibonacci series permeates every level, from beat, to bar, to motif, to section, to the structure of the work as a whole, all informed and guided by its visual counterpart. Pentagon III – ‘Roundels’ Ensemble was created by examining the Penrose tiling and locating the areas which displayed local five-fold symmetry at various sizes of cell groupings (Fig. 20). It was of interest to see if the visual and geometric structure that governs the occurrence and placement of these roundels (and the grid as a whole) would have a harmonic value when translated into a time based structure. In this compositional game, each instrument in the quartet was assigned a different scale of cell groupings. The lengths chosen were of 2, 5, 13, 34, and 89 beats. To create these areas of radial symmetry, the given instrument would need to repeat the same motif at specific parts of the composition, governed by the structure of the grid. The subsequent interplay of these motifs would be an outcome of the nature of the tiling. Though this is a very particular setup, it gives us the opportunity to explore sequences standard in mathematics and geometry but not common in music, to hear how these sequences would sound and feel, and to interact with them in a creative manner musically. There is still much to be explored, but the first inroads seem promising, as the aperiodic sequences, although unfamiliar, do not feel unnatural, and they do allow for creative expression within their structures.

21

A Hidden Order: Revealing the Bonds Between Music and. . .

643

Fig. 20 The Penrose tiling, with areas of local symmetry at various cell groupings highlighted. Each size of cell grouping was assigned to a different instrument and governed the structure of their respective parts

Conclusion Some Final Thoughts on the Research Beyond outlining the theory and methodology involved in this creative process, it is worth considering some of the practical, aesthetic, and philosophical implications that were brought to light throughout the course of the research.

Pain and Gain Through Restrictions While there is undoubtedly an enormous amount of creative freedom to be found within the system underlined throughout this chapter, these acts of composition also come with a number of practical considerations. These include: the necessity of a fixed bpm; a limitation on the kinds of time signature that might be used within one composition (an area in which progress has been made subsequent to the A Hidden Order project); and the musically unfamiliar structural demands of aperiodic tiling. Such considerations might be viewed as restrictive, but in practice they can also prove to be the most liberating or innovative side to the process. In the author’s experience, working within the very specific requirements of a brief can facilitate one’s creativity, freeing the composer from paralysis in the face of the dauntingly infinite possibilities of a blank page. Using rules adopted from other worlds – scientific, mathematical, philosophical – can also help to shape the work

644

L. Westwood and S. Mara

in unexpected ways, giving a fresh slant to old methods, and guiding us away from habit. Most fundamentally though, in this case, the conceptually rigorous foundations for Mara’s translation between sound and pattern circumvented any efforts in musical visualization which, having a pleasing surface quality to them, would nonetheless be founded on arbitrary relationships (consider your average computer screensaver, for example). While it is common practice across many cultures to write music that follows the 4/4, 12/8, 3/4, or 6/8 time signatures characteristic of square and triangular symmetry, and even to structure such music in ways sympathetic with the unfolding of higher generations within these grids (doubling, tripling, etc.), forms based on pentagonal or octagonal symmetry have quite plausibly not been explored in the same strict manner undertaken in the A Hidden Order suite. The music, guided by the geometry and related numerical sequences, must follow a very specific sequence of mixed time signature changes, with stresses and relationships between musically unusual beats of the bar. Enlarging the musical structure from one generation of the grid to the next also requires the composition of a highly specific and exponentially large amount of material entirely unfamiliar to the doubling or ternary sequences amiable to, say, Classical music (consider the implications of the Fibonacci series: 1,1,2,3,5,8,13,21...). Reflecting on the experience of working within these various symmetries, it is interesting to consider how each one promoted a slightly different approach to the writing practice, resulting in the creation of a very diverse and colorful suite of compositions. From a musical perspective, while composing within square and triangle grids felt as natural as putting on one’s own shoes, writing in the unfamiliar environments of pentagonal and octagonal symmetry was most fruitful when we approached them with geometric sketches as a starting point, working music into these designs (or rather, working the music out of them) as we went. A particularly exciting systemic method of composition was adopted for the work Pentagon III – ‘Roundels’ Ensemble, as described above in Section Aperiodic Rhythms. Here, following Fibonacci proportions faithfully through what became a gradual build-up, break-down, and final concentration of melodic material transposed into a very natural musical structure. At other times, desirable changes in the geometry led to the need for frustrating edits to musical details (and vice versa) which, in their original medium, had worked very well, as was the case with Octagon I – Flute & Marimba. Having carefully crafted a melodic line, requests such as “could you change the second and seventh beat of every fourth bar so that it’s blue” (to paraphrase a certain geometer) can be painful! This, of course, goes with the territory of any collaborative work, and learning how to navigate the parallelcomposition process resulted in what are perhaps A Hidden Order’s most exciting and successful works. These differences should not suggest that the two worlds are unconnected or that this research forces a nonexistent relationship between music and geometry. Rather, one might speculate that they are symptomatic of modes of expression which hold a common root – a shared mathematical genome, if you will – but

21

A Hidden Order: Revealing the Bonds Between Music and. . .

645

which have diverged over many centuries of cultural evolution, adopting their own idiosyncrasies along the way. Whatever the case, there were times when it came as a surprise to find that a very natural musical gesture would look awkward, or where a pleasing geometric formation exhibited less musical interest than anticipated. Consequently, when working in one medium exclusively until a satisfactory outcome was reached, it could not necessarily be expected that the results would translate with equal success. This was more typical of grids obeying the golden and silver ratio than those based on triangular, hexagonal, or square symmetry, although it might simply be put down to a lack of experience working with these kinds of structure in a musical context. Broadly speaking, however, when the compositional process was founded on continual feedback from one side to the other, the results yielded were hugely rewarding. Hexagon II – Cello & Percussion, for example, made recourse to some of our earliest research into simple rhythms and rudimentary, repeated geometric patterns, providing a very reliable foundation on which to build the work. Furthermore, its visual construction benefitted from the linear layout discussed above in Part I Section Linear Layout, freeing the music from the necessity of strict interactions over large time-spans.

A Multidimensional Artistic Object As mentioned in Part I Section Performance Dynamics and Accents, the use of Fourier analysis, and the consequent application of the rhythm-to-pattern translation at higher resolutions, allowed the rendering of a fine level of detail from the recorded audio, one far exceeding the information available in the original score. This meant subtle nuances in performance, dynamics, timbre, and timing became visible in the final image, making the geometry a blueprint not just for the music, but for the recording of that particular performance. This brings to light a number of interesting philosophical questions surrounding the nature and function of the score: if this printed language, originally a series of symbolic instructions for recreating sounds, could now be used as a map of another image, in turn, could this geometric image be considered a score for the music, and vice versa? At a certain point in the creative process, it became apparent that what was being created was not specifically an image or a musical work, but rather a multidimensional artistic object that existed independently of either of these mediums, one which could, however, be “viewed” through a number of different goggles: you might put on your “music goggles” and experience the object as sound; or you might wear your “geometry goggles” and experience it from a different dimension, one of pattern... In either of these cases, the object itself was essentially something else, something intangible yet all-pervading, universal. One very exciting outcome of this mode of creation is that each pair of goggles offers a unique vantage point, from which the viewer can best experience differing qualities of the object. Using our “music goggles,” we are better positioned to distinguish the various layers of a dense “polyphonic” texture. Our ears may effortlessly isolate the activities of different overlaid voices using timbral information, a clarity we struggled to replicate in our 2-dimensional grids. For example, it is far harder

646

L. Westwood and S. Mara

for us to draw out the cello line with our eyes from Triangle – Ensemble than it is with our ears. Similarly, through our “geometry goggles,” where sound in time is mapped onto 2-dimensional space, we are able to visualize the entire work on one canvas, allowing us to gather an impression of the entire structure at once, in contrast to the necessarily moment-to-moment experiential nature of music. Here, in this visual realm, free from the restrictions of memory, disparate points in time may neighbor one another and form more explicit relationships which otherwise may go overlooked or forgotten.

Time as Space Dealing with temporality in 2-dimensional space led to the consideration of a number of alternatives in the way that time unfolded geometrically. While the musical works can be visualized as a static image – one containing all the musical information at a glance – it is also possible to render them in real-time, allowing the creation of animations for each composition, and the performance of the suite live in a multimedia concert environment. Mara’s original method of visualization involved working from the center outwards, eventually culminating in an entire image which contained the full duration of the musical work, much like a printed score. This provided the ideal means of displaying the architecture of the piece as a whole, but resulted in occasional difficulties where musical sections otherwise conceived of as unrelated would neighbor one another visually and thus have to be molded to suit (see Part I Section Indefinite Growth). Striving for something that emulated the momentary experience of music, Mara then developed a process whereby each visualized bar of music was tessellated across the visual field which, on completion, began to fade and write over itself. This offered an excellent means of exploring musical works with more unusual structures, as well as works that were through-composed, or improvisatory in nature. In summary, the two methods employed allowed the work to proceed on an equal footing with each medium, or alternatively with a bias towards either musical or geometric freedom. A number of other philosophical considerations arose from these issues in translation. For one, the geometric work was founded on the principle of rotational symmetry. As a consequence, each moment in time appears at numerous points in space simultaneously. A second perceptual dilemma (discussed in Part I Section A Hidden Order: Revealing the Bonds Between Music and Geometric Art – Part I) stems from the fact that cells of different sizes within the aperiodic tiling of pentagonal and octagonal grids (type-A and type-B shape cells) correspond to the same beat length. In tow with the foundation of the theory which relates length of time directly to area of space, it may prove fascinating to experiment with music that embraces these two distinct beat lengths (in the case of the Penrose tiling the resulting two beat lengths are in the ratio 1:ϕ). A future consideration might be to explore such relationships through the use of electronic music, bringing to mind the

A Hidden Order: Revealing the Bonds Between Music and. . .

21

647

“impossible” rhythms found in the works of Conlon Nancarrow, who used piano rolls to achieve the otherwise unplayable (Steinitz 2013).

Looking Ahead At the time of creating A Hidden Order, it was not yet clear how a grid might be constructed which would be capable of dealing with a sequence of time signatures or tempi that relate to changing symmetries (see Part I Section Change of Time Signature). Having now overcome the issue of time signatures through linear layouts opens up vast avenues for future research. Reflecting on such restrictions, one might also ask how much the still-developing software that aids the visualization has been allowed to guide the creative process. While this is something that should continue to be reviewed, it is without doubt that the implementation of these technologies in A Hidden Order has taken the resulting art to levels that had not originally been foreseen, not only making very time-consuming processes a reality, but also revealing some of the most exciting aspects of the work, ones which would have remained hidden had the authors proceeded by hand. Perhaps the most poignant dilemma when working between the arts and sciences concerns how we define the quality of “beauty” and how one judges what is aesthetically “true.” For the most part – and as the authors of this paper, one can only speak from the point of view of one’s own ears and eyes – the works of A Hidden Order display a staggering fluency between the musical and geometric world that hints at a universal harmony transcending artistic mediums, a harmony deeply rooted in the world of mathematics. This sense is strengthened by a perceived success in the compositions derived from musically nontraditional structures proportions, such as Pentagon III – ‘Roundels’ Ensemble and Octagon I – Flute & Marimba. However, this dependence upon the senses becomes especially contentious when evaluating those endeavors throughout the research that were deemed by us to be less successful. When a composition follows a certain musical logic, but to our eye does not immediately look right (or vice versa), can this be considered evidence of the work not exhibiting a “true” or “common” harmony across both mediums, or are we simply responding to what we are culturally conditioned to see as “right”? Is it conceivable that we are viewing a geometry we are not yet accustomed to? It is important, as practitioners, to keep an open mind and to continually ask the question of what guides our aesthetic judgments – whether we should look back to tradition, rely on instinct, or embrace new musical and geometric harmonies which push the boundaries of those established within our culture.

Cross-References  A Hidden Order: Revealing the Bonds Between Music and Geometric Art – Part

One

648

L. Westwood and S. Mara

References Bibby N (2003) Tuning and temperament: closing the spiral. In: John F, Flodd R, Wilson R (eds) Music and mathematics from pythagoras to fractals, 1st edn. Oxford University Press, New York, pp 13–14 Boulez P (1971) Notes of an apprenticeship. Faber & Faber, London Caivano JL (1994) Color and sound: physical and psychophysical relations. Color Res Appl 19(2):126–133 Galeev BM, Valechkina IL (2001) Was Scriabin a Synesthete? Leonardo 34(4):357–361 Griscom W (2015) Visualizing sound: cross-modal mapping between music and color. PhD, University of California, Berkeley, P1 Grünbaum B, Shephard GC (1986) Tilings and patterns. In: Klee V (ed) . W.H. Freeman and company, New York, pp 540–570 Howat R (1983) Debussy in proportion. Cambridge University Press, Cambridge Jacobson R, Ray SF, Attridge GG, Axford NR (2000) The manual of photography, digital and photographic imaging, 9th edn. Focal Press, Oxford Kandinsky W (1914) Concerning the spiritual in art. Dover Publications, New York, p 25. (1977) Krumhansl C (1989) Why is musical timbre so hard to understand? In: Olsson O, Nielzén S (eds) Structure and perception of electroacoustic sound and music, 1. Excerpta Medica, Amsterdam, pp 43–53 Lendvai E (2000) Bela bartok: an analysis of his music. Kahn & Averill, London Livio M (2002) The golden ratio: the story of phi, the extraordinary number of nature, art and beauty. Review, London, p 193 Marks LE (1974) On associations of light and sound: the mediation of brightness, pitch, and loudness. Am J Psychol 87:173–188 Marks LE (1989) On cross-modal similarity: the perceptual structure of pitch, loudness, and brightness. J Exp Psychol Hum Percept Perform 15(3):598 Messiaen O (2002) Traite De Rythme, De Couleur, Et D’Ornithologie, tome VII. Alphonse Leduc, Paris, pp 95–191 Padovan R (1999) Proportion. Routledge, New York, pp 221–227, 2008 Peitgen HO, Jürgens H, Saupe D (2004) Chaos and fractals: new frontiers of science, 2nd edn. Springer Science+Business Media Inc, New York, p 92 Senechal M (1995) Quasicrystals and geometry. Cambridge University Press, Cambridge, p 195 Steinitz R (2013) György ligeti: music of the imagination. Faber & Faber, London, pp 267–269 Taylor C (2003) The science of musical sound. In: John F, Flodd R, Wilson R (eds) Music and mathematics from pythagoras to fractals, 1st edn. Oxford University Press, New York, p 51 Tymoczko D (2011) A geometry of music: harmony and counterpoint in the extended common practice. Oxford University Press, New York Wittkower R (1953) Brunelleschi and “proportion and perspective”. J Warburg Courtauld Inst XVI. Idea and Image, Thames & Hudson, London, p 131, 1978

Korean Traditional Patterns: Frieze and Wallpaper

22

Hyunyong Shin, Shilla Sheen, Hyeyoun Kwon, and Taeseon Mun

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frieze Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wallpaper Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

650 651 652 661 662 663

Abstract All frieze and wallpaper patterns in Korean tradition are presented with their locations. They can be recognized audibly as well as visually through analyzing symmetries of patterns. Two Korean traditional music instruments play the seven frieze patterns. A piano music is also introduced. Music can be accessed through QR-codes. A cover of umbrella of regular 17-gon is proposed by the 24 patterns.

H. Shin () Korea National University of Education, Cheongju-si, South Korea e-mail: [email protected] S. Sheen Graduate School, Korea National University of Education, Cheongju-si, South Korea e-mail: [email protected] H. Kwon Gyeongsangnam-do Office of Education, Changwon, South Korea e-mail: [email protected] T. Mun Seoul Metropolitan Office of Education, Seoul, South Korea e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_17

649

650

H. Shin et al.

Keywords Frieze · Wallpaper · Symmetry · IUC notation · Orbifold notation

Introduction From a geometric point of view, there are 7 types of frieze patterns and 17 types of wallpaper patterns (Conway and Huson 2002). In this article, the Korean traditional frieze and wallpaper patterns for each type are presented, and their locations are shown in the map (Fig.1) by Roman characters. For example, the frieze pattern of type p1(∞∞) can be found in Seoul which is coded as I.

RU

SS

IA

Fig. 1 Locations

CHINA

X

IX

XI

NORTH KOREA SOUTH KOREA I II III

VIII IV VII

V VI

JAPAN

22 Korean Traditional Patterns: Frieze and Wallpaper

651

In presented patterns, the color has been ignored to make our classification simple (Grünbaum 2006). Most patterns are from South Korea, Republic of Korea, and the others are from North Korea, Democratic People’s Republic of Korea. The IUC notation(Macbeath 1966; Blanco and de Camargo 2011) as well as the orbifold notation(Conway and Huson 2002) for the types of pattern are used. The symbols of symmetries can be found in various literatures (Blanco and de Camargo (2011), for example), but they speak for themselves.

Frieze Patterns The symmetries of each frieze pattern are quite obvious. The fundamental regions showing symmetries are omitted in this article. p1, ∞∞ Bo (Dish for ritual offering), National Palace Museum of Korea, Seoul, South Korea (I)

p11m, ∞* Girder, Donghak Temple, Choongchungnamdo, South Korea (IV)

p1m1, *∞∞ National Treasure of Korea No. 68, Gansong Museum, Seoul, South Korea (I)

p2, 22∞ Brass Censer, Sukwang Temple, Kangwondo, North Korea (IX)

652

H. Shin et al.

p2mm, *22∞ Girder, Ryongheung Temple, Hamkyungnamdo, North Korea (X)

p11g, ∞× Dodecagon Porcelain Dish (Editorial Board of Chosun Relics 2002), North Korea (XI)

p2mg, 2*∞ Girder, Ryongheung Temple, Hamkyungnamdo, North Korea (X)

Wallpaper Patterns The fundamental region of each of 17 types of wallpaper pattern is tabulated in Table 1. Now the examples of traditional wallpaper pattern of Korea for each type are presented with fundamental region showing symmetries. Some patterns can be found in Lee (1995), LEEUM (2014), Lim (2004), Shin and Sheen (2014a,b) or Shin et al. (2014).

22 Korean Traditional Patterns: Frieze and Wallpaper Table 1 Types of Wallpaper

653

654

H. Shin et al.

p1, o National Treasure No. 60 of Korea, National Museum of Korea, Seoul, South Korea (I)

pm, ** Flower Door, Soodug Temple, Choongchungnamdo, South Korea (II)

pg, ×× Cotton Jacket, Sook Myung Womens University Museum, Seoul, South Korea (I)

22 Korean Traditional Patterns: Frieze and Wallpaper

655

cm, *× Porcelain Dish with Wave Pattern (Editorial Board of Chosun Relics 2002), North Korea (XI)

p2, 2222 Bamboo Basket, Korea Bamboo Museum, Chollanamdo, South Korea (VI)

In all the patterns in this article, the interlacing is considered. In fact, if it is not considered, the pattern may be of different type. On this issue, Grünbaum (2006) is recommended.

656

H. Shin et al.

pmm, *2222 Flower Door, Dae Seung Temple, Kyungsangbookdo, South Korea (VII)

pmg, 22* National Treasure of Korea No. 198, Gyeongju National Museum, Kyungsangbookdo, South Korea (VII)

22 Korean Traditional Patterns: Frieze and Wallpaper

pgg, 22× Bamboo Winnow, Korea Bamboo Museum, Chollanamdo, South Korea (VI)

cmm, 2*22 Door, Magok Temple, Choongchungnamdo, South Korea (III)

657

658

H. Shin et al.

p3, 333 Korean Kite with Samtaegeuk Pattern, Private Possession

As mentioned before, color is ignored in all patterns. p3m1, *333 Flower Door, Soodug Temple, Choongchungnamdo, South Korea (II)

p31m, 3*3 Girder, Naeso Temple, Chollabookdo, South Korea (V)

22 Korean Traditional Patterns: Frieze and Wallpaper

659

p4, 442 Dining Room, Changduk Palace, Seoul, South Korea (I)

p4m, *442 National Treasure of Korea No.95, National Museum of Korea, Seoul, South Korea (I)

660

H. Shin et al.

p4g, 4*2 Door, Changduk Palace, Seoul, South Korea (I)

p6, 632 Bamboo Wife, Korea Bamboo Museum, Chollanamdo, South Korea (VI)

p6m, *632 Flower Door, Boolgook Temple, Kyungsangbookdo, South Korea (VII)

22 Korean Traditional Patterns: Frieze and Wallpaper

661

Some Designs • Original colors The following design consists of seven concentric circles with seven frieze patterns with the original colors as in this article. Through QR-code below, a piano music that presents the friezes can be heard.

• Obangsaek(Korean traditional colors) The following design consists of seven concentric circles with 7 frieze patterns with Obangsaek, the Korean traditional colors. Through QR-code below, a Korean music that present the friezes can be heard. The music is played by Professor Chung Tae Seok of Seoul National University and Ms. Chung Yun Soo, a teacher of Korean National School of Korean Music.

662

H. Shin et al.

• Umbrella Prime number “7” is special because 7 is the first prime number p such that regular p-gon is not constructible by straight edge and compass. Prime number “17” is also somewhat remarkable because 17 is the first “nontrivial” prime number p such that regular p-gon is constructible. It may be also worthwhile to remember that 7 and 17 are the first two full reptend primes. Now we construct regular 17-gon and get 7 concentric circles with 7 frieze patterns in this article and get 17 wallpaper patterns also from this article at each side of the polygon. This sheet may be used as a cover of an umbrella of regular 17-gon.

Conclusion Group theory classifies types of frieze and wallpaper patterns. This article presents all types of them from traditions of Korea. Through mathematics visual patterns

22 Korean Traditional Patterns: Frieze and Wallpaper

663

could be presented as music. The seven frieze patterns can be “heard” by Korean musical instruments as well as by piano.

References Blanco MFB, de Camargo HALN (2011) Symmetry groups in the Alhambra. In: Visual mathematics, vol 13. Mathematical Institute SASA, Beograd Conway JH, Huson DH (2002) The orbifold notation for two-dimensional groups. Struct Chem 13(3/4):247–257 Editorial Board of Chosun Relics (2002) Cultural assets of North Korea I, II, III, IV. Seoul National University, Seoul Grünbaum B (2006) What symmetry groups are present in the Alhambra? Notices of the AMS 53:670–673 Lee Y (1995) Relics of palaces (2). Daewonsa, Seoul LEEUM (2014) LEEUM collections of Korean old arts. LEEUM, Seoul Lim Y (2004) Traditional patterns of Korea. Daewonsa, Seoul Macbeath AM (1966) The classification of non-Euclidean plane crystallographic groups. Can J Math 19:1192–1205 Shin H, Sheen S (2014a) A mathematical approach to pattern: classification of Korean traditional frieze patterns according to group theory. Arch Des Res 27(3):295–311

664

H. Shin et al.

Shin H, Sheen S (2014b) Korean traditional frieze/wallpaper patterns according to mathematical classification. Presented at ICM2014, Seoul Shin H, Sheen S, Mun T, Kwon H, Lee Y (2014) A study on development of teaching/learning materials based on wallpaper patterns. J Korean Soc Math Educ Ser A: Math Educ 53(3):433– 445

Projections of Knots and Links

23

Alexander Åström and Christoffer Åström

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Knotwork Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rectangular Diagonal Knotwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Circular Knotworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Archaeological and Historical Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contemporary and Traditional Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Knotwork Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Number of Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Number of Crossings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Braiding Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Coloring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Construction of Knotworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

666 668 669 672 673 676 678 681 683 683 684 685 685 685 687 688 689

Abstract This chapter presents an introduction to the archaeological and historical aspects of the use of projections of knots and links for decorative purposes. These projections of knots and links have been created by man on various artifacts,

A. Åström () Gothenburg, Sweden e-mail: [email protected] C. Åström Ucklum, Sweden e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_16

665

666

A. Åström and C. Åström

for at least some 4500 years. Apparently, the most prominent types of artifacts in most of the studied civilizations and cultures have been decorated with knots and links. Some examples are the Greek and Roman mosaics of the Roman world and various artifacts and art forms in the British Isles in the Anglo-Saxon era. The working hypothesis is that these knots and links are potential projections of reallife three-dimensional objects or at least that the intention of their creators was to present a feeling or sense in the beholder that the images were depictions of actual physical knots or links. The topological study of knots and links consider objects in E3 . However, these projections can in one way be seen as twodimensional designs and could thus be studied with a geometrical approach in E2 instead or at least to some extent. From the perspective of decorative art, this may in fact be sufficient and even perhaps more convenient.

Keywords Archaeology · History · Geometry · Decorative art · Interlaced patterns · Knotworks · Knots · Links

Introduction The use of knots in a practical manner dates thousands of years back alongside the evolution of mankind. Although rope is not strictly needed in order to tie knots, they are tightly linked together. Some of the earliest archaeological evidence of knots have been found in the context of rope or cordage. For instance, findings in the Dzudzuana Cave in Georgia which includes strings with numerous knots are believed to be around 26,000–32,000 years old (Kvavadze et al., 2009). Knots made out of rope or cordage obviously suffers from the same difficulties as cordage in being perishable objects. Thus, if seeking for even older use of knots, indirect evidence can be sought after instead. For example, seafaring implies the use of rope and knots. Undoubtedly, practical knots must have preceded the use of decorative knots, but somewhere along the line man have started express herself through art, and the use of knots as decoration has come to be used. Some of the earliest findings of art are figurative motifs, such as Venus figurines, with the currently oldest specimen being the Venus of Hohle Fels believed to be some 35,000 years old (Conrad, 2009). According to Pike et al. (2012), cave paintings are about 40,000 years old. Non-figurative motifs such as geometrical motifs or patterns can also be found on cave walls and portable artifacts, for instance, the 17,000-year-old ivory figurines from Mezin in Ukraine (Soffer, 1997). Some geometrical motifs can be, and often are, interpreted as being projections of actual physical knots as they often show distinct over- and undercrossings. An example of a knot-like motif can be seen on the carved plaque in Fig. 1, comprised of two intertwined serpents forming a link (two separate components). See, for instance, Carter et al. (1992) for further information regarding this artifact. A real link, or knot, like this does not have much value from a practical point of

23 Projections of Knots and Links

667

Fig. 1 Carved plaque with figures and an interlaced geometrical motif in the upper middle portion. The geometrical motif is composed of two intertwined serpents forming a link. Originating in Mesopotamia (present Iraq) in 2600–2500 BC. (Photo credits: photo ©Musée du Louvre, Dist. RMN-Grand Palais/Les frères Chuzeville). Museum item Sb 2724: (a) full-size image of the artifact and (b) close-up on the link

view in real life. Thus, some transaction from pure practical knots into decorative knots must have been made at some time. The reasons are not known to us but might be religious or perhaps having a symbolic value. It is however intriguing why these ancient projections of knots and links are often made with no lose ends but rather being closed loops. This was the case at least until the seventh century when the Celtic and Pictish knotworks started to emerge and became rather elaborate and showing more examples of knotworks with lose ends compared to before. Later on the same trend appeared in Norse art as well. Motifs showing intertwined serpents, like the one in Fig. 1, are common on artifacts originating in Mesopotamia in the third and fourth millennia BC. These motifs often have distinct over- and undercrossing, but these early examples do not have closed loops like the one in Fig. 1. An example is the Morgan Seal 1 in the Pierpont Morgan Library originating late fourth to early third millennia; see No. 137 in Ward (1909). There are also examples of motifs with intertwined serpents in nearby cultures and civilizations, such as the chalcolithic period of Egypt (4000–3200 BC); see Köhler (2011) for a chronology and de Morgan (1924) for an example. There are also examples in the Susa II period (3500–3100 BC) of Iran Carter et al. (1992), and according to Herzfeld (1941), there are even examples of intertwined serpents in the Susa I period (4000–3500 BC) of Iran. Some examples stretch even as far back as 590– 5700 BC, for instance, a flint dagger depicting an intertwined serpent from Çatal Hüyük (Mellaart, 1964). However, more complex motifs with intertwined serpents forming knots and links can be found in Mesopotamia from around 2600–2500 BC; see, for instance, Przytycki (2009) and Jablan et al. (2012) showing an image of a cylinder seal from Ur with a rectangular-shaped knot structure but with open ends. In Herzfeld (1941), some examples of intertwined serpents are shown, which form rectangular designs with open ends. These examples are ascribed to the Mesilim

668

A. Åström and C. Åström

period, which according to Porada et al. (1992) is accommodated in the end of Early Dynastic I. These examples may then predate the one from Ur. In this chapter, an introduction is given to mathematical knots and archaeological and historical aspects of knots used for decorative purposes. Also included is an overview of the mathematics behind these projections of knots that can be found on various materials and spans at least some 4,500 years.

Terminology The terminology related to knots and links often differs between literature, depending on field of research or the context, material, techniques, etc. The main terminology used in this chapter is presented, starting with knots. Note the differentiation between knots used for practical applications and mathematical knots. The former is defined as the result of connecting one or more bendable object with the assistance of loops and bends, so that these form a concatenated unity. That is, nothing is stated regarding the material used for tying the knot nor stating anything regarding the number of components forming the knot. In knot theory, a knot is a closed-loop curve, and a link is a collection of one or more intertwined knots. The number of components in a link is sometimes referred to as strands, which is not to be confused with the strands of a rope. A formal definition of mathematical knots and link is given in section “Knot Theory” below. Continuing with some of the key terminologies of this chapter, looking at the work by Anderson (1881) and Allen and Anderson (1903) derived in the late nineteenth century. This work is based on the Celtic and Pictish carved stones from the sixth to the ninth century located in Scotland and the Scottish Isles. The definitions are however applicable for a larger scope. The sculptured designs of their classification are presented under four main headings, where each subject is further divided into subclasses. However, it is sufficient to look only at those classes of interest to this chapter. The limited classification scheme of Allen and Anderson (1903): (1) Symbols (2) Ornaments (2.1) Geometrical (2.1.a) Interlaced-work (2.1.b) Key-patterns (2.1.c) Step-patterns (2.1.d) Spirals (2.2) Zoömorphic (2.3) Foliageous (3) Figure Subjects (4) Inscriptions

23 Projections of Knots and Links

669

The branch of interest in this chapter is the one labeled (2.1.a), namely, Ornaments, Geometrical, and Interlaced-work. Allen and Andersons describe Interlaced-work as patterns consisting of narrow bands or cords forming a path of straight or curved lines with over- and undercrossings at regular intervals. In some literature, the word knotted is used synonymous to interlaced. Allen (1883) also divided interlaced-work into Plait-works and Knot-works. Plait-works are described as patterns formed by twisted or plaited bands and subsequent derived patterns using what he refers to as a process called stopping off. With the given definition, it is clear that there is no requirement that the ends should or should not be joined. With the examples given, it seems however to be that the ends should not be joined. That is, the plait-works are somewhat comparable to braids. Further, the term knot-works which is the pattern of interest in this chapter and is described as: The latter includes all patterns made by the repetition of an elementary knot at regular intervals, the ends being joined so as to form one or more continuous bands. (Allen, 1883, p. 226).

Throughout this chapter, it is written knotwork despite of Allen and Anderson’s knot-work. Looking at the knotwork definition, it should be noted that by elementary knot, it is clearly not meant a mathematical knot as a closed curve with ends that are joined. Rather the pattern is created by repeating a motif which is comprised of a looped structure with over- and undercrossings (given by the fact that a knotwork is a subclass of interlaced-work). However, the pattern is completed when the motif is repeated in such a way that there are no lose ends. Thus, ends being joined regards the entire knotwork. This also makes it clear that the knotwork concept includes both knots and links. In literature concerning various contexts such as Roman mosaics and near eastern cylinder seals, the term guilloche is used more or less synonymous to knotwork. But some (e.g., Lasi´c 1995) have a much wider definition of guilloche, which makes it clear that it cannot be used synonymous to knots and links. Finally, in the maritime context, the term fancywork is the corresponding implementation of a knotwork or any kind of ropework made out of rope with the purpose of being decorative.

Mathematical Concepts The study of mathematics goes as far back as to the time of the ancient Egypt and Babylonia around year 3000 BC; see, for instance, Kline (1972). Evidence on ancient Egyptian mathematics, including geometry, can be found in old papyri from the second millennia BC, such as the Moscow Mathematical Papyrus and the Rhind Mathematical Papyrus. See, for instance, Clagett (1999). Evidence of Babylonian knowledge of geometry from the second millennia BC can be seen in the Plimton 322 tablet. See, for instance, Neugebauer and Sachs (1945) and Neugebauer (1969). This section aims at giving a brief description of the mathematical concepts and terms used throughout this chapter, mainly consisting of geometry and knot theory.

670

A. Åström and C. Åström

Geometry In 300 BC, Euclid compiled the Elements, perhaps one of the most important books in the history of science and mathematics. The Elements consists of a logical system of axioms and propositions concerning geometry. For an English translation, see, for instance, Heath (1956a,b,c). In this chapter, there are two Euclidean geometries of interest: the two-dimensional plane E2 and the three-dimensional space E3 ; see, for instance, Coxeter (1989) or Thurston and Levy (1997). In E2 , for instance, points can be defined, and line segments or arcs can be drawn between them, constructing different geometrical motifs or figures. Grünbaum and Shephard (1989) describes a motif according to: Definition 1. A motif M is a nonempty set in E2 . With distance preserving operations, called isometries or rigid motions, these motifs or figures can be transformed into un-altered shapes. This concept is known as symmetry, which is an isometry that maps an image onto itself. In E2 , the following symmetries are possible: translation, glide reflection, rotation, and mirror reflection. For further information regarding symmetry, see Weyl (1952), Jablan (2002), Conway et al. (2008), and Rosen (2008), among others. Symmetry gives us that a pattern in the plane can be created by the repetition of a motif, for instance, translation. Grünbaum and Shephard (1989) defines what they call a mono-motif pattern P according to: Definition 2. A mono-motif pattern P with motif M is a nonempty family {Mi | i ∈ I } of sets in E2 , where I is an index set, if the following conditions are fulfilled: (i) Mi ∩ Mj = ø, for each pair Mi , Mj with i, j ∈ I , (ii) Mi ∼ = M, ∀i ∈ I , (Mi is called a copy of M) and (iii) there is an isometry in E2 that maps P onto itself and Mi onto Mj , for each pair Mi , Mj with i, j ∈ I . Describing a non-trivial pattern, they have the following definitions (Grünbaum and Shephard, 1989): Definition 3. A pattern P with motif M is discrete if M is a bounded and connected set. Definition 4. A discrete pattern P with motif M and index set I is non-trivial if card(I ) ≥ 2. Further, symmetry groups are often divided into three classes, finite, onedimensional, and two-dimensional designs. The two later groups are sometimes referred to as monotranslational and ditranslational designs, respectively. For further information, see, for instance, Washburn and Crowe (1991), Home and Hann (1998)

23 Projections of Knots and Links

671

and Horne (2000). The symmetries of foremost interest in this chapter are rotational symmetry and the two-dimensional designs. Rotational symmetry in E2 consists of cyclic and dihedral symmetries, denoted Cn and Dn , respectively, where n is the order of symmetry. The two-dimensional designs can be divided into 17 distinct symmetry groups. The one of interest to this chapter is the one called p4m.

Knot Theory The mathematical study of knots and links is referred to as knot theory which is a branch of topology. This field is relatively new, just a few hundred years old, but has occupied mathematicians such as Gauss and Tait, among others. For a history of knot theory, see, for instance, Van De Griend (1996). Yet another branch of topology is Braid theory. A braid can be described by a set of points A = {a1 , a2 , . . . , an } on the side of a rectangle (which frames the braid) and set of points B = {b1 , b2 , . . . , bn } on the opposite side of the rectangle, with each pair of points {ai , bj }, i ∈ A, and j ∈ B connected by a string. The direction of each string always heads the same way. See, for instance, Artin (1925) or Adams (2000). In fact, looking at incisions on cylinder seals featuring knots and links which originates in Mesopotamia from the late fourth millennia to the second millennia, it can be see that some impressions will rather be braids than knots or links. If the pattern on the cylinder surface is a knot or link that goes around the cylinder, the impressions will naturally become a braid. See Fig. 2 for an example. Thus, on the one hand, studying the knot or link on the artifact, knot theory would be used to describe it. On the other hand, studying the impressions made by the cylinder seal, braid theory would perhaps rather be used to describe it. In fact, the closure of a braid would construct a link, which would give the least number of tangles to obtain the closed braid, known as the braid index. This could instead be used in the study. However, in this chapter, knot theory is chosen for the study. For further information on braid theory, see, for instance, Birman (1974). A knot can mathematically be defined as

Fig. 2 Near eastern cylinder seal with corresponding impression on clay: (a) cylinder seal featuring a knot and (b) impression of the cylinder seal featuring a braid

672

A. Åström and C. Åström

Definition 5. A knot K is a closed-loop curve in E3 . Thus, a knot is like a piece of string where the ends are joined; for further description, see Kauffman (1987, 1993), Adams (2000, 2008), and Cromwell (2004), among others. This also tells us that a knot can have only one component. A structure with more than one component can be seen as several interlinked knots, which is called a link L. Links are sometimes being defined as a set of one or more knots; thus a knot is a link of only one component. The number of components in a link L is referred to as its multiplicity, denoted μ(L). Throughout this chapter, a link is considered being a set of one or more knots: Definition 6. A Link L is a set of disjoint knot(s) in E3 . Two knots K1 and K2 are said to be equivalent if K1 can be transformed, with an ambient isotopy, into K2 and vice versa without cutting the strings. Otherwise, the knots are inequivalent; see, for instance, Kauffman (1987). Knot equivalence can be expressed formally as Definition 7. Two knots K1 and K2 are considered equivalent if and only if there is an orientation preserving homeomorphism f : E3 → E3 such that f (K1 ) = K2 . Knots and links are three-dimensional objects, which is why they are commonly represented by projections onto E2 , with additional information for the crossing points, i.e., the over- and undercrossings. These projections are called knot or link diagrams (Alexander, 1928) or sometimes grid diagrams (Schaake and Hall, 1995; Turner et al., 1991). A projection of a knot can be transformed into an equivalent knot using three basic types of, what is referred to as, Reidemeister moves. These moves change the relation between the crossings and may also change the number of crossings; see Reidemeister (1948). The minimal number of crossings in a diagram of a knot or link L is called its crossing number, denoted by χ (L). These overand undercrossings represent the braiding pattern of the knot or link or as it is sometimes called the weaving-pattern; for the latter, see, for instance, Schaake and Turner (1991). A common braiding pattern is the alternating pattern, defined as Definition 8. A link that consists of over- and undercrossings with a period length of 2 is said to be alternating.

Knotwork Concepts In this chapter, the focus lies on a few of the knotworks that are frequently encountered on archeological artifacts, foremost in Asia, Europe, and Africa. They are described mathematically in order to have definitions in place for subsequent sections. One of the foremost reasons why using the term knotwork, which in

23 Projections of Knots and Links

673

some sense is redundant to knots and links, is because of the decorative and artful appearance a symmetrical knot or link will display. The aim below is to try and preserve this thought in the definitions below with the use of geometry rather than knot theory for some specific aspects. That is, since these knots and links are topological objects, they are in fact deformable and flexible structures, and the focus lies mainly on the specific projections on the studied artifacts.

Rectangular Diagonal Knotwork A Rectangular diagonal-shaped knotwork is a knot or link, like the ones shown in Fig. 3. These can be characterized by its number of bights in x and y directions, u and v, respectively. Most archaeologically encountered rectangular diagonal knotworks are alternating, which is why the focus lies on these. Throughout, rectangular diagonal knotworks are referred to as Ru,v . Note also that the link shown in Fig. 1 can be classified as an R2,2 knotwork. An interesting approach to describe designs like these are made by Gerdes (2007b). He describes them as mirror curves by defining a set of diagonal designs D[m, n] consisting of curves (straight diagonal lines), which in turn are described by rectangular grids RG[m, n] of arbitrary size, where m, n ∈ N. He describes the diagonal designs as the trace(s) of one or more light rays reflected back and forth within the rectangular grid with an angle of 45◦ in relation to the grid, in such a way that each defined coordinate in the grid is covered. The lines are then transformed into a smooth mirror lines design by a set of transformation rules. The sides of the rectangles are described as mirrors. He expands this concept to allow double-sided mirrors placed within the grid, forcing the imagined light rays to be reflected in different ways; see Gerdes (2007b) and Jablan (2001, 2012, 1995). This has previously been described by Allen and Anderson (1903) and Allen (1912) in the early twentieth century as breaks in their analysis of Celtic and Pictish knotworks. An alternative is given by Dunham (2000) for the construction of knotwork, referring to this concept as avoiding tiles. However, the concept of using these breaks within the knotworks has been further left out of this chapter and just duly noted for the sake of completeness. In general, patterns which need to be

Fig. 3 Examples of knot and link diagrams for rectangular knotworks: (a) R3,3 , (b) R4,3 and (c) R5,3

674

A. Åström and C. Åström

described using breaks seem to first appear in ancient Greece and Rome, shaped as rectangular borders and L-shaped knotworks made of tesserae, forming mosaics. Patterns like these are discussed by Fisher and Mellor (2004). Further, Lomonaco and Kauffman (2008) use a system of mosaic tiles referred to as knot mosaic to construct various knots and links in their study of quantum knots. A knot mosaic is constructed using an arbitrary set of the 11 presented tiles, 5 unique ones if considering rotational symmetry. Other examples of describing knotworks are made by Bain (1973, 1986). However, their approaches are more focused on methods for constructing knotwork using paper and pen. The method described by Fisher and Mellor (2004) could be seen as a refinement of Bain. They describe a knotwork like the ones in Fig. 3 by using a square grid of an arbitrary number of rows and columns, with each cell in the grid populated with a rotated inscribed square. A crossing is inserted where the vertex of an inscribed square meets another square’s vertex. Below, this method is used, somewhat more formalized by using Grünbaum and Shephard (1989) concept of mono-motif patterns. For tutorial reasons, the construction of the intermediate pattern P with motif M within a square grid SG[u, v] is illustrated, where u, v ∈ N are representing the number of columns and rows, respectively. Our motif M can be described as a square of the size of one cell in the square grid, rotated π4 radians around its own center and scaled by the factor √1 . See Fig. 4 for an illustration. Using this motif, 2 a ditranslational design with p4m symmetry of any given size can be created, by copying the motif horizontally and vertically according to the number of rows and columns; see Fig. 5 for examples. The number of copies of M in P is equal to card(I ) = u · v, where I is the index set. Letting the allowed pattern be discrete and nontrivial gives that patterns like Fig. 5a are possible. Thus, patterns consisting of multiple copies of M are either in a single row or a single column. Applying the crossing to P and obtaining the corresponding knotwork Ru,v , clearly, the result is a projection of a trivial knot, the unknot. However, from an aesthetical point of view, it is still considered a knotwork. This will be proved useful further below.

Fig. 4 Motif M within a square grid SG[1, 1]

Fig. 5 Pattern produced by motif M with p4m symmetry: (a) 2 × 1, (b) 2 × 2, (c) 3 × 2, (d) 3 × 3 and (e) 4 × 3

23 Projections of Knots and Links

675

Looking at the square grids in Fig. 5, it can be seen that the squares of motif M in adjacent cells intersect in such a way that the vertices form line segments from side to side in the outer rectangle border. Each pair of adjacent cells in both x and y directions can be transformed into either an over- or an undercrossing. Basically, all crossings between pairs of adjacent cells in x direction can be appointed either an over- or an undercrossing, while all pairs of adjacent cells in y direction instead get an under- or over-crossing; see Fig. 6 for an example. Studying Fig. 6, the number of crossings can easily be derived; thus χ (R) = (u − 1) · v + (v − 1) · u.

(1)

Finally, adding both over- and undercrossings to the pattern gives the resulting knotwork R4,3 as in Fig. 7. The result is a clear approach, which can be used to describe and define a rectangular diagonal knotwork:

Fig. 6 Example of crossings in a 4 × 3 pattern of motif M: (a) each pair of horizontally adjacent cells has over-crossings, and (b) each pair of vertically adjacent cells has undercrossings

Fig. 7 Example of the resulting R4,3 knotwork: (a) over- and undercrossing shown in grid diagram style and (b) over- and undercrossing presented as a knot diagram

676

A. Åström and C. Åström

Definition 9. A discrete and nontrivial mono-motif pattern P with motif M, visually defined in Fig. 4, with u · v copies of M arranged by translational symmetry in u columns and v rows where each pair of row-wise adjacent vertices consists of over-crossings and each pair of column-wise adjacent vertices consists of undercrossings or vice versa, is called a rectangular diagonal knotwork Ru,v . The number of components in rectangular diagonal knotworks can be calculated with the gcd theorem with the number of bights in x and y directions. To our knowledge, this was first mentioned by Carmichael (1922) but without giving any proof. A proof is however given by Fisher and Mellor (2004). Thus, the number of components in a Ru,v μ(R) = gcd(u, v).

(2)

Circular Knotworks There are many different kinds of circular-shaped knotworks. A circular knotwork can be defined as a knotwork with a rotational symmetry of some kind. In knot theory, these would perhaps best be described as Periodic knots. A knot K in E3 is said to be periodic with period n if there is a periodic map f of (E3 , K) such that f is a 2π/n rotation about a line F in E3 and that F and K are disjoint. See, for instance, Murasugi (1971), Livingston (1993), and Kawauchi (1996). Further, the most prominent type of circular knotwork, encountered on archaeological artifacts up until the eight- or ninth-century AD, is the Turk’s head. This type is described further in the subsequent section. There are other types of archaeologically encountered knotworks, yet resembling Turk’s heads. An example is the late eighth-century knotwork type that can be found on the Hilton of Cadboll and the Nigg stones in Easter Ross in Scotland. See Allen and Anderson (1903) for a description of the stones and the carvings. See Åström and Åström (2011, 2015, 2017) for a description of the mathematics, including the equations for the number of components.

Turk’s Head The naming of these knots as Turk’s heads seems at least to have been made some 200 years ago. In the late eighteenth century, Röding (1798a,b) referred to them in German as “Turkiche knoop,” which in English is Turkish knot. Lever (1819) describes that the finalizing part of tying these knots will form a kind of Crown or Turban, which gives some explicit hints to what the name refers to. A Turk’s head is a knot or link like the ones shown in Fig. 8. These can be characterized by its number of bights and leads, p and q, respectively. Bights can be seen as segments of rope that form the boundary of the knot or link which give its scalloped appearance. Leads can be seen as individual pieces of rope that cross every vertical line in the diagram. A Turk’s head is alternating, according to Definition 8. Throughout, Turk’s

23 Projections of Knots and Links

677

Fig. 8 Examples of knot and link diagrams for Turk’s heads: (a) H5,3 , (b) H6,3 and (c) H7,3

heads are referred to as Hp,q . As mentioned above, the link shown in Fig. 1 can be classified as an R2,2 knotwork; it can also be classified as H4,2 . This is however the only one that can be classified as both types. Since this chapter focuses on projections of knots and links onto E2 and their corresponding geometrical patterns, it may be useful with a definition focused on the projection of a Turk’s head rather than the knot itself. Thus, in more of a geometrical manner: Definition 10. Let p and q be integers, both greater than or equal to 2. Then, a set of points {P1 , . . . , Pp }, pairwise connected by tangles with p-fold rotational symmetry according to q, where q is the number of steps between points, is called a Turk’s head Hp,q , with p bights and q leads if it is alternating. Note however that our definition does not exclude other shapes that a Turk’s head may have; it is still a topological object that may be deformed in any possible way. In fact, another shape than the flat Turk’s heads presented in Fig. 8 is the shape of a cylindrical shell. In order to understand the mathematical structure of a Turk’s head and more specifically how it will look in the shape of a cylindrical shell, a different view of it may be helpful. By cutting a flat Turk’s head open along a ray emanating from the center of the knot and unfolding it, it may be arranged in a rectangular grid. This is shown in Fig. 9. By bending the structure as if around a cylinder and pairwise connecting the loose ends, top left with top right, continuing downward until all are connected, the result is a Turk’s head formed as a cylindrical shell. Note also that viewing a Turk’s head in this way shows the resemblance with rectangular diagonal knotworks. In addition, looking at the highlighted tangle, it can be seen how the number of leads q relates to the number of steps of a tangle between bights in the margin of the knot. Studying Fig. 9 it can intuitively be seen that the number of crossings is equal to χ (H ) = p · (q − 1).

(3)

A Turk’s head knot can also be referred to as a Rosette knot from the German “Rosettenknotten” coined by Krötenheerdt (1964, 1971); see also Murasugi (1965). For further examples and reading of Turk’s heads in general, see the works of

678

A. Åström and C. Åström

Fig. 9 Knot diagram of a cut and unfolded Turk’s head H7,4 . The figure highlights one tangle which shows the relation between the number of leads q and the number of steps of a tangle between bights in the margin

Öhrvall (1908), Brochmann (1941), Ashley (1944), Lund (1968), Nilsen (1978), Hall (1996), and Grant (2002), among others. A Turk’s head is considered a knot if and only if its number of bights and leads are coprime. With more than one component, it is considered a link. That is, the number of components μ in a Turk’s head Hp,q (i.e., its multiplicity) can be calculated with the greatest common divisor (gcd) theorem of its number of bights and leads. For a proof of the gcd theorem for Turk’s heads, see Schaake and Turner (1988) and Turner and Schaake (1991) and further discussions in Coleman (1997, 2008) and Canute (2008). Clearly, μ(H ) = gcd(p, q).

(4)

Archaeological and Historical Aspects In this section, focus lies on those cultures and civilizations that seem to have had the most prominent use of knotworks as decorations, at least to the best of our knowledge. Further, focus lies on the time periods that appears to have had a direct or an indirect impact on the development of knotworks. That is, this section is not aimed at being complete, in terms of listing all cultures and civilizations at all time periods, which may have practiced the art of decorating artifacts with knotworks. The artifacts are studied chronologically but divided into the geographical areas where these ancient civilizations had their roots. Thus, some overlap in time is to be expected. The ancient near east is often regarded as the cradle of civilizations and birthplace of some of the most important inventions made by mankind, for instance, the wheel, or as mentioned above mathematics. Even before the invention of writing, various means have been used by man in order to identify oneself or to safeguard important goods from tampering or steal the content of vessels or jars. Examples of such objects used to achieve this were stamp seals and cylinder seals with inscriptions or marks of various kind. The stamp seal first appeared sometime in the late Neolithic, eight–fifth millennia BC, in Mesopotamia, present-day Iraq

23 Projections of Knots and Links

679

Fig. 10 Drawing of late Uruk period cylinder seal impression of item NBC 3968 in the Yale Babylonian collection, where p indicates the period length. See Buchanan (1981) for a further description

(Denham, 2013). The cylinder seal started to emerge in Mesopotamia in the middle or late fourth millennia BC known as the Uruk period; see, for instance, Teissier (1984). By rolling a cylinder seal on wet clay, a laterally reversed impression of the inscription or markings are impressed into the clay to mark the ownership of the seal. These cylinder seals, often rather small (about 1.5–4 cm), made of hematite, statite, marble, limestone, etc., are carved or drilled showing figurative as well as geometrical motifs. There are some seals from the late Uruk period which show sinusoidal carvings around the seals, as to form closed loops. However, these seals do not have any clear crossings where the lines cross each other; see Fig. 10 for an example. In the succeeding period of Uruk, called Jemdet Nasr (3100–2900 BC), seals with similar sinusoidal patterns can be found, still with no clear over-unders. Some specimens are, for instance, held by the British museum; see items BM 119202, BM 120562, BM 126425, BM 126326. After the Jemdet Nasr Period comes the Early Dynastic period, around 2900–2350 BC, which the artifact in Fig. 1 belongs to. In this period, there are plenty of artifacts showing clear knot and link patterns. Following the Early Dynastic period, more or less throughout the greatness of Mesopotamia, artifacts with knot and link patterns can be found. Figure 11 shows an example of a H10,2 link on a disk-shaped seal or amulet, which according to Von der Osten (1934) is post-Hittite, a territory consisting of Asia Minor and Syria. The artifact is dated to after 1200 BC. Further examples of knots and links on cylinder and stamp seals originating in Mesopotamia can be found in Ward (1899, 1910), Carnegie (1908), Delaporte (1923), Frankfort (1939, 1955), and Pittman (1987), among many others. Projections of knots and links are neither limited to seals nor to Mesopotamia and Asia Minor in the first to the late third millennia BC; there are findings from Egypt as well as the Indus Valley. In Wallis (1898), some Egyptian pottery from the McGregor collection with artifacts from 2100 to 1800 BC is shown which displays intertwined bands around the neck on the pottery, forming Turk’s heads. Mackay (1943) shows a stamp seal from Sindh in the lower part of the Indus Valley, from the Jhukar period, which displays a flat H (5, 2), referred to as a interlaced coil pattern. Piggott (1950) writes that this specific motif on the seal is comparable with the motifs in Hittite contexts in Asia Minor. The H (2, 2) Turk’s head is another

680

A. Åström and C. Åström

Fig. 11 Link diagrams of H10,2 : (a) interpretation of No. 385 of Von der Osten (1934) and (b) a topologically equivalent link, transformed using a series of Reidemeister moves of Type I

common motif that can be found on artifacts belonging to the Bactria-Margiana Archaeological Complex in Central Asia in the second to late third millennia BC; see, for instance, Sarianidi (1986, 1981a) and Hiebert (1994). Some examples of the H (2, 2) link consist of intertwined serpents, resembling those in Fig. 1 or described as dragons by Sarianidi (1981b). In the first millennium BC, there are still plenty of examples of Circular knotworks to be found throughout Europe, for instance, on Greek pottery, painted around the neck on vases (Richter, 1959; Williams, 1999) or decorated metalworks belonging to the Hallstatt and La Tène cultures; see Finlay (1973) and Megaw and Megaw (2001), among others. The spread of mosaics in the Roman world can be seen as a new beginning for decorative knots and links where focus lies on rectangular diagonal knotworks, like the one in Fig. 12, rather than the previous focus on circular knotworks. These mosaics can be found throughout the countries which surround the Mediterranean Sea, including North Africa and South Europe, extending up to Great Britain and even the central parts of Europe. The knotworks consist mainly of rectangular diagonal knotworks, rectangular borders, and as H (2, 2) Turk’s heads, occasionally as larger Turk’s heads as well; see, among others, Ovadiah (1980), Balmelle et al. (2002), Becker and Kondoleon (2005), Dunbabin (2006), and Madden (2014). With the decline of the Roman Empire, and the migration period in Europe in the fourthto sixth-century AD, followed a new era. In Scandinavia and the British Isles, a vast amount of artifacts give witness to a widespread practice of decorative art including knots and links from around the eight-century AD to twelfth-century AD (Allen, 1912; Allen and Anderson, 1903). In this period of time, the evolution of the knotworks seems to have advanced and shows knotworks consisting of curls, bends, etc. in addition to the previously dominated diagonal and circular designs. The designs from this time period are often referred to as Celtic knotworks, but note that this could sometimes also include Anglo-Saxon, Pictish, and sometimes even Scandinavian knotworks. Examples from the British Isles can be found in Allen and Anderson (1903), Allen (1912) and Cromwell (2008), among others. Another great source of decorative knotworks from this time period is illuminated manuscripts such as Coptic Bibles, Qurans, and Gospels; see, for instance, Westwood (1988) for examples of illuminated manuscripts of fourth–sixteenth-century versions of

23 Projections of Knots and Links

681

Fig. 12 Mosaic with R(3, 3) knotwork. Conimbriga Portugal, second–third-century AD

the Bible. Note that these knotworks are often rather elaborate, and several of the patterns are in fact just interlaced-work. Further, examples of Scandinavian excavation sites with numerous examples of artifacts with elaborate knotworks are Birka (Arbman, 1940, 1943) and Valsgärde (Olsén, 1945). See also Arwidsson (1942) for examples of artifacts with knotworks from various sites in Sweden belonging to the Vendel Period. The findings at Sutton Hoo (Kendrick et al., 1939) in England have many similarities to the Swedish findings. On the Isle of Man, a vast number of stones and crosses can be found which can be divided into a pre-Scandinavian and a Scandinavian class, where the earliest stones are presumed to date back to the sixth- or seventh-century AD; see Cumming (1857), Kermode (1907), and Cubbon (1996). Another rather distinct style of decorative knotworks can be found in the Islamic world, emerging in the seventh-century AD. Mosaics, pottery, and metalwork are among the art forms where examples of knotworks and interlaced-work can be found. Some examples, including preceding Arabic art, are given in Gayet (1893), Atil (1975), El-Said and Parman (1976), and Rice (1986), among others. See also Grünbaum and Shephard (1992) and Cromwell (2010), among others, for mathematical analysis of Islamic art including interlaced patterns.

Contemporary and Traditional Art Nowadays the variety of knot patterns are quite large as well as the medium in which they are realized. However, two types of knotworks that still may be among the most common types of decorative knotworks are rectangular diagonal knotworks and Turk’s heads. These two types of knotwork, originating from the ancient world, can be found at various places and cultures around the world in traditional art, both as carved projections and as realized in rope or sennit, etc. These patterns are used both by sailors and craftsmen of the Western world as well as by indigenous people at various places around the world. In Indonesia, there is a tradition of weaving

682

A. Åström and C. Åström

Fig. 13 A small Egyptian stamp seal in steatite with carved pattern and corresponding impression on clay: (a) stamp seal and (b) impression on clay

baskets of coconut or palm leaves, called ketupat (Van de Griend, 1994). These designs are made by weaving together either two or four strips of leaves according to traditional basket designs in such a way that the loose ends of the strips connect and form a single closed loop, hence a knot. Some of these designs, when unfolded, are rather resembling a flat Turk’s head as pointed out by Van de Griend (1994). Another interesting example is a pattern that can be seen in Fig. 13. This has been found both in Mohenjo-Daro (Mackay, 1937a,b), in present-day Pakistan, and in Egypt. This specimen, a rather small steatite stamp seal, potentially originating from the first intermediate period of Egypt (later part of the third millennium BC) shows the exact same pattern that can be found in Angola made by the Chokwe people; see Gerdes (1999, 2007a). The Chokwe people practice an ancient tradition of sand drawings, called sona in their language, which among other things is used to illustrate tales and legends (Gerdes, 2007a). The example in Fig. 13 clearly lacks over-undercrossings, but they are often described as knot designs, coiled cord designs, as mentioned above mirror curves, etc. Other patterns that the Chokwe people draw include various rectangular diagonal knotworks and Turk’s heads but without over-undercrossings. In Gerdes (1999), there are plenty of other examples of patterns from Africa that do have over-undercrossings which forms both rectangular diagonal knotworks and Turk’s heads. Similar patterns like those made by the Chokwe people are made by the Tamil women of India. These designs, which are formed by drawing one single line, are among other things called Brahma’s knot (Gerdes, 1989). In the maritime world, there is a tradition by sailors and craftsmen of creating knotworks out of rope or sennit, normally referred to as fancywork; see Fig. 14 for examples. Ashley (1944) is a comprehensive source for many different designs made of rope, including many archaeological encountered specimens, such as flat and cylindrical Turk’s heads as well as rectangular diagonal knotworks. In Stormes

23 Projections of Knots and Links

683

Fig. 14 Examples of knotworks realized in rope: (a) an R(9, 7) and (b) various cylindrical-shaped Turk’s heads

and Reeves (2010), examples of cylindrical Turk’s heads as part of the Californian tradition of braiding can be seen.

Knotwork Analysis In this section, some mathematical properties and quantities (invariants) of knots and links are studied. The properties studied are chosen in such a way that they may help in understanding their structure and in addition facilitate the interpretation of the evolution and development of these kinds of knotworks.

The Number of Components The number of components in a link, or its multiplicity, is an important invariant, for instance, for knot artists. The multiplicity tells a knot artist how many pieces of rope he needs when creating a fancywork, and it also gives the possibility to have different colors on different components. This is something that also can be seen in Roman mosaics and in Illuminated manuscripts, which is further discussed below. But note that in both of these two examples of colored knotworks, it is more common that a component is shifting in color than that different colored components are used. Some of the early examples of knots and links from ancient Mesopotamia features components composed of one or two serpents, like the one in Fig. 1, and in addition the preceding patterns consisting of one or two intertwined serpents, where the heads and tails do not necessarily form closed loops. These may be depictions of the ancient god Nirah of Mesopotamia (Black and Green, 1992); similar ones were depicted and worshipped in ancient Egypt (Lurker, 2005). Depictions of, for instance, Nirah are shown with snakes as legs which are entwined or surrounded by snake coils, etc. Subsequently when the designs, in one way or another, have been evolved into mathematically defined knotworks, rectangular diagonal knotworks on

684

A. Åström and C. Åström

Mesopotamian artifacts show a multiplicity of one or more. However, when it comes to cylindrical Turk’s heads on cylinder seals, a multiplicity of more than one seem to be more common. One reason for this may be of constructional reasons, basically that the very small size of these seals set the limit for what is possible to create and the multiplicity is a result rather than a parameter. There may also be different meanings or purposes behind cylindrical Turk’s heads and flat ones. These kinds of cylindrical Turk’s heads are as mentioned above also common on Greek pottery, but available studies of these Turk’s heads are somewhat sparse. In fact, studies of the multiplicity of decorative knotworks are overall rather sparse. Until now, the most analyzed knotworks are likely to be those found on the British Isles from the eightcentury AD and onward. Allen (1883) writes that most of these Celtic and Pictish knotworks have a multiplicity of one or two. The circular knotwork design on the Hilton of Cadboll cross slab in Scotland is an example of a link with multiplicity of six (Åström and Åström, 2011). Nonetheless, one distinct type of cross-slab style, as categorized by Allen and Anderson (1903), shows traces of Christian-related motifs and scenes from the Bible and at the same time shows symbols from the preceding pagan times of polytheism. This is seen as a transition period. This period is as mentioned above very well ornated with both knotworks and plaitworks. It may not be too far fetched to imagine that knotworks with a multiplicity of one may have had religious meanings not conforming to Christianity. Compare with the tradition of the Tamil people (Gerdes, 1989) who strive to draw designs using only one single line, where the term line is equivalent to a component and the patterns are curves rather than knotworks. However, the reasons why one single line is preferred seem to be of religious beliefs concerning both gods and evil spirits (Gerdes, 1989). Further, when it comes to manufacturing fancywork, a multiplicity greater than one is often disregarded directly by many craftsmen; one reason may be a belief that it is easier to work with one piece of string rather than several. In Ashley (1944), for instance, when discussing the parameters of Turk’s head, a table is presented which leaves out all parameter pairs other than those with a multiplicity of one.

The Number of Crossings To know the minimum number of crossings can be useful for various objectives, for instance, measuring the complexity of links (Cromwell, 2008). The same argumentation is valid for knot artist manufacturing knotworks with rope, who in addition will have a measure to compare the complexity between different designs, such as a circular versus a rectangular one. Cromwell (2008) studies interlacedwork, or knotted designs, in Celtic art by looking at statistical distribution. Based on a small number of basic designs, he alters these designs by inserting what was described above as breaks and maps the designs to the knot table, which can be found in several standard books on knot theory, for instance, Adams (2000) and Cromwell (2004). Extending studies like these to include designs and knotworks from Mesopotamia, the Roman world, etc. would give an interesting insight in the evolution of knotworks with regard to complexity.

23 Projections of Knots and Links

685

Braiding Pattern Looking at archaeological artifacts featuring knotworks, the main body of specimen will consist of alternating patterns. The same goes for braidwork and various interlaced-works. When it comes to what Allen (1883) refers to as Celtic interlacedwork, he as well as Carmichael (1922) argues that they are always alternating. Looking at the early examples from Mesopotamia and surrounding areas, there are three noteworthy bodies of specimens preceding actual projections of knots and links. Firstly, there are examples of intertwined serpents as discussed above, these have over-under pattern but do not form closed loops. Secondly, there are examples of rectangular diagonal patterns which, except for the lack of braiding pattern, seemingly have the forms of knotworks; see, for instance, Frankfort (1939) where they are referred to as snake coils. The third one being the cylindrical sinusoidal patterns as shown in Fig. 10 or similar ones but with straight line segments instead. These lack, more or less, any over-under pattern or at least any periodic such. That is, from a historical perspective, there seem to have existed both alternating patterns and patterns structured as knotworks at the same time, but at some point in time, they were not yet integrated. Further, when it comes to contemporary knotworks extended to realizations in rope, the braiding patterns are sometimes used as an extra source for decoration; see, for instance, Fig. 14b. Braiding patterns are studied by, for instance, Schaake et al. (1992, 1988, 1990, 1991).

Symmetry As indicated above, the aim of this chapter, to some extent, is to look at knotworks from a geometrical perspective rather than a topological. Since knots and links are three-dimensional objects, their symmetry groups can be defined accordingly; this is done by, for instance, Grünbaum and Shephard (1985). Given that these knotworks are potential projections of real-life three-dimensional objects, they should perhaps be treated as such when studying their symmetry. However, they can in one way be seen as two-dimensional designs, and symmetry groups corresponding to E2 may be sufficient. Figure 15 shows three examples of Turk’s head knots where the rotational symmetry is emphasized. Depending on the application and the goal of the study, the method and constraints should be chosen to fit the current need. From a perspective of decorative art, a two-dimensional approach is perhaps convenient where two different projections of the same knot can be classified differently. Symmetry of knots are, for instance, studied by Cromwell (1993).

Coloring The use of colored components in knots or links can be seen on painted pottery in, for instance, Greece, but coloring a link in multiple colors is not seen until mosaics

686

A. Åström and C. Åström

Fig. 15 Knot diagrams for Turk’s heads with axes of rotational symmetry: (a) H3,2 , 3-fold symmetry, (b) H5,2 , 5-fold symmetry and (c) H7,2 , 7-fold symmetry

Fig. 16 Example of a R(7, 7) knotwork realized in rope with two different colors, which highlight an additional pattern

were used in the Roman era, thus indicating different components with unique colors. However, the symmetry from a geometrical point of view with regard to color seems to be prioritized over creating a realistic link. This results in both knots and links where components change color from one bight to the next one, in order to have a color symmetry. However, example of simple knotworks consisting of multiple components can be found where each component is completely colored in different colors; see, for instance, Fig. 12. This knotwork shows also that the constituent parts of mosaic, the tesserae, could be used to indicate parallel components in the Greek and Roman mosaics. Later, when Illuminated manuscripts started to emerge, for example, Coptic Bibles, Qurans, and Gospels such as the Book of Kells or Lindisfarne Gospel, more intricate patterns were used, often using multiple colors, both with uniformly colored components and components with shifting color, again giving precedence for color symmetry. Coloring in knot diagrams is further explored in, for instance, Knoll et al. (2017). Figure 16 shows an example of a knotwork realized in rope with two different colors exposing an additional pattern.

23 Projections of Knots and Links

687

Construction of Knotworks Nowadays, there are plenty of practical and user-friendly computerized tools that can be used to construct or analyze knotworks of various kind. Kaplan and Cohen (2003), for instance, present techniques and methods for construction of Celtic knotwork. Previous fairly modern methods were performed by using paper and pen, see, for instance, Bain (1973, 1986), also focusing on Celtic and Pictish knotworks. Some works have also been made based on Roman mosaics; see, for instance, Liu and Toussaint (2010) and Parzysz (2009). Further, looking at how our ancestors may have constructed their knotworks on various media, they must certainly have encountered various problems. Looking at some of the artifacts of ancient Mesopotamia that often hosted knotworks, such as cylinder seals and stamp seals, which have in common that they are quite small. As noted above, the first evidence of geometry in the study of mathematics goes back to the second millennium BC, which means that the manufacturer of the first knotworks most likely lacked this knowledge. However, the faults that can be seen in some of the specimen may rather be material problems than geometrical. Imagine these small objects, like a cylinder seal that are just a few centimeters; a minor accidental translation of an incision will have large effect on the overall pattern. Consider the Greek and Roman mosaics, constructed in a time where the study of geometry had advanced quite far and the work of Euclid had been compiled, but these concepts may not have been accessible to the common or the craftsman. These large compositions sometimes contain faults that may be related to geometry in one way or another. One potential source for faults could be the probable difficulty of overlooking a design that may span several meters. Another reason might be that the need or wish to create a pattern with a certain symmetry or dimension simply forced the craftsmen to intentionally introduce a fault into the design. With a proper geometrical thinking or knowledge, the designer should probably have been able to choose a design that conformed to the potential restraints. However, there might be other aspects in the construction of Roman mosaics that actually have considered geometrical aspects. In Fig. 17, the process of manufacturing of a knotwork realized in rope is shown. This is made using a plate with wooden spikes in the margin of the framework that ease the constructional work. These plates can be rather large, with drilled holes forming a grid system that lets the craftsmen configure the parameters of the design in a straightforward manner. Compare this arrangement of the drilled holes and spikes in the margin to the often white colored tesserae placed in the mosaic designs; see, for instance, Fig. 12. These tesserae form a grid just like the configurable system that a modern knot artist uses. These grids may have been used in a similar way as the fancywork may have helped the craftsmen during construction. In fact, these may be a reflection of how small-scale knotworks realized in rope may have been constructed as models for large-sized knotworks in the Greek and Roman mosaics. The grid system could then have been used to easily scale up the knotwork and get the proportions correct.

688

A. Åström and C. Åström

Fig. 17 Example of the manufacturing of a R(9, 5) knotwork realized in rope using a plate with wooden spikes in the margin

Discussion The use of decorations, which easily could be interpreted as projection of knots and links, has been created by man continuously for about 4500 years. Describing the main body of the first 2000–2500 years of archaeological encountered knotwork types can be done by using some fairly easy mathematics. Naturally, as time advances, more complex knotworks emerge as can be seen from the early examples of Mesopotamia up till those of Celtic and Pictish origin. As Allen (1883) points out, the knotworks found on artifacts prior to the Celtic and Pictish era are rather simple and in most cases put in a subordinate position in the designs. This is often the case seen in Greek and Roman mosaics, where various kinds of knotworks occur. These are often used as padding or as borders around more dominant figures. In the Celtic and Pictish era, however, there is a distinct evolution in the complexity of the knotworks, and in addition, they seem to be allowed as dominant designs. In this chapter, the main focus has been those designs that can be classified as mathematical knots and links, which can be somewhat abstract. However, there is a tight, undisputable connection to rope, which can be seen in some projections of knots and links displayed on various artifacts. There are, for instance, some interesting similarities between modern-day fancywork and knotworks in Greek and Roman mosaics. One resemblance is, as discussed above, the grid system of the constructional plate of the fancywork and the grid of white tiles in the mosaics. In advance, markings like these, adjacent to the crossing points, can also be seen in some knotworks painted on Greek pottery; see, for instance, Williams (1999). Further, modern-day fancyworks are often tied with a few ropes in parallel in order to achieve a thicker design, as seen in Figs. 14a and 16. This could be compared to Fig. 12 where the tiles that form the knotwork have been laid out as to create more than one string or band in parallel. However, looking at the oldest known examples of knots and links, these do not show as distinct connection to rope as those on the Roman mosaics. Some are even made to resemble serpents, as seen in Fig. 1. Knotworks may have evolved in different ways; some alternatives are listed below. Combinations of the

23 Projections of Knots and Links

689

below suggestions are possible as well as simultaneous development at different geographical locations. Some potential steps in the evolution of knotworks are: • Intertwined serpents depicted on artifacts already in the sixth millennium BC. The earliest examples do not form closed loops, but these may be predecessors of later examples were the heads and tails of the serpents are connected as to form closed loops, like the one seen in Fig. 1 forming a link. • Impressions of cordage, braids, or weaved baskets on wet clay can give a distinct pattern that could have been further elaborated by connecting the loose ends in such a way that a closed-loop design is made. Petrie (1920) writes that braid patterns may have been developed like this, from accidental markings of rope on wet clay. • Developed from geometrical designs, such as diagonal patterns or sinusoidal patterns, for instance, painted on pottery or carved lines around cylinder seals, like the one in Fig. 10. See, for instance, Moortgat (1945), Müller-Karpe (1968a,b), and Mellaart (1975) for several examples of artifacts with geometrical patterns from the middle east spanning back to the sixth millennium BC. • Depictions of actual knots and links realized in rope, leaves, or any other material that may have been used as decoration or as practical use. For instance, in Papua New Guinea, there are indigenous people who use Turk’s heads as adornments on arrows and bows (Bush, 1985; Fyfe, 2008; Van de Griend, 1994). • Tribes in Africa and India have in modern days been encountered making sand drawings; these drawings have several similarities to knotworks. Before men had the means to manufacture items that allowed them to paint or carve the objects, or simply before they knew how to carve or paint, they may have drawn the patterns. Regardless of how knotworks have been developed, it is clear that the complexity has increased as time advances. Although some designs are quite simple, the craftsmen who once created the knotworks must have had some kind of mathematical awareness in order to be able to copy these designs and adjust them by unknowingly changing the parameters following the mathematical rules behind the different types of knotwork. Note: All figures and images are either made by or photographed by A. Åström if not stated otherwise. All ropes and fancywork are made by C. Åström if not stated otherwise.

References Adams CC (2000) The knot book – an elementary introduction to the mathematical theory of knots, 1st edn. W.H Freeman and Company, New York Adams CC (2008) A brief introduction to knot theory from the physical point of view. In: Buck D, Flapan E (eds) Proceedings of symposia in applied mathematics – applications of knot theory, vol 66. American Mathematical Society, Providence, pp 1–20 Alexander JW (1928) Topological invariants of knots and links. Trans Am Math Soc 30(2): 275–306

690

A. Åström and C. Åström

Allen JR (1883) On the discovery of a sculptured stone at St Madoes, with some notes on interlaced ornament. Proc Soc Antiquaries Scotland 17:211–271 Allen JR (1912) Celtic art in pagan and Christian times, 2nd edn. Dover Publication, New York. Reprinted 2001 Allen JR, Anderson J (1903) The early Christian monuments of Scotland. The Pinkfoot Press, Balgavies, by Forfar, Angus. Reprinted 1993 Anderson J (1881) Scotland in early Christian times. Kessinger Publishing, La Vergne. Reprinted 2009 Arbman H (1940) Birka I die Gräber – Tafeln. Almqvist & Wiksells Boktryckeri Aktiebolag, Uppsala. Kungl. Vitterhets Historie och Antikvitets Akademien Arbman H (1943) Birka I die Gräber - Text. Almqvist & Wiksells Boktryckeri Aktiebolag, Uppsala. Kungl. Vitterhets Historie och Antikvitets Akademien Artin E (1925) Theorie der Zöpfe. Abh Math Semin Univ Hamb 4:47–72 Arwidsson G (1942) Vendelstile – Email und Glas Im 7.-8 Jahrhundert. Almqvist & Wiksells Boktryckeri Aktiebolag, Uppsala Ashley C (1944) The Ashley book of knots, 1st edn. Doubleday, New York Åström A, Åström C (2011) Circular knotworks consisting of pattern No.295: a mathematical approach. J Math Arts 5(4):185–197 Åström A, Åström C (2015) Circular knotworks II: combining pattern No.295 with Turk’s heads. J Math Arts 9(3–4):91–102 Åström A, Åström C (2017) A practical approach to circular knotworks consisting of pattern No.295. International Guild of Knottyers/Gipping Press, Needham Market Atil E (1975) Art of the Arab world. Smithsonian Institution, Washington, DC Bain G (1973) Celtic art the methods of construction. Dover, Mineola Bain I (1986) Celtic knotwork. BAS Printers Ltd, Over Wallop Balmelle C, Blanchard-Lemée M, Darmon J, Gozlan S, Raynaud M (2002) Le Décor Géométrique De La Mosaïque Romaine: II. Répertoire graphique et descriptif décors centrés. Picard, Paris Becker L, Kondoleon C (2005) The arts of antioch – art historical and scientific approaches to Roman mosaics and a catalogue of the Worecester art museum antioch collection. Worcester Art Museum, Worcester Birman JS (1974) Braids, links, and mapping class groups. Annals of mathematics studies, 1st edn. Princeton University Press, Princeton Black J, Green A (1992) Gods, demons and symbols of ancient Mesopotamia. The British Museum Press, London Brochmann D, (1941) Tyrkeknoper. Norsk Sjøfartsmuseum, 1st edn. Oslo, Norway Buchanan B (1981) Early near eastern seals in the Yale Babylonian collection. Yale University Press, New Haven Bush T (1985) Form and decoration of arrows from the highlands of Papua New Guinea. Rec Aust Mus 37(5):255293 Canute K (2008) Hypothesis, rule or law? The fourth installment on a philosophy of knots. Knotting Matters 27(98):28–29 Carmichael EK (1922) The elements of Celtic art, 1st edn. An Comunn Gaidhealach, Glasgow Carnegie H (ed) (1908) Catalogue of the collection of antique gems formed by James Ninth Earl of southeast K.T., vol 2. Bernard Quaritch, London Carter E, Bahrani Z, André-Salvini B, Caubet A, Tallon F, Aruz J, Deschesne O (1992) The old elamite period. In: Harper PO, Aruz J, Tallon F (eds) The royal city of Susa – ancient near eastern treasures in the Louvre, pp 81–120. The Metropolitan Museum of Art, New York Clagett M (1999) Ancient Egyptian science a source book, vol 3. Ancient Egyptian mathematics of memoirs of the American philosophical society, vol 232. American Philosophical Society, Philadelphia Coleman J (1997) Turks head knots and the rule of the greatest common factor. Knotting Matters 16(57):22–25 Coleman J (2008) Turks head knots and the role of the greatest common factor. Knotting Matters 27(100):31–33

23 Projections of Knots and Links

691

Conrad NJ (2009) A female figurine from the basal Aurignacian of Hohle Fels Cave in southwestern Germany. Nature 459(7244):248–252 Conway JH, Burgiel H, Goodman-Strauss C (2008) The symmetries of things. Ak Peters series. CRC Press, Taylor & Francis Group, New York Coxeter HSM (1989) Introduction to geometry, 2nd edn. Wiley, New York Cromwell PR (1993) Celtic knotwork: mathematical art. Math Intelligencer 15(1):36–47 Cromwell PR (2004) Knots and links, 1st edn. Cambridge University Press, Cambridge Cromwell PR (2008) The distribution of knot types in celtic interlaced ornament. J Math Arts 2(2):61–68 Cromwell PR (2010) Islamic geometric designs from the Topkapı scroll II: a modular design system. J Math Arts 4(3):119–136 Cubbon A (1996) The art of the Manx crosses. Manx National Heritage, The Manx Museum and National Trust, Douglas, Isle of Man Cumming JG (1857) The Runic and other monumental remains of the isle of man. Bell and Daldy, London de Morgan J (1924) Prehistoric man – an general outline of prehistory. The Edinburgh Press, Edinburgh Delaporte L (1923) Catalogue des Cylindres – Cachets et Pierres Gravées de Style Oriental. Librairie Hachette, Paris Denham S (2013) The meanings of late Neolithic stamp seals in North Mesopotamia. PhD dissertation, University of Manchester Dunbabin MK (2006) Mosaics of the Greek and Roman world. Cambridge University Press, Cambridge Dunham D (2000) Hyperbolic celtic knot patterns. In: Sarhangi R (ed) Bridges: mathematical connections in art, music, and science. Southwestern College, Winfield, pp 13–22 El-Said I, Parman A (1976) Geometric concepts in Islamic art. World of Islam Festival Publishing Company Ltd, London Finlay I (1973) Celtic art – an introduction. Noyes Press, New Jersey Fisher G, Mellor B (2004) On the topology of celtic knot designs. In: Sarhangi R, Séquin C (eds) Bridges: mathematical connections in art, music, and science. Southwestern College, Winfield, pp 37–44 Frankfort H (1939) Cylinder seals – a documentary essay on the art and religion of the ancient near east. Macmillan and Co., London Frankfort H (1955) Stratified cylinder seals from the Diyala region. The University of Chicago Oriental Institute Publications, vol LXXII. The University of Chicago Press, Chicago Fyfe A (2008) Gender, mobility and population history: exploring material culture distributions in the Upper Sepik and Central New Guinea. PhD dissertation, The University of Adelaide Gayet A (1893) L’Art Arabe. Ancienne Maison Quantin, Paris Gerdes P (1989) Reconstruction and extension of lost symmetries: examples from the tamil of South India. Comp Math App 17(4–6):791–813 Gerdes P (1999) Geometry from Africa: mathematical and educational explorations. The Mathematical Association of America, Washington Gerdes P (2007a) Drawings from Angola: living mathematics. Privately Published, Maputo Gerdes P (2007b) LUNDA geometry: mirror curves, designs, knots, polyominoes, patterns, symmetries, 2nd edn. Lulu Enterprises, MorrisviHe. First published in 1996 Grant B (2002) Encyclopedia of rawhide and leather braiding, 1st edn. Cornell Maritime Press, Centreville Grünbaum B, Shephard GC (1985) Symmetry groups of knots. Math Mag 58(3):161–165 Grünbaum B, Shephard GC (1989) Tilings and patterns – an introduction, 1st edn. W. H Freeman and Company, New York Grünbaum B, Shephard GC (1992) Interlace patterns in islamic and moorish art. Leonardo 25(3/4):331–339. Visual mathematics: special double issue Hall T (1996) Introduction to Turk’s-head knots, 1st edn. Privately Published

692

A. Åström and C. Åström

Heath STL (1956a) The thirteen books of Euclid’s elements, 2nd edn. Books I and II, vol 1. Dover Publications Inc., New York Heath STL (1956b) The thirteen books of Euclid’s elements, 2nd edn. Books III and IX, vol 2. Dover Publications Inc., New York Heath STL (1956c) The thirteen books of Euclid’s elements, 2nd edn. Books X and XIII, vol 3. Dover Publications Inc., New York Herzfeld EE (1941) Iran in the ancient east – archaeological studies presented in the Lowell lectures at Boston. Oxford University Press, London Hiebert FT (1994) Origins of the bronze age oasis civilization in Central Asia. American School of Prehistoric Research Bulletin 42. Peabody Museum of Archaeology and Ethnology Harvard University, Cambridge Home CE, Hann MA (1998) The geometrical basis of patterns and tilings: a review of conceptual developments. J Text Inst 89(1):27–46 Horne CE (2000) Geometric symmetry in patterns and tilings, 1st edn. Woodhead Publishing Ltd/CRC Press LLC, Cornwall Jablan S (2001) Mirror curves. In: Sarhangi R, Séquin C (eds) Bridges: mathematical connections in art, music, and science. Southwestern College, Winfield, pp 233–246 Jablan S (2012) Mirror-curves and knot mosaics. Comput Math Appl 64(4):527–543 Jablan SV (1995) Curves generated by mirror reflections. Filomat 9(2):143–148 Jablan SV (2002) Symmetry, ornament and modularity. Series on knots and everything, vol 30. World Scientific, Singapore Jablan SV, Radovi´c L, Sazdanovi´c R, Zekovi´c A (2012) Knots in art. Symmetry 4(4):302–328 Kaplan M, Cohen E (2003) Computer generated celtic design. In: Proceedings of the 14th Eurographics workshop on rendering, EGRW’03, pp 9–19 Kauffman LH (1987) On knots. Annals of mathematics studies, 1st edn, vol 115. Princeton University Press, New Jersey Kauffman LH (1993) Knots and physics. Series on knots and everything, 2nd edn, vol 1. World Scientific, Singapore Kawauchi A (1996) A survey of knot theory. Birkhäser Basel, Basel Kendrick TD, Kitzinger E, Allen D (1939) The sutton hoo finds. Br Mus Q 13(4):ii+111–136 Kermode P (1907) Manx crosses. The Pinkfoot Press, Balgavies, Angus. Reprinted 1994 Kline M (1972) Mathematical thought from ancient to modern times. Oxford University Press, New York Knoll E, Taylor T, Landry W, Carreiro P, Puxley K, Harrison K (2017) The aesthetics of colour in mathematical diagramming. In: Swart D, Séquin, CH, Fenyvesi K (eds) Proceedings of bridges 2017: mathematics, art, music, architecture, education, culture, pp 563–570 Köhler EC (2011) The rise of the Egyptian state. In: Teeter E (ed) Before the pyramids – the origins of Egyptian civilization. Oriental Institute Museum Publications 33. The Oriental Institute of the University of Chicago, Chicago, pp 123–125 Krötenheerdt O (1964) Über eine speziellen typ alternierender knoten. Mathematische Annalen 153(4):270–284 Krötenheerdt O (1971) Zur lösung des isotopieproblems der rosettenknoten. In: Herrmann M, Kertész A, Krötenheerdt O (eds) Beiträge zur Algebra und Geometrie 1. Springer, Berlin/Heidelberg, pp 19–31 Kvavadze E, Bar-Yosef O, Belfer-Cohen A, Boaretto E, Jakeli N, Matskevich Z, Meshveliani T (2009) 30,000-year-old wild flax fibers. Science 325(5946):1359 Lasi´c VD (1995) Pleterni Ukras – Od Najstarijih Vremena Do Danas Njegov Likovni Oblik I Znaˇcenje. Ziral, Chicago Translated: the twist or guilloche as ornament from ancient times to the present: its exterior form and inner meaning. Lever D (1819) The young sea officer’s sheet anchor, or a key to the leading of rigging, and to practical seamanship, 2nd edn. Dover Publications, Inc., Mineola. Reprinted in 1998 Liu Y, Toussaint GT (2010) Unraveling roman mosaic meander patterns: a simple algorithm for their generation. J Math Arts 4(1):1–11

23 Projections of Knots and Links

693

Livingston C (1993) Knot theory. Carus mathematical monographs, vol 24, 1st edn. The Mathematical Association of America, Washington Lomonaco SJ, Kauffman LH (2008) Quantum knots and mosaics. Quantum Inf Process 7(2–3): 85–115 Lund K (1968) Måtter og rosetter, 1st edn. Borgen, Ringkøbing Lurker M (2005) An illustrated dictionary of the gods and symbols of ancient Egypt. Thames and Hudson, London Mackay EJH (1937a) Further excavations at Mohenjo-Daro, vol. I: text. Government of India Press, New Delhi Mackay EJH (1937b) Further excavations at Mohenjo-Daro, vol. II: plate I – CXLVI. Government of India Press, New Delhi Mackay EJH (1943) Chanhu-Daro excavations 1935-36. American oriental series, vol 20. American Oriental Society, New Haven Madden AM (2014) Corpus of Byzantine church mosaic pavements from Israel and the Palestinian territories. Peeters, Leuven Megaw R, Megaw V (2001) Celtic art – from its beginnings to the book of kells. Thames and Hudson, New York Mellaart J (1964) Excavations at Çatal hüyük, 1963, third preliminary report. Anatolian Studies 14(1):39–119 Mellaart J (1975) The neolithic of the near East. Thames and Hudson Ltd., London Moortgat A (1945) Die Entstehung der Sumerischen Hochkultur. Der Alte Orient, Band 43. J. C. Hinrichs Verlag, Leipzig Müller-Karpe H (1968a) Handbuch der Vorgeschichte – Jungsteinzeit, vol 2. Text. C.H. Beck’sche Verlagsbuchhandlung, München Müller-Karpe H (1968b) Handbuch der Vorgeschichte – Jungsteinzeit, vol 2. Tafeln. C.H. Beck’sche Verlagsbuchhandlung, München Murasugi K (1965) Remarks on rosette knots. Math Ann 158(5):290–292 Murasugi K (1971) On periodic knots. Commentarii mathematici Helvetici 46:162–177 Neugebauer O (1969) The exact sciences in antiquity, 2nd edn. Dover Publications Inc., Mineola Neugebauer O, Sachs A (1945) Mathematical cuneiform texts. Pub. jointly by the American Oriental Society and the American Schools of Oriental Research, New Haven Nilsen KW (1978) Om tyrkerknop slått i handa. In: Molaug S, Kolltveit B, Dahl GB (eds) Norsk Sjøfartsmuseum, 1th edn. Aktietrykkeriet i Stavanger, Oslo, pp 105–154 Öhrvall H (1908) Om knutar, 1st edn. Bonniers, Stockholm Olsén P (1945) Die Saxe Von Valsgärde I. Almqvist & Wiksells Boktryckeri AB, Uppsala Ovadiah A (1980) Geometric and floral patterns in ancient Mosaics: a study of their origin in the Mosaics from the classical period to the age of Augustus. L’Erma di Bretscheider, Rome Parzysz B (2009) Using key diagrams to design and construct roman geometric mosaics? Nexus Netw J 11(2):273–288 Petrie WMF (1920) Egyptian decorative art – a course of lectures delivered by the royal institution, 2nd edn. Methuen & Co, LTD., London Piggott S (1950) Prehistoric India – to 1000 B.C. Penguin Books LTD, Harmondsworth Pike AWG, Hoffmann DL, García-Diez M, Pettitt PB, Alcolea J, De Balbín R, González-Sainz C, de las Heras C, Lasheras JA, Montes R, Zilhão J (2012) U-series dating of paleolithic art in 11 caves in Spain. Science 336(6087):1409–1413 Pittman H (1987) Ancient art in miniature – near eastern seals from the collection of Martin and Sarah Cherkasky. The Metropolitan Museum of Art, New York Porada E, Hansen DP, Dunham S, Babcock SH (1992) The chronology of mesopotamia, ca. 7000 1600 B.C. In: Ehrich RW (ed) Chronologies in old world archaeology, 3rd edn, vol 1. University of Chicago Press, Chicago, pp 77–121 Przytycki JH (2009) The trieste look at knot theory. In: Kauffman LH, Lambropoulou S, Jablan S, Przytycki J (eds) Introductory lectures on knot theory: selected lectures presented at the advanced school and conference on knot theory and its applications to physics and biology. Series on knots and everything, vol 46. World Scientific Publishing Co, Singapore, pp 407–441

694

A. Åström and C. Åström

Reidemeister K (1948) Knotentheorie. Ergebnisse Der Mathematik und Ihrer Grenzgebiete, vol 1. Chelsea Publishing Company, New York Rice DT (1986) Islamic art. Thames and Hudson, London Richter GMA (1959) A handbook of Greek art. Phaidon Press Limited, London. Reprinted 1994 Röding JH (1798a) Allgemeines Wortenbuch Der Marine, vol 3. Licentiat Nemnich und Adam Freidrich Böhme, Hamburg Röding JH (1798b) Allgemeines Wortenbuch Der Marine, vol 4. Licentiat Nemnich und Adam Freidrich Böhme, Hamburg Rosen J (2008) Symmetry rules. How science and nature are founded on symmetry, 1st edn. Springer, Berlin Sarianidi V (1986) Die Kunst des alten Afghanistan. VEB E.A. Seemann Verlag, Leipzig Sarianidi VI (1981a) Margiana in the bronze age. In: Kohl PL (ed) The bronze age civilization of Central Asia – recent soviet discoveries. M. E. Sharpe, Inc., New York, pp 165–193 Sarianidi VI (1981b) Seal-amulets of the murghab style. In: Kohl PL (ed) The bronze age civilization of Central Asia – recent Soviet discoveries. M. E. Sharpe, Inc., New York, pp 221–255 Schaake GA, Hall T (1995) Braiding instructions and their presentation. Braider 1(1):2–12 Schaake GA, Hall T, Turner JC (1992) Braiding – standard herringbone knots. A series of books on braiding, book 3/1, 1st edn. Department of Mathematics and Statistics University of Waikato, Hamilton Schaake GA, Turner JC (1988) A new theory of braiding. Research report 1/1, No. 165, 1st edn. Privately Published, Hamilton Schaake GA, Turner JC (1991) An introduction to flat braids. Pamphlet No. 5, 1st edn. Privately Published, Hamilton Schaake GA, Turner JC, Sedgwick DA (1988) Braiding – regular knots. A series of books on braiding, book 1/1, 1st edn. Department of Mathematics and Statistics University of Waikato, Hamilton Schaake GA, Turner JC, Sedgwick DA (1990) Braiding – regular fiador knots. A series of books on braiding, book 2/1, 1st edn. Department of Mathematics and Statistics University of Waikato, Hamilton Schaake GA, Turner JC, Sedgwick DA (1991) Braiding – standard herringbone pineapple knots. A series of books on braiding, book 4/1, 1st edn. Department of Mathematics and Statistics University of Waikato, Hamilton Soffer O (1997) The mutability of upper paleolithic art in central and Eastern Europe: patterning and significance. In: Conkey M, Soffer O, Stratmann D, Jablonski N (eds) Beyond art: pleistocene image and symbol, Wattis symposium series in anthropology, Memoirs of the California academy of sciences, vol 23. University of California Press, San Francisco, pp 239–262 Stormes C, Reeves D (2010) Luis Ortega’s rawhide artistry – braiding in the California tradition. University of Oklahoma Press, Oklahoma Teissier B (1984) Ancient near eastern cylinder seals – from the Marcopoli collection. University of California Press, Los Angeles Thurston WP, Levy S (1997) Three-dimensional geometry and topology, vol 1. Princeton mathematical series, 35. Princeton University Press, New Jersey Turner JC, Schaake GA (1991) A proof of the law of the common divisor in braids. Knotting Matters 10(35):6–10 Turner JC, Schaake GA, Sedgwick DA (1991) Introducing grid-diagrams in braiding, 1st edn. Privately Published, Hamilton Van de Griend P (1994) Ketupat knot designs. Privately Published, Århus Van De Griend P (1996) A history of topological knot theory. In: History and science of knots. Series on knots and everything, vol 11. World Scientific Publishing Co. Pte. Ltd., Singapore, pp 205–260

23 Projections of Knots and Links

695

Von der Osten HH (1934) Ancient oriental seals in the collection of Mr. Edward T. Newell. The University of Chicago Oriental Institute Publications, vol XXII. The University of Chicago Press, Chicago Wallis H (1898) Egyptian ceramic art – the Macgregor collection. Taylor and Francis, London Ward WH (1899) The hittite gods in hittite art. Am J Archaeol 3(1):1–39 Ward WH (1909) Cylinders and other ancient oriental seals – in the library of J. Pierpont Morgan. Privately published, New York Ward WH (1910) The seal cylinders of western Asia. Carnegie Institution of Washington, Washington, DC Washburn DK, Crowe DW (1991) Symmetries of culture: theory and practice of plane pattern analysis, 1st edn. University of Washington Press, Seattle/London Westwood JO (1988) The art of illuminated manuscripts – illustrated sacred writings. Arch Cape Press, New York Weyl H (1952) Symmetry, 1st edn. Princeton University Press, Princeton Williams D (1999) Greek vases, 2nd edn. British Museum Press, London

Comparative Temple Geometries

24

Kelly McGonigal

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Islamic Region and Religion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trading Mathematics and Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Islamic Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Islamic Geometric Patterns and Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Japanese Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Japanese Temple Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

698 698 700 702 703 705 706 708 709 709

Abstract This chapter will explore the historical developments of mathematically influenced artistic works in both Islamic and Japanese regions. The main artworks to be examined will include Islamic geometric patterning and Japanese temple geometry. Through a comparative analysis of historical developments of these regions, we can gain a perspective on cultural influences and the interconnectivity of ancient civilizations and how these aspects impacted mathematical artworks. Cultural influences such as religion and beliefs play an important role in mathematics and art. Islamic regions and Japan may seem at first culturally isolated from one another, but an exploration of ancient trade routes such as the Silk Road will diminish this idea of isolation. We will take a historic look at the transfer of mathematics to Islamic and Japanese regions and also the native developments in mathematics of both regions. K. McGonigal () Independent Scholar, Anaconda, MT, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_22

697

698

K. McGonigal

Keywords Islamic geometric patterning · Islam · Japanese temple geometry · Sangaku · Shinto · Silk Road

Introduction Exploring the convergence of mathematics, religion, culture, and visual artistry through a historical viewpoint can reveal contrasting aspects and also interesting similarities among different artworks of cultures. This chapter will specifically discuss the characteristics of visual artwork containing Islamic geometric patterning from the Islamic region and the artwork of Japanese temple geometry known as Sangaku. The Islamic region and Japanese region have been specifically chosen to compare due to their geographic locations being both connected historically by trade routes such as the Silk Road. Islamic geometric patterns developed through collaborations between artists and mathematicians. Japanese temple geometry was developed by unlikely mathematicians, such as samurai, farmers, and children. Both artworks were deeply influenced by cultural ideals and religion. From simple elegant geometric constructions to elaborate geometric patterning of architectural structures, this chapter will provide a historical comparative analysis of Islamic geometric patterning and Japanese temple geometry.

Islamic Region and Religion We will first get a sense of time and geography of the Islamic region. The time period we will be discussing is a 700 year interval estimated between 750 CE and 1450 CE. The Islamic region or Medieval Islam refers to the regions of the world where the religion of Islam is most prominent. The Islamic religion was first introduced by the prophet Muhammad sometime around 645 CE. Teachings of Muhammad first came to Arabic people, which eventually lead to the spread of Islam by missionaries and armies (Katz, 2007). Geographically, Medieval Islam stretched from the Iberian Peninsula through North Africa and the Middle East to central Asian republics of the former Soviet Union, Afghanistan, Iran, and even into parts of India (Katz, 2007). This agrees with the geographic classification of the Islamic region given in Crest of the Peacock by Joseph (2000). When speaking of the mathematics of this region, there are various authors who would rather use the term Arab mathematics since the main language was Arabic, but within the Medieval Islamic, regions are not just Arabic-speaking people. Iranians, Egyptians, and Moroccans are also part of Medieval Islamic regions. Thus, it is better categorized as Islamic mathematics (Katz, 2007). Similarly, the notion of Islamic artworks is associated with places and peoples of the Islamic region defined above.

24 Comparative Temple Geometries

699

It is important to get a sense of how the Islamic religion had an impact on both Islamic mathematics and artworks. The word “Islam” means submission or submission to God. The religion of Islam is classified as a monotheistic religion. In relation to mathematics, the Islamic religion perhaps acted as a gateway or spark to a new enthusiasm and interest in higher learning. This enthusiasm and interest in learning aspects of mathematics and science was known as the Muslim acquisition during 750 CE to 900 CE (Katz, 2007). Religion also had an impact on Islamic art. Since Islam is based on monotheism, artistic works tended to avoid imagery that could be interpreted or seen as idolatrous. The Islamic religion is based on the religious text called the Qur’an. Although there are no specific instructions or comments on permissible artwork, scholars have made conjectures on possible interpretations that Islamic peoples may have made from passages of the Qur’an regarding artwork. One such passage (Qur’an 5:93) reads, “O you, who believe, indeed wine, games of chance, statues, and arrows for divination are a crime, originating in Satan.” The word statue in this passage most likely refers to the idea of false idols. This cultural/religious influence lead to an opposition of creating representational forms of living entities in artworks (Ettinghausen et al., 2002). Thus, artists could create religiously appropriate artworks by implementing abstract geometric patterning and ornamentation (El-Said and Parman, 1976). Aside from geometric patterns, artists could also incorporate calligraphy, figural forms, and plant/floral motifs into the artworks as seen in Fig. 1. Besides religious reasons, there are other possible theories that led to the formation of Islamic patterning. From a western point of view, there is a tendency to place only decorative merit to Islamic geometric patterning. But perhaps there is a deeper significance to Islamic geometric patterning. It is interesting to note that patterning that becomes ornamental is, as Carol Bier puts it, “a function of completion” (Bier, 2008). This means that it is the algorithm of a repeated process, such as weaving or stacking that, through repetition, leads to a finished work of art that contains geometry. There is the possibility that complex patterns can be formed without an understanding of sophisticated mathematics. There are also some obvious theories, such as that the Islamic culture simply had a love of ornament. Another plausible theory is that patterning could have been used for markers of dynastic or political remembrance (Bier, 2008). It is also important to get a sense of the underlying principles of beliefs that may govern various forms or artistic structures in Islamic artwork. According to author Jale Nejdet Erzen, there are three principles: the principle of constant change within permanence, the principle of the uncertainty of human cognition, and the principle of love or understanding with the heart (Erzen, 2007). The first principle, constant change within permanence, expresses the constant flow and flux in the world. It expresses the constant movement in a universe that is permanent. This is lofty subject matter, but it could be that artists felt they wanted to create something worthy of the creation of God and in some ways wanted to create artwork that would bring humans closer to God. The second principle mostly deals with the incorporation of mirrors and screens in Islamic architecture, and the third deals with

700

K. McGonigal

Fig. 1 Detail of Jameh Mosque of Yazd incorporates geometric patterns with floral motifs and calligraphy. (Image by Hesamkhandan. CC BY-SA 4.0, https:// commons.wikimedia.org/w/ index.php?curid=50898633)

the admiration between artists and God (Erzen, 2007). We now turn our attention to the transmission of artistic and mathematical ideas.

Trading Mathematics and Art As ancient cultures began economic developments and became involved in commerce with other regions, many goods and ideas were transferred from culture to culture. One of the most famous trade routes is the Silk Road. Opening around

24 Comparative Temple Geometries

701

105 BCE or 115 BCE and possibly even dating back 2000 years earlier, the Silk Road served as the major trade and communication link between the Mediterranean and China. Traders, fortune hunters, adventurers, soldiers, pilgrims, wanderers, emigrants, and refugees followed the great path of the Silk Road (Franck and Brownstone, 1986). Of all the peoples that have traversed the Silk Road, the possibility of a mathematician crossing this route seems plausible. It is interesting to take note of the mathematics that may have found a way from the Islamic region to Japan, although ideas that may have transferred from Islam to Japan would not have taken a direct route via the Silk Road. As we will discuss later, many of Japan’s early influences of mathematics came from China. Thus this is why we must look at the Silk Road, since ideas from Islamic mathematics would have reached China and then would have possibly been transferred to Japan. We now will examine a plausible transference of mathematics between the Islamic region, China, and Japan. One of the many great achievements of Islamic mathematics was Ibn al-Haytham and al-Buzjani’s methods for creating what are now known as “magic squares” (Katz, 2007). According to Joseph (2000) the Chinese were first introduced to magic squares in the third millennium BCE when Emperor Yu acquired two diagrams. As the legend goes, Emperor Yu received the first diagram, Ho Thu (River Chart), from a dragon-horse that emerged from the Huang Ho River. The second diagram, Lo shu (Lo river writing), was on the back of a sacred turtle, which was found in the Lo River. Chinese developments in methods for construct magic squares were not published until Chinese mathematician Yang Hui did so in 1275 CE (Joseph, 2000). Japanese mathematics during the period from the early ages (before 522 CE) to 1600 CE was heavily influenced by Chinese learning (Smith and Yoshio (1914)). Perhaps knowledge of magic squares began in the waters of Chinese rivers. Then through the spread of Islam or through transference via the Silk Road, magic square fell into the hands of Islamic mathematicians who found methods for construction. These methods returned to the Chinese, who then passed this accumulation of knowledge on magic squares to Japan. According to math historian Jean-Claude Martzloff, ideas from Chinese mathematics spread to the east and west. It is hard to determine exact transmissions; however, we can look at similarities in techniques or ideas and determine possible influences. From Chinese mathematics we see fundamental identicalness to geometric and algebraic relationships appearing in al-Khwarismi’s works. The rule of double-false position, which originated in the Nine Chapters, can be found in Islamic mathematics. We also know of mathematical ideas that were possibly transmitted to China; these include Islamic (Arabic) spherical trigonometry and the Arabic notation for numbers. Martzloff claims that many of the mathematical ideas that were transferred to and from China were transmitted along the Silk Road. Although we must consider whether ideas were shared and accepted or whether there was simply parallel developments in mathematics. Martzloff concludes that it is possible that parallel developments occurred, while it is less likely that China rejected any outside influence and its developments were entirely isolated. Chinese mathematical ideas were also transmitted to Japan and Korea. We will examine some of the texts that originated in China and were introduced in Japan later,

702

K. McGonigal

but for now we note that much of China’s mathematical ideas on algebra were transmitted to Japan and Korea (Martzloff, 1997). This illustrates the dynamical transfer and development of mathematical ideas during ancient times. This also shows evidence that mathematics from the Islamic region were transferred to Japan. These transferences most likely had to pass through China. Mathematics crawled across a vast, ancient web. Mathematical art was also transferred from Islamic regions to Japan. Made in Iran, silk textiles adorned with patterns created by simple algorithms eventually made their way to churches in Europe and emperor’s shrines in Japan (Bier, 2009). As we have seen, the Islamic religion played a major role in the motivations for decorative geometric patterning in Islamic art. The Silk Road provided a route to spread the Islamic religion; however, this spreading lead to contact with other religions such as Buddhism. Thus there may have been a possible Buddhist influence on Islamic art. Even claims that the geometrical patterning and decorative forms found in Buddhist ivory panels and stone carvings may have been copied by Islamic artists (Elverskog, 2010). This is interesting since the Japanese temple geometry we will soon examine also has ties to Buddhism. The Qur’an influenced Islamic art in that only nonrepresentational forms were allowed. It is also interesting to note how Islamic art eventually came to accept representational forms. As Islam spread to areas where representational art was acceptable, this in turn slowly changed the view of allowable representational forms in art. Representational living forms also crept into Islamic art as new ideas progressed. As we have seen, the Qur’an forbid idolatrous forms in artwork; however, Islamic artists began to consider the fact that since idols are three-dimensional statues, then a shadow would be cast. Thus, they made the connection that if they were to create a representational living form in their artwork and they depicted it without casting a shadow, according to this logic, it was not an idol (Elverskog, 2010). We will now look at the mathematics that developed in the Islamic region.

Islamic Mathematics Islamic mathematics can be put into three traditions: the first is geometry from Greek mathematics from mathematicians such as Archimedes, Apollonius, Diophantus, Euclid, and Heron; the second is an adoption of the Hindus arithmetic system along with algebraic methods, trigonometry, solid geometry, and astronomy; and the third is practical mathematics which was for tax and treasury officials, merchants, surveyors, builders, and artisans in geometric design (Katz, 2007). It is important to look at the influences of mathematics from outside the Islamic region. As the Islamic region grew, scholars and mathematicians were exposed to Greek mathematics. For Islamic mathematicians and scholars to begin learning the works of Greek mathematicians, it was first necessary to translate the Greek texts. Three brothers Muhammad, Ahmad, and al-Hasan ibn Musa worked with linguist Thabit ibn Qurra to produce Arabic translations of Euclid’s Elements, Archimedes’ On the Measurement of the Circle and On the Sphere and Cylinder, Ptolemy’s

24 Comparative Temple Geometries

703

Almagest, and Conics by Apollonius of Perga (Katz, 2007). These works were available to Islamic mathematicians by the end of the ninth century (Ball, 1960). The geometric constructions that came from the Greek works were crucial for geometric patterning of Islamic art. Islamic mathematicians understood the methods and proofs of geometrical constructions which allowed them to collaborate with artisans and architects. Influences from the west may have also included the philosophy in art and geometry. Classical ideas and ideas from antiquity could have possibly influenced Islamic cultures view of the beauty that is derived from mathematics (Bier, 2008). In the eighth or ninth century, an Arabic translation of Plato’s Timaeus became available to Islamic mathematicians. In Timaeus, Plato discusses the right isosceles triangle and the half equilateral triangle. From these triangles he constructs the three-dimensional forms known as the Platonic solids. The√right isosceles triangle √ contains 2, and the half equilateral triangle contains 3. Much of the twodimensional Islamic patterning contains these two ratios (Bier, 2009). Another important influence came from Indian mathematics in the mid-seventh century. The Indian base-10 positional system aided Islamic mathematicians in the development of algebra (Katz, 2007). There are a number of interesting extensions of Greek works made by Islamic mathematicians including Abu al-Wafa al Buzjani’s method to construct a regular pentagon with a compass of fixed opening or Abu Sahl alKuhi’s method of inscribing an equilateral pentagon in a square (Katz, 2007).

Islamic Geometric Patterns and Art Mathematics can take on many forms of beauty. We see this beauty visually in the geometric patterning of Islamic art. Were the artisans trained in mathematics? Or were mathematicians involved in training the artists? In many Islamic regions, patterning was applied to almost all mediums including architecture, metal work, ceramics, textiles, and book illumination. In some cases the patterns are simple algorithms which can be easily understood and used by artists. Some patterns applied to architecture, such as patterning a full circular vault, may require mathematics beyond the scope of what a common artisan understood at that time. When the artist makes a pattern, they become immediately involved in principles of applied geometry. Formulas or symbolic expression can be left unknown to the artist. The artist only needs the idea of repetition and to follow the laws of symmetry. Some examples of basic grid layouts an Islamic artist would have used include a seven overlapping circle grid or a triangular grid. Artists could use a grid and another simple algorithm to create a desired pattern, or the artist could improvise to create interesting intricate patterns. Some problems and constructions went beyond the scope of the artist. Some Islamic texts were published that address such problems. Abu al-Wafa al-Buzjani wrote the book, On the Geometric Constructions Necessary for the Artisan. In his book he describes how he collaborated with artisans to teach them correct geometric constructions. The artisans were aware of cutting up square material

704

K. McGonigal

to make patterns, but Abu al-Wafa al-Buzjani claimed that the artisans would occasionally make elementary mistakes. In this work Abu al-Wafa al-Buzjani examines how the artisan is concerned with the correctness of the construction perceived by his senses, while the geometer is concerned with the correctness of the proof perceived by his imagination. The errors made in construction are due to the artisan’s unfamiliarity with proof and the geometer’s unfamiliarity with construction (meaning construction dealing with a craft not geometric construction). Abu alWafa al-Buzjani discusses a meeting of artisans and geometers trying to figure out how to construct one square from three squares (Katz, 2007). In this discussion two methods are created by the artisans, which to the eye seem correct. Geometrically the methods are incorrect. The artisans however keep in mind that they will be cutting material. A simple layout for these cuts will be the best. The geometers then propose their method to the artisans. Their method is geometrically correct; however, cutting the material with their method would be unpractical for the artisan. Figure 2 shows the interior of the Dome of Sheikh Lotfollah Mosque. Collaborations between artisans and mathematicians produced breathtaking patterns and designs adorning architecture. Another important text was a Persian manuscript, On Inscribing Similar and Congruent Figures. The manuscript contains many geometry problems. Without

Fig. 2 Dome of Sheikh Lotfollah Mosque, Isfahan, Iran. Completed in 1619 CE, the interior dome exemplifies the stunning visual beauty and complexity of Islamic art. (Image by Adam Jones from Kelowna, BC, Canada https://creativecommons.org/licenses/by-sa/2.0/legalcode)

24 Comparative Temple Geometries

705

proofs, the author, who remains unknown, gives methods and constructions of ways to separate polygons into smaller polygons that can be formed into other polygons. In the Topkapi Museum in Istanbul, there is a scroll which contains patterns that can be applied to architectural structures. This scroll is known the Topkapi scroll and was published by Gulru Necipoglu. “The mathematics of these structures attracted the attention of mathematicians of the caliber of al-Kashi, and considerable mathematical investigation must have been necessary for many of the patterns” (Berggren pg. 620 2007). The Topkapi scroll is not Necipoglu’s original. The original contained the patterns along with Necipoglu’s construction lines. The Topkapi scroll contains a pattern for a full circular vault. The portion is only a fourth and can be repeated to cover the entire full circular vault. The vault, when completed with this patterning, would contain 24-fold rotational symmetry. The geometric art of Islam was shaped by the artist’s hand and the mathematician’s mind. This close collaboration of artist and mathematician produced breathtaking, impressive, and beautiful artwork; however, where does the beauty arise from? Is it the system of mathematics behind the artwork? Is it the artist, capturing what the senses say are correct and beautiful? Perhaps it is a delicate balance between mathematical truth and truth perceived by our senses. It is now time that we leave our examination of Islamic patterning and journey to Japan. It is perhaps the temple geometry of Japan that allows us to see mathematical beauty in artwork. We will now turn our attention to Japan’s history of mathematics and the story of temple geometry.

Japanese Mathematics Although the oldest surviving Japanese mathematical tablet is dated 1683 CE, we will proceed to look at a brief history of Japanese mathematics. From ancient times up to 552 CE, there are few records documenting any sort of mathematical history. Before 522 CE, Japan may have had a system for expressing very high powers of ten, similar to Archimedes’ Sand Reckoner (Rothman and Fukagawa, 2008). According to Smith and Yoshio (1914), in 284 CE Korea introduced Chinese ideograms to Japan which allowed Japanese scholars to learn to read and write. Around 522 CE Buddhism was introduced to Japan (Smith and Yoshio, 1914). Japan’s development of mathematics relied on the influence of China and Korea. Most of the mathematical ideas that came from China went to Korea and then were transmitted to Japan. Transmission of mathematical ideas, mostly in the form of various mathematical texts, eventually came directly from China (Hayashi, 1905). We shall also note here that Korea and China’s influence on Japan did not come from invasion or China forcing its ideas upon Japan, but rather it was Japan who reached out in order to gain the knowledge that was available in Chinese mathematics. The next important date is 701 CE. It was in 701 CE that Emperor Monbu established an institute of higher learning including mathematical studies. Nine Chinese mathematical texts were studied. These nine Chinese works were:

706

K. McGonigal

1. Chou-pei Suan-ching which is the oldest Chinese works on mathematics and includes something similar to the Pythagorean theorem although the proof of this theorem would not come about for another 500 years 2. Sun-Tzu Suan-Ching includes treatise on algebraic quantities and indeterminate equations 3. Liu-Chang is unknown 4. San-Kai Chung-cha includes some treatise on mensuration of heights and distances 5. Wu-tsao Suan-shu includes treatise on Chinese arithmetic 6. Hai-tao Suan-shu includes a problem addressing measuring an island from a faraway point 7. Chiu-szu is unknown 8. Chiu-chang is known as the greatest Chinese arithmetical classic (nine sections). The Chiu-chang contains topics such as mensuration of various plane figures (triangles, quadrilaterals, circles, circular segments, sectors, and the annulus), problems solved by the rule of three, extraction of square and cube roots, mensuration of solids (prism, cylinder, pyramid, circular cone, frustum of a cone, tetrahedron, and wedge), rules of false position, linear equations involving two or more unknown quantities, an equivalent formula for the Pythagorean theorem, and something of a quadratic equation The last of the Chinese texts used at Emperor Monbu’s university was the ChuiShu which unfortunately is lost (Smith and Yoshio, 1914). The Chiu-chang was the Chinese classic studied by Japanese mathematicians. The sixth chapter in the Chiu-chang deals with fair taxes and also includes pursuit problems. The pursuit problems deal with a hound chasing a hair. These same problems were introduced to Europe by Islamic mathematicians (Joseph, 2000). It is not clear whether Islamic mathematicians obtained these problems from China or whether there was a transmission from the Islamic region to China. It does however emphasize the influence of China on both the Islamic regions and Japan. Oddly enough in the eighth century rather than Japanese learning taking off with these new available Chinese texts, interest in mathematics declined. At this point in time, one would be considered of low birth if they could perform arithmetic. Even stranger yet, “ignorance of the value of different coins was a token of good breeding” (?). From here on to 1600 CE, Japan went through a dark age in mathematics. No advances or texts were published.

Japanese Temple Geometry In the 1600 CE, Christianity tried to get a foothold in Japan, but this lead to Christians being expelled from Japan. This created Japan’s isolation from the outside world known as sakoku, which means closed country (Rothman and Fukagawa, 2008). During the period of Japan’s sakoku, many developments took place in mathematics and sciences. This is the period when we find a niche of

24 Comparative Temple Geometries

707

Fig. 3 Sangaku of Konnoh Hachimangu Shrine, Shibuya, Tokyo, Japan. (1859 CE) (Image by Momotarou2012 https://creativecommons.org/licenses/by-sa/3.0/legalcode)

Japanese mathematics that some have classified as Japanese folk mathematics. This niche contains the creation of sangaku or wooden tablets containing sacred temple geometry problems. Sangaku contained beautiful painted geometric configurations. Included with the geometric configurations were usually formulas or statements that had to do with some geometrical property of the figures. These wooden panels were then hung at either Shinto or Buddhist shrine (Rothman and Fukagawa, 2008). Figure 3 shows the Sangaku of Konnoh Hachimangu Shrine. So why were these geometric tablets hung at shrines? There must have be a close tie between religion and mathematics. Sangaku actually evolved from a Shinto offering custom. The Shinto gods were known as Kami (whose spirits infuse everything from the sun and moon to rivers, mountains, and trees (Rothman and Fukagawa, 2008)). Worshipers would bring gifts to the shrines in honor of the Kami. Some of these gifts were wooden panels with paintings on them. The Kami were known to be fond of horses, so panels containing pictures of horses or carvings of horses incorporated into the wood were quite common. From here there are no sure theories on why worshipers suddenly started hanging mathematical tablets, but we will examine further the Shinto religion and cultural views on art to see if we find any connections. Japanese people are island peoples that are both master of their environment and mastered by their environment. Art can be seen as the response to one’s environment. Much of Japanese art is connected to the close relationship with nature. This response to nature and the beauty that is found in nature has roots in the Shinto religion. Of the Shinto religion, the Kami are the objects of reverence.

708

K. McGonigal

On one hand Kami can be interpreted as the nature gods of the Shinto religion. There are other interpretations as well including Kami referring to the unique force in nature, animals, and humans that represent emotions such as fear, reverence, or gratitude. Another interpretation of Kami is the force that fills people with wonder and awe (Picken, 2002). Perhaps the last interpretation of the Kami best explains why a wooden tablet containing a geometric proof would be appropriate to hang at a Shinto temple. As Japanese mathematicians began working with geometric proofs, they must have experienced the awe that anyone who has done a successful proof feels. It is sometimes odd, as a mathematician, to comment on emotions and feelings while in the realm of science, and yet it is the joy of mathematics that inspires us to continue. Viewing Kami as the force that fills one with awe may be a plausible explanation on why Japanese mathematicians felt geometric proofs would be appropriate as religious offerings. It is interesting to note who the mathematicians or geometers that created sangaku were. Apparently samurai, farmers, women, and children were the mathematicians of the sangaku (Rothman and Fukagawa, 2008). There existed a sort of competitive spirit among geometers of the sangaku. Some felt that if they hung a more difficult theorem at their shrine, then the gods would give them greater favor for their achievement. This is perhaps what pushed these geometry problems to more and more advanced levels. Since we want to have a full understanding of how sangaku existed in Japanese culture, we must look at the criticism that came from Japanese scholars. Sangaku is a beautiful art, but according to Fujita Sadasuke (1734–1807), it really had no practical purposes since the geometry had no application to production techniques or natural sciences. In Fujita’s book Seiyou Sanpou (1781), he remarks that in mathematics, there is something useful, something not immediately useful, and then there is something totally useless. He then remarks that the custom of hanging sangaku was totally useless work (Ogawa, 2001). Perhaps while sangaku problems may not have had applicable uses, these geometry problems were used in the teaching of mathematics, and with this logic would be seen as useful in developing an intelligent culture. We also must remember that sangaku were used as religious offerings and therefore did not need to have practical application. Perhaps sangaku can be seen as having spiritual application.

Conclusion Islamic geometric patterning and Japanese temple geometry are both sophisticated art forms. They are art forms that combine art and mathematics. The mathematician may see the beauty in the proof, or the number and the artist see beauty in the craft, but combined as in Islamic geometric pattern, we witness a unique dance between mathematical truth and truth of the senses. There is no doubt that the translations of Greek classics into Arabic aided in the creation of Islamic geometric patterning. We must remember that it was the native developments of Islamic mathematicians and artisans that laid the grids for geometric patterning. The religion of Islam also shares

24 Comparative Temple Geometries

709

in the shaping of Islamic patterning, as did the Shinto religion and Buddhism in Japanese temple geometry. The Islamic region was vast and open to influence while the creation of geometric patterning occurred, while temple geometry arose in Japan at a time when Japan was closed to outside influences. Of the influences that reached Japan, China’s influence was crucial in Japan’s development of mathematics. Nine sections was a work that helped Japanese geometry evolve into the gem known as sangaku. Sangaku wooden tablets exemplify how geometric proofs can be visually beautiful, intellectually simulating and have cultural importance. Islamic regions and Japan, as culturally different and geographically separated as they were, share a common thread in that they gave to the world the gift of mathematical art.

Cross-References  Islamic Design and its relation to Mathematics  Korean Traditional Patterns: Frieze and Wallpaper

References Ball WWR (1960) A short account of the history of mathematics. Dover Publications, New York Berggren JL (2007) Section “Mathematics in Medieval Islam” in The Mathematics of Egypt, Mesopotamia, China, India and Islam: A Sourcebook (V. Katz, ed.) Princeton, NJ: Princeton University Press, 2007, pp. 515–675 Bier C (2008) Art and Mith¯al: reading geometry as visual commentary. Iranian Stud 41(4):491–509 Bier C (2009) Number, shape, and the nature of space: thinking through Islamic art. In Robson E, Stedall J (eds) The Oxford handbook of the history of mathematics. Oxford University Press, Oxford, pp 827–851 El-Said I, Parman A (1976) Geometric Concepts in Islamic Art. World of Islam Festival Publishing Company Ltd, London Elverskog J (2010) Buddhism and Islam on the silk road. University of Pennsylvania Press, Philadelphia Erzen JN (2007) Islamic aesthetics: an alternative way to knowledge. J Aesthet Art Critic 65: 69–75. https://doi.org/10.1111/j.1540-594X.2007.00238.x Ettinghausen R, Grabar O, Jenkins-Madina M (2002) Islamic art and architecture. Yale University Press, New Haven, CT pp 650–1250 Franck IM, Brownstone DM (1986) The silk road: a history. Facts on File Publications, New York Hayashi T (1905) A brief history of the Japanese mathematics. Amsterdam, Nieuw Arch. Wisk., (Ser. 2), 6, 1905 Joseph GG (2000) Crest of the peacock: the non-European roots of mathematics. Princeton University Press, Princeton Katz V (2007) The mathematics of Egypt, Mesopotamia, China, India, and Islam a sourcebook. Princeton University Press, Princeton Martzloff JC (1997) A history of Chinese mathematics. Springer, New York Ogawa T (2001) A review of the history of Japanese mathematics. Revue d’histoire des Mathematiques 7:137–155 Picken SDB (2002) Historical dictionary of shinto. The Scarecrow Press Inc, Lanham Rothman T, Fukagawa H (2008) Sacred mathematics: Japanese temple geometry. Princeton University Press, Princeton. Smith DE, Yoshio M (1914) A history of Japanese mathematics. The Open Court Publishing Co., Chicago

Wasan Geometry

25

Hiroshi Okumura

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wasan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wasan Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems Involving Congruent Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Congruent Circles on a Line and a Circle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Congruent Circles on a Line with Two Congruent Circles on a Line . . . . . . . . . . . . . . . . . . Congruent Circles on a Line and Congruent Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two Congruent Circles on a Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Congruent Circles on a Line with Two Intersecting Congruent Circles . . . . . . . . . . . . . . . . Two Sets of Congruent Circles on a Line and Two Circles . . . . . . . . . . . . . . . . . . . . . . . . . . A Square and Three Congruent Circles in an Isosceles Triangle . . . . . . . . . . . . . . . . . . . . . Congruent Circles in a Rectangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Arbelos in Wasan Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two Sangaku Problems Involving a Circle of the Same Radius . . . . . . . . . . . . . . . . . . . . . . Two Congruent Circles Touching a Perpendicular to AB . . . . . . . . . . . . . . . . . . . . . . . . . . . Two Circles Touching a Perpendicular to AB at the Same Point . . . . . . . . . . . . . . . . . . . . . Two Congruent Circles Touching an Inclined Line to AB . . . . . . . . . . . . . . . . . . . . . . . . . . Congruent Circles Touching a Circle Passing Through the Center of α . . . . . . . . . . . . . . . Reflection in the Axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Golden Arbelos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arbelos with Overhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arbeloi Determined by a Chord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Sangaku Problem Involving an Archimedean Circle . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Sangaku Problem Involving Two Archimedean Circles . . . . . . . . . . . . . . . . . . . . . . . . . . Wasan Geometry and Division by Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Configuration A(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Three-Circle Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

712 713 714 714 721 723 724 726 727 727 729 730 732 733 735 736 737 738 739 741 743 746 747 749 750 751 751

H. Okumura () Faculty of Engineering, Department of Life science and Informatics, Graduate School of Engineering, Division of Life Science and Informatics, Maebashi, Gunma, Japan e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_122

711

712

H. Okumura

Practical Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Study of Wasan Geometry: Past and Present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

754 759 760

Abstract After giving a simple introduction of Japanese traditional mathematics called Wasan and Wasan geometry, we consider problems in Wasan geometry in details in three sections and show that the problems are rich source for mathematical study today, although many people consider Wasan to be a historical mathematics. In the first section, we consider problems involving several congruent circles. Those figures have not been considered elsewhere, though they have interesting properties, and there are few expository writings dealing with those problems today. In the second section, we consider problems involving an arbelos formed by mutually touching three circles with collinear centers. Since it is one of the most well-known plane figures and has been studied by many mathematicians, it is a very good example to see the different approaches to studying the same figure between the East and the West. In the third section, we consider simple application of recently made definition of division by zero to Wasan geometry. At the end of this chapter, we see the practical side of Wasan and Wasan geometry briefly and give a simple history of the study of Wasan geometry.

Keywords Wasan · Wasan geometry · Sangaku · Touching circles · Congruent circles on a line · Arbelos · Archimedean circle · Division by zero

Introduction In this chapter, we consider geometric problems proposed in old Japanese mathematics called Wasan. Thereby we give a simple introduction of Wasan and its geometry at first. Then we consider problems in Wasan geometry in details in three section, which are the main part. In the first section, we consider problems involving several congruent circles. It seems that those figures have not been considered elsewhere, though they have interesting properties. Also there are very few expository writings dealing with those problems today. In the second section, we consider problems involving an arbelos formed by three mutually touching circles with collinear centers. The arbelos is one of the most well-known plane figures and have been studied by many mathematicians, but those problems in Wasan geometry are not known. The problems are good examples to see the different approaches to studying the same figure between the East and the West. In the third section, we consider simple application of recently made definition of division by zero to Wasan geometry (Kuroda et al., 2014). Throughout the three sections, we will see that problems in Wasan geometry are rich source for mathematical study today. At

25 Wasan Geometry

713

the end of this chapter, we see the practical side of Wasan and Wasan geometry briefly and give a simple history of the study of Wasan geometry.

Wasan “Wasan” refers to Japanese mathematics developed independently of Western science in the seventeenth–nineteenth centuries. “Wa” means Japan or Japanese, and “san” means mathematics. Takakazu (or Kowa) Seki (?–1708) improved the Chinese way of algebraic calculation of an unknown quantity and made it possible to solve equations with a number of unknown. After that Japanese mathematics developed very rapidly in its own original way. This mathematics after Seki is called Wasan in the narrower sense. Japanese government closed the country during the Edo period. But at the beginning of the Meiji era (1868–1912), the new government opened the country and adopted Western mathematics in the new schooling system. Thereafter Wasan followed a course of decline. For extensive references, see Mikami (1913) and Smith and Mikami (1914). There were two customs which accelerated the development of Wasan. One is idai which is a challenging problem at the end of Wasan books. When Wasan mathematicians published a book, they proposed unsolved problems at the end of the book. Then others, who succeeded in solving the problems, published their solutions with other challenging problems at the end of their books. Seki’s attempt to improve algebraic calculation was also made when he tried to solve such a challenging problem. Another custom is a sangaku, a wooden tablet of mathematics. When people found interesting properties or solved hard problems, they wrote them as problems on a framed wooden board to express gratitude, which was dedicated to a shrine or a temple. Then the board would be hung on the wall under the roof. Most such problems were geometric, and the figures were beautifully drawn in color. It was also a means to publish discoveries or to propose new problems. Wasan covers a part of analysis, number theory, combinatorics, and geometry. The area of circles, length of circular arc, volume of intersecting solids, solutions to indefinite equation, and magic squares are popular Wasan topics. Also Wasan mathematicians studied astronomy, surveying, and the art of divination in many cases. Though they studied certain aspects of some things deeply, they did not establish theoretical system. Since almost all Wasan books were written as problem books following the tradition of Chinese mathematics books, it was not necessary to treat particular topics. On the other hand, there are also Wasan books which treated particular subjects (Okumura, 1999). There were several Wasan schools like sado (tea ceremony) and kado (the art of flower arrangement). Seki school and Saij¯o school were famous major Wasan schools, but the difference between their mathematics was not so clear. There are several digital libraries of Wasan books and Wasan manuscripts today. Tohoku University has the largest collection of Wasan materials, where about ten thousands of books and manuscripts are online as Tohoku University Digital

714

H. Okumura

Collection. Aida, the founder of Saij¯o school, left behind several hundreds volumes, which are on line at Yamagata University Academic Repository. Toyoyoshi proposed many problems involving congruent circles, whose manuscripts can be seen in Digital Library Department of Mathematics, Kyoto University. More than a hundred titles are online at Kotenseki Sogo Database in Waseda University. Also several hundreds of books and manuscripts are online at Shimoura Collection in Museum of Science Tokyo University of Science. The first and the last collections can also be found at the site of National Institute of Japanese literature.

Wasan Geometry Let us see the catalog pages at the opening of a popular book of formulas (Yamamoto, 1841) (see Figs. 1, 2, 3, 4, 5, and 6). The figures are arranged to be read top-to-bottom and right-to-left. There are a hundred and two figures, and a formula is stated on later pages for each of the figures. For example, for the leftmost figure at the top in Fig. 5 consisting of √an ellipse and a square, the formula states that the side of the square equals ab/ a 2 + b2 , where a and b are the major axis and the minor axis of the ellipse. As this example shows, which is also stated as a problem and its answer, most geometric problems attempt to discover certain dimensions of geometric figures such as the diameter of a circle, a major (or minor) axis of an ellipse, the side of a rectangle, etc. The catalog also shows that figures involving touching circles constitute the major parts of Wasan geometry. Touching spheres were also considered (see Fig. 5, right). As in Figs. 5 and 6, ellipses, which were considered as sections of cylinders, were studied often, but hyperbolas and parabolas were not outside of several exceptions such as Iwai (no date), a sangaku problem (Gunma Wasan Kenky¯ukai, 1987, p. 89), and also see Mikami (1913-2). Also rhombuses were considered, but parallelograms barring a couple of exceptions, such as Gunma Wasan Kenky¯ukai (1987, p. 72) and Kimura (1855), were not. Wasan plane geometry is not a triangle geometry, i.e., Wasan mathematicians did not study about one triangle but studied some relationships which arose when several elementary figures such as triangles, rectangles, and circles got together. For a lot of notable problems in Wasan geometry, see the nice collection (Fukagawa and Pedoe, 1989).

Problems Involving Congruent Circles We state results of Wasan geometry as problems and answers to distinguish them from the others. In this section, we consider problems of Wasan geometry involving several congruent circles. Those problems were considered especially by mathematicians in Shiseisanka school and Toyoyoshi in Miki school. Since the problems are easily generalized today, we consider the problems together with their generalization. We will see that the 3-4-5 triangle is often associated with several interesting special cases.

25 Wasan Geometry

715

Fig. 1 Catalog page 1

Congruent circles in Wasan geometry are circles in line in many cases as in Fig. 7, which we define as follows. Let α1 , α2 , · · · , αn be congruent circles touching a line t from the same side such that α1 and α2 touch, and αi (= αi−2 ) (i = 3, 4, · · · , n) touches αi−1 . In this case, we call the circles α1 , α2 , · · · , αn congruent circles on a line or congruent circles on t. The next proposition is very useful for problems in Wasan geometry involving externally touching circles.

716

Fig. 2 Catalog page 2

H. Okumura

25 Wasan Geometry

Fig. 3 Catalog page 3

717

718

Fig. 4 Catalog page 4

H. Okumura

25 Wasan Geometry

Fig. 5 Catalog page 5

719

720

H. Okumura

Fig. 6 Catalog page 6

Fig. 7 Congruent circles on t

α1

α2

α3

αn

t

Proposition 1. If two externally touching circles of radii r1 and r2 touch a line at √ two points P and Q, then |P Q| = 2 r1 r2 .

25 Wasan Geometry

721

Congruent Circles on a Line and a Circle If β1 , β2 , · · · , βn (n ≥ 2) are congruent circles of radius b on a line t, and a circle α of radius a touches β1 , βn , and t, we denote the configuration consisting of α, β1 , β2 , · · · , βn and t by A(n) (see Fig. 8). If n = 2, the circle α coincides with the incircle of the curvilinear triangle made by β1 , β2 , and t. If n = 3, the circles α and β2 coincide, and we get the trivial relation a = b. The relation between a and b was considered in the case n = 2, 4, 5 for A(n) in Wasan geometry. The next theorem can easily be proved by Proposition 1 (Okumura, 2017a): Theorem 1. A circle of radius a and n (n ≥ 2) congruent circles of radius b on a line can form A(n) if and only if a = b



n−1 2

2 (1)

.

Figure 9 shows A(5) appeared in the problem proposed by Shinohara in 1809 (Shimura, no date). Since a/b = 1/4 for A(2), and a/b = 4 for A(5), we can construct a recursive configuration arising from Shinohara’s figure (see Fig. 10), where the horizontal parallel segments are removed. The configuration A(n) has several interesting properties, one of which is as follows (Okumura, 2017a): Theorem 2. Assume that β1n , β2n , · · · , βnn (n ≥ 2) are congruent circles on a line t forming A(n) with a circle α and t. If γ is the incircle of the curvilinear triangle n+2 made by α, β1n and t, then there are congruent circles γ = β1n+2 , β2n+2 , · · · , βn+2 on t such that they form A(n + 2) with α and t. Assume that a circle α, its tangent t, and congruent circles β13 , β23 (= α), β33 on t form A(3). Starting with this figure, and using Theorem 2 repeatedly, we get Fig. 11. With this construction, we can define the configuration A(1) as the figure consisting of α, t, and β11 , where β11 is the tangent of α parallel to t (see Fig. 12).

Fig. 8 A(5)

α

β1

β2

β3

β5 t

722

H. Okumura

Fig. 9 A(5) in Shinohara’s problem (Shimura, no date)

Fig. 10 Configuration arising from A(5)

β13

β33

α =β23

β15

β25

β35

β45

β55 t

Fig. 11 A(3) ∪ A(5) ∪ A(7) · · ·

25 Wasan Geometry

723

β11

Fig. 12 A(1)

α

t

Fig. 13 B(5)

β1

γ1 γ2 γ3

β2

γ5 t

Fig. 14 Toyoyoshi (no date)

Congruent Circles on a Line with Two Congruent Circles on a Line Let β1 and β2 be congruent circles of radius b on a line t, and let γ1 , γ2 , · · · , γn be congruent circles of radius c on t such that they lie inside of the curvilinear triangle made by β1 , β2 , and t and γ1 touches β1 and γn touches β2 . The configuration consisting of β1 , β2 , γ1 , γ2 , · · · , γn , and t is denoted by B(n) (see Fig. 13). The configuration B(1) coincides with A(2). The relation between b and c was considered in the case in which n = 1, 5 and n is even in Wasan geometry. If we divide B(n) by the perpendicular from the point of tangency of β1 and β2 to t, we get two congruent figures. Toyoyoshi considered one of the resulting figure and got a relation essentially the same as (2) in the next theorem in the case n being even (Toyoyoshi, no date) (see Fig. 14).

724

H. Okumura

Fig. 15 B(1) ∪ B(4) ∪ B(9)

β1

β2

t

Theorem 3. Two circles of radius b and n congruent circles of radius c on a line can form B(n) if and only if √ b = ( n + 1)2 . c

(2)

A similar theorem to Theorem 2 for B(n) also holds (Okumura, 2017a) (see Fig. 15). Theorem 4. Assume that γ1 , γ2 , · · · , γn2 are congruent circles on a line t forming B(n2 ) with two congruent circles β1 and β2 and t for a positive integer n, where β1 touches γ1 . If δ is the incircle of the curvilinear triangle made by β1 , γ1 , and t, then there are congruent circles δ = δ1 , δ2 , · · · , δ(n+1)2 on t such that they form B((n + 1)2 ) with β1 , β2 , and t.

Congruent Circles on a Line and Congruent Squares Congruent squares are derived from the configurations A(n) and B(n). We consider the following problem (Aida, 1788) (see Fig. 16). Problem 1. Let β1 and β2 be congruent circles of radius b on a line t. If ABCD is a square inside of the curvilinear triangle made by β1 , β2 , and t such that the side DA lies on t and the points C and B lie on β1 and β2 , respectively. Show that 2b = 5|AB|. The problem is the case n = 1 in the next theorem (Okumura, 2018a) (see Fig. 17). Theorem 5. Assume that congruent circles β1 and β2 of radius b on a line t and congruent circles γ1 , γ2 , · · · , γn on t form B(n) with t, where γ1 and γn touch β1 and β2 at points C and B, respectively. If A is the foot of perpendicular from B to t, the following relations hold.

25 Wasan Geometry

725

Fig. 16 n = 1

β1 C

β2 B

D

A

β1

β2

t

Fig. 17 n = 3

C

B

γ1 D

γn A

t

Fig. 18 A(4)

α β2

β1 C

β3

β4 B t A

(i) n|AB| = |BC|. √ (ii) 2b = (1 + ( n + 1)2 )|AB|. If D is the foot of perpendicular from C to t in the theorem, then the rectangle ABCD can be divided into n congruent squares of side length |AB|. The theorem also shows that the incircle of the curvilinear triangle formed by β1 , β2 , and t touches β1 and β2 at C and B, respectively, in Problem 1. For the configuration A(n), we also have a similar theorem (see Fig. 18). Theorem 6. Assume that a circle α of radius a and congruent circles β1 , β2 , · · · , βn on t form A(n) with t, where α touches β1 and βn at points C and B, respectively. If A is the foot of perpendicular from B to t, the following relations hold. (i) (n − 1)|AB| = |BC|.     n−1 2 (ii) 2a = 1 + |AB|. 2

726

H. Okumura

Two Congruent Circles on a Line We consider the next problem (Furuya, 1854) (see Fig. 19). There are several sangaku problems and books stating the same problem. Problem 2. Let α and β be externally touching circles of radii a and b, respectively, with an external common tangent t. Let γ and γ  be two congruent circles on t of radius c such that they lie inside of the curvilinear triangle made by α, β, and t, γ touches α, and γ  touches β. Find c in terms of a and b. The part (iii) of the next theorem gives an answer of the problem (Okumura, 2017b) (see Fig. 20). Theorem 7. Let α, β, γ , γ  , and t be as in Problem 2. Assume that δ and δ  are two congruent circles on t of radius d such that α and β lie inside of the curvilinear triangle made by δ, δ  , and t, δ touches α, and δ  touches β. Then the following relations hold. (i)

√ √ √ √ a + b = d − c.

(ii) ab = cd. (iii) c =

w−

√ √ √ w + w 2 − 4ab w 2 − 4ab and d = , where w = a + b + 4 ab. 2 2

Fig. 19 A pair of two congruent circles on t

α

t

β

γ

γ

Fig. 20 Two pairs of two congruent circles on t

δ

α γ

δ

β

γ

t

25 Wasan Geometry

727

Congruent Circles on a Line with Two Intersecting Congruent Circles We generalize the next problem (Takeda, 1825), which can also be found in several Wasan books (see Fig. 21). Problem 3. For two intersecting circles δ1 and δ2 of radius s, there are four congruent smaller circles of radius r such that two of them touch each other and δ1 and δ2 internally, each of the other two circles touches δ1 and δ2 externally, and one of the external common tangents of δ1 and δ2 . Show s = 6r. Problem 3 is the case n = 1 in the next theorem (Okumura, 2018g) (see Fig. 22). Theorem 8. For a rectangle ABCD satisfying s = |BC| > |AB|, let δ be the circle of center B passing through C. If γ1 , γ2 , · · · , γn are congruent circles of radius r on DA lying inside of the curvilinear triangle made by CD, DA, and δ such that γ1 touches CD and γn touches δ, and γ1 , γ2 , · · · , γn are congruent circles of radius r on DA touching DA from the same side as γ1 such that γ1 touches δ internally from the side opposite to D and γn touches AB from the same side as D, then the following statements hold. (i) s = 2(2n + 1)r. (ii) There is a circle of radius r touching γn , γ1 and DA.

Two Sets of Congruent Circles on a Line and Two Circles We consider the next sangaku problem stated in a collection of problems proposed by mathematicians in Shiseisanka school (no author’s name, no datea) (see Fig. 23). The problem is also cited in Saitama Prefectural Library (1969). Fig. 21 n = 1

δ1

δ2

728 Fig. 22 n = 4

H. Okumura

C

B

δ γ1

γ2

γn

D

γ1

γ2

γn

A

Fig. 23 Two congruent circles on t and one circle on t of the same radius

u

γ1

α

β

γ2 t

Problem 4. Let α and β be externally touching circles of radii a and b (a < b), respectively, with external common tangents t and u. Congruent circles γ1 and γ2 on t lie inside of the curvilinear triangle made by t, u, and α such that γ1 touches u and γ2 touches α, and the incircle of the curvilinear triangle made by α, β, and t is congruent to γ1 . Show that b = 2a. The problem can be solved by Proposition 1. It is involving two larger circles and two sets of congruent circles on a line of the same radius, one is {γ1 , γ2 }, and the other is the set consisting of incircle of the curvilinear triangle made by α, β, and t, though the latter consists of only one circle. It suggests that there may be some interesting simple relationships between the ratio of the radii of the two larger circles and the numbers of congruent circles of the two sets of the circles of the same radius, if we consider similar figures. The next theorem shows one of such relationships (Okumura and Sodeyama, 1998) (see Fig. 24).

25 Wasan Geometry

729

γ1 γn

t

s

γ1 γ2

γm

α

β

Fig. 24 (n, m) = (2, 6) (left), (n, m) = (1, 3) (right)

Theorem 9. Let α and β be externally touching circles of radii a and b (a < b), respectively, with external common tangents t and u. Congruent circles γ1 , γ2 , · · · , γn on t lie inside of the curvilinear triangle made by t, u, and α such that γ1 touches u and γn touches α, also congruent circles γ1 , γ2 , · · · , γm on t lie inside of the curvilinear triangle made by α, β and t such that γ1 touches α and γm touches β, where γ1 and γ1 are also congruent. Then b = 4a if and only if 3n = m. Notice that the point of intersection of t and u, the center of α, and the point of tangency of α and t form a 3-4-5 triangle in Fig. 24.

A Square and Three Congruent Circles in an Isosceles Triangle We consider the next problem (Takigawa, 1827), which can be found in several Wasan books (see Fig. 25). Problem 5. For an isosceles triangle EF G with base EF , let ABCD be a square such that B and C lie on the sides F G and GE, respectively, and DA lies on the side EF , where the triangles ABF and BCG have the same inradius r. Show that 4r = |AB|.

730

H. Okumura

Fig. 25 n = 1

G

C

E

B

Fig. 26 n = 3

F

A

D G

C

E

D

B

A

F

For a triangle EF G, let γ1 , γ2 , · · · , γn be congruent circles on EF of radius r lying inside of EF G such that γ1 touches GE, and γn touches F G. In this case we say that EF G has n circles of radius r on EF . Problem 5 is the case n = 1 in the next theorem (Okumura, 2018b) (see Fig. 26). Theorem 10. For an isosceles triangle EF G with base EF , let ABCD be a square such that B and C lie on the sides F G and GE, respectively, and DA lies on the side EF . If the triangle ABF has n circles of radius r on AB and the triangle BCG has n circles of radius r on BC, then 2(n + 1)r = |AB| and |EF | : |F G| = 6 : 5. The theorem shows that the shape of the isosceles triangle EF G is fixed for any positive integer n, which is divided into two 3-4-5 triangles by the perpendicular bisector of the base.

Congruent Circles in a Rectangle Let ABCD be a rectangle of center O and circumcircle δ of radius s. If the triangle OAB has n circles of radius r on AB and a circle of radius r touches the side BC at the midpoint and the minor arc of δ cut by BC, then the configuration is denoted by S(n). Furthermore if the triangle OBC has m circles of radius r on BC for S(n), we denote the configuration by S(n, m). The following problem was proposed on the

25 Wasan Geometry

731

Fig. 27 S(2), k = 1

δ C

B

O

D

Fig. 28 S(12), k = 3

A

C

B

δ O

D

A

occasion of the 150th anniversary of the death of K¯owa (or Takakazu) Seki (Okayu, 1855) (see Fig. 27). Problem 6. Show s/r = 5 for S(2). Problem 6 is the case k = 1 in the next theorem (Okumura, 2018d) (see Fig. 28). Theorem 11. The configuration S(n) can be constructed for any positive integer n, and the shape is determined uniquely by n, where s/r is an integer if and only if there is a positive integer k such that n = k(k + 1). In this event s/r = (k + 1)2 + 1. The configuration S(n, m) exists if and only if (n, m) = (2, 2), (6, 5), where the triangle ABC is a 3-4-5 triangle in both the cases (Okumura, 2018e) (see Figs. 29 and 30).

732

H. Okumura

B

Fig. 29 S(2, 2)

C

O

A

O

A

B

Fig. 30 S(6, 5)

C

The Arbelos in Wasan Geometry In this section we consider problems of Wasan geometry involving an arbelos. For a point O on the segment AB, let α, β, and γ be the semicircles of diameters AO, BO, and AB, respectively, constructed on the same side of AB (see Fig. 31). The area surrounded by the three semicircles is called an arbelos and denoted by (α, β, γ ). We call the perpendicular to AB at O the axis. The axis divides the arbelos into two curvilinear triangles with congruent incircles. The two congruent circles are believed to be studied by Archimedes and called the twin circles of Archimedes. Circles congruent to the twin circles are said to be Archimedean. Today’s research on the arbelos is mostly concerning with finding new Archimedean circles. However the third Archimedean circle was not found until the twentieth century. Meanwhile Leon Bankoff found an Archimedean circle, which is sometimes called Bankoff triplet circle (Bankoff, 1974). After the publication of the notable paper entitled “Those ubiquitous Archimedean circles” (Dodge et al., 1999), which presented dozens of Archimedean circles, many mathematicians were interested in the arbelos, and many papers on the arbelos have been published. Since the arbelos was also considered in Wasan geometry, it is an very good example to see the different approaches to studying the same figure between the East and the West. Ordinarily the arbelos is described by three semicircles so that the axis is vertical as in Fig. 31. But in Wasan geometry, it is described by three mutually touching circles with collinear centers such that the line passing through the centers is vertical (see Fig. 32). However we describe figures of Wasan geometry in ordinary way to reduce the space here. As Fig. 32 shows, twin circles of Archimedes were also considered in Wasan geometry (Akabane, 1998; Fukushimaken Wasan Kenky¯u Hozonkai, 1989; Hirayama and Matsuoka, 1966); however there are few problems

25 Wasan Geometry

733

Fig. 31 (α, β, γ ) and the twin circles

δα

δβ

γ α

β B

O

A

Fig. 32 Fujisawa (1874)

involving other Archimedean circles. We will see two sangaku problems involving such Archimedean circles at the end of this section. On the other hand, various kinds of non-Archimedean circles were considered. We generalize those problems or show that the figures of the problems have some interesting properties. We assume |AO| = 2a, |BO| = 2b. The radius of Archimedean circles equals rA = ab/(a + b). The twin circles of Archimedes touching α and β are denoted by δα and δβ , respectively.

Two Sangaku Problems Involving a Circle of the Same Radius We consider two sangaku problems involving a circle of the same radius. The part (i) of the next problem was proposed by Izumiya in 1866 (Saitama Prefectural Library, 1969) (see Fig. 33), and the part (ii) was proposed by Naitoh in 1983 (this sangaku seems to be made in modern days) (Fukushimaken Wasan Kenky¯u Hozonkai, 1989) (see Fig. 34). Problem 7. Assume that a = b for (α, β, γ ). (i) The tangent of α from B meet γ again in a point D. Show that the inradius of the curvilinear triangle made by α, γ and the perpendicular from D to AB equals a/9. (ii) Show that the radius of the circle touching the remaining external common tangent of α and δα and the minor arc of γ cut by the tangent at the midpoint equals a/9. The next theorem shows that there are several congruent circles including the two circles in the problem even in the case a = b (Okumura, 2019d) (see Fig. 35). Theorem 12. We assume that t is the external common tangent of α and β, u is the remaining external common tangent of α and δα , v is the tangent of α from B

734

H. Okumura

Fig. 33 A small circle of radius a/9

γ D

β

α

O

B Fig. 34 A small circle of radius a/9

A

ψ δα β

α

O

B

u

γ

ε4

D

ε2

δα

ε3

A

ε1

t

α

δβ

w

ε5 v B

β O

A

Fig. 35 A generalization

meeting γ again in a point D, and w is the perpendicular from D to AB. Then the following five circles are congruent and have radius a2b . (a + 2b)2 ε1 : the incircle of the curvilinear triangle made by α, γ , and w ε2 : the circle touching u and the minor arc of γ cut by u at the point D ε3 : the incircle of the curvilinear triangle made by γ , δβ , and the axis

25 Wasan Geometry

735

ε4 : the smallest circle touching w and passing through the point of intersection of u and v ε5 : the smallest circle touching the axis and passing through the point of intersection of β and v The circle ε1 also touches the line t, and D is the midpoint of the minor arc of γ cut by u in the theorem.

Two Congruent Circles Touching a Perpendicular to AB We consider the next sangaku problem, which was proposed by Tamura in 1898 (Saitama Prefectural Library, 1969) (see Fig. 36). Problem 8. Assume that h is a perpendicular to AB intersecting α and the incircle of the curvilinear triangle made by α, γ , and h is congruent to the circle touching α and β externally and h. Show that the common radius equals |AB|/10 in the case a = b. For this problem, we are interested in a necessary and sufficient condition, in which the two congruent circles are obtained. The next theorem gives such a condition (Okumura, 2019e) (see Fig. 37). Theorem 13. Let T be the point of tangency of the incircle of (α, β, γ ) and α. Let h be a perpendicular from P to AB for a point P on α. The incircle of the curvilinear triangle made by α, γ , and h is congruent to the circle touching α and β externally and h if and only if P coincides with T . In this event if h meets AB in a point U ,

Fig. 36 Tamura’s problem

γ

β

α

h O

B Fig. 37 Three congruent circles

A

γ α

P=T

β B

h O U

A

736

H. Okumura

then the circle of diameter T U is also congruent to the two circles and the common radius equals ab(a + b) . b2 + (a + b)2 Notice that the line BT meets α again in the farthest point on α from AB in the theorem.

Two Circles Touching a Perpendicular to AB at the Same Point We consider two sangaku problems and assume that h is the perpendicular to AB at H for a point H on the segment AO, δ is the circle touching α and β externally and h, and ε is the incircle of the curvilinear triangle made by α, γ and h. The part (i) of the next problem was proposed by Satoh in 1883 (Fukushimaken Wasan Kenky¯u Hozonkai, 1989) (see Fig. 38). The part (ii) was proposed by Chiba in 1880 (Yasutomi, 1987) (see Fig. 39). Problem 9. (i) The circle ε is congruent to the circle of diameter OH and a = b. Show that the common radius is triple the inradius of the curvilinear triangle made by α, γ and the external common tangents of α and β. (ii) The circle δ is congruent to the circle of diameter AH . Show that the radius of ε equals ab/(a + 2b).

Fig. 38 Two congruent circles

γ ε β

h

B

α

H

O

Fig. 39 Two congruent circles

A

ε γ β B

α

δ h O

H

A

25 Wasan Geometry

737

Fig. 40 Circles derived from the tangent of α from B touching α at V

ε δ β B

h O

V

H

γ α

A

The incircle of the curvilinear triangle made by α, γ , and the external common tangents of α and β in (i) is the circle ε1 in Theorem 12, i.e., its radius equals a 2 b/(a + 2b)2 in the case a = b. We also consider necessary and sufficient conditions, in which each of the pairs of congruent circles in the problem is obtained. The next theorem gives several such conditions (Okumura, 2019b,i) (see Fig. 40). Theorem 14. Let V be the point of tangency of α and its tangent from B. The following statements are equivalent. (i) (ii) (iii) (iv) (v)

The line h passes through the point V . The circle δ and the circle of diameter AH are congruent. The circle ε and the circle of diameter H O are congruent. The circles δ and ε touch. The two circles touching the two circles of diameters H O and AH externally and the axis are Archimedean.

If h passes through the point V , then |H O| = 2ab/(a + 2b) and the radius of the circle δ equals a(a + b)/(a + 2b) in the theorem.

Two Congruent Circles Touching an Inclined Line to AB We denote the point of intersection of γ and the axis by I . Let ρ be the reflection in the line AB. We assume that ε1 is the incircle of the curvilinear triangle made by α, γ , and the line I P for a point P on the semicircle α ρ and ε2 is the circle touching the semicircles α ρ , γ ρ , and I P from the side opposite to A (see Fig. 41). We consider the next problem (Akita, 1833). Problem 10. The circles ε1 and ε2 are congruent. Find the common radius in terms of a and |I O|. √ Considering the power of the point O with respect to γ , we get |I O| = 2 ab. Therefore Problem 10 essentially demands to find the common radius of the two circles in terms of a and b. However b was not used in the problem because of the

738

H. Okumura

Fig. 41 Two congruent circles touching an inclined line to AB

A

γ

γρ P

ε1

αρ

α

I

ε2

O

I

Fig. 42 Two congruent circles touching α, γ and I J ρ

ε1 J=Pρ

γ α

β B

O

A

P

ε2

αρ γρ

absence of the semicircle β from the figure. Let J be the point of intersection of α and the segment AI . The next theorem gives a condition, in which the pair of congruent circles in the problem is obtained (Okumura, 2019a) (see Fig. 42). Theorem 15. The circles ε1 and ε2 are congruent if and only if the point P ρ coincides with the point J . In this event the common radius equals 4a 2 b . (2a + b)2 The point J is one of the most notable points on α, because the external common tangent of α and β touches α at J , the distances from the center of β to I and J are the same, and the distance between J and the axis equals 2rA , where recall that rA is the radius of Archimedean circles.

Congruent Circles Touching a Circle Passing Through the Center of α The next sangaku problem was proposed by Matsuda (no author’s name, no dateb) (see Fig. 43).

25 Wasan Geometry

739

A

Fig. 43 Matsuda’s problem

γ

α δ



O

I

A

Fig. 44 A generalization

γ

I

α

P1

δ1

P2

δ2

O



Problem 11. A circle δ passes through the points I and I ρ and divides the curvilinear triangle made by α, γ , and I O into two curvilinear triangles with congruent incircles. Show that δ passes through the center of α. The problem is generalized as follows (Okumura, 2019c) (see Fig. 44). Theorem 16. For a point Pi (i = 1, 2) on the segment AO, let δi be the circle passing through the points I , I ρ , and Pi . Then the incircle of the curvilinear triangle made by α, γ , and δ1 is congruent to the incircle of the curvilinear triangle made by α, I O, and δ2 if and only if the center of α coincides with the midpoint of P1 P2 .

Reflection in the Axis Let  be the reflection in the axis. The next sangaku problem was proposed by Satoh in 1850 (Yasutomi, 1987) (see Fig. 45). Problem 12. Show that the radius of the circle touching the semicircles β  externally, γ internally, and the axis from the side opposite to B equals a/2. The problem is obtained if n = 1 in the next theorem (Okumura, 2019f) (see Fig. 46). Theorem 17. Let (n) be the composition of’ and the homothety with center O and ratio n for a non-negative real number n, where we consider X(0) = O for any point X. If α2 (resp. β2 ) is the circle touching β (n) (resp. α (n) ) externally, γ internally, and

740

H. Okumura

Fig. 45 Satoh’s problem

γ α

β β O

B Fig. 46 n = 0.8

γ

A

β2

α2

β

β (n)

α

α (n) O

B

A

γ

Fig. 47 C1

γ β

B

β

α

O

α

A

the axis from the side opposite to B (resp. A), then the circles α2 and β2 touch and have radii a/(n + 1) and b/(n + 1), respectively. We construct a recursive circle configuration by Theorem 17. For this purpose we consider the arbelos together with its reflection in AB, i.e., we assume that α, β, and γ are the circles of diameters AO, BO, and AB, respectively, and denote the configuration consisting of the three circles by (α, β, γ ). Let C1 = (α, β, γ ) ∪ (α, β, γ ) (see Fig. 47). We call C1 a symmetric arbelos of diameter |AB|, where we assume |AB| = 1. Let α2 (resp. β2 ) be one of the circles touching β  (resp. α  ) externally, γ internally, and the axis from the side opposite to B (resp. A), where we assume that α2 and β2 lie on the same side of AB, i.e., they touch by Theorem 17. If γ2 is the smallest circle touching the two circles internally, then the figure (α2 , β2 , γ2 ) ∪ (α2 , β2 , γ2 ) is a symmetric arbelos of diameter 1/2 similar to C1 . We call (α2 , β2 , γ2 )∪(α2 , β2 , γ2 ) with its reflection in AB the two small copies of C1 (see Fig. 48). If the configuration Ck has been constructed, which consists of C1 , two symmetric arbeloi of diameter 1/2 similar to C1 , four symmetric arbeloi

25 Wasan Geometry

741

Fig. 48 C1 with the two small copies = C2

γ

γ β

B

β

α

O

α

A

Fig. 49 C0

B

A

of diameter 1/4 similar to C1 , · · · , and 2k−1 symmetric arbeloi of diameter 1/2k−1 similar to C1 , then we add the two small copies of each of the symmetric arbeloi of diameter 1/2k−1 and let Ck+1 be the resulting configuration. The configuration Cn is  now defined for any positive integer n. Then we define the configuration C0 = i≥1 Ci (see Fig. 49).

Golden Arbelos √ If a/b = φ ±1 , then (α, β, γ ) is called a golden arbelos, where φ = (1 + 5)/2. We consider a problem involving a golden arbelos. Let σ be the reflection in the perpendicular bisector of AB. Recall that the radius of Archimedean circles equals rA = ab/(a +b). We consider the next problem (Ono, 1855) (see Fig. 50). The same sangaku problem can be found in Fukushimaken Wasan Kenky¯u Hozonkai (1989). Problem 13. Let ε be the circle touching the semicircles γ internally, α σ externally, and the axis from the side opposite to A such that ε and α have the same radius. Find the radius of ε in terms of the radii of γ and δα .

742

H. Okumura

Fig. 50 The case α σ and ε being congruent

ε

δα

ασ

γ α

B

A

O

Fig. 51 A generalization

ε

δα α

β B

γ

ζ O

P

A

We show that the circles δα and ε touch in the problem. Let ζ be a semicircle of diameter BP for a point P on the segment AB constructed on the same side of AB as γ , and let ε be the circle touching γ internally, ζ externally, and the axis from the side opposite to A (see Fig. 51). The next theorem gives conditions in which the circles δα and ε touch (Okumura, 2019h). Theorem 18. The following statements are equivalent. (i) The circles δα and ε touch. (ii) The circle ε has radius b − rA . (iii) The semicircle ζ coincides with α σ . Assume that ζ and ε have radius a as in the problem. Then the circles δα and ε touch, and we get a = b −rA by the theorem, which implies 2a = a +b −rA . Hence we get the answer of the problem a = (a +b −rA )/2. On the other hand, a = b −rA is equivalent to b = φa. Therefore (α, β, γ ) is a golden arbelos and rA , a, b, a + b form a geometric progression with common ratio φ. Also a = b − rA shows that the circle of diameter OO σ is Archimedean. The circle also touches ε, since it is concentric to γ (see Fig. 52). The next theorem shows that the Archimedean circle can also be obtained in the case b = φa and gives a characterization of the golden arbelos using the Archimedean circle (Okumura, 2019h) (see Fig. 53). Theorem 19. Let η be the circle touching the circle ε externally and the axis at O from the side opposite to A. Then η is Archimedean if and only if ζ and ε have the same radius. In this event, (α, β, γ ) is a golden arbelos such that b = φa if and only if ζ and η touch.

25 Wasan Geometry

743

Fig. 52 The Archimedean circle of diameter OO σ

ε δα

γ

ζ

α Oσ

B Fig. 53 The case η being Archimedean

O

A

γ

δα ε

α

ζ B

O

η

A

Fig. 54 G ∪ Gτ = G1 ∪ G2

ε

ασ B

δα

γ

γτ α Aτ =Oσ O

Bτ =A

We assume b = φa and construct a self-similar circle configuration. Let τ be the composition of σ and the homothety of center A and ratio 1/φ, where σ is applied at first. Then τ coincides with the homothety of center O and ratio −1/φ for a point on the line AB. Therefore the axis is fixed by τ . Also γ τ passes through the √ point of tangency of δα and ε by Proposition 1, because (2 arA )2 = 2a · 2φ −1 a = |B τ O||Aτ O| (see Fig. 54). Let G be the figure consisting of α, α σ , γ , δα , and ε, which is obtained by τ i−1 for i = 1, 2, 3, · · · , and removing  the axis and AB from Fig. 50. Let Gi = G G0 = i≥1 Gi . Following to the custom of Wasan geometry, we describe G0 so that AB is a vertical line with its reflection in AB (see Fig. 55).

Arbelos with Overhang Aida (1747–1817) considered a deformed arbelos formed by a circle and two semicircles in the circle as in Fig. 56 and left several notable results, some of which are stated as follows (Aida, no date) (see Fig. 57).

744

H. Okumura

A

Fig. 55 G0 with its reflection in AB

B Fig. 56 Aida (no date)

Fig. 57 Aida’s deformed arbelos

γ q B

Q

β˜

α˜ O

p P

A

˜ is the quarter circle of radius p (resp. q) and Problem 14. Assume that α˜ (resp. β) center P (resp. Q) on the segment AO (resp. BO) such that one of the endpoints is O and the other lies on γ . Show that the following statements hold. ˜ and γ into two (i) The axis divides the curvilinear triangle made by α, ˜ β, curvilinear triangles with common inradius r satisfying q = 2(p−r)r/(p−2r). (ii) c = |AP | + q = |BQ| + p, where c is the radius of γ . (iii) p2 + q 2 = c2 . ˜ and γ , then 2s = (iv) If s is the  inradius of the curvilinear triangle made by α, ˜ β, 2 2 2 (p + q) / p + q − (p + q).

25 Wasan Geometry

745

Solving the equation in (i) for r, we get r = (p + q − c)/2 by (iii). Therefore we have



p2 + q 2 )/2 = (p + q −

2r + c = p + q. By the last equation and (iv), we have 2s = (p + q)2 /c − (p + q) = 2(p + q)r/c. Hence we get s p+q = . r c Essentially the same relation as (iii) was rediscovered (Jobbings, 2011). Aida’s deformed arbelos is a special case of a generalized arbelos called the arbelos with overhang (Okumura, 2014). Let Ah (resp. Bh ) be a point on the half line OA (resp. OB) with initial point O such that |OAh | = 2(a + h) (resp. |OBh | = 2(b + h)) for a real number h satisfying − min(a, b) < h. Let αh (resp. βh ) be the semicircle of diameter Ah O (resp. Bh O) constructed on the same side of AB as γ . The configuration of three semicircles αh , βh , and γ is called an arbelos with overhang h and denoted by (αh , βh , γ ) (see Figs. 58 and 59). The circle touching αh (resp. βh ) externally, γ internally, and the axis from the side opposite to B (resp. A) has radius ab . a+b+h Circles of the same radius are called Archimedean circles of (αh , βh , γ ). Let V (resp. W ) be the point of intersection of αh (resp. βh ) and γ in the case h ≥ 0. Since Fig. 58 − min(a, b) [A] (from Kond + Sholl to Kond + Sholl) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [A] = > [B]. From Kond + Sholl to Tond or, More Often, from [A1] to [B] . . . . . . . . . . . [B] = > [A1]. From Tond to Kond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [B] = > [B]. From Tond to Tond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X-Tiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The X-Tiles and the Tond Traditional Family of Pentagonal Patterns . . . . . . . . . . . . . . . . . Transition from Kond to Tond with the X-Tiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tond to Tond Transition Through the X-Tiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Self-Similarity of TOND Patterns Through the X-Tiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . First Inflation Rule: System V1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Second Inflation Rule: System V2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Third Inflation Rule: System V3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fourth Inflation Rule: System V4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Working with Decorated Rhombuses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . To Go Further . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

802 803 803 803 806 807 808 810 811 812 813 814 815 815 816 816 817 817 819 822 825 830 832 835 838 838 838

J.-M. Castera () Independent Artist, Paris, France e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_58

801

802

J.-M. Castera

Abstract Looking at the heritage of traditional Persian pentagonal patterns (patterns made from tiles derived from the pentagon), one can suppose that the artists, very early on, have set targets for their creation. One is the search for self-similarity; another is the search for methods of connection between the two main families of patterns. It is strange and intriguing that the historic artists did not fully achieve these targets. This paper, following a previous publication (Castera, Nexus Netw J 18:223, 2016), proposes solutions and new developments. There are no Penrose patterns in that story; only binary tiling and the X-Tiles.

Keywords Persian pentagonal patterns · Multilevel patterns · X-Tiles · Self-similarity · Binary tiling

Introduction: The Two Traditional Persian Families of Pentagonal Patterns The impossibility to pattern the plan (without any gap or overlap) with only regular pentagons is well known. Nevertheless, Persian artists trying to overcome that impossibility have produced many patterns with local five-fold symmetry. These “pentagonal patterns” are always included in a periodic network. If it happens that they introduce some light variations which break the periodicity that does not make the pattern an aperiodic one (in other words, a 2D quasicrystal). For geometric analyse of pentagonal patterns, see for example Lee (1987); About connection with quasicrystals, one of the first contributions is in Makovicky (1992). There are two main families of pentagonal Persian patterns. Let [A] be the first one, which the Iranian tradition call “Kond + Sholl,” (Mofid and Raieszadeh 1995) and [B] the second one, “Tond.” Let [A1] be the subset of [A] made of only Kond tiles. Those families are defined by the different kinds of tiles they are composed of (Figs. 1 and 6). The vertices of these patterns are always of degree 4 (intersection of two lines). They can therefore be colored using two colors, as in a chessboard. As a result, a remarkable property emerges: In such a coloring, any two tiles that are of the same kind will also be of the same color (this is not the case in the Arabic style). That is why we can distinguish two kind of tiles. Let them be the “positive” and the “negative” tiles. Note that a pattern can be drawn using only one kind of tile (positive or negative), connected by vertices that respect the continuity of the lines. The shapes of the void between the tiles are the shapes of the other kind of tiles. The following figures show the set of the tiles, the variations, and some examples.

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

803

Fig. 1 The “Tond + Sholl” family set of tiles. While it can be slightly different in the Iranian tradition, this is the necessary and sufficient set for any pattern in that family to be fully selfsimilar. The tile N1, a stellated decagon, is “the mother” of each tile. It is never used in that way, but always decorated with one of its variations (Fig. 2). One could say that the mother is always pregnant. The Kond tiles define the subset [A1]. (© Jean-Marc Castera 2015–2018)

Fig. 2 The mother (Stellated decagon [10/3]). Proportions, angles, and variations. Only N11 (Sun) keeps the symmetries; N12 is oriented and N13 uses a new tile. N41 is the unique variation of the tile N4. (© Jean-Marc Castera 2015–2018)

The Kond + Sholl Family See Figs. 2, 3, 4, and 5.

The Tond Family See Figs. 6 and 7.

Multilevel Patterns. Reminders, and a New Case A first-level pattern is made of large-scale tiles, each one cut into small-scale tiles that fit perfectly with the adjacent ones. That is the second-level pattern. Iranian artists call this process “the pattern into the pattern.” Would it be possible to make a

804

J.-M. Castera

Fig. 3 The simplest Kond pattern: Nothing but 4 stars holding hands, with variation N11 . (© JeanMarc Castera 2015–2018)

Fig. 4 Left, the simplest pattern made of Kond and Sholl tiles, with stars on each vertex of a rhombus. On the right, the stars are on the vertices of a rectangle. (© Jean-Marc Castera 2015–2018)

third level, and so on ad infinitum, using always the same substitution rules? In this case, the pattern would be self-similar. Of course, there is no way to make such a pattern in real mosaic; even with only three levels, the material difficulties are obvious. Nevertheless, if all the shapes of the tiles being used at the second level are used at the first level (large scale), then all the substitution rules needed are defined, the process can go on (virtually) ad infinitum, and therefore the pattern is fully self-similar.

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

805

Fig. 5 A Turkish pattern. Note the variation N41 and flowers from the Tond family at the bottom corners. On the right, the pattern reduced to tiles from only the Kond + Sholl system (What I want to say is that to do this reduction is a good exercise). Note the tile P5, which is rarely used, and two slightly forced connections at the edges (showing also the small deviation from the original). (© Jean-Marc Castera 2015–2018)

Fig. 6 The set of Tond tiles (system [B]). Although the tradition is sometimes to use more tiles, these ones are the most common. Moreover, they are X-Tiles compatible (see section “X-Tiles”), thus they can form self-similar systems. (© Jean-Marc Castera 2015–2018)

Surprisingly, it looks like historic Persian artists achieved only two-level patterns: there is at least one tile at the second level that does not exist at the first. So, one substitution rule is missing, which is necessary to get to a third level. In the Kond + Sholl patterns, that tile is the Sormedan (Fig. 8, and N3 in Fig. 1). Even so one can find some two-level pattern with that tile at the first level, but never in good connection with its adjacent tiles and/or never without introducing some extra tiles that do not admit a compatible substitution rule. Examples of traditional multilevel patterns design can be seen in Mofid and Raieszadeh (1995),

806

J.-M. Castera

Fig. 7 Top: the most common Tond patterns (stars are set up along a rhombus). Bottom: some slight variations. (© Jean-Marc Castera 2015–2018) Fig. 8 The problematic Sormedan. (© Jean-Marc Castera 2015–2018)

Necipogglu (1995), Shaarbaf (1982). For western scholars investigations, see for example Bonner (2017), Cromwell (2009), Pelletier (2013).

Two Kond Self-Similar Systems We want to complete the traditional two-level Kond + Sholl patterns in order to make them fully self-similar. That means to pattern each tile with tiles of the same

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

807

Fig. 9 A Kond + Sholl fully self-similar system. Bottom left, the covering of the three different kinds of edges in that family. To the right, alternative options for the tiles P1 and N3. Bottom, the covering of the three types of edges. (© Jean-Marc Castera 2015–2018)

family at a reduced scale. The two first solutions come from a previous article (Castera 2016), while the third one presented here is new. The general method consists in first searching for a covering of the different kinds of edges (three for the family A; only two for the subfamily A1), then in patterning the interior of each tile in continuity. The first solution (Fig. 9) works for the whole Kond + Sholl family. A similar solution is shown on http://www.quadibloc.com/math/pen05.htm. The second solution uses smaller second-level tiles but works only for the Kond subfamily (Fig. 12). There is actually a slight difficulty with the Sormedan (tile N3). The resolution requires that the “Golden rule” be broken, which asserts that each first-level vertex be the center of a second-level star. System 1: This works for the whole Kond + Sholl family (Figs. 10 and 11). System 2: This works only for the Kond subfamily (Figs. 12 and 13).

A Third Type of Kond Self-Similar System In the meantime, we have found a third solution, which is the generalization of a two-level pattern seen in Iran (Fig. 14). In these two examples, the Sormedan is still missing at the first level. Figure 15 shows a complete solution, with two options for the Sormedan. The two levels belong to the subfamily [A1], (Kond family). Applying endlessly the substitution rules from any Kond tile generates a 2D quasicrystal-like pattern, fully self-similar (Fig. 16).

808

J.-M. Castera

Fig. 10 A two-level pattern in Shiraz. Note (highlighted to the right) the non-symmetric arrangement of the pentagons (Tiles P3). (© Jean-Marc Castera 2015–2018)

Fig. 11 Starting from the Sormedan tile, four levels are necessary for all the substitutions rules to be used. (© Jean-Marc Castera 2015–2018)

Transitions Between Different Families In a two-level pattern, both levels can belong to the same families or to different ones. The historic Persian artists have explored some possibilities but, surprisingly, not all of them.

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

809

Fig. 12 The second solution for a Kond self-similar system. To the right, alternative solutions for the tiles P1, P2, and N3. Bottom, the covering of the two types of edges. (© Jean-Marc Castera 2015–2018)

Fig. 13 This famous Darb-el-Iman pattern (Isfahan) belongs to the second system of selfsimilarity. Again, note that the orientations of the pentagons are not symmetric. And this is NOT an error! (© Jean-Marc Castera 2015–2018)

Let [A] and [B] be the two families (Kond + Sholl and Tond). There are then four transition possibilities: [A] = > [A] (from Kond + Sholl at the first level to Kond + Sholl at the second level). [A] = > [B] (from Kond + Sholl to Tond) [B] = > [A] (from Tond to Kond + Sholl) [B] = > [B] (from Tond to Tond).

810

J.-M. Castera

Fig. 14 Third type of two-level Kond pattern. Left, (Isfahan, Chahar Bagh Madrasa), only the positive tiles have a second level. To the right, the pattern is made of mirrors and all the tiles have a second level (Shiraz, Shah Cheragh). Superimposed, the covering of the two types of edges (long and short) at the first level. (© Jean-Marc Castera 2015–2018)

Fig. 15 The five Kond tiles, along with their rules of substitution into tiles of the same family at reduced scale. That set defines a third type of self-similar system. The traditional system required in addition only a substitution rule for the tile N3, the Sormedan. Note here the two options (N3a and N3b) for that tile. Note also that, as for the two previous solutions, this one is challenging the traditional “golden rule,” which requires that all the first-level vertices be the centre of a star N1 at the second level. (© Jean-Marc Castera 2015–2018)

The last case does not exist in the traditional heritage, which is why we are going to propose solutions to it later in this article. Moreover, those solutions will be fully self-similar. First, here are some examples seen in Iran.

[A] = > [A] (from Kond + Sholl to Kond + Sholl) This case has been examined in the above-mentioned previous publication and in the previous section of this article (A third type of self-similar system). More common

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

811

Fig. 16 An original third type self-similar pattern, with the Sormedan tile at the first level. The variation N11 is systematically applied in place of the N1 star. The substitution rules are defined from the first level onwards, thus this pattern would be fully self-similar . . . if the first level was not periodic. (© Jean-Marc Castera 2015–2018)

Fig. 17 These two mosaics, found in Isfahan, use mostly the same pattern. The first level belongs to the Kond family while the second includes also some Sholl tiles. Note that the positive tiles admit a geometric second level, while the others are decorated with floral elements. The superimposed design shows a generalization of this system, by setting a substitution rule for every tile, including the Sormedan. However, this is only a two level system, not self-similar, because the other tiles (bottom) of the second level do not admit any compatible substitution rules. Note the two variations for the tiles N1 and P2. (© Jean-Marc Castera 2015–2018)

examples are reduced to the subfamily [A1], except that the tile N3 does not exist at the first level (or rarely, with incompatible substitution rules). Here are more examples, which cannot generate a self-similar system. However, we give here a substitution rule for the Sormedan that, once again, is conspicuously absent at the first level (Fig. 17).

[A] = > [B]. From Kond + Sholl to Tond or, More Often, from [A1] to [B] Here is an example (Fig. 18), and its generalization to all the Tond tiles:

812

J.-M. Castera

Fig. 18 Once again, the first level includes only Kond tiles. It is the simplest pattern in that family, made from 4 stars holding hands, arranged in a diamond shape (see Fig. 3). Superimposed is an original solution for the missing tile at the first level, the Sormedan . . . as usual. (© Jean-Marc Castera 2015–2018)

Fig. 19 In these two mosaics, there are only Kond tiles at the second level, while the first level is nothing but the simplest Tond pattern. Only five Tond tiles are used (P1, P2, P3, N1 and N6). Is it possible to complete the system? (© Jean-Marc Castera 2015–2018)

[B] = > [A1]. From Tond to Kond Figure 19 shows two solutions. The first level pattern is the same (the simplest Tond pattern), which uses a limited set of tiles from that family (see Fig. 7).

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

813

Fig. 20 Generalization of the simplest transition from Tond to Kond. The tile N5 needs two variations of the tiles P1 and P2. (© Jean-Marc Castera 2015–2018)

Generalization of the First Example The tile N3 admits two chiral options (N3a and N3b). The tile N5 admits a more convoluted solution, which requires the use of two variations for the tiles P1 and P2, each one with two chiral options. The star N1 can be replaced with one of its variations, for example, N1a . This proliferation of the variations is not very elegant. Its advantage is to complete the system. If not perfect, it may be the perfect level of imperfection . . . (Fig. 20). Generalization of the Second Example After the second generalization (Fig. 21), we have two systems of connection from any Tond pattern at the first level to a Kond pattern at the second level. Many two-level patterns can be seen in Iran. Not all of them have a second level with all the tiles fitting correctly together or with severe local deformations. Visually, these little arrangements can go unnoticed. This kind of things deserves further study . . . in a future article.

[B] = > [B]. From Tond to Tond As far as the author knows, there is not such two-level pattern in the traditional heritage (at least in Iran). If one exists, it is certainly well hidden. In short: We now have transitions, fully self-similar, from [A] to [A], from [A1] to [A1], from [A] to [B], from [B] to [A1]. So, we can imagine cycles like [A] = > [B] = > [A1] = > [A] = > [B] = > [A1], etc. Now we are going to propose transition solutions from [B] to [B]. Before, we have to quickly describe the “X-Tiles” system (Castera 2011).

814

J.-M. Castera

Fig. 21 Generalization of the second solution of the two-level system with first [B] level and second [A] level. Here, contrary to the previous solution, each tile has a unique substitution rule. Top left, the covering used for each of the three different edge lengths in that set. (© Jean-Marc Castera 2015–2018)

Fig. 22 This pattern on the sphere of the new EAU council is at the root of the emergence of the X-Tiles. (© Foster+Partners)

X-Tiles As the above-mentioned publication shows, any Tond pattern can be made from a couple of rhombi decorated with two simple lines. Let them be the “X-Tiles,” while the lines are the “X-Lines.” Those two rhombi are well known from the famous Penrose patterns, which use the same rhombi, along with constraints (matching rules) that force the pattern to be non-periodic. But the X-Tiles matching rules are different. In fact, they are the same as those of the binary pattern, which can be found at http://www.quadibloc.com/math/pen02.htm. They allow both nonperiodic and periodic patterns.

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

815

Fig. 23 The two “penta-rhombi,” coming from the decagon. One fat and one slim, known as Laurel and Hardy in a certain circle of crystallographers. (© Jean-Marc Castera 2015–2018)

Fig. 24 The embellishment of the penta-rhombus with the X-Lines defines the X-Tiles. Right, two partitions of the decagon with penta-rhombi. Only the configuration “Rose 2” enables line continuity. (© Jean-Marc Castera 2015–2018)

This system, which is extremely simple, appeared on the occasion of a competition with the British architect Norman Foster (Fig. 22). The X-tiles naturally lead to all the Tond patterns (as defined in this article). There is no reason to believe that the traditional artists used, or were aware of, this concept.

Definition The X-Tiles are the two “penta-rhombi” decorated with only two lines crossing at the centre of each rhombus, with angles of 36 (fat rhombus) and 108 (slim rhombus) degrees. The matching rules are given by the constraint of continuity of the X-Lines. The X-Tiles constitute an extremely simple system (Figs. 23 and 24).

The X-Tiles and the Tond Traditional Family of Pentagonal Patterns The figure below (Fig. 25) shows all the different arrangements of the X-Tiles around a common vertex. After exclusion of the cases which break the X-Lines continuity, nine configurations remain. In each one, the X-Lines form, around the common vertex, a special shape, which are exactly the Tond tiles!

816

J.-M. Castera

Fig. 25 All the valid arrangements of X-Tiles around a common vertex lead to all the Tond tiles. Bottom left, the forbidden configurations. (© Jean-Marc Castera 2015–2018)

Transition from Kond to Tond with the X-Tiles The previous publication demonstrates two possibilities for the decomposition of Kond tiles into penta-rhombi, in configurations compatible with the X-Tiles. This leads not only to transitions from Kond to Tond patterns (Fig. 26) but also to transitions from Tond to Tond patterns, which this article is now going to explore.

Tond to Tond Transition Through the X-Tiles A binary pattern can be made tile-to-tile, using local rules. However, it is also possible to use an inflation process, which means to replace any basic tile by a set of the same kind of tiles at reduced scale. With the X-Tiles, this process leads to an infinite series of Tond patterns at decreasing scales. This solves the problem of Tond patterns’ self-similarity. However, that is certainly not the unique way to produce self-similar Tond patterns.

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

817

Fig. 26 A transition from Kond to Tond through the X-Tiles. (© Jean-Marc Castera 2015–2018)

Fig. 27 X-Tiles inflation, applied to the N4 tile environment, leads to its partition into the same kind of tiles, with reduced scale. This process can be applied to any Tond tile (Fig. 28), and then the system becomes fully self-similar. (© Jean-Marc Castera 2015–2018)

Self-Similarity of TOND Patterns Through the X-Tiles Principle Since we know that it is possible to apply inflation processes to the binary tiling, thus to the X-Tiles, we have a general method of Tond to Tond transition, with selfsimilarity. An illustration of the process may be the simplest explanation (Figs. 27, 28, and 29). One can say that, in this example, the inflation rule is not the simplest (see section “Fourth Inflation Rule: System V4”). However, it leads to a relationship between the two levels, which is more elegant than others coming from simpler rules. We now are going to explore systematically the use of different X-Tile’s inflation rules.

818

J.-M. Castera

Fig. 28 It becomes possible to forget the X-Tiles and compose multi-level patterns directly with these two-level tiles. The second level lines of any adjacent tiles will automatically match perfectly. Most of the tiles are oriented. Note that the tile N4 can exist only at the first level because, like the others, it is missing at its second level. Apart from that, the system is self-similar. (© Jean-Marc Castera 2015–2018)

Fig. 29 An original pattern from that system. The pattern uses all kinds of Tond tiles, except N2 and N5. It is original, but the originality lies more in the system itself than in any particular pattern made from this system. (© Jean-Marc Castera 2015–2018)

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

819

Fig. 30 The simplest inflation rule for the binary tiling. (© Jean-Marc Castera 2015–2018)

Fig. 31 To the left, the diamonds transformed according to the inflation rule, but arranged without regard to their orientation. To the right, the unique valid arrangement in the case of that inflation rule. No gap, no overlap. (© Jean-Marc Castera 2015–2018)

Fig. 32 In addition to the inflation rule, we use this coding to fix the relative orientation of the rhombi. (© Jean-Marc Castera 2015–2018)

First Inflation Rule: System V1 The Inflation Rule With the first application of the rule (Fig. 30), the tiles lose one symmetry axis, thus the invariance under 180◦ rotation. That is why we have to care about the orientation of the rhombi during the inflation process (Figs. 31 and 32). A visual coding fixes the right orientation. Now we have an inflation rule for each rhombus, and a rule fixing their relative orientation: we now have everything needed to continue the process. In this case, the solution is unique. We will see later that in the case of different inflation rules, there are many solutions. The relative frequency of the two rhombi converges to the golden number, which confirms that a pattern coming from that process is not periodic. It is easy to demonstrate this property (see case V4) (Fig. 33). Figure 34 shows the set of all Tond tiles that can emerge from this V1 system. Order of Appearance of the Tiles See Fig. 35.

820

J.-M. Castera

Fig. 33 Considering all the possible rhombuses configurations around a common vertex, after inflation, leads to all the tiles that can emerge from the V3 system. (© Jean-Marc Castera 2015– 2018)

The Two-Level Tiles Could the superimposition of successive levels leads to a self-similar system defined directly by Tond tiles inflation? Figure 36 shows examples of superimposition of two successive levels (Levels 3 + 4, and 3 + 5). In the two cases, the cutting of the second level tiles by the edges of the first level is not really elegant, giving a feeling of disorder. However, that disorder may have some aesthetic value. Figure 37 shows the set of all the tiles with three levels of inflation. Each one, except the tile N5, requires two options, which are used in the double-sided tiles of the “Abyme puzzle” (Fig. 38). That pattern has also been used for the album cover of “Different Tessellations” by the British pianist, composer and improviser Veryan Weston.

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

821

Fig. 34 The set of tiles that can emerge from the V3 inflation system. The tiles N1 and N2 are missing. (© Jean-Marc Castera 2015–2018)

Fig. 35 Successive applications of the inflation process. The tiles appear gradually. At the fourth application, all the possible tiles in this system have emerged. Level 1 (first X-Tiles inflation): Tiles P1 and P2. Level 2: Tile N6: Level 3. Tiles N4 and N5. Level 3: Tiles P3 and N3. (© Jean-Marc Castera 2015–2018)

822

J.-M. Castera

Fig. 36 Left, Levels 2 and 3 superimposition. Right, levels 3 and 5. (© Jean-Marc Castera 2015– 2018)

Fig. 37 The Tond tiles with 3 superimposed levels of inflation. Each one leads to two different configurations, except the most symmetric, N5. (© Jean-Marc Castera 2015–2018)

Fig. 38 The “Abyme puzzle.” Each tile, except N5, is printed recto-verso with the two variations of a three-level inflation. Right, the album cover of “Different Tessellations” (Veryan Weston, Leo Svirsky and the Vociferous Choir, 2011). (© Jean-Marc Castera 2015–2018)

Second Inflation Rule: System V2 The Inflation Rule See Figs. 39, 40, and 41.

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

823

Fig. 39 The second X-Tiles inflation rule. (© Jean-Marc Castera 2015–2018)

Fig. 40 Due to the dissymmetry of the second rhombus after inflation, it is necessary to takes in account the orientation. That is why a coding is added to the figure. Contrary to the previous case V1, there are many different valid choices. (© Jean-Marc Castera 2015–2018)

Fig. 41 Here is the point of view of decorated rhombuses. Again, the slim rhombus loses one symmetry axis after the first inflation. See further (section “Working with Decorated Rhombuses”) on this article. (© Jean-Marc Castera 2015–2018)

Here again, it is not surprising that the relative frequency of the two kind of rhombuses converges to the golden number.

The Set of All the Tond Tiles that Can Emerge from the V3 System The six negative Tond tiles have been generated by the different arrangements of the X-Tiles around a common vertex (Fig. 25). In each of those arrangements, the X-Tiles are connected to this common vertex by their 36◦ angles for the slim rhombus, and by their 72◦ angles for the fat one. It so happen that, after inflation, these rhombuses have the same immediate environment. Therefore, their arrangements will lead again to the same six negative tiles. The three positives tiles as well emerge (Fig. 42). Order of Appearance of the Tiles See Fig. 43. The Two-Level Tiles See Figs. 44 and 45. Looking at two successive inflation levels, we got the idea to define an inflation process directly on the Tond tiles, without going through the X-Tiles. The result is

824

J.-M. Castera

Fig. 42 All the Tond tiles emerge from the system V2. (© Jean-Marc Castera 2015–2018)

Fig. 43 All the Tond tiles emerge after 3 inflation levels: P1, P2, N6 since the first level, P3, N2, N4 at the second, and eventually N1 and N3 at the third. (© Jean-Marc Castera 2015–2018)

Fig. 44 Superimposition of two successive levels (levels 3 and 4). Here comes the idea of an inflation defined directly on the tiles. (© Jean-Marc Castera 2015–2018)

much more . . . calm than with the first system, V1, but the decomposition of some tiles is not unique, and the second level tiles are badly cut by the first level edges.

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

825

Fig. 45 Unfortunately, several different inflations are necessary for some kind of tiles. Moreover, the edges of the first level cut the second level tiles in ways that are not always very elegant. (© Jean-Marc Castera 2015–2018)

Fig. 46 Two valid alternative options for the rhombuses orientation, among others. (© Jean-Marc Castera 2015–2018)

Fig. 47 The third X-Tiles inflation rule, system V3. (© Jean-Marc Castera 2015–2018)

Remark: Other Valid Orientation Options in the V2 System Figure 46 shows two orientation systems, which are lightly different from the one we have used here. The first differs by the orientation of one only rhombus, the second by more. Visually, the differences between the resulting Tond patterns may not be dramatic. However, are the relative amount of the different tiles affected? Does every tile of the system will emerge? We are going to discuss those questions with the next inflation rule.

Third Inflation Rule: System V3 The Inflation Rule See Figs. 47 and 48.

826

J.-M. Castera

Fig. 48 When connected, the inflated tiles will fill the gaps at the edges always the same way. Therefore, it is possible to continue the inflation with the decorated rhombuses (to the right). (© Jean-Marc Castera 2015–2018)

Fig. 49 The tiles N2 and N4 occur around a connection with the special obtuse vertex of a slim rhombus. (© Jean-Marc Castera 2015–2018)

Fig. 50 Complete set of the tiles that can emerges from the system V3. Tiles N3 and N5 are missing. (© Jean-Marc Castera 2015–2018)

Note that the covering of the edges is symmetric. Again, the relative frequency of the two kind of rhombuses converges to the golden number. If the process is applied ad infinitum to a rhombus, its limit becomes a fractal curve.

The Set of All the Tond Tiles that Can Emerge from the V3 System The tiles P1, P2, P3 and N6 occur since the first level. Note that the gap at the boundary of the first level tiles is always filled in a unique way, so it becomes possible to continue the inflation process from the decorated tiles (Fig. 48, right). It so happen that the vertexes of these rhombuses are pieces of stars . . . except one of them, an obtuse vertex of a slim rhombus (see the arrow on the figure). Therefore, each vertex of such a pattern is the centre of a star, except if connected to a slim rhombus by such an obtuse vertex. So, it is sufficient to look at all these possibilities to find out all the others negative tiles that can emerge from a pattern made of these decorated tiles (whether by inflation or not) (Figs. 49 and 50).

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

827

Fig. 51 Four valid options of rhombuses orientations in the V3 system. (© Jean-Marc Castera 2015–2018)

That system provide all the Tond tiles, except N3 and N5. Now, in order to continue the inflation process, it needs to fix the relative orientations of the rhombuses. There are many valid configurations, including the following (Fig. 51): These different configurations have an incidence on the resulting patterns: 1. The relative frequency of the different type of tiles is variable. 2. In the case of V3.1, the tile N4 never occurs. Indeed, that tile comes from the connection of one fat rhombus and two slim in a head to tail configuration, which is impossible in that option. Instead, the option V3.4 will increase the frequency of the tile N4. In short: • Option V3.1: No tile N4. The others negative tiles occur since the second level. • Option V3.2: Few N4 tiles. They appear only at the third inflation level: six in the inflation of the fat rhombus, only one with the slim. • Option V3.3: Tile N4 occurs since the second level. There are two in the inflation of the fat rhombus, but none with the slim. • Option V3.4: Tiles N4 since the second level. There are three in the inflation of the fat rhombus and two with the slim. At the third level, there are 94 and 58.

Option V3.1 Order of Appearance of the Tiles See Figs. 52, 53, 54, and 55.

The Two-Level Tiles and the Interlacings The superposition of two successive levels provides a mapping of the first level tiles by the next level. Unfortunately, the two levels do not fit very well, and some tiles

828

J.-M. Castera

Fig. 52 At the first inflation, the three positive tiles and the negative N6 are here. (© Jean-Marc Castera 2015–2018)

Fig. 53 Second level of inflation. Here emerge the tiles N1 and N2. (© Jean-Marc Castera 2015– 2018)

Fig. 54 Third level, detail. Emphasized is the boundary between the second level components. Obviously, there is no new kind of tiles: all of them existed already at the previous level. (© JeanMarc Castera 2015–2018)

Fig. 55 The tiles emerging from the V3.1 option. In addition to N3 and N5, the tile N4 is missing. (© Jean-Marc Castera 2015–2018)

needs two chiral options. But a good news occurs when considering two successive levels of the same parity (levels 1 and 3 on Fig. 56 right): Even though the second level tiles are badly cut, it becomes possible to design interlacings, whose lines fit nicely with the third level tiles. Those interlacings are the thinnest possible. We could design wider ones, but they would not fit that well. We know that the other options provide the tile N4. Let have a look on the option V3.4.

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

829

Fig. 56 To the left, levels 1 and 2. In term of tiles, the two levels still do not fit nicely: many second level tiles are cut without regard to their symmetries. However, the nice thing is that we can design interlacings, which match perfectly with the third level tiles (Right). (© Jean-Marc Castera 2015–2018)

Fig. 57 To the left, the rhombuses orientation. Same as for the previous case, all positives tiles, and N6 as well, occur since the first level. The others negative tiles occur at the second level. Including N4, highlighted on the figure. (© Jean-Marc Castera 2015–2018)

Fig. 58 The same zoom as in Fig. 54. The difference is due to the tiles N4. In light color, the previous level of negative tiles. (© Jean-Marc Castera 2015–2018)

Option V3.4 Here the orientations are set up with a maximum of slim rhombuses in a head to tail configuration, in order to allow the emergence of as many tiles N4 as possible (Fig. 57). Is it the maximum solution? Not sure. Figure 58 shows a detail of the pattern after three levels of inflation, and the previous level tiles as well. The pattern of these tiles is still imperfect: many second level tiles are cut without respect to their symmetries.

830

J.-M. Castera

Fig. 59 (a) That inflation rule needs the use an extra shape, the decagon. (b): The equivalent full surface of the rhombuses. (c): After the first inflation, we have the positive Tond tiles P1 and P2, and the negative tiles N5 and N6 as well. Note that the covering of the edges is not symmetric. (© Jean-Marc Castera 2015–2018)

Fourth Inflation Rule: System V4 The Inflation Rule Here the inflation rules are a little bit tortuous. Indeed, if we use only the two rhombuses (Fig. 59a), at each inflation step, the resulting pattern leave gaps. No overlapping, but octagonal gaps, which have to be filled by the third shape. The algorithm is not obvious, it needs some testing. The second line of the figure (Fig. 59b) shows the equivalent surface of the rhombuses, thereby calculating the relative frequency of the two shapes (although this arrangement is not X-Tiles compatible): We can see on the figure that, after inflation, each fat rhombus gives way to 20 fat and 12 thin rhombuses. Similarly, each slim rhombus gives way to 12 fat and 8 slim. Let un be the number of fat tiles and vn the number of slim tiles in a pattern made after n inflations. After a new inflation, the new pattern has un + 1 fat and vn + 1 slim tiles, with the relations: un+1 = 20un + 12vn and vn+1 = 12un + 8vn .

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

831

Fig. 60 Superposition of two successive levels in the cases of the systems V1 (left) and V2 (right). (© Jean-Marc Castera 2015–2018)

Or, if rn = un /vn , rn + 1 = (20rn + 12)/(12rn + 8). The serial {rn } converges, its limit r satisfies the equation r2 – r – 1 = 0, which solution is the golden ratio. Not surprising, for a pattern made of that kind of tiles. That system generates all the different Tond tiles . . . except N4. Since the first inflation, we have the positives tiles P1 and P2, and the negatives tiles N5 and N6 as well (Fig. 59c). At the next inflation, we get the rest: P3, N1, N2, and N3. Now, we can wonder: What is really interesting in that tortuous system? The answer comes when looking at the superimposition of different levels of the resulting patterns. Firstly, have a look on what happen in the previous cases, when superimposing two successive inflations (Fig. 60). In these cases, the second level tiles do not fit nicely with the first level. It is better with the system V2 than with the system V1, but still not satisfying. But with the system V4, everything looks very good (Fig. 61). It is not absolutely perfect with two successive levels: one edge of the little tile P1 is still badly broken by the first level edges. But if we consider two levels of the same parity (levels 2 and 4 on Fig. 61 right), there is only a little, quite unnoticeable imperfection. The complete set of the two level tiles is shown in Fig. 28. It includes all the Tond tiles. Even though the tile N4 do not emerges from the inflation system, it can be designed apart. However, it can only exist at the first level. After that, it is lost in inflation. The other nice thing here is that the system provides a natural, easy way to draw interlacings (Figs. 62 and 63). Moreover, their width is the correct size, in an aesthetic sense (which was not the case in Fig. 56).

832

J.-M. Castera

Fig. 61 Case of the system V4. To the left: two successive levels (levels 2 and 3). To the right: two successive levels of same parity (levels 2 and 4). (© Jean-Marc Castera 2015–2018)

Fig. 62 The interlacings, drawn at the first level (left) and at the second level as well (right). (© Jean-Marc Castera 2015–2018)

Working with Decorated Rhombuses In some cases of inflation systems, it is possible, since the first level, to replace the new rhombuses by a decoration of the initial ones with the resulting second level tiles. That happens when the pattern of the edges of the rhombuses is symmetric. Figure 69 shows the examples from the systems V2 and V3. The decoration of the fat rhombus in the second case leads to the most common pentagonal pattern, which can be seen in Persian, Arabic, or Indian traditional styles. However, there are simpler ways to generate it (Castera 1996; Castera and Jolis 1991). It happens that the artists have recognized, and used, the second (slim) rhombus (Fig. 68). Sometimes, being fascinated by symmetry, the Moroccan artists wanted that rhombus decoration to be symmetric (Fig. 65c). See Figs. 64, 65, 66, 67, 68, and 69. Note that, because the edges of these decorated rhombuses are symmetric, it becomes possible to use them as a mapping of any pattern made of the two pentarhombuses. Including any generalized Penrose pattern, binary tiling, or periodic patterns.

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

833

Fig. 63 Design of the tiles at three successive levels, with the interlacings on the two first levels. All the lines fit nicely together. (© Jean-Marc Castera 2015–2018)

Fig. 64 Decorated rhombuses, rectangles, pentagons, or hexagons can be seen in different styles of traditional patterns. Right, the top panel is made of fat penta-rhombuses while the bottom one is made of slim penta-rhombus. Note that both are not the same scale. (© Jean-Marc Castera 2015– 2018)

The next figures show three applications of these systems (Figs. 70 and 71). Figure 72 shows the Moroccan-style variation of the Fig. 71. Note that:

834

J.-M. Castera

Fig. 65 That Zellij medaillon (a) on a spandrel of the Andalous mosque in Fes (Morocco) is made of four decorated fat penta-rhombuses (b). These rhombuses can also be seen as made from the two standard decorated rhombuses (c). See also Fig. 69, bottom. (© Jean-Marc Castera 2015–2018)

Fig. 66 The design of this plaster panel at the medersa Bou Inania in Fès (Morocco) is made of fat rhombuses decorated, and 20-pointed stars included into decagons. (© Jean-Marc Castera 2015–2018)

1. There is no more orientation. 2. In order to make it a suitable pattern, we have modified the edges. Now, the pattern is perfectly inserted into a periodic structure, the edges are the symmetry axis. 3. If the pattern is suitable for Persian artists, it is not still perfect for the Moroccans. Just a little thing is missing (although it can be easily added . . . at least along the horizontal edges). Indeed, the Moroccan artists and craftsmen cannot handle that

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

835

Fig. 67 This is the simplest and most famous pentagonal pattern. It can be seen in different styles and different variations. From left to right: India (Agra), Morocco (Rabat), Iran (Tehran), India (Agra). (© Jean-Marc Castera 2015–2018)

Fig. 68 The two decorated penta-rhombs, in a wooden door (Yazd, Iran). (© Jean-Marc Castera 2015–2018)

the tiles are cut at the limit of a pattern, even though that limit is a symmetry axis. They want the pattern to be framed by a “river,” made of entire tiles . . . but this is another story.

To Go Further Just an example of original pattern (Fig. 73): This pattern is made of a first-level Tond pattern with interlaces, with, at the second level, an alternation of Tond and Kond patterns. It can be considered a mix between a Tond to Tond and a Tond to Kond systems that fit nicely together. Of course, this is fully self-similar, even though only two levels are drawn.

836

J.-M. Castera

Fig. 69 The two decorated rhombuses coming from the V2 X-Tiles inflation system (top), and from the V3 system (bottom). Bottom left, the Moroccan style interpretation of the slim rhombus, with two symmetry axis. (© Jean-Marc Castera 2015–2018)

Fig. 70 Detail of a pattern made from the first system of rhombus mapping with Tond tiles. Bottom, the limits of the rhombuses are highlighted. (© Jean-Marc Castera 2015–2018)

The idea is to have all the different style of pentagonal patterns connected together inside a same self-similar system. Because everything is in relation with everything, wish those relations be harmonious and peaceful.

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

837

Fig. 71 Detail of a pattern made from the second system of rhombus mapping with Tond tiles (which comes from the X-Tiles system V3). Note that this pattern is oriented. (© Jean-Marc Castera 2015–2018)

Fig. 72 A Moroccan-style variation from the pattern in Fig. 71, with modification of the edges, in order to make the pattern periodic. This pattern uses the modified slim rhombus, so it is not oriented anymore. (© Jean-Marc Castera 2015–2018)

838

J.-M. Castera

Fig. 73 This detail of an original pattern is a mix of Tond to Tond and Tond to Kond systems. (© Jean-Marc Castera 2015–2018)

Conclusion The connection between the X-Tiles and the Tond patterns was very unexpected. Not only it shows that a whole family of complex-looking patterns can be reduced to arrangements of only two kind of so simple tiles but also it gives a solution to a question that the traditional Persian artists – for mysterious reasons – had not solved. Moreover, that solution produces not only two levels but self-similar patterns.

Cross-References  Fractal Geometry in Architecture  Geometric and Aesthetic Concepts based on Pentagonal Structures

References Binary tilling: http://www.quadibloc.com/math/pen02.htm Bonner J (2017) Islamic geometric patterns. Their historical development and traditional methods of construction. Springer, New York Castera JM (1996) Arabesques: Art Décoratif au Maroc, ACR Edition, Paris (English edition in 1999) Castera JM (2011) Flying patterns. In: Proceedings of the ISAMA/Bridges, Coimbra. Can be downloaded from Bridge’s web site, or from http://castera.net/entrelacs/public/articles/ Flying_Patterns.pdf

27 TOND to TOND: Self-Similarity of Persian TOND Patterns, Through. . .

839

Castera JM (2016) Persian variations. Nexus Netw J 18:223. https://doi.org/10.1007/s00004-0150281-5 Castera JM, Jolis H (1991) Géométrie douce. Atelier 6 12 , Paris Cromwell PR (2009) The search for quasi-periodicity in Islamic 5-fold ornament. Math Intell 31(1):36–56 Kond to Kond: http://www.quadibloc.com/math/pen05.htm Lee AJ (1987) Islamic star patterns. In: Grabar O (ed) Muqarnas IV: an annual on Islamic art and architecture. E.J. Brill, Leiden, pp 182–197 Makovicky E (1992) 800-year old pentagonal tiling from Maragha, Iran, and the new varieties of aperiodic tiling it inspired. In: Hargittai I (ed) Fivefold symmetry. World Scientific, Singapore, pp 67–86 Mofid H, Raieszadeh M (1995) Revival of the forgotten arts: principles of the traditional architecture in Iran according to Hossein Lorzadeh. Mola Publications, Tehran. (In Persian) Necipogglu G (1995) The Topkapi scroll: geometry and ornament in Islamic architecture. Getty Center Publication, Santa Monica Pelletier M (2013) Zellij Qusicrystals – A Gallery, Les tracés de l’Arabesque géométrique. Académie des Arts Traditionnels, Casablanca Shaarbaf A (1982) Ghirih and karbandi, vol 1. The National Organization for Protection of Iran’s Antiquities, Tehran I strongly recommend also the reading of every publication from Antony Lee, Peter Cromwell (available on https://girih.wordpress.com/), Emil Makovicky and Craig Kaplan With a special mention to the recent brilliant work of the French mathematician Armand Jaspar, available on line: http://patterns-islamiques.fr/

Artistic Manifestations of Topics in String Theory

28

Nadav Drukker

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Glimpses into String Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Genesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . First Superstring Revolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Second Superstring Revolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AdS/CFT Correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Imagery of String Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Piece of String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pants Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calabi-Yau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.C. Escher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Film and Television . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ceramics Inspired by String Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Circle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cusp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sewing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Threehalves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anomaly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subsurface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

842 844 844 845 846 847 847 848 848 849 850 850 850 851 851 852 854 855 857 859 859 860 861

Abstract As both a scientist and artist, the author presents the artworks inspired by his own research in string theory. The chapter starts with an introduction to string theory

N. Drukker () Department of Mathematics, King’s College London, London, UK © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_135

841

842

N. Drukker

and a very brief discussion of some common images associated to it. It then focuses on several series of works in clay with the mathematical and ceramics background to them.

Keywords Ceramics · Pottery · String theory · Theoretical physics

Introduction String theory is a topic at the frontiers of physics and mathematics. Though it has still not been demonstrated to describe our world, it has repeatedly enriched theoretical physics with innumerable discoveries and provided continued insight into varied fields of mathematics. First developed over 50 years ago, the theory is not well known outside of a relatively small circle of practitioners. The main reason being that there is no commonly known physical phenomenon that string theory addresses. Its main achievement is reconciling quantum mechanics with gravity, as manifested in Einstein’s general relativity. While both those theories are over 100 years old, the details of neither are widely taught, and thus the solution to the problem of quantizing gravity does not easily catch the imagination. String theory also lacks iconography. The image most recognized with quantum physics is an atom formed of a nucleus with electrons zipping around it (see Fig. 1). While a very poor representation of an atom’s wave function, it is related to Bohr’s “old quantum mechanics” model. It also is appropriate, as the understanding of the structure of the atom was intertwined with the development for quantum mechanics and quantum physics is applicable mostly on the atomic scale. A common image for general relativity is of a curved surface, stretched by the presence of a star or planet. It actually is a very good illustration of the three fundamental concepts of general relativity: that spacetime is curved and not flat, that this is the essence of

Fig. 1 Common images used to represent an atom (left) and the curving of space in general relativity (right). (Images: http://pixabay.com, http://commons.wikimedia.org/wiki/File: Spacetime_lattice_analogy.svg)

28 Artistic Manifestations of Topics in String Theory

843

the gravitational force, and that spacetime itself is dynamical; thus its curvature is determined by the objects inhabiting it. So what is the art of string theory? Let me propose three answers: • Foremost it is the theory itself, as the collective creative endeavor of thousands of researchers. • The imagery and metaphors used by science communicators and popularizers to convey the basic ideas of string theory. • Works of art inspired by string theory. This chapter focuses on the latter, in particular ceramic art created by the author and directly inspired by his own research projects within string theory, but it touches on the other two as well. The next section is a brief historical review of string theory with particular emphasis on the topics that inform the research and through it the art presented later. String theory, as the construct of pure thought and imagination guided by physical principles and ideas of harmony, should really be considered as a work of art in its own right. This brief review is far from doing it justice and is surely incapable of demonstrating to the reader the true elegance and beauty of it. The topics covered are the inception of string theory, originally known as the double resonance model as a theory of hadrons (subatomic particles), the realization that this model is a theory of strings and that it contains gravity. Then comes the first string revolution, the developments of the 1980s, which incorporated supersymmetry into string theory and many conceptual and technical obstacles were overcome. It led to the conjecture that the universe is in fact described by a superstring theory in ten dimensions comprising our familiar four-dimensional spacetime and a tiny compact six-dimensional manifold, known as a Calabi-Yau manifold. The next period of rapid development was the mid to late 1990s, known as the second superstring revolution. This involved the understanding of other crucial objects in string theory, known as D-branes which led to a less stringcentric view on the theory and consequently to the unveiling of connections between previously disparate string models. D-branes were also instrumental in connecting string theory to regular quantum field theories (QFT), the bread and butter of particle physics, thus forming a larger web of interconnected theories. This spans strongly gravitating theories describing black holes and the structure of entire spacetimes to very quantum theories like those studied by experimentalists at CERN and other laboratories. The most famous realization of this is the AdS/CFT correspondence, or the gauge-gravity duality, which is discussed in some detail, as a lot of the research realized in the art in the final section is related to it. The subsequent section talks briefly about common images used to describe string theory and tries to analyze what their purpose is and to what extent they realize it. As should be obvious after reading the preceding section, string theory is vast and immensely complicated, frustrating these attempts. In fact, of the two most common images used to describe string theory, one approach involves straightforward images

844

N. Drukker

of string interactions and fails to show the intricacies and richness of the theory. The other seems to focus mostly on illustrating the complexity itself. The following section focuses on the ceramic art of the author and its relation to his research. For every research project that he undertakes, the author produces a series of ceramic pieces whose shape and decoration is informed by the science. Often these pieces are produced in parallel to the research project itself, hence representing different stages in the development of the ideas and theories, where refined results and calculations are manifested in finer materials and decoration techniques. These works do not attempt to represent string theory as a whole, but rather particular research projects. As such their scope is much more intimate, allowing for wider freedom of expression.

Glimpses into String Theory Here is a brief history of string theory with emphasis on points that play a role in the later discussion. While it is aimed to be accurate, it is far too brief to properly demonstrate any of the points mentioned. That would require an entire monogram. The interested reader can easily find a variety of resources to dwell deeper into the topic. In any case, what is presented in this section is not required to follow the following two sections.

Genesis The inception of string theory is traced back to the late 1960s and the work of Veneziano (1968), who wrote down a formula with interesting properties that matched some rough relations observed between the masses and other properties of hadrons, certain subatomic particles detected at experiments. A model that reproduced the Veneziano amplitude was then discovered to be a string theory (Nambu, 1969; Nielsen, 1969; Susskind, 1970). The intuition behind this is rather elegant: quarks, the constituents of protons and neutrons, and other hadrons are the endpoints of strings. The fact that quarks are never observed alone but always in pairs or triples is because the string has another end. Two developments combined to turn string theory on its head. First, a better candidate as a model for hadrons was discovered in quantum chromodynamics (QCD), soon verified experimentally and still the accepted description of the strong interactions. Second, it was realized that while open strings (with endpoints) could resemble hadrons, the theory contains also closed strings (loops) whose spectrum includes particles that resemble the quantum particles of gravity: gravitons. While a theory of quantum gravity was lacking, the phenomena of the strong interactions and of gravity are experimentally not closely related, while in string theory they would be.

28 Artistic Manifestations of Topics in String Theory

845

These two developments removed string theory from a candidate theory to describe elementary particles, or as an alternative to quantum field theory (which QCD is an example of), and left it as a candidate for a unified theory of quantum gravity. In its early manifestations, it suffered from several technical difficulties that prevented it from being a proper candidate to describe our universe, and those were only overcome in the period that has come to be known as the first superstring revolution.

First Superstring Revolution The mid-1980s was a period of rapid development in string theory. Instead of describing the problems and their resolutions, the following is a rough image of the understanding of string theory after the revolution and the hope at the time of it describing our universe. There are five main consistent versions of superstring theory: • • • • •

Type I Type IIA Type IIB Heterotic E8 × E8 Heterotic spin(32)

All those theories are consistent only in ten-dimensional spacetime, and all have supersymmetry, a symmetry between particles with integer spin and particles with half integer spin (or Grassmann even and Grassmann odd). Otherwise they have all kinds of different properties, for example, while all those theories have closed strings, only Type I has also open strings. The details are not important for this narrative, but it was commonly assumed that nature should be described by the heterotic E8 × E8 theory, introduced in Gross et al. (1985). To overcome the embarrassing fact that the theory is tendimensional, one needs to require that six of the spatial directions are curled up into tiny, as yet undetected, directions. This small compact manifold has to satisfy certain properties for the theory to be consistent and to manifest some supersymmetry in four dimensions. The simplest way to realize this is with manifolds known as Calabi-Yau (CY) (Calabi, 1957; Candelas et al., 1985; Yau, 1978). Starting from an almost unique theory in ten dimensions, the properties of the four-dimensional universe depend on the details of the Calabi-Yau manifold. In particular the number of subatomic particles and the interactions between them are all determined from the details of the “compactification.” At the time not much was known about these manifolds, and it was hoped that there would be very few of them, or maybe a single appropriate one, and that it would determine our universe and the spectrum of particles and all other details of the standard model of particle physics purely from string theory and the mathematics of the Calabi-Yau.

846

N. Drukker

Unfortunately, this hope was not realized, as it turns out that there are plenty of CY manifolds and to date there is no obvious way to choose a particular one. Though it is still a favorite model to realize string theory in nature, it lost its expected predictive power. The choice of parameters in the standard model is now replaced with a choice of a particular CY.

Second Superstring Revolution The second superstring revolution (mid to late 1990s) upended a lot of the orthodoxy of the first. In particular it was realized that the five superstring theories listed above are all related to each other. Instead of being isolated theories, they are particular points in a continuous space of possible theories. Within this larger framework, dubbed M-theory, the five old string theories are the particular points where strings become weakly coupled. That means that a string will propagate freely for a long time without splitting or joining. Moving away from these points amounts to increasing the string coupling, so they split and join more readily. If strings easily morph, it is natural to ask whether they are still the most natural constituents of the theory, and it turns out that at a particular point within the continuum, the space actually looks 11-dimensional and the natural objects are membranes and not strings. This point, 11-dimensional supergravity theory, was then added as a sixth corner of M-theory (sometime that word is used to refer only to the vicinity of this 6th point, the quantum version of the 11-dimensional theory). Thus strings lost their superiority within string theory and instead one studies theories of strings, membranes, and objects of other dimensionality, many of which are known as D-branes (brane being a generalization of membrane). Strings have one spatial and one time direction and are usually studied by their embedding into spacetime. So one chooses intrinsic coordinates on this twomanifold, say (σ, τ ), and defines the string in spacetime by the ten functions XM (σ, τ ) with M = 0, · · · 9. This description is very general and has a lot of redundancy of reparameterizing the surface, but it is a very useful one. In this language we can view string theory as a two-dimensional rather than tendimensional theory, a theory of a string and the spacetime it is in, rather than the spacetime and the string within it. This point of view is part of what enabled the first superstring revolution. Two-dimensional theories are very rich but also very constrained and relatively easy to study. Having lost its primacy, one needs to study the theories describing the embedding of all the other objects that exist in M-theory. That includes in particular three-, four-, five-, and six-dimensional theories. If we consider N coincident D-branes and can choose a regime of the theory where they are weakly coupled, the proper description of it is in terms of what is known as a gauge theory, a very close analog of QCD, the theory that replaced string theory as the description of the strong interactions. It is an interesting twist on our tale that the theory that supplanted string theory is actually part of it.

28 Artistic Manifestations of Topics in String Theory

847

The reason for the name chromodynamics comes from the fact that quarks, the constituents, come in three varieties, which were called color (they also can also have different “flavors,” but that is not important here). One can easily define the theory with different number of colors, though nature realizes three. In string theory (taken henceforth to be synonymous to M-theory), one realizes this with the number N of coincident D-branes. For N = 0, there is nothing. N = 1 is very simple, as it is just the interaction of the brane with itself, and it is actually very close to Maxwell’s electromagnetism. For N = 2, 3, · · · things get progressively more complicated, except that there is a way to take the N → ∞ limit, where things simplify somewhat in the limit. This is not very close to the real value of N = 3 in QCD, but theorists take the liberty to play with the parameters and study the simplified version of the theory. Some of what is learnt there can be the interpolated to finite N .

AdS/CFT Correspondence A remarkable discovery due to Maldacena (1999) is that the theory of N coincident D-branes has an alternative description. By tuning a parameter in the theory, the original string theory description becomes such that gravity is strong and the large number of branes bend spacetime. The result, known as the AdS/CFT correspondence (or gauge-gravity duality), is that in this regime the gauge theory is better described by a gravitational theory on curved spacetime, known as anti-de Sitter (AdS) space. AdS space is a pseudo Riemannian space with constant negative curvature. Its Euclidean version, EAdS, is also more commonly known in mathematics as hyperbolic space. This was the source of a revolution in nineteenth-century geometry, namely, the understanding that the parallel axiom is independent of the others. The simplest examples of it are in two dimensions and go under the names Lobachevsky plane or Poincaré disk. A great deal of work has gone into realizing this duality explicitly: Finding more examples of dual field theories and gravity theories, understanding how to translate every calculation between the field theory and gravity (or string theory) description, trying to interpolate results between the two regimes, and more. A lot of the work presented in the final section of this chapter centers around this and other dualities. This concludes the lightning review of string theory; for more details the reader is encouraged to find textbooks or popular science books and articles on the topic.

The Imagery of String Theory In this section, I present some of the common images used to represent string theory and discuss their origin and meaning. Most of them appear, for example, on the Wikipedia page https://en.wikipedia.org/wiki/String_theory, but versions of them are ubiquitous on the Internet, books, magazines, and lectures on the topic.

848

N. Drukker

A Piece of String The image in Fig. 2 shows a piece of thread and a rubber band, representing an open and closed string. The basic objects in string theory were so named because of actual resemblance to common strings. They should be much smaller and not made of cotton or rubber, rather they should be the most fundamental building block of all other matter. Such images are not often used to illustrate string theory, possibly because they are too mundane. Instead strings are sometimes drawn as vibrating and/or fuzzy. Presumably, this is meant to convey the fact that different harmonics, i.e., vibration modes, of the string manifest as different particles. It could also indicate their quantum or complicated nature, compared to the simple rubber band.

Pants Diagram The image in Fig. 3 is one of the most common graphical representation of string theory. If we view the vertical direction as time, then the single closed string at the bottom splits in two. Imagining them as a waist and two legs, this is called a

Fig. 2 Embroidery thread and rubber band illustrating open and closed strings. (Image: Nadav Drukker)

Fig. 3 A string “pants diagram” showing a single closed string at the bottom splitting into a pair. (Image: Nadav Drukker)

28 Artistic Manifestations of Topics in String Theory

849

pants diagram. This image has a deep connection to the representation of string theory in terms of a two-dimensional model and is related to the mathematics of two-dimensional surfaces; see, for example, the chapter of Baik (2020). Versions of this image appear on the string theory Wikipedia page, the covers of many books, and countless conference posters and YouTube videos. According to my subjective impressions, this image has lost some of its popularity in recent years. The reasons could be the decrease in supremacy of strings and particularly the worldsheet description in string theory with the advent of dualities and D-branes. Another possibility is that the image (like the rubber band) is viewed as too simple to convey the richness of string theory.

Calabi-Yau Some of the most popular graphical representations of string theory in recent years are computer-generated images of complicated surfaces. See Fig. 4 for the version appearing on the string theory Wikipedia page (and reused in many other places). This image is of a two-dimensional cross section of a particular Calabi-Yau manifold. As mentioned above, these are six-dimensional spaces, so they cannot be easily visualized on paper hence the need for choosing a particular cross section. In most occurrences where I came across usage of this and similar images, it is not explained, as that would require enough knowledge of algebraic geometry to understand what a CY manifold is, details of the particular one, and the cross section chosen. In reality, this image is not meant to convey those details but rather that string theory is complicated. In fact in an informal survey that I took of my colleagues, all of whom have seen this image and many who have reproduced it, none knew the exact origin or meaning of it, except that it’s “probably a cross section of a CY manifold.”

Fig. 4 A two-dimensional cross section of the quintic Calabi-Yau manifold. (Credit: https://en.wikipedia.org/wiki/ File:Calabi_yau.jpg)

850

N. Drukker

M.C. Escher It is well known that M.C. Escher incorporated a great deal of mathematics into his images. They include tessellations, Möbius strips, false perspectives, and more. One particular set of designs is based on the geometry of the hyperbolic plane; see https://mcescher.com/wp-content/uploads/2019/05/LW-436.jpg. The hyperbolic plane is an example of a (Euclidean) AdS space, so images of this type have come to symbolize the AdS/CFT correspondence, and they can illustrate quite a lot of the peculiarities of these geometries. First, we should note that these images are conformal projections of hyperbolic space, not the space itself. Angles are represented correctly, but like in all flat maps of our spherical world, distances are not. In the true metric of hyperbolic space, all the angels and demons in Escher’s image would be of the same size. Many calculations on the AdS side of the correspondence rely on the classical geometry/gravity of that space, involving geodesic length or minimal surfaces. The rough properties of those can be read off these figures, as they clearly illustrate the exponential growth of the area (or volume) of space as one approaches the boundary. This implies that geodesics and minimal surfaces tend to enter deep into the space, rather than stay near the boundary, where the lengths diverge.

Music A common non-visual metaphor for string theory are string instruments. The basic mathematics of strings, namely, their vibration patterns are shared by those in string theory and those on a violin. One manifestation of that was the naming of the discoverers of the heterotic string theory (Gross et al., 1985) “The Princeton String Quartet.” It is also at the heart of the collaborative performance between Brian Greene, the author of a popular string theory book (Greene, 1999), and the Emerson String Quartet; see http://www.college.columbia.edu/cct_archive/sep99/12a.html.

Film and Television It may be worth mentioning that string theory is invoked in many sci-fi movies. It also plays a prominent role in the TV series “The Big Bang Theory,” as the protagonist, Sheldon Cooper, is a string theorist. Though the science in the series is often detailed and mostly accurate, it takes a minor role in the plots, and no clear facets or images of string theory (or the other branches of science featured on the show) permeated the Zeitgeist.

28 Artistic Manifestations of Topics in String Theory

851

Ceramics Inspired by String Theory The remainder of this chapter is an exposition of the ceramic art the author has made, inspired by his own research within the topics of string theory, M-theory, and supersymmetric field theories. The section is divided into seven subsections, each devoted to a series of works inspired by one research project. The discussion of the motivation and philosophy behind the art, the topics of research, and technical ceramic details are spread across the different subsections, interspersed with images of the works.

Circle This series of works is based on a research paper with Nobel laureate David J. Gross (Drukker and Gross, 2001). The object studied is known as a Wilson loop (Wilson, 1974) and in this case has the shape of a circle, hence the name of the series. All works in the series are plates, thus representing the circle, and indeed the form of the pieces in each series of works described below is related to the associated research. The plates are decorated with certain details of the calculations and/or details from the research. Circle-3, see Fig. 5, shows some Feynman diagrams, a graphical illustration of some integrals common in quantum field theory, which arise in the context of the calculation in the paper. You can also see the unique identifier hepth/0010274 of the paper on the online repository http://arxiv.org.

Fig. 5 Circle-3 detail. Thrown, incised stoneware with celadon glaze, 33 × 33 × 3 cm. (Photo: Nadav Drukker)

852

N. Drukker

Fig. 6 Circle-10. Thrown porcelain, slip inlay and clear glaze, 35 × 35 × 3 cm. (Photo: Nadav Drukker)

In Fig. 6 you can see another plate in the series, called Circle-10. This one is made of porcelain with blue porcelain inlay. One of the main results of the paper is a function of two parameters λ and N, given by   1 1 λ − L eλ/8N N N −1 4N

(1)

where Lαn are generalized Laguerre polynomials. This expression can also be seen in Fig. 5. This function can be Taylor expanded in powers of λ and inverse powers of N. Note that the dependence on N is subtle, because it appears also in the subscript of L, but at every order in λ there is a finite expansion in 1/N. This expansion up to λ18 is written on Circle-10. As often it is assumed that N is large, higher negative powers of N are subdominant. This is manifested in the shade of blue, which is darkest for O(N 0 ) made of porcelain with 3% cobalt carbonate. Subsequent terms in the 1/N expansion are progressively fainter, made of more and more diluted colored porcelain. While this and all the projects below are within string theory, it should be clear that these works are not aiming to address a grand question like “what is string theory” (Polchinski, 1994). Rather they offer a small glimpse into a corner of the theory exemplifying some themes in theoretical and mathematical physics.

Cusp The “Cusp” series grew out of a collaboration with Valentina Forini (Drukker and Forini, 2011). It too revolves around Wilson loops in the same theory as the “Circle” series, but with a more complicated geometry: two rays meeting at a corner or two arcs meeting at two corners. The latter is the cross section of the pieces in this series and hence the main motivation for the design. The paper approaches the problem in two different ways, one valid at weak coupling, using similar Feynman diagrams as

28 Artistic Manifestations of Topics in String Theory

853

Fig. 7 Cusp-1. Slab-built, incised stoneware, red iron oxide wash and celadon glaze, 28 × 23 × 7 cm. (Photo: Nadav Drukker)

in Fig. 5. The other uses string theory via the AdS/CFT correspondence to calculate it at strong coupling in terms of a minimal string surface in hyperbolic space. Those two approaches to the problem are represented on the two faces of the pots. In Fig. 7 one can see some of the results of the minimal surface calculations. The previous example of the circle had two parameters λ and N (there is no dependence on the radius, because the calculation is done in a conformal, i.e., scale invariant, theory). In this case there is a conformally invariant quantity which is the angle at the corner, here measured to be roughly φ ∼ 2.13. The results of the calculations are then written on this pot evaluated for this value. In this sense some more mathematical processing was put into producing the pot than was in the original paper, as in the Taylor expansion on Circle-10 in Fig. 6. Figure 8 shows the “weak coupling” side of another piece in this series, Cusp-18. One can see some of the Feynman diagrams (similar to those in Fig. 5) that go into the calculation, as well as the final results. More details about the “Cusp” series can be found in Drukker (2019).

854

N. Drukker

Fig. 8 Cusp-18. Slab-built stoneware, incised colored slip and celadon glaze, 33 × 30 × 7 cm. (Photo: Nadav Drukker)

Sewing The series “Sewing” attempts to illustrate the process of constructing twodimensional manifolds (Riemann surfaces) out of basic building blocks, known as pairs of pants, closely related to the discussion in the chapter (Baik, 2020). This is the main mathematical tool used in the paper (Drukker et al., 2009) which deals with the problem of classifying line operators, like the Wilson loops in the previous two examples, but in other theories. The theories studied are in four dimensions and arise from considering a six-dimensional theory on a compact two-manifold. In the same way that fourdimensional physics arises from string theory on a six-dimensional Calabi-Yau, here the starting point is in six dimensions, and the resulting four-dimensional theory is determined by a choice of two-manifold. As mentioned previously, most two-dimensional manifolds can be constructed by sewing together pairs of pants as in Fig. 3. There are normally an infinite number of ways of decomposing a surface into pairs of pants, and in this context different decompositions correspond to different four-dimensional theories. Two four-dimensional theories arising from different decompositions of the same mani-

28 Artistic Manifestations of Topics in String Theory

855

Fig. 9 Sewing-4j. Wheel-thrown and altered stoneware, incised, iron oxide wash, cobalt blue glaze, and yarn. Triptych. Each 27 × 18 × 9 cm. (Photos: Nadav Drukker)

fold are related to each other in a very non-trivial manner, as discussed in Gaiotto (2012). Figure 9 shows the simplest surface that can be constructed out of a single pair of pants, by sewing two of the legs. It is the torus with a single hole. The same surface is shown here in three inequivalent “pants decompositions.” The solution to the problem studied in Drukker et al. (2009) is an elaboration of this. It is not merely the pants decomposition of a surface but how to describe nonself-intersecting curves on the surface. This is a problem that was solved by Dehn (1987) and independently by Thurston (1988) and amounts to assigning a pair of integers satisfying some constraints to each sewn circle. The numbers represent the number of times the curve crosses the circle and how much it twists around it. The red yarn in Fig. 9 represents the curve, which is parallel to the cut in the photo on the left, crosses the cut perpendicularly in the middle and with a single twist on the right. Details of these “pants decompositions” and their gauge theory avatars (Wilson, ’t Hooft, and dyonic loops) are inscribed on them. Figure 10 shows a five-punctured sphere made of three pairs of pants. The details of that example were not worked out in the paper (Drukker et al., 2009) and had to be worked out independently in order to write them on this piece.

Threehalves Two of the series involve spherical vessels with cuts through them: “threehalves” and “cut.” Both are related to calculations of three-dimensional theories on S 3 , the three-sphere, so essentially some properties of a universe that is a compact sphere. Relying on supersymmetry and some detailed analysis, the calculation is reduced to something known as a matrix model (which in fact is also true of the “circle” project discussed above). A matrix model is an integral over an ensemble of matrices with a prescribed measure. In many cases it can be reduced to an integral over the eigenvalues of the

856

N. Drukker

Fig. 10 Sewing-6. Thrown and assembled stoneware with iron oxide wash, shino and blue glazing and leather straps, 63 × 42 × 35cm. The 5-punctured sphere is assembled from three “pairs of pants.” (Photo: Nadav Drukker)

matrix. For an N × N matrix, we get N eigenvalues subject to some interactions and potential. In the calculations in Drukker et al. (2011), which is the inspiration for “threehalves,” the eigenvalues tend to coalesce around two regions in the complex plane, the intervals [−b, −1/b] and [1/a, a]. In the large N limit, the eigenvalues make a dense subset of these intervals, and it makes sense to discuss the eigenvalue density ρ(x), which is different from zero in this interval. Starting with a function

ω(z) =

1  1 , N n z − zn

(2)

where zn are the eigenvalues, the limit replaces the poles with a pair of branch cuts. They are represented by two physical cuts in the pots. If we want to know how many eigenvalues are in each interval, we can use the Cauchy integral formula

28 Artistic Manifestations of Topics in String Theory

857

Fig. 11 Threehalves-6 two details. Wheel-thrown and altered stoneware, incised, violet matt glaze. 18 × 17 × 17 cm. (Photos: Nadav Drukker)

 1 Ni = 2ti = ω(z) dz , N 2π i C i

(3)

where Ci is a contour around the ith cut. Those are known as period integrals. The basic structure of those is shown in Fig. 1 of Drukker et al. (2011) and reproduce on the pots, see the left image in Fig. 11. The formulas are on the other side of the pot, shown on the right image.

Cut The “Cut” series is based on Anderson and Drukker (2017), which is a follow-up paper to that of “threehalves” and the logic of the design is similar, just that in this case the eigenvalues condense to a single interval, hence the single cut. Some of the pieces in this series were made in parallel to the research itself so they are final artworks representing draft calculations. The state of the calculation is manifested by using rough techniques, like the stoneware clay and messy shino glazing on Cut-4 in Fig. 12 which in fact was applied so thick that it obscured some of the mistakes in the early calculations and even sealed the cut. The final result of a research project is a published (peer-reviewed) paper. Its artistic avatars employ much finer techniques and more elegant writing, like Cut-20 in Fig. 13 which was decorated with colored slip inlay and precious metal lusters. The numbers at the base of the piece, which can be seen on the right image, are the full bibliography of the paper. This large spherical porcelain pot owes

858

N. Drukker

Fig. 12 Cut-4. Thrown and altered stoneware with shino glaze 26 × 25 × 25 cm. (Photo: Nadav Drukker)

Fig. 13 Cut-20 two views, wheel-thrown and altered porcelain, colored slip inlay, clear glaze and gold and platinum lusters. 27 × 28 × 28 cm. (Photos: Nadav Drukker)

to techniques refined for the Asian moon jars, and likewise the science relies on previous work that should be properly referenced. Indeed another artistic reference are the canvases (and ceramics) of Fontana, who upended modern art by turning the canvas and the gaps in it into the work of art rather than just a substrate. Here the integrity of the pot as both a ceramic vessel and a writing medium are violated in a similar way.

28 Artistic Manifestations of Topics in String Theory

859

Fig. 14 Anomaly-7. Slab-built stoneware with blue slip, cobalt wash and matte white glaze 34 × 38 × 3 cm. (Photo: Nadav Drukker)

Anomaly Several of the series above were related to research into Wilson loops and their dual description as the strings of string theory via the AdS/CFT correspondence. The “Anomaly” series is related to the research in Drukker et al. (2020a,b) on Two-dimensional generalizations of loop operators in a six-dimensional theory and whose realization using the AdS/CFT correspondence are the M2-branes of Mtheory. While studying arbitrary Wilson loops is very hard, and the focus was on a circle or a cusp, for the surfaces it is possible to calculate some quantities, known as anomalies, which are valid for any shape. This is realized by the very bent sheet in Fig. 14. As in the cusp series, there are several different approaches to calculate the anomalies and on this piece one is written by blue over white in the troughs and the other in white over blue on the peaks. Like Cut-4 in Fig. 12, this piece was also made, while the calculation was still ongoing, and indeed there are some partial results written on it that were later realized to be mistaken.

Subsurface One continuation of the “Anomaly” project is still ongoing (Drukker, 2020) at the time of writing this chapter. This is related to restricted classes of two-dimensional surfaces, which are constrained to be within particular higher dimensional submanifolds of R6 , for example, a three-sphere S 3 . This inspired the spherical generalization of the “Anomaly” series into “Subsurface”; see Fig. 15. As this piece

860

N. Drukker

Fig. 15 Subsurface-2. Wheel-thrown and altered stoneware, incised, red iron oxide wash, shino and cobalt blue glazes. 24 × 25 × 25 cm. (Photo: Nadav Drukker)

was done at the very early stages of the project, the writing includes mainly ideas and postulates with very preliminary calculations. This is also the motivation for the very messy glaze. If the research comes to fruition, there will be more refined versions of this work, and if not, the early pieces in this series will remain as the evidence to an incomplete research project, as has happened with other projects.

Conclusions This chapter focused on the ceramic works of the author which are inspired by his research in string theory. It sets off with a lightning review of some main topics in string theory (and its cousin, M-theory). A very rudimentary selection of some other graphical and artistic realizations of string theory are presented in the next section. The bulk of the chapter is devoted to presenting seven series of ceramic works, each comprising 5–30 pieces all with similar forms but differing in the material, size, writing, and glazing. Each of the series is illustrated by 1–2 figures to show the main design elements and explore some of the pottery techniques used across the series. While the topics of research that these works are inspired by are all within string theory, this aspect can be lost when looking at the pots or even when reading the description. They are meant to provide a more intimate view of small corners of the theory and not necessarily deal with grand questions and certainly not represent all of string theory. This allows for each series or particular piece to highlight some

28 Artistic Manifestations of Topics in String Theory

861

topic in mathematics and physics like Feynman diagrams, minimal surfaces, Taylor expansion, pants decompositions, matrix models, branch cuts, and more. Not treating string theory as a monolith awards the artist more freedom of expression and enables the viewer, the sci-art enthusiast, to approach the topic from a multitude of angles while digesting small nuggets. The medium of clay is ideal for the combination of sculptural and twodimensional writing possibilities. The forms are chosen to be related in some way to the research, as are design elements in the decoration. These are abstractions of the physics, but the writing itself is firmly grounded in the language of mathematics. The formulas all arise in the research, as draft calculations, final results lifted from the published paper, or further elaboration on them done for the purpose of the art piece. The philosophy and semiotics behind the works are further explored in (Drukker, 2020). Works from these series, including some of those in the photos have appeared in multiple solo and group exhibitions. Both those dedicated to art and science and purely art exhibitions and have generally garnered positive reviews. They are meant to be, and hopefully perceived as, realizations of the deep interconnections between the mathematical sciences and art – the most creative of human endeavors.

References Anderson L, Drukker N (2017) More large N limits of 3D gauge theories. J Phys A 50(34):345401. https://doi.org/10.1088/1751-8121/aa7e11, http://arxiv.org/abs/1701.04409 Baik H (2020) Almost all surfaces are made out of hexagons. Springer International Publishing, Cham, pp 1–6. https://doi.org/10.1007/978-3-319-70658-0_111-1 Calabi E (1957) On kähler manifolds with vanishing canonical class. In: Algebraic geometry and topology. Princeton University Press. https://doi.org/10.1515/9781400879915-006 Candelas P, Horowitz GT, Strominger A, Witten E (1985) Vacuum configurations for superstrings. Nucl Phys B 258:46–74. https://doi.org/10.1016/0550-3213(85)90602-9 Dehn M (1987) On curve systems on two-sided surfaces, with application to the mapping problem. Springer, New York, pp 234–252. https://doi.org/10.1007/978-1-4612-4668-8_14, translated from the German by John Stillwell Drukker N (2019) Ceramics inspired by string theory. In: Goldstine S, McKenna D, Fenyvesi K (eds) Proceedings of bridges 2019: mathematics, art, music, architecture, education, culture. Tessellations Publishing, Phoenix, pp 311–318. Available online at http://archive. bridgesmathart.org/2019/bridges2019-311.pdf Drukker N (2020) Representing theoretical physics research in and on ceramics. Leonardo (ja):1– 12. https://doi.org/10.1162/leon_a_01908 Drukker N, Forini V (2011) Generalized quark-antiquark potential at weak and strong coupling. JHEP 06:131, https://doi.org/10.1007/JHEP06(2011)131, http://arxiv.org/abs/1105.5144 Drukker N, Gross DJ (2001) An exact prediction of N = 4 SUSYM theory for string theory. J Math Phys 42:2896–2914. https://doi.org/10.1063/1.1372177, http://arxiv.org/abs/hep-th/0010274 Drukker T (2020) Observations on BPS observables in 6d, arXiv:2012.11087. https://arxiv.org/abs/ 2012.11087 Drukker N, Morrison DR, Okuda T (2009) Loop operators and S-duality from curves on Riemann surfaces. JHEP 09:031. https://doi.org/10.1088/1126-6708/2009/09/031, http://arxiv.org/abs/ 0907.2593

862

N. Drukker

Drukker N, Mariño M, Putrov P (2011) From weak to strong coupling in ABJM theory. Commun Math Phys 306:511–563. https://doi.org/10.1007/s00220-011-1253-6, http://arxiv. org/abs/1007.3837 Drukker N, Probst M, Trépanier M (2020a) Defect CFT techniques in the 6D N = (2, 0) theory. http://arxiv.org/abs/2009.10732 Drukker N, Probst M, Trépanier M (2020b) Surface operators in the 6D N = (2, 0) theory. J Phys A 53(36):36. https://doi.org/10.1088/1751-8121/aba1b7, http://arxiv.org/abs/2003.12372 Gaiotto D (2012) N = 2 dualities. JHEP 08:034. https://doi.org/10.1007/JHEP08(2012)034, http:// arxiv.org/abs/0904.2715 Greene BR (1999) The elegant universe: superstrings, hidden dimensions, and the quest of the ultimate theory. Norton, New York Gross DJ, Harvey JA, Martinec EJ, Rohm R (1985) The heterotic string. Phys Rev Lett 54:502– 505. https://doi.org/10.1103/PhysRevLett.54.502 Maldacena JM (1999) The large N limit of superconformal field theories and supergravity. Int J Theor Phys 38:1113–1133. https://doi.org/10.1023/A:1026654312961, http://arxiv.org/abs/hepth/9711200 Nambu Y (1969) Quark model and the factorization of the Veneziano amplitude. In: International conference on symmetries and quark models, Wayne State U., Detroit, pp 269–278 Nielsen HB (1969) An almost physical interpretation of the dual n point function. Nordita preprint; unpublished Polchinski J (1994) What is string theory? In: NATO advanced study institute: Les Houches summer school, session 62: fluctuating geometries in statistical mechanics and field theory. hep-th/9411028 Susskind L (1970) Dual-symmetric theory of hadrons. 1. Nuovo Cim A 69S10:457–496 Thurston WP (1988) On the geometry and dynamics of diffeomorphisms of surfaces. Bull Amer Math Soc (NS) 19(2):417–431. https://projecteuclid.org:443/euclid.bams/1183554722 Veneziano G (1968) Construction of a crossing – symmetric, Regge behaved amplitude for linearly rising trajectories. Nuovo Cim A 57:190–197. https://doi.org/10.1007/BF02824451 Wilson KG (1974) Confinement of quarks. Phys Rev D 10:45–59. https://doi.org/10.1103/ PhysRevD.10.2445 Yau ST (1978) On the Ricci curvature of a compact Kähler manifold and the complex MongeAmpère equation.I. Commun Pure Appl Math 31(3):339–411

Cutting, Gluing, Squeezing, and Twisting: Visual Design of Real Algebraic Surfaces

29

Stephan Klaus

Contents From Algebraic Formulas to Geometric Forms: Real Algebraic Surfaces . . . . . . . . . . . . . . . . Standard Constructions: Union, Intersection, and Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . Morphing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cutting and Gluing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Squeezing, Shifting, and Twisting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

864 867 868 869 872 874 877

Abstract The theory of real algebraic surfaces studies the connection between the algebraic properties of a polynomial p(x, y, z) in three real variables and the geometric and topological properties of its set of zeros in three-space. We give an overview on some methods how to create interesting real algebraic surfaces which are also looking nice from the esthetic viewpoint. The surfaces are visualized using the free SURFER software.

Keywords Real algebraic surface · Variable elimination · Möbius strip · Trefoil knot · Octahedral symmetry · Polyhedron · Pentagon · Moduli space · Coordinate transformation · MSC: Primary 14-04 · 14H45 · 14H50 · 14J17 · 14P05 · 14Q10 · and 14Q30; Secondary 65D18 · 68 U05 · 97G60 · 97G99 · 97H30 · and 97N80

S. Klaus () Oberwolfach Research Institute for Mathematics, Oberwolfach, Germany e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_118

863

864

S. Klaus and B. Sriraman

From Algebraic Formulas to Geometric Forms: Real Algebraic Surfaces Affine real algebraic surfaces are given as the set S of solutions for one equation p(x, y, z) = 0 in three real variables x, y, and z. Here, p(x, y, z) has to be a polynomial, and we are interpreting the variables as coordinates in three-dimensional space R3 . As the variables have 3 degrees of freedom and the constraint has 1 degree (as there is only one equation), the set S of solutions is generically a two-dimensional object, i.e., a curved smooth surface embedded in R3 . This follows, e.g., from the wellknown Theorem of Sard in differential topology (Milnor 1997). In the nongeneric case, the set of solutions can have singularities and can be quite different: For the zero polynomial p = 0, it is the whole space (i.e., threedimensional); for the constant polynomial p = 1, it is the empty set; for p = x2 + y2 , it is the z-axis (i.e., one-dimensional); and for p = x2 + y2 + z2 , it is the origin (i.e., zero-dimensional). The theory of real algebraic surfaces studies the connection between the algebraic properties of the polynomial p and the geometric and topological properties of the solution set S. All pictures in this article were created by the author with the freely available SURFER software (2008). This software also allows to use free parameters a, b ∈ [0, 1] in polynomials which can be fixed by scroll bars (sometimes, we even use a third parameter c). This is very useful for experiments and to optimize the graphical result. Note that we give equations additionally in a format which can be directly used for the SURFER software by copy and paste to the command line. Clearly, linear polynomials give planes. Quadratic polynomials give interesting surfaces already: Sphere : xˆ2 + yˆ2 + zˆ2 − 1 = 0 Cylinder : xˆ2 + yˆ2 − 1 = 0 Double cone : xˆ2 + yˆ2 − zˆ2 = 0

29

Cutting, Gluing, Squeezing, and Twisting: Visual Design of Real. . .

865

The explanation that these formulas give the above forms is very easy: x2 + y2 + z2 is the square of the distance d from a point in space to the origin, thus the formula yields the sphere as the set of points of constant distance d = 1. x2 + y2 is the square of the distance from a point in the x-y-plane to the origin, thus the formula yields the circle in the plane and the cylinder in space as the z-coordinate can take arbitrary values. In order to understand the third example, we first consider the equation (x + z)(x − z) = x2 − z2 = 0 which has as solution the two diagonals x ± z = 0 in the x-z-plane. The additional term y2 introduces rotational symmetry in the x-y-plane, i.e., it rotates a St Andrew’s cross around the z-axis, and this yields a double cone. These examples show that one can create interesting forms from simple formulas. Of course, one needs more complicated polynomials for more complex forms. In Hartkopf and Matt (2012, 2013), one can find further examples and information, connecting mathematics and art. The author was also involved in this topic, in particular in its connections with mathematical research: In Klaus (2009, 2019), we found formulas for multiple twisted Möbius strips, which were considered as impossible before (in fact, these constructions are very thin closed surfaces, thus a fake construction in some sense):

In Klaus (2010), we produced the solid trefoil knot as an algebraic surface of degree 14, which led us to further research (e.g., Klaus 2014, 2017b) in knot theory. Some results are visualized in the author’s gallery on knots (Gallery 2012):

In Klaus and Violet (2015a), Bianca Violet and the author visualized interesting families of convex polytopes which interpolate between a cube, a dodecahedron,

866

S. Klaus and B. Sriraman

and an octahedron: 2 − parameter family of convex polyhedra (a ∗ x + b ∗ y + z)ˆ16 + (−a ∗ x + b ∗ y + z)ˆ6 + (a ∗ y + b ∗ z + x)ˆ16 + (−a ∗ y + b ∗ z + x)ˆ16 + (a ∗ z + b ∗ x + y)ˆ16 + (−a ∗ z + b ∗ x + y)ˆ16 − 1 = 0

In Klaus and Kojima (2019), Sadayoshi Kojima and the author examined the moduli space of pentagons as an algebraic surface:

Some of these results are also described in the conference paper (Klaus 2017a) in the Festschrift in honor of Gert-Martin Greuel, who invented the SURFER software with colleagues in 2008. Last but not least, the author worked with Andreas Matt, Bianca Violet, JoséFrancisco Rodrigues, and other colleagues on mathematical short movies (Movie of CIM and MFO 2010; Klaus and Violet 2015b, 2016) where dynamical deformations of algebraic surfaces play the main role. The two movies (Klaus and Violet 2015b, 2016) were presented at Bridges Short Film Festivals 2015 and 2016. It should be mentioned that the topic is also highly interesting from the perspective of education in mathematics because of its strong esthetic flavor. The author gave several lectures and courses on algebraic surfaces and SURFER in high schools in Germany, Portugal, and Greece. In the following sections, we give further constructions of interesting algebraic surfaces. Some of them are well-known to the experts, some other seem to be new.

29

Cutting, Gluing, Squeezing, and Twisting: Visual Design of Real. . .

867

As acknowledgment, I would like to thank Gert-Martin Greuel, Sadayoshi Kojima, Sofia Lambropoulou, Andreas Daniel Matt, José-Francisco Rodrigues, and Bianca Violet for the fruitful cooperation in the mentioned projects.

Standard Constructions: Union, Intersection, and Smoothing These constructions are simple and well-known. As a product is zero if and only if at least one factor is zero, the product of two polynomials p1 p2 gives the union S1 ∪ S2 of their associated surfaces. Similarly, the sum of squares p12 + p22 is zero if and only if each polynomial is zero, thus, it gives the intersection S1 ∩ S2 of their associated surfaces. Unfortunately, the generic intersection of two surfaces is a one-dimensional object, i.e., a curve, which cannot be visualized by the SURFER software. Here, the Theorem of Sard came again into play. If we have an equation p(x, y, z) = 0 with singularities, a small deformation p(x, y, z) =  kills the singularities such that we get a smooth surface. It follows that p12 + p22 −  (for small positive ) defines a tubular surface around the one-dimensional intersection S1 ∩ S2 of the surfaces which are associated to p1 and p2 . Here are some examples of these constructions: Union of two cylinders : (xˆ2 + yˆ2 − 1) ∗ ((x − 4 ∗ a + 2)ˆ2 + zˆ2 − 1) = 0 Smoothing of intersection of two cylinders : (xˆ2 + yˆ2 − 1)ˆ2 + ((x − 4 ∗ a + 2)ˆ2 + zˆ2 − 1)ˆ2 − b = 0

Smoothing of intersection of two cylinders :

868

S. Klaus and B. Sriraman

(xˆ2 + yˆ2 − 1)ˆ2 + ((x − 4 ∗ a + 2)ˆ2 + zˆ2 − 1)ˆ2 − b = 0 Smoothing of eight touching spheres : ((x − 1)ˆ2 + (y − 1)ˆ2 + (z − 1)ˆ2 − 1) ∗((x + 1)ˆ2 + (y − 1)ˆ2 + (z − 1)ˆ2 − 1) ∗((x − 1)ˆ2 + (y + 1)ˆ2 + (z − 1)ˆ2 − 1) ∗((x + 1)ˆ2 + (y + 1)ˆ2 + (z − 1)ˆ2 − 1) ∗((x − 1)ˆ2 + (y − 1)ˆ2 + (z + 1)ˆ2 − 1) ∗((x + 1)ˆ2 + (y − 1)ˆ2 + (z + 1)ˆ2 − 1) ∗((x − 1)ˆ2 + (y + 1)ˆ2 + (z + 1)ˆ2 − 1) ∗((x + 1)ˆ2 + (y + 1)ˆ2 + (z + 1)ˆ2 − 1) − 1000 ∗ a = 0

Morphing By morphing, we understand the interpolation between two polynomials p(x, y, z) and q(x, y, z), i.e., we consider the one-parameter family of polynomials pa (x, y, z) := (1 − a) p (x, y, z) + aq (x, y, z) where a denotes a real parameter. For a = 0, we obtain the form associated to p(x, y, z), and for a = 1, we obtain the form associated to q(x, y, z). Thus the intermediate values 0 < a < 1 interpolate between these two forms.

29

Cutting, Gluing, Squeezing, and Twisting: Visual Design of Real. . .

869

As an example, we consider the square of the equation of a sphere x2 + y2 + z2 = 1, i.e., p(x, y, z) := x4 + y4 + z4 + 2(x2 * y2 + x2 * z2 + y2 * z2 ) − 1 = 0, which obviously gives a sphere again. We also know that q(x, y, z) := x4 + y4 + z4 − 1 = 0 is an approximated cube. Thus it is self-evident to consider the following morphing: xˆ4 + yˆ4 + zˆ4 + 20 ∗ a ∗ (xˆ2 ∗ yˆ2 + xˆ2 ∗ zˆ2 + yˆ2 ∗ zˆ2) − 1 = 0 For a = 0, we have a cube, for a = 0.1, we get a sphere, and for 0.1 < a < 1, the corners are pushed further into the inside which results in an octahedron!

Morphing is a very direct, experimental method to produce new surfaces. The drawback is that it is difficult to predict the results.

Symmetry Let A: R3 → R3 be a linear transformation, for example, a rotation in threespace. Thus, A describes a coordinate transformation which can be performed on a polynomial p(x, y, z) to give a new polynomial p ◦ A. The surface associated to the new polynomial is just given by performing the coordinate transformation to the old surface, which in our example is a rotation. If the polynomial p ◦ A coincides with p, it is A-invariant, which means that the associated surface has A-symmetry. Thus, an algebraic symmetry also means a geometric symmetry . The same consideration applies to a group G of symmetries A, in particular to a subgroup G of the group of rotations O(3). As an example, we consider the polynomial p(x, y, z) = x2 + y2 + z2 − 1 which gives a sphere. It has full algebraic symmetry G = O(3) which corresponds to the fact that the sphere has full rotational symmetry. In contrast, the polynomial p(x, y, z) = x4 + y4 + z4 – 1 which yields an approximate cube has a much smaller symmetry group. It is generated by the three coordinate reflections x −→ −x (etc.) and the permutations of the three coordinates. It is easy to see that this symmetry group (the octahedral group) has order 48 as there are 3! = 6 permutations and 23 = 8 multireflections. Applying these symmetries to a fixed polynomial and multiplying all results generate a symmetric polynomial. This method gives some interesting results, for

870

S. Klaus and B. Sriraman

example, a symmetric arrangement of planes: Arrangement : x ∗ y ∗ z ∗ (xˆ2 − yˆ2) ∗ (yˆ2 − zˆ2) ∗ (zˆ2 − xˆ2)

Here are other examples of polynomials with symmetry. Assume that p(x, y2 ) is a polynomial of two variables x and y, where y appears only with even powers in p. The set of solutions is a one-dimensional curve in the plane with coordinates x and y. If we form the new polynomial p(x, y2 + z2 ), it describes a surface which emerges by rotating the curve around the x-axis. The reason is that y2 + z2 is the square of the distance in the y-z-plane which has rotational symmetry around the x-axis. We apply this method to the well-known lemniscate curve (the “infinity curve”)    (x − 1)2 + y 2 (x + 1)2 + y 2 = a which is defined as the set of all points in the plane such that the product of distances to the two points (±1, 0) is constant. Rotated lemniscate : ((x − 1)ˆ2 + yˆ2 + zˆ2) ∗ ((x + 1)ˆ2 + yˆ2 + zˆ2) − 2 ∗ a Chemical p − orbitals : (((x − 1)ˆ2 + 4 ∗ yˆ2 + 4 ∗ zˆ2) ∗ ((x + 1)ˆ2 + 4 ∗ yˆ2 + 4 ∗ zˆ2) − 1)∗

29

Cutting, Gluing, Squeezing, and Twisting: Visual Design of Real. . .

871

(((y − 1)ˆ2 + 4 ∗ xˆ2 + 4 ∗ zˆ2) ∗ ((y + 1)ˆ2 + 4 ∗ xˆ2 + 4 ∗ zˆ2) − 1)∗ (((z − 1)ˆ2 + 4 ∗ yˆ2 + 4 ∗ xˆ2) ∗ ((z + 1)ˆ2 + 4 ∗ yˆ2 + 4 ∗ xˆ2) − 1) Threefold orbital : (xˆ3 − 3 ∗ x ∗ yˆ2 − 1)ˆ2 + (3 ∗ xˆ2 ∗ y − yˆ3)ˆ2 + (3 ∗ z)ˆ2 − 1

In order to construct a closed surface of genus n (i.e., a sphere with n holes), we use the equation (t − a)2 + z2 = b2 of a circle in the t-z-plane with center (a, 0) and radius b, together with the square of the distance to the complex n-roots of unity  2 t 2 = (x + iy)n − 1 . By the method of algebraic variable elimination, we get a polynomial equation p(x, y, z) = 0. As an example, n = 3 leads to the following equation: Surface of genus 3 : ((xˆ3 − 3 ∗ x ∗ yˆ2 − 1)ˆ2 + (3 ∗ xˆ2 ∗ y − yˆ3)ˆ2+(2 ∗ z)ˆ2 +aˆ2 − bˆ2)ˆ2−4 ∗ aˆ2 ∗ ((xˆ3 − 3 ∗ x ∗ yˆ2 − 1)ˆ2 +(3 ∗ xˆ2 ∗ y − yˆ3)ˆ2) = 0

The last picture of orbitals with tetrahedral symmetry was created from a threedimensional constructions, generalizing the construction of the lemniscate by taking the product of distances from the point (x, y, z) to the four corners of a tetrahedron.

872

S. Klaus and B. Sriraman

Cutting and Gluing At a first glance, cutting appears as a process which seems to be impossible within the range of purely algebraic methods as it typically involves inequalities. For example, cutting a cap from a sphere x2 + y2 + z2 – 1 can be achieved by the condition z ≤ 0.5. Here is picture of such a cutting phenomenon which only involves one polynomial: Cutting a hole in sphere : (xˆ2 + yˆ2 + zˆ2 − 1) ∗ ((x − a)ˆ2 + yˆ2 + zˆ2 − bˆ2) + c = 0

This paradox can be resolved as follows: The equation consists of standard sphere (center (0, 0, 0) and radius 1) and a nonconcentric sphere (center (a, 0, 0) and radius b). Multiplication gives for c = 0 just the union of the two spheres (if the second sphere is inside the first sphere, it is not visible in SURFER, and the image is looking as the standard sphere). With the deformation parameter c, it is possible to avoid points which are very close to both spheres (as then the product is very small). These points are located in the region of a cap – that is all! The pictures above are created from the formula by playing with the parameters a, b, and c. The next topic is that of gluing. Suppose we have two surfaces with polynomials p1 and p2 that are disjoint but close together. We suppose furthermore that the polynomials have positive values outside the surfaces (at least in a small neighborhood). Then the polynomial equation p1 p2 − a = 0 has a good chance to connect the two surfaces for suitable a > 0. As a nice example, we consider two separate spheres: Gluing two spheres together : ((x − 1)ˆ2 + yˆ2 + zˆ2 − b) ∗ ((x + 1)ˆ2 + yˆ2 + zˆ2 − b) − a = 0

29

Cutting, Gluing, Squeezing, and Twisting: Visual Design of Real. . .

873

The observant reader has remarked that this is just the equation of the rotated lemniscate. Now we describe a different gluing method. We start with the formation of a two-dimensional “black hole” in the plane: Black hole in the plane : xˆ5 ∗ (yˆ2 + zˆ2 + bˆ5) − a = 0

This surface is rotational invariant and comes from the algebraic curve y2 =

a − b5 x5

where we have chosen the fifth power in order to make the black hole steeper. If we want to glue two black holes together in order to create a worm hole between two plane universes, we cannot proceed by the “smoothing method” applied to the two planes (it is easy to see that this does not lead to a good result). Instead, we use a similar function for a curve which we rotate by replacing y2 by y2 + z2 : y2 = −

a (x − 1)

5

+

a (x + 1)5

− b5

Worm hole between two planes : (xˆ2 − 1)ˆ5 ∗ (yˆ2 + zˆ2 + bˆ5) + a ∗ ((x + 1)ˆ5 − (x − 1)ˆ5) = 0

874

S. Klaus and B. Sriraman

Squeezing, Shifting, and Twisting It is also possible to deform a given surface associated to a polynomial p(x, y, z) by a coordinate transformation : R3 → R3 , i.e., we consider the equation p ◦  = 0. In order to receive an algebraic surface again, (x, y, z) = (φ 1 (x, y, z), φ 2 (x, y, z), φ 3 (x, y, z)) has to be polynomial in each coordinate. Three cases are particularly interesting for us: 1. Φ(x, y, z) = (a(z)x, a(z)y, z) with a squeezing function a(z) 2. Φ(x, y, z) = (x + a(z), y + b(z), z) with a shifting vector (a(z), b(z)) 3. (x, y, z) = (cos(a(z))x + sin(a(z))y, − sin(a(z))x + cos(a(z))y, z) with a twisting function a(z) The effect of these transformations is dilatation (squeezing), translation (shifting), and rotation (twisting) in each plane slice R2 × {z} of R3 with parameters depending on z only. Of course, the third case displays a problem as the trigonometric functions cos and sin are not algebraic. This problem can be resolved by the well-known Taylor series a2 a4 2 + 24 a3 a5 6 + 120

cos(a) = 1 −



sin(a) = a −



a6 720 ± . . . a7 5040 ± . . .

and by replacing cos and sin by finite polynomial approximations up to a certain order. As a first example, we squeeze a cube x 4 + y 4 + z4 = b14 by the function a(z) = 1 +

100a z8 + 1 + 100a = : 8 z +1 z8 + 1

Squeezed cube : (zˆ8 + 1 + 100 ∗ a) ∗ bˆ4 ∗ (xˆ4 + yˆ4) + (zˆ8 + 1) ∗ (bˆ4 ∗ zˆ4 − 1) = 0

29

Cutting, Gluing, Squeezing, and Twisting: Visual Design of Real. . .

875

The second example is a deformed thin disk x6 + b * (y2 + z2 )4 = 1 with shifting vector (a(z), b(z)) = (az2 , 0) growing quadratic as a parabola: Deformed thin disk : (x + a ∗ zˆ2)ˆ6 + bˆ6 ∗ (yˆ2 + zˆ2)ˆ4 − 1 = 0

Our last examples concern twisting. We start with a torus which is given by the well-known equation (x2 + y2 + z2 + b2 − a2 )2 − 4b2 (x2 + y2 ) = 0 where b is the large radius and a the small. As we want to twist the torus in y-direction, we replace x and z by     (cy)4 (cy)3 (cy)5 (cy)2 + + z (cy) − + x 1− 2 24 6 120 and     (cy)3 (cy)5 (cy)2 (cy)4 −x (cy) − + +z 1− + . 6 120 2 24 This leads to the following equation (in fact, we use 10c as a parameter; also note that a division by a fixed number is an allowed algebraic operation in SURFER): Twisted torus ((x ∗ (1 − 100 ∗ cˆ2 ∗ yˆ2/2 + 10000 ∗ cˆ4 ∗ yˆ4/24)

876

S. Klaus and B. Sriraman

+z ∗ (10 ∗ c ∗ y − 1000 ∗ cˆ3 ∗ yˆ3/6 + 100000 ∗ cˆ5 ∗ yˆ5/120))ˆ2 +yˆ2 + (−x ∗ (10 ∗ c ∗ y − 1000 ∗ cˆ3 ∗ yˆ3/6 + 100000 ∗ cˆ5 ∗ yˆ5/120) +z ∗ (1 − 100 ∗ cˆ2 ∗ yˆ2/2 + 10000 ∗ cˆ4 ∗ yˆ4/24))ˆ2 + bˆ2 − aˆ2)ˆ2 −4 ∗ bˆ2 ∗ ((x ∗ (1 − 100 ∗ cˆ2 ∗ yˆ2/2 + 10000 ∗ cˆ4 ∗ yˆ4/24) +z ∗ (10 ∗ c ∗ y − 1000 ∗ cˆ3 ∗ yˆ3/6 + 100000 ∗ cˆ5 ∗ yˆ5/120))ˆ2 + yˆ2)

The last image is a twisted thin cylinder where the twisting function is given by a higher order approximation of cos and sin:

Address: Stephan Klaus, Mathematisches Forschungsinstitut Oberwolfach, Schwarzwaldstrasse 9-11, D-77709 Oberwolfach-Walke, Germany

29

Cutting, Gluing, Squeezing, and Twisting: Visual Design of Real. . .

877

References Gallery (2012) Knots of Stephan Klaus on Imaginary Open Mathematics: https://imaginary.org/ gallery/stephan-klaus-knots Hartkopf A, Matt AD (2012) The art of an algebraic surface. Text on Imaginary Open Mathematics: https://imaginary.org/background-material/the-art-of-an-algebraic-surface Hartkopf A, Matt AD (2013) SURFER in math art, education and science communication. Text on Imaginary Open Mathematics: https://imaginary.org/background-material/surfer-in-mathart-education-and-science-communication Klaus S (2009) Solid Möbius strips as algebraic surfaces. Text on Imaginary Open Mathematics, 10 pages: https://imaginary.org/background-material/solid-mobius-strips-as-algebraic-surfaces Klaus S (2010) The solid trefoil knot as an algebraic surface, featured article in CIM bulletin 28. Departamento di Matematica, Universidade de Coimbra, Coimbra, pp 2–4 Klaus S (2014) On algebraic, PL and Fourier degrees of knots and braids. In: Oberwolfach workshop on algebraic structures in low-dimensional topology, 25 May–31 May 2014, organised by Kauffman LH, Manturov VO, Orr KE, Schneiderman R. Oberwolfach reports OWR 11.2, report no. 26. Mathematisches Forschungsinstitut Oberwolfach, Oberwolfach, pp 1434–1438 Klaus S (2017a) Möbius strips, knots, pentagons, polyhedra, and the SURFER software. In: Singularities and computer algebra: Festschrift for Gert-Martin Greuel on the occasion of his 70th birthday. Springer, New York, pp 161–172 Klaus S (2017b) Chapter 13: Fourier braids. In: Lambropoulou S et al (eds) Algebraic modeling of topological and computational structures and applications, Springer proceedings in mathematics and statistics 219. Springer, New York, pp 283–296 Klaus S (2019) Solid N-twisted Mobius strips as real algebraic surfaces, CIM bulletin 41. Departamento di Matematica, Universidade de Coimbra, Coimbra, pp 41–46 Klaus S, Kojima S (2019) On the moduli space of equilateral plane pentagons. In: Beiträge zur algebra und geometrie/Contributions to algebra and geometry, vol 60. Springer, New York, pp 487–497 Klaus S, Violet B (2015a) Katzengold: pyrite, plato, and a polynomial. Text on Imaginary Open Mathematics, 5 pages: https://imaginary.org/background-material/katzengold-pyrite-plato-anda-polynomial Klaus S, Violet B (2015b) Katzengold. Movie on Imaginary Open Mathematics: https://imaginary. org/film/katzengold Klaus S, Violet B (2016) Algebraic vibrations. Movie on Imaginary Open Mathematics: https:// imaginary.org/film/algebraic-vibrations Milnor JW (1997) Topology from the differential viewpoint, Princeton landmarks in mathematics and physics. Princeton University Press, Princeton Movie of CIM and MFO: LPDJLQH D VHFUHW, Conception: Victor Fernandes, Stephan Klaus, Armindo Moreira and José Francisco Rodrigues; Surfer Movie Sequences: Andreas Matt and Bianca Violet (2010) Movie on Imaginary Open Mathematics: https://imaginary.org/film/ lpdjlqh-d-vhfuhw SURFER Software (2008) Freely available on Imaginary Open Mathematics: https://imaginary. org/de/program/surfer

Double Layered Polyhedra

30

Rinus Roelofs

Contents Elevation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vertex Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Knots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Holes and Compounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connected Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connecting the Knots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Odd or Even, Grünbaum’s Double Polyhedra Versus Jitterbug . . . . . . . . . . . . . . . . . . . . . . . . Face-Doubling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jitterbug Transformation Applied to Infinite Uniform Polyhedra . . . . . . . . . . . . . . . . . . . . . . . Unfolding Multilayer Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unfolding the Double Layered Cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Double Layered Tetrahedron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Double Layered Cuboctahedron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Double Layered Dodecahedron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Double Layered Icosahedron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elevation: Combinations of Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Strips and Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zonohedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar Zonohedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

880 881 884 888 892 895 898 902 903 908 918 922 924 926 929 931 938 944 948 957 958 959

Abstract In their book “La Divina Proportione,” Luca Pacioli and Leonardo da Vinci described and illustrated an operation which you can apply to a polyhedron,

R. Roelofs () Independent Sculptor, Hengelo, The Netherlands e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_23

879

880

R. Roelofs

called Elevation. Starting from Pacioli’s basic idea, resulting in a second layer around a polyhedral shape, we can develop this idea further towards entwined double layer structures. Some of them are single objects; others appear to be compounds. With the introduction of Elevation by Luca Pacioli and Leonardo da Vinci, the interesting field of double layer structures has been opened. Inspired by Albrecht Dürer, who was the first person to publish folans of the Platonic solids and also some of the Archimedean solids, I investigated the possibility of making folding plans for double polyhedra. For some of the Platonic double layer polyhedra single sheet folding plans can be made. In my opinion, it is important to have real physical models of this group of objects because physical models help to understand the complex structure of these double layered polyhedra. Therefore I developed more ways to create these models. One of those techniques of creating models of double layered polyhedra, based on weaving, is explained here.

Keywords Polyhedra · Elevation · Folding plans · Models · Leonardo da Vinci · Luca Pacioli · Albrecht Dürer · Branko Grünbaum · Double polyhedra · Knots · Zonohedra · Weaving

Elevation In his book “La Divina Proportione,” published in 1509, Luca Pacioli (1991) introduced the concept of elevation, an operation that can be applied to the Platonic polyhedra as well as on the Archimedean polyhedra. The elevated versions of the first two Platonic solids, the tetrahedron and the octahedron, as they were designed by Leonardo da Vinci and drawn by himself or a copyist, are shown in Figs. 1 and 2. To understand what elevation means, we have to read “La Divina Proportione,” chapter L, paragraph XIX.XX, Pacioli describes the elevated version of the octahedron as follows: “And this object is built with 8 three-sided pyramids, that can be seen with your eyes, and an octahedron inside, which you can only see by imagination.” This means that the object is composed of 32 equilateral triangular faces of which 8 are hidden (Fig. 3). Pacioli describes the process of elevation as putting pyramids, built with equilateral triangles, on each of the faces of the polyhedra. The result of this operation is a double layered object and has much similarities with the stellated version of the octahedron. However, “stellation” is introduced about a century later by Johannes Kepler in his “Hamonices Mundi” (1619). Kepler defined stellation for polyhedra, as the process of extending faces until they meet to form a new polyhedron. That means that the stellation of the octahedron consists of 8 faces, just as the octahedron itself, instead of the 32 faces mentioned by Pacioli. Also the number of vertices is different: the stellation of the octahedron has 6 vertices (again the same number as the octahedron itself) whereas the elevated octahedron has 14 vertices (6 of the octahedron +8 of the pyramids).

30 Double Layered Polyhedra

881

Fig. 1 Elevated tetrahedron

Fig. 2 Elevated octahedron

Vertex Figure Let’s have a closer look at the elevated octahedron (Fig. 4) and especially at the 6 vertices of the original octahedron. After adding the 8 three-sided pyramids at each of the faces, we now have 12 triangles meeting in each of these vertices. By definition, in a polyhedron, only two faces join along any common edge. In Pacioli’s

882

R. Roelofs

Fig. 3 Pacioli’s description

Fig. 4 Elevated octahedron

elevation, however, we have some edges where four faces join along a common edge. By the way Pacioli describes the process of elevation, it is very clear which faces are really joined. So when we look at the vertex figure around one of those vertices (the vertex figure is the (spherical) polygon which describes how the faces are arranged around a vertex. It is formed as the intersection of the faces that meet at a vertex with a small sphere centered on that vertex.), we can split the set of components of the vertex figure in such a way that closed loops can be formed.

30 Double Layered Polyhedra

883

Fig. 5 Elevated octahedron

Fig. 6 Composition

Such a closed loop can be used as a path to make a walk around the vertex. We will discuss two different possibilities. In Figs. 5 and 6, we see that we have a first loop on the outer shell of the elevation (Fig. 5), and a second loop on the inner shell (on the octahedron that is still inside), Fig. 6. So the connection of the faces in these vertices can be described as follows: 33333333 (the faces of the pyramids) plus 3333 (the faces of the octahedron). Because of the special situation that four faces are coming together at each of the edges that meet in these vertices, we can take an alternative choice of pairing the

884

R. Roelofs

Fig. 7 Alternative loops

faces. For instance, we can start our walk around the vertex on one of the faces of the octahedron and then step on an adjacent face of a three-sided pyramid (Roelofs 2014). When we continue our walk, we can make a loop-like the blue line in Fig. 7. Just like in the first choice of pairing the faces, we end up with two loops around the vertex, but now the elevation splits up in two tetrahedron-like figures instead of one octahedron plus eight three-sided pyramids. It reminds Kepler’s stellated octahedron (Fig. 8), but still there is a difference: we don’t really have tetrahedra because the big triangles are in fact four joined triangles. Let us define this set of joined faces, one face of the basic polyhedron (in this case the octahedron) surrounded by faces of the elevation pyramids, one at each of the edges, as the “elevation element” (Fig. 9).

Knots After analyzing the double layer structure of Pacioli’s elevated octahedron, the question might be whether we will have similar configurations at the other elevated polyhedra. In Fig. 10, the drawing of the elevated cube is shown. Luca Pacioli explains: “ . . . it is enclosed by 24 triangular faces. This polyhedron is built out of 6 four-sided pyramids, together building the outside of the object as you can see it with your eyes. And there is also a cube inside, on which the pyramids are placed. But this cube can only be seen by imagination, because it is covered by the pyramids. The 6 square faces are the bottom faces of the 6 pyramids.” When we follow this explanation, we can make the exploded view presented in Fig. 11.

30 Double Layered Polyhedra

885

Fig. 8 Stella octangula

Fig. 9 Elevation element

Looking at the vertex figure around the one of the vertices of the original cube, we see again two separated loops, when we follow the description of Luca Pacioli. And these loops can be described as 333333 plus 444. The steps are shown in Figs. 12, 13, and 14. Also in this case, we have the situation that four faces come together at the edges that meet in these vertices. So here we can also choose for an alternative order. Instead of just following the outside shell or the inside shell, we

886

R. Roelofs

Fig. 10 Elevated cube

Fig. 11 Exploded view

start at one of the faces of the cube and then step over to the adjacent triangle face, and after a second triangular face, we step back on a face of the cube again. The total order of the faces of our walk will be 433433433 (Fig. 15). But that means that we now have one single loop. We can modify the shape of the loop so that all edges are connected lines between just two faces, when we use it as a cutting line in the total construction (Fig. 16). The total shape of the loops will then turn out to be the trefoil knot (Fig. 17). By definition, the “elevation element” for

30 Double Layered Polyhedra

887

Fig. 12 Elevated cube

Fig. 13 First loop

the elevated cube is the object shown in Fig. 18. For the real physical construction, we cut away some parts following the knot-line. And now the final model can be built with six of these elements. The final result is a single surface double layered cube (Figs. 19 and 20).

888

R. Roelofs

Fig. 14 Two separated loops

Fig. 15 Alternative path

Holes and Compounds The result of Pacioli’s elevation operation applied on a cube leads to an entwined double layered structure, when we take the alternative order of the faces around one of the cube vertices. This is different from Pacioli’s description of having a cube within a skin of triangles. It is also different from the structure of the elevation of the

30 Double Layered Polyhedra

889

Fig. 16 Trefoil knot

Fig. 17 Single surface

octahedron, where we had a compound of two polyhedra as the end result. It is one single object. And there is also another difference: the path around the cube vertices in the elevation gave knot-shaped cutting lines for the holes in the construction instead of the simple loop-shaped holes around the vertices of the octahedron in the elevated octahedron. In Figs. 21 and 22, the rounded of 3D printed versions of both the elevated octahedron and the elevated cube are shown. We can see that both objects are entwined and double layered. The object of Fig. 21 is an entwinement of

890

R. Roelofs

Fig. 18 Elevation element

Fig. 19 Construction element

two different parts, and the object in Fig. 22 is one single object. The shapes of the holes in the elevated octahedron are simple loops, and in the elevated cube, we can recognize the trefoil knots. A few questions arise: If we can make a knot-shaped path around a vertex of an elevated polyhedron, do we always get one single double layered entwined object? When we want to make a single double layered entwined object, is it only possible with knot-shaped holes? The answer to second questions is negative, because we

30 Double Layered Polyhedra

891

Fig. 20 Double cube

Fig. 21 Elevated octahedron

are some counter examples, as shown in Figs. 23 and 24. But both these examples are built like a torus. And a torus has a higher genus then the sphere, the basic shape of the polyhedra that Pacioli used for the elevations. So the question might be: How can we understand this?

892

R. Roelofs

Fig. 22 Elevated cube

Fig. 23 Double skin torus

Connected Holes To get a better understanding of the connection between the shape of the holes and the number of parts in the construction, we start with the structure of two interwoven surfaces as shown in Fig. 25. All the holes in the construction are simple loops.

30 Double Layered Polyhedra

893

Fig. 24 Trefoil torus

Fig. 25 Interwoven surfaces

Bending the construction will not change the properties of the holes (Fig. 26). We can bend it so far that the left edges get connected to the right edges. But in this situation, the right edge of one of the surfaces is connected to the left edge of the other surface and vice versa. So don’t have two surfaces anymore, but one single interwoven surface (Fig. 27). Still, the properties of the holes haven’t changed. Do we have a single double layer entwined object without knot-shaped holes? The answer is “no,” because at the moment the construction closes to a cylinder, two extra holes were created. These holes can be seen as trefoil knots (Fig. 28). The top side of the cylinder shows the same cross-section as the bottom side. So if we place two of them on top of each other, we can connect them. In the connection, two knot-shaped holes will disappear. They will again be transformed to single loops, and thus the total number of knot-shaped holes would be two again. There is another way to connect the top side to the bottom side. We don’t need more

894

R. Roelofs

Fig. 26 Bending the surfaces

Fig. 27 Cylindrical shape

copies of the structure; we can just bend again as in Fig. 29. But then, when the top side gets connected to the bottom side, all knot-shaped holes are disappeared. We now have the situation that we have a single interwoven double layer object, but we had to transform it into a torus to get this result (Fig. 30).

30 Double Layered Polyhedra

895

Fig. 28 Knot

Fig. 29 Bending again

Connecting the Knots We have seen that we can reduce the number of knots by connecting them. Let us go back to the Pacioli’s elevation of the cube. We will take the version with the alternative order of the faces around the cube vertices, the “double cube” of Fig. 20. We can now connect two of these when we remove one of the pyramids plus the underlying face of both of the elevated cubes (Fig. 31). Around the vertices of the

896

R. Roelofs

Fig. 30 Torus

Fig. 31 Connected elevated cubes

connection, the knot-shaped holes are now changed in sets of two connected single loop-shaped holes. To make this more visible, we can round of all the pyramids. In Fig. 32, this is done for the single elevated cube and we can see the knot-shape holes at the vertices In Fig. 33, the rounded of version of the connected elevated cubes, we see the linked loop-shape holes in the middle. The rounded versions of the elevated polyhedra are in fact polyhedra in which each face is doubled. The single rounded version of the elevated cube has 2 × 6 “square” faces. The connected elevated cube model has 2 × 10 “square” faces. In the vertices in the middle of the object in Fig. 33, we see 2 × 4 faces coming together and in the other vertices 2 × 3 faces are coming together. The odd number 3 leads to a knot-shaped hole at the vertices of the single cube (Fig. 34) as well as in eight of the vertices of the two connected cubes, whereas the even number 4, in the middle vertices of the two connected cubes, leads

30 Double Layered Polyhedra

897

Fig. 32 Rounded model

Fig. 33 Connected cubes rounded

to the two entwined simple loop-shaped holes (Fig. 35). Connecting three cubes, as in Fig. 36, leads to two vertices with a hole with the shape of a fivefold knot. Looking at the connection of four elevated cubes (Fig. 37), we can think of possibilities of applying the “doubling transformation” on other patterns with squares. The cube is the regular polyhedron build with squares. Another construction build with squares is the regular tiling with squares (Fig. 38). Here we can also “double” each of the squares to make a construction similar to the idea of elevation.

898

R. Roelofs

Fig. 34 1 Cube

Fig. 35 2 Connected c-cubes

The elevation of this pattern, or the “doubled square pattern,” is shown in Fig. 39. As we can see, this structure falls apart into two entwined layers, so it is a compound.

Odd or Even, Grünbaum’s Double Polyhedra Versus Jitterbug Face-doubling, a method to generate new uniform polyhedra by doubling the faces of a known uniform polyhedron, can only be applied if there is at least one vertex in which an odd number of faces come together. Only then it leads to a new polyhedron as one single object. In other cases face-doubling will lead to a

30 Double Layered Polyhedra

899

Fig. 36 3 Connected cubes

Fig. 37 4 Connected cube

compound (Grünbaum 2003). This is what Grünbaum stated in his paper “New Uniform Polyhedra.” The Jitterbug transformation is a transformation that can be applied on uniform polyhedra in which the number of faces that meet in each vertex is even. The Jitterbug transformation was discovered by Buckminster Fuller in 1948 (Verheyen 1989). In Fig. 40, we see how the octahedron transforms into the cuboctahedron by rotating and moving the triangular faces. Each triangular face is connected to three other triangular faces, meeting vertex to vertex. The movement of each of the triangular faces is a translation along the line that connects the midpoint of the face with the center of the polyhedron, together with a rotation for which this line is the axis. The Jitterbug transformation needs two different rotation directions. If one triangle rotates clockwise, then all its neighbors rotate counterclockwise. The complete Jitterbug transformation transforms an octahedron back to an octahedron.

900 Fig. 38 Square pattern

Fig. 39 Elevated squares

Fig. 40 Jitterbug transformation of the octahedron. Sequence of stills of the animation

R. Roelofs

30 Double Layered Polyhedra

901

Halfway through this process, the cuboctahedron can be recognized, but the square faces are only suggested by the empty spaces in between the triangular faces. The Jitterbug transformation connects pairs of faces with opposite rotational directions at their vertices (Fig. 41). Thus, for the Jitterbug transformation to be applicable on a polyhedron, the number of faces meeting at each of the vertices must be even. When we restrict ourselves to the convex regular and semiregular polyhedra, there are only five polyhedra that meet this requirement: the octahedron, the cuboctahedron, the rhombic cuboctahedron, the icosidodecahedron, and the rhombicosidodecahedron (see Fig. 42). In the Jitterbug transformation, all faces move and rotate about axis connecting centers of the face with the center of the

Fig. 41 Rotation of the faces: clockwise and counterclockwise

Fig. 42 The five convex regular and semiregular polyhedra with an even number of faces meeting at each vertex

902

R. Roelofs

polyhedron. These are helical movements and the path described by the movements of the vertices of a face lies on the cylinder that is an extrusion of the circle determined by the vertices of the face. In the Jitterbug transformation, neighboring faces always rotate in opposite directions and stay connected at a vertex. So, for each uniform polyhedron it seems that, either the Jitterbug transformation or face-doubling applies.

Face-Doubling As we have seen, the octahedron is the only Platonic solid on which the Jitterbug transformation can be applied, because it is the only Platonic solid with an even number of faces coming together at each vertex. In 1965, Joseph D. Clinton, a student of Buckminster Fuller, made the first dual face polyhedral transformation model (Clinton 2014). He doubled the faces of the “odd” Platonic solids (the tetrahedron, cube, dodecahedron, and icosahedron) to become polyhedral in which each vertex has even valence (an even number of faces come together at each vertex). He then connected each vertex of an outer face to a vertex of an inner face, as can be seen in Fig. 10a. Clinton’s construction is suitable for the Jitterbug transformation (Figs. 43 and 44). There is a great similarity between Clinton’s dual face concept and the facedoubling of Grünbaum, where he says: “Face-doubling replaces each face by one red and one green face, with edges joining only faces of different colors; hence the number of edges is also doubled. Face-doubling doubles the valence of odd-valent vertices. Face-doubling results in a polyhedron if and only if the starting polyhedron has at least one odd-valent vertex. Since all vertices of a uniform polyhedron have the same valence, face-doubling is applicable only to uniform polyhedra of odd valence.” So in summary, face-doubling can only be applied on the odd valent

Fig. 43 Clinton’s model of the dual face dodecahedron

30 Double Layered Polyhedra

903

Fig. 44 My visualization of Grünbaum’s double dodecahedron

Fig. 45 (a) Cube. (b, c) Dual-face cube, on which the Jitterbug transformation can be applied

uniform polyhedra and will result in even-valence uniform polyhedra. For even valent uniform polyhedra, it is possible to apply the Jitterbug transformation. As an example, face-doubling of a cube is shown in Fig. 45.

Jitterbug Transformation Applied to Infinite Uniform Polyhedra The vertices of the cube have odd valence. So we first double the faces and after that, the resulting dual-face cube is applicable for the Jitterbug transformation. Grünbaum remarks in his paper that some generalizations are possible: “First, one may add infinite polyhedra, provided they are discrete.” Clinton too makes use of this generalization in his dual-face models of Archimedean tilings. There are two infinite regular tilings with vertices of even valence and thus can be considered for

904

R. Roelofs

Fig. 46 The three regular infinite tilings

Fig. 47 A step in the Jitterbug transformation applied on the 4.4.4.4-tiling

the Jitterbug transformation. The even valence verified by the fact that these tilings are two-colorable, that is, every white tile is surrounded by black tiles and vice-versa (Fig. 46). The two colors, black and white, are translated into two different rotational directions by the Jitterbug transformation, clockwise and anti-clockwise. In Fig. 47, the different colors of the tiles are represented by the two different colored arrows. The arrows now show the rotation direction of the Jitterbug transformation. There are more ways to build infinite regular structures with only square faces. The column shown in Fig. 48 can be seen as a part of an infinite uniform polyhedron. The valence of all vertices is even and thus the Jitterbug transformation can be applied (Fig. 50). Note that the square faces of this infinite column can be 2-colored like the checkerboard coloring of the planar tiling by squares. Applying Grünbaum’s method of doubling the faces, to this column, will not result in a new uniform polyhedron, because there is no odd valence vertex. The result will be a compound

30 Double Layered Polyhedra

905

Fig. 48 Square column with square faces

Fig. 49 Doubling the square faces of the column of Fig. 48

of two entwined polyhedra, as shown in Fig. 49. On the other hand, the Jitterbug transformation works as expected, as can be seen in the stills of the animation (Fig. 50). A next step brings us to the infinite Petrie-Coxeter polyhedron 4.4.4.4.4.4 in which all vertices have even valence. The Jitterbug transformation should be applicable, because all vertices have even valence. Indeed, this is the case, as shown in Fig. 51, in which we can see a sequence of stills of the animation of the Jitterbug transformation of a fragment of the Petrie-Coxeter 4.4.4.4.4.4 polyhedron. Here too, because there are no odd valence vertices, Grünbaum’s face doubling results in a compound of two entwined polyhedra, shown in Fig. 52b (Roelofs 2016). Another infinite polyhedron worth investigating is the infinite Petrie-Coxeter polyhedron 6.6.6.6 (Fig. 53a) that is the dual of the infinite Petrie-Coxeter 4.4.4.4.4.4. Here too, face-doubling doesn’t result in a new polyhedron because all vertices are even valence, but in a compound, a pair of two entwined polyhedra (Fig. 53b).

906

R. Roelofs

Fig. 50 Jitterbug transformation applied on a regular infinite column with square faces

Fig. 51 Jitterbug transformation of a fragment of the infinite Petrie-Coxeter 4.4.4.4.4.4 polyhedron

Again, though face-doubling doesn’t work because of the even valent vertices, the Jitterbug transformation is possible. The last infinite regular polyhedron we have to examine is the infinite PetrieCoxeter 6.6.6.6.6.6, shown in Fig. 54a. We know that the Jitterbug transformation gives each face a rotational direction: if one face rotates clockwise, then its neighboring face has to rotate anti-clockwise. In order to assign a rotational direction to the faces of the polyhedron, we begin to color the faces, blue for clockwise, red for anti-clockwise (Fig. 16b). When we continue coloring the faces, we can color the next hexagon blue again (Fig. 16c), but then we have a problem. The hexagon that adjoins that face has both a blue- and a red-colored neighbor. The conclusion is that this polyhedron is not two-colorable, so that the Jitterbug transformation cannot be applied to this polyhedron, despite the fact that all its vertices have even valence. This example shows that even valence of all vertices is a necessary condition for a polyhedron to be able to have the Jitterbug condition applied, but it is not a

30 Double Layered Polyhedra

907

Fig. 52 (a) Fragment of the infinite Petrie-Coxeter polyhedron 4.4.4.4.4.4. (b) Compound after face-doubling

Fig. 53 (a) Fragment of infinite Petrie-Coxeter 6.6.6.6 polyhedron. (b) Compound as a result of face-doubling of the infinite Petrie-Coxeter 6.6.6.6 polyhedron

Fig. 54 (a) Fragment of the infinite Petrie-Coxeter 6.6.6.6.6.6. (b, c) Coloring the faces to investigate if and how the Jitterbug transformation can be applied

908

R. Roelofs

Fig. 55 Doubling the Infinite Petrie-Coxeter 6.6.6.6.6.6

sufficient condition. The ability for faces to be two-colored is also necessary. For the Petrie-Coxeter 6.6.6.6.6.6, we now have to analyze face-doubling. The result is a real “new” uniform polyhedron (see Fig. 55). This example shows that when face-doubling applied to infinite uniform polyhedra, Grünbaum’s requirement that in order to produce a new polyhedron “at least one vertex has to be of odd valence” is thus not necessary. It should be replaced by “the polyhedron is not two-colorable.” For the Jitterbug transformation to be possible, we should replace the requirement that all vertices have even valence by the requirement that the polyhedron is twocolorable. This is a stronger requirement.

Unfolding Multilayer Polyhedra In his book “Unterweissung der Messung”, Albrecht Dürer (1525) explained and showed us how to construct paper models of the Platonic solids using 2D plans. The plan for the cube is shown in Fig. 56. As far as I know, this was the first time that the Platonic solids are visualized in this way. In every presentation of 3D objects in 2D, some information will be lost. But it also offers the possibility to focus on some special properties of the presented object. For instance, in the folding plan, every face of the solid can be seen, which is not the case in the drawing of the cube Pacioli’s book shown in Fig. 57. From Dürer’s drawing, together with extra written information, the model of the cube can be made. Looking at the multilayer polyhedra, a few questions comes up: Can we make folding plans of these objects just like Dürer showed us for the elementary polyhedra? And, if so, how many different folding plans do we have for instance for the double cube? Can we find a nice way to unfold a double layer polyhedron? Or, in general, how can we construct models of these objects (Roelofs 2018). To start with the double cube for which we will concentrate on the flattened version of the structure in Fig. 58: Can we make a folding plan in 2D that can be

30 Double Layered Polyhedra Fig. 56 Cube plan

Fig. 57 Cube da Vinci/Pacioli

909

910

R. Roelofs

Fig. 58 Double cube

Fig. 59 Flattened version

folded into the “double cube.” And if so, can we also find a strip folding plan? (Fig. 59). Both turned out to be possible. An example is shown in Fig. 60. So the next question then is: How many different folding plans, or nets, do we have for the double cube? The number of nets for the normal cube is 11 (Fig. 61). Four of these nets are strip folding plans as shown in Fig. 62. We will start to examine the possible strip folding plans for the double cube. For this, we have to

30 Double Layered Polyhedra

911

Fig. 60 The net and the paper model

Fig. 61 The 11 different nets for the cube

realize that when we start with a face of the outer skin, the next face has to be a face of the inner skin. We can make this clear with the light and dark colors in the folding plan (Fig. 63). The way to find out the possible folding plans is to use the rolling dice method. To find a strip folding plan for the normal cube, this means the method can be described as follows: we roll a dice, numbered 1–6, and each numbered face acts as a stamp. We have to find a path such that every number on the dice touches the ground only once and each stamped square may have a maximum of two stamped neighbor squares. For the double cube, each neighbor has to be of the opposite color and all numbers from 1 to 6 should occur twice, one time on a dark square and one time on a light square. With this method, I found eight different solutions, shown in the figures. This is part of the solution of the problem, but it shows that it is possible to build the double layer cube from one single 2D plan. Other methods will be discussed later. Looking at the strip plans of Fig. 64, there are two plans which starts and ends with the same number “1”, in different colors Number 5 and Number 8). Both of

912 Fig. 62 Strip folding plans for the cube

Fig. 63 Coloring the outer and inner faces

Fig. 64 Strip folding plans of the double layer cube

R. Roelofs

30 Double Layered Polyhedra

913

Fig. 65 Tetrahedron

them turned out not to be practically usable. It’s impossible to really fold them together as a double layered cube. Also for the double tetrahedron and the double dodecahedron, folding plans can be made. Starting with the drawings of the tetrahedron and the elevated tetrahedron from Pacioli’s book (Figs. 65 and 66), we can develop the double layered tetrahedron shown in Fig. 67, following the same procedure as we used to obtain the double cube. Because the strip folding plan of the double tetrahedron is also a strip plan for the normal octahedron (Fig. 68), we can use two of them to build the double octahedron (Fig. 69), which is a compound, the well-known Kepler-star. To construct a folding plan for the double icosahedron, I followed another approach. We start with a folding plan of the simple icosahedron and, using two colors, we color this in such a way that no two neighbor triangles have the same color. This represents the alternate inner skin – outer skin that we need for the double icosahedron (Figs. 70 and 71). And now we connect a second similar folding plan but with opposite coloring to the first one. The complete folding plan now covers the icosahedron twice and for each triangle in the original icosahedron we now have a face for the inner skin and a face for the outer skin (Fig. 72). The result of this approach can be seen in (Figs. 73, 74 and 75).

914

R. Roelofs

Fig. 66 Elevated tetrahedron

Fig. 67 Double layered tetrahedron

Finding a net for the double layered dodecahedron turned out to be a more complicated task. Not all possible nets can be practically used. In some cases, you cannot succeed getting all the faces in the right position. We will discuss this later. Again we start from the drawings designed by Leonardo da Vinci (Figs. 76, 77, and 78).

30 Double Layered Polyhedra

915

Fig. 68 Unfolding the double layered tetrahedron

Fig. 69 Double octahedron

The flattened version of the elevated dodecahedron can now be unfolded in the net shown in Fig. 79. So now we have double versions of four of the five Platonic solids (Fig. 80). Also nets for the double versions of the Archimedean solids can be constructed. There is one example that I want to this discuss here. In Fig. 81, we see the double version of the pentagonal prism. We can unfold this object in such a way that we get one straight strip of all the square faces of the object. The four pentagonal faces are connected at this strip as shown in Fig. 82.

916 Fig. 70 Folding plan for the icosahedron

Fig. 71 Faces divided into two groups

Fig. 72 Combining two simple plans

R. Roelofs

30 Double Layered Polyhedra

917

Fig. 73 Final plan for the double icosahedron

Fig. 74 Final paper model

With the straight strip of squares, we fold the pentagonal box and then we can close the box with the pentagonal faces, two at the bottom and two at the top. This means that we can really use this as a box because it can be closed with the two pentagonal faces at the top. This method only works because we started with prism with an odd number of square faces (Fig. 83). It doesn’t work for the cube, which is a prism with an even number of square faces. It is hard to find a net for the double cube with the property that it can be closed as a box like this pentagonal prism.

918

R. Roelofs

Fig. 75 Computer drawing of the double icosahedron

Fig. 76 Dodecahedron

Unfolding the Double Layered Cube Just like the normal cube, we can unfold the double cube by making some cuts along the edges. The double layered cube then can be unfolded in the way showed in Fig. 84.

30 Double Layered Polyhedra Fig. 77 Elevation

Fig. 78 Flattened elevation

919

920

R. Roelofs

Fig. 79 Net for the dodecahedron and model

Fig. 80 Double versions of 4 of the Platonic solids

Looking at the result of the unfolded double layered cube, we can see that the plan is similar to the plan of a simple cube, first presented by Albrecht Dürer in his book Unterweisung der Messung (Fig. 85). But the unfolding of the double layered cube consists of six double faces instead of six simple faces (Fig. 86). When we look close, we see that the unfolding is an interwoven structure and can be seen as a cut out part of the interwoven square tiling of Fig. 87. Having seen this, we can go the other way around: we can start with two normal cube plans, weave them together and then fold the interwoven set to a double layered cube. A few remarks: each of the individual cube plans consists of faces for the inner cube and faces for the outer cube. These faces alternate in each of the plans (In the example of Fig. 88, the outer faces are the faces with the holes). When we weave the two plans together, in the end result all the outer faces have to be at one side

30 Double Layered Polyhedra

921

Fig. 81 Double pentagonal prism

Fig. 82 Unfolded paper net of the double pentagonal prism

(In the example of Fig. 88, in the third picture from the left, you can see them on top.). The process can be described as follows: step 1: put the white plan on top of the gray plan and, by rotating, get the gray faces with the holes on top of the white faces without the holes (as illustrated in the second picture of Fig. 88). After that, we have to complete the weaving process by bringing one more white face with a hole to the top. And now we can assemble the double layer cube by folding all the double faces in the right way. Because with this method we make use a plan of the simple cube to construct the double layered cube, we have got 11 different possible choices for our double plans (see Fig. 89). When we compare the set of plans of the example in Fig. 88 with the example in Fig. 90, we can see that in the second example, each element of the set has three inner faces and three outer faces for the double layered cube. And both elements are the same. So in this example, only one type of element is needed. The basic cube

922

R. Roelofs

Fig. 83 (a–d) Folding the double pentagonal prism

plan of this example can be seen as one strip of squares. It is one of the four cube plans without a T-junction.

Double Layered Tetrahedron The double layered cube is constructed by taking a part of the interwoven square tiling in the shape of a plan of a simple cube. For the double layered tetrahedron, we can use the same approach, but now starting with the interwoven triangular tiling (Fig. 91). There are two different plans for the simple tetrahedron. One of them is shown to us by Dürer (Fig. 92). In Fig. 93a and 94, you can see two different solutions for the parts of the interwoven tiling, needed to fold the double layer tetrahedron. The steps of the weaving process based on the plan presented by Dürer can be seen in the first three pictures of Fig. 94. In the fourth picture of Fig. 94, the final result is shown.

30 Double Layered Polyhedra

923

Fig. 84 (a–d) Unfolding the double layered cube Fig. 85 Dürer’s cube plan

When we use Dürer’s plan, both elements used for the construction are equal. In the other case, both elements will be different, as can be seen in Fig. 95. The weaving process in this case is based on rotation. There are more possibilities to create sets of plans with which we can make a double layered tetrahedron. One of them is shown in Fig. 96 (third solution) and Fig. 97. This is a perfect solution, but in this chapter, we will limit ourselves to designs based on plans of the simple polyhedra.

924

R. Roelofs

Fig. 86 Plan of the double layered cube

Fig. 87 Interwoven square tiling

Double Layered Cuboctahedron Dürer also showed plans of some of the Archimedean solids. In Fig. 98, we see Dürer’s drawing of the plan of the cuboctahedron . The cuboctahedron is built with squares and triangles. We also have an Archimedean tiling with squares and triangles. It is interesting to see that Dürer’s plan is a fragment of this tiling (Fig. 99). So it looks like we can use the same approach as we did for the double layered cube or the double layered tetrahedron to get the double layered cuboctahedron. But when we double the tiling, as we did with the square tiling and the triangular tiling, the result now isn’t a weaving of two layers (Fig. 100). Everything seems to be connected. And after folding the double cuboctahedron from the cut out folding plan shown in Fig. 101 we don’t get one connected object but a compound of two separate structures (Fig. 102). This has to do with the odd or even number of tiles coming together in a vertex, which is explained in my paper “Connected Holes,”

30 Double Layered Polyhedra

925

Fig. 88 (a–d) Making a model of a double layered cube

Fig. 89 Alternative plans for cube

Bridges 2008. The same effect will show up when we double the octahedron, also then the end result will be a compound. When we want to build models starting from a net, we generally have more choices. The decision which net to choose can be based on practical reasons or

926

R. Roelofs

Fig. 90 Double layered cube

Fig. 91 Interwoven triangular tiling

on esthetic reasons. When we look at the choices Dürer made for the nets of the Platonic solids, we see that he had a preference for a net which contains at least one node with the highest possible degree. For the cube, this is four and for the dodecahedron, it is five. The graph of the net Dürer took for the dodecahedron has two nodes with degree 5 (Figs. 103, 104, and 105).

Double Layered Dodecahedron It’s not possible to make a regular tiling with regular pentagons. So or the plan of the dodecahedron we cannot take a part of a tiling. But we can directly double any plan of a simple dodecahedron to create a double layered dodecahedron. The plan Dürer has chosen is quite familiar and has nice symmetries. For the weaving process, we can look for other properties. Looking at the graph of Dürer’s choice, we can say that this is the choice with the highest vertex values. The graph with the lowest vertex values is the representation of the strip plan, used for the design shown in Fig. 110. Making this one is like really weaving two strips together.

30 Double Layered Polyhedra

927

Fig. 92 Dürer’s tetrahedron plan

Fig. 93 (a) Double tetrahedron plans. (b) Two different solutions

The net Dürer choose for the dodecahedron has a twofold rotational symmetry. This will result in two equal parts when we apply this net for the double layered dodecahedron (Fig. 106). This can be seen as a practical advantage. Based on this symmetry property, we have a few more choices as shown in the graphs of Fig. 107 and the following figures (Figs. 108, 109, and 110). The net used for the paper model in Fig. 111 has also a twofold rotational symmetry and thus two equal parts. The corresponding graph has two nodes of degree 4.

928

R. Roelofs

Fig. 94 (a–d) Weaving and folding the double layered tetrahedron based on Dürer’s plan

Fig. 95 (a–c) Alternative plan for the double tetrahedron

For this model, both parts of the construction are equal, and we have to weave two single plans together (Fig. 112a) before we can fold the final double dodecahedron (Fig. 112b). The choices Dürer made for the nets of seven of the Archimedean solids he described are also remarkable. Also here we see nets of which the corresponding graph has one node with the highest degree possible, and in most cases, this node is in the most central position (Figs. 113 and 114). The reasons for these choices can again be both esthetical and/or practical. The net for the next example of a double layered dodecahedron is based on Dürer’s

30 Double Layered Polyhedra

929

Fig. 96 Three different combinations of simple plans

Fig. 97 Not a combination of two simple plans

supposed preference, and although it needs two different parts, it has an advantage in the building process. To entwine the parts, they have to be laid on top of each other and then the upper part has to be rotated to get the first step in the entwinement (Figs. 115, 116, 117, and 118).

Double Layered Icosahedron Also for the double layered icosahedron, there are many different basic plans we can use, in total 43,380. In Fig. 118, we see Dürer’s choice which is used for the

930

R. Roelofs

Fig. 98 Plan for the cuboctahedron Fig. 99 Plan, embedded in Archimedean tiling

design of the model of the double layered icosahedron in Fig. 119c. In Fig. 119a, c, the parts and the weaving of the parts are shown. In Figs. 120 and 121, two more solutions for the double layered icosahedron are shown. The second example is again based on a strip folding plan. Now, for the icosahedron, the most practical solutions seem to be a net with one central triangle (Fig. 122). An alternative, with the advantage of not having two

30 Double Layered Polyhedra

931

Fig. 100 Tiling after doubling

Fig. 101 Double polyhedron plan

different basic parts is the net with a set of two connected triangles in the center (Fig. 123). The final result will look like Fig. 124 in both cases. We now have developed a general method to construct plans for double layered polyhedra. This weaving concept will work for all polyhedra which have at least one vertex in which an odd number of faces come together. As a final example, the construction of the double layered truncated octahedron is shown (Fig. 125).

Elevation: Combinations of Polyhedra The double layered polyhedra which we studied so far are in fact a simplification of the Pacioli-da Vinci elevated polyhedra. Let’s go back to the original elevated

932

R. Roelofs

Fig. 102 Final model

Fig. 103 Dürer’s cube plan

polyhedra and see how the concept of weaving can be applied to create models of the elevated polyhedra (Fig. 126). Basically we can take the same steps as in the construction of the double layered cube in Fig. 90. There is, however, one step we have to take before we start the weaving: first we have to fold the pyramids, the elevations, in both parts of the

30 Double Layered Polyhedra

933

Fig. 104 Dodecahedron

Fig. 105 Graphs of the plans

model. This is shown in the second third and fourth picture of Fig. 127. So first fold, then weave, and then fold again. A three-step process. This process also opens another possibility of making variations of the elevated polyhedra: we can vary the height of the elevation in such a way that the outer skin of the construction gets the shape of another polyhedron. In the outer skin of the model in the final picture of Fig. 127, we can recognize the rhombic dodecahedron.

934 Fig. 106 Based on Dürer’s plan

Fig. 107 Symmetric graphs for the nets

Fig. 108 Second graph of Fig. 107

R. Roelofs

30 Double Layered Polyhedra Fig. 109 Third graph of Fig. 107

Fig. 110 Fourth graph of Fig. 107

Fig. 111 Alternative net with two nodes of degree 4

935

936

R. Roelofs

Fig. 112 Paper model of the alternative net with two nodes of degree 4. (a) Weaving of the two layers. (b) The almost finished folded model

Fig. 113 (a) Truncated tetrahedron. (b) Truncated cube. (c) Cuboctahedron. (d) Truncated octahedron. (e) Rhombicuboctahedron. (f) Snubcube. (g) Truncated cuboctahedron

The basic inspiration for the model in Fig. 128 was Leonardo’s elevated tetrahedron. In this model, the height of the elevation is changed in such a way that the outer skin becomes a cube. Indeed, we know we can construct a tetrahedron inside a cube by using the face diagonals of the faces of the cube. With this model, this is illustrated in a nice way.

30 Double Layered Polyhedra

937

Fig. 114 Graphs of the nets of Dürer’s plans of seven of the Archimedean solids Fig. 115 Star-shaped plan for the dodecahedron

We also know that we can construct a cube inside the dodecahedron by using face diagonals of the faces of the dodecahedron. And also this can be illustrated with a model of a double layered polyhedron. The parts, the folding of the parts, the weaving, and the final model are shown in Fig. 129. So again the three-step process: folding, weaving, and folding. Another nice and interesting property of this three-step process is that the relation between the flat parts, and the final double layered polyhedra is not obvious and sometimes unexpected. An example based on the elevated icosahedron is shown in Fig. 130. The plan seems be 5/6 of a pattern with rotation symmetry. Only after

938

R. Roelofs

Fig. 116 (a, b) Rotating the top layer to start the entwinement Fig. 117 Entwined plan of the double dodecahedron together with the final model

the first folding step, we can recognize the plan of a icosahedron (Fig. 130b). After interweaving two of those folded plans, we can fold everything together to the final elevated icosahedron.

Strips and Rings There is one other association that appeared to be fruitful in the design process of weaving models of double layered polyhedra. The structure of the double layered cube shown in the picture shown in Fig. 131 has much in common with

30 Double Layered Polyhedra

939

Fig. 118 Plan of the icosahedron

Fig. 119 (a–c) Parts, weaving, and final double layered icosahedron

the Borromean ring structure, the three interwoven rings in the picture shown in Fig. 132. We can divide the 2 × 6 faces of the double layered cube into three strips of four faces (Fig. 133a). In the final model, we still can recognize the three strips, now as three interwoven rings (Fig. 133c). The final object, however, has the same structure as the double layered cube of Fig. 131. Also the combination polyhedron cube/dodecahedron of Fig. 129 can be made as a three-ring construction. The strips and the final folded model are shown in Fig. 134. The Borromean ring structure can still be recognized thanks to the use of the colors (Fig. 134b).

940

R. Roelofs

Fig. 120 Cross plan for the double icosahedron

Fig. 121 Single strip plan for double layered icosahedron

Fig. 122 (a, b) Two nets with a central triangular face, to be combined to form the double icosahedron

30 Double Layered Polyhedra

941

Fig. 123 (a, b) Single plan and interwoven plans Fig. 124 Double icosahedron

The real double layered dodecahedron consists of 2 × 12 = 24 faces. We can divide this in four rings of six faces (Fig. 135) and then construct the four rings double layered dodecahedron as shown in the last picture of Fig. 136. In this construction, each set of two rings is interlocked in contrast to the Borromean Rings. In the same way, we can make a model of the double layered rhombic triacontrahedron with six strips (Fig. 137). The rhombic triacontrahedron has 30 faces, all rhombs. Doubling this polyhedron leads to 60 faces, that can be divided in 6 strips with 10 faces each. Each strip is made up of five inner faces and five outer faces, alternately.

942

R. Roelofs

Fig. 125 (a) Single strip plans and (b) interwoven plan for (c) the double truncated octahedron Fig. 126 Model based on Leonardo’s drawing of the elevated cube. Fold – weave – fold

To make the paper model, the individual strips were cut out with a laser cutter (Fig. 138a). The first step then is to fold one single strip into a ring (Fig. 138b). And after that, all the other strips are woven into the construction as shown in Fig. 138c. Figure 138d shows the complete model of this double layered polyhedron. The rhombic triacontrahedron is the dual of the icosidodecahedron, one of the Archimedean solids. In the same way, we can make a model of a double layered version of the dual of the rhombic cuboctahedron (Fig. 139).

30 Double Layered Polyhedra

943

Fig. 127 (a–f) Double polyhedron: outer skin is the rhombic dodecahedron as a result of the elevation of the cube

Fig. 128 (a–c) Model based on Leonardo’s drawing of the elevated tetrahedron. Fold – weave – fold

Again this results in a model that can be divide in six strips. The main difference is that the strips do not follow the meridian of the sphere that surrounds this polyhedron. And therefore we will get the bended strips as shown in Fig. 140a. When we start building this model, we have to interweave four strips as is shown in Fig. 140b. We then continue by interweaving a next strip, which we close as a ring (Fig. 140c). Figure 140d shows the completed model. The final model that I want to show here is the model of a double layered dual of the rhombicosidodecahedron (another Archimedean solid). The model is constructed out of 10 equal strips, interwoven through each other. Again, because the rings do not follow the great circles of the circumscribed sphere, the strips for the construction are bended as can be seen in Fig. 142a. The easiest way to build this model is to start by making one ring (Fig. 142b) and then weave five strips through this ring as is shown in Fig. 142c.

944

R. Roelofs

Fig. 129 (a–d) Variation of the double layered cube: combination of cube and dodecahedron

Finally, Fig. 142d shows the completed paper model of this double polyhedron. Looking at the triangular, square, and pentagonal holes, the basic Archimedean solid, the rhombicosidodecahedron can still be recognized. I think this ring-weaving concept is a very nice way to construct models of some of the double layered polyhedra.

Zonohedra In Fig. 127, we have seen a double polyhedron which is in fact a combination of two different polyhedra, the cube and the rhombic dodecahedron. Both of these polyhedra are built with parallelograms and therefore they belong to the family of zonohedra . This is an interesting class of polyhedra that deserves special attention especially when we look at the possibilities to double these polyhedra (Figs. 143 and 144). We have studied the double cube intensively and one of the approaches to construct a model was to just double a basic plan of the cube, interweave both plans, and fold this together to construct the final model.

30 Double Layered Polyhedra

945

Fig. 130 (a–d) Model based on Leonardo’s drawing of the elevated icosahedron. Fold – weave – fold

Fig. 131 Double layered cube

946 Fig. 132 Three interwoven rings

Fig. 133 (a–c) Weaving with strips. Double cube from three strips

R. Roelofs

30 Double Layered Polyhedra

947

Fig. 134 (a) Strip weaving: cube/dodecahedron. (b) Borromean ring model of the cube/dodecahedron

Fig. 135 (a) Double layered dodecahedron (four strips). (b) One strip

Fig. 136 (a, b) Four strips, to be folded to four rings for the construction of the double dodecahedron

We can follow the same steps to become a double version of the rhombic dodecahedron. Figure 145 shows the interwoven cut out plans for the double rhombidodecahedron, and in Fig. 146, the final model is shown.

948

R. Roelofs

Fig. 137 (a, b) Strip weaving: double layered rhombic triacontrahedron (six strips)

Fig. 138 (a–d) The construction of the paper model of a double layered rhombic triacontrahedron

Polar Zonohedra A special subgroup of the zonohedra is the so-called polar zonohedra . When we realize that the cube and the rhombic dodecahedron are in fact the first two members

30 Double Layered Polyhedra

949

Fig. 139 (a, b) Strip weaving: double layered dual of the rhombic cuboctahedron (six strips)

Fig. 140 (a, d) The construction of the paper model of a double layered rhombic cuboctahedron

of the group of polar zonohedra, we can develop a more general way to construct models of the double versions of the members of this group. Figure 147 shows the front and the top view of the first three polar zonohedra. In the top view, we see the threefold rotational symmetry of the cube and the fourfold symmetry of the rhombic dodecahedron. The next polyhedron in this group is the rhombic icosahedron with fivefold symmetry. Polar zonohedra can

950

R. Roelofs

Fig. 141 (a, b) Strip weaving: double layered dual of the rhombicosidodecahedron (10 strips)

Fig. 142 (a–d) The construction of the paper model of a double layered dual of the rhombicosidodecahedron

be deconstructed in as many “zones” as their number of rotational symmetry. The zones then are a strip of connected faces, starting with a face on the top and ending with a face at the bottom. We can see such a strip of the rhombic icosahedron in Fig. 148a. The following step is to double these strips and to weave pairs together

30 Double Layered Polyhedra

Fig. 143 (a) Elevated cube. (b) Cube and rhombic dodecahedron

Fig. 144 Basic plans for a cube and for a rhombic dodecahedron Fig. 145 Interwoven paper plans for the construction of the double rhombic dodecahedron

951

952

R. Roelofs

Fig. 146 (a, b) Final model of the double rhombic dodecahedron

Fig. 147 (a, b) The first three members of the group of polar zonohedra

as in Fig. 148b. To make a model of the rhombic icosahedron, we need five of those interwoven strips (Fig. 148c). And now we can connect the double strips as is shown in Fig. 149a. We have to make sure that the inner faces get connected to the outer faces. After making the second connection at each of the double strips, we already get the shape of the top of the double rhombic icosahedron (Fig. 149b). When we continue connecting the double strips, we finally will get the complete model of the double rhombic icosahedron. The next members of the group of the polar zonohedra have 6-, 7-, and 8-fold symmetry, respectively. To show a different strategy for making a model of a double polar zonohedra, I will start with the polar zono-8 as an example. The basic shape is shown in Fig. 150. Again we will use the strips of connected faces that start with a face at the top and end with a face at the bottom. As we can see in Fig. 151a, b, there are two ways to compose such a strip. The helical strip can be left turning or right turning. We will use both types. To make the double polar zonohedron, the strips should be composed from both inner and outer faces, alternately. We then construct the model by weaving the single

30 Double Layered Polyhedra

953

Fig. 148 (a–c) Single strips, which has to be interwoven to form the basic parts for the construction of the double rhombic icosahedron

strips. In Fig. 152, we can see the left turning strip and the right turning strip. The weaving process is shown in Fig. 152b, c. In Fig. 153, we can see the nice interwoven structure of the model of a double polar zonohedron-8. The final double polar zonohedron-8 is not a compound but one single double polyhedron. In every polar zonohedron, all vertices have degree 4 (in each vertex, four faces meet) except the first ring of vertices next to the top vertex and the first ring of vertices next to the bottom vertex. Those vertices all have degree 3 and therefore the double polar zonohedron becomes one single object. It is nice to use this left-right weaving technique for the other polar zonohedra as well (Fig. 154). In the next pictures, this technique is applied to construct a model of the double polar zonohedron-6 (Fig. 155). Again we have two type of strips and we get a nice pattern of on the interwoven double polar zonohedron-6 as we also have seen on the double polar zonohedron-8 (Fig. 156).

954

Fig. 149 (a–d) The construction of the double rhombic icosahedron

Fig. 150 Polar zonohedron-8

R. Roelofs

30 Double Layered Polyhedra

Fig. 151 (a–c) Left and right turning strips of the zonohedron

Fig. 152 (a) Single strips, left and right turning. (b, c) Weaving the strips together

955

956

R. Roelofs

Fig. 153 (a, b) Front and top view of the paper model of the double polar zonohedron-8

Fig. 154 (a–c) Perspective, top and front view of the polar zono-6

Fig. 155 (a) Single strips, left and right turning. (b) The completed paper model

30 Double Layered Polyhedra

957

Fig. 156 Double polar zonohedron-8 and double-polar-zonohedron-6 constructed using the leftright weaving technique

Fig. 157 Double triangular net

Conclusion We have seen different methods to construct double polyhedra. Some models were made from just one single strip, others were made using two layer interwoven plans based on the plans of the single polyhedron. And also the technique of weaving multiple strips is applied. The starting points for the double polyhedra were the regular and the semiregular solids, their duals, and the polar zonohedra. With the developed techniques, we can now explore other geometrical shapes like columns and toruses.

958

R. Roelofs

Fig. 158 Double tetrahelix

Fig. 159 Double triangulated torus

To give a few examples: starting from an interwoven double triangular net (Figs. 157), we can fold a double tetrahelix (Fig. 158) which then could be bend to construct a double triangulated torus (Fig. 159).

Cross-References  Geometric and Aesthetic Concepts Based on Pentagonal Structures  Mathematics, Humanities, and the Language Arts: An Introduction

30 Double Layered Polyhedra

959

References Clinton JD (2014) Intersecting cylinders and the jitterbug. Bridges Conference Proceedings, Winfield, pp. 345–346 Dürer A - Unterweissung der Messchung, Nurnberg, 1525 Grünbaum D (2003) “New” uniform polyhedra, Discrete geometry: In Honor of W. Kuperberg’s 60th birthday monographs and textbooks in pure and applied mathematics, vol. 253. Marcel Dekker, New York, pp. 331–350 Pacioli L, da Vinci L (1991) La Divina Proportione, 1509, (Ed.) Akal SA, Madrid Roelofs R (2014) Elevations and stellations. Bridges conference proceedings, Seoul, pp. 235–242 Roelofs R (2016) The Elevation of Coxeter’s Infinite Regular Polyhedron 444444, Bridges Conference Proceedings, Jyväskylä, pp. 33–40 Roelofs R (2018) Weaving Double-Layered Polyhedra, Bridges Conference Proceedings, Stockholm, pp. 139–146 Verheyen HF (1989) The Complete Set of Jitterbug Transformers and the Analysis of their Motion, Comput Math Applic 17(1-3):203–250

Part II Mathematics, Humanities, and the Language Arts

Mathematics, Humanities, and the Language Arts: An Introduction

31

Gizem Karaali and Bharath Sriraman

Contents Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

965

Abstract In this short introduction, the section “Mathematics, Humanities, and the Language Arts” of the Handbook of the Mathematics of the Arts and Sciences is discussed.

Keywords Literature · Mathematics · Poetry · Mathematics and Fiction

Mathematics is at the heart of four of the seven historic liberal arts (arithmetic, music, geometry, and astronomy, making up the quadrivium). As such it is natural to expect that today’s humanists would feel some kinship towards mathematics. Unfortunately, this does not seem to be the case. Most humanists do not associate mathematics with the humanities; they see the humanist’s task as decidedly disjoint from that of the mathematician. Of course, this perspective is not limited to the humanists; many mathematicians agree. Several contributors of this handbook, on the other hand, feel strongly that the humanities and mathematics are close relatives,

G. Karaali () Department of Mathematics, Pomona College, Pomona, CA, USA e-mail: [email protected] B. Sriraman () Department of Mathematical Sciences, University of Montana, Missoula, MT, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_144

963

964

G. Karaali and B. Sriraman

though they may not always be on speaking terms. They view the intertwining of mathematical themes and structures in works of literature as one of the many bridges between these two worlds. Numerous examples abound where such bridges have been instrumental in bringing interesting mathematics to the public eye as well as tackling societal themes subversively. Edwin A. Abbot’s Flatland is one such classic, where geometry was used to satirize and challenge the mores of Victorian society. Charles Dodgson’s Euclid and his Modern Rivals is another good example where contemporary geometry books were compared to the 13 books of Euclid’s Elements and were deemed inferior to the treatment of geometry in the Elements. The contributors to this section of the Handbook of the Mathematics of the Arts and Sciences offer us an eclectic selection of chapters where we can see for ourselves how the mathematical and the literary can come together to weave a coherent and even elegant tapestry. Perhaps because the emphasis on form and concision in poetry especially appeals to the mathematical aesthete, most of the articles in this section focus on mathematical poetry and the poetry of mathematics. Gizem Karaali and Lawrence M. Lesser, in Chap. 32,  “Mathematics and Poetry: Arts of the Heart,” offer a broad overview of the world of mathematical poetry, by exploring the closely related phrases “mathematical poetry,” “poetic mathematics,” “mathematics of poetry,” and “poetry of mathematics.” In Chap. 36, titled  “Poems Structured by Mathematics,” Daniel May zeroes in on the formal relationship between mathematics and poetry and takes us on a journey of discovery into the world of mathematically structured poetry. In particular, he addresses the different uses of mathematical ideas in poetry, extracts mathematical structures from these different uses, as well as touches on the work of the Oulipo group that intentionally used mathematical forms. Poetry is a paragon of concision; poetic language is dense in meaning. As such, Dmitri Manin’s Chap. 34,  “Running in Shackles: The Information-Theoretic Paradoxes of Poetry,” is a welcome addition to this section. Manin cleverly employs the tools of information theory to explore whether we can translate the first sentence of this paragraph into precise mathematics. Karaali and Lesser in Chap. 32,  “Mathematics and Poetry: Arts of the Heart,” explore deeper connections that link mathematics and poetry. In their own words, their chapter aims to “carve out a dynamic and playful transdisciplinary space in which differences melt into similarities, allowing the mind to expansively perceive and embrace the boundless nature of the interaction between two fields commonly seen as being separate and disjoint.” This section also includes Chap. 33,  “‘Elegance in Design’: Mathematics and the Works of Ted Chiang,” a delectable chapter by Jessica K Sklar on the mathematically rich world of Ted Chiang’s stories. Many people enjoy science fiction; Ted Chiang writes science fiction that is especially close to a mathematician’s heart. Sklar opens these up for those who already enjoy Chiang’s work and for the others who are just coming on board. Another chapter in this section engages with Lewis Carroll, aka Charles Lutwidge Dodgson, another dear to many mathematicians’ hearts. In Chap. 37,  “Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise,” Natalie Schuler

31 Mathematics, Humanities, and the Language Arts: An Introduction

965

Evers investigates the Euclidean and non-Euclidean traces in Through the Looking Glass and reminds us once more that mathematics is naturally embedded into our worldview and hence can be deeply embedded in works we never expected to find it. We believe these chapters show the reader the varieties of mathematical experiences that are to be had in the world of the language arts. Mathematics and language are intricately connected, and together they draw for us the boundaries and expanses of our humanity.

Cross-References  Mathematics and Origami: The Art and Science of Folds

Mathematics and Poetry: Arts of the Heart

32

Gizem Karaali and Lawrence M. Lesser

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematics of Poetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syllabic Verse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rhyme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Visual Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Mathematical Concerns About Poetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Poetry of Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Poetic Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Poetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Educational Possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further Reading and Making Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

968 970 970 971 972 972 973 973 974 976 977 978

Abstract Two mathematical poets share reflections on the overlap between mathematics and poetry. In particular they respond to questions such as what makes mathematics poetic, what makes poetry mathematical, what mathematical poetry might entail, and how it might be used in an educational setting. An extensive list of references points to the rich and growing literature on the topic.

G. Karaali () Department of Mathematics, Pomona College, Pomona, CA, USA e-mail: [email protected] L. M. Lesser Department of Mathematical Sciences, The University of Texas at El Paso, El Paso, TX, USA e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_45

967

968

G. Karaali and L. M. Lesser

Keywords Poetry · Meter · Rhyme · Visual poetry · Pedagogy · Mathematical poetry

Introduction Mathematics and poetry are realms that the uninitiated often believe are distinct or even disjoint. Such people would not be surprised to learn that a search for books on “math and poetry” yields few hits before the twentieth century. (The nineteenth-century English mathematician Ada Lovelace is known to have described mathematics as the “poetic science,” but other than her letters, we have no record of this line of inquiry in the literature.) Indeed, if poetry is concise linguistic expression of emotion and experience, and if as many people believe, mathematics is not experiential and should be free of emotion, then there should be no overlap! But emotions do not leave the building when math makes its entrance (e.g., see Zeki et al. (2014) for a study that shows math indeed uses the emotional part of the brain). In fact, for many mathematicians, as well as students and teachers of mathematics, and even for many who have proudly avoided that particular four-letter word for decades, mathematics evokes strong emotions. We found that mathematical poetry can be a possible outlet for these very human responses. More broadly, in this article, we are interested in exploring the various connections between mathematics and poetry. We begin with a consideration of similarities. Here is what S.T. Sanders wrote in 1942: Indeed, poet and mathematician are, at times, singularly alike in their sense of detachment from so-called reality, and in their devotion, the one to his logic-spun domain, the other to his empire of fancy and emotion. Alongside of the poet’s dream of a Paradise Lost, or an Ancient Mariner, may be set the mathematician’s vision of a geometry not Euclid’s or a physics not Newtonian. (Sanders 1942)

In other words, mathematicians and poets both live in their respective dream worlds. Thus, what is poetical shares some essential characteristics with what is mathematical: In each is the instinct for seeing simplicities in the complex thing, the mathematician reducing it to his points, lines, variables, the poet expressing it in terms of equally elementary things, as light, setting, beauty. Again, in each is the urge for a picturization with universal appeal. Tennyson’s “Break, break on thy cold grey stones, O Sea” matches in the wide sweep of its call to human emotion the universality of reason’s conviction that, given Euclid’s assumptions, the sum of the angles of every plane triangle is 180 degrees. (Sanders 1942)

Sanders romanticizes mathematics by associating it with poetry and thus flatters mathematicians and poets alike. But when we dig deeper into this brief essay, or more broadly, look into the precise relationship between the two, what do we get? Is there really some commonality to speak of or are mathematicians unjustifiably claiming a disciplinary kinship to poetry?

32 Mathematics and Poetry: Arts of the Heart

969

Before attempting an answer to this question, we cite some others positing an affinity between mathematics and poetry; we expect the reader will have seen some but hopefully not all: • “Pure mathematics is, in its own way, the poetry of logical ideas.” – Albert Einstein • “Mathematics is one of the essential emanations of the human spirit – a thing to be valued in and for itself – like art or poetry.” – Oswald Veblen • “A mathematician who is not at the same time something of a poet will never be a full mathematician.” – Karl Weierstrass • “What, after all, is mathematics but the poetry of the mind, and what is poetry but the mathematics of the heart?” – David Eugene Smith • “You may fly to poetry and music, and quantity and number will face you in your rhythms and your octaves.” – Alfred North Whitehead • “The true spirit of delight, the exaltation, the sense of being more than man, which is the touchstone of the highest excellence, is to be found in mathematics as surely as in poetry.” – Bertrand Russell • “The union of the mathematician with the poet, fervor with measure, passion with correctness, this surely is the ideal.” – William James • “If a man is at once acquainted with the geometric foundation of things and with their festal splendor, his poetry is exact and his arithmetic music.” – Ralph Waldo Emerson • “Poetry is a mystic, sensuous mathematics of fire, smoke-stacks, waffles, pansies, people, and purple sunsets.” – Carl Sandburg • “Poetry is a form of mathematics, a highly rigorous relationship with words.” – Tahar Ben Jelloun What might be interesting to point out here is that the above list includes not only mathematicians but also scientists, philosophers, and writers. Also, see Growney (2008). Readers may also find it interesting (and challenging) to decide whether each quotation in JoAnne Growney’s quiz (Growney 1992) is better completed with “poetry/poet” or with “mathematics/mathematician.” In his essay “Two Languages and the Chasm Between Them,” Adin Steinsaltz gives a hint at how poetry and mathematics differ (Steinsaltz 2016): “Science takes a myriad of phenomena and gives them one name, while poetry takes one phenomenon and gives it many names.” This is not unlike how Cobb and Moore (1997) describe how context obscures structure in mathematics but provides meaning in statistics. Because many famous people were mathematicians and poets (Lewis Carroll, Johannes Kepler, Dante, Omar Khayyam, Eratosthenes of Cyrene, Bhaskara, etc.), however, it is not surprising that there must be many ways poetry and mathematics concur, see Marcus (1998). Both realms involve: difficulty of categorization, balance of invention and discovery, drive towards abstraction, interest in what is hidden, conciseness, infinity, elegance or delight, self-reference, approximation, a search

970

G. Karaali and L. M. Lesser

for truth, and the ability to be explored across the curriculum. It is a fascinating historical note that a mathematician used poetry to strategically transmit and protect a piece of mathematics. Niccolò Tartaglia presented his solution (with triple rhyme scheme and tercets!) to the cubic equation in 1539 to Cardano as a poem (perhaps to ensure it would not be altered). In addition to similarities of essence, mathematics and poetry also have similarities in their process. “Flashes of inspiration” come anytime/anyplace (be ready!), but are usually preceded by much reflection, false starts, and routine of practice to develop tools to attack the blank page. We often try varying structures or starting points (how would you complete the famous quote that “the essence of ______ is its freedom”?) without always knowing what the final destination will be. During the process, the essence or germinal phase may come quickly, followed by a longer period to work through and finalize details. Some works end up being steps toward something “bigger.” We need to know when exactness is needed, when peer feedback or collaboration is helpful, how multiple “right answers” can differ aesthetically, and so on. In the rest of this chapter, we continue our exploration of the connections between mathematics and poetry by trying to open up four related phrases: mathematics of poetry, poetry of mathematics, poetic mathematics, and mathematical poetry. Many of these ideas connect within the realm of education for both authors, so we wrap up our discussion with reflections on the educational possibilities of mathematics and poetry in tandem.

Mathematics of Poetry Poetry often involves counting, sequences, visual patterns, and other mathematically interesting formal features. Through the centuries, many people found poetry a fruitful playground to explore mathematically. Also, see (Growney 2006, 2008). Here we explore a few formal poetic features that have mathematical connotations.

Syllabic Verse Counting is a natural place to start thinking about the mathematics of poetry. Classic forms of English poetry inspired by Greek forms such as the iambic pentameter and more recent ones inspired by the Japanese haiku often require a strict counting of the syllables in each line. This tendency for numerical constraints is not limited to English-language poetry of course. Many other European and non-European languages also have poetic forms depending on syllabic count constraints. For example, twentieth-century republican Turkish poetry inspired by Anatolian bards and poets of earlier centuries also depends on strict syllable counts. These kinds of constraints may be pointing toward the musical origins of poetry. In many historical cultures, poetry was regularly accompanied by music, and often sung. Thus, what made for good rhythm of public music might have translated into similarly mathematical constraints on the sound systems of poetry.

32 Mathematics and Poetry: Arts of the Heart

971

The meter in English poetry does not only involve counting syllables, however. The length of each syllable is also significant, and hence we get into the realm of combinatorics. In fact, the various ways of organizing long and short vowels in Sanskrit poetry was one of the motivations for Pingala (ca. 300 BCE) to develop combinatorial mathematics. His Chandah.s´a¯ stra is the first known work on Sanskrit poetic forms. Even though many today avoid classical forms depending on syllabic count constraints, there are also new forms and styles of poetry that are specifically generated around such conditions. Mike Pinter describes how a modern form called the Dekaaz depends inherently on combinatorics and can be fruitfully studied using some standard combinatorial tools (Pinter 2014). A simpler connection with combinatorics may be found in the fib, a form that reflects the consecutive terms of the Fibonacci sequence in the syllable counts of its consecutive lines.

Rhyme Rhyme patterns are, obviously, patterns. And mathematics, in one of its many manifestations, is the study of patterns. So, it is natural that we would talk here about rhyme patterns and poetic forms determined by them. While poetry can have intricate internal rhyme patterns and more complex sound alignments that go beyond simple alliteration within lines, the more common focus is on patterns of end-rhyme, using a different capital letter to denote a different sound that ends a line. Here are some standard examples, roughly in order of increasing complexity: Couplet: AA Triplet: AAA Monorhyme: AAAAAAAAA . . . Enclosed rhyme: ABBA Alternating rhyme: ABAB Ballad: ABAB × 4 or ABCB × 4 Limerick: AABBA Ghazal: AA BA CA DA, . . . (this is a very simplified representation; see more at https://poets.org/text/ghazal-poetic-form) Terza Rima rhyme scheme: ABA CDC EFE Keats Odes rhyme scheme: ABAB CDE CDE Shakespearean sonnet: ABAB CDCD EFEF GG Villanelle: A1 bA2 , abA1 , abA2 , abA1 , abA2 , abA1 A2 . (where the capital letters are whole lines (refrains) that are repeated through the poem and the lower-case letters correspond to rhyme sounds that remain the same throughout). Pantoum: ABCD BEDF EGFH . . . (where the second and fourth lines of each fourline stanza is the first and third line of the next four-line stanza; see more at https://poets.org/text/pantoum-poetic-form) The sestina is perhaps the most mathematical rhyme pattern. It is made up of 39 lines, the first 36 of which are organized into six stanzas of six lines each and

972

G. Karaali and L. M. Lesser

the last three make a three-line envoi. The words that end the six lines of the first stanza are then permuted in a specific pattern through the next five stanzas. Though its origins go back at least to the twelfth century, the mathematics of the rhyme pattern of the sestina is perhaps best described by group theory. Caleb Emmons describes this structure poetically via his sestina “S|{e,s,t,i,n,a}| ” (Emmons 2007).

Visual Form Other poetic forms with mathematical structure may be visually detected. For example, a square poem literally looks like a square and its syllable count in each line is the same as the total number of lines of the whole poem. A triangular poem similarly looks like a triangle on the page, such as the example in Growney (2006). Other poets have used more unusual visual forms, such as poems written in the shape of a decision tree, a bell curve, or a fractal. Many well-known examples of shaped poetry, like Lewis Carroll’s “The Mouse’s Tale,” have mathematics built into their form. Though shaped like a mouse’s tail, the poem might also lead us to imagine the mathematician Carroll playing with the idea of oscillating convergence. Visual poetry engages with mathematics in a wide range of ways. We will have more to say about this in the section “Mathematical Poetry.”

Other Mathematical Concerns About Poetry David Alpaugh in his quantitatively well-stocked essay “The New Math of Poetry” (Alpaugh 2010) bemoans the proliferation of poetry outlets and publications and cites the following numbers: “the total number of literary journals publishing poetry 50 years ago [was] 300 to 400. Today the online writers’ Duotrope’s Digest lists more than 2,000 ‘current markets that accept poetry,’ with the number growing at a rate of more than one new journal per day in the past six months.” He also complains that too many people are writing “poetry” (his quotes) and this is not at all a good thing. The tone of Alpaugh may feel elitist to some (and remind others of Joseph Epstein’s “Who Killed Poetry?” (Epstein 1988) in its tenor if not in its main target). However, one might sympathize with at least one point Alpaugh makes that perhaps some amazing poetry is “buried in the overgrowth” and we may never know. The situation in music might provide some comfort. Since music production became something folks with some training, interest, and minimal gear could get into, there is a lot more music out there, and there is freedom in that. Perhaps the democratization of poetry is also not a bad thing after all! Another author who should be mentioned here is Jon Wesick. In his article “Moneyball for Creative Writers: A Statistical Strategy for Publishing Your Work” (Wesick 2017), Wesick also uses mathematics to the benefit of his argument and his argument is inherently mathematical. Wesick develops a statistical model of his

32 Mathematics and Poetry: Arts of the Heart

973

poetry submissions and offers ways to use various “numerical metrics to numerical metrics to maximize publications.” Wesick’s concerns are completely orthogonal or even antithetical to Alpaugh’s, but they too fall naturally under the label of “mathematics of poetry”. Both Alpaugh and Wesick write about issues that are in some sense external to the internal affairs of poetry, but they are still very much relevant to the world of poetry at large.

Poetry of Mathematics It is not too difficult to find mathematicians wax poetic about mathematics. Indeed, form, an essential feature of poetry, is a significant component of mathematics today. Theorems are written in a formal style, and proofs are stylized, too, at least to an extent. And these styles evolve; in other words, they are cultural artifacts. In these ways, then, mathematics resembles poetry in form and style, as well as in its situatedness within a human (historical and cultural) context. As we said above, however, our goal is to try and tease out exactly what beyond metaphor we can find in such assertions that there is poetry in mathematics. Growney (2009) tries to explain “what poetry is found in mathematics.” She tells us that good mathematics is poetry itself, just like Henle (2011) argues that some mathematics is not just poetic but is itself poetry. We will come back to these assertions in the next section, on poetic mathematics. For now, however, it is perhaps more productive to think about the phrase “poetry of mathematics” as a counterpart of the phrase “mathematics of poetry.” If the latter is about how we can see mathematics in poetry, or how mathematics can organize poetry, then the former should be about how we can see poetry in mathematics, or how poetry can organize mathematics. Or can we use this phrase to talk about how we can do poetry about mathematics? Here boundaries begin to blur. Poetry in mathematics blends into the next section, on poetic mathematics. Poetry about mathematics ushers us into the section after that, about mathematical poetry. And poetry organizing mathematics might take us back to Tartaglia, and perhaps even further back, to Sanskrit mathematicians and astronomers, but is today seen mainly in school mathematics. We will visit this theme in our penultimate section, on the educational possibilities of exploring the intersections and interactions of mathematics and poetry.

Poetic Mathematics To understand what the phrase “poetic mathematics” might mean, we might ask around. Surely many mathematicians will come up with examples of “beautiful mathematics” when asked: G.H. Hardy, for example, would likely offer the proof of the irrationality of the square root of 2 or that of the infinitude of primes. It is indeed

974

G. Karaali and L. M. Lesser

true that these proofs are beautiful and can be appreciated as such by many people. Indeed, recent research shows that laypeople, too, and not only mathematicians, may experience mathematics aesthetically (Johnson and Steinerberger 2019). And there is a whole body of still-growing literature on just what makes beautiful math beautiful; see the special issue of the Journal of Humanistic Mathematics on the nature and experience of mathematical beauty (Raman-Sundström et al. 2016) as a possible entry point to the contemporary conversation on the topic. But not all beautiful things are poetry, and poetry is not always beautiful. This means that the question “what makes some mathematics specifically poetic rather than simply, or more broadly, beautiful?” still remains pertinent. At this juncture, a recent paper titled “Is (Some) Mathematics Poetry?” (Henle 2011) becomes quite relevant. Here James Henle explores and opens up a notion mentioned in passing in Growney (2009). What if some mathematics is not only poetic but actually poetry? Can we see this? For this of course, one might imagine we would need a solid theory of poetry. Henle decides instead “to create works which act on the reader as poetry acts”: One reads a poem and understands one level immediately, a story, an image. But beyond that level, one senses deeper truths, thoughts not stated but hinted or implied by the words. A poem can kindle a sense of wonder – “can it be true that . . . ?” And of course the very structure of the poem can have aesthetic qualities that, in addition to pleasing, can enhance the message, the thoughts, the truths of the poem. (Henle 2011)

As his first concrete example, Henle offers us (a finite excerpt from) Pascal’s Triangle (known in earlier times as the Meru-prastaara, or Yang Hui’s triangle). He writes: The poem is, of course, symmetric. Countless millions have seen the lines. They understand how it is put together. They understand how it will grow. They quickly see patterns. They wonder if the patterns they see will persist. They wonder what further patterns can be found. More individuals have found pleasure and inspiration in this work than in most of the verses written in English or any other language. (Henle 2011)

Henle’s assertions are arguable, but for many mathematicians, they resonate. Henle presents six of his own creations in this chapter and invites others to try their hands at this type of poetry. Sara Katz takes this on in Katz (2017). Also see (Emmons 2010). Maslanka (in Maslanka 2010) coins the phrase “pure maths poetry” to this type of poetry. This leads us naturally to a broader discussion of mathematical poetry.

Mathematical Poetry In a review of Glaz (2016), Caleb Emmons offers a taxonomy of mathematical poetry (Emmons 2017). According to Emmons, there are three main (nondisjoint) categories of mathematical poetry:

32 Mathematics and Poetry: Arts of the Heart

975

1. Poems that have math as a major subject, 2. Poems that apply mathematical language or imagery to something nonmathematical. 3. Poems whose structure is inspired or informed in some way by math. We know today that ancient Mesopotamians wrote poetry about mathematics (Glaz 2019). So mathematical poetry in the sense of (1) is nothing new. Today several serial outlets continue to publish this kind of poetry; see the last section for some suggestions. Growney (2006) gives a half-dozen examples of each of the last two categories, including many by poets who are quite distinguished (e.g., former US Poets Laureate) and/or famous. In particular she lists “Geometry” by Rita Dove, “Figures of Thought” by Howard Nemerov, “Six Significant Landscapes” by Wallace Stevens, “Pi” by Wislawa Szymborska, and “Gravity and Levity” by Bin Ramke, as poems with mathematical imagery. Other well-known examples in this category include Carl Sandburg’s “Arithmetic,” Linda Pastan’s “Algebra,” Edna St. Vincent Millay’s “Euclid Alone Has Looked on Beauty Bare,” and Yehuda Amichai’s “Through Two Points Only a Straight Line Can Pass: Theorem in Geometry.” Mathematical poetry of this style has even won a Forward Prize in 2013; see Akbar (2013) and http:// www.forwardartsfoundation.org/forward-prizes-for-poetry/forward-alumni/. In the “mathematical structure” category of poems, Growney (2006) gives examples of poems where the number of syllables (or, in some case, number of words) per line conforms to a mathematical pattern/rule (e.g., haiku, limerick, or fib) and/or to an intended overall visual or geometric form (e.g., Lewis Carroll’s “The Mouse’s Tale,” a square poem, a triangle poem, etc.). We explored these poetic forms in our section on the mathematics of poetry. Maslanka’s classification of mathematical poetry (Maslanka 2010) goes into some detail on mathematical visual poetry. In particular, he distinguishes between mathematical visual poetry and visual mathematical poetry and in between inserts a category he labels “equational poetry.” This latter term describes a specialized type of visual poetry where words serve as metaphors within equations and all works according to explicitly mathematical rules. The other types of visual poetry differ depending on how much or how little they engage with lexical units and mathematical rules. Maslanka’s mathematical poetry blog (available at http:// mathematicalpoetry.blogspot.com/) is a good place to locate various examples of all three of these types of visual poems. Additional examples of visual math poetry may be found in some of Michael Naylor’s work (see Glaz (2016)), as well as in Bob Grumman’s (Grumman 2014). While exploring Emmons’s third category, we might ask what kinds of mathematical processes might give rise to poetry. Emmons, a prolific mathematical poet himself, offers in Emmons (2013) a mechanism to create mathematical poetry. Described tongue in cheek as a step-by-step algorithm, this might lead one to wonder if other, more mathematical, processes might exist. The avant-garde

976

G. Karaali and L. M. Lesser

composer John Cage is known to have written poetry as well as music using the randomness of the I Ching. The members of the French Oulipo, some of whom were themselves mathematicians, also used mathematical processes in creating their works. Today computers also write poems; a possible starting point to explore the world of computer-generated poetry might be http://botpoet.com/what-is-computerpoetry/. We believe, however, that poetry, as well as mathematics, is an inherently human endeavor, and we are most interested here in what mathematical poetry human poets create.

Educational Possibilities Poetry can be used in courses on mathematics or statistics (indeed, across the curriculum) as a memorable and creative way to motivate students and engage content. One way this can be done is to have a mini-lesson developed around a particular poem with the desired mathematical content. For example, the second author wrote ready-to-use classroom discussion questions to accompany “Diameter” (Lesser 2017), a poem written about a circle’s diameter, in the form of a diameter, and in the meter of diameter! The first author shared her experiences on introducing relevant examples of mathematical poetry into a college course with a liberal arts/humanistic slant (Karaali 2014). A particularly great month to feature mathematical poetry is April, which is not only both “National Poetry Month” and “Mathematics and Statistics Awareness Month,” but is also an ideal month to showcase or cap off what has been learned over the course of the school year that is starting to wind down. Another approach is having students write mathematical poetry (which the first author does near the end of her course, after students have read and discussed assigned selected works by others). The second author has given students in his statistical literacy course an extra-credit assignment (due in the last month of the course) in which students may create a poem, song, or video connecting to course content. A 5-min video-poster about the assignment (including the guidelines given to students) is available at Lesser (2018). Secondary school and college instructors can also encourage their students to submit their poems to national contests offered by organizations such as CAUSE (https://www.causeweb.org/cause/a-mu-sing/) or the AMS (http://www.ams.org/programs/students/math-poetry). The latter’s inaugural contest attracted 110 entries in categories for middle school, high school, and college students and featured its winners at the Joint Mathematics Meeting 2019. Daisy Zhang-Negrerie, a high-school teacher at Concordia International School Shanghai, encouraged her calculus students to write poetry during the 2014–2015 school year; this resulted in a lovely book (Zhang-Negrerie 2015). Mathematical poetry assignments have also been used in elementary schools, such as the haiku and limerick assignment in a fifth-grade classroom (Ayebo et al. 2010). The compact phrasing required by writing a poem (versus textbook-style prose) arguably makes the student more deeply engage with concepts to get to the essence, thus

32 Mathematics and Poetry: Arts of the Heart

977

consolidating existing knowledge as well as generating new insights. For those who are concerned about assessing a student-created poem, see the criteria in Table 3 of Connor-Greene et al. (2005) and the strategies of LaBonty and Danielson (2005), Keller and Davidson (2001), and Bay-Williams (2005). Many of the preceding examples demonstrate that the educational uses of poetry are by no means limited to poetic jingles to memorize mathematical facts or formulas (though it is certainly possible to rephrase pieces of mathematics to give them memorable rhyme, such as changing Euler’s formula V − E + F = 2 into “F plus V equals 2 plus E”, or stating the four rigid motions as “turn, slide, flip and glide”). We see that poetry can also be used educationally to introduce concepts or terms, reinforce thinking processes, humanize mathematics, or connect to history or the real world (Lesser 2014). Several scholars explored the use of poetry in the mathematics classroom, and sometimes the converse collaboration, on the use of mathematics in the poetry classroom has become a topic of interest. In the K-12 context, we can list Altieri (2005), Ayebo et al. (2010), Bay-Williams (2005), Curcio et al. (1995), Hammett (2007), Keller and Davidson (2001), Long (2001), and Whitin and Piwko (2008). In the context of the college classroom, readers might find the works of Glaz (2010), Glaz and Liang (2009), Karaali (2014), and Lesser (2014) of interest.

Further Reading and Making Connections An early account of the relationship between mathematics and poetry can be found in Buchanan’s book (Buchanan 1929/1962), which promises “fun, stimulating new ideas, and a delightfully Alice-in-Wonderland atmosphere. Poetry and Mathematics are treated as of equal importance, as two very successful attempts to deal with ideas.” (Barber 1930). An interesting personal account of how poetry and mathematics revolve around each other and create a rich life may be found in (Grosholz 2018). A more systematically mathematical approach is taken in Birken and Coon (2008). Also see the articles in the special issue of the Journal of Mathematics and the Arts on Poetry and Mathematics (Glaz 2014). Outlets regularly publishing mathematical poetry include Journal of Humanistic Mathematics and The Mathematical Intelligencer. Other outlets include: Math Horizons, The Pi Mu Epsilon Journal, and JoAnne Growney’s blog Intersections – Poetry with Mathematics, available at https://poetrywithmathematics.blogspot.com. There are now many books and collections of mathematical poetry. Here is a selection of anthologies: Glaz and Growney (2008), Growney (2001), Fadiman (1958/1997), Plotz (1955), Robson and Wimp (1980), and the Bridges Conference poetry anthologies edited by Sarah Glaz (2013, 2016, 2018). For those who want to connect with readers and writers of mathematical poetry, there is a Facebook group Mathematical Poets: https://www.facebook.com/groups/ MathematicalPoets/. Other events to look for include poetry readings at the annual Joint Mathematics Meetings and Bridges Mathematics and Art Conferences.

978

G. Karaali and L. M. Lesser

References Akbar A (2013) Sexually charged poem about learning metric wins Forward Prize, Independent 1 Oct 2013. Available at https://www.independent.co.uk/arts-entertainment/books/news/sexuallycharged-poem-about-learning-metric-wins-forward-prize-8852157.html Alpaugh D (2010) The new math of poetry, The Chronicle of Higher Education. Available at https:/ /www.chronicle.com/article/The-New-Math-of-Poetry/64249/ Altieri J (2005) Creating poetry: reinforcing mathematical concepts. Teach Child Math 12(1): 18–23 Ayebo A, Wiest LR, Sherard H (2010) Poematics: exploring math through poetry. Math Teach Middle School 15(7):378–381 Barber HC (1930) Poetry and mathematics by Scott Buchanan. Math Teach 23(6):396 Bay-Williams J (2005) Poetry in motion: using Shel Silverstein’s works to engage students in mathematics. Math Teach Middle School 10(8):386–393 Birken M, Coon AC (2008) Discovering patterns in mathematics and poetry. Rodopi, Kenilworth Buchanan S (1929/1962) Poetry and mathematics. The John Day Company, New York. Republished in 1962, in Philadelphia, by J. B. Lippincott Cobb GW, Moore DS (1997) Mathematics, statistics and teaching. Am Math Mon 104(9):801–823 Connor-Greene PA, Young A, Paul CP, Murdoch JW (2005) Poetry: it’s not just for English class anymore. Teach Psychol 32(4):215–221 Curcio FR, Zarnowski M, Vigliarolo S (1995) Mathematics and poetry: problem solving in context. Teach Child Math 1(6):370–374 Emmons C (2007) S|{e,s,t,i,n,a}| . Math Intell 29(1):7 Emmons C (2010) Snowflake. Math Intell 32(2):5 Emmons C (2013) How to cook up a math poem in n easy steps. J Humanist Math 3(1):108–114. http://scholarship.claremont.edu/jhm/vol3/iss1/9 Emmons C (2017) Bridges 2016 poetry anthology. J Math Arts 11(1):62–66 Epstein J (1988) Who Killed Poetry?, Commentary. Available at https://www.commentary magazine.com/articles/who-killed-poetry/ Fadiman C (1958/1997) Fantasia mathematica, an anthology. Simon & Schuster. Republished by Springer, New York in 1997 Glaz S (2010) The enigmatic number e: a history in verse and its uses in the mathematics classroom. Convergence. https://doi.org/10.4169/loci003482 Glaz S (ed) (2013) Bridges 2013 poetry anthology. Tessellations Publishing, Phoenix, AZ Glaz S (ed) (2014) Journal of Mathematics and the Arts. Special issue on poetry and mathematics 8(1–2). https://www.tandfonline.com/toc/tmaa20/8/1-2 Glaz S (ed) (2016) Bridges 2016 poetry anthology. Tessellations Publishing, Phoenix, AZ Glaz S (ed) (2018) Bridges 2018 poetry anthology. Tessellations Publishing, Phoenix, AZ Glaz S (2019) Enheduanna: princess, priestess, poet, and mathematician. Math Intell. https://doi. org/10.1007/s00283-019-09914-7 Glaz S, Growney JA (eds) (2008) Strange attractors: poems of love and mathematics. A K Peters, Wellesley Glaz S, Liang S (2009) Modeling with poetry in an introductory college algebra course and beyond. J Math Arts 3:123–133 Grosholz ER (2018) Great circles: the transits of mathematics and poetry. Springer, Cham Growney JA (1992) Are mathematics and poetry fundamentally similar? Am Math Mon 99(2):131 Growney J (ed) (2001) Numbers and faces. Humanistic Mathematics Network Growney JA (2006) Mathematics in poetry. J Online Math Appl 6. Available at https://www.maa. org/sites/default/files/images/upload_library/4/vol6/Growney/MathPoetry.html Growney JA (2008) Mathematics influences poetry. J Math Arts 2(1):1–7 Growney JA (2009) What poetry is found in mathematics? What possibilities exist for its translation? Math Intell 31(4):12–14 Grumman B (2014) Visiomathematical poetry, the triply-expressive poetry. J Math Arts 8:31–37

32 Mathematics and Poetry: Arts of the Heart

979

Hammett JE (2007) Turning the mathematics classroom into an intellectual playground through poetry. Math Teach Middle School 13(4):195–198 Henle J (2011) Is (some) mathematics poetry? J Humanist Math 1(1):94–100. http://scholarship. claremont.edu/jhm/vol1/iss1/7 Johnson SGB, Steinerberger S (2019) Intuitions about mathematical beauty: a case study in the aesthetic experience of ideas. Cognition 189:242–259 Karaali G (2014) Can zombies write mathematical poetry? Mathematical poetry as a model for humanistic mathematics. J Math Arts 8(1–2):38–45 Katz SR (2017) A math poem. J Humanist Math 7(2):415–415. https://scholarship.claremont.edu/ jhm/vol7/iss2/24 Keller R, Davidson D (2001) The math poem: incorporating mathematical terms in poetry. Math Teach 94(5):342–347 LaBonty J, Danielson KE (2005) Writing poems to gain deeper meaning in science. Middle Sch J 36(5):30–36 Lesser LM (2014) Mathematical lyrics: noteworthy endeavours in education. J Math Arts 8(1–2): 46–53 Lesser LM (2017) Diameter [poem and discussion questions]. Tex Math Teach 63(2):14, 31 Lesser LM (2018) Student-created songs in statistics class. Presentation at 4th electronic conference on teaching statistics. https://www.causeweb.org/cause/ecots/ecots18/posters/4-04 Long VM (2001) Polygons to poetry. Math Teach Middle School 6(8):436–438 Marcus S (1998) Mathematics and poetry: discrepancies within similarities. In: Sarhangi R (ed) Bridges: mathematical connections in art, music, and science. Bridges Conference, Southwestern College, Winfield, Kansas. Available online at http://archive.bridgesmathart.org/ 1998/bridges1998-175.html Maslanka K (2010) Five types of mathematical poetry, blog post dated 14 June 2010. http:// mathematicalpoetry.blogspot.com/2010/06/4-types-of-mathematical-poems.html Pinter M (2014) How do I love thee? Let me count the ways for syllabic variation in certain poetic forms. J Humanist Math 4(2):94–100. http://scholarship.claremont.edu/jhm/vol4/iss2/10 Plotz H (1955) Imagination’s other place: poems of science and mathematics. Thomas Y Crowell Company, New York Raman-Sundström M, Öhman L-D, Sinclair N (eds) (2016) Journal of Humanistic Mathematics. Special issue on the nature and experience of mathematical beauty 6(1). https://scholarship. claremont.edu/jhm/vol6/iss1/ Robson E, Wimp J (eds) (1980) Against Infinity. Primary Press, Parker Ford Sanders ST (1942) Mathematics and poetry. Natl Math Mag 17(3):98 Steinsaltz A (2016) Two Languages and the Chasm between Them, Standpoint, 22 Mar 2016. https://standpointmag.co.uk/issues/april-2016/text-april-2016-rabbi-adin-steinsaltz-twolanguages-science-poetry-religion/ Wesick J (2017) Moneyball for creative writers: a statistical strategy for publishing your work. J Humanist Math 7(1):155–171. https://scholarship.claremont.edu/jhm/vol7/iss1/13 Whitin DJ, Piwko M (2008) Mathematics and poetry: the right connection. YC Young Child 63(2):34–39 Zeki S, Romaya JP, Benincasa DMT, Atiyah MF (2014) The experience of mathematical beauty and its neural correlates. Front Hum Neurosci 8:68. https://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3923150/ Zhang-Negrerie D (2015) From tangency to truth: an intersection of math, poetry, and art, CreateSpace Independent Publishing Platform

“Elegance in Design”: Mathematics and the Works of Ted Chiang

33

Jessica K. Sklar

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Writing Like a Heptapod: Nonlinear Semasiography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thinking Like a Heptapod: Variational Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Premembering: Nonlinear Orthography and Nonlinear Time . . . . . . . . . . . . . . . . . . . . . . Story of Her Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

982 983 986 990 992 992 994 997 998 999 1000

Abstract Author Ted Chiang masterfully weaves higher-level mathematical concepts into the fabric of his science fantasy narratives: concepts such as the compactification of the real line, linear orderings on sets, public-key encryption, and Fermat’s principle of least time play profound – if not always explicit – roles in his stories. His unique and compelling narratives have broad emotional appeal: the movie Arrival (2016), adapted from his novella “Story of Your Life (2016),” finished in third place at the box office during its opening weekend. But beyond this, Chiang’s stories are instructive: his readers find themselves learning about mathematics – as well as about linguistics, physics, computer science, and Babylonian cosmology – without even trying. This chapter discusses some of the ways in which Chiang uses mathematics to illuminate nonmathematical

J. K. Sklar () Mathematics Department, Pacific Lutheran University, Tacoma, WA, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_47

981

982

J. K. Sklar

concepts; nonmathematical concepts to communicate mathematical ideas; and connections between the two to provide his readers – especially those with some mathematical background or inclination – with literary “aha moments.”

Keywords Ted Chiang · Stories of Your Life and Others · “Tower of Babylon” · Alexandroff one-point compactification of the real number line · Topological spaces · Compactness · Infinity · Real projective plane · Edwin Abbott Abbott · Flatland · “Seventy-Two Letters” · Victorian England · Victorian era · Euonym · Kabbalah · Golem · Shem · Nomenclature · Public-key encryption/decryption · RSA (Rivest-Shamir-Adleman) cryptography · Integers · Factor · Prime number · Relatively prime · Congruent modulo n · φ(n) · Public key · Private key · Bézout’s Identity · Division Algorithm · Euler’s Theorem · “Division by Zero” · Formalism · Kurt Gödel · Gödel numbering · Alex Kasman · Mathematical Fiction (website) · “Story of Your Life” · Arrival · Semasiographic system · Semagrams · Logograms · Variational principle · Fermat’s principle of least time · Calculus of variations · Derivative · Functional

Introduction Around the turn of the twenty-first century, mathematics became increasingly visible in popular culture. Eschewing the notion that mathematics is an esoteric discipline – the bailiwick of gray-bearded philosophers, wunderkinds, and madmen – television series and films were now letting audience members in on the secret that they, too, could do mathematics. The tagline of the 2005–2010 television show NUMB3RS – “We all use math every day; to predict weather, to tell time, to handle money. Math is . . . using your mind to solve the biggest mysteries we know” – suggests that the practice of mathematics is both of profound importance and of the people. Films such as Good Will Hunting (1997), Proof (2005), The Imitation Game (2014), and Hidden Figures (2016) feature mathematicians who are bluecollar, female, gay, and black. Yet as refreshing as it is to see these varied depictions of mathematicians, in popular media mathematics itself is often more or less used as a prop (something that looks cool on a whiteboard) or as a plot device – it solves a crime, allows aliens to communicate with people, or wins college students money at a casino. However, some works, such as the science fantasy short stories of Ted Chiang, are quite deeply informed by mathematics. In a note on his short story “Division by Zero,” Chiang writes: “One of the things we admire most in fiction is an ending that is surprising, yet inevitable. This is also what characterizes elegance in design: the invention that’s clever yet seems totally natural” (Chiang 277). Perhaps the most common manifestation of this type of elegance is a plot twist, such as that found at the end of Shirley Jackson’s “The Lottery” (1948). But there is another,

33 “Elegance in Design”: Mathematics and the Works of Ted Chiang

983

more subtle manifestation of it, in which a second narrative or, more accurately, a second interpretation of what is being said – that is, a subtext – plays out in a work’s shadows. In this case, the two narratives serve as scaffolds for one another, providing the careful reader, viewer, or listener with enhanced understanding – even epiphany – when the second narrative finally emerges into the light. One finds this type of elegance not only in poetry, where metaphor and analogy frequently take center stage, but also in film and prose; in particular, one finds it – with a mathematical bent – in Chiang’s work. In his collection Stories of your Life and Others, Chiang uses mathematics to illuminate nonmathematical concepts; nonmathematical concepts to communicate mathematical ideas; and connections between the two to provide his readers – especially those with some mathematical background or inclination – with literary “aha moments.” In this chapter, we explore how this plays out in four of the collection’s tales.

Direction Chiang’s collection opens with “Tower of Babylon,” a quasi-biblical travel narrative whose protagonist, Hillalum, travels not across the earth, but up. Hillalum, an Elum miner, is summoned to climb to the top of the Tower of Babel – still under construction – and “‘dig through the vault of heaven”’ (Chiang 2). The story is peppered with measurements: “Were the tower to be laid down . . . it would be two days’ journey to walk from one end to the other. While the tower stands, it takes a full month and a half to climb from its base to its summit” (Chiang 1); “Four months pass between the day a brick is loaded onto a cart, and the day it is taken off to form a part of the tower” (Chiang 1); crews building the tower climb during fourday stints; the tower’s lowest platform stood “some two hundred cubits on a side and forty cubits high” (Chiang 5). Chiang focuses less on the purpose behind the tower’s creation and more on the details of the engineering required for its construction, yet in the background, the tower’s raison d’être looms: the Babylonians, in their hubris, are attempting to build “a stair that men might ascend to see the works of Yahweh, and that Yahweh might descend to see the works of men” (Chiang 6). The biblical story of the Tower of Babel, of course, suggests another motive for the tower’s creation: the builders are attempting to solidify their unity as a people. “And they said, Go to, let us build us a city and a tower, whose top may reach unto heaven; and let us make us a name, lest we be scattered abroad upon the face of the whole earth” (King James Version, Genesis 11:4). The Lord, wary of humans gaining knowledge, looks poorly upon this: And the LORD said, Behold, the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do. Go to, let us go down, and there confound their language, that they may not understand one another’s speech. (King James Version, Genesis 11:6–7)

984

J. K. Sklar

One might take a leap now and evoke the proposition – popularized in the film Contact and argued by academics Waller and Flood (2016) – that mathematics is a universal language: perhaps the people’s communal engineering of the building is as threatening to God as their “one language” and the tower’s impending encroachment upon heaven. Hillalum, uncomfortable at the thought of breaking into the firmament, tries to suppress his concern. He occupies himself with learning details of the building’s construction, and pondering celestial geometry and the science of rainfall with his colleagues; he attempts to ignore his literal approach to heaven even as he acquires knowledge that is bringing him closer to God. He rises as far as the moon, the sun, and the stars, and finally arrives at the vault that, as per the Babylonian model of the universe, holds back the waters of heaven. He is profoundly unsettled: All of Hillalum’s senses were disoriented by the sight of [the vault]. Sometimes when he looked at the vault, he felt as if the world had flipped around somehow, and if he lost his footing he would fall upward to meet it. . . . Too, there were moments when . . . for an instant it seemed that there was no up and no down, and his body did not know which way it was drawn. (Chiang 17)

Despite his apprehension, Hillalum spends years tunneling through granite with his peers until – inevitably, it seems – they break through to a reservoir and their tunnel floods. Hillalum is overtaken by the water, becomes wildly disoriented while engulfed in the deluge, and nearly drowns. Eventually he is spewed up into a cavern and crawls out a passage, onto . . . earth, a short distance from the tower’s base. And this is where a mathematician experiences a little thrill: Hillalum has encountered the Alexandroff one-point compactification of the real number line. Interested readers can learn about topological spaces and compactness in texts such as James Munkres’ Topology, but one can informally describe this compactification as follows: take the real number line and shrink it to, say, the open interval (−1, 1). Then bend that interval into a circle that is missing a point. Finally, attach the “ends” of the interval together at a point labeled “∞.” (This description ignores a great many mathematical nuances of this compactification, including the nonexistence of “ends” of the interval, but can help illuminate the concept.) Hillalum, starting at point 0 (the base of the tower), has gone around the circle toward infinity (in the direction of one’s choosing), reached and passed through infinity (i.e., heaven), and completed essentially a revolution around the circle, returning to a location close to the one at which he started. (See Fig. 1.) Considering he has arrived at and passed through infinity, his disorientation upon nearing the vault is rather understandable. Completing his revolution, Hillalum has two epiphanies: one mathematical and one spiritual. (This is not the only time such a pairing appears in Chiang’s work.) First, he asks himself how it can be that such distant places as earth and heaven can touch, and derives a mathematical model of the situation: When rolled upon a tablet of soft clay, [a seal cylinder] left an imprint that formed a picture. Two figures might appear at opposite ends of the tablet, though they stood side by side on the surface of the cylinder. All the world was as such a cylinder. Men imagined heaven and earth as being at the ends of a tablet, with sky and stars stretched between; yet the world was wrapped around in some fantastic way so that heaven and earth touched. (Chiang 28)

33 “Elegance in Design”: Mathematics and the Works of Ted Chiang Fig. 1 The Alexandroff one-point compactification of the real number line

985

∞ •

• 0 Fig. 2 The fundamental polygon of the real projective plane. To form this projective plane, “glue” the vertical edges together and the horizontal edges together, identifying the arrows. Note that this surface is not embeddable in three-dimensional space

The circle in Fig. 1 is a cross section of this cylinder. The mathematically inclined reader might even consider the possibility that Hillalum has traveled in the real projective plane (see Fig. 2). This manifold’s status as a non-orientable surface might have contributed to Hillalum’s extreme disorientation upon both nearing the vault and being consumed by the flood: “With only blackness around him, he once again felt that horrible vertigo that he had experienced when first approaching the vault: he could not distinguish any directions, not even up or down” (Chiang 25). Next, he realizes that the construction of the tower was itself a form of worship: through their endeavor, men would glimpse the unimaginable artistry in Yahweh’s work, in seeing how ingeniously the world had been constructed. By this construction, Yahweh’s work was indicated, and Yahweh’s work was concealed. Thus men would know their place. (Chiang 28)

Hillalum approached the infinite, disorientedly passed through it, and returned to spread the gospel: “He would send word to those on the tower. He would tell them about the shape of the world” (Chiang 28). Hillalum’s insight is simultaneously devotional and topological: like A. Square in Edwin A. Abbott’s novella Flatland (1884), who is taken by a sphere out of his planar universe and introduced to the miraculous three-dimensional world beyond, he has traveled into the mathematical unknown and returned to evangelize. Galileo was forced to recant his scientific findings, and A. Square was imprisoned for suggesting that third – and even fourth! – dimensions exist. One wonders what will become of Hillalum and his apocryphal topological testimony.

986

J. K. Sklar

Decryption The novella “Seventy-Two Letters,” one of the most disorienting of Chiang’s tales, takes place in an alternative Victorian England, where unicorn horns are displayed in museums alongside moose antlers, and automata are animated through the use of euonyms, strings of seventy-two Hebrew letters that cause “the latent potentialities of [the name and the object] to be realized” (Chiang 149–150). The true Victorian era was a crucible in which scientists and spiritualists both collaborated and dueled; Chiang transmutes the era’s scientific and spiritual movements and weds them to the kabbalistic tradition in this tour de force, which incorporates everything from thermodynamics to class politics to the concept of self-replicating machines. The reader meets the story’s protagonist, Robert, when he is a young boy, experimenting with the automation of his toys: his clay doll and porcelain horse are animated, like other objects in Robert’s world, via their euonyms, which are inscribed on parchment scraps that are slipped inside the objects. Both industry and personal protection rely on euonyms: There had long been two classes of names: those for animating a body, and those functioning as amulets. Health amulets were worn as protection from injury or illness, while others rendered a house resistant to fire or a ship less likely to founder at sea. (Chiang 153)

Industrial automata push trolleys of ore in mines, act as messengers, and crank manufacturing drive wheels. Advances in technology, as well as in the preservation of human life, therefore rely upon advances in the discovery and parsing of euonyms. The study of euonyms before Robert’s time had traditionally been the domain of mystics and kabbalists. Indeed, each euonym functions as a shem – a name of God (one version of which contains 72 letters) that is used to animate a golem. In Jewish folklore, a golem is an inanimate creature, usually constructed of clay or mud, that is animated, like one of Chiang’s automata, through the inscription of a shem either on its body or on a piece of paper that is inserted into it. Chiang directly connects his automata to golems when he notes that the “earliest use [of a particular component of a euonym] was claimed to have occurred in biblical times, when Joseph’s brothers created a female golem” (Chiang 177). A DETOUR: MATHEMATICS AND GOLEMS Chiang’s “Seventy-Two Letters” is not the only work that connects mathematics and the golem legend. In “Truth by the Numbers: Mysticism and Madness in Darren Aronofsky’s π ,” Laurie A. Finke and Martin B. Shichtman discuss connections in π between the kabbalah, the creation of a golem using a “sacred word” (Sklar & Sklar 274), and predictions of stock market behavior. In π , protagonist Max (Sean Gullette) attempts to lay bare mathematical secrets; in “Seventy-Two Letters,” Robert seeks to lay bare linguistic ones. But both men are, fundamentally, trying to understand phenomena through the lens of what Jacques Derrida would term signifiers. And truly, after all, what is the study of mathematics but that?

33 “Elegance in Design”: Mathematics and the Works of Ted Chiang

987

THE DETOUR ENDS However, times are changing: nomenclature, the study of these euonyms, is increasingly being done by secular researchers – Robert and his boyhood friend purchase a book that “inform[s] them that nomenclators no longer spoke in terms of God or the divine name” (Chiang 149). Robert goes on to study nomenclature at Trinity College, reading kabbalistic texts and “alchemical treatises that placed the techniques of alphabetic manipulation in a broader philosophical and mathematical context” (Chiang 152). Chiang elaborates on Robert’s euonymic research, which is certainly linguistic but also, at heart, mathematical. Even the language Chiang uses is mathematical in nature (emphases added): He learned that every name was a combination of several epithets, each designating a specific trait or capability. Epithets were generated by compiling all the words that described the desired trait . . . By selectively substituting and permuting letters, one could distill from those words their common essence, which was the epithet for that trait. . . . The entire process relied on intuition as much as formulae; the ability to choose the best letter permutations was an unteachable skill. He studied the modern techniques of nominal integration and factorization, the former being the means by which a set of epithets . . . were commingled into the seemingly random string of letters that made up a name, the latter by which a name was decomposed into its constituent epithets. (Chiang 152)

While the emphasized words certainly appear in nonmathematical contexts, together they infuse the passage with an algebraic and number theoretic flavor. Other portions of the story also contain mathematical language: for instance, Robert’s research focuses on the (emphases added) “details of permutation and combination” (Chiang 180) – a clear nod to the lexicon of combinatorics – and at one point Robert realizes that a kabbalistic euonymic researcher’s achievement is, like an ingenious proof, “elegant” (Chiang 198). Further, descriptions of euonymic research, such as the following, could easily, instead, be describing mathematical research: [Robert] worked in collaboration with the other nomenclators in the group, and between them they divided up the vast tree of nominal possibilities, assigning branches for investigation, pruning away those that proved unfruitful, cultivating those that seemed most productive. (Chiang 177)

Robert is, in fact, engaged in the study of decryption and encryption, doing work analogous to that of modern day cryptographers. (Chiang gives a sly nod to this: Robert’s employer is “Coade Manufactory.”) In particular, one can connect the work of nomenclators to that of mathematicians and computer scientists attempting to make public-key encryption – a commonly used system for encrypting messages that can only be decrypted by their intended recipients – obsolete. A MATHEMATICAL EXCURSION Public-key encryption involves the use of two “keys,” a public key and a private one. For example, suppose Roberta’s friend Leona wants to encrypt a secret message

988

J. K. Sklar

for Roberta using RSA (Rivest-Shamir-Adleman) cryptography. First, recall the following: • The integers are the real numbers 1, 2, 3, . . ., their negatives, and 0. The positive integers are 1, 2, 3, . . . • An integer m is a factor of an integer n if there exists an integer k such that n = km. • A prime number is an integer p > 1 whose only positive factors are 1 and itself. • Integer a is said to be relatively prime to integer b if no integer greater than 1 is a factor of both a and b. • Given a positive integer n, two integers a and b are said to be congruent modulo n if n is a factor of a − b. If a and b are congruent modulo n, we may write a ≡n b. • Given a positive integer n, φ(n) denotes the number of positive integers that are less than n and relatively prime to n. In one implementation of RSA cryptography, the processes of message encryption and decryption proceed as follows. (For a more thorough description of RSA cryptography, see Section 7.2 in Judson 2018.) 1. Roberta chooses two distinct, large prime numbers, p and q, and lets n be their product: that is, n = pq. 2. Since p and q are large primes, there is some integer e with 1 < e < φ(n) such that e and φ(n) are relatively prime. The pair of numbers (n, e) is the cryptosystem’s public key; this key, used for encryption, is shared with Leona. Note that Leona knows only n and e: she does not know p, q, or φ(n). 3. Since e and φ(n) are relatively prime, it follows from Bézout’s identity (see Theorem 2.10 in Judson 2018) that there exists a positive integer d with ed ≡φ(n) 1. The number d is the cryptosystem’s private key, used for decryption. Only Roberta knows this private key. 4. To send a message to Roberta, Leona converts her plaintext message into a positive integer m < n that’s relatively prime to n, using a reversible method upon which she agrees with Roberta. By the division algorithm (see Theorem 2.9 in Judson 2018), there exists a unique positive integer c with c < n such that c ≡n me . Leona computes c – the encrypted message – and sends it to Roberta. 5. To decrypt c into the original message, m, Roberta computes the unique positive integer m such that m < n and m ≡n cd (the existence of m is again guaranteed by the division algorithm). Since m < n and m < n, if m ≡n m, then a straightforward proof will yield that m = m. Now, ≡n (see, for instance, Theorem 2.1 in Niven et al. 1991) is a transitive relation: that is, if x,y, and z are

33 “Elegance in Design”: Mathematics and the Works of Ted Chiang

989

integers with x ≡n y and y ≡n z, then x ≡n z. Since m ≡n cd , it thus suffices to prove cd ≡n m. Note that since ed ≡φ(n) 1, there exists an integer k such that ed −1 = kφ(n), that is, such that ed = kφ(n) + 1. Since c ≡n me , it follows that cd ≡n (me )d (see, again, Theorem 2.1 in Niven et al. 1991). Thus, cd ≡n (me )d = med = mkφ(n)+1 = m(mφ(n) )k . Since m and n are relatively prime, Euler’s theorem (Theorem 6.18 in Judson 2018) yields mφ(n) ≡n 1. So m(mφ(n) )k ≡n m(1k ) = m. Therefore, cd ≡n m. Thus, m ≡n m, and so m = m. 6. Roberta has successfully computed m = m, and she now simply converts m into Leona’s original plaintext message.   The reason that public-key encryption remains relatively secure boils down to this: one cannot decrypt an encrypted message without knowing the system’s private key, d, and in order to find d, one must know both e and φ(n). While the numbers n and e are shared publicly, one cannot reasonably compute φ(n) without knowing p and q, and when p and q are very large, these primes are next to impossible to identify because there is no known effective algorithm for reliably factoring – that is, finding the factors of – extremely large numbers. Thus, only someone holding the system’s private key is able to decrypt an encrypted message. THE EXCURSION ENDS If researchers discover an efficient method for factoring very large numbers, government agencies and other organizations will be able to decrypt encrypted messages. Just as it is easy to multiply two large primes together but challenging, if not impossible, to factor their product, in Chiang’s world Not every method of [euonym] integration had a matching factorization technique . . . Some names resisted refactorization, and nomenclators strove to develop new techniques to penetrate their secrets. (Chiang 152–153)

While in our world number theorists seek numerical factoring algorithms in order to decipher encrypted messages, in Chiang’s nomenclators seek to factor euonyms in order to animate their machines. And just as in our world there are factions of people who feel strongly that personal privacy must be respected and thus fear the undermining of public-key encryption, in Chiang’s universe there are those who oppose euonymic research. Human sculptors of automata are afraid of losing their jobs, should nomenclators be able to endow automata with the ability to create their own kind; moreover, in Chiang’s England, men will become sterile within the next couple of hundred years. Researchers hope to find euonyms that will allow “‘mankind to perpetuate itself

990

J. K. Sklar

through nomenclature”’ (Chiang 171). This may seem like a noble goal, at least from an anthropocentric perspective, but the London elite plan to restrict access to these euonyms, should they be discovered, in furtherance of the government and peerage’s eugenic agenda: Lord Fieldhurst, a patron of euonymic research, declares: “. . . once we have human production under our control, we will have a means of preventing the poor from having such large families.” He continues: “‘By exercising some judgment when choosing who may bear children or not, our government could preserve the nation’s racial stock”’ (Chiang 186). In each case, the cracking of a cipher constitutes a threat to civil liberties. But paradoxically, at the end of “Seventy-Two Letters,” it is an extreme violation of a civil liberty – namely, the murder of a kabbalist – that leads to Robert’s novel approach to the preservation of the working classes’ reproductive rights: he will seek a euonym that will allow biologically sterile humans to self-replicate, like mathematician John von Neumann’s theorized automata. Mathematics, throughout it all, has been lurking behind the scenes.

Division A proof that mathematics is inconsistent, and that all its wondrous beauty was just an illusion, would, it seemed to me, be one of the worst things you could ever learn. —Ted Chiang, note on “Division by Zero”

Like Hillalum in “Tower of Babylon,” Renee, the antiheroine of Chiang’s story “Division by Zero,” experiences a mathematical epiphany: however, her discovery plunges her into nihilistic despair. The story, divided into nine sections, is comprised of two parallel nonlinear narratives, preceded by section introductions explaining mathematical history and concepts. In the first narrative, told the subsections that are labeled with the letter ‘A’, Renee, a former mathematical prodigy now approaching middle age, develops a revolutionary formalism – a syntactic language – which allows her to prove that any two real numbers are equal to one another. As Chiang notes in the story’s sixth section introduction, Kurt Gödel proved that arithmetic as a formal system cannot guarantee that it will not produce results such as “1 = 2”; such contradictions may never have been encountered, but it is impossible to prove that they never will be. (Chiang 79)

By working outside of the system of formal arithmetic, however, Renee has been able to prove something that could never have been proven within it: a result that has rendered mathematics meaningless. Her husband, Carl, argues that “‘Math still works. The scientific and economic worlds aren’t suddenly going to collapse from this realization,”’ but Renee dismisses that sort of mathematics as a “gimmick” (Chiang 81). Since childhood, she has been a believer in the fundamental rightness of math: now she finds that her intuition has “betrayed her” (Chiang 83). Her proof that arithmetic is inconsistent makes sense to her: “in its own perverted way, it felt right. She understood it, knew why it was true, believed it” (Chiang 83). Faced with the disproof of everything she has always “known” to be right, she loses her

33 “Elegance in Design”: Mathematics and the Works of Ted Chiang

991

clarity of thought and her concentration; she dreams that she has proven that life and death are equal. Her realization that mathematics has, all along, been a false idol makes her suicidal. She no longer knows who she is: having lost her faith, she must wander the wilderness. She remembers an acquaintance who “gave up academia to sell handmade leather goods” (Chiang 86). The second narrative – contained in the subsections that are labeled with the letter ‘B’ – focuses on Carl, who had attempted suicide while in graduate school but since then has never suffered despair. While Renee’s narrative focuses on her relationship with mathematics and, in conjunction with that, her sense of self, Carl’s narrative, for the most part, focuses on his relationship with Renee. It discusses what drew him to her when they first met, and details his attempts to understand what is going on in her mind when she begins to show signs of frustration with her work: while she pores over “hieroglyphic equations, interspersed with commentary in Russian,” he attempts to “decipher [her] glare” (Chiang 77). Even when she shares with him her disastrous revelation, she remains a cipher: fundamentally a pragmatist, he is unable to comprehend the depths of her grief and tries to assuage her existential anguish with logical arguments (that are no match for her own) and with the suggestion of a weekend away. Eventually, Renee attempts suicide, and when he finds her note and dashes through the house towards her, he has his own epiphany: he realizes that “because he couldn’t understand what had brought her to such an action, he couldn’t feel anything for her” (Chiang 85). Finally, the two parallel storylines intersect: after she is released from the hospital and Carl has decided to leave the marriage, she makes one last attempt – too little, too late – to explain her emotions. “The things that have been going on in my head . . . If it had been any normal kind of depression, I know you would have understood, and we could have handled it. . . . But what happened, it was almost as if I were a theologian proving that there was no God. Not just fearing it, but knowing it for a fact. Does that sound absurd? . . . It’s a feeling I can’t convey to you. It was something that I believed deeply, implicitly, and it’s not true, and I’m the one who demonstrated it.” He opened his mouth to say that he knew exactly what she meant, that he had felt the same things as she. But he stopped himself: for this was an empathy that separated rather than united them, and he couldn’t tell her that. (Chiang 88–89)

The story ends with this exquisite paradox: because Carl cannot empathize with her loss of her faith (in mathematics, in herself as a mathematician), he has lost his faith (in their marriage, in his empathic skills) and thus understands exactly what she is going through. This is a beautiful example of the elegance in design that Chiang discusses (not unsurprisingly) in his note on this very story: the ending is “clever yet seems totally natural.” Chiang writes “Of course, we know that [endings that are surprising yet inevitable] aren’t really inevitable; it’s human ingenuity that makes them seem that way, temporarily” (Chiang 277). In fact, Chiang has been setting up this ending from the beginning of the story, with the story literally structured like a mathematical proof. Recall that, perhaps in a nod to Gödel numbering – a function assigning numbers to the symbols and formulas of a formal language – Chiang has divided the story into, ostensibly, 18 narrative subsections: 9 focusing

992

J. K. Sklar

on Rene and 9 focusing on Carl. But in truth, as is noted in an anonymous comment on Alex Kasman’s website Mathematical Fiction, there is only one story: the story of a loss of faith. Renee and Carl’s narratives collide in the last subsection, labeled “9A = 9B.” Dividing both sides of this equation by 9 (a valid arithmetical move in a universe in which mathematics is consistent, since the division is not by zero), one obtains the equality A = B: thus, 1A = 1B, 2A = 2B, etc. In other words, Renee and Carl’s narrative arcs have been linked all along. In the end, just as Renee must break, to a greater or lesser degree, with mathematics, Carl must break with Renee: the two narratives of division have, ironically, been joined all along.

Determination I used to think this was the beginning of your story. . . . And this was the end. . . . But now I’m not so sure I believe in beginnings and endings. –Louise Banks, Arrival

Like “Division by Zero,” the novella “Story of Your Life” – the story from which the 2016 film Arrival was adapted – interweaves two related narratives: a linear chronicle of first contact with an alien race and a nonlinear contemplation of a relationship between a mother and daughter. The narrator in each case is linguist Louise Banks, and together the narratives comprise what is fundamentally a love letter to her daughter, beginning and ending (if you believe in beginnings and endings) at the moment when she decides to conceive a child. The linear narrative begins straightforwardly enough: aliens – termed heptapods by humans, since each alien has seven radially symmetric limbs – have placed 112 two-way communication devices, which allow humans and heptapods to both see and hear one another, in various locations around the world. Louise and physicist Gary Donnelly are members of a team of scientists brought to one of these looking glasses and charged with establishing meaningful communication with the heptapods.

Writing Like a Heptapod: Nonlinear Semasiography Louise eventually determines that the heptapods’ oral language and written system of communication are completely different from one another: she calls them, respectively, Heptapod A and Heptapod B. Heptapod A is relatively comprehendible by humans, despite the factors distinguishing it from human languages: [The linguists] made steady progress decoding the grammar of the spoken language . . . It didn’t follow the pattern of human languages, as expected, but it was comprehensible so far: free word order, even to the extent that there was no preferred order for the clauses in a conditional statement, in defiance of a human language “universal.” Peculiar, but not impenetrable. (Chiang 113–114)

33 “Elegance in Design”: Mathematics and the Works of Ted Chiang

993

Heptapod A, as a spoken language, is intrinsically linear. Indeed, given a Heptapod A sentence, we can define a relation ≤ on its set S of words by declaring w1 ≤ w2 if and only if w1 is spoken before w2 in the sentence. In the parlance of mathematics, ≤ is a linear ordering of S. Since word order is free in Heptapod A, equivalent sentences may correspond to different linear orderings on their sets of words, but this doesn’t change the fact that every sentence’s set of words has a unique linear ordering, induced by the order in which the words are spoken. (For a rigorous discussion of relations and linearly ordered sets, see, for instance, Devlin 1993.) Heptapod B, on the other hand, is nonlinear: the performance of writing a Heptapod B sentence induces no linear ordering on its set of constituent parts. These parts, in fact, bear no relation to the words of Heptapod A. Heptapod B is a complex semasiographic system of writing: The term [semasiographic] combines the Greek word semasia meaning “meaning” with a “graphic” presentational style . . . Semasiographic systems of communication convey ideas independently from language and on the same logical level as spoken language rather than being parasitic on them . . . [T]hey can function outside of language. . . . [In one type of semasiographic system,] meaning is indicated by the interrelationship of symbols that are arbitrarily codified. Mathematical notation . . . is one example of such a conventional system, where the numerals, letters, and plethora of specialized signs are conventionally understood as numbers, things, and actions. (Boone 15–16)

Louise calls the constituent components of Heptapod B sentences semagrams. It appeared that a semagram corresponded roughly to a written word in human languages: it was meaningful on its own, and in combination with other semagrams could form endless statements. [The linguists] couldn’t define it precisely, but then no one had ever satisfactorily defined “word” for human languages either. (Chiang 111)

Significantly, however, there is no natural linear ordering of semagrams in a Heptapod B sentence. Louise recalls watching a heptapod write: [I] watched the web of semagrams being spun out of inky spider’s silk. . . . Comparing [the] initial stroke with the completed sentence, I realized that the stroke participated in several different clauses of the message. It began in the semagram for “oxygen,” . . . then it slid down to become the morpheme of comparison in the description of . . . two moons’ sizes; and lastly it flared out as the arched backbone of the semagram for “ocean.” Yet this stroke was a single continuous line . . . The heptapods didn’t write a sentence one semagram at a time; they built it out of strokes irrespective of individual semagrams. (Chiang 122–123)

A sentence written in Heptapod B doesn’t correspond to a linear ordering of its semagrams since multiple semagrams share components, and therefore portions of many semagrams are written at once. This is distinct from the flexibility in Heptapod A word order: while multiple distinct orderings of words in Heptapod A may yield equivalent sentences, the words in each such sentence are spoken in a unique, identifiable, order. Semagrams, on the other hand, are not written as discrete units during discrete periods of time. They are distinct from one another, but their forms overlap in space and their creation is simultaneous. (In Arrival, semagrams – created by artist Maxine Bertrand – are roughly circular, underscoring the nonlinear

994

J. K. Sklar

nature of Heptapod B. Interested readers can find images of these semagrams – termed logograms by the filmmakers – in the subdirectory https://github.com/ WolframResearch/Arrival-Movie-Live-Coding/tree/master/ScriptLogoJpegs of the Wolfram Research GitHub repository.) Louise describes the challenge inherent in writing a sentence in Heptapod B: [A] heptapod had to know how [an] entire sentence would be laid out before it could write the very first stroke. . . . I had seen a similarly high degree of integration before in calligraphic designs . . . But those designs had required careful planning by expert calligraphers. No one could lay out such an intricate design at the speed needed for holding a conversation. At least, no human could. (Chiang 123)

But, eventually, a human can. In time, Louise learns how to both read and write Heptapod B, and in doing so, her perceptions start to mimic those of her xenogenic conversation partners. In Arrival, Louise (Amy Adams) and the film’s version of Gary (Jeremy Renner) – inexplicably rechristened Ian – educate viewers on the Sapir-Whorf hypothesis, which plays an important (albeit unstated) role in Chiang’s novella: IAN :

I was doing some reading about this idea that if you immerse yourself into a foreign language, that you can actually rewire your brain. LOUISE : Yeah, the Sapir-Whorf hypothesis . . . It’s the theory that the language you speak determines how you think. Louise, in learning Heptapod B, begins to think like a heptapod – and heptapods think very differently from humans. A voiceover in Arrival states that [u]nlike speech, a [semagram] is free of time . . . [The heptapods’] written language has no forward or backward direction. Linguists call this nonlinear orthography, which raises the question: is this how they think?

Indeed, it is.

Thinking Like a Heptapod: Variational Principles Gary and his scientific peers are nonplussed when they discover that the heptapods do not seem to understand simple algebra, geometry, or what humans consider to be basic physical principles. The heptapod race is clearly significantly more scientifically advanced than that of humans, yet the heptapods’ understanding of mathematics and physics seems minimal. It turns out that this is because the heptapods’ first principles of physics profoundly differ from ours. The physicists finally have a breakthrough when they present to the heptapods Fermat’s principle of least time. Gary presents the concept to Louise: he explains that if a ray of light is going from a point A in the air to a point B in water, then “The light ray travels in a straight line until it hits the water; the water has a different index of refraction, so the light changes direction” (Chiang 116). He notes that there are many possible theoretical paths that the ray of light could take. (See Fig. 3.)

33 “Elegance in Design”: Mathematics and the Works of Ted Chiang

995

Fig. 3 Four possible paths that light can take when traveling from A (in the air) to B (in the water). The light follows the solid line. For more details, see “Snell’s Law” (Flens 2016)

A

B

Though in three-dimensional space the shortest-length path between two points is always a straight line (e.g., the rightmost dashed line in Fig. 3), the fact that light travels more slowly in water than in air means that following a straight-line path is not the fastest way for light to get from A to B. Gary notes that a path involving the ray of light traveling perpendicularly through the water (the leftmost dashed line in Fig. 3) “reduces the percentage [of the path] that’s underwater, but the [path’s] total length is larger. It would also take longer for light to travel along this path than along the actual one. . . . Any hypothetical path would require more time to traverse than the one actually taken. In other words, the route that the light ray takes is always the fastest possible one. That’s Fermat’s principle of least time.” (Chiang 117–118)

Gary admits that his first explanation of Fermat’s principle was overly simplistic and adds that similar principles exist in “all branches of physics” (Chiang 119): “[T]he word ‘least’ is misleading. You see, Fermat’s principle of least time in incomplete; in certain situations light follows a path that takes more time than any of the other possibilities. It’s more accurate to say that light always follows an extreme path, either one that minimizes the time taken or one that maximizes it. A minimum and a maximum share certain mathematical properties, so both situations can be described with one equation. So to be precise, Fermat’s principle isn’t a minimal principle; instead it’s what’s known as a ‘variational’ principle. . . . Almost every physical law can be restated as a variational principle. The only difference between these principles is in which attribute is minimized or maximized.” (Chiang 119)

A MATHEMATICAL JAUNT The idea that a single equation can be associated with finding both maxima and minima is analogous to one found in first-semester differential calculus. The local minima and maxima of a continuous function f from a set of real numbers to a set of real numbers can only occur at values of c at which the derivative of f is 0. In the calculus of variations, one’s goal is to minimize or maximize a functional – that is, a mapping from a set of functions to the set of all real numbers – rather than a function; instead of identifying real numbers at which the derivative of a function is 0,

996

J. K. Sklar

one identifies the functions at which the functional derivative is 0. The definition of a functional derivative is beyond the scope of this chapter, but readers can learn more about the calculus of variations in The Open University’s unit Introduction to the Calculus of Variations 2016. It is surprising to the physicists that, other than arithmetic, Fermat’s principle of least time is the first mathematical concept they present to the heptapods that the latter seem to understand. Gary and Louise have the following exchange, beginning with an observation by Gary: “[I]t’s curious that Fermat’s principle was the first breakthrough; even though it’s easy to explain, you need calculus to describe it mathematically. And not ordinary calculus; you need the calculus of variations. We thought that some simple theorem of geometry or algebra would be the breakthrough.” “Curious indeed. You think the heptapods’ idea of what’s simple doesn’t match ours?” “Exactly . . . If their version of the calculus of variations is simpler to them than their equivalent of algebra, that might explain why we’ve had so much trouble talking about physics . . . ” (Chiang 118)

THE JAUNT ENDS

Later, as they dine in a Chinese restaurant, Louise tells Gary that something about Fermat’s principle seems strange to her. Gary immediately identifies the reason for her intellectual disquiet: “You’re used to thinking of refraction in terms of cause and effect: reaching the water’s surface is the cause, and the change in direction is the effect. But Fermat’s principle sounds weird because it describes light’s behavior in goal-oriented terms. It sounds like a commandment to a light beam: ‘Thou shalt minimize or maximize the time taken to reach thy destination.’ . . . The thing is, while the common formulation of physical laws is causal, a variational principle like Fermat’s is purposive, almost teleological.” (Chiang 124)

Their exchange continues, beginning with a question by Louise: “[L]et’s say the goal of a ray of light is to take the fastest path. How does the light go about doing that?” “Well, if I can speak anthropomorphic-projectionally, the light has to examine the possible paths and compute how long each one would take.” [Gary] plucked the last potsticker from the serving dish. “And to do that,” [Louise] continued, “the ray of light has to know just where its destination is. If the destination were somewhere else, the path would be different.” Gary nodded . . . “That’s right; the notion of a ‘fastest path’ is meaningless unless there’s a destination specified. And computing how long a given path takes also requires information about what lies along that path, like where the water’s surface is.” [Louise] kept staring at the diagram on the napkin. “And the light ray has to know all that ahead of time before it starts moving, right?” Or, if not that, then I guess ellipses are needed before “And the light ray . . . ”

33 “Elegance in Design”: Mathematics and the Works of Ted Chiang

997

“So to speak . . . The light can’t start traveling in any old direction and make course corrections later on, because the path resulting from such behavior wouldn’t be the fastest possible one. The light has to do all its computations at the very beginning.” (Chiang 125)

Louise identifies what it is that she finds discomfiting: “the ray of light has to know where it will ultimately end up before it can choose the direction to begin moving in” (Chiang 125).

Premembering: Nonlinear Orthography and Nonlinear Time Many science fiction writers have allowed their characters to experience nonlinear time, and explored the paradoxes that nearly inevitably arise from such experiments. Some approach this with utter seriousness, while others, like the television series Doctor Who, approach it with whimsy, with The Doctor declaring: People assume that time is a strict progression of cause to effect, but actually, from a nonlinear, non-subjective viewpoint, it’s more like a big ball of wibbly-wobbly, timeywimey . . . stuff. (“Blink” 2007)

But in the real world, humans have what might be considered a locally linear concept of time, in that they can remember what has happened in the past, but cannot remember what will happen in the future. Heptapods, on the other hand, have a nonlinear concept of time. Louise elaborates: [W]hen humans thought about physical laws, they preferred to work with them in their causal formulation. I could understand that: the physical attributes that humans found intuitive, like kinetic energy or acceleration, were all properties of an object at a given moment in time. And these were conducive to a chronological, causal interpretation of events: one moment growing out of another, causes and effects creating a chain reaction that grew from past to future. In contrast, the physical attributes that the heptapods found intuitive, like “action” or those other things defined by integrals [used in the statement of Fermat’s principle of least time], were meaningful only over a period of time. And these were conducive to a teleological interpretation of events: by viewing events over a period of time, one recognized that there was a requirement that had to be satisfied, a goal of minimizing or maximizing. And one had to know the initial and final states to meet that goal; one needed knowledge of the effects before the causes could be initiated. (Chiang 129–130)

The heptapods’ writing, like their radially symmetric bodies, has no “forward” or “backward” direction; nor, it turns out, does their memory. They experience time not as a sequence of forward-marching moments, with those behind one accessible by one’s memory and those ahead of one not; rather, they know both what has happened in the “past” and what will happen in the “future.” One might ask how one can know what will happen in the future and still maintain free will; Louise notes: The heptapods are neither free nor bound as we understand those concepts; they don’t act according to their will, nor are they helpless automatons. What distinguishes the heptapods’ mode of awareness is not just that their actions coincide with history’s events; it is also that their motives coincide with history’s purposes. They act to create the future, to enact chronology.

998

J. K. Sklar

Freedom isn’t an illusion; it’s perfectly real in the context of sequential consciousness. Within the context of simultaneous consciousness, freedom is not meaningful, but neither is coercion; it’s simply a different context, no more or less valid than the other. (Chiang 137)

As Louise begins to develop a heptapod-like consciousness, she “remembers” – or perhaps a better word is premembers – what will happen in the same way that she remembers what has happened. Like the ray of light that knows where it must enter the water in order to travel from A to B in the least amount of time, Louise knows that she must hit her marks in what she says and does in order to “enact chronology” (Chiang 137). She feels compelled to hit these marks, yet it doesn’t feel compulsory. While shopping, she comes upon a salad bowl that she premembers falling off a counter and hitting her as-yet-unborn daughter in the head; she, in keeping with what she knows will happen, reaches out to take the bowl. But she notes: The motion didn’t feel like something I was forced to do. Instead it seemed just as urgent as my rushing to catch the bowl when it falls on [my daughter]: an instinct that I felt right in following. (Chiang 133)

Story of Her Life Though the linear narrative forms the bulk of the novella’s content, the emotional heart of the story is found in the interwoven patchwork quilt of Louise’s vignettes about her relationship with her daughter, whom we may call “Hannah,” after her character in Arrival (she is unnamed in Chiang’s story). The story begins with Louise’s announcement that Gary, Hannah’s future father, “is about to ask me the question” (Chiang 91) and ends with both the question itself – “Do you want to make a baby?” – and the defining moment when Louise answers, “Yes” (Chiang 145). The vignettes are all prememories, addressed by Louise to yet-to-be-born Hannah, presenting an olio of moments from their shared lives: preteen Hannah arguing with her mother about vacuuming the floor; 6-year-old Hannah eagerly anticipating a trip to Hawaii; 14-year-old Hannah writing a school report; infant Hannah, squalling; and 25-year-old Hannah, identified by her parents after a fatal rock climbing accident. The vignettes are presented in nonchronological order: by the time Gary asks her if she wants to have a baby, Louise is already fluent in Heptapod B and so knows how her entire future with Hannah will unfold before she answers – indeed, she knows it before the question is even asked. The textually separated linear and nonlinear narratives eventually thematically unite, like Renee and Carl’s stories in “Division by Zero.” In his note on “Story of Your Life,” Chiang writes: I’ve found [variational principles of physics] fascinating ever since I first learned of them, but I didn’t know how to use them in a story until I saw a performance of Time Flies When You’re Alive, Paul Linke’s one-man show about his wife’s battle with breast cancer. It occurred to me then that I might be able to use variational principles to tell a story about a person’s response to the inevitable. (Chiang 277)

33 “Elegance in Design”: Mathematics and the Works of Ted Chiang

999

The fundamental question posed by Chiang’s novella is: Would you choose to love, knowing that your choice will lead to grief ? And its a priori answer, at least for Louise, is yes. Her actions, like the heptapods’, “coincide with history’s events” and her “motives coincide with history’s purposes” (Chiang 137). Her future is predetermined: not because she has no free will but because her choices are exactly those that determine the future that she knows will occur – indeed, that future occurs exactly because of the choices she makes. And, like the ray of light minimizing its traveling time from A to B, Louise, in choosing to have a child who she knows will die young, is optimizing her experience. But an optimization can be a maximization or a minimization: which is it for Louise? Even she doesn’t know: From the beginning, I knew my destination, and I chose my route accordingly. But am I working toward an extreme of joy, or of pain? Will I achieve a minimum, or a maximum? (Chiang 145)

Most likely, Louise has achieved both.

Conclusion Chiang is certainly not the only author to tackle mathematical themes in his writing. Many works – from Flatland, to Robert Heinlein and Ian McEwan’s respective stories “And He Built a Crooked House” (1941) and “Solid Geometry” (1975), to the low-budget Canadian horror film Cube 2: Hypercube (2002) – play with mathematically informed spatial and temporal concepts. Additionally, the cracking of codes – literal or figurative – has captured the interest of myriad authors, readers, and viewers: the mathematician in Sneakers (1992) is murdered because he has made a breakthrough in the factoring of large numbers, presumably making public-key encryption obsolete (Len Adleman, the ‘A’ in RSA cryptography, was a consultant on the film), and mathematicians in The Bank and π (1998) discover ways of predicting the behavior of the stock market. Still, Chiang’s use of mathematics in his work is relatively unique. “Division by Zero” and “Story of Your Life” have structures that are mathematical. And the emotional impact of his work is both mathematically informed and exceptional. In his review of Stories of Your Life and Others, China Miéville writes: [Chiang’s] stories are constructed on a bedrock of profound humanism, so the most abstruse philosophical conjectures are experienced as resonant and emotional. In “Division by Zero”, for example, we feel painful empathy and pity for the main character only because and insofar as we have understood the crisis in her life occasioned by a mathematical paradox. This is scientific problem not as puzzle to be solved, but as ontological catastrophe, and human catastrophe too. Similarly, in his notes, Chiang describes how “Story of Your Life” “grew out of [an] interest in the variational principles of physics”: perhaps surprisingly, the tenderness of this story, and its astonishingly moving culmination, are not achieved despite the scientific speculation, but are direct functions of it.

1000

J. K. Sklar

In Chiang’s universe, humanism is inextricable from rationalism. Far from being counterposed as “cold” to emotion’s “warmth”, it is the rationalism of the characters—and the writer—that makes them emotional and human.

Mathematics is not simply a prop, or even a character, in Chiang’s stories: rather, it is the frame on which the stories are built. Like those of Renee, the “intellectual and emotional lives” of Chiang’s stories are “inextricably linked” (Chiang 87): in his work, mathematics and emotion, like marble tiles that Renee admires, meet at “incredibly fine lines” – and, like Renee, we may “[shiver] at the precision” (Chiang 74). Acknowledgments The author thanks Drs. Tom Edgar, Elizabeth Sklar, and Bharath Sriraman, as well as Sean McQueen and an anonymous peer reviewer, for their invaluable help as readers of and consultants on this chapter.

References Abbott E (1884) Flatland: a romance of many dimensions. Dover Publications, New York Arrival (2016) Dir. Denis Villeneuve. Film Moffat S (2012) “Blink.” Doctor Who: The Complete Third Series Dir. Hettie Macdonald. BBC Home Entertainment, London. Television episode Boone E (1994) Introduction: writing and recording knowledge. In: Bonne E, Mignolo W (eds) Writing without words: alternative literacies in Mesoamerica & the Andes. Duke University Press, Durham Chiang T (2016) Stories of your life and others. Vintage Books, New York (Original work published 2002) Devlin K (1993) The joy of sets: fundamentals of contemporary set theory, 2nd edn. Springer, New York Flens H (2016) https://eng.libretexts.org/Bookshelves/Materials_Science/Supplemental_ Modules_(Materials_Science)/Optical_Properties/Snell’s_Law “Snell’s law.” Engineering LibreTexts. CC BY-NC-SA 3.0 US. 28 Jul 16. Accessed 19 Mar 19. Judson T (2018) Abstract algebra: theory and applications. GFDL, Tacoma. http://abstract.ups.edu/ aata/. Accessed 27 Oct 2018 Kasman A (2018) Division by zero. In: Mathematical fiction. College of Charleston. http:// kasmana.people.cofc.edu/MATHFICT/mfview.php?callnumber=mf194. Accessed 19 Oct 2018 Miéville C (2004) “Wonder boy.” The Guardian. Guardian News and Media Unlimited. 23 Apr 2004. Accessed 30 Dec 2018 Munkres J (2017) Topology, 2nd edn. Prentice Hall, Inc., Upper Saddle River Niven I, Zuckerman H, Montgomery H (1991) An introduction to the theory of numbers, 5th edn. Wiley, Hoboken NUMB3RS (2005–2010) Creators Nicolas Falacci and Cheryl Heuton. Television series Introduction to the calculus of variations (2016) The Open University, Milton Keynes. http://www. open.edu/openlearn/ocw/mod/resource/view.php?id=72745 π (1998) Dir. Darren Aronofsky. Film Sklar J, Sklar E (eds) (2012) Mathematics and popular culture: essays on appearances in film, fiction, games, television, and other media. McFarland, Jefferson Waller P, Flood C (2016) Mathematics as a universal language: transcending cultural lines. J Multicult Educ 10(3):294–306. https://doi.org/10.1108/JME-01-2016-0004 Wolfram C (2017) Arrival logograms. In: Wolfram Research GitHub repository. https:// creativecommons.org/licenses/by-nc/4.0/CC BY-NC 4.0; https://github.com/WolframResearch/ Arrival-Movie-Live-Coding/tree/master/ScriptLogoJpegs Accessed 2 July 19

Running in Shackles: The Information-Theoretic Paradoxes of Poetry

34

Dmitri Manin

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Form Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Nonsense Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Curious Case of Missing Synonyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Word in Its Place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Beyond Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1002 1004 1006 1008 1009 1011 1012 1013

Abstract Information theory developed by Claude Shannon in the 1940s provides a simple, but powerful model for reasoning about communication that far transcends the relatively narrow technical domain of telecommunications, for which it was initially developed. The use of language for exchanging messages is, arguably, the most distinctive feature of Homo sapiens as a species. Language is central for almost everything we do, and among many different ways language is used, poetry is perhaps the most enigmatic. Poetry is an ancient invention and never went out of fashion, but the reasons for its existence and the mechanisms of its impact remain elusive. Does information theory have anything to say about poetry? If poetry is often conceptualized as a message with highly concentrated meaning, can it be proven that it has high information content? Attempts to answer these questions in the past 60 years that we review in this chapter are rich with important insights and nagging paradoxes.

D. Manin () Independent researcher, Menlo Park, CA, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_55

1001

1002

D. Manin

Keywords Autoencoders · Entropy · Formal constraints · Information · Kolmogorov complexity · Meter · Metaphor · Poetry · Rhyme · Redundancy

Introduction From the very beginning of information theory, researchers were tempted to apply it, even if informally or metaphorically, to literary texts, especially poetry. In his groundbreaking paper, Claude Shannon (1948) wrote: Two extremes of redundancy in English prose are represented by Basic English [Wikipedia (2018) – DM] and by James Joyce’s book “Finigan’s Wake” [sic – DM]. The Basic English vocabulary is limited to 850 words and the redundancy is very high. This is reflected in the expansion that occurs when a passage is translated into Basic English. Joyce on the other hand enlarges the vocabulary and is alleged to achieve a compression of semantic content.

Poetry is commonly described in terms of “economy of language,” “compression of meaning,” and so on. This notion even occurs in dictionary definitions, such as this one from the Merriam-Webster’s dictionary: “writing that formulates a concentrated imaginative awareness of experience in language” (Merriam-Webster online, 2018). So it seems that information theory must be able to contribute something to this intuitive notion that a work of literature can pack a lot of punch in a short stretch of text. Indeed, the theory defines a measure of content, and if one could demonstrate that poetry has more content per unit length (information density or entropy) than nonliterary, utilitarian language, then that would be a proof of its elevated expressive power. Shannon’s definition of information operationalized some of the intuitive notions that had been expressed long before. Without getting into technical subtleties, the amount of information in a sequence of symbols is a measure of its unexpectedness. The amount of information per symbol (entropy or information density) is given by H =−



pi log pi

(1)

i

where pi is the probability for the ith symbol to occur. If every symbol can be predicted with certainty, the sequence has no information in it, in agreement with the intuitive notion. The way to increase the amount of information is to compose the message out of unexpected elements. For example, if we consider each word in the text as a “symbol,” the greater the vocabulary, the smaller these probabilities are on average, so the amount of information increases. Compare this with how Aristotle in his Poetics recognized the importance of unexpected elements: “That diction [. . . ] is lofty and raised above the commonplace which employs unusual words.” and “It is precisely because such phrases are not part of the current idiom that they give distinction to the style” (Aristotle, 2008, Part XXII). As another reference point in modernity, consider Viktor Shklovsky’s theory

34 Running in Shackles: The Information-Theoretic Paradoxes of Poetry

1003

of defamiliarization (ostranenie). A writer and scholar of the Russian formalism school, Shklovsky, contended that our perception is dulled by habituation, and the goal of literature is to refresh it by presenting things in an unfamiliar, estranged way, in particular with language (Shklovsky, 2015): The language of poetry is difficult, laborious language which puts the brakes on perception. In some particular cases, the language of poetry approaches the language of prose, but this does not violate the law of difficulty. [. . . ] For Pushkin’s contemporaries, Derzhavin’s elevated diction was the usual language of poetry, so that Pushkin’s style was unexpectedly difficult for them in its ordinariness. Recall that Pushkin’s contemporaries were horrified by his vulgar expressions. Pushkin used the vernacular as a device to arrest attention [. . . ]

Shannon’s definition allows us to actually compute some measure of unexpectedness of a text’s language: either by using word frequency as an approximation to its probability or by asking human subjects to predict the next word (or letter) and estimating probability from the success rate on this task. There have been some attempts to do that early on. Shannon himself conducted the first experiments (Shannon, 1951), where a respondent was asked to guess the next letter in randomly chosen fragments of a Jefferson biography. Shannon was mostly interested in estimating the information rate itself and didn’t study differences between texts. Later, a number of researchers repeated and elaborated Shannon’s experiments. Fónagy (1961) compared guess rate of the next character in three different texts: a poem, a newspaper article, and a “conversation of two young girls.” Apparently, he used a simplified method, where each letter was guessed only once (in contrast to Shannon’s guessing until the correct answer is obtained), which doesn’t provide a way to compute entropy estimates. But the guess rates of 40% in poetry, 67% in newspaper, and as high as 71% in conversation suggested an elevated entropy for the poetry. Other estimates of the natural language entropy, both using Shannon-style experiments and statistical text analysis, can be found in Burton and Licklider (1955), Paisley (1966), Kolmogorov (1968), Cover and King (1978), Kontoyiannis (1996/1997), Pereira et al. (1996), Teahan and Cleary (1996), Moradi et al. (1998). In some of these works, attempts have been made to correlate informationtheoretical characteristics of texts with their style and literary quality. However, for the most part, such studies were motivated by practical applications like text compression. The only systematic study of entropy as a function of style, period, and author that I am aware of is Paisley (1966). Though it employed a very crude entropy estimate (by frequencies of two-letter sequences), a systematic difference was found between prose and poetry texts, the latter having higher entropy, i.e., information density, in agreement with expectations. However, if we attempt to naively apply information-theoretic concepts to poetry, two paradoxes immediately arise. First, the constraints of poetic form, such as meter, alliteration, and rhyme, restrict the pool of words that can fill a given place in the line and so should decrease unpredictability and entropy, rather than increase it. Call this The Form Paradox. Second, according to Shannon’s theory, the highest entropy is found in a completely random character sequence, where all characters occur with equal frequency and independently of each other. But a random sequence of letters

1004

D. Manin

(or words) can’t form a meaningful text, so it’s not even clear in what sense it can be treated as containing information. Call this The Nonsense Paradox. We will consider them in the following two sections.

The Form Paradox Fónagy (1961) noted that rhythmical and phonic organization of poetry should, it seems, decrease unpredictability and entropy, contrary to his experimental results. In the same collection where that article was published, we find a short, but insightful, article by Abernathy (1961) where the same idea is put in this striking way: “poetry uses a considerably circumscribed and impoverished language compared to the everyday speech.” Abernathy proposed to resolve this paradox by ascribing a subjective probability to each message and postulating that poems are characterized by a drastically lower probability, i.e., higher unexpectedness, despite the fact that they use “a considerably circumscribed” language. Unfortunately, this is a rather nonconstructive approach, since it is not clear how to ascribe probabilities to texts, and, more importantly, why or whether lower probability would result from satisfying formal restrictions. A much more detailed exploration of this paradox is due to A.N. Kolmogorov, a great mathematician with a deep interest in literature. In a series of unpublished (at the time) works popularized by Lotman (1977) (see also Kolmogorov, 1997; Yaglom and Yaglom, 1983), he was developing a formalized approach based on considering a set of all possible texts and a particular poem as a member of this set. A poet can be seen as selecting or finding the text that expresses the desired meaning and at the same time satisfies some formal constraints. As an illustration, consider the set of all character sequences, say, no longer than War and Peace. It is a very large, but finite, number. There would be in this set a small subset of grammatically correct and meaningful Russian texts. In this highly idealized model, we don’t care what exactly the “meaning” is, but we postulate that each character sequence has either no meaning or exactly one, and that given any two “meanings” we can always say whether they are the same or different. Subdivide all texts into equivalence classes by synonymy, so that each class contains texts that all mean the same. In other words, each class would express in all possible ways some content different from that of the other classes. Consider the synonymy class in which all texts express the meaning of, say, Eugene Onegin. If this class is large enough, one can find in it a text that is composed in Onegin stanzas, which is what a poet does. Then the number of synonymy classes would be equal to the number of meanings expressible with texts no longer than War and Peace, and the average number of texts in each class is essentially the number of different ways to express any given meaning. According to Kolmogorov, the former quantity (or rather, its logarithm) reflects the “content capacity entropy” of the language (h1 ), while the latter one reflects its “flexibility entropy” (h2 ). If the flexibility entropy is large enough, i.e., if the given content can be expressed in a large enough number of ways, one can

34 Running in Shackles: The Information-Theoretic Paradoxes of Poetry

1005

expect to find among those ways some that also satisfy the formal restrictions of versification. Because formal constraints reduce the pool of admissible texts, they can be characterized by a negative entropy (β), and Kolmogorov proposed that versification is only possible in languages where β < h2 . Of course, one can’t get rid of the feeling that something is wrong in this picture. Cf. the uncharacteristically impressionistic passage in a classical text on probability and information (Yaglom and Yaglom, 1983, p. 214): However, in the compositions of many eminent poets the decrease in the information content of one text letter, related to the fulfilment of known formal rules, is apparently compensated for to a great extent by the enhanced radiance and unconventionality of language. Therefore, it can be well expected that here the redundancy of the language has the same order as that of a prose literary text.

I suppose Kolmogorov felt that way too, which is why he apparently never published his model. In fact, in an undated manuscript (Kolmogorov, 2002) first published by Uspensky (2002), he wrote: “Poetry admits somewhat freer use of word order unconventional in prose, which somewhat increases [the flexibility entropy].” This essentially means that the basic notion of the set of all admissible texts is somewhat fuzzy: what is barely admissible in prose can be quite admissible in poetry. But if the statistical population is not well defined, probability and, hence, information are also ill-defined. One can even surmise that it was this train of thought that eventually led Kolmogorov to propose his algorithmic complexity theory. There is a telling remark in his groundbreaking paper (Kolmogorov, 1968) after the introduction of what is now known as Kolmogorov complexity: “such quantities as the ‘complexity’ of the text of ‘War and Peace’ can be assumed to be defined with what amounts to uniqueness.” Uspensky in his preface to the publication of Kolmogorov’s notes (Uspensky, 1997) also admits that “the very notion that the corpus of literary texts is only a subset of the corpus of meaningful texts” requires rethinking and gives an example of the famous nonsense line from a poem by the Russian modernist poet Kruchenykh, consisting of three meaningless monosyllabics “Dyr bul shchyl” (Perloff, 2017, p. 73). Note, however, that while Kolmogorov pointed out the relaxation of syntactic norms in poetry, Uspensky’s example hints at the possible relaxation of semantic norms. In fact, such relaxation is well-known to literary scholars. We are used to the notion of metaphor as a specific feature of poetry and literary prose, but most metaphors are literally absurd phrases, inadmissible by the standard semantics of the language, as one can see with any textbook example like “the curtain of the night fell upon us” or “Juliet is the sun.” So it’s easy to see that metaphor (as well as metonymy and other tropes) already serves to expand the space of admissible texts in the semantic dimension. As for syntactic expansion, an obvious example is the word order violation often arising from the demands of poetic form. However, not all syntactic oddities arise for purely technical reasons. Many figures of speech known and meticulously catalogued at least since antiquity are particular ways to enhance expression by violating rules of syntax. Consider, for example, the opening of G.M. Hopkins’

1006

D. Manin

poem “To His Watch”: “Mortal my mate. . . ” (Hopkins and Blaisdell, 2013, p. 84). From the point of view of rhythm, the standard word order, “My mortal mate,” wouldn’t be any inferior, but Hopkins forcefully emphasizes mortality, the theme of the poem, by shifting the word to the syntactically awkward initial position. To summarize, it appears that although poetic form does narrow down the statistical population of all admissible texts, various poetic devices counteract by expanding it and by pushing the boundaries of the standard language syntax and semantics. In this way, they keep the language of poetry from becoming “considerably circumscribed and impoverished” (Abernathy, 1961) and prevent entropy reduction.

The Nonsense Paradox Suppose that armed with information theory we decided to construct a poem with the highest possible information content. That requires each next element of the text (whether we consider it composed of letters or words) to be the most unpredictable. It’s easy to demonstrate that maximum entropy in a sequence is achieved when every element is independently drawn from a uniform random distribution. But such “text” would be nonsensical. How can nonsense have any information content, let alone maximum information content? To untangle this problem, recall that Shannon’s theory was formulated in the context of reversible transformations of symbolic sequences, such as encoding and decoding. If we compress the text of War and Peace using any standard algorithm like LZ77 (e.g., Dumas et al., 2015), we get a seemingly random character sequence where all characters appear at roughly the same rate and independently of each other (in statistical terms). However, a decoding transformation exists that can be applied to this sequence to recover the original text. That means that however one defines the “content” of a text, it remains intact in the compressed form, and since the compressed text is shorter than the original, its information content is higher. What makes such compression possible is that the elements of a natural language text are not equiprobable and, more importantly, not independent. Compression is possible for a sequence whose entropy is lower than the maximum achievable with a given alphabet. The relative deficit of entropy is called redundancy (Shannon, 1948). An important role of redundancy, as also discovered by Shannon, is error correction, i.e., redundancy makes it possible to reconstruct the signal, if it is distorted by noise. Obviously, natural language needs a fair amount of redundancy, because it evolved in an environment where quite literal noise often interferes with communication. So, as is well-known, n nglsh txt wth ll vwls rmvd s stll mstl rdbl, because not all letter combinations are pronounceable and not all pronounceable combinations are meaningful words. Syntax also contributes to the redundancy of natural texts: some word sequences (in fact, a great many of them) are grammatically invalid. And a third major component of redundancy comes from semantics: most syntactically correct sentences would be meaningless and so highly unlikely to occur in reality.

34 Running in Shackles: The Information-Theoretic Paradoxes of Poetry

1007

So, with reversible text transformations, there is no paradox. Problems begin when we notice that not all interesting text transformations are strictly reversible. For example, when Shannon discusses the translation of Finnegans Wake into Basic English, it is fairly obvious that such a translation can’t be done in such a way as to ensure that the reverse translation would exactly recover Joyce’s text. In fact, it’s highly doubtful that such a translation is even possible. Consider a typical sentence from the novel (Joyce, 2012, p. 6): Hurrah, thereis but young gleve for the owl globe wheels in view which is tautaulogically the same thing.

The word “tautaulogically” is a playful misspelling of “tautologically” that echoes its own meaning of pointless repetition by making the second syllable into a copy of the first one. It would be possible to reproduce the pun or create an equivalent one in a translation into a different, but similarly unrestricted language. But writing in Basic English, by its very definition, precludes one from using words beyond its 850-word lexicon. One could explain the pun in Basic English, but that’s not the same as translating it, i.e., creating an equivalent text that could function in the same way as the original one. Of course, James Joyce’s creation is an extreme example. But with respect to literary texts in general, it’s not uncommon to encounter the notion that they can’t be paraphrased, because changing any element would irreparably alter the text and change its meaning. That’s what Leo Tolstoy meant when he famously remarked that in order to explain the message of Anna Karenina, he would have to write the novel again. Semiotician and literary scholar Yuri Lotman wrote (Lotman, 1977): the reader considers the text placed before him (if we speak of a perfect work of art) as the only possible text; “you can’t throw a word out of a song”. For the reader the substitution of a word in the text is not a variant of the content, but new content. Carrying this tendency to its logical extreme, we can say that for the reader there are no synonyms.

Compare a very similar remark from a contemporary American introductory text on poetics (Mandel, 1998): The selection of the right or best word or phrase in the right or best place is so delicate a task because — the exaggeration is minute — there are no synonyms.

Let’s step back for a moment. The root of the Nonsense Paradox lies in the different notions of “content” of a message in information theory vs our tentative application. From the information-theoretic point of view, a character sequence is the message, while with natural language, we have an intuitive notion of meaning that is “contained” in the message and can be expressed in different ways. When we consider writing a highly informative poem, we imply the most compact way of expressing a meaning and so have to deal with the question of whether two different texts express the same meaning. Information theory wasn’t designed to deal with this notion of text synonymy. But note that we already encountered it before, when we considered the Form Paradox: it arose in Komogorov’s model of poetic composition. It’s worthwhile to look at it in more detail.

1008

D. Manin

The Curious Case of Missing Synonyms The natural language possesses huge paraphrasing capabilities. The same meaning can be expressed in many different ways (in particular, this sentence can be regarded as a paraphrase of the previous one). This property seems to play an important role in composing literary texts, especially poetry, by allowing us to select out of all texts expressing the desired meaning, the one that also satisfies some requirements to its form. Except, if we are to believe philologists, it doesn’t, because changing even a single word would alter the meaning. Even making an allowance for the obvious hyperbolicity of those claims (because, of course, some words in most poems can be altered without completely changing the poem), it’s clear that there’s something important lurking here. Where does the powerful paraphrasing ability of natural language disappear, or at least, why is it significantly reduced, when we deal with poetry? And, how is this related to the expressive power of poetic language? According to Lotman (1977), the lack of textual synonymy plays a crucial role in poetry: if every poetic text has a different meaning, that implies that they open up an enormous space of new meanings. In terms of Kolmogorov’s model, if we reduce the flexibility entropy to near zero (each synonymy class has only one element), that increases the content entropy, if their sum is still equal to the total language entropy. The value of poetry, then, is not that it can express any given meaning in an especially economical way but that it can express meanings otherwise inexpressible. The way Lotman explains this effect is by postulating that there are two different ways to perceive a text: from the author’s and from the reader’s points of view. The reader is given an immutable, finished work of art and tends to read significance into every word choice, every sound of it. In that sense, the entire entropy of the text is the content entropy. On the other hand, from the writer’s point of view, the artistic freedom is virtually unlimited. In the extreme case, one could say that any word can be substituted for any other, in the same sense as Carroll’s Humpty-Dumpty claimed that “When I use a word, [. . . ] it means just what I choose it to mean— neither more nor less.” So for the writer, the entire entropy of the language is the flexibility entropy. It is not clear whether this boldly impressionistic view can be given a precise meaning, but let us note that underlying here is again the idea that the norms of the standard language can and should be violated or altered in poetry. For the writer, the standard semantics of words becomes malleable and customizable. For the reader, normally non-semantic features, like the sound of words or their grammatical relationships, become vehicles of meaning. Lotman demonstrated in his analyses of specific poems how phonemes can take on semantic meaning (unique for each poem) by virtue of their recurrence in meaningful positions. Likewise, Roman Jakobson analyzed the semanticization of grammatical structures in poetry (Jakobson et al., 1987). Another way to look at the text synonymy problem is, in Lotman’s words (Lotman, 1977, p. 24): The fact is that in numerous instances the receiver must not only decipher a message with the help of a particular code, but must determine the “language” in which the text is encoded.

34 Running in Shackles: The Information-Theoretic Paradoxes of Poetry

1009

The claim is, essentially, that each poem defines its own language, overlapping with, but not identical to the standard, base language in which it is written. This is done by overriding some syntax and semantics of the base language and replacing them with the text-specific syntax and semantics. If so, it’s not surprising that textual synonymy is significantly reduced. But is it possible to somehow substantiate this view in a more scientific manner? We’ll address this question in the next section.

A Word in Its Place The notion that literary texts possess some special rigidity of structure is actually rather commonplace and widespread, as expressed in the form of the Russian proverb “you can’t drop a word from a song.” One can find literally dozens of quotes from different cultural contexts that all express this feeling of “the right word in the right place” in a remarkably consistent way, for example: • There’s no dropping a word out of a song (Russian proverb). • Not a line is drawn without intention. . . As Poetry admits not a Letter that is Insignificant so Painting admits not a Grain of Sand or a Blade of Grass Insignificant much less an Insignificant Blur or Mark (Blake et al., 2008, p. 560). • Every word must be mandatory (Kharms, 2002, book 38, sheet 3). • The poet’s conviction that nothing is left to chance in poetry refutes the puerile reasoning of some literary scholars who think that “a poem may contain structures that are not connected with its literary function and impact” (Jakobson, 1987). • The selection of the right or best word or phrase in the right or best place is so delicate a task because—the exaggeration is minute—there are no synonyms (Mandel, 1998, p. 192). Just as the degree of unpredictability can be measured directly by asking subjects to guess words or letters in a text, the degree of this rigidity (word irreplaceability) can be measured by asking subjects to assess whether a given word belongs to the text or had been replaced. This idea was recently realized experimentally (Manin, 2011, 2012) in the form of an online literary game (in Russian), which yielded a large amount of data over a considerable corpus of text fragments, both verse and prose. The results show that word unpredictability of typical poetry and of literary prose lies roughly in the same range, just as it was predicted by Yaglom and Yaglom (1983) (quoted in section “The Form Paradox”). As discussed above, words in poetry are harder to guess because of the greater freedom of word selection and combination in poetry compared to prose, but they are at the same time easier to guess because of the constraints of meter and rhyme. As it turns out, these two opposite forces are approximately of equal strength, so that their effects are canceled. This is likely not a coincidence, but an indication that language dictates

1010

D. Manin

a certain level of unpredictability. Texts that are too predictable sound banal, and texts that are too unpredictable border on nonsense (which is, in fact, a playground of much avant-garde literature). But this begs a question: why would the poet bother to increase predictability with unusual word combinations while at the same time decreasing it back to where it was by following formal restrictions? Is anything measurably gained in the process? It turns out that yes, poetry has a significantly higher measure of constrainedness, a quantity indicating how easy it is for a reader to detect a substituted word. One can view each word in a text as being constrained by its environment via syntax and semantics. In poetry, though, words are also constrained by other features of the text: rhythm (not necessarily meter), sound patterning (not necessarily rhyme), and all kinds of semantic and grammatical echos. This is what makes a poem into a rigid structure where each word uniquely fits in its place and is hard or impossible to replace. The dynamics of entropy (predictability, redundancy) becomes clearer, if we invoke the error-correcting function of redundancy. Consider the simplest code which achieves higher reliability in a noisy environment by simply doubling each transmitted character. If any symbol is omitted in a message encoded this way, it can be reconstructed with certainty. Hence, the extra characters carry no information. Why then are they needed? Because they carry meta-information, a meta-message: “this message was not corrupted in transmission.” The feeling of “the right word in the right place” in a literary text can have a similar function, except that it is an evidence that the message has been deciphered (understood) correctly. Lotman’s idea of a literary text simultaneously carrying a message and defining the language in which it is encoded is apt here. If the reader feels that all the words are where they belong, this feeling is an assurance that the language was correctly decoded. For this, the text must have sufficient redundancy. But poetry, as opposed to prose, can transmit the meta-message via meter, as well as rhyme and other formal devices, which are not usable for encoding the message proper. By shifting the redundancy/predictability into these non-semantic domains, poetry frees up the semantic resources of the text for transmitting the meaning, which therefore can be made much denser. If that is true, it means that formal constraints paradoxically increase information content or rather open the possibility for such an increase. With some exaggeration, we could say that the formal organization of the text helps us accept linguistic constructs that would be impossible otherwise, those that expand the set of all admissible texts. There seems to be a clear parallel here with the remarkable result of McGlone and Tofighbakhsh (1999) who discovered that rhymed aphorisms are perceived as being more true than their non-rhymed semantic equivalents (while being equally comprehensible). Indeed, a rhymed saying has a higher total redundancy than the unrhymed equivalent, which is subconsciously perceived as a certificate of correctness, but this redundancy doesn’t come at the expense of the meaning-carrying capacity. In fact, the results of Manin (2012) make it possible to estimate the relative magnitude of entropy redistribution between semantic and formal aspects of a poem.

34 Running in Shackles: The Information-Theoretic Paradoxes of Poetry

1011

The probability to guess a word is the product of two probabilities: to guess the word length only (the formal part) and to guess the word, if its length is known (mostly semantic part). So, the total entropy, with its logarithmic dependence on probability, can be approximated as the sum of formal and semantic entropies. It turns out that in prose, they are approximately equal, while in metered poetry, the formal entropy is very low, which means that semantic entropy is at least twice as high as in prose. Perhaps, it is one of the functions of great poetry to expand the set of all admissible texts. “In order to be realized, poetry needs to transcend the boundaries of language, needs a kind of ex-stasis” (Geide, 2006). As a counterpoint, consider the following raw account, not mediated by theoretical thought, of how a new, and otherwise inexpressible, meaning is actually created for the reader (listener) when the poet “transcend[s] the boundaries of language.” Director T. Haynes (2007) discusses Bob Dylan’s song I’m Not Here after which he named his film about Dylan: There’s words that no one can ever decipher, that he is sort of filling in as he is singing. And in a way because it escapes linguistic logic and mastery it almost communicates more than it ever could if it was fully legible, readable. It’s sort of beyond words.

Beyond Entropy Though the Shannonian paradigm eventually proved quite productive in its applications to literature analysis, it is not the only possible information-theoretic approach. Kolmogorov complexity, which is roughly the length of the shortest program that generates a given text, provides another interesting perspective. Quoting Manin and Manin (2017) (see also Manin, 2003): A natural language message is usually treated as carrying information. But it also can be treated as a program that runs in the brain of the receiver and whose purpose is to create a certain mind state in it. This interpretation is particularly interesting for literary texts, especially poetry, because their purpose is not conveying information, but rather imparting an emotional state to the reader. It is customary to state that successful poetry compresses its language and, consequently, if one wants to fully explicate the “meaning” of a good poem, an extensive prose text has to be written. So perhaps the right way to conceptualize a great poem is to say that it represents a maximally Kolmogorov-compressed representation of the target mind state.

Note that because Kolmogorov complexity is not computable, finding the maximally compressed description of something – whether a target mind state or an image, a scene, and a situation being described in the poem – is necessarily a creative act, something that can’t be done with an algorithm (faster than brute-force enumeration). Although this idea seems quite speculative, a recent work by Grietzer (2017a,b) explores a closely related territory in a spectacularly concrete fashion. Grietzer’s main instrument is autoencoders, a class of machine-learning algorithms based on the neural network architecture. An autoencoder is “trained” on a large corpus of objects from a multidimensional feature space and “learns” a transformation

1012

D. Manin

that maps this input space into a low-dimensional representation space. In other words, a trained autoencoder embodies a compression algorithm. This algorithm is reversible, so that any point in the representation space can be mapped back into the input space, but it is also, generally speaking, lossy, i.e., the reverse mapping will not necessarily reconstruct the original object exactly. However, there is an important low-dimensional submanifold in the input space, namely, the set of its points the trained neural net learned to project into. We can call it the invariant manifold, because the objects belonging to it are compressed by the autoencoder losslessly and can be reconstructed from their representations exactly. A crucial property of the invariant manifold is that the best way to recreate the same mapping in another autoencoder is to train it on the first autoencoder’s invariant manifold. (Of course, when dealing with computer software, it’s easier to just copy the trained state to another instance, but that only works for instances with identical architecture.) This is where art and literature come in. Everyone perceives the world by constructing an internal lower-dimensional representation of it. Only the most important features of the objects and events are represented there. Some important features are more or less shared by all people; others are defined by one’s unique perspective and personality. So every human essentially embodies an autoencoder. Suppose one wants to share one’s unique perspective on the world (or, rather, a fragment of it) with others. The best way to do it is to publish one’s invariant manifold (or, rather, a consistent piece of it) and let others train themselves on it. This describes creating artwork and reading/viewing/listening to it. Interestingly, this description seems to circumvent the non-computability problem of Kolmogorov complexity, because instead of trying to find the shortest description of a predefined fragment of the world, the author takes an existing short description and constructs a fragment of the world best described by it. But of course, the analogy is somewhat broken at this point, because a work of art is not (usually) a literal fragment of the world, but a symbolic representation of it. Nevertheless, this looks like a very powerful paradigm with a great promise.

Conclusion Thinking about poetry and literature in general in terms of information theory turns out to be surprisingly productive. Many centuries-old notions, as well as relatively recent developments in the humanities, can be naturally integrated into a cohesive paradigm. Moreover, recent technological developments make it possible to move from informed, but informal speculation to verifiable, reproducible results that give a solid ground to conclusions. Poetry emerges perhaps most importantly as language use that pushes the boundaries of standard syntax and semantics, expanding the set of all admissible texts. A poem can utilize highly unusual word combinations but still maintain a comfortable level of total redundancy with rich patterning on phonic, morphologic, and syntactic levels. This effectively increases its meaning-carrying capacity. The

34 Running in Shackles: The Information-Theoretic Paradoxes of Poetry

1013

rules of the underlying language are relaxed and partially replaced with regularities particular to this specific piece. Thus, a poem can be described as a message that to some extent defines its own language. This in turn leads to the reduction of textual synonymy and the feeling of every word fitting harmonically in its place, which likely is an important component of aesthetic satisfaction. We are still at the beginning of this journey, and one should expect many new developments in the years to come.

References Abernathy R (1961) Mathematical linguistics and poetics. In: Davie D (ed) Poetics. Poetyka. Poetika. Pa´nstwowe Wydawnictwo Naukowe, Warsaw, pp 564–569 Aristotle (2008) Poetics. Cosimo classics. Cosimo Classics, New York. ISBN:9781605203553. https://books.google.com/books?id=L6MzVeeCyN4C. Translated by Butcher SH Blake W, Erdman D, Bloom H (2008) The complete poetry and prose of William Blake. University of California Press. ISBN:9780520256378. https://books.google.com/books?id=pyaJajW3kEC Burton NG, Licklider JCR (1955) Long-range constraints in the statistical structure of printed English. Am J Psychol 68(4):650–653 Cover TM, King RC (1978) A convergent gambling estimate of the entropy of English. Trans Inf Theory 24(4):413–421 Dumas JG, Roch JL, Tannier É, Varrette S (2015) Foundations of coding: compression, encryption, error correction. Wiley. ISBN:9781118960523. https://books.google.com/books?id=dAWBgAAQBAJ Fónagy I (1961) Informationsgehalt von Wort und Laut in der Dichtung. In: Davie D (ed) Poetics. Poetyka. Poetika. Pa´nstwowe Wydawnictwo Naukowe, Warsaw, pp 591–605. In German Geide M (2006) Dorechevoe [before speech]. Text only (a web publication), (17). http://textonly. ru/case/?issue=17&article=9538. In Russian Grietzer P (2017a) Deep learning, literature, and aesthetic meaning, with applications to Modernist studies. Précis of April’17 HUJI Einstein Institute of Mathematics talk. https://medium.com/@peligrietzer/informal-research-overview-deep-learning-sense-andliterature-with-applications-to-modernist-fc22f12858ae Grietzer P (2017b) Mood, vibe, system: the geometry of ambient meaning. PhD thesis, Harvard University. https://www.academia.edu/36402721/Mood_Vibe_System_The_Geometry_ of_Ambient_Meaning Haynes T (2007) Interview on NPR’s All Things Considered program, 15 Nov 2007. Audio recording. http://www.npr.org/templates/story/story.php?storyId=16303037 Hopkins GM, Blaisdell B (2013) Selected poems of Gerard Manley Hopkins. Dover Thrift Editions. Dover Publications. ISBN:9780486320779. https://books.google.com/books?id= rRHCAgAAQBAJ Jakobson R (1987) Raboty po poetike [Works on Poetics], chapter “Post Scriptum” to Questions de poetique. Progress, Moscow, pp 80–95. In Russian Jakobson R, Pomorska K, Rudy S (1987) Language in literature. Belknap Press series. Belknap Press. ISBN:9780674510289. https://books.google.com/books?id=5AEB8QfCtMMC Joyce J (2012) Finnegans wake. Wordsworth. ISBN:9781840226614 Kharms D (2002) Zapisnye knizhki. Dnevniki. [Notebooks. Diaries.]. Akademicheskij proekt, SPb Kolmogorov AN (1968) Three approaches to the quantitative definition of information. Int J Comput Math 2(1–4):157–168 Kolmogorov AN (1997) Semioticheskie poslaniia [Semiotic letters]. Novoe Literaturnoe Obozrenie (24):216–243. http://magazines.russ.ru/nlo/1997/24/kholmog.html

1014

D. Manin

Kolmogorov AN (2002) On possible applications of basic information-theoretic concepts to the study of verse, literary prose and translation techniques. In: Uspensky VA (ed) Works in nonmathematics, vol 2. OGI, Moscow, pp 743–745. In Russian Kontoyiannis I (1996/1997) The complexity and entropy of literary styles. NSF Technical report no. 97, Department of Statistics, Stanford University, June 1996/Oct 1997 Lotman IuM (1977) The structure of the artistic text. Michigan Slavic contributions. Department of Slavic Languages and Literature, University of Michigan Mandel O (1998) Fundamentals of the art of poetry. Sheffield Academic Press. ISBN:9781850758372 Manin DYu (2003) Explanatory note [to the site ygrec.org]. In Russian. http://ygrec.org/cgi/kl/ showComment.cgi?id=1 Manin DYu (2011) Chopped-up prose or liberated verse? An experimental study of Russian vers libre. Mod Philol 108(4):580–596. https://doi.org/10.1086/660697 Manin DY (2012) The right word in the left place: measuring lexical foregrounding in poetry and prose. Sci Study Lit 2(2):273–300. ISSN:2210-4372. https://doi.org/10.1075/ssol.2.2.05man Manin DY, Manin YI (2017) Cognitive networks: brains, internet, and civilizations. In: Sriraman B (ed) Humanizing mathematics and its philosophy. Birkhäuser, Cham. ISBN:978-3-319-612300. https://doi.org/10.1007/978-3-319-61231-7_9 McGlone MS, Tofighbakhsh J (1999) The Keats’ heuristic: Rhyme as reason in aphorism interpretation. Poetics 26(4):235–244. Merriam-Webster online (2018) “poetry”. https://www.merriam-webster.com/dictionary/poetry Moradi H, Grzymala-Busse JW, Roberts JA (1998) Entropy of English text: experiments with humans and a machine learning system based on rough sets. Inf Sci Int J (104):31–47 Paisley WJ (1966) The effects of authorship, topic structure, and time of composition on letter redundancy in English text. J Verbal Behav 5:28–34 Pereira FCN, Singer Y, Tishby N (1996) Beyond word n-grams. arXiv:cmp-lg/9607016 Perloff N (2017) Explodity: sound, image, and word in Russian futurist book art. The Getty Research Institute publications program. Getty Research Institute. ISBN:9781606065082. https://books.google.com/books?id=kYwmDwAAQBAJ Shannon CE (1948) A mathematical theory of communication. Bell Syst Techn J 27(3):379–423. ISSN:1538-7305. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x Shannon CE (1951) Prediction and entropy of printed English. Bell Syst Tech J 30(1):50–64. https://doi.org/10.1002/j.1538-7305.1951.tb01366.x Shklovsky V (2015) Art, as device. Poetics Today 36(3):151–174. Translated from Russian by Berlina A Teahan WJ, Cleary JG (1996) The entropy of English using PPM-based models. In: Proceedings of data compression conference DCC’96, Snowbird Uspensky VA (1997) A preface for the readers of Novoe literaturnoe obozrenie to Andrei Nikolaevich Kolmogorov’s Semiotic letters. Novoe Literaturnoe Obozrenie (24):123–209. http://magazines.russ.ru/nlo/1997/24/uspensky.html. In Russian Uspensky VA (2002) Works in non-mathematics. OGI, Moscow. ISBN:5-94282-086-4. In Russian Wikipedia (2018) Basic English. https://en.wikipedia.org/wiki/Basic_English Yaglom AM, Yaglom IM (1983) Probability and information. Theory and decision library. Springer, Netherlands. ISBN:9789027715227

Metaphor: A Key Element of Beauty in Poetry and Mathematics

35

Sânziana Caraman and Lorelei Caraman

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Beauty in Poetry and Math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Metaphors in Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Taxonomy of Mathematical Metaphors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Explicative or Homey Metaphors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discovery or Eureka Metaphors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creative or Special Metaphors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical and Poetic Metaphors: Differences and Similarities . . . . . . . . . . . . . . . . . . . . Seven Differences Between Mathematical and Poetic Metaphors . . . . . . . . . . . . . . . . . . . Seven Reasons Why Metaphor Creates Beauty (Emotion) in Poetry and Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1016 1017 1020 1021 1022 1026 1030 1034 1035 1037 1041 1042

Abstract Seeking to bridge the gap between sciences and arts, this chapter explores the deeper connections linking together Mathematics and Poetry. From several common areas where they meet each other like old friends, to the centrality of metaphor and metaphorical thinking in both domains, Metaphor: a Key Element of Beauty in Poetry and Mathematics endeavors to carve out a dynamic and playful transdisciplinary space in which differences melt into similarities,

S. Caraman () Department of Mathematics, Gheorghe Asachi Technical University, Iasi, Romania e-mail: [email protected] L. Caraman Department of English, Alexandru Ioan Cuza University, Iasi, Romania e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_110

1015

1016

S. Caraman and L. Caraman

allowing the mind to expansively perceive and embrace the boundless nature of the interaction between two fields commonly seen as being separate and disjoint.

Keywords Mathematics · Poetry · Metaphor · Beauty · Taxonomy

Introduction Many and many a years ago, In a kingdom by the sea

there lived together two different people, the Arts and the Sciences. It was in Plato’s and Archimedes’ time, when the dominant interest was for qualitative changes. Both reason and intuition were used for understanding natural phenomena. Grammar, rhetoric, and logic (the “trivium”) together with architecture, music, arithmetic, and geometry (the “quadrivium”) were all intertwined under the name of “the seven liberal arts” and, for centuries, flourished side by side. The angels, not half so happy in Heaven, Went envying their peaceful life. And this was the reason that, long ago, In this kingdom by the sea A wind blew out of a cloud, chilling their beautiful harmony.

So that, the Arts and Sciences were now separated. When the Renaissance period started, the sciences were increasingly influenced by the mathematical quantitative formalism of Galileo Galilei. Each people built its own kingdom but something was missing. For the moon never beams, And the stars never rise, without bringing dreams (Poe 1896a) of a world that was not divided.

So that, after a while, around the nineteenth century, the Sciences and the Arts started to search for one another again. And from time to time, they even meet. As they do, for instance, in this book. This chapter starts with a story: the allegory of the arts and sciences accompanied by Poe’s lines, because it is concerned with the atoms of story found in the world of imagination, spread like butterflies above the meadows of human fantasy. And because these tiny pieces of story transport thoughts into the realm of the fantastic, and also because they had to bear a name, they have been called metaphors, which in ancient Greek means “transfer.” As a schema of thinking, metaphor articulated itself since the very dawn of civilizations and, despite its long history, it still speaks of the eternal youth, freshness, unexpectedness, and spirit of adventure of the mind. The qualities that metaphor possesses are indispensable in the act of creation for two main domains of human spirituality which are Poetry and

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1017

Mathematics and, as shown in the following, metaphor is not only an important link between them but it also constitutes an essential ingredient that brings beauty in both domains.

Beauty in Poetry and Math But is there any real connection between the two domains? Apparently, nothing seems more distant than these two terms and to make a pair of Mathematics and Poetry is as incredible as seeing an elephant dancing with a giraffe. Well, this may be the image of somebody coming from the age when Art and Science were separated, a period that Henry calls the “modern barbary.” (1987). If one bends enough over this subject, one may observe that, from time to time, Mathematics meets Poetry. The existing literature around this topic already highlights some important links (Marcus 1998; Aharoni 2005; Grosholz 2018; see also the  Chap. 32, “Mathematics and Poetry: Arts of the Heart” in the present book). A short overview reveals the following: • Both belong to the same universe as they inhabit the realm of fantasy. Poems do not exist in the real world and numbers cannot be touched. Mathematical objects are abstract concepts that Graham-Dixon describes as “signposts to the next world, placed in this one” (qtd. in Pimm 2006) Poetry is interested in a second reality, which is located beyond our senses. In order to make this hidden world visible, poets create imaginary entities or phantasmagoric scenarios that may tell much more about this world than direct speech would. • The main feature in the land of Math and Poetry is freedom. Poets and mathematicians have complete liberty in choosing their subjects or the approach to them: “unlike physics, there is no reality against which mathematical truth must be measured” (Sinclair 2011). Mathematics is “free from empirical reality” (Rota 1997) and Poetry’s freedom has also another valence that does not regard the creator but the reader, who may have his or her personal interpretation. • Another common element that links Poetry and Math is their attraction for mysteries and their fascination with infinity. Modern mathematics started when infinity was tamed and transformed into a tool and in this way, unbounded spaces were conquered by large infinities, while the subatomic world was discovered with infinitesimals. On the other hand, immortality, the absolute, eternity, the never ending, and the nevermore are favorite themes of poets. • The pursuit of infinity, in all its possible forms, also has another ramification which is the quest for truth. Jan Zwich highlights this common element, saying that “mathematics shows us necessary truth unconstrained by time’s gravity; poetry, on the other hand, articulates the necessary truths of mortality.” (2006). • Even if the road towards truth may be long and difficult, it is interesting to observe, both in Mathematics and Poetry, the glittering of a certain desire to play. Mathematicians often express complicated knowledge by using a gamely background. Poets love to play too, hiding the truth in their lines and leaving

1018

S. Caraman and L. Caraman

only some traces, like the pebbles Hansel and Gretel left in the woods. Poet T. Roethke observes that “a poet must have his childhood close at hand” (qtd. in Aharoni 2005). • Mathematics and Poetry seem to diverge in one major area: that of means of expression. And yet there is a solid common feature, outlined by B. Lipman, who notes that “in both Mathematics and Poetry one finds a large amount of thought expressed in very few words.” (qtd. in Albers et al. 1990). Both domains appear to respect the principle of “the maximum of meaning in the minimum of expression.” (Marcus 1998). • The search for common patterns is a main concern for Mathematics and Poetry (Birken and Coon 2008; the  Chap. 36, “Poems Structured by Mathematics” in the present book) and can be found in arrangements of numbers or words, in progressions, repetitions, symmetries, in recurring symbols or images, in repeated occurrences of cycles, in proportions. There are patterns in forms and patterns in structures; there are visible patterns and hidden ones. And, as Warwick Sawyer notes, patterns and meanings are embedded into one another: “where there is a pattern there is a significance.” (Sawyer 1955). There are, therefore, real intersections between Mathematics and Poetry. But are all these enough to justify the assertion that “a mathematician who is not somewhat of a poet will never be a perfect mathematician” (Weierstrass qtd. in Moritz 1942) and also, perhaps, to understand J. Growney’s (2009) point of view that “good mathematics is poetry”? According to S. Glaz, it seems there is a more profound relation between Mathematics and Poetry, one that “defies all attempts to give it full explanation” (2010). I. Barbu speaks about a “spiritual high point” (1968) where these two domains meet and L. Kempthorne thinks that both Mathematics and Poetry “share an affective quality, otherwise indescribable.” (2015). Finally, what is this deep and, at the same time, mysterious connection that many refer to? A certain answer, coming from those “of many far wiser” may be that “there is another, more prominent common feature that makes us feel that mathematics and poetry are close and this is beauty.” (2005). This assertion, belonging to mathematician R. Aharoni, is in harmony with the French poet P. Valery, who considers that Mathematics and Poetry together give “a delicate and beautiful explanation of the world.” (1952). But what does mathematical beauty suppose? Any attempt to roll in the topic of beauty soon fails because this subject is caught in the crevasse between two slippery walls: subjectivity versus objectivity. Is beauty in the eye of the beholder or is it independent of it? There are many who think that beauty in Mathematics is an objective quality. C. Rota asserts that “both the truth of the theorem and its beauty are equally objective qualities, equally observable characteristics of a piece of mathematics which are equally shared and agreed upon by the community of mathematicians.” (1997b). Others argue that beauty is projected onto mathematics by the observer. As proof of the subjective nature of mathematical beauty, David Wells presented a list of 22 theorems to mathematicians around the world and found widespread variations in their aesthetical preferences (1990).

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1019

Mathematicians do agree on one subject however: it is not possible to imagine aesthetical norms for mathematical results. One may wonder then, what are the signs of beauty that a mathematical object has to possess? Bertrand Russell says that “mathematics, rightly viewed, not only possesses truth but also supreme, sublimely pure beauty, capable of a stern perfection such as only the greatest art can show.” (qtd. in Moritz 1942). But what does “rightly viewed” mean? Is there any connection between truth and beauty? For some, there is. H. Poincare and J. Hadamard were convinced that the subconscious plays a role in selecting beautiful ideas and these are those that will lead to productive results. They emphasized the intuitive aspect of mathematical discovery and the fact that aesthetic combinations of ideas may influence the development of mathematical knowledge (Poincare qtd. in Newman 1956; Hadamard 1945). For Poetry, the relation between truth and beauty is essential. Adopting the holistic view which says that the sum of the parts is less than the whole, poets also believe that separately the two concepts do not have the same value as together. Mathematically, it would sound like this: Truth + Beauty < Truth-Beauty, and the following lines illustrate this: Beauty becomes just valueless Where is no truth, And truth becomes charmless Where no beauty comes with The Truth Beauty, Muzahidul Reza (2018)

Sometimes, beauty and truth are seen as one and the same, which expressed mathematically becomes: Truth = Beauty, as in Keats’ famous Ode to a Grecian Urn: Beauty is truth, truth beauty, − that is all Ye know on earth, and ye need to know.

Even if mathematical beauty cannot be defined, some important characteristics of mathematical products that have aesthetic merits have been found. G.H. Hardy considered as main criteria: depth and significance and “pure aesthetic qualities” such as “unexpectedness, economy and inevitability.” (1940). He believes that “the mathematician’s patterns, like the painter’s or the poet’s must be beautiful; the ideas, like the colors or the words must fit together in a harmonious way.” (1940). One may thus speak about beauty not only in Poetry but in Mathematics as well. And yet, beauty is not universal: a theorem and a poem may both be beautiful but in different ways. So naturally one might wonder: is there a common element in the beauty of both Poetry and Mathematics? In Poetry, a main device that creates beauty is metaphor: Beauty is the language of metaphor And poetry without beauty It’s like poetry without metaphor. The Language of Poetry Is Metaphor, Shalom Freedman (2012)

Looking at the features of enlightenment, unexpectedness and depth, which are unanimously recognized for creating mathematical beauty, one realizes that these

1020

S. Caraman and L. Caraman

characteristics are also specific to metaphor. Therefore, the following question arises: is it possible that metaphors, or the “atoms of story” described at the beginning of this imaginary journey, are responsible for creating beauty, not only in Poetry, but in Mathematics as well?

Metaphors in Mathematics Metaphors are organically and intrinsically poetic. While not the sole technique for creating poetical effects, it is nonetheless so frequently used that sometimes Poetry itself is identified with the creation of metaphors. To Romantic poet William Wordsworth, poetry means “to discern similarities between things that look different to the passive observer.” (1805). While metaphor is defined by I.A. Richards as “a comparison between two seemingly dissimilar concepts that involves the carry on of a word from the normal use to a new use” (1936), it can easily be observed that the two definitions, for poetry and metaphor, are very close. The Poetry = Metaphor relation becomes obvious, for instance, in the following lines: A poem is a metaphor Isn’t it? Aren’t we all poems? Aren’t we all metaphors? A Poem is a Metaphor, isn’t it? Shalom Freedman (2009)

Metaphor is a key element in Poetry, but lately many studies consider it to be essential for mathematics as well (Grosholz 2018). It is rather surprising, since a mathematical radiography of the metaphor arrives at the following: there are two terms, A and B, which are said to be equal (A = B), when obviously they are different (A = B). This fact contradicts the Aristotelian principle standing at the foundation of logic, the “tertium non datur.” “To be or not to be,” this is accepted by mathematical reasoning, but “to be and not to be,” this is the question with apparently only one answer: “impossible!”, for according to mathematical rules it leads to nothingness. And yet, paradoxically, the power of metaphor comes from its bivalent structure. Two terms stay side by side, in a relation where both analogy and heterogeneity coexist. These antagonistically positioned forces create a fruitful tension that generates emotions. It is why poets choose metaphor as their preferred instrument. Playing skillfully on metaphor’s strings, poets can make it sing in various ways, fulfilling a vast range of functions: to surprise, to intrigue, to impress, to clarify, to amaze, to doubt, to wonder, to suggest. But what do all these have to do with mathematics, the world of rigor and rational thinking, equipped with a conceptual non equivoque apparatus? In the very “paragon of mental activity” (Nunez 2000) is there a place for metaphors? The traditional view of human thought evolved and mathematics cannot be seen as “a universal system of disembodied eternal truths, independent of human beings.” (Nunez 2000). Indeed, the latest discoveries and the new theories made mathematics shiver and new

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1021

questions arose. Is the universe deterministic? Is visual space Euclidean? Is binary logic fitted to our reality? Is the continuum hypothesis true or false? In the late nineteenth and continuing with the twentieth century, long-held models of thought began to change. Non-Euclidean geometries “burst into European consciousness” (Richards 2003) and showed that there are other conceivable descriptions of space. Chaos theory reveals that, in our universe, predictability cannot be taken for granted, quantum mechanics asserts that reality is probabilistic not deterministic, and Godel proves that there exists undecidability at the very core of mathematics. Even the ancient binary logic of Parmenides coming from the fourth century BC, that dominated human thinking for more than 2000 years, is no longer unique. An infinite-valued logic appeared, called fuzzy logic, which is a mathematical model of vagueness. It shelters under its roof, besides the two values of classical logic, true and false, the concept of “partial truth.” Intriguing is also the answer given by mathematicians regarding the veracity of the continuum hypothesis. The true or false possibilities are like the two faces of the same coin. P. Cohen proved that if one drops it on one face, we get a certain mathematics and when it is dropped on the other side, another mathematics emerges. Finally, one arrives at the conclusion that A = B and at the same time A = B is not so outrageous after all and the notion of metaphor, with its speculative, imaginative, equivocal nature, is not far away from the present mathematical landscape. Once our mind is liberated from what Lakoff and Nunez call the “romance of mathematics” (Lakoff and Nunez 2000), one realizes that metaphors are not only encountered quite often in mathematics, but they also fulfill a variety of functions. Sabine Masen describes metaphors as “fascinating units of knowledge transfer and knowledge transformation. By disorganizing the given stock of knowledge, they may eventually provide new perspectives on the issues in hand.” (Maasen 2000). But metaphor is not a mathematical instrument. So how can it help mathematics? Answering with an analogy, it is similar to the way complex analysis, based entirely on a fictive entity, the imaginary number “i,” is used to solve problems from the real case. Metaphor is thus a benefit for Mathematics not in spite, but because it is not a usual mathematical tool. As Arthur Miller says, metaphors are “means for extending our intuition into realms beyond sense perception. The study of the nature and uses of metaphor can help us to understand the process of accommodation and the nature of visual imagery.” (Miller 2000).

A Taxonomy of Mathematical Metaphors Why does Mathematics need metaphors? How do metaphors achieve their goal? What are metaphor’s realizations? An in-depth analysis of the different functions that metaphor fulfills in the vast domain of Mathematics and also the concrete strategies they use, followed by illustrative examples, may bring an answer to these questions. For mathematical metaphors, three main functions may be identified: to transmit knowledge, to transform knowledge, and to create new perspectives in knowledge. These carved the following main types of mathematical metaphors:

1022

S. Caraman and L. Caraman

“explicative or homey” metaphors, “discovery or eureka” metaphors and “creative or special” metaphors (Caraman and Caraman 2020). Note that these classes of metaphors are not disjoint and one considers that a certain metaphor belongs to one of the three categories, when one of its functions is dominant.

Explicative or Homey Metaphors The metaphors belonging to this class are conceived with the purpose of explaining or to translating from a complex mathematical language to a simpler mathematical one or even into a nonmathematical one, to make comprehension easier. Sometimes their goal is not only to make communication clearer but also more pleasant and memorable. Briefly, their main function is to transmit knowledge. The “homey” metaphors were used since Antiquity. Aristotle considered them to be “pleasant tools for learning” and they are still used nowadays, especially in teaching or in presenting mathematical results. If the explicative metaphor generally has an active profile, something is transmitted to somebody else, one may also encounter it in a reflexive role since it helps in gaining mathematical insight. As a mathematician confesses: “to understand a new concept I must create an appropriate metaphor. A personification. Or a spatial metaphor. A metaphor of structure. Only then can I answer questions, solve problems. I may even be able to perform some manipulation on the concept. Only when I have the metaphor. Without the metaphor I just can’t do it.” (Sfard 1994). At this level of metamorphosis, the first term of a metaphor, the ground (A), is a mathematical object and the second term of the metaphor, the figure (B), may be a more familiar mathematical object (such as a graph, a diagram, a geometrical figure) or an object pertaining to another domain than mathematics (for instance: a structure, a human activity, a drawing, a figurative expression, a riddle, a story). Regardless of the choice of (B), be it a mathematical object or not, the explicative metaphors offer a new way of looking at the mathematical entity (A), that “is often accompanied by a feeling of astonishment or of things falling into place of their coming home.”(Zwich 2006). This is the reason why this class of metaphors can also be called “homey.” In the following, we are going to illustrate how explicative metaphors work by choosing (A), the first term of the metaphor, as a famous theorem from topology, called Brouwer’s fixed point theorem which asserts that: (A) “Any continuous function f, that maps a closed ball from a n-dimensional Euclidean space into itself, has at least a fixed point, that is there exists an x such that f(x)=x.”

For different dimensions of the Euclidean space, appropriate metaphors may be imagined. For n = 1, the ball is a closed bounded interval [a, b] and in this case Brouwer’s theorem becomes a well-known result from Mathematical Analysis. Constructing a graph for the given function f, the abstract analytic form of the theorem (A) may be replaced by a less abstract formulation (B) (Fig. 1):

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics Fig. 1 Visual representation of Brouwer’s Theorem using a graph

1023

y

C

B

f(x)

0

x

A

x

Fig. 2 Visual Representation of Brouwer’s theorem using a structure

(B) “Any continuous curve, lying inside the square OABC that unites two points from the opposite vertical sides, crosses the diagonal AC of the square in at least one point.”

For n = 2, a metaphor may be constructed by using a structure with two sheets of papers. Note that, in topology, a disk is topologically equivalent with a rectangle, so that the use of sheets of paper is fitted to the content of the theorem (Fig. 2). (B) “If we take two sheets of paper, one lying directly above the other and crumple the top sheet and place it on top of the other sheet, then there is at least one point on the top sheet that is directly above the corresponding point on the bottom sheet.”

For n = 3, a metaphorical image may be realized with the help of a diagram (Fig. 3): (B) “If a ball is stretched, moved, distorted, without tearing, and the body remains within the same place it previously occupied, then there is a point that did not move.”

An inspired “imagistic domestication” (Wall 2012), for the three-dimensional case, was formulated by Brouwer himself, choosing the second term of the metaphor, the figure, from the domain of human activities. Using a familiar object, a cup of coffee, he employed the following analogy to explain the statement of the theorem:

1024

S. Caraman and L. Caraman

Fig. 3 Visual representation of Brouwer’s theorem using a diagram

(B) “No matter how much we stir the coffee in our cup, finally there will be a point in the same place as before stirring.”

As they particularly serve teaching purposes, homey metaphors are also designated as “pedagogical metaphors” (Gibbs 1994). Therefore, the choice of (B) may depend on the audience. For instance, if the fixed-point theorem is presented to students of Civil Engineering or Architecture, then an adequate metaphor would be: (B) “If one walks inside a building holding a scale model of it, then, in whichever room one enters and in whichever position one holds the model, there will always be a point from the model coinciding with the real point in the building.”

The homey metaphor may also be useful in rendering the beauty of mathematics to nonmathematicians. The various metaphors given by the above examples help not only to understand the content of the theorem but also to grasp its beauty. Or at least part of it. Because the beauty of the theorem does not come only from its simple and elegant form or from its intriguing content. It also has depth and relevance, and these derive from the tremendous amount of applications that Brouwer’s theorem generates. The creation of metaphors using narrative scenarios is frequently employed in the mathematical community. This way, “mathematics is concretely experienced; not the abstract logic of it, but heartbeat-by-heartbeat sense of getting the piece of knowledge through one’s eyes, hands, thoughts.” (Netz 2012). Many examples are to be found in Mathematics, where complex mathematical objects are replaced by literary constructions as stories, riddles, or poems. For instance, Hilbert imagined a metaphor showing an incredible property of the infinite sets, which is the following: “a part of the set may have the same size as the whole set.” The narrative scenario describes a hotel with an infinite number of rooms. A guest arrives, but all the rooms are full. The manager has an idea. He asks each guest to move to the next room and in this way, room number 1 is vacant. The next day an infinite number of guests comes to the hotel which is again full. Then the manager asks the guest from room 1 to move in room 2, the guest from room 2 to move in room 4, the guest from room 3 to number 6, and so on. In this way, an infinite number of rooms are cleared, all those with odd numbers. Regarding the above metaphor, R. Aharoni confesses that “for anyone who encounters Hilbert’s hotel for the first time, this probably seems mystifying, even entertaining, and possibly even beautiful. I must admit that, even

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1025

as a professional mathematician, who uses this idea on an everyday basis, Hilbert’s hotel still has not lost its charm for me.” (2005). Homey metaphors give Mathematics a human appearance. One encounters Dido, the queen of Carthage, thinking about isoperimetric problems: “What shape should this land have, enclosed by the strip made from a single ox hide, in order to obtain the greatest area for my future kingdom?” It is also possible to meet Bertrand Russel’s barber wondering about Cantor’s paradox: “To shave or not to shave myself?” One may even see an imaginary cat pursuing an imaginary mouse, both running with constant speed inside a circle, so that everyone may understand the continuity concept, or a monkey, typewriting symbols until the whole work of Shakespeare is restored. Metaphors encapsulated in figurative expressions are also seen, from time to time, in Mathematics. One finds the “golden number” or the “divine proportion” for a famous irrational number, “the butterfly effect” and the “horseshoe of Simile” for small changes that can produce huge effects in time, the “monstrous set,” “the pathological curve,” “Menger sponge,” “Koch- snowflake,” or “the fingerprint of God” for fractals, the “kissing number” for generating the Sierpinski gasket stochastically, the “small infinity” for constructing the limit, or the “blow up” for geometric transformations that replace a subset of a given space by al the directions pointing out of that subspace, “Hadamard’s billiards” for the chaotic motion of a free particle sliding frictionlessly on a surface of constant negative curvature, the “Eratosthenes’s sieve” which is an algorithm for finding prime numbers or “Gabriel’s horn” a geometrical figure with infinite surface and finite volume. Lakoff calls these types of metaphors “extraneous” (Lakoff and Nunez 2000), considering them to be expressive metaphors, like those in Poetry. However, approaching these metaphors, one observes they also have interesting attributions. They may be “economical carriers of complex meanings” (Maasen and Weingart 1997), sometimes emphasizing characteristic features of the mathematical object they refer to, or representing precursory names of a future concept. Each metaphor has its own universe and exploring it can prove to be an interesting journey and a way of entering the wonderland of mathematics. Now, turning to Poetry, one may observe that many metaphors are modelled using the strategy of the “homey” metaphor. That is the case with metaphors for which the ground (A) is an abstract object and the figure (B) is a concrete one, as in the following poem: “Hope” is the thing with feathers – That perches in the soul – And sings the tune without the words – And never stops – at all – Hope, Emily Dickinson (1961)

And indeed, “abstract words cannot touch the roots of thought, nor the profundities of emotions . . . To reach people’s mind it is better to construct images instead of abstractions, pictures that are close to the roots of thought. Mathematicians know this, as do poets.” (Aharoni 2005).

1026

S. Caraman and L. Caraman

Discovery or Eureka Metaphors An “insight into the wonderful concatenation of truth” (Gauss qtd. James 2002) reveals a very simple thing about Mathematics. New knowledge does not appear out of nowhere. It is rather old knowledge that suffers a transformation. This metamorphosis of old into new knowledge is to be found in the schema of the “discovery” metaphor. In this case, both terms of the metaphor (A) and (B) are mathematical objects. But if the explicative metaphor acts at the surface of the mathematical object, providing a better interpretation, the discovery metaphor works from inside the mathematical object; it helps to test hypotheses, to invent new methods, or to find ideas in demonstrations. The main function of discovery metaphors is to transform knowledge. It is worth noticing that, at this level, metaphors were not easily accepted in science and, for many years, they were hotly debated. Some consider that metaphors are “hazardous, controversial and provisional,” (Polya 1954) or “speculative, imaginative and not very precise.” (Bailer-Jones 2000). Others think they are just “ornamental, inessential and dangerous.” (Maasen and Weingart 1997). Even if they seem to be a gift, they may be a “trojan horse” and are not to be trusted. Butler agrees that analogy is often misleading but “it is the least misleading thing we have.” (1919). On the other hand, science without metaphor (analogy) is inconceivable. C. Simic sees metaphor as the “supreme way of searching for truth” (1990) and L. Blaga says that “the scientific spirit is the tactful chief of analogy.” (1937). The change of perspective with respect to the subject of metaphor happened around the 1960s and was greatly influenced by the works of M. Black, M. Hesse, and R. Harre, in which scientific models are viewed as metaphors (Black 1962; Hesse 1966; Harre 1960). Two decades later, the works of G. Lakoff and M. Johnson highlighted the conceptual nature of the metaphor and brought a scientific foundation to the present topic (Lakoff and Johnson 1980). “The scandal of the metaphor” (Eco 1983), as Umberto Eco calls it, is far from being over. However, the idea that metaphor is a figure of thought, not merely of language, has revolutionized the way metaphors are perceived. A metaphorical description of the discovery metaphor may be grasped in the following lines: What be metaphor? If not a way to help explore Wonders of Collective Minds Eyes that see or eyes of blind. Metaphor, Ray Lucero (2008)

The two mathematical objects (A) and (B) of the metaphor may be from different fields of mathematics, being the creations of several minds, sometimes even of generations of mathematicians. A discovery metaphor may thus be an exploration of the wonders of collective minds. If (A) and (B) are not hidden and may grab the attention of the eyes that see, the fruitful connection between (A) and (B) can be seen only in imagination, by the eyes of the blind.

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1027

In mathematics, discovery metaphors are fairly standard and, in many cases, one finds them at the turning point of demonstrations; it is why they can also be called “eureka” metaphors. According to the different strategies that they employ, they can be further classified into: “as if” metaphor, “billiard shot” metaphor, “hidden” metaphor, and “two faced” metaphor (Caraman and Caraman 2020). As If Discovery Metaphors (A) → (B) ⇒ (A) In this case, the strategy is the following: either (A) is modified in a convenient way so that it becomes (B), or a suitable analogy is found for (A) denoted again with (B). Then, one returns to (A), but enriched with the knowledge from (B), from which (A) is going to benefit. This type of metaphor may be recognized in one of the most beautiful theorems of Mathematics, Cantor’s theorem , from set theory. It asserts that: The size of the real set of numbers is greater than the size of the natural set of numbers.

Cantor used the reductio ad absurdum method. It is enough to show that [0,1] is uncountable and it follows that the set R is uncountable too. The first term of the metaphor is: (A) the set {x ∈ R / 0 ≤ x ≤ 1} is countable. This means that one may take the first number, the second, the third, and so on, and arrange them one under the other. The decimals now form a giant square. In this way, Cantor moved the problem from an abstract scenario (A) to a visual scenario (B) which is a square with infinite sides. Since any square has a diagonal, Cantor walks on this diagonal and is picking one by one all the digits. From the first row (which are the decimals of the first real number with the integer part 0), he takes d1 , from the second row (which are the decimals of the second real number with the integer part 0) he takes d2 , and so on. He obtains an infinite sequence of digits: d1 d2 d3 . . . dk . . . And then, like in a fairy tale, he changes all digits. Not in frogs or pumpkins but in other numbers. This is possible by using the spell of an algorithm, for instance, δ k = dk + 1 (mod 10). In this way, a sequence of digits has been formed δ1 δ2 δ 3 . . . δk . . . that cannot be found on any row from the square. Therefore (B) is not possible and since (B) is nothing else but a modified (A), it follows that (A) is not possible as well and the set [0,1] is thus uncountable, which means that | R | > | N |. The “as if”’ strategy is also frequently encountered in poetry, as revealed in the following lines: it’s some specific calling, saying, see that far-off other? we’re are related; see us as together, your surprise may tell you about the vastness- maybe, awe, wonder, beauty, wisdom, truth . . . In praise of metaphor, Michael Shepherd (2007a)

The linking of the two terms of the metaphor produces not only beauty but also wisdom. And sometimes one may discover a whole world in a sole poem:

1028

S. Caraman and L. Caraman

Letters can change into music, Words can turn into magic, A pen can be your sword, A poem can be the world. A poem can be the world, Gabrielle Yana Concepcion (2014)

Billiard Shot Discovery Metaphors (B) ⇒ (A) In the “billiard shot” strategy, one starts with (B) and, either modifying it, or finding a suitable analogy, one arrives at (A), where interesting properties of (B) are used. It is like in the game of billiard when it is not possible to move the ball (A) directly, so one first hits the ball (B) which is going to reach (A) and, with its force, change the position of the ball (A). An illustrative example of this type of metaphor is the way Archimedes computed the area A of the circle. He first considered a right triangle with one cathetus being equal to the radius r of the circle and the other having the same length as the circumference C of the circle. The area of this triangle is T = 12 rC. There are three possibilities: A > T, A < T or A = T. In order to prove that both A < T and A > T are false, Archimedes used a double reductio ad absurdum method and two other metaphorical constructions, the inscribed and the circumscribed polygons. His argument is based on two facts. The first is an observation regarding the relation between the circumference of the circle C and the perimeter of the inscribed polygon p and the circumscribed polygon P, which is obviously the following: p < C < P. The second assertion is more sophisticated as it already contains the seeds of the integral calculus. It claims that for any circle one can find polygons with the area “as close as we want” to the area of the circle. If T – A > 0, then there exists a circumscribed polygon with the apothem h, such that: A(circ. polygon) – A < T – A. It follows that A(circ. polygon) < T = 12 rC < 12 hP = A(circ. polygon), which is false. Similarly, it is proved that A – T > 0 is not possible. Therefore A = T, which means that the area of the circle is A = πr2 (Fig. 4).

Fig. 4 The billiard-shot metaphor in Archimedes’ proof

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1029

In poetry, the “billiard-shot” strategy is highly praised as it can, in the words of Emily Dickinson “tell all truth, but tell it slant.” In many cases, only the figure (B) is visible, while the ground (A) is implied. Michael Shepherd describes metaphor as the: incongruous instrument of speech with which we say one thing, when we mean quite another. “Seeking Metaphor,” Michael Shepherd (2007b)

This definition suits the billiard-shot metaphor perfectly. As in the poem below, where the poet uses the image of the sea to touch upon the theme of his own life: I’d like to stand on deck on a boat and jump in the sea and say, follow me, and know you would. The sea is cold and it’s deep, too, I’d joke standing at the edge of the boat’s bow. A wind breathes across the sea, joining gently the edges of time. Vestiges, A. Van Jordan (2005)

Hidden Discovery Metaphors (A) → ?⇒ (A) This type of metaphor appears when techniques of (B) are used in (A) because there is an analogy between (A) and (B), but (B) is not mentioned. In mathematics, metaphorical thinking is encountered quite often as a cognitive process, but in many cases, this process remains invisible. For instance, when a polynomial decomposition is performed, one has in mind the decomposition of the integers in prime factors. The set Z of integer, with its algebraic properties, is also a good analogy for other sets such as: the continuous functions, the matrixes, the series. This will lead (see “Creative Metaphors”) to the so-called “birth-concept” metaphors. Poetry is populated with hidden metaphors , which enhance the atmosphere of mystery. There are cases when only traces of the figure are shown, as in the following lines. Tell me, if I caught you one day and kissed the sole of your foot, wouldn’t you limp a little then, afraid to crush my kiss? Poem, Nichita Stanescu (1964)

In the above poem, the ground (A) is the kiss and the figure (B) is an unknown precious and delicate thing. Only its trace is visible: “wouldn’t you limp a little then?” If (B) had been revealed to the reader’s eyes (a flower, a jewel, a cake, a letter), the aesthetic effect would not have been as strong. Two-Faced Discovery Metaphor (A) ⇐⇒ (B) In this category of metaphors, the roles of ground and figure are switched. Not only does (B) have an effect over (A), but in its turn, (B) is semantically influenced

1030

S. Caraman and L. Caraman

Fig. 5 Visual representation of two-faced metaphors

by (A). For instance, in the Euclidian plane, vectors can be seen as pairs of real numbers, but also inversely. That is, if P(a, b) is a point from the cartesian plane xy, −→ then it may be identified with a vector OP having the tail in the point O(0,0) and −→ the head in P. At its turn, a vector OP may be identified with a pair of numbers (a, b), which are the scalar projections of the vector on the axes of coordinates (Fig. 5). Like in Poetry, where a certain figure (B) may be associated with different grounds (A), the Euclidian plane is also encountered in other metaphorical constructions. An example is the identification of the complex numbers with points in the plane by replacing the y-axis with an imaginary axis. M. Reyers calls this metaphoric maneuver “rhetorical-discursive invention” (see the  Chap. 102, “Mathematics and Rhetoric” in the present handbook). In Poetry, two-faced metaphors increase the effect of unexpectedness. For instance, the world is famously identified with the stage and life with the play. “Life” is thus the ground and the “play” is the figure. In the following poem, the traditional roles change; the figure is “the life” and the ground becomes the “play”: “And so, the play is over, like the play of the life of a man.” The Book of Life, Jacob Steinberg (qt. in Aharoni 2005)

Creative or Special Metaphors These metaphors fit A. Miller’s definition: “being tools for exploration, metaphors provide entry into possible worlds.” (Miller 2000). Their paradoxical nature has a stimulant effect: “it pushes us out of our airtight logical mental compartments and opens the door to new ideas, new insights, deeper understanding.” (Byers 2007). According to Kepler, the value of analogy lies in the most spacious field of invention that it opens, rather than in the argumentation (Kepler 1960). Generally, in mathematics, a metaphor is meant to solve a concrete problem but sometimes

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1031

the metaphoric maneuver surpasses the boundary of the particular problem which inspired it. One may say that, in this case, the metaphor gains a flavor of universality. Creative metaphors come either from explicative metaphors or from discovery metaphors, but they have a special destiny. For instance, the metaphor that Cantor imagined in order to prove that the infinity of the reals is greater than the infinity of the naturals, was taken and applied to prove other results. His metaphor is thus a creative one, known as the “diagonal method.” Among special metaphors, some interesting subtypes include dictatorial special metaphors, germinating special metaphors, birth concept special metaphors, and bird special metaphors. Dictatorial Special Metaphors The name “dictatorial” was inspired by Solomon Marcus, who speaks about the “dictatorship of metaphors.” (Marcus 2011). He refers to those powerful metaphors that may even dictate new directions of research in certain domains. For instance, “the arrow” metaphor in the category theories, the “chaos metaphor” for nonlinear system analysis and even for science in general (see James Gleick’s book “Chaos: Making a New Science”), the “virus” metaphor in informatics, the “black box” in cybernetics, the “neurological” metaphor in the domain of international relation, the metaphor of the “chess play” in linguistics and ethnology, are only a few examples. These metaphors may also dictate a certain direction in the way reality is perceived because our thoughts are prisoners of the figure of speech in which knowledge is shaped. A.Wall warns about the danger of forgetting that we are inside a metaphor: “delude yourself that it is not a metaphor, merely a transparent filament between yourself and reality, and you are trapped, like a metaphoric fly in a metaphoric amber.” (Wall 2012). He gives as an example the case of Heisenberg, when he formulated the Principle of Uncertainty, he had to defeat a misleading metaphor, which is the “path of the electron.” Therefore, “metaphor is a doubleedged sword” (Sfard 1998); on one hand, it provides a “terra ferma” for new knowledge, and on the other hand, it may enclose imagination in its proper bounds. Germinating Special Metaphors “Germinating” metaphors are those metaphors that have such a high degree of suggestion that they transgress the frontiers of their own domain. They are like seeds that can sprout in other fields. Even if conceived with the goal of better communicating a mathematical result, they nevertheless succeed in inflaming human imagination and their spread may be spectacular. A vivid example is the “butterfly effect” metaphor. Its story started with Edward Lorenz’s attempt to study the behavior of the solutions of a nonlinear differential equation system. The simplified mathematical model of atmospheric convection that he considered manifested a sensitive dependence on initial conditions. This means that two trajectories that start with very close initial conditions may subsequently arrive to be totally different. The amazing result was presented in 1972, by E. Lorenz at the AAAS meeting, having an equally amazing title “Predictability: Does the flap of a butterfly’s wing in Brazil set off a tornado in Texas?”

1032

S. Caraman and L. Caraman

The short version of this metaphor is known as the “butterfly effect” and its widespread use is prodigious. One finds it not only in physics, chemistry, biology, and generally in sciences but also in music, video games, cinema, fiction novels, fashion shows, and art galleries. It is interesting that this principle, that small changes may have drastic consequences, was formulated, in literature, long before Lorenz. In 1800, in the novel “The vocation of man,” Fichte wrote: “You could not remove a single grain of sand from its place without thereby changing something throughout all parts of the immeasurable whole” (Fichte 1800). This proves that the words which carve the truth are as important as the truth itself. And sometimes the form of a metaphor may surpass the message it carries. Birth Concept Special Metaphors Yuri I. Manin imagined the future of mathematics as “an exploration of metaphors that are already visible but not understood.” (2007). Birth concept metaphors are precisely this kind of metaphors. They describe a certain mathematical phenomenon for which there still does not exist a rigorous mathematical definition. They are a kind of proto-concepts or metaphors that prepare a birth concept. Solomon Marcus says that the way from metaphor to concept is gradual and one may speak of degrees of the metaphorical and degrees of the conceptualization (Marcus 2011). In mathematics, there are many examples of such metaphors: continuous functions are imagined by Euler, using the metaphor “a curve described by freely using the hand” and Dedekind speaks about a linear continuum, as “a dense aggregate with no gaps.” The “small infinity” precedes the limit concept. It is with the help of the “infinitesimals” that Leibniz defined the notion of differential and Newton, the fluxions. The birth of a new concept may sometimes encounter difficulties, until it reaches its perfect shape. For instance, the use of the “infinitesimals” (which are those entities smaller than any feasible measurement, but are not zero) was criticized by Berkley, considering them to be a fuzzy notion that he ironically named “ghosts of departed quantities.” (1734). The struggle of finding names for things is also specific to poetry: . . . .as imagination bodies forth The forms of things unknown, the poet’s pen Turns them to shapes and gives to airy nothing A local habitation and a name. Midsummer’s Night Dream, William Shakespeare

There are also birth concept metaphors whose point of origin is a discovery metaphor. When a figure (B) proves to be useful for several grounds (A), then a usual technique is to empty (B) of its elements, while keeping its structure and this way, (B) is replaced by a new born abstract concept. For instance, Z, the set of integers, is a fruitful analogy for different sets: the polynomials of one variable, the continuous functions, the matrixes, and the series. So that mathematicians created a depersonalized new concept called “ring,” by taking only the shell of Z (the algebraic properties), without its elements, that is without the integral numbers. The strategy used in this case is the reverse of the “homey” metaphor. One may even name them “yemoh” (homey spelled backwards) metaphors, but in mathematics,

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1033

one calls it “generalization” and it is a frequently used process. Even if in poetry the customary way of creating metaphors is by replacing the abstract by the concrete, sometimes one encounters the reverse (the “yemoh” metaphor) as in the following lines by Edgar Allan Poe: A full-orbed moon, that, like thine own soul, soaring, Sought a precipitate pathway up through heaven. To Helen, E.A. Poe (1896b)

Bird Special Metaphors Freeman Dyson, in the Foreword to Yuri Manin’s “Mathematics as Metaphor,” speaks about metaphors as being “the greatest achievements in mathematics, linking one world of ideas with another.” (2007). He even makes a daring description of mathematicians: some, he argues, are “birds” and others are “frogs.” Those capable of creating metaphors, the birds, are those whose thoughts fly so high they catch, in their perspective, different domains and establish fertile connections. Mathematics offers many such examples. For instance, Descartes, when he imagined a pair of numbers as a point in the plane, linked algebra and geometry. The new domain that emerged, analytic geometry, is the foundation of most modern fields of geometry. Newton connected geometry and dynamics by observing that both the slope of the tangent to a plane curve and the speed may be computed by the same technique, which is the derivative (the fluxion). The gate of differential calculus was thus opened. Boole had the idea of describing logical operations (conjunction, disjunction, and negation) in the same way elementary algebra describes operations with numbers. He thus connected logic and algebra and today all modern computers perform their functions using two-value Boolean logic. In the case of “bird” metaphors, one observes an intensification of the interaction of the two terms (A) and (B) of the metaphor, like in the case of the “two faced” metaphor. It is not only (B) that, by its action helps (A) discover in itself new aspects, but in its turn (B), under the influence of (A), receives a new cognitive potential. Finally, the joining of the two terms is so beneficial that the knowledge transformation brings new horizons to the respective topic. For the “birds” metaphors, two main strategies are visible. One is the “linking” strategy, in which the joining of the two terms (A) and (B) produces a transformation of both elements of the metaphor, but the features of (A) and (B) are still recognizable. This is the case with Descartes coordinates, where the geometric and algebraic properties are visible and peacefully coexist in the new world of analytic geometry. The other strategy is a “fusion” technique, in which (B) is melted in (A), as for instance the Riemann integral for which neither the geometrical properties of computing areas, nor the limit operation from analysis, are visible. Considering the taxonomy of metaphors above, one may speak about the “unreasonable effectiveness” of metaphors not only in poetry, but in mathematics as well. The classification of mathematical metaphors may be connected with the taxonomy considered by I.A. Richards for linguistic metaphors that presents three aspects: “ornamental or expressive,” “enriching or modifying,” and “creative” (1936). Explicative metaphors are similar to the ornamental ones, because they both

1034

S. Caraman and L. Caraman

act at the surface of the object they describe. But if the mathematical metaphor aims to clarify, in poetry, literature, and rhetoric, the ornamental metaphor has the purpose of obtaining aesthetic effects. Discovery metaphors may be linked to enriching metaphors, since both are characterized by the fact that the ground (A), under the influence of the figure (B), suffers an enriching transformation. In the case of creative metaphors, mathematical intuition and poetical inspiration excel. Even if the strategies for creative metaphors described above are not encountered in Poetry (with maybe the exception of birth-concept metaphor), in both domains they bring changes to the way the world is seen. The moments, when creative metaphors appear, are described in the following lines: . . . moments when the mind’s a god, and life itself a metaphor; a glimpse that somewhere, two things mentioned meet under the astonished, single gaze of eternity itself Seeking Metaphor, Michael Shepherd (2007b)

Mathematical and Poetic Metaphors: Differences and Similarities Metaphors are not only found at the crossing of Mathematics and Poetry. They also accompany poets and mathematicians, like faithful pilgrims, in what one may call “the adventure of finding truth.” Thought does not crush to stone. The great sledge drops in vain. Truth never is undone; Its shafts remain. The Adamant, Theodore Roethke (1941)

But even if the truth is “never undone,” sometimes it is difficult to grasp. As Sue William Silverman confesses, “it is impossible to live every life, fight every war, battle every illness, belong to every tribe, believe every religion. The only way we may come close to the whole experience is by embracing what we see both inside and outside the window of the page.” (2009). And metaphors can give us the whole experience. With its bicameral structure, it may link the abstract to the concrete, the mysterious to the familiar, the universal to the particular, the already known to the unknown, the inside with the outside of the page. However, one may wonder, how can the same schema of thinking help create things that are so different, such as a poem or a theorem? The answer can be found in its water-like quality, allowing metaphor to take the form of the structure into which it is poured. Even if mathematical and poetical metaphors use, in many cases, the same strategies (as shown in this chapter), they may fulfill a diversity of functions. An appropriate metaphor for metaphor itself would thus be a musical instrument. As a violin may be played in different tempos (from adagio and andante, to allegro or presto) and may use different levels of loudness (piano, pianissimo or forte, fortissimo), creating totally different tunes, so can a metaphor. The rhythm, the dynamic, or the structure of a metaphor is not the same in Poetry and in Mathematics.

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1035

Seven Differences Between Mathematical and Poetic Metaphors The Principle of Simplicity In Mathematics, the figure (B) is chosen in such a way that the arrival to the discovery of truth concerning the ground (A) is done in the simplest and also clearest way possible. Therefore (B) may be a more familiar object than (A), or may possess properties that are easier to exploit than those existing in (A). In Poetry, there is no such a principle. On the contrary, the imagistic resolution of poetic metaphors is much lower than the mathematical one, since a light too bright may harm the haze of mystery in which the truth lies, as shown in the poem below: Crushing the world’s crown of wonders is not what I do and killing, with my mind, the mysteries I find along the way, in flowers, eyes, on lips, or graves, is not what I do. Others’ light breaks the spell of fathomless darkness, but I, with my light, deepen the world’s riddles – and as, with its white rays, the moon lessens not, but tremblingly deepens night’s mystery, I, too, amplify the dark horizon with broad tremors of holy mystery and all that’s unrevealed turns ever more unknowable under my gaze – for I so love both flowers and eyes, and lips and graves. I do not crush the corolla of wonders, L. Blaga (2019)

The Principle of Speed Even if the aim of both Mathematics and Poetry is to arrive at the truth, mathematical metaphor aims for the shortest way to get there. The faster, the better. Poetic metaphor is not in a hurry and prefers the sinuous path that leads to the truth. In Poetry, a principle of slowness can be observed, aiming “to increase the difficulty and length perception, because the process of perception is an aesthetic end in itself and must be prolonged.” (Shklovski 2017). Metaphor, with its multitude of nuances, contributes to the so-called labyrinthic effect that language possesses. As Wittgenstein observes: “language is a labyrinth of paths. You approach from one side and you know your way about; you approach the same place from another side and no longer know your way about.” (1972). The Constraints In Mathematics, the construction of a metaphor has to take into account previous existing rules and when (B) is chosen, it is necessary to prove that “inferences are preserved.” (Lakoff and Nunez 2000). In Poetry, even if the choice of (B) is never arbitrary, there are practically no constraints (with the exception of those dictated by the common sense). It is what L. Kronecker once said: “are not mathematicians

1036

S. Caraman and L. Caraman

veritable and innate poets? Indeed, they are, just that their representation ought to be demonstrated.” (qtd. in Brescan 2009). The Different Nature of Metaphors The nature of mathematical metaphor is conceptual. Its structure is based on concepts explicitly defined and the meaning of a certain metaphor does not change if the concrete problem for which it was created is replaced with another. Whereas the structure of the metaphor encountered in Poetry is often contextual: “poetry is trying to diminish, if not to cancel, the conceptual dimension of language, by leaving the dictionary meaning of words and replacing it with a contextual, ad-hoc meaning, the reader has to build and learn on his own.” (Marcus 1998). Poets use metaphor in poetry To tell one thing to mean another Behind the curtain under the layers They hide thousand meanings. Religion is a metaphor, Abdul Wahab (2013)

As G. Braque remarks: “art is made to disturb, science reassures.” The invitation to decipher the multitude of meanings gives to poetic metaphors a wider opening to interpretations than the scientific one. Poetic metaphor is thus a reflexive kind of construct, whereas mathematical metaphor is rather a transitive one. Metaphor, Between Rigor and Imprecision In Mathematics, a good discovery metaphor is not only that which leads to an interesting result but also reaches a high level of rigor, whereas for the explicative metaphor, the main quality is clarity. In Poetry, things are quite the opposite. Poets prefer to suggest rather than to assert. This is why, in many cases, metaphors are imprecise, creating ambiguity, paradox, mystery. In the poem “The Lake of Metaphor,” this feature of vagueness is called by M. Shepherd (2007c) “the grace of metaphor” and it is described as “the brushing of a white dove’s wing.” The Splendor and Decline of Metaphors Poets consider that the main feature of a metaphor is vividness. But everything that is alive may also die. This happens to a metaphor which is too frequently used and, in this way, loses its aura of surprise. It becomes a “dead” or a “frozen” metaphor: To be a metaphor You need to know your place; Stay around too long, you lose The vital force; no one believes In the unbelievable-when there’s no mystery, That’s the end of metaphor for man. Reborn metaphor, Michel Shepherd (2006)

In Mathematics, when a metaphor that was created for a particular case enters in the common use of other problems, it becomes a method, a technique, or a principle. So that, the same phenomenon, the attrition, brings the metaphor in one domain to decline and in the other to splendor.

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1037

Beauty, Between Motivation and Consequence In Poetry, metaphor is created with an aesthetic purpose. Poets polish the facets of metaphors in order to make them shine, as one may readily observe in the following lines: . . . at its heights, a metaphor shines like new light; bringing together, two images so disparate and making of their neighbouring, a moment magical in memory as if we’d never seen the world so brilliant or so revealing. Seeking Metaphor, Michael Shepherd (2007b)

While in mathematics metaphor is not created with an aesthetic design, beauty may appear as a byproduct. After the burning of a mathematical demonstration is finished, sometimes, in its ashes, one finds the beauty of a metaphor. For metaphor is not only a key element in the link between Mathematics and Poetry, being “unreasonably effective” in uncovering the truth in both domains, but it is also “unreasonably effective” at creating beauty. Why? Why and how does a metaphor create beauty? Where does its power over the human mind come from? And how is it that one can state if a poem or a theorem is beautiful or not, even without having a rigorous definition of beauty or clear aesthetical norms, in Poetry or in Mathematics? Even if one does not have an explanation for the “beauty or not beauty” verdict, in many cases, the decision is accompanied by a halo of certitude. Thomas Hardy says: “we may not know quite what we mean by a beautiful poem, but that does not prevent us from recognizing one when we read it.” (1940). Paul Erdos asserts that numbers are beautiful but he cannot explain why, as he cannot say why Beethoven’s Ninth Symphony is beautiful (qtd. in Delvin 2000). Therefore, one may conclude that the certainty of aesthetical assessments does have not a rational texture but rather an emotional one. The question thus has to be reformulated to “why does a metaphor create emotion?”

Seven Reasons Why Metaphor Creates Beauty (Emotion) in Poetry and Mathematics A Fresh Look Over the World Since the very dawn of civilization, the aliveness and vividness of metaphor has been praised. Aristotle speaks about this in his Rhetoric: “ordinary words convey only what we know already; it is from metaphor that we can best get hold of something fresh.” And poets tell that metaphors are: making new language out of words grown old, sounding new sounds of tales that are not yet. The secret treasured world of metaphor, Michael Shepherd (2007d)

New images or ideas attached to common things or words, awaken our attention, having a revitalizing effect and provoking a feeling of pleasure. For instance,

1038

S. Caraman and L. Caraman

everyone knows what fog is, but when one reads, Carl Sandburg’s poem, one feels amazement and delight: The fog comes on little cat feet It sits looking over harbor and city on silent haunches and then, moves on. The Fog, Carl Sandburg (1994)

Likewise, in Mathematics, we are enchanted and surprised, as in front of a magic trick, when Cantor walks along the diagonal of its imaginary square with infinite sides, picking up decimals and proving, in this way, that not all the infinities have the same size and there is a hierarchy among them. Due to their unexpectedness, metaphors often constitute Eureka moments. Rota talks about these moments “as the instantaneousness of a light bulb being lit” (1997) and Hadamard compares the feeling of enlightenment to “a flash of the insight” and to the “light illuminating the darkness.” (1945). They all describe the sudden apparition of beauty. The Pleasure of the Game One of the encounters between Mathematics and Poetry is facilitated by the playfulness they both possess. In the mark of the metaphor, one also finds a ludic line. It may be pronounced or blurry but it is never missing from the constructive schema of a metaphor. The “let us pretend that (A) is (B),” when one knows very well that (A) is not (B), resembles childhood games when children imagine being a certain character (mother, father, a princess, a ninja, a cowboy). Metaphors, through the detour they make in an imaginary world, become “atoms of story.” And stories have always fascinated human minds. the wind is a Lady with bright slender eyes (who moves) at sunset and who-touches-the hills without any reason. Spring, E.E. Cummings (1991)

The Reverberance Through the Senses Metaphors, especially those tailored for Poetry (but also for “homey” mathematical metaphors) have, in many cases, as figure (B) an object that can be perceived by the senses. It is for this reason that metaphors may already communicate to the reader (listener) something, before the message is fully understood. It is a subliminal communication that makes the entering into the small universe of the metaphor to be done on two levels: the sensitive and the rational one. And this participation, with “body and mind,” makes the felt emotion more powerful. “A fruitful or important analogy is one that establishes a deep field of resonance.” (Zwich 2006). As in the following poem of Pablo Neruda (2020), in which the metaphors invade our imagination with visual, gustatory, tactile, and auditory images:

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1039

Oh, love is a journey with water and stars with drowning air and storms of flour; Love is a clash of lightnings, two bodies subdued by one honey. Sonnet 12, Pablo Neruda (2020)

Saving Mental Energy In Mathematics, the first term of a metaphor, the ground (A), may be a complex concept or a complicated problem. It is a sky full of dark clouds where no light is passing through. One is prepared to struggle with a tempest of difficulties. Suddenly, the choice of the figure (B) brings the clarity of a blue sky and our mind can sharply grasp it. Therefore, the mental energy prepared to be consumed is saved. The Freudian principle of “saving mental energy” results from the difference between what one expects and what finally happens, creating a feeling of happiness. It is like in the case of Gauss identity, when instead of computing the sum of the first n positive integers, 1 + 2 + 3 + . . . n = n(n + 1)/2, one constructs a suitable metaphor in the following way. One transforms the numbers into dots, one doubles the sum, then one arranges them in such a way to form a rectangle and suddenly the proof of the statement stares us in the face (Fig. 6). A similar phenomenon takes place in Poetry as well. Metaphors, especially profound ones, generally create a gap between what we consciously read (from the words that are in front of our eyes) and the unconscious perceptions we are invaded by. This difference creates the “saving mental energy” effect, as in the following poem: The apparition of these faces in the crowd; Petals on a wet, black bough. In a station of the Metro, Ezra Pound (2013)

Fig. 6 A visual metaphor for Gauss identity

1040

S. Caraman and L. Caraman

From just a few words, a flood of thoughts and emotions rush towards us, without any effort: the impossibility of catching the moment, the transiency of life, the many things that pass us by, but are lost forever. Revealing Patterns Metaphors find similarity in difference and this is done by revealing common patterns. And recognizing these patterns gives us a pleasant feeling coming from a sense of relief, because it brings us to the idea that there is a hidden order in the universe. One only has to discover it. In R. Aharoni’s words: “what poetry does to human emotions and cravings, mathematicians do to order in the material world. It tries to find the internal logic of things.” (2005). In mathematics, one example would be the almost crushing feeling one has when discovering that the distribution of the roots of the “zeta function” of Riemann is similar to the distribution of the primes in the sequence of the positive integers. Poets, in turn, are especially attracted to and fascinated with patterns: What more could a poet wish for, Than this round room where The light turns into river And the river into light. Holderin, Ana Blandiana (2016)

Chaos is frightening, while order is reassuring and beautiful. Je hais le mouvement qui déplacent les lignes: says French poet Charles Baudelaire (1868), referring to unpleasant disorder. The human mind likes to discover and understand models because it aids in making predictions. Projections into the future are a way in which thoughts may escape the physical world and travel in another dimension. The Forbidden Apple The strategy of metaphor possesses a flavor of paradox, as it uses a non-truth (or a half truth, or an approximative truth) in order to arrive at the truth. This contradictory “half Jesus and half Juda” (Lesovici 1999) structure fits with these lines from Romanian poet L. Blaga: Where does the light of Heaven come from? I know, it is hell that lights it with its flames. The Light of Heaven, Lucian Blaga (1919)

Its almost ambiguous nature brings a creative dynamism that seems to have inspired the following definition: “Metaphor is an affair between a predicate with a past and an object that yields while protesting.” (Godman 1976). Accepting a metaphor is accepting to break the laws of logic (to be and not to be at the same time), and this awakens an almost guilty pleasure of biting into the “forbidden apple.” Discovering the Trace A new experience always also carries, within itself, something of a past experience. Andre Gide formulates this idea by using the following metaphor: “every wave owns

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1041

the beauty of its line to the withdrawal of the preceding one.” In Derridean terms, the past leaves its trace into the present. As everyone has a shadow, every event contains a “minimal repeatability” (Derrida 1997). But in many cases, this trace remains in the hidden part of our consciousness. A metaphor may reveal it to us. The “finding of trace” is in the same class of equivalence (with respect to the emotions it generates), with other actions that humans do: returning home, rereading one’s diary, looking at old photos, talking with friends about past experience, etc. Finding the trace that our life left, while passing through the years, sometimes silently and sometimes noisily, sometimes slowly and sometimes in a hurry, is always precious. Therefore, in order to perceive the beauty of a metaphor, one has to find within it, elements of our personal universe, coming from past experiences, from our knowledge, or from our dreams, from something that left a trace in our life. Otherwise it cannot have power over us. For metaphor is a holy sacrament: one should never dare to use it without some faint echo of a moment clear recalled when that which one refers to, came dazzling bright into the mind as life transfigured to another world, time lifted to the timeless. Seeking Metaphor, Michael Shepherd (2007b)

Conclusions If metaphor’s natural habitat is considered to be Poetry, this chapter has shown that metaphors may also wander about, with great naturalness, in the land of Mathematics. The traces left by its steps can be named “the poetry of mathematics.” On the other hand, the mathematical model of metaphor is the function, which is a main concept in Mathematics. Lakoff (1980) describes metaphor as being “a grounded, inferences preserving, cross-disciplinary mapping.” Having thus a solid mathematical structure, metaphor moves graciously and freely everywhere, but reigns in Poetry. It is what may be called “the mathematics in poetry.” But what is even more spectacular about this topic is that the traces metaphors leave, both in Mathematics and Poetry, are in many cases, traces of beauty.

Cross-References  Mathematics and Poetry: Arts of the Heart  Poems Structured by Mathematics Acknowledgments The authors thank the following people and publishers for their permission to use the material found in this chapter. “Vestiges” by A. Van Jordan, originally published in Poema-Day on May 27, 2015, by the Academy of American Poets. Copyright © 2015 by A. Van Jordan. Excerpt reprinted here by kind permission of the author. “Metaphor” by Ray Lucero appeared on Poem Hunter site, on August 18, 2008. All poems on this website are the property of their respective owners. The selected lines of these poems are reprinted here by kind permission of the author.

1042

S. Caraman and L. Caraman

References Aharoni R (2005) Mathematics, poetry and beauty. World Scientific Publishing, Singapore Albers D, Alexanderson G, Reid C (1990) More mathematical people: contemporary conversations. Harcourt Brace Jovanovich, Boston Bailer-Jones DM (2000) Scientific models as metaphors. In: Hallyn F (ed) Metaphor and analogy in the sciences. Springer, Dordrecht, pp 199–244 Barbu I (1968) Pagini de proza. Editura pentru literatura, Bucuresti Baudelaire C (1868) La Beauté in “Les Fleurs du Mal”. Calmann – Lévy, Paris Berkeley G (1734) The analyst: a discourse addressed to an infidel mathematician. https://www. maths.tcd.ie/pub/HistMath/People/Berkeley/Analyst/Analyst.html Birken M, Coon AC (2008) Discovering patterns in mathematics and poetry. Rodope, New York Black M (1962) Models and metaphors. Cornell University Press, Ithaca Blaga L (1919) The light of heaven. Our translation. “Poemele Luminii”. Minerva Publishing House, Bucuresti Blaga L (1937) Trilogia culturii: geneza metaforei si sensul culturii. Republished in 2018 by Humanitas, Bucuresti Blaga L (2019) I do not crush the corolla of wonders. Translated by Maria Magdalena Biela. https://www.poemhunter.com/poem/eu-nu-strivesc-corola-de-minuni-a-lumii-ido-not-crush-the-world-s-corolla-of-wonders/ Blandiana A (2016) Holderin, three poems by Ana Blandiana translated by Paul Scott Derrick and Viorica Patea, Glasgow Review of Books Brescan M (2009) Mathematics and art. Sci Stud Res Math Inform Univ Bacau 19(2):99–118 Butler S (1919) On the making of music, pictures and books. The note-books of Samuel Butler. A.C. Fifield, London Byers W (2007) How mathematicians think: using ambiguity, contradiction and paradox to create mathematics. Princeton University Press, Princeton Caraman S, Caraman L (2020) Metaphors at the crossing of mathematics and the literary arts. In: Bridges conference proceedings, pp 497–500 Concepcion GY (2014) A Poem can be the World. Avilable at http://forums.familyfriendpoems. com/topic.asp?TOPIC_ID=59768 Cummings EE (1991) Complete poems 1904–1962. Liveright, New York Delvin KJ (2000) The math gene: how mathematical thinking evolved and why numbers are like gossip. Basic Books, New York Derrida J (1997) Of grammatology. Johns Hopkins University Press, Baltimore/London Dickinson E (1961) Hope is the thing with feathers. The complete poems of Emily Dickinson. Back Bay Books/Little, Brown and Company, New York Eco U (1983) The scandal of metaphor: metaphorology and semiotics. Poetics Today 4(2):217–257 Fichte JG (1800) The vocation of man. Hackett Publishing Company, Indianapolis Freedman S (2009) A poem is a metaphor, isn’t it? Available at https://www.poemhunter.com/ poem/a-poem-is-a-metaphor-isn-t-it/ Freedman S (2012) The language of poetry is metaphor. Available at https://www.poemhunter. com/poem/the-language-of-poetry-is-metaphor/ Gibbs RW (1994) The poetics of mind: figurative thoughts, language and understanding. Cambridge University Press, Cambridge Glaz S (2010) Discovering patterns in mathematics and poetry, by M. Birken and AC Coon. J Math Arts 4(4):227–229 Godman N (1976) Languages of art: an approach to a theory of symbols. Hackett Publishing Company/ING, Indianapolis/Cambridge Grosholz E (2018) Great circles: the transits of mathematics and poetry. Springer, Cham Growney J (2009) What poetry is to be found in mathematics? What possibilities exist for its translation? Math Intell 31(12):12–14 Hadamard J (1945) The mathematician’s mind: the psychology of invention in the mathematical field. Princeton University Press, New York

35 Metaphor: A Key Element of Beauty in Poetry and Mathematics

1043

Hardy GH (1940) A mathematician’s apology. Cambridge University Press, Cambridge Harre R (1960) Metaphor, model and mechanism. Proc Aristot Soc 60:101–122 Henry M (1987) La barbarie. Grasset, Paris Hesse M (1966) Models and analogies in science. University of Notre Dame Press, Notre Dame James I (2002) Remarkable mathematicians from Euler to von Newmann. Cambridge University Press, Cambridge Jordan A Van (2015) Originally published in Poem-a-Day on May 27, by the Academy of American Poets. https://poets.org/poem/vestiges Kempthorne LJA (2015) Relation between modern mathematics and poetry: Czeslaw Milosz; Zbingniew Herbert; Ion Barbu/Dan Barbilian. PhD dissertation, Victoria, University of Wellington Kepler J (1615/1960) Nova stereometria doliorum vinarorium, etc. In: Hammer F (ed) Johannes Kepler: Gesammelte Werke, vol IX. C.H. Beck, Munich Lakoff G, Johnson M (1980) The metaphor we live by. The University of Chicago Press, Chicago Lakoff G, Nunez RE (2000) Where mathematics comes from: how the embodied mind brings mathematics into being. Basic Books, New York Lesovici MD (1999) Ironia. Institutul European, Iasi Lorenz EN (1972) Predictability: does the flap of a butterfly’s wings in Brazil set off a tornado in Texas? American Association for the Advancement of Sciences,139th meeting Lucero R (2008) Metaphor. Available at https://www.poemhunter.com/poem/metaphor/ Maasen S (2000) Metaphors in the social sciences: making use and making sense of them. In: Hallyn F (ed) Metaphor and analogy in the sciences. Springer, Dordrecht, pp 199–244 Maasen S, Weingart P (1997) The order of meaning: the career of chaos as a metaphor. Configurations 5:463–520 Manin Y (2007) Mathematics as metaphor: selected essays of Yuri I. Manin. American Mathematical Society, Providence Marcus S (1998) Mathematics and poetry: discrepancies within similarities. Bridges, mathematical connections in art, music and sciences. Gilliad Printing, Kansas Marcus S (2011) Paradigme universale. Paralela 45:52 Miller A (2000) Metaphor and scientific creativity. In: Hallyn F (ed) Metaphor and analogy in the sciences. Springer-Science +Business Media, Dordrecht, pp 147–164 Moritz (1942) Memorabilia mathematica: the Philomath quotation book: 1140 anecdotes, aphorisms and passages by famous mathematicians, scientists and writers. Spectrum series. Mathematical Association of America, Washington DC Neruda P (2020) 100 love sonnets. Black Eagle Books, Ashland Netz R (2012) Inside a mathematical proof lies literature. Stanford Report, May 7 Newman JR (1956) The world of mathematics. Simon and Schuster, New York Nunez R (2000) Conceptual metaphor and the embodied mind: what makes mathematics possible. In: Hallyn F (ed) Metaphor and analogy in the sciences. Springer, Dordrecht, pp 125–145 Pimm D (2006) Drawing on the image in mathematics and art. In: Sinclair N, Pimm D, Higginson W (eds) Mathematics and the aesthetic: new approaches to an ancient affinity. Springer, New York, pp 160–189 Poe EA (1896a) Annabel Lee. The works of Edgar Allan Poe. George Routledge & Sons, London Poe EA (1896b) To Helen. The works of Edgar Allan Poe. George Routledge & Sons, London Polya G (1954) Mathematics and plausible reasoning, volume I: induction and analogy in mathematics. Princeton University Press, Princeton Pound E (2013) In a Station of the Metro. https://www.poetryfoundation.org/poetrymagazine/ poems/12675/in-a-station-of-the-metro Reza M (2018) The truth beauty. Available at https://www.poemhunter.com/poem/the-truthbeauty/ Richards IA (1936) The philosophy of rhetoric. Oxford University Press, New York/London Richards JL (2003) The geometrical tradition: mathematics, space, and reason in the nineteenth century. In: Nye MJ (ed) The Cambridge history of Science, volume 5: The modern physical and mathematical sciences. Cambridge University Press, Cambridge, pp 449–467

1044

S. Caraman and L. Caraman

Roethke T (1941) The Adamant. Open House. Available at http://adilegian.com/roethke.htm Rota GC (1997a) Indiscrete thoughts. Birkhauser, Boston Rota GC (1997b) The phenomenology of mathematical beauty. Synthese 111(2):171–182 Sandburg C (1994) Fog. In: Chicago poems: the unabridged. Courier Corporation, Chelmsford Sawyer WW (1955) Prelude to mathematics. Penguin Books, Harmondsworth Sfard A (1994) Reification as the birth of metaphor. Learn Math 14(1):44–55 Sfard A (1998) On two metaphors of learning and the danger of choosing just one. Educ Res 27(2):4–13 Shepherd M (2006) Reborn metaphor. Available at https://www.poemhunter.com/poem/0003reborn-metaphor/ Shepherd M (2007a) In the spirit of Rumi – 71 – In praise of metaphor. Available at https://www. poemhunter.com/poem/in-the-spirit-of-rumi-71-in-praise-of-metaphor/ Shepherd M (2007b) Seeking metaphor. Available at https://www.poemhunter.com/poem/seekingmetaphor/ Shepherd M (2007c) The lake of metaphor. Available at https://www.poemhunter.com/poem/thelake-of-metaphor/ Shepherd M (2007d) The secret treasured world of metaphor. Available at https://www. poemhunter.com/poem/the-secret-treasured-world-of-metaphor/ Shklovski V (2017) A reader. Alexandra Berlina (ed) Bloomsbury, New York Silverman W (2009) Fearless confessions. A writer’s guide to memoir. University of Georgia Press, Athens Simic C (1990) Notes on poetry and philosophy. Wonderful words, silent truth. The University of Michigan Press, Ann Arbor Sinclair N (2011) Aesthetic considerations in mathematics. J Humanist Math 1(1):1–32 Stanescu N (1964) Poem in “O viziune a sentimentelor”. Editura pentru literature Valery P (1952) Lettres a quelques-uns. Gallimard, Paris Wahab A (2013) Religion is a metaphor. Available at https://www.poemhunter.com/poem/religionis-a-metaphor/ Wall A (2012) The Janus face of metaphor. The Fortnightly Review. https://fortnightlyreview.co. uk/2012/05/Wall-metaphor/ Wells D (1990) Are these the most beautiful? Math Intel 12(4):37–41 Wittgenstein L (1953/1972) Philosophical investigations, 3rd edn. Basil Blackwell, Oxford Wordsworth W (1799/1805) The prelude or growth of a poet’s mind: an autobiographical poem. Ernest de Selincourt, Oxford Zwich J (2006) Mathematical analogy and metaphorical insight. Math Intel 28(2):4–9

Poems Structured by Mathematics

36

Daniel May

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Early Examples of Mathematical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Oulipo and Raymond Queneau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sestinas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Poetic Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syllables per Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Words per Line and Latin Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lines per Stanza and Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Letters per Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pantoums and Platonic Solids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fundamental Theorem of Arithmetic Poetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Incidence Geometry Poetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1046 1048 1053 1058 1066 1066 1070 1071 1074 1074 1082 1084 1088 1091

Abstract Poets have long used mathematical ideas in the structure of their poetry. Sometimes the mathematical structure of a poem is simple and obvious, while the mathematics guiding other forms of poetry is more opaque. Pre-twentiethcentury examples often included combinatorial techniques and sometimes offered the reader choices which rendered thousands of poetic possibilities out of a single short text. Since the mid-twentieth century, the use of explicit mathematical poetic form has become more intentional. Founded in the 1960s, the Oulipo remains a group of writers interested in the overlap of mathematics and poetry. One of the founders of the group, Raymond Queneau, explored the

D. May () Black Hills State University, Spearfish, SD, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_113

1045

1046

D. May

generalization of the sestina, and the work produced around this question remains a highlight in the mathematics of poetry. Other well-known poetic forms, such as the haiku and pantoum, can be described in mathematical terms. Many less famous mathematical poetic forms also exist. Poets have structured their work according to numerous mathematical ideas, including the Fibonacci sequence, pi, Latin squares, Platonic solids, the fundamental theorem of arithmetic, graphs, and finite projective planes. This chapter presents these ideas and also includes the occasional example of a poet intentionally violating the strict mathematical form in which they write.

Keywords Mathematical poetry · Constrained writing · Poetic structure · Oulipo · Sestina · Pantoum

Introduction The idea of mathematical poetry might seem a bit incongruous to a reader first encountering this topic. But Emily Grosholz suggests that both mathematics and poetry provide a “seed” out of which sprouts an entire branch of human knowledge. Her analogy of the subjects’ respective contexts is illuminating: “poetry stands in the same relation to the humanities as mathematics stands to the sciences” (Grosholz, 2018). As it turns out, countless individuals have explored the connections between mathematics and poetry. A rich history of mathematical poetry has evolved over the centuries, and among many others the long list of excellent surveys of mathematical poetry includes the work of Sarah Glaz, JoAnne Growney, Marcia Birken, and Anne C. Coon listed in the References at the end of this chapter (Birken and Coon, 2008; Glaz, 2011; Growney, 2006, 2008). In particular, the reader interested in learning more about the many ways that mathematics and poetry overlap is encouraged to visit JoAnne Growney’s excellent blog Intersections. Since 2010, this site has provided hundreds of examples of mathematical poetry of all varieties, and the field of mathematical poetry would be impoverished if the site did not exist. Several of the examples discussed in this chapter also appear on Growney’s site. Mathematical poetry is sometimes categorized into three broad areas. The first is poetry which is literally about mathematics. Poetry of this type includes both poetic formulations of mathematical questions or solutions and poems which are expositions of mathematics in general. Archimedes’s verses on his cattle problem and Cardano’s poem describing his method of solving the cubic equation are examples of the poetic asking or answering of mathematical questions. Expository poems on mathematics and math history abound, such as Glaz’s “Calculus,” a depiction of the controversy and intrigue present in the early days of the subject. Numerous poets have also written about their personal experiences in doing mathematics and being

36 Poems Structured by Mathematics

1047

a part of the mathematics community, including Glaz, Growney, Grosholz, Marion Cohen, and many others. The second type of mathematical poetry uses mathematical language or imagery in a metaphorical sense. Examples of this approach include Emily Dickinson’s “We Shall Find the Cube of the Rainbow” (which appears in section “The Oulipo and Raymond Queneau” and is one example of Dickinson’s many uses of mathematical imagery in her work), Langston Hughes’s “Addition [1],” and Alice Major’s “The god of prime numbers.” And Grosholz has examined metaphorical connections between mathematics and poetry, through personal reflections, historical and mathematical study, and her own poetry. The third type of mathematical poetry involves poetic form, and this type is the subject of this chapter. Speaking broadly, mathematics can be used to structure poetry in numerous ways. For example, poets use sequences of integers to dictate the number of syllables, lines, or stanzas they compose, as in the examples in section “Poetic Enumeration”. Sometimes the poet will employ repetition of a word, phrase, or entire line, and this repetition is often mathematical in nature, as in sections “Sestinas” and “Pantoums and Platonic Solids”. Many of these techniques are very old, such as some of the examples discussed in section “Early Examples of Mathematical Form”. Sometimes when one reads a mathematical poem, the mathematics can be rather opaque, as in the examples in sections “Fundamental Theorem of Arithmetic Poetry” and “Incidence Geometry Poetics”. On the other hand, sometimes the math is quite explicit in structuring a piece, and a poet might consciously borrow from theorems or concepts in mathematics to invent entirely new poetic forms. Each of these ideas will be discussed in this chapter. Because the history of overlap between mathematics and poetry is so deep, this chapter should be read as a necessarily selective history of just some of the ways poets use mathematics to structure their work. Hopefully this presentation of mathematically structured poetry will inspire readers to experiment with these forms. While writing to a constraint might initially seem overly restrictive, some poets find that rising to the challenge of a particularly thorny mathematical framework can actually produce creative ideas they wouldn’t have otherwise considered. For those readers interested in more creative advice about how to write mathematically structured poetry, or mathematical poetry more broadly, several excellent resources are available. Stephen Fry’s “The Ode Less Travelled: Unlocking the Poet Within” is a guide for writers which covers some of the forms presented in this chapter, and Alice Major offers some suggestions for the poet wanting to employ mathematical metaphor in their work (Major, 2018). There are also references available specifically for teachers wishing to bring poetry ideas and activities into a mathematics classroom (Cohen, 2012; Despeaux, 2015). As the reader proceeds through an approximately chronological tour of mathematical form, they should keep in mind that mathematical structure is not, and never has been, intended to be a rigid poetic prison. Indeed the well-known syllabic rules of the haiku are less restrictive than is perhaps commonly believed, and the Oulipo (discussed in section “The Oulipo and Raymond Queneau”) was so fond of setting up mathematical restrictions and then infrequently breaking them that they

1048

D. May

devised a name for this: the “clinamen.” Writing for the Poetry Foundation, Rodney Koeneke sardonically rejects the idea of pre-modern composition of poetry as a sort of algorithmic process: “the poet chose a meter [...] and selected a form to contain it. [...] Follow the instructions and everyone knows you’ve made a poem [...] I doubt things were ever that simple” (Koeneke, 2014). In that spirit, this chapter is not intended to degrade the magical process by which a poem is born into a lifeless and dry algorithm. Quite the contrary, readers are encouraged to learn some mathematical poetic forms, write to them, and then experiment with an intentional and infrequent flaunting of the rules. One other thread that will be followed throughout this chapter is the idea of generalization. Especially during the twentieth century and beyond, constrained forms became increasingly amenable to mathematical extension. This generalizing is also relevant to the modern studies of much older forms, as in the case of the sestina (discussed in section “Sestinas”).

Early Examples of Mathematical Form This section will present early progenitors of mathematically structured poetry. With these early examples, it is not always clear that the poet explicitly had mathematics on their mind when considering structure. With hindsight, however, the examples in this section do form a foundation of the mathematically structured poetry to follow. Perhaps the first writer to ever identify herself as the author of her own work, the Mesopotamian poet and priestess Enheduanna lived and wrote circa 2300 BCE. It has been suggested (Glaz (2019) is the source of the following quotations) that some of her poetic concerns were mathematical in nature, and indeed her depiction of a person who “measures the heavens above and stretches the measuring-cord on the earth” could describe a mathematician’s activities some 4000 years ago. Enheduanna paid close attention to an enumeration of the lines present in each of her temple hymns; each hymn ends with a statement such as “14 lines for the house of Nisaba in Eresh.” The mention of the number of lines a poem contains is certainly not a sophisticated mathematical concern. But the fact that it is included as part of the poem itself is a very early example of a poet making explicit numerical considerations of the structure of their poetry and of calling this structure to the reader’s attention. By the first century BCE, poets were explicitly considering combinatorial implications of their poetry. Consider the following passage from Book II of “De rerum natura” (“On the Nature of Things”), the well-known work from Roman poet and philosopher Lucretius. from “De rerum natura” Lucretius, first century BCE Moreover, it is important in my own verses With what and in what order The various elements are placed. For the same letters denote Sky, sea, earth, rivers, sun, The same denote crops, trees, animals.

36 Poems Structured by Mathematics

1049

If they are not all alike, yet by far the most part are so; But position marks the difference in what results. So also when we turn to real things: When the combinations of matter, When its motions, order, position, shapes are changed, The thing also must be changed.

When Lucretius refers to the order in which “the various elements are placed,” he is pointing out that various permutations of members from some set of symbols (in his case, the Latin alphabet) yield very different meanings. While the word “combination” here predates the modern mathematical meaning of the term, and is used in the context of Lucretius’s interest in atomism (Duncan, 2012), it is telling that he uses an explicitly literary metaphor to illustrate his point. Writing poetry itself is presented as an inherently combinatorial act. A natural extension of the combination of letters into words described by Lucretius is the combination of words into sentences and ideas, and indeed early poets did explore this idea in ways that seem implicitly mathematical. An illustrative example is the late thirteenth-century writer Ramon Llull. Born on the Mediterranean island of Mallorca, Llull was a religious figure, philosopher, and mathematician whose work is known to have influenced Gottfried Wilhelm Leibniz. Llull devised a system of circular diagrams, known as the Ars Magna, which in modern mathematical language are best described as graphs. For example, Fig. 1 (which is a modern recreation by the author of this chapter) is K16 , the complete graph on 16 vertices. Llull devised a set of attributes of God and then used this K16 to explore   all possible binary combinations of these attributes. In general, Kn contains n2 = n(n−1) edges, so Fig. 1 illustrates 120 different combinations of attributes of God 2 that would have held some meaning to Llull. Anthony Bonner’s The Art and Logic of Ramon Llull: A User’s Guide (Bonner, 2007) is a book-length exploration of these ideas and the principal source of the brief outline of Llull’s work here (including the labelling of vertices in Fig. 1). On the surface, the Ars Magna has nothing to do with poetry. But Llull’s interpretive work with his system was indeed poetic. For example, even though Llull intended that his methods would provide a literal answer to the question of, say, the existence of God (and his resulting “proof” does indeed look rather algebraic), he also explicitly states that his methods may be applied metaphorically to scientific questions. Even more, Llull specifically suggests that his techniques may be used to compose sermons, and this combinatorial generation of text foreshadows other methods described later in this chapter. Llull also inserts himself as a sort of creative voice into his techniques when he rejects such possibilities as “God is greedy.” From a modern vantage, his work appears more creative than scientific. Indeed, in an essay entitled “Ramón Llull’s Thinking Machine,” Jorge Luis Borges suggested a repurposing of Llull’s graphs and tables for poetic ends. In fact, Llull was a poet himself. His poem “Desconhort” is a numerically structured work: 69 12-line stanzas, each line written in alexandrine form (see

1050

D. May simplicity generosity

nobility

mercy

justice

perfection

dominion

glory

goodness

truth

greatness

eternity

virtue power

will or love wisdom

Fig. 1 Ramon Llull’s labelling of K16 from Ars Magna, 1290

section “Poetic Enumeration”) such that all 12 lines per stanza rhyme. Part of this poem is devoted to extolling the virtues of the Ars Magna. from “Desconhort” Ramon Llull, 1295 I still tell you that I bring a general Art that is given back to the spiritual person for whom one can know everything that is natural

Several of the examples in this section are explored in more depth in (Johnson, 2012), including Ramon Llull and the seventieth-century German poet Georg Philipp Harsdörffer. Harsdörffer constructed Llullian wheels for the automatic generation of German words, both real and nonexistent, for poetic uses. He also explored poetic form in a way that was later interpreted as early combinatorial poetry (Motte, 1998). Harsdörffer would compose couplets consisting of several monosyllabic German nouns and allow for any permutation of these nouns to

36 Poems Structured by Mathematics

1051

generate a different reading of the poem. The following translation of such a poem appears in (Motte, 1998). from “Récréations” G. P. Harsdörffer, ca. 1650 Honor, Art, Money, Property, Praise, Woman, and Child One has, seeks, misses, hopes for, and disappears.

Given n monosyllabic nouns, this of course yields n! different poems, each of which is rhythmically identical. In the original German, the couplet above has n = 7, the seven nouns in the first line. As in this couplet, it seems that sometimes Harsdörffer expanded this idea to include both interchangeable nouns and interchangeable verbs. In the case of a couplet containing n > 0 such nouns and v > 0 such verbs, n! × v! different poems result, fewer than the (n + v)! poems if all of the words were allowedto be interchanged  n+vwith  each other. This is because (n + v)! − n! × v! = n! × v! n+v − 1 , but > 1 since n > 0 and v > 0, v v so indeed the difference is positive. Of course fewer poems are expected with the restrictions that only nouns may be exchanged for nouns, and verbs for verbs; the interchange of words without respect to their parts of speech would almost surely produce grammatically incorrect transpositions of nouns and verbs. It’s not clear how seriously Harsdörffer took this form of composition. These poems appeared in a book he called Récréations, and he is perhaps best known for publishing in 1647 his Poetischer Trichter (Poetic Funnel). The subtitle of that book claimed the volume would “pour” the “art of poetry” into the reader in 6 hours, and this kind of mechanical teaching and learning is sometimes referred to as the Nuremberg Funnel. It does seem at least plausible that Harsdörffer considered this technique as a kind of poetic combinatorial writing. However, German-language scholar J. G. Robertson, writing centuries later, states Harsdörffer’s techniques “reduced, in all seriousness, [. . . ] to absurdity” earlier theories of German-language poetics (Robertson, 1911). In any case, Harsdörffer’s work does provide an early example which can be analyzed combinatorically. And his work provided inspiration for German combinatorial poets to follow. Born near the end of Harsdörffer’s life, Quirinus Kuhlmann took the idea of the poetic funnel and put it into more serious practice. In 1671, he published a collection of sonnets called Himmlische Libes-Küsse (Heavenly love-kisses) in which the intentional permutation of words is the defining characteristic of the poems. For example, “Love-Kiss XLI: The Mutability of Human Affairs” is a sonnet in which the first 12 of the 14 lines require a choice of the reader. The first stanza, as translated in 2009 by Richard Sieburth, includes the following. from “Love-Kiss XLI: The Mutability of Human Affairs” Quirinus Kuhlmann, 1671 From Night / Fog / Clash / Frost / Wind / Sea / Heat / South / East / West / North / Sun / Fire and Plagues Come Day / Blaze / Bloom / Snow / Peace / Land / Flash / Warmth / Heat / Joy / Cool / Light / Flames and Dread

1052

D. May

From Ache / Pain / Shame / Fear / War / Oh / Cross / Stress / Scorn / Hurt / Shock / Guile / Scold and Fraud Come Joy / Charm / Fame / Ease / Wins / Truce / Gains / Peace / Jokes / Games / Rest / Mild / Morning Rays.

Each of these lines contains 13 monosyllabic words (the words before “and” in the first 3 lines and before “Rays” in the fourth). Kuhlmann encourages the reader to choose one word per line, leaving the first and last word (or two) unchanged under all readings. For example, the first stanza might be read as follows. From Wind and Plagues Come Snow and Dread From Fear and Fraud Come Morning Rays.

There are thus 134 = 28,561 ways in which a reader may choose to read this first stanza and 1312 = 23,298,085,122,481 ways the reader can read the entire sonnet. Because of the words used by Kuhlmann, some of these readings produce lines that don’t seem to make sense: the intended meaning of “From Oh and Fraud/Come Games Rays” is not at all clear. Two important figures from the history of mathematics had lesser-known works which are relevant to the early history of mathematically structured poetry. Roughly contemporary to Harsdörffer was seventeenth-century French mathematician and Catholic priest Marin Mersenne. Mersenne is best known today for the category of prime numbers which bear his name. Less famous than his work with primes, Mersenne attempted to enumerate all possible 22-note songs. By “song,” Mersenne apparently meant a musical sequence consisting of any sound producible in any human language. In a sense, Mersenne was attempting to produce every possible short poem, in a manner foreshadowing Borges’ short story “The Library of Babel.” This sort of exhaustive search for all pronounceable sounds, or more broadly of all words regardless of pronounceability, was a project approached by a variety of thinkers of the time (Eco, 1995). In Mersenne’s case, he produced computations suggesting there are 1,124,000,727,777,607,680,000 possible 22-note songs. He further calculated that printing these short songs on paper would produce a stack that stretched “between heaven and earth,” or about 28,826,640,000,000 inches. Leibniz too was inspired by Llull to attempt to count all possible vocal sounds in his first published work, Dissertatio de arte combinatoria (“Dissertation on the Art of Combinations”). Numerous other writers have been named as forerunners of modern mathematically structured poetry. One is Jonathan Swift, whose 1726 novel Gulliver’s Travels describes an automatic text-generating machine. Another is Victor Hugo, who published the poem “Les Djinns” in 1829; the mathematical structure of the poem is discussed in section “Poetic Enumeration”. Raymond Roussel’s lengthy 1910 poem “New Impressions of Africa” consists of one very long sentence in which the reader must navigate a sequence of nested parenthesis five levels deep, and the sentence is reminiscent of order-of-operations problems from a basic algebra class.

36 Poems Structured by Mathematics

1053

Several of the examples from this section evoke a time before strict divisions existed between academic disciplines, or even between the academic and the nonacademic. One might rightly call this a prescientific view, or decry a lack of academic standards. But for these poets and thinkers, considering math and poetry side by side may not have seemed so unusual. The people mentioned in this section also inspired a flowering of mathematically constructed poetry in the second half of the twentieth century through the work of the Oulipo, a group discussed in the following section.

The Oulipo and Raymond Queneau Any discussion of mathematically structured poetry must include an examination of the Oulipo, which began in 1960 as a collection of French writers and mathematicians. This section will provide an introduction to the group, some of the key figures in it (including Raymond Queneau), and a few of their mathematical poetry techniques. Edited by Harry Mathews and Alastair Brotchie, the Oulipo Compendium is an essential survey of the Oulipo. Ian Monk and Daniel Levin Becker edited All That Is Evident Is Suspect: Readings from the Oulipo 1963–2018, an anthology which includes Oulipian work from every member of the group. Though most assuredly not a “movement” (a distinction stressed by both Mathews and Levin Becker), the group remains active, and members now hail from several countries. Levin Becker provides a description of the twenty-first century activities of the group in Many Subtle Channels: In Praise of Potential Literature. One might consider the founding of the group as the commencement of the modern period of mathematical poetry. Indeed, the group itself might prefer to see things that way: they often refer to instances of mathematical or other constrained writing which occurred prior to their founding as “anticipatory plagiary.” Calling attention to their stylistic predecessors was one of the two main goals of the early Oulipo. But while they sought to uncover previous writers who had used combinatorial or constrained writing techniques (including some mentioned in the previous section), the group also devised new literary structures of their own. These Oulipian structures are not exclusively mathematical, but the group has had a keen interest in mathematics from the start. As early Oulipian Jacques Duchateau described in a lecture in 1963, “[W]e are prompted to ask ourselves questions about these notions of structure. This is not new.” The new part, according to Duchateau, is an explicit focus on mathematics. “The search for new forms,” he says, was “made possible by a working practice first used by mathematicians: the axiomatic method” (Monk and Levin, 2018). While working on what would become the first canonical Oulipian work, “Cent mille milliards de poémes,” French writer Raymond Queneau enlisted the assistance of chemical engineer and writer François Le Lionnais, and the Oulipo was initiated. Soon thereafter, a collection of like-minded writers and mathematicians gathered at a colloquium covering Queneau’s work. Early on, the group adopted

1054

D. May

the name Ouvroir de Littérature Potentielle, which can be translated as “Workshop for Potential Literature.” The name would later be abbreviated as “OuLiPo,” or “OU.LI.PO,” or simply “Oulipo.” Other well-known early member of the group included Georges Perec and Jacques Roubaud. Raymond Queneau’s “Cent mille milliards de poémes” (“A Hundred Thousand Billion Poems”) was published in 1961. In it, Queaneau composed a sequence of 10 14-line sonnets in which every sonnet has the rhyme pattern ABABCDCDEF EF GG, where each rhyme A, B, C, D, E, F and G is held constant over the entire collection. That is, the rhyme in the first and third lines of the first sonnet matches the rhyme in the first and third lines of every other poem in the collection and similarly for all other corresponding lines. Further, every line across all the sonnets is written in the same meter. The (quite intentional) result of this is that any of the ten sonnets’ first lines may be combined with any of their second lines, any of their third lines, and so forth, until a full 14-line sonnet is obtained. Any poem produced this way will have identical rhyme patterns and remain rhythmically consistent. In the first printing of the work, each of Queneau’s sonnets was printed on one side of ten consecutive pages, with dotted lines printed in between each line of poetry to encourage the reader to physically cut the poems into strips, allowing for a rearrangement of the lines into many different sonnets as shown in Fig. 2. This printing scheme is reproduced in the Oulipo Compendium and can be synthesized online at various websites. The enumeration of all possible sonnets present in “Cent mille milliards de poémes” is a basic example of the fundamental counting principle, sometimes also called the multiplication principle (as are some of the examples in section “Early Examples of Mathematical Form”). In order to read a sonnet, the reader must first choose any one of the ten first lines Queaneau has written. The reader then reads a second line, and because of the care taken in their composition, any second line will

Fig. 2 Raymond Queneau’s “Cent mille milliards de poémes”

36 Poems Structured by Mathematics

1055

fit with whichever first line the reader has chosen. In this way, the reader proceeds to make 14 independent choices, with 10 options at each choice, for a total of 1014 different ways the reader could read the poem. In this way, Queaneau has produced 100,000,000,000,000 distinct (but closely related) sonnets – hence the name of the work. Whether or not Queneau himself can be considered the author of all of these poems has been the subject of some debate. Reminiscent of Mersenne’s calculations of the height of a stack of all songs, Queaneau noted that reading at a rate of one sonnet per minute, 24 h per day, it would take someone approximately 190,258,751 years to finish every one. It seems worth mentioning, then, that because no one has checked every single sonnet in “Cent mille milliards de poémes,” the poetic consistency of the entire collection is of course dependent on the transitive property. The breadth of Queaneau’s work was vast. At the time of the founding of the Oulipo, one of Queneau’s more well-known works was Exercises in Style (Queneau, 2012). This collection of stories is 99 different versions of the same anecdote, with each version written according to a different literary style. Most of these styles are not mathematical: the “botanical” retelling of the story, for example, recasts the principal characters as plants. However several of the versions do use mathematics: the version called “Mathematical” describes the path of a bus as a solution to a secondorder differential equation, and “Probability” recounts the events in terms of what the narrator calls the “established laws of probability.” Appearing in a later edition, the retellings “Set Theory” and “Geometry” pose the anecdote as textbook exercises. The Exercises in Style versions most relevant to this chapter are a series of retellings based on the permutation of the words or letters of the original story. “Permutations by groups of 1, 2, 3, and 4 words,” for example, tell the story in four sentences. In the first, the reader is required to transpose every other word to obtain a coherent sentence. In the second sentence, the reader must transpose every other pair of words; in the third sentence, the reader transposes every other triple of words; and in the final sentence, the reader must transpose every other set of four words. Put another way, the words of sentence one are swapped according to the permutation (1 2), the words of the second sentence according to (1 3) (2 4), the third according to (1 4) (2 5) (3 6), and the fourth by (1 5) (2 6) (3 7) (4 8). “Permutations by groups of 2, 3, 4, and 5 letters” follow a similar pattern, producing what appears to be gibberish. While not strictly speaking poetry, these parts of Exercises in Style certainly recall the early combinatorial poetry of Harsdörffer and Kuhlmann. They also foretell the main literary concerns of the early Oulipo, the contrainte, or “constraint.” For their purposes, the Oulipo has defined a literary constraint as a “strict and clearly definable rule, method, procedure or structure that generates [a] work” (Mathews and Brotchie, 2005). The mixture of styles employed by Queneau in Exercises in Style demonstrates the diversity of the Oulipian constraint: some constraints are mathematical and others are not. The discussion of Raymond Queneau will be concluded with a brief mention of perhaps his most important mathematical contribution, in the area of Ulam numbers. Starting with the integers 1 and 2, subsequent (1, 2)−Ulam numbers are

1056

D. May

defined to be the next smallest integer which can be represented by sums of distinct smaller Ulam numbers in precisely one way. The first few (1, 2)−Ulam numbers are 1, 2, 3, 4, 6, 8, 11, and 13 (OEIS sequence A002858). For example, 5 is not a (1, 2)−Ulam number because 5 = 1 + 4 = 2 + 3. In general, (u, v)−Ulam numbers are obtained in the same way (starting with positive integers u and v), and a sequence of (u, v)−Ulam numbers is regular if the sequence of differences between consecutive (u, v)−Ulam numbers is eventually periodic. Queneau showed that (2, 7)−Ulam numbers and (2, 9)−Ulam numbers are regular (Queneau, 1972), and in 1994 this result was generalized by Schmerl and Spiegel who showed that for all odd values of v > 3, (2, v)−Ulam numbers are regular. Broadening the discussion to the wider group, the name “Workshop for Potential Literature” suggests that the Oulipo is often more interested in the “potentiality” of a literary constraint than in actually producing literature according to that constraint (Levin Becker, 2012). However, a large number of Oulipian works do exist which adhere to constraints, mathematical or otherwise. An early member of the Oulipo, Georges Perec produced several well-known Oulipian works. His 1978 novel Life: A User’s Manual relies on a plot structure dictated by the pair of orthogonal Latin squares discovered by the so-called Euler spoilers a few decades earlier. (Orthogonal Latin squares are described in more details in section “Poetic Enumeration”.) One less mathematical constraints used by Perec was the lipogram, which is a text in which the author intentionally omits the use of a letter or set of letters. While not invented by the Oulipo, Perec wrote La Disparition, a 1969 novel excluding the letter e, which was published in English as A Void by Gilbert Adair in 1995. (This novel was the subject of a “Google doodle” on what would have been Perec’s 80th birthday, in which the “e” in Google is partially erased.) Perec’s history of the lipogram appears in (Motte, 1998) and cites examples as far back as the sixth century BCE. In 1978, Paul Braffort published a collection of 21 poems entitled “Mes Hypertropes,” whose structure was determined by the Fibonacci sequence. One of the most famous sequences in all of mathematics, the Fibonacci sequence arises from starting with two copies of the integer 1 and producing subsequent integers by summing the preceding two. The first several terms of the sequence are thus 1, 1, 2, 3, 5, 8, 13, 21, and 34. Among other restrictions, the poems in the sequence call attention to Zeckendorf’s theorem on Fibonacci numbers, named for Edouard Zeckendorf but originally proved by Gerrit Lekkerkerker. This theorem states that every positive integer can be uniquely written as the sum of one or more distinct Fibonacci numbers such that none of the numbers in the sum are consecutive Fibonacci numbers. For example, 20 = 2 + 5 + 13 and 20 cannot be expressed as a sum of mutually nonconsecutive Fibonacci numbers in any other way. Braffort’s poems metaphorically demonstrate this theorem in that poem 20 in “Mes Hypertropes” is determined by the contents of poems 2, 5, and 13, and each of the other poems is likewise determined by the Zeckendorf representation of the poem’s ordinality in the collection.

36 Poems Structured by Mathematics

1057

“Mes Hypertropes” also demonstrates Roubaud’s second principle, named after Oulipian Jacques Roubaud. First published by him in 1981, Roubaud’s principles are the following two statements (as they appear in the Oulipo Compendium) regarding the relationship between a piece of constrained writing and the constraints that structure it. Roubaud’s First Principle: A text written in accordance with a restrictive procedure refers to the procedure. Roubaud’s Second Principle: A text written according to a mathematically formulable procedure includes the consequences of the mathematical theory that it illustrates. An example of a piece of literature that obeys Roubaud’s first principle is Perec’s La Disparition, the plot of which includes a description of the disappearance of the letter “e.” One could interpret the excerpt of De rerum natura presented in section “Early Examples of Mathematical Form” as exemplifying Roubaud’s first principle as well, as Lucretius describes the combinatorial consequences of arranging letters into words in an actual piece of writing itself. Roubaud’s First Principle was modified by Jacques Jouet in 1990 into the following corollary (as it appears in the Oulipo Compendium). Jouet’s Corollary: Any text that follows a restrictive method and pretends to ignore it semantically contradicts the very principle of its use. An example of Jouet’s corollary can be found in a collection of short poems by the author of this chapter first published in The Fib Review in 2019. Some of the poems in the collection claim to be Fibs (a type of short poem described in section “Poetic Enumeration”) but in fact are not. Another mathematical Oulipian form is the Métro Poem, a collaborative project undertaken by Jouet and graph theorist Pierre Rosenstiehl. The self-referential “Poem of the Paris Métro,” published in 1998, describes the simple rules of the form: while taking a ride through the Métro, the poet writes exactly one line of poetry between each stop. In this way, the poem will contain one fewer line than the number of stations on the trip. According to Jouet, the poet should mentally compose a line, while the train is moving, but only actually write when the train is stopped at a station. The mathematical goal was to construct a Métro ride that would allow for a poem covering the entire Paris Métro system. This problem may remind the reader of the foundational problem in graph theory, the Königsberg Bridge Problem. As Rosenstiehl points out, the entire Paris Métro is not Eulerian, as some stations have an odd number of lines arriving into them. As such, there was work to be done to design the most efficient path for Jouet to take as he composed his poem, and the problem can be seen as an example of the Travelling Salesman Problem. The final Oulipian technique mentioned here is the “N + 7 machine,” which was one of the earliest Oulipian methods to be adapted to the Internet. The “machine”

1058

D. May

simply replaces every noun in a text with the seventh noun subsequent from the original. This process can be explored online at the collaborative poetry site The Spoonbill Generator, where the “large” dictionary contains 11,700 nouns. As an example of this process, Emily Dickinson’s poem “We Shall Find the Cube of the Rainbow” was used as the original source material. “We Shall Find the Cube of the Rainbow” Emily Dickinson We shall find the Cube of the Rainbow. Of that, there is no doubt. But the Arc of a Lover’s conjecture Eludes the finding out.

From this poem, The Spoonbill Generator produces the following (admittedly nonsensical) verse. We shall find the Cudgel of the Raisin Of that - there is no dovecote But the Archdiocese of a Luck’s connoisseur Eludes the fingertip out.

This section concludes with a short discussion of the clinamen, alluded to in section “Introduction”. As described in the Oulipo Compendium, a clinamen is a “deviation from the strict consequences of a restriction.” However, the group suggests that use of a clinamen can only be justified in cases where the deviation from the rule is not necessary. As such, the clinamen technique is a sort of wink from the writer to the reader, a meta-textual commentary on the artificial nature of the constraint at play. The clinamen is also sometimes referred to as a “swerve” (Duncan, 2012). Oulipian Italo Calvino has said that given a piece of constrained writing, the clinamen “alone can make the text a true work of art” (Motte, 1998). In practice, actual examples of the clinamen in Oulipian work can be hard to find, and the Oulipo Compendium specifically points out that it does not contain many clinamen. However, in subsequent sections, a few places where a constraint is violated in the spirit of the clinamen will be highlighted.

Sestinas Of all the Oulipo’s techniques, Raymond Queneau’s generalization of the sestina is perhaps the most mathematically rich. Indeed the sestina’s mathematical renown is well-earned, and the form admits a fairly deep mathematical examination. The sestina is generally attributed to Arnaut Daniel, a French poet and troubadour of the twelfth century. The first English-language sestinas were published in 1579, and the form saw a resurgence in the twentieth century, with well-known examples being written by Elizabeth Bishop, W.H. Auden, and Ezra Pound. Overviews of the history, structure, and mathematics of the sestina have been written by Peter Asveld,

36 Poems Structured by Mathematics

1059

Michael Saclolo, Mark Strand and Eavan Boland, and Daniel Tammet and are listed in the References at the end of this chapter. The essential feature of the sestina is the systematic repetition of the words appearing at the ends of the poem’s lines. The reader is invited to read Bishop’s “Sestina” below, from her 1965 collection of poems Questions of Travel, and see what patterns they can ascertain. “Sestina” Elizabeth Bishop, 1965 September rain falls on the house. In the failing light, the old grandmother sits in the kitchen with the child beside the Little Marvel Stove, reading the jokes from the almanac, laughing and talking to hide her tears. She thinks that her equinoctial tears and the rain that beats on the roof of the house were both foretold by the almanac, but only known to a grandmother. The iron kettle sings on the stove. She cuts some bread and says to the child, It’s time for tea now; but the child is watching the teakettle’s small hard tears dance like mad on the hot black stove, the way the rain must dance on the house. Tidying up, the old grandmother hangs up the clever almanac on its string. Birdlike, the almanac hovers half open above the child, hovers above the old grandmother and her teacup full of dark brown tears. She shivers and says she thinks the house feels chilly, and puts more wood in the stove. It was to be, says the Marvel Stove. I know what I know, says the almanac. With crayons the child draws a rigid house and a winding pathway. Then the child puts in a man with buttons like tears and shows it proudly to the grandmother. But secretly, while the grandmother busies herself about the stove, the little moons fall down like tears from between the pages of the almanac into the flower bed the child has carefully placed in the front of the house. Time to plant tears, says the almanac. The grandmother sings to the marvelous stove and the child draws another inscrutable house.

1060

D. May

Perhaps the clearest pattern in the end-words is that the final end-word of each stanza appears as the first end-word of the subsequent stanza. The reader may also notice that (aside from the concluding tercet) the poem contains six stanzas of six lines each and that each end-word appears precisely once in each line position. For example, the word “house” appears at the end of lines 1, 2, 4, 5, 3, and 6, respectively, in each of the poem’s six-line stanzas. The word “grandmother” appears at the end of lines 2, 4, 5, 3, 6, and 1, respectively. “Stove” concludes lines 4, 5, 3, 6, 1, and 2. These sequences of positions may call to mind a permutation. Specifically, consider the set of end-words {house, grandmother, child, stove, almanac, tears}. A permutation of this set of six words is a bijection from the set onto itself. Less formally, a permutation of these words is simply a reordering of them. For example, take the ordering of these words in the first stanza of “Sestina” as a starting order: “house” first, “grandmother” second, “child” third, “stove” fourth, “almanac” fifth, and “tears” sixth. They can be put into alphabetical order (almanac, child, grandmother, house, stove, tears) via the permutation (1 4 5) (2 3) (6), written in cycle notation. This notation means that the first word in the starting order (“house”) moves to the fourth position in the alphabetized list, the fourth word in the starting order (“stove”) moves to the fifth position in the alphabetized list, and the fifth word in the starting order (“almanac”) moves to the first position in the alphabetized list. The second and third words trade places from the starting order to the alphabetized order, and that “tears” appears sixth in both lists. Written in cycle notation, the permutation which reorders the end-words from their positions in the first stanza to their positions in the second stanza is (1 2 4 5 3 6). Further, this permutation reorders the end-words from any stanza to the subsequent stanza, and this prescribed reordering is the defining feature of the sestina. Definition. A sestina is a seven-stanza poem in which the first six stanzas each include six lines. The same six words appear at the end of each of these stanza’s six lines, permuted from one stanza to the next according to the permutation (1 2 4 5 3 6). The sestina’s final tercet contains two end-words per line, with an end-word appearing at the end of each of these three lines. While at one time the order of the end-words in the final three lines was prescribed (Fry, 2005), this requirement seems to have often been ignored in twentieth-century sestinas. As the concluding tercet does not have any bearing on the mathematical questions to follow, it has not been included in the definition of a sestina. Because the defining permutation has order 6 (since it is a 6−cycle), a sixth application of that permutation will put the end-words back in their original order for a would-be seventh full stanza. It is also common to present the repetition of end-words as a spiral, as in Fig. 3. Labelled with end-words from a previous stanza, the poet follows this spiral starting from the bottom to obtain the end-words for the following stanza. If stanza B follows stanza A, then the final end-word from A becomes the first end-word in B, the first end-word from A becomes the second end-word in B, and so forth. For Elizabeth Bishop’s “Sestina,” this repetition structure is presented in Table 1.

36 Poems Structured by Mathematics

1061

Fig. 3 The defining spiral of the sestina, labelled according to the first stanza of Elizabeth Bishop’s “Sestina”

house

grandmother

child

stove

almanac

tears

Table 1 Ordered end-words of Elizabeth Bishop’s “Sestina” Stanza 1 House Grandmother Child Stove Almanac Tears

Stanza 2 Tears House Almanac Grandmother Stove Child

Stanza 3 Child Tears Stove House Grandmother Almanac

Stanza 4 Almanac Child Grandmother Tears House Stove

Stanza 5 Stove Almanac House Child Tears Grandmother

Stanza 6 Grandmother Stove Tears Almanac Child House

Because the spiral permutation in Fig. 3 produces the 6−cycle (1 2 4 5 3 6), each of the end-words in “Sestina” appears precisely once in each of the six possible line positions across the six stanzas. But analogous spiral permutations (see Fig. 5) of other numbers of words will not necessarily force the end-words to appear in every possible line position exactly once. As an example the reader is encouraged to attempt to construct a hypothetical “quadina” from “We Shall Find the Cube of the Rainbow” (from section “The Oulipo and Raymond Queneau”) by following the analogous spiral permutation on the four end-words shown in Fig. 4. To form the second stanza of the proposed “quadina,” the required end-words, spiraling from bottom to top, will come in the following order: “out,” “Rainbow,” “conjecture,” and then “doubt.” The end-word’s positions in subsequent stanzas are shown in Table 2, and it’s clear that each end-word does not appear in every possible position. In particular, “conjecture” would appear at the end of the third line of every stanza, meaning none of the other end-words can ever appear there. So a fourstanza poem analogous to the sestina cannot be constructed from a similar spiral permutation. This example raises the question: for which numbers of end-words can one write a sestina-like poem by using an appropriate spiral permutation of the end-

1062

D. May

Fig. 4 A spiral for a would-be “quadina,” labelled according to Emily Dickinson’s “We Shall Find the Cube of the Rainbow”

Rainbow

doubt

conjecture

out

Table 2 “Quadinas” do not exist

Stanza 1 Rainbow Doubt Conjecture Out

Stanza 2 Out Rainbow Conjecture Doubt

Stanza 3 Doubt Out Conjecture Rainbow

Stanza 4 Rainbow Doubt Conjecture Out

words? While the first published results on this question were produced by Raymond Queneau, there was at least one earlier attempt at generalizing the sestina. In 1904, Algernon Charles Swinburne published “The Complaint of Lisa.” He labelled this poem a “double sestina,” and it consists of 12 12-line stanzas, where each of the 12 end-words is permuted so that they appear at the ends of various line positions from stanza to stanza. The poem and a discussion of its form appear in (Birken and Coon, 2008), and as they point out, the poem isn’t a true generalization of the sestina. In fact, for reasons Swinburne perhaps couldn’t have known (but which will be discussed later in this section), a 12-stanza generalization of the sestina is not possible (at least not without significant changes to the poem’s form). Upon close inspection, Swinburne does not even use a consistent permutation to permute end-words in moving from one stanza to the next. The end-words of the first stanza are permuted according to (1

2

7

6

11

4

5

12) (3 9) (8) (10)

to obtain the second stanza’s end-words, but the permutation (1

2

6

5

4

11

3

10

9

8

12) (7)

is used in moving from the second stanza to the third. Not only unequal, these permutations do not even have the same cycle structure, and neither are 12 cycles at all. (He similarly uses different permutations in moving from stanza to stanza

36 Poems Structured by Mathematics Table 3 Queneau numbers, hand-checked by Queneau, 1963

1063 1 18 41 83

2 23 50 86

3 26 51 89

5 29 53 90

6 30 65 95

9 33 69 98

11 35 74 99

14 39 81

in “Sestina,” a poem which at first glance does resemble a proper sestina.) But he shouldn’t be judged too harshly, for his work was completed over a half-century before it was proven that a “double sestina” cannot exist. One other poem that on the surface might look like a generalized sestina is “Pentatina for Five Vowels” by Campbell McGrath. Not only does the poem’s title invoke the sestina, but the poem itself contains five stanzas of five lines each. But in fact the end-words are not repeated at all (rather they adhere to a strict rhyme scheme). As will be shown, though, it is indeed possible to write a five-stanza version of a sestina if one is so inclined. These examples (which are not generalized sestinas) notwithstanding, a poem analogous to the sestina but with some number of end-words other than six is often called a “quenina” in honor of Raymond Queneau. They are also sometimes referred to as “n−inas,” where n is the number of end-words, thus stanzas, and lines per stanza. If such an “n−ina” exists, n is called a Queneau number (OEIS sequence A054639). In 1963, Queneau published a list (see Table 3) of all Queneau numbers less than 100, which he found by hand calculations. The general spiral permutation presented in Fig. 5 is often labelled as δn (Saclolo, 2011). As we’ve seen, δ6 = (1 2 4 5 3 6), and Table 2 suggests that δ4 = (1 2 4) (3). Comparing the unsuccessful quadina-generating δ4 to the sestinagenerating δ6 , the most obvious difference is that δ4 is comprised of two disjoint cycles, whereas δ6 is one single 6-cycle. And this is the key feature required of a quenina-generating δn . That is, δn must be an n−cycle, and not simply a permutation of order n. For example, the alphabetizing permutation (1 4 5) (2 3) (6) considered above has order 6 (the least common multiple of its cycle lengths). But this alphabetizing permutation would not permute each end-word into every possible position. If applied to the first stanza of Elizabeth Bishop’s “Sestina,” the word “tears” would appear at the end of the sixth line in every stanza. So the important feature of the defining spiral permutation is that it is a 6−cycle, not that it has order 6. So, the relevant question is: for which n−values is δn an n−cycle? In order to answer this question, a formal definition of the permutation δn is required. In reading off the end-words starting from the bottom in the generalized spiral in Fig. 5, notice that end-word 1 appears second, end-word 2 appears fourth, end-word 3 appears sixth, end-word 4 appears eighth, and so forth. This is because each time an end-word from the first half (or so) of the previous stanza is encountered, the number of end-words encountered so far will be twice its line position. This would require δn (x) = 2x, but there is an obvious problem that prevents this formula from completely describing δn : eventually 2x will be greater

1064

D. May end-word 1

end-word 2

end-word 3 • • •

• • •

end-word n − 2 end-word n − 1 end-word n

Fig. 5 The spiral nature of permutation δn

than n, the number of lines available in a stanza. Thus, for end-words with line positions after the halfway line of the previous stanza, doubling will not work. In the subsequent stanza, the odd line positions have not yet been filled in, and it’s the second half of the previous stanza’s end-words that have yet to be used. In specific, the previous stanza’s end-word n should become the subsequent stanza’s end-word 1, the previous stanza’s end-word n − 1 should become the subsequent stanza’s end-word 3, and in general the previous stanza’s end-word n − i should become the subsequent stanza’s end-word 2i + 1. This is accomplished by mapping x from the second half of the previous stanza to 2(n − x) + 1 in the subsequent stanza. Thus, the full description of δn is  δn (x) =

2x if 2x ≤ n 2(n − x) + 1 otherwise

It is easy to use this definition to check that δ4 is indeed (1 2 4) (3), so clearly not a 4−cycle, preventing the existence of the quadina. As further examples, this definition gives δ5 = (1 2 4 3 5) and δ7 = (1 2 4 7) (3 6) (5), demonstrating why 5 appears in Table 3 but 7 does not. The following was one of the first general results on Queneau numbers, which excluded certain integers. Often referred to as Queneau’s Lemma, it was conjectured by (Queneau, 1963) and later proved by (Bringer, 1969).

36 Poems Structured by Mathematics

1065

Lemma. If n = 2xy + x + y for x, y positive integers, then n is not a Queneau number. Proof. Let x and y be positive integers, and suppose n = 2xy+x+y=y(2x+1)+x, so 2x + 1 < n. Suppose m (2x + 1) is an arbitrary multiple of 2x + 1 which is less than or equal to n. (Larger multiples of 2x + 1 are not relevant here, since the poem in question only contains n end-words.) If 2m (2x + 1) ≤ n, then δn (m (2x + 1)) = 2m (2x + 1). On the other hand, if 2m (2x + 1) > n, then δn (m (2x + 1)) = 2 (n − m (2x + 1)) + 1 = 2 (y (2x + 1) + x − m (2x + 1)) + 1 = (2x + 1) (2y − 2m + 1) . Thus all multiples of 2x + 1 map to multiples of 2x + 1 under δn . So in order for δn to be an n−cycle, each of 1, 2, . . . , n (every possible image) must be a multiple of 2x +1. But for a positive integer x, this is impossible, so n is not a Queneau number. Also in her 1969 paper, Bringer proved a necessary condition for an integer to be a Queneau number. The following and preceding proofs are essentially the presentations of Bringer’s proofs that appear in (Asveld, 2013). Theorem. If n is a Queneau number, then 2n + 1 is prime. Proof. The contrapositive will be proved. If 2n + 1 is composite, then it can be expressed as the product of two odd integers since it is odd itself. So 2n + 1 = (2x + 1) (2y + 1) for x, y positive integers. Thus n = 2xy + x + y, which, by Queneau’s Lemma, means that n is not a Queneau number. This theorem provides another reason why quadinas do not exist: 2(4) + 1 is not prime. It also dooms Swinburne’s “double sestina” to failure, since 2(12) + 1 is also composite. Bringer also proved several other partial characterizations of Queneau numbers. Unfortunately, none of her necessary conditions were also sufficient, and it wasn’t until nearly four decades later that the matter of Queneau numbers was completely solved. In 2008, Jean-Guillaume Dumas proved the following (Dumas, 2008). Theorem. Suppose n is an integer with p = 2n + 1 prime. The integer n is a Queneau number if and only if one of the following holds: • 2 is of order 2n in Fp • n is odd and 2 is of order n in Fp In the theorem Fp is the finite field of order p, often written as Zp or Z/pZ, meaning the integers modulo p. Dumas’ proof contains several cases and subcases, and is not presented here. However, the following examples demonstrate how the theorem guarantees that δ6 is a 6− cycle, but δ8 is not an 8−cycle. Both of these n-values are even with 2n + 1 prime, so it must be the first condition in the theorem that determines whether or not they are Queneau numbers.

1066

D. May

In general, Fermat’s little theorem states that 2p ≡ 2 mod p for primes p, so ≡ 1 mod p. In the case here where 2n + 1 = p, this means that 22n ≡ 1 mod p. In the case of n = 6, it is easy (but perhaps tedious) to verify that no smaller power of 2 is congruent to 1 modulo 13, which forces 6 to be Queneau. In the case of n = 8, note that 28 ≡ 1 mod 17, so 2 is of order 8. According to Dumas’ theorem, this is too small for 8 to be Queneau. The second case of the theorem states that the only situation in which a Queneau number n can require 2 to have anything less that its maximal possible order modulo p is when n is odd. In the case of n = 3, 2(3) + 1 is prime, and so the order of 2 in F7 determines whether 3 is Queneau or not. As before, Fermat guarantees that 26 ≡ 1 mod 7, but 6 is not the order of 2 modulo 7: note that 23 ≡ 1 mod 7 as well. This is permitted by the second condition of the theorem, since n = 3 is odd, so indeed 3 is a Queneau number. If one wants to write a quenina for some large value of n, Dumas’ rather complicated characterization of Queneau numbers is the best known way to determine if this is possible. That is, one has to first check that 2n + 1 is prime (which is easy unless n is very large) and then calculate the order of 2 modulo 2n + 1 (which could take a while even if n is relatively small). Other generalizations of sestinas have been suggested by changing the permutation of the end-words, for example, by changing the 2 in the definition of δn to a 3 (Saclolo, 2011). This idea is also mentioned in the Oulipo Compendium; essentially, it seems to be a matter of simply choosing some n−cycle to produce a quenina-like poem of the desired length. In fact, any n−cycle should suffice if the only goal is to ensure each of the n ending words will appear at the end of each of the n line positions precisely once over the poem’s n stanzas. For non-Queneau numbers, of course, this n−cycle cannot be the traditional spiral permutation. 2p−1

Poetic Enumeration While not mathematically complicated like the generalization of the sestina, many poetic forms involve the enumeration of syllables, words, lines, or stanzas. Sarah Glaz’s 2016 (Glaz, 2016) paper provides an excellent survey of poetic forms of this type, and she describes such poetry as being structured by sequences of positive integers. This section will present some examples of form where enumeration is the defining feature.

Syllables per Line The first several examples presented are defined by the number of syllables contained in a given line of poetry. Perhaps the most famous such example is the haiku. The following poem by Ron Padgett summarizes the popular conception of the syllabic definition of the form (and is an excellent example of Roubaud’s first principle described in section “The Oulipo and Raymond Queneau”).

36 Poems Structured by Mathematics

1067

“Haiku” Rob Padgett, 1995 First: 5 syllables Second: 7 syllables Third: 5 syllables

However, the syllabic strictures of the haiku are traditionally not as rigidly followed as is sometimes imagined, and some have even called this syllable restriction an “urban myth.” Small changes to a haiku’s syllable count recall Koeneke’s statement (cited in section “Introduction”) that the composition of a poem is never as simple as merely following a set of instructions. In fact, a poem with 5 syllables in its first line, 7 syllables in its second line, and 6 syllables in its third and final line could be taken as an example of the Oulipian clinamen. It seems natural to ask whether “Haiku” was composed by Padgett or discovered by him, and this question is reminiscent of the long-standing debate over whether mathematics is discovered or invented. This debate is among the themes explored in Amy Uyematsu’s poem “The Invention of Mathematics,” which includes the following lines about positive integers. from “The Invention of Mathematics” Amy Uyematsu, 2005 / the imaginary number i my students don’t get the joke after all, every number is imaginary even those we count out as beads and stones and miles to the sun

“The Invention of Mathematics” also describes another traditional syllablerestricted form of Japanese poetry, the tanka. The tanka extends the syllable count per line from the sequence (5, 7, 5) to the sequence (5, 7, 5, 7, 7). In Uyematsu’s poem (which is not itself a tanka), the syllable-count restrictions of the tanka are presented as a sort of musical rhythm. Daniel Tammet (Tammet, 2012) has noted that prime numbers appear prominently in both the haiku and the tanka. In addition to the abundance of fives and sevens in their defining sequences, each form also contains a prime number of lines and a prime number of total syllables. More specifically, Tammet notes that the tanka’s total syllable count is both a Mersenne prime and a twin prime (since the total syllable count of a tanka, 31, is 2 more than another prime number). One could also point out that the haiku’s total syllable count is a twin prime as well. Haiku have sometimes been used as building blocks for longer mathematical poems. Glaz’s “Reflection about the t−axis” does this and also employs a visual symmetry on the page. The poem “adore” by the author of this chapter is comprised of haiku and is also an example of a Fano plane poem (which will be discussed in more detail in “section Incidence Geometry Poetics”). The following poem by Daniel Mathews is also a sequence of haiku. He notes that the second line in the second haiku is pronounced “p squared on q squared is 2.”

1068

D. May

√ “A proof that 2 is irrational” Daniel Mathews, 2004 Suppose rational Let fraction be p on q hcf is 1 Square both sides and so p2 =2 q2 then multiply out. But then p’s even. . . . . . But then q’s even! Bang! wow! Like freakout! Pigs fly! Woe, too much to take. So now spare a moment few. Poor Pythagoras

Dozens of other mathematically themed haiku are available in a 2018 issue of the Journal of Humanistic Mathematics. Given the renown of the Fibonacci sequence (mentioned in section “The Oulipo and Raymond Queneau”), it is no wonder that mathematically minded poets have mined it for structure for years. Some scholars have suggested that structural references to the Fibonacci numbers appear in poetry as varied as Virgil’s “Aeneid,” Homer’s “Odyssey,” and the general form of the limerick (Birken and Coon, 2008). More recently, poets have used the Fibonacci sequence as a syllabic constraint in a variety of ways. In 2006, Gregory Pincus posted a six-line poem on his blog, in which each line contained the number of syllables equal to the corresponding Fibonacci number. He defined this form as the “Fib.” The popularity of the fib soared in subsequent months and years: the form received coverage in The New York Times and inspired The Fib Review, an online journal devoted solely to fibs and other Fibonacci-related poems. As pointed out in (Birken and Coon, 2008), poems using the Fibonacci sequence as a guide for syllable count existed before the phenomenon of the Fib. In 1980, Michael Johnson published “Fibonacci Time Lines,” a poem where each of the poems’ nine lines’ syllable count matches the corresponding Fibonacci number. And in 2002, Denis Garrison wrote a poem, “Nautilus,” in which not only do the first seven lines have syllable counts matching the corresponding Fibonacci numbers, but the last seven lines contain a number of syllables prescribed by the first seven Fibonacci numbers in descending order. The poem thus contains a sort of syllabic swell and ebb. There are many other poems which reflect the ascending syllable-count lines with corresponding descending syllable-count lines. Examples include Sherman Alexie’s “Requiem for a Pay Phone” and “BEAM: A Fibonacci Poem” by the author of this chapter. In “Momentary Time Travel,” Jennifer R. Shloming demonstrates this growth and contraction of syllables.

36 Poems Structured by Mathematics

1069

“Momentary Time Travel” Jennifer R. Shloming, 2019 For some a smell or a taste; For me, it’s a song. I close my eyes and I travel to a memory. Press repeat; And stay; And Stay.

The following definition encompasses a variety of syllabic restrictions afforded by the Fibonacci sequence. Definition. A Fibonacci poem is a poem in which the Fibonacci sequence is used to prescribe the number of syllables in each line of the poem. Another simple form of syllabically enumerated poem is the syllable square. Definition. A syllable square is an n-line poem in which each line contains n syllables. Such poems go back centuries, and an n = 10 example from the late sixteenth century is “Square Poem in Honor of Elizabeth I.” More recently, JoAnne Growney published a collection of syllable squares for n = 4, 5, 6, and 7, related to mathematics and gender, in a 2019 issue of Math Horizons. The following n = 4 example is from that collection. “Little Women” JoAnne Growney, 2019 In school, many gifted math girls. Later, so few famed math women!

The final writer mentioned in the discussion of syllabically constrained poems is Victor Hugo. In 1829, Hugo published “Les Djinns” in his collection Les Orientales. “Les Djinns” is a sequence of 15 stanzas of 8 lines each. In the first stanza, each line has 2 syllables, in the second stanza each line has 3 syllables, and so forth, until one reaches the seventh stanza, where each line has 8 syllables. In the eighth stanza, perhaps surprisingly, each line contains 10 syllables. Stanzas 9 through 15 then consist of decreasing numbers of syllables per line, so that the ninth stanza’s lines each contain 8 syllables, the eighth stanza’s lines each contain 7 syllables, and so on, until one reaches the fifteenth and final stanza, where each line consists once again of 2 syllables. (These syllable counts are based on the original French, and may not necessarily be reproduced faithfully in every translation.)

1070

D. May

One could ask a mathematical question about “Les Djinns”: what function s(n) models the number of syllables per line in the nth stanza? Because of Hugo’s pre-Oulipian use of a clinamen, this is not as simple as it would be otherwise. If the eighth stanza’s lines each contained the expected 9 syllables, an absolute value function could be used to model the poem. Specifically, the function sˆ (n) = −|n − 8| + 9 would give the correct number of syllables sˆ (n) per line in stanza n, for n ∈ D = {1, 2, . . . , 15}. But because the middle stanza of Hugo’s poem does not follow the same pattern, the vertex of sˆ (n) doesn’t correctly count that stanza’s syllables. An additional “+1” must be added only to sˆ (8). One (perhaps rather daft) way to accomplish this with only elementary functions is to define s(n) = sˆ (n) + h(n), where

h(n) =

15 7  −1  (n − i) (n − i). (7!)2 i=1

i=9

In particular, h(n) is zero for all values of n in D except at the n = 8 stanza, where the function “turns on” and gives a value of 1. So at the n = 8 stanza, s(8) = sˆ (8) + h(8) = 9 + 1 = 10 as required. A shorter definition of s(n) could also be given using the Kronecker delta function: s(n) = sˆ (n) + δ(8)(n) , where δij is zero everywhere except at i = j , where the function takes the value 1. The reader is invited to devise their own functions consisting of combinations of absolute values and Kronecker deltas to set syllablecount restrictions for themselves in the spirit of “Les Djinns.” To conclude this subsection, some other examples that could fit here are mentioned. The traditional sonnet is structured by the syllables-per-line sequence (ai )14 i=1 , where ai = 10 for all i. This is the syllabic constraint Queneau followed in his “Cent mille milliards de poémes.” The traditional French alexandrine is governed by the syllable sequence (ai )ni=1 , where ai = 12 for all i. Finally, the “pi-ku” is another syllable-per-line mathematical constraint that will be discussed later in this section in the context of other pi-related forms.

Words per Line and Latin Squares This subsection presents a variant on the syllable square, which has appeared in a mathematically nontrivial way. Definition. A word square is an n-line poem in which each line contains n words. In a pair of papers, Lisa Lajeunesse explored various kinds of poetry puzzles, including one that involves word square poems (Lajeunesse, 2018, 2019). Lajeunesse first demonstrated how to scramble and unscramble preexisting word square poem by using Latin squares, and she then composed original poetry according to a pair of orthogonal Latin squares.

36 Poems Structured by Mathematics Fig. 6 A pair of orthogonal Latin squares on the sets {A,B,C} and {α, β, γ }

1071

Aα Bγ Cβ Bβ Cα Aγ Cγ Aβ Bα

An order n Latin square is an n × n array, in which one of n symbols appears precisely once in each row and each column. A pair of orthogonal Latin squares (also referred to as a single Graeco-Latin square) is a pair of Latin squares using different symbol sets in which each of the n2 possible pairs of symbols (with one taken from each set) appears precisely once in the n2 array positions. A pair of orthogonal Latin squares of order 3 is shown in Fig. 6. Lajeunesse constructs a Graeco-Latin square poem as follows. A single word is placed in each position of the array, and two stanzas are then recovered: one for each symbol set used to produce each individual Latin square. Thus for the n = 4 case, Lajeunesse obtains a two-stanza poem, and each stanza is a 4−line word square containing the same total 16 words. Writing something coherent and effective from such a restrictive device, as Lajeunesse has done, is quite a feat, and her paper includes some welcome advice to anyone else attempting to write to this constraint. And Lajeunesse’s ideas can be generalized. As it turns out, it is possible to have more than two Latin squares which are pairwise orthogonal to each other; such collections of arrays are called mutually orthogonal Latin squares, or MOLS. Among the many interesting results on MOLS is the fact that for prime powers n = pe (for p a prime and e a positive integer), one can construct n − 1 MOLS of order n. (In fact, Fig. 6 is an example of this result, with n = 3 and e = 1.) Lajeunesse’s definition of a Graeco-Latin square poem could thus be extended to an n − 1 stanza poem in which each stanza is an n-line word square. However, composing a meaningful poem in such a way seems quite challenging indeed for any number of MOLS greater than 2.

Lines per Stanza and Pi Many poets pay close attention to the number of lines in a given stanza, and the mathematical constant pi has focused this attention in a surprising number of ways. Additionally, pi has provided other kinds of poetic structure for mathematically minded poets to explore. The reader will recognize the short integer sequence (3, 1, 4, 1, 5) as the first five digits of pi, and the Oulipo has made use of the fact that this sequence sums to 14, the

1072

D. May

number of lines in a traditional sonnet. Both Jacques Bens and Harry Mathews have published so-called irrational sonnets, in which the 5 stanzas contain, respectively, 3, 1, 4, 1, and 5 lines. This idea has been extended by Peter Meinke to a poem with 7 stanzas in which the number of lines per stanza is determined by the sequence (3, 1, 4, 1, 5, 9, 2). And this idea has been applied to at least one other famous irrational number: JoAnne Growney has written a 6-stanza poem in which the number of lines per stanza is dictated by the sequence (1, 4, 1, 4, 2, 1). Pi has been used in other ways to structure poetry, and the next several forms are all examples of what have been called “piems.” The first of these is the cadae, which combines syllables-per-line and lines-per-stanza constraints based on pi (Arndt and Haenel, 1998). Definition. A cadae is a 5-stanza poem, in which the number of lines per stanza is given by the sequence (3, 1, 4, 1, 5). Further, the number of syllables per line of the poem’s 14 lines is given by the sequence (3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7). The name “cadae” can be explained by looking at the relative position in the alphabet of each letter in the word, and the form takes advantage of the happy coincidence that no 0 appears in the first 14 digits of pi. Tony Leuzzi included a series of 33 cadae in his 2014 book The Burning Door. The untitled poems are collected in a section entitled “A Thing or Two,” which concludes with the following. “Things stay green” Tony Leuzzi, 2014 Things stay green for such a short time. Time is a lousy word for poems that are all about it and most poems certainly are. + Let’s try this again: things stay green for such a short while then turn disappear and return in abundance as if “everywhere” distracts the eye from “ever.”

Another type of piem is the pi-ku, a discussion of which would have also fit in the syllables-per-line subsection. Definition. A pi-ku is a three-line poem in which the first line contains three syllables, the second line one syllable, and the third line four syllables.

36 Poems Structured by Mathematics

1073

Fig. 7 A pi-ku by BLW McGrory, from Pi(e)-ku Poetry Zine Issue No. 3 (March, 2020)

Pi(e)-ku Poetry is a biennial poetry zine founded to fight hunger and food insecurity in the mid-Atlantic region and is available online and in print. The example in Fig. 7 was written by editor in chief BLW McGrory, and she points out that the traditional practice of pairing an image with haiku also works well with the pi-ku form. Though not specifically a form of poetry, the constrained technique known as “pilish” also deserves mention here. Definition. Pilish is a constrained form of writing in which the sequence of letters per word matches the sequence of digits of pi. Pilish texts are sometimes used by people who want to memorize digits of pi. The eight-digit mnemonic “May I have a large container of coffee?” appears to have been first recorded by Martin Gardner in 1959, and published examples exist at least as far back as 1905 (Arndt and Haenel, 1998). In 1995, Mike Keith wrote a poem which provides a 740-digit representation of pi. Keith’s poem is based on Edgar Allan Poe’s “The Raven” and begins as follows. “Poe, E. Near a Raven” Midnights so dreary, tired and weary. Silently pondering volumes extolling all by-now obsolete lore. During my rather long nap - the weirdest tap! An ominous vibrating sound disturbing my chamber’s antedoor. “This,” I whispered quietly, “I ignore.”

1074

D. May

Perhaps the most famous example of a pilish song was composed by Andrew Huang in 2004. His song acts as a mnemonic for the first 101 digits of pi, and his performance of the song (complete with guitar accompaniment) has been viewed over 100, 000 times on YouTube. The first few lines are as follows. from “PI MNEMONIC SONG” Andrew Huang, 2004 Man, I can’t, I shan’t, formulate an anthem where the words comprise mnemonics dreaded mnemonics for Pi.

Letters per Line The final enumeration constraints mentioned in this section are based on the number of letters per line in a poem. The sequences (1, 2, 3, . . . , n − 1, n) and (n, n − 1, n−2, . . . , 2, 1) have been used by Oulipians and others to construct, respectively, snowballs and melting snowballs. Definition. In a snowball poem, each line contains one more letter than the previous line, and the first line consists of a one-letter word. Each line of a melting snowball poem has one fewer letter than the preceding line, ending with a one-letter word in the final line. Sometimes poets combine a snowball and melting snowball to obtain a diamondshaped poem structured by the sequence (1, 2, 3, . . . , n − 1, n, n − 1, . . . , 3, 2, 1). Victor Hugo’s Les Djinns is a form of diamond-shaped syllable snowball, and examples of various kinds of snowballs appear in the August 5, 2010, entry of Intersections. Finally, the Oulipian practice of “going for the limit” pushes the enumeration constraints of this section to one extreme. François Le Lionnais has written poems consisting of a single word, and even a single letter, both of which can be found in the Oulipo Compendium. Untitled François Le Lionnais, 1961 T.

Pantoums and Platonic Solids Like the sestina, the pantoum is built around a systematic pattern of repetition throughout the poem. Instead of repeated words, however, the poet repeats entire lines when moving from stanza to stanza in a pantoum. This section provides a

36 Poems Structured by Mathematics

1075

discussion of the pantoum, and a modification of the form leads naturally into an investigation of poetry structured by Platonic solids. Before formally defining the repetition in a pantoum, the reader is invited to look for patterns in the following pantoum by Sarah Glaz, inspired by the description of a famous theorem. “A Pantoum for the Power of Theorems” Sarah Glaz, 2017 The power of the Invertible Matrix Theorem lies in the connections it provides among so many important concepts. . . It should be emphasized, however, that the Invertible Matrix Theorem applies only to square matrices. David C. Lay, Linear Algebra and Its Applications The power of a theorem lies In the connections it provides Among many important concepts Under a certain set of assumptions. In the connections it provides We are always able to find Under a certain set of assumptions Some that fell through the cracks. We are always able to find Neglected aspects of ourselves Some that fell through the cracks Left unexplored by mathematics. Neglected aspects of ourselves (The power of a theorem lies) Left unexplored by mathematics Among many important concepts.

As is evident from Glaz’s example, the pantoum is a sequence of quatrains in which some lines recur from stanza to stanza. This sequence can be any number of stanzas in length, and unlike in section “Poetic Enumeration”, there are no syllabic or word-count restrictions in play. While pantoums sometimes employ an ABAB rhyme pattern in the quatrains (Strand and Boland, 2000), pantoums also frequently adhere to no rhyme scheme at all. The pantoum is defined by the following prescribed line repetition. Definition. A pantoum is sequence of quatrains of indeterminate length in which the second and fourth lines of one quatrain become the first and third lines, respectively, of the following quatrain. This definition requires the poet to compose any four-line quatrain, consisting (in order) of lines A, B, C, and D. The second stanza requires her to write two new lines of poetry, lines E and F , and to interweave them with lines B and D from the previous stanza. Thus, the second stanza will consist (in order) of lines B, E, D,

1076

D. May

Stanza 1 Line A Line B Line C Line D

Stanza 2 Line B Line E Line D Line F

Stanza 3 Line E Line G Line F Line H

Stanza 4 Line G ??? Line H ???

Fig. 8 The line repetition pattern of the pantoum, with the final stanza left blank to allow for various methods of completing the pantoum

and F . This repetition pattern repeats through the final stanza, however long it takes the poet to reach it; see Fig. 8. It may not be completely clear how this pattern should end, and indeed different poets end their pantoums in different ways. The definition does not specify what should appear as the second and fourth lines of the pantoum’s final stanza. Some poets write new lines of poetry for these positions, which have not appeared anywhere previously in the poem. In that situation, lines A and C (from the first stanza) never reappear, and the new (second and fourth) lines of the final stanza, too, only show up once in the pantoum. Donald Justice’s “Pantoum of the Great Depression” is one such example. A symmetric approach to the final stanza is perhaps more common, in which the as-yet unrepeated lines from the first stanza (lines A and C) make a repeat appearance, being the only two lines which have not yet appeared twice in the entire poem. Some poets prefer to repeat lines A and C as the fourth and second lines, respectively, of the final stanza. This is how Nellie Wong concludes her pantoum “Grandmothers’ Song.” In the case of a four-stanza pantoum (as in Fig. 8), this produces a final stanza with lines G, C, H , and A. This approach does have the nice property that the poem’s first and last lines will match. However, this placement of lines A and C into the final stanza could be considered a bit mathematically inconsistent. The reader might expect a different placement of lines A and C if the pattern of arrows in Fig. 8 is extended to “wrap around” the poem from the last stanza to the first. That wrapped-around extension would produce an arrow between the second line of the final stanza and the first line of the opening stanza and also one between the last line of the final stanza and the third line of the opening stanza. This would produce lines G, A, H , and C in the fourth (and in this case final) stanza. Given that she is a mathematician, it is perhaps unsurprising that Glaz follows this more symmetric approach to line repetition in the final stanza. Wrapping the repetition around the poem like this may be reminiscent of modular arithmetic. Or to take a more geometric analogy, one could imagine the stanzas written on the surface of a cylinder. Upon completion of the final stanza, the same line repetition pattern then naturally produces the first stanza. In that sense, a pantoum can be read as an infinitely repeating poem. The Oulipo has used the

36 Poems Structured by Mathematics

1077

term “cylinder” to describe a circular text like this, and one example of an infinitely repeating poem is discussed in section “Incidence Geometry Poetics”. While the strictest definition of the pantoum requires the poet to repeat the lines verbatim from previous stanzas, some poets relax this requirement somewhat. In “Pantoum of the Great Depression,” Justice modifies some of his lines very slightly from stanza to stanza, replacing one word for another here or there. Carolyn Kizer uses the same techniques in her “Parents’ Pantoum,” and these poets’ subtle modifications to their repeat lines can be seen as an Oulipian clinamen. Whether the repetition is word-for-word, or blurred a bit, the repeated appearance in close proximity of lines can produce an arresting sort of cyclic meditation, as the reader (or listener) repeatedly encounters the same ideas and phrases, but situated in new contexts. This can be especially powerful when the pantoum is used as a reflection on the continuing nature of trauma, loss, or grief. Writing in The Washington Post in 2018, Tracy K. Smith said that the successful pantoum “creates an eerie sense of deja vu.” The repetitive structure of the pantoum can also be used as an aid to remember a poem, and the form was adapted into the English language (through France) out of the Malay language in the early nineteenth century (Strand and Boland, 2000). Even though the form has been used to celebrate mathematical themes in verse, there is still work to be done to mathematically generalize the repetition pattern of the pantoum. It is clear that any even number of lines per stanza could produce a pantoum-like poem: simply repeating the even-numbered lines of the preceding stanza into the odd-numbered line positions of the proceeding stanza will produce a cylindrical poem in the spirit of the pantoum. But a more robust exploration of generalized pantoums has apparently not been undertaken. Generalization of the pantoum notwithstanding, at least one mathematical variant does indeed exist: Enriqueta Carrington created a form she calls the “tetrahedral pantoum.” First appearing in the April 8, 2010, entry on Intersections, the tetrahedral pantoum was later outlined in detail on the blog Math with Bad Drawings in 2018. Like a traditional pantoum, the form is based on a prescribed repetition of lines from stanza to stanza. Unlike the traditional pantoum, the tetrahedral pantoum involves a tetrahedron. Definition. A tetrahedral pantoum is a four-tercet poem, which in total contains six distinct lines of poetry. Each poetic line corresponds to an edge of a tetrahedron, and each tercet corresponds to a face of the tetrahedron. The ordering of the lines in the poem is dictated by Fig. 9. To compose a tetrahedral pantoum, the poet starts with a tetrahedron. For aesthetic reasons, the poet might prefer to consider a regular tetrahedron, the Platonic solid consisting of four congruent equilateral triangular faces. The vertices are labelled A, B, C, and D, and the six edges of this solid are each inscribed with a distinct line of poetry. This inscription mostly determines the poem, but the writer is still required to make some decisions about how to start. Specifically, each of the four faces of the tetrahedron produces one three-line stanza consisting of the lines of poetry corresponding to the edges of that face.

1078

D. May

Fig. 9 The tetrahedral pantoum, as depicted in Math with Bad Drawings

Thinking of the tetrahedron as a graph, the edges contained in a face are traversed in order to complete a cycle. The poet chooses one face which contains the lines she wishes to include in the first stanza and chooses an order for those lines (thus creating a cycle on that face). In pantoum style, the first line of the next stanza must be the middle line from the previous stanza, and the corresponding edge must be traversed in the same direction. Because each edge is contained in only two faces, this completely determines the next stanza and hence the entire poem. Figure 9 displays the mechanics for the line repetition in Enriqueta Carrington’s tetrahedral pantoum “The Goddess Works Her Loom.” “The Goddess Works Her Loom” Enriqueta Carrington, 2010 Until at last the pattern is fully there, who can read the figures that she weaves? Ixchel sits on her heels, a snake in her hair. Who can read the figures that she weaves as she murmurs a lullaby, spell, or prayer? One mother rejoices, another one grieves, as she murmurs a lullaby, spell, or prayer while children drop like tears, or rain, or leaves. Ixchel sits on her heels, a snake in her hair while children drop like tears, or rain, or leaves until at last the pattern is fully there one mother rejoices, another one grieves.

In this example, Carrington has chosen to use ABC as the face determining the first stanza and edge CA as the poem’s opening line. This gives the cycle (C, A, B, C), corresponding to lines 1, 2, and 3 of the poem’s opening stanza. The line for edge AB (line 2) must now be the first line in the second stanza; thus ABD

36 Poems Structured by Mathematics

1079

is the face determining this second stanza. Because edge AB must be traversed from A to B as before, the cycle that produces the second stanza has to be (A, B, D, A). Thus the second stanza contains, respectively, lines 2, 4, and 5. The remaining two stanzas are produced similarly. Generalizing the tetrahedral pantoum to larger Platonic solids is irresistable, but for now no examples of a (say) dodecahedral pantoum are known to exist. However, some poets have explored forms based on the most famous of Platonic solids, the cube. One such form was known as the cube puzzle and was developed by Herbert Schuldt. His cube, completed in 1981, is a physical cube which was cut into ten irregular pieces. Six of the pieces are pentahedral, and four of them are tetrahedrons. Each face of each piece is inscribed with a word, for a total of (6 × 5) + (4 × 4) = 46 words in any poem constructed from the cube. These words are then rearranged into a poem in some way that remains a bit opaque from the description in the Oulipo Compendium. A more decipherable cube-based poetic form is the Rubik’s cube poem devised by Shelley Wood, which she wrote about on her website in detail in 2015. Wood selected 6 poems, each of which contained the word “puzzle,” and collected 43 distinct words in total from them. Repeating the word “puzzle” 6 times, “the” 4 times, “a” 3 times, and “you” twice, Wood obtained a list of 54 words, taking into account the words’ multiplicities. From this list, she then composed six short poems of nine words each, such as the following. Unscrambled Rubik’s cube poem #1 Shelley Wood, 2015 a million birds solved the puzzle so unexpectedly simple

She then affixed each poem to one of the six faces of a Rubik’s cube and asked students in her creative writing class to scramble the puzzle. The consequences of this are evident: each scrambled position of the Rubik’s cube corresponds to six short poems of nine words each. Wood suggests that while many of the resulting nine-word poems were gibberish, there were some gems in the bunch, such as the poem shown in Fig. 10. Scrambled Rubik’s cube poem #2 Shelley Wood, 2015 his eyes behold the always puzzle a superhuman poem(s)

Poetically, it would be an interesting challenge to find collections of 54 words which are most likely to produce more coherent poems. Mathematically, an interesting aspect of Wood’s Rubik’s cube poems is the reduction of poetic possibilities the puzzle imposes. Suppose, in the absence of the cube, a poet selected 54 distinct words (which isn’t quite the situation in Wood’s Rubik’s cube poems) and wanted to randomly

1080

D. May

Fig. 10 A scrambled Rubik’s cube poem

compose a poem of nine words from them. The number of such poems possible would be P (54, 9) = 1,929,772,710,028,800. But the number of distinct poems allowed by the Rubik’s cube is much less than that. This number is equal to the number of possible configurations of a single face of the cube. To begin this count, notice there are 6 choices for the center square. Because a Rubik’s cube has 8 corner pieces, there are P (8, 4) ways to arrange 4 corner pieces onto the face in question, and each of these corner pieces has 3 orientations. Of the Rubik’s cube’s 12 edge pieces, there are P (12, 4) ways to arrange 4 of them on the face in question, and each of these edge pieces has 2 orientations. This gives 6×P (8, 4)×34 ×P (12, 4)×24 distinct poems produced by a Rubik’s cube labelled with 54 distinct words, or 155,196,518,400 different poems. This is less than one hundredth of 1% of the nine-word poems allowed without using the Rubik’s cube. Wood’s use of repeated words could be considered a type of clinamen, diverging from the stricter constraint of 54 unique words. Because of these repeats, the number of nine-word poems present in her cube is less than the number of configurations of a single face of the Rubik’s cube. As it turns out, Wood’s cube contains a mere 29, 937, 600 potential poems. Given all of this, one possible definition of a Rubik’s cube poem is as follows. Definition. A Rubik’s cube poem is any nine-word poem composed from a scrambling of a Rubik’s cube labelled with 1 word in each of its 54 cells.

36 Poems Structured by Mathematics

1081

Of course this definition is not very robust in the sense that any collection of nine words could be a Rubik’s cube poem for some labelling of the unscrambled cube. Wood’s Rubik’s cube poems could be extended to other cubes or cube-like twisty puzzles. She used the standard 3 × 3 × 3 variety, but n × n × n cubes are widely available for at least the set of n-values {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 17}. And virtual cubes can of course be constructed for any n. Additional physical Rubik’s cube-like puzzles also exist: the “pyraminx” is a tetrahedral twisty puzzle, and the “megaminx” is a dodecahedron. Several models of each of these exist, which vary the number of cells per face of the puzzle. It would be an interesting poetic exploration to see which of these puzzles are most amenable to Wood’s style of scrambled poetry. The last form briefly mentioned in this section is the villanelle, which is in some ways similar to the pantoum. Definition. A villanelle is a six-stanza poem, in which the first five are tercets. The first line of the first tercet appears at the end of the second and fourth tercets, and the third line of the first tercet appears at the end of the third and fifth tercets. The final stanza is a quatrain with the final two lines a repetition of the poem’s first and third lines, respectively. Further, the first and last lines of every tercet rhyme. Finally, the middle lines of each tercet also rhyme with each other and the second line of the concluding quatrain. Because of line repetition, a villanelle contains 13 distinct lines, as depicted in Fig. 11. Even further, the restrictive rhyme scheme of the villanelle dictates that lines A, C, D, F, H, J, and L all rhyme, while the remaining lines B, E, G, I, K, and M share a different rhyme. Famous twentieth-century villanelles were written by Dylan Thomas, Sylvia Plath, and Elizabeth Bishop. As with the pantoum, the line repetition pattern of the villanelle has evidently not been generalized to produce any longer villanelle-like poems. Still, the form remains irresistible to mathematical poets, and Gizem Karaali published “A Mathematician’s Villanelle,” an investigation into the origins of her mathematical life, in a 2018 issue of Math Horizons. (See also her chapter in this volume with Larry Lesser  Chap. 32, “Mathematics and Poetry: Arts of the Heart”).

Stanza 1 Line A Line B Line C

Stanza 2 Line D Line E Line A

Stanza 3 Stanza 4 Stanza 5 Stanza 6 Line F Line H Line J Line L Line G Line I Line K Line M Line C Line A Line C Line A Line C

Fig. 11 The line repetition pattern of the villanelle

1082

D. May

Fundamental Theorem of Arithmetic Poetry Similar to the discussion in section “Pantoums and Platonic Solids”, this section is devoted to a form which is based on a prescribed repetition of poetic lines. Although the form itself is rather arcane, fundamental theorem of arithmetic poems (or F T A poems) are derived from one of the most famous and widely used theorems in mathematics. The fundamental theorem of arithmetic is a statement about the factorization of positive integers. Specifically, it states that every composite integer greater than one can be factored as a product of prime numbers, where the factorization is unique up to the order of the prime factors. For example, the positive integer 360 is factored as 2 × 2 × 2 × 3 × 3 × 5, which is more succinctly written using exponents as 23 × 32 × 5. The conventional representation of an integer’s prime factorization is to arrange the primes in increasing order from left to right. For some purposes, it is convenient to include the exponent 1 when appropriate (as in 360 = 23 × 32 × 51 ), but F T A poems do not use the exponent 1. For prime numbers themselves, the prime factorization is trivial. Using the theorem, an F T A poem is defined as follows. Definition. An FTA poem is a poem in which each prime-numbered line consists of a unique phrase. The composite-numbered lines are constructed according to the prime factorization of that line’s position, where phrases corresponding to the primes in that factorization (either in the base or the exponent) appear in order, connected by single words representing multiplication and exponentiation. The definition does not make it clear whether the poem’s lines are numbered in increasing or decreasing order, and indeed both approaches have appeared in published examples. JoAnne Growney numbers her lines in increasing order from top to bottom in the following example, and the title is considered line 1. “We Are the Final Ones” JoAnne Growney, 2015 we breathe dirty air coral reefs die we breathe dirty air as we breathe dirty air storms are extreme we breathe dirty air and coral reefs die climate change affects the poor first we breathe dirty air as coral reefs die coral reefs die as we breathe dirty air we breathe dirty air and storms are extreme we drive instead of walk we breathe dirty air as we breathe dirty air and coral reefs die drought is a serial killer we breathe dirty air and climate change affects the poor first coral reefs die and storms are extreme we breathe dirty air as we breathe dirty air as we breathe dirty air What will happen to the polar bears? we breathe dirty air and coral reefs die as we breathe dirty air

36 Poems Structured by Mathematics Table 4 Prime number phrases in JoAnne Growney’s “We Are the Final Ones”

1083 Prime 2 3 5 7 11 13 17 19

Phrase “we breathe dirty air” “coral reefs die” “storms are extreme” “climate change affects the poor first” “we drive instead of walk” “drought is a serial killer” “What will happen to the polar bears?” “trash piles grow”

trash piles grow we breathe dirty air as we breathe dirty air and storms are extreme coral reefs die and climate change affects the poor first

In this poem, Growney uses the word “and” to stand for multiplication and “as” for exponentiation, and the phrases used to represent each prime number are listed in Table 4. Thus, for example, line 5 (counting the title as line 1) of the poem is the phrase “storms are extreme.” The remaining lines are constructed from composite integers. For example, the 12th line is produced as 12 = 22 × 3, so that line contains the phrase for 2 (representing the base in 22 ) followed by “as” (for exponentiation) followed by a repeated statement of the phrase for 2 (representing the exponent in 22 ), then “and” (for multiplication), and finally the phrase for 3. In the situation where the prime factorization requires an exponent that is composite, the poet will simply use the line produced by that composite number from earlier in the poem. For example the 16th line of “We Are the Final Ones” is 2 composed from the prime factorization of 16 which is 24 = 22 . So this line is “we breathe dirty air as we breathe dirty air as we breathe dirty air.” In the other variant of the F T A poem, the poet numbers their lines in decreasing order from top to bottom. Under this approach, lines with phrase repetition appear earlier in the poem than the line containing only that phrase. In this case, line 1 (the poem’s final line) may either include simply a punctuation mark, or remain blank. The first F T A poem published in 1978 by Carl Andre adhered to this decreasing line numbering. Another such example is Sarah Glaz’s “13 January 2009.” One of the poetic strengths of the F T A poem is that while the repetition of phrases is obvious to the reader, the pattern of that repetition remains hidden. So the reader comes to expect phrases to recur over and over, but does not know exactly which phrases to expect, and when. The overall effect can be haunting and ominous, as in “We Are the Final Ones.” Because every other integer is even, the phrase used for 2 will reappear quite frequently: precisely in every other poetic line, and never in consecutive lines. The phrase will always appear at the beginning of a line, so that phrase serves as a kind of anaphora for the poem.

1084

D. May

Similar considerations, of course, must be given to the phrase standing for 3, which will recur approximately two-thirds as often as the phrase for 2. And each prime number phrase will be repeated with less and less frequency as they are introduced, so the poet is able to introduce more specific ideas that do not need to hold meaning in as many different contexts. Some famous results and open questions about prime numbers have interesting consequences for F T A poems. Euclid provided a proof that there are infinitely many primes in Book IX of the Elements, so a poet will never stop introducing new phrases into an F T A poem, regardless of how long she writes. However, the reader will encounter these new phrases with decreasing frequency as a result of the prime number theorem, which describes how prime numbers become less and less frequent as they get larger. (A result depicted in Alice Major’s “The god of prime numbers” mentioned in section “Introduction”.) But it would take quite a long poem for the reader to notice this widening of the gap between newly introduced phrases. There is a sort of ebb and flow to the action in “We Are the Final Ones.” After encountering mostly new phrases over the poem’s first several lines, repetition sets in and tends to produce a meditative state. This state is then occasionally punctured by the startling introduction of new phrases. The shock to the reader is accelerated in lines 11–13, with the introduction of two new phrases in three lines. This is because 11 and 13 are twin primes. Will the reader of a very long F T A poem ever stop encountering these poetic accelerations? The twin prime conjecture suggests that there are infinitely many twin primes, but it remains unproven.

Incidence Geometry Poetics In incidence geometry, one is concerned with two types of objects and an incidence relation between them. These types of objects are often referred to as “points” and “lines,” and in that case the incidence relation may be described in terms of a point lying on a line. This section presents two poetic forms based on different kinds of incidence geometry: graph theory and finite projective planes. In a graph, the two types of incidence geometric objects are vertices and edges. Specifically, a graph is defined as a set of vertices and a set of edges, where each edge is a subset of cardinality two of the vertex set. A directed graph, or digraph, is a graph in which each edge has an orientation, so that movement is implied from one vertex to another. Working in collaboration with Courtney Huse Wika, the author of this chapter used the direction provided by certain digraphs as templates for multiple choice poetry (Huse Wika and May, 2017). One specific digraph used in this project was the balanced tournament graph shown in Fig. 12, which is obtained by assigning directions to the edges of the complete graph on five vertices. Poetry was composed over this digraph according to the following definition. Definition. Given a digraph, a digraph poem is a poem in which one stanza or canto is written to correspond to each vertex of the digraph. Each edge from a vertex is

36 Poems Structured by Mathematics

1085

Fig. 12 Balanced directed graph on five vertices

1

i 5

a

j h

g

2

b

f d

4 e

c 3

represented with a line of poetry, which corresponds to a choice the reader must make at the end of that section of the poem. Huse Wika’s poem “This Is Where You’ll Find Her” superimposes the stages of grief with the seasons in a year. In that poem, the following canto occupies vertex 4 in Fig. 12. from “This Is Where You’ll Find Her” Courtney Huse Wika, 2017 When there is nothing but the hymns of wind, even the jays are silent. This part is always the hardest: when she writes the same letter when the wind forces the pines to their knees when she tries to forgive her long winter. g. She measures her days in resignation. h. Winter breaks, and she is back to her beginning.

After reading this canto, the reader is asked to travel to the next section of the poem by choosing either edge g or edge h. Included with the poem is a table which assigns a poetic choice to each of the ten edges, including the choices listed in the preceding excerpt. “This Is Where You’ll Find Her” consists of at least 80 different readings, depending on the various choices the reader makes in navigating the poem (Huse Wika and May, 2017). In fact the poem may be read by starting at any vertex and is thus an example of an Oulipian cylinder. “This Is Where You’ll Find Her” is also infinite in the sense that the digraph contains cycles, so a reader can continue to reread sections multiple times if they wish. As a part of that project, the author of this chapter wrote multiple choice poetry over other digraphs, including the Hasse diagram of a three-element set. In that case, the digraph suggests a specific starting and ending point, and does not contain cycles. The resulting poem is thus finite and was published in 2020 in the Journal of Mathematics and the Arts. Further generalizations of this form to other digraphs await future poets.

1086

D. May

Fig. 13 The Fano plane

The second incidence geometric poetic form devised by Huse Wika and the author of this chapter is based on a finite projective plane (May and Huse Wika, 2015). Projective planes are incidence structures which adhere to a short set of straightforward axioms. Specifically, a projective plane is a set of points, along with a set of lines (which are each subsets of those points) which obey the following axioms. Projective Plane Axiom 1: Given any two distinct points, there exists exactly one line containing both points. Projective Plane Axiom 2: Given any two distinct lines, there exists exactly one point contained in both lines. Projective Plane Axiom 3: There exists at least 4 points such that no 3 of them are contained in the same line. The symmetry of the first two axioms means that not only do any pair of points determine a unique line (as in the familiar context of Euclidean geometry) but also that any pair of lines intersects in a unique point. That is, projective planes do not contain parallel lines. A finite projective plane is one with a finite set of points. The smallest example of a finite projective plane contains seven points and seven lines. Each line contains three points, each point is contained in three lines, and these symmetries are not coincidental. This smallest finite projective plane is known as the Fano plane and is depicted in Fig. 13. (In that figure, the “circle” g is merely one of the seven lines, containing points B, D, and F ). A Fano plane poem is defined as follows.

36 Poems Structured by Mathematics

1087

Definition. A Fano plane poem is a poem consisting of seven stanzas, each corresponding to one line of the Fano plane. The poem contains seven repeated key words, each corresponding to one point of the Fano plane. The incidence of the repeated words in the stanzas follows the projective plane axioms governing the Fano plane. Some of the poetic consequences of this definition include that every pair of the repeated key words appears precisely once together in some stanza and that every pair of stanzas share precisely one key word. Oftentimes the poet writes to a sort of Fano plane template. The poet chooses seven key words, places them on the points of the Fano plane, and then composes seven stanzas of poetry to represent each of the plane’s lines. Figure 14 is such a template from a Fano plane poem written by Michelle Stampe. The following is an excerpt from that poem. from “As It Is” Michelle Stampe, 2015 You notice it caged behind the wire mesh, Miraculously dry in the flood of rainwater, And lean in close to study how the drops Muddle its colors to grey. Drops creating synapses between drops. You glimpse it in puddles in the yard, Altered and rippling in craters Made from a grey sky and a nihilistic lawnmower. You look up to see if nature’s an honest painter, And are disappointed by her depiction.

Comparing the excerpt to Fig. 14, the key words “flood,” “rain,” and “grey” appear in the first stanza, and “alter,” “crater,” and “grey” appear in the second. These stanzas correspond to Fano plane lines d and e, respectively, in the labelling of Fig. 13. Because of the first projective plane axiom, the key words “flood” and “grey” must appear together in exactly one stanza of the poem (the first in the excerpt). The second projective plane axiom guarantees that every pair of stanzas will have precisely one key word in common; for the two excerpted stanzas, that word is “grey.” Stampe’s introduction of a single line between the excerpted stanzas is a device she employs elsewhere in the poem. While not a formal requirement of the Fano plane poem, these additional single lines serve to momentarily disrupt the flow of connections from stanza to stanza. These single lines provide a sort of clinamen, the kind of poetic disruption of form that has existed throughout many other forms of constrained writing. As with F T A poems, Fano plane poems feature repetition that is generally evident to the reader, but the underlying structure of this repetition is quite opaque. But the symmetry of connections afforded by the projective plane axioms provides a nice unification of the poem: every stanza is connected to every other stanza via a key word, and every key word appears close to every other key word exactly once.

1088

D. May

Fig. 14 Fano plane poem template for the poem “As It Is”

Examples of Fano plane poems can be found in the 2016 and 2018 Bridges poetry anthologies and online in a 2016 issue of Talking Writing. The next largest projective plane contains 13 points and 13 lines, and in general a projective plane of order n has n2 + n + 1 points and the same number of lines. Planes are known to exist for n-values which are powers of primes, but no projective plane poems are known to exist for order larger than 2 (the order of the Fano plane).

Summary and Concluding Remarks Three categories of mathematical poetry were mentioned in the introduction. While this chapter has focused on mathematical form, several of the examples presented have also contained mathematical content. And so it goes: the distinction between various categories of mathematical poetry is often porous, or even nonexistent. Complicating matters further, it can be difficult to assess just how mathematical a certain poetic form is. Most of the forms discussed here included mathematically prescribed word patterns or repetition, or enumeration of a poem’s syllables, words, or lines. Some of these mathematical forms have been generalized, while others have yet to be. One major area not addressed here is poetry with a visually mathematical structure, important early examples of which appeared in the 1979 anthology Against Infinity. Various square-shaped poems did appear in section “Poetic Enumeration”, and triangle poems also exist. Both of these are examples of concrete poetry, and many mathematical examples abound. Mike Naylor’s “Decision Tree” is visually structured as a binary tree, and Brian Bilston’s poem “At the Intersection” is

36 Poems Structured by Mathematics

1089

both visually and thematically structured by a Venn diagram. Other poets make conspicuous use of mathematical notation in their work as a sort of visual structure, and the work of Kaz Maslanka straddles the line between poetry and the visual arts. The poetry of Larry Lesser also explores these types of connections. Poems have even been written onto physical Möbius strips. Another subject which deserves much more detailed examination is a history of mathematical poetic structure in languages other than English. Most of the original members of the Oulipo wrote primarily in French. The French alexandrine was mentioned only in passing, but not its Spanish cousin Alejandrino. And a discussion of the French rondeau probably belongs alongside any comments on pantoums and villanelles. A few traditional Japanese forms, briefly mentioned in section “Poetic Enumeration”, have been widely adapted into other languages. The Malaysian roots of the pantoum were mentioned, but the Malaysian form known as the empat perkataan which is based on a long sequence of four-word lines was not. Another form is the chronogram, which exploits the double linguistic and numerical meanings that some characters have in a given language. A tradition of chronograms exists in both Java (where the chronogram is known as the sengkalan) and Rome (where Latin chronograms abounded on buildings, coins, and other artifacts). And yet many other stones remain unturned. Lewis Carroll mixed mathematics and poetry throughout his career, and while his logic puzzles might not be considered poetry in the strictest sense, they do provide a nice bridge between logical reasoning and literature. Carroll is discussed in N. Schuler’s chapter in this volume “Lewis Carroll and Euclid: Parallels or Otherwise”. For his various incursions into mathematical poetry, Martin Gardner deserves more credit than the single mention he received here in section “Poetic Enumeration” for his pirelated wordplay. Poets have invoked fractals in their poetic imagery, including Tatiana Bonch-Osmolovskaya. Not mentioned at all in this chapter is the hypertext poetry of Stephanie Strickland, or computer-generated poetry, which simultaneously explores new frontiers in mathematically composed poetry while also extending the very early explorations of Jonathan Swift in automatic writing mentioned in section “Early Examples of Mathematical Form”. Also not present is statistically motivated poetry, for example, the work of Eveline Pye, or poetry structured by matrix multiplication, which has been suggested by Carol Dorf. Despite these oversights, hopefully the reader has been inspired by the mathematical forms of poetry presented in this chapter. After learning a constraint, it is ultimately up to the writer to determine how strictly to follow it and when to drop in a clinamen as a sly wink to their readers.

Cross-References  Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise  Mathematics and Poetry: Arts of the Heart  Two-Way Thermodynamics

1090

D. May

Acknowledgments The author thanks the following people and publishers for their permission to use the material found in this chapter. While many of Emily Dickinson’s poems remain fully or partially covered by copyright, “We Shall Find the Cube of the Rainbow” is now in the public domain, in all versions, and may be used freely, with proper citation to source. It is available in The Poems of Emily Dickinson: Reading Edition, edited by Ralph W. Franklin, Harvard University Press, 1999. Used by permission of Farrar, Straus and Giroux: “Sestina” from POEMS by Elizabeth Bishop. Copyright © 2011 by The Alice H. Methfessel Trust. Publisher’s Note and compilation copyright © 2011 by Farrar, Straus and Giroux. “Haiku” was published in Collected Poems, copyright © 2013, by Ron Padgett. Reprinted by permission of Coffee House Press. Amy Uyematsu, excerpt from “The Invention of Mathematics” from Stone Bow Prayer. Copyright © 2005 by Amy Uyematsu. Reprinted with the permission of The Permissions Company, LLC, on √ behalf of Copper Canyon Press, coppercanyonpress.org. “A proof that 2 is irrational” originally appeared on Daniel Mathews’s website in 2000 and is reprinted here by kind permission of the poet. “Momentary Time Travel” by Jennifer R. Shloming was originally published in 2019 in The Fib Review Issue # 33 and is reprinted here by kind permission of the poet. “Little Women” by JoAnne Growney appeared in a 2019 issue of Math Horizons and is reprinted here with the kind permission of the poet. “Things stay green” is excerpted from “A Thing or Two: Cadae,” by Tony Leuzzi, from The Burning Door, Copyright © 2014, used by kind permission of Tiger Bark Press and the poet. The pi-ku and accompanying image were originally published in Pi(e)-ku Poetry, Issue No. 3, 2020. Poem reprinted with the kind permission of the poet BLW McGrory, Delaware Beat Poet Laureate (2019–2021), and Editor in Chief of the Pi(e)-ku Poetry Zine – bite-sized poems to help fight hunger. “Poe, E. Near a Raven” originally appeared on Mike Keith’s website in 1995 and is reprinted here by kind permission of the poet. “PI MNEMONIC SONG” by Andrew Huang originally appeared on his website in 2004 under the name “I Am The First Fifty Digits of Pi.” A recording of Huang performing the song is available on YouTube. Lyrics reprinted here by kind permission of the songwriter. “A Pantoum for the Power of Theorems” by Sarah Glaz appears in Ode to Numbers, published in 2017 by Antrim House, and is reprinted here by the kind permission of the poet. The images of the tetrahedral pantoum originally appeared in a 2018 post on the Math with Bad Drawings blog, where the work is licensed under a Creative Commons AttributionNonCommercial 4.0 International License. The images here are reproduced by the explicit and kind permission of Ben Orlin, creator of the blog. “The Goddess Works Her Loom” by Enriqueta Carrington originally appeared in a 2010 post on JoAnne Growney’s Intersections blog and appears here by kind permission of the poet. Shelly Wood’s Rubik’s cube poems, including a description of her process and photos, originally appeared on her website in 2015. The poems and photo appear here by kind permission of the poet and photographer. “We Are the Final Ones” by JoAnne Growney appeared in the Bridges Conference Art Exhibition Catalog, Baltimore, 2015, along with artwork by Allen Hirsh. Reprinted here with the kind permission of the poet. “This Is Where You’ll Find Her” by Courtney Huse Wika originally appeared in “The Poetics of a Cyclic Directed Graph” by Courtney Huse Wika and Dan May in the Bridges Conference Proceedings, Waterloo, Canada, 2017. Excerpt reprinted here with the kind permission of the poet. “As It Is” by Michelle Stampe originally appeared in “Galaxies Containing Infinite Worlds: Poetry from Finite Projective Planes” by Dan May and Courtney Huse Wika in the Bridges Conference Proceedings, Baltimore, 2015. Excerpt reprinted here by the kind permission of the poet.

36 Poems Structured by Mathematics

1091

References Arndt J, Haenel C (Lischka C, Lischka D tr) (1998) Pi-unleashed. Springer, Berlin Asveld P (2013) Queneau numbers – recent results and a bibliography. CTIT Technical Report Series, TR-CTIT-13-16, 13 p Birken M, Coon A (2008) Discovering patterns in mathematics and poetry. Rodopi, New York Bonner A (2007) The art and logic of Ramon Llull: a user’s guide. Brill, Boston Bringer M (1969) Sur un problème de R. Queneau. Math et Sci Hum 27:13–20 Cohen M (2012) Math in poetry: half of a course. In: Bridges conference proceedings. https:// archive.bridgesmathart.org/2012/bridges2012-73.html. Accessed 5 Dec 2019 Despeaux S (2015) Oulipo: applying mathematical constraints to literature and the arts in a mathematics for the liberal arts classroom. PRIMUS 25(3):238–247. https://doi.org/10.1080/ 10511970.2014.966935 Dumas J (2008) Caractérisation des quenines et leur représentation spirale. Math et Sci Hum 184:9–23 Duncan D (2012) Calvino, Llull, Lucretius: two models of literary combinatorics. Comp Lit 54(1):93–109 Eco U (1995) The search for the perfect language. Oxford University Press, Oxford Fry S (2005) The ode less travelled: unlocking the poet within. Gotham Books, New York Glaz S (2011) Poetry inspired by mathematics: a brief journey through history. J Math Arts 5(4):171–183. https://doi.org/10.1080/17513472.2011.599019 Glaz S (2016) Poems structured by integer sequences. J Math Arts 10(1–4):44–52. https://doi.org/ 10.1080/17513472.2016.1231574 Glaz S (2019) Enheduanna: princess, priestess, poet, and mathematician. Math Intell 16 p. https:// doi.org/10.1007/s00283-019-09914-7 Grosholz E (2018) Great circles: the transits of mathematics and poetry. Springer, New York Growney J (2006) Mathematics in poetry. J Online Math Apps 6 Growney J (2008) Mathematics influences poetry. J Math Arts 2(1):1–7. https://doi.org/10.1080/ 17513470801975615 Huse Wika C, May D (2017) The poetics of a cyclic directed graph. In: Bridges conference proceedings. https://archive.bridgesmathart.org/2017/bridges2017-359.html. Accessed 5 Dec 2019 Johnson C (2012) N + 2, or a late renaissance poetics of enumeration. MLN 127(5):1096–1143. https://doi.org/10.1353/mln.2012.0140 Koeneke R (2014) Qasida. In: The poetry foundation harriet blog. https://www.poetryfoundation. org/harriet/2014/08/qasida. Accessed 5 Dec 2019 Lajeunesse L (2018) Poetry puzzles. In: Bridges conference proceedings. https://archive. bridgesmathart.org/2018/bridges2018-645.html. Accessed 5 Dec 2019 Lajeunesse L (2019) Graeco-Latin square poems. In: Bridges conference proceedings. https:// archive.bridgesmathart.org/2019/bridges2019-35.html. Accessed 5 Dec 2019 Levin Becker D (2012) Many subtle channels: in praise of potential literature. Harvard University Press, Cambridge, MA Major A (2018) Mapping from e to metaphor. In: Bridges conference proceedings. https://archive. bridgesmathart.org/2018/bridges2018-443.html. Accessed 5 Dec 2019 Mathews H, Brotchie A (eds) (2005) OULIPO compendium. Make Now Press, Los Angeles May D, Huse Wika C (2015) Galaxies containing infinite worlds: poetry from finite projective planes. In: Bridges conference proceedings. https://archive.bridgesmathart.org/2015/ bridges2015-259.html. Accessed 5 Dec 2019 Monk I, Levin Becker D (eds) (2018) All that is evident is suspect: readings from the Oulipo 1963–2018. McSweeney’s, San Francisco Motte W (1998) Oulipo: a primer of potential literature. Dalkey Archive Press, Normal Queneau R (1963) Note complémentaire sur la sextine. Subsidia Pathaphysica Troisième et nouvelle série 1:79–80

1092

D. May

Queneau R (1972) Sur les Suites s−additives. J Comb Theory 12:31–71. https://doi.org/10.1016/ 0097-3165(72)90083-0 Queneau R (Wright B tr) (2012) Exercises in style. New Directions, New York Robertson J (1911) Outlines of the history of German literature. William Blackwood and Sons, London Saclolo M (2011) How a medieval troubadour became a mathematical figure. Not Am Math Soc 58(5):682–687 Strand M, Boland E (eds) (2000) The making of a poem: norton anthology of poetic forms. W.W. Norton and Company, New York Tammet D (2012) Thinking in numbers. Little, Brown and Company, New York

Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise

37

Natalie Schuler Evers

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Euclid and His Controversial Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emergence of Non-Euclidean Geometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Non-Euclidean Geometries and the Education System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Charles Dodgson: The Oxford Mathematician . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lewis Carroll’s New Approach to the Euclidean Debate . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geometric “Straight” Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Defense of the Parallel Postulate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carroll and Mathematics Examinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Euclid and His Modern Rivals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carroll’s Misunderstandings of Non-Euclidean Geometries . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion: The Real Reason Carroll Fought for Euclid . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1094 1095 1096 1097 1098 1098 1099 1102 1104 1107 1109 1111 1112

Abstract In the nineteenth century, mathematical discoveries were being made that radically changed the world of mathematics. These changes sprung from the claim that Euclidean geometry – as laid out in Euclid’s Elements – was flawed, inadequate, and perhaps contingent. In response to these claims, Lewis Carroll used his position as the writer of the famous Alice in Wonderland to publish Through the Looking Glass in which he satirized what he considered the folly of teaching non-Euclidean geometries in schools. Some critics have noticed that Through the Looking Glass refers to contemporary mathematical ideas, but none have put it into context with the education reformation nor have they explored why Carroll included complex mathematics in what is typically considered a

N. S. Evers () University of South Alabama, Mobile, AL, USA © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_128

1093

1094

N. S. Evers

children’s story. I argue that not only was Carroll engaging in the debate, but many of the most memorable scenes and characters in Through the Looking Glass represent different elements in the Euclidean debate, and putting the story in context with the debate removes some of what has been attributed to “nonsense” from the text.

Keywords Non-Euclidean geometry · Euclidean geometry · Lewis Carroll · Through the looking Glass

Introduction In the nineteenth century, mathematical discoveries were made that radically changed the world of mathematics. These changes sprung from the claim that Euclidean geometry was flawed, inadequate, and unnecessary, and in response to these claims, Charles L. Dodgson – better known as Lewis Carroll – used his position as the writer of the famous Alice in Wonderland to publish Through the Looking Glass and later Euclid and His Modern Rivals in which he satirized the introduction of non-Euclidean geometries into the education system. Although Carroll did not deny the validity of non-Euclidean geometries, he firmly believed that only Euclid should be taught in schools even though it relied heavily on complicated proofs. Some Carroll scholars have noticed that Through the Looking Glass refers to contemporary mathematical ideas and the Euclidean debate. Michael Lucibella (2013) remarks: Riemann’s seminal contributions to geometry likely inspired Lewis Carroll when he wrote Through the Looking Glass. [Carroll] was a traditional Euclidean at heart; he liked his spaces flat. In many respects, the absurdity of the imaginary world he created for Alice mirrored the intellectual upheaval of late 19th century mathematics, in which scholars grappled with a topsy-turvy looking glass world filled with curved space and imaginary numbers.

Although Lucibella notices a connection between the mathematical debate and the publishing of Through the Looking Glass, he never develops the idea, and Bjørn Felsager suspected a connection between Through the Looking-Glass and Minkowski Geometry, but there is no indication that Carroll had any knowledge of Minkowski Geometry. Felsager briefly mentions Through the Looking Glass in his article “Through the Looking Glass: A Glimpse of Euclid’s Twin Geometry: The Minkowski Geometry,” but his argument primarily pertains to Alice’s conversation with the Mock Turtle and the Gryphon concerning the Pythagorean Theorem in Alice in Wonderland (Felsager 2004, pp. 11–13). Other Carroll scholars have extensively studied the connection between Carroll and his interest in the Euclidean debate – such as Martin Gardner, Francine F. Abeles, Amirouche Moktefi, and Edward Wakeling to name a few – but they fail to carry the argument further than his mathematical works and Euclid and His Modern

37 Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise

1095

Rivals. I argue that not only was Lewis Carroll engaging in the Euclidean debate in his mathematical works and Euclid and His Modern Rivals, but that many of the most memorable scenes and characters in Through the Looking Glass represent different elements of the debate as well.

Euclid and His Controversial Elements Written circa 300 BC, Euclid’s Elements is composed of postulates, propositions, and complicated proofs that laid the groundwork for the study of mathematics for centuries, and it had been used to teach mathematics in European schools for over a thousand years. However, geometry as laid out in Elements contained partial proofs and flaws. Euclid’s Elements begin with five postulates: 1. 2. 3. 4. 5.

To draw a straight line from any point to any point. To produce (extend) a finite straight line continuously in a straight line. To describe a circle with any center and distance (radius). That all right angles are equal to one another. That, if a straight line falling on two straight lines makes the interior angles on the same side less than two right angles, the two straight lines, if produced indefinitely meet on that side on which are the angles less than two right angles. (Euclid and Heath 300 BC/1926, pp. 195–202) Plainly stated,

1. 2. 3. 4. 5.

Points and straight lines exist. Lines can be infinitely long. Circles exist. All right angles are equal to each other. Two lines that stay the same distance apart from each other never meet.

All of these postulates had been accepted, but the fifth postulate was seen as problematic and perhaps this is why Euclid waited to use it until later in Book I. The fifth postulate – more commonly known as the Parallel Postulate – was convoluted and ambiguous, yet its existence is necessary to prove more significant results later in Elements. Therefore, if the Parallel Postulate does not hold, later propositions in Elements that rely on the Parallel Postulate cannot be proven to be true. By the fifth century, the Parallel Postulate had already sparked controversy. Proclus and Morrow (412–485/1970) notes in his commentary on the first book of Elements that “[the Parallel Postulate] ought to be struck from the postulates altogether” since it “invites many questions . . . and requires for its demonstration a number of definitions as well as theorems” (p. 150). He then adds: But perhaps some persons might mistakenly think this proposition deserves to be ranked among the postulates on the ground that the angles’ being less than two right angles makes

1096

N. S. Evers

us at once believe in the convergence and intersection of the straight lines. To them Geminus has given the proper answer when he said that we have learned from the very founders of this science not to pay attention to plausible imaginings in determining what propositions are to be accepted in geometry. Aristotle likewise says that to accept probable reasoning from a geometer is like demanding proofs from a rhetorician . . . These considerations make it clear that we should seek a proof of the theorem that lies before us and that it lacks the special character of a postulate. (Proclus and Morrow 412–485/1970, p. 151)

Proclus was not the last to take issue with Euclid’s Parallel Postulate, but the only solution thus far had been to accept it as gospel and continue the attempt to verify it. However, this mindset changed in the nineteenth century. Some mathematicians, such as Carl Gauss, continued to either attempt to rectify Euclid’s Parallel Postulate or discredit Euclidean geometry entirely, but some, including János Bolyai and Nikolai Lobachevsky, created their own forms of geometry as alternatives.

Emergence of Non-Euclidean Geometries Carl Friedrich Gauss was a renowned mathematician in the nineteenth century. Gauss expressed his misgivings concerning Euclid’s Elements, and said in a letter to a close friend later in life, “I have from time to time in jest expressed the desire that Euclidean geometry would not be correct” (Lucibella 2013). While Gauss knew there were flaws in Euclid’s teachings, it is rumored he remained quiet due to the popularized view of Immanuel Kant who concluded that Euclidean geometry is “the inevitable necessity of thought” (O’Connor and Robertson 1996). In fact, despite all of Gauss’ works and discoveries throughout his life, he hid some of his discoveries, particularly those pertaining to his reservations towards Euclid’s Elements. Gray (2018) remarks: [Gauss] began to doubt the a priori truth of Euclidean geometry and suspected that its truth might be empirical. For this to be the case, there must exist an alternative geometric description of space. Rather than publish such a description, Gauss confined himself to criticizing various a priori defenses of Euclidean geometry. It would seem that he was gradually convinced that there exists a logical alternative to Euclidean geometry . . . Gauss failed to give a coherent account of his own ideas.

Despite not publically declaring his doubts about Euclid, Gauss shared his ideas with his friend and fellow mathematician, Farkas Bolyai, who made several failed attempts to prove the Parallel Postulate. Farkas Bolyai’s fascination with the Parallel Postulate was passed to his son, János Bolyai, with the warning “not to waste one hour’s time on that [Parallel Postulate]” (O’Connor and Robertson 1996). Undeterred by his father’s warning against his exploration of the Parallel Postulate, János Bolyai spent years wrestling with the problem and in the end, his tinkering with the postulate gave rise to a new form of geometry. Bolyai called his new form of geometry “imaginary geometry,” and in imaginary geometry, instead of planes being flat, they are curved or saddle-shaped – the conventional meaning of “straight” is completely disregarded; parallel lines are not actually parallel, and the measurement of angles do not remain constant to 180◦ . In his geometry, the shortest

37 Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise

1097

distance between two points is not a straight line but a curved line, and therefore, parallel lines actually diverge from each other. In 1830, Bolyai wrote of imaginary geometry to his father and remarked, “Out of nothing I have created a strange new universe” (Mastin 2010). When Gauss learned of János Bolyai’s geometry he said to a friend: “I regard this young geometer Bolyai as a genius of the first order” (O’Connor and Robertson 1996), but told János Bolyai that he had reached similar conclusions years prior yet never published them (O’Connor and Robertson 1996). Shortly after Bolyai introduced his imaginary geometry, Russian mathematician Nikolai Lobachevsky published results similar to Bolyai’s. Despite the similarities, Lobachevsky did not know Bolyai personally, nor did he base his geometry on Bolyai’s. Lobachevsky’s geometry also ignored the Parallel Postulate but was so similar to Bolyai’s form of geometry that they came to share the credit for imaginary geometry which is now more commonly known as Bolyai-Lobachevskian geometry or hyperbolic geometry. Lobachevsky published his findings in 1830 like Bolyai, but they were not widely circulated until after Lobachevsky’s death in 1856. In 1853, Bernhard Riemann, who studied under Gauss, gave a lecture titled, “On the Hypotheses Which Lie at the Foundations of Geometry” in which he openly questioned the soundness of Euclid. In the lecture, Riemann builds his argument on Gauss’ previous works, and shows that Euclid’s assumption of constant curvature of lines and planes is incorrect. Once the mathematical community learned of the alternative approaches developed by Bolyai and Lobachevsky, and the flaws in Euclid’s ideas presented by Riemann, a revolution started in England.

Non-Euclidean Geometries and the Education System After the development of non-Euclidean geometries had circulated throughout the mathematical community, it permeated the education system as well. The education community in England in the nineteenth century was intrigued by non-Euclidean geometries, and the concept caught on soon after Legendre’s book Élements de Géometrie had been adopted in France and a delegation was sent to Britain to analyze the Euclidean system of education. The preface to Élements de Géometrie, Legendre and Smith (1823/1867) states: “Geometry rests upon a few simple and self-evident truths; and from these, by the rigid processes of deduction, the student of mathematics is afforded a valuable mental discipline, which supplies an important corrective for some of the evils resulting from an exclusive devotion to analysis” (p. i). The “evils resulting from an exclusive devotion to analysis” is a sly way of referring to Euclid’s Elements and how it was being taught at the time. In 1870, the Elementary Education Act was passed which established compulsory education for all children between the ages of 5 and 13, and this same year, Rawdon Lovett introduced the idea of an “Anti-Euclid Association” which was transformed into the Association for the Improvement of Geometrical Teaching (AIGT) in 1871. The AIGT’s desire was to reform the teaching of mathematics in primary schools as well as in universities throughout England. One of the biggest problems the AIGT noticed was that Euclid was considered a “very English

1098

N. S. Evers

approach to mathematics” (Price 1994). By “English,” the reformers were not referring to the country of England, but to the language – they were concerned that students being taught Euclid’s approach focused mainly on memorizing and writing complicated proofs and neglected the more practical applications of geometry. The Euclidean approach also caused a problem since examinations were rising in popularity on the primary school level as well as in universities, and proofs were tedious to grade. The concerns of the “English approach” and the push for more examinations coupled with the popularized revelations that some of Euclid’s propositions were incorrect led to the AIGT recommending the removal of the majority of Euclid’s teachings from the education system. It was at this point that Charles Dodgson entered the debate.

Charles Dodgson: The Oxford Mathematician Charles Dodgson was a mathematics lecturer at Christ Church for 26 years, but his interest in mathematics did not stop inside the classroom. Martin Gardner and Carroll (1974) remarks, “[Dodgson’s] lectures were humorless and boring,” but while he was a lecturer, Dodgson was introduced to the new dean, Henry Liddell, and Dodgson formed a special relationship with one of Liddell’s daughters, Alice. On top of being a respected member of the church and a brilliant mathematician, Dodgson had a love for children – particularly girls – and he was also a gifted storyteller. Dodgson was well known for writing letters and poems to his child friends, and he often made up stories for them. The Liddell children would beg Dodgson to tell them stories, and 1 day, on a boat, he began what would later develop into Alice in Wonderland.

Lewis Carroll’s New Approach to the Euclidean Debate Before the publication of Alice in Wonderland and Through the Looking Glass, Carroll had published several works on the importance of keeping Euclid in the education system. Most of these works were dry and intended for the mathematical gurus and were ultimately overlooked due to Carroll’s unpopularity up to this point. However, the fame he received after Alice in Wonderland was published allowed Carroll to use his position to subtly confront some of the issues of discarding Euclid. The sequel to Alice in Wonderland, entitled Through the Looking Glass, was published in 1871, right in the middle of the Euclidean debate. Although Through the Looking Glass is undoubtedly a continuation of the famous story, it contains conversations and scenes that are often dismissed as Carroll’s propensity for nonsense. However, some of these seemingly nonsensical conversations and scenes contain subtle attacks against the emergence of non-Euclidean geometries in the education system. Once Alice begins her journey in the Looking Glass world, she quickly finds that the differences between her world and the Looking Glass world do not stop with

37 Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise

1099

only books being backwards. When she meets the Red Queen, she looks over the land ahead of her and exclaims: “It’s marked out just like a large chessboard! . . . It’s a great huge game of chess that’s being played” (Carroll and Tenniel 1871/2014, p. 150). As I will show, chessboards are Euclidean in nature and automatically follow three out of the five postulates, but Carroll’s chessboard follows four of the five. The giant chessboard has players travelling in straight lines as far as their piece allows, which is equivalent to Euclid’s first postulate which states that points and straight lines exist. Also, pieces can travel what seems like an infinite distance while still travelling on the same line. This is shown when the Queen and Alice make their first move: “‘Are we nearly there?’ Alice managed to ask. ‘Nearly there!’ the Queen repeated. ‘Why, we passed it ten minutes ago! Faster!’ . . . And they went so fast that at last they seemed to skim through the air, hardly touching the ground with their feet” (Carroll and Tenniel 1871/2014, p. 151). This is Carroll’s clever way of working the concept of infinite lines (the second postulate) into the otherwise finite chessboard: no matter how fast they travel, they never leave the line nor do they run out of line.1 Since the chessboard is divided into squares, and all squares contain equal angles (right angles), the fourth postulate is stated. The third and fifth postulates are the only two postulates not directly seen on the chessboard itself, but the fifth postulate – the highly controversial Parallel Postulate – is seen in depth later, and the third pertains to circles and not lines and angles (like the first, second, and fourth), which may be why Carroll did not bother to include it. Though before Alice steps onto the chessboard, Carroll has already attacked the non-Euclidean idea of “straight.”

Geometric “Straight” Analogies As soon as Alice steps outside the Looking Glass house and into the Looking Glass world, she sees the garden and wants to go to the top of a hill to get a better view. She says to herself, “ . . . here’s a path that leads straight to it – at least, no, it doesn’t do that – . . . but I suppose it will at last. But how curious it twists! It’s more like a corkscrew than a path! Well, this turn goes to the hill, I suppose – no, it doesn’t!” (Carroll and Tenniel 1871/2014, p. 143). Alice quickly realizes that even though the path appears to lead straight to the garden, it does not. For several more paragraphs Alice attempts to reach the garden to no avail by walking in a straight line when finally, she encounters the Live Flowers who – after some harsh words towards Alice – advise her to walk the other way. “This sounded nonsense to Alice, so she said nothing, but set off at once towards the Red Queen. To her surprise . . . she found herself walking in at the front door again. A little provoked, she drew back, and . . . she thought she would try the plan, this time, of walking in the opposite

1 The

conversation with Alice and the Red Queen as they whisk across the chessboard serves two purposes to Carroll: establishing the infinite lines postulate as well as showing the foolishness of pressing exams. The latter is discussed more in depth later.

1100

N. S. Evers

direction. It succeeded beautifully” (Carroll and Tenniel 1871/2014, p. 148). The fact that the path appears to lead to the top of the hill and then straight to the garden is a reference to non-Euclidean geometries in which the plane is not flat, but saddleshaped – the lines are not straight, and the plane curves. Although curves are present in Euclidean geometry, the shortest distance between two points is always a straight line. However, in non-Euclidean geometries, the shortest distance may actually be curved instead of straight. In Euclid and His Modern Rivals (1879/2009), when Henrici’s book Elementary Geometry: Congruent Figures is on trial, Minos asks about Henrici’s definition of a straight line. Niemand responds, “Here my client’s meaning is not very clear. The first Definition I can find is that of a curve. He says (p. 6) ‘a point may be moved, and then it will describe a path. This path of a moving point is a curve’” (p. 72). This definition does not satisfy Minos and he replies, “Surely he does not mean that a point can never move straight? He must mean that there are two kinds of curves . . . But if so, he makes ‘Line’ and ‘curve’ synonymous” (pp. 72–73). Niemand reads on and discovers that when Henrici does define “line,” it “seems to limit the word to bent lines” (p. 73). Minos remarks sarcastically, “So then a ‘Line’ must be bent, though a ‘curve’ need not be so? Your client has clearly one merit – great originality of style!” (p. 73). Carroll is emphasizing both in Through the Looking Glass and Euclid and His Modern Rivals how illogical it is to have a line – or path in this case – that appears to be going one direction but actually goes another which is not straight or direct in a logical sense. In chapter seven of Through the Looking Glass, the White King tells Alice that he needs two messengers so that he has “one to fetch and one to carry” messages. Despite the message going and coming from the same location, the White King insists he needs two messengers. With this scene, Carroll is mocking J. M. Wilson’s definition of a straight line or pair of lines that appears in his manual, Elementary Geometry, published in 1868. Carroll explicitly attacks Wilson’s flawed definition in his later book, Euclid and His Modern Rivals, in which Wilson is put on trial for his manual. Minos says of Wilson’s concept of lines and direction, “In asserting that there is a real class of non-coincidental Lines that have ‘the same direction,’ are you not also asserting that there is a real class of Lines that have no common point? For, if they had a common point, they must have ‘different directions” (Carroll 1879/2009, p. 113). In this statement, Carroll is simultaneously calling out Wilson for his misinterpretation of parallel lines and meeting lines: if the lines meet, they must have different directions. Thus, the White King has one messenger to fetch and one to carry since, although the lines meet (they are travelling the same line), they cannot be going the same direction. Wilson’s definition of a straight line also did not define direction before first using direction to define a straight line and this was an important distinction to Carroll. Wilson – as the character in Euclid and His Modern Rivals – says, “[Minos] you are using ‘straight line’ to help you in defining ‘direction.’ We, on the contrary, consider ‘direction’ as the more elementary idea of the two, and use it in defining ‘straight line’” (Carroll 1879/2009, p. 104). Wilson’s textbook defines a straight line as, “a line which has the same direction at all parts of its length” (Carroll 1879/2009, p. 104). The difference between Wilson’s definition and Euclid’s definition is that Euclid specified in a later proposition that a line has

37 Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise

1101

only one direction but that it can be traveled either way. Carroll himself said of Wilson’s manual: The abundant specimens of logical inaccuracy, and of loose writing generally, which I have here collected would, I feel sure, in a mere popular treatise be discreditable – in a scientific treatise, however modestly put forth, deplorable – but in a treatise avowedly put forth as a model of logical precision, and intended to supersede Euclid, they are simply monstrous. My ultimate conclusion on your Manual is that it has no claim whatever to be adopted as the Manual for purposes of teaching and examination. (Wilson and Moktefi 2019, p. 27)

The definition stated by Wilson is what Carroll is mocking when the White King has two messengers because they can each go only one way, even though they are making the same journey. Whereas, if the king were a proper Euclidean, he would need only one messenger who could travel in both directions. Carroll (1879/2009) was also bothered by non-Euclidean geometers’ willingness to rearrange problems and theorems from the Elements. In Euclid and His Modern Rivals, Minos says: I understand that the Committee of the Association for the Improvement of Geometrical Teaching, in their Report on the Syllabus of the Association, consider the separation as ‘equivalent to the assertion of the principle that, while Problems are from their very nature dependent for the form, and even the possibility, of their allowed to be used, Theorems, being truths involving no arbitrary element, ought to be exhibited in a form and sequence independent of such limitations.’ They add however that ‘it is probable that most teachers would prefer to introduce Problems, not as a separate section of Geometry, but rather in connection with the Theorems with which they are essentially related.’ (p. 18)

To which Euclid responds: “It seems rather a strange proposal, to print the Propositions in one order and read them in another. But a stronger objection to the proposal is that several of the Problems are Theorems as well” (Carroll 1879/2009, p. 19). The AIGT as well as other “Modern Rivals” argue that axioms and their propositions are not necessarily needed in the order in which they appear in the Elements. The ridiculousness of rearranging theorems and propositions is seen in Through the Looking Glass, specifically in chapter five: “Wool and Water.” While Alice is with the White Queen, she notices one of the White King’s messengers, Hatta – the Mad Hatter – in jail. The Queen explains that he is in jail for a crime he has not committed yet. Alice asks: “Suppose he never commits the crime?” to which the Queen replies: “That would be all the better, wouldn’t it? . . . Were you ever punished?” The conversation then proceeds: “Only for faults,” said Alice. “And you were all the better for it, I know!” the Queen said triumphantly. “Yes, but then I had done the things I was punished for,” said Alice: “that makes all the difference.” “But if you hadn’t done them,” the Queen said, “that would have been better still; better, and better, and better!”. (Carroll and Tenniel 1871/2014, pp. 181–182)

Following the looking-glass theme of being out of order, Hatta has not committed a crime and it is not even proven that he will commit a crime, yet he is locked up. The idea of justice has been reversed as well as the natural order of proceedings.

1102

N. S. Evers

The premature and unnecessary imprisonment of Hatta alludes to the foolishness of rearranging Euclid’s propositions and theorems.

Defense of the Parallel Postulate Although Carroll disagreed with various aspects of non-Euclidean geometries, the ideas that appeared to anger him more than others was the desire to disregard or remove the Parallel Postulate from the education system. The Parallel Postulate is portrayed most prominently in two of the most renowned characters in Through the Looking Glass: Tweedledum and Tweedledee. Alice encounters these two characters in chapter four of Through the Looking Glass, but the characters themselves are not original to Carroll. Gardner and Carroll (1974) points out, “In the 1720s there was a bitter rivalry between Handel, the German-English composer, and Bononcini, an Italian composer. John Byrom, an eighteenth-century hymn writer and teacher of shorthand, described the controversy as follows: ‘Some say, compared to Bononcini That Mynheer Handel’s but a ninny; Others aver that he to Handel Is scarcely fit to hold a candle; Strange all this difference should be Twixt tweedle-dum and tweedle-dee.’” (p. 230)

Years later, a famous nursery rhyme was written which included the two characters: Tweedledum and Tweedledee Agreed to have a battle; For Tweedledum said Tweedledee Had spoiled his nice new rattle. Just then flew down a monstrous crow, As black as a tar-barrel; Which frightened both the heroes so, They quite forgot their quarrel. (Anon 1805)

This nursery rhyme is what Carroll’s Tweedledum and Tweedledee act out in Through the Looking Glass, so it has always been assumed to be Carroll’s main source of inspiration for the characters. However, Carroll was the first to depict Tweedledum and Tweedledee as brothers, but – more importantly – as twins. Even the original drawing of Tweedledum and Tweedledee indicates their parallel nature. Although Carroll himself was not the illustrator, there is strong evidence in letters that Carroll was extremely particular about the appearance of illustrations, so while it is not certain that he ordered Tenniel to render the characters in parallel postures, it is certain that Carroll endorsed the illustrations by allowing them to appear in the book at all. At the close of chapter three, Alice is following a road with two finger-posts and asks herself which she should follow: “It was not a difficult question to answer, as there was only one road through the wood, and the two finger-posts both

37 Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise

1103

pointed along it . . . she went on and on, a long way, but wherever the road divided, there were sure to be two finger-posts pointing the same way, one marked ‘To Tweedledum’s House’, and the other ‘To the House of Tweedledee’” (Carroll and Tenniel 1871/2014, p. 164). Before the two characters are ever formally introduced, the idea of the Parallel Postulate is already presented: there are two different lines, but the signs point the same direction and never truly diverge in conflicting directions, indicating that the lines adhere to Euclid’s definition of parallel. Martin Gardner and Carroll (1974) observes that Tweedledum and Tweedledee “are what geometers call ‘enantiomorphs,’ mirror-image forms of each other. That Carroll intended this is strongly suggested by Tweedledee’s favorite word, ‘contrariwise’” (p. 231). Gardner is right in noticing that they are mirror-images of one another, but this claim can be taken one step further. Enantiomorph is a term more frequently used in chemistry as opposed to mathematics, but the idea of an enantiomorph is similar to the mathematical concept of being parallel. This is seen even in how Tweedledee and Tweedledum are depicted in Tenniel’s original picture: their postures and the way their arms are positioned across each other indicate a right angle, which Euclid said is necessary to show that two lines are parallel. One of the strongest proofs of Tweedledum and Tweedledee representing the Parallel Postulate is Tweedledee’s famous statement: “Contrariwise if it was so, it might be; and if it were so it would be; but as it isn’t, it ain’t. That’s logic” (Carroll and Tenniel 1871/2014, p. 166). This quotation is famous for its repetitive tautological structure, and that is exactly how Carroll wanted it to be; Carroll is mocking the critics who have removed the Parallel Postulate from the education system – a straight-forward concept – and replaced it with something much more difficult to understand. The theme of parallelism is carried throughout the entire chapter with Tweedledum and Tweedledee. In this chapter, Tweedledum, Tweedledee, and Alice encounter the sleeping Red King. Tweedledee asks Alice, “What do you suppose he’s dreaming about? . . . Why, about you! . . . And if he left off dreaming about you, where do you suppose you’d be?” (Carroll and Tenniel 1871/2014, p. 173). To which Alice replies, “Where I am now, of course.” Here begins the philosophical debate that would come to haunt philosophers such as Bertrand Russell: whose dream is the story actually taking place in? Alice’s or the Red King’s? However, the point Carroll is making is not a simply a philosophical point, but a mathematical one as well. Gardner and Carroll (1974) says, “there is an odd sort of infinite regress involved here in the parallel dreams of Alice and the Red King. Alice dreams of the King, who is dreaming of Alice, who is dreaming of the King, and so on, like two mirrors facing each other” (p. 239). The dream sequence follows the logic of the Looking Glass world, but it separates itself from merely a mirrored dream. Carroll demonstrates at the beginning of the story when Alice is reading the books in the Looking Glass house that the words are backwards, but the dreams of Alice and the King are the same dream – they are going the same direction, but they are dreamed by different people. It follows the same imagery of parallelism given by the signs pointing to the house of Tweedledum and Tweedledee. This means that the question Carroll raises is not simply a question of whose dream the story is actually taking place in, but what would happen if one changed the dream slightly

1104

N. S. Evers

or even woke up from the dream at all? Carroll’s answer is given in the answers provided by Tweedledee: “You’d be nowhere. Why, you’re only a sort of thing in his dream!” and Tweedledum: “You’d go out – bang! – just like a candle!” (Carroll and Tenniel 1871/2014, p. 173). Carroll reasons that if parallelism is disrupted, logic will disappear just like Alice would if the Looking Glass dream was disrupted. This logic follows Carroll’s views that if the Parallel Postulate is removed from the education system, the entire teaching of logic and mathematics is permanently altered and possibly harmful. When Alice encounters Humpty Dumpty in chapter six of Through the Looking Glass, they get into an argument over word choice. Humpty Dumpty uses a word in an unconventional way, and when Alice asks what the word means in the context of that sentence, he responds, “When I use a word, it means just what I choose it to mean – neither more or less” (Carroll and Tenniel 1871/2014, p. 196). Alice cleverly retorts, “The question is, whether you can make words mean so many different things” (Carroll and Tenniel 1871/2014, p. 151). Here, Carroll is saying that despite what the non-Euclidean geometers believe, the word “parallel” cannot have two meanings. The phrase Humpty Dumpty uses – “neither more or less” – is a reference to the definition of parallel as defined by Euclid: parallel lines exist only when the angle between the lines and line crossing them is no more or less than 90◦ .

Carroll and Mathematics Examinations Carroll was not fond of examinations – especially written examinations not unlike the standardized tests used in schools today. In 1855, he failed his examination to keep his scholarship at Christ Church, and he recorded in his diary: It is tantalizing to think how easily . . . I might have got it, if I had only worked properly during this term, which I fear I must consider as wasted. However, I have now got a year before me, and with this past term as a lesson . . . I mean to have read by next time, Integral Calculus, Optics (and theory of light), Astronomy, and higher Dynamics. I record this resolution to shame myself with, in case March 1856 finds me still unprepared, knowing how many similar failures there have been in my life already. (Cohen 1995, p. 51)

His phrase “knowing how many similar failures there have been in my life already” leads one to believe this is not the first time Carroll struggled with an exam, which must have added to his frustration later in life when the AIGT began pushing for more written examinations. Carroll disliked examinations especially as a lecturer because he felt that they were not an adequate way to determine whether a student truly grasped a topic. In an 1885 letter, Carroll wrote to a woman seeking advice on furthering her daughter’s education: As one who has lectured for 26 years on Mathematics, I may perhaps make bold to say that the amount of work you tell me [your daughter] went through in 5 months is simply absurd. Thorough mastery, of so much in so short a time, would be (even if she were a female Isaac Newton) out of the question: and if there is one subject less adapted than another to be got up by “cram,” it is Mathematics. And again, if there is one subject more than another, where it is absolutely fatal to success to attack higher parts of the subject, while lower parts

37 Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise

1105

are still only half-understood, it is Mathematics. That she ‘passed’ an examination in those subjects is no real criterion of her having mastered them. (Cohen 1995, p. 142)

Carroll uses part of Alice’s conversation with Humpty Dumpty to show the flaws with purely written examinations. Humpty Dumpty asks Alice what 365 minus one is, and she responds with “364.” He “looked doubtful and said, ‘I’d rather see that done on paper’” (Carroll and Tenniel 1871/2014, p. 196). She writes it down and “Humpty Dumpty took the book, and looked at it carefully. ‘That seems to be done right – ,’ he began” (Carroll and Tenniel 1871/2014, p. 196). Alice interrupts him and shows him that the book is upside down. After he rotates it, he says, “I thought it looked a little queer. As I was saying, that seems to be done right – though I haven’t time to look it over thoroughly just now” (Carroll and Tenniel 1871/2014, p. 196). With this, Carroll makes light of teachers and lecturers who feel that written examinations are the only way to accurately portray learning. He shows that even the teachers and lecturers themselves do not always answer them correctly, but he also implies that written examinations are also more burdensome to teachers because they have to “look it over thoroughly” once the exams are completed. He also hints that an examination-driven education takes some of the joy out of learning towards the beginning of Through the Looking Glass when Alice runs with the Red Queen. As Alice and the Red Queen make their move on the chessboard in chapter two, Alice and the Red Queen move so quickly that “they seemed to skim through the air, hardly touching the ground” (Carroll and Tenniel 1871/2014, p. 151). Although previously it was pointed out that this passage was meant to establish the existence of infinite lines, it also serves to show how much more effort examinations cause in order to obtain the same results. When they finally come to a stop, Alice is astonished to see that although they had been moving quickly, they had not moved much at all and that “Everything’s just as it was!” (Carroll and Tenniel 1871/2014, p. 152). The Red Queen is taken aback by Alice’s astonishment, so Alice continues, “Well, in our country . . . you’d generally get to somewhere else – if you ran very fast for a long time as we’ve been doing.” The Red Queen responds, “A slow sort of country! . . . Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that” (Carroll and Tenniel 1871/2014, p. 152). Alice is displeased by this notion and mentions she is hot and thirsty to which the Red Queen says, “I know what you’d like! . . . Have a biscuit?” (Carroll and Tenniel 1871/2014, p. 152). The passage continues: “[Alice] took it, and ate it as well as she could: and it was very dry: and she thought she had never been so nearly choked in all her life” (Carroll and Tenniel 1871/2014, p. 152). While Alice choking on her “refreshment,” the Red Queen takes measurements: “She took a ribbon out of her pocket, marked in inches, and began measuring the ground, and sticking little pegs in here and there. ‘At the end of two yards,’ she said, putting in a peg to mark the distance, ‘I shall give you your directions . . . at the end of three yards I shall repeat them – for fear of your forgetting them. At the end of four I shall say good-bye. At the end of five I shall go!’” (Carroll and Tenniel 1871/2014, p. 152). The Red Queen’s measurements are setting goals for Alice to reach and each measurement has a specific purpose. The use of the word “measurement” is not coincidental especially

1106

N. S. Evers

since in some cases, it is synonymous with “assessment.” The measurements the Red Queen makes are rather pointless and only benefit herself, yet she says they are for Alice. While Alice is first becoming acquainted with the Red Queen, the Red Queen asks where she is from and is already correcting Alice: “Where are you going? Look up, speak nicely, and don’t twiddle your fingers all the time” (Carroll and Tenniel 1871/2014, p. 148). Her corrections continue: “‘I don’t know what you mean by your way, . . . all the ways about here belong to me – but why did you come out here at all?’ she added in a kinder tone. ‘Curtsey while you’re thinking what to say. It saves time’” (Carroll and Tenniel 1871/2014, p. 148). The Red Queen means well, and Carroll himself says of the Red Queen, “The Red Queen I pictured as a Fury, but of another type; her passion must be cold and calm; she must be formal and strict, yet not unkindly; pedantic to the tenth degree, the concentrated essence of all governesses!” (Gardner and Carroll 1974, p. 206). Part of a governesses’ job is to tutor children, so in many ways, she echoes a school teacher, and idea is confirmed by the Red Queen giving biscuits to Alice for following instructions. This is not an idea original to Carroll – it is taken from a quote from Horace’s Satires: “Quamquam, ridentem dicere verum quid vetat? Ut pueris olim dant crustula blandi doctors, elementa velint ut discere prima” (Harrison 2007, p. 80). Translated: “Though what is there to prevent one who is laughing from telling the truth? Just as sometimes teachers are charming and give cakes to boys, in order to make them wish to learn their first letters” (Harrison 2007, p. 80). Carroll (1879/2009) was aware of this quote, and he used it in the preface to the first edition of Euclid and His Modern Rivals. The Red Queen’s exaggerated corrections, her dry-biscuit reward for Alice’s hard work, and her obsession with unnecessary measurements all indicate the push for what Carroll would have considered unnecessary assessments. Towards the end of Through the Looking Glass when Alice is about to be crowned queen, the Red Queen and the White Queen subject her to an examination to see if she is qualified: ‘Can you do Addition?’ the White Queen asked. ‘What’s one and one and one and one and one and one and one and one and one and one?’ ‘I don’t know,’ said Alice. ‘I lost count.’ ‘She ca’n’t do Addition,’ the Red Queen interrupted. ‘Can you do Subtraction? Take nine from eight.’ ‘Nine from eight I c’an’t, you know,’ Alice replied very readily: ‘but – ’ ‘She ca’n’t do Subtraction,’ said the White Queen. ‘Can you do Division? Divide a loaf by a knife – what’s the answer to that?’ ‘I suppose – ’ Alice was beginning, but the Red Queen answered for her. ‘Bread-andbutter, of course. Try another Subtraction sum. Take a bone from a dog: what remains?’ (Carroll and Tenniel 1871/2014, p. 232)

The answer to this question is debated and then the “examination” starts again, but this time, with Alice: ‘Can you do sums?’ Alice said, turning suddenly on the White Queen . . . The Queen gasped and shut her eyes. ‘I can do Addition’, she said, ‘if you give me time – but I ca’n’t do Subtraction under any circumstances!’ . . . The White Queen said in an anxious tone. ‘What is the cause of lightning?’

37 Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise

1107

‘The cause of lightning,’ Alice said very decidedly, for she felt quite certain about this, ‘is the thunder – no, no!’ she hastily corrected herself. ‘I meant the other way.’ ‘It’s too late to correct it,’ said the Red Queen: ‘when you’ve once said a thing, that fixes it, and you must take the consequences.’ (Carroll and Tenniel 1871/2014, pp. 233–234)

Lucky for Alice, the senseless examination did not determine whether or not she was capable of being queen, and shortly after, she passes through the door and is crowned Queen Alice. However, Carroll was not entirely against examinations, he just did not see them as an accurate representation of a student’s knowledge. In a letter sent in 1873 to Gaynor Simpson, Carroll says, “My name is spelt with a ‘G,’ that is to say ‘Dodgson.’ Any one who spells it the same as that wretch (I mean of course the Chairman of Committees in the House of Commons) offends me deeply, and for ever! It is a thing I can forget, but never can forgive! If you do it again, I shall call you ‘ ‘aynor.’ Could you live happy with such a name? . . . Your affectionate friend, Lewis Carroll” (Cohen 1995, p. 56). The Dodson who is the Chairman of Committees in the House of Commons is John Dodson who held the position from 1865 to 1872. The only stance Dodson would have had that would have offended Carroll so deeply was that in 1864 Dodson proposed a bill that would abolish all examinations in universities. He includes in another letter to a friend stressing over an impending examination: “tell [your daughter] not to be nervous about the examination. In the Oxford examinations, the best candidates are always fancying they will get plucked, but they come out, after all, crowned with beautiful wreaths of cauliflowers – and so will she, no doubt” (Cohen 1995, p. 94). The phrase “they will come out, after all, crowned with beautiful wreaths” sounds like the end of Through the Looking Glass when Alice herself has made it through the Looking Glass world and is crowned queen. These are not opinions that would be held by a man who is wholly against examinations; he is only opposed to over-assessing as he fears it does an injustice to both the subject and the learner.

Euclid and His Modern Rivals By incorporating mathematics into Through the Looking Glass, Carroll hoped to stress the dangers of replacing the teaching of Euclid’s Elements. Unfortunately, Through the Looking Glass was seen by the reading public as merely a continuation of Alice in Wonderland rather than as a serious contribution to the Euclidian debate, and therefore his input on the mathematical debate was overshadowed by his established reputation as a fantasy novelist and a nonsense poet. Since his ideas were not portrayed obviously enough in Through the Looking Glass, he later published Euclid and His Modern Rivals in which he openly attacked the reformers by name. Even though Lewis Carroll published a book in between Through the Looking Glass and Euclid and His Modern Rivals under his real name, he once again used his pseudonym when Euclid and His Modern Rivals was published in 1879. Carroll (1879/2009) explains in the preface:

1108

N. S. Evers

The object of this little book is to furnish evidence, first, that it is essential, for the purpose of teaching or examining in elementary Geometry, to employ one textbook only; secondly, that there are strong a priori reasons for retaining, in all its main features, and specially in its sequence and numbering of Propositions and in its treatment of Parallels, Euclid’s Elements; and thirdly, that no sufficient reasons have yet been shown for abandoning it in favour of any one of the modern Manuals which have been offered as substitutes. (p. xxxv)

The preface to Euclid and His Modern Rivals goes on to say, I have not thought it necessary to maintain throughout the gravity of style which scientific writers usually affect, and which has somehow come to be regarded as an “inseparable accident” of scientific teaching. I never could quite see the reasonableness of this immemorial law; subjects there are, no doubt, which are in their essence too serious to admit of any lightness of treatment – but I cannot recognise Geometry as one of them. (Carroll 1879/2009, p. xxxv)

Here, Carroll openly admits that he sees no wrong in breaking one of the unspoken agreements about math; he uses math and humor together. In fact, he incorporates math into fantasy. Since he overtly incorporates mathematics into his fantasy with Euclid and His Modern Rivals, there is no reason to doubt that he incorporated it covertly in Through the Looking Glass only 8 years prior, and because his underlying messages went unnoticed in Through the Looking Glass, Carroll does not mince words in Euclid and His Modern Rivals. Written as a play, Euclid and His Modern Rivals takes place in Minos’ study, and he and Euclid put “Modern” (non-Euclidean) geometers on trial for wanting to replace Euclid. One of those people, or groups, was the AIGT itself which he calls “Association for the Improvement of Things in General.” He says of the group: Enter a phantasmic procession, grouped about a banner, on which is emblazoned in letter of gold the title “Association for the Improvement of Things in General.” Foremost in the line marches Nero, carrying his unfinished “Scheme for Lighting and Warming Rome”; while among the crowd which follow him may be noticed – Guy Fawkes, President of the “Association for Raising the Position of Members of Parliament.” (Carroll 1879/2009, p. 182)

Nero’s “Scheme for Lighting and Warming Rome” is referring to Nero burning Christians, and Fawkes’ “Association for the Raising the Position of Members of Parliament” is referring to Guy Fawkes’ failed attempt to blow up Parliament in 1605. Carroll is not implying that the removal of Euclid is tantamount to murder, but the gravity of the comparisons indicates that Carroll truly did not want his beliefs to go unnoticed again. As Carroll (whose opinions are voiced through Minos and Euclid) examines each non-Euclidean geometer’s works, he fails to find one completely without flaw when held in comparison to Euclid’s Elements. In some cases, Carroll dismisses entire works as being useless, but in some his only issues are with the phrasing of words or the misuse of words. Regardless of how minor the flaw appears to be, Carroll uses this as grounds to reject the entire work. However, this seems hypocritical when compared to the known facts about Euclid’s Elements. Even Carroll in later years could no longer deny that Elements contained flaws that needed to be rectified in order to be completely accurate.

37 Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise

1109

Carroll’s Misunderstandings of Non-Euclidean Geometries In 1868, Carroll published the second edition of his book called The Fifth Book of Euclid Treated Algebraically in which he wrote his own version of the fifth of Euclid’s Elements. Stuart Dodgson Collingwood (1898/2017) says of the book, The object of his edition of Euclid Book V., . . . was to meet the requirements of the ordinary Pass Examination, and to present the subject in as short and simple a form as possible. Hence [one of the controversial ideas] was omitted, though, as the author himself said in the Preface, to do so rendered the work incomplete, and, from a logical point of view, valueless. He hinted pretty plainly his own preference for an equivalent amount of Algebra, which would be complete in itself. (p. 58)

In fact, the entire book’s title is: Euclid, Book V: Proved Algebraically So Far As It Relates to Commensurable Magnitudes, to Which is Prefixed a Summary of All the Necessary Algebraical Operations. Arranged in Order of Difficulty. The complete title of the book essentially says, “Here is a new version of Book V of Euclid’s Elements that has been simplified as much as possible in order to appease the education system. Make sure you read the preface of the book before you get started, though, since the book you are holding is practically worthless.” By publishing his own version of Book V, Carroll hoped to clarify some of the confusion associated with the Euclidean approach to mathematics, but by his later years, his staunch stance on Euclid had shifted. In his later years, Carroll shifted his focus from the imaginative to the logical and focused more on writing mathematical works. Although Carroll had corresponded with mathematicians before this point, his earlier works on logic and geometry indicate that he was not fully familiar with the newer, more popular developments in the subjects. Wilson and Moktefi (2019) note, It remains true that Dodgson was outside the main mathematical circles and could not claim familiarity with the major mathematicians of his time (Henry Smith possibly being excepted), and yet one should not infer that he worked alone and isolated. Indeed, his diaries, letters, and papers show that he regularly solicited the opinions of Oxford colleagues concerning mathematical topics in which he was interested. (p. 29)

Carroll went on to publish several books and essays on formal logic and on Euclid’s role in the present education system. By the late 1880s, Lewis Carroll’s knowledge and understanding of non-Euclidean geometries had grown. Wilson (2019) notes that “[Carroll] was aware of these ‘non-Euclidean geometries’ but rejected them as meaningless and irrelevant to the geometrical world in which we live. He accepted that they were consistent mathematical theories, but did not recognize them as depictions of real space” (p. 52). By this point, Carroll had begun publishing papers arguing that Euclid had actually meant for the Parallel Postulate to be taken as a theorem and not as an axiom. With this new revelation, Carroll published his own version of the Parallel Postulate taking on what he considered to be a more “intuitive” response. Abeles (2019) says:

1110

N. S. Evers

Dodgson’s attempts to dispense with Euclid’s Parallel Postulate was his reason for writing A New Theory of Parallels, first published in July 1888. His aim was to replace Euclid’s axiom with one that would be ‘intuitively’ true – by which he meant an axiom that does not involve infinities and infinitesimals. He regarded Euclid’s Parallel Postulate as something that was true for finite magnitudes only, and he believed that this is what Euclid really intended but did not state directly. (pp. 179–180)

After his publication of A New Theory of Parallels , the rest of Carroll’s works that involved Euclid were focused more on defending Euclid’s possible reasons as opposed to his actual words. Abeles (2019) further remarks, In Appendix III of the first edition of A New Theory of Parallels, [Carroll] faulted Euclid’s definition of parallel lines as lines that do not meet, no matter how far they are produced, because he claimed that it did not produce a unique pair of lines, writing: ‘given a Line and a Point not on it, a whole “pencil” of Lines may be drawn, through the Point, and not meeting the given Line . . . after drawing one such Line, that the others make with it angles which are infinitely small fractions of a right angle’ In other words, allowing parallelism to apply to infinite lines introduces both the necessity of infinitesimals and also the possibility of there being another parallel axiom – a hyperbolic one. (pp. 185–186)

Carroll attempted to create his own proof of the Parallel Postulate in which he inscribed an equilateral hexagon in a circle. Carroll remarked to someone criticizing A New Theory of Parallels who disagreed with his third attempted (and failed) proof of the Parallel Postulate, I shall be told, no doubt, that this is too bizarre and unprecedented an Axiom – that it is an appeal to the eye and not to the reason. That it is somewhat bizarre I am willing to admit – and am by no means sure that this is not rather a merit that a defect. But as to its being an appeal to the eye, what is ‘two straight Lines cannot enclose a space’ but an appeal to the eye? What is ‘all right angles are equal’ [Postulate 4] but an appeal to the eye? (Abeles 2019, p. 184)

This statement was published after Carroll’s third attempt to produce a proof for the Parallel Postulate, and it shows that his mindset towards the Parallel Postulate had shifted – it was no longer a matter of proving the Parallel Postulate but a matter of justifying it in order to keep the five axioms intact. However, despite his shifting view on Euclid’s Elements, his belief that mathematics was infallible did not shift. The introduction to A New Theory of Parallels states that “The charm [of pure mathematics] lies chiefly, I think, in the absolute certainty of the results; for that is what, beyond almost all mental treasure, the human intellect craves for . . . but neither thirty years, nor thirty centuries, affect the clearness, or the charm, of Geometrical truths” (Dodgson 1890, p. xv). In the end, it seems Carroll did not care so much about Euclid himself remaining present in the education system as long as the foundations of mathematics remained based on the idea that mathematics was infallible and unchangeable regardless of which curriculum it followed – even though, if he had his way, he would have preferred the conservative, Euclidean approach.

37 Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise

1111

Conclusion: The Real Reason Carroll Fought for Euclid Carroll was a conservative mathematician, and he honestly believed it was dangerous to remove Euclid from school systems. In this time, the teaching of logic (emphasized in Euclid) went hand-in-hand with ethics. Therefore, logic was a necessity to preserve an ethical standard as well. Carroll wrote several books dedicated to logic, and Moktefi (2019) remarks on Carroll’s interest in logic: Religious thinking also provided crucial motivation for Dodgson’s work on logic. Indeed, he believed that ‘the bad logic that occurs in many and many a well-meant sermon, is a real danger to modern Christianity’, and even considered writing a book that would address religious difficulties treated ‘from a logical point of view, in order to help those, who feel such difficulties, to get their ideas clear, and to see what are the logical results of the various views held.’ (p. 88)

Therefore, Carroll’s fight to keep Euclid went beyond mathematics; he feared that removing Euclid – which was a vital part of the foundation of students’ education – would fuel the discussion for changing other aspects of the foundations of student’s education, such as the teaching of the Bible. In the preface to Euclid and His Modern Rivals, one of the key statements Carroll makes is “subjects there are, no doubt, which are in their essence too serious to admit of any lightness of treatment . . . .” (Carroll 1879/2009, p. xxxvi). Abeles (2019) says, As a deeply religious man [Carroll] considered his mathematical abilities to be a gift that he should use in the service of God, and he linked his work in geometry with his religious beliefs through the way he perceived natural theology and the nature of mathematical truth. In his time mathematics was considered uniquely capable of generating truths from axioms that captured the nature of reality. His need to study and develop logical rules for reasoning reflected this conviction. (p. 178)

Although Carroll felt there was nothing inappropriate with satirizing the mathematical community, he recognized there were some subjects, like the Bible, that were too sacred to alter. Since Euclid’s Elements had always been considered to be based on axioms – much like the Bible – what was to stop the school boards from then supplementing, editing, or removing the Bible? By this point in time, the infallibility of the Bible was already being questioned. Four months after the publication of Origin of Species, Essays and Reviews was written in which seven Anglicans called into question the divine inspiration of the Bible as well as how the Bible should be read in context with modern science. Three of these seven writers were Oxford teachers, one of whom was Baden Powell. Powell was the Savilian Professor of Geometry who was opposed to Euclid (he said of students who studied mathematics: “though a certain portion had ‘got up’ the first four books of Euclid, not more than two or three could add Vulgar Fractions or tell the cause of day or night, or the principle of a pump . . . ” (Wilson 2019, p. 43), and it is noted that Carroll attended his lectures (Hofstadter et al. 2011, p. 194). To a conservative like Carroll, the truth itself was at stake. Alice in Wonderland is truly a nonsense adventure even though it contains elements of social

1112

N. S. Evers

satire and mathematical concepts, but Through the Looking Glass is different; it serves a polemic purpose: revealing the folly of discarding Euclid for the sake of not only protecting Euclid, logic, and the students, but ultimately, protecting God.

References Abeles FF (2019) Mathematical legacy. In R. Wilson & A. Moktefi (Eds.) The Mathematical World of Charles L. Dodgson (Lewis Carroll). Oxford University Press, Oxford Anon (1805) Original ditties for the nursery; so wonderfully contrived that they may be either sung or said, by nurse or baby. Original Juvenile Library, London Carroll L (2007) Through the looking glass. Penguin Books, London. (Original work published in 1871) Carroll L (2009) Euclid and his modern rivals. Macmillan and Co, New York. (Original work published in 1879) Carroll L, Tenniel J (2014) Alice’s adventures in wonderland and other classic works. Fall River Press, New York Cohen MN (1995) Lewis Carroll: a biography. Alfred A. Knopf, Inc., New York Collingwood SD (2017) The life and letters of Lewis Carroll: (rev. C.L. Dodgson) Illustr. CreateSpace Independent Publishing Platform, Columbia. (Original work published in 1898) Dodgson CL (1890) Curiosa Mathematica, part 1: a new theory of parallels. Macmillan and Co., London Euclid, Heath TL (trans) (1926) The thirteen books of Euclid’s elements, vol I. Cambridge University Press, Cambridge. (Original work published approximately 300 BC) Felsager B (2004) Through the looking glass: a glimpse of Euclid’s twin geometry: the Minkowski geometry, ICME-10. Haslev Gymnasium & HF, Copenhagen Gardner M, Carroll L (1974) The annotated Alice. New American Library, New York Gray JJ (2018) Carl Friedrich Gauss. Encyclopædia Britannica Harrison SJ (2007) Generic enrichment in Vergil and Horace. OUP, Oxford Henderson A (2009) Math for math’s sake: non-Euclidean geometry, aestheticism, and flatland. PMLA 124(2):455–471 Hofstadter D, Gardner M, Burstein M (2011) A bouquet for the gardener: Martin Gardner remembered. Lewis Carroll Society of North America, New York Legendre AM, Smith F (trans) (1867) Élements de géometrie. Kelly & Piet Publishers, Baltimore. (Original work published in 1823) Lucibella M (2013) June 10, 1854: Riemann classic lecture on curved space. American Physical Society 22(6):2, Maryland Mastin L (2010) 19th century mathematics – Bolyai and Lobachevsky. Bolyai and Lobachevsky – 19th century mathematics – the story of mathematics Moktefi A (2019) Logic. In R.Wilson & A. Moktefi (Eds.), The Mathematical World of Charles Dodgson L (Lewis Carroll). Oxford University Press, Oxford O’Connor J, Robertson E (1996) Non-Euclidean geometry. Retrieved August 4, 2019, from http:// www-history.mcs.st-and.ac.uk/˜history/HistTopics/Non-Euclidean_geometry.html Price MH (1994) Mathematics for the multitude? A history of the Mathematical Association. The Mathematical Association, Leicester Proclus, Morrow GR (trans) (1970) A commentary on the first book of Euclid’s elements. Princeton University Press, Princeton. (Original work published between 412–485)

37 Lewis Carroll’s Defense of Euclid: Parallels or Contrariwise

1113

Wilson JM (1868) Elementary geometry. MacMillan & Co, London Wilson R (2019) Geometry. In R. Wilson & A. Moktefi (Eds.), The Mathematical World of Charles L. Dodgson (Lewis Carroll). Oxford University Press, Oxford Wilson R, Moktefi A (2019) A mathematical life. In R. Wilson & A. Moktefi (Eds.), The Mathematical World of Charles L. Dodgson (Lewis Carroll). Oxford University Press, Oxford

Part III Mathematics and Architecture

Architecture and Mathematics: An Ancient Symbiosis

38

Michael J. Ostwald

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationships and Epistemology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematics in Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematics for Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematics of Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1118 1119 1121 1126 1128 1131 1132 1132

Abstract From the construction of primitive tribal huts to the design of technologically advanced high-rise buildings, architects have relied on and been inspired by mathematics. This chapter introduces and categorizes three types of applications of mathematics in architecture. The first comprises aesthetic applications, which are visible or expressed in a building. The second are practical applications that provide support for the creation of a stable, durable, and functional building. The last category is made up of analytical applications, which reveal various invisible properties of a building. Using this framework, the chapter examines 17 major architectural themes, spanning historically from Ancient Egypt to the present day, and in scope from number mysticism to computational analysis. Through this process, the chapter investigates the rich, symbiotic relationship that exists between architecture and mathematics, providing a series of mechanisms for understanding the many ways they are connected.

M. J. Ostwald () UNSW Built Environment, University of New South Wales, Sydney, NSW, Australia e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_138

1117

1118

M. J. Ostwald

Keywords Architecture · Mathematics · Design · History · Construction · Aesthetics · Symbolism · Analysis

Introduction Throughout history, the intricate and fertile relationship between architecture and mathematics has evolved and strengthened. From the moment the first primitive tribes wove grass into mats, and then lashed them to fallen branches to create a shelter, a connection was established between architecture and mathematics. When Neolithic communities created symmetrical thatched roofs and lined floors with flat stones, this connection was confirmed. The relationship deepened in the civilizations of ancient Greece and Rome, when architects began to associate proportions and shapes with notions of beauty and order (Kruft 1994). In the fifteenth century, Renaissance architects used perspective to geometrically define visual perception and in the seventeenth century, Baroque architects developed complex overlapping geometric plans (March 1998). Two hundred years later, mathematicians would name these Baroque geometric constructs “Boolean operations.” In the 1940s, architects were inspired by non-Euclidian geometry and Le Corbusier presented the Modulor, a system of ideal proportional ratios for designing buildings. In the 1970s, Christopher Alexander used graph theory to propose a universal “pattern language” for design and architects employed set theory to analyze urban infrastructure. Today, the symbiotic relationship between architecture and mathematics is visible in most major cities. Building facades have been lined with aperiodic tiles, roofs shaped using Voronoi tessellations, foyers extrapolated from Klein bottles, and designs inspired by Sierpinski cubes. From the Euclidean compositions of 1980s’ architects Mario Botta and Ricardo Bofill to the organic designs of Zaha Hadid in the 2000s, architects continue to be both inspired and supported by advances in mathematics. This chapter explores the close association between architecture and mathematics that has evolved throughout history. It adopts a three-part framework (“in, for, and of ”) to classify relationships between mathematics and architecture. The three categories in the framework distinguish between mathematical applications that are visible in a building, those which are used for supporting the design process and finally, ones that reveal hidden properties of a building (Ostwald and Williams 2015a). These three could also be thought of as aesthetic, practical, and analytical applications, respectively. Starting with the “in” or aesthetic category, the chapter traces mathematical applications across six major eras and civilizations: Ancient Egyptian, Greek, Roman, Medieval, Renaissance, and Baroque. Three historic building “types” – the labyrinth, the temple, and the utopian city – which are significant for demonstrating applications of mathematics in architecture, are also considered. In the “for” or practical category, tessellations, stereotomy, fractal geometry, and parametric algorithms are described. Finally, in the “of ” category, four computational analytical methods are presented: shape grammars,

38 Architecture and Mathematics: An Ancient Symbiosis

1119

space syntax, isovists, and fractal dimensions. In total, this chapter describes 17 architectural themes which are categorized into three types. These categories are, however, not mutually exclusive. As later sections in this chapter reveal, architects often use mathematics for multiple simultaneous purposes, and it is common to see combinations of practical and aesthetic (for + in) or practical and analytical (for + of ) applications. The content of this chapter is structured broadly in chronological order, as understanding changing applications of mathematics in architecture is one way of revealing insights into the relationship between them. Constructing a rigorous chronological account is, however, neither possible nor desirable in the limited scope of this chapter. In the first instance, there are many practical applications, like tiling and measuring, that occur across eras, and there are several building types, like the temple, which are repeated throughout history. In the second, we cannot assume that developments in architecture and mathematics have occurred contemporaneously. Certainly, throughout history architecture and mathematics have tended to follow a “progressive trajectory,” developing and applying new knowledge and challenging old beliefs with scientific evidence (Ostwald and Williams 2015b). But there have been eras and specific architectural movements when connections between the two were less direct. One example is the nineteenth-century Gothic Revival movement in architecture, which adopted the symbolic geometry and number mysticism of the original sixteenth-century Gothic. In this revival, architects sought to employ a historic language of mathematics that was completely out of step with mathematical developments at the time. A further example of this misalignment is seen in the 1980s Postmodern Historicist movement, where architects revived the use of proportional ratios from classical Greek temples, completely ignoring contemporary mathematical advances in particle modeling and algebraic number theory. These two examples are useful reminders that the relationship between architecture and mathematics may have continued throughout history, but the two have not necessarily been focused on the same topics and at the same time. Architecture, more so than mathematics, is subject to the whims of clients, social pressures, financial constraints, and legislation. These factors can limit the extent to which an architect can make direct use of new mathematical knowledge. Nevertheless, in recent years, the symbiotic relationship between architecture and mathematics has become more pronounced, and there are now many examples where branches of mathematics have had a visible impact in architecture, less than a decade after being developed (Burry and Burry 2012). Given the time taken to design and construct a major building, this could be regarded as evidence that contemporary architects are paying closer attention to mathematics than ever before.

Relationships and Epistemology In one of the first books of the modern era to directly address the relationship between architecture and mathematics, Mario Salvadori (1968) notes the profound difficulty with this task. The problem, he argues, is that “there is not just one

1120

M. J. Ostwald

mathematics, but many, and there is not one architecture, but many” (Salvadori 2015: 25). Mathematics is often divided into “pure” or “applied” branches, with the former including number theory, geometry, algebra, topology, and combinatorics. The latter, applied mathematics, tends to be delineated into general subfields like statistics, probability and algorithmic operations, and discipline-specific subfields, like statics, mechanics, and dynamics. Architecture is potentially even more diverse. In a professional sense, it is not just associated with the design and construction of buildings, its scope encapsulates landscapes, interiors, and urban plans. As a scholarly discipline, architecture has specializations in design, history, conservation, science, sociology, structures, tectonics, and computing. Any and all of these branches of architecture can, and do, use mathematics for their quotidian operations. This is why Salvadori asks the question, “which mathematics should I discuss in relationship to which architecture?” (Salvadori 2015: 25). Two paths to answering this question are to consider the education of architects and the reasons why architects use mathematics. From the earliest architectural treatises, it is clear that architects were expected to understand numbers and geometry. In Ancient Greece, skilled artisans and military architects required expertise in stone-cutting and woodwork, both of which relied on metrology and geometry (Kostof 1977). In Ancient Rome, architects were expected to have knowledge of geometry, as well as history, medicine, philosophy, astronomy, music, and law (Vitruvius 2009). In medieval Europe, where education was broadly divided into three areas of study – logic, grammar, and rhetoric – mathematics provided a foundation for knowledge (Ostwald and Williams 2015c). By the Renaissance, the curriculum had expanded to include four arts (geometry, arithmetic, astronomy, and music) and four disciplines (art, science, medicine, and architecture). Since that time, architectural education has always included studies in geometry and arithmetic, and in the last few decades, mathematics has become both ubiquitous and effectively invisible in architectural practice, by virtue of being embedded in software (Ostwald and Williams 2015c). CAD (computer aided design) and BIM (building information modeling) software automates and enables many architectural processes today, providing a mathematical rigor and logic for them, as well as an advanced capacity for simulation and visualization of building data. In addition to the formal knowledge embedded in the architectural curriculum, architects also selectively appropriate mathematical knowledge to solve problems, to seek authority, or to communicate ideas (Ostwald 1999). Several explanations for how these appropriations occur have been offered by scholars. Evans (1995), for example, observes that architects treat geometric knowledge like the bricks needed to build a house, it is “reliably manufactured elsewhere and delivered to site ready for use” (1995: xxvi). Indeed, after closely considering architects attitudes to mathematics, Evans concludes that architects don’t just apply mathematical knowledge, they “consume it” (1995: xxvi). In this sense, mathematics may be a critical source of sustenance for architectural endeavors. Conversely, the ways architects borrow and adopt mathematical knowledge is evocatively described by Downton, as occurring on “dark nights,” when knowledge is “smuggled over the

38 Architecture and Mathematics: An Ancient Symbiosis

1121

difficult terrain at disciplinary borders by radical thinkers” and then “introduced in clandestine meetings and infiltrated by stealth into the mainstream of the discipline” (Downton 1997: 82). Unlike Evans, who views architecture’s adoption of mathematics as a largely pragmatic and opportunistic, Downton sees it in a more subversive and even political context. Both of these explanations emphasize the importance of understanding the motivations of architects when they use mathematical knowledge. The “in, for, and of ” framework adopted in this chapter was originally proposed in response to Salvadori’s (1968) unanswered question. It is used for examining and classifying historic and contemporary mathematical applications in architecture (Ostwald and Williams 2015a). The first category is called “mathematics in architecture,” because it comprises applications that are tangible in a completed building. It encompasses aesthetic, intellectual, and phenomenal applications of mathematics in design. The next category is named “mathematics for architecture,” because its applications are used for supporting the creation of a building. These include the use of mathematics to make predictions about stability, durability, and environmental performance. The final category, “mathematics of architecture,” captures analytical and visualization methods for investigating the properties of a building. These three categories, all of which overlap in different ways, are used in the remainder of this chapter to examine 17 architectural themes.

Mathematics in Architecture The first category centers on numeric and geometric properties that are expressed in the formal characteristics of a building and are intended to communicate a message or be read in a particular way. There are four application types in this category: aesthetic, symbolic and semiotic, phenomenal and rational, and inspirational (Table 1).

Table 1 Mathematics in architecture. (Adapted from Ostwald and Williams 2015a) Definition Mathematical properties that are demonstrated, visible or sensible in architecture

Application Aesthetic

Symbolic and semiotic

Phenomenal and rational

Inspirational

Explanation Using mathematics to achieve a particular appearance or visual effect Using mathematics to represent or communicate something about a building Using mathematics to evoke a connection by way of the senses or the mind Using mathematics as influence, motivation, or animation

1122

M. J. Ostwald

Throughout history, some of the most common applications of mathematics in architecture have been to serve a symbolic or semiotic purpose. For example, tripartite columns in Gothic cathedrals symbolize the Holy Trinity and the 36 columns in the Lincoln Memorial (1922) in Washington, D.C., represent the states of the union at the time of Abraham Lincoln’s death. Both are examples of number symbolism. The vesica pisces – the lens-shape formed through the overlapping of two circles of the same radius – has, depending on its context, either Christian or Pagan meanings (Barrallo et al. 2015). When found in gothic pointed arches and stone traceries in cathedral windows, the lens-shape evokes the fish-bladder or Pisces symbols of Christianity. Conversely, in pre-Christian iconography, the vesica pisces was a symbol of fertility, female sexuality, or an eclipse. Regardless of the meaning attributed to it by a particular culture, its application in architecture reflects the desire to communicate a message. Alternatively, as an example of a semiotic application, the façade of Robert Venturi’s postmodern Lieb House (1969) in New Jersey is adorned with a large number “9” to communicate two simultaneous messages. First, this is the street number of the house and its enlarged scale suggests an ironic celebration of its address. Second, the “9” recalls the oversized numbers and letters found in pop-art, a movement which inspired many postmodern architects including Venturi. In the Lieb House, these two levels of communication (known as “dual coding”) are coexistent, each with a different message. A further application of mathematics in architecture is associated with rationalism, which is the practice of prioritizing logic, reasoning, and empirical knowledge over beliefs and feelings. In its architectural variation, designers typically use simple Phileban solids (pyramids, cones, and cubes) to evoke higher order thinking. Étienne-Louis Boullée’s spherical design for a Cenotaph for Sir Isaac Newton (1784) is an example of the rationalist use of geometry in architecture. Mathematical concepts have been known to inspire architects. An example of this is Bolles Wilson’s design for the Forum of Water (1993) in Berlin, which is shaped like the iconic fractal, the Menger Sponge. Similarly, UN Studio’s Mobius House (1998) in the Netherlands is structured in a way that is reminiscent of its titular non-Euclidean counterpart, the Mobius strip. The world of Ancient Egypt, which spanned from around 3000 BCE to 30 BCE, is an obvious starting point when considering the symbiotic relationship between mathematics and architecture. The Egyptian pyramid, exemplified in the Great Pyramid of Khufu, is one of the oldest examples of the transformation of a geometric solid into a major structure (Bartlett 2014). Furthermore, as the Egyptians were able to construct many alternative building forms, it might be assumed that there was a degree of deliberation in the decision to create a four-sided pyramid (Rossi 2003). The evidence for this is, however, not so emphatic. Ancient Egypt did not recognize mathematics and architecture as distinct fields, and the closest equivalents to mathematicians and architects were scribes and craftsmen (Kostof 1977). Furthermore, as Rossi notes in  Chap. 39, “Egyptian Architecture and Mathematics,” while systems of measurement and counting did exist in the ancient world, they were largely used to record quantities and lengths associated with trade. As such, we do not know, for example, how builders used numbers or geometry,

38 Architecture and Mathematics: An Ancient Symbiosis

1123

or how designers determined which forms to use. Rossi ( Chap. 39, “Egyptian Architecture and Mathematics”) reviews the evidence and finds that initial decisions about the design of a building “might have had a symbolic meaning that was evidently decided in advance.” Although, she also notes that it is clear that after the design was chosen, “many geometrical details were handled directly in the field by the workmen” ( Chap. 39, “Egyptian Architecture and Mathematics”). As such, the geometric form of the pyramid may have been chosen for symbolic reasons, but once construction commenced, the primary applications of mathematics in this building were largely practical and logistical. From a mathematical perspective, one of the most significant buildings of the ancient world was the Egyptian Labyrinth. First described by Herodotus in the fifth century BCE, this grand palace was said to be so spatially complex that it confused visitors. The Greek word labyrinthos describes the bewildering properties of a maze-like network of corridors with dead-ends and blind corners. In more conventional use, a labyrinth is a complex geometric construct that is physically constructed for controlling the way people experience space, and a maze is a two-dimensional decoration or pattern. In  Chap. 40, “Labyrinth,” Morrison argues that this geometric pattern is the oldest symbol of the symbiosis between mathematics and architecture. Significantly, ancient Greek myths name Daedalus the first architect because he created both Ariadne’s maze-like dance floor and the labyrinth at Crete (Ostwald and Williams 2015a;  Chap. 48, “Tessellated, Tiled, and Woven Surfaces in Architecture”). The mythology of ancient Greece even suggests that it is the inclusion of mathematics that transforms a rustic structure into architecture. Importantly, it is possible to identify a lineage connecting many of the world’s most famous labyrinths. Morrison ( Chap. 40, “Labyrinth”) uses topology to demonstrate that the “structure of the symbol of the labyrinth” that emerges in the ancient world, “is repeated in various cultures” throughout history. Moreover, this structure reflects the Cretan labyrinth of Daedalus, effectively identifying this ancient structure as the first “fusion of architecture and symbol” ( Chap. 40, “Labyrinth”). While the sphere and the labyrinth have been positioned in architectural theory as representing two opposing positions – the former reflecting reasoning and logic and the latter phenomenology and mysticism (Tafuri 1987) – the classical Greek and Roman eras developed many more complex applications of mathematics in architecture. Three examples of these are identified in Duvernoy’s  Chap. 41, “Classical Greek and Roman Architecture: Mathematical Theories and Concepts.” The first involves the figurate representation of quantities, an approach that is concerned with the different shapes that numbers of discrete elements can be arranged into. The second revolves around the visual comparison of magnitudes, which uses graphic means to represent various relations. This was significant at the time because drawing was “the only means capable of imparting concrete form to irrational numbers and thereby proving their existence” ( Chap. 41, “Classical Greek and Roman Architecture: Mathematical Theories and Concepts”). The third mathematical concept is the theory of proportions. Ratios were used in classical Greek and Roman architecture to capture the relationship between numbers, lengths

1124

M. J. Ostwald

and musical tones. They were valued because beauty could be conceptualized through harmonic proportions and ratios. The practical application of these themes can be seen in the ways Doric, Ionic, and Corinthian temples were designed. In  Chap. 41, “Classical Greek and Roman Architecture: Examples and Typologies,” Duvernoy draws on a reading of Vitruvius’ De Architectura Libri Decem, the oldest surviving architectural treatise, to describe how mathematics was used to support the design and construction of buildings as well as to communicate messages about the relationship between the individual and the world (later embodied in Da Vinci’s image of the “Vitruvian man”). Duvernoy also examines Roman amphitheaters and outlines the evidence they display of the use of geometry for both practical and symbolic purposes. The combination of aesthetic and pragmatic applications of mathematics in architecture strengthened in the medieval era in Europe, which spanned from the fifth to the fifteenth century CE. Gothic and Medieval cathedrals were not only structurally complex, but they had to communicate meaning, often using numbers to emphasize the power of the liturgy (Lluis i Ginovart et al. 2018). One of the most famous examples of this combination is found in the Cathedral of Tortosa in Catalonia. As Lluis i Ginovart observes in  Chap. 43, “Mathematics and the Art and Science of Building Medieval Cathedrals,” this cathedral is unique for its era because its original design drawings still exist and also because a complete archive is preserved of the mathematical treatises available to the design team at the time. Using a reading of historic archival documents, Lluis i Ginovart ( Chap. 43, “Mathematics and the Art and Science of Building Medieval Cathedrals”) shows that it is possible to understand the arithmetic and geometric knowledge of the people responsible for the design of the cathedral. Some of the most important insights developed through this process involve the practical way geometric figures, such as octagons, were constructed and then grew to embody symbolic meaning. The word “renaissance” is derived from an old French meaning “rebirth.” The Renaissance era in Europe spanned from the fifteenth century and the end of the “middle ages,” to the seventeenth century and the “age of enlightenment.” The concept of a rebirth is interpreted in multiple ways in the Renaissance, incorporating both a renewed interest in art and literature and a revival of classical Greek and Roman knowledge (Kruft 1994; Millon 1996). The classical revival is especially important when considering mathematics and architecture, because Renaissance designers used geometry and proportions to revive classical notions of beauty (March 1998). As Duvernoy argues in  Chap. 44, “Renaissance Architecture,” the rediscovery of Vitruvius’s classic treatise, De Architectura Libri Decem, provided a catalyst for the rebirth of the era. From this source, Renaissance designers learnt that beauty in architecture arises from the use of numbers and geometric shapes, “their ratios, their proportion, their modularity, and their commensurability” ( Chap. 44, “Renaissance Architecture”). One of these ratios was so significant that Luca Pacioli named it the “divine proportion” (Fletcher 2006). In the latter part of the Renaissance, in an era now known as the Baroque, the phenomenological or emotional impacts of geometry became more pronounced (Norberg-Schulz 1980). Architecturally, the Baroque like the Renaissance, was

38 Architecture and Mathematics: An Ancient Symbiosis

1125

inspired by the classical orders, but whereas the latter venerated these motifs, the former interpreted them in a free and expansive way. Baroque spatial experience was about “unbridled movement, overwhelming richness in color and composition, theatrical effects produced by a free play of light and shade, and [the] indiscriminate mixture of materials and techniques” (Panofsky 1995: 23). Duvernoy in  Chap. 45, “Baroque Architecture” describes the era as a time when “complex shapes are experimented with and new layouts are designed, giving birth to dynamic and fluid spaces” ( Chap. 45, “Baroque Architecture”). She emphasizes the phenomenological significance of the curved and undulating walls of the oval-plan churches of the era (Duvernoy 2015). This was a time when the rules of classical Greek and Roman proportions were challenged and even replaced by new, illusory alternatives. This was, for example, the period when trompe-l’oeil (lifelike murals with false perspective) and anamorphosis (images that only reveal their true content when viewed from a particular angle or location) replaced the elaborate perspectival rules of the Renaissance (Lyttelton 1974). Just as the Great Pyramid of Khufu and the Cretan labyrinth at Knossos encapsulate particular formal and experiential connections, respectively, between mathematics and architecture, possibly the most important symbolic relationships between the two is found in the archetypal Temple of Solomon (Morrison 2010a). Also known as the First Temple, the Temple of Solomon was most likely built in Jerusalem around 950 BCE and destroyed by Nebuchadnezzar II in 587 BCE. Early accounts of the temple in the Tanakh (c 450 BCE) and Middoth (c 200 CE) place great significance on the dimensions and proportions of the temple. In the Renaissance, along with a renewed interest in classical Greek and Roman proportions, there was also a revival of interest in recreating the mathematical properties of the ideal temple (Morrison 2010b). Two of the lead proponents of this endeavor were the Spanish Jesuits Jerome Prado and Juan Bautista Villalpando. In  Chap. 46, “Temple of Solomon,” Morrison traces their attempts to reconstruct the mathematical structure of the temple and replicate its symbolic messages. This was not a straightforward task, as Villalpando and Prado fundamentally disagreed about the geometry of the temple. Prado maintained that the temple’s geometry “followed the description from the twelfth-century Rabbi Moses Maimonides, whose ground plan was asymmetrical, whereas Villalpando believed that . . . the Temple was highly symmetrical” ( Chap. 46, “Temple of Solomon”). For many hundreds of years, the mathematical proportions and structure of the ideal temple have remained a contested subject. It could be regarded as one of the earliest examples of the use of mathematics to embed symbolic or spiritual properties into architecture. Throughout history, there has been a tradition linking the physical and moral ills of society to the properties of its buildings and cities. Regardless of whether the argument is that architecture shapes society, or that society’s flaws are reinforced by buildings, there was a belief that ordered, geometric environments are healthier and more moral than chaotic streets lined with ramshackle houses. Such arguments echo the philosophical position mentioned previously in this chapter, which linked pure geometric forms to higher order thinking and labyrinths to immoral, debased, or decadent behaviors. Following this reasoning, throughout history proposals for

1126

M. J. Ostwald

healthy, just, and virtuous societies have typically used Euclidean geometry to embed a pervasive order and structure in a city plan and its architecture. As Morrison argues in  Chap. 47, “Utopian Cities,” the “concept of the ideal city being a perfect geometrical structure has its roots in the Pythagorean philosophy of number symbolism. For the Pythagoreans, the mystical cosmos could be understood through geometry and numerology, through perfect and pure numbers” ( Chap. 47, “Utopian Cities”). In utopian architecture, geometry serves a symbolic role, ordering the fabric of a city in accordance with either a rational or theological view of the ideal world.

Mathematics for Architecture The second category in this chapter encompasses the practical or functional techniques and tools used to support architectural design and construction (Table 2). This category includes mathematical applications that support the siting or construction of a building, ensure its stability or durability, and predict or optimize its performance. In these applications, mathematics informs or serves the design and construction processes, but it is not visible in the completed building. Thus, for example, the techniques of triangulation, traversing, and measurement may be used to survey a building site and determine its boundaries, but their application is not directly apparent in the completed building. In contrast, some applications of mathematics for architecture overlap with the previous category, mathematics in architecture, combining both practical and aesthetic applications. As the previous

Table 2 Mathematics for architecture. (Adapted from Ostwald and Williams 2015a) Definition Practical or functional tools or techniques for the support of architectural design, construction, and conservation

Application Measurement

Explanation Using mathematics to record and communicate dimensional information

Surveying

Using mathematics to derive and translate locational or site-related measures Using mathematics for achieving coordination and consistency within a larger design system Using mathematics to inform decisions about structural, acoustic, visual, environmental, and related physical properties Using mathematics to achieve an efficient or controlled coverage of a defined plane Using algorithms or rules to evolve aspects of a design

Modularity

Performance

Surface articulation

Generation

38 Architecture and Mathematics: An Ancient Symbiosis

1127

section notes, Egyptian and classical Greek and Roman architecture includes applications of both types in the same buildings. Furthermore, whereas tiling, tessellation, weaving, and stereotomy are, in a practical sense, instances of modularity or surface articulation, they are also frequently used for aesthetic purposes. The use of fractal geometry in architecture is also typical of this dual-application, as a designer may claim a practical motivation, but it is more often an example of mathematical inspiration. The use of mathematics to generate or evolve a design that meets predetermined parameters may also, ostensibly at least, be presented as a practical application, but it can equally serve as a means of seeking novel or original design solutions (Yu et al. 2015a). It is often theorized that the first intuitive applications of mathematical knowledge in architecture were associated with the use of weaving or tiling to cover or decorate surfaces. For example, Grünbaum and Shephard (1987: 1) argue that the “art of tiling must have originated very early in the history of civilization,” because, from the first attempt to “use stones to cover the floors,” a primitive knowledge of geometry was displayed. One of the oldest “origin myths” of architecture, recorded in Vitruvius’ De Architectura Libri Decem, describes the first house, “the primitive hut of the ancients,” as being constructed by weaving branches and leaves together. As such, examples of mathematical tiling in architecture are “not defined by the craft of combining materials but by the repetitious creation of patterns formed through the application of a set of usually polygonal shapes” (Ostwald 2015: 460). The combination of different materials and patterns, “to create a more durable or weatherproof finish for a floor, wall, or ceiling,” is described in  Chap. 48, “Tessellated, Tiled, and Woven Surfaces in Architecture. ” Architectural applications of tessellations range from Neolithic “stone cutting practices to late twentieth-century aperiodic cladding systems in major public buildings” ( Chap. 48, “Tessellated, Tiled, and Woven Surfaces in Architecture”). The latter category includes the use of Penrose and Conway tiles, along with Voronoi tessellations to cover large surfaces. In some of these cases, the tiles serve a practical purpose, whereas in others, they are intended to evoke a connection to contemporary mathematics. A special type of three-dimensional tiling in architecture and construction is stereotomy. In  Chap. 49, “Stereotomy: Architecture and Mathematics,” Fallacara and Gadaleta define stereotomy as “the art of building with small structural elements, geometrically refined, which allow the construction of architectural systems with triple value: aesthetic, static, and functional” ( Chap. 49, “Stereotomy: Architecture and Mathematics”). While originally described in Philibert Delorme’s sixteenth-century Le Premier Tome de l’Architecture, with the rise of computer-controlled, robotic fabrication processes, stereotomy has seen a recent revival. Just as tiling can have both a practical and aesthetic application in architecture, so too can many other geometric systems. One of the most interesting developments in late twentieth century mathematics was presented in Benoit Mandelbrot’s (1982) The Fractal Geometry of Nature. In this work, Mandelbrot showed how certain recursive mathematical systems can be categorized, their properties measured, and this knowledge applied to a diverse range of fields. A core component of this

1128

M. J. Ostwald

theory is the notion of fractal geometry, which refers to shapes that have noninteger dimensions, or possess “characteristic” roughness or irregularity. A fractal is “generated by successively subdividing or growing a geometric set using a series of iterative rules, producing a figure that has parts, which under varying levels of magnification tend to look similar, if not identical, to each other” ( Chap. 55, “Fractal Dimensions in Architecture: Measuring the Characteristic Complexity of Buildings”). At the start of the twenty-first century, more than 200 cases had already been identified of architects using fractal geometric forms or surfaces (Ostwald 2001). Over the next decade, this number expanded, as advanced computer aided manufacturing systems made it possible to construct partial fractals in two and three dimensions. While these fractals were sometimes employed for practical purposes – for example, to increase the surface area of solar collectors or to create more effective water channels in facades – more often they served symbolic purposes.  Chapter 50, “Fractal Geometry in Architecture” undertakes a review of the ways architecture and geometry can be combined through inspiration, practical application, and algorithmic evolution. This last factor, arising from the growth of generative and evolutionary algorithms, is also an example of the final type of mathematics for architecture. In the latter half of the twentieth century, a growing number of algorithmic processes were developed to solve problems in engineering, construction, and design. As a general principle, these processes started by defining allowable parameters (for example, distance, size, angle, density, and cost), and then the algorithm was applied sequentially to generate and test solutions that fulfill these parameters. Significantly, if multiple solutions fulfill the parameters, the architect is able to choose an outcome from a range of equally valid alternatives, a process that is often associated with creativity (Yu et al. 2015a). In  Chap. 51, “Parametric Design: Theoretical Development and Algorithmic Foundation for Design Generation in Architecture,” Gu et al. provide a critical review of generative design, its background, and history. They describe the way parametric design processes use a sequential and recursive application of mathematics to support “design generation in a structured manner, from an initial state of the design to the final outcome. Design alternatives emerge when different combinations of rules are selected and applied” ( Chap. 51, “Parametric Design: Theoretical Development and Algorithmic Foundation for Design Generation in Architecture”). Furthermore, through the development of detailed mathematical rules, parametric design can be used to replicate the structure of complex environments (Yu et al. 2015b).

Mathematics of Architecture The final category in this chapter, the “mathematics of architecture,” incorporates mathematical methods for measuring, analyzing, visualizing, and assessing the properties of a building. The mathematical knowledge in this category is neither the type that is readily visible in a building, nor the type used to support the siting, construction, or creation of the building. Instead, this last category is concerned

38 Architecture and Mathematics: An Ancient Symbiosis

1129

Table 3 Mathematics of architecture. (Adapted from Ostwald and Williams 2015a) Definition Logical and analytical methods for quantifying or determining various properties of architecture

Application Analysis

Explanation Using mathematics to better understand the properties of a design

Informatics

Using mathematics to visualize or characterize architectural, urban and regional spatial and formal properties The reasoned or disciplined application of knowledge to understand, justify, and evaluate the properties of a design

Logical reasoning

with architectural properties that are uncovered through rigorous investigation or assessment. There are three application types in this category: analysis, informatics, and logical reasoning (Table 3). The first application type involves the mathematical measuring, modeling, or simulation of the properties of a building. It is used to develop an improved understanding of a design and its performance. For example, axial line analysis can be used to identify potential congestion zones in major office buildings and to measure the ease with which hospitals can be navigated. The mathematical basis for this method is graph theory, correlated to human behavioral and cognitive patterns. Axial line analysis can be used to assist the design process, or more commonly to investigate and improve the operations of a completed building. As such, it can be either a practical or analytical application, depending on the architect’s motivations. The second application type, informatics, uses data and graphics to visualize properties of a buildings. For example, major buildings are often visualized in the first instance using spatial information derived from point-cloud models (developed through 3D laser scanning) or GIS data. This spatial data is, however, rarely revealing in isolation but becomes more valuable when augmented with data derived from building operations. This additional data, which can be visualized in the space of the building, might include energy use, user behavior, natural ventilation levels, or rental income. The benefit of using a data visualization method of this type is that it allows for an intuitive understanding to be developed of a building and its operations. The third mathematical application reflects the importance of logical reasoning in architecture. This reasoning may be intuitive (as, for example, in the weaving of the “primitive hut of the ancients”), purposive (in the Cretan labyrinth), or evidence based (in the majority of contemporary buildings). Most architectural design relies on the application of logic and reasoning which are a type of applied mathematical thinking. Indeed logic, in its inductive, deductive, abductive, and computational forms, is a foundation of most contemporary architecture. While analytical applications of mathematical thinking existed prior to the last decade, as seen in Antonio Gaudi’s hanging chain models from the late nineteenth

1130

M. J. Ostwald

century, the majority of examples in this category were developed in the late twentieth century. Two well-known applications, which were developed in the 1970s and 1980s, are now considered foundation methods in the field of computational design (March 2011; Ostwald 2011). The first of these examines the “grammar” of architectural shape and the second analyses the “syntax” of architectural space (Hillier and Hanson 1984). A Shape Grammar “is a computational approach that is used to identify and understand the rules required to produce the formal properties of a design, or model the design process used by an architect” (Lee and Ostwald 2020: xi). Conceptually, the grammar of architecture is the set of rules that determines how forms (the vocabulary of architecture) are combined into a distinct language. In  Chap. 52, “Shape Grammars: A Key Generative Design Algorithm,” Gu and Amini Behbahani describe the way this method has developed over time to “become the foundation and inspiration for many contemporary computational design methods and tools, especially parametric design” ( Chap. 52, “Shape Grammars: A Key Generative Design Algorithm”). In contrast, Space Syntax is the name for a set of computational tools and techniques that use “mathematics to measure the social, cognitive or experiential properties of a building or city plan” (Lee and Ostwald 2020: xi). As explained in  Chap. 53, “Space Syntax: Mathematics and the Social Logic of Architecture,” most syntactical methods “convert the spatial properties of a plan into a graph” before using mathematics to “derive various measures, which are interpreted in the context of the original plan or against benchmark data for particular building types” ( Chap. 53, “Space Syntax: Mathematics and the Social Logic of Architecture”). Because Space Syntax is based in part, on correlations between human social and behavioral data and spatial data, its mathematical basis differs slightly from conventional graph theory. Thus, for instance, in Space Syntax, “integration” is a primary indicator of the likelihood, all other factors being equal, of people meeting in a particular place or of their paths crossing in a particular area (Hillier 1996). In graph theory, the “centrality” measure is an equivalent and arguably more appropriate measure, but it does not possess the same level of empirical evaluation. In the last decade, both Shape Grammars and Space Syntax have evolved into more advanced mathematical variations, the former as part of evolutionary algorithms and the latter in multiple software platforms. Both have also been combined to develop generative and analytical models encompassing both space and form (Lee et al. 2017). A technique that is closely associated with Space Syntax is isovist analysis. In  Chap. 54, “Isovists: Spatio-Visual Mathematics in Architecture,” an isovist is defined as the geometry of space “that is delineated by the human cone of vision, scanning in every direction from a fixed location” ( Chap. 54, “Isovists: SpatioVisual Mathematics in Architecture”). It was originally described by Benedikt (1979: 47) as “the set of all points visible from a single vantage point in space with respect to an environment.” The set of all isovists in a given building plan is the isovist field, which provides a global measure of spatio-visual properties, instead of a local one (Batty 2001; Ostwald and Dawes 2018). The properties of an isovist field can also be measured and compared using graph theory, a technique called Visibility Graph Analysis (VGA). In its most basic application in architectural

38 Architecture and Mathematics: An Ancient Symbiosis

1131

analysis, an isovist is represented in two dimensions as a polygon, from which various measures can be derived. Many of these measures have been correlated to human psychological responses, and so an isovist is not just an abstract concept, it can be used to model human perceptions and responses. Mandelbrot (1982), as previously mentioned in this chapter in the discussion of applications of fractal geometry, also presented a method for measuring the “fractal dimension” or “characteristic complexity” of an object. Fractal dimensions are non-integer dimensions that can provide a statistical indication of the typical complexity of an object. This fractal dimension is rarely significant in itself, rather it provides a basis for comparing different objects or sets, and then correlating different properties to dimensions. Architects became interested in measuring the fractal dimensions of buildings in the 1990s, and since that time, this method has grown to be widely accepted and used (Bovill 1996). In  Chap. 55, “Fractal Dimensions in Architecture: Measuring the Characteristic Complexity of Buildings,” the primary approach to measuring characteristic complexity, the box-counting method, is explained, along with its major mathematical variables ( Chap. 55, “Fractal Dimensions in Architecture: Measuring the Characteristic Complexity of Buildings”). This method has been used to analyze architectural designs, drawing conclusions about visual relationships, spatial psychology, the orientation of facades, functionalist expression, and the visual differences between architectural styles (Ostwald and Vaughan 2016). Like Shape Grammars, Space Syntax, and isovists, fractal dimensions are derived from the physical properties of a building, which are typically represented in plans and elevations.

Conclusion Architecture and mathematics share a strong cultural and intellectual heritage along with a capacity for supporting abstract, pragmatic, and poetic applications. They both rely on highly structured and coded logic systems to develop and test propositions, and they value the elegance and functionality of solutions. When viewed in this way, it is natural to see why architects have grown to rely on and respect mathematics. But the word “symbiosis,” which has been used throughout this chapter, suggests a mutual relationship. While the focus of this chapter has been on the use of mathematics in architecture, the reverse case also occurs. Georg Cantor, Robert Hooke, and Isaac Newton were all fascinated by architecture and sought, in different ways, to design buildings. Ludwig Wittgenstein, Benoit Mandelbrot, and many other mathematicians have used architectural examples to explain mathematical concepts. In “recreational” mathematics, Martin Gardner and Ian Stewart have employed architectural tropes to solve problems. Naum Vilenkin’s Mathematical Art Museum and Henri Poincare’s Gallery of Monsters point to the ways mathematical knowledge can be ordered and conceptualized architecturally. Paul Lévy’s references to architecture suggest authority and order, and Douglas Hofstadter used architectural analogies to explore cognitive structures. Brian Kaye’s tale, Topological Alice combines mathematical icons with architectural elements

1132

M. J. Ostwald

(windows, stairs, carpets, doors) to both educate and entertain. The title of Kaye’s account of life in a mathematical house references Lewis Caroll’s (Ludwig Dodgson’s) Alice in Wonderland and Alice Through the Looking Glass, but its playful commingling of mathematics, architecture, and social commentary has parallels with Edwin Abbott’s Flatland and Tom Stoppard’s Arcadia. In general, mathematicians like philosophers borrow architectural concepts or use architecture as an analogy or inspiration, to explain processes of creation, construction, stability, and endurance (Wigley 1993; Ostwald 1999). Mathematicians talk about constructing a theorem, laying its foundations, and testing the strength of its principles. The examples in this conclusion emphasize that architecture and mathematics, at the very least, have a propensity for borrowing from each other, and perhaps, have a degree of mutual dependence and regard. From ancient Egypt to the modern era, architects have used mathematics to solve practical problems and been inspired by the beauty and poetry of numbers and geometry. These are the signs which suggest the relationship has been, and will continue to be, a productive and symbiotic one.

Cross-References  Baroque Architecture  Egyptian Architecture and Mathematics  Classical Greek and Roman Architecture: Examples and Typologies  Classical Greek and Roman Architecture: Mathematical Theories and Concepts  Fractal Dimensions in Architecture: Measuring the Characteristic Complexity of

Buildings  Fractal Geometry in Architecture  Isovists: Spatio-Visual Mathematics in Architecture  Mathematics and the Art and Science of Building Medieval Cathedrals  Parametric Design: Theoretical Development and Algorithmic Foundation for

Design Generation in Architecture  Renaissance Architecture  Shape Grammars: A Key Generative Design Algorithm  Space Syntax: Mathematics and the Social Logic of Architecture  Stereotomy: Architecture and Mathematics  Temple of Solomon  Tessellated, Tiled, and Woven Surfaces in Architecture  Utopian Cities

References Barrallo J, González-Quintial F, Sánchez-Beitia S (2015) An introduction to the vesica piscis, the Reuleaux triangle and related geometric constructions in modern architecture. Nexus Netw J 17:671–684. https://doi.org/10.1007/s00004-015-0253-9

38 Architecture and Mathematics: An Ancient Symbiosis

1133

Bartlett C (2014) The design of the great pyramid of Khufu. Nexus Netw J 16:299–311. https:// doi.org/10.1007/s00004-014-0193-9 Batty M (2001) Exploring isovist fields: space and shape in architectural and urban morphology. Environ Plan B: Plan Des 28(1):123–150 Benedikt ML (1979) To take hold of space: isovists and isovist view fields. Environ Plan B: Plan Des 6(1):47–65 Bovill C (1996) Fractal geometry in architecture and design. Birkhäuser, Boston Burry J, Burry M (2012) The new mathematics of architecture. Thames and Hudson, London Downton P (1997) The migration metaphor in architectural epistemology. In: Cairns S, Goad P (eds) Building dwelling drifting. Melbourne University, Melbourne Duvernoy S (2015) Baroque oval churches: innovative geometrical patterns in early modern sacred architecture. Nexus Netw J 17:425–456. https://doi.org/10.1007/s00004-015-0252-x Evans R (1995) The projective cast: architecture and its three geometries. MIT Press, Cambridge, MA Fletcher R (2006) The golden section. Nexus Netw J 8:67–89. https://doi.org/10.1007/s00004-0060004-z Grünbaum B, Shephard GC (1987) Tilings and patterns. W. H. Freeman, New York Hillier B (1996) Space is the machine. Cambridge University Press, London Hillier B, Hanson J (1984) The social logic of space. Cambridge University Press, Cambridge Kostof S (1977) The architect: chapters in the history of the profession. University of California Press, Berkeley Kruft HW (1994) A history of architectural theory from Vitruvius to the present. Princeton Architectural Press, New York Lee JH, Ostwald MJ (2020) Grammatical and syntactical approaches in architecture. IGI Global, Hershey Lee JH, Ostwald MJ, Gu N (2017) A combined plan graph and massing grammar approach to Frank Lloyd Wright’s prairie architecture. Nexus Netw J 19:279–299. https://doi.org/10.1007/ s00004-017-0333-0 Lluis i Ginovart J, López-Piquer M, Urbano-Lorente J (2018) Transfer of mathematical knowledge for building medieval cathedrals. Nexus Netw J 20:153–172. https://doi.org/10.1007/s00004017-0359-3 Lyttelton M (1974) Baroque architecture in classical antiquity. Cornel University Press, Ithaca Mandelbrot BB (1982) The fractal geometry of nature. W.H. Freeman, San Francisco March L (1998) Architectonics of humanism: essays on number in architecture. Academy Editions, London March L (2011) Forty years of shape and shape grammars, 1971–2011. Nexus Netw J 13:5–13. https://doi.org/10.1007/s00004-011-0054-8 Millon HA (1996) Italian Renaissance architecture: from Brunelleschi to Michelangelo. Thames and Hudson, London Morrison T (2010a) The body, the temple and the Newtonian man conundrum. Nexus Netw J 12:343–352. https://doi.org/10.1007/s00004-010-0029-1 Morrison T (2010b) Juan Bautista Villalpando and the nature and science of architectural drawing. Nexus Netw J 12:63–73. https://doi.org/10.1007/s00004-010-0017-5 Norberg-Schulz C (1980) Baroque architecture. Electra, Milan Ostwald MJ (1999) Architectural theory formation through appropriation. Architectural Theory Rev 4(2):52–70. https://doi.org/10.1080/13264829909478370 Ostwald MJ (2001) Fractal architecture: late twentieth century connections between architecture and fractal geometry. Nexus Netw J 3:73–84. https://doi.org/10.1007/s00004-000-0006-1 Ostwald MJ (2011) The mathematics of spatial configuration: revisiting, revising and critiquing justified plan graph theory. Nexus Netw J 13:445–470. https://doi.org/10.1007/s00004-0110075-3 Ostwald MJ (2015) Aperiodic tiling, Penrose tiling and the generation of architectural forms. In: Williams K, Ostwald MJ (eds) Architecture and mathematics, from antiquity to the future. Volume II: 1500s to the future. Birkhäuser, Cham, pp 459–472

1134

M. J. Ostwald

Ostwald MJ, Dawes MJ (2018) The mathematics of the modernist villa: architectural analysis using space syntax and isovists. Birkhäuser, Basel Ostwald MJ, Vaughan J (2016) The fractal dimension of architecture. Birkhäuser, Cham Ostwald MJ, Williams K (2015a) Mathematics in, of and for architecture: a framework of types. In: Williams K, Ostwald MJ (eds) Architecture and mathematics, from antiquity to the future. Volume I: antiquity to the 1500s. Birkhäuser, Cham, pp 31–57. https://doi.org/10.1007/978-3319-00137-1_3 Ostwald MJ, Williams K (2015b) The revolutionary, the reactionary and the revivalist: architecture and Mathematics after 1500. In: Williams K, Ostwald MJ (eds) Architecture and mathematics, from antiquity to the future. Volume II: 1500s to the future. Birkhäuser, Cham, pp 1–27. https:/ /doi.org/10.1007/978-3-319-00143-2_1 Ostwald MJ, Williams K (2015c) Relationships between architecture and mathematics. In: Williams K, Ostwald MJ (eds) Architecture and mathematics, from antiquity to the future. Volume I: antiquity to the 1500s. Birkhäuser, Cham, pp 1–21. https://doi.org/10.1007/978-3319-00137-1_1 Panofsky E (1995) Three essays in style. MIT Press, Cambridge, MA Rossi C (2003) Architecture and mathematics in ancient Egypt. Cambridge University Press, Cambridge Salvadori M (1968) Mathematics in architecture. Prentice Hall, Englewood Cliffs Salvadori M (2015) Can There Be Any Relationships Between Mathematics and Architecture? In: Williams K, Ostwald MJ (eds) Architecture and mathematics, from antiquity to the future. Volume I: antiquity to the 1500s. Birkhäuser, Cham. 25–28. https://doi.org/10.1007/978-3-31900137-1_2 Tafuri M (1987) The sphere and the labyrinth: avant-gardes and architecture from Piranesi to the 1970s. MIT Press, Cambridge, Mass Vitruvius (2009) On architecture (trans: R Schofield). Penguin Books, London Wigley M (1993) The architecture of deconstruction: Derrida’s haunt. MIT Press, Cambridge, MA Yu R, Gu N, Ostwald M, Gero J (2015a) Empirical support for problem-solution co-evolution in a parametric design environment. Artif Intell Eng Des Anal Manuf (AIEDAM) 29(01):33–44 Yu R, Ostwald M, Gu N (2015b) Parametrically generating new instances of traditional Chinese private gardens that replicate selected socio-spatial and aesthetic properties. Nexus Netw J 17(3):807–829

Egyptian Architecture and Mathematics

39

Corinna Rossi

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Accurate Reckoning for Enquiring into Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scribes and Builders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematics and Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Practical Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Meanings Beyond Numbers? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1136 1136 1136 1138 1140 1140 1143 1145 1145

Abstract An analysis of the relationship between mathematics and architecture in ancient Egypt requires, first of all, an analysis of the terms involved in the discussion. Mathematics, mathematicians, architecture, and architect are modern terms that convey a range of meanings that may or may not find a precise correspondence in the ancient Egyptian culture. Textual, iconographic, and archaeological sources provide a significant amount of pieces of the puzzle representing the complex task of building a monument, and yet some important aspects still remain unclear. Mathematical knowledge was deeply intertwined with the architectural practice, but defining its nature and boundaries is not easy. The extant mathematical texts are schoolbooks and cast a relatively limited light on the way in which numbers and geometrical figures were used in the planning and building process; in particular, it is difficult to establish who decided the shape and the dimensions

C. Rossi () Politecnico di Milano, Milan, Italy e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_57

1135

1136

C. Rossi

of the buildings and of their architectural elements. The overall impression is that building a monument was a collective enterprise, carried out by a long line of individuals, the majority of whom remained anonymous.

Keywords Ancient Egypt · Architecture · Mathematics · Scribe · Reckoning

Introduction Mathematics has always been deeply linked with arts and architecture, but the nature of this connection varied substantially over time and space. Taking for granted that numbers and geometrical figures were used always in the same way is a mistake that often leads to misunderstandings and misinterpretations, and this is especially true when dealing with cultures that developed centuries or even millennia ago. Ancient Egypt culture represents a special case, as it lasted for over thirty centuries, during which the concepts of continuity and cyclicity occupied a fundamental place and played an important and defining role. From the Late Period onwards, the increasing contacts with other Mediterranean cultures started to accompany the natural evolution of the Egyptian culture, until the advent of the Greekspeaking Ptolemaic Dynasty certified the spread of the Hellenistic culture along the Nile Valley. The continuity in terms of artistic and architectural forms, eagerly pursued, now rested on and interacted with a mathematical system that certainly combined ancient Egyptian and Hellenistic elements. In order to analyze how mathematics, arts, and architecture interacted in ancient Egypt, it is easier to turn to the “classic” pharaonic period, from the beginning of the Old Kingdom to the end of the New Kingdom: the abundant textual, iconographic, artistic, and architectural evidence allows a reconstruction of the big picture, even if some parts remain still obscure.

Definitions Accurate Reckoning for Enquiring into Things Writing books and articles allows scholars to share the results of their research, to start discussions, and to share and spread knowledge; in other words, to communicate with other people. In this respect, researchers should adopt a language that is comprehensible and refer to concepts that can be understood by readers. However, one must always be aware that modern words and concepts might or might not overlap with what appear to be their ancient counterparts (Rossi 2010). As this chapter is dedicated to the relationship between mathematics and architecture in

39 Egyptian Architecture and Mathematics

1137

ancient Egypt, it is important, first of all, to check whether using the modern terms mathematics (and mathematician) and architecture (and especially architect) would be really appropriate or whether it would generate any misunderstanding. Describing mathematics and mathematicians is not necessarily an easy task as, to start with, there is no modern consensus on their definition: “there is no single body of knowledge that we can conveniently call ‘mathematics’ but ( . . . ) we can identify many mathematical disciplines and activities”; moreover, “mathematicians,” as we now understand the term, is a modern European invention. ( . . . ) One thing is certain: “the history of mathematics is not the history or mathematicians” (Stedall 2012: Chap. 2, esp. pp. 29 and 31). The word “mathematics” comes from the ancient Greek mathema (“what has been learned”), and originally referred to any subject that requested a formal training; by the time of Plato and Aristotle, it became primarily associated with arithmetic and geometry (Perisho 1965: 64), a link that persists in the modern perception. The ancient Egyptian concept that can be assimilated to our mathematics is the action of “reckoning.” The first line of the most important extant mathematical source from ancient Egypt, the Rhind Mathematical Papyrus (Peet 1923; Robins and Shute 1987), gives itself the title: “Accurate reckoning [or rules for reckoning] for enquiring into things, and the knowledge of all things, mysteries . . . all secrets” (Clagett 1999: 122; Imhausen 2016: 66). Reckoning (literarily “breaking up numbers,” Gardiner 1957: 538) and its rules were thus perceived as a field, a discipline, an area of knowledge; the same expression also meant “correct method, norm, standard (of speech, conduct), rectitude” (Faulkner 1981: 297). One of the most important concepts in the ancient Egyptian culture was that of truth, justice, and cosmic order, personified by the goddess Maat (Wilkinson 2003: 150); the containment of the unrule, often represented as wild animals being circled and constrained or killed, represented a recurring theme in the Egyptian iconography of power (Kemp 2006: 92–99). In this respect, reckoning was certainly a way to put things in order (cf. Clagett 1999: 183–184) and, as such, had a regulating role in society; from the first line of the Rhind Mathematical Papyrus, it is also clear that the ancient Egyptians were fully aware of the fact that numbers could be used as tools to investigate the surrounding world. How much and how this aspect was developed is very difficult to tell (Roero 1994: 43–44; see also Spalinger 1990: 310), as the few surviving mathematical sources are textbooks meant to teach the basics to young scribes, thus offering a glimpse only of their initial training. What appears clearly from these textbooks is that “reckoning” was applied and deeply linked to a wide variety of activities, including agriculture, architecture, administration, and accounting. Indeed, it is likely that the need to keep accounts relating to agriculture might have been a fundamental factor that triggered the invention of writing in Egypt, as it happened in Mesopotamia (Imhausen 2016: 15– 17). If this is the case, then the situation could be described in the opposite way: rather than being applied to other activities, reckoning was born with them, and represented a fundamental and founding element of their development.

1138

C. Rossi

Scribes and Builders In ancient Egypt, generally speaking mathematical knowledge belonged to the realm of scribes, who were expected to achieve a set level of literacy and numeracy. This condition guaranteed a relatively privileged position in the Egyptian society in comparison with other, more physical occupations. In the Middle Kingdom text known as the Satire of the Trades, a father lists to his son all the disadvantages of being engaged in trades and professions other than being a scribe: among the potter (who “grubs in the mud more than a pig”), the mason (plagued by backache), the courier (forced to travel in the scary desert), and the washermen and fishermen (who are constantly at risk of encountering a crocodile), the scribe appears to be the most convenient profession (Lichtheim 1973: 184–192). The New Kingdom instructions of Papyrus Lansing strongly suggest: “be a scribe! Your body will be sleek; your hand will be soft. ( . . . ) You stride freely on the road. You will not be like a hired ox” (Lichtheim 1976: 171). Since the earliest times, scribes were associated with the hieroglyphic sign depicting a writing palette and were depicted in a squatting position that became associated with their activities. This classic and long-lasting iconography is not mirrored, however, by a unique and unequivocal interpretation of these figures: “all scribes may be assumed to be literate to some extent, but it is a logical fallacy to assume that all literate people were scribes” (Allon and Navratilova 2017: 1–4). The term scribe appears to refer to employees in charge of recording items and events but also appears among the titles of individuals that clearly exploited their literacy and numeracy for high-level tasks. Rather than being described as two closed categories, these two groups are likely to represent two elements of the spectrum of literacy and numeracy that could be encountered in the ancient Egyptian society (Baines 1983; Lesko 2001; cf. Cuomo 2012). As indicated by the surviving mathematical textbooks, young scribes were trained in a variety of tasks involving the use of numbers, including some directly relating to the organization and management of building sites. One very important aspect appears to have been recording the progress of the works, as abundantly witnessed by the written material, dating to the Nineteenth and Twentieth Dynasties, retrieved in the Valley of the Kings and in the area of Deir al-Medina, the village where the workmen engaged in the construction of the royal tombs lived: some of these documents bear traces of the actual planning process, while the majority are lists of parts of tombs with their dimensions written out; in some cases, scribes also calculated the total volume of the stone that had been quarried to a certain date; a final, detailed survey probably certified the successful conclusion of the building work (Rossi 2004: 142–147). As clearly hinted by the first line of the Rhind Mathematical Papyrus, being able to write, read, and reckon could open further opportunities. The Royal Scribe Amenhotep Son of Hapu wrote that the king Amenhotep III “showed me his favor by appointing me royal scribe under the (direct) orders of the sovereign. I then entered the religious literature and saw the useful works of Thoth. I made myself master of ideas inaccessible at first sight; I put in full light all the obscure passages;

39 Egyptian Architecture and Mathematics

1139

information was extracted from me in all the questions they dealt with” (cf. Varille 1968: 40–41). The art of writing was said to have been invented by the god Thoth, responsible for all kinds of accounts and records, Lord of Time, Reckoner of the Years, Scribe of the Gods. He was responsible for all the written treatises kept in the so-called Houses of Life, attached to temples: they fulfilled many functions, as they stored sacred texts as well as accounts, contracts, and correspondence with other temples. They were basically archives containing collections of diverse written sources and may have acted as a model for the creation, centuries later, of the Museion, the Library of Alexandria (Wilkinson 2003: 74, 215–217). Officials like Amenhotep Son of Hapu clearly proceeded further in their acquaintance with the ancient Egyptian written sources. As nothing is known about the nature of the texts kept in the Houses of Life, it is impossible to make any suggestion on the nature of this deeper and more articulated knowledge; whether or not treatises specifically dedicated to “reckoning,” and therefore whether or not figures that we could assimilate to mathematicians existed, is impossible to tell. It is also extremely difficult to outline and pinpoint the ancient Egyptian equivalent of our modern architect, the person in charge of designing, planning, and constructing a building. It seems that in ancient Egypt this process did not fall under the control of a single individual but was rather a collective enterprise that saw the participation of a large number of individuals who remained anonymous. The construction of major monuments was obviously ordered by the king himself and entrusted to a high-ranking officer. The latter generally bore a string of important titles, including (Royal) Scribe, but not necessarily relating to technical aspects; in the New Kingdom, the title Superintendent of the Works is one of the most common titles, but it is difficult to tell whether it was a honorific title or was attributed as a consequence of specific technical knowledge. In the case of Kha (called “the Scribe” on one of his walking sticks), who served under Amenhotep II, the presence of a foldable cubit among his grave goods suggests that he was directly involved in some technical work. It is worth noting that he was buried also with two scribal palettes, as well as a golden cubit donated by the king (Schiaparelli 1957: 63, 80–81, 87, 188). Kha is known to have been in charge of the construction of a small temple at Hermopolis (Schiaparelli 1957: 169–173), just as Amenhotep Son of Hapu was the “head of the works of the king for the transport of his great monuments in all types of solid materials” and was entrusted to direct the construction of a monument dedicated to Amon in Karnak and move around some statues (Varille 1968: 8 and 27). Concerning instead Senenmut, one of the most powerful officials under Queen Hatshepsut, his titles relating to building works might have been honorific and reflect his being generally in charge, rather than indicating a specific technical knowledge (cf. Dorman 1988: Chap. 7). This ambiguity goes back to the figure of Imhotep, high-ranking officer of the Third Dynasty king Djoser, who was buried under one of the most striking and innovative ancient Egyptian funerary monuments: the Step Pyramid of Saqqara (Lauer 1936–9). The only Third Dynasty sources mentioning his titles are the

1140

C. Rossi

basement of a statue of the king, on which he is said to be “Seal-Bearer of the King of Lower Egypt, First One under the King, Administrator of the Great Mansion, Prince, Chief of Seers,” and a slightly later graffito calling him simply again as “Seal-Bearer of the king of Lower Egypt” (Kemp 2006: 159; see also Gunn 1926). In the subsequent centuries, Imhotep acquired great posthumous fame as a sage and a physician (Wildung 1977), and was identified as the “architect” of the Step Pyramid. It is entirely possible that, as one of the highest-ranking officers of the court, Imhotep might have actually been entrusted with the responsibility of building the royal tomb and, moreover, that the ancient Egyptians might have had access to biographical sources that are now lost to us. However, this does not imply that he designed the monument himself and technically directed the operations. In fact, other major architectural innovations relating to the evolution of royal tombs remained totally anonymous: nothing is known on who first thought of building a smooth pyramid, or to move from pyramids to rock-cut tombs. Distinguished figures appear to have been generally remembered as “wise men,” without further specifications; at the same time, “art and architecture were parts of the stream of direct activity that emanated from the court” (Kemp 2006: 158). It seems that such collective efforts left room to be remembered as directly associated with the monument only for the king, obviously, and for his highest officers in charge; the stream of lower-ranking technical personnel, including various specialists, was not necessarily recorded. It is interesting that monuments appear to be the result of a joint effort also at the lower level of private architecture, as hinted, for instance, by the Middle Kingdom story of Sinhue: “a pyramid of stone was built for me in the midst of the pyramids. The overseers of stonecutters of the pyramids marked out its ground plan. The draughtsmen sketched in it, and the master sculptors carved in it. The overseers of works who were in the necropolis gave it their attention” (Snape 2011: 175). It is worth noting that nothing in this text suggests who chose the dimensions and the slope of the pyramid, that is, who decided the mathematics behind the monument. In conclusion, modern titles should be used with care when dealing with ancient Egypt prior to the Hellenistic Period. To sum up, there is no evidence of the existence in ancient Egypt of the equivalent of modern mathematicians, and the activities of a modern architect appear to have been subdivided among several different figures, many of whom were destined to remain anonymous. Therefore, it is safer to refer to “reckoning,” “scribes,” and “builders,” bearing in mind that these terms include a wide variety of activities and functions and are indeed also rather vague. At least, they remind the reader of the difficulties relating to overlapping modern and ancient terminology.

Mathematics and Architecture Practical Operations Mathematical texts might be defined texts specifically meant to convey information on a specific mathematical technique or procedure (Robson 1999: 8). This kind of

39 Egyptian Architecture and Mathematics

1141

sources from ancient Egypt amounts to a handful of texts only; the problems listed in the Rhind Mathematical Papyrus that may be directly linked to the building practice are grouped together: nos. 41–47 deal with the calculation of the volume of circular and rectangular granaries; nos. 48–55 with the calculation of the area of fields in the shape of circles, triangles, and truncated triangles; nos. 56–60, finally, deal with the dimensions of pyramids. All these problems fall under the modern definition of geometry (cf. Chace and Manning 1927); they all deal with simple geometrical figures. Information on mathematical knowledge, however, can be gained from other sources as well. Seemingly, information on the ancient Egyptian planning and building process can be derived by a variety of sources, ranging from drawings, models, and written documents relating to building activities, to the building themselves, including sometimes also the traces left by the provisional structures that aided the construction and that were generally completely dismantled once the building was completed (Clarke and Engelbach 1930; Arnold 1991). Architectural drawings represented the various portions of buildings and tombs in proportion but were not to scale; the dimensions were written out beside the architectural element to which they referred. They only represent buildings in plan: this is interesting as it appears to suggest that the project of the three-dimensional space was not carried out in two dimensions. It may be suggested that the evaluation of the three-dimensional complexity of the building could be achieved thanks to architectural models; however, not many survive from ancient Egypt, and most of them are likely to have had a ceremonial, rather than a technical function. The overall impression is that only a few details of the complex task of building a monument were handled thanks to drawings and models (Rossi 2004: 96–147). In this respect, and considering also the voluntary or involuntary lack on information on the individuals involved in the planning and building process, most of the creative process behind the construction of ancient Egyptian monuments remains rather obscure. The role of mathematics must have been pervasive, and yet it is not easy to pinpoint. Keeping track of the dimensions and volume of corridors and rooms excavated during the construction of rock-cut royal tombs appears to belong to the administrative, rather than the technical side of the management of a building site. The opposite, however, is likely to have also happened, even if the surviving sources are scant or nonexistent: Reisner Papyrus I, for instance, might contain the calculations performed to establish the amount of soil or rock to be removed to lay the foundations of a temple or another type of monument (Rossi and Imhausen 2009); and calculations of the total volume of stone to be quarried in order to build a monument or parts of it must have been performed continuously, even if no written documents survive. It is difficult to establish how far calculations stretched and where the building process relied instead on observations. Certainly no structural calculation was performed in antiquity, and therefore the dimensions of vertical supports and horizontal elements were established by eye and on the basis of the builders’ experience. The relatively short time period that saw the introduction of stone as main building material for royal funerary monuments and its quick development

1142

C. Rossi

represents an interesting case study. The first stone monument was the funerary complex of the Third Dynasty king Djoser at Saqqara that, beside the Step Pyramid, included a series of smaller buildings within a vast rectangular enclosure. Many of these smaller constructions are labeled “dummy buildings”: they do not contain any internal space, but are simply reproductions in stone of pavilions that evidently formed the traditional funerary complex of kings and that until then had been constructed in light materials, such as wood, reeds and mats (Badawy 1948). In particular, all the columns that appear in the Saqqara complex are not free-standing but are either half-embedded in the masonry or placed at the end of short transversal walls, as if the builders were not sure that these stone versions could stand by themselves. Moving from light materials to stone must have represented a big and complex step, as it required to re-define shapes and dimensions; it quickly determined the birth of totally new aesthetic criteria, based on the use of massive blocks, that flourished in the relatively short space of about 300 years (Clarke and Engelbach 1930: 5–11). Builders must have quickly gained a great deal of practical experience that resulted in their ability to build massive stone constructions; the role of “reckoning” in all this process might have been minimal. In general, “reckoning” must have been involved in designing the outline of the buildings on the ground, for instance, to set up alignments and right angles (Rossi 2004: 148–161). When dealing with the orientation of buildings, instead, it is difficult to establish how much the builders relied on calculations, and instead how much on direct observations (cf. Hannah 2009: Chap. 3; Johnstone 2011: Chap. 3). Solar, stellar, and topographic alignments were certainly based on accurate observations of the movements of the celestial bodies and of the natural or artificial features of the landscape, and in general it is obvious that a certain amount of measurements must have been performed in order to materialize the connection between the chosen reference points and the actual building (cf. Magli 2013: 89–93). The accuracy in the orientation of some monuments suggests a parallel accuracy in the accompanying calculations, but establishing the nature of the latter would be extremely difficult. Keeping a straight line and marking on the ground the four right angles of a building are relatively simple geometrical problems, but when the size of the building is the pyramid of Khufu, with a basis of 440 cubits (over 230 m) wide, outlined around a central rock knoll, and a final height of 280 cubits (over 146 m), simple tasks become rather complex. The amount of calculations that were necessary to achieve accurate results, as well as the level of their complexity, remains unknown. Geometrical schemes (lines and grids) were used to guide the design and reproduction of complex artistic shapes, in primis for the representation of the human body. Grids were probably used since the Middle Kingdom to guide the way bodies were drawn on the tombs’ walls: some horizontal and vertical lines passed across specific points of the body, such as the hairline, the elbows, the buttocks, and the ear. There were loose rules also to represent pairs of figures, as well as seated figures; the proportions of the grid and of the human figures did not remain constant over the centuries, but it is clear that square grids represented the most widespread and common support to draw human figures (Robins 1994). Square grids were also

39 Egyptian Architecture and Mathematics

1143

used to keep the desired proportions of statues, architectural elements, and other artistic objects (Rossi 2004: 93, 122–123). The use of grids in architecture is less clear; cords were certainly used to establish the orientation and the general outline of the buildings on the ground, but this does not mean that square grids made of modules were actually used. Regular square grids may represent useful tools to be used by modern scholars to check and highlight the presence of modules or recurring dimensions but do not necessarily reflect the ancient building process (Rossi 2004: 122–127).

Meanings Beyond Numbers? Architecture includes in itself eminently practical and highly symbolic components. Any building must stand, first of all, and thus respect some basic rules of stability; but anything else, from the design of the interior and the exterior, as well as the proportions of its architectural elements, depends on the message which that every specific building wishes to convey. One example from ancient Egypt is the choice of the slope of pyramids that was expressed as the horizontal displacement of the sloping surface at the height of one cubit (the basic unit of measurement, corresponding to the forearm); called seked, this linear measure was generally expressed in palms and fingers, the submultiples of the cubit. In general, the slope of pyramids increased as the size of the monuments decreased: the overall impression is that the builders aimed at building tall and pointed monuments, but also that they always kept below a certain slope when they built large pyramids. The intuitive explanation that builders referred to the natural “angle of repose” of the material, that is, the natural angle in which any heaped matter remains without changing its form (Dreyer and Swelim 1980: 95) is certainly a starting point; however, well-laid stone courses allowed a wider range of possible slopes, as it is in fact witnessed by the archaeological remains (Rossi 2004: Appendix). One of the reasons must have been the fear of loading too much the core of the monument, a problem encountered during the construction of the first smooth pyramid, the Bent Pyramid of Dahshur of king Snefru; this problem was avoided by Snefru’s son, Khufu, who chose to build his pyramid (the largest ever built) on the Giza plateau, on and around a small rocky hill (Lehner 1997: 109). He carefully chose the slope of the steepest true pyramid that had been completed so far, that of Meidum, corresponding to 5 palms and 2 fingers (a bit more than 52◦ ), which was slightly flatter than the lower part of the Bent Pyramid. His successor Khafra chose a slightly steeper slope, 5 palms, and 1 finger (ca 51◦ ) that allowed his pyramid to rival in height with Khufu’s monument, even if its side-length was shorter; Khafra’s slope was the steepest ever used during the construction of large pyramids. It cannot be excluded that keeping below a certain value might have been due also to other practical reasons, such as facilitating the construction of the ramps that must have been used to lift the stone blocks. There is substantial archaeological

1144

C. Rossi

evidence of ramps used to transport blocks to the base of the pyramids, and up to a certain level; but things become quite foggy when trying to imagine how the blocks reached the upper part of these monuments: linear ramps starting from ground level are out of the question (as they would eventually become too long), as well as large spiral ramps encapsulating the whole monument (that would hide the lines of the four corner and prevent the builders to check their alignment). If small and narrow ramps were built on and around the upper part of the monument, then a not-thatsteep slope would have helped. This problem did not concern smaller pyramids that could be built with the aid of relatively small mudbrick ramps and structures that did not pose specific problems. The New Kingdom pyramids that topped the private tombs of Deir al-Medina reached slopes of 3 or even 2 palms (corresponding to 67◦ and 74◦ ), but their height was just between 3 and 5 m: these pyramids could be easily built with the aid of small additional structures that could rise and be dismantled independently from the small monument that was being built. For his first true pyramid, that of Dahshur, Snefru had initially chosen a slope of 4 palms (60◦ ); after the first cracks appeared, an attempt was made to solve the problem by enclosing this stump of pyramid into a larger layer with a flatter slope, corresponding to 5 palms (ca 54◦ ). As the structural problems persisted, the decision was taken to drastically reduce the slope to 7 palms and 1 finger (ca 43◦ ): the final result was a jagged profile, from which the modern name of Bent Pyramid derives. It is interesting that the first slope chosen for this monument (4 palms, corresponding to 60◦ ) means that the vertical section of the original pyramid was meant to be an equilateral triangle. The second choice, 5 palms, would have generated a pyramid in which the faces corresponded to equilateral triangles, thus somehow retaining the use of this geometrical figure. Curiously enough, the equilateral triangle did not bring any luck to the kings who tried to use it in their pyramids and does not rank among the most successful shapes used by royal pyramid builders (Rossi 2004: Appendix). One of the most successful slopes appears to have been 5 palms and 1 finger, the one first used by the Fourth Dynasty king Khafra at Giza, that corresponded to the right-angled triangle 3-4-5. All the Sixth Dynasty kings appear to have chosen this slope for their funerary monuments; the secondary pyramids that completed their funerary complexes, dedicated to their queens, were all steeper. However, different slopes appear to have been used in secondary pyramids that were extremely similar in size: all four pyramids of the queens of Pepi I had a slope of half a cubit (and thus their bases were equal to their heights), but two had a side-length of 30 cubits and two of 40 cubits; that is, their proportions were the same, but two pyramids were larger than the other two. Of the three pyramids of the queens of Pepi II, instead, the largest was the steepest and the smallest was the flattest, the opposite of what one could expect if the size of the monument was the only element to be taken into account to choose the slope. It is therefore possible that slopes were chosen also on other bases, perhaps as a direct reference to older funerary monuments and their owners, or to other factors that are unknown to us (Rossi 2004: 236–238).

39 Egyptian Architecture and Mathematics

1145

Conclusions In conclusion, any discussion on the relationship between mathematics and architecture in ancient Egypt should be carried out bearing in mind not only the similarities, but also the differences between the ancient and the modern languages and concepts. “Reckoning” (keeping things in order) was deeply intertwined with “building,” but many aspects of this relationship remain unclear. Calculations supported some practical aspects of the construction process, but the role of numbers and geometrical figures at the planning stage is unclear. Scribes were involved in the management of building sites that were under the responsibility of high-ranking officials that also bore the title of scribe, and often that of Superintendent of the Works; in some cases, this title appears to reflect an actual technical knowledge, while in other cases it might have been just a honorific title. A grey area still persists between the workmen, whose tasks can be easily defined and described, and the high-ranking officers. The overall appearance and design of the royal monuments is likely to have been chosen by the highest personalities involved in the task, and then someone with specific technical knowledge must have been entrusted with the practical tasks and must have coordinated the work of the lower ranking personnel. In a schematic interpretation, the symbolic aspects of building a monument might be attributed to the king and the high-ranking officers, whereas the practical aspects would be the realm of the workmen. “Reckoning” is likely to have been involved at all levels, possibly playing different roles. The choice of the overall, initial, and basic dimensions might have had a symbolic meaning that was evidently decided in advance, but there is ample evidence that many geometrical details were then handled directly on the field by the workmen. This appears to confirm that “reckoning” was pervasive and that it could play different roles and probably have different meanings at the various social levels of the personnel involved in the complex task of building a monument.

References Allon N, Navratilova H (2017) Ancient Egyptian scribes: a cultural exploration. Bloomsbury, London Arnold D (1991) Building in Egypt, pharaonic stone masonry. Oxford University Press, Oxford Badawy A (1948) Le dessin architectural chez les anciens Egyptiens. Imprimerie Nationale, Cairo Baines J (1983) Literacy and ancient Egyptian society. Man 18(3):572–599 Chace A, Manning HP (1927) The Rhind mathematical papyrus. British museum 10057 and 10058. Mathematical Association of America, Oberlin Clagett M (1999) Ancient Egyptian science. A source book. Vol. Three: ancient Egyptian mathematics. American Philosophical Society, Philadelphia Clarke S, Engelbach R (1930) Ancient Egyptian masonry. Oxford University Press, Oxford Cuomo S (2012) Exploring ancient Greek and Roman numeracy. BSHM Bull: J Br Soc Hist Math 27(1):1–12. https://doi.org/10.1080/17498430.2012.618101 Dorman P (1988) The monuments of Senenmut. Kegan Paul, London

1146

C. Rossi

Dreyer G, Swelim N (1980) Die kleine Stufenpyramide von Abydos-Süd (Sinki). Mitteilungen des Deutschen Archäologischen Instituts Kairo 38:83–95 Faulkner R (1981) A concise dictionary of middle Egyptian. Griffith Institute and Ashmolean Museum, Oxford Gardiner A (1957) Egyptian grammar, 3rd edn. Griffith Institute and Ashmolean Museum, Oxford Gunn B (1926) Inscriptions from the step pyramid site. Annales du Service des Antiquités de l’Égypte 26:177–196 Hannah R (2009) Time in antiquity. Routledge, London/New York Imhausen A (2016) Mathematics in Ancient Egypt. A contextual history. Princeton University Press, Princeton/Oxford Johnstone S (2011) A history of trust in ancient Greece. University of Chicago Press, Chicago/London Kemp BJ (2006) Ancient Egypt. Anatomy of a civilization, 2nd edn. Routledge, London/New York Lauer J P (1936–9) La pyramide à degrés, l’architecture. Service des Antiquités de l’Égypte, Cairo Lehner M (1997) The Complete Pyramids. Thames & Hudson, London Lesko L (2001) Literacy. In: Redford D (ed) The Oxford encyclopedia of ancient Egypt, vol 2. Oxford University Press, Oxford, pp 297–299 Lichtheim M (1973) Ancient Egyptian literature, vol. I: The old and middle kingdoms. University of California Press, Berkeley/Los Angeles/London Lichtheim M (1976) Ancient Egyptian literature, vol. II: The New Kingdom. University of California Press, Berkeley/Los Angeles/London Magli G (2013) Architecture, astronomy and sacred landscape in ancient Egypt. Cambridge University Press, Cambridge Peet TE (1923) The Rhind mathematical papyrus, British museum 10057 and 10058. The University Press of Liverpool and Hodder & Stoughton, London Perisho MW (1965) The etymology of mathematical terms. Pi Mu Epsilon Journal 4(2):62–66 Robins GR (1994) Proportion and style in ancient Egyptian art. Thames & Hudson, London Robins GR, Shute CCD (1987) The Rhind mathematical papyrus: an ancient Egyptian text. British Museum Publications, London Robson E (1999) Mesopotamian mathematics, 2100–1600 BC. Technical constants in bureaucracy and education. Clarendon Press, Oxford Roero CS (1994) Egyptian mathematics. In: Grattan-Guinness I (ed) Companion encyclopedia of the history and philosophy of the mathematical sciences, vol 1. Routledge, London, pp 30–45 Rossi C (2004) Architecture and mathematics in ancient Egypt. Cambridge University Press, Cambridge Rossi C (2010) Science and technology, Chapter 21. In: Lloyd AB (ed) The Blackwell companion to ancient Egypt, vol I. Blackwell, Oxford, pp 390–408 Rossi C, Imhausen A (2009) Papyrus Reisner I: architecture and mathematics in the time of Senusret I. In: Ikram S, Dodson A (eds) Beyond the horizon: studies in Egyptian art, archaeology and history in Honour of Barry J. Kemp. American University Press, Cairo, pp 440–455 Schiaparelli E (1957) Relazione sui Lavori della Missione Archeologica Italiana in Egitto (anni 1903–1920). Volume secondo: La Tomba Intatta dell’Architetto Cha nella Necropoli di Tebe. Museo delle Antichità, Torino Snape S (2011) Ancient Egyptian tombs. Wiley and Blackwell, Oxford Spalinger A (1990) The Rhind mathematical papyrus as a historical document. Studien zur Altägyptischen Kultur 17:295–337 Stedall J (2012) The history of mathematics. A very short introduction. Oxford University Press, Oxford Varille A (1968) Inscriptions concernant l’architecte Amenhotep fils de Hapou. Imprimerie de l’Institut Français d’Archéologie Orientale, Cairo Wildung D (1977) Egyptian saints. Deification in pharaonic Egypt. New York University Press, New York Wilkinson RH (2003) The complete gods and goddesses of ancient Egypt. Thames and Hudson, London

40

Labyrinth Tessa Morrison

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Topology of Labyrinths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mnemonic Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1148 1150 1152 1152 1158 1161 1161

Abstract The labyrinth is a fusion of architecture and symbol, and it has permeated culture since ancient times. It is also a synthesis of reality, religion, and myth that has merged through the ages. It has been a prison for the Minotaur in ancient Crete and an ancient Egyptian palace as described by Herodotus and Pliny. In ancient Rome, it appears as graffiti in Pompeii. For pseudo-Dionysius (late fifth century to early sixth century), it was the dance of the angels which has formed the basis for Christian processions and rituals and for shaping the architectural boundaries of early churches. In mediaeval times, it was the pathway to Jerusalem that decorated the floors of cathedrals, particularly in France. Although labyrinths come in various forms, a very precise structure of the symbol of the labyrinth emerges in ancient times that is repeated in various cultures, sometimes round and sometimes square but exactly the same structure. The word “labyrinth” seems to have so many meanings, yet many of these ancient labyrinths have one precise geometrical structure that has been retained for millennia and is T. Morrison () The School of Architecture and Built Environment, The University of Newcastle, Newcastle, NSW, Australia e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_4

1147

1148

T. Morrison

commonly called the Cretan labyrinth. The labyrinth, architecture, and geometry have been entwined through time. This chapter considers this relationship by examining a paradigm that makes it possible to assess these simple structures and how they have changed in appearance, while the geometrical structure has stayed the same.

Keywords Cretan labyrinths · Roman labyrinths · Church labyrinths

Introduction In modern speech, the words “maze” and “labyrinth” are often interchanged. Yet they both have very different meanings. The word maze originated from the old English mazen, which means to bewilder, and it was first recorded in c.1385 by Thomas Chaucer in The Legend of the Good Woman (Bradley et al. 1989, 507). It is mostly associated with English maze gardens where there is more than one path to the center – these are multicursal patterns. The most famous of these is at Hampton Court in England that dates to the Elizabethan era of the sixteenth century. Labyrinths, on the other hand, are unicursal patterns, meaning they only have one path to the center, and once on the path, no deviation is possible except to return to the beginning. These labyrinths have been dated back to the early Bronze Age, 2500–2000 BC; however, they are notoriously difficult to date. They were carved into stone and are found in Italy, Spain, Iran, England, Ireland, and Sardinia and are all circular. The first precise dating of this geometric structure was in Pylos, in Greece. They are found as a square preserved on a clay tablet; on one side is a labyrinth, and on the other side of the clay tablet was inventory. An early example of the labyrinth appears as a doodle that was preserved by fire in the destruction of the city of Pylos in the thirteenth century BC (see Fig. 1a). Exactly the same symbol was found over 1400 years later, and it was also preserved by disaster. It came in the form of graffiti on the peristyle of a villa in Pompeii, with the inscription “labyrinths. Here lies the Minotaur” (Fig. 1b). The same pattern occurs on coins from Crete from the second and third century BC (Fig. 1c), on a seventh-century BC Etruscan wine pitcher (see Fig. 1d), and a ninth-century AD Biblical manuscript with the inscription “the walls of Jericho” (Fig. 1e). Sometimes this symbol is round and sometimes square, but it is the same structure. The symbol has been called the Cretan labyrinth after the myth of the Minotaur. In this myth, every 7 years, seven males and seven females from Athens were to be sacrificed to the Minotaur, half man, half bull, who was imprisoned in the labyrinth in Crete. The labyrinth had been built by the famous architect Daedalus, and it was designed with many confusing paths, making it difficult to find the way out. However, Ariadne, daughter of the King of Crete, fell in love with one of those to be sacrificed, Theseus. He smuggled a sword into the labyrinth to kill the Minotaur, and Ariadne gave him a ball of string so that he would be able to retrace his steps and escape the labyrinth (Apollodorus 1997).

40 Labyrinth

1149

Fig. 1 (a) The Pylos labyrinththirteenth century BC, (b) graffiti from Pompeii 79 AD, (c) Cretan coins c.190–100 BC and 267–200 BC, (d) seventh-century BC Etruscan wine pitcher, and (e) nineteenth-century AD Biblical manuscript

Classical writers such as Herodotus, c.485–425 BC (Herodotus 1939), and Pliny the Elder 23–79 AD (Pliny 1949) wrote of an Egyptian labyrinth. This was found in the Temple of the Twelfth Dynasty Pharaoh Ammenemes III, c.1842–1797 BC – which was built beside his pyramid at Harrow in the Fayum district. Herodotus was overwhelmed by the magnificence of the Egyptian labyrinth. He stated that “the pyramids, too, are astonishing structures, each one of them equal to many of our

1150

T. Morrison

most ambitious works of Greece; but the labyrinth surpasses them. It has 12 covered courts – six in a row facing north, six South – the gates of the one range exactly fronting the gates of the other, with a continuous wall around the outside of the whole. Inside, the building is of two storeys and contains 3000 rooms, of which half are underground, and the other half directly above them” (Herodotus 1939, II.149). By the nineteenth century, none of this magnificent Egyptian labyrinth survived when Flinders Petrie found evidence of its existence; it literally was a hole in the ground (Hall 1904). Herodotus’ description of the Egyptian labyrinth sounded more like that of a multicursal maze rather than a unicursal labyrinth. However, Pliny claimed “there is no doubt that Daedalus adapted it [the Egyptian labyrinth] as the model for the labyrinth built by him in Crete, but he reproduced only 100th part of it containing passages that wind, advance and retreat in a bewilderingly intricate manner” (Pliny 1949, XXXVI. 85). Yet, by this time the unicursal labyrinth structure that became known as the Cretan labyrinth was well established, as is demonstrated in the graffiti at Pompeii. However, it is often stated that the unicursal labyrinthine structure was the pattern left by Ariadne’s thread that Theseus took into the labyrinth to find his way out, rather than the labyrinth itself.

Topology of Labyrinths The main purpose of this chapter is to analyze the ancient labyrinths associated with architecture; thus there are restrictions on the type of labyrinths examined. To examine the structure of the unicursal labyrinth, it is first necessary to define the anatomy of the unicursal labyrinth. 1. There are an equal number of paths or levels of the labyrinths equidistance from the center, so that the levels are concentric levels. These can be concentric circles, squares, or any geometrical shape, as long as they are parallel and an equidistance to the center. Each level has maximum coverage, meaning the only gaps on the level are for turns. 2. The unicursal labyrinths have a restricted anatomy, as described in Fig. 2. 3. There are a finite number of levels. These ancient labyrinths obey the law of alternation, meaning that the direction of the path changes whenever the level of the labyrinth changes. In short, these labyrinths are simple, alternative, transitive labyrinths or SAT labyrinths. In strict mathematical terms, the unicursal labyrinth’s topology is that of a straight line, which gives very little information on the symbol’s structure. However, in this chapter a paradigm will be outlined that emphasizes its features, such as turns. Figure 2a shows an example of a SAT labyrinth that has alternating rows, spirals, and only one semiaxis. The semiaxis from the entrance to the center is called the throat of the labyrinth. Figure 2b only obeys the laws of alternation and has four semiaxes. To analyze these symbols, it is necessary to first unroll the labyrinths. This reduces the symbol to its fundamental form (FF), which consists purely of the geometry

40 Labyrinth

1151

Fig. 2 Anatomy of the SAT labyrinths

Fig. 3 The process of unrolling a SAT labyrinth

of its turns, thus making it simpler to analyze. Figure 3 demonstrates the process of unrolling the labyrinth with only one semiaxis, to reveal its FF (Fig. 3e). Often the FF consists of fundamental elements (FE) which are stacked vertically to create the FF. However, many SAT labyrinths have more than one semiaxis, particularly church and Roman mosaic labyrinths.

1152

T. Morrison

Definitions A fundamental form or FF is a rectangular graft obtained when a labyrinth is unrolled. A FF is constructed from fundamental elements or FE. They are minimal building blocks, or elements, of the FF. One or more FEs is stacked vertically to create FFs. The SAT labyrinth is the most enduring and simplest labyrinth. There are two other families of labyrinths: the church labyrinths and the ancient Roman labyrinths that were found in mosaic floors. Both have generally four semiaxes, although this number can be extended. This gives the labyrinth the appearance of perfect fourfold rotational symmetry. By using the same process of unrolling the labyrinths, the end result is a series of FFs. Figure 4 shows a Roman mosaic labyrinth unrolled, and this reduces the labyrinth to its four sections that make up the FF of the Roman labyrinth. The labyrinth has ten rows and the four sectors that can be clearly seen when reduced to a FF. Each section consists of a single or a collection of stacked vertical FEs, and each section is called a fundamental element sector (FES).

Definition In a FF of a labyrinth with semiaxis is other than the throat, there is a division coursed by the semiaxis, which divides the FF into sectors. These sectors consist of a single, or a collection of, stacked vertical FEs, and they are called the fundamental element sector or FES. The Cretan labyrinth has eight levels, and its level sequence from the outside, designated as 0, to the center is 032147658. An example of FE is 3214, which can be symbolized by γ4 ; the other part of the sequence, 7658, is isomorphic to 3214, but four levels further from the entrance are also symbolized by γ4 ; this is demonstrated in Fig. 3e. Therefore, FF of the Cretan labyrinth is denoted as SAT[γ4 2 ]. In general the notation is SAT[ξn m . . . ψn m ] where ξψ = FE and n,m = 1,2,3, . . . ,s. The level sequence of the FEs essentially determines the topology of the SAT labyrinths. The two FEs of the Cretan labyrinths are clearly visible in the unrolling of the SAT labyrinths demonstrated in Fig. 3. The subscript gives the number of levels in this particular FE sequence, and this superscript gives the number of these particular FEs that are stacked vertically in succession. The level sequence of the FEs is completely determined by the topology of the SAT labyrinths. The level sequences of the FEs are read from the beginning of the right-hand side, and the levels run from the top to the bottom. The notation of the FE’s stacking is also read from the top to the bottom. Figure 4a demonstrates how the levels of the graph are read, while Fig. 4b illustrates examples of FEs’ dual or inverse sequence patterns and their level sequences. The duals of the FEs denote the walls of the labyrinths and the FE and paths of the labyrinths. However, what is interesting is that there are only three main FEs that are needed to define that topology of the SAT labyrinth. These are γn which denotes the FE (n − 1) . . . 321n, examples of this are γ 2 = 12, γ 4 = 3214, and γ 6 = 543,216,

40 Labyrinth

Fig. 4 Demonstrating how the levels of the graph are read

1153

1154

T. Morrison

ϕn which denotes 1234 . . . . (n − 3) (n − 2) (n − 1)n, and furthermore ηn which denotes the FE (n − 1)2 (n − 3)4 (n − 5)6 . . . 5 (n − 4)3 (n − 2)1n. An example of this is η8 = 72,543,618. Another two FEs are added: αn and α2 [αn ] or [αn ] α2. The brackets indicate nested turns and n is always even. These nested turns are needed for the FFs of the church and Roman labyrinths. With these five classifications of FEs, it is possible to classify the overall topology of all unicursal labyrinths. Excluding spirals, SAT labyrinths are the simplest form of labyrinthine structures. Roman labyrinths are the next significant development in labyrinths. The typical arrangement of the Roman labyrinth is in four sections. Only rarely is this number extended, and traditionally Roman labyrinths have a fourfold symmetry. These sections become clear when the labyrinth is unrolled. At first appearance, the Roman labyrinths look far more complex and a completely different structure to the Cretan labyrinth. Figure 5 demonstrates this, showing that by unrolling the Roman labyrinth its structure is simplified. The Roman labyrinth’s topology of each of the four FESs becomes a SAT labyrinth. Figure 5 clearly shows a typical Roman labyrinth which has been unrolled, and the first, second, and third sections are isomorphic to SAT[γ4 2 ] with each sector’s path being linked. But the fourth FES introduces a new form of FE. In Fig. 4c is an FE that is labelled α2 [αn ], this means on the right hand of the graph of a FE αn is nested within a α2 on the left-hand side. This type of FE cannot constitute FF alone. In Fig. 5, the fourth FES is labelled (α2 [αn ])2; the notation that

Fig. 5 Unrolling Roman mosaic labyrinths

40 Labyrinth

1155

will be used for the FF of the Roman labyrinth in Fig. 5 is R[γ4 2 × 3, α2 [αn ]2 ]. This classifies each FES and is read from left to right. In general, the notation is R[ξ x I, . . . .ψ x j] where ξ.ψ = FES and i,j = 1,2,3, . . . n. This will enable the variations of each Roman labyrinth to be detected no matter how many FESs there are. Through looking at the topology of the FES, each FES, with the occasional exception of the last sector, runs to the center and becomes a SAT labyrinth. Many of the early Roman labyrinths are associated with cities and the myth of Theseus and the Minotaur (see Fig. 6). There are thought to be at least 43 recorded Roman labyrinth mosaic floors, but the survey of these labyrinth mosaic floors is incomplete since many of the labyrinths are in war zones, and many have been reburied to protect them (Kern 2000). The most common topology of these Roman floor labyrinths is γ4 2 , the FF of the Cretan labyrinth strongly suggesting a cultural or ritual connection. The function of these labyrinths is difficult to ascertain, as there is no ancient written reference to their purpose – mythical, or possibly a ritual. But they do indicate some movement through into the center. The earliest labyrinth in a church is a Roman labyrinth in the Cathedral of Algiers and is dated in the fourth century. It is the standard for sector Roman labyrinths and in the center is a matrix of letters where the word “Sancta” (holy) is spelt out in all four directions in the shape of a swastika, replacing the more standard image of the Minotaur. The labyrinth is in the entrance of the basilica, seemingly unrelated to the shape of the building. Nevertheless, it was a dominant feature of the original basilica. Church labyrinths are the third family of unicursal labyrinths. The most famous of these is the floor labyrinth at Chartres Cathedral, France (see Fig. 7). The FF of the Chartres Cathedral labyrinth is isomorphic to Fig. 7d. The standard church labyrinth is 12 concentric levels and is divided into four sections. However, Fig. 8 shows examples of these labyrinths for levels 8, 10, and 12 with their FFs. The symmetry of the labyrinth is highlighted and is unrolled in a similar fashion to the Roman labyrinth. It is possible to detect the variations of the structure and symmetry of the labyrinth. An example of the notation used for the FFs of the church labyrinths in Fig. 7 is C[α2 [α2 ]2 , α3 2 , α3 2 , α2 [α2 ]2 ]. This equation reads from left to right and classifies each FES. In general the notation is C[ξn , . . . .ψ m ] where ξ,ψ = FES and n,m = 1,2,3, . . . s. The dotted and solid line under the FESs indicates an extra line on the first and last level FES. If the turns on the semiaxes are removed, then the SAT structure underlying the church labyrinth is revealed as shown in Fig. 8. When the church labyrinth has eight concentric levels, the level sequence of the underlying SAT labyrinths is 032147658 which are isometric to the Cretan labyrinth (see Fig. 9). This is the only variation of a level 8 standard church labyrinth. While level 10 church labyrinths have two variations (see Fig. 8b, c), they are symmetrical to each other. The Chartres Cathedral labyrinth has an underlying SAT labyrinth with a level sequence of 0,5,4,32,1,6,11,10,9,8,7,12, to repeated sequences and a FE γ6 . All the church labyrinths underlying SAT labyrinths are all a combination of the FEs and γ6 and γ4 . The 12-level style labyrinth is found on the floors of churches, roof bosses, on stone steles, and in many manuscripts. These structures indicate a strong architectural and mathematical continuity with the past (Morrison 2009).

1156

T. Morrison

Fig. 6 Roman labyrinths

By using this process of unrolling and assessing the overall structures of the labyrinths, it becomes possible to see the cultural transference of the structures and how they are embedded into other symbols from other cultural symbolic formats. By classifying the FFs of the three types of labyrinths (SAT, Church, and Roman labyrinths), it is possible to examine their structural connections, variations, and transformations from one to the other. The level 8 church labyrinth

40 Labyrinth

1157

Fig. 7 Chartres Cathedral labyrinth

is interesting, since it gives the appearance of having no resemblance to the Cretan labyrinth. The initial thought is that the church labyrinth evolved from the Roman labyrinth because of the appearance of the four quadrants and the close cultural links. However, the structure of the SAT labyrinths is embedded into the Roman labyrinths, which in turn is the foundational structure of the Church labyrinth. This begs the question, how could such a complex structure be preserved, embodied into other cultural symbols, and transferred through time? The Cretan labyrinth is an enduring and complex symbol; it is also very difficult (if not impossible) to hand draw with any accuracy, yet this precise structure has been repeated for centuries. However, on close examination of these labyrinths, particularly the ones which have been drawn into clay tablets or as graffiti, there is clearly an underlying structure to assist in this drawing. The continuous repetition of this complex symbol appears to support the suggestion that it was drawn through a mnemonics, a small symbol that acted as a guide to construct these more complex symbols. Using mnemonic devices in training the memory was common in classical times. In a world devoid of printing and notepaper, a highly trained memory was of paramount importance to recover information, and rhetoric was an important part of classical education. The earliest surviving treatise on training the memory is known as Ad Herennium and is dated c.86–5 BC. However, within the text of Ad Herennium are described earlier Greek writings on the art of memory which do not survive (Yates 1966).

1158

T. Morrison

Fig. 8 Level 8, 10, and 12 church labyrinths and their FF and underlying SAT FF

Mnemonic Devices A mnemonic is a device used to remember something that is otherwise too hard to recall in detail. Several mnemonic systems have been suggested. One symbol and method of construction is continually pointed to as being the easiest way to draw a Cretan labyrinth (Attali 1999; Kern 2000; Morrison 2009). The mnemonic, or

40 Labyrinth

1159

Fig. 9 Removing the semiaxis from the level 8 church labyrinth

nucleus, is shown in Fig. 10a and is expanded by beginning at the top vertical of the cross and then inserting a right angle or arc between the vertical of the cross and vertical of the L-shape on the right-hand side (see Fig. 10b). Second is beginning on the vertical on the left-hand side and following the path made in the last steps and terminating at the dot in the right-hand quadrant (see Fig. 10c). The preceding steps, following Fig. 9, continue to build up the labyrinths by beginning at the dot or the line on the left-hand side, leaving the lines that have terminated at the dots, and then traversing the symbol in the same direction and terminating at the first dot or line on the right-hand side, again leaving the lines that have terminated at the dots. Causing this nucleus (Fig. 10a) to expand into the complete labyrinth has been reported to have been a game called “walls of Troy” that was well-known at the beginning of the twentieth century (Heller 1946). This same nucleus is a symbol that is found in many ancient pottery shards and may have been used as seen in the thirteenth-century BC clay tablet from Pylos, where a pattern of dots can be seen pressed into the clay (Fig. 1a). Although there is no doubt that there would be a need for some mnemonic system for such a complex structure, and there is evidence of dots, and sometimes pinholes in parchment, there is no evidence, however, of the actual algorithm, and this can only be speculation. Other suggested nuclei have been in the form of religious symbols such as the double axes from the Minoan society and the ancient symbol of the swastika, where the labyrinth is constructed in a similar manner to Fig. 9 (Morrison 2009). Nevertheless, these labyrinthine structures endured through time and appeared to have some mythical and ritualistic function, and this function was adapted and absorbed through changes in culture and religion. By the end of the sixth century, the church labyrinth had developed from the Roman labyrinth. The earliest surviving example is from the sixth century at San Vitales, Ravenna. However, it was not until the twelfth century that the church labyrinths became a significant part of cathedrals, particularly in France, the most famous of all being the floor labyrinth of Chartres Cathedral. Originally there was an image of Theseus and the Minotaur on a bronze plaque at the center of the Chartres Cathedral

1160

T. Morrison

Fig. 10 One of the suggested systems of mnemonic for the Cretan labyrinth

labyrinth. Unfortunately this was removed and melted down for the Napoleonic Wars. The original purpose of the church labyrinths is debatable and the connection with the Minotaur obscure. Many modern churches are having a church labyrinth incorporated into the church as a mosaic floor, and walking the labyrinth is equated to the concept of walking a pilgrimage. However, there is no evidence that this was their original purpose, but walking or dancing the pattern of all the labyrinths, whether it be Cretan, Roman, or church labyrinths, does appear to be the obvious, although speculative, purpose.

40 Labyrinth

1161

Conclusion By examining the labyrinths by eye, there is little to hold them together apart from the fact they are unicursal. However, by examining these three types of labyrinths through the paradigms described above does indicate there is some structural connection which may have evolved through ritual, and perhaps cultural, connections. Apart from the geometrical structure, the architectural elements of the walls of the labyrinths or a city, whether it be Troy or Jericho, are common decorative features, or sometimes a literary reference or connection is strongly associated with these labyrinths. The connection of the labyrinths with architecture has a mythical presence over the millennia. Their connection in modern times has become more literal than symbolic through literature such as Invisible Cities, by Italo Calvino, and “Coleridge’s Dream” by Jorge Luis Borges. The patterns of their labyrinths cannot be assessed as easily as the Cretan, Roman, and church labyrinths, but they are nevertheless part of Ariadne’s thread.

References Apollodorus (1997) The library of Greek pathology (trans: Hard R). Oxford University Press, Oxford. Attali J (1999) The labyrinth in culture and society. North Atlantic Books, Berkeley Bradley H, Murray JAH, Craigie WA, Onions CT (1989) The Oxford English dictionary. Clarington Press, Oxford Herodotus (1939) The histories – book II. Methuen, London Hall HR (1904) The two labyrinths. J Hell Stud XXIV, p 320–337 Heller JL (1946) Labyrinth or troy town. Class J 42:175–191 Kern H (2000) Through the labyrinth. Prestel, Munich Morrison T (2009) Labyrinthine symbols in western culture: an exploration of the history, philosophy and iconography. VDM Verlag, Saarbrucken Pliny TE (1949) Natural history. Harvard University Press, Cambridge, MA Yates FA (1966) The art of memory. The University of Chicago Press, Chicago

Classical Greek and Roman Architecture: Mathematical Theories and Concepts

41

Sylvie Duvernoy

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Figurate Representation of Quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Visual Comparison of Quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Theory of Proportion and Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Musical Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Duplication of the Cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Art and Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1164 1165 1165 1169 1170 1173 1174 1176 1177 1179 1180 1180

Abstract In classical antiquity only round numbers — natural integers — were known, and mathematics was very different to the way it is today. But whereas the mathematics of this ancient era was in one sense more basic, it made use of many theoretical concepts and approaches that are no longer familiar to modern scientists. This chapter introduces three mathematical concepts or approaches that provided a foundation for classical Greek and Roman architecture. The first of these, which was equally significant for geometry and arithmetic, is concerned with the figurate representation of quantities. The second is associated with the visual comparison of magnitudes, and the last is the theory of mean proportions.

S. Duvernoy () Politecnico di Milano, Milan, Italy e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_61

1163

1164

S. Duvernoy

Keywords Pythagoras · Euclid · Plato · Mean proportional · Musical proportions · Commensurability · Symmetry

Introduction The Classical era in Greece and Rome is conventionally regarded as spanning almost 1000 years, from around 800 BC through to 200 AD. During this period the GrecoRoman world laid the foundations for contemporary thought in science, philosophy literature, art, politics, architecture, and mathematics. The latter two are the focus of the present chapter, and in particular the mathematical theories, concepts, and practices that became central to the architecture of the era, and continue to have an enduring, if more subtle influence to the present day. In order to understand the relationship between architecture and mathematics in classical antiquity, it is first necessary to understand the essence of Greek mathematics and the methodologies of scientific research that made developments in mathematical and architectural knowledge possible. Geometry and arithmetic, the two main branches of ancient theoretical mathematics — and also two of the four liberal arts — relied on sensorial perception, both in the process of undertaking research and in the process of divulgation. Sensorial perception provides a fundamental point of connection with architecture and points to the essence of the relationship between science and arts. In classical geometry and arithmetic, numbers and magnitudes were tangible objects that could be shaped, modeled, and transformed from linear to planar and to solid. They could be drawn, measured, and compared. Among the main research tools that ancient mathematicians used were the figurate (meaning graphic or figurative) representation of quantities, the visual comparison of magnitudes, and the theory of mean proportions. In classical antiquity mathematicians utilized a sensorial approach to scientific research, while artists and architects worked out intellectual design procedures relying on numbers and proportional systems to size and shape their works. A common concern of all of these disciplines was understanding and imitating the beauty of divine creation. As such, mathematicians, artists, and architects potentially shared similar methods and motives. However, historians of Greek mathematics have a rich bibliography of extant ancient treatises on which to base their studies, many more than historians of Greek architecture. A cursory comparison between the number of preserved treatises of mathematics and those of architecture reveals a disparity of number heavily in favor of the mathematical treatises. From ancient Greece, not only have the books of Euclid (fl. 300 B.C.) come down to us but also those written by Aristotle (384–382 B.C.), Archimedes of Syracuse (287–212 B.C.), Aristarchus of Samos (?–230 B.C.), and Apollonius of Perga (240–? B.C.), among many others. While of those regarding architecture, we only have the treatise entitled The Ten Books on Architecture, a “second-hand source” written much later, in the Roman era, by

41 Classical Greek and Roman Architecture: Mathematical Theories and Concepts

1165

the architect-engineer Marcus Vitruvius Pollio (c. 80–c. 15 B.C.). Being the only literary source dealing with ancient architecture, Vitruvius’ text has been studied and analyzed at length by scholars (historians and architects). While this text effectively clarifies the concerns of Greek architects, while they were designing, in order to understand how these concerns were linked to mathematics, we have to turn to the mathematical treatises of the era and the sensory methods they employed. For an example of how important these were, Plato (428–348 B.C.) refers to the role of visualization in the development of cognitive process. For Plato, drawn figures are simply imperfect, although necessary illustrations of a perfect idea: an intermediate step in thoughts that have to remain purely intellectual. . . . they further make use of the visible forms and talk about them, though they are not thinking of them but of those things of which they are a likeness, pursuing their inquiry for the sake of the square as such and the diagonal as such, and not for the sake of the image of it which they draw . . . And so in all cases. (Plato 2013. Republic, V- 510d)

Plato’s perfect idea may be found in mathematics, but its capacity to be understood relies on visualization of its properties. Thus, the connections between architecture and mathematics in the Classical Greco-Roman world were often found in particular processes or concepts. Three of these are the focus of the present chapter. They are the figurate representation of quantities, the visual comparison of magnitudes, and the theory of mean proportions. The architects, while defining their project and their design options, go through all these three procedures themselves. They manipulate geometric figures that are figurate representation of quantities assigned by the commissioner, or other factors, such as maximum available area, maximum height, maximum cost, etc. They visually compare and evaluate the proportions of the designed object. The control of the proportions (that modern architects address in a very loose way) was one of the main concerns in classical antiquity. In this chapter each of the three approaches are introduced in terms of their historic mathematical purpose, before the penultimate section of the chapter looks at how they began to have an impact in architecture.

The Figurate Representation of Quantities Arithmetic In Plato’s day, mathematicians make a clear and fundamental distinction between arithmetic and logistic. “Logistic” deals with numbered things, rather than numbers. It is the art of calculation: it comprises the ordinary operations of adding, subtracting, multiplying, and dividing. “Arithmetic” is the science that considers numbers in themselves (Heath 1981). Classical arithmetic stems from Pythagoras’ (c. 580–c. 495 B.C.) “Theory of Numbers” which claims that not only do all things possess numbers, but all things are numbers. Discussing Pythagorean principles, Aristotle (one of the earliest commentators) explains:

1166

S. Duvernoy

Fig. 1 Polygonal numbers Fig. 2 Adding triangular numbers

. . . since it seemed clear that all other things have their whole nature modeled upon numbers, and that numbers are the ultimate things in the whole physical universe, they assumed the elements of numbers to be the elements of everything, and the whole universe to be a proportion or number. (Aristotle 1989, Metaphysics, A.5, 985 b 27-986 a 2)

Aristotle also relates how Pythagoreans represented numbers by arranging pebbles according to various geometric patterns. The disposition of the pebbles (visible points), one for each unit of the number, gave figurative shapes to the quantities. The quantities were thus grouped together according to their common fitting shapes, and therefore numbers were called “triangular,” “square,” “pentagonal,” “hexagonal,” and so on. Moreover, the numbers in each category had peculiar proprieties that the graphic representation itself brought to light and sufficed to demonstrate (Fig. 1). The manipulation and transformation of polygonal quantities also reveal arithmetical proprieties and relationships between numbers, simply deductible by visually observing the graphic patterns and shapes achieved by varying the arrangement of the pebbles. For instance, the sum of two successive triangular numbers gives a square number, whereas the duplication of a single one generates an oblong number which is a rectangular number whose sides are in superpartiens proportion (n/n + 1). While square numbers are found by summing the successive odd terms of the natural series, oblong numbers are found by summing the successive even natural numbers (Fig. 2). The table of the first ten terms of the various polygonal series shows that the unit, 1, is common to them all. Conversely, the double, 2, is not a regular polygonal number and does not appear in any series. The second term of each series equals the number of sides of the generating polygon which depicts the series. Some quantities

41 Classical Greek and Roman Architecture: Mathematical Theories and Concepts

1167

Table 1 The first ten terms of the various polygonal series

1

1

1

1

1

1

1

2

3

4

5

6

7

8

3

6

9

12

15

18

21

4

10

16

22

28

34

40

5

15

25

35

45

55

65

6

21

36

51

66

81

96

7

28

49

70

91

112

133

8

36

64

92

120

148

176

9

45

81

117

153

189

225

10

55

100

145

190

235

280

may also have several different shapes. For example, 55 is either the tenth triangular number or the fifth heptagonal number. The series of hexagonal numbers can be found by picking every other term of the triangular series starting from 1, which means that every hexagonal number is also a triangular number. Along a single column of the table, the difference between two terms is constant, and this constant is equal to the first triangular number of the preceding column. Thus, the successive polygonal numbers of the same row form an arithmetical progression whose interval is equal to the first preceding triangle (Table 1). Further playful manipulations and experiments led the arithmetician to develop three-dimensional representations. The game, consisting of piling up the pebbles in pyramidal shapes, leads from planar to solid arithmetic. The series of pyramidal numbers derived from various polygonal bases are revealed by arranging the decreasing terms of a single polygonal series in a steady assembly. Triangular pyramidal numbers are obtained by summing the successive triangular numbers: 1, 1 + 3 = 4, 1 + 3 + 6 = 10, 1 + 3 + 6 + 10 = 20, etc. Square pyramidal numbers come from the summing of the square numbers: 1, 1 + 4 = 5, 1 + 4 + 9 = 14, 1 + 4 + 9 + 16 = 30, etc. The same procedure generates the series of numbers shaped like a pentagonal, hexagonal, heptagonal, or n-gonal pyramid. The solid numbers of each category are obtained by summing the planar numbers of the relevant polygon (Fig. 3). The properties that tie pyramidal numbers together match the relationships existing between polygonal numbers, and they can be observed in the table of the first ten terms of the various pyramidal series. The unit, 1, is common to all series

1168

S. Duvernoy

Fig. 3 Triangular and pentagonal pyramidal numbers

Table 2 The first ten terms of the various pyramidal series

1

1

1

1

1

1

1

2

4

5

6

7

8

9

3

10

14

18

22

26

30

4

20

30

40

50

60

70

5

35

55

75

95

115

135

6

56

91

126

161

196

231

7

84

140

196

252

308

364

8

120

204

288

372

456

540

9

165

285

405

525

645

765

10

220

385

550

715

880

1045

and its shape belongs to any kind of pyramid. The double and the triple, 2 and 3, have no pyramidal shape. The second term of each series indicates the number of faces of the pyramidal solid that depicts the series. Some planar polygonal numbers show also among the pyramidal numbers. For instance, 55, which is both a triangular and a heptagonal number, is also a square pyramidal number of five levels. Some numbers can take more than one pyramidal shape: 196 is either a pentagonal pyramid of seven levels or a heptagonal pyramid of six levels. 196 is also a planar square number with side 14.

41 Classical Greek and Roman Architecture: Mathematical Theories and Concepts

1169

From this quick overview, it is clear that some numbers, but not all of them, can switch from planar to solid and of course linear. The sum of two successive tetrahedrons is a square pyramidal number, and – more generally – along a single column of the table, the difference between two successive numbers is a constant equivalent to the first tetrahedron number of the preceding column. This implies that the series formed by the various successive pyramidal numbers of the same amount of levels is a simple arithmetic progression (Table 2).

Geometry The Pythagorean representational system applies only to a few arithmetical quantities: those that can fit in this particular geometric bidimensional or tridimensional shaping. In contrast, geometry is a non-arithmetized discipline which uses a more flexible representational system, in which any quantity can assume either a mono, bi-, or tridimensional shape and can therefore be modeled according to various figures. Numbers no longer belong to separate definite categories but can be transformed into whatever shape is most convenient for the purposes of specific problem-solving. They are no longer natural numbers, or integers, but “quantities” or “magnitudes”: forms that can be either larger or smaller and placed together or in opposition to one another. Classical geometry is said to have originated in ancient Egypt with the necessity of measuring land. However, ancient Greek mathematicians made a clear distinction between geometry and geodesy: similar to the division between arithmetic and logistic. “Geodesy” is the art of mensuration, not confined to land-measuring but covering generally the practical measurement of surfaces and volumes (Heath 1981). Geometry is a theoretical science. In geometry, quantities are represented either by lines, surfaces, or volumes, and arithmetical operations are graphically equivalent to drawing figures. When numbers are lines, addition means lengthening an initial line; multiplication between two numbers means drawing a rectangle or constructing a parallelepiped when three numbers are involved. Therefore, multiplying a number by itself creates a square and multiplying it once again creates a cube. We divided all number into two classes. The one, the numbers which can be formed by multiplying equal factors, we represented by the shape of the square and called square or equilateral numbers. . . . The numbers between these, such as three and five and all numbers which cannot be formed by multiplying equal factors, but only by multiplying a greater by a less or a less by a greater, and are therefore always contained in unequal sides, we represented by the shape of the oblong rectangle and called oblong numbers. (Plato 1967, Theaetetus, 147e – 148ab)

Plato is not explicit when discussing volumes and thus different shapes can be created according to the kind of arithmetical operation. When three different numbers are multiplied, the resulting parallelepiped will be scalene; if the three numbers are equal, a cube will be constructed; and if two of the three numbers are

1170

S. Duvernoy

equal and different from the third, a square-base prism is generated. The shape of the solid reflects the relationship between the numbers involved in its construction. The geometric representation of numbers is unlimited and can be applied to any quantity. This is the abstract intellectual tool that allows the scientist to switch from practice to theory and to draw general theoretical conclusions from the study of single examples. In ancient geometric texts, alphabetical symbols are associated with these graphics; however, they must not be confused with the arithmetical notation system. These symbols name the objects, indicating either points such as the vertices of the figures or the quantities themselves represented by linear segments or surfaces. The written demonstrations accompanying the graphics can thus refer to the letters, stating relationships of parity or inequality between them, heralding the later algebraic equation system of calculation (Fig. 4).

The Visual Comparison of Quantities The graphic representation of mathematical variables using lines or other geometrical figures makes it possible to visually materialize the so-called incommensurable quantities that is irrational numbers, as well as natural integers. In this way the irrational numbers can thus be manipulated together with rational numbers in research processes. The scientific value of the graphic depiction emerges in this example in its strongest form, drawing being the only means capable of imparting concrete form to irrational numbers and thereby proving their existence and giving shape and reality to quantities which would otherwise remain “invisible.” The more ancient geometric problem regarding incommensurability is the duplication of the square and the proportion between its side and its diagonal. Socrates’ lesson to the slave about this predicament, related by Plato in the Menon (82a-85b), is the quintessential illustration of the intellectual power of the drawing in a process of rational deduction using the simple means of visual perception and comparison of figurate quantities. In this example, Socrates inquires into the natural virtues of man, and in so doing he demonstrates the potential of graphic representation in the development of knowledge and science. The figure illustrating the graphic solution to the duplication of the square, a visual image, or sensible object allows anyone to become aware of the intelligible relationships between opposing quantities and to draw his or her own conclusions regarding the obvious evidence. Senses and sensorial perception are common to mankind. The capacity to observe and understand is innate and latent in anyone and does not come from a cultural privilege or a high level of education. In order to learn and progress in conscious knowledge, it is sufficient to exercise one’s natural skills. The education of the neophyte or the methodology of the scientist need only concentrate on how to observe. Socrates, in showing the figure, does not give the conclusion, because to see the figure, as a mathematical figure, actually means knowing how to look at it, how to read it, in short how to think it. (Caveing 1996)

Fig. 4 Geometrical transformation of a given arithmetical quantity: from an irregular shape to the perfect square

41 Classical Greek and Roman Architecture: Mathematical Theories and Concepts 1171

1172

S. Duvernoy

It is notable that most of the propositions that Euclid includes in the 13 books of the Elements are demonstrated through graphics, whose interpretation requires the visual comparison of figurate quantities. The accompanying text to each diagram guides the learner in the sensitive reading of the scheme, in the tangible evaluation of the entities, and in the appraisal of their perceptible equalities, complementarities, or differences. A primary property of the visual comparison between figurate quantities is that it provides the only means of perceiving the value of the irrational magnitudes. These entities can be estimated only by being represented together with a known quantity, drawn in the same scale. Thus, in the realm of geometry, a quantity in itself cannot be considered “incommensurable,” in the sense of not equal to a natural integer, but it is its ratio to another quantity – representing for instance the unit – that may be incommensurable. Only ratios between quantities may be nonmeasurable and noncomputable. In book X, Euclid defines irrationality in terms of incommensurability with a given line. The concept of rationality – or irrationality – refers to the quality of a magnitude. The notion of commensurability – or incommensurability – refers to the quality of the ratio between two magnitudes. Two quantities are commensurable if they are linked by some plain relationship, the simplest one being the fact of having a ratio equal to some ratio existing between two natural integers. Ancient mathematicians discerned several kinds of commensurability. Plato already introduces two of them: commensurability of lines and commensurability of surfaces, since the commensurability of two surfaces obviously implies the existence of a particular kind of relationship between the respective sides of these surfaces. All the lines which form the four sides of the equilateral or square numbers we called lengths, and those which form the oblong numbers we called surds, because they are not commensurable with the others in length, but only in the areas of the planes which they have the power to form. And similarly in the case of the solids. (Plato 1967, Theaetetus, 148b)

In order to illustrate the two different kinds of commensurability mentioned here, it is sufficient to go back to the problem of the duplication of the square. The unit, 1, and the square root of 2 are two incommensurable lengths, whereas their squares, 1 and 2, are commensurable, being in plain ratio from simple to double. Therefore 1 and square root 2 are two quantities commensurable-in-square-only. Book X is by far the longest book of The Elements by Euclid: it contains 117 propositions, while the second longest, Book I, contains only 48. It is also the most difficult to understand since it is entirely dedicated to the study of the incommensurability concept in terms that are no longer familiar to modern readers. In this work Euclid sets out 13 different kinds of irrationality and commensurability, starting by providing precise definitions of the notions mentioned by Plato. Quantities may be commensurable-in-length or commensurable according to the squares that can be constructed on their sides. Magnitudes commensurable in length are those that have a common divisor (which may be either a natural integer or an irrational number). Magnitudes commensurable-in-square-only are those whose squares are both multiples of a common quantity. One type of relationship does not categorically exclude the other: the same quantities may be both commensurable in length and in square.

41 Classical Greek and Roman Architecture: Mathematical Theories and Concepts

1173

Rectangles contained by lines commensurable-in-square-only are medial quantities (areas), and therefore the side of the square equal to this rectangle is a medial quantity (line) too. Since a rectangle is the mean proportional between the squares on its sides, medial lines are the mean proportional between two lines commensurable in square only. Medial areas can be arithmetically interpreted as the square root of a non-square rational number, and medial lines are the fourth root of the same number. A magnitude equal to the sum of two lines commensurable in square only is a binomial line. A magnitude equal to their difference is an apotome and so on. Among the 13 possible kinds of irrationality listed by Euclid in book X, only one is defined as being the mean proportional between two magnitudes commensurable in square only (the medial); the others are defined by adding or subtracting incommensurable quantities, either areas or lines, leading to one subclassification for medial magnitudes and another one for binomials. All these nuances disappear in the realm of modern mathematics, where algebra and decimal system offer a new approach to the study of irrationality and commensurability. Moreover, contemporary arithmetized geometry erases the true figurate shapes of the quantities studied in book X, where “medial” or “binomial” may be the characteristic of either a line or a surface, generating several further proprieties in each case. In order to understand the reason and the sense of Euclid’s investigations, we have to understand the ancient context of non-arithmetized geometry, in which the manipulation of shapes and figures predominates, and a far from negligible aspect of which is the study of regular polygons and their inscription in a same circle. This is the first step to the study of the regular “Platonic” polyhedra inscribed in a single sphere that was to be discussed in book XIII of the Elements. The purpose is to find some qualitative as well as quantitative relationship between the sides and the diagonals of an n-gon or some qualitative definition of the ratio of the circumdiameter to a side, a diagonal, etc.

The Theory of Proportion and Means It is commonly accepted that the theory of proportion and means appeared in the history of mathematics simultaneously with the concept of irrationality, the Pythagoreans’ discovery of the incommensurability of the diagonal of the square to its side, and later on, the incommensurability of the edges of two cubes whose volumes are in a ratio from simple to double. It is important to distinguish between the concept of a ratio and the concept of proportion. They refer to different kinds of mathematical notions and are not interchangeable. A ratio is the relationship (a function) linking two quantities: it may be either commensurable or not, according to the various distinctions stated by Euclid. He himself defines a ratio in these terms: A ratio is a sort of relation in respect of size between two magnitudes of the same kind. (Euclid 1908. Book V, Definition 3)

1174

S. Duvernoy

In contrast, a proportion is the relationship (a condition) linking three or more numbers or quantities. For instance, if the ratios of two pairs of magnitudes are equal, then the three or four magnitudes are proportional. In ancient Greece the word mean applies either to a sequence of three terms in continuous proportion or to the middle term which ties together the two extremes. Ratios, proportions, and means are calculation tools both in arithmetic and in geometry. Ratios between natural integers belong of course to the realm of arithmetic, but proportions and means are calculation systems that apply to both disciplines, being the only possible scientific and computational support of the visual and sensitive comparison of incommensurable figurate quantities that are so abundant in the realm of geometry. The three different kinds of means reported by Plato in his dialogues are the arithmetical, geometric, and harmonic means. In the arithmetic mean, the second term exceeds the first by the same amount as the third exceeds the second. In other words, the sequence of numbers is formed by successively adding a constant quantity to each one, thus creating regular intervals of a constant length between the numbers, and each term is equal to half the sum of its predecessor and its follower. If a and b are the two extremes, X, the midterm − the arithmetic mean – can be written as follows: X= (a + b)/2. In the geometric mean, the second term is to the first as the third is to the second: the ratio is constant. The sequence is found by consecutively multiplying the terms by the same amount. Therefore each term is the square root of the product of its predecessor and follower. The midterm of a √ trio – the geometric mean – can be written as follows: X = ab. In the harmonic mean, three consecutive terms are such that by whatever part of the first, the second exceeds the first; the third exceeds the second by the same part of itself. In modern notation, a and b being the extremes, X – the harmonic mean – is found by the equation: X = 2ab/(a + b). Several other means were added later by other mathematicians: three of them being the subcontraries of the geometric and harmonic, but none of them were as important for scientific progress, most of them being the result of purely systematic computational investigation.

Musical Proportions The most ancient discovery to use the theory of proportions is the definition of musical harmony proposed by the Pythagoreans. The sensorial acuity behind their scientific research was, in this case, no longer visual but auditory perception. Faithful to the basic concept found in the Theory of Numbers, which states that all things are numbers, the question for Pythagoreans was to find a way to numerically quantify the sounds that musicians play on their flutes and lyres. Thus, which numbers are capable of describing musical notes? The desire to be able to compose such notes in a thoughtful and melodic sequence suggests a reflection on their intrinsic properties and reciprocal relations.

41 Classical Greek and Roman Architecture: Mathematical Theories and Concepts

1175

Sound is the sensible effect of a physical phenomenon: vibrations that travel through the air. It is a function of the frequency of that vibration which in turn depends on the length of the string of the lyre to be plucked by a musician or the length of the flute through which air is blown. If the longest string of a tetrachord, whose length is double the shortest one, produces a sound that is an octave higher than the small one, of what length must the two middle chords be in order to produce the intermediate notes, being the middle sounds that the ear can perceive? How can intervals between 1 and 2 be defined? 1 and 2 being two successive numbers? Plato reports the procedure for solving this by having Socrates speak words that sound slightly ironic. . . . they talk of something they call minims and, laying their ears alongside, as if trying to catch a voice from next door, some affirm that they can hear a note between and that this is the least interval and the unit of the measurement, while others insist that the strings now render identical sounds, both preferring their ears to their minds. (Plato 1935, Republic, VII, 531a-c)

The Pythagoreans had previously demonstrated that there is no rational geometric mean between two numbers in superpartiens ratio that is between two consecutive integers (n, n + 1). Therefore any answer will produce irrational quantities. The interval of the octave has to be “filled” with the two others means: arithmetical and harmonic. However in order to transform the four magnitudes in round numbers, it is necessary to give to the extremes of the octave the values of 6 and 12, rather than 1 and 2. Thus the arithmetic mean is 9 and the harmonic is 8. This formulation generates a sequence of natural integers: 6, 8, 9, and 12, and the interval of the octave – or diapason – has been filled. The lengths of the two middle chords of the lyre have also been determined, and they are commensurable with the shortest and longest ones. Several intermediate intervals have also been created. The interval between 6 and 9, equal to the interval between 8 and 12 (whose length is 1 + 1/2), is the fifth – diapente – and its extremes are in a ratio of 2:3. The intervals 6–8 and 9–12 whose length is 1 + 1/3 are the fourths – or diatessaron – and the extremes are in a ratio of 3:4. The last interval, between 8 and 9, whose length is 1 + 1/8, is the tone. The fifth is equal to three tones and a lemma; the fourth is two tones and a lemma. The Pythagorean quantification of the lemma is 256/243. The musical intervals and their subdivisions appear clearly on the piano keyboard, which shows the repetition of the successive octaves, and the various notes of each, better than any other musical instrument (Fig. 5). In the dialogue Timaeus, Plato applies the Pythagorean theory of musical proportions to the explanation of the creation of the soul of the world, while the creation of the body of the world was made possible thanks to the geometrical mean. The mathematical process is both the guarantee and the explanation of the beauty of the divine composition. In Ancient Greece, mathematics, metaphysics, philosophy, and artistic creation were linked together by a common sense of awe about the beauty of the universe and the divine creation and by the desire to understand, explain, quantify, and imitate divine beauty.

1176

S. Duvernoy

Fig. 5 The Pythagorean musical proportions

The Duplication of the Cube With parallels to the solution developed for musical intervals, the problem of the duplication of the cube was solved applying the system of proportion and means. The duplication of the cube, the trisection of any angle, and the squaring of the circle are the three classic problems of ancient mathematics that triggered scientific research for centuries and for which various solutions were put forward. All three seek a precise quantification of an irrational quantity and its commensurability with a known rational magnitude. It is now well known that the squaring of the circle had no solution. The trisection of any angle was the easiest to solve. The duplication of the cube can be solved in many ways, some more complicated and others less. Hippocrates of Chios (470–410 B.C.) was the first to take an initial step toward the solution of the duplication of the cube. We are told by Proclus Diadochus (412– 485 A.D.) that he “reduced” the problem to the need to find two mean proportionals, in continued proportion, between two given straight lines. In modern mathematics the problem can be reduced to the finding of an approximate value for the cubic root of 2. But in Hippocrates’s time, when the irrational quantities only had a graphic shape appearing on a geometric figure, the arithmetical problem of finding two geometric means between two numbers, √ notably – once again – between 1 and 2 (where 3 2 lies), becomes a drawing

41 Classical Greek and Roman Architecture: Mathematical Theories and Concepts

1177

Fig. 6 The geometrical solution to the doubling of the cube attributed to Apollonius

problem involving finding two mean proportionals between two straight lines. We may only guess what kind of reasoning Hippocrates pursued in order to reach to his conclusion. We may suppose that he extended the methods of calculation typical of linear and planar geometry, to solid geometry by simple analogy. If the geometric mean of two quantities is the square root, that is, a line equal to the side of the square constructed from the product of the two extremes, then a cubic root too is to be found in a geometric progression of linear quantities. Proposition 17 in book VI of the Elements shows an exact bidimensional correspondence to the duplication of the cube as “reduced” by Hippocrates a century and a half earlier. His assertion, which was not yet a solution, had a massive influence on the later mathematicians who continued to work on the problem, taking it as a starting point. “Reducing” a problem does not mean solving it; it simply means making it conform to a familiar pattern of queries. Thus the work of Hippocrates is a precious indication that the theory of proportion and means was already largely in use as a method for problemsolving in his day, and he only expanded it from the realm of planar geometry to the field of solid geometry (Fig. 6).

Art and Architecture While mathematicians approached their work with an artistic sensibility that pushed them to unveil the beauty inherent in the laws of nature, artists and architects attempted to identify a mathematical support to their art. The sculptor Polycleitus (mid-fifth century B.C., roughly contemporary to Hippocrates of Chios) is famous for his accurate study of the proportions of the

1178

S. Duvernoy

human body. That is, the commensurability of the various measures of the individual parts of the body and their commensurability to the whole. His studies resulted in both a theoretical treaty “the Canon” and a model statue, the “Doryphoros.” The original statue is lost but some Roman copies have been preserved. The treatise is unfortunately lost too, but some quotations are found in later texts including the writings of the Greek physician Galen (129–c.200 A.D.), a scientist obviously interested in the secrets of human anatomy. Modern scholars have attempted at reconstructing what was written in the “Canon.” It seems that Polycleitus unveiled a proportional rule, starting from the lengths of the finger phalanges, the ratio of the finger length in respect to the hand itself, the hand to the forearm, and so on, showing that all those dimensions are in continuous geometric proportion. He thus established a mathematical basis on which artists could rely to control the making of well-proportioned statues. This mathematical analysis of the beauty of the human body had a strong and long-lasting impact on artists and architects since it tended to prove that divine beauty (Man was created by God) was indeed based on proportional rules. Another ancient claim for the mathematical rules of the visual arts is reported by Pliny the Elder (23–79 A.D.) in his Naturalis Historia, Book 35. Pliny tells us about the Greek art school which was founded in Sicyon by the painter Eupompus (fourth century B.C.). The school flourished under the guidance of Eupompus’s successor Pamphilus (fourth century B.C.), who is described by Pliny as the first painter with a theoretical background in arithmetic and geometry. Pamphilus claimed that art could not attain perfection without the support of mathematical knowledge. Consequently he elevated visual arts to the level of liberal arts, to such a level that free-born children could be given drawing lessons. Similarly to painting and sculpting, the Greek architectural search for beauty and harmonious visual effect expresses itself through numbers and arithmetical ratios. Numbers, ratios, and mean proportions are borrowed from, and shared with, other scientific or artistic disciplines where they have proved to be successful, including anatomy and music. Vitruvius provides an extensive literary source about this research on numbers and ratios, but more than the results that he reports, it is the methodology of design, perceptible from his writing, veiled by the compilation of the list of rules that is significant. Speaking about temple design, the most important typology of Greek architecture, he writes: The design of a temple depends on symmetry, the principles of which must be most carefully observed by the architect. They are due to proportion, in Greek αναλoγ´ ´ ια. Proportion is a correspondence among the measures of the members of an entire work, and on the whole to a certain part selected as standard. Without symmetry and proportion there can be no principles in the design of any temple; that is, if there is no precise relation between its members, as in the case of those of a well shaped man. (Vitruvius 1914. III, 1)

For Vitruvius, the concept of “symmetry” is not the same as its modern mathematical meaning. In Greek and Roman times, the architectural concept of “symmetry” is the equivalent of the mathematical concept of “commensurability.” All dimensions of a building must be commensurable between them. Architects

41 Classical Greek and Roman Architecture: Mathematical Theories and Concepts

1179

must learn from Polycleitus who unveiled the “symmetry” of the human body – God’s creation – and apply the same concept in their design in order to try to attain perfect beauty.

Conclusion Relationships between architecture and mathematics are too often considered to be founded on a hierarchy and chronology which put mathematics first and architecture second. Such a hierarchy suggests that mathematics existed before art and architecture, and the former always provided the latter with a solid scientific background, a sort of database of shapes, geometric figures, numbers, and proportional systems, among which designers could pick at any given time the solution that best answered their needs. In a time in which borders between sciences, philosophy, and arts were loose and all fields of knowledge and culture were intertwined, mathematics grew simultaneously with the arts, acting as a catalyst as well as a recipient of progress and cultural evolution. The practical aspects and problems related to architectural design and building operation, often pushed to the development and progress of theoretical research and surely contributed to the arithmetization of geometry. As it often happens in Greek mythology, the classic problem of the duplication of the cube is coupled to a legend, the famous “legend of Delos,” and is therefore often referred to as the “Delian problem.” The legend of Delos is the most ancient and best historical example of interactive relationship between architecture and mathematics. The legend says that the people of Delos, struck by a severe plague, asked the oracle of Apollo how to calm the Gods’ wrath. The answer was that they only had to build a new altar to Apollo, twice as big as the existing one, which was of a cubic form. So the locals immediately obeyed the divine wish and built a new altar, whose edges were twice as long as the previous one . . . but the plague did not stop. In fact by doing so, the altar had been multiplied by eight instead of two. The legend of Delos is reported in every essay on the history of ancient mathematics, with some variation from book to book, and with more or less details, but the essence of the problem is always the same. Tradition and mythology associate this famous classic mathematical problem with an architectural problem. The goal is to build a votive monument of a given shape, cubic, and dimension. In order to determine the dimension, architects turn to mathematicians, who are unable to give a precise numerical value. Contemporary knowledge in arithmetic still lacks some fundamental answers to ordinary and practical problems in the field of construction. With that single and basic query, architectural design promoted efforts in theoretical mathematical research which would last several centuries. Independently of the fact that the legend might be true or not, it illustrates very explicitly the quality of the relationship between mathematics and architecture in ancient Greece, and more generally between theory and practice, which are not always linked by a mere cause-effect sequence, but often interfere in reciprocal influence and stimulation.

1180

S. Duvernoy

Cross-References  Classical Greek and Roman Architecture: Examples and Typologies

References Aristotle (1989) Metaphysics (trans: Tredennick H). Harvard University Press, Cambridge Caveing M (1996) Platon et les mathématiques. In: Barbin E, Caveing M (eds) Les philosophes et les mathématiques. Ellipses, Paris Euclid (1908) The elements (ed and trans: Heath Sir TL). C.U.P., Cambridge Heath ST (1981) A history of Greek mathematics. Dover, New York Plato (1935) The Republic : books 6–10 (trans: Shorey P). Loeb Classical Library 276. Harvard University Press, Cambridge, MA Plato (1967) Theaetetus, Sophist (trans: North Fowler H). Loeb Classical Library. William Heinemann Ltd, London Plato (2013) Republic, Volume I: Books 1–5 (trans: Emlyn-Jones C, Preddy W). Loeb Classical Library 237. Harvard University Press, Cambridge Vitruvius (1914) The Ten Books on Architecture (trans: Hicky Morgan M). Harvard University Press, Cambridge, MA

Classical Greek and Roman Architecture: Examples and Typologies

42

Sylvie Duvernoy

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vitruvius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symmetry: Numbers and Ratios in Greek Temples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ionic Temples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Doric Temples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arithmetization of Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roman Innovation: Amphitheaters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1182 1182 1186 1187 1189 1190 1193 1200 1201 1201

Abstract Focusing on Vitruvius’ De Architectura Libri Decem, the oldest extant architectural treatise available today, this chapter traces the ways in which mathematical concepts were embedded in the architectural design process in general and more particularly in the creation of Doric, Ionic, and Corinthian temples. In essence, Vitruvius provides a way of understanding how mathematics was used by the architects of the Classical Greek and Roman worlds to both solve practical problems and to create buildings which conformed to the highest aesthetic aspirations of the era. The study of the later typology of Roman amphitheaters provides a clear example of the changing role of geometry in architectural planning, as it shows innovative patterns with respect to Greek tradition and close connections between progress in mathematics and modernity in architecture.

S. Duvernoy () Politecnico di Milano, Milan, Italy e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_60

1181

1182

S. Duvernoy

Keywords Vitruvius · Commensurability · Symmetry · Modularity · Arithmetization of geometry · Greek temples · Archimedes · Apollonius · Roman amphitheaters

Introduction Architects rely on both geometry and arithmetic while designing. The geometric shapes that they create have to be quantified using numbers, in order to be constructed. This is still true today, as the various stages in the architectural design process have not changed much in time, but the concern for numbers has changed. In the Greco-Roman world, numbers were used to define the proportions of the building and to guarantee the beauty of the designed object. Symmetry, the aesthetic goal sought by classical Greek and Roman architects, was achieved through a thoughtful choice of architectural dimensions, all of which had to be commensurable. Such dimensions referred to a particular module or length and the choice of the true length of this basic module gave its scale to the entire building. However, irrational magnitudes resulted in incommensurable outcomes, raising questions for both mathematicians and architects about how to approach such issues in theory and in practice. As a result of this need for practical developments, architecture was among the driving forces behind the arithmetization of geometry and the task of assigning approximate values to irrational quantities. In this chapter, examples of Greek and Roman architectural connections to mathematics are introduced. In terms of sources, our knowledge of ancient Greek architecture relies mainly on surviving sacred buildings, that is, on the remains of temples. Conversely, remains of Roman buildings belong to many different architectural typologies, some of which show their origins in, and respect for, their Greek antecedents. The primary written source for understanding the theoretical, and mathematical, basis for classical Greek and Roman architecture is De Architectura Libri Decem by Vitruvius (c. 80–70 B.C. – after c. 15 B.C.). This work is the starting point for the present chapter, and thereafter sections examine the classical concept of “symmetry,” temple architecture, and amphitheaters, the latter two building types being ones where architecture and mathematics are especially closely connected.

Vitruvius De Architectura Libri Decem (Ten Books on Architecture) is the only still extant architectural treatise from ancient times. Having been written in Roman times, at the turn of the first century BC, and dutifully reporting the Greek cultural heritage, the book is of paramount importance for understanding the relationships between architecture and mathematics that existed in ancient Greece and in the Roman era. The author of De Architectura Libri Decem is Marcus Vitruvius Pollio, a former engineer and officer to Julius Caesar’s army, and the dedication of the book is to the Emperor Augustus. Vitruvius composed his treatise in ten books, of which

42 Classical Greek and Roman Architecture: Examples and Typologies

1183

the first seven are specifically dedicated to architecture, while the eighth is about water, the ninth is about clocks and astronomy, and the tenth is about mechanics and machinery. The tone and content of the ten books proves that Vitruvius was far from being a brilliant mathematician himself. He lived in the first century before Christ, after the end of the period of glory of Greek mathematics – which began with Thales of Miletus (625–546 BC) and ended with the deaths of Archimedes of Syracuse (c.287–c.212 BC) and Apollonius of Perga (c.240–c.190 BC) – and before the rebirth of mathematical research and the writing of the great treatises such as the ones by Nicomachus of Gerasa (fl. 100 AD). There were no famous mathematicians who were contemporaries of Vitruvius. In fact, he lived during the only historic period, from antiquity to the present day, in which there was an interruption of progress in mathematics (Williams and Duvernoy 2014). In Book I Chapter 1, in the first paragraphs dedicated to the formation and education of the architect, Vitruvius addresses the problem of the relationship between architecture and sciences and consequently between architecture and mathematics. Architecture, he argues, has two aspects, one purely theoretical and the other mainly practical. Because of this, the architect has to be skilled in both aspects. For Vitruvius, designing and building without the support of a solid cultural background in theoretical sciences will not lead to success and fame, while the use of theory, without any practical application of the design principles in any construction, will only result in a shadow of architecture. Yet, the duality of the architect’s skill set, which requires a preliminary education in both fields of theoretical and applied sciences, is the very fact that prevents the architect from any possibility of becoming a great scientist. Vitruvius admits (and seems to regret) that the education of the architect is aimed more at providing a general familiarity of the theoretical sciences than at a mastery of them, “so that he is not at a loss when it is necessary to judge and test any work done” (Vitruvius 2009, I,1). If the architects possessed exhaustive knowledge of other arts, they would no longer be architects: . . . those individuals on whom nature has bestowed so much skill, acumen, retentiveness that they can be thoroughly familiar with geometry, astronomy, music and other studies, go beyond the duty of an architect and are to be regarded as mathematicians. (Vitruvius 2009, I,1)

Thus, according to Vitruvius, the fundamental difference between the architect and the “mathematician” is the latter’s freedom and capacity to deepen their level of knowledge in one or more scientific fields. Vitruvius uses the term “mathematician” in a broad sense, as a synonym of “scientist,” “theorist,” or perhaps “expert.” For this reason the term “matematici” that appears in the original text has been interpreted in various ways by the many translators and commentators and has not always been translated in a way that refers specifically to the science of mathematics (Vitruvius 1990). In Vitruvius’s words there is a sense of admiration and even a touch of envy for those scientists who, not restricted by the numerous practical necessities of their trade, are free to undertake theoretical studies in their fields, eventually contributing to the increase and growth of knowledge by means of their discoveries and inventions.

1184

S. Duvernoy

Vitruvius makes two types of references to individual mathematicians. First, he sometimes provides a list of mathematicians in passing, as examples of scientists of the highest order but without mentioning any of their achievements. Second, he gives credit to individual mathematicians for particular discoveries. Thus, he cites Plato for the doubling of the square, Pythagoras for his theorem, Archytas for having resolved the problem of doubling the cube by means of half-cylinders, and Eratosthenes for the resolution of the same problem by means of the mesolabium. It is surprising to note that at no stage does he cite Euclid, nor mention the Elements, the theoretical basis of classical mathematics. This tends to suggest that Vitruvius was taught or learned geometry directly from texts which were focused on applications rather than from theoretical sources. Indeed, Vitruvius appears to be scarcely capable of establishing connections between his scant theoretical knowledge (mainly mentioned in the prefaces to the various books) and the practical applications that are discussed in the course of the treatise. Vitruvius assigns to geometry and arithmetic duties that cover the two aspects of the professional work: theoretical and practical. From the practical point of view, geometry teaches a student how to draw with the compass and the square and how to draw right angles or axes on the horizontal plane and levels on the vertical plane, while arithmetic makes it possible to calculate the costs of the construction process. From the theoretical point of view, arithmetic also makes it possible to control the proportions of the designed building by computing the ratios between the dimensions, and geometry and arithmetic together allow architects to solve the “difficult questions of symmetry” (Vitruvius 2009, I,1). In the De Architectura, references to geometry for the resolution of problems inherent in architectural design remain strictly within the context of plane geometry without ever entering the field of solid geometry. Furthermore, the geometric figures and numeric ratios suggested for the resolution of the design problems largely refer to simple lengths (one-dimensional elements) or to polygonal shapes, preferably quadrilaterals (two-dimensional elements), to the exclusion of complex compositions. The geometry of the circle is examined only in the case of the theater (Book V, 6), since the previous paragraph dedicated to round temples (Book IV, 8) is extremely brief and undeveloped from the geometrical point of view. However, there are no attempts whatsoever to quantify numerically the ratios between the diameter and circumference of the circle or between the diameter and the sides of inscribed polygons. It would be imprudent to draw general conclusions on the quality and quantity of mathematical knowledge of Roman architects based only on a reading of the De Architectura. Not having other similar, contemporary treatises, in order to compare and contrast them, we cannot on the basis of a single exemplar elaborate an absolute rule. Many recent studies by architectural historians have uncovered evidence of the reasoned use of contemporary mathematical and geometrical notions in the design of Roman monuments. Thus, the effort and merit of Vitruvius in conceptualizing the mental and deductive processes connected to architectural design resides largely in their literary transcription rather than in their geometric theorizing.

42 Classical Greek and Roman Architecture: Examples and Typologies

1185

While his knowledge of the theoretical treatises about mathematics may have been limited, Vitruvius is more precise about his architectural literary sources (all of which are lost to us). In the preface of Book VII, he compiles a list of texts written by Greek architects that presumably helped him to compose his own treatise and which cover a period of time of about 400 years. These “books” could be of two different types: either general treatises on architectural design or texts focusing on single monuments. It seems that some “books” discussed proportions and symmetry in general terms, while others described in a precise way the ideation process related to a specific work. When the authors of the books were the architects of the buildings in question, they typically wanted to put in writing the various design options that they applied in their own oeuvre. Silenus published a book on the proportions of Doric structures, Theodorus, on the Doric temple of Juno which is in Samos; Chersiphron and Metagenes, on the Ionic temple at Ephesus which is Diana’s, Pytheos, on the Ionic fane of Minerva which is at Priene; Ictinus and Carpion, on the Doric temple of Minerva which is on the acropolis of Athens; Theodorus the Phocian, on the Round Building which is at Delphi; Philo on the proportions of temples, and on the naval arsenal which was at the port of Piraeus; Hermogenes, on the Ionic temple of Diana which is at Magnesia, a pseudodipteral, and on that of Father Bacchus at Teos, a monopteral; Arcesius, on the Corinthian proportions, and on the Ionic temple of Aesculapius at Tralles, which it is said that he built with his own hands; on the Mausoleum, Satyrus and Pytheos who were favoured with the greatest and highest fortune. (Vitruvius 2009, Preface to Book VII)

If Vitruvius wanted to cite these texts about single monuments chronologically, in the order of their construction from the earliest to the most recent, he missinserted in his list the works by Pytheos. Pytheos was a contemporary of Philo. The arsenal in Piraeus and the temple of Athena in Priene were built in the same years, sometime after the construction of the Mausoleum. The dates of the temples built by Hermogenes are more difficult to establish since no other text besides the De Architectura provides additional information about him. However, the temple of Magnesia is probably to be placed at about 150 BC. Similarly, it is not possible to determine a date for the temple of Aesculapius at Tralles built by Arcesius. Vitruvius’s list shows that general treatises and descriptions of specific monuments could be written by the same author. For example, Philo, the architect of the naval arsenal in Piraeus, a civic building that he described in a dedicated text, is said to have also written a general treatise on temples. Arcesius wrote a book on the Corinthian order, but built an Ionic temple, which he also described in writing. It seems clear from these examples that practice and theory were not separate activities for architects in ancient Greece. On the contrary, it appears that writing and building were two aspects of the work of a professional that mutually benefited each other. Vitruvius, influenced by the Greek approach to architecture, explains in the very first paragraphs of his treatise, the complementarity between the two facets of professional work. . . . architects who have aimed at acquiring manual skill without scholarship have never been able to reach a position of authority to correspond to their pains, while those who relied only upon theories and scholarship were obviously hunting the shadow, not the substance. But

1186

S. Duvernoy

those who have a thorough knowledge of both, like men armed at all points, have the sooner attained their object and carried authority with them. (Vitruvius 2009, I,1)

A series of names completes the list of authors provided by Vitruvius who have written “on the laws of symmetry.” These authors include “Nexaris, Theocydes, Demophilus, Pollis, Leonidas, Silanion, Melampus, Sarnacus, and Euphranor” (Preface to Book VII). Greek literature about theories of architectural design was obviously abundant. Information about the content of this literature is partially revealed in the opening paragraphs of the first Book of the De Architectura. In all matters, but particularly in architecture, there are these two points: — the thing signified, and that which gives it its significance. That which is signified is the subject of which we may be speaking; and that which gives significance is a demonstration on scientific principles. (Vitruvius 2009, I,1)

It is clear in this quote that in classical Greece, reflection upon the architectural design process was intensive and that the codes for the definition of architectural form switched from pure creative manipulation to intellectual investigation, focusing on the relationships between architecture and other theoretical scientific disciplines, mainly the liberal arts. The Greeks were the first that we know of to commit into writing the theoretical concepts of architectural design. Since the echoes of those texts quite frequently occur in temple descriptions (apart from the naval arsenal in Piraeus), we may suppose that the evolution of styles and progresses in temple design, and the search for an archetypal “canon,” acted as a driver for the development of architectural theory.

Symmetry: Numbers and Ratios in Greek Temples Vitruvius dedicates two full books (III and IV) of his treatise to the design and layout of temples, thus showing how important this architectural typology was in the contemporary culture. In Book III he deals with the “temples of the immortal Gods describing and explaining them in the proper manner,” although only Ionic temples are considered: the restriction is stated at the end of the book, as if to correct the lack of initial precision. Book IV, on the other hand, is concerned with Doric and Corinthian temples but also includes Etruscan and circular temples and altars. Some scholars assert that Book III, because of its consistency and unique topic, is largely dependent on a single Greek source, the lost books by Hermogenes. In contrast, Book IV is more broadly based and reports pieces from several sources (Gros 1989; Tomlinson 1989). The fact that Vitruvius discusses the Ionic temples first and the two other styles after, regardless to his own chronological order that puts the Doric as the earliest and the Corinthian as the latest, surely adds credit to this interpretation. A major concern Vitruvius identifies in architectural design is solving the “difficult questions of symmetry.” In Chapter 2 of Book I, Vitruvius lists, in the following order, the six theoretical rules of architectural design: ordinatio, dispositio, eurhythmy, symmetry, decor, and distributio. Many commentators have

42 Classical Greek and Roman Architecture: Examples and Typologies

1187

already pointed out the awkwardness with which Vitruvius mixes Latin and Greek terms in this list, underlining the fact that he probably relies on at least two different written sources and his writing may be a series of quotations rather than a synthesis of his bibliographic sources (Frézouls 1987). Recent scholarship suggests an alternative interpretation, explaining how these concepts might be linked together. An interpretation attracting growing support detects a bipartite split between processes of design and the attribute they produce. Ordinatio might be the process of calculation that gives rise to symmetria; dispositio might be the process of composition which gives rise to eurythmia; distributio might be the process of evaluation that gives rise to decor. (Wilson Jones 2000: 40)

Indeed, ordinatio and symmetry are closely linked to one another. Ordinatio, Vitruvius says, is based on the choice of a special magnitude that the Greeks call “posotes” which determines all the proportional ratios of the various dimensions of an architectural object. Thus, notwithstanding some inelegance in his explanation, Vitruvius conveys the basic ancient principles of architectural design that are based on the concept of symmetry. The modern meaning of the word “symmetry” has shifted to a mathematical concept based on transformation by means of geometrical specularity with respect to a plane, a line, or a point. But for Greek and Roman design theory, “symmetry” was a precise mathematical concept and an aesthetic goal, wherein specularity and/or mirroring disposition with respect to a central axis could eventually fulfill but did not act as the only answer to the question. Symmetry is achieved through a thoughtful choice of architecture’s dimensions that must all be commensurable. This commensurability may be achieved in different ways. The easiest option is to choose a basic unit to which all measures will refer. Commensurability of all the various dimensions of a single monument through the multiplication or subdivision of a basic length acting as the “module” of the building will guarantee the symmetry of the architecture and therefore its aesthetic quality. Chapter 2 of Book I, where the concept of “modularity” is introduced for the first time, explains which part has to be selected as the basic standard dimension: “in the case of temples, symmetry may be calculated from the thickness of a column, from a triglyph, or even from a module.” Obviously the width of the triglyph may be the starting length for the design of the Doric temples only, while for the proportions of the Ionic and Corinthian temples, the thickness of the column will be used.

Ionic Temples In Book III, rectangular Ionic temples are classified by Vitruvius in two different ways: according to their importance and monumentality in relation to the number of columns on the façades and, more typically, according to the ratio symmetriarum applied in the design. His architectural classification is therefore based on numbers. These numbers may quantify either tangible elements (quantity of columns) or

1188

S. Duvernoy

Table 1 Symmetry: classification of Ionic temples according to the ratio between column diameter (D) and intercolumnation

Table 2 Symmetry: Ionic eustyle temples ratio symmetriarum according to a “module” (M)

intangible qualities, like the ratio between two repetitive distances that shape a façade. In the second classification, the ratio of the intercolumnar width to the diameter of the columns determines the style of the monument and the category to which it belongs. The height of the columns is fixed for each style (Table 1). In a process of growing abstraction, Vitruvius then determines and classifies the dimensional qualities of eustyle Ionic temples (the most beautiful) in terms of commensurability to a “module” and no longer to a basic element of the building such as the diameter of the column (Table 2). The module is – however – equal to the diameter of the column, but the fact of referring to a “module” instead of referring to a “column diameter” is a further step in the definition of theoretical concepts. The module could be any conventional length, and the concept of modularity can

42 Classical Greek and Roman Architecture: Examples and Typologies

1189

be applied to the design of any architectural typology. The modularity prescriptions for eustyle Ionic temples are said to be taken from the book by Hermogenes, the architect for whom Vitruvius seems to have had a special admiration. Hermogenes’ books probably represented the most up-to-date literature in architectural design that was available to Vitruvius.

Doric Temples In a manner which is similar to the study reported in Book III, in Book IV, Vitruvius lists the ratios that must be applied in the design of Doric temples. The proportions are here all given in terms of a “module” that corresponds to both half the diameter of the column and the width of the triglyph. Therefore, there is no contradiction nor are there several different schemes depending on the choice of the basic unit of design. The three options mentioned in Book I, Chapter 2 – the thickness of the column, the triglyph, or the module – end up to be equivalent, since they are linked to a clear relationship. The “module” is equal to half the diameter of the column which in turn is equal to the width of the triglyph (Table 3). From the prescriptions listed in Books III and IV of the De Architectura, it appears that the relationship between architecture and mathematics, as far as the design of rectangular temples is concerned, is very basic and simple. No design rule deals with surfaces or volumes. Only lengths, and ratios between lengths, are called for. No geometrical concept is involved: arithmetic is a sufficient and necessary support for the design. The project of a temple is quantified by a series of numbers, the definition of which is left to the privilege of the designer. These numerical specifications, which described the form of the projected building, had a dual purpose. First, they would simplify the construction process by transmitting the instructions to the builders in a simple mnemonic way. Second, in addition to their practical utility, these numbers not only expressed quantities but they also expressed aesthetic qualities. The definition of numbers is a design methodology: the ratio symmetriarum which derives from, and produces the commodulatio, is

Table 3 Symmetry of Doric temples according to a “module” (M)

1190

S. Duvernoy

both a guideline to be taught to designers and the result that they must achieve in order to create beautiful monuments. Measured surveys and archeological studies conducted on many extant temple remains show that the numerical rules given by Vitruvius were not unique. Analyses unveil many variations on the theme of temple design. The common link between all the monuments – the precious information contained in Vitruvius’s treatise – is not the series of ratios but the design methodology, being the theoretical concepts that Greek architects had established and to which they all adhered, without restricting their creativity. Strict compliance with the concepts of modularity and symmetry was never an impediment to improvement or originality (Figs. 1 and 2).

Arithmetization of Geometry However, the processes of architectural design and construction go beyond the realm of linear dimensions and their modularity. Building involves many kinds of computations including areas, volumes, costs, work force, and time, just to name a few. As soon as areas and volumes are considered, the commensurability of dimensions is at risk. Each geometric shape produces and contains in itself dimensions (lines, areas) which are not commensurable in the first sense of the word. That is, they are not multiples of the same unit. The incommensurability of the side and the diagonal of the square had been discovered many years previously by the Pythagoreans and the Egyptians before them. Other similar incommensurabilities can be found in most of the basic geometrical figures including the equilateral triangle, the half equilateral triangle (which Plato describes as the most beautiful of all triangles), and, of course, the circle. In fact, the height of the equilateral triangle √ √ of side equal to 1 is ( 3)/2; its area is ( 3)/4: both irrational magnitudes. Any  computation of circular perimeter or area involves the magnitude named “ ,” which is also irrational. All those magnitudes can be geometrically manipulated thanks to the use of the ruler and the compass: they may be exactly doubled, tripled, halved, and so on. But they need to be coupled with numbers when the figure is given a size and thus a dimension, once its shape subtends the geometrical diagram of an architectural object. An approximate value for the diagonal of the square had been sought by the ancient Egyptians, and the values most commonly adopted were 7/5 (or 14/10). The research  on the squaring of the circle lead Archimedes to state that the numerical value of was found somewhere between 3 + 1/7 and 3 + 10/71. From then on, the relatively easy-to-handle value of 3 + 1/7 (or 22/7) was extensively adopted for all computational purposes related to architecture and other practical fields. Also, for triangles, hexagons, and other polygons, 6/7 was adopted as a satisfactory numerical √ approximation of ( 3)/2. Furthermore, it seems that the Greeks had devised a numerical value that made it possible to approximate the solution of the doubling of the cube. In Book II, Chapter 3, Vitruvius lists two kinds of Greek bricks, the tetradoron and the pentadoron, both cubic, measuring, respectively, 4 and 5 palms to a side. The ratio

42 Classical Greek and Roman Architecture: Examples and Typologies

1191

Fig. 1 The Hephaisteion in Athens and the Temple of Poseidon at Sounion. (Wilson Jones 2006: 158)

5/4 is a fair numeric approximation for the irrational cube root of 2. Vitruvius tells the reader that the larger pentadoron was used in the construction of public buildings (for which monumentality was important), while the smaller tetradoron was used in the construction of private buildings (presumably of a minor scale). He does not relate the two brick shapes to the famous classic “Delian problem” (of doubling the

1192

S. Duvernoy

Fig. 2 The entablatures of the so-called Temples of Juno Lacinia (top) and Concord at Agrigento (bottom). The module (M) is equivalent to the triglyph width. In both cases M = 30 dactyls. The dactyl is 1/16 of the so-called Doric foot of ca. 328 mm. (Wilson Jones 2006: 159)

size of a cubic altar), but his words cannot escape the attention of a reader interested in the relationship between architecture and mathematics. The so-called “Delian problem” has its basis in a legend reporting Apollo’s demand for a cubic altar located on the island of Delos to be doubled in size, as an offering to stop a plague on the island. The Delians doubled the edge length of the old altar while building the new one, which did not in fact double the size, but multiply it by eight. Assuming that the former altar had been built with tetradorons, the replication of the same monument using a set of bricks twice as big as the old ones – pentadorons – would indeed have been a clever architectural solution to their problem. Unfortunately, to this day no archeological evidence has confirmed the actual existence of the bricks mentioned by Vitruvius or their use in architecture. Surviving monuments from Roman times are abundant, and they comprise samples of many different typologies: from sacred architecture to public and private

42 Classical Greek and Roman Architecture: Examples and Typologies

1193

buildings. Several measured surveys and analyses of monuments’ remains show skillful combinations of geometry and arithmetic. For example, The perimeter of a [circular] 70 ft wide tomb at Capua known as “Le Carceri Vecchie” has twenty two semi-columns,an unusual number, but one that exploited the fact that 22/7 is a close approximation to . It meant that the 220 ft perimeter divided up neatly into 10 ft intervals marked by the axes of the semi-columns. (Wilson Jones 2000: 79)

For large-scale buildings, the basic module could hardly be the diameter of a column or the width of a triglyph. Roman designers habitually chose a module of 12 ft as the unit for their dimensions. This would make all numbers describing the building 12 times smaller, therefore easier to handle and to remember. The Roman counting system, which is still in use today, is based on the multiples of 10, but having a module equal to 12 ft will make all the dimensions multiples of 12 too. Decimal numbers are given to multiplication, while duodecimal numbers are given to subdivision. Ten has only 2 factors (2 and 5), while 12 has 4, among which are 3 and 4. Multiples of four are particularly interesting in the case of geometrical patterns with double-mirroring symmetry with respect to two orthogonal axes. Multiples of seven modules appear in the case of circular or semicircular buildings, since they produce areas and perimeters expressible in round numbers when multiplied by 22/7.

Roman Innovation: Amphitheaters In the late Roman Republic and early Empire, a new typology of buildings appeared in Roman culture: the amphitheater. Classical Roman architecture displays a vast array of building types for shows and plays, each one intended for a special kind of spectacle: shows, plays, sport, games, or races. Theaters, open-air structures, were built to host dramas and ballets. Odeons were enclosed theaters, designed for music and lyrics. Stadiums hosted athletic games and races, and circuses – which despite their name were not of a circular shape – were used mostly for horse races. Each kind of building had a specific form consistent with its functionality. These four architectural typologies, as well as the shows that they hosted, were inherited from the Hellenistic culture, and early examples of each kind can be found in ancient Greece. Amphitheaters, however, are a Roman architectural innovation. They were designed to host a type of show that did not exist in Ancient Greece or in Oriental cultures: the gladiatorial combat (munera and venationes). Munera (fights between gladiators) and venationes (fights between gladiators and beasts) belonged to the traditions of the Samnite and Etruscan civilizations that Rome had conquered while expanding in the Italian peninsula. Originally Etruscan munera were private shows offered by relatives to honor the memory of valorous soldiers who had died in a war. Captured prisoners, that the victors had brought home as slaves, were forced to fight in pairs to the death, in special arenas that were temporarily built for the spectacle. Munera were thus originally a kind of ritual human sacrifice that Etruscans and

1194

S. Duvernoy

Samnites used to pay to the Gods as a tribute for victory and in remembrance of, or vengeance for, the dead warriors. The earliest amphitheaters appeared in Campania at the beginning of the second century BC and were immediately characterized by a closed elliptic/ovoid shape that had never been adopted in architectural design before. With the exception of the so-called oval forum of Gerasa in Roman Syria (now Jerash, Jordan), this geometric planning was never used for any other purpose in Roman culture. Form followed function, and since the function was new, a new geometric layout entered the architectural vocabulary. Scholars still wonder why amphitheaters are ovoid instead of being simply circular or rectangular. There are no objective answers to that question, only subjective ones. The most convincing arguments probably rely on the fact that the shape was a compromise between the theater hemicycle, which had a stage that was far too small, the circus shape which was too elongated, and the forum rectangle, which was not originally related to the concept of plays and shows and whose corners could act as “dead corners” during the progress of the show. In addition, a possible interpretation of the shape is that a circular plan would have been too democratic for a strongly hierarchical society such as the Roman one, and there would not have been a privileged position for the pulvinar of the Emperor, which always took place in the middle of the front row of the podium (the sector reserved for high-class society), at one extremity of the short axis of the monument. Furthermore, the appearance of elliptic or oval-shaped buildings in Southern Italy no earlier than the beginning of the second century BC, at the very time and in the very place in which Archimedes and Apollonius were studying conics, brings us face to face with a triple coincidence: historical, geographical, and morphological. So we may assume that the simultaneity and similarity are not due to chance and that innovation in architecture went parallel with research in theoretical mathematics, each one acting as an inspiration for the other (Fig. 3). The scholarly interest in amphitheaters has tended to be focused on the true nature of their geometrical pattern. No indication or information can be found in Vitruvius’ treatise. Since Vitruvius was only interested in classical architecture and not in innovative and modern buildings, this is not unexpected. Even though several important buildings of this kind were built before the completion of his books, Vitruvius mentions the word “amphitheater” only once in his whole treatise and does not discuss it any further. Thus, in the absence of reliable and contemporary literary sources, the morphological question (the shape of the building type) has to be solved by means of direct observations and inquiries. Are amphitheaters elliptic or oval? Pompeii’s amphitheater, the most ancient still extant example, is the most interesting one as far as relationships between architecture and mathematics are concerned. Accurate measured surveys and in-depth studies have proved that its geometrical diagram is indeed based on elliptic curves (Duvernoy and Rosin 2006) (Fig. 4). In the late Roman Republic, conic curves were a well-known part of mathematical knowledge. Conic curves were first discovered by Manaechmus (380–320 BC) while searching for the solution to the Delian problem around 350 BC. He is credited with having obtained them from the sectioning of acute-angled, right-

42 Classical Greek and Roman Architecture: Examples and Typologies

Fig. 3 View of the remains of Pompeii’s amphitheater

Fig. 4 Geometrical diagram and modularity of the axes’ lengths of Pompeii’s amphitheater

1195

1196

S. Duvernoy

angled, and obtuse-angled cones. The curves were further studied by Aristaeus (c.370–c.300 BC) and Euclid (fl. 300 B.C.). Sir Thomas Heath states that in Euclid’s day some kind of focus-directrix property was already known (Heath 1981). In any case, for tracing an ellipse, only the knowledge of the existence of the foci is necessary (independent of the directrix). By the time Pompeii was built, Archimedes and Apollonius of Perga had further investigated the properties of the conics. The ellipse was also known to Archimedes as the section of a cylinder; Apollonius even suggested that the planets had elliptical orbits of which the sun occupied one focus. In his treatise entitled “Conics,” Apollonius examines various properties of the lines drawn from the foci to points on the curve or to specific points on straight lines tangential to the curve. And finally, proposition 52 of book III shows that if, in an ellipse, straight lines are deflected from the “points resulting from the application” (i.e., the foci) to any point on the curve, the sum of the distances will be equal to the main axis (Apollonius of Perga 1998). This well-known property of the ellipse, which has been extensively applied to drawing elliptic curves by the socalled gardener’s method, because of its simplicity, is not emphasized by the author, and no corollary of the proposition is provided to point out a reverse practical aspect (Fig. 5).

Fig. 5 The so-called gardener’s method

Table 4 Comparing the perimeter and area of a circle and an ellipse R

Perimeter Area

Circle P=2πR S = π R2

a

b

Ellipse P = 2 π (a + b)/2 = π (a + b) S = π ab

42 Classical Greek and Roman Architecture: Examples and Typologies

1197

Fig. 6 Squaring the ellipse: symmetry and commensurability of Pompeii’s geometrical diagramcommensurability

1198

S. Duvernoy

In his book, On Conoids and Spheroids, Archimedes, before focusing on other topics, quickly studies the characteristics of the ellipse and shows how some properties of the circle can be likewise applied to the ellipse (Archimedes 2009). The fundamental proposition 4 compares the area of an ellipse with the area of a circle having the same diameter as the major axis of the ellipse. The result is that the two areas are in the same ratio as the minor axis is to the diameter of the circle. From this comparison it is clear that the area of any ellipse is equal to the product of its semidiameters multiplied by π. When calculating elliptic perimeters, the equation used for the circle can be directly applied, if the value for R (the radius) is substituted by the value of the arithmetic mean between the two semiaxes a and b. Similarly, when calculating elliptic areas, the square on the radius must be replaced by the rectangle a x b (Table 4). The following proposition 5 states that any elliptic area is in the same ratio to its circumscribing rectangle as a circle is to its circumscribing square. This statement makes it possible to compare the various areas of several ellipses, which are between them as the areas of their circumscribing rectangles. The ratio between the area of a circle and its circumscribing square (or an ellipse to its circumscribing rectangle) is equal to π/4 = (22/7)/4 = 22/28 = 11/14. The search for a module acting as the major common divisor of all dimensions of Pompeii’s amphitheater has shown that this module was, typically, equal to 12 ft. Expressed in modules, rather than in feet, the dimensions of Pompeii’s amphitheater show some striking “coincidences.” Similarly to most amphitheaters, Pompeii is divided into a central arena, a circumscribing podium for the elite, and a cavea for spectators divided into a media cavea (divided in 20 wedges in plan) and a summa cavea (divided in 40 wedges). The summa cavea is closed by a solid masonry ring which hosts what seem to have been individual boxes and which used to hold the poles for the vela: the shades protecting the spectators. The area comprising the arena and podium is equal to 225 square modules. The area comprising arena, podium, and media cavea is 400 square modules. The elliptic ring of the summa cavea and boxes is 441, and the total structure is 841 square

Fig. 7 Reconstructed transversal section of Pompeii’s amphitheater

42 Classical Greek and Roman Architecture: Examples and Typologies

1199

Fig. 8 Examples of geometrical diagrams of extant oval amphitheaters from the late Roman Republic and Imperial period (Wilson Jones 2006: 154)

modules. 225, 400, 441, and 841 are all square numbers. From this it appears that, at a time in which the squaring of the circle (together with the doubling of the cube and the trisection of any angle) was among the main mathematical concerns, the architect succeeded in squaring some ellipses, through the arithmetization of the geometry with an approximation of only 1,5%. Furthermore 841 (the square of 29) is the sum of 400 (square of 20) and 441 (square of 21). Thus, the designer used a Pythagorean triplet (20,21,29) to size his building and divide it into sectors (Duvernoy 2009) (Figs. 6 and 7).

1200

S. Duvernoy

As of today, Pompeii is the only monument to have revealed its elliptic diagram. All other studies conducted by scholars on later examples of the type have shown that their geometrical pattern is invariably based on oval curves. The oval, a polycentric curve made of two pairs of mirroring arcs of two different radii, provides a good approximation of the ellipse, as far as visual perception is concerned, and it has many practical advantages that the ellipse does not have. Parallel ovals are concentric: they may be drawn in small scale, or traced in full scale, from the same centers. The centers of the arcs make radial division possible and easy to compute. Oval shapes can be drawn according to different patterns, and the most recurrent ones for amphitheater design are shown on next figure. The evolution of diagrams during the four centuries in which amphitheaters were built throughout the Roman world reveals numerous variations on a single theme, with a decisive shift from the first ellipse to the flexible oval shape (Fig. 8).

Conclusion Although Vitruvius’ treatise is of fundamental importance to our knowledge on many aspects of Roman culture and civilization, as far as mathematics and architecture are concerned, its interpretation is limited by the fact that Vitruvius apparently did not study the preserved Greek mathematical books that are still available to modern scholars today, while he knew well the long lost treatises on architecture that no one can check any longer. In addition, he was more interested in the Greek tradition than in the progress made in Roman times, a factor which shapes the content of his work. The study of the measurements of some prominent Roman monuments’ remains shows that some designers, whose names are unfortunately forgotten, were far more skilled in theoretical mathematics than Vitruvius himself. They evidently made use of their knowledge to design and build innovative masterpieces. Whether, like their ancient Greek fellows, they wrote books about their inventions is unknown. No written records whatsoever are preserved. Modern technology, however, makes it possible to carry on accurate surveys from which evidence of the dynamic interaction between science and art (mathematics and architecture) is visible. The sole evidence of existing bridges between theoretical mathematics and the art of design in the Roman Republic is information that is worth highlighting in its own right, since archeological and historical studies conducted by modern scholars often tend to focus on single narrow topics, as if each was isolated from all other aspects of culture and science. The level of sophistication of a given society and its culture is readable not only in the writings but in all items that this culture produced, and it is up to scholars to see it.

42 Classical Greek and Roman Architecture: Examples and Typologies

1201

Cross-References  Classical Greek and Roman Architecture: Mathematical Theories and Concepts

References Apollonius of Perga (1998) Conics, books I–III. In: Densmore D (ed), Green Lions Press, Santa Fe–NM Archimedes (2009) On conoids and spheroids. In: Heath T (ed) The works of Archimedes. Cambridge University Press, Cambridge, MA Duvernoy S (2009) L’anfiteatro di Pompei: un esempio di sintonia fra matematica antica e architettura. In: Conferenze e Seminari dell’Associazione Subalpina Mathesis 2008–2009, Kim Williams Books, Turin Duvernoy S, Rosin P (2006) The compass, the ruler and the computer. In: Duvernoy S, Pedemonte O (eds) Nexus VI mathematics and architecture. Kim Williams Books, Turin Frézouls E (1987) Vitruve et le dessin d’architecture. In: Le dessin d’architecture dans les Sociétés Antiques: Actes du colloque de Strasbourg, 26–28 janvier 1984. In: Annales. Économies, Sociétés, Civilisations. 42è année, n. 2 Gros P (1989) Structures et limites de la compilation vitruvienne dans les livres III et IV du De Architectura. In: Geertman H, de Jong JJ, Kooreman A (eds) Munus non ingratum. Stichting Bulletin Antieke Beschaving, Leiden Heath ST (1981) A history of Greek mathematics. Dover, New York Tomlinson RA (1989) Viruvius and Hermogenes. In: Geertman H, de Jong JJ, Kooreman A (eds) Munus non ingratum. Stichting Bulletin Antieke Beschaving, Leiden Vitruvius (1990) De l’Architecture (trans: Philippe Fleury). Les Belles Lettres, Paris Vitruvius (2009) On architecture (trans: Richard Schofield). Penguin Books, London Williams K, Duvernoy S (2014) The shadow of Euclid on architecture. Math Intell 36(1):37–48 Wilson Jones M (2000) Principles of roman architecture. Yale University Press, New Haven Wilson Jones M (2006) Ancient architecture and mathematics: methodology and the doric temple. In: Duvernoy S, Pedemonte O (eds) Nexus VI architecture and mathematics. Kim Williams Books, Turin

Mathematics and the Art and Science of Building Medieval Cathedrals

43

Josep Lluis i Ginovart

Contents Introduction. The Cathedral and the Gothic Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gothic Apses and Sacred Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Theorica of the Canons of Tortosa Cathedral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Commentary on Euclid’s Elements by Al-Haijaj (c.325–c.265 BC) . . . . . . . . . . . . . . . . . . . Saint Augustine’s De Civitate Dei . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Translation of Plato’s Timaeus by Calcidius, with Part of a Commentary . . . . . . . . . . . . . . . Part of the Commentary on Plato’s Timaeus by Calcidius . . . . . . . . . . . . . . . . . . . . . . . . . . . Commentary on Somnium Scipionis by Macrobius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part of Geometria from Martianus Capella’s Marriage of Philology and Mercury . . . . . . . Geometria Incerti Auctoris by Gerbert (Silvester II) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Positional Number System of Adelard of Bath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Practica Versus Theorica of Tortosa Cathedral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Construction of Heptagons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Construction of Octagons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Geometria Fabrorum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematics and the Art and Science of Building Medieval Cathedrals . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1204 1206 1211 1212 1213 1214 1215 1216 1217 1218 1219 1219 1223 1230 1233 1235 1239

Abstract The Cathedral of Tortosa (1345) is one of the most important Gothic buildings of Catalonia (Spain). Its relevance is partly due to the large volume of documents that survive from the era of its construction. One of the most important of these documents is the Guarc’s parchment (c.1345–1380), which contains the drawing of the plan of a cathedral that was never built but is the oldest drawing of this type in Spain and offers us the information about the construction of

J. Lluis i Ginovart () Universitat Internacional de Catalunya, Barcelona, Spain e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_84

1203

1204

J. Lluis i Ginovart

geometric figures, such as the heptagon and the octagon in medieval times. The study of the Gothic layout through its imprints enables us to establish the geometric and arithmetic knowledge of the agents involved in the design and construction of the cathedral. This chapter provides a background to the use of mathematics in Gothic architecture, by way of a detailed study of the Cathedral of Tortosa.

Keywords Medieval geometry · Medieval drawing · Heptagon · Octagon · Dome · Tortosa Cathedral Abbreviations ACTo AHCTE FBMPM

Arxiu Capitular Tortosa Arxiu Històric Comarcal Terres Ebre Fundación Bertomeu March Palma Mallorca

Introduction. The Cathedral and the Gothic Order The Gothic era in architecture is conventionally defined as spanning from the twelfth to the sixteenth centuries in Europe. While it is regarded as originating in France, it soon spread across Spain, Portugal, and Germany, with the most prominent examples of Gothic architecture typically being religious structures. One of the most important of these is Tortosa’s Gothic cathedral, which is equidistant between those in Barcelona and Valencia, transformed and replaced the Romanesque cathedral from 1345 onward. The Arxiu Capitular of the Cathedral of Tortosa (hereinafter ACTo) contains a large collection of codices and manuscripts from the era (Fig. 1). The collection exists in part because the ecclesiastical figures of the bishop and Chapter, along with the medieval architect, envisioned the cathedral as the city of God built by men. As such, the collection makes it possible to connect the knowledge of the ecclesiastical promoters with that of the medieval masters. A key vehicle for attaining this knowledge is the building project, essentially embodied in the drawings of Gothic cathedrals. This relationship between the ecclesiastical figures (bishop and Chapter) and the medieval architect was emphasized by Wilhelm Worringer (1881–1965) in Formprobleme der Gotik (Worringer 1911) and by Erwin Panofsky (1892–1968) in his work Gothic Architecture and Scholasticism (Panofsky 1951). The search for the canons of medieval creation is partly found in the cosmology of the Timaeus by Plato (c.429–347 BC), which is recognized by Francis Macdonald Cornford (1874– 1943) in Plato’s Cosmology: The Timaeus of Plato (Cornford 1937). Thus, one of the points of overlap between clerical and artisan statements is established in the measurement and proportion of architecture, and Gothic drawings are considered one important tool that captures this overlap.

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1205

Fig. 1 Cathedral of Tortosa in Spain

From the perspective of geometric knowledge and Gothic cathedrals, Otto von Simson (1912–1993), in The Gothic Cathedral: The Origins of Gothic Architecture and the Medieval Concept of Order (Simson 1956), approached the existence of cult resources (treatises and codices of reference) and looked for evidence in the Civitatis Deis, De Ordine, and the Musica of Saint Agustín (354–420). This research was supplemented by examining authors, such as Boecio (480–524) and his De consolatione philosophiae and the Musica as well as the main commentators on Plato, such as Calcidius (f. 350) with Timaeus translatus commentarioque instructus, Marciano Capella (f. 430) with De Nuptiis Philologiae et Mercurii Comentarii, and Macrobius (f. 400) with Somnium Scipionis. The cathedral represents a synthesis of the knowledge of both the clergy and the builders. The clergy acquired knowledge from the codices, while the master masons learnt from the practice of their craft. The members of the Cathedral Chapter worked closely with the master builders, and despite the lack of primary sources demonstrating the transfer of knowledge between them, the evidence found when studying the completed cathedral is clearly related to the sources available at the Capitular Archive (ACTo) of Tortosa Cathedral (Fig. 2). It is therefore worth considering which methods the canons and architects used when designing Gothic apses (Beaujouan 1963, 555–563; Beaujouan 1975, 437–484; Høyrup 2006, 2009; Sarrade 1986: 27–40). Mathematical ideas spread throughout Europe during the Gothic period, thanks to the book De Scientiis by Dominicus Gundissalinus (fl. 1150), although its precursor was Enumeration

1206

J. Lluis i Ginovart

Fig. 2 Apse and presbytery of Tortosa Cathedral

of the Sciences by al-F¯ar¯ab¯ı (c.870–950). According to al-F¯ar¯ab¯ı, mathematics, a sciencia doctrinali, is one of the five known sciences. Mathematical sciences include arithmetic, geometry, optics, astronomy, music, the science of weights and mechanics. In the sections on arithmetic and geometry, al-F¯ar¯ab¯ı makes a distinction between theorica and practica (González 1932: 97–105). Gundissalinus used the same terms in the third chapter of De Scientiis (Alonso 1955: 85–112). As such, the medieval cathedral ecclesia materialis was viewed in terms of its material presence and construction, although the desire to create an ecclesia spiritualis, depicted in the Chapter’s Archives, was hidden.

Gothic Apses and Sacred Geometry The Tortosa Cathedral’s ACTo has both preserved Neoplatonic codices and most of the new cathedral’s construction books – Llibres d’Obra (Ll.o.) (Bayerri 1962, 122– 503). Tortosa Cathedral chevet (or eastern end), with a heptagonal plan, was roofed between 1383 and 1441. It has a double ambulatory apse, which was constructed around the old Romanesque cathedral. The first construction phase of the belt of radial chapels took place between 1383 and 1424, with a very low sectional proportion of 9:5. During the second phase of construction, the ambulatory, the sectional proportion increased to a proportion of 9:6. Unlike the previous vaults, it was built symmetrically (1424–1435), closing inward from the ambulatory’s mouth to the presbytery. Finally, the presbytery was covered (1435–1441) (Fig. 3).

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1207

Fig. 3 Laser scanner of Tortosa Cathedral

The basic pattern of measurements appearing in the Llibres d’Obra (ACTo) is the cana of 8 palms and the palm of 12 fingers. The cana of Tortosa is defined in Book IX, no. 15.5 of the Consuetudines Dertosae (1272) (AHCTE cod.53, fol.256r), and in the copy of 1346 when the gothic cathedral was started, Llibre de les Costums Generals feutes de la insigne ciutat de Tortosa (FBMPM, fol.100r). By comparing the unifying documents of the cana of Tortosa with that of Barcelona (24-VII-1593), it can be concluded that the cana of Tortosa used for the cathedral measures 1858 m and the palm 0,2323 m. Thus, the cathedral chevet has the following metrological proportions: it is 150 palms wide and 100 palms deep, and the radial chapels have a square plan of 21 × 21 palms. Using computer technology, it was determined that the center points of the pillars, around which the work was raised, are equidistant by 24 palms, i.e., 3 Tortosa canas. The metric in the apse section is 45 palms in the radial chapels, 72 palms at the ambulatory, and 100 palms at the presbytery, where the keystone has a diameter of 10 palms (Fig. 4). The importance of radial chapels, as a modular element in the cathedral’s construction, originates in the Gothic liturgy Prochiron, vulgo Rationale divinorum officiorum (1291), by the bishop of Mende, Gulielmus Durandus (1230–1296). In that work the correspondence between the physical ecclesa materialis and the heavenly ecclesia spiritualis is defined (Sebastián 1994, 352–355). The new pattern was first used at Clermont-Ferrand, then Narbonne, and came to Catalunya through Girona, where the Chapter established a program of nine chapels at the apse (Street 1926, 318, 339). Direct knowledge of Durando’s text in Tortosa Cathedral is by

1208

J. Lluis i Ginovart

Fig. 4 Metrical structure of the apse of Tortosa Cathedral

way of the códice ACTo no. 58, dating from the end of the thirteenth century, the Roman incunabula dating from 1477 (ACTo no.258), and the Venetian incunabula from 1482 (ACTo no.290). Following this tradition, the radial chapels in Tortosa Cathedral were placed 54 palms from the center of the presbytery, and the pillars are also equidistant at 24 palms (3 canas). There is therefore a relationship between the circumference radius (54 palms = 18 modulus) and the side of the 14-sided polygon (24 palms = 8 modulus), with a proportion of 9:8. In medieval Christian culture, since the days of the Fathers of the Church, the number seven had represented the Creation: the first 6 days of work (Gen. 1:1), and the seventh, when God rested. In the Pythagorean tradition, the number

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1209

seven was the perfect and religious number, known as telesphoros. The humanist and mathematician Charles Bouvelles (1478–1567), the author of Geometricum Introductorium (1503) in the ACTo 300 published in Paris in 1510 (Bouvelles 1510) and Geometrie pratique (1542), acknowledged that such an important figure for Christian symbolism as the heptagon did not appear in the Elementa (Chap.2.57) by Euclid (c.325–265 BC) (Bouvelles 1542, 5), (Fig. 5). In spite of this, many Gothic cathedrals, including the Cathedral of Tortosa, have apses with heptagonal structures, although many key treatises do not consider this geometry. The Elementa as translated by Adelardo of Bath (1075–1166) in 1142 considers the layout of regular polygons in Book IV, the triangle (IV.5), the square (IV.6 to 9), the pentagon (IV.11 to 14), the hexagon (IV.15), and the 15-sided polygon (IV.16) in the first edition in Spanish (Euclides 1576, 71–79), but fails to mention either the heptagon (Heath 1908:2, 88–111) or the mathematical syntax in Ptolemy’s (c.85–165) Almagesto, which was translated by Gerardo of Cremona (1114–1187) in around 1175. On the measures of the straight lines drawn in the circle, (LI, IX), figures of a square, pentagon, hexagon, decagon, and dodecagon (Toomer 1984, 35–74) appear in the first edition in Spanish (Ptolomeo 1578, 61–81). Only a few works were known that mentioned the heptagon: the Heptagon Book by Archimedes (287–212 BC) (Hogendijk 1984, 197–330; Knorr 1998, 257–271), transmitted by Arab sources – in this case, by Abu Ali al-Hasan ibn al-Haytham (c.965-c.1040) (Rashed 1976, 387–409), through Archimedes lema; the text written by Abu Sahl Waijan ibn Rustam al-Quhi (c.940-c.1000) (Hogendijk 1984, 231–234) on the proportional division of the segment into three parts; and finally another text on the trisection of the right angle written by Abu Said Ahmad ibn Muhammad Al-Sijzi (c.945-c.1020) (Hogendijk 1984, 290–316). Indeed, the angle trisection method influenced the construction of the heptagon discussed in the earliest medieval Latin texts, such as the Geometria vel de triangulis libri IV by Jordanus Nemonarius (1225–1260) (Liber IV, 23) (Curtze 1887, 25–32) or Varii de Rebus Mathematicis Responsi, Liber VIII (1593) by François Viète (1540–1603) and his Protasis IV.

Fig. 5 Charles Bouvelles ACTo n◦ 300, Tractatus varii de rebus philosophicis

1210

J. Lluis i Ginovart

Fig. 6 Laser scanner heptagonal apse of the Cathedral of Tortosa

Theorema (Viète 1696, 362–363). All these methods are completely inapplicable for the construction of Gothic apses (Fig. 6). The heptagonal geometric construction which we have inherited today determines the side of the regular heptagon inscribed within a circumference as the height of an equilateral triangle with a side equal to the circumference’s radius. It was described by Albrecht Dürer (1471–1528) in his Underweysung der Messung, mit dem Zirckel und Richtscheyt: in Linien Ebnen vo gantzen Corporen (1525), as a corollary to the layout of the pentagon (LII.15), rather than of the heptagon (LII.11) (Dürer 1525, 27v–28v). The method has the same matrix as the one mentioned by Mohammad Abu’l-Wafa Al-Buzjani, (940–998) Kit¯ab f¯ı m¯a yah.t¯aju al-s.a¯ ni‘min al-a‘m¯al al-handasiyya (Book on those geometric constructions which are necessary for craftsmen) (c.993–1008) (Aghayani-Chavoshi 2010; Suter 1922, 94–109; Woepcke 1855, 218–256: 309–359), (CII.6) and (CIII.13), which was also completed in the instructions in his Aritmética (P2, C.IV-V) (Saidan 1974, 367–375). The heptagonal construction method by means of geometrical instrumentation was first refuted by Kepler (1571–1630) in his Harmonices Mundi, Libri V; in his Primus Geometricus, De figurarum regularium, quae proportiones harmonicas pariunt, ortu, classibus, ordine et diferntiis, causa scientiae et deomonstraciones

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1211

(1619); and in the Propositio, Heptagonus et figurae ab eo ommes (LI.45) (Kepler 1864:5, 101–108) and later by Gauss (1777–1855), at the end of his Disquisitiones Arithmeticae (1801) (Sect. VII – Propositions 361–366) (Gauss 1801, 454–463). This means that the heptagon is one of the most enigmatic geometric figures, particularly if we take into account that it is the basic layout geometry for the apses of some gothic cathedrals. The medieval architect therefore had to consider the problem of how to lay out seven chapels around the arc of a semicircle. The problem was aggravated in the case of the Cathedral of Tortosa because the center of the circumference necessary for laying out the seven radial chapels was inaccessible, since it was physically located inside the old cathedral’s presbytery, which was replaced by the Gothic building and which was still functional. The magister operis therefore had to resolve three geometrical problems when building the heptagonal apse: first, finding a method to lay out the heptagon, as this method did not appear in the texts available to him at the time; second, he had to build the geometric figure without knowing where its center was; and finally, he had to commensurably resolve the proportional ratio between the radial chapels, situated along the curved section of the apse, and the side chapels, situated along the straight section of the choir and the side aisles (Fig. 7).

The Theorica of the Canons of Tortosa Cathedral After carrying out an in-depth analysis of the content of a number of codices, the sources listed below were identified as providing a connection between mathematics, architecture, and religious beliefs. Significantly, the Canons had access to these works prior to the construction of the Gothic cathedral: (1) Commentary on Euclid’s Elements (c.325–c.265 BC) by Al-H.ajj¯aj ibn Y¯usuf ibn Mat.ar (786-833), ACTo 80 (fol. 161r.6-13), twelfth c. (2) Saint Augustine’s De Civitate Dei (The City of God) (354–420), ACTo 20, (fols. 1r-408r), twelfth c. (3) Translation of Plato’s Timaeus by Calcidius (fl. 350), ACTo 80 (fols. 146r155v.14), including part of his commentary (fol. 155v.15-66), twelfth c. (4) Another part of the commentary on Timaeus by Calcidius (fl. 350), inserted in ACTo 236 (fol. 39), thirteenth c. (5) Commentary on Somnium Scipionis (The Dream of Scipio) by Macrobius (fl. 400), ACTo 236 (fol. 1r-61v), except fol. 39, thirteenth c.; both books constituting the commentary are complete. (6) An excerpt from Book VI, Geometria, in De Nuptiis Philologiae et Mercurii (On the Marriage of Philology and Mercury) by Martianus Capella (fl. 430), ACTo 80 (fols. 160v.28-161r.5), twelfth c. (7) Part of Books III and IV of Geometria Incerti Auctoris by Gerbert of Aurillac (Silvester II, c. 940–1003), ACTo 80 (fols. 159r.1-160v.27), twelfth c. (8) The positional number system, ACTo 80 (fol. 162r.1-3), of Adelard of Bath (1090–1160), twelfth c.

1212

J. Lluis i Ginovart

Fig. 7 Superposition of the Romanesque cathedral and new Gothic cathedral

In what follows we outline part of the content of these sources, in which we focus on knowledge pertaining to arithmetic and geometry and related to the part called practica. It is not the objective of this chapter to track the origins and genealogy of these sources.

Commentary on Euclid’s Elements by Al-Haijaj (c.325–c.265 BC) This extract (fol. 161r.6-13) is based on Euclid’s Elements by Al-H.ajj¯aj ibn Y¯usuf ibn Mat.ar (786–833): Hec est de abecedario. Ait Elhageth dic [it] quia linea longior cum proporcionaliter in potencia. [Biteg dixit]. The Commentarii reads as follows: The first of the irrational numbers will be the first side, the second one will be the second side, the third one will be the third side, and so on. In the same way it is situated beyond the extremes that are always further away, its rational portion is more distant, as it is further away from the previous rational square. It thus needs to be understood that no rectilinear angle can be so small that it does not fit inside it, that angle a b d cannot be narrower, as shown in, because angles c and d are smaller than it, or in other words, a [b] d, according

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1213

Fig. 8 Detail from the commentary on Euclid’s Elements by Al-Haijaj, ACTo 80, fol. 161v to [theorem] XVI [from the first book of Euclid’s Elements], but not d, because c and e are narrower. As a result, if someone framed it in those terms, we would be surprised to see that the three angles are equal in the trigon. This is because two of them must be right angles, meaning that the other one must be smaller than the minimum [which is impossible].) (Fig. 8)

Saint Augustine’s De Civitate Dei ACTo 20, Saint Augustine’s De Civitate Dei (354–420), contains 22 books (fols. 5v-407v); an introduction (fols. 1r.5r-4v), including five graphical representations; and an appendix (fol. 408r), one of which is Fig. 8, representing the creation and the symbolism of the numbers 5, 6, and 7. The first book begins with “Gloriosissimam Civitatem Dei, sive in hoc . . . ” (fol. 5v), and Book XXII finishes with “Explicit Liber vicesimus secundus . . . Finito libro Reeddamus gracias Christo” (fol. 407r) (Fig. 9). In Book XI, Civtatem Dei dicimus cuis ea Scriptura . . . (fol. 156r-170v), the number 6 is recognized as the number of perfection, for it is the first number made up of its own parts added together, 6 = 3 + 2 + 1 (XI.30). To Saint Augustine,

1214

J. Lluis i Ginovart

Fig. 9 Detail from Saint Augustine, Civitatis Dei, ACTo 20

the number 10 was also significant: its divisors are 1, 2, 5, and 10, and it is made up of the number 3, representing the Trinity, and the number 7, which represents the seventh day, recognized as the day of God, and the sum of 4 and 3 (fols. 168v-169v) (XI.31). The number 12 has divisors 6, 4, 3, 2, and 1 (XI.30). At the end of Book XX, from “De die ultimi judicii Dei, quod ipse donaverit . . . ” to “ . . . et posse facere quod imposible est infideli” (fols. 333r-359r), new proportional references appear. Saint Augustine redefines the number 12 as the number of the Apostles; 12 is the product of 3 and 4, the two parts of number 7. The number 12, which represents the 12 tribes of Israel, is the triple of 4 and the quadruple of 3 (fols. 335r-336v) (XX.5.5). The number 1000 appears as the measure of past or future time. 1000 is the perfect number in the fullness of time. It represents the cube of the number 10, inasmuch as 10 × 10 = 100, a square, but a plane figure, while to make a solid figure, with volume, the 10 × 10 × 10 = 1000 (XX.7.2).

Translation of Plato’s Timaeus by Calcidius, with Part of a Commentary ACTo no. 80 contains Plato’s Timaeus translated by Calcidius (fols. 146r-155v) and includes only a summary of his commentary about arithmetic and geometry (fol. 155v.15–66) (Fig. 10). The commentary starts on fol. 146r.1, “Socrates in exortationibus suis virtutem . . . ” (Moreschini 2003, 4–109; Waszink 1975, 5–52)

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1215

Fig. 10 Details from the commentary on Timaeus by Calcidius, ACTo 80, fol. 156v, fol. 150r

with the translation of the first part and goes to “nancisceretur imaginem. Liber Platonis Timaeus explicit [32 Waz] on fol. 152v.3. At the end, there is a commentary on Timaeus, from “[Quis igitur] primae portionis numerous . . . ” (fol. 155v.15– 66) to “qui unt in formula”. The work only shows one commentary on Timaeus: Descriptio tertia, quae est armónica (XLIX), accompanied by [tab. 9]. Figure 10 (left) provides the description for the ratios diatessaron and diapente (Moreschini 2003, 204–207; Waszink 1975, 88–89), while Fig. 10 (right) provides the generation of numbers -2-3-. One of the codex’s peculiarities is the figure inserted between fol.150.1 “cuncta intra suum ambitum” [25 Wz] and fol.150r.41 “a Graecis epitritum dicitur” [28Wz], in the right margin (fols. 150r.20-150v.16). The figure explains the passage devoted to generating mathematical proportions [27Wz-28Wz]. In Calcidius’s translation, they are generated as follows: 1, 2 (=2 × 1), 3 (=2 + (1/2) × 2), 4 (=2 × 2), 9 (=3 × 3), 8 (=1 + 7), and 27 (=27 × 1). In the intervals it defines the whole plus its half part (1 + 1/2); the whole plus its third part (1 + 1/3), which he called epitrite; the whole plus its eighth part (1 + 1/8), which is called epogdus; and the double, triple, and quadruple and the ratio between (243:256). The figure accompanies Timaeus throughout his comments. XXXII [tab. 7] and XLI [tab. 8] (Moreschini 2003, 186–191; Waszink 1975, 89–91) are summarized in a single figure. The first one is dedicated to the origin of the soul with the numbers 1, 2, 4, 8 and 1, 3, 9, 27, while the other one refers to harmonic modulations with the series 6, 8, 9, 12, 16, 18, 24, 32, 36, 48 and 6, 9, 12, 18, 27, 36, 54, 81, 108, 162.

Part of the Commentary on Plato’s Timaeus by Calcidius In the codex ACTo 236, fol. 39, in the middle of Macrobius’s commentary on The Dream of Scipio, there is part of the commentary on Timaeus by Calcidius, “De

1216

J. Lluis i Ginovart

modulatione siue Harmonica” (XL, XLI, XLII) [tab. 7] [tab. 8] (Moreschini 2003, 186–191; Waszink 1975, 89–91); XL is complete, “Itaque figura similis eius quae paulo superius” (ACTo 236, fol. 39r.7), and so is XLI, “Quia VI numeris facit unum limitem et item XII” (ACTo 236, fol. 39r.10). XLII is incomplete; the beginning, “Haec eadem ratio”, is cut off by the binding.

Commentary on Somnium Scipionis by Macrobius In ACTo 236, fols. 1r-61v, Book I goes from fol. 1r, “Inter Platonis et Ciceronis” (I.1.1), to fol. 35v.18 “disputationem sequentium reseruemus” (I.23.13) (ArmisenMarchetti 2001, 1–134; Willis 1970). Book II goes from fol. 35v.19, “Superiore comentario Eustathi” (II.1.1), to fol. 61v.28, philosophiae comtinetur integritas (II.17.17) (Armisen-Marchetti 2003, 1–869; Willis 1970), from which passages I.5.7 to I.5.13 (fol. 6r) are missing. In the first part of the commentary (fol. 6r.4), the first citation of the dream (I.5.2) starts by mentioning the notion of plenitude of arithmetic, “Ac prima nobis tractandam . . . ” and goes to fol. 12v.69, which concludes with the arithmetical excursus (I.6.83) “singulos certa lege metitur”. In the left margin (fol. 6v.19–22), where he talks about the virtues of the number 7 (I.6.3), a diagram appears with the 2 series 1, 2, 4, 8 and 1, 3, 9, 27, recalling God as the creator of the soul, taking even and odd numbers with doubles and triples. It defines the virtues of the main numbers 8, 7, 1, 6, 2, 5 and 3, 4 (I.5.15 and I.6.23) (Fig. 11). He callsthe number 8 justicia (justice): 8 = 7 + 1; 8 = 2 × 4; 8 = 2 × 2 × 2; 8 = 5 × 3. Macrobius called the number 7 pleno (full): 7 = 1 + 6; 7 = 2 + 5; 7 = 3 + 4. He calls 1 monás (monad); 1 is both male and female, and even and odd at the same time. The number 6 has 1/6,

Fig. 11 Detail from the commentary on Somnium Scipionis. (ACTo 236, fol. 51v and fol. 52r)

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1217

1/3, 1/2, 1: 6/3 = 2, 6/2 = 3, 1/6 = 1; 1 + 2 + 3 = 6. The number 2, called diada (dyad) is considered the first number after 1. The number 5 is the supreme God and is the sum total of the universe. The number 3 is the number situated between the numbers 1 and 5. The number 4 is the first number to obtain two means. In Book II, the part dealing with music, the harmony of the spheres (II.1.1) appears, and he explains why we do not hear the music of the spheres (II.4.15) non capitat audium (fol. 43r.4). Both fols. 46v-47v, diagrams of the earthly and celestial orbs, and fol. 51bis contain the figure with the definitions of diapente and diatessaron. The diagram explains the harmonic relations (II.1.15) similar to the way they are described in Calcidius’s commentary on Timaeus XLIX (Waszink 1975, 98–99). He defines the harmonic relations (II.1.4), as well as the concepts of epitrite (3:4), hemiolia (3:2), double (2:1), triple (3:1), quadruple (4:1), and epogdus (9:8) (fol. 36v.23). The dià tessàr¯on comes from the ratio of the epitrite, the interval dià pénte from the hemiolia, the dià pason from the double, the dià pason kaì dià pénte from the triple, the dis dià pason from the quadruple, and the tónos (II.1.15–20) from the epogdus, which is completed with the definition of a semitone 243:256 (II.1.15–22), which the Pythagoreans called diesis (fol. 37r.13).

Part of Geometria from Martianus Capella’s Marriage of Philology and Mercury The excerpt in ARCTo 80, fols.160v.28-fol.161r.5, is from Book VI, Geometria, of the De Nuptiis Philologiae et Mercurii, and goes from “Ergasticis Schematibus” (715) to “Ex his aloge XIII fiunt, quarum prima dicitur Mese Alogos” (720) (Capella 2001, 486–491; Willis 1983). It defines the two different types of plane figures: ergastic figures which contain the precepts to form any figure and apodictic figures which provide evidence. The methods used are systatikós, which creates a triangle from a few lines; tm¯ematikós, which shows how lines can be cut for a procedure; anágraphos, which shows how a line can be joined and described; éngraphos, which shows how a figure can be inscribed in a circle; perígraphos, which shows how a figure circumscribes a circle; parembolikós, equivalent polygons; and proseuretikós, for finding the mean proportional between two lines of different lengths. The except from De Nuptiis Philologiae et Mercurii defines three types of angles: regular angles, which are right and always identical; narrow angles, which are acute and variable; and, finally, wide angles, which are obtuse and variable like acute angles and wider than right angles. The lines are isotes when two equal lines are in proportion to a mean line the length of which is equal or double; homólogos, when the lines meet; análogos, when a line is twice as long as another line but half the length of a third one; and álogos, or irrational, in which there is no proportional coincidence. All lines are either rh¯etós or álogos. The first of these is rational and it can be compared with a common measurement, while the second does not match any measurement, so it cannot be compared. Lines may also be classified as those

1218

J. Lluis i Ginovart

that are the same as others, symmétrus, and those that are not, asymmétrus. It is not only length that makes them commensurable, also their strength, which is referred ´ to as dynámei symmétroi. Those lines that are the same length are called mêkei symmétroi. Those that differ in either length or in strength are asymmétros. These lines give rise to another 13 irrational lines.

Geometria Incerti Auctoris by Gerbert (Silvester II) ARCTo 80 fols. 159r.1-160v.27 deals with questions of geometry ascribed to followers of Gerbert. The codex does not appear in the main editions of the work, and the text was part of Caput XIV-Caput XXXII (Silvester II 1853: 115–127), Cap. XIV-XXXII (Olleris 1867, 427–441) from Liber IV and Liber III (Bubnov 1899, 317–330, 336–338). Geometria consists of 20 propositions from different sources. The definitions P-1 and P-2 are gromatic, while P-3, P-4, P-5, P-6, and P-7 are utilities of the astrolabe of Arab origin, as are P-8 and P-9, though the latter had its foundations in optics. P-18, P-19, P-20, and P-21 are ascribed to Euclid’s Optics, as are P-10, P-11, P-13, P-15, and P-16. Finally, triangle proportionality is defined in P-12, P-14, P-17, and P-20, including isosceles (P-18) and Pythagorean (P-19) triangles. This proportions use a proportional base, which unit is 12, that, at the same time, it can be divided as 1:1, equality; 2:1, dupla or diapason; 3:2, sesquialtera or diapente; and 4:3, sesquitertia or diatessaron. Of particular interest in this codex are the different interpretations given of proposition P-20. In the editions of Olleris (1867) and Bubnov (1899), an auxiliary vertical construction is used to construct a slope which is described as a vertical auxiliary element, which may be either a rule or plumb line uhz, unlike the Migne edition (Pez 1853) and ACTo 80, fol. 161v, in which uhz is constructed horizontally. In the diagram included in ACTo, a set square is used on the line of sight bd at point d to determine point u and uh, unlike the Pez edition, which extends the line of sight bd until it meets the horizontal line hz which shows a comparative analysis of these interpretations (Fig. 12).

h (zh/hd) - (ge/gb) (uh/hd) - (gb/gd) (eb/ab) - (ge/gd)

u

u

z

h

u

z

h d

(zh/hd) - (ge/gb) (uh/hd) - (gb/gd) (eb/ab) - (ge/gd)

d

z (uz/hu) - ( dg/ab)

d

a a

a e

b P-19 (Pez XXXII)

g

g

b

e e

P-19 (OII.52 , Bub.59)

b

g

P-19 (ACTo 80)

Fig. 12 Comparison of different editions of Prop. 20 from Geometric incerti auctoris: (left) Pez (1853) (center) Olleris (1867) and Bubnov (1899); (right) ACTo 80, fol. 161r

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1219

Fig. 13 Detail of the positional numbering system found in ACTo 80, fol.162r.1-3

The Positional Number System of Adelard of Bath Indo-Arabic numbering is found in ACTo 80, fol. 162r.1–3), arranged in three lines. In the first line appear figures 2, 3, 4, 5, 6, 7, 8, and 9; in the second, the numbers 11, 12, 13, 14, 15, 16, 17, 18, 19, and 20; and in the third, 30, 40, 50, 60, 70, 80, 90, and 100 (Fig. 13). The notation of the number zero, represented by the approximate shape of the letter tau (τ), is used to write the tens precisely. The Mozarabic tradition of codices Vigilianus written in 976 and Aemilianensis written in 992 orders the numbers in descending order, 9, 8, 7, 6, 5, 4, 3, 2, 1 (Menéndez 1959, 45–116). In the same way it does, Abraham Ibn Erza (1140–1167), in Sefer ha mispar (Silberberg 1895, 2), Leonardo Pisano (1180–1250), Liber abaci (Boncompagni 1854, 253), and Alexandre Villedieu, (c.1175–1250) in Carmen de Algorismo (Halliwell 1841, 3–27, 73–83), as well as in El Algorismus vulgaris by Joan Sacobosco (1200–1256) (Curtze 1897, 1–19). Pedro de Dacia (c.1235–1289), Rector of the University of Paris and canon of the cathedral, introduces it in this sphere, with Commentun Magistri Philomeni de Dacia (Curtze 1897, 20–92), and the medieval manuscript MS 1 from Cashel cathedral in Tipperary, Ireland, held in the G.P.A. Bolton Library (Burnett 2006, 15–26). The notation τ for the formation of numbering the position of the tens, from 20 to 90, and the hundreds was also used by Johannes Ocreatus (fl. 1200) and the followers of Adelard of Bath (1090–1160) (Burnett 1996, 221–331).

Practica Versus Theorica of Tortosa Cathedral The “traça de Guarc” (c.1345–1380), ink on parchment (917 × 682 mm), Fábrica no. 49 (ACTo), is a plan for a Gothic cathedral chevet which is different to the one that was built (Fig. 14). The existence of a modular structure with a reiterative transfer of measures – starting from the lateral chapels – was deduced from an analysis of the parchment’s auxiliary lines, traced with stylet and compass. This

1220

J. Lluis i Ginovart

Fig. 14 The parchment of Antoni Guarc (c.1345–1380)

allowed the method to be used to trace the octagon of the ciborium and then determine the double heptagon at the apse. Guarc’s structure is based on a modular system in base 9: this provides the proportions of 18 (9 + 9) for the central nave, 9 for the side aisles, and 8 for the chapels. The 3 × 3 structure was often used at that time. Raymundus Lullus (1232–1315) based his constructions on this structure, while Juan de Herrera (1530–1597), in El trazado de la figura cúbica (c.1580), still recognized the 3 × 3-based structures (Herrera 1988, 19–106). The assessment of Guarc’s layout is based on the analysis of points, compass layouts, and auxiliary lines. It may lead to establish an interpretive methodology of the possible graphic construction of the main geometrical figures. An analysis of the preceding imprints to the final plans allows us to establish an interpretive methodology of the possible graphical construction (Fig. 4). In particular, several points on the support penetrate the parchment, and some points are located on the perimeter to fasten the parchment to its support (Pa1). Other points are from compass marks that could be used to transport measurements (Pa2) or for tracing circumferences, which penetrate and break the surface of the parchment (Pa3) (Fig. 15). The lines in the parchment are drawn using a punch for straight sections (La4) or through a two-pointed compass for circular layouts (La5). Two different types of graphic techniques were used to draw the lines: some are laid out with sharp tools that alter the surface of the skin; others are similar to graphite strokes. The technique that uses sharp tools frames the drawing and establishes the proportions, whereas the graphite strokes were placed as auxiliary lines and then fixed with

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1221

Fig. 15 Imprints of points, auxiliary lines, and plotting, Tortosa

final ink strokes (La6). An analysis of the auxiliary lines and points gives us the drawing sequence and determines the number of graphical operations. Some points are used only once, and others are used twice; others, such as PO.1, are used up to five times, which comprise most of the base of the auxiliary drawing. This point is known as the focal point or point of origin. Moreover, PO.2 is the center of the dome, and PO.3 the keystone of the choir and the center of the layout of the apse (Fig. 16). The sequence of auxiliary lines should have been (Fig. 17) • T0. Fixing points on the edge of the parchment, the incisions pierce the surface. • T1. Fitting lines of the drawing. The border lines T1.1 and T1.2. A point in T1.1 is used to lay out T.1.3; the measure between lines T1.1 and T1.3 is the module used for the lay out of the plan. • T2. Fitting lines of the plan. T2.1 is located at the mouth of the chevet. It is the starting point for determining the ratio of the width of the naves. A line outside the main drawing is drawn, T2.2, which is divided into six equal parts. • T3. Vertical composition lines of the plan. T2.1 contains points with a dual function. Some are used for the layout of the dome, and others to set the proportions of the central nave related to the lateral naves. The first point is located at 1/3 of the total length and the second at 1/6. The sequence of the lines is drawn as follows: the total width of the plan with T3.1 and T3.2, the width of the central nave through T3.3 and T3.4, and finally the widths of the side naves, with T3.5 and T3.6.

1222

Fig. 16 Auxiliary imprints typology. 1, line; 2, points

Fig. 17 Point plotter and auxiliary lines sequence tracing

J. Lluis i Ginovart

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1223

• T4. Horizontal composition lines of the plan. The horizontals lines T4.1 and T4.2 cross the vertical drawn before, shaping the structure of the nave. At this point it was possible to draw the dome or also the apse. • T5. Composition lines of the chevet. An arch is drawn from T4.1 to T2.1 and center in PO.1, named T5.1. Following, the vertical line T5.2 is drawn, which end determines the location of the semicircular chevet. This process can be interpreted as the method for laying out the heptagon. • T6. Layout lines of the apse. The line T5.2 has the same length as the distance between the points (T2.1-T3.3) and (T2.1-T5.1). It determines the location of the straight section of the presbytery, marked by the line T6-1. The point PO.3 is the center of the presbytery and the arches T6.2 and its correction T6.3 according to lines (T3.5-T6.1). After, the arch T6.4 completes the presbytery. • Lines T7 and T8 are the main lines in the auxiliary operations to determine the construction of the apse. From the Guarc parchment, it has been possible to determine that module 9 on the axis of the lateral chapels was divided between 8 units for the width and 1 unit for the separation wall. This module of 8 in the lateral chapel was transferred to the ambulatory arc and determined the radial chapels. In order to trace the heptagon, an arch was lowered over the straight section of the presbytery, dividing it into 8 units and 10 modules. Guarc transfers the measure of 8 to the straight section of the presbytery, creating the first chapels in the apse, and then places the measure on the arc seven times, with the opening of 8 units. From here, he establishes the interior construction of the radial chapels. Guarc also establishes a relationship between the two axes, the width of the side nave = 9 and the width of the radial chapel = 8, and its proportional ratio is therefore 9:8. The ratio (9:8) between the geometric layout of the Traça de Guarc and the general laying out of the masonry in the radial chapels (1383–1424) – 24 palms wide and traced at 54 palms – is the same. The solution between the radius of 18 modules of the circumference and 8 on the 14sided polygon in turn establishes a geometric, arithmetic, and metrological solution (Fig. 18).

The Construction of Heptagons At the beginning of Guarc’s construction, he locates point P13 over the line T3.6, which is perpendicular to the line T2.1. The location of point P13 is evident through a dark mark of ink, which reveals a direction of movement. It cannot be determined if point P13 is a measure carried from T5.2 with compass or if otherwise it is the starter point of the layout of the heptagon, and the line T5.2 is drawn from P13. The result would be the same in both cases. The layout of the curve T5.1, with center in PO.1, is very evident due to the mark left over the parchment. It is used to transfer the measure (PO.1-P13) to T2.1, where the point P14 is located (Fig. 19).

1224

J. Lluis i Ginovart

Fig. 18 Antoni Guarc’s parchment metrical structure (c.1345–1380)

Fig. 19 Sequence of the heptagon layout. Plan by Guarc, ACTo Fábrica no. 49

The vertical line T5.2 is located between points (P14-P15) and measures the same as the segment (P2-P14). Guarc determines this measure as the unit measure of the radial chapels, which will be the same as the measure of the chapels of the lateral nave. This measure will also define the proportion of the side of the heptagon.

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1225

The horizontal line T6.1 is placed across the point P15, and the center point of the circumference of the apse PO.3 is located over the line T6.1. The semicircle T6.2 is drawn with center in PO.3 and radius (P16-PO.3). The distance (P2-P14 = P14-P15) is located over the semicircle T6.2 from point P16, obtaining P17. The operation is repeated in turn with the same measure from P17, obtaining P18, P19, P20, P21, P22, and P23. Thus, the semicircle is divided into seven equal parts. To complete the layout of the chapels, the semicircle T6.4 is drawn with the center at PO.3 and the radius at point P24. Following, the six auxiliary radial lines, T7.1 to T7.6, are layout connecting the center PO.3 with the points P17 to P22. The shape of the back of the chapels is drawn with the perpendicular line to the segments formed by the succession of points P16 to P23. Thus, 14 orthogonal lines are drawn, T8.1 to T8.14, and the points P24 to P37 define the depth of the chapels. The method used by Guarc to trace the cathedral apse has nothing in common with the major treatises in the geometrical tradition by the gothic constructors that have survived. In general terms, the methods for tracing the heptagon in Gothic geometry work by using either angular division – the method used by the manufacturers of geometrical instruments such as the astrolabe and the quadrant – or the division of the circumference by means of the compass, as already established by the gromatic tradition. The angular division method is illustrated by the case of the Practica Geometriae (c.1125–1130) by Hugonis of Sancto Victore (1096–1141) in the definition of Altimetria (p10–14) (Baron 1956, 194–198; Frederick 1991, 31– 70) and the Practica de Geometriae (1346) by Dominicus of Clavasius (f.1346), Instrumentun gnomonicum construer (Monac.410, fol.76r) (Busard 1965, 530– 531). These methods for the angular division of the circumference arch survived until the Renaissance and were used by Bouvelles (1478–1567) in the Geometrie pratique (Bouvelles 1542, 25) and in the Compendio de Arquitecura y Simetria de los Templos by Simón García (1681), which is the compilation of Gothic knowledge by Rodrigo Gil de Hontañón (1500–1577) (García, 1991, fol64v). These systems led to trigonometric tracings such as the Problemata geometrica (1583) by Stevin (1548–1620), in his Tomus Secundus Mathematicarum hypomnematun de Geometriae Praxi (Lib2. pro.7), using a 360◦ /7 angular division, with the ratio of 10,000,000/8677676 (Stevin 1605, 66). La siensa de atermenar (1401), by Berand Boysset (1355–1415), contains a chapter about the layout of the heptagon: Quapitol d’atermenar posesion partida que fos tota redona. The graphic solution is achieved using a square-shaped landmark with some clefts indicating the heptagon diagonals. This method suggests the angular division of the circumference applied to a geometric template (Portet 1995). However, the first geometry disseminated widely beyond the loggias of architecture was in German – the Geometrie Deutsch (1488) by Matthäus Roriczer (+c. 1495) – in which the heptagonal construction takes place by means of a combined operation with the compass, starting from the equilateral triangle with a side equal to the circumference radius (Hoffstadt 1847, 17–18; Recht 1980, 24– 25; Roriczer 1999 56–60; Shelby 1977, 118–119), (Fig. 20.1). The major treatise writers during the Renaissance also used Dürer (1471–1528) in Underweysung der Messung (1525) (L II. 12) and the pentagon corollary (L II. 15) (Fig. 20.2).

1226

J. Lluis i Ginovart

Fig. 20 Heptagonal layout 1: Roriczer (1488), 2: Dürer (1525)

The metalsmith Juan de Arfe (1535–1603), in his De varia commensvracion para la escvltvra y architectura (1585), LI. Cap.II, Heptágono (Arfe 1585 7v) describes a method to trace an heptagon. The same is true of the manuscript from the Arab tradition (L. I Cap 15) by Diego López Arenas (+c.1640), in the Primera y segunda parte de las reglas de la carpintería (1616), edited (1633) (López de Arenas 1633, 16–17; Nuere 2001, 155). This tradition was continued by Fray Lorenzo of San Nicolás, (1593–1679) in Arte y Vso de Architectvra (1633) LI, Chap. XLII (Lorenzo de San Nicolás 1633, 80v). These solutions work by means of some variation in the plotting of the heptagon set by Ab¯u al-Waf¯a Al-B¯uzj¯an¯ı (940–998) Kit¯ab f¯ı m¯a yah.t¯aju al-s.a¯ ni’ min al-a’m¯al al-handasiyya (c.993–1008) propositions Y47B, Y47C, and Y47D (Aghayani-Chavoshi 2010) disseminated during the Renaissance across Europe (Raynaud 2008, 34–83) (Fig. 21). Another possible solution to the plotting of the heptagon is based on the proportion between the side of the polygon inscribed in the surface of a heptagon and its diameter, which is the case of the pseudo-Heronian Metrica, attributed by Hero of Alexandria (c.20–62) (I, 29 Theorem 54), Dimetiendi rationes (I, XX) (Schoene1903, 55–59). Other approaches can be found in Pseudogeometría Geometry II by Boethius defined in De multiangulis figuris, De eptagoni. (Friedlein 1867, 420–421). Similarly, an approach can be found in the work of Giorgio Valla (1447–1501), De expetendis et fugiendis rebús, with one part dedicated to the six books of Geometry (Valla 1501 Exp.et Fug. Lib XIIII et Geometriae V). Importantly, an approach similar to Guarc’s layout (c.1345–1380) is found in the Manifiesto Geométrico del Muy Reverendo Padre Fray Ignacio Muñoz (1683),

43 Mathematics and the Art and Science of Building Medieval Cathedrals

Fig. 21 Geometric construction of the heptagon

1227

1228

J. Lluis i Ginovart

by the Dominican mathematician Fray Ignacio Muñoz (1612–1685), which was published almost two centuries after the Guarc parchment (Muñoz 1683). The Manifiesto Geométrico was intended as an addition to Euclid’s Elements Book IV. It determines that the heptagon’s side has a ratio of 4:9 with the figure’s main diagonal and traces the other five vertices using a method that is different to Guarc’s. In the treatise, he refutes J. Kepler (1571–1630) and takes issue with the cosmographer Luis Serrán Pimentel (1613–79) saying that he has used the angular division using trigonometric tables by P. Joseph Zaragoza (1627–79). The search for a geometrical solution to the heptagon layout problem continues today, through the inscription in other polygons. For example, Miranda Lundy’s Sacred Geometry (1998) uses a method starting with a regular triangle and hexagon, inscribed in the circumference, seven from three (Lundy 1998, 36–37). In Mark Reynolds’ construction (2001) From Pentagon to Heptagon: A Discovery on the Generation of the Regular Heptagon from the Equilateral Triangle and Pentagon, the construction of the regular heptagon is related to the pentagon, through the inscribed and circumscribed circumference of the pentagon (Reynolds 2001, 139– 145). Nevertheless, there are other methods in practical geometry which are not passed down in the learned geometry texts which are capable of tracing the heptagon. In the six-knot method, using an extended rope split into five segments, two circumferences are traced with a radius of four parts from both edges; their intersection determines the heptagon’s side. In the 14-knot method, with a triangle (4, 4, 5) over the side of 5, a line is traced from the first knot to the opposite vertex, obtaining the heptagon side (Lundy 1998, 36–37). In the seven equal segments method, nesius construction, a segment determines the base for the construction of a triangular figure, while the other six intersect two by two over the bisector of the base segment until three of its points are aligned, forming a triangle. As a template, the method can be constructed easily with a pocket rule, which articulates the instrument in its segments. Some of the later classical treatises dealt with a proportional solution between the side of the inscribed polygon side and the heptagon’s surface, such as the Metrica de Heronis Alexandrinus (c.20– 62) (I, 29 teorema 54), also known as Dimetiendi rationes (I, XX). Starting from the regular hexagon, Heronis deduces that the equilateral triangle constructed by using the radius – to which he assigns a length of 8 units – is 7 units high (I, IXX). Hence in proposition (I.XX), a heptagon of side 7 has a radius of 8, providing the proportion with the circumference diameter of 16:7. This gives a commensurable solution similar to Guarc’s proportion (Bruins 1964; Schoene1903, 55–59). A similar method in Geometria II of Boethius, De multiangulis figuris, De eptagoni relates a 6 foot side with a heptagon surface of 81 feet (Friedlein 1867, 420–421). Using a computer simulation of Guarc’s method for the division of the circumference of the 18-unit radius into seven equal parts, a side of 8011 is obtained. Analyzing the different methods, we obtain (Fig. 22): • Ancestral 1, rope with 6 knots (6/5): 6 of 8010 and 1 of 8014 • Ancestral 2, ribbon with 13 knots (4, 4, 5): 6 of 7994 and 1 of 8113 • Ancestral 3, nesius 7 L, equal segments: 7 sides of 8011

43 Mathematics and the Art and Science of Building Medieval Cathedrals

Fig. 22 Heptagon methods with R = 9

• • • • • • •

Heronis Alexandrinus (c.20–62) (16/7): 6 of 7875 and 1 of 8823 Abu’l-Wafa (c.990) and its derivatives: 6 sides of 7994 and 1 of 8113 Guarc (c.1345–1380) (18/8); 6 sides of 8000 and 1 of 8075 Fray Ignacio Muñoz (1683), (9/4): 1 side of 8000, 4 of 8009, and 2 of 8019 Simon Stevin (1605), angular 360/14 division: 7 sides of 8011 Miranda Lundy (1998), the triangle: 4 of 8046, 2 of 7985, and 1 of 7920 Mark Reynolds (2001), the pentagon: 6 sides of 8.016 and 7981

1229

1230

J. Lluis i Ginovart

An analysis of the accuracy of the solutions, each of which was acceptable and valid in its own time, shows that Guarc’s method has a smaller error than Abu’lWafa’s and its derivatives and an accuracy similar to the 13-knot layout. Only the ancestral six-knot method and the Fray Ignacio Muñoz’s method are more accurate than Guarc’s.

The Construction of Octagons The layout of the dome drawn on Guarc’s parchment requires a method for constructing the octagon. Therefore, Guarc takes the main line, T2.1, at the foot of the presbytery as the base and constructs the structural square, which contains the dome. A compass point is observed at P1, P2, P3, and P4, which was how the center of the square, PO.2, was laid out. This point is determined by the intersection of the diagonals (P1-P3) and (P2-P4) where the auxiliary layouts of graphite are still visible. The opposite vertices of the square, P1 and P3, have two compass marks unlike the rest. The points P5 and P6 are obtained by rotating the segment (P1PO.2) on the vertex (P1). The same sequence is conducted on point (P3) and obtains points P7 and P8. The distance (P5-P7) and (P6-P8) is the measurement of the side of the octagon. Points P9, P10, P11, and P12 are obtained by the reiteration of this measurement with a compass, whose marks can be observed on each point (Fig. 23). However, practical methods for the construction of this figure were used in the late classical world. This is the case of the Gromatic text Fragmentum de hexagono et octogono (Fig. 24.1), which is attributed to Marcus Terentius Varro (116–27 BC) (Bubnov 1899, 552). This text contains the drawing of the octagon, whose squaring construction was widely used in Roman flooring (Watts 1996: 165–181. This method was also considered by Hero of Alexandria in the Metrica (c.20–62) (LI.XVIII) (Fig. 24.2), (Bruins 1964, 1153; Schoene 1903, 57–59). This method is considered a reference for the plan by Horologion des Andronikos (Tower of the Winds, Athens, s.I aC) (Svenshon 2010, 103–112) and in some apses with an octagonal layout (Cantor 1907, 377–379; Özdural 2002, 217–242). The pseudoHeronian De mensuris also contains a construction of an octagon inscribed in a square (Heiberg 1914, 206–207; Høyrup 2009, 367–377). In the Middle Ages, there is also a method from the Arab tradition of Ab¯u al-Waf¯a Al-B¯uzj¯an¯ı (940–998) Book on those geometric constructions which are necessary for craftsmen (c.993–1008). In the method of W79, an octagon is inscribed in a square in the tradition of De mensuris (Fig. 24.3). This method would have been one of Guarc’s direct predecessors (Aghayani-Chavoshi 2010, W31). In Western culture, there is the figure of the mediatrix, which mediates between the square and circle of De triangulis (c.1250) of Jordanus Nemonarius (1225–1260) (P.IV.15) (Beaujouan 1975, 453–454) (Fig. 24.4). The Gothic design of the octagon appeared in the Geometrie Deutsch (1488) (fol 3r.) by Matthaüs Roriczer (c. 1435–1495). It is drawn using the inscribed square and abates its semidiagonal (Hoffstadt 1847, 20; Recht 1980, 24–25; Roriczer 1999, 56–60; Shelby 1977, 119–120) (Fig. 24.5). An operating system was produced to

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1231

Fig. 23 Sequence of the octagon layout. Plan by Guarc, ACTo Fábrica no. 49

simulate the construction WG 18 in the Frankfurt album “1560–1572” (Bucher 1979, 219). The method is similar to the approach used 100 years earlier by Antoni Guarc (c. 1345–1380). It is also similar to W79 by Ab¯u al-Waf¯a Al-B¯uzj¯an¯ı (c.993– 1008). The geometries of the Renaissance also considered the problem of the octagon. Luca Pacioli (1445–1517) considers this in the Summa de Aritmética, Geometría, Proportioni et Proporcionalità (1494), where he uses a method of squaring the circle that is similar to Nemonarius (Pacioli 1494, fol.31v). He also addresses the issue in Divina proportione (1497) and applies integer arithmetic ratios between the diameter and the side of the octagon (Pacioli 1509: fol. 7r-7v). Leonardo da Vinci (1452–1519) in the Windsor codex 12,542 (1478–1518) constructs the circle that is inscribed on the side of the square of the isoptic of the side of the octagon (Reynolds 2008, 51–76) (Fig. 24.6). Albrecht Dürer (1471–1528) in the Underweysung der Messung (1525) uses the reiterated process for the partition of the side of the square

1232

J. Lluis i Ginovart

Fig. 24 Octagon layout. 1 Gromatics (c 100 aC). 2 Herón Alejandria (c.60) 3 Abu’l-Wafa (c.990), 4 Jordanus Nemonarius (c.1250). 5 Roriczer (1488). 6 Leonardo da Vinci (1478–1518). 7 Dürer (1525). 8 Arfe (1585)

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1233

with the compass, in the tradition of Nemonarius and Pacioli (Dürer 1525, L.II.14) (Fig. 24.7). In the architectural treatises, such as Il Primo libro d’architettura (1545) by Sebastiano Serlio (1475-c.1554), the Roriczer method is used (Serlio 1545, 19). In Spain, it appears in the architectural manuscript (c.1545–1562) of Hernán Ruiz el Joven (c.1514–1569) (Navascués 1974 Lam. XXIII). The goldsmith Juan of Arfe (1535–1603) of German origin wrote De varia commensvracion para la escvltvra y architectura (1585), the most influential geometric treatise of the Spanish Renaissance, which addressed the octagon (Arfe 1585, fol. 7v-8r). He designs the figure by using the circumference and its arcs (LI.T1.C2.13) for which the reference could have been Albrecht Dürer. This influence also applies to the octagon that is inscribed in the square (LI.T1.C2.14), which could be influenced by Augustin Hirschvogel (1503–1553) in Ein aigentliche vnd grundtliche anweysung in die Geometria, Berg und Neuber (1543) (Hirschvogel 1543, 18). This method is similar to the approach used by Guarc (Fig. 24.8). These methods, which were inherited from the W78 and W79 of Ab¯u al-Waf¯a Al-B¯uzj¯an¯ı (c.993–1008), persisted until Lorenzo de San Nicolás (1593–1579) in the Arte y Vso de Architectvra (Lorenzo San Nicolás 1639 fol. 144 r.).

The Geometria Fabrorum The sequence in which Guarc traced his project (c.1345–1380) is similar to the way the cathedral chevet was built. The Gothic master laid out the stonework in the same way as he drew it. A comparison of the beginning of the gothic cathedral and Guarc’s project – despite their considerable formal differences – shows that are very similar both geometrically and arithmetically (Fig. 25). The construction of the chevet involved several masters. Guarc was not one of them, but they had to deal with the same problems. One of the main challenges was the staking out of a heptagon without knowing its center. The new Cathedral had to replace the old Romanesque one, and the construction begun while it was still in use. Thus, the new chevet surrounded the old one, and the center of the heptagon was inside the Romanesque building. The study of the parchment reveals a method that could be used for the geometric plotting of Guarc’s plot ratio (c.1345–80) which is equal to that of the apse built (1383–1424). The solution is both arithmetic and geometric. According to the theory of proportionality, if the presbytery has a width of 18, the chapels must have 8 modules. To build a chapel of 3 canas (24 palms), the radius must be of 6 canas and 6 palms (54 palms). These metrical structures are directly related with some sources found in the ACTo (Fig. 26). To layout a heptagon without knowing its center is a geometrical problem, which does not have an easy solution. In the following section, a historical review on the learned methodologies from the Gothic construction point of view is used to examine this issue.

1234

J. Lluis i Ginovart

Fig. 25 Metrology of the Guarc layout (c.1345–1380) and construction of the apse (1383–1441)

Fig. 26 Geometric structure of A. Guarc and the Cathedral (c.1345–1375)

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1235

On the other hand, the keystone of Tortosa Cathedral was 10 palms in diameter at the base, 3.5 palms in diameter at the neck, and 5.5 palms high from the upper surface of the rough tiling of the roof. The keystone volume has been calculated to be 3.64 m3 and its weight 87.47 kN. Thus, in the quarry, the stone block used for the keystone must have been approximately 8.77 m3 . The block’s grinding, carving, and sculptural decoration is attributed to the sculptor Bartomeu Santalínia, who came from a family of goldsmiths and was employed to work on the construction of the cathedral for 59 days in the summer of 1439. The final cut of the keystone had to resolve both the problems involved in carving the iconography of the Coronation of the Virgin Mary and the problems related to the geometry of the Gothic stonework. The lower carving is set out on a circumference, and the neck of the keystone had to accommodate the nine diagonal arches of the presbytery that converged on it. There must be a relationship between the circumference length, its diameter, and the dimensions of the diagonal arch that meets the keystone. As a result, this circumference had to be divided into seven equal parts. The keystone was cut prior to the completion of the diagonal arches, and the size of the mold of the arch was dictated by the mold of the base of the column placed at the presbytery. The geometry of the neck needed to accommodate the diagonal ribs. Its size can be determined using simple proportion, 9:4, using either the ratio of the circumference to the rib or vice versa. The ratio 9:4 (radius of the circumference to the side of the inscribed polygon) is highly effective for the division of the circumference into 14 equal parts. The arches’ axes are equidistant, 36 cm, and the neck of the keystone has a radius of 81 cm, corresponding to the ratio of 9:4. The keystone of the presbytery of Tortosa Cathedral, which represents the Coronation of the Virgin Mary after her Ascension into Heaven surrounded by a choir of ten angels, is a highly symbolic element that overlooks the high altar and was the culmination of the cathedral’s construction. The keystone was placed during a public ceremony on Sunday 27 September 1439, the feast day of the Virgin Mary’s Assumption. The geometric and topographic metrology of Tortosa Cathedral clau major, its keystone, is strongly based, in symbolic terms, on Saint Augustine’s number theory (10 × 100) (Fig. 27). Following his teaching, the keystone’s theoretical diameter is 10 palms (2.32 m), and its position is at a height of 100 palms (23.23 m). It thus represents the number 1000, the perfect number in the fullness of time that Saint Augustine defined in De Civitatis Dei (XX.7.2). The keystone represents the volumetric concept of space. The square figure is flat, and it is given height to make it three-dimensional or volumetric. This appears in the codex ACTo. 20, fols. 337v-338v.

Mathematics and the Art and Science of Building Medieval Cathedrals In the codex ACTo 80, Capella describes two kinds of lines, rh¯etós and álogos. The plot of the 14-sided polygon used in Guarc’s elevation, as in Tortosa’s apse,

1236

J. Lluis i Ginovart

Fig. 27 The Coronation of Virgin Mary in the main keystone of Santa Maria Cathedral, Tortosa

uses the ratio 9:8 to define the width of the nave and the chapel. This makes the chapels in the straight section of the apse and those located on its plumb line commensurable and equal. In Capella’s terms, it is rh¯etós. The ratio 9:8 appears in Calcidius (ACTo 80) and in Macrobius (ACTo 236) as the whole plus one-eighth (1 + 1/8), which is called epogdus. The geometrical layout of Guarc’s octagon is similar to the one published in the Geometria Deutsch. This layout is álogos, which is incommensurable. The arithmetical transposition of the octagon may be based on Heron of Alexandria’s Metrica. It enables the drawing of a 7.5 side, equal to the depth of a lateral chapel and a dome with an 18-module base. When Tortosa’s Gothic cathedral was constructed, the remains of a previous Romanesque cathedral still existed. The builders were therefore unable to draw the chord of the circle within the apse. Using Guarc’s ratio, 9:4, circumscribing the polygon, it was possible to draw it. Furthermore, using a 2-9-9 triangle and a 9-8-9-4 trapezium, and by gradually rotating the two polygons on their sides, the apse could be plotted without knowing its center (Fig. 28). The 18:8 ratio and the 9-8-9-4 trapezium are similar to the 13-18-13-8 trapezium in the Hibbur ha-Meshihah ve-ha-Tishboret of 1116 by Abraam Bar Hiia (1070– 1136) (L II. 77) (Millàs 1931, 62) and the Practica geometriae of 1223 by Leonardo Pisano (c.1180–1250) (Boncompagni 1854, 78–81) and have similar solutions (Levey 1952) (Fig. 29). Guarc’s ratio, 9:4, does not appear in any learned treatises, but it is an instrument of the geometria fabrorum that provides a geometric and arithmetic solution. In Tortosa Cathedral, taking as a starting point the measurement of the chapel, which

43 Mathematics and the Art and Science of Building Medieval Cathedrals

Fig. 28 Laying out the apse of Tortosa Cathedral (1383–1441)

Fig. 29 Laying out the apse of Tortosa Cathedral (1383–1441) with trapezium 9-8-9-4

1237

1238

J. Lluis i Ginovart

is 3 canas (24 palms), all the measurements of the apse, both in the floor plan and in the cross section, are implemented as an algorithm. The large measurements are related to the numerical modulations diapente and diatessaron, which were well known to the canons who had read ACTo 80 and ACTo 236. The Prochiron gothic liturgy confers particular importance on lateral chapels. Moreover, chapels are a financing instrument of the masonry, as well as a support providing balance during the staggered construction of Catalan cathedrals (Font 1891, 9–14; Puig i Cadafalch 1923, 65–87). From a metric point of view, the radial and lateral chapels must be equal. Guarc presents a solution which is metrologically commensurable for the division of the semicircumference. The chapels along the curved section must be equal to the chapels on the straight side. The radial chapel is the basic unit in Tortosa Cathedral and determined on liturgical, economic, and constructive grounds. It is the essential modulus which imposes the final form and metrology, defining a Neoplatonic proportional relation in terms of the quadrivium. The lateral chapels acted as an auxiliary structure during the construction process. They provided the essential buttressing system while the cathedral was being built, when it needed a staggered system of the chapels and lateral naves. The Tortosa masters solved the heptagon construction in an elementary way. Using the 18:8 ratio, they traced the cathedral chevet without using the center of the circumference. The method consisted of the 9:8 ratio between the modulus of the lateral nave and the chapel, which is also a tonal expression in medieval music. The proportion of 9:8 also has the same metrological structure as the Tortosa cana, in base 8, making it a good approximation, tracing ad triangulum, to the height of the equilateral triangle. The solution in Tortosa Cathedral is in keeping with the tradition of pseudo Heron’s Metrica, in which the inscription of some regular polygons is solved by means of the proportional solution of integers. The arithmetical and geometrical expressions used by some gothic southern Masters are integers within Neoplatonic culture and far from approximations of irrational numbers. The way in which Guarc traced the heptagon, or staked out the Tortosa apse, does not appear in medieval geometries, but it has been proven to be far more effective than the Renaissance methods because it solves the layout of the enigmatic figure geometrically and arithmetically. These methods do not need to know the center of the circumference to trace the heptagon and can be used to build new apses around old presbyteries while the latter were still in use. The magister knew that if the apse was started from the gospel radial chapel, it would end at a particular point – the epistle chapel – according to the measure of the radial chapel. The canons of the Cathedral of Tortosa commissioned a cathedral with a chapel of 3 canas and 24 palms. Everything else opens out automatically from this unit in to a predetermined order: the width of the cathedral is 150 palms; the depth of the apse is 100 palms (Fig. 30).

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1239

Fig. 30 Construction apse of Tortosa Cathedral (1383–1441)

Acknowledgments Members group research Patriarq: PhD Agustí Costa Jover, PhD Sergio Coll Pla, and Arq. Mónica López Piquer.

References Aghayani-Chavoshi (2010) Ketâb al-nejârat. Sur ce qui est indispensable aux artisans dans les constructions géométriques. Written Heritage Re- search Centre & Institut, Tehran Alonso M (1955) Domingo Gundisalvo. De Scientiis. Compilación a base principalmente de la de Al-Farabi. Consejo Superior de Investigaciones Científicas, Madrid Arfe J (1585) Ioan de Arphe y Villafañe (1535-1603). De varia commensuracion para la escultura y architectura. Andrea Pescioni y Juan de León, Sevilla Armisen-Marchetti M (2001) Macrobe. Commentaire au Songe de Scipion. Livre I. Les Belles Lettres, Paris

1240

J. Lluis i Ginovart

Armisen-Marchetti M (2003) Macrobe. Commentaire au Songe de Scipion. Livre II. Les Belles Letres, Paris Baron R (1956) Hugonis de Sancto Victore Practica geometriae. Osiris Año 1956:176–224 Bayerri E (1962) Los Códex Medievales de la Catedral de Tortosa. Novísimo inventario descriptivo. Talleres Gráficos Algueró y Baiges, Tortosa Beaujouan G (1963) Calcul d’expert, en 1391, sur le chantier du Dôme de Milan. In: Le Moyen Age 79. Livraire Jubilaire, Bruxelles, p 555–563 Beaujouan G (1975) Réflexions sur les rapports entre théorie et pratique au Moyen Âge. In: Murdoch E, Sylla ED (eds) The cultural context of medieval learning. J. Reidel Publishing Company, Dordrech, pp 437–484 Boncompagni B (1854) Intorno ad alcune opere di Leonardo Pisano matematico del secolo decimoterzo. Tipografia delle Belle Arti, Roma Bouvelles C (1510) Liber de intellectu, Liber de sensibus, Libellus de Nihilo, Ars oppositorum, Liber de generatione. Liber de Sapiente, Liber de duodecim numeris . . . Henri Estienne, Paris Bouvelles C (1542) Livre singulier et utile, touchant l’art praticque de geometrie, composé nouvellement en françoys, par maistre Charles de Bouvelles. Regnaud Chaudière et Claude, Paris Bruins EM (1964) Codex Constantinopolitanus Palatii Veteris no. I, Janus Suppl. 2, 3 vols. Brill, Leiden Bubnov N (1899) Gerberti postea Silvestri II papae opera mathematica. Friedlände, Berlin, pp 972–1003 Bucher F (1979) Architector. The lodge books and sketchbooks of medieval architects. V.1. Abaris Books, New York Burnett C (1996) Algorismi vel helcep decentior Est diligentia: the arithmetic of Adelard of Bath and his circle. In: Folkerts M (ed) Mathematische Probleme im Mittelalter: der lateinische und arabische Sprachbereich. Otto Harrassowitz, Wiesbaden, pp 221–331 Burnett C (2006) The semantics of Indian numerals in Arabic, Greek and Latin. J Indian Philos 34:15–30 Busard HLL (1965) The Practica geometriae of Dominicus de Clavasio. Arch Hist Exact Sci 2:520–575 Cantor M (1907) Vorlesungen Uber Geschichte Der Mathematik, V1.2. B.G. Teubner, Leipzig Capella M (2001) Le Nozze di Filologia e Mercurio. ed. and trad. Bompiani, Milano Cornford FM (1937) Plato’s cosmology. The Timaeus of Plato. Kegan Paul, Trench, Trubner & Co, London Curtze M (1887) Jordani Nemorarii Geometria vel de triangulis libri IV. Mitteilungen des Coppernicus-Vereins zu Thorn 6/1887 Curtze M (1897) Petri Philomeni de Dacia in algorismun vulgarem Johannis de Sacrobosco. Petri Philomeni de Dacia. A. F. Hoest & Fil, Copenhagen Dürer A (1525) Underweysung der Messung, mit dem Zirckel und Richtscheyt: in Linien Ebnen vo gantzen Corporen. Hieronymus Andreae, Nüremberg Euclides (1576) Los seis libros primeros de geometría de Evclides: traduzidos en lengua española por Rodrigo Çamorano Astrólogo y Mathemático y Cathedrático de Cosmographía por su Magestad en la casa de la Contratacion de Sevilla. Casa de Alfonso de la Barrera, Sevilla Friedlein G (1867) De institutione arithmetica libri duo. De institutione Musica libri quinque. In: Aedibus BG (ed) Accedit geometria quae fertur Boetii. Teubneri, Leipzig Font A (1891) La catedral de Barcelona. Ligeras consideraciones sobre su belleza arquitectónica. Imprenta y Litografía en Henrich y Cª, en comandita, Barcelona Frederick AH (1991) Practical geometry (Practica Geometriae). Attributed to Hugh of St. Victor translated from the Latin with an introduction, notes and appendices. Marquette University Press, Milwaukee García S (1991). Compendio de arquitectura y simetría de los templos conforme a la medida de cuerpo humano con algunas demostraciones de geometría: año de 1681: recoxido de diversos autores, naturales y extranjeros. Fascímil 1683. Colegio Oficial Arquitectos en Valladolid, Valladolid

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1241

Gauss CF (1801) Disquisitiones Arithmeticae. Auctore D. Carolo Federico Gauss. In: Commissis apud Gerh. Fleischer Jun, Lipsiae González Á (1932) Alfarabi. Catálogo de las ciencias. Facultad de Filosofia y Letras, Madrid Halliwell JO (ed) (1841) Rara mathematica; or, a collection of treatises on the mathematics and subjects connected with them, from ancient inedited manuscripts. Samuel Maynard, London Heath TL (1908) V2 the thirteen books of Euclid’s elements. University Press, Cambridge Heiberg JL (1914) Heronis quae feruntur Stereometrica et De mensuris. Heronis Alexandrini Opera quae supersunt omnia. Teubner, Leipzig Herrera J (1988) Sobre la figura cúbica. Universidad de Cantabria, Santander Hirschvogel A (1543) Ein a igentliche vnd grundtliche anweysung in die Geometria. Berg und Neuber. Vom Berg und Neuber, Nuremberg Hoffstadt F (1847) Principes du Style Gothique. Exposés d’après des documents authentiques du moyen Âge avec 40 planches in-folio. A l’usage des artistes et des ouvriers. A. Franck et Victor Didron, Paris Hogendijk JP (1984) Greek and Arabic constructions of de regular heptagon. Arch Hist Exact Sci 30(3–4):197–330 Høyrup J (2006) The rare traces of constructional procedures in practical geometries. Creating shapes. A workshop held at the Max-Planck-Institut für Wissenschaftsgeschichte, 7–9 Dec 2006, Berlin Høyrup J (2009) The rare traces of constructional procedures in ‘practical geometries’. In: Nowacki H, Lefèvre W (ed) Creating shapes in civil and naval architecture. Brill, Leiden & Boston, pp 367–377 Kepler J (1864) V5. Joannis Kepleri Astronomi Opera Omnia. Volumen Quintum. Edidit Dr. Ch. Frisch. Heyder & Zimmer, Francofurti A.M. et Erlangae Knorr WR (1998) On Archimedes “Construction of the Regular Heptagon”. Centaurus 32: 257–271 Levey M (1952) The encyclopedia of Abraham Savasorda: a departure in mathematical methodology. Isis 43(3):257–264 López de Arenas D (1633) Breve Compendio de la Carpintería de lo blanco y tratado de alarifes, con la conclusión de la regla de Nicolás Tartaglia, y otras cosas tocantes son la Iometria, y puntas de compás. Luis Estupiñan, Sevilla Lorenzo de San Nicolás (1633) Arte y uso de la Arquitectura. Dirigida Al Smo Patriarca S. Ioseph. Compuesto por Fr. Laurencio de S Nicolas, Agustino Descalço, Maestro de obras. Madrid Lorenzo San Nicolás (1639) Arte y Uso de Architectura. Dirigida al Smo. Patriarca S. Ioseph. Compuesto por Fr. Laurencio de S. Nicolás, Agustino Descalco, Maestro de Obras S.l: s.n., s.a Lundy M (1998) Sacred geometry. Wooden Books/Walker & Company, New York Menéndez G (1959) Los llamados numerales árabes en occidente. Boletín de la Real Academia de la Historia 145:45–116 Millàs JM (1931) Llibre de geometria: Hibbur hameixihà uehatixbòret. Alpha, Barcelona Moreschini C (2003) Calcidius. Commentario al Timeo di Platone. Bonpiani, Milano Muñoz I (1683) Manifiesto geométrico, plus ultra de la Geometría práctica, adición al IV libro de los elementos de Euclides. Francisco Foppens, Brussels Navascués P (1974) El libro de arquitectura de Hernán Ruiz, el Joven. ETSAM, Madrid Nuere E (2001) Nuevo tratado de la carpintería de lo blanco y la verdadera historia de Enrique Garavato carpintero de lo blanco y maestro del oficio. Editorial Munilla-Lería, Madrid Olleris A (1867) Oeuvres de Gerbert, Pape sous le nom de Sylvetres II collationnées sur les manuscrits, précédées de sa biographie, suivies de notes critiques & historiques. F. Thibaud, Impr.-Libr.-Éditeur, Clermond-Ferrand Özdural A (2002) The church of St. George of the Latins in famagusta: a case study on medieval metrology and design techniques. In: Wu N (ed) Ad quadratum. Ashgate, Burlington Pacioli L (1494) Summa de Aritmética, Geometría, Proportioni et Proporcionalità. Paganino de Paganini, Venezia Panofsky E (1951) Gothic architecture and scholasticism. Archabbey Press, Pennsylvania

1242

J. Lluis i Ginovart

Pez B (1853) SS Silvester II. Geometria [0999–1003]. Patrologia Latina Vol. 139. MPL139, Col. 0091 – 0152C. Apud Garnier Fratres, Paris Portet P (1995) Bertrand Boysset, arpenteur arlésien de la fin du IVe. Siècle. 1355–1415. Toulouse: Université de Toulouse II. Le Mirail. Thèse d’habilitation Puig i Cadafalch J (1923) El problema de la transformació de la catedral del nord importada a Catalunya. Contribició a l’estudi de l’arquitectura gòtica meridional. In: Miscel·lania Prat de la Riba. Institut Estudis Catalanas, Separata Rashed R (1976) Ibn al-Haytham’s construction of the regular heptagon. J Hist Arab Sci 3(2): 387–409 Raynaud D (2008) Geometric and arithmetical methods in early medieval perspective. Physis; rivista internazionale di storia della scienza 45:29–55 Recht R (1980) Le traité de Géométrie de Mathieu Roriczer. Les bâtisseurs du Moyen-Âge Histoire et Archéologie, 24–25 Reynolds M (2001) From pentagon to heptagon: a discovery on the generation of the regular heptagon from the equilateral triangle and pentagon. Nexus Netw J 3(2):139–145 Reynolds M (2008) The octagon in leonardo’s drawings. Nexus Netw J Architect Math 10(1): 51–76 Roriczer M (1999) Das Büchlein von der Fialen Gerechtigheit (fak. Regensburg 1486) und Die Geometria Deutsch (fak. Regensburg un 1487/88). Hürtgenwald Guido Pressler, Regensburg Saidan AS (1974) The Arithmetic of Ab¯u’l-Waf¯a’. Isis 65(3):367–375 Sarrade MT (1986) Sur les connaissances mathématiques des bâtisseurs de cathédrales. Librairie du Compagnonnage, Paris Schoene H (1903) Heronis Alexandrini opera quae supersunt omnia. Vol III: Rationes dimetiendi et commentatio dioptrica. Teubner, Leipzig. Sebastián S (1994) Mensaje Simbólico del Arte Medieval. Arquitectura, Liturgia e Iconografía. Ediciones Encuentro, Madrid Serlio S (1545) Il Primo libro d’architettura di Sebastiano Serlio, bolognese. Le premier libre d’Architecture de Sebastiano Serlio, Bolognoi, mis en lange Francoyse par Iehan Martin. Jean Barbé, París Shelby LR (1977) Gothic design techniques. The fifteenth-century design booklets of Mathes Roriczer and Hanns Schumuttermayer. Southern Illinois University Press, Carbondale Silberberg M (1895) Sefer Ha-Mispar. Das Buch der Zahl, ein habräisch-Werk arithmetisches Werk des R. Abraham ibn Esra. J. Kauffmann, Frankfurt am Main Simson OG (1956) The Gothic Cathedral: the origins of gothic architecture and the medieval concept of Order Harper & Row. Evanston, New York Stevin S (1605) Tomus Secundus Mathematicarum hypomnematun de Geometriae Praxi. Ex Officina Ioannis Patii, Academiae Typographi, Lugodini Batavorum Street GE (1926) In: Murray J (ed) Some account of gothic architecture in Spain. Albemarle Street, London Suter H (1922) Das buch der geometrischen Konstruktionen. Suter H. Das Buch der geometrischen Konstruktionen des Abu’l Wefa. Abhandlungen zur Geschichte der Naturwissenschaften und Medizin. 4, p 94–109 Svenshon H (2010) ‘Schlag’ nach bei Hheron. Der Turm der Winde im Sspiegel antike Vvermessungslehre’. Koldewey-Gesellschaft Vereinigung für Baugeschichtliche Fforschung E.V. Bericht über die 45. Tagung für Ausgrabungswissenschaft und Bauforschung. Von 30 april bis 4 maig 2008 in Regensburg. Klaus Tragar, Augsburg Toomer GJ (1984) Ptolemy’s Almagest. Translated and annotated by G.J. Toomer. With a foreword by Owen Gingerich. Duckworth, London Valla G (1501) De expetendis et fugiendis rebus Impr. Aldo Manuce, Venetiis Viète F (1696) Francisci Vietae Opera Mathematica. In unum Volumen congesta ac recognita. Ex Officina Bonaventurae & Abrahami Elzeviriorum, Lugduni Batavorum Waszink JH (ed) (1975) Plato Latinus. IV. Timaeus, a Calcidio translatus commentarioque instructus. The Warburg Institute, London

43 Mathematics and the Art and Science of Building Medieval Cathedrals

1243

Watts CM (1996) The Square and the Roman House: Architecture and Decoration at Pompeii and Herculanum. In: Nexus Network Journal: Architecture and Mathematics. Edizioni dell’Erba, Firenze, pp 167–182 Willis J (ed) (1970) Macrobius. Commentarii In Somnium Scipionis (Macrobii Opera Vol II). Teubner, Leipzig Willis J (ed) (1983) Martianus capella. Teubner, Leipzig Woepcke MF (1855) Recherches sur l’historie des Sciences mathématiques chez les orientaux, dápres des traités inedits Arabes et Persans. Deuxième siecle. Analyse et extrait d’un recueil de constructions géométriques par Abûl Wafa. Journal asiatique. Fevier-Avril. V: 218–256: 309–359 Worringer W (1911) Formprobleme der Gotik. Piper Verlag, München

Renaissance Architecture

44

Sylvie Duvernoy

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Heritage from Classical Antiquity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Beauty in the Renaissance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Beauty in Renaissance Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1246 1247 1250 1252 1254 1259 1260 1260

Abstract In classical culture, architecture and mathematics were strongly connected by a parallel sense of beauty. The conceptual beauty of mathematical rules was echoed in the search for canons of beauty in art. Scientists and artists both pursued divine perfection. In his treatise titled De Architectura Libri Decem, the Roman architect-engineer, Marco Vitruvius Pollio, conveyed to the Renaissance the cultural background inherited from Classical culture, whose substance and values were in turn conveyed to posterity by the Renaissance humanists. For Vitruvius, beauty in architecture will originate from the numbers that the designer will carefully and knowingly choose while sizing the geometric shapes of the designed object: the numbers, their ratios, their proportion, their modularity, and their commensurability. Renaissance humanists also included in the many mathematically beautiful items the peculiar proportion that Euclid used to name the “extreme and mean ratio.” The mathematician Luca Pacioli named it “the

S. Duvernoy () Politecnico di Milano, Milan, Italy e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_10

1245

1246

S. Duvernoy

divine proportion.” A further geometric form became essential to Renaissance architecture when, toward the end of the fifteenth century, a new diagram appeared in Italian religious architecture: the central plan. In centrally planned churches, the dome rises over the heads (and souls) of the worshippers, thus creating a different spatial relationship between the single individual and the house of God. The central space and its dome represent the Earth under the Heavens, with Man at the center of the universe. While these geometric forms and systems were embedded in architectural plans, the Renaissance is also the time in which the geometric rules for perspective representation were reinvented, thus providing designers with a new work tool.

Keywords Alberti · Piero della Francesca · Bramante · Golden section · Musical proportions · Central plan · Perspective

Introduction The name of the historical period following the Middle Ages – the Renaissance – comes from the extraordinary blossoming of arts and sciences that suddenly spread across Europe, starting in Italy in the very last years of the fourteenth century. This progress is said to be due to the rediscovery of the cultural and scientific heritage coming from Classical Greek and Roman antiquity, on behalf of the humanists of that time. But rather than being a rediscovery of ancient knowledge, it seems more correct to speak of a renewed intellectual approach to knowledge and learning. It is a new approach toward Classical culture and its values that produced the socalled rebirth of art and science. Science and its written supports – manuscripts and copies of treatises – had circulated all through the Middle Ages, in most regions of Europe, thus making knowledge available to people who could read and write. In the Renaissance, however, scientific matters start to be more widely studied by scholars and consequently discussed, commented upon, and debated. This cultural movement inevitably led to an improvement and increase in knowledge in all fields. Therefore, rather than by a “renaissance” of arts and sciences, the European Renaissance historical period is marked by the “renaissance” of interest in the cultural values of the heritage from the past and by a renewed theoretical approach to scientific research. All of this was made possible by the invention of mechanical printing by Johannes Gutenberg (c. 1400–1468) in Germany, in the mid-fifteenth century, supporting the production of books in great quantity in a very short time. Mechanical printing provided a communication tool among scholars that had no equivalent in the previous centuries. The subsequent flourishing of the editing business, and the publication of theoretical treatises, was of paramount importance for the progress and international dissemination of Renaissance culture.

44 Renaissance Architecture

1247

This chapter commences by tracing the influence of classical antiquity, and its attitude to mathematical beauty, on the Renaissance. Thereafter, particular Renaissance developments of these theories are described, including both number and geometric symbolism. Theories and practices of proportional beauty in Italian Renaissance architecture are described in the following section, before the chapter explains the rediscovery of perspective projection, its significance, and its application in architecture.

The Heritage from Classical Antiquity As far as the nexus between mathematics and architecture is concerned, theoretical research resumed from where it had been interrupted in the past. It is important to recall that in Classical culture, architecture and mathematics shared a parallel sense of beauty and were thus strongly connected. The conceptual beauty of mathematical rules and theories was echoed in the search for canons of beauty in art. As such, scientists and artists both pursued divine perfection. Evidence of this sense of beauty is found in the ancient scientific and philosophical treatises from late antiquity, not least Plato’s dialogues. While inquiring into the mathematical properties of some basic geometric objects – both planar and solid – Greek mathematicians and philosophers had pointed out that some special numbers, shapes, and proportion systems were fairer than others. For example, Plato (428–348 B.C.) explains in the Timaeus that the half equilateral triangle is the most beautiful of all triangles: We, however, shall pass over all the rest and postulate as the fairest of the triangles that triangle out of which, when two are conjoined, the equilateral triangle is constructed as a third. (Plato 1929. Timaeus, 54b)

This peculiar triangle makes it possible to build three out of the five regular polyhedra that Plato describes at length in his dialogue, and surely this is why it held a special value and a special beauty. The search for the construction of the regular polyhedra itself was a search for formal beauty and perfection (Fig. 1). For to no one will we concede that fairer bodies than these, each distinct of its kind, are anywhere to be seen. (Plato 1929. Timaeus, 53e)

In antiquity, mathematical beauty is linked to the concepts of order and commensurability. The concept of “order” is related to regularity: shapes that are equilateral and equiangular are more beautiful then irregular ones. In modern terms, we would say that the least number of parameters to define the shape, the more beautiful is the shape. In addition to material shapes, the sense of beauty also applies to abstract notions and research tools. Plato states that some proportions and means proportional have special beauties. A proportion is the relationship linking three or more quantities. In ancient Greek mathematics, the word “mean” applies either to a sequence of three terms in continuous proportion or to the middle term which ties the two extremes together. In modern terminology, the expression “mean proportional” is often used

1248

S. Duvernoy

Fig. 1 The five regular polyhedra, also called “platonic solids.” (Drawing: S. Duvernoy)

to stand for the second meaning. On the contrary, a “ratio” is the relationship linking two quantities: it may be either rational or not. In a time in which only round numbers were known, ratios, proportions, and means were the main research tools, both in arithmetic and in geometry. Ratios between natural integers belong, of course, to the realm of arithmetic, but proportions and means are calculation systems that apply to both disciplines. In Plato’s day, three different kinds of means were already known: the arithmetical, geometric, and harmonic means. The harmonic mean was defined by the Pythagoreans in order to calculate the musical intervals. In the Timaeus, Plato refers to two different means to explain the creation of the world. In this famous description, he reestablishes the link between mathematics and metaphysics, restoring the lofty philosophical value of scientific research in all the liberal arts. The

44 Renaissance Architecture

1249

Pythagorean musical proportions formed by an arithmetical and a harmonic mean were used for the creation of the soul of the world. While for the creation of the body of the world (which occurred prior to the creation of the soul), the geometrical mean was necessary. The quality of the proportion creates beauty, and, out of the three existing means, the geometrical one is mostly highlighted. But it is not possible that two things alone should be conjoined without a third; for there must be some intermediary bond to connect the two. And the fairest of bonds is that which most perfectly unites into one both itself and the things which it binds together; and to effect this in the fairest manner is the natural property of proportion. For whenever the middle term of any three numbers, cubic or square, is such that as the first term is to it, so is it to the last term, − and again, conversely, as the last term is to the middle, so is the middle to the first, − then the middle term becomes in turn the first and the last, while the first and last become in turn middle terms, and the necessary consequence will be that all the terms are interchangeable, and being interchangeable they all form a unity. (Plato 1929. Timaeus 31c–32b)

Apart from the philosophical demonstration, what is interesting in this quote is the recurrence of the word “fair,” which applies to an abstract mathematical concept. The mean – the sequence of three or four terms mathematically connected by a proportional relation – is the cause for the beauty of the composition created by God. Passing from the Greeks to the Romans, the strong classic sense of beauty for mathematics echoes in the oldest architectural treatise that has been preserved: the treatise entitled De Architectura Libri Decem written by the Roman architectengineer Marcus Vitruvius Pollio (c. 80–c. 15 B.C.). Vitruvius speaks about numbers and notes that Plato stated that the number 10 was the perfect number, being formed by units named “monads.” However, he adds that Greek mathematicians had a different opinion and considered that 6 was the perfect number, because it can be divided in parts that have harmonic proportions with the whole: 1 is the sixth, 2 is the third, 3 is half, 4 is two thirds, and 5 is the fifth. The anthropomorphic references enhance and subtend the concept of beauty. The number of man’s fingers is 10, 6 is both the ratio between man’s foot and height and man’s cubitus and palm. Arithmetically speaking, 10 is the sum of the four first integers, 12 (6+6) is the smallest multiple of the first four integers, while 6 is both the sum and the product of the first three integers. Vitruvius explains that the Romans, following the Greeks, first adopted the number 10 as the perfect number and later stated that 16 was the most perfect, being the sum of 10 and 6 (Vitruvius 2009. III, 1). Perfect beauty is achieved when geometry and arithmetic combine to generate special shapes whose geometrical properties can be “numbered.” This is the case of the so-called Pythagorean triangle: the triangle whose three sides can be quantified through natural integers, and by summing the squares on the sides, the rational number of the square on the hypotenuse is obtained. The sides and the squares on the sides are all rational and commensurable quantities, and the triangle is, always, a right-angle triangle.

1250

S. Duvernoy

Mathematical Beauty in the Renaissance When Poggio Bracciolini (1380–1459) brought back to Florence a copy of the De Architectura Libri Decem that he had found in the monastery of Saint Gall (in modern Switzerland), the manuscript immediately caught the attention of artists, architects, and humanists from all disciplines. Vitruvius’ treatise had been available during the Middle Ages in many monastery libraries, but in Florence, at the beginning of the fifteenth century, it started to be studied again, discussed, commented upon, and illustrated, prompting the writing and publishing of architectural treatises across Europe for several centuries onward. Vitruvius indeed conveyed to the Renaissance the cultural background inherited from classical culture, whose substance and values were in turn conveyed to posterity by the Renaissance humanists. However, the mathematics contained in Vitruvius’ treatise is quite elementary and disordered. References to classical problems are scattered in the ten books of the treatise, and thorough explanations are missing. Vitruvius makes two types of references to single mathematicians. He sometimes gives a list of mathematicians in passing, as examples of scientists of the highest level but without entering into their achievements. At other times, he gives credit to individual mathematicians for their discoveries. Thus, he cites Plato for the doubling of the square, Pythagoras for his theorem, Archytas for having resolved the problem of doubling the cube by means of half cylinders, and Eratosthenes for the resolution of the same problem by means of the mesolabium. It is surprising to note that he does not cite Euclid (fl. 323–283 B.C.) a single time, nor mentions the Elements, the theoretical basis from which all classical studies of mathematics began (Williams and Duvernoy 2014). The main message with which Vitruvius can be credited regards the concept of mathematical beauty. Beauty in architecture will originate from the numbers that the designer will carefully and knowingly choose while sizing the geometric shapes of the designed object: the numbers, their ratios, their proportion, their modularity, and their commensurability. This message that he passed on to posterity was fully received by the Renaissance scholars and architects, who highlighted it, claimed it, and broadened it to encompass even more notions and concepts. Around 1450 – in the years in which mechanical printing was being invented – Leon Battista Alberti (1404–1472) wrote a treatise entitled De Re Aedificatoria, the very first of the many literary works on architectural theory that would be produced in the centuries that followed. While mentioning the question of defining measures expressible in round numbers, he referred to Pythagorean musical ratios and the harmonic proportion, stating that: Surely the same numbers that make the sound of voices so pleasant to the human ears, are the same that will fill the eyes and the souls with marvelous pleasure. We shall thus draw our proportion rules from the musicians who are perfectly familiar with those numbers. (Alberti 1988. IX, 5)

This statement immediately became a shared design rule and was reiterated in later treatises. We can mention a couple of well-known examples. In a book entitled

44 Renaissance Architecture

1251

De Geometria (Paris 1545), which would later become the first of the seven books assembled in a treatise entitled L’Architettura, Sebastiano Serlio (1475–1554), after addressing various themes, draws a list of seven quadrangular proportions in which the reader recognizes the Pythagorean musical ratios. The list starts with the square (1:1) and ends with the double square (1:2), with five intermediate proportions: √ 4:5–3:4 (the fourth, or diatessaron) – 1: 2–2:3 (the fifth, or diapente) – 3:5. Only one proportion is irrational, the middle one, and comes from the very old problem of the duplication of the square (Serlio 2005). Later on, in 1570, Andrea Palladio (1508–1580) published in Venice a treatise entitled I Quattro libri dell’Architettura. In chapter 23 of Book 1, Palladio gives the rule to define the height of a quadrangular room, and in chapter 24 he gives a series of drawings showing the appropriate kind of vault for each kind of room. The list of the seven kinds of rooms is very similar to Serlio’s proportion list. Palladio starts with a room in the shape of a circle (1:1), then a square (1:1), and five rectangles √ whose proportions are, respectively, 3:4 (the fourth, or diatessaron) – 1: 2–2:3 (the fifth, or diapente) – 3:5–1:2 (the octave). The musical names are not mentioned, as if, at the end of the sixteenth century, they need not be repeated once again. By then they were known. (Palladio 1997) Renaissance humanists also included in the many mathematically beautiful items the peculiar proportion that Euclid used to name the “extreme and mean ratio.” Euclid gives the definition of “extreme and mean ratio” at the beginning of Book 6 of the Elements (definition number 3), and he gives the rule of how to actually divide a line in extreme and mean ratio in Book 6, proposition XXX (Euclid, 1956). In Euclid’s day, this proportion was a necessary calculation tool to define the ratios between the various sides of the five regular polyhedra inscribed in a sphere, which are linked by different kinds of incommensurable relationships. Around 1509, the mathematician Luca Pacioli (1445–1517) wrote a full treatise dedicated to the “extreme and mean ratio” entitled De Divina Proportione. In this book he explained at length the 13 properties of this special proportion, each being highlighted by an epithet such as “bizarre,” “wonderful,” “supreme,” “superb,” “incomprehensible,” “magnificent,” etc (Pacioli 1509). No word was enough for him to describe this marvel of mathematics. In 1496 Luca Pacioli had been called by the Duke Ludovico Sforza (1452–1508) at the Milanese court to teach mathematics. There, he met Leonardo da Vinci (1452–1519) who had been in the Duke’s service since 1482. Under Luca’s guidance Leonardo started a systematic study of theoretical mathematics, studying the Elements of Euclid and becoming familiar with all the classical geometrical problems. In the last chapters of De Divina Proportione, Luca addresses the problem of building the regular polyhedra, and Leonardo provides a series of illustrations which forms the very first historical, exact – and beautiful – perspective representation set for these complex solids. Before Leonardo, nobody had been able to draw them correctly in three-dimensional representation. Today the “divine proportion” is commonly named “the golden section,” and its arithmetical value is the “golden number,” which is an irrational magnitude. It is surprising to note that no architectural treatise of the Renaissance lists the

1252

S. Duvernoy

Fig. 2 Geometry and irrationality of the “divine proportion”. (Drawing: S. Duvernoy)

golden section among the beautiful proportions. However, there is evidence of its use in architectural design, showing that interactions between mathematics and architecture did exist beyond what is reported in the architectural literature (Figs. 2 and 3).

Beauty in Renaissance Architecture Besides emphasizing the Greek musical proportions, Alberti also adheres to the classical taste for order and regularity in geometrical shapes. In his architectural treatise, he recommends nine possible geometrical diagrams for temple design, six of which are circular or polygonal and only three of which are rectangular. The square, the hexagon, the octagon, decagon, and dodecagon are the recommended polygonal shapes, and he insists on the fact that their angles must be precisely drawn equal to one another; otherwise, they will not be regular and inscribed in a circle. The circle seems, therefore, to be the ultimate reference, since it is – Alberti states – the favorite shape of nature. According to ancient Greek philosophy, the circle and the sphere represent the shape of the universe: the divine creation. The cosmos was conceptualized as seven concentric spheres, at the center of which was the Earth. In Plato’s day, symbolic values were attributed to the regular polyhedra. They represented the four elements that were believed to compose the physical world: the tetrahedron represented fire, the octahedron air, icosahedron water, and the cube stood for earth. The fifth Platonic solid, the dodecahedron, symbolized the ether (the quintessence). Toward the end of the fifteenth century, a new diagram appears in Italian religious architecture: the central plan. Many centrally planned churches are shaped according to the basic geometrical pattern of a central cube covered by a hemispherical cupola, to which peripheral chapels are added, shaped as half-cubes or semi-cylinders, and covered by portions of spheres. In centrally planned churches, the dome rises over the heads (and souls) of the worshippers, thus creating a different spatial relationship between the single individual and the house of God. The central space and its dome represent the Earth under the Heavens, with Man at the center of the universe.

44 Renaissance Architecture

1253

Fig. 3 Unknown architect, Palazzo Uguccioni, begun in 1550 c., Florence. The façade has “divine” proportions. (Courtesy of K. Williams)

One of the most famous early examples of a centrally planned church is Santa Maria delle Carceri, designed by Giuliano da Sangallo (1445–1516), built in Prato (Tuscany) in 1485. A composition of squares, circles, cubes, and spheres, its geometrical diagram is the paradigm of symbolic solid geometry. Other examples include the church of Santa Maria della Consolazione in Todi (Umbria), whose construction started around 1508 under the direction of Cola da Caprarola (?–1518); the church of the Madonna di San Biagio at Montepulciano (Tuscany), built by Antonio da Sangallo the Elder (1453–1534) between 1518 and 1545; and Sant’Eligio degli Orefici in Rome which was designed by Raffaello Sanzio (1483– 1520), although its construction was carried out by Baldassare Peruzzi (1481–1536) after Raffaello’s death.

1254

S. Duvernoy

In the same period, Donato Bramante (1444–1514) designed the small temple of San Pietro in Montorio in Rome (built c. 1502), also known as the Tempietto, which was immediately recognized as one of the most beautiful and perfect examples of Renaissance architecture, embodying the canons of Classical antiquity. The geometrical pattern of the temple, based on the circle and the sphere, is supremely simple, clear, and perceptible. √ Bramante’s choice of 1:1, 2:1, and 2:1 ratios for the Tempietto makes perfect sense in the context of a centralized plan based on concentric circles and cardinal axes, since these ratios fit an ad quadratum sequence habitually used to resolve this kind of composition. (Wilson Jones 1990)

Bramante’s contemporaries’ admiration of the Tempietto can be measured by the number of drawings and references that almost instantly appeared in the sketchbooks and architectural treatises of his fellow architects. The building was inserted in the collection of classical models that modern architects had to learn from. In the third book of his treatise, L’Architettura, entitled De le Antiquità (Venice 1540), Sebastiano Serlio describes the church of San Pietro in Montorio with three drawings: a plan and an elevation of the building itself and the general plan of the original project showing the Tempietto inside a round courtyard (Serlio 2005). The plan and the elevation of the Tempietto also appear in Andrea Palladio’s treatise. Palladio justifies the inclusion of this contemporary monument among the collection of Classical temples by pointing out that Bramante was the first to bring to light the beautiful ancient architecture that to his day had been forgotten (Palladio 1997). Sketches of the Tempietto are also found among the freehand drawings by Bastiano da Sangallo (1481–1551), a sign that by the first years of the sixteenth century, the study of this building had become an inevitable part of the experience of architects on a Grand Tour to Rome (Figs. 4 and 5). Classicist Renaissance architects struggled to adapt the model of the Greek temple front to the façade of Christian churches, a question that could be tricky, especially for buildings with the Latin cross diagram. Palladio solved the problem by intertwining two fronts: one tall and narrow for the volume of the central nave and a second one, low and wide, for the side aisles (Wittkower 1998). The Greek temple front is one of the main themes of Palladio’s architecture. He designed colonnades supporting triangular pediments even for villas façades, thus mixing the vocabulary and features of sacred architecture with those of private buildings.

Perspective The relationship between mathematics and architecture has always had a double aspect. On one side, mathematics is the theoretical discipline which provides models and algorithms from which forms and shapes can be numerically generated: the design tool. On the other side, geometry is the scientific tool that makes it possible to construct exact representations of an architectural object: the communication tool. In the early Renaissance, we witness a renewed interest in both aspects. The biggest

44 Renaissance Architecture Fig. 4 Pellegrino Tibaldi, San Sebastiano, begun in 1576, Milan. Central plan, classical proportions, and symbolism in late sixteenth-century church design. (Drawing: S. Duvernoy)

1255

1256

S. Duvernoy

Fig. 5 Pellegrino Tibaldi, San Sebastiano, begun in 1576, Milan. (Photo: S. Duvernoy)

and most important progress that was achieved in the Renaissance in the realm of the relationship between mathematics and architecture concerns architectural representation and the rebirth of perspective. Since the ancient knowledge of perspective representation had been lost, Renaissance humanists had to redefine it. The question was how did the Ancients draw the beautiful frescoes and trompe l’oeil paintings that can still be seen in the Roman villas in Pompeii, Herculaneum, and Rome, dating back to the first century A.D.? No ancient text could help, not even Vitruvius’s treatise, which had been such an inspiration to date. Vitruvius hastily mentions a kind of representation in which the sides of the building recede into the distance, but he is so vague about it that we are allowed to think that he was not an expert in this kind of drawing. Not even Euclid’s treatise on “Optics,” which inquires in depth into the geometry of human vision, could help. Euclid studies the differences between the true forms of objects and their distorted appearance to the human eye. His purpose, however, is not to try to reproduce the visual image artificially (Euclid, 1945). Therefore, it was through practical experiments first, and subsequently through scientific theorization, that the perspective rules were rediscovered in the Renaissance. The three first fundamental steps in the rebirth of perspective science can be listed as follows. First, Filippo Brunelleschi (1377–1446) undertook the successful experiment of reproducing on a small panel the image of Florence’s baptistery as

44 Renaissance Architecture

1257

it could be seen from a precise standpoint located on the cathedral’s threshold. The panel is long lost, but the experiment was reported by Brunelleschi’s biographer, Antonio di Tuccio Manetti (1423–1497), who also mentions a second panel showing a view of Palazzo Vecchio, which has also been lost. Second, Leon Battista Alberti put into written words, in a small book entitled De Pictura (1435), a didactic explanation which is known as the “costruzione legittima,” i.e., the rule for the correct geometrical depiction of a horizontal square grid receding in the distance (Alberti 1991, I, 19–20). No image illustrated the text, but the words were precise enough for the commentators to be able to draw their own pictures. Third, Piero della Francesca (1415–1492) wrote a full treatise on perspective drawing entitled De Prospectiva Pingendi, a text which is considered to have been completed in 1482. This treatise is the first thorough theoretical study on perspective. It is scientifically organized and divided into three books, each book comprising a series of theorems. The author both studies and teaches how to draw in perspective. His set of rules established the basis of the “linear perspective” technique, which is the technique that makes it possible to simulate our static visual perception of objects and space on a flat canvas or sheet of paper (Della Francesca 2005) (Fig. 6). After Piero della Francesca, perspective science spread very quickly across Europe. The first applications are found in painting and examples of art works are countless. They range from small pictures to full-size paintings and frescoes in churches and palaces. Italian Renaissance artists focused mainly on one-point perspective and very seldom addressed the question of two-point perspective. In fact, the technique of one-point linear perspective makes it possible to solve the question raised by Euclid in Optics, theorem 8, regarding the proportion of equal objects’ apparent sizes with respect to their distances from the viewer. Euclid states in theorem 8 that lines of equal length and parallel, if placed at unequal distances from the eye, are not seen in proportion to the distances (Euclid, 1945). This theorem has been commented on at length by many scholars, underlining the fact that it mostly intends to deny what probably was a common erroneous feeling at the time it was written. While this assertion is true in most of the cases, it turns out to be wrong in the case in which the parallel lines of equal length (e.g., several similar columns or pillars) are aligned on a line which is perpendicular to the picture plane, thus receding to the vanishing point which is the orthogonal projection of the viewer’s eye on the horizon line. Indeed, in one-point linear perspective, apparent sizes of equal objects are inversely proportional to their distances from the viewer. This means that the space and the objects represented in one-point perspective views may be measured – up to some extent – and their apparent sizes are going to suggest a correct dimension of the spatial depth to the viewer, especially when the square grid of the floor is clearly visible under the polychrome decoration of the marble tiles (Fig. 7). The treatise by Piero della Francesco does not discuss the meaning of perspective. Piero is only addressing the question of how to draw – and not why to draw – in perspective. His purpose is only to unveil the geometrical rules. The various meanings of the various representation techniques are discussed in the document known as “Raffaello’s letter” to Pope Leone X. The pope had

1258

S. Duvernoy

Fig. 6 From Alberti: the basic rule for drawing an horizontal square grid in perspective. (Drawing: S. Duvernoy)

Fig. 7 In one-point linear perspective, apparent sizes of equal objects are inversely proportional to their distances from the viewer. (Drawing: S. Duvernoy)

44 Renaissance Architecture

1259

commissioned Donato Bramante to do a measured survey of the city of Rome, but the survey was later carried out by Raffaello Sanzio. Raffaello wrote a letter to his patron to explain his work methodology, in which he opposes the “architect’s drawing” and the “painter’s drawing.” He insists that the architect needs to be able to measure the drawings, therefore, for this purpose, the drawings of a building need to be three: the plan, the outside façade showing the ornaments, and the vertical section showing the inside of the building. This letter is the first historical document which mentions the technique of the vertical section, and defines the modern full set of 2D drawings, by adding to the traditional ichnographia (plan) and orthographia (façade) the vertical section as a third necessary representation. Nonetheless, and despite the fact that no measurement can be taken from a foreshortened line, Raffaello agrees that the perspective view – which belongs to the painter’s skills – may be useful to the architect to fully explain the shape of a building. The visual image may be joined to the scientific representation of the scale drawings for the sake of communication. Raffaello was probably too strict in his statements. Renaissance artists were quick to notice that the perspective view could also be used as a design tool and not just a representation tool. Leonardo da Vinci was probably among the first to extensively use this type of drawing for study purposes. He filled many pages of his sketchbooks with drawings and studies of centralized churches, exploring several different geometrical options. All plan diagrams are completed by a bird’seye perspective view of the possible volume that could be built from the plan sketch (Xavier 2008). While studying an extension to the church of Saint Peter in Rome, Baldassare Peruzzi drew an astonishing cut away bird’s-eye perspective view showing both the plan and the inside space of his project. This sketch is both very modern and very innovative for the time in which it was made.

Conclusion The Renaissance spread in France at the end of the fifteenth century, thanks to the king, François 1 (1494–1547), who invited several Italian artists to come to the court of France, not least Leonardo da Vinci, who spent the last years of his life in France. He asked painters such as Rosso Fiorentino (1495–1540) and Francesco Primaticcio (1504–1570) to decorate his castle in Fontainebleau, and their gathering started what is known now as the “School of Fontainebleau.” Sebastiano Serlio, too, worked in Fontainebleau and published in France four of the seven books of his treatise on architecture: Books 1 and 3 were published in Paris in 1545, Book 5 followed in 1547. The last book (Book 7) was published in Lyon in 1551. Philibert de l’Orme (1514–1570) and Pierre Lescot (1510–1578) are among the great French Renaissance architects. Philibert de l’Orme spent 3 years in Rome between 1533 and 1536 to complete his formation in Classical architecture. The Renaissance reached England and Northern Europe thanks to cultural exchanges mostly due to the works and studies by the British architect Inigo Jones

1260

S. Duvernoy

(1573–1652). Jones’s first travels in Italy took place shortly before 1603. He was an admirer of Palladio and he collected a significant number of the master’s drawings on his second trip in 1613–1614, acquiring them from Vincenzo Scamozzi (1548– 1616), a disciple of Palladio. Jones played an important role in the spread of the so-called Palladianism: the architectural style that became very popular in Great Britain in the mid-seventeenth century, even reaching the United States in the eighteenth century.

Cross-References  Baroque Architecture

References Alberti LB (1988) In: Rykwert J, Leach N, Tavernor R (eds) On the art of building in ten books (1452). MIT Press, Cambridge, MA Alberti LB (1991) In: Kemp M, Grayson C (eds) On painting (1435). Penguin, London Della Francesca P (2005) De Prospectiva Pingendi (1482). Fasola G.N. Le Lettere, Florence Euclid (1945) Optics (ca 300 B.C.) (trans: Burton HE). J Optical Soc Am 35(5) Euclid (1956) Elements (ca 300 B.C.) (trans: Sir Thomas Little Heath). Dover, New York Pacioli L (1509) De Divina proportione. Venice Palladio A (1997) The four books on architecture (1570) (trans: Tavernor R, Schofield R). MIT Press, Cambridge, MA Plato (1929) The loeb classical library. In: Timaeus. Critias. Cleitophon. Menexenus. Epistles (trans: Bury RG), Harvard University Press, Cambridge, MA Serlio S (2005) On architecture (1540–1551) (trans: Hart V, Hicks P). Yale University Press, New Haven Vitruvius (2009) On architecture (15 B.C.) (trans: Schofield R). Penguin, London Williams K, Duvernoy S (2014) The shadow of Euclid on architecture. Math Intell 36(1): 37–48 Wilson Jones M (1990) The Tempietto and the roots of coincidence. Archit Hist 33:1–28 Wittkower R (1998) Architectural principles in the age of humanism. Wiley, Hoboken Xavier JP (2008) Leonardo’s representational technique for centrally-planned temples. Nexus Network J 10(1):77–99

Baroque Architecture

45

Sylvie Duvernoy

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Baroque Architecture and Architects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Church Design: The Elongated Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Odd Polygons and Complex Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Literary Sources and Onsite Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perspective and Anamorphosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Baroque Polymathy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1262 1263 1263 1269 1271 1272 1273 1275 1276 1276

Abstract The Baroque approach to geometry is challenging and playful. Complex shapes are experimented with and new layouts are designed, giving birth to dynamic and fluid spaces. Baroque architecture makes use of curves, curved spaces, and undulating walls. Some daring interpretations in the use of the classical orders are visible in façade design, and Baroque designers play with classical design rules. The Renaissance was the time in which rules were established and applied, and Baroque is the time when shape grammars are enriched with new forms. Guarino Guarini, one of the main Italian architects of the High Baroque period, wrote in his treatise: “Architecture may modify ancient rules, and invent new ones” (Guarini (1737) L’Architettura civile. Torino). This approach toward tradition and classical culture leads designers and artists to innovate in all fields. Church design – for instance – evolved from the concept of centrality to the S. Duvernoy () Politecnico di Milano, Milan, Italy e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_9

1261

1262

S. Duvernoy

notion of “elongated centrality”; from a square or circular to an oval plan diagram. In the meantime, perspective representation evolved to trompe-l’oeil and anamorphosis. “Baroque” can also be an epithet which is sometimes used in a deprecatory sense, meaning an excess of decoration or ineffective complexity. This is an incorrect understanding of the early and High Baroque styles which are, on the contrary, very elegant and balanced, even when dealing with sophisticated geometrical patterns.

Keywords Vignola · Borromini · Bernini · Guarini · Ellipse · Oval · Oval domes · Spiral · Trompe-l’oeil · Anamorphosis

Introduction When does the Renaissance end? When does the Baroque begin? It is quite impossible to define a precise date. Art and architecture historians assert that the Renaissance style first evolved into Mannerism and then Baroque. The beginning and the end of the Renaissance period varies with geographical areas and countries. As far as Italy is concerned, it is commonly accepted that the Renaissance started with the Florentine artist Filippo Brunelleschi (1377–1446). According to Giorgio Vasari (1511–1574) in his book “Le Vite de’ più eccellenti pittori, scultori e architettori” (published in Florence 1550), Filippo and Donatello (1386–1466) went together to Rome to study (and measure) the remains of ancient classical architecture and artworks (Vasari 1550), thus starting a practice that would later evolve into the tradition of the Grand Tour. The turn from the Renaissance to the Baroque begins at the end of the sixteenth century. Michelangelo Buonarroti (1475– 1564), a contemporary of Baldassare Peruzzi (1481–1536) and slightly younger than Andrea Palladio (1508–1580), is considered to be a “High Renaissance” artist, belonging to the last generation of the Renaissance humanists. Vignola (1507–1573) worked in the same years as Michelangelo did. He can thus be listed among the greatest Renaissance architects, but his works show that he played a leading role in the evolution from the Renaissance to the Baroque style (see below). The main building site in Rome, the construction of Saint Peter’s Basilica, started in the Renaissance in 1506 under the direction of Donato Bramante (1444–1514) and ended in 1626 under the guidance of Carlo Maderno (1556–1629), one of the main Baroque architects, who designed the basilica’s façade. This chapter commences with an overview of Baroque architecture and architects before examining a series of themes that are pertinent to the era. The first theme concerns oval and elliptical geometry, as such curves are central to the creation of the baroque plan. The following section describes complex polygons and compound curves, which Baroque architects used, often alongside ovals, to shape space and form. Literary sources, architectural treatises, which describe such geometric

45 Baroque Architecture

1263

constructions, are described in the next section, before Baroque perspective and anamorphosis techniques are introduced. The final section in the chapter considers the general respect architects held mathematics at the time.

Baroque Architecture and Architects The evolution from Renaissance to Baroque can be conceptualized as an evolution in design, based on a new approach toward the design rules. While the Renaissance architects would adhere to the rules that they helped to define, Baroque architects would start to play with the rules, in the search for new geometries. Renaissance geometrical patterns were mostly compositions of basic figures such as the square, the circle, the cube, the sphere, and the rectangle of musical proportions. Of course, the manipulation of a few basic figures could produce infinite patterns, which in turn could produce infinite architectures. The Baroque approach to geometry is more challenging and playful. Complex shapes, less easy to handle, are experimented with; new layouts are designed, giving birth to dynamic and fluid spaces. Baroque architecture makes use of curves, curved spaces, and undulating walls. Some daring interpretations in the use of the classical orders show in façade design. However, when used as an adjective, “baroque” is also an epithet which can be used in a deprecatory sense, meaning an excess of decoration, or ineffective complexity. This is not the case for a proper Baroque style, which is on the contrary very elegant and balanced, even when dealing with composite geometrical patterns. The Baroque era is loosely divided into three approximate phases for convenience: Early Baroque, ca. 1590–1625; High Baroque, ca. 1625–1660; and Late Baroque, ca. 1660–1725. Indeed, in some areas and countries, Late Baroque architecture happens to be richly decorated with sculptures and stuccos that unfortunately tend to hide the geometries of the buildings. But Early and High Baroque monuments are sophisticated outcomes of the interaction between geometry and art. The main architects of Italian Early and High Baroque eras include Carlo Maderno, Gianlorenzo Bernini (1598–1680), Francesco Borromini (1599–1667), and Guarino Guarini (1624–1683). The most famous French Baroque architects are François Mansart (1598–1666) whose masterpiece is the castle of MaisonsLaffite, Louis Le Vau (1612–1670) the designer of the castle of Vaux le Vicomte, and Jules Hardouin-Mansart (1646–1708) who built the Invalides in Paris. Sir Christopher Wren (1632–1723) is the main English Baroque architect, known for the construction of Saint Paul’s Cathedral in London.

Church Design: The Elongated Centrality When discussing geometry and mathematics in architecture, words describing the geometrical nature of shapes must be carefully employed according to their proper significance and must not be used only to suggest generic visual aspects. Each geometrical shape has unique mathematical properties, and those specific properties

1264

S. Duvernoy

Fig. 1 The mathematical difference between an ellipse and an oval. (Drawing: S. Duvernoy)

give their special shapes – and beauty – to the buildings that designers carefully plan. The difference between a square, a rhombus, and a rectangle is known to anyone, even if they all belong to the general category of “quadrilaterals.” A square is equiangle and equilateral; a rhombus is equilateral but not equiangle, while the rectangle is equiangle but not equilateral. The same mathematical precision is required while speaking of ovals and ellipses. In many texts, the adjectives elliptic and oval are used as if they were synonyms, whereas they describe two different kinds of geometric elements (Fig. 1). The ellipse is a natural curve: it is the shape of the shadow of a sphere or a disk on a plane; it is also the path of the planets orbiting in the sky around the sun. The oval is a closed concave polycentric curve, a graphic composition of circular arcs that can be either ovoid or regular. Ellipses and ovals have different mathematical properties. The tracing of an elliptic curve relies on the prior determination of the length of its mean axes and on the position of its two focal points. The construction of an oval consists in joining four or more segmental arcs of different dimensions and of different radii, which meet where they share the same tangent. The centers of an oval can be arranged according to a variety of patterns; however, regular ovals have four centers symmetrically set on two perpendicular axes. Any ellipse can be closely approximated by a regular oval made of four arcs – two big and two small – mirroring each other with respect to the symmetry axes. In fact, an oval and an ellipse, having identical axes, are very similar as far as visual perception is concerned, and this optical similarity is the underlying cause of the ambiguity of the geometrical terms used by many commentators. However, the pragmatic and technological demands of architecture are of such importance that geometrical ambiguity is in no way compatible with site operations and building processes. Italian sacred architecture of the sixteenth century is characterized by an enlargement of the vocabulary of shapes and forms, whose application in church design adds further variations to the theme of the central plan (Norberg-Schulz 2003). In fact, from a functional point of view, the central plan was not fully

45 Baroque Architecture

1265

appropriate to Christian liturgy. Strict concentricity and radiality did not provide the main path from entrance to altar required for processions and special celebrations. In the mid-sixteenth century, Jacopo Barozzi da Vignola and his disciples were the first to design, and actually build, churches with oval plans and oval domes (Adorni 2008). The oval shape is obviously a clever compromise between a central space and an oriented space. Being oval, the nave has a major and a minor axis, but its closed curved perimeter guarantees its intimacy. Vignola’s very first oval church is Santa Anna dei Palafrenieri and was built in the Vatican for the confraternity of the papal horse keepers. Vignola himself did not live long enough to see it completed; the building operations began in 1572 (a year before he died) and were directed by his son, Giacinto Barozzi, from 1573 on (Fig. 2).

Fig. 2 Jacopo Barozzi da Vignola, Santa Anna dei Palafrenieri, Rome 1575. (Photo: S. Duvernoy)

1266

S. Duvernoy

This innovative geometry will be echoed later on several times by architects influenced by this new design. Santa Anna dei Palafrenieri, with its elegant space, spawned a new typology of religious architecture that would spread all over Europe in the following two centuries. The second oval church built in Rome, San Giacomo degli Incurabili, was designed soon after 1590 by a disciple of Vignola, Francesco Capriani da Volterra (1535–1594). Like Vignola, Francesco Capriani died before the end of the construction, and the church was completed by Carlo Maderno. Other oval churches were soon built in other Italian cities such as Ferrara and Naples. The new design spread quickly abroad also, first in Spain and in Malta, where oval churches were built before the end of the seventeenth century, and later on in northern countries such as France and Austria. The unique geometrical properties of a shape – planar or solid – suggest solutions to design problems, but they also inevitably raise technological problems. The two aspects are undividable, and the effort of the designer will be to solve the technological problems without altering the beauty and perfection of the geometry. The geometric challenges of an oval design reside mainly in the definition of the plan layout and in the construction of the oval dome (Duvernoy 2015). Each oval layout is characterized by the proportional ratio between the axes of symmetry of the curve and the position of the centers of the paired arcs. An overview and comparison of some of the diagrams of the main churches of the seventeenth century show both the differences and the similarities between the layouts. No two churches are alike, but, like variations on a single theme, they all refer to the same design principles. Very few historical drawings are preserved, and not all of those show the geometric diagram underlying the final form. In this sense, the drawings by Francesco Borromini showing the plan of San Carlo alle Quattro Fontane (also known as San Carlino, built in Rome in 1634) are unique. They are among the very few authentic drawings revealing the recipe for the design. However, the shape of the church only follows the diagram in a loose way. In earlier oval churches, the wall surrounding the nave followed a strict oval curve that was interrupted or opened where it intersected the geometrical and symmetry axes, in order to insert the side altars or chapels. Instead, designed a sinuous wall whose alternating concavities and convexities enclose both the nave and its niches. The symmetrical side chapels, and the main altar recess, do not appear as additions external to the oval central space: they are part of it. The oval pattern acts as a starting diagram which is then deliberately deformed to produce a more dynamic space. The upper architrave supported by 16 columns is not oval in plan: it is alternatively straight and bowed. Borromini deliberately played with the rule, to create what could almost be defined today as a “free form.” Gianlorenzo Bernini, too, played with the rule. In Sant’Andrea al Quirinale, built in Rome in 1658, he proposed a new kind of elongated centrality. The oval nave is oriented transversally to the entrance direction: the path from the door to the main altar follows the minor axis of the central space. The church had to have five altars, dedicated to five different Saints. Bernini thus designed an oval church with a total of ten surrounding niches and chapels. The central space can be read as an

45 Baroque Architecture

1267

elongated decagon. The geometrical figure of the decagon is rarely adopted when double symmetry is required, precisely because it cannot be divided into four equal quarters (see below). Here Bernini takes advantage of this property in order to stress the visual perspective along the minor axis, blocking the elongation of the major axis. The central space is quite simple in shape and volume; it is covered by a large oval dome resting on the strong entablature supported by pilasters. In order to design a solemn entrance to the church, Bernini transformed the awkward convex shape into a concave curve. Semicircular steps, covered by a semicircular porch, lead from the street to the main door, and the convex protrusion of the porch itself is framed and enclosed by two symmetrical, curved, concave walls, extending sideways. The façade design is dated to 1669, some 10 years after the design of the church itself, and the entrance ensemble was built after 1676 (Spagnesi, Fagiolo 1983, 227). To a certain extent, this urban arrangement may be seen as a small-scale imitation of his own design for the monumental Piazza San Pietro, in Rome. The oldest description of the geometry of oval domes is contained in the Tratado de Arquitectura , written around 1580 by the Spanish architect Alonso de Vandelvira (1544–1626) (Vandelvira 1977). This first Spanish scientific theorization testifies to the intense cultural and scientific exchanges between Italy and Spain in the late sixteenth and early seventeenth centuries. A few years after the completion of San Carlo alle Quattro Fontane (?) for the Spanish Trinitarians, Diego Martinez Ponce de Urrane built the church of the Virgen de los Desamparados between 1652 and 1666, the first oval church in Valencia (Fig. 3). Two important oval domes had already been constructed in Spain by then: the dome above the choir of the cathedral of Cordoba and the dome of the Sala Capitular of the Cathedral of Seville, both in the second half of the sixteenth century. Later, the oval Oratory of San Filippo Neri was built in Cadiz by the architect Blas Diaz (?) at the end of the seventeenth century, in roughly the same years in which Mattia de Rossi (1637–1695), a disciple of Bernini, was building the church of Santa Maria dell’Assunta in Valmontone, near Rome. Vandelvira discussed six different kinds of oval domes in his treatise. The theoretical – and practical – problem was to define the geometry of the “meridian” and “parallel” ribs of the domes. When the cross section of the dome is a semicircle and the longitudinal section is an oval (Vandelvira’s case study three), or when the longitudinal section is a semicircle and the cross section is an oval (case study four), what is the shape of a meridian rib? Vandelvira drew the curves point by point with the help of the combined orthogonal projections. The question of the geometrical shape of the dome’s ribs brings us back to the question of the mathematical difference between the ellipse and the oval. Many Italian oval domes have semicircular cross sections. If the longitudinal section of the dome is a semioval, in order to match the geometry of the plan, then the dome intrados is a surface of revolution around the major axis. Any cross section is thus a semicircle, and by placing semicircular transverse centerings, the dome may be easily built by successive rings, until it is closed. But if the longitudinal section is drawn as an “elongated” semicircle as prescribed by the traditional method explained both by Leonardo da Vinci (1452–1519) and

1268

S. Duvernoy

Fig. 3 Diego Martinez Ponce de Urrane, the Virgen de los Desamparados, Valencia, 1652–1666. (Photo: S. Duvernoy)

Albrecht Dürer (1471–1528), the resulting curve is obviously an ellipse, and any meridian rib of such a dome is elliptic. Conversely, its parallel ribs (i.e., the horizontal sections) are ovals of the same nature as the pattern of the plan, but they are not concentric: the quadrilateral formed by the centers shrinks regularly with the curve itself, as the section reaches to the top of the dome (Fig. 4). Solid geometry raises mathematical problems that were solved empirically by architects and builders before being theorized by mathematicians in the seventeenth century (Huerta 2007). It was the Swiss mathematician Paul Guldin (1577–1643) who discovered in 1640 the elliptic nature of the elongated semicircle. The combination of planar oval sections and vertical elliptic sections in the same volume clarifies

45 Baroque Architecture

1269

Fig. 4 The complex geometry of an oval dome. (Drawing: S. Duvernoy)

why the mathematical indistinctness between ellipse and oval is so persistent in architectural literature, from Sebastiano Serlio (1475–1554) onward.

Odd Polygons and Complex Curves Among the beautiful shapes that Renaissance texts recommended, some were easier to handle than others. Mirrored symmetry was one of the basic design principles of the Renaissance and Baroque times. Façades, plans, and volumes were all drawn following the concept of symmetry with respect to a central axis. Therefore, figures

1270

S. Duvernoy

having an even number of sides were most suitable to be inserted in architectural geometric diagrams. The square, the octagon, and the dodecagon were the most suitable polygons for central plan design. They were also fully appropriate to solve any problem related to the crossing of two directions, such as – for instance – the nave and the transept of a church. The hexagon and the decagon were much less common, because although they are even in number, they are not – as the ancient Greek would put it – “even-even” numbers, i.e., multiples of four. Furthermore, the hexagon is not inscribable in a square. Its use in a geometrical diagram generates spaces that expand in three directions: a quality that would be developed much later in architectural design. In the Renaissance, figures with a double symmetry with respect to two orthogonal axes were the most common. But Baroque designers enjoyed pushing the limits. For example, Francesco Borromini designed the Chapel of Sant’Ivo alla Sapienza (built in Rome in the mid-seventeenth century) shaping its plan on the basis of an equilateral triangle. Of course the triangle recalls the idea of the Holy Trinity and therefore adds spiritual symbolism to the design of the chapel, but the odd number of sides and the narrow angles of the vertices raise various design problems, not least the position of the entrance door and the location of the major altar. In order to expand the space, Borromini designed three large semicircular chapels along the sides of the triangle, each facing the opposite vertex, and smoothed the sharp pointed polygon by rounding up somewhat the vertices. The door could thus be put on a vertex and the altar in the facing chapel. The dome is a complex composition of three-dimensional curved surfaces and rests on an entablature that mirrors the plan diagram: triangular with added semicircles on each side. On the cupola stands a lantern which is topped with a spiral shape, surmounted by a Cross. When addressing spiral curves, architectural design enters the realm of infinity and pure symbolism. The spiral is an infinite curve; it may have a beginning but has no end. It may be either “arithmetic” (regular expansion) or “logarithmic” (accelerated expansion). Borromini’s spiral is arithmetic. Archimedes (287–212 B.C.) defined the arithmetic spiral as the planar curve described by a point moving at regular speed along a line, while this line rotates with constant angular velocity about one of its extremities. The turns of this kind of spiral are equidistant when measured along a radius. Pappus of Alexandria (290–350 A.D.) defined the conic spiral as the curve described by a point moving at regular speed along a line, while this line rotates at constant angular speed and at constant angular inclination around an axis. Borromini’s spiral on top of the lantern is a conic spiral: it rises up to the sky, adding mystical verticality to the building. There is no evidence that Borromini studied the ancient mathematical treatises by Archimedes and Pappus or even a textbook referring to them. His knowledge of complex curves might have been purely practical or fragmentary. However, Sant’Ivo chapel shows how the architectural shape grammar was enlarged in Baroque times, thanks to innovative designers such as Borromini.

45 Baroque Architecture

1271

Literary Sources and Onsite Studies The question of the relationships between mathematics and architecture in the Baroque period raises the question of the connections between built projects (architectural monuments) and written sources of the era (architectural treatises). There is evidence, for example, that the mathematical beauty of the golden section was known to Renaissance architects. Thus the fact that it is not mentioned in any architectural treatise does not mean that authors were unaware of it. It only shows that, for some reason, it had not been written down or at least not by them or not at that time. The analysis of Early Baroque architecture through the mere study of contemporary treatises would not suffice to underline all its richness and its many innovative aspects. Indeed, there is a lack of literature regarding this period (Kruft 1994). Oval layouts, for instance, are discussed at some length only in Serlio’s treatise. Serlio lists four possible layout constructions, which in fact reduce to three, because diagram four is a special case of the general rule shown in diagram one. Those diagrams come from the studies that he carried out with Baldassare Peruzzi on ancient Roman amphitheaters, which, indeed, acted as inspiring examples. After Serlio, oval diagrams were regularly discussed in treatises ranging from architecture to military engineering to stone cutting. Pietro Cataneo (ca.1510–1569/1573) and Buonaiuto Lorini (ca. 1540–1611) are among the Italian authors of the late sixteenth century who, following Serlio, included examples of oval shapes in the chapters of their treatises dedicated to theoretical geometry. The figures that illustrate the discussions are very similar to those by Serlio: the basic diagrams are still those of the inscribed double square and the inscribed double equilateral triangle. No further discussions are present and no additions are made. These same graphics are still visible in a treatise by Ferdinando Galli-Bibiena (1656–1743), first published in Bologna in 1731, nearly two centuries after Serlio’s book. No updates inspired by recent developments are mentioned, as if theory and practice had never interfered on that subject. Accurate modern studies – and actual measured surveys of oval churches – show that reality is much more varied than that. Many geometrical patterns that cannot be found in any written source were unveiled by recent on-site inquiries. This tends to prove that either additional oral rules circulated among architects or ongoing researches and experiments were conducted by single designers who were more eager to apply them in their projects than to publish them in texts. The intensity of cultural exchanges between disciplines related to science and art shows only partially in Late Renaissance and Early Baroque written sources. The notions of theoretical geometry present in architectural treatises are no more than a shadow of the knowledge of the time, and innovative applications do not immediately echo in specialized literature. However, the mathematical skills of Baroque architects, and their ability to play with the rules in order to devise new design solutions,

1272

S. Duvernoy

indisputably show in the monuments that they actually built. It is up to scholars interested in this matter to find them and point them out.

Perspective and Anamorphosis By 1482, the publication of Piero della Francesca’s (1415–1492) treatise De Prospectiva Pingendi had provided contemporary artists with the necessary theoretical knowledge that allowed them to simulate – by painting – natural views of any architectural space or urban scenery. Surely enough, when designers and artists are offered a new professional tool, they are instantly eager to use it, and they want to figure out all the applications that the new device makes possible. Therefore, many creative art works were inspired by perspective science, spanning from trompe-l’oeil to anamorphoses. The art of trompe-l’oeil is the art of deceiving viewers by painting on walls landscapes or architectural spaces that seem real. Trompe-l’oeil paintings are fullscale perspective representations, and they appeared in art history as early as the end of the fifteenth century, right after the mathematical rules of perspective were definitively established. The first celebrated example is the fake choir of Santa Maria presso San Satiro, in Milan, painted by Donato Bramante in 1483 on the shallow niche behind the church altar. The painting is meant to produce the illusion that the church is built according to a double symmetrical plan shaped as a Greek cross, whereas its true form is only T-shaped. In order to be perfectly deceiving, trompel’oeil paintings have to be seen from the viewpoint chosen by the artist to construct the perspective drawing. In Santa Maria presso San Satiro, this viewpoint is located in the center of the nave, close to the entrance of the church. Like most Italian perspective paintings, Bramante’s fresco is a central perspective and has to be looked at frontally. On the contrary, when a painted decoration has to be looked at sideways, from an unusual position and with an unusual oblique sight direction, then the painting is technically no longer a perspective but an anamorphosis. This is the case, for instance, of the fake cupola painted by Andrea Pozzo (1642–1709) in 1685 above the altar of the church of Sant’Ignazio di Loyola in Rome. From a viewpoint located in the center of the nave close to the entrance door, looking up obliquely to the ceiling above the altar, the visitor sees a cupola that quickly reveals its true nature as a flat image when he moves along the nave to go see it from below. The geometric rule for the correct construction of an anamorphosis is quite simple: it comes from distorting the perspective rule (Baltrušaitis 1984) (Fig. 5). The first artist to be concerned with trompe-l’oeil paintings that could not be looked at frontally was Leonardo da Vinci, who thus suggested to artificially reduce the natural foreshortening effect in the distance. Anamorphosis was then used to produce special effects of “forced perspective” (illusory space elongation or depth reduction) or to produce special art works. Both artists and architects enjoyed playing the game. In 1642, Emmanuel Maignan (1601–1676), a French Jesuit priest, painted a fresco showing San Francesco

45 Baroque Architecture

1273

Fig. 5 The geometrical principle of oblique planar anamorphosis. (Drawing: S. Duvernoy)

di Paola on the wall of the cloister of the convent of Trinità dei Monti in Rome. When looked at frontally, the fresco shows a mountain landscape, but when looked at sideways – while walking along the corridor – the viewer sees the Saint Patron of the Order of Minims. Maignan explained how he proceeded in a book entitled Perspectiva Horaria, published in Rome in 1648. Borromini, too, played with forced effects of perspective. In the entrance of Palazzo Spada, in Rome, he designed for the new owner who wanted to have his Renaissance palace modernized, a portico whose depth appears to be much longer than its true dimension. Thanks to the accelerated reduction of columns heights, and the sloppy floor, the end of the gallery seems much further away than it really is.

Baroque Polymathy Guarino Guarini is the paradigm of the Baroque international polymath. A Theatine priest, he traveled and worked in several European countries (Southern Italy, Portugal, and France) and wrote major scientific treatises on several subjects spanning from theology, philosophy, and astronomy to mathematics and architecture. He designed and built palaces and churches in Messina, Lisbon, and Paris and ended his career in Torino where he worked for the Savoia family, the monarch of Piedmont. Guarini used to define himself as a mathematician. He wrote a treatise in Latin, of about 700 pages, entitled Euclides Adauctus et Methodicus Mathematicaque Universalis, in which he follows the steps of Euclid, book by book, adding personal corollaries and propositions. This book alone testifies to the extent of his knowledge in theoretical mathematics. Among his many other writings, there are two books that deal with architecture, written in Italian. The first is Modo Di Misurare le Fabriche. The second one is l’Architettura Civile, written in Italian, published posthumously 50 years after his death, by Bernardo Vittone (1704–1770).

1274

S. Duvernoy

L’Architettura Civile is very different from all the previous architectural treatises, and – from different points of view – it is also very modern. It is divided into five books. Book I deals with theoretical mathematics. Book II deals with questions of ichnographia, i.e., problems of plan design and representation, in which the author mixes elements of graphic geometry with elements of architectural history and theory. Similarly, book III deals with orthographia: problems of the design and representation of façades with a lesson on classical orders. Book IV is about ichnographia gettata, i.e., projective geometry and deals with problems of vaults and stereotomy. Book V addresses “geodesy” (problems of measuring and quantifying areas of irregular shapes) from a very theoretical approach that recalls Euclidean geometry. In this treatise, the mathematical knowledge that subtends the architectural design is no longer reduced to a shadow. It is described in depth in the first book, and references continuously emerge in the following pages, through to the end of the volume. In Book II Guarini discusses ovals and elliptical shapes from a pragmatic standpoint, giving rules – for instance – for the design and drawing of columns around an oval courtyard. In Book III he discusses spirals (even oval spirals) and parabolas and also mentions Nicomedes’s conchoids. Guarini claims that architecture comes from mathematics. In his treatise, he refers to both ancient mathematicians, such as Archimedes and Nicomedes, and contemporary colleagues and scholars, proving that his knowledge was not only academic but updated with the most recent works in progress. His scholarly approach is very modern. In his writing he either cites scientists’ names, in passing, as experts in the subject that is being discussed, or books, mentioning title and page number, not least his own book, Euclides Auductus . . . We can assume that during his travels and stays abroad, Guarini had the opportunity to meet and have fruitful exchanges with several fellow mathematicians and architects. Those were the years in which Girard Désargues (1591–1661) and Abraham Bosse (1602–1676) were working in France, studying what the French used to call “la pratique du trait,” i.e., the way to represent on paper the tricky three-dimensional shapes of stones required for building complex vaults. Book IV of L’Architettura Civile deals with problems of vault design, and the tables that illustrate the book are full of drawings whose sketching technique anticipates what would be later theorized by Gaspard Monge (1746–1818) and become the science of descriptive geometry. As a novice, when he was in Rome, Guarini had the opportunity to admire Borromini’s work, especially San Carlo alle Quattro Fontane, which had a strong influence on his own work. He himself designed and built many churches and palaces. Some were destroyed. Many are preserved. Both the drawings contained in his treatise and the extant buildings show how daring and innovative his style was. In L’Architettura Civile, Book I, chapter III, sixth remark, he says: “Architecture may modify ancient rules, and invent new ones.” This is a clear declaration of independence with respect to classical tradition. He says it in writing, while predecessors such as Bernini and Borromini only asserted it by building.

45 Baroque Architecture

1275

Fig. 6 Guarino Guarini, San Lorenzo, Torino, inaugurated in 1680. The ribbed dome. (Courtesy of K. Williams)

The church of Sainte Anne La Royale in Paris, on which he worked around 1662, was demolished in 1823, but from the drawings published in the treatise, it seems to have been designed following the diagram of a regular pentagon. The church for the Padri Somaschi in Messina (Sicily) had a hexagonal plan diagram. Other basilical churches show how easily he could manipulate circular and oval shapes, combining them together to form an elongated nave, flanked by side chapels, creating very fluid spaces. However, the vaults are his most spectacular achievements. Unlike his colleagues and predecessors, Guarini claimed that Gothic architecture was beautiful. His own ribbed vaults, showing arcs that cross the covered space, with large windows in between, indeed recall the gothic taste for visible construction technology and largely open façades (Fig. 6).

Conclusion Classical culture considered arithmetic and geometry to be two of the four liberal arts that included also astronomy and music. When writing De Architectura Libri Decem, Vitruvius was attempting to elevate the status of architecture to that of a liberal art, and the form taken by the treatise – a literary work – followed from that desire (Williams and Duvernoy 2014). In the early Renaissance, this ambiguity, the question of the status of architecture, was still present. In fifteenth-century Italy, architecture was still considered (together with painting and sculpture) one of the “three arts of drawing.” Renaissance architects were all trained as painters before

1276

S. Duvernoy

becoming architects. The first treatise on perspective was written by a man who was both a painter and a mathematician. The relationship between mathematics and architecture aimed at being a reciprocal relationship between arts, in which mathematical beauty and symbolism would suggest rules to generate architectural beauty. At the end of the seventeenth century, we can notice an evolution to another kind of relationship. Guarino Guarini was first a mathematician, then an architect. He claimed that architecture stemmed from mathematics. According to him, architecture was a science that came from another science – mathematics – the latter providing the necessary support for the former. Therefore, the relationship between architecture and mathematics evolved to a relationship between sciences. This change of status, of course, did not happen as a sudden and irreversible switch. As with their Renaissance fellows, many Baroque designers were also famous artists: Bernini was a prominent sculptor. However, this kind of approach was definitely modern, compared to Renaissance culture, and it would be the one that would spread later on, in the following centuries.

Cross-References  Renaissance Architecture

References Adorni B (2008) Jacopo Barozzi da Vignola. Skira, Milan Baltrušaitis J (1984) Anamorphoses ou Thaumaturgus Opticus. Flammarion, Paris De Vandelvira A (1977) In: de Lisle GB-C (ed) Trattado de Arquitectura. Caja de Ahorros Provincial, Abacete Duvernoy S (2015) Baroque oval churches: innovative geometrical patterns in early modern sacred architecture. Nexus Netw J 17(2):425–456 Guarini G (1737) L’Architettura civile, Torino Huerta S (2007) Oval domes: history, geometry and mechanics. Nexus Netw J 9(2):211–248 Kruft H-W (1994) History of architectural theory. From Vitruvius to the present. Princeton Architectural Press, Princeton Norberg-Schulz C (2003) Baroque architecture. Electa Architecture, Milan Spagnesi G, Fagiolo M (eds) (1983) Gian Lorenzo Bernini Architetto e l’Architettura Europea del Sei-Settecento. Treccani, Rome Vasari G (1550) Le Vite de’ più eccellenti pittori, scultori e architettori, Florence. (English translation: Conaway Bondanella J, Bondanella P, The lives of the artists. Oxford University Press, Oxford) Williams K, Duvernoy S (2014) The shadow of Euclid on architecture. Math Intell 36(1):37–48

46

Temple of Solomon Tessa Morrison

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Villalpando’s Flawless System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ezechielem Explanationes’ Influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1278 1279 1285 1290 1291

Abstract In the 1580s, two Spanish Jesuit priests, Father Jerome and Juan Bautista Villalpando, began to work on a collaborative scriptural exegesis of the Book of Ezekiel. Prado instigated the project and Villalpando, an architect, was commissioned to complete the commentary on chapters 40–43, which contained the architectural description of the Temple of Jerusalem. The project was financed by the Spanish King Philip II for over 20 years. In 1592, both priests moved to Rome to complete the project. The project had many problems from the beginning, as Villalpando and Prado disagreed about the design and the importance of Solomon’s Temple. Prado believed that Ezekiel’s Temple was a Temple of the future and not Solomon’s Temple. He claimed that the architecture of the precinct of Solomon’s Temple followed the description from the twelfth-century Rabbi Moses Maimonides, whose ground plan was asymmetrical, whereas Villalpando believed that Ezekiel’s Temple was the Temple of Solomon, and that the precinct of the Temple was highly symmetrical and represented the microcosm of the macrocosm – the earthly image of the heavens. However, Prado died in 1595

T. Morrison () The School of Architecture and Built Environment, The University of Newcastle, Newcastle, NSW, Australia e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_3

1277

1278

T. Morrison

after the first volume had been completed, and Villalpando found himself in charge of the entire project. In 1604 Villalpando published a further two volumes. One of these volumes was entirely dedicated to the architecture of the Temple of Solomon as described by Ezekiel. This volume was entitled in Ezechielem explanationes et apparatus urbis ac templi Hierosolymitani (Ezekiel’s explanation and the preparation of the cities and of the Temple of Jerusalem), and it stimulated a heated debate almost instantaneously that lasted for 150 years. It was either heavily criticized or profoundly praised. This debate was not only a religious one. There were commentaries from architects, professors of astronomy, scientists, and laypeople. This debate was about the religious and the architectural and mathematical representation of the building. The text of Ezekiel was problematic; it is vague and has contradictory measurements, and the entire precinct of the Temple is not described. This chapter considers the 150 years of debate that Villalpando began, which bought a new mathematical approach to the reconstruction of the Temple.

Keywords Solomon’s temple · Sacred architecture · Juan Bautista Villalpando · John Greaves

Introduction The Temple of Solomon is mentioned in Middoth and the Hebrew Bible, or Tanakh, which is the basis of the Old Testament. In the twelfth century, Moses Maimonides reconstructed the Temple in Book Eight of the Code of Maimonides (Mishneh Torah) (Maimonides 1949). The measurements for this reconstruction came straight from the Middoth, and Maimonides’ reconstruction of the Temple remains one of the most important reconstructions in the Jewish tradition. From the Old Testament, over the centuries, I Kings 6–8 and II Chronicles 3–4 were considered to be the description of the first Temple, and Ezekiel 40–43, which included a great deal more architectural description, was considered to be the Temple of the future. Prior to the publication of Ezechielem explanationes, although Solomon’s Temple played an important role in both Judaism and Christianity, its role was religious and ritual. Although its architecture was discussed, that discussion was on the use of the architecture rather than the architecture itself. In Villalpando’s use of the text of Ezekiel, he was going against the traditional interpretation of biblical texts where Ezekiel’s Temple was the Temple of the future. Villalpando’s justification for using the Book of Ezekiel as his main source was that the numbers and proportions of the building are so absolutely perfect that it provided the perfect rules for architecture, which would permit no deviation (Villalpando and Prado 1604, 16ff).

46 Temple of Solomon

1279

Villalpando’s Flawless System Villalpando changed the perspective of the Temple, and he envisaged that the Temple of Solomon was a building that encapsulated the entire formal grammar of classical architecture. The Temple had harmonic proportions, and Villalpando’s interpretation of ancient measurements appears to create a flawless system. This flawless system is a combination of the Pythagorean-Platonic musical harmonies; the hermetic worlds of microcosm-macrocosm, a cosmic astrological plan that determines the Temple precinct, Vitruvian anthropomorphism, and the module that governs the buildings. Villalpando considered Vitruvius to be the pioneer of architecture, who had codified the norms of architecture. These norms of profane architecture had been derived by Vitruvius from sacred architecture, and in particular from Solomon’s Temple, which itself had been derived from the plan of the Tabernacle given to Moses on Mount Sinai. These norms constituted a natural order. Villalpando believed that “Sacred architecture constitutes the origins of architecture, and that the profane one is like a copy, or, better still, like a shadow of sacred architecture” (Villalpando and Prado 1604, 414). Villalpando claimed that when Vitruvius had studied the norms of architecture, he derived his three classical orders – Ionic and Doric from the elevation of the Holy of Holies, and Corinthian, which was solely derived from the Solomonic order (435). Throughout Ezechielem explanationes Villalpando emphasized the concept of the Divine Architect and that the Temple was a plan drawn by the hand of God, who had “ordered all things in measure, and number, and weight” (Wisdom 11:21). He carefully defined all of the measurements of the Temple, which he believed were derived from the sacred texts and supported by profane texts such as those of Josephus. Newton supplied all of the measurements of the main three floors of the buildings that form the precinct of the Temple. In Fig. 1, column one gives the measurements of the House of the Lord; in column two are the measurements that are given for the atrium, and column three is used for the house of the King. These columns are grouped under the headings: the diameter of the columns, the height of the columns, the height of the entablature, the height of the floors, the height of the balcony, and the overall height of the buildings. The measurements are in the harmonic relationship with each other. The atrium’s measurements are double those of the King’s house, and the measurements of the Lord’s house are double those of the atrium. In all of the columns, the measurements reveal that the second floor is a quarter part smaller than the measurements of the first floor, or a third part of its own measurement smaller than the first four: the third floor is a fifth part less than the second floor measurements or the fourth part of its own measurements smaller than the second floor; proceeding in the same way it makes it possible to calculate the other measurements of the remaining floors, for example, the fourth floor will be a fifth part smaller than the measurements of the third floor or a sixth part of his own measurements smaller than the third floor, and so on (Fig. 1). Villalpando examined the heights of the entablatures and columns. He then considered the atrium’s columns. The atrium’s height is 60 cubits; the height of

1280 Fig. 1 Symmetry of sacred architecture

T. Morrison

46 Temple of Solomon

1281

the first-floor columns is 20 cubits, a third of the overall height, for the second-floor columns is 15 cubits, a quarter of the overall height, and the height of the third-floor columns is 12 cubits, a fifth of the overall height. All of these ratios are replicated in the height of the columns in relation to the overall height. The height of the entablature of the first floor is a 1/12; for the second floor it is one 1/16 and for the third floor it is 1/20 of the overall height. The ratio between the first and second floor is 4:3, and between the second and third floor, the ratio is 5:4. To evaluate the heights of the elements of the entablature, he divided the height of the entablature by eight; each of the friezes and the Crowns are three parts and the other two parts are the height of the architraves. The width of the metopes and triglyphs can be calculated from a distance between the centers of the columns, and in the first floor, the height of the architraves, which was equal to the width of the triglyph, and that of the metopes, was equal to the frieze. The width of the triglyph of the second floor to the first is in the ratio 3:2 and the third to the second 4:3. For Villalpando, all of the measurements of the entire Temple were based on the proportions of the columns. The symmetry and the proportions of the different parts of the temple have no arithmetic progression, following the same reason it can be verified in the numbers of the rows of the vestibule, which Villalpando deduced from the sacred texts. These are 50 – 40 – 30. Villalpando demonstrated the relationship by using the monochord between the widths of the triglyph and the metopes, and this resulted in Pythagorean intervals of the musical scale. The interrelationships between these three buildings of Solomon are shown in Fig. 2. Vitruvius outlined six harmonic ratios: the quarter (diatessaron), the fifth (diapente), the eighth (diapason), the quarter of the eighth (diapason with diatessaron), the fifth of the eighth (diapason with diapente), and the double of the eighth (disdiapason) (Vitruvius 1960, V. ix). Villalpando followed the translation and commentary of Vitruvius, entitled De architectura libri decem, cum commentariis Danielis Barbari and published in 1567 (Vitruvius 1567). Barbaro was opposed to Vitruvius’ musical theory, and Barbaro believed that architecture was like a singer who sang in tune. “This beautiful manner in music, as well as in architecture is called harmony, mother of grace and of delight” (Wittkower 1988, 139). However, Villalpando misinterpreted Barbaro, claiming he rejected the quarter of the eighth of Vitruvius. Villalpando claimed that the quarter of the eighth can be called superpatiens’ (beyond endurance), and it is truly a discordant chord which is not suitable for architecture, subsequently the chords are simply five, three simple and two composed. Villalpando determined the different progressions of the floor plan of the Temple precinct, beginning with the 12 tribes of Israel surrounding the Tabernacle. The distribution of these tribes determined the perfect plan, but the Temple’s dimensions were double that of the Tabernacle. He then demonstrated that the Temple represents the microcosm of the universe – the macrocosm (Fig. 3). Reflected in his floor plan are the celestial orbs which were fortified by 12 bastions, or fortifications, of the perimeter of the Temple precinct. The positions of these bastions equate with the 12 tents of the tribes of Israel.

1282

T. Morrison

Fig. 2 Table of the parts of the entablature

The circumference of the heavens was divided into 360◦ according to the movement of the Sun which every 24 h returns in a circle around the centralized earth. The circumference is three times the diameter of the heavens, which is 120◦ . The height of the Temple being 120 cubits, equates with the width of the celestial orbit. The atrium symbolizes the residence of man and is 60 cubits in height, which is half the circumference of the heavens – i.e., man dwells under the heaven of heavens. This plan represents both the microcosm and the macrocosm which creates a perfect vision of a geocentric universe. While the Temple precinct emphasized the celestial, the Temple’s proportions equate to man’s proportions. His outstretched arms correspond with this measurement, but if those arms are folded in front of the chest, such as in Fig. 4, the width of man will be 1/2 cubits, or 3 ft. The colonnades of the Temple have eight intercolumns, that correspond with the height of the head of man measured from the chin to the upper part and are divided into three galleries or promenades, which correspond to the barrel of the man’s chest with the arms folded. These colonnades correspond to a proportion of 1:2, which is not only a double square, but is also the harmonic ratio of an eighth which is an octave. Thus, Villalpando established

46 Temple of Solomon

1283

Fig. 3 The plan of the temple as microcosm of the universe

that the proportion of the central atrium which surrounds the altar and Temple is a double square. Villalpando’s gridded plan of the Temple (Fig. 5) had 8 open courtyards surrounded by colonnades and incorporated 1500 columns. The Temple precinct itself was 500 × 500 cubits, and the Temple Mount consisted of foundations of 300 cubits, so that the total height was 420 cubits, from the foundations to the top of the Temple. For Villalpando, the Temple of Solomon and its precinct represented the origin of architecture, and no building could ever surpass such perfection. Ezechielem Explanationes contains the important elements of hermetic philosophy of the fifteenth and sixteenth century – astrology, music, and allegory. Astral images and configurations are an essential part of the justification of Villalpando’s floor plan. Villalpando claimed that his reconstruction was an “architecture of theology” for the betterment of his fellow brethren. Through the plan and the

1284

T. Morrison

Fig. 4 A single colonnade and the resemblance to the division of the human stature

architecture of the Temple, which was a symbol of the microcosm, the brethren would have a closer understanding of the mind of God. For Villalpando, the act of contemplating or meditating on the visual of the macrocosm through the microcosm of Solomon’s Temple was an attempt to reveal the Divine truth. In short, the plan of the Temple of Solomon and the celestial heavens contained the power and understanding of the macrocosm.

46 Temple of Solomon

1285

Fig. 5 Villalpando’s floor plan of the Temple of Solomon

Ezechielem Explanationes’ Influence Ezechielem Explanationes’ notoriety was spread primarily through commentaries written by supporters and critics. The first commentary Agostino Tornielli’s Annales Sacri was published in Milan in 1610, and it was subsequently reprinted six times in Frankfurt, Antwerp, and Cologne over the next 50 years. Lewis Cappel made an extensive study of the Temple, which contained abstracts from Villalpando’s Ezechielem Explanationes and was printed in Brian Walton’s Biblia Sacra Polyglotta in Cappel (1657). Both Tornielli and Cappel wrote vehement criticisms of Villalpando’s design. Cappel’s criticism was revised in John Pearson’s Critici Sacri and published in 1660. Nicolaus Goldmann’s Vollständige Anweisung zu der Civil Bau-Kunst, written before 1665, made Ezechielem Explanationes known throughout

1286

T. Morrison

Germany. Goldmann agreed with many points that Villalpando raised, and he believed that the dimensions of the temple, as prescribed in the Bible, should be normative for all architecture (Goldmann 1696, 33). Fischer von Erlach’s Entwurff einer Historischen Architectur included Villalpando’s Temple in his précis of architectural history, which was printed in Vienna in 1721. Ezechielem Explanationes, with its high-quality etchings, was expensive, and it was through these books, which contained abstracts and/or commentaries of Ezechielem Explanationes, that it became widely known. At times Villalpando was quoted from these abstracts rather than from his original work (Perrault 1969, 144–147 and note 24). This interest in the Temple of Solomon was not only scholarly. It led to the construction of large-scale models of the Temple that were exhibited in Germany (Whitmer 2010), Amsterdam, and England. Competing models of the Temple that were constructed in different architectural styles were put on exhibition, and the public flocked to see them (Morrison 2016). Samuel Lee, a natural philosopher and nonconformist minister, published Orbis miraculum, or The temple of Solomon pourtraied by Scripture-light in 1659 a second edition was published in 1665. Although it was also a scriptural exegesis using the book of Ezekiel as its main text, Lee emphasized and illustrated the architecture in his reconstruction. Reconstructing the architecture of the Temple became an integral part of any scriptural exegesis of this period. Inigo Jones studied Stonehenge; a book entitled The most notable antiquity of Great Britain, vulgarly called Stone-Heng on Salisbury plain was posthumously printed but was possibly rewritten by John Webb, Jones’ assistant (Jones 1655). However, the study revealed that Jones believed Stonehenge was connected to the Temple of Solomon as the microcosm of the macrocosm and that its classical composition was in fact a Roman temple with Tuscan columns (Morrison 2014). Architects and scientists, such as Christopher Wren and Robert Hook, discussed the Temple’s architectural merits. There were also unpublished reconstructions, such as Newton’s Prolegomena ad Lexici Propretici partem Secundam: De Forma Sanctuary Judaici, commonly called Babson Ms 434 (Newton mid-1680–early 1690s), and William Stukeley’s manuscript entitled The Creation, Music of the Spheres K[ing] S[olomon’s] Temple Microco[sm] – and Macrocosm Compared &C, written between 1721 and 1724 (Stukeley 1721–1724). Isaac Newton studied the Temple of Solomon for over 50 years. It is often said that his work on the Temple commenced after he had a nervous breakdown in 1693. However, Newton’s work on the Temple had begun by the late 1670s, and some of his key mathematical work on the Temple was completed before 1690 (Morrison 2011). Babson Ms 434 is primarily in Latin, with some Greek and Hebrew and one English marginalia in one of his six drawings in the manuscript. It is an incomplete working document which works through the measurements of Ezekiel’s Temple. Newton attempted to puzzle together the architecture with Ezekiel’s measurements and endeavored to explain contradictions within these measurements. In Isaac Newton’s reconstruction of the Temple, he studied Vitruvius to get a better understanding of architecture. However, he also studied contemporary reconstructions, such as Villalpando, Constantin L’Empereur, John Lightfoot and

46 Temple of Solomon

1287

Louis Cappel, and many others which represented a wide variety of both Christian and Jewish traditional designs. He also considered a large selection of ancient and mediaeval sources. Newton praised the mathematical approach of Villalpando, but he points out that his grid plan cannot be supported, that it lacks in reason, and that Villalpando had incorrectly interpreted Ezekiel which leads to a contradiction within Villalpando’s text and his reconstruction. Within Villalpando’s text he had clearly established that a portion of the atrium was a double square. This correlated with the proportions of Moses’ Tabernacle. However, by creating the grid with triple colonnades of 50 cubits and the atriums 100 square cubits, the Temple atrium is in fact 100 × 250 cubits. For Newton, to accept Villalpando’s reconstruction was to reject the proportions of the atrium of Moses that surrounded the Temple and the altar. Newton believed Villalpando’s reconstruction was a “fantasy” (Newton mid-1680–early 1690s, f56; Morrison 2008). In the Book of Ezekiel, he described two Hebrew cubits, a sacred cubit and what Newton called a vulgar cubit. Ezekiel described the sacred cubit as “the cubit is a cubit and a palm breath” (Ezekiel 42:13). The measurements of the cubit are essential for Newton’s reconstruction; however, the length of both of these Hebrew cubits is not known. Newton claimed that a vulgar cubit was 5 palms and a sacred cubit was 6 palms. But the length of a palm, like the cubit, varied in the different measurements across nations. In the paper posthumous reprinted in 1737 entitled “A Dissertation Upon the Sacred Cubit of the Jews,” he executed an ingenious study to uncover the length of these cubits (Newton 1737). The original paper no longer exists; however, his surviving papers reveal that he had been studying the cubit since the late 1670s (Newton c 1680, 1670s–1680s). From the biblical texts, Newton believed that the builders of the Great Pyramid would have used the standard uniform unit of measurement, because of the number of itinerant workers; he called this standard measurement the Memphis cubit. Newton believed the length of the Hebrew cubit could be derived from the cubits of the different nations, for instance, the Babylonian cubit, the Egyptian cubit, the Greek cubit, the Roman cubit, etc. To calculate the size of the Hebrew cubit, Newton approached the problem of the variation in the measurements by assessing ancient authors’ references to cubits, examining their limits, and then comparing them to each other. He also used the work of John Greaves, a professor of astronomy at the University of Oxford, who had made a study of the Great Pyramid entitled Pyramidographia, published in Greaves (1646). From the measurements that Greaves had taken of the Great Pyramid, Newton stated that they could be evenly divided into Memphis cubits. For instance, the length of the base of the Great Pyramid was 692.8 English feet, or 400 Memphis cubits, or the square passageway was 3.463 English feet or 2 Memphis cubits. (Newton used fractions which were very clumsy, as it often went to 1000th of an inch. For the purpose of this chapter it has been converted into decimal points for convenience.) Both Newton and Greaves believed that the foot and the cubit were the measurements used by every ancient nation, and the ratio between these two was 5:9. They also considered that the length of the measurements were a true representation

1288

T. Morrison

of a foot and a forearm. According to the Talmud, a man’s height is 3 cubits; thus, according to Newton 3 cubits is more than 5 Roman feet and less than 6 Roman feet. Thus, a vulgar cubit would be no less than 20 Roman unciae (A Roman foot equals 12 unciae) and no more than 24 unciae, and a sacred cubit no less than 24 unciae and no more than 28.8 unciae. Newton examined examples from ancient literature to further define his limits. He continued to define his limits through ancient literature using Josephus’ measurements of columns of the court of the Jewish temple, which could be embraced by three men with their arms joined (Josephus 1963). Both the Talmud and Josephus claimed that the circumference of the Temple was 8 Hebrew cubits. Newton did not follow the Vitruvian man; he added a palm length to the length of the outstretched man, making the Newtonian man more oval or rectangular (Morrison 2010). Thus, the circumference of the column was greater than 15.75 Roman feet and less than 18.75 Roman feet. This further defined the sacred cubit as being greater than 24 unciae and less than 27 unciae. He verified this limit with other texts, particularly from Josephus. Newton confirmed that the vulgar Hebrew cubit was derived from the Memphis cubit when the Jews were held in captivity in Egypt. Between Greaves’ measurements, Newton’s estimate of the Memphis cubit and that the ratio between the vulgar cubit and the sacred cubit is 5:6; he concluded that the sacred cubit is 25.6 unciae or 24.816 English inches. This measurement falls within his limit range, confirming its validity. In his reconstruction, Newton assessed the measurements from both Jewish and Christian sacred texts, as well as ancient texts such as Josephus, who had taken measurements of the Temple in cubits. Examining Ezekiel’s text, Newton found contradictions which he attempted to explain. According to Ezekiel the width of the atrium was 100 cubits and the Temple was 70 cubits in width, leaving 15 cubits as the width around the Temple. However, Ezekiel also claimed there were 20 cubits surrounding the Temple. To reconcile the measurements, Newton claimed that the 20 cubits referred to the thickness of the building surrounding the Temple, i.e., the store room + walkway + rooms + wall. However, this only measures 19 cubits according to Newton’s own measurements in his reconstruction. Ezekiel claimed that the foundations of the side chambers are 6 cubits, but the walls of the chambers measure 5 cubits on each side. To correct this, Newton states that the chambers themselves are 6 cubits, and he removed the width of the wall from the measurements, making the perimeter 180 cubits as stated by Ezekiel. Although Newton’s reconstruction is flawed, he was working with contradictory measurements, and his reconstruction (Fig. 6) is perhaps one of the most comprehensive attempts to answer Ezekiel’s challenge. William Whiston was a former pupil and successor of Newton as Lucasian Professor at Cambridge, although later discharged because of his theological views and although he was charged with heresy but he was not convicted. However, he had very strong views on the architecture of Solomon’s Temple. In London in 1726, newspapers announced that he had made a model of the Temple that was to be exhibited in the Haymarket (Anonymous 1726). At that time, a large model 4 × 4 m from Hamburg, based on Villalpando’s reconstruction, was being exhibited to the public in London and it had captured their imagination.

46 Temple of Solomon

1289

Fig. 6 Isaac Newton’s reconstruction of Solomon’s Temple

Whiston believed Ezekiel was an incorrect source for the Temple, but unfortunately his reconstruction and his writing on his reconstruction no longer exist. There are other newspaper advertisements for other models being exhibited at the same time, but unfortunately none of these were documented. The puzzle of Ezekiel’s measurements was a challenge to be resolved or refuted, and this challenge became a public one, with public lectures and models being exhibited. Books explaining the Temple exhibit and prints that illustrated the Temple were sold at these exhibitions. In the British Library some of these books remain, which had been bound together with prints from other exhibitions. The interest in the early eighteenth century in the reconstruction of the Temple was both an intellectual and a public obsession. A manuscript by William Stukeley shows that he examined the order of Solomon. Stukeley disagreed with Villalpando’s claims that the Corinthian style imitated the capitals of Solomon, and that the original foliage was a palm, not the leaves of

1290

T. Morrison

the tree as stated by Vitruvius. Stukeley emphasized that the natural elements of Solomon’s order came from trees, the scrolls of the ram’s horn, and the eventual addition of architectural norms of proportion. For Stukeley, the Doric order was the order that was derived from the order of Solomon. However, in his reconstruction of the Temple of Solomon, he uses all three classical orders. The main source of his reconstruction comes from I and II Kings, II Chronicles, and Jeremiah, and thus he avoids all of the contradictory measurements. Architect John Wood of Bath produced a reconstruction of the Temple that is completely different to that of Villalpando, Newton, and Stukeley (Wood 1968). His precinct is rectangular, being 500 cubits by 840 cubits. His measurements of the Temple were derived from events in the Bible. For instance, he estimated the number of columns that surrounded the court to be 508, which represents the number of years from the Israelites leaving Egypt to the time of building the Solomon’s Temple. His reconstruction was more allegorical than mathematical, and he used the texts of I and II Kings and II Chronicles, avoiding the complications of Ezekiel. By the mid-eighteenth century, the debate that had begun with Villalpando’s publication of Ezechielem explanationes was slowly drifting away from its mathematical basis.

Conclusion With the popularity of Villalpando’s reconstruction of the Temple, Ezekiel’s architectural description became a puzzle to be solved. Between the beginning of the seventeenth century and the mid-eighteenth century, many attempts to reconcile Ezekiel’s measurements into a cohesive plan were made. In 1726 in London, there were so many architectural models being exhibited of the Temple, lectures being given about the Temple and publications and pamphlets on the reconstructions, in the most disagreeing with each other, that a reporter despairingly wrote that they are all “pretending to be true models, yet are different. If our virtuosos can’t agree upon corporeals, no wonder there is such a difference in speculative matters” (Anonymous 1726). Villalpando had claimed that the Temple of Solomon was the microcosm of the macrocosm – it was the earthly vision of heaven, thus to derive its floorplan architecturally correctly was to have insight into the mind of God. However, the problem was never going to be resolved, as Ezekiel’s measurements do not add up. Nevertheless, the debate that arose from Villalpando’s reconstruction produced an array of different and intricate plans that used the measurements to justify the proportions, structure and layout of the Temple. Even though Villalpando’s flawless system of ancient measurements and harmonic proportions is fundamentally flawed, both through his interpretation and the inconsistency of his source Ezekiel, he created a system that was worth the long and very productive debate.

46 Temple of Solomon

1291

References Anonymous (1726) Advertisment. Mist’s Wkly J issue 67 (August 6) Cappel L (1657) Chronologia sacra. In: Walton B (ed) Biblia sacra polyglotta. London Goldmann N (1696) Vollständige anweisung zu der civil bau-kunst, 2nd edn. Wolfenbüttel Greaves J (1646) Pyramidographia. London Jones I (1655) The most notable antiquity of Great Britain vulgarly called Stonehenge on Salisbury Plain. London Josephus (1963) Jewish antiquities. William Heinemann Ltd, London Maimonides M (1949) The code of Maimonides. Yale University Press, New Haven Morrison T (2008) Villalpando’s sacred architecture in the light of Isaac Newton’s commentary. In: Kim Williams (ed) Nexus: architecture and mathematics VII. Basel, pp 79–91 Morrison T (2010) The body, the temple and the Newtonian man conundrum. Nexus Netw J 12:343–352 Morrison T (2011) Isaac Newton’s Temple of Solomon and his reconstruction of sacred architecture. Springer, Basel Morrison T (2014) The origins of architecture: an English sixteenth to eighteenth century perspective. Common Ground, Champaign Morrison T (2016) Isaac Newton and the Temple of Solomon: an analysis of the description and drawings and a reconstructed model. McFarland, Jefferson Newton I (c 1670s–1680s) Untitled treatise on revelation (Yahuda Ms 3). Unpublished manuscript. National library of Israel, Jerusalem Newton I (c 1680) Miscellaneous notes and extracts on the temple, the fathers, prophecy, church history, doctrinal issues, etc. (Yahuda 14), unpublished manuscript, Jerusalem. National Library of Israel, Jerusalem Newton I (Mid 1680–early 1690s) Prolegomena ad lexici propretici partem secundam: De forma sanctuary Judaici (Babson Ms 434). unpublished manuscript. Babson College, Wellesley Newton I (1737) A dissertation upon the sacred cubit of the Jews. In: Miscellaneous works of John Greaves Professor of Geometry at Oxford. London Perrault C (1969) Unknown designs for the Temple of Jerusalem. In: Fraser D, Hibbard H, Lewine MJ (eds) Essays presented to Rudolf Wittkower. Phaidon, London/New York Stukeley W (1721–24) The creation, music of the spheres K[ing] S[olomon’s] Temple microco[sm]- and macrocosm compared &c. FM MS 1130 Stu (1). Freemasons Library, London Villalpando JB, Prado J d (1604) Ezechielem explanationes et apparatus urbis Hierosolymitani commentariis et imaginibus illustratus, Roma Vitruvius (1567) De architectura libri decem, cum commentariis Danielis Barbari. apud Franciscum Franciscium Senensem & Ioan, Venetiis. Crugher Germanum Vitruvius (1960) The ten books on architecture. Dover Publications, New York Whitmer KJ (2010) The model that never moved: the case of a virtual memory theater and its Christian philosophical argument, 1700–1732. Sci Context 23(3):289–327 Wittkower R (1988) Architectural principles in the age of humanism. Academy Editions, London Wood JoB (the Elder) (1968) The origin of building or, the plagiarism of the heathens detected. Gregg International Publishers, Farnborough

47

Utopian Cities Tessa Morrison

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Search for the Ideal City . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1294 1294 1307 1307

Abstract The concept of the ideal city being a perfect geometrical structure has its roots in the Pythagorean philosophy of number symbolism. For the Pythagoreans, the mystical cosmos could be understood through geometry and numerology, through perfect and pure numbers. These philosophies carried through into neoPythagorean, Platonic, and Neoplatonic philosophies. In John of Patmos’s Book of Revelation, the image of heaven is in the form of a city. The shape of this city is heavily influenced by Neoplatonic geometry, and in the Bible God proclaimed that “You have ordered all things in measure, number and weight” (Wisdom: 11:21). Although Early Christianity was reluctant to accept pagan wisdom in the fourth century, Augustine claimed that they came close to the truth because of their observations of the pattern of creation through nature. Through the writings of Augustine, the numbers of the Pythagorean concept of the decad retained their significance in Christianity (Augustine (1972) The city of God. Penguin Books, Harmondsworth). This is illustrated where Plato is depicted as a central figure in Raphael’s “School of Athens,” commissioned by Pope Leo X in 1509. Plato is holding a copy of the Timaeus, a dialogue which tells the story of the origin of

T. Morrison () The School of Architecture and Built Environment, The University of Newcastle, Newcastle, NSW, Australia e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_2

1293

1294

T. Morrison

the cosmic system and man’s place within it. The pagan tradition of numerology and geometry was revered in the highest places throughout the Renaissance and beyond. It became enmeshed into the wisdom of the macrocosm (the heavens) and how it was reflected within the microcosm (earth) through the perfection of its geometry, which became enmeshed in the symbolism of the ideal city and the ancient unsolved problem of squaring the circle.

Keywords Utopia · Symmetry · Ideal republic

Introduction Thomas More published Libellus vere aureus, nec minus salutaris quam festivus, de optimo rei publicae statu deque nova insula Utopia (Of a Republic’s Best State and of the New Island Utopia) in 1516, the story of the fictional island and its society, religion, and political and social customs. The title soon became shortened to Utopia, and the word “Utopia” has now become part of common language, meaning an imaginary place or a state of things in which everything is perfect. That perfection is generally associated with a city that is geometrical. The concept of an ideal Utopian city is geometrical and perfect, while an unsymmetrical city is imperfect and corrupt. This image has recurred over the last 2000 years, from Plato’s ideal city of Atlantis to the solutions for a housing crisis in the Industrial Revolution. However, the geometrical city and a Utopian-built environment still resonate today. This chapter looks at the progression of the perfect numbers and their geometry, which has become embedded into the plan of the Utopian city.

The Search for the Ideal City The sixth-century BC mystic, Pythagoras, founded a new religion based on philosophy and science. To the Pythagoreans, geometry, numbers, and harmonies were closely linked, since numbers in astronomy could be expressed as geometrical constructions and harmonies were expressed in ratios of distance between planets and the harmony of the spheres. As the basis for Pythagoras’s numerological and geometric concepts, they have been powerful philosophical tools in both design and religious symbolism for the past 2000 years. They represented the ideal – the perfect heavenly structures which could be replicated on earth. For the Pythagoreans, the monad was a single point or the number one. It was the ultimate unit of being, such as an atom or a soul. The monad was a principle of magnitude. The dyad was 2 points, or the number two; the 2 points define a line that has no dimension (Heninger 1974, 78–79). The Pythagoreans regarded the dyad as the cosmic opposite to the monad (Gorman 1979, 140). Although the monad and the dyad are the archetypes of odd and even numbers, to the Pythagoreans they are not arithmetical numbers. The monad has the potential of all numbers,

47 Utopian Cities

1295

while the dyad represents the concept of extension and, as such, represented the divisibility of the physical world (Heninger 1974, 87). Three, or the triad, was the first arithmetical number and defined a surface. The triad was the addition of the monad and the dyad, which were the creators of numbers yet not themselves numbers. Four points was the number four or the tetrad and represented volume, while the combination of these four numbers was decad, or ten, and encompassed the whole and was sacred to the Pythagoreans. All of the numbers, the decad, had symbolic meaning. Odd numbers were masculine and even numbers feminine. The addition of the feminine two and the masculine three was the first marriage number – five. Six is the product of two and three and is the first fertile marriage number (Capella 1992, 735–36). The product of 3 and 4, 12, is the first marriage number of real numbers, since 3 is the first number. Additionally, it is the fertile marriage of the soul, three, and the body, four. The tetrad and the decad, however, had special significance for the Pythagoreans. The tetrad represented the extended universe, and the decad represented the limits of the universe. Embodied in the concept of the tetrad are the four elements: earth, fire, water, and air (Plato 1995, 32.C). There are five regular solids; the first four, the pyramid, cube, octahedron and icosahedron, can be constructed out of the equilateral triangle. Plato built on Pythagorean philosophies of number symbolism, which were carried through in the Neoplatonic and neo-Pythagorean philosophies. The Timaeus is the only full dialogue of a work that consisted of at least three dialogues that relate a connected and coherent story. The second dialogue is Critias, and only a fraction of that exists, while the third dialogue, Hermocrates, has been lost and the subject matter unspecified. While the Timaeus tells the story of the cosmic system, it finishes with a dialogue about the place of man in that cosmos, and the Critias tells the story of the city of Atlantis, which studies the ideal environment through the perfect commonwealth. Critias is the first surviving story of the city of Atlantis and the first description of an ideal – or later to be called a Utopian – city. In Critias, Poseidon became the god of Atlantis after the gods had decided to divide the earth up between them. It was a very large island that was opposite the Straits of Gibraltar. Poseidon fell in love with one of the inhabitants of the island called Cleito. She became one of his mistresses, and he enclosed her in a fortress that was encircled by concentric rings of sea and land. “There were two rings of land and three of sea, like cartwheels, with the island at their center and equidistant from each other, making the place inaccessible to man” (Plato 1995, 113). Enclosed on that island, Cleito bore Poseidon five sets of male twins. Thus, Atlantis eventually became divided into ten kingdoms that were distributed between the sons. After this initial division between all of the sons, each of the ten kingdoms was to be only ruled by the eldest of the King, so that there would only ever be ten kingdoms of Atlantis. The central island was five stades in diameter, it was surrounded by a ring of water that was one stade wide, followed by a ring of water and land that were both two stades wide, and finally a larger ring of water and land that were three stades wide (see Fig. 1). The city of Atlantis was adjoined to a rectangular and uniform flat plain that measured 2000 stades by 3000 stades. The ten kings had absolute power in their

1296

T. Morrison

Fig. 1 The ring of Atlantis

own kingdom. However, Poseidon’s laws jointly governed the ten kingdoms and at the temple of Poseidon the kings “assembled alternately every fifth and sixth year (thereby showing equal respect to both odd and even numbers)” (Plato 1995, 119), to respect those laws. Numbers and geometry play an important role in the city of Atlantis. It was orderly and symmetrical in both its design and its political structure. Ten, six, and five were important numbers in Atlantis, while the masculine three and the feminine two are used as the first marriage numbers in this dialogue. The product of three and two, six, represents the fertility of the marriage, while the decad, or ten, embodies all creation. Plato utilizes Pythagorean number symbolism to represent the myth of the union of Pythagoras and Cleito and the foundation of its models, as well as to create the orderly and ideal city of Atlantis. The number and geometry of the ideal city of Atlantis are in sharp contrast with the description in Critias of the prehistoric Athens. In Critias there is no mention of any numbers associated with the prehistoric Athens that relate to its construction or its appearance. Additionally, the city was nonsymmetrical. There is no mention of any distances or anything about its rulers or the number of kings. While Atlantis counts and measures everything, Athens has no geometry and quantifies nothing – the ideal celestial origins of both its commonwealth and city versus the earthly and nondescript. Only a fraction of Critias survives, and most of the remaining dialogue is the description of the two cities. Atlantis was a mystical city of order and symmetry. With its embedded numbers and geometry, it appears to embody the ideal and celestial wisdom. However, Atlantis was “swallowed up by the sea and vanished” (Plato 1995, 25), despite its perfection. Nevertheless, the tale of Atlantis in the dialogue of Critias establishes the concept of geometry and perfection in a city.

47 Utopian Cities

1297

In John of Patmos’s Book of Revelation, Neoplatonic and neo-Pythagorean numerology becomes a central part of Christian symbolism. At the heart of the Book of Revelation is a perfect gem-like city that is as clear as crystal – the New Jerusalem. It is a perfect cube in shape and had 12 gates, 3 on each side of the cube. The length, breadth, and height of this great city were 12,000 furlongs (Revelation, XXI:11–21). Throughout Revelation John measures the city with a rod to emphasize that God “ordered all things in measure, number, and weight.” The numbers 3, 4, 7, and 12 and their multiples, such as 24 and 144, play a dominant role. The cube was a Platonic solid that represented the earth. Within John’s vision, he represented the integration of heaven and earth – incorporeal and corporeal. The New Jerusalem is the earthly city in heaven. The incompatibility of these two worlds is represented by the problem of squaring the circle. According to Aristotle, either a line is finite or infinite. A circle represents infinity as it has no ends and is complete and perfect, while a square is complete because it has an end and a beginning (Aristotle 1953). According to the Pythagorean philosophy, a circle is equivalent to a point – monad – one. Since the circle is eternal, therefore it is infinite. This contrasts with a square, which represents the physical world because four points were the minimum required for the three-dimensional extension (Heninger 1974). The geometrical problem of squaring the circle endeavors to explain the infinite in terms of the material world. This is reflected in the medieval imagery of the ideal city, New Jerusalem and the Divine Architect. The celestial, or heaven, was represented as the city of New Jerusalem in early images in Christian churches from the fourth century on. The reverence for the symmetry of the celestial city is echoed in the designs of medieval cathedrals and their representations of the New Jerusalem (von Simson 1984). Medieval builders and churchmen argued that the more symmetrical and geometrically precise the cathedral was, the more it led to a better understanding of God. The more geometrically harmonious the church was, by relying on orderly, balanced geometrical shapes, the more it was made possible to be closer to God and the divine (Lace 2001). One of the most important religious poems in early Ireland is Saltair na Rann, and it is stylistically dated to the tenth century. Although its architectural description is different from Revelation, there is an emphasis on the measurements of the city. The geometry of the celestial city in Saltair na Rann reflects the significance of pagan and early Christian numerology – the 4 walls, 8 porches with 12 divisions and a total of 40 doorways. The number 40 became an important number in Babylonian astronomy but gained significance in the Roman era through the astronomy of Ptolemy’s 40 epicycles. Ptolemy’s astronomy shaped the vision of the universe, from the time it was written in c.130 CE in Alexandra to the early seventeenth century. The city of Saltair na Rann’s geometrical structure (Fig. 2) reveals the importance of the numerology and geometry of the ideal city (Morrison 2010). However, from the eleventh century, the geometry of the image of the divine city became more consistent. Between the eleventh and the thirteenth century, there was an attempt to represent the ideal city, New Jerusalem, as a bird’s eye view of the entire city in a square motif. The image was a complex, symmetrical one. In a

1298

T. Morrison

Fig. 2 The plan of the celestial city in Saltair na Rann

single plane, all external views of the four walls and the inner courtyard are seen from a bird’s eye (Fig. 3). From an aerial view, all four walls of the city lie 90◦ from the usual position and in the same plane as the aerial view. Each wall is lying 90◦ from the other and the four walls bordering the central courtyard, which is also viewed aerially but at the same angle as the walls. In addition, John’s visions of the Lamb and the Angel, and occasionally himself, are integrated into the image in the courtyard but are viewed at 90◦ difference to the aerial view of the courtyard. These images of the city are developed into an elaborate geometric motif, and the city is viewed from different angles from a complex perspective; it is possible to see the full 360◦ view of the exterior of the city and the center of the interior of the city. Although this moved away from Aristotle’s representation of infinity – a circle – the complex perspective was a visual representation of the entire city that was also complete and perfect. This square geometrical motif was copied in manuscripts for over 200 years, throughout Europe. Although the geometrical structure of this motif was copied, it was stylized to suit contemporary and cultural tastes of that time. For example, an

47 Utopian Cities

1299

Fig. 3 The New Jerusalem, mid-thirteenth century, Trinity College Cambridge MS R.16.2

eleventh-century manuscript, Facundus Beatus from Spain, depicts keyhole arches which are typical of Mozarabic architecture (Anonymous 1047). While in a twelfthcentury manuscript from France, the gates have more classical rounded arches, which was a style of the French contemporary architecture of the day (Anonymous c1150). An English manuscript from the thirteenth century, The Trinity Apocalypse, reveals pointed Gothic arches (Fig. 3) (Anonymous 1260). Architecture of the ideal city was seen in terms of contemporary architecture. These same manuscripts reveal this complex geometrical motif for the ideal city when the fall of Babylon is depicted with no symmetry or geometry and only viewed from one elevation. Thus, while symmetry and geometry define the ideal, asymmetry defines the earthly and corrupt. The image of the Divine Architect creating a perfect symmetrical universe was deeply rooted in medieval thought and imagination. The ideal universe could be

1300

T. Morrison

Fig. 4 The Divine Architect, mid-thirteenth century, Frontispiece of Bible Moralisee

constructed through the symmetry and geometry created with the compass and rule, an image that originated from Plato’s Timaeus as well as Ptolemy’s Almagest. There are over 40 surviving images of the Divine Architect wielding compass and rule from the medieval period (see Fig. 4) (Friedman 1974). This highlights the concept of geometry and the ideal city, whether earthly or celestial. Although the geometrical square motif of the celestial city disappeared from manuscripts after the fourteenth century, its description remained in poetry. The fifteenth-century poem Pearl describes the celestial city, where all angles and elevations of the city are seen from a single point (Morrison 2009). The richness of the description of this view from the single point underlies the significance of geometry and the concept of the ideal. The first architectural design for an ideal city in the Renaissance came from Florentine architect Antonio di Pietro Averlino, now commonly called Filarete, who described the layout and buildings of the city Sforzinda and its port Plusiapolis in his book Libro Architettonico, written between 1461 and 1464. Libro Architettonico is a dialogue, with many of the characters in the dialogue contemporaries of Filarete. In the dialogue, he promises to build the Duke of Milan, Francesco Sforza, the ideal

47 Utopian Cities

1301

Fig. 5 The ground plan of Sforzinda as drawn in Libro Architettonico

city. He describes the overall ground plan of the city (Fig. 5) as being formed by two squares, stating: The basic form is two squares, one atop the other without the angles touching. One angle will be equal distance from two other angles in each square. There is a distance of ten stadii from one angle to the other, that is, a mile and a quarter. The perimeter of each square is 80 stadii and the diameter with the 28 stadii. The angular circumference is 80 stadii. (A stadium is an ancient Greek measurement which is the length of a semicircle course where foot races were held and is equal to about 185 m (607 ft).) (Filarete 1965, 13v)

There was a distance of 10 stadii between the corners, and the side of each square is 20 stadii, with the diameter of 28 and the exterior circumference around the corners of the superimposed squares of 80 stadii. In other words, the perimeter of the square is equal to the circumference of the circle; Filarete is attempting to square the circle, reconciling the celestial and the earthly in his ideal city. However, to do so Filarete uses a value for π that is less than the Biblical 3 (1 King 7.23). Filarete develops an architectural Theory of Decorum based on Leon Battista Alberti’s De re aedificatoria. Alberti suggested that his social theory of architectural orders was based on the notion that working class people were short, strong, and

1302

T. Morrison

simply dressed, while the upper classes were tall, slender, and richly decorated (Onians 1988). Like Alberti, Filarete claims that the qualities of the buildings match the qualities of the occupants, but his principles of Decorum to “dress” the buildings go much further than Alberti. The qualities are related to social class, an architectural order, and proportion. Both Alberti and Filarete followed Vitruvius’s architectural orders. However, Filarete changed the sequence of the architectural orders (see Table 1). He believed that Ionic was for bearing weight; Corinthian was used in buildings that need support, while the greatest of them all is Doric, used to support and to decorate a building. Filarete states that the form and relationships that were founded in the orders of architecture corresponded to similar forms and relationships in men and society (Morrison 2015). Filarete associates two different types of proportions to the three orders; that is, the proportion of the columns in the height and the proportion of the ground plan. These three proportions for the ground plan “are made in two squares, one and one half and one diameter” (Filarete 1965, 60r). Filarete correlates these measurements with the three orders (see Table 2). The three proportions for the ground plans are used as a baseline, and there is a sliding scale around them showing degrees of superiority of the inhabitants. There is a consistency in Filarete’s use of his social stratification classification throughout the city except for one enclosed compound, which has a stratification of its own. This compound is the largest compound in the entire city; it is given two measurements in the text: 1500 × 750 and 1500 × 700 braccia. (A braccio is an old Italian unit of length, usually about 26 or 27 in. (66 or 68 cm), but can vary between 18 and 28 in. (46 and 71 cm).) This large compound housed the House of Virtue and Vice, a circus, craftsmen’s workshops, the Temple of Virtue, and the house of the Architect who created the compound. Architectural historian Hanno-Walter Kruft claimed that the House of Virtue and Vice represented an “architectural allegory [that] attains the dimensions of architectural parlante” (Kruft 1994: 55). It was an architecture that “spoke” to the viewer. The House of Virtue and Vice was architecturally very complex, and many symbolic references were included in the design of the building. There was only one

Table 1 Alberti and Filarete’s principles of decorum Alberti Corinthian Ionic Doric

Elegant, upper class Reliable, solid, middle class Load-bearing, strong, working class

Filarete Doric Corinthian Ionic

Table 2 Filarete’s proportions Classical order Doric Corinthian Ionic

Proportion of the height 1:9 1:8 1:7

Proportion of the ground plan 1:2 2:3 √ 1: 2

47 Utopian Cities

1303

Fig. 6 Sketches of the House of Virtue and Vice from Filarete’s Libro Architettonico (Drawn by Author)

entrance to the entire compound, and visitors to the compound had to pass through the House of Virtue and Vice to enter the compound. The plan consisted of a square building, 2 floors high, surrounding a rotunda building 11 stories high, including 1 floor that was below ground level (Fig. 6). In the square building of the House of Virtue and Vice, visitors had to select a series of doors to pass through, as a test to decide the path that they would follow (i.e., virtue or vice). Those who chose to follow vice were not included in the remainder of the text and never mentioned again. However, it can be presumed these visitors disappeared to the two lower levels of prisons. Only those who followed the path of virtue could continue into the House of Virtue and move into the compound. Those who enter the House of Virtue and Vice, whether it be scholars, men of arms or artisans, are tested for their virtue. The entire compound celebrated the talents of scholars, men of arms, and craftsmen. Within the compound, all men were viewed as socially equal, and individuals competed to achieve the highest results in their disciplines. Although the compound itself is in the dimension of Doric or a little under Doric, the buildings within the compound no longer follow Filarete’s Theory of Decorum. However, the House of Virtue and Vice supports the idea of squaring the circle, with the square base of the House of Vice representing the earthly and the round tower of the House of Virtue representing the celestial. While the rest of the city is classified in the hierarchical stratification of the Theory

1304

T. Morrison

of Decorum, to pass through the House of Virtue and Vice was to square the circle and enjoy a more equitable social structure within the walls of the compound. The theme of squaring the circle is replicated in Thomas More’s Utopia to symbolize the ideal world and city. The word “Utopia” has Greek origins and has been translated as “nowhere” or “no place.” More’s island of Utopia is not located in time or place; however, it gives the illusion of having both place and time. In the original Latin, the island is described as being shaped as a “lune,” a geometrical shape created by the overlapping of two circles with different radii, and this text is often translated as shaped as a new moon. (Utopiensium insula in media sui parte (nam hac latissima est. millia passuum ducenta porrigitur, porrigitur magnumque per insulae nonmulto angustior, fines versus paulatim utrimque tenuatur. Hi velut circumducti circino quingentorum ambitu millium, insulam totam in lunae speciem renascentis effigiant.) The island is: 200 miles across in the middle part, where it is widest, and nowhere much narrower than this except towards the two ends, where it gradually tapers. These ends, curved round as if completing a circle 500 miles in circumference, make the island crescent shaped, like a new moon. (More 2001, 51)

Hippocrates of Chios (late fifth century BC) is the first mathematician to examine the lune shape and to calculate its exact area. It appears that he wanted to solve the classical problem of squaring the circle; however, he is the first to square up a figure with a curved boundary (Postnikov and Shenitzer 2000). In making the island of Utopia a lune shape, More brings together the concepts of squaring the circle and transforming the celestial into the earthly. It is impossible to work out the exact area of the island, since More only gives one circumference of 500 miles. However, the measurements that he does give are impossible. If the circumference of the island was 500 miles, the diameter would be 159 miles, so it is not possible for the luneshaped island to be 200 miles wide since there could be no point of intersection with a second circle to create the lune. Impossibility reinforces the idea that the island is nowhere. The concept of squaring the circle is present in another aspect of the text as well. Hythloday, visitor to the island of Utopia and the narrator of the dialogue, also explained that the use of Utopian language was very robust, stating: “[t]hey study all branches of learning in their native tongue, which is not deficient in terminology or unpleasant in sound and adapts itself as well as any to expression of thought” (More 2001, 155) (Disciplinas ipsorum lingua perdiscunt. Est enimneque verborum inops nec insuavis auditu nec ulla fidelior animi interpres est.). Hythloday claimed that the language was derived from Persian in most respects but had vestiges from Greek. Notably, the 1517 edition of Utopia included a Utopian alphabet in which the Utopian script corresponds with Roman letters (Fig. 7). The invention of the Utopian language has been credited to both More and Peter Giles. Geometrically, the language plays on the transformation of squaring a circle, supporting More’s concept of reconciling the divine and earthly, but also suggests mathematical precision and perfection (Conley and Cain 2006, 202). There is a clear metamorphosis in the shape of the letters, in the transformation of their geometrical shape from square to circle, perhaps mimicking the square cities and the lune-shaped island.

47 Utopian Cities

1305

Fig. 7 Utopian language devised by More and first published in the 1517 edition of Utopia

The lune-shaped island contained 54 quadrilateral planned cities. More only described the main bureaucratic city, Amaurot, but stated that each city had the same layout and design as that of Amaurot (insofar as the terrain allowed). More was a humanist scholar, and he would have found a precedent in Plato’s “geometrical truths” as a graphical means of reconciling the earthly and the celestial. In the original 1516 text, next to the descriptions of the quadrilateral planned cities is a side note: “likeness breeds concord,” the geometrical similarity of the cities breeds harmony. Both Filarete and More would have believed that the problem of squaring the circle was a possible geometrical construction, even though the solution had been elusive. Although it was believed by the mid-eighteenth century that the problem was impossible and search for its solution “fruitless,” this was not proven until the late nineteenth century (Schepler 1950). Utopia was a phenomenal success; within 3 months of its first Latin edition in 1516, it was reprinted. This was followed by a Paris edition in 1517 and a Basel edition in 1518. In less than 40 years, it had been translated into six vernacular European languages. More acknowledged that he had been influenced by Plato and

1306

T. Morrison

was indebted to Plato’s Republic, Timaeus, and Critias (Logan 1989). The concept of the perfect city and Commonwealth became replicated in many books that were written in the form of dialogue and situated in a possible location – nowhere. One of these dialogues, which was heavily influenced by Utopia, was written by a Calabrian Dominican priest, Tommaso Campanella, who wrote City of the Sun in 1602, although it was not published until 1623. It was a time of great unrest, with new scientific, as well as mystical, ideas being promulgated by philosophers but against a background of suppression supported by capital punishment by the papacy for unorthodox ideas. Campanella turned away from the Aristotelian philosophy which was central to scholarship at the time and turned toward Platonic ideals. He had been imprisoned for “heresy” in 1594 and spent 27 years in various Nepalese and Papal prisons. Writing in prison he conceived the City of the Sun as an earthly Utopian city that was laid out to the celestial plan. The city was perfectly circular and consisted of seven circuits, each named after a planet. The seven circuits were intersected by four streets that were aligned with the four points of the compass. The first two circuits were on the flat plane, while the others rose up a hill. There was a gate at each circuit where it crossed the road and stairs leaving the circuits as well as the road. Central to the plan, and on top of the hill, was a massive temple which was 350 paces in diameter and was crowned by a massive dome which was approximately 300 paces in diameter (124.4 m or 408 ft). Campanella had been to Florence and Rome and was familiar with Brunelleschi’s Dome in Florence which was 41.5 m or 136 ft in diameter, while Michelangelo’s Dome in Rome was 43 m or 139 ft and the Pantheon 43.5 m or 141 ft. Campanella’s Dome of 300 paces was close to three times larger than these three famous domes. The symmetry and the monumental size of the City of the Sun were beyond human scale (Fig. 8). By the early seventeenth century, symmetry was synonymous with Utopian cities. Although many plans for Utopian cities followed, not all were as monumental as the city of the Sun, but all followed strict symmetrical rules. By the beginning of the nineteenth century, a housing crisis caused by the Industrial Revolution and enclosures of common land were beginning to change the urban landscape of Britain. In 1817 Robert Owen, a successful industrialist and manager of the New Lanark Mills, was asked by the “Committee of the Association for the Relief of the Manufacturing and Labouring Poor” to give his views of the causes of the national distress. Owen conceived the Villages of Unity and Co-operation Utopian villages, a concept of self-contained urbanism. The layout used strict symmetry (Fig. 9). The squared buildings of his plan were sufficient to house the ideal size of these

Fig. 8 Elevation of City of the Sun

47 Utopian Cities

1307

Fig. 9 Owen’s villages of unity and cooperation

communities, 1200 people. Within the square were the public buildings, and the square was divided into parallelograms. They were heavily criticized and became known as “the parallelogram of paupers” (Harrison 2009).

Conclusion Despite the criticism, many plans for similar symmetrical self-sufficient villages were published, for instance, John Minter Morgan’s The Christian Commonwealth (1849), James Silk Buckingham’s National Evils and Practical Remedies, with The Plan of a Model Town (1849), and Robert Pemberton’s Happy Colony (1851). Each plan was perfectly symmetrical. Symmetrical and Utopian cities have become interdependent concepts, perhaps not mathematically rationalized such as in ancient and Renaissance Utopian cities. However, this interdependency remains into the twenty-first century. One of the most publicized current Utopian plans is The Venus Project (https://www.thevenusproject.com/). Developed for the millennium by Jacque Fresco, it reports to be an “alternative vision for the future.” The Venus Project is a futuristic city which embodies geometrical symmetry and is entirely equitable and self-sustaining. Despite its futuristic, sustainable, hightech projections, its concepts and plans are strongly rooted in the Renaissance mathematical concepts of the Utopian vision for architecture and society.

References Anonymous (1047) Facundus beatus. Biblioteca Nacional, Madrid Anonymous (1260) The trinity apocalypse. Trinity College, Cambridge Anonymous (c1150) Beatus of Saint-Sever. Billiotheque Nationale, Paris

1308

T. Morrison

Aristotle (1953) On the heavens. William Heinemann, London Augustine (1972) The city of God. Penguin Books, Harmondsworth Capella M (1992) Martianus Capella and the seven liberal arts: the quadrivium of Martianus Capella: Latin traditions in the mathematical sciences. Columbia University Press, Columbia Conley T, Cain S (2006) Encyclopaedia of fictional and fantastic languages. Greenwood Press, Westport/London Filarete (1965) Filarete’s treatise on architecture. Yale University Press, New Haven/London Friedman J B (1974) The architects compass in creation miniatures of the late middle ages. Traditional dwellings and settlements review 30 Gorman P (1979) Pythagoras a life. Routledge and Kegan Paul, London Harrison JFC (2009) Robert Owen and the Owenites in Britain and America: the quest for the new moral world. Routledge, London Heninger SK (1974) Touches of sweet harmony: Pythagorean cosmology and renaissance poetics. The Huntington Library, San Marino Kruft H (1994) A history of architectural theory: from Vitruvius to the present. Princeton Architectural Press, New York Lace WW (2001) The mediaeval cathedral. Lucent Books, San Diego Logan GM (1989) The argument of utopia. In: Interpreting Thomas More’s utopia. Fordham University press, New York, pp 7–35 More T (2001) Utopia. Yale University Press, New Haven/London Morrison T (2009) Seeing the apocalyptic city in the fourteenth century. In: Michael A Ryan and Karolyn Kinane (eds) End of days: essays on the apocalypse from antiquity to modernity. McFarland Press, Jefferson Morrison T (2010) Imperial roman elements in the architecture of the city in Saltair na Rann. In: Celts in legends and reality: papers from the sixth Australian conference of celtic studies. Sydney University Press, Sydney Morrison T (2015) Unbuilt utopian cities 1460 to 1900: reconstructing their architecture and political philosophy. Ashgate, Farnham Onians J (1988) Bearers of meaning: the classical orders in antiquity, the middle ages, and the renaissance. Princeton University Press, Princeton Postnikov MM, Shenitzer A (2000) The problem of squarable lunes. Am Math Mon 107(7): 645–651 Plato (1995) Timaeus and Critias. Penguin Books, London Schepler HC (1950) The chronology of PI. Math Mag 23(4):216–228 von Simson O (1984) The gothic cathedral: origins of the gothic architecture and the medieval concept of order. Princeton University Press, Princeton

Tessellated, Tiled, and Woven Surfaces in Architecture

48

Michael J. Ostwald

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background to Tiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tiling in Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1310 1313 1317 1322 1322 1322

Abstract This chapter is focused on the tessellation, tiling, and weaving of architectural surfaces. These three processes result in the production of a geometric pattern of connected shapes which cover a plane. In architecture, such techniques are typically employed to create a more durable or weatherproof finish for a floor, wall, or ceiling. But they also provide a means of decorating a surface to achieve an aesthetic, poetic, or symbolic outcome, some of which are used to evoke particular mathematical properties. This chapter provides an overview of the development of architectural tiling, highlighting key connections to mathematics. The architectural examples range from simple Neolithic weaving and stone cutting practices to late twentieth century aperiodic cladding systems in major public buildings. The chapter also refers to past research into tiling in architecture and the primary themes which have been examined in the past.

Keywords Architecture · Tiling · Tessellation · Weaving · Aperiodic tiling M. J. Ostwald () UNSW Built Environment, University of New South Wales, Sydney, NSW, Australia e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_86

1309

1310

M. J. Ostwald

Introduction Nikos Salingaros (1999) argues that throughout history, patterns have provided a fundamental connection between architecture, mathematics, and society. The human mind is not only drawn to identify patterns in nature and society, but also to develop new patterns and visually represent them to make sense of their applications or consequences. Salingaros postulates that at its most basic level, this sociocultural affinity for patternmaking and identification “explains the ubiquitousness of visual patterns in the traditional art and architecture of mankind” (1999: 76). He even suggests that tiled, tessellated, or woven geometric arrangements are the “visible tip” (Salingaros 1999: 76) of mathematics in society, providing an important link between abstract ideas and their practical application. In a sense, even the most basic tiled patternmaking in architecture is significant because it represents both a pragmatic and poetic connection between mathematics and the world. But for Salingaros (1999), these basic geometric patterns that were once pivotal to maintaining this connection have not always been celebrated or respected. For example, in the first half of the twentieth century, he proposes that Modernist architects devalued the role of the tile, and since then, most architects have either ignored it or forgotten its potential. The reasons why Modernist architects devalued tiling provides an interesting insight into the differences between architects and mathematicians attitudes to tessellations. The Modern movement in architecture, which developed in Europe and America in the early decades of the twentieth century, had a fascination with large-scale sculptural form-making, functional expression, and mechanized production (Ostwald and Dawes 2018). As such, Modernist architects tended to valorize materials, like concrete, steel, and glass, which did not need applied finishes, and they also called for the rejection of all ornamentation and decoration. For example, European architectural theorists like Adolf Loos (1998) rejected tiling on the basis that it was unhygienic (having too many unnecessary joints), socially biased (as it required a particular class of artisan workers), and culturally inferior (because geometric patterns were often used in primitive or tribal art in Africa and Asia). The only tessellations or patterns that Modernist architects found acceptable were those that arose as a by-product of large-scale industrial manufacturing. Thus, the simple rectilinear gridded or “stacked” pattern produced by the use of prefabricated components was considered “honest” because it was an expression of the basic properties of the material (Fig. 1). The standard “stretcher” bond tiling arrangement was considered “functional” because its overlapping pattern enhanced its structural properties (Fig. 2). However, a “herringbone” pattern was regarded as “dishonest,” because the material was being used to achieve a decorative and thereby morally debased outcome (Fig. 3). These three tessellations – stacked, stretcher, and herringbone – are all somewhat trivial in a mathematical sense. Indeed, they can be constructed using exactly the same components or tiles and they all cover a surface in an efficient manner. But for a Modernist architect or theorist, the choice of patterns had additional ethical, moral, or symbolic implications which also had to be taken into account.

48 Tessellated, Tiled, and Woven Surfaces in Architecture

1311

Fig. 1 Stacked pattern

Fig. 2 Stretcher or running pattern

While a detailed analysis of early Modernist attitudes to patternmaking is beyond the scope of this chapter, they do provide an insight into the complexity of the topic. For a mathematician, a tessellation may be expressed using an algebraic or algorithmic notation, and its essence may be interrogated or classified using several related methods. For an architect, a tiled or patterned surface must be appropriate

1312

M. J. Ostwald

Fig. 3 Herringbone pattern

for the environmental conditions of a site, the proposed function of the building, and its budget. In addition, a pattern might also evoke social or cultural properties and symbolize values or communicate particular messages. Paradoxically, it is these latter, more aesthetic or poetic properties of architectural tiling which have been more closely associated with developments in mathematics. Indeed, in the last decade of the twentieth century, there was a renewed interest in complex tiling in architecture which has continued to the present day. Despite shifting attitudes to tiling, a recent framework for classifying connections between architecture and mathematics identifies “surface articulation” as a consistent theme throughout the history of architecture. Surface articulation is defined as the “use of mathematics to achieve an efficient or controlled coverage of a defined plane” (Ostwald and Williams 2015: 40). This category encompasses “empirically or intuitively derived methods for achieving a waterproof, or windproof [ . . . ] surface,” along with the use of geometric tiles “to achieve an intricate, patterned surface covering” (40). These definitions encompass both a sense of the pragmatic and aesthetic implications of tiling in architecture. However, in this chapter, the focus is only briefly on the practicalities of tiling, as some of the most interesting connections between architectural and mathematical tessellations often occur because designers are inspired by particular developments in mathematics. This chapter provides a background to the history of tiling in the built environment and an overview of parallel developments that have occurred in mathematics. However, before progressing, there are five caveats that must be stated. First, the majority of the “architectural” examples discussed in this chapter are interior or exterior surfaces, rather than complete buildings, and as such they could also be

48 Tessellated, Tiled, and Woven Surfaces in Architecture

1313

regarded as being drawn from the fields of interior design or façade design. Second, this chapter also discusses woven surfaces, treating them as a special type of tiling. The reason for this addition will become clearer in the following section, but the earliest patterned surfaces created in primitive societies were typically woven, and their resultant geometric expression was similar to a tiled or tessellated surface, even if they were not produced in the same way. Third, the majority of this chapter is about surface or “plane-filling” tiling. However, it is also possible to talk about the tiling or tessellation of space, which is associated with “space-filling” properties. Just as a square tile can readily cover a surface creating a gridded pattern, a cubic tile can readily fill a volume in a lattice. With a few exceptions, architecture’s interest in tiling has been dominated by the former category, surface tessellations, and examples of the use of volumetric tessellations are less common. Finally, when artists, designers, or architects are inspired by mathematics, they tend to adapt, rather than adopt, the language of mathematicians. As such, the ways architects describe tiling are not consistent with the nomenclature used in mathematics. Furthermore, architects’ knowledge of contemporary tessellation theory is limited and most often tiling patterns are applied without any deep understanding of their mathematical properties. This last caveat should be kept in mind when reading this chapter and looking at its examples.

Background to Tiling In architecture and mathematics, the words “tiling” and “tessellation” are often used interchangeably to describe the process of covering a surface in polygons. While some mathematical texts use the word “tessellation” when the pattern has a clear underlying rule and “tiling” when it does not, this use is not consistent. The word “tessellation” is derived from the Latin tessela, which was a small, semi-regular cubic tile used for mosaics in the ancient Greek and Roman worlds. The origins of the word “tile” are typically traced to the Latin tegula, meaning to cover something. In essence, both words have their origins in the process of covering large walls, floors, or ceilings with smaller, more durable or decorative elements. A more mathematical definition of tiling is that it is the process of covering a surface using plane shapes such that two conditions are met. First, there must be no overlap between any tiles, and second there must be no gaps in the resultant surface. If these two conditions are met, then the tiling is said to be “complete” or “true.” In addition to this, every instance of a particular polygon in a pattern is defined as a “tile,” and the number of different polygon types used to create the tessellation is called the “tile-set.” Thus, if a surface is covered in identically sized square tiles, this is a tile-set of one as there is only one type of tile (Fig. 4). The stacked, stretcher, and herringbone examples in the previous section all have a tile-set of one. If a surface is covered in a mixture of square tiles, and right-angled triangular tiles (made by dividing a square across its diagonal), it is described as a tile set of two, and so on. In theory, there is no maximum number for a tile-set, as it is possible that every tile differs from every other one. In contrast, there are only three “regular”

1314

M. J. Ostwald

Fig. 4 Square

or “congruent” tessellations with a tile-set of one – squares, equilateral triangles, and hexagons (Figs. 4, 5, and 6). Tessellations such as these, which display multiple lines of symmetry or repeat their configuration, are usually called “periodic” sets. Some of the most recognizable periodic patterns are based on squares, rectangles, trapezoids, or parallelograms. Furthermore, several three-dimensional shapes also perfectly fill a volume of space, by repeating only a single shape. The most obvious example of this is the cube, but also the rhombic dodecahedron has this property along with a number of triangular and rhombic prisms. Whereas periodic tiles must have lines of symmetry or repeat their configuration, there are also two types of “aperiodic” tiles. The first type can fill a plane either periodically and aperiodically, depending on the rule used to assemble them. Gardner’s quadrilaterals are an example of the first type of aperiodic tiles (Ostwald 1998). The second type, which are also the most interesting to mathematicians and architects, are those that can only tile aperiodically. While in recent years this category has been the focus of considerable research, prior to the 1960s it was not even apparent that it existed at all. In 1961, Hao Wang set out to determine if, given a particular polygonal tileset, there was a way of determining its capacity to tile a plane periodically (that is, to determine if they would repeat their configuration). However, in 1965, Robert Berger proved that it was not possible to develop a decision procedure for periodic tiling, and thus aperiodic tile-sets and patterns must exist (Rubinsteim 1996). This realization triggered a series of developments in the field, with Berger proposing the first aperiodic tile-set which had 20,426 different shapes. In 1967, Berger reduced this number to 104; in 1968, Donald Knuth reduced it to 92; and in 1971,

48 Tessellated, Tiled, and Woven Surfaces in Architecture

1315

Fig. 5 Equilateral triangular tessellation

Fig. 6 Hexagonal tessellation

Raphael Robinson reduced the set to just 6 tiles. Two years later, Roger Penrose demonstrated that using a parallelogram tiling period, the set could be reduced to just two tiles, known today as “Penrose tiles” or the “darts and kites” set (Fig. 7). In the early 1990s, John Conway identified a further two-tile aperiodic set known as the “pinwheel” set.

1316

M. J. Ostwald

Fig. 7 Penrose “dart and kite” aperiodic tiling

Interestingly, the Penrose “darts and kites” set exhibits a type of “quasisymmetry” reminiscent of the fivefold symmetry found in quasi-crystal geometry in the 1960s and in aluminum-manganese crystals in the 1980s (Stewart and Golubitsky 1993; Ostwald 1998). A peculiar characteristic of Conway tiles is that like fractal geometric forms, they can be scaled, such that a smaller version of the tile-set can nest within a larger (Fig. 8). There are multiple periodic sets where this can occur, but the Conway pattern is the only well-known aperiodic set with this property. Despite this, such tile sets tend to be what Benoit Mandelbrot calls “trivial fractals,” meaning that the connection between tessellations and fractal geometry is not necessarily as significant or interesting as this example might suggest (Ostwald and Vaughan 2016). Some tessellation patterns repeat their small-scale geometry at larger scales, which is part of the definition of a fractal. But there are many examples of nonfractal patterns with these same properties, including congruent square (Fig. 4) and equitriangular (Fig. 5) tessellations. In parallel with the development of examples of plane-filling aperiodic tiling, there have also been several proposals for space-filling aperiodic sets. Both Koji Miyazaki (1977) and Robert Ammann identified two tile-sets that will fill space aperiodically. Ammann’s tile-set comprises a pair of rhombohedra formed by creating two solids, each of which have six sides that are all the same as Penrose’s starting rhombus for the formation of the dart and kite set. Plane-filling and spacefilling variations of aperiodic Voronoi tessellations have also been developed. While these Voronoi sets do not possess the mathematical purity of Ammann’s and Miyazaki’s sets, they are interesting because they appear to replicate several natural phenomena, which have in turn inspired architects, designers, and artists.

48 Tessellated, Tiled, and Woven Surfaces in Architecture

1317

Fig. 8 Conway “pinwheel” aperiodic tiling

Tiling in Architecture Throughout history, tiling has been closely associated with architecture. Even the earliest primitive shelters were often floored with woven matting, walled with roughly stacked masonry and carved with geometric patterns. One of the oldest foundation myths of architecture, the “primitive hut of the ancients,” depicts prehistoric humans platting together branches in a tight grid to create the first shelter (Rykwert 1991). Evidence of this approach to weaving baskets, mats, and screens can be found in Pre-Mesolithic and Neolithic societies (10,000 BC). These early constructions were produced by platting dried leaves or vines to create larger surfaces. By repeating a regular pattern, these woven objects became more durable, and by combining different materials, they become decorative surfaces which often grew to take on cultural significance (Gerdes 1999). Indra Kagis McEwen (1993) suggests that the origins of architecture – meaning not only the first constructed works, but also those cited in mythology as the starting point for architecture – can be traced to either woven structures or mosaic floors. While, from a mathematical perspective, weaving and tiling are different processes, they share several common linguistic ancestors and their surface conditions have similar properties. For example, in Greek mythology, Daedalus is credited as being the first architect because he created the Labyrinth at Knossos and Ariadne’s choros, an intricate mosaic dance-floor. However, there remains some confusion about

1318

M. J. Ostwald

which of these is regarded as the first work of architecture. In mythology, both are classified as architecture by virtue of the fact that they are geometric weaves that animate human behaviors. In the first instance, Daedalus’ Labyrinth at Knossos is a complex spatial pattern that describes the pathways taken by the Minotaur at its core, and the sacrificial victims released into its confines. Conversely, the choros of Ariadne encapsulates the movement patterns followed by dancers as they engage in a choreographed rite. In both cases, the surface pattern is significant because it shapes or animates the actions of people. Regardless of which of these examples might be considered the first work of architecture, it is clear that in Greek and Roman mythology, the origins of architecture are tied to the act of creating patterned surfaces. For Grünbaum and Shephard (1987: 1), the practical “art of tiling” commenced with the first use of stones to cover a floor. Such irregular floors can be found in Neolithic ruins, and from around 7000 BC, evidence is available that more regular stone and timber floors were used in granaries. In Mesopotamia and ancient Egypt, primitive bricks were made by putting clay into square or rectangular molds (often with thickened or hollowed central sections, to facilitate stacking) and then firing the clay in a kiln or drying it in the sun. These bricks were used to create roads, paths, floors, and walls, being tiled together in a range of patterns which were intended to increase their strength or durability. Grünbaum and Shephard (1987: 1) also specifically connect the “art of tiling” with architecture because in ever the earliest societies, people realized that tessellated patterns add richness, decoration, or ornamentation to the surface of a building. Furthermore, at any point in history, “whatever kind of tiling was in favor, its art and technology always attracted skillful artisans, inventive practitioners, and magnanimous patrons” (Grünbaum and Shephard 1987: 1). For example, some of the earliest mosaics can be traced to Mesopotamia, when colored tiles were combined together to create patterns or pictures on walls or floors. Colored tiles, which celebrated geometry, rather than pictorial representation, were also extensively used in Ancient Persian and Indian buildings, and today they are often linked to Islamic architecture. While the use of such tiling patterns was more ubiquitous than this, the Mosques, Madrassas, and Palaces of North Africa and the Middle East represent some of the most advanced tiling applications ever seen, and remain a source of fascination and debate to the present day (Bonner 2017; Wichmann and Wade 2017). Tiling was an important part of Islamic architecture because religious beliefs restricted the use of pictorial or figurative representations. As such, it is not surprising that elaborately carved and inlaid tilings and arabesques reached possibly their highest level of refinement in Islamic architecture. For example, the Moorish palace of the Alhambra in Granada (Andalusia) features examples of between thirteen and sixteen different types of congruences or isometries. These tiles, which are mostly from the late thirteenth and early fourteenth century expansion of the building, are today celebrated for their richness and inventiveness. The latter dimension, inventiveness, is significant because it was only in the seventeenth century that the first serious study on tessellation was completed by Johannes Kepler, and the nineteenth century that Yevgraf Fyodorov identified that periodic

48 Tessellated, Tiled, and Woven Surfaces in Architecture

1319

tilings all conform to one of seventeen isometries or “wallpaper” groups. Thus, three hundred years before the different types of congruences in periodic tiling were classified and understood by mathematicians, the designers of the Alhambra had already used many of them. In a further example, Lu and Steinhardt (2007) note that several tiling patterns in the mid-fifteenth Iranian architecture have similarities to Penrose tiling in that they appear to possess quasi-periodicity. Lu originally observed these properties in the Madrasa in Bukhara (Uzbekistan) and in a more advanced version in the Darb-i Imam shrine in Isfahan (Iran). While both the veracity of claims about the tiling patterns in the Darb-i Imam shrine and the date they were produced have been disputed (Lauwers 2018), Kukeldash Madrasah in Bukhara also possess several complex tiled surfaces within three dimensional ribbed-ceiling domes or arches (Makovicky 2018). A completely different set of tiling traditions emerged across Europe, with the floors of monasteries, public buildings, and commercial buildings often tiled in symmetrical, repetitive colored patterns, while churches and palaces were lined in mosaic representations of biblical scenes, famous battles, or landscapes. As such, a bifurcation occurred in European tiling practices with more functional patterns reserved for interior thoroughfares or axes, and a pictorial tradition, sometimes occurring in parallel with frescos or tromp l orl, serving as a type of communication or entertainment. Examples of the first type, and especially so-called “checkerboard” tilings, can be found throughout history in classical Greek temples, Roman villas, Medieval wineries, and Victorian hospitals. In general, such geometric tiles tend be used in entries, hallways, or atria. While the base patterns were often simple, they were also regularly framed with “border tiles” or punctuated with decorative tiles, sometimes depicting heraldic coats-of-arms or guild signs and emblems. Despite the use of colored tiles and those with different textures or illustrations, many of these tiles could be regarded as being largely functional applications. However, there are notable exceptions including the tiled floors in Masonic lodges and temples. Such ceremonial floors are typically rectilinear in shape and are tiled (often on the diagonal) in a black and white checkerboard, with a border of triangular (or diamond-shaped) tiles enclosing them. Many of these floors feature decorative central tiles depicting crosses (“saltires”) and six-pointed stars, or they are inlaid at their corners with images of Masonic tools; the square, compass, plumb-line, and trowel. An example of the second type of tiling, where the purpose is more representational or pictorial, is the famous second century mosaic from Palencia, Spain. This tiled panel depicts the four seasons on a square field that has been divided diagonally creating four triangular regions, which are further subdivided into octagonal, trapezoidal, and cruciform tilings. Each of the triangular regions is centered on a human face (within an octagonal tile), surrounded by depictions of birds, plants, and mythical beasts, representing the passage of time in a year. In another example, in the sixth century Basilica di San Vitale in Ravenna, Italy, the elaborate mosaics depict Old Testament stories in the rich golden and brown hues of the Helenistic-Roman decorative tradition. The underlying geometry of

1320

M. J. Ostwald

the tiling patterns in these mosaics is not significant, because their purpose is to illustrate a combination of biblical themes and natural elements (flora and fauna). The Basilica di San Vitale also has a famous mosaic floor depicting a labyrinth, as does the thirteenth century Chartres Cathedral in France. While the actual purpose of labyrinth tilings of this type remains unknown, they are often interpreted as either symbols representing the challenging paths taken by pilgrims to reach the place of worship, or else as the literal path a kneeling penitent follows around the floor of the cathedral while seeking absolution. Such examples, much like the Daedalic Labyrinth at Knossos and Ariadne’s dance-floor, use tiling to illustrate potential or desired movement. By the early nineteenth century, pictorial tiling had fallen out of favor across most parts of Europe, being both prohibitively expensive and, in some countries, less socially acceptable than it previously was. In contrast, simple, decorative tiling had become sufficiently affordable that it could be found in many houses in fire place surrounds, window and door sills, and even the walls of kitchens. During the Victorian era in England and America, tiles were not only hard-wearing, they were a means of displaying middle class wealth and they also offered a level of hygiene which the previous stone, timber, or stucco surfaces could not. However, in the closing years of the nineteenth century a range of conflicting forces began to change the ways tiles were used. On the one hand, the industrial revolution increased the scale of mass production such that tiles were more available than ever before, but on the other, with the rise of the Arts and Crafts movement in England, there was a new-found enthusiasm for handmade tiles. Modern Architecture, which grew to dominate the architectural canon of the early twentieth century, rejected Victorian, Gothic-Revival and Arts and Crafts style architecture, along with almost all of their tiling traditions. Oscar Niemeyer was possibly the only Modernist architect who included elaborate murals and tiled surfaces in his designs for religious structures in Pampulha and Brasilia (Brasil). By the 1970s and 1980s, a growing number of architects and urban planners had begun to criticize Modernism for its often cold, inhuman spaces and forms. The backlash against Modernism, which was collectively known as Post-modernism, sought to reintroduce a connection to history and a sense of humanity. One design strategy Post-modern architects used to achieve these goals was to employ ostentatious tiled decorations on walls and floors, some of which were intended to replicate or recall historic places or events. For example, architect Charles Moore’s Piazza d’Italia in New Orleans (USA) was completed in 1978. This design for a public space features multiple decorative, oversized, and brightly colored tiling patterns, which generally evoke ancient Roman (or classical Greek) architecture (Jencks 1991). The ground-plane of Piazza d’Italia is tiled in concentric black and white rings, recalling both the labyrinth floors of Basilica di San Vitale and Chartres Cathedral and the iconography of Las Vegas Casinos and shopping malls. As the tiles Moore chose were often cheap and mass-produced, the whole affect was like a temporary stage set. Robert Venturi and Denise Scott Brown’s 1982 Gordon Wu Hall (Princeton, USA) uses grand, decorative tiling patterns above its entry façade, and their 1983 Lewis Thomas Laboratory

48 Tessellated, Tiled, and Woven Surfaces in Architecture

1321

for Molecular Biology (Princeton, USA) is covered in multiple different brick tessellation patterns. In both cases, the tiling patterns tend to be simple, but the way they are positioned so prominently and boldly was intended as an ironic counterpoint to Modernism and a means of evoking historic building styles (von Moos 1987). One of the first direct architectural engagements with the mathematics of tiling occurred in the final decade of the twentieth century. In Melbourne (Australia), ARM architects were commissioned to refurbish the historic Storey Hall auditorium. Their design, which was completed in 1996, overlays lime green Penrose Tiles on the historic façade, as well as throughout the new foyer and auditorium. While it might be possible to classify ARM’s use of tiles as an extension of the Post-modern tradition, their design includes multiple references to the actual mathematical properties of aperiodic tiles and their explanations directly reference Penrose’s work (Ostwald 1998). Not only does their version of the dart and kite set partially function in three dimensions, but it emphasizes the importance of those parts of surfaces which are either untiled or which cannot be tiled because of a deliberate flaw in the logic followed. In another more recent example, Federation Square in Melbourne is a large arts and entertainment complex which was completed in 2002. It was designed by LAB architects and it uses a pinwheel aperiodic tile-set as both exterior cladding and to shape parts of its planning. The exterior pinwheel tiles are made of zinc (both solid and perforated), glass (translucent and frosted), and sandstone. A three-dimensional glass tile set also serves as an atrium roof and the public spaces of Federation Square are lined with cobblestones (often inlaid with fragments of text), beneath which is a hidden environmental “labyrinth.” Federation Square and Storey Hall are not alone in their flamboyant adoption of recent mathematical tile-sets for generating architectural space and form. There have also been multiple examples of Voronoi tessellations including Minifie Nixon’s Centre for Ideas (Melbourne) in 2004 and ARM’s 2009 Melbourne Recital Hall (Melbourne). Such is the popularity of aperiodic and quasi-periodic paving patterns that they have been used in many public buildings and space – for example, the Mathematical Institute in Oxford, the Science Centre in Paris and the Zamet Centre in Rijeka – and have even been used in commercially produced bathroom and kitchen tiles. In most of the examples given in this chapter, the focus has been on the seamlessly covered surfaces, but certain combinations of periodic tile-sets can create areas that are unable to be tiled. Conventionally, this is regarded as a flaw or error and is rectified by removing a number of surrounding tiles and reworking the pattern until there are no holes. But what if the holes, their shapes and frequency, are viewed as a different dimension of the tiling pattern? John Conway describes this way of thinking about tessellations as “hole theory” and it as akin to imagining “a vast temple with a floor tessellated by Penrose tiles and a circular column exactly in the center. The tiles seem to go under the column, [but] the column covers a hole that can’t be tessellated” (Gardner 1989: 26–27). ARM’s Storey Hall offers an early architectural interpretation of the significance of untiled, or un-tile-able parts of surfaces.

1322

M. J. Ostwald

Conclusion In the first century BC, the Roman author and military architect Marcus Vitruvius Pollio published De Architectura. In that work, he famously defined the three core properties of architecture as firmness, commodity, and delight. These same properties are also behind the enduring appeal of tiled, tessellated, and woven surfaces in architecture. Tiling occurs in part to increase the strength or stability of a surface, it also improves the function or usability of a surface, and it provides an opportunity to develop decorative, symbolic, or poetic properties. Thus, for practical, social, and cultural reasons, tiling has remained significant in architecture. However, the relationship between architectural and mathematical tiling has been less consistent. It could be argued that architectural tiling, and particularly in Islamic architecture, effectively predates the development of formal mathematical knowledge about tiling. It must also be noted that more recently, architects and designers have learnt from and openly adapted advances in aperiodic tiling. Whether or not this relationship will continue is currently unknown, as multiple cultural and social factors shape the way architects and designers work with mathematical concepts. But regardless of the evolving relationship between architecture and mathematics, tiling remains one of its most obvious and visible points of contact.

Cross-References  Classical Greek and Roman Architecture: Examples and Typologies  Classical Greek and Roman Architecture: Mathematical Theories and Concepts  Fractal Geometry in Architecture  Labyrinth

References Bonner J (2017) Islamic geometric patterns: their historical development and traditional methods of construction. Springer, Cham Gardner M (1989) Penrose tiles to trapdoor ciphers. W. H. Freeman, New York Gerdes P (1999) Geometry from Africa: mathematical and educational explorations. Mathematical Associate of America, Washington, DC Grünbaum B, Shephard GC (1987) Tilings and patterns. W. H. Freeman, New York Jencks C (1991) The language of post modern architecture. Wiley, London Lauwers L (2018) Darb-e imam tessellations: a mistake of 250 years. Nexus Network J 20:321–329 Loos A (1998) Ornament and crime: selected essays. Ariadne Press, Riverside Lu PJ, Steinhardt PJ (2007) Decagonal and quasi-crystalline tilings in medieval Islamic architecture. Science 315:1106–1110 Makovicky E (2018) Vault mosaics of the kukeldash madrasah, Bukhara, Uzbekistan. Nexus Network J 20:309–320 McEwen IK (1993) Socrates ancestor: an essay on architectural beginnings. MIT Press, Cambridge, MA

48 Tessellated, Tiled, and Woven Surfaces in Architecture

1323

Miyazaki K (1977) On some periodical and non-periodical honeycombs. University Monographs, Kobe Ostwald MJ (1998) Aperiodic tiling, Penrose tiling and the generation of architectural forms. In: Williams K (ed) Nexus II: Architecture and mathematics. Edizioni dell’Erba, Florence, pp 99– 111 Ostwald MJ, Dawes MJ (2018) The mathematics of the modernist villa: architectural analysis using space syntax and isovists. Birkhäuser, Cham Ostwald MJ, Vaughan J (2016) The fractal dimension of architecture. Birkhäuser, Cham Ostwald MJ, Williams K (2015) Mathematics in, of and for architecture: a framework of types. In: Williams K, Ostwald MJ (eds) Architecture and mathematics from antiquity to the future: Volume I. Birkhäuser, Cham, pp 31–57 Rubinsteim H (1996) Penrose tiling. Transition 52(53):20–21 Rykwert J (1991) On Adam’s house in paradise: the idea of the primitive hut in architectural history. MIT Press, Cambridge, MA Salingaros N (1999) Architecture, patterns and mathematics. Nexus Netw J 1(1):75–85 Stewart I, Golubutsky M (1993) Fearful symmetry. Penguin, London von Moos S (1987) Venturi, Rauch and Scott Brown: buildings and projects. Rizzoli, New York Wichmann B, Wade D (2017) Islamic design: a mathematical approach. Springer, Cham

Stereotomy: Architecture and Mathematics

49

Giuseppe Fallacara and Roberta Gadaleta

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geometric Knowledge for the Rationalization of Structural Form Constructed with Small Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stereotomic Architecture Is Historically Based on Geometrical and Cutting Technique Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Application of Stereotomy Using Innovative Technology: “Stereotomy 2.0” . . . . . . . . Research About “Stereotomy 2.0” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stereotomy with 3D Printing in the Age of Industry 4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1326 1326 1327 1330 1332 1336 1343 1343 1343

Abstract Stereotomy is the art of building with small structural elements, geometrically refined, which allow the construction of architectural systems with triple value: aesthetic, static, and functional. A sense of constructive rationalization is inherent to the discipline of stereotomy which permits the optimization of structural form through the knowledge of geometry and cutting techniques and to simplify it using advanced technologies and techniques for design and production of building elements. While originally formulated in the sixteenth–seventeenth century, stereotomy is valuable in contemporary architecture, and research about it continues today in various centers around the world. This chapter traces the fundamental aspects of the history of stereotomy and its development to

G. Fallacara · R. Gadaleta () DICAR, Politecnico di Bari, Bari, Italy e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_85

1325

1326

G. Fallacara and R. Gadaleta

the present day, touching on its geometrical principles and mentioning some important examples.

Keywords Stereotomy · Geometry · Architecture · Stone · Bond · Cutting techniques · 3D printing · Robotic fabrication · Industry 4.0

Introduction Stereotomy (from ancient Greek στερεo´ ς, “solid” and τoμη, ´ “cut”) is the art of cutting solids, that, through geometrical knowledge, provides a constructive rationalization of architectonic systems composed of distinct structural elements that are interlocked with each other, generating a resistant form that combines aesthetic expressivity with structural strength. Stereotomy derives from the art du trait géométrique (the art of the geometrical line) or the art of drawing, through rules of projective geometry, the real shapes of the faces of the constructive elements to cut them. The ancient discipline of stereotomy, originally defined by Philibert Delorme in Le Premier Tome de l’Architecture and formulated in the sixteenth–seventeenth century, is also being revived in recent years as it is valuable in contemporary architectural design and construction. Research on Stereotomy for many years constituted the cultural identity of the Faculty of Architecture in Politecnico di Bari and at a later time of the DICAR, started by its founder Professor Architect Claudio D’Amato Guerrieri, and continues today in other research centers around the world. Through the knowledge of stereotomy, it is possible to update the traditional constructions made with small structural elements – which constitutes an important part of architectural heritage and is a fundamental model of constructive rationalization – and to simplify it using modern technologies for design and production of building elements. This chapter provides an overview of the fundamental aspects of the history of stereotomy before describing its development to the present day, including examples of recent applications and propositions.

Geometric Knowledge for the Rationalization of Structural Form Constructed with Small Elements Studies of stereotomy have a close relationship with the historical architectural practice, which over the centuries has used the construction technique consisting of the organic connection of small structural elements realized in the different materials in close relationship with the particular genius loci. This traditional construction technique, although mostly not codified by the stereotomic discipline, used methods dictated by the ancient geometria fabrorum , which were handed down over the centuries, to resolve the same constructional rationalization needs which are intrinsic to the logic of the masonry structures but also to that of the wooden

49 Stereotomy: Architecture and Mathematics

1327

structures. The application of this construction technique, employing materials with a plastic-masonry vocation, is visible in the use of both stone material and brick, which is used by Brunelleschi for the configuration of the original bond solution of the dome of the cathedral of Santa Maria del Fiore in Florence. Instead, using of materials with an elastic wood vocation, this construction technique is well represented by the studies of Leonardo da Vinci and Sebastiano Serlio on reciprocal frame structures, whose subsequent transposition into stone constructional logic is represented by the stereotomic patents of Joseph Abeille and Jean Truchet approved in 1699 and published in Machines et inventions approuvées par l’Académie Royale des Sciences (Gallon 1735: 159–164). These patents present stone bonds for a “flat vault” with a single ashlar (or a square-cut building stone) configured by stereotomic techniques. In parallel with the traditional construction technique, the art du trait géométrique in the historical stereotomic treatises, also refers to “to the projective technique aimed to the wood and stone building art” (Fallacara 2007: 23). The dome of Royal Chapel of Château d’Anet (XVI Cent.) in Anet (France), projected by Philibert Delorme, is an example of architecture codified by stereotomic discipline. This dome measures approximately 8 m of span, with 28 invariant ashlars types (Potié 1996: 116–117) which have the height of face at extrados (or outer or upper surface of an arch or vault) from 20 to 50 cm. This example shows how the stereotomic discipline, when compared to traditional construction of dome that requires more ashlars, allowed Delorme to control the form definition to optimize the production process of the invariant ashlars, minimizing the number of panneaux (or panels or cardboard shapes) needed to cut them. The continuity seen in these historic examples of constructive knowledge, demonstrates that techniques which optimize construction, rationalizing it, have always been pursued because they are efficient and advantageous and they allow to respond coherently and practically to the problems of contemporary times. As Paolo Portoghesi said to Claudio D’Amato in the interview written in D’Amato (2014): Innovation and permanence are two values of exactly the same importance [ . . . ]. Because innovation is not in vain, it is necessary to have a substantially strong permanence; and a permanence associated with a construction technique is certainly very important (179). The study of stone by adding stereotomical knowledge to the knowledge, let’s say instinctive, of practice tradition, it is a fascinating innovation (187). Heidegger said that the great tradition is wait, openness to the future, the future is built only with [ . . . ] the gifts of the past (189).

Stereotomic Architecture Is Historically Based on Geometrical and Cutting Technique Knowledge The structure obtained using small constructive elements requires a well-determined geometry, thus generating a form-resistant structure that produces an aesthetically refined form with structural valence.

1328

G. Fallacara and R. Gadaleta

“Geometria autem plura praesidia praestat architecturae (geometry offers many aids to architecture)” is written in Liber I,1,4 of Vitruvio’s De Architectura, specifying that the art of construction is based on the knowledge of precise geometric rules that allow for the control of the shape to be constructed. A constructed form can be obtained through the application of geometria fabrorum, namely, the empirical rules derived from traditional knowledge, or through theoretical geometry in its disciplinary codification, on which the art of stereotomy is based for the control of complex forms (Gadaleta 2018: 708). Knowledge of Classical and Projective geometry is fundamental for a stereotomic architectural project, because it allows for the control of the whole constructed form and of the structural elements that compose it for their design and cutting. The treatise by A. F. Frézier of 1737–1739 (Frézier 1980) (Fig. 1) proposes that stereotomy, understood as the art de la coupe des pierres (of stone cutting), is the direct consequence of the rigorous method du trait géométrique, originally used in carpentry for the cut-off of structural elements, and based on the projective method analyzed and then codified by Girard Desargues, who generalized the Euclidean geometry to infinite space and to homogeneous coordinates. Philibert Delorme, “who was ascribed with the origin of the stereotomic discipline” (Fallacara 2007: 38), declares his geometry as coming from the Euclidean geometry. Stereotomy is a body of theoretical-practical knowledge, and the geometric theory is always indispensable for the stereotomic project, as it is used to determine the form of each single structural element related to the good realization of the whole constructive system. The theory - writes Rondelet - is a science that guides all practical operations. This science is the result of experience and reasoning based on the principles of mathematics and physics applied to the different operations of art. It is by the theory that a skilled builder comes to determine the forms and the right dimensions that must gave to each part of the building, according to its position and to the efforts it will have to bear, so it turns out perfection, solidity and economy. (D’Amato 2014: 31).

The correspondence of form and structure is inherent in the masonry construction technique, which provides for the interlocking assembling of distinct structural elements and of reduced dimensions, by generating a precise bond that can simultaneously determine expressivity aesthetic and structural strength. In a stereotomic architectural system, it is very important that the organic relationship which exists between the morphology of each individual construction part and the configuration of the whole construction system is in a strong relationship of reciprocal dependence. Stereotomy is founded on three invariants that are characteristic and intrinsic to the definition of stereotomic architecture: the ’prefigurative invariant’, or the ability to subdivide an architectural continuum into the appropriate parts; the ’technical-geometric’ invariant, or the ability to provide a precise geometric definition of the stone element (projection technique); the ’structural invariant’, or the ability to ensure the mechanical balance of the architectural system with dry joints (the mechanics of rigid bodies). The existence of these three principles [ . . . ] makes it possible to define a selective filter that can distinguish generic

49 Stereotomy: Architecture and Mathematics

1329

Fig. 1 A. F. Frézier, Traité de stéréotomie, 1737–1739, Tome II, Planche 53. (From Frézier 1980: 331)

1330

G. Fallacara and R. Gadaleta

stone architecture from architecture of a stereotomic nature, which may be considered as the extreme speculation of stone architecture. (D’Amato Guerrieri and Fallacara 2006a : 34–35).

These three principles provide increase sense of constructive rationalization that simplifies the production and construction process of masonry organic architectonic systems. This occurs because of: – The design accuracy of the whole masonry bond, founded on the knowledge of its geometrical and static rules, can improve the morphology of the elements to be produced and reduce their number. – Executive precision of each structural element, founded to the knowledge of the current advantageous cutting techniques.

The Application of Stereotomy Using Innovative Technology: “Stereotomy 2.0” Using contemporary terms, it is possible to identify with “Stereotomy 1.0” as the historical discipline of stereotomy, related to the definition of stonecutting techniques. It follows that “Stereotomy 2.0” is the contemporary development of the discipline with the use of new technology. This “new field of research, called ‘digital stereotomy’, was born in 2000 at the Politecnico di Bari (Italy) and has since spread throughout academic community worldwide” (Fallacara and Barberio 2018). Its origins can be found in the knowledge of construction techniques based both on traditional modus operandi and codified by stereotomic discipline, combined with the knowledge of the innovative technology of infographic design and the production of structural elements, demonstrating “the potential of load-bearing stone in contemporary architecture when tradition and technological innovation are combined” (D’Amato Guerrieri and Fallacara 2006b: 39). The stereotomic discipline allows for the precise geometric definition of the whole architectural system and of each constructive element that composes it, leading to a “strong conceptual affinity with the development of integrated CAD-CAM systems, which ensure the fundamental requisite of Stereotomy, specifically design and building precision” (D’Amato Guerrieri and Fallacara 2006a: 35). The stereotomic complex art du trait géométrique (of the geometric drawing) is based on a knowledge of complex drawing rules from projective geometry, for determining the whole masonry system and the real shape of all element’s sides for their cutting. This process is in fact simplified by current three-dimensional computational design techniques. The control of this complex geometry is made simpler by the modern three-dimensional infographic modeling technology tools that, compared to the traditional geometric tracing method, allows the architect to visualize the geometric division on the plane and the corresponding structural tessellation in space. This can be used to verify equivalences and symmetries of the parts in which it is subdivided with visible security of the actual accuracy of the

49 Stereotomy: Architecture and Mathematics

1331

design, without resorting to mathematical calculations and more complex geometric construction rules. The use of CAD allows the architect to model each construction element and assemble it with the others into an architectural system. Its two-dimensional and three-dimensional configuration can then be viewed, in a progressive sequence of commands and operation. This process effectively connects the contemporary architect with the ancient tailleur de pierre, the stone cutter or stereotomist who knew the geometric rules and the sequence necessary to reproduce them on the material, shaping it. In this sense, the architect defines the ashlars and the entire architectural system using software, with a constructive logic that makes this process closer to the real than to the virtual. Existing CAD/CAM software and the advanced innovative production technology, combined with the structural verify of the constructive system, allow the direct transfer from the ideal to real. Infographics software, in fact, enables the direct transfer of information from a three-dimensional model to 3D printing and rapid prototyping machines (RP), which permit the construction of the maquette or preliminary model, the first stage from the drawing to the concrete architectural realization on a small scale. The direct transfer from three-dimensional model to innovative robotic fabrication simplifies the process of the production of structural elements, which in the past required highly specialized handmade workmanship and longer fabrication times. Furthermore, robotic fabrication produces a high level of precision, which is very important for the structural stability of the whole architectural system. In this sense, it is very important to remember that the more accurate the threedimensional model and the more advanced is robotic fabrication, the more precise the structural elements will be, as will the whole architectonic system, ensuring its correct functionality. It is useful to observe that stereotomic principle of constructive rationalization (based on design accuracy and executive precision) is very much in line with the process of industrial production. It is useful to observe another concept of high logical and perfect rigor for the process of industrial production referred to the programmed modularity of the blocks, according to which of any vaulted system it is possible to optimize a least number of useful blocks to satisfy the whole construction of the vault. The idea of serial production, typical of the manufacturing industry, conjugates today with the notion of uniqueness of the architectural product giving life to the concept of “serial uniqueness” of the manufactured article. It is a justifiable ossimoro [oxymoron] thanks to the advent of the parametrical–variational info– graphic era conjugated to the use of the utensils machines [ . . . ] for which every project is unique and reproducible in series (Fallacara 2007: 145).

The use of robots in the stone processing industry permits the optimization of cutting processes, minimizing the margin of error between virtual model and its realization, responding to requirements of stereotomy. Diamond wire cutting is an important technique for reducing production time and increasing quality of elements realization. For this reason, it is important a continuous update of the architect on

1332

G. Fallacara and R. Gadaleta

production and cutting technologies, of which robotics represents a fundamental innovation for its speed and precision. An accurate knowledge of practical operation and robotic fabrication technologies is therefore fundamental for the correct stereotomic design of structural elements and their realization, which “carries over the peculiarities of the traditional architecture to a level of best technical efficiency and performance” (Fallacara 2007: 145). For all these reasons, “the role of stereotomy [ . . . ] may prove strategic because it can help restore theoretical and practical unity to the process of designing and building an architectural work” (D’Amato and Fallacara 2006a: 34), a synthesis of “to know” and “to know-how.” The right way to use the material in stereotomy determines its potentiality in the architectural project: with the art of stereotomy, it is possible the distribution of the material where it is structurally necessary, by lightening the structural form, as it happens in natural forms. Thus, stereotomy, in accordance with its definition written by Charles Perrault, is the “art de se servir de la pesanteur de la pierre contre elle-même et de la faire soutenir en l’air par la même poids qui la fait tomber” (the art of using the same stone weight against its gravity to support it on the top, making full use of the force that would bring it down) (Fallacara 2007: 40). According to Ludwig Mies van der Rohe speaking at his inaugural address at the Illinois Institute of Technology in 1938, in fact, “we must remember that everything depends on how we use a material, not on the material itself . . . New materials are not necessarily superior. Each material is only what we make of it” (Mies qtd in Mills et al. 2013 : 43). Stereotomy can be used with current technologies and artificial or composite materials; however, the choice of the material used must be made based on its mechanical and geometric properties in relation to the construction technique that is reliant on the connection of small elements.

Research About “Stereotomy 2.0” Stereotomic research is aimed at studying the most suitable structural form of a constructive system, considering aesthetic, static, and functional values and the rationalization of the construction and production process. This is achieved through the research into the optimization of the form, the dimension, and the number of structural elements, considering static resistance, aesthetic expressiveness, typological character, and the symbolic value of the whole structural form. Studies, treaties, and constructive applications show how stereotomy is often related to the definition of vaulted space. The symbolic value that the vault represents is expressed by the formal analogy between architectonical vault and celestial vault (celestial sphere), through the etymological relationship between the ancient word stereoma (as celestial vault) written by Cosma Indicopleuste (VI sec. d.C.) and the word stereotomy (Fallacara 2014: 17–21). For example, the Mausoleum of Galla Placidia in Ravenna has an intrados (or inner or lower surface of an arch or vault) decorated with stars on blue background. This affirms the coincidence between the

49 Stereotomy: Architecture and Mathematics

1333

Fig. 2 Claudio D’Amato, Giuseppe Fallacara, and Portale Abeille. Left: From Abeille’s patent to Portale Abeille through topological variation. Right: The Portale Abeille exposed in X Venice Biennale in 2006

constructed form of the vault and symbol of celestial vault, the space that overhangs and “covers.” Studies conducted by Prof. Claudio D’Amato Guerrieri and by Prof. Giuseppe Fallacara allowed them to theorize about and experiment with the topologic deformation of the J. Abeille’s planar reciprocal structure into a barrel vault called Portale Abeille (Fig. 2). This was accomplished by the repetition of two types of invariant ashlars reciprocally interconnected, presented in the X International Architecture Exhibition held in Venice in 2006. The Bin Jassim Dome, which was constructed in Qatar in 2012, is a covering for the hammam designed by architect J. Caspari that has similar properties about ashlars reciprocally interconnected. It was conceived by the team composed of architect G. Fallacara, engineer M. Brocato, the construction company Mecastone by L. Tamboréro, the company SNBR for the realization of stonecutting by digital computer machine, EDM Projects, and Protostyle (Fallacara 2012: 120). The dome measures about 6 m in extrados diameter and consists of 110 stone trapezoidal ashlars of which 8 are element-type invariant, whose face at extrados measures about 1 m in length. During her doctoral research (XXIX cycle – 2014–2016, Politecnico di Bari with Università degli Studi Roma Tre), architect Roberta Gadaleta designed an innovative solution of bonding for a hemispherical stereotomic dome in cut stone, which optimizes its construction when compared to traditional methods. This is achieved using a fivefold structural geometry of bond that reduces the number of invariant ashlars to be produced, maintaining their reduced dimensions and increasing the diameter of dome, respecting static laws and improving aesthetic expressivity (Gadaleta 2018). Her study was carried out with a knowledge of the number and the morphology of invariant structural elements, as well as the size and configuration of historical

1334

G. Fallacara and R. Gadaleta

Fig. 3 Roberta Gadaleta, New stereotomic bond for dome in stone architecture. Left: View of intrados and lateral view. Right: View of intrados of new stereotomic dome, in which are highlighted: geometry of spherical dodecahedron (with grey lines), distribution of static forces (with black curved lines and arrows), and minimum triangular part (with dotted lines)

constructed forms with the study of the possible geometrical methods of spherical division. The innovative bond produced is based on a particular geodesic tessellation with fivefold symmetry, wherein the hemispherical calotte of the dome is divided according to the spherical regular dodecahedron whose regular pentagonal faces are subdivided according to a structural pattern derived from J. Kepler’s “Aa” pentagonal tessellation (Kepler 1619), which is the same geometry that characterizes the atomic structure of quasicrystals with icosahedral symmetry (Fig. 3). The 34 invariant ashlars in Gadaleta’s dome are repeatable and measure from 25 to 40 cm long; the diameter of the dome is 10.06 m long at the extrados, and the thickness of ashlars is 22 cm. Static analysis of the dome was done, concluding that the structural form is verified, also through the appropriate subtraction of particular ashlars, thereby unloading the structural form (Gadaleta 2018). From the infographic model for Gadaleta’s dome, it was possible to prototype ashlars in PLA material with the Ultramaker2 machine for the construction of a maquette at a scale of 1:14.37 with diameter of dome being 70 cm long at the extrados. The height of face of the ashlars at extrados is from minimum of 1.5 cm to maximum of 4 cm (Fig. 4). After manually numbering and classifying all the prototyped ashlars, the maquette was constructed in 5 working days, and the assembly process was simplified using construction drawings prepared for that

49 Stereotomy: Architecture and Mathematics

1335

Fig. 4 Roberta Gadaleta, new stereotomic bond for dome in stone architecture. Intrados (left) and lateral view (right) of maquette. Extrados diameter: 70 cm. Scale of maquette: 1:14.37

Fig. 5 Left: Lapisystem, robot ABB of T&D robotics used for cutting stone ashlars. Center: Ashlars cut by SNBR for this research at Sainte-Savine-Troyes (France) for construction stone prototype. Right: Intrados of stone prototype constructed by SNBR and projected by author. Extrados diameter: 1.24 m; Scale of prototype: 1:5.74

purpose, in which the same-type invariant ashlars have the same color, same number, and belong to the same layer in the infographic software. Construction of the maquette took place without centring, using the traditional construction technique employing a trammel, comprising a small wooden rod equal in length to the radius of the sphere inscribed in the dome, having one end positioned in the center of the dome and the other in the faces of the ashlars, which allows the definition of the perfect sphericity of the intrados. The construction of the maquette was very useful to check the interlocking achieved by the bond and to understand the most effective laying rules (Gadaleta 2018). It was analyzed the static behavior of this new bond and the possibility to hole the same through the subtraction of some ashlars. In order to verify the feasibility of the cutting work and the assembly of the bond, a stone prototype of a smaller dome with this innovative bond was made by société SNBR at Sainte-Savine-Troyes in France, using Robot ABB (Fig. 5). The scale of the stone prototype is 1:5.74 , and the extrados diameter measures 1.24 m. The prototype was further simplified, since

1336

G. Fallacara and R. Gadaleta

the decagons and the nonagons are made up of only one ashlar, because they are smaller than in real scale; in fact, they measure about 10 cm in length and 4 cm of thickness. Ashlars were made in pierre de Lens, cut by the robot in 60 h, first numbered and then put on polyurethane centering, which was prepared in 4 h by the robot, and was transferred by engraving the design of the bonding project. The execution of this design isn’t indispensable for the construction of the dome, as demonstrated by the PLA maquette, but it was helpful to speed up assembly times, which took place on 2 working days by two people, including cleaning the prototype after assembly (Gadaleta 2018).

Stereotomy with 3D Printing in the Age of Industry 4.0 Stereotomy is effectively the technique (or rather the art) of removing material, in order to create stone blocks, geometrically refined, which allow for the construction of elements and architectural systems of triple value: aesthetic, static, and functional. Modern stereolithography, on the other hand, is the technique that makes it possible to create objects, by means of appropriate machines, by adding material (originally liquid resin solidified by UV rays) in overlapping layers. The term “stereo,” which is in common between the two techniques, indicates the purpose of the technique or the creation of solid-volumetric objects created mutually: by removal of matter (cut – tomia) or by addition of matter. The two techniques, viewed from a biocompatibility point of view, in the sense of exploitation of the raw material, could theoretically be considered complementary and integrative. Specifically, if, by stereolithography, we mean, in a broader sense, the technique of additive manufacturing or 3D printing through the stratification and solidification of semifluid material composed of specific mortar, it is useful to consider the reuse of stone processing waste for the inert part of the mortar. To do so, Giuseppe Fallacara has conceived the AHS (Architectural Hypar System) (Fallacara 2016: 26–35) as one possible processes of updating the ancient stereotomic discipline in the triple aesthetic, geometric, and constructive aspects. Fallacara’s AHS modular construction system was initially designed in 2016 and presented, with the prototype HyparWall, for the first time to the public on the occasion of the exhibition titled “New Marble Generation,“ curated by Raffaello Galiotto and Vincenzo Pavan for the 51st edition of Marmomacc in Verona (Veronafiere, 28 September–1 October, 2016) (Fig. 6). In particular, the annual exhibition is part of the project The Italian Stone Theater, created by Marmomacc with the support of the Ministry for Economic Development (MISE), ICE-Italian Trade Agency, and Confindustria Marmomacchine, as part of the Extraordinary Promotion Plan for Made in Italy for the enhancement of the excellence of the national lithic and technological sector. This research using geometric-modular construction refers to the “Modular constructivism,” a sculptural trend developed between 1950 and 1960, whose founders are Erwin Hauer and Norman Carlberg. The AHS (Architectural Hypar System) is a modular construction system that allows for the construction of many

49 Stereotomy: Architecture and Mathematics

1337

Fig. 6 Giuseppe Fallacara, the prototype HyparWall, for the 51st edition of Marmomacc in Verona (Veronafiere, 28 September – 1 October 2016)

types of wall, with the use of “bricks-blocks” of complex shape resulting from the geometry of the hyperbolic paraboloid, from which derives the term Hypar (hypar hyperbolic paraboloid). From the geometric point of view, the “type brick” is a solid derived from the extrusion of a saddle surface (with rectilinear edges) inscribed in a parallelepiped with a square, rectangular, trapezoidal, and parallelogram base. The “type brick,” based on its topological variation, can be used for the construction of multiple building wall systems composed of discrete elements subjected to compression: walls, vaults, domes, etc. (Fig. 7). The system is part of the broader research relating to the updating of construction techniques, of a stereotomic nature, whereby the static, the aesthetics, and the geometry are part of a unique inseparable designing and constructive thought. From the material point of view, AHS, it has been conceived for the use of natural stone and, or recomposed stone, according to a triple constructive possibility focused mainly on the elimination of excess processing waste and for the complete recovery of them, in a logic of eco-compatibility of the product and respect for the raw material: 1. Natural stone: CNC cutting with robotized diamond wire 2. Artificial stone: Realization through appropriate mold 3. Artificial stone: Realization through large-sized 3D printing

1338

G. Fallacara and R. Gadaleta

Fig. 7 Giuseppe Fallacara, composition of multiple systems and cut blocks

In the first case, the use of natural stone required the study of a specific construction and cutting technique that reduced to a minimum the waste of material due to the particular shape of the “saddle brick.” The hyperbolic paraboloid, being a ruled surface (or a surface made by the union of different lines), can easily be made with a diamond wire cut and, in this case, with a computerized numerical control cut by an anthropomorphic robot with a diamond wire head. The anthropomorphic robot, being able to control more movements during the cut, compared to a static diamond wire cutting machine, is able to realize segments with a rectangular, trapezoidal, and parallelogram base, allowing ample freedom of action in the design of complex shapes. In this way, starting from a parallelepiped block, it is possible to easily obtain many identical segments contained in a series inside the parallelepiped. The loss of raw material, due to the excavation of the tool, is almost zero, except

49 Stereotomy: Architecture and Mathematics

1339

for the material removed from the thickness of the diamond wire diameter. The cut blocks are also perfectly stackable for transport and storage. During the assembly phase, the segments can have different configurations that allow both porous and completely closing walls. In the second case, artificial stone, the waste powders of the stone processing become the aggregates of a specific cement-based mortar to be used as raw material for the filling of the “hypar-bricks” molds. In this way, in the production process, it is necessary to produce the molds (traditionally made of cement or fiberglass) for the purpose of mass production of the brick. Inside the mold, during the filling phase of the mortar, it is possible to insert fiberglass fibers and/or light metal reinforcements in order to make the brick very resistant. The production of molds can take place after the production of the basic plaster hypar-brick, using traditional manual techniques, or in wood or high-density polystyrene cut with CNC machine tools. The chromatic appearance and superficial roughness of the brick can vary due to both the specific chromatic qualities of the stone or marble powders used and the surface treatment of the internal surface of the mold. In the third case, as well as in the previous one, the raw waste material of stone processing, i.e., residues and stone powders with different granulometry, can be usefully reused to form the aggregates of specific mortars for large 3D printers (Fig. 8). Thanks to this, the mortar, composed of binder and inert, would have a more natural color and the appearance of an artificial stone. In this regard, the mortars and binders based on geopolymers would be optimal for this purpose. The research, in this third case, focuses on two fundamental aspects: the mortar composition and the molding technique related to specific machines or anthropomorphic robots and extruders for large-scale 3D printing. At present, the limits of these technologies concern three aspects: the aesthetic quality of the printed product, the difficulty

Fig. 8 Giuseppe Fallacara, 3D model of Hypar Dome (left) and production of ashlars with large 3D printer (right)

1340

G. Fallacara and R. Gadaleta

in moving the machinery for the printing and for the continuous production of mortars on the construction sites, and the risk of non-homogeneity of the mechanical qualities of the printed product on the basis of the different climatic exposure during the construction in open places (parts more or less exposed to atmospheric agents could react differently to the maturation of the mortar). Another very important aspect related to this construction technique is related to the geometry of the element to be printed, which is produced by the sedimentation, for successive horizontal layers, of mortar, which solidifies in the compatible times to support the subsequent layer of material without global and local variation of the shape of the element itself. It is, therefore, easily understandable that all the shapes which imply a cantilever overhang of the mortar during printing, beyond the allowed angles, are prohibited or strongly limited. The construction system, AHS, thanks to its geometry, is compatible with the 3D printing process without the use of supporting elements, becoming interesting for large molding systems. In addition, it allows the printing of both bricks, in a single way, and the continuous wall, as a whole, given by the aggregation of several bricks without interruption. The experimental phase, concerning the real-scale realization of the prototypes deriving from the three construction techniques previously described, was organized with a diachronic scan starting from the second constructive system, then moving on to the first system and, finally, to the third system currently in progress. The first prototype of the AHS, created for the exhibition “New Marble Generation” with the aim of creating new high-quality lithic design products aimed at mass production, is called HyparWall. This is a sinusoidal modular diaphragm wall, made up of segments of hyperbolic paraboloid. The segments are made using the waste from the Pietra Leccese stone work, using specific binders able to create a sort of reconstructed Pietra Leccese, very similar to the original. The HyparWall research involved the collaboration of a leading company in the sector, PiMar (Lecce), and the contribution of Tarricone Prefabbricati of Corato (Bari). The production cycle of natural stone, in addition to cutting the elements, uses waste materials in a sustainable and innovative way. The aim of the research was to extend the production cycle of natural stone in addition to cutting the elements, using waste materials in a sustainable and innovative key. The overall geometry of the whole wall is constituted by the mutual aggregation of two “typebricks” (specular), which, in an appropriate manner, can be used to create: • • • •

Straight wall, full, or pierced Curvilinear and cylindrical wall Sinusoidal wall Barrel vault

The HyparWall prototype, exhibited in 2016 in Hall 1 of the Marmomacc fair, was selected in 2017 for the exhibition outside the fair, in the heart of the city of Verona within the “Marmomac & the City” format. The second prototype phase, currently underway and developed since 2017 in collaboration with the French company SNBR (Sainte-Savine, Troyes), has

49 Stereotomy: Architecture and Mathematics

1341

produced two prototypes in natural stone, referring to two different construction systems: the sinusoidal wall and the reinforced barrel vault. The first prototype, very similar to the sinusoidal wall made of artificial stone through the use of the previously described molding construction technique, was realized with the diamond wire cutting technique moved by an anthropomorphic robot. The starting parallelepiped-shaped lithic block, based on isosceles trapezium, was “sliced” in serial succession by the diamond wire according to a specific spatial direction. The production speed of the blocks and the almost total lack of processing waste made the prototype very interesting from an economic and constructive point of view. The second prototype, named Hypar Vault, is a perforated barrel vault made of the same two type blocks used for the sine wall geometry. The structure, while being able to withstand the only condition of natural compression of the stone elements, has been reinforced and prestressed by the posttension of harmonic steel wires passing through the linear axis of the individual segments in correspondence with the median axial arc of each row of the barrel vault. For the construction of the vault, it is necessary to have a wooden rib which is removed for the purpose of the assembly of all the segments constituting the entire vaulted structure and of their prestressing (Fig. 9). The constructive system can also be used for the construction Fig. 9 Giuseppe Fallacara, Hypar Vault

1342

G. Fallacara and R. Gadaleta

of other geometric types of vaulted and dome structures. It is possible to equip the extrados with vaulted structures of a glass or plexiglass covering system, completely integrated into the overall geometry. The third prototype phase of AHS refers to the development of a research, still in progress, deriving from a recent international architecture award (www.printarch. net – Architectural Hypar System) based on the development of technologies and building components created with the additive technique of the 3D robotic print currently being tested. In general, the priority research themes in this area are exemplified by the application of additive manufacturing to the building elements and systems of architecture. The development trajectories include the research on innovative and environmentally friendly materials and on waste treatment. The aim, as already underlined, is to manufacture and build on a large scale using additive manufacturing methods and using powders and waste deriving from the stone working process as a printing material. This application would, therefore, result in a transformation of waste from a cost element to an economic resource, from unusable processing by-products to raw materials for environmentally friendly services and products, in a cyclic process of environmental and economic regeneration. From this basic concept, the largescale project Anthill Tower was born, an “anthill” tower inspired by nature and the manufacturing phenomena of the animal sphere, by which it is possible to build large vertical structures thanks to the sedimentation and stratification of small grains of soil (Fig. 10). In this specific case, the incessant and methodical work of the ants is carried out by anthropomorphic robots, coordinated in series, which extrude mortar on parallel horizontal levels developed vertically according to the logic of 3D printing. The specific geometry of the tower, created as aggregation of Hypar-seamlessly maxi blocks, allows the mortar to settle according to the optimal angles of vertical growth of the tower. The interior spaces of the concave-convex tower give a new housing dynamic wherein the sequence of shared spaces and labyrinthine

Fig. 10 Giuseppe Fallacara, Anthill Tower project

49 Stereotomy: Architecture and Mathematics

1343

connections is greater than the aggregation of individual private housing cells. The anthropomorphic landscape of the city, thus, returns to reflect on the nature and the biological life that surrounds us. The practical results of this research were made with a large 3D printer (WASP delta 3mt), financed by the Fondazione Puglia within the legislative measure: Resources in the “scientific and technological research” sector 2018.

Conclusion The present chapter outlines the fundamentals principles of stereotomy and its use in architecture. It explains the constructive rationalization achieved using a specialized knowledge of geometry which permits the architect to optimize the construction of architectural systems that possess a triple value: aesthetic, static, and functional. Thanks to the advanced techniques and technologies for the design and production of structural elements, stereotomic architecture remains competitive today and can be continuously updated in the age of Industry 4.0 .

Cross-References  Geometric and Aesthetic Concepts Based on Pentagonal Structures

References D’Amato C (2014) Studiare l’architettura. Un vademecum e un dialogo. Gangemi Editore, Roma D’Amato Guerrieri C, Fallacara G (2006a) Costruire con la pietra oggi. In: D’Amato Guerrieri C (ed) Città di Pietra/Cities of stone. Costruire in pietra portante/Stones of Apulia. To build load-bearing stone. Venezia, Marsilio, Pietre di Puglia, pp 34–35 D’Amato Guerrieri C, Fallacara G (2006b) Archetipi costruttivi dell’architettura in pietra. In: D’Amato Guerrieri C (ed) Città di Pietra/Cities of stone. Costruire in pietra portante/Stones of Apulia. To build load-bearing stone. Venezia, Marsilio, Pietre di Puglia, p 39 Fallacara G (2007) Verso una progettazione stereotomica. Nozioni di stereotomia, stereotomia digitale e trasformazioni topologiche: ragionamenti intorno alla costruzione della forma/towards a stereotomic design. Notions of stereotomy, digital stereotomy and topological transformations: reasoning about the construction of the form. ARACNE editrice, Roma Fallacara G (2012) Stereotomy. Stone architecture and new research. Presses des Ponts et Chaussées, Paris Fallacara G (2014) Stereotomia e rappresentazione del mondo. In: Fallacara G, Minenna V (eds) Stereotomic design. Edizioni Gioffreda, Maglie (Lecce), pp 17–21 Fallacara G (2016) Architectural stone elements. Research, design and fabrication. Presses des Ponts, Paris Fallacara G, Barberio M (2018) Stereotomy 2.0: the rebirth of a discipline that never died. Nexus Netw J Archit Math 20:509–514. https://doi.org/10.1007/s00004-018-0408-6 Frézier AF (1980) La théorie et la pratique de la coupe des pierres et des bois, pour la construction des voutes et autres parties des bâtimens civils & militaires, ou Traité de stereotomie à l’usage de l’architecture (1737–1739). Tome second, Planche 53. Jacques LAGET-L.A.M.E, Nogentle-Roi

1344

G. Fallacara and R. Gadaleta

Gadaleta R (2018) New stereotomic bond for the dome in stone architecture. Nexus Netw J Archit Math 20:707–722. https://doi.org/10.1007/s00004-018-0379-7 Gallon J-G (1735) Machines et inventions approuvées par l’Académie Royale des Sciences depuis son établissement jusqu’à present; avec leur Description. Dessinées & publiées du consentement de l’Académie, par M. Gallon. Tome Premier. Voute Plate inventée par M. Abeille, 1699, N◦ . 50, p 159–160–161, Planche N◦ 50. Voute Plate inventée par Le Pere Sebastien, de l’Academie Royale des Sciences, 1699, N◦ . 51, pp 163–164, Planche N◦ 51. Chez Gabriel Martin, JeanBaptiste Coignard, Fils, Hippolyte-Louis Guerin, Ruë S. Jacques. Avec Privilege du Roy, Paris Kepler J (1619) Harmonices Mundi. Liber II, Fig. Aa, between pp 58–59. Lincii Austriae, Sumptibus Godofredi Tampachii Bibl. Francof. Excudebat Ioannes Plancus, Frankfurt Mills D, Warminski M, Greenhills Historical Society (2013) Greenhills. Arcadia Publishing, Charleston Potié P (1996) Philibert de l’Orme. Figures de la pensée constructive. Editions Parenthèses, Marseille

Fractal Geometry in Architecture

50

Josephine Vaughan and Michael J. Ostwald

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fractal Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fractal Geometry in Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of Fractal Geometry in Architecture and Design . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1346 1347 1348 1351 1355 1358 1359 1359

Abstract Fractal geometry is a product of fractal theory, a mathematical approach that describes the way space is filled by figures or objects. A fractal geometric figure is one that can be iteratively subdivided or grown in accordance with a series of rules. The overall fractal figure then has parts, which under varying levels of magnification tend to look similar – if not identical – to each other, and the figure fills more space than its topological boundaries. While pure mathematical fractal figures can be infinite in their iterations, there are examples of fractal shapes with limited scales that can be found in architecture. This chapter briefly outlines the background of fractal theory and defines fractal geometry. It then looks at the confusion surrounding the claims about fractal geometry in architecture before reviewing the way architecture and fractal geometry can be combined through inspiration, application, or algorithmic generation.

J. Vaughan () The University of Newcastle, Newcastle, NSW, Australia e-mail: [email protected] M. J. Ostwald UNSW Built Environment, University of New South Wales, Sydney, NSW, Australia e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_11

1345

1346

J. Vaughan and M. J. Ostwald

Keywords Fractal geometry · Architecture · Design · Interpretation · Critique

Introduction Benoît Mandelbrot (1982) originally defined fractal geometry as a type of deep geometric phenomena that arises from the application of a system of repetitively applied feedback rules. Today fractal geometry has become a valuable branch of mathematics, with applications in many fields. This chapter examines the way architects and scholars have incorporated fractal geometry into the design and interpretation of the built environment. By the start of the twenty-first century, more than 200 examples of building designs had been identified that have been shaped by or allegedly designed in accordance with fractal geometry (Ostwald 2001). Since that time this number has continued to grow, and today there are multiple examples of major buildings and spaces that have been designed in response to various properties, qualities, or forms associated with fractals. These buildings display many possible connections between fractal geometry and design, ranging from inspiration to structural stability, from construction systems to surface treatments, and from applied ornament to algorithmic generators. From this initial list of connections, it should begin to be apparent that there is no single or fixed relationship between architecture and fractal geometry and that many possible connections have been proposed. However, it must also be acknowledged that the relationship between fractals and architecture is not a straightforward one, and there are several misunderstandings associated with this particular connection between architecture and mathematics. Since the late 1970s, architectural scholars and designers have opportunistically appropriated images and ideas from fractal geometry along with concepts broadly related to fractal dimensions and nonlinear dynamics and used them for a wide variety of purposes. Some of these appropriations have been motivated by the desire to advance architecture or to offer new ways of understanding design, but many others have a seemingly more superficial or expeditious agenda (Ostwald 1998). These derivations of fractal theory in architecture have adopted many labels including “Fractalism,” “Complexitism,” “Complexity Architecture,” and “Nonlinear Architecture,” leading Yannick Joye to argue that “a systematic, encompassing, scholarly treatment of the use and presence of this geometrical language in architecture is missing” (2011: 814). Joye is not alone in this observation, and other scholars have sought to define the different possible connections between architecture and fractal geometry (Ostwald and Vaughan 2016). However, for the purposes of the present chapter, a much simpler classification is adopted. This chapter divides applications of fractal geometry in architectural design into three types; those that are inspired by fractal theory, applications of fractal geometry in built form, and fractal geometry as an algorithmic generator for design. Before describing each of these categories, it is first necessary to understand fractal theory and fractal geometry. The following section provides an overview of these topics.

50 Fractal Geometry in Architecture

1347

Background Fractal theory, which encompasses fractal geometries and fractal dimensions, is derived from a mathematical concept formalized by Benoît Mandelbrot in 1975. His idea was that an object or image has an amount of detail in it that may not be initially obvious to the viewer. However, upon further investigation of this object or image, a greater volume of detail or roughness may be present in the way this object or image occupies space. The traditional Euclidean approach to space-filling geometry – as taught to most school students – considers geometry to be finite. Thus, a square remains a square, and under closer scrutiny, no more detail can be found in the figure. We can measure a square; we can increase or decrease its size, but it remains unquestionably a two-dimensional figure with four sides. The apparent measurability and predictability of Euclidean geometry, which exists in a sole dimension, make it easy to handle and account for its popularity and common use. However, in the last few decades, the idea that multiple dimensions may exist simultaneously in Euclidean space has become known as the “theory of general dimensions” (Edgar 2008). One of the catalysts for this development was the growing realization that whole number or integer dimensions are incapable of describing the full complexity of the material world. Probably the most famous of the general dimensions, and the first to methodically develop non-integer values, is the fractal dimension. Fractal theory, while accepting of finite geometry, whole numbers, and integer dimensions, also allows for the inclusion of infinite geometry and non-integer dimensions. Fractal geometry can be used to examine those more unusual and disturbing non-Euclidean shapes and uncertain geometries that were once described by mathematicians as a “gallery of monsters” (Mandelbrot 1982: 3). In the late 1970s, Mandelbrot developed fractal theory to embrace these geometries which upon closer inspection have more complexity (or a higher HausdorffBesicovitch dimension) than their apparent Euclidian integer dimension (their topological value) (Mandelbrot 1982: 15). From this overarching fractal theory, Mandelbrot developed two applications – both based on the same mathematical theory – to produce different outcomes: fractal geometry and fractal dimensions. Mandelbrot developed mathematician Gaston Julia’s work from the early twentieth century on iterative geometries to define fractal geometry as the physical or theoretical embodiment of fractal theory. A fractal geometric figure is generated by successively subdividing or growing a geometric set using a series of iterative rules, producing a figure that has parts, which under varying levels of magnification tend to look similar, if not identical, to each other and have a resulting shape too complex to fit into its original topological category. A fractal dimension, on the other hand, is not a geometric figure but a mathematical application of the theory of fractals to measure the characteristic complexity or roughness of any object or image (which may be fractal or not), by counting the detail present in it over many scales. Because fractal geometry and fractal dimensions are related and part of the same general theory, Mandelbrot used fractal geometric sets to explain fractal dimensions and vice versa. However, while all images or objects have fractal dimensions, not all are examples of fractal

1348

J. Vaughan and M. J. Ostwald

geometry. Yet this distinction is often lost when non-mathematicians work with fractal theory, and the term “fractal” has tended to become a somewhat generic concept in many fields, which has led to often inaccurate or meaningless theoretical meldings of fractal geometry with fractal dimension. As such, in architecture and the arts, there is often confusion about what might be described as an application of fractal geometry and what significance a fractal dimension might hold. To clarify the differences between Euclidean geometry, fractal dimensions, and fractal geometry, let us look at three different images over three scales of observation (Fig. 1). The first is a typical Euclidian shape, a rectangle (a); the second is an image of a cement-rendered wall of a building (b); and the third is a branching, tree-like figure (c). Each image contains a small framed section, a close-up of which is depicted in the image below it. When we zoom in for a closer look, each image responds differently depending on its qualities. For example, when zooming in for a close look at the first iteration of the perfect Euclidean rectangle, the section that is framed remains as a simple straight line (d). A closer look at the framed section of the wall now shows us some more detail of its material properties, with a few irregular bumps in a rough surface (e), and in the same way that the wall displays greater detail as the viewer approaches it, the tree-like figure shows us some more detail of a twig, which at this closer scale reveals further branching, which occurs in the same way as the entire tree previously. That is, halfway up each length the twig splits into two twiglets, creating a Y shape (f). Now each of these new views (d–f) also contains a small framed section, which is investigated at a closer scale in the image below it. The straight line of the rectangle, when observed at a closer scale, remains a straight line (g) and is therefore an example of a Euclidian shape. The wall, when viewed at a closer scale (h), reveals further detail of its surface, with even more roughness due to the aggregate in the concrete. As such, this wall does not conform to the typical definition of a Euclidian geometric form, nor, as we will see, is it an example of fractal geometry. A closer look at the twig reveals additional branching (i), following the previous splitting rule, and furthermore the overall shape of the new twiglet is remarkably similar to the image of the tree initially observed in image c. This last example, which reveals the recurring geometric structure of an irregular form, is an example of fractal geometry.

Fractal Geometry While the tree-like shape (c, f, and i) is the only example of fractal geometry in Fig. 1, all three images can be considered through the lens of fractal theory. For example, all three can be measured to determine their fractal dimension. The hollow rectangle, with its four perfect lines, would have a fractal dimension close to one, as it is a Euclidian figure with planar geometry. The wall and the tree would have fractal dimensions somewhere between 1 and 2, as they both fill more space than their initial one-dimensional view suggests. But only the tree is an example of what Mandelbrot calls fractal geometry, a special type of irregularity formed by using a

50 Fractal Geometry in Architecture

1349

Fig. 1 Three images (a–c), which are then viewed at a closer scale (d–f), and then this closer scale is itself the subject of a closer investigation (g–i)

series of geometric constructions which parametrically repeat themselves to produce evocative and potentially, infinitely complex images. In such cases, the resultant geometric figure, when examined at increasingly fine scales, is seen to be “selfsimilar.” That is, at a variety of ranges, the object in question tends to resemble itself. Generating fractal geometry is undertaken using a method known as iterative function systems or IFS (Peitgen and Richter 1986). In Fig. 1c, f, i, the examples are increasingly closer views of a portion of a completed fractal shape. The shape (Fig. 1c) was initially generated using IFS, with the branching rule as described. Figure 2 shows this process from the initial line to the shape found in Fig. 1c. If the iterations continue, the shape progressively increases in detail, and yet it does not exceed the boundaries of the initial figure, which is a typical property of many fractal sets. The rule for this fractal geometric figure could be described as follows; halfway along every new twig, a division occurs, splitting it into a Y shape. For the

1350

J. Vaughan and M. J. Ostwald

Fig. 2 Starting geometry and first two iterations of a fractal geometric image

Fig. 3 Iterations three to five of a branching tree

next iteration, each of the arms of the Y must follow the same rule, and therefore half of each Y twig splits into another Y and so on (Fig. 3). So from one stick, we have two twigs, from those we have four twiglets, and in a theoretical model, this rule-based iteration could continue infinitely, and this would be an example of a true fractal geometric form. There are many well-known fractal sets – including the Koch snowflake and the Sierpinski triangle (Fig. 4) – which feature infinitely deep and repetitious shapes. These are often called “ideal fractals” because they can only exist in the mind, in computer simulations, or as algorithmic processes. In the material world however, if an actual tree did happen to grow following this algorithmic rule, these repeated steps would only occur a few times before the sticks became so slender that practical, material, and scale limits would begin to apply, meaning the form could not be infinitely divided or enlarged.

50 Fractal Geometry in Architecture

1351

Fig. 4 First four iterations of the Sierpinski triangle (top) and Koch snowflake (bottom) fractals

Mandelbrot originally identified two types of fractal geometric sets, “finite” and “infinite,” which he later described as “multi-fractal” and “uni-fractal” sets, respectively. Ideal mathematical fractals – such as the Koch snowflake or Sierpinski triangle – possess infinite scalability and singular stable dimensions and as such they are sometimes known as “uni-fractals.” In contrast, if the branching tree in Fig. 1 was a real-life plant, it would be an example of a multi-fractal, an object that simultaneously possesses a range of dimensions, each of which is relatively consistent over several scales, but is not continuous. Like the tree, many examples exist in nature that exhibit levels of relatively consistent complexity over a few distinct scales. This is also the case in architecture where, for example, an entire building or parts of a building might use a rule-based, scaled motif, to create a repetitively applied feedback pattern which is visible to the naked eye. However, due to physical limitations of construction and tangible space, examples of fractal geometry in architecture are never infinitely present as “buildings are not fractals in the same way that mathematical constructs” are (Bovill 1996: 117), and – like natural forms – buildings can only ever be classed as multi-fractals (Ostwald and Vaughan 2016). The following section looks more closely at this topic.

Fractal Geometry in Architecture In his early publications on fractal theory, Mandelbrot suggested that architecture could provide an example of fractal geometry when he proposed that certain architectural styles possess similar formal properties to those of various natural fractals. This argument is encapsulated in his statement that “a high period Beaux

1352

J. Vaughan and M. J. Ostwald

Arts building is rich in fractal aspects” (1982: 24), because it possesses “very many scales of length and favour[s] self-similarity” (1982: 23). Mandelbrot’s identification of the Beaux Arts (French, neoclassical architecture of the eighteenth century) as being especially fractal could be regarded as being reasonable, given the way a Beaux Arts building has a complex form, often featuring elaborate ornamentation, and some of its elements, including columns, archways, tiling, or paving, are even repeated at different scales. But Mandelbrot’s description of Beaux Arts as capturing the essence of fractal geometry is at best a visual simile or a generalization. Architects have offered equally evocative but often simplistic arguments about the essence of fractal geometry and its application in design. For example, misapplications of a romantic notion of fractal theory have led to a proliferation of quasiscientific applications of fractals and architecture. This situation has developed from design proposals or publications which merge multiple, often dissimilar properties which may only be tentatively related to fractals. Probably the best known of these is Charles Jencks’s (1995) “Architecture of the Jumping Universe,” an evocative title for an eclectic set of ideas cherry-picked from science, philosophy, and art. Also emerging on a similar vein in the 1990s were Jeffery Kipnis’ proposals for the “New Baroque” and Peter Eisenman’s “Architecture of the Fold” which freely merge concepts from fractal theory with themes from the writings of Deleuze and Guattari, philosophers who once used fractal geometry as a metaphor for political systems (Ostwald 2006). Other examples of confusion include designs that critics have interpreted as having fractal geometric properties but without any apparent agreement from the designers involved. For example, Bovill (1996) observes that Lucien Kroll’s 1986 book, The Architecture of Complexity, contains images and ideas which are suggestive of fractal geometry. This may be true, but Kroll does not mention fractals, and most of his ideas about complexity relate to the use of modular elements in design and construction. Likewise, the patterns on the facades of Storey Hall by architects Ashton Raggatt McDougall and LAB Architecture Studio’s Federation Square have been described as containing fractals; however, both are actually examples of tessellations, a category of plane-filling topographic structures which are only superficially reminiscent of fractals. Inaccurate claims about the presence of fractal geometry in architecture are typically derived from the basic misunderstanding that a repeated, scaled-down shape found in a building constitutes a fractal. While fractal geometry may contain such examples, the key difference is that in a true case of fractal geometry, these scaled, self-similar shapes must be located according to an algorithmic rule, not randomly placed in a building. For example, Fig. 5 shows the hypothetical facade of a modern building with repeated rectangles of different scales found across the surface as doors and windows. These are likely placed according to the requirements of the inhabitants, and their position cannot be modeled according to any algorithmic rule, and thus it is not an example of fractal geometry. Figure 6 however, while also a modern building facade, does provide a partial example of fractal geometry in architecture, as the rectangles on the surface are scaled and placed according to an iterative rule, and the smallest shape looks similar to the generating shape

50 Fractal Geometry in Architecture

1353

Fig. 5 Despite some self-similarity and scaling, this is not an example of fractal geometry

Fig. 6 This is a partial example of fractal geometry as the self-similarity and scaling conforms to an iterative rule

(a recumbent cross in a rectangular frame). If the shapes were repeated infinitely, it would be regarded as fractal geometry, but as the figure depicts the elevation of a building, which has practical scale and material limits, it is not a true fractal in a mathematical sense, but it begins to capture the spirit of fractal geometry. In this context, Frank Lloyd Wright’s Palmer House provides an interesting example of the confusion over what might constitute fractal geometry in architecture. Leonard Eaton (1998) proposed that Wright’s Usonian work of the 1950s and 1960s features a “striking anticipation of fractal geometry” (1998: 31). Eaton’s rationale for this argument is derived from the recurring presence of equilateral triangles, at different scales, in the plan of Wright’s Palmer House (Fig. 7). Eaton counts the triangular forms found in this house, ranging from the large triangular slabs of the cast concrete floors, down to the triangular shape of the fire-iron rest adjacent to the hearth, leading him to conclude that the Palmer House has “a

1354

J. Vaughan and M. J. Ostwald

Fig. 7 Palmer House Plan, redrawn from Storrer (1993: 352)

three-dimensional geometry of bewildering complexity” (1998: 35). Since Eaton’s publication, many other scholars have incorrectly cited the Palmer House as an example of fractal geometry in architecture (Capo 2004; Sedrez and Pereira 2012). Unfortunately, Eaton’s argument is largely incorrect as his definition of fractal geometry – comprising “a geometrical figure in which an identical motif repeats itself on an ever diminishing scale” (Eaton 1998: 33) – lacks a crucial property. As explained previously, fractal geometry requires a rule-base or recurring structure. Repetition of geometry does not make an object fractal. Andrew Crompton (2001) is highly critical of proposals like Eaton’s noting that, “[f]rom this point of view almost any building can show fractal qualities, one simply has to count the elements of a façade which occur within different ranges of size and see how they increase in number as they get smaller” (245–246). James Harris also soundly rejects Eaton’s definition stating that it “points out the misconception that a repetition of a form, the triangle in this case, constitutes a fractal quality. It is not the repetition of the form or motif but the manner in which it is repeated or its structure and nesting characteristics which are important” (Harris 2007: 98). In a similar way, Salingaros is critical of Jencks’s claim that Frank Gehry’s Bilbao Guggenheim Museum is selfsimilar and thereby fractal. Salingaros (2004) argues that Jencks “is misusing the word ‘fractal’ to mean ‘broken, or jagged’ [and ...] he has apparently missed the central idea of fractals, which is their recursiveness generating a nested hierarchy of internal connections” (47). Ostwald and Moore (1996) observed this same problem soon after the publication of Eaton’s argument about the Palmer House, when they demonstrated that even the most Euclidean of designs, Mies van der Rohe’s Seagram Building, has more than 12 scales of conscious self-similarity (Ostwald and Moore 1996); however, this does not make it, or the Palmer House, fractal.

50 Fractal Geometry in Architecture

1355

Examples of Fractal Geometry in Architecture and Design The examples in the previous section show that attempts to merge fractal geometry with architecture often fail because there is no iterative feedback structure to the geometry. However when fractal geometry is better understood, there is no reason why it cannot be integrated into architectural design. For example, in Carl Bovill’s 1996 book Fractal Geometry in Architecture and Design, he suggests that for fractal geometry to be present in architecture “there should always be another smaller-scale [ . . . ] detail that expresses the overall intent of the composition” (5). Maycon Sedrez and Alice Pereira (2012) also note that fractal geometry can be found in architecture “through [ . . . ] recursive patterns, as generative patterns [or] as tools of scale perception” (99). These recursive architectural features are those that are characterized by both formal repetition and routine geometric construction. Kirti Trivedi (1989) provides an example of a building type, the Indian temple, which features both recursive and rule-based geometries that conform relatively closely to the expectations of fractal geometry. Trivedi (1989) observes that in certain ancient Indian temples, visually complex shapes are generated through the use of successive “production rules that are similar to the rules for generating fractals.” Moreover, there appear to be multiple different rule variables which are pertinent to different parts of the temple. In combination these rules, scales, and variables operate through “self-similar iteration in a decreasing scale: repetition, superimposition and juxtaposition” (249). Indian temples provide a strong case for an intuitive connection between fractal geometry and architecture, in part because they actually possess, to a limited extent, scaled, self-similar geometric forms that follow a seemingly clear generative process (Trivedi 1989; Lorenz 2011; Sedrez and Pereira 2012). Like ancient Indian temple architecture, many recursive, scaled geometries can be found in the built environment long before Mandelbrot defined fractal theory. For example, Ron Eglash (1999) notes the similarities between the geometric patterns found in indigenous African design and the self-similar shapes of fractal geometry. Several architects and mathematicians have also observed that the thirteenthcentury plan of Frederick II’s Castel del Monte possesses self-similarity at two scales, thereby suggesting the start of a sequence of fractal iterations (Schroeder 1991). Each of these examples is an instance of scaled, geometric repetition, which is superficially similar to the geometric scaling found in mathematical fractals. Furthermore, researchers have identified fractal properties in the way the classical Greek and Roman orders have been iteratively (Capo 2004). Along with other scholars, Manfred Schroeder (1991), Andrew Crompton (2001), Wolfgang Lorenz (2011), and Albert Samper and Blas Herrera (2014) all suggest that Gothic architecture has fractal properties or can be interpreted in terms of fractal geometry. Joye (2011) even proposes that the Gothic cathedral offers one of “the most compelling instances of building styles with fractal characteristics” (2011: 820). In the nineteenth century, in addition to Mandelbrot’s case for the fractal features of the Beaux Art, he is one of multiple authors to suggest that the Eiffel Tower could be

1356

J. Vaughan and M. J. Ostwald

considered structurally fractal, at least for up to four iterations (Mandelbrot 1982; Schroeder 1991; Crompton 2001). Since Mandelbrot’s development of fractal theory, many building designs have been published or constructed that explicitly acknowledge a debt to fractal geometry, even though the resultant architecture may not have such a clear relationship. Some of the architects and firms that have either made explicit reference to complexity science or have been linked to fractals include architectural firms from the USA (Asymptote, Peter Eisenman, Steven Holl, Morphosis, Eric Owen Moss and Kenneth Haggard and Polly Cooper), Germany (Kulka and Königs), Austria (Coop Himmelblau), France (Jean Nouvel), the Netherlands (Aldo and Hannie van Eyck, Van Berkel and Bos), Belgium (Philippe Samyn), the UK (Bolles Wilson, Zaha Hadid, Ushida Findlay), Japan (Yoshinobu Ashihara, Arata Isozaki, Kisho Kurokawa, Fumihiko Maki, Kazuo Shinohara), India (Charles Correa), Spain (Carlos Ferrater), and Colombia (Plan B). In some cases the influence of fractal geometry in a particular project may be obvious, whereas in others it is less clear what the connection is. For example, one of Charles Correa’s designs for a research facility in India features a landscaped courtyard that is tiled in a representation of the Sierpinski triangle. This is an obvious and literal connection that might be appropriate, given the function of the building, but it is potentially little more than an ornamental application. In contrast, Ushida Findlay produced a three-dimensional map of the design themes they had been investigating at different stages during their joint career. This map, a nested, recursive structure which traces a spiraling path toward a series of design solutions, is visually and structurally similar to a strange attractor; an iconic form in complexity science (Ostwald 1998). More commonly, architecture that explicitly acknowledges a connection to fractal geometry is inspired by some part of the theory or its imagery, even though it does not employ a scientific or mathematical understanding of the concept. Thus, in architecture the fractal tends to serve as a sign, symbol, or metaphor representing a connection to something else. For instance, a large number of architectural appropriations of fractal forms are inspired by the desire to suggest a connection to science, nature, or ecology. An example of these motivations is found in the Botanical Gardens of Medellin, jointly designed by Plan B Architects and JPRCR Architects, and in the work of architects Haggard and Cooper, who also use fractal geometry for its “holistic characteristics and endless scales aiming at the creation of sustainable architecture” (Sedrez and Pereira 2012: 99). In both of these cases, geometric scaling is deployed to evoke a connection to nature, a link which might be reasonable in symbolic or phenomenal terms but does not support any genuine ecological agenda. Further examples of mathematical inspiration in architecture include the work of Philippe Samyn who researched the geometry of fractals as inspiration for his design of “‘harmonic’ double curved structures [...] which are low-cost, lightweight, and easy to erect” (Pearson 2001: 62). Japanese architect Kisho Kurokawa used fractal geometry as an inspiration in his Kuala Lumpur International Airport Terminal Complex and also used it in a more practical way to solve computational modeling and construction challenges in some of his designs (Ostwald and Vaughan 2016).

50 Fractal Geometry in Architecture

1357

The influence of fractal theory on architecture also encompasses designs that have been generated using the mathematics, rules, or processes of fractal geometry. These examples range from the straightforward proposition to construct a classical fractal set and inhabit it, to more elaborate, computational, algorithmic, or scripted approaches to evolving a formal solution to a design problem. Among the more literal examples is Bolles Wilson’s proposal for the Forum of Water in the 1993 Das Schloss Exhibition, a design in the shape of a modified Menger cube – a classic or ideal fractal object. Menger cubes have also been proposed as architectural designs in the works of the Russian architects Turin and Bush, Podyapolsky, and Khomyakov (Ostwald 2010) and as a facade treatment in the architecture of Steven Holl (2010). After Mandelbrot’s fractal theory became well known, architects also developed computational algorithms for creating fractal designs. Unfortunately, the core problem with this endeavor is that the resultant “organic,” “blob-like,” or “crystalline” forms are often unusable or unfeasible for architectural purposes (Yessios 1987). Of the large number of these works that have been generated, relatively few have resulted in habitable and constructible forms (Coates et al. 2001). One of the dilemmas with using any generative design algorithm to create architecture is that it is almost always form-based. That is, these methods are used for creating shapes, not for accommodating social structures or functional needs and without any sense of the tectonic or material properties of the form (Ostwald 2010). Furthermore, such evolved forms typically do not take into account environmental or cultural considerations. As Van Tonder (2006) observes, “[f]ractals emulate our natural visual surroundings in terms of structural self-similarity, a fact which unfortunately renders architectural fractals prohibitively expensive to construct, and inefficient as architectural space for human occupation” (2). This is why, instead of attempting to design a complete building using fractal algorithms, most architectural works in this final category only use fractals to generate part of the building. As Holl (2010) observes, “[a] real building, of course, cannot be a perfect mathematical figure” (7). Several attempts have been made to break away from the form-dominated approach to fractal architecture. For example, in an early strategy to come to terms with the problems inherent in fractally generated designs, Yessios (1987) notes the paradox that, “if left unrestrained,” a fractal process “will go on forever” (173) and then proceeds to identify the fundamental challenge mentioned previously in this section, arguing that “if applied in a ‘pure’ fashion, [fractal geometry] will create an interesting shape but will never produce a building. A building typically has to respond to a multiplicity of processes, superimposed or interwoven. Therefore, the fractal process needs to be guided, to be constrained and to be filtered” (Yessios 1987: 173). In a similar way, almost two decades before architects began to design buildings using fractal algorithms, Schmitt (1988) suggested that the resultant designs would be too “boring” without some additional consideration of the qualities that define architecture. “The next logical step,” Schmitt (1988) argues, is toward modes of “computer creativity” which can develop “context sensitive associations, preferably in interaction with the user” (103). Schmitt rejects the simplistic formalism so often associated with fractally generated design, calling for

1358

J. Vaughan and M. J. Ostwald

architecture to be sensitive to external, site-based, and internal, inhabitation-based, qualities. The suggestion that a fractal process might be modified in some way to produce a context-sensitive architecture has become a common response to the problems of creating useful (functional, constructible, and inhabitable) geometric objects. In an interesting alternative to this tradition, Saleri (2005) proposed using fractal algorithms to generate building facades from a palette of architectural elements. Thus, rather than producing ambiguous organic shapes, Saleri created elevations and forms which feature windows, doors, and other building elements that have been placed in accordance with a set of rules, rather than the needs of inhabitants. In a different way, Harris (2007) uses mathematical transformations and repetitions of simple forms to generate potentially functional architectural designs, mostly skyscrapers. He examines the basic design “rules” of various styles of architecture or architects and applies these rules to his fractal, generative process. In this manner, he has produced three-dimensional images of buildings which look convincingly similar to real Art Deco towers and designs by Frank Lloyd Wright. In a more recent reflection of this proposition, Sedrez and Pereira (2012) propose a method which commences with a fractal form and then uses functional emergence principles to “choose the appropriate size for that object, [ . . . ] proposing architectural information through doors/windows, landscaping, furniture and surfaces, finishes, colours” (100). Another application of fractal generation to design is associated with “contextual fit”; that is, the capacity of a new intervention to be “sympathetic to,” or “in keeping with,” the visual character of its surrounding site. Applications of fractal generation have been proposed for a range of urban neighborhoods and regions (Saleri 2005). However, much like the architectural examples of generative design, some of these cases are dominated by formalist solutions, which have only limited connection to their sites or cultural contexts.

Conclusion Despite attempts to define “fractal architecture,” the central paradox of the endeavor is that no building can possess true fractal geometry (Ostwald and Vaughan 2016). Remember that fractal geometry is a system which describes forms that are generated from precise algorithmic rules which are infinitely repeated. The reason that the idea of fractal architecture is so problematic is that buildings have what is called a “scaling limit”: a point at which any sense of self-similarity either changes or simply breaks down completely. Pure fractal geometry only exists in mathematics, in hypothetical examples, computer simulations, or philosophical puzzles. This is why architecture in the real world can never be a true example of fractal geometry. Given this paradox, is it even meaningful to talk about fractal architecture? The proliferation of unsubstantiated quasi-fractal references in architecture and design has tended to undermine the usefulness of the concept and the degree to which it can be taken seriously.

50 Fractal Geometry in Architecture

1359

Probably the biggest problem for architects has been that fractal geometry relies on a structured, iterative process, which is not the same as repeating forms at different scales. This issue arose when, 26 years after the publication of The Fractal Geometry of Nature, Benoît Mandelbrot was asked if he thought that Frank Gehry’s architecture expressed some of the properties of fractal geometry. Mandelbrot replied that he found Gehry “repetitive” rather than iterative, because there was no algorithmic scaling in Gehry’s architecture (Mandebrot qtd. in Obrist 2008). This distinction, between a repeated shape and a fractal shape, is key to producing fractal geometry in architecture. The final issue is that the connections between architecture and fractal geometry do not necessarily have to be mathematically correct to be valuable. A great building could be inspired by fractal geometry but possess no clear trace in the finished design of the origins of that inspiration. Conversely, a computational algorithm could be used to develop a fractal building, but this is no guarantee that the building will be successful. A fractally generated design will likely have to be modified – through the inclusion of a range of site, social, or context-based measures – before it can become suitable for inhabitation. Ultimately, architecture can, potentially, have multiple different connections to fractal geometry, all of which, within clearly described limits, are informative or useful.

Cross-References  Fractal Dimensions in Architecture: Measuring the Characteristic Complexity of

Buildings  Fractal Geometry in Architecture

References Bovill C (1996) Fractal geometry in architecture and design. Birkhäuser, Boston Capo D (2004) The fractal nature of the architectural orders. Nexus Netw J 6:30–40 Coates P, Apples T, Simon C, Derix C (2001) Current work at CECA: Dust plates and blobs. In: Proceedings of 4th international conference on generative art, GA2001, Milan, 12–14 Dec, unpag. Politecnico di Milano, Milan Crompton A (2001) The fractal nature of the everyday environment. Environ Plan B: Plan Des 28:243–254 Eaton L (1998) Fractal geometry in the late work of frank Lloyd Wright. In: Williams K (ed) Nexus II: architecture and mathematics. Edizioni dell’Erba, Fucecchio, pp 23–38 Edgar D (2008) Measure, topology, and fractal geometry. Springer, New York Eglash R (1999) African fractals: modern computing and indigenous design. Rutgers University Press, New Brunswick Harris J (2007) Integrated function systems and organic architecture from Wright to Mondrian. Nexus Netw J 9:93–102 Holl S (2010) Architecture surrounds you, in the same way that music surrounds you. ArchIdea 42:4–11 Jencks C (1995) Architecture of the jumping universe. Academy Editions, London Joye Y (2011) A review of the presence and use of fractal geometry in architectural design. Environ Plan B: Plan Des 38:814–828

1360

J. Vaughan and M. J. Ostwald

Lorenz WE (2011) Fractal geometry of architecture. In: Dietmar B, Gruber P, Hellmich C, Schmiedmayer HB, Stachelberger H, Gebeshuber IC (eds) Biomimetics – materials, structures and processes: examples, ideas and case studies. Springer, Berlin, pp 179–200 Mandelbrot BB (1982) The fractal geometry of nature. W.H. Freeman, San Francisco Obrist HU (2008) The father of long tails interview with Benoît Mandelbrot. In: Brockman J (ed) EDGE.org. Retrieved from http://www.edge.org/3rd_culture/obrist10/obrist10_index.html Ostwald MJ (1998) Fractal traces: geometry and the architecture of Ushida Findlay. In: van Schaik L (ed) Ushida Findlay. 2G, Barcelona, pp 136–143 Ostwald MJ (2001) Fractal architecture: late twentieth century connections between architecture and fractal geometry. Nexus Netw J 3:73–84 Ostwald MJ (2006) The architecture of the new baroque: a comparative study of the historic and new baroque movements in architecture. Global Arts, Singapore Ostwald MJ (2010) The politics of fractal geometry in Russian paper architecture: the intelligent market and the cube of infinity. Architl Theory Rev 15:125–137 Ostwald MJ, Moore RJ (1996) Fractalesque architecture: an analysis of the grounds for excluding Mies van der Rohe from the oeuvre. In: Kelly A, Bieda K, Zhu JF, Dewanto W (eds) Traditions and modernity in Asia. Mercu Buana University, Jakarta, pp 437–453 Ostwald MJ, Vaughan J (2016) The fractal dimension of architecture. Birkhauser, Cham Pearson D (2001) The breaking wave: new organic architecture. Gaia, Stroud Peitgen H-O, Richter PH (1986) The beauty of fractals: images of complex dynamical systems. Springer, New York Samper A, Herrera B (2014) The fractal pattern of the French gothic cathedrals. Nexus Netw J 16:251–271 Schmitt G (1988) Expert systems and interactive fractal generators in design and evaluation. In: CAAD Futures ‘87 Eindhoven, The Netherlands, 20–22 May 1987. Elselvier, Amsterdam, pp 91–106 Schroeder MR (1991) Fractals, chaos, power laws: minutes from an infinite paradise. W.H. Freeman, New York Sedrez M, Pereira A (2012) Fractal shape. Nexus Netw J 14:97–107 Saleri R (2005) Pseudo-urban automatic pattern generation. Chaos Art Archit – Int J Dyn Syst Res 1:24.1–24.12 Salingaros NA (2004) Anti-architecture and deconstruction. Umbau-Verlag Harald Püschel, Solingen Storrer WA (1993) The Frank Lloyd Wright companion. University of Chicago Press, Chicago Trivedi K (1989) Hindu temples: models of a fractal universe. Vis Comput 5:243–258 Van Tonder G (2006) Changing the urban façade through image superposition. In: Proceedings of the international symposium on architecture of habitat system for sustainable development. Kyushu University, Fukuoka, pp 83–89 Yessios CI (1987) A fractal studio. In: Integrating computers into the architectural curriculum. ACADIA, Raleigh, pp 169–182

Parametric Design: Theoretical Development and Algorithmic Foundation for Design Generation in Architecture

51

Ning Gu, Rongrong Yu, and Peiman Amini Behbahani

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generative Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Common Characteristics of Generative Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Main Generative Design Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parametric Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Historical Review of Parametric Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parametric Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reshaping Architectural Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impact on Architectural Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Limitations of Parametric Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1362 1362 1363 1364 1368 1368 1370 1376 1376 1378 1379 1380 1380

Abstract This chapter presents the theoretical foundation of parametric design for design generation in architecture. Parametric design has been increasingly applied to architectural design in recent years. It is essentially a digital design method, which can be characterized by rule-algorithmic design and multiple-solution generation. Parametric design originates from generative design, which is a typical computational design approach based on rules or algorithms (e.g., in N. Gu () · P. A. Behbahani School of Art, Architecture and Design, University of South Australia, Adelaide, South Australia, Australia e-mail: [email protected]; [email protected] R. Yu School of Engineering and Built Environment, Griffith University, Southport, Australia e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_8

1361

1362

N. Gu et al.

generative grammars or evolutionary systems). This chapter starts with a critical review of generative design, followed by the background, history, and theory of parametric design, including various fundamental concepts and applications that underpin parametric design, and concludes with a discussion of the impact of parametric design on architecture.

Keywords Parametric design · Architectural design · Generative design · Mathematics · Algorithm

Introduction Generative design is a general term commonly used to describe a number of computational methods that aim to automate the whole, or a part, of the design process. Their main purpose has been to assist human designers to explore the design space with computational means (Herr and Kvan 2007). This is usually realized through rapid prototyping, powered by computers (Chien 1998). Furthermore, because the conceptual frameworks and design opportunities they offer are very distinct from the traditional approaches to design, it is arguable that generative systems have also created new paradigms or tools for design thinking (Oxman 2006). Shape grammar is one of the typical generative design methods (see  Chap. 52, “Shape Grammars: A Key Generative Design Algorithm”). As discussed in that chapter, shape grammars, their generative nature, as well as formal approaches to design representation and generation, have influenced the emergence and development of contemporary computational design methods and tools, including parametric design, which is the focus of this chapter. Deriving from generative design principles, parametric design has distinctive features and has been widely adopted in leading architectural design practices. This chapter is structured into three main sections. The first section summarizes generative design as the theoretical background for parametric design; the section introduces generic features of generative systems and reviews key examples of generative design methods. The next section “Parametric Design” focuses on parametric design including its historical development as well as its main components, applications, and implementation. The section “Reshaping Architectural Design” discusses the impact of parametric design and how parametric design has reshaped architecture and is followed by a concise conclusion.

Generative Design As defined above, generative design refers to a collection of computational methods or algorithms used to explore the design space for generating solutions. For this purpose, algorithmic procedures are formally defined and applied to produce

51

Parametric Design: Theoretical Development and Algorithmic. . .

1363

numerous design alternatives, of which the most suitable would be selected (Herr and Kvan 2007). This definition has been systematically used since the 1990s in the computational design domains, although most generative design systems (such as shape grammars) had been developed decades earlier. The term generative appeared much earlier in the relevant fields of design and visual art, at least since 1965, as reported by Boden and Edmonds (2007). It has gradually evolved to shift or extend its meaning from a description of design mechanics (as computer-generated design) to a more specific reference to design paradigms capable of supporting design thinking (Eckert et al. 1999). Tracing back to its origin, generative design initially emerged in the 1950s– 1960s, inspired by natural phenomena. For example, evolutionary systems or genetic algorithms were inspired by the processes of biological evolution (Holland 1975), cellular automata were modeled based on biological growth (von Neumann 1951), and design grammars had synergies with the formulation of natural languages (Stiny and Gips 1972).

Common Characteristics of Generative Design Although developed for a variety of purposes, generative design methods share some common characteristics. As noted, generative design systems rely on formal algorithms to describe and generate designs and are mostly associated with computation. This chapter defines three common characteristics of generative design: algorithm, ideation (for numerous design alternatives), and computation. Algorithms in most generative methods, are realized as a collection of generative rules, an interpretation made by Oxman (1990) and Boden and Edmonds (2007). Others describe similar concepts as build commands (Hornby and Pollack 2001), transformation rules (Gero and Kazakov 1996), and so on. Generative rules are responsible for changing the whole or a part of the design to a new one. Rules differ by the mechanisms of generation they adopt, which refer to the different ways rules alter the design. For example, in shape grammars the generative mechanism in a shape rule can be used for either replacement or modification (Knight 2003). Replacement is the substitution of the design or its part(s) with another; it can also include addition (replacing void with an element) and subtraction (replacing an element with void). On the other hand, modification is used to change the scale, orientation and direction, or other numerical properties (e.g., in parametric shape grammars) of the design. Through the application of the rules, design generation is then arranged in sequence (Herr and Kvan 2007). This sequential and recursive process selects a combination of rules to process the design generation in a structured manner, from an initial state of the design to the final outcome. Design alternatives emerge when different combinations of rules are selected and applied. These sequences are not always explicitly defined and may emerge during their applications. For example, in shape grammars, rule sequencing is made possible in real time by the recursive matching of existing shapes from the design with

1364

N. Gu et al.

the left-hand side (LHS) shapes of the rule set (Maher 1990). Another more advanced type of sequencing is through the cyclic reproduction of design (Eckert et al. 1999). In this case, rules that have directed the generation of the previous stage will undergo further round(s) of application, but only applied to the highest potential designs and alternatives identified during the process. Such a quality is quintessential to evolutionary design, which will be introduced in the section “Evolutionary Systems.” Ideation (for numerous alternatives), or design divergence, refers to the capability of generative design methods to create multiple instances that satisfy the design requirements; usually this is enabled by the selection of multiple rules and their different sequencing for application during design generation. For example, in shape grammars, the decidability of LHS shapes (e.g., multiple possibilities in recognizing and matching LHS shapes) can contribute to the divergence of the design results (Knight 2003). Other factors can also play a role in enabling design divergence. For example, in parametric shape grammars, the range of parameters may allow multiple definitions of design by adjusting the values of those parameters (Knight 2003). The computational characteristic refers to the fact that generative design has a formal structure comprising visual or mathematical properties for design generation in a systematic way. Because of the increasing complexity and intensity of the design and the generation processes, research has extended the notion of “computation” to the use of computers as well (Boden and Edmonds 2007). In some domains, for example, “computer art” and “generative art” have been used interchangeably deriving from this context.

Main Generative Design Systems Generative design systems have been developed since the 1960s. Many of them share common origins or features, which are useful for categorizing different systems. For example, Fischer and Herr (2001) characterized generative systems into four general types: emergent systems (e.g., cellular automata), generative grammars, algorithmic generation (e.g., parametric design), and algorithmic reproduction (e.g., genetic algorithms). While such categorization makes sense in abstracting and distinguishing common generative features, the terminologies themselves may not be very precise. For example, “algorithmic generation” can be a very broad term and arguably applicable to all generative systems. The term “emergent systems” suggests support for design emergence and is also applicable to a number of systems. A different approach (Singh and Gu 2011) does not explicitly develop more general categorizations, but acknowledges that different generative systems can have both differences and overlaps between each other. This section reviews four categories, by considering both their differences and similarities and therefore does not mutually exclude one from another. These categories are: generative grammars, evolutionary systems, self-organized emergent systems, and associative generation.

51

Parametric Design: Theoretical Development and Algorithmic. . .

1365

Generative Grammars As discussed in  Chap. 52, “Shape Grammars: A Key Generative Design Algorithm”, generative grammars are based on transformational rules which can be applied recursively to develop the final shape. This notion was originated from linguistics but has evolved into different forms for different unique purposes. For example, visual grammars such as shape grammars were inspired by Chomsky’s idea of generative grammars (Stiny and Gips 1972), meanwhile graph grammars were initially developed in computer science and are especially suitable for computer implementations (Chakrabarti et al. 2011), and L-systems were based on another linguistic concept – string grammars – developed in the 1950s (Singh and Gu 2011). In recent years research has combined these approaches to enhance generative capabilities, for example, the combinations of shape grammars and graph grammars (Grasl and Economou 2013; Lee et al. 2016) have been demonstrated to address both visuospatial and syntactic issues in architecture. Generative grammars are one of the earliest generative design systems to be used in architectural design, for example, the shape grammar of Palladian villas by Stiny and Mitchell (1978). On the other hand, graph grammars and L-systems were applied to architectural design much later, having been in use since the 2000s (e.g., Parish and Müller 2001). The original generative mechanisms of rules in grammars are mostly for replacement not modification (Knight 2003). Nevertheless, modifications later became possible in grammars and significantly enhanced their generative power, especially with the emergence of parametric shapes and parametric shape grammars. Generative grammars are sequential due to the nature of rule application. The sequencing of rules for application is determined by matching different existing elements of the design against LHS shapes of the rule set for rule selection (Stiny and Gips 1972). The divergent generation of design alternatives could be fulfilled by different possibilities in such rule selection and application. While all grammars are computational, only graph grammars and L-systems have been realized as computer implementations to a greater extent. As discussed in  Chap. 52, “Shape Grammars: A Key Generative Design Algorithm”, during shape detection, shape grammars allow designers to freely decompose and recompose shapes. This feature is called shape emergence, which regards shapes as not being predefined but emerged. To appropriately handle shape emergence has been difficult in computer implementations because of the finite representations of design being restricted by the computer memory (Krish 2011; Tching et al. 2016), and therefore it has become an ongoing challenge for visual grammars such as shape grammars.

Evolutionary Systems As the term suggests, evolutionary systems in design were derived from evolutionary biology, especially with the simplified notion of “survival of the fittest” (Holland 1975). In summary, an evolutionary system defines a recursive process of design reproduction, in which each organism reproduces (slightly) divergent offspring. Only the “fittest” among them would survive to trigger reiteration of the reproduction and selection cycle (Fasoulaki 2007). In an evolutionary system, the

1366

N. Gu et al.

“fitness” of the generated design instances is evaluated using certain fitness criteria to select the most appropriate ones to continue the generation and selection cycle, until a satisfactory outcome is achieved. Because of such a basis in evolutionary biology, these generative design systems are also called genetic algorithms (GA). Unlike most other generative design systems, there are no inherent generative rules in evolutionary systems. Design alternatives can be produced by different means, including other generative methods (e.g., shape grammars, L-systems, etc.). The sequential characteristic of generation reflects the cycle of generation and selection, rather than the order of specific rule selection and application. Divergence in evolutionary design is also supported by the generation and selection cycle, and the evolutionary process is usually sizable in terms of both depth (number of stages in design generation) and breadth (number of alternatives per stage). In most existing evolutionary design systems, computers have been used to assist in managing the generation and selection cycle. However, in cases where the selection criteria are more subjective (e.g., aesthetics), human designers in particular have been used for intervention. Generative design systems or genetic algorithms have emerged from the very beginning of computational design (Holland 1968). However, their explicit applications in architecture only began in the 1990s (Jo and Gero 1998). A possible reason may be the complications of formulating fitness criteria that address rather complex architectural design issues or considerations and the challenges for formulating them using abstract computer representations. The concepts of generation and selection, or synthesis and evaluation, have always been an important part of design theories and methodologies, and they are also the fundamental basis of evolutionary design systems.

Emergent and Self-Organized Systems The concept of emergence here is defined as the explicit outcome of implicit representation (Gero 1996). Therefore, emergent systems refer to generative design systems whose outcome emerges out of self-organized components. A common approach to this emergence is a collection of self-organizing agents, who shape the final form by interacting with each other and with their surroundings. Cellular automata and swarm intelligence are the two typical examples of emergent and selforganized systems (Singh and Gu 2011). Cellular automata systems were inspired by biology through an interest in simulating biological growth (von Neumann 1951). With its formal establishment in the 1980s (Wolfram 1986), it attracted the attention of designers. In practice, a cellular automata system is made of a grid (or structured geometry) whose cells can have different properties or states (Sarkar 2000). A rule or set of rules, inherent to the cells, can direct the alteration of states responding to their neighboring cells or time intervals (Batty 1997b). As time passes, the cells automatically change their properties or states, and the corresponding design combinations emerge on the grid as the outcome. Because of this automation process of the cells, the term cellular automata is therefore used for such generative design systems.

51

Parametric Design: Theoretical Development and Algorithmic. . .

1367

Considering the rather abstract nature of the cells, the generative rules (i.e., for state alteration) can have both replacement and modification as the generative mechanism. The automation process is sequential as state alteration proceeds and spreads cell by cell. Design divergence in cellular automata can be achieved either by recording the collective states of cells at different times or enabling parameters and variations that can affect the automation rules (Batty 1997a). Swarm intelligence concerns the design of complex and intelligent systems based on collective behaviors of simple individual agents who work collaboratively (Blum and Li 2008). Similar to the cells in cellular automata, agents in swarm intelligence behave collectively based on their inherent rules, as well as interactions with each other and their surroundings. However unlike cellular automata, their design outcomes are not bound to the defined grid (Singh and Gu 2011), therefore their behaviors can be more complex than those of the alteration of states in cellular automata. Swarm intelligence originated in the study and simulation of insect swarms, and later has been extended to other animals and humans who can undertake more complex cognitive activities (Garnier et al. 2007). Similar to cellular automata, the agents in swarm intelligence usually operate based on time intervals, which define the sequential characteristic of design generation using swarm intelligence. The divergence of the design outcomes may be realized by various means such as varying (or even randomizing) the numbers and types of agents, their surroundings, time intervals, etc.

Associative Generation In this type of generative system, the design and its alternatives are generated by firstly defining the association between different components that will make up the design, and then by assigning and alternating the values of different properties between each other. Unlike grammars, the design components are not explicitly transformed, but any changes made to their properties will subsequently lead to the generation of new design instances by adjusting other properties due to the defined association. Parametric design is one of the most typical generative systems and the latest development based on associative generation or associative geometry. Parametric design is the focus of this chapter and will be more extensively introduced in the second half of the chapter. Parametric, as a term or method, originated from the mathematical notion of parametric equations, with its design application initially seen in the works by Italian architect Luigi Moretti in the 1940s (Frazer 2016). The essential idea of parametric design is that the values of different variables change, usually associatively, based on different input parameters. The concept of parametric design has also been adopted to enhance other generative design systems, for example, parametric shape grammars. In theory, parameters and variables can be of any types or values, including those related to visual qualities, while in practice they are usually numerical, which make them ideal for computer implementation. Divergence may be supported by either adjusting the parameters and variables, or modifying and redefining the association between them. The sequential characteristic here is largely reflected in iteration or rapid prototyping,

1368

N. Gu et al.

where a large number of design alternatives can be very efficiently produced by the abovementioned adjustment, modification, or redefinition, which can then be explored and reviewed (often with the intervention of human designers) to perfect the design generation.

Parametric Design As described above, the concept of parametric design originated from generative design. Deriving from those core generative design principles, parametric design has since evolved into one of the most widely applied generative design systems in practice and in education. This section firstly introduces the historical background and development of parametric design. The main content of the section describes the theoretical foundation of parametric design, including various fundamental concepts, as well as their application and implementation.

Historical Review of Parametric Design Origin of Parametric Design As noted, parametric design is a specific type of system for generative design. The generative process of parametric design focuses on the control of variations and constraints. The variation control of geometry can be traced back to the Church of Colònia Güell designed by Antonio Gaudi. He created a model in which the ceiling shape could be altered by adjusting the position of the weight or the length of the string. This association of the ceiling shape with the position of the weight and the length of the string in the system, demonstrates the fundamental concept of parametric design and is the initial application of it in architecture, although without explicit computational techniques. Based on this concept, Hillyard and Braid (1978) proposed a system for computer-aided mechanical design that allowed the specification of geometric constraints, restricted within a certain range. This system represents one of the first attempts to theorize variations and constraints in computational design systems. A few years later, Light and Gossard (1982) furthered the development and presented what was called variation geometry or variation design. Their work provided geometrical representations with enhanced mathematical and geometrical modeling tools to support the design process. These parametric modeling tools were then utilized widely in aerospace, ship construction, and product design industries. From the late 1980s, Frank Gehry, a leading figure in contemporary architecture, used CATIA as the main platform for the documentation of his building designs. As a component of this platform, a parametric package (for modeling NURBs surfacing) formally entered commercial architectural practices. Though most of Gehry’s designs only used parametric techniques in the documentation stage, at around that time his style suggested new architectonic possibilities and triggered intellectual debates about the style and the role of the computer. Without parametric tools, these dynamic, open-ended, and complex designs would

51

Parametric Design: Theoretical Development and Algorithmic. . .

1369

have been difficult to achieve. Later, the consulting firm Gehry Technologies (GT), standardized this parametric approach into Digital Project (DP) to specifically handle complex architectural designs. Today GT has expanded their services to a wide range of computational design areas.

Development of Parametric Design From the mid-1990s, the number of architectural practices and academic organizations that explored and adopted parametric design in architecture increased significantly. During that period, software such as GenerativeComponents™ (GC), Grasshopper, and Processing were developed and evolved rapidly with enhanced features. By the year 2000, the applications of parametric techniques in building design had matured, and a growing number of landmark buildings using parametric design were proposed and constructed. The leading practices using parametric design in the architecture industry at the time included Foster+Partners, Zaha Hadid Architects, UNStudio, KPF, AA Emtech, and SPAN. Designers were able to use different parametric design tools to produce and control free-form architecture. This was a major breakthrough for many of the leading architectural practices shifting into designing through parametric modeling. In research and education, AA in London, LAAC in Spain, and Hyperbody at Delft University of Technology, MIT, and Columbia University have since become cradles for the next generation of parametric designers. Meanwhile, international computational design conferences including ACADIA, ASCAAD, CAADRIA, eCAADe, SIGraDi, and CAAD Futures have an increasing number of publications on parametric design, an important topic that has since been established in these research forums. In more recent times, there have been many other conferences and workshops that continue the effort in promoting parametric design. Parametricism Over the past 20 years, parametric design has gradually played an increasingly important role in architectural design, especially among leading design practices. Some scholars have argued that it has become a design style or movement that is replacing Modernism (Schumacher 2009). In the 2008 Venice biennial exhibition of architecture and titled “Out There: Architecture Beyond Building,” the term parametricism was first raised by Patrik Schumacher and later more comprehensively described as a combination of design concepts that provide a new and complex order based on key principles of differentiation and correlation (Schumacher 2009). During the 2011 ACADIA conference, the term then became the main conference theme. The overarching implications of parametricism and its related concepts are still largely underexplored in a broader industry context, and the approaches to apply parametricism can also vary. For example, architects can maintain modernist aesthetics in buildings but only use parametric modeling to address different levels of complexity or to optimize the overall design. Such approaches can be seen in projects like Soho Shang Du in Beijing. It is considered to be a finely

1370

N. Gu et al.

designed building achieved through parametric tools, but with a rather conventional appearance.

Parametric Design Key Concepts and Terminologies Parametric design. Parametric design focuses on the representation and control of the relationships between objects in a computational design model. It supports the creation, management, and organization of complex designs (Woodbury et al. 2007). Using parametric design tools, designers can create rules according to the aesthetics and functional or performance-related requirements of a design. A parameter is a value or measurement of a variable that can be altered or changed. Each object in a parametric system may have certain rules embedded, and when one parameter changes, other parameters will adapt automatically (Ostwald 2012). By changing various parameters, particular instances can be created from a potentially infinite range of possibilities (Kolarevic 2003). From the notions above, parametric design can be understood to have the following characteristics. Firstly, these systems are structured based on design parameters. Eastman (2008) suggests that in parametric design, key variables are often defined by parameters (Eastman 2008). While most parameters are related to geometrical modeling, others can be connected to functional or performancerelated requirements. Secondly, there are relationships between design variables. As Cárdenas (2007) noted, parametric design establishes the relationships between modeling components defined by constraints, while Abdelsalam (2009) claims that the relationships are maintained by variations. In most cases, both constraints and variations in combination help to define the relationships between the modeling components. The entire parametric system will update when certain variables are changed, because the defined relationships between them will lead to changes in other variables. However, a parametric system may become “over-constrained” if there is no effective control over the defined relationships (Burry 2003). Thirdly, the rule-based algorithms ensure that the parametric design process is flexible but controllable. By making rules, designers can produce variations that result in “fully organized controllable building forms” (Abdelsalam 2009). Fourthly, parametric design provides a formal process from which multiple design solutions can be developed simultaneously and efficiently. Both Hernandez (2006) and Karle and Kelly (2011) emphasize that to develop parallel ideas is one of the main advantages in parametric design. In summary, parametric design is a dynamic, rule-based design process controlled by variations and constraints, in which multiple design solutions can be developed in parallel. It is especially effective in generating complex forms and optimizing multiple design solutions. As a result, parametric design is usually utilized in complex building form generation and fabrication, structural and energy optimization, and other design iteration and prototyping tasks. As a new digital

51

Parametric Design: Theoretical Development and Algorithmic. . .

1371

Fig. 1 Simplified examples of parametric variations generated in the same system

design method, parametric design is quite different from traditional CAD/CAM because of these rule-based algorithmic characteristics. Parametric variations. In a parametric design system, variations are controlled by changing the values of parameters and constraints without necessarily altering the original structure of the model (see Fig. 1 for different variations of a primitive geometry generated by the same system). Variations can be single or multiple, independent from or interrelated with each other. Karle and Kelly (2011) argue that as a new design method, parametric design does not push designers into generating the right design solution but, rather, into asking the right question, which can then be answered with multiple solutions. As a result, the selection process of the generated variations is an important step in parametric design. Prior to selecting the appropriate variation(s), a typical workflow in parametric design also involves identifying the design problem(s) and developing a series of rule sets with associative design variables, which can support a dynamic and flexible design process enabling the emergence of variations. Design constraints have a significant role in directing the entire parametric process. They can be used by designers to describe and generate a range of variations and to define and control the unique characteristics of each. Constraints set limits on parameters and control the possibilities for the range of variations. In parametric design, constraint satisfaction is important in the decision-making process, connecting individual factors with the overall design outcomes. There are two basic forms of constraint – geometric and dimensional (Monedero 2000). Geometric constraints are properties that control how geometrical entities relate to each other. Dimensional constraints are properties that can be assigned to a singular value, relating the geometry to a numerical value that fixes its behavior until it is changed or removed. An optimal parametric design process often requires the computational model to be well balanced, which is neither under-constrained nor over-constrained (Monedero 2000). Parametric modeling. Parametric modeling refers to specific techniques that “create design spaces and geometric dependencies within a model” (Gane and Haymaker 2009). It provides a formal descriptive and generative design framework (through parameters and their associative relationships), by which designers are able to change the input values to generate and optimize the design with variations. The most significant advantage of parametric modeling is that it allows changes to be made to the parameters at any stage of the design process (Monedero 2000). Different parametric modeling techniques have been developed for visual purposes

1372

N. Gu et al.

Fig. 2 Examples of parametric modeling techniques for form finding: repetition (left) and subdivision (right)

(i.e., form finding), as well as for other functional or performance-related purposes. Figure 2 demonstrates some typical parametric modeling techniques for form finding, for example, repetition (Fig. 2, left) and subdivision (Fig. 2, right); other similar types of techniques can include tiling, recursion, weaving, etc. Research further synthesizes and formalizes essential parametric modeling techniques into “design patterns” (Woodbury et al. 2007). With these patterns, more standardized approaches can be adopted in parametric design to address particular problems. The advantage of these general patterns in parametric design is that “[p]atterns are a way to identify successful general strategies that exemplify a key concept in a memorable fashion that can easily be taught” (Woodbury et al. 2007, p. 229). Parametric design patterns have since been proposed as abstract and reusable tools in parametric design. It is suggested that by using and learning these general patterns, architects and students alike are able to master parametric design more efficiently and skillfully (Woodbury 2010) through abstraction and standardization. Application Parametric design and form finding – Parametric design tools can capture and explore critical relationships between design intentions and geometries. Designers interact with the tools through rule algorithms to capture and manipulate these relationships, as well as relationships among different design elements. For the purpose of form finding, parametric design can be especially useful to facilitate the modeling process for complex geometries, and the integration of parametric tools in design can also enhance flexibility and control during the process (Fischer et al. 2003). To explore form finding in parametric design, some studies have focused on geometrical modeling methods. For example, Hnizda (2009) suggested that there can be at least two different geometrical modeling methods in parametric design – object extraction and transformation – each of which can be used to explore the relationships between formal aesthetics and functional properties. Others, such as Baerlecken et al. (2010), explored the issues through problem definition in the early conceptual stage. To a large extent, the latter form-finding approach through problem definition is dependent on variation settings (Baerlecken et al. 2010), and through parametric variations they have explored functional and structural properties resulting from sunshading and to derive the final design form.

51

Parametric Design: Theoretical Development and Algorithmic. . .

1373

One of the most significant form-finding exercises using parametric design is the “Sagrada Familia” project. Gaudi’s Sagrada Familia cathedral is characterized by curved sculptural surfaces that follow specific sets of principles. These embedded rules in Gaudi’s design provide perfect opportunities for further design and analysis using parametric tools (Roberto and Hernandez 2004). These rather simple geometric rules and procedures can produce a rich formal language. Since the early 2000s, Professor Mark Burry (2003) and his team have developed and applied parametric systems for modeling complex geometries in a major case study of Gaudi’s Sagrada Familia. The research later extended to aspects of fabrication and construction aiming to complete the unfinished masterpiece. Their series of developments has provided important evidence of the effectiveness of parametric design for designing and analyzing complex geometries through rule algorithms. Extending into landscape design, Yu et al. (2015b) applied parametric design to produce new garden plans that are consistent with the style of Chinese traditional private gardens, replicating selected socio-spatial characteristics of those heritage gardens. The socio-spatial characteristics of the original gardens were analyzed employing various space syntax techniques, and the resultant mathematical measurements were then mapped into parameters and constraints in the system. Through the defined rule algorithms, the parametric system can generate new garden plans with similar spatial structures and connectivity values. Figure 3 shows selected examples of the newly generated garden plans using the developed system, on three different given sites. Parametric design and design performance – As discussed earlier, beyond forms and aesthetics, parametric design tools can also respond to functional or performance-related requirements. Such applications are mostly for structural analysis and building performance optimization (i.e., sustainability). When using parametric tools for structural design, Maher and Burry (2003) compared parametric structural analysis with traditional approaches in a crossdisciplinary collaboration between architects and structural engineers. The combination of structural analysis and parametric design not only enhanced the design process and result but also provided new platforms for collaboration across disciplines. In a similar study linking architectural design to structural optimization, Holzer et al. (2007) investigated geometry generation with structural analysis and optimization, which shows that parametric systems (supported by parameters, rules, and constraints) are effective for generating a variety of solutions for structural design. Most of these works have been focused on analyzing the volume of structural materials. ETABS is one of the popular structural analysis tools whose data can be imported into parametric design software. For instance, Almusharaf and Elnimeiri (2010) studied structural performance in high-rise building design by utilizing ETABS in the parametric environment of Grasshopper. A design scenario was presented in their study where instant feedback of structural performance could be provided during the parametric process to assist decision making and design generation. One of the advantages of a parametric design system is that multiple analyses can be established and conducted in parallel within the same model to support

1374

N. Gu et al.

Fig. 3 Selected examples of new garden plans generated using a parametric design system (Yu et al. 2015b)

more comprehensive optimization. Besides structural analysis, building performance, especially in terms of sustainability, is another important application of parametric design. For instance, “multi-parametric façade elements” were proposed and examined by Schlueter and Thesseling (2008). In their study, the performance analysis of the façade offers instant assessment (in terms of the solar gain in different façade forms) during the design process, so that designers can better improve energy performance while generating design forms. In another study, a formal framework that combines parametric modeling with the “performance-based design” (PBD) paradigm was proposed (Bernal 2011), aiming to standardize the parametric applications of building performance and providing real-time feedback on the building performance index during the design process. Collaboration in parametric design – With the continuing evolution and growing popularity of parametric design, researchers have begun to explore design collaboration using parametric systems, including cross-disciplinary and distant scenarios. As discussed above, collaboration between disciplines using parametric systems, for example, in structural design and analysis, can critically optimize solutions using interdisciplinary knowledge. Cross-disciplinary collaboration is considered to be especially beneficial in the early design stage, and the use of

51

Parametric Design: Theoretical Development and Algorithmic. . .

1375

parametric design in the early design stage can effectively assist multidisciplinary teams in problem finding, which in turn directs the generation and selection of variations during the parametric process (Gane and Haymaker 2009). In terms of distant collaboration, Burry and Holzer (2009) explored the potential for sharing parametric models in a version control platform, which could be used by design teams located in different geographical locations. Rajus et al. (2010) used the participant observation method to study distant collaboration in parametric design. In their study, 18 participants were asked to perform different design tasks using the parametric software GenerativeComponents™ (GC). The study developed different controls in collaboration and applied them in the experiments. The results show that moderately controlled collaboration can enhance the team performance and user satisfaction in parametric design. Implementation Software To date, quite a number of software packages have been developed and adopted for parametric design in practice. Most software supports free-form modeling, but also has scripting plug-ins that enable designers to create rule algorithms more directly and freely. Some of these software packages are briefly introduced in this section. Rhino + Grasshopper has been one of the most commonly used parametric tools, especially in architectural design, while Digital Project (DP) and GenerativeComponents™ (GC) are more tailored for large-scale projects with complex parametric and geometric associations. Computer Aided Three-dimensional Interactive Application (CATIA) was first developed in 1977 by French aircraft manufacturer Dassault Aviation. Thereafter, it was applied in aerospace, automotive, shipbuilding, and other industries for its capabilities in controlling and manipulating complex geometries, as well as its supports for manufacturing accuracy. Multiple versions of CATIA have since been developed to perfect the commercial uses, and at the same time it has gradually entered the architectural field. Based on CATIA 5.0, Gehry Technologies developed Digital Project (DP), which specifically aimed to service architectural design. DP has been known as a very powerful parametric software package that can effectively handle complex parametric as well as geometric associations, which makes it ideal for large complex parametric design projects. In 2005, GenerativeComponents™ (GC) was developed by Robert Aish for Bentley Systems. GC implements parametric concepts across the entire design project life cycle, from the early conceptual phase to the final documentation. In addition, GC has been integrated with Building Information Modeling (BIM) as well as other analysis and simulation platforms for design evaluation and optimization. Such integration can potentially make parametric design more targeted and realistic, effectively linking design conceptualization with production, fabrication, and construction. Rhinoceros (Rhino) is a stand-alone, NURBS-based 3D modeling tool. It was developed by Robert McNeel and has since been widely applied in a range of domains, including architecture, industrial design, jewelry design, automotive, as well as marine design. Grasshopper, on the other hand, is a rule-algorithm editor

1376

N. Gu et al.

with a graphic interface, which can be integrated into Rhino as a scripting plug-in. It is structured with specific definition files that link to the main parametric model in Rhino. Usually Grasshopper is used as a generative tool rather than a modifier during the parametric design process. Compared to other parametric software, Rhino + Grasshopper has been much more widely adopted in both practice and education at the present time. This is because of its ease of operation as a visual programming tool (supported by the graphic rule-algorithm editor) as well as its relatively low cost. Scripting uses computer programming languages such as Java and Visual Basic (VB) to directly interpret and execute commands. It can establish and control parameters and translate design intents into codes that are easily identified by a computer. Consequently, designers (with fluent scripting skills) can define and control the logics and rules for parametric design in a way that allows them more freedom than any other tools. Common scripting languages include Python, VB, and Ruby, while scripting tools include Python Script, RhinoScript, Processing, and CADscript. There are also several design analysis tools, which are often used together with parametric software. For example, Ecotect is used for analyzing energy performance, while ETABS is used for structural analysis. These analysis tools are capable of exporting analysis data into parametric software to direct design generation and optimization.

Reshaping Architectural Design Impact on Architectural Design Designers’ activities change in a parametric design environment compared to those in a traditional CAD environment. Firstly, designers design rules and define logical relationships of the design rather than only modeling geometries. One of the biggest differences between parametric design and traditional computer-aided design is that rule sets become basic design elements and procedures in parametric design (Abdelsalam 2009). While building parametric models, designers set variations, design data flows, and define and adjust rules as well as the values of parameters. They are not only required to think about a particular building design but also the rule sets that generate and underpin the design. Additionally, through the control of these logical relationships, parametric design ensures many more possibilities for the generated solutions (Hernandez 2006; Karle and Kelly 2011). Secondly, architects can easily trace back through the parametric design process and are free to make changes in any steps of the process. During parametric design, through the defined rule algorithms, systems are clearly differentiated and correlated, and design activities are clearly communicated with each other within the process (Schumacher 2009). Therefore, designers are able to go back to any step along the process to change parameters or revise rules in order to modify the design process for different purposes or to trial different variations. This level of flexibility allows architects

51

Parametric Design: Theoretical Development and Algorithmic. . .

1377

Fig. 4 A large number of form variations generated in a parametric design system

to keep the design “open.” Thirdly, a large number of design alternatives can be developed in parallel. Traditionally, designers consider a very small number of alternatives because of time or memory limitations (Woodbury and Burrow 2006). Therefore, as Akın (2001) has argued, design solutions are not usually optimal, but only satisfactory, according to a preset level of aspiration. In parametric design, once the rule algorithms are established, a large number of design alternatives may be generated (see Fig. 4, for instance). This rapid design prototyping can significantly expand the design possibilities and widen the designer’s thinking. Further, designers do not need to predetermine or fixate on any solutions at early stages (Hernandez 2006; Holland 2011; Karle and Kelly 2011), which allows the maximum potential to be maintained in the process, until the final design solution is achieved and evaluated. Parametric design offers not only a new computational design tool but also a new way of design thinking. Generally, design is an abduction process in which both the final artifacts and their behaviors are designed or defined (Dorst 2011). In parametric design, Yu et al. (2015a) suggested that designers’ behavioral patterns shift between two levels of activities – the design-knowledge level and the rule-algorithm level (Fig. 5). During a typical parametric design process, there are design activities that occur on both levels. At the design-knowledge level, architects make use of their design knowledge, including, for example, how to adapt a building to the site, how to shape the way people use the building, and how to satisfy other requirements of their clients. At the rule-algorithm level, designers apply design knowledge through the operations of various parametric design features, including defining the rules and their logical relationships in the design, choosing the parameters suitable for a particular purpose, and importing external data into the system to direct and

1378

N. Gu et al.

Fig. 5 Two types of design spaces in a parametric design environment

evaluate the generation. During the parametric design process, designers progress by applying specialist knowledge, and in some parts (namely, at the rule-algorithm level), they apply design knowledge indirectly through rule algorithms. As observed by Yu et al. (2015a), designers often start from the design knowledge level, then as the design proceeds their activities at the rule algorithm level can rise, while those at the design knowledge level can decrease but do not cease. This implies that, although rule algorithms in parametric design are the focus and offer many advantages, such as the support for design flexibility (Fischer et al. 2003) and a capacity to deal with design complexity (Leach 2008), nevertheless design knowledge still appears to be essential during parametric design. These preliminary findings by Yu et al. (2015a) have implications for the practice and teaching of parametric design in architecture. Although training for programming/scripting skills is of course important, training for design thinking is also vital. As designers substitute rule algorithm for design knowledge during parametric design, understanding how design knowledge is expressed in rule algorithms in parametric design is important. It has been noted that, in terms of high-level design thinking, designers’ cognitive behavior does not significantly vary because of the computational tools adopted (Yu et al. 2013).

Limitations of Parametric Design Parametric design has been widely practiced in recent years, and its characteristics, such as design flexibility and controllability, have rapidly escalated its popularity (Barrios 2005; Fischer et al. 2005; Salim and Burry 2010). However, a number of limitations have been reported. Firstly, parametric design requires relatively higher acquisition and implementation costs. Some large-scale and more complex tasks also require additional and specialized computing power. Therefore it can be difficult for small-scale design firms to adopt parametric design. Secondly, a typical challenge of using parametric design is how to select appropriate variations. Peña and Parshall (2001) have argued that problem finding is the most important part in a design process, and only by defining problems appropriately would problems be

51

Parametric Design: Theoretical Development and Algorithmic. . .

1379

solved. In parametric design, defining variations is critical to representing design concepts. Variations support design flexibility and the possibility to develop ideas in parallel. Designers benefit in having multiple variations, and this only makes sense when the system can adequately and appropriately capture and represent the problems. For example, some design factors such as Energy Conservation Index (ECI), budget, architectural building codes, and structural requirements can be quite easily quantified and integrated into the parametric system. However, other factors such as aesthetics, historical context, and social interaction are difficult to measure and represent in the system. As a result, architects’ critical thinking (by applying the relevant design knowledge) is still important even with the use of parametric design tools. Thirdly, parametric design without careful implementation can sometimes lead to excessive or unnecessary complexity. Over-pursuing forms and appearances can make some parametric design projects appear to be superficial and lacking contextualization. After all, for architecture, the complex form should come from the complex needs and behaviors of people rather than the pure urge for “bizarre” appearances (Leach 2008).

Conclusion As one of the most frequently applied computational and generative design systems, parametric design plays an important role, especially in various leading architectural practices. This chapter has presented the theoretical development of parametric design, discussing its connection to generative design in general, together with a historical review and a basic introduction of the key concepts and applications. Parametric design has significant impact on architecture. Firstly, parametric design provides a formal framework and computational means for generating and managing complex and dynamic building forms that have never been easily and adequately achieved (by traditional CAD tools). This has opened up intellectual debates about new aesthetics, styles, and movements in contemporary architecture. Secondly, parametric design promotes collaboration. With parametric design tools, as well as the integration of other specialization tools for performance analysis, we have witnessed increasing and more integrated teamwork between architects and other experts for design optimization. Thirdly, through defining rule algorithms, parametric design adds and controls rationality in architectural design, which can assist designers in making more informed decisions to achieve more superior outcomes. From the designer’s perspective, parametric design tools provide architects with opportunities for designing at both the design knowledge and rule algorithm levels, which opens up many new possibilities, as discussed throughout the chapter. At the same time, new challenges also emerge. First and foremost, the role of architects is changing, such that they need to be both architects and programmers/scripters. The fact that architects can directly encode designs as computer languages can be both efficient and constraining. Constantly switching between activities of design knowledge and rule algorithm during parametric design can also affect an architect’s

1380

N. Gu et al.

design cognition, which should be carefully considered when designing parametric systems and facilitating them in practice and in education.

Cross-References  Shape Grammars: A Key Generative Design Algorithm

References Abdelsalam M (2009) The use of the smart geometry through various design processes: using the programming platform (parametric features) and generative components. Proceedings of the Arab Society for Computer Aided Architectural Design (ASCAAD 2009), Manama, Kingdom of Bahrain, pp 297–304 Akın Ö (2001) Variants in design cognition. In: Eastman C, Newstetter W, McCracken M (eds) Design knowing and learning: cognition in design education. Elsevier Science, Oxford, pp 105–124 Almusharaf AM, Elnimeiri M (2010) A performance-based design approach for early tall building form development. Proceedings of the Arab Society for Computer Aided Architectural Design (ASCAAD 2010), Fez, Morocco, pp 39–50 Baerlecken D, Martin M, Judith R, Arne K (2010) Integrative parametric form-finding processes. Proceedings of the 15th international conference on computer aided architectural design research in Asia (CAADRIA), Hong Kong, pp 303–312 Barrios C (2005) Transformations on parametric design models. Proceedings of the 11th international conference on computer aided architectural design futures (CAAD Future), Vienna, Austria, pp 393–400 Batty M (1997a) Cellular automata and urban form: a primer. J Am Plan Assoc 63(2):266–274 Batty M (1997b) Urban systems as cellular automata. Environ Planning B Plan Des 24:159–164 Bernal M (2011) Analysis model for incremental precision along design stages. Proceedings of the 16th international conference on computer aided architectural design research in Asia(CAADRIA), Newcastle, Australia, pp 19–18 Blum C, Li X (2008) Swarm intelligence in optimization. In: Blum C, Merkle D (eds) Swarm Intelligence. Springer, Berlin, pp 43–85 Boden M, Edmonds E (2007) What is generative art. Digit Creat 20(1):21–46 Burry M (2003) Between intuition and process: parametric design and rapid prototyping. In: Kolarevic B (ed) Architecture in the digital age – design and manufacturing. Spon Press, New York/London, pp 149–162 Burry J, Holzer D (2009) Sharing design space: remote concurrent shared parametric modeling. Proceedings of 27th eCAADe conference, Istanbul, Turkey, pp 333–340 Cárdenas CA (2007) Modeling strategies: parametric design for fabrication in architectural practice. Dissertation, Harvard University Chakrabarti A, Shea K, Stone R, Cagan J, Campbell M, Hernandez NV, Wood KL (2011) Computer-based design synthesis research: an overview. J Comput Inf Sci Eng 11:021003– 021012 Chien SF (1998) Supporting information navigation in generative design systems. Dissertation, Carnegie Melon University Dorst K (2011) The core of ‘design thinking’ and its application. Des Stud 32(6):521–532 Eastman CM (ed) (2008) BIM handbook: a guide to building information modeling for owners, managers, designers, engineers and contractors. Wiley, Hoboken Eckert C, Kelly I, Stacey M (1999) Interactive generative systems for conceptual design: an empirical perspective. Artif Intell Eng Des Anal Manuf 13:303–329

51

Parametric Design: Theoretical Development and Algorithmic. . .

1381

Fasoulaki E (2007) Genetic algorithms in architecture: a necessity or a trend? Proceedings of 10th generative art international conference. Milan, Italy Fischer T, Herr C (2001) Teaching generative design. Proceedings of 4th generative art international conferene. Milan, Italy Fischer T, Burry M, Frazer J (2003) Triangulation of generative form for parametric design and rapid prototyping. Proceedings of 21th eCAADe conference, Graz, Austria, pp 441–448 Fischer T, Burry M, Frazer J (2005) Triangulation of generative form for parametric design and rapid prototyping. Autom Constr 14(2):233–240 Frazer J (2016) Parametric computation: history and future. Archit Des 86(2):18–23 Gane V, Haymaker J (2009) Design scenarios: methodology for requirements driven parametric modeling of high-rises. Proceedings of the 9th international conference (CONVR 2009), Sydney, Australia, pp 79–90 Garnier S, Gautrais J, Theraulaz G (2007) The biological principles of swarm intelligence. Swarm Intell 1:3–31 Gero JS (1996) Creativity, emergence and evolution in design. Knowl-Based Syst 9(7):435–448 Gero JS, Kazakov V (1996) An exploration-based evolutionary model of generative design process. Comput Aided Civ Infrastruct Eng 11(3):211–218 Grasl T, Economou A (2013) From topologies to shapes: parametric shape grammars implemented by graphs. Environ Plan B: Plan Des 40:905–922 Hernandez CRB (2006) Thinking parametric design: introducing parametric Gaudi. Des Stud 27(3):309–324 Herr C, Kvan T (2007) Adapting cellular automata to support the architectural design process. Autom Constr 16(2007):61–69 Hillyard RC, Braid IC (1978) Analysis of dimensions and tolerances in computer-aided mechanical design. Comput Aided Des 10(3):161–166 Hnizda M (2009) Systems-thinking: formalization of parametric process. Proceedings of the Arab Society for Computer Aided Architectural Design (ASCAAD 2009), Manama, Kingdom of Bahrain, pp 215–223 Holland J (1968) Hierarchical descriptions, universal spaces and adaptive systems: technical report. University of Michigan, Ann Arbor Holland J (ed) (1975) Adaptation in natural and artificial systems. An introductory analysis with application to biology, control, and artificial intelligence. University of Michigan Press, Ann Arbor Holland N (2011) Inform form perform. Proceedings of ACADIA regional 2011 conference, pp 131–140 Holzer D, Hough R, Burry M (2007) Parametric design and structural optimisation for early design exploration. Int J Archit Comput 5(4):625–643 Hornby GS, Pollack J (2001) The advantages of generative grammatical encodings for physical design, 2001 IE congress on evolutionary computation. IEEE, Seoul Jo J, Gero JS (1998) Space layout planning using an evolutionary approach. Artif Intell Eng 12(3):149–162 Karle D, Kelly B (2011) Parametric thinking. In: Proceedings of ACADIA regional 2011 conference, pp 109–113 Knight T (2003) Computing with emergence. Environ Plan B Plan Des 30:125–156 Kolarevic B (ed) (2003) Architecture in the digital age: design and manufacturing. Spon Press, New York Krish S (2011) A practical generative design method. Comput Aided Des 43:88–100 Leach N (ed) (2008) (Im)material processes – new digital techniques for architecture. China Architecture & Building Press, Beijing Lee JH, Ostwald M, Gu N (2016) A justified plan graph (JPG) grammar approach to identifying spatial design patterns in an architectural style. Environ Plann B: Urban Analytics City Sci 45(1):67–89 Light R, Gossard D (1982) Modification of geometric models through variational geometry. Comput Aided Des 14(4):209–214

1382

N. Gu et al.

Maher ML (1990) Process models for design synthesis. AI Mag 11:49–58 Maher A, Burry M (2003) The parametric bridge: connecting digital design techniques in architecture and engineering. Proceedings of the 2003 annual conference of the association for computer aided design in architecture, Indianapolis, Indiana, pp 39–47 Monedero J (2000) Parametric design: a review and some experiences. Autom Constr 9(4): 369–377 Ostwald M (2012) Systems and enablers: modeling the impact of contemporary computational methods and technologies on the design process. In: Gu N, Wang X (eds) Computational design methods and technologies: applications in CAD, CAM and CAE education. IGI Global, Pennsylvania, pp 1–17 Oxman R (1990) Design shells: a formalism for prototype refinement in knowledge-based design systems. Artif Intell Eng 5(1):2–8 Oxman R (2006) Theory and design in the first digital age. Des Stud 27(3):229–265 Parish Y, Müller P (2001) Procedural modeling of cities. Proceedings of the 28th annual conference on computer graphics and interactive techniques. ACM, New York, pp 301–308 Peña W, Parshall S (eds) (2001) Problem seeking: an architectural programming primer. Wiley, New York Rajus V S, Woodbury R, Erhan H, Riecke B E, Mueller V (2010) Collaboration in parametric design: analyzing user interaction during information sharing. Proceedings of the 30th annual conference of the Association for Computer Aided Design in Architecture (ACADIA), New York, pp 320–326 Roberto C, Hernandez B (2004) Parametric Gaudi. Proceedings of the 8th Iberoamerican congress of digital graphics (SIGraDi 2004), Porte Alegre, Brasil, pp 213–215 Salim F, Burry J (2010) Software openness: evaluating parameters of parametric modeling tools to support creativity and multidisciplinary design integration. Proceedings of computational science and its applications (ICCSA 2010), Berlin and Heidelberg, Germany, pp 483–497 Sarkar P (2000) A brief history of cellular automata. ACM Comput Surv 32(1):80–107 Schlueter A, Thesseling F (2008) Balancing design and performance in building retrofitting: a case study based on parametric modeling. Proceedings of the 28th annual conference of the Association for Computer Aided Design in Architecture (ACADIA) pp 214–221 Schumacher P (2009) Parametricism – a new global style for architecture and urban design. AD Archit Des – Digi Cities 79(4):14–23 Singh V, Gu N (2011) Towards an integrated generative design framework. Des Stud 33:185–207 Stiny G, Gips J (1972) Shape grammars and the generative specification of painting and sculpture. Inf Process 71:1460–1465 Stiny G, Mitchell WJ (1978) The Palladian grammar. Environ Plan B 5:5–18 Tching J, Reis J, Paio A (2016) A cognitive walkthrough towards an interface model for shape grammar implementations. Comput Sci Inf Technol 4:92–119 Von Neumann J (1951) The general and logical theory of automata. In: Taub AH (ed) John von Neumann, collected works. Pergammon Press, New York, pp 280–326 Wolfram S (1986) Random sequence generation by cellular automata. Adv Appl Math 7: 123–169 Woodbury R (ed) (2010) Elements of parametric design. Routledge, New York Woodbury RF, Burrow AL (2006) Whither design space? Artificial intelligence for engineering design. Anal Manuf 20(2):63–82 Woodbury R, Aish R, Kilian A (2007) Some patterns for parametric modeling. Proceedings of the 27th annual conference of the association for computer aided design in architecture Halifax, Nova Scotia, pp 222–229 Yu R, Gero JS, Gu N (2013) Impact of using rule algorithms on designers’ behavior in a parametric design environment: preliminary results from a pilot study. Proceedings of the 15th International conference on computer aided architectural design futures (CAAD FUTURES 2013), Shanghai, China, pp 13–22

51

Parametric Design: Theoretical Development and Algorithmic. . .

1383

Yu R, Gu N, Ostwald M, Gero J (2015a) Empirical support for problem-solution co-evolution in a parametric design environment. Artif Intell Eng Des Anal Manuf (AIEDAM) 29(01):33–44 Yu R, Ostwald M, Gu N (2015b) Parametrically generating new instances of traditional chinese private gardens that replicate selected socio-spatial and aesthetic properties. Nexus Netw J 17(3):807–829

Shape Grammars: A Key Generative Design Algorithm

52

Ning Gu and Peiman Amini Behbahani

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic Shape Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Main Components of a Shape Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shape Grammar Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Designing a Shape Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extensions of Basic Shape Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parametric Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graph Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further Discussion on the Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications of Shape Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Description and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproduction and Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization and Customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Combination with Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implementation of Shape Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shape Grammar and Other Generative Design Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1386 1387 1388 1388 1389 1391 1393 1393 1394 1395 1395 1396 1396 1397 1399 1400 1400 1401 1402 1403

Abstract Shape grammars are one of the main generative design algorithms. The theories and practices of shape grammars have developed and evolved for over four

N. Gu () · P. Amini Behbahani School of Art, Architecture and Design, University of South Australia, Adelaide, South Australia, Australia e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_7

1385

1386

N. Gu and P. Amini Behbahani

decades and showed significant impact on design computation and contemporary architecture. The formal computational approach to generative design as specified in shape grammars, and the novel descriptions and applications of shapes and shape rules for representing and composing a design, has become the foundation and inspiration for many contemporary computational design methods and tools, especially parametric design, which is a current leading computational design method. This chapter gives an overview of the historical developments and applications of shape grammars. The algorithm is introduced by highlighting the background, key components, and procedures for design generation, methods, and issues for authoring shape grammars, shape grammar evolution and extension, purposes of shape grammar application, as well as implementation of shape grammars. The characteristics of shape grammars are presented and discussed by comparing them to other key generative design algorithms, some of which have been applied in conjunction with shape grammars. This chapter shows that shape grammars have significant potentials in design generation, analysis, and optimization, as seen in many of the grammar studies. The future directions should focus on further research, improved pedagogy, as well as validation in design practice, to further advance the field.

Keywords Shape grammars · Shape rule · Generative design algorithm · Design generation · Design analysis

Introduction Generative design algorithms as an important part of computational design research are used to formulate or automate parts of the design process. They are intended to assist the human designer to explore the design space more formally and effectively (Singh and Gu 2011). They add to the traditional design philosophy and methodology by enabling designers to systematically manipulate, engage with, and interact with design (McCormack et al. 2004). Shape grammars have been one of the main computational approaches to generative design for about four decades. Emerging during the epoch of Noam Chomsky’s structuralist ideas in the 1960s and 1970s, they have since been developed for various design applications, including stylistic analysis, design generation, and customization, and have also been used in conjunction with other generative algorithms for more optimal results. Shape grammars and their extensions have been used in various creative fields, ranging from architecture – where the earliest examples were demonstrated – to industrial design, visual art such as paintings and textile design, and even expanding to music, animation, and computer languages. This chapter aims to provide an overview of shape grammars, highlighting various aspects of this key generative design algorithm. The chapter begins with the background to grammar development. It proceeds by introducing the main

52 Shape Grammars: A Key Generative Design Algorithm

1387

components and procedures of the algorithm. Further, it introduces the extensions of shape grammars. A number of noteworthy examples of shape grammar applications are then presented to demonstrate the algorithm. The chapter concludes by summarizing the current developments and future directions of shape grammars.

Background After World War II, a movement started that criticized Modernism for an excessive functionalism in modernist architecture (Argan 1996). One of the main arguments of this movement, called typology, valued the continuity and integration of premodernist architecture and urbanism. With typology, designers started to develop and apply the abstraction of classic shapes, forms, and their combinations, which were labeled as types (Colquhoun 1996). Although typology was efficient in describing features of a style, it fell short of providing a mechanism for generating instances under the style (Duarte 2001). In parallel and within the structuralist school of linguistics research, Chomsky (1965) proposed that humans possess an innate mechanism (called Language Acquisition Device or LAD) which includes basic grammatical features and enables infants to quickly learn the grammatical syntax of their language. He called these hard-wired features universal grammars (UG). Chomsky hypothesized that UG provide a set of rules for generating different syntactic possibilities in all human languages. The generative capability of UG allows a language to produce an infinite number of grammatically correct sentences and phrases out of a limited vocabulary, relative to each language. One of the basic rules of UG is transformation: the components of the sentence (or their positions) are transformed to generate new sentences while preserving the correct grammatical structure of the sentence. The major innovation of LAD and UG was that they opposed the behaviorist view that a human learns the language through interaction and inference by trial and error (Chomsky 1965). While there have since been rigorous debates (Everett 1993; Tomasello 2009) on the extent or even existence of UG within linguistics and cognitive sciences, the UG hypothesis has created the prospect of structuring the generative process or algorithm. One of the areas where UG has been adopted is design, and this concept has proved to be extremely useful in furthering design studies on typology. Under the above context, Stiny and Gips (1972) first applied the core concept of UG in design to develop shape grammars. A generative system with transformation rules was demonstrated, which could generate geometrical shapes subject to the application of the rules. The premise of such design grammars was that, similar to language, there are abstract transformation rules which can generate numerous design instances out of a finite set of components. Each set of rules can capture a particular style or a language of similar designs by formally representing and reproducing them. Stiny first demonstrated a shape grammar for generating patterns for acrylics on canvas, as reported by Stiny and Gips (1972). This was followed by more complex shape grammars for generating lattice patterns for Chinese traditional

1388

N. Gu and P. Amini Behbahani

Fig. 1 Selected processes of the Palladian grammar application for generating the ground floor plan of La Malcontenta

windows (Stiny 1977) and for generating the ground plans of Palladian villas (Stiny and Mitchell 1978) (Fig. 1).

Basic Shape Grammars A shape grammar is generative in nature, which provides a formal and recursive framework to rapidly produce designs within a certain style (Chakrabarti et al. 2011). For this purpose, grammars encode design knowledge within a computational structure as rules. Because the design knowledge can be embedded into the grammar, users of the grammar do not need to have disciplinary expertise in order to generate a design. The application of the grammar is supported by the transformation rules that each replaces one shape with another through a set of geometrical operations. This section briefly reviews the basic components of a shape grammar and its application.

Main Components of a Shape Grammar The main components of a shape grammar are shape rules that specify a set of spatial transformations of shapes. A shape is a generic term for any set of geometrical entities containing finite sets of vertices connected by maximal lines (Stiny 1980). A shape rule is a couplet of left-hand side (LHS) and right-hand side (RHS) shapes representing the states before and after the rule application, respectively, where the RHS shape is the result of the spatial transformations of the LHS shape. For illustrating a common shape rule, the shape couplet is ordered from left to right usually separated by an arrow symbol (Fig. 2).

52 Shape Grammars: A Key Generative Design Algorithm

1389

Fig. 2 A common shape rule containing the LHS and RHS shapes

Fig. 3 Shape rule examples based on different types of transformation: (a) addition, (b) subtraction, (c) division, (d) modification, and (e) substitution

A shape grammar contains a finite set of shapes, a finite set of symbols, as well as a finite set of rules that can be applied to transform the shapes. The final key component of a shape grammar is the initial shape that specifies the starting point of the grammar application (Stiny 1980). While the initial shape can be any shape (including void), a part or the whole of the shape should match the LHS shape of at least one shape rule so that the grammar can be initiated. After the initiation, the grammar recursively applies the rules by transforming LHS shapes that are generated in each earlier step into RHS shapes. Generally, for a grammar application to proceed, a part or the whole of the generated shapes will continue to match the LHS shape(s) of the shape rule(s); otherwise the application will be terminated. Stiny (1980) emphasizes the use of labels in specifying shapes and rules in a shape grammar, and this will be discussed in section “Shape Grammar Application.” The rules can transform shapes in various ways. The literature often refers to five common types of transformation (also called replacement format), which are addition, subtraction, division or split, modification, and substitution (Knight 1999; Trescak et al. 2009; Wonka et al. 2003). Addition rules retain the LHS shape while adding a new part to form the RHS shape. In contrast, subtraction rules remove a part from the LHS shape. In division rules, the LHS shape is divided into two or more parts while its outline remains intact. Modification rules preserve the geometrical elements of the LHS shape but modify its properties, such as proportion, orientation, or direction. Finally, in substitution rules, the RHS shape replaces the entire LHS shape, and the two shapes can differ significantly, for example, a square replaces a circle. See Fig. 3 for examples.

Shape Grammar Application A shape grammar is essentially a production system specifying a formal procedure for design generation that starts from the initial shape and proceeds by recursively transforming that shape through the shape rules. For each step of the application,

1390

N. Gu and P. Amini Behbahani

it is critical to detect and match possible LHS shapes and apply the corresponding rules until there are no matching LHS shapes from the generated design. To apply a shape grammar would also require further consideration of the following three issues: • In the case where more than one LHS shape is detected, should one or some be prioritized? If so, which shape(s) and rule(s)? • Further, how are these prioritization and selection processes facilitated, and by whom? • During shape detection, shape grammars allow designers to freely decompose and recompose shapes. This feature is called emergence, which signifies shapes that are not predefined but emerged. How should emergence be handled appropriately in a shape grammar application? The further development of shape grammars flexibly addresses these three key concerns in terms of varying decidability. For example, shape grammars can be completely unrestricted, allowing multiple rules to be applied simultaneously in a flexible order; or they can be deterministic, with strict controls for both rule selection and application (Knight 1999). To manage decidability, a number of control mechanisms are used during shape grammar application to determine what rules to apply and in what order they will be applied, as well as what degree of emergence is to be enabled. This chapter categorizes the control mechanisms into three general levels: internal, external, and parallel. The internal mechanisms are essentially geometric. The shapes in the rules are defined in a way that enables or does not enable the application of multiple rules, to determine order of rule application and to anticipate possible shape emergence. We called this level internal, as it only pertains to shapes that are the core of shape grammars. The second category is external, allowing an external agent – human or computer – to decide altogether. The criteria for rule selection, application, and termination can be the fulfillment of certain design criteria and preferences or time, process, and scale constraints. The third category of control is parallel, which can be described as the incorporation of selected factors influencing the external agent’s decision-making within the shapes and rules. For this purpose, shapes are accompanied by nongeometrical descriptions or symbols. This is where the earlier mentioned “labels” become applicable. See Fig. 4 for examples. In a simpler form, the descriptions can be represented as various markers (symbols, labels, lateral geometrical features, etc.), which are only used for controlling the detection of LHS shapes for rule application. This approach has been most common in shape grammar development (Stiny 1980). In a more elaborate form, the descriptions can contain semantic indicators related to function or shape. An important role of these descriptions is to facilitate the detection of LHS shapes (Stiny 1981). In other words, they control shape emergence in order to effectively detect and match LHS shapes from the resultant design for grammar

52 Shape Grammars: A Key Generative Design Algorithm

1391

Fig. 4 Two variants of descriptions using different labels: alphabets (left) and points (right)

application. Descriptions are especially important in more complex shapes where visual scanning of the design becomes difficult or unexpected shape emergence cannot be effectively managed. Figure 4 illustrates some example descriptions and demonstrates their uses. In addition to the practical uses discussed above, descriptions can also have a further effect on the development and extension of shape grammars. With descriptions, it is possible to combine nongeometrical elements with shapes. Shapes have become symbolic objects rather than pure geometries. This notion is the main cause for the emergence of several shape grammar extensions, as outlined in section “Extensions of Basic Shape Grammars.” With descriptions, the original shape grammars have also inspired the development and application of other grammars such as spatial grammar (McKay et al. 2012), design grammar (Pauwels et al. 2015), and set grammar (shapes are treated as members of a predefined set) (Garcia 2017).

Designing a Shape Grammar In this subsection, common methods and procedures for developing a shape grammar are discussed. The section presents three aspects of developing a grammar: the selection of design instances (corpus) to prepare for the development, the grammar development, and, finally, evaluation.

Corpus Selection The corpus of a shape grammar is a collection of design instances which are selected out of a particular style or class. Such a corpus is only necessary for analytical grammars (grammars that aim to represent or reproduce existing styles or classes), but not for original grammars (grammars that aim to develop original

1392

N. Gu and P. Amini Behbahani

or new designs) (Pauwels et al. 2015). The corpus forms the basis for grammar development. Ideally, the grammar corpus contains all the available instances. However, this rarely happens due to the large and diverse collection of some styles. The importance of the corpus selection is that different selections may result in different grammars (Stiny 2006). The corpus selection is an inductive process (Johnson 2003) involving the extraction of common rules to generate the selected design instances. The goal of this process is to capture commonalities of the instances and to remove superficial characteristics in order to access deeper and more essential properties which underpin the generalized concepts of the style or class. Seawright and Gerring (2008) define the following cases for selecting design instances: • Typical cases form an exemplary selection of the sample population, which largely represent the average characteristics of the style or class (Flyvbjerg 2006). However, these cases are not necessarily the richest in information. • Diverse cases retain maximum variation in different dimensions (Seawright and Gerring 2008), which enables information to be extracted from widely different situations (Flyvbjerg 2006). • Extreme and deviant cases hold unusual or surprising values compared to other cases. They are usually used for enhancing and evaluating understandings about the main characteristics (Flyvbjerg 2006). • Influential cases are those whose analysis may affect the overall understanding about the style or class, disregarding the rest of the cases. • Most similar and different cases are those with the most similar or distinct features and values when compared to each other.

Shape Grammar Development Shape grammar development is a logical argumentation process based on shapes and their operations. It requires multiple rounds of axiom-based modeling/analysis, trials and errors, and so on. Duarte (2001) proposes a four-step process for developing a shape grammar based on an existing corpus. It includes rule inference, prototyping, reverse grammar, and testing: • In rule inference, the corpus is analyzed from different perspectives to reveal the spatial or form-related structures of the individual instances. • Prototyping accounts for finding common features of the individual structures identified from the previous step. • Reverse grammar is about checking every instance in the corpus to test whether it is possible to trace back to the initial shape from an existing design. • The testing step is to evaluate the success of the grammar applications in producing different designs in both the corpus and beyond. This step is further explained below.

52 Shape Grammars: A Key Generative Design Algorithm

1393

Shape Grammar Evaluation The evaluation of a shape grammar examines both the feasibility and efficiency of the grammar. A well-designed grammar should be able to generate designs and alternatives that are consistent with the selected corpus. The grammar should also be free from errors and provide controls over unexpected design emergence. For analytical grammars, three types of tests are considered for evaluation (Stiny and Mitchell 1978). The tests aim to evaluate the abilities of the grammar to: • Regenerate the selected corpus. • Generate other design instances of the style that are not in the corpus. • Generate new designs based on the style (only if the grammar’s goal is to also generate such new designs). The evaluation of a grammar has subtle differences from the evaluation of its semantic outcomes (i.e., meeting specific design criteria). Shape grammars do not always have an internal mechanism to closely match their outcome against the design criteria (Singh and Gu 2011). Such evaluation and selection processes are usually performed after the generation, either by human designers or through other computational processes.

Extensions of Basic Shape Grammars Basic shape grammars are one of the earliest applications of Chomsky’s grammars in the context of visual art and design. Since the 1980s, and on the basis of the shape grammar theorization, different developments have extended the basic shape grammars to address different needs of design generation or analysis. This section focuses on discussing three different types of extensions, including parallel grammars, parametric grammars, and graph grammars.

Parallel Grammars Additional descriptions of the shapes other than the geometries can also be specified in a shape grammar, enabling a parallel design transformation within the grammar. To facilitate this, an independent rule set can be added that corresponds and is synchronized with the respective shape grammar rules (Knight 2003). These types of shape grammars are called parallel grammars (prior to the 1990s, the term “parallel grammar” referred to an unrelated notion of a grammar being able to produce two or more shapes in each step during the application (see Stouffs 2016, “An algebraic approach to implementing a shape grammar interpreter,” pp. 329–338)) where each rule set addresses a different aspect of the design, and together they are applied simultaneously (Knight 2003). The description rules work

1394

N. Gu and P. Amini Behbahani

Fig. 5 Selected rule sets from a parallel grammar, based on Duarte’s (2001) Malagueira grammar

in the same way as the shape rules do, by replacing an LHS component with the RHS one. Descriptions in a basic shape grammar are realized using labels and symbols that attach directly to the shapes. In a parallel grammar, they can be an entirely separate rule set. The main advantage of parallel grammars is to allow a grammar to address other design requirements beyond shape or form. They can also effectively address some technical issues during shape grammar application, such as shape matching and design emergence, due to the more structured and complex system of defining design descriptions. Figure 5 demonstrates an example of parallel grammar developed for Siza’s Malagueira houses (Duarte 2001). A set of semantic rules describe functional zones, dividing a general-purpose lot into indoor and outdoor spaces, then further splitting the indoor space into public and private areas. They are applied in parallel to the shape rules to generate the outcomes (Fig. 5).

Parametric Grammars Descriptions can add some levels of richness and complexity to the rigid geometries of shapes. To maximize such flexibility in a grammar, a selection of shapes and their transformations are generalized as one generic shape with parameterized features that can be adjusted according to a number of criteria and constraints (Stiny 1980). Such a grammar is called a parametric grammar.

52 Shape Grammars: A Key Generative Design Algorithm

1395

Parameters can affect the grammar in different stages and forms. Parameters can be applied during the detection of LHS shapes, selection of the rules, or determining the features of the RHS outcome. For example, the LHS shape of rule can be a quadrilateral with a certain range of height-width ratio (i.e., the parameter). This ratio may in turn affect a corresponding ratio in the RHS shape. Parametric constraints are not necessarily numerical, as in the above example. They can be geometrical and semantic or use any other form(s) a description may have. Because of their flexibility, parametric grammars possess more potential and have been applied more commonly in recent development (Abdesalam 2012). In addition, they may reduce the number of rules and therefore simplify the grammar (Stavrev 2011).

Graph Grammars Historically, graph grammars derive from string grammars, not Chomskyite generative grammars (Chakrabarti et al. 2011). More recently, they have been contextualized and applied within shape grammar research. Graph grammars contribute to shape grammar research by using two approaches, namely, geometric and semantic approaches. The geometric approach comes from the analogy between graphtopological and shape-geometrical vertices and edges, focusing on defining shapes or forms with graph elements. The shapes here are represented as a collection of vertices that are connected by edges. In the semantic approach, the graph’s nodes are associated with the semantics of shapes. For example, in a building, the nodes of the graph represent rooms and their associated purposes, while the edges represent the opening or adjacency between them. Some examples of this approach are Eloy’s transformation grammar (Eloy 2012) and Lee et al. (2016) graph grammar of Prairie-style houses. The main incentive for incorporating graph theories into shape grammar is the simplicity and abstractness of graphs. On the one hand, graph grammars mitigate the complexities of visual shape detection. On the other hand, they represent shapes or forms in a topological manner, which may better reflect the interrelations within the design, both structurally and functionally.

Further Discussion on the Extensions The three extensions introduced above can be combined to create more powerful grammars. For example, it is possible to have a parallel parametric grammar that includes a rule set defined by graph elements (see Eloy 2012, the transformation grammar definition). These three extensions are only three typical formal grammars, contributing to aspects of the manageability, flexibility, and intelligibility of shape grammars. Other important extensions of shape grammars include: • Color grammar (Knight 1989), which is a parallel grammar that uses colors as descriptive elements. A major difference between color and symbolic descrip-

1396

N. Gu and P. Amini Behbahani

tions is that colors can be combined together to form a more visually rich design. They can also represent design attributes other than the actual colors, for example, materials, building elements, and so on. • Sortal grammar (Stouffs and Krishnamurti 2001) abstracts shapes (or descriptions) as sorts to optimize the manageability and flexibility of the grammar. • Discursive grammar (Duarte 2001) is a parallel parametric grammar with additional heuristics for managing rule selection. The heuristics are high-level non-procedural semantic descriptions (e.g., design goals or criteria), which can act as parameters for applying and assessing rules. There are different terminologies regarding grammar developments and their application domains. In general, the most abstract levels of grammars are called formal or structure grammars, in which the vocabulary (LHS and RHS components) can be of any kind of structures. The next level pertains to the grammars with specific types of basic vocabularies. For example, shape grammars, of basic “shapes,” have vocabularies consisting of pure geometrical elements, graph grammars involve graph representations, and the vocabulary of a motion grammar includes patterns of movement. Beyond visual art and design, developments such as motion or music grammars have extended the shape grammar logics into the respective new disciplines. Hence, shape grammars may be called design grammars in this regard. Figure 6 shows the chronology of the shape grammars and their related developments.

Applications of Shape Grammars Shape grammars are commonly known for being a type of generative design method. However, generation is not the only application of shape grammars. McKay et al. (2012) consider four major areas where shape grammars can be applied, based on a survey by Gips (1999): parsing, development, generation, and inference. Considering other studies (Huang et al. 2009; Özkar and Stiny 2009; Singh and Gu 2011), this section adapts the four areas as follows.

Description and Analysis Originally, shape grammars were developed for describing paintings and sculptures (Stiny and Gips 1972). Their aim was to understand and abstract the form-related structure of a certain style or from a particular era (Huang et al. 2009). This application was then extended to, and commonly applied in, architecture. Shape grammars have been used to study a wide range of architectural styles or related designs, such as Palladian villas (Stiny and Mitchell 1978), Turkish traditional houses (Ça˘gda¸s 1996), Safavid caravanserais (Andaroodi et al. 2006), and Glenn Murcutt’s domestic architecture (Lee et al. 2016).

52 Shape Grammars: A Key Generative Design Algorithm

1397

Fig. 6 A diagram depicting the chronology of shape grammars and their related developments

Reproduction and Generation As a generative design method, shape grammars can be used to automate parts, or all, of the design process. Specifically, shape grammars are useful when dealing with modularity, due to its articulated nature (Özkar and Stiny 2009). It is significantly efficient for reducing costs in mass customization of design (Duarte 2001) because its embedded design knowledge reduces the need for more human expertise during the generation process. Through defined shapes and rules, shape grammars also retain the consistent compositions that grant a brand identity or design style (Özkar and Stiny 2009).

1398

N. Gu and P. Amini Behbahani

Fig. 7 A schema of shape grammar application for Prairie-style houses, figure adapted by AminiBehbahani (2016) based on Koning and Eizenberg’s original (1981)

We have discussed the distinction between analytical and original grammars (Pauwels et al. 2015). Analytical grammars are developed based on a corpus of existing designs, while original grammars are developed from scratch to generate novel designs. So far, a significant number of shape grammar examples have been of the analytical type. Many of them have not been intended to create a complete or practical design, but to demonstrate the generative possibilities. Examples of analytical grammars are shape grammars for Wright’s Prairie-style houses (Koning and Eizenberg 1981) (Fig. 7), coffee-making machine (Agarwal and Cagan 1998), Siza’s Malagueira houses (Duarte 2001), and Harley-Davidson motorcycles (Pugliese and Cagan 2002). Examples of original grammars are relatively rare. One reason may be that shape grammar research has focused more on design style, methodology, and philosophy rather than specific issues of practice (Krish 2011). Another possible reason is that, even for the purposes of generating original designs, there are always existing precedents that are useful as the foundation of the grammar and new design instances. There are, however, some examples of original grammars, including the planar truss grammar (Sass et al. 2005) (Fig. 8).

52 Shape Grammars: A Key Generative Design Algorithm

1399

Fig. 8 Selected rules and a possible design outcome from a shape grammar for generating space trusses (Sass et al. 2005), adapted by the authors

Fig. 9 A demonstration of transformation grammar (on shapes and graphs in parallel): two selected rules and current configurations in plan and graph (above); optimized configurations by applying the rules (below), with changes highlighted in circles on both plan and graph (Eloy 2012), adapted by the authors

Optimization and Customization One of the generative purposes of shape grammars is to support designers to explore and optimize generated design solutions. This is especially beneficial in education for understanding design alternatives and possibilities (Özkar and Stiny 2009). A typical example of such is the transformation grammar (Eloy 2012) to modify and upgrade apartments in Lisbon (Fig. 9).

1400

N. Gu and P. Amini Behbahani

Combination with Other Methods Referred to as inference, Gips (1999) suggests that a shape grammar can be used for extracting or developing another shape grammar. Moreover, shape grammars have been used in conjunction with other generative methods, such as cellular automata (Speller et al. 2007). In another attempt, Kielarova et al. (2015) demonstrate a combination of shape grammars with evolutionary processes for fine-tuning the grammar application progress of earring design, directed and controlled by evolutionary fitness functions.

Implementation of Shape Grammars The earlier grammars were implemented manually. However, most current grammars are applied through digital means. This is likely due to the increased complexity of more recent shape grammars and the digital nature of contemporary design practice. The implementation of shape grammars has always been feasible through interpreters (Gips 1999), which are software packages that enable the design or application of grammars. Interpreters are often plug-ins within CAD/BIM software or stand-alone programs with possible interactions with CAD/BIM software. Earlier interpreters were stand-alone tools with a basic interface. Since the 1990s, selected interpreters have been developed as plug-ins or scripts in commercial CAD packages (e.g., AutoCAD and Rhino) (Gips 1999). These interpreters address various aspects of grammar implementation, including capabilities of 2D or 3D, bitmap or vector-based shape detection, parametrization, graph grammar incorporation, etc. The development of new interpreters, or new features of interpreters, is likely to continue into the future for more efficient and effective implementations. There are limitations regarding shape grammar implementation that require further research. These limitations are both technical and, more importantly, behavioral. In one of the earliest reviews of interpreters, Gips (1999) argues that there is a gap between the developers and users of grammars. The developers are usually advanced in symbolic thinking and computer programming, while the users excel in visual thinking and designing. As a result, the main problem of most interpreters is that they do not adequately address the behavioral needs of both groups. Gips further correlates this issue to the fact that most interpreters are prototypes for demonstration and proof of concept and the available industrial and commercial tools are scarce. The core of this problem still exists today, despite dozens of grammar interpreters having since emerged for either general purposes or specific contexts. The lack of current CAD integration is also limiting shape grammar implementation (Chakrabarti et al. 2011). With the rapid development of contemporary CAD programs, the development of new interpreters does not seem to pick up the same pace and correspond to the latest CAD tools or methods. The existing grammar

52 Shape Grammars: A Key Generative Design Algorithm

1401

interpreters also do not closely resemble the interfaces and project workflows of the common CAD tools with which the designers are more familiar (Tching et al. 2016). Subsequently, because of the lack of commercialization, they can also lack the interactivity and appeal often expected by designers. Another limitation regarding shape grammar implementation (and the grammar itself) is that it does not adequately facilitate the reframing of design alternatives (Pauwels et al. 2015). In practice, designers may drastically move from one idea to another, but such exploratory flexibility is not feasible in most interpreters and shape grammars.

Shape Grammar and Other Generative Design Algorithms Generative design algorithm (GDA) or simply generative algorithm (the acronym GA is also used for generative algorithms. This chapter uses GDA to avoid confusing it with a more common use of the acronym GA that refers to genetic algorithms.) refers to various computational algorithms that assist in exploring the design space, usually by outlining a formal framework with principles and processes for generating numerous alternatives. These algorithms vary significantly in their components, procedures, criteria, and efficiency. Shape grammars are one of these algorithms, and this section discusses several other main algorithms in relation to shape grammars. GDAs can be approached and classified in various ways. In this chapter, we categorize them into three types, based on their identifiable approaches to design space exploration. These three types of GDAs are replacement, evolution, and agent interaction. In the replacement type, a part of design is replaced with another to generate new alternatives, usually based on certain rules. Shape grammars belong to this type of GDA. Another replacement type of GDA is L-system (or Lindenmayer system) which shares similar origins with shape grammars (namely, formal grammars from the 1960s) (Singh and Gu 2011). Unlike shape grammars, the components in L-systems are mainly symbols (or symbolic geometries). Another difference is that L-systems predominantly feature multiple rule selection in each step. These two differences make L-systems suitable for design situations with small components repeating in patterns over a relatively large extent. Examples of such situations include plant modeling and street design (Parish and Müller 2001). Another GDA that can be loosely put under this type is parametric combination or associative geometry (Hernandez 2006). In parametric combination, the replacement affects features such as the proportions and dimensions of the design rather than the entire entity of shapes or components (as in basic shape grammars). Under this method, a design is defined with its components parametrically associated with each other. Once a parameter changes, the dependent components will also change automatically, resulting into new design alternatives. The algorithms discussed above share a limitation – they do not have a native mechanism to evaluate the effectiveness of the outputs (Duarte 2001; Singh

1402

N. Gu and P. Amini Behbahani

and Gu 2011). This often necessitates a more elaborate approach in developing the rules by incorporating foreknowledge for quality control. The second type of GDA, evolution, is theorized based on its biological inspirations. The essential feature here is that the design alternatives are matched against a set of fitness functions in each step; the next step continues from the “fittest” alternatives for maximal optimization. Genetic algorithms are the most typical example of this type of GDA, supported by an evolutionary mechanism. For the third type of GDA, agent interaction, a society of agents navigate through the design space and generate design alternatives based on their interactions with the context and other related elements. Common examples of this type are cellular automata, swarm intelligence (Singh and Gu 2011), and space colonization (Runions et al. 2007). This type of GDA is usually a bottom-up process, where the useful outcomes emerge from the visual or spatial and functional congruence of different components. An obvious advantage of such a method is that the generated outcomes can be closely aligned to the requirement and context, minimizing the need for post-design evaluation. On the other hand, they can be visually or spatially restricted because their design space is geometrically limited (e.g., rigid grids for cellular automata). The above categorizations are not mutually exclusive. In other words, it is possible to have a GDA with features from two or even three different types. For example, in the development of shape grammars, evolutionary processes have been incorporated to better direct and evaluate design generation. These examples include shape grammars for Coca-Cola bottles (Ang et al. 2006), earring design (Kielarova et al. 2015), Prairie house validation (Granadeiro et al. 2013), and steel section evaluation (Franco et al. 2014).

Discussion and Conclusion This chapter has presented shape grammars, including the key components, the procedures of design generation, the main applications and extensions, as well as their implementations. Starting in the 1970s, shape grammars have since evolved and extended into various forms and been applied in different disciplines. They have also demonstrated design capabilities in addressing nongeometrical and semantic issues. To further enhance the generative power and to better assist designers, research has also focused on providing flexibility and manageability during shape grammar application. The significant potentials of shape grammars in design generation, analysis, and optimization are evident, as seen in many of the grammar studies. A key to such success is perhaps due to the visual basis (shapes and their spatial transformations as specified in shape rules), which makes shape grammars more suitable and easily understood for visual thinkers, such as architects and product designers. In addition, their generative capabilities (especially empowered by computation) make them very suitable and useful for mass design tasks. Despite these promising potentials, shape grammars as a generative algorithm have limitations, as discussed throughout the chapter. Further, the theories and

52 Shape Grammars: A Key Generative Design Algorithm

1403

applications of shape grammars have not yet been broadly utilized in practice. This will require further research as well as validation in design practice. The last decade has especially seen a surge in studies that combine shape grammars with either other grammars or other computational methods, to address different design needs. The generative nature of shape grammars, and the formal approaches to design representation of generation as specified in shape grammar research, has become the foundation and inspiration for contemporary computational design methods and tools, most notably parametric design, which has been widely adopted in leading architectural practices. It is perhaps in this regard that shape grammar research has demonstrated its most significant impact on design computation and contemporary architecture. We expect these hybrid developments and applications will continue and further increase into the future. Parallel to research and practice, pedagogy is another important future issue of shape grammars. Compared to research and practice, shape grammar pedagogy has often been overlooked and has rarely been recorded and debated in literature. To further develop shape grammars, and to effectively and broadly apply their generative power in the design industry, will rely on the future generation of designers and design students. Besides generation, the roles of shape grammars, especially in design analysis and optimization, have been largely unexplored in design education. They have the potential to provide very novel approaches for teaching design students about design history and science topics. To equally develop these three key pillars – research, practice, and pedagogy – is important for the future of shape grammars, and each will critically inform the other.

References Abdesalam M M (2012). The use of smart geometry in Islamic patterns. CAAD | INNOVATION | PRACTICE 6th international conference proceedings of the Arab Society for computer aided architectural design, pp 49–68 Agarwal M, Cagan J (1998) A blend of different tastes : the language of coffee makers. Environ Plan B: Urban Analytics City Sci 25(2):205–226 Amini-Behbahani P (2016) Spatial properties of frank lloyd wright’s prairie style: A topological analysis. Dissertation. University of Newcastle, Australia Ang M C, Chau H H, McKay A, De Pennnington A (2006) Combining evolutionary algorithms and shape grammars to generate branded product design. In: Gero J S (ed) Design computing and cognition’06, pp 521–540 Andaroodi E, Andres F, Einifar A, Lebirge P, Kando N (2006) Ontology-based shape-grammar schema for classification of caravanserais: a specific corpus of Iranian Safavid and Ghajar open, on-route samples. J Cult Herit 7:312–328 Argan GC (1996) Typology of architecture. In: Nesbitt K (ed) Theorizing a new agenda for architecture: an anthology of architectural theory 1965–1995. Princeton Architectural Press, New York, pp 240–247 Ça˘gda¸s G (1996) A shape grammar: the language of traditional Turkish houses. Environ Plan B, Plan Des 23:443–464 Chakrabarti A, Shea K, Stone R, Cagan J, Campbell M, Hernandez NV, Wood KL (2011) Computer-based design synthesis research: an overview. J Comput Inf Sci Eng 11:021003– 021012

1404

N. Gu and P. Amini Behbahani

Chomsky N (1965) Aspects of the theory of syntax. MIT Press, Cambridge, MA Colquhoun A (1996) Typology and design method. In: Nesbitt K (ed) Theorizing a new agenda for architecture: an anthology of architectural theory 1965–1995. Princeton Architectural Press, New York, pp 248–257 Duarte J (2001) Customizing mass housing: a discursive grammar for Siza’s Malagueira houses. Dissertation, MIT Everett DL (1993) Sapi, Reichenbach and the syntax of tense in Piraha. Pragmat Cogn 1(1):89–124 Eloy S (2012) A transformation grammar-based methodology for housing rehabilitation: meeting contemporary functional and ICT requirements. Dissertation, TU Lisbon Flyvbjerg B (2006) Five misunderstandings about case-study research. Qual Inq 12:219–245 Franco JMS, Duarte J, Batista EM, Landesmann A (2014) Shape grammar of steel cold-formed sections based on manufacturing rules. Thin-Walled Struct 79:218–232 Garcia S (2017) Classifications of Shape Grammars. In: Gero JS (ed) Design computing and cognition ‘16. Springer, Cham, pp 229–248 Gips J (1999) Computer implementation of shape grammars. Workshop on shape computation, MIT 1999 Granadeiro V, Pina L, Duarte J, Correia JR, Leal VMS (2013) A general indirect representation for optimization of generative design systems by genetic algorithms: application to a shape grammar-based design system. Autom Constr 35:374–382 Hernandez CRB (2006) Thinking parametric design: introducing parametric Gaudi. Des Stud 27:309–324 Huang J, Pytel A, Zhang C, Mann S, Fourquet E, Hahn M, Cowan W (2009) An evaluation of shape/split grammars for architecture. Research Report CS-2009-23, David R. Cheriton School of Computer Science, University of Waterloo, Ontario Johnson R (2003) Case study methodology. In: Proceedings of the international conference methodologies in housing research, pp 22–24 Kielarova SW, Pradujphongphet P, Bohez ELJ (2015) New interactive-generative design system: hybrid of shape grammar and evolutionary design – an application of jewelry design. In: Tan Y, Shi Y, Buarque F, Gelbukh A, Das S, Engelbrecht A (eds) Advances in swarm and computational intelligence. ICSI 2015. Lecture Notes in Computer Science, Springer, Cham, 9140: 302–315 Knight T (1989) Color grammars: designing with lines and colors. Environ Plan B, Plan Des 16:417–449 Knight T (1999) Shape grammars: six types. Environ Plan B, Plan Des 26:15–32 Knight T (2003) Computing with emergence. Environ Plan B, Plan Des 30:125–156 Koning H, Eizenberg J (1981) The language of the prairie: Frank Lloyd Wright’s prairie houses. Environ Plan B, Plan Des 8:295–323 Krish S (2011) A practical generative design method. Comput Aided Des 43:88–100 Lee JH, Ostwald M, Gu N (2016) A justified plan graph (JPG) grammar approach to identifying spatial design patterns in an architectural style. Environ Plan B: Urban Analytics City Sci 45(1): 67–89 McCormack J, Dorin A, Innocent T (2004) Generative design: a paradigm for design research. In: Richmond J, Durling D, de Bono A (eds) Proceedings of futureground. Design Research Society, Melbourne, pp 156–164 McKay A, Chase S, Shea K, Chau HH (2012) Spatial grammar implementation: from theory to useable software. Artif Intell Eng Des, Anal Manuf 26:143–159 Özkar M, Stiny G (2009) Shape grammars. In: Proceedings, SIGGRAPH 2009 (course) Parish Y, Müller P (2001) Procedural modeling of cities. In: Proceedings of the 28th annual conference on computer graphics and interactive techniques, pp 301–308 Pauwels P, Strobbe T, Eloy S, De Meyer R (2015) Shape grammars for architectural design: the need for reframing. In: Celani G, Sperling DM, Franco JMS (eds) Communications in computer and information science. Springer, Berlin, pp 507–526 Pugliese MJ, Cagan J (2002) Capturing a rebel: modeling the Harley-Davidson brand through a motorcycle shape grammar. Res Eng Des 13:139–156

52 Shape Grammars: A Key Generative Design Algorithm

1405

Runions A, Lane B, Prusinkiewicz P (2007) Modeling trees with a space colonization algorithm. In: NPH’07 proceedings of the Third Eurographics conference on natural phenomena, pp 63–70 Sass L, Shea K, Powell M (2005) Design production: constructing freeform designs with rapid prototyping. In: Digital design: the quest for new paradigms – 23nd eCAADe conference proceedings, Lisbon (Portugal) 21–24 Sept 2005, pp 261–268 Seawright J, Gerring J (2008) Case selection techniques in case study research, a menu of qualitative and quantitative options. Polit Res Q 61:294–308 Singh V, Gu N (2011) Towards an integrated generative design framework. Des Stud 33:185–207 Speller TH, Whitney D, Crawley E (2007) Using shape grammar to derive cellular automata rule patterns. Complex Syst 17:343–351 Stavrev V (2011) A shape grammar for space architecture – part II. 3D graph grammar – an introduction. In: 41st international conference on environmental system, American Institute of Aeronautics and Astronautics Stouffs R (2016) An algebraic approach to implementing a shape grammar interpreter. Conference: 34th eCAADe conference, At Oulu, Finland, vol 2, pp 329–338 Stouffs R, Krishnamurti R (2001) Sortal grammars as a framework for exploring grammar formalisms. Mathematics and Design ‘01, Deakin University July 2001 Stiny G (1977) Ice-ray: a note on the generation of Chinese lattice designs. Environ Plan B 4:89–98 Stiny G (1980) Introduction to shape and shape grammars. Environ Plan B 7:343–351 Stiny G (1981) A note on the description of designs. Environ Plan B, Plan Des 8 Stiny G (2006) Shape: talking about seeing and doing. The MIT Press, Cambridge, MA Stiny G, Gips J (1972) Shape grammars and the generative specification of painting and sculpture. Inf Process 71:1460–1465 Stiny G, Mitchell WJ (1978) The Palladian grammar. Environ Plan B 5:5–18 Tching J, Reis J, Paio A (2016) A cognitive walkthrough towards an interface model for shape grammar implementations. Comput Sci Inf Technol 4:92–119 Tomasello M (2009) Universal grammar is dead. Behav Brain Sci 32:470–471 Trescak T, Rodriguez I, Esteva M (2009) General shape grammar interpreter for intelligent designs generations. Computer Graphics, Imaging and Visualization, 2009. CGIV’09, pp 235–240 Wonka P, Wimmer M, Sillion F, Ribarsky W (2003) Instant architecture. ACM Trans Graph 22:669–677

Space Syntax: Mathematics and the Social Logic of Architecture

53

Michael J. Dawes and Michael J. Ostwald

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Space Syntax and Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spaces, Lines, and Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1408 1409 1410 1414 1416 1416 1416

Abstract Space syntax is the title given to a set of mathematical and computational theories and techniques for analyzing the social and cognitive characteristics of an architectural or urban plan. Several of the most famous of these techniques convert the spatial properties of a plan into a graph. Thereafter, graph theory is used to derive various measures, which are interpreted in the context of the original plan or against benchmark data for particular building types. This chapter presents an overview of the conceptual basis for space syntax and introduces three major analytical techniques: convex space analysis, axial line analysis, and intersection point analysis. Applications of these techniques are also described, along with a brief discussion of potential criticisms or limits. References cited in this chapter include the formulas and protocols needed to apply each of these techniques.

M. J. Dawes () University of New South Wales, Sydney, NSW, Australia e-mail: [email protected] M. J. Ostwald UNSW Built Environment, University of New South Wales, Sydney, NSW, Australia e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_6

1407

1408

M. J. Dawes and M. J. Ostwald

Keywords Space syntax · Architectural analysis · Graph theory · Social analysis

Introduction One of the earliest definitions of architecture identifies three essential characteristics: “firmness,” “commodity,” and “delight.” The first of these refers to form or shape, being the tangible or solid geometry of architecture. Firmness is associated with the surfaces, walls, floors, and roofs of buildings. The second part, commodity, refers to the spaces or voids that are enclosed or defined by form. Commodity is found in the network of alcoves, rooms, annexes, and hallways in an interior. These are the spaces and connections between them that support a functional or social purpose. The final part of the classical definition of architecture, delight, refers to the way in which combinations of forms appeal to perceptions of beauty or suggest more transcendent, spiritual, or poetic qualities. In the late twentieth century, inspired by the linguistic analogies of the era, architecture was reconceptualized as having three distinct properties: “vocabulary,” “syntax,” and “grammar.” The first of these encapsulates the basic formal units of a building, the second, the spaces enclosed by these units, and the third, the combinations of forms which create the overall architectural expression. The last of these three was the catalyst for shape grammar theory, while the second, associated with the syntactical properties of an environment, led to space syntax. The primary paradigm shift promoted by early syntactical research in architecture was the suggestion that space could be examined independent of form. As most histories and theories of architecture are dominated by discussions about form, style, and aesthetics, the notion that architecture could be analyzed solely from the point of view of the network of voids in a building was almost heretical at the time. However, as Bill Hillier and Julienne Hanson argue, while “we may prefer to discuss architecture in terms of visual styles, its most far-reaching practical effects are not at the level of appearances at all, but at the level of space” (1984: ix). This realization led Hillier and Hanson (1984) to reject the scholarly fascination with the “geography” of architecture – meaning its dimensionality, orientation, and location – and focus instead on its “topology.” Space may be empty and invisible, but it has two topological properties that exert a significant influence over architecture’s capacity to function. The first of these is “permeability,” being the way spaces are connected or configured. The second is “difference,” which refers to the extent to which spaces may be distinguished from each other solely on the basis of their location in a system or network of spaces. The topology of space is significant because it “can reflect and embody a social pattern” (Hillier 2005: 104). This pattern encapsulates the collective values and social structures of a group in an architectural plan. However, “space can also shape a social pattern” (Hillier 2005: 104), because of the way an architectural plan positions some areas more centrally than others. Bafna, describing this relationship, argues “that social structure is inherently spatial and inversely that

53 Space Syntax: Mathematics and the Social Logic of Architecture

1409

the configuration of inhabited space has a fundamentally social logic” (2003: 18). Space syntax theories and techniques focus on the topology of space to examine the relationship between configurational patterns in architecture and their generative or reproductive social structures. While this introduction provides a background to the theoretical foundations of space syntax, the following section describes the analytical techniques that are used to investigate the topology of architecture. Before progressing, it must be noted that the scope of this chapter is primarily limited to architectural applications of space syntax theory and to analytical techniques that draw on graph theory. These limits are necessary for two reasons. First, despite initially being developed for architectural analysis, in the last few decades, space syntax has been more extensively applied in micro- and macroscale urban analysis. The present chapter acknowledges this evolution, but the majority of its examples and applications are drawn from architectural research. Second, several well-known space syntax techniques are not considered in this chapter because they have a different conceptual or mathematical basis. Two of the most important of these are isovist analysis and visibility graph analysis.

Space Syntax and Mathematics There are three main stages in the typical space syntax technique. The first abstracts or reduces various properties of an environment – typically represented in an architectural or urban plan – into a series of differentiated components and the connections between them. The set of connected components identified in this process is called a “map,” although it is also, mathematically, a graph. The second stage analyzes the topological properties of the map using graph theory. In the final stage, the mathematical measures derived in stage two are used to analyze the social or cognitive properties of the original plan it was generated from. The early space syntax techniques, along with the ones introduced in this chapter, all repeat this process of abstraction, analysis, and interpretation, and the mathematical processes and formulas are largely unchanged in each technique. The primary difference between the various syntactical techniques is the abstraction process, which, depending on the variation applied, develops graphs of different architectural properties. The remainder of this section explains these processes in more detail, before the following section describes the three different approaches for abstracting a map from an architectural plan. The first stage of the typical space syntax analytical technique involves the abstraction or reduction of various properties from an architectural plan into a map of elements and the connections between them. These elements and connections in the map are then treated as the nodes and edges in a graph. Because graphs do not possess traditional geographic features, such as orientation or dimensionality, they can be rearranged in various different ways to emphasize particular properties. While this process doesn’t change the mathematical properties of the graph, it can be used to investigate different architectural properties. From a mathematical viewpoint, this process might seem irrelevant, but an important principle in syntactical analysis is that the “spatial layout not only looks but is different when seen

1410

M. J. Dawes and M. J. Ostwald

from different points of view in the layout” (Hillier 2005: 101). The process of “justifying” a graph around a “carrier” to emphasize particular spatial relationships can be effective for intuitively analyzing a graph, before later interpreting its mathematical results. The next stage in the standard space syntax technique involves the mathematical analysis of the map or graph. The analysis commences with a derivation of simple summative measures, including the number of nodes, edges, or types of elements (e.g., if the focus is rooms, then the nodes may be coded to differentiate public from private spaces). The total depth (TD) and mean depth (MD) of nodes is then determined, along with measures for relative asymmetry (RA), choice or control (C), and integration (i). There are also normalized variations of many of the standard measures to allow for different sized graphs to be compared. For example, real relative asymmetry (RRA) is derived from a formula that normalizes relative asymmetry values against those of an idealized diamond-shaped graph. Despite the fact that space syntax and graph theory have a common mathematical foundation, over time the primary concerns and methods employed in the two fields have diverged. For example, whereas research in computer science has sought to develop increasingly refined algorithms for modeling complex graphs, in architecture, research has more often been directed to determining the usefulness of various graph measures. Thus, architectural scholars developed empirical evidence linking some graph theory values with observations of human behavior. They also began to rely on ways of relativizing graph measures against benchmark data that is thought to embody particular architectural properties. As a result of this, several of the most important mathematical measures developed for use in space syntax analysis don’t have an equivalent in conventional graph theory or wouldn’t make sense outside of architectural analysis.

Spaces, Lines, and Points This section introduces three important analytical techniques used by space syntax researchers. The first is convex space analysis, which is typically used to investigate the permeability or connectivity relationships in a plan. These properties are associated with social structures and hierarchies. The second technique, axial line analysis, is used to investigate movement and surveillance potential in a plan. The third technique, intersection point analysis, can be employed to examine the role played by “choice” or “pause points,” in navigation through a building. Respectively, these three techniques are used to analyze the topology of an architectural plan in terms of spatial accessibility, movement or surveillance, and decision-making. A “convex space,” like a convex polygon, is one where all the interior angles in the room’s plan are less than 180◦ , such that a straight line drawn between any two points of the room does not cross the perimeter of the room. It is a significant concept in architectural analysis because it encapsulates two psychological and social properties of space in a geometric formulation. In the first instance, a psychologically self-contained unit of space is one where the entire perimeter is visible from any location within it. Such a space is also described as “visually

53 Space Syntax: Mathematics and the Social Logic of Architecture

1411

coherent,” because it can be understood in its totality from within. In a social sense, this type of space accommodates “copresence” and “co-awareness,” meaning that people are visible to one another or can interact with or acknowledge each other (Peponis and Bellal 2010). By drawing on the concept of convexity, these psychological and social definitions can be translated into a mathematical one. If the shape of a room in an architectural plan conforms to this definition, it is classed as a “convex space,” and it therefore replicates the basic psychological and social characteristics of the smallest, habitable unit of space. If a room does not conform to this geometry – say, its plan is an L-shaped concave polygon with one interior angle of 270◦ – the map representation of the space must be subdivided in such a way as to identify only convex spaces. Convex space analysis abstracts the plan of a building or environment into the minimum number of visually coherent units of spaces (Hillier and Hanson 1984). Markus defines a convex map as the set of a building’s “fewest and fattest spaces that cover the entire plan, the former always prevailing over the latter” (1993: 14). This means that the lowest number of spaces takes precedence over the geometry of the convex space, when making decisions about the classification or division (in the event of a concave space) of spaces in a plan. Once the complete set of convex spaces in the plan is identified, then any trafficable connections between them are marked on the map. Such connections must be large enough to allow for human movement. After this map has been drawn (being effectively an overlay on an architectural plan), then the convex spaces are designated as nodes, and the doorways between them (or shared surfaces in the case of a subdivided concave room) are classified as edges. In this way, the plan of the building is transformed into a graph of spaces and accessibility (Fig. 1). The content of this graph can be conceptualized as capturing the permeable properties of the plan, which in turn define its interior social structure (Hanson 1998). Furthermore, by adding a node to the graph representing the exterior of the building, different social relations can also be compared. For example, understanding the social relationships between visitors and inhabitants, or inhabitants and other inhabitants, relies on the role of the exterior in the topology of the plan. The convex space graph produced in this process can be visually assessed in the first instance, to intuitively identify particular properties. This typically commences

Fig. 1 (i) Simple building plan, (ii) annotated convex space map of this plan, (iii) graph of the convex map

1412

M. J. Dawes and M. J. Ostwald

by arraying the graph vertically, with the exterior at the base (as the “carrier” of the graph). The shape of this graph gives an immediate sense of its social structure. For example, a graph with a dense branching topological structure is commonly described as “bush-like” or “tree-like.” This configuration arises in a plan where users must pass through a small number of controlling spaces to access deeper parts of the plan. Conversely, if the graph has extensive interconnections or is rhizomorphous, it may be described as “lattice-like” or as containing “loops” or “rings.” Such spatial configurations in a plan offer users a high level of flexibility in the way they move through and inhabit a building. The third common archetype is a linear graph, called an “enfilade.” This configuration represents the most highly controlled plan, where there is no choice but to progress through a strict sequence of rooms in the same order every time. Despite this simple tripartite visual classification of graph archetypes, many large buildings actually combine all three as substructures. For example, in an institutional building (say a hospital or school), the spaces near the entry tend to conform to more linear or secure spatial configurations, while spaces at an intermediate depth tend to be looped to accommodate efficient and flexible functional interactions. Finally, private offices and services are often deep in a building plan, where they are arrayed in tree-like configurations. While this type of visual analysis can be useful, it rarely supplants the need to mathematically analyze the graph and then interpret the numerical results in terms of specific spaces in the plan (Ostwald 2011a). The convex space technique was one of the earliest to be developed by space syntax researchers, but over the last decade, a simplified version of it has become more common. In this version, called “functional space analysis,” nodes equate to room functions (Hanson 1998; Markus 1993; Ostwald 2011b). Thus, for example, a dining room in a house plan is classified as one graph node, regardless of whether it is strictly convex or not. This variation has many benefits, not the least of which is that it simplifies the complex and inconsistent process of subdividing concave spaces. Furthermore, it also accommodates the presence of very small convex spaces in a plan (like alcoves, nooks, or window seats), which can otherwise bias or skew the results. Such small convex spaces are simply merged into the largest adjacent functional space. A further variation of this technique proposes that “functional sectors” in a plan should be grouped into single nodes. A sector-based approach of this type has proven useful for considering large-scale social patterns and more complex space and form-related decision-making (Amorim 1999; Lee et al. 2015). In this second variation, for example, a service area in a house may have multiple small convex and functional spaces within it (including a powder room, bathroom, toilet, laundry, and storage), but these are all grouped into a single “service” node in the graph. The next major analytical approach developed by space syntax researchers is axial line analysis. An axial line is a straight path or vector through space, signifying potential for both movement and vision in a plan. While people do not necessarily walk or look along completely straight vectors, an axial line is an idealized approximation of the capacity to walk or see in a direction. Certainly, people have physical properties (including shoulder-width and stride-length) which prevent them from following perfectly linear paths, and objects in space (furniture,

53 Space Syntax: Mathematics and the Social Logic of Architecture

1413

people, and vehicles) can temporarily obstruct movement or vision, but the axial line offers a generalized way of modeling movement and vision in an environment. An axial map is the set of the most efficient (in number) and effective (in coverage), axial lines in a plan, all of which are connected to at least one other line to form a single set of connected lines (Fig. 2). It could also be described as the set of fewest and longest lines in an architectural plan, which give access to all spaces and allow a person to move everywhere and see everything in an environment. Once this set of connected lines is identified, then the lines are treated as nodes and their connections as edges, producing a graph which is then analyzed mathematically. In practice, the axial map looks like a set of angled lines, all of which intersect at least one other line and therefore form a single connected set. Unlike the convex space map, which is often presented as a graph and even analyzed qualitatively through visual inspection, the axial line map is rarely depicted or examined in this way. It is more common for some level of visual coding to be applied to the map to assist readers to interpret its properties. For example, graph theory values – such as integration or connectivity – can be color-coded in the map. Using this approach, axial lines with high values are typically red or orange, and low values are pale blue or violet, with intermediate values distributed along a color gradient between them. A third technique developed by space syntax researchers is intersection point analysis. Whereas the convex space and axial line techniques are concerned with, respectively, generalized notions of inhabitation and movement, the intersection approach examines the relationship between significant locations in a plan (Dawes and Ostwald 2013). Such locations are the ones where the primary paths through a plan commence, cross, or conclude. This map is essentially a variation of the axial line graph, focusing instead on the connections between paths, rather than on the paths themselves (Batty 2004; Jiang and Claramunt 2004; Ostwald and Dawes 2013; Porta et al. 2006; Turner 2005). Thus, whereas the axial line map represents the network of movement or vision-related potential in a plan, its inversion captures the relationships between significant locations in this network. Such locations approximate, in an experiential sense, places where a person is most likely to pause and make a decision about how to navigate or explore a building. Because the intersection map appears to reverse the focus of the axial map, some researchers describe the former as a dual of the latter’s primal graph. However,

Fig. 2 (i) Simple building plan with axial lines marked, (ii) annotated axial line map of this plan, (iii) graph of the axial map

1414

M. J. Dawes and M. J. Ostwald

Fig. 3 (i) Simple building plan with original axial map overlay, (ii) annotated intersection point map of this plan, (iii) graph of the intersection map

the graph that arises from the intersection map is actually an inversion of the one generated from the axial map. To invert an axial map, each intersection is treated as being just one topological step from each other intersection on the axial line. This process, which was developed by Batty (2004), maintains the integrity of the graph (Fig. 3). There are multiple variations of the intersection mapping technique to accommodate different interpretations of the importance of the endpoints of axial lines, called “stubs.” In the standard version of this method, the only nodes included in the intersection graph are those where axial lines cross. But in many architectural plans, these locations only occur in particular spaces or zones, leaving large parts of the plan unrepresented. However, any spaces in a plan that don’t have intersection points will likely have stubs. Placing a point at the end of these lines creates a more complete map of an environment, but it can also bias or skew the results, which might be inappropriate for some purposes. While there is no unilaterally accepted approach to determining which line ends to include or exclude, several alternative rules have been canvassed and tested (Dawes and Ostwald 2013). Both the axial line and intersection point maps can also be used to examine various cognitive properties of an environment. The most important of these is intelligibility (I), which is a measure of global-local relationships in a graph (Peponis et al. 1990). It indicates the degree to which the entire configuration of a building plan can be understood by moving through it (in the case of an axial map) or being located at key locations in the environment (in the case of an intersection point map). Intelligibility is, in essence, a measure of the capacity to navigate or understand an environment. It is calculated as the correlation coefficient of connection (C) and integration (i), which indicates the extent to which any line or point in a graph is connected to any other line or point.

Application Since the late 1970s, syntactical techniques have been used in architecture, urban design, and town planning. They have also been more selectively adopted in

53 Space Syntax: Mathematics and the Social Logic of Architecture

1415

archaeology, anthropology, geography, spatial psychology, and traffic engineering. In architecture and urban design, these techniques have been used to investigate a range of spatial properties, including power structures in primitive societies, social patterns in historic housing, wayfinding, pedestrian movement, vehicular traffic volumes, property values, and crime rates. Convex space analysis was initially used by researchers for identifying structural genotypes, which are dominant or recurring spatial configurations for a particular building type and context (Conroy-Dalton and Kirsan 2008; Hanson et al. 1987). Some common building types which have been analyzed in this way include houses, schools, mosques, and prisons. A structural genotype can be identified by examining a set of buildings with a similar function and from a particular era and sociocultural context. For example, studies of historic houses from different parts of the world reveal recurring local social structures and values (Hanson 1998; Hillier and Hanson 1984). Similar methods have been employed to investigate power relations within institutional spaces (Dovey 1999; Markus 1993) and historic buildings (Bustard 1999; Cooper 1997; Dawson 2001). If the works of a particular architect are analyzed in this way, it is possible to uncover insights into the design logic they have used (Bafna 1999; Major and Sarris 1999; Ostwald 2011b). More recently, convex space analysis has been used to extract properties from culturally significant environments and then parametrically generate new versions of these (Yu et al. 2015). While the majority of applications of convex space analysis have been in architecture, axial line analysis has tended to be used for investigating the properties of urban space (Desyllas and Duxbury 2001; Hillier et al. 1993; Peponis et al. 1997a). Past research reveals that the syntactical measure integration may be used to predict movement potential in a spatial network and that choice (C – a derivation of the earlier measure control) may be used to model pedestrian distribution (Hillier 1996; Hillier et al. 1987). Other syntactic measures derived from axial maps have also been shown to correlate with levels of criminal activity (Friedrich et al. 2009; Hillier and Shu 2000). Despite being less common in architecture, axial maps have been used to examine the frequency of social encounters and the volume of pedestrian movement in office buildings (Ermal and Peponis 2008). Axial maps have also been used to study wayfinding problems in health-care facilities (Haq and Zimring 2003). In much the same way that convex maps have been employed to analyze the design styles or theories of famous architects, so too axial line maps have been employed for this purpose (Hanson 1998; Dawes and Ostwald 2012). As the most recent of the syntactical techniques considered in this chapter, developments in intersection point analysis have been focused largely on refining the method, with only a small number of applications in architectural or urban analysis, or using geographic information systems (Batty 2004; Jiang and Claramunt 2004; Porta et al. 2006; Turner 2005). In the case of architectural analysis, comparisons between results derived from axial line and intersection point maps have been undertaken for domestic modernist architecture (Ostwald and Dawes 2013).

1416

M. J. Dawes and M. J. Ostwald

Conclusion Despite its success, there have been several criticisms of the space syntax movement that have directly shaped its development and refinement. Nevertheless, the interpretation of mathematical results in architectural terms has been repeatedly described as complex, subjective, and overly reliant on ambiguous terminology and concepts (Dovey 1999; Osman and Suliman 1994). Full worked examples – which explain the rationale and formulas used for each stage in the analytical processes – are also surprisingly rare. As most researchers use software for syntactical analysis, a growth in application of these techniques has occurred in parallel with a diminution in understanding of what the software is doing or producing and why. The axial line map, in particular, has been singled out for criticism for its failure to accommodate population density, metric distance, and subtle changes in geometry (Ratti 2004a, b). Some of these criticisms can be directly attributed to space syntax’s rejection of geography in favor of topology. The lack of consideration of distance and direction is a limitation of the approach, although, by focusing on topology, the results are potentially more significant for human wayfinding and cognition (Peponis et al. 1997b). Setting aside the various arguments for and against the use of space syntax techniques for architectural analysis, even its critics accept that the mathematical analysis of space is crucial for the future of design. Space syntax offers a language of spatial representation that can be used for analysis, optimization, and prediction. Such languages are rare, and for this very reason, the syntactical approach has found an enduring place in architectural design and analysis.

Cross-References  Isovists: Spatio-visual Mathematics in Architecture

References Amorim L (1999) The sectors paradigm: a study of the spatial and functional nature of modernist housing in northeast Brazil. Dissertation. University of London Bafna S (1999) The morphology of early modernist residential plans: geometry and genotypical trends in Mies van der Rohe’s designs. In: Proceedings, space syntax second international symposium, Brasilia, 1999, pp 01.1–01.12 Bafna S (2003) Space syntax: a brief introduction to its logic and analytical techniques. Environ Behav 35(1):17–29 Batty M (2004) A new theory of space syntax, Working paper series. Paper 75. UCL, London, pp 1–36 Bustard W (1999) Space, evolution, and function in the houses of Chaco Canyon. Environ Plan B: Plan Des 26(2):219–240 Cooper LM (1997) Comparative analysis of Chacoan Great Houses. In: Proceedings, space syntax first international symposium, vol 2, London, p 22

53 Space Syntax: Mathematics and the Social Logic of Architecture

1417

Conroy-Dalton R, Kirsan C (2008) Small-graph matching and building genotypes. Environ Plan B: Plan Des 35(5):810–830 Dawes MJ, Ostwald MJ (2012) Lines of sight, paths of socialization: an axial line analysis of five domestic designs by Richard Neutra. Int J Constr Environ 1(4):1–28 Dawes MJ, Ostwald MJ (2013) Precise locations in space: an alternative approach to space syntax analysis using intersection points. Archit Res 3(1):1–11 Dawson PC (2001) Space syntax analysis of Central Inuit snow houses. J Anthropol Archaeol 21(4):464–480 Desyllas J, Duxbury E (2001) Axial maps and visibility graph analysis. In: Proceedings, space syntax 3rd international symposium, Atlanta, 2001 Dovey K (1999) Framing places: mediating power in built form. Routledge, London Ermal S, Peponis J (2008) The effect of floorplate shape on office layout integration. Environ Plan B: Plan Des 35(2):318–336 Friedrich E, Hillier B, Chiaradia A (2009) Anti-social behaviour and urban configuration using space syntax to understand spatial patterns of socio-environmental disorder. In: Proceedings of the 7th international space syntax symposium, Stockholm, 2009 Hanson J (1998) Decoding homes and houses. Cambridge University Press, Cambridge, UK Hanson J, Hillier B, Graham H (1987) Ideas are in things: an application of the space syntax method to discovering house genotypes. Environ Plan B: Plan Des 14(4):368–385 Haq S, Zimring C (2003) Just down the road a piece: the development of topological knowledge of building layouts. Environ Behav 35(1):132–160 Hillier B (1996) Space is the machine. Cambridge University Press, London Hillier B (2005) The art of place and the science of space. World Archit 11(185):96–102 Hillier B, Hanson J (1984) The social logic of space. Cambridge University Press, Cambridge, UK Hillier B, Shu S (2000) Crime and urban layout: the need for evidence. In: Ballintyne S, Pease H, McLaren V (eds) Secure foundations: key issues in crime prevention, crime reduction and community safety. Institute for Public Policy Research, London, pp 224–248 Hillier B, Brudett R, Peponis J, Penn A (1987) Creating life: or does architecture determine anything? Archit Behav 3(3):233–250 Hillier B, Penn A, Hanson J, Grajewski T, Xu J (1993) Natural movement: or, configuration and attraction in urban pedestrian movement. Environ Plan B: Plan Des 20(1):29–66 Jiang B, Claramunt C (2004) Topological analysis of urban street networks. Environ Plan B: Plan Des 31(1):151–162 Lee JH, Ostwald MJ, Gu N (2015) A syntactical and grammatical approach to architectural configuration, analysis and generation. Archit Sci Rev 58(3):189–204 Major MD, Sarris N (1999) Cloak and dagger theory: manifestations of the mundane in the space of eight Peter Eisenman houses. In: Proceedings, space syntax second international symposium, Brasilia, 1999 Markus T (1993) Buildings and power. Routledge, London Osman KM, Suliman M (1994) The space syntax methodology: fits and misfits. Archit Behav 10(2):189–204 Ostwald MJ (2011a) The mathematics of spatial configuration: revisiting, revising and critiquing justified plan graph theory. Nexus Netw J 13(2):445–470 Ostwald MJ (2011b) A justified plan graph analysis of the early houses (1975–1982) of Glenn Murcutt. Nexus Netw J 13(3):737–762 Ostwald MJ, Dawes MJ (2013) Miesian intersections: comparing and evaluating graph theory approaches to architectural spatial analysis. In: Cavalcante A (ed) Graph theory: new research. Nova Science, New York, pp 37–86 Peponis J, Bellal T (2010) Fallingwater: the interplay between space and shape. Environ Plan B: Plan Des 37(6):982–1001 Peponis J, Zimring C, Choi YK (1990) Finding the building in wayfinding. Environ Behav 22(5):555–590 Peponis J, Ross C, Rashid M (1997a) The structure of urban space, movement and co-presence: the case of Atlanta. Geoforum 28(3–4):341–358

1418

M. J. Dawes and M. J. Ostwald

Peponis J, Wineman J, Rashid M, Kim H S, Bafna S (1997b) On the generation of linear representations of spatial configuration. In: Proceedings, first international space syntax symposium, London, 1997 Porta S, Crucitti P, Latora V (2006) The network analysis of urban streets: a primal approach. Environ Plan B: Plan Des 33(5):705–725 Ratti C (2004a) Urban texture and space syntax: some inconsistencies. Environ Plan B: Plan Des 31(4):487–499 Ratti C (2004b) Rejoinder to Hillier and Penn. Environ Plan B: Plan Des 31(4):513–516 Turner A (2005) Could a road centre line be an axial line in disguise. In: Proceedings 5th international space syntax symposium, Delft, 2005 Yu R, Ostwald MJ, Gu N (2015) Parametrically generating new instances of traditional Chinese private gardens that replicate selected socio-spatial and aesthetic properties. Nexus Netw J 17(3):807–829

Isovists: Spatio-visual Mathematics in Architecture

54

Michael J. Dawes and Michael J. Ostwald

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Isovist Measures and Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1420 1422 1424 1426 1429 1429 1429

Abstract This chapter introduces an important concept in architectural analysis, the isovist. Along with two associated concepts – the isovist field and the visibility graph – the isovist provides a mathematical basis for analyzing and shaping architectural space and form. Significantly, isovists can also be used for investigating, and even predicting, human behavioral and cognitive responses to buildings. This chapter commences with an overview of both the isovist and isovist field, before describing their origins and applications. The chapter also presents a summary of past research that uses isovists to analyze architectural and urban space or seeks to correlate isovist measures with human perceptions. While the chapter does not provide detailed explanations of all the mathematical and computational processes involved, it does include a list of key isovist measures

M. J. Dawes University of New South Wales, Sydney, NSW, Australia e-mail: [email protected] M. J. Ostwald () UNSW Built Environment, University of New South Wales, Sydney, NSW, Australia e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_5

1419

1420

M. J. Ostwald and M. J. Dawes

and their behavioral interpretations. The references cited in this chapter include detailed worked examples of isovist analysis along with discussion of limitations and recent developments.

Keywords Isovists · Isovist fields · Visual graph analysis · Spatio-visual analysis · Space syntax

Introduction In order to understand what an isovist is and how it operates, imagine yourself standing inside the entry to a large interior space, say a shopping mall, airport terminal, or hospital. Now carefully look in each direction, slowly turning around through 360◦ . In some directions, various objects – including walls and furniture – will block your view, hiding parts of the interior. In other directions, there is nothing occluding your vision, and you can see the full extent of the space. From this vantage point, various types of information can be gleaned. For example, you may identify objects that can serve as landmarks, supporting navigation and wayfinding. You may also be able to differentiate between parts of the space that are smaller or more enclosed and those which are larger or more exposed. When looking in some directions, it may even be possible to sense that there is more to discover, because of the way various objects partially screen your view. Whereas in other directions, the objects and elements which intrude into your view are so numerous and complex that you can gain no deeper understanding about the spaces they hide without movement to a new location. If you draw a plan (a two-dimensional view from above, looking directly down) of the volume of space that is visible from your location in each direction, you will create a polygon. This polygon will have some edges that are defined by the walls

Fig. 1 Isovist generated for a given position within an environment and annotated to identify some typical measures

54 Isovists: Spatio-visual Mathematics in Architecture

1421

that have blocked your view and others that are by-products of the limits of your cone of vision, because without moving you cannot see around corners or behind objects. This polygon, which is a representation of the combined spatial and visual (or “spatio-visual”) geometry and limits of your experience, is an isovist (Fig. 1). One of the earliest definitions of an isovist described it as “the set of all points visible from a single vantage point in space with respect to an environment” (Benedikt 1979: 47). It is, in essence, the geometry of the space that is delineated by the human cone of vision, scanning in every direction from a fixed location. In architecture, an isovist is typically represented in two dimensions as a polygon traced on a floor plan. This polygon provides a simple graphic representation of spatio-visual geometry relative to a particular position. However, the polygon also has many measurable geometric properties. For example, it has an area, a number of connected edges, and a perimeter length. Such measures are not only properties of the polygon, but they also capture various characteristics of the environment from which it was generated. Moreover, measures derived from this isovist can be used to model or predict a person’s understanding of, and possible reaction to, the environment. But while these measures for an individual isovist are useful, comparing multiple isovists can generate even more valuable information. Imagine yourself back in the same large interior space as before. Now move a short distance away from your original position, and use this new location to start another rigorous visual exploration of the environment, turning slowly through 360◦ . You will now generate a new and most likely different understanding of the environment. In some directions, nothing may have changed; the same walls may still completely block your view. But in other directions, spaces or objects that were previously hidden from you may now be revealed. Once again, if you trace the outline of the extent of your vision on a plan, you will produce a polygon, which is likely to differ from the previous one. There may be more edges, and the angles between the edges may have changed, along with the area of the polygon and its perimeter length. This means that you can mathematically compare the properties of the two isovist polygons and determine, for example, whether, from this new location, you see more or less of the space than before. If more of the space is revealed, then you have increased your awareness and possible understanding of the building, effectively taking the first step to discovering its properties. However, if the extent of visible space is reduced, you have entered a more private, hidden, or enclosed area (Fig. 2).

Fig. 2 A graphic comparison of isovists generated from three positions in the same architectural plan

1422

M. J. Ostwald and M. J. Dawes

If you have the time and inclination, you could also methodically move through the entire interior, producing new isovists and generating new spatio-visual measures for every position. By comparing these measures, you will be able to differentiate between a range of spatial properties. This rigorous mapping of the complete set of spatio-visual properties in the building produces an “isovist field.” Whereas the individual isovist is useful for understanding precise locations in a building or the experience of a type of user (say, a first-time visitor), the isovist field is more useful for analyzing the building in its entirety.

Background The first model of the geometry of visual perception was developed by the environmental psychologist James Gibson (1966). Gibson’s “ambient optic array,” which proposed a new way of examining the relationship between the viewer and an environment, has three characteristics which are encapsulated in its title. The word “ambient” refers to the fact that a person’s experience of any environment is necessarily localized. A person must be immersed within an environment and occupy a specific part of it to experience it. The second word, “optic,” relates to the realization that environmental information is generally contingent on the mechanics of vision, the process wherein light enters the eye and strikes the retina. Finally, the word “array” refers to an ordered arrangement of elements as part of a larger system. In this context, the arrangement occurs when light from an environmental surface is reflected into the eye. It is this process which provides light rays with the spatial information that is conveyed to the observer. Gibson’s conceptual breakthrough was to realize that a light ray is not just a stream of photons; rather, it is a geometrically structured source of information. However, if the viewer is constrained to a single location, the quantity of environmental information available is also fixed. Therefore, movement is a critical part of the experience of space, because it is only by moving that a person can discover additional properties of an environment. Gibson (1979) explains this by observing that, as a person moves, the ambient optic array changes, revealing more or less information with each step. Thus, the volume and type of visual information available in an environment are relative to the location from which that information is obtained. Gibson’s ambient optic array has parallels with the concepts of “intervisibility” (Gallagher 1972) and the “viewshed” (Lynch 1976), both developed for research in urban and landscape geography. One of the first references to an isovist is also derived from the same fields (Tandy 1967), although it wasn’t until Benedikt (1979) formalized its construction and with Davis (Davis and Benedikt 1979), identified a repeatable method for constructing and measuring them, that the concept was adopted in architecture. One important part of the definition of an isovist is that it accommodates all light rays accessible to a person’s eye, from any direction. While it might be more intuitive to assume that the geometric limits of human vision are necessarily directed

54 Isovists: Spatio-visual Mathematics in Architecture

1423

in a cone, with its apex in the human eye, from the earliest research, it was observed that vision and perception are more complex. The “eyes, head and body can all move” and “under normal conditions, a viewer is continuously sampling a much broader portion of the environment even though at any one instant the new stimuli are limited” (Smardon et al. 1986: 41). Indeed, while the high acuity region of the macular field typically has a 124◦ cone of vision, the low acuity zone expands to over 170◦ . Furthermore, by tilting the head to the left or right, the cone of vision is increased to around 230◦ . If partial rotation of the shoulders and waist is combined with movement of the head and eyes, the cone of vision increases to around 300◦ . Therefore, the decision that an isovist is delineated by the space visible in every direction from a fixed location reflects the capacities and behaviors of people in space. Certainly, isovist theory can and does accommodate more limited cones of vision, but for most purposes, 360◦ isovists are now the accepted standard in architectural analysis (Meilinger et al. 2009). While individual isovists are valuable for exploring the spatial properties of a single location, or the perceptions of viewers at this location, such “local” measures are of relatively limited use. From the earliest research into isovists, it was apparent that a system was required to produce a more comprehensive or “global” mapping of the isovist properties of a plan. Davis and Benedikt (1979) proposed the first of these mappings, the isovist field. It worked by superimposing a regular grid on an environment’s plan and generating an isovist at the center of each grid square. From this data, it was then possible to color each grid square in the architectural plan to represent relative values. This was effectively the first “visibility map,” with the data encapsulated in the visualization being based on the metric values of individual isovists. However, over the following decades, several researchers developed ways of calculating global visibility measures from such maps (De Floriani et al. 1994), the most famous of which is visibility graph analysis (VGA). VGA treats each observation point in a plan (being the center of each grid square in the isovist field) as a graph node. If a straight line can link any two nodes without passing through a wall (i.e., if the locations are mutually visible), then this is classed as a graph edge. Once the visibility graph is completed, then multiple topological measures are derived from the isovist field (Batty 2001; Turner 2003; Turner and Penn 1999; Turner et al. 2001). Not surprisingly though, the scale of the data developed from VGA has tended to require the development of new methods of

Fig. 3 Visibility maps of a simple architectural plan, colored to represent, from left to right, (a) integration, (b) occlusivity, and (c) area

1424

M. J. Ostwald and M. J. Dawes

visualization to present any findings. The most common practice has been to apply color to each grid square to represent high or low values (Fig. 3). Furthermore, while all of the early isovists were constructed by hand and measures were derived from them using a similarly manual process, in the last few decades, all isovist analysis and VGA has been undertaken using software. The most well-known software for this process is UCL Depthmap.

Isovist Measures and Mathematics There are two types of isovist measures, “metric” (sometimes called “real”) and “statistical.” The former are derived from the properties of the isovist itself, whereas the latter arise from its method of construction. This section describes several important measures from each type. An individual or local isovist potentially has a large number of metric properties. For example, the first measured property was area (A), which is one of the most basic descriptions of the volume of a floor plan that is visible from a location. However, while area is relatively straightforward to calculate, it provides no information about the shape of the isovist. For this reason, the perimeter (P) length was soon identified as one means of differentiating shapes with similar areas but different boundary conditions. These two are frequently combined in the area: perimeter ratio (A:P) to assist in comparing isovists (Conroy 2001).  Area perimeter ratio =

isovist area isovist perimeter



In an isovist, the perimeter has both a number of edges (Pol#) and a number of edge changes in direction, known as jaggedness (J). Moreover, the perimeter can be made up of two types of edges, “boundaries,” being edges which correspond with solid surfaces, and “occluded radials,” which are edges that result from part of the environment being obscured by a closer object. These radials are also not congruent with a physical barrier and therefore change as observation points change. From these measures, a range of secondary ratios have been developed for architectural analysis, including occluded/perimeter ratio (O:P) and average occluded length/area ratio (RO(A) :A). Stamps (2005) argues that the mathematical measures concavity (Con) – which he also calls convex deficiency – and circularity (Circ) are especially useful for comparing isovists. Circularity is a comparison of the area of the isovist to the perimeter of the isovist. Circularity =

Isovist Perimeter2 (4 × π × isovist area)

Three additional metric measures are associated with the visual “enticement” embodied in a space. Drift magnitude (DrM) is the distance from the observation point, d, to the “center of gravity,” c, of the isovist polygon, where the center of

54 Isovists: Spatio-visual Mathematics in Architecture

1425

the isovist is calculated as a “polygonal lamina” (Conroy 2001: 154). The distances between observation point and center of gravity in the x and y planes individually and the square root of the sum of the square of the planar differences is the magnitude of the isovist drift.  Drift =

 2 (dx − cx )2 + dy − cy

Unlike these metric measures for isovists, to understand the origins of the statistical measures, we have to explain how isovists are constructed. There are primarily two methods for constructing isovists, the second of which generates the statistical measures (Ostwald and Dawes 2013a). The first method commences by drawing lines on a plan from the location of the viewer to every vertex (corner) in the plan. If any of these lines cross the edges of the plan, they are deleted (because they have connected to points in space which are obscured). The locations where the remaining lines intersect with surfaces are then identified, and all of these are connected by edges, creating a polygon. While a few steps have been left out of this explanation, it can be used to quickly generate a viable isovist. The second method is much more computationally intensive, but it has a beneficial side effect. It commences by projecting a radial (a line) on a plan, from the location of the viewer at 0◦ until it meets a surface. Then the next radial is projected from the location of the viewer with, for example, an angle of 1◦ until it meets the surface. The end points of the first two radials are then connected. Thereafter, 358 more radials are projected, each 1◦ more than the previous radial, and their ends joined creating an isovist. This second method has the advantage that each radial has a length, and these lengths, 360 of them in this example, can be analyzed statistically. Thus, both basic – mean, median, and standard deviation – and more advanced statistical measures, kurtosis (K) and entropy (En), can be used to characterize isovists. Benedikt (1979) proposes that the statistical measures variance and skewness can also be used to quantify the dispersion of the perimeter around the observation point and the asymmetry of the isovist polygon. Variance (M2 ) is the second moment about the mean of the radials, and skewness (M3 ) is the third moment about the mean of the radials.   N 1  2 M2 = (ri − μ) N i=1

Skewness relies on a similar formula to variance, although the radial difference calculations are cubed rather than squared. A few isovist measures are also effectively both metric and statistical. For example, the longest radial length (RL(L) ) in a set is necessarily the furthest visible distance, which is also a metric measure, as too is the shortest radial length (RL(S) ), being the closest surface. Furthermore, while most isovist measures are described as either “metric” or “statistical,” in practice there are two other ways of classifying these measures and their applications: “scaled” and “scale-free.” Scaled

1426

M. J. Ostwald and M. J. Dawes

measures possess an absolute value, for example, length in meters or area in square meters. Isovist area, perimeter, and longest and shortest radials are all examples of scaled measures. In contrast, scale-free measures are relativized or normalized in some way, typically through the construction of ratios or proportions. Occluded to perimeter ratio is a scale-free measure, and some basic numerical results, including jaggedness and number of occluded radials, are also treated as scale-free.

Application Isovists are often considered part of the “space syntax” set of theories and techniques for analyzing architectural and urban spaces. For this reason, they are often linked to the assessment of various social and cognitive properties of architectural plans. For example, isovists and isovist fields have been used for a range of purposes since the 1970s, including studies of spatial cognition and wayfinding (Braaksma and Cook 1980; Conroy 2001; Meilinger et al. 2009), accessibility (Turner et al. 2001), spatial structure (Psarra 2005; Tzortzi 2004; Zamani and Peponis 2013), social structure (Markhede and Koch 2007), and object display (Stavroulaki and Peponis 2003). Isovists and isovist fields have also been used to analyze the properties of particular architects’ works (Choudhary et al. 2007; Peponis and Bellal 2010) and the experience of historic buildings. For example, isovist fields have been used to test several theories about the spatio-visual experience of movement through the architecture of Frank Lloyd Wright (Dawes and Ostwald 2014a, b; Ostwald and Dawes 2013b). In a similar way, VGA has been used to examine the paradoxical sense of mystery and transparency people observe when walking through traditional Chinese private gardens (Yu et al. 2016). Research using isovists, fields, and VGA also identifies a close correlation between several mathematical measures and human behaviors and perceptions. For example, it has been demonstrated that human perceptions of the “most visible” and “most hidden” locations in a building can be directly correlated to the largest and smallest isovist areas in a field (Conroy-Dalton and Bafna 2003; Wiener and Franz 2005). Human perceptions of spatial exposure and enclosure also correlate closely with several local isovist measures (Dosen and Ostwald 2016). Global measures derived from VGA have also been adapted for the analysis and prediction of pedestrian movement (Desyllas and Duxbury 2001; Turner and Penn 1999; Turner et al. 2001). Because pedestrian behavior has a significant impact on several social factors, including crime rates, rental returns, spatial occupation, and social encounters, isovists can be used to optimize space for a range of purposes (Desyllas 2000; Hillier and Shu 2000). Insights into social interaction within office buildings and spatial use in urban plazas have also been supported by VGA (Bada and Farhi 2009; Steen and Markhede 2010). While Wiener and Franz (2005) demonstrate that people have an innate capacity to assess space in terms of planar geometry, Stamps (2006, 2008) takes this suggestion further by directly comparing isovist geometry with environmental preference. In particular, he suggests that the isovist measure

54 Isovists: Spatio-visual Mathematics in Architecture

1427

skewness might be used to identify spaces that satisfy the human psychological need for shelter (Stamps 2005). In the last two decades, it has been proposed that correlations exist between specific isovist measures and human responses or behaviors. Some of these connections are statistically significant, others are logical but untested, and the remainder is less compelling. Nevertheless, it is possible to group the most commonly used isovist measures into five sets that have been linked to patterns of human perceptions and responses. The first two sets feature isovist measures that relate to perceptions of exposure and enclosure. This category includes some of the most consistent and powerful perceptions identified in spatial psychology and has some of the best evidence of connections to isovist measures. This is because spaces that are larger or more open provide ideal opportunities for surveillance and outlook (often called “prospect” spaces), whereas those which are small or enclosed tend to be perceived as hidden and safe (often called “refuge” spaces). Because the properties of exposure and enclosure are reciprocal – that is, the more open a space feels, the less enclosed it feels and vice versa – the first set features measures which can be an indicator of either perception (Table 1). Thus, isovist area, if relatively small, can be an indicator of feelings of enclosure, safety, or refuge, whereas if this measure is large, it can be linked to perceptions of exposure and risk. But there are also isovist measures that Table 1 Isovist measures associated with perceptions of exposure or enclosure, classified by use (scaled/scale-free) and type (metric/statistical) Isovist measure (units) Area (m2 ) Perimeter (m) Shortest radial length (m) Average radial length (m) Longest radial length (m) Convex deficiency Circularity Area/perimeter ratio Elongation – Γ Elongation – Ψ

Abbreviation A P RL(S) RL(A) RL(L) Con Circ A:P El(Γ ) El(Ψ )

Scaled     

Scale-free

Metric    

Statistical

  

  

  

 

Table 2 Isovist measures associated with perceptions of openness and enclosure, classified by use (scaled/scale-free) and type (metric/statistical) Isovist measure (units) Std dev of radial lengths M2 – variance M3 – skewness M4 Kurtosis

Abbreviation RL(SD) M2 M3 M4 K

Scaled

Scale-free     

Metric

Statistical     

1428

M. J. Ostwald and M. J. Dawes

seem to encapsulate, in a single measure, the balance of exposure and enclosure in a space (Table 2). Such measures include variance, skewness, and kurtosis, the last of which is an indicator of the extent to which a peak in a frequency-distribution curve is exaggerated. These measures combine the frequency of close surfaces (which potentially support feelings of enclosure) and distant surfaces (relating to exposure) in a single figure. The next two categories of correlations between isovist measures and human perceptions have been tested in various ways, but the evidence is typically more limited. Despite this, they are still useful in architectural design and analysis for examining the potential properties of space. The third set (Table 3) includes measures that have been associated with perceptions of mystery or spatial ambiguity. These measures rely on occlusion, which occurs when vision is not constrained by a simple surface but by the edge of a surface which hides something from view. For example, it has been argued that the higher the level of occlusion in an isovist perimeter, the greater the sense of mystery. The fourth set (Table 4) features isovist measures – like jaggedness and entropy – that have been connected to perceptions of spatial complexity or intricacy, potentially leading to feelings of confusion. Finally, empirical studies examining behaviors associated with enticement and discovery have found potential correlations with several drift measures (Table 5). Table 3 Isovist measures associated with perceptions of mystery, classified by use (scaled/scalefree) and type (metric/statistical) Isovist measure (units) Occlusivity (m) Number of occluded radials Average occluded length (m) Occluded/perimeter ratio Average occluded length/area

Abbreviation O RO(#) RO(A) O:P RO(A) :A

Scaled 

Scale-free 

  

Metric     

Statistical

Table 4 Isovist measures associated with perceptions of complexity, classified by use (scaled/scale-free) and type (metric/statistical) Isovist measure (units) Entropy (bits) 1 mm Entropy (bits) 100 mm Number of polygon edges Jaggedness

Abbreviation Ent(1mm) Ent(100mm) Pol# J

Scaled

Scale-free    

Metric

Statistical  

 

Table 5 Isovist measures associated with perceptions of enticement, classified by use (scaled/scale-free) and type (metric/statistical) Isovist measure (units) Drift (m) Area in directed view cone % of total area in view cone

Abbreviation Dr T(A) VC%

Scaled  

Scale-free



Metric   

Statistical

54 Isovists: Spatio-visual Mathematics in Architecture

1429

Conclusion The isovist encapsulates “an intuitively attractive way of thinking about a spatial environment,” because it offers “a description of the space ‘from inside’, from the point of view of individuals, as they perceive it, interact with it, and move through it” (Turner et al. 2001: 103). Isovists are one of the rare mathematical and computational methods in architecture which provide a “beholder-centered perspective” that can be used to model the “properties of space that are relevant for spatial behavior and experience” (Wiener and Franz 2005: 44). It is the existence of empirical evidence to support the use of isovist measures which has also, ironically, limited its development and use primarily to two dimensions. Traditional isovist analysis and VGA abstract the environment into a horizontal plane and a two-dimensional polygon. Such an isovist is incapable of differentiating between the spatial experiences of standing under a low or high ceiling and is unable to document visibility up or down staircases or across multilevel voids. For this reason, several attempts have been made to develop rules and methods for examining three-dimensional isovists and visibility graphs (Yang et al. 2007). Three-dimensional isovists have been utilized to study local properties, such as spatial openness (Fisher-Gewirtzman et al. 2003), wayfinding (Bhatia et al. 2013), and the effects of changing urban forms (Wong et al. 2012; Yang et al. 2007). However, while with current computational power it is feasible to undertake threedimensional isovist analysis (Bhatia et al. 2013; Morello and Ratti 2009), the lack of standard methodologies for constructing and using them, coupled with limited evidence of their correlation to human behavioral patterns, means that most practical applications continue to use only two-dimensional isovists.

Cross-References  Space Syntax: Mathematics and the Social Logic of Architecture

References Bada Y, Farhi A (2009) Experiencing urban spaces: isovists properties and spatial use of plazas. Courrier Savoir 9:101–112 Batty M (2001) Exploring isovist fields: space and shape in architectural and urban morphology. Environ Plan B: Plan Des 28(1):123–150 Benedikt ML (1979) To take hold of space: isovists and isovist view fields. Environ Plan B: Plan Des 6(1):47–65 Bhatia S, Chalup SK, Ostwald MJ (2013) Wayfinding: a method for the empirical evaluation of structural saliency using 3D isovists. Archit Sci Rev 56(3):220–231 Braaksma JP, Cook JW (1980) Human orientation in transportation terminals. Transp Eng J 106(2):189–203 Choudhary, R, Heo Y, Bafna S (2007) A study of variations among Mies’s courtyard houses by a combined set of all visual and environmental properties. In: Proceedings, 6th international space syntax symposium, Istanbul, 2007, pp 096.01–096.08

1430

M. J. Ostwald and M. J. Dawes

Conroy R (2001) Spatial navigation in immersive virtual environments. Dissertation. University of London Conroy-Dalton R, Bafna, S. (2003) The syntactical image of the city: a reciprocal definition of spatial elements and spatial syntaxes. In: Proceedings, 4th international space syntax symposium, London, 2003, pp 59.1–59.22 Davis LS, Benedikt ML (1979) Computational models of space: isovists and isovist fields. Comput Graph Image Process 11(1):49–72 Dawes MJ, Ostwald MJ (2014a) Testing the ‘Wright Space’: using isovists to analyse prospectrefuge characteristics in Usonian architecture. J Archit 19(5):645–666 Dawes MJ, Ostwald MJ (2014b) Prospect-refuge theory and the textile-block houses of Frank Lloyd Wright: an analysis of spatio-visual characteristics using isovists. Build Environ 80: 228–240 De Floriani L, Marzano P, Puppo E (1994) Line-of-sight communication on terrain models. Int J Geogr Inf Syst 8:329–342 Desyllas J (2000) The relationship between urban street configuration and office rent patterns in Berlin. Dissertation. University College London Desyllas J, Duxbury, E (2001) Axial maps and visibility graph analysis. In: Proceedings, space syntax 3rd international symposium. Georgia Institute of Technology, Atlanta Dosen A, Ostwald MJ (2016) Lived space and geometric space: comparing people’s perceptions of spatial enclosure and exposure with metric room properties and isovist measures. Archit Sci Rev 60(1):62–77 Fisher-Gewirtzman D, Burt M, Tzamir Y (2003) A 3-D visual method for comparative evaluation of dense built-up environments. Environ Plan B: Plan Des 30(4):575–587 Gallagher GL (1972) A computer topographic model for determining intervisibility. In: Brock P (ed) The mathematics of large scale simulation. Simulation Councils, La Jolla, pp 3–16 Gibson JJ (1966) The senses considered as perceptual systems. Houghton Mifflin Company, Boston Gibson JJ (1979) The ecological approach to visual perception. Houghton Mifflin Company, Boston Hillier B, Shu S (2000) Crime and urban layout: the need for evidence. In: Ballintyne S, Pease H, McLaren V (eds) Secure foundations: key issues in crime prevention, crime reduction and community safety. Institute for Public Policy Research, London, pp 224–248 Lynch K (1976) Managing the sense of region. MIT Press, Cambridge, MA Markhede H, Koch D (2007) Positioning analysis: social structures in configurative modelling. In: Proceedings, 6th international space syntax symposium, Istanbul, 2007 Meilinger T, Franz G, Bulthoff H (2009) From isovists via mental representations to behaviour: first steps toward closing the causal chain. Environ Plan B: Plan Des 39(1):1–16 Morello E, Ratti C (2009) A digital image of the city: 3D isovists in Lynch’s urban analysis. Environ Plan B: Plan Des 36(5):837–853 Ostwald MJ, Dawes MJ (2013a) Using isovists to analyse architecture: methodological considerations and new approaches. Int J Constr Environ 3(1):85–106 Ostwald MJ, Dawes MJ (2013b) Prospect-refuge patterns in Frank Lloyd Wright’s prairie houses: using isovist fields to examine the evidence. J Space Syntax 4(1):136–159 Peponis J, Bellal T (2010) Fallingwater: the interplay between space and shape. Environ Plan B: Plan Des 37(6):982–1001 Psarra S (2005) Spatial culture, way-finding and the educational message. In: Macleod S (ed) Reshaping museum space: architecture, design, exhibitions. Routledge, London, pp 78–94 Steen J, Markhede H (2010) Spatial and social configurations in offices. J Space Syntax 1(1): 121–132 Smardon RC, Palmer JF, Felleman JP (1986) Foundations for visual project analysis. Wiley, New York Stamps AE (2005) Isovists, enclosure, and permeability theory. Environ Plan B: Plan Des 32(5):735–762 Stamps AE (2006) Interior prospect and refuge. Percept Mot Skills 103(3):643–653

54 Isovists: Spatio-visual Mathematics in Architecture

1431

Stamps AE (2008) Some findings on prospect and refuge I. Percept Mot Skills 106(1):147–162 Stavroulaki G, Peponis J (2003) The spatial construction of seeing at Castelvecchio. In: Proceedings, 4th international space syntax symposium, London, 2003 Tandy CRV (1967) The isovist method of landscape survey. In: Murray HC (ed) Methods of landscape analysis. Landscape Research Group, London, pp 9–10 Turner A (2003) Analysing the visual dynamics of spatial morphology. Environ Plan B: Plan Des 30:657–676 Turner A, Penn A (1999) Making isovists syntactic: Isovist integration analysis. In: Proceedings, space syntax second international symposium, Brasilia, 1999 Turner A, Doxa M, O’Sullivan D, Penn A (2001) From isovists to visibility graphs: a methodology for the analysis of architectural space. Environ Plan B: Plan Des 28(1):103–121 Tzortzi K (2004) Building and exhibition layout: Sainsbury Wing compared with Castelvecchio. Archit Res Q 8(2):128–140 Wiener J, Franz G (2005) Isovists as a means to predict spatial experience and behavior. In: Freksa C, Knauff M, Krieg-Brückner B, Nebel B, Barkowsky T (eds) Spatial cognition IV: reasoning, action, interaction, Lecture Notes in Computer Science, vol 3343. Springer, Heidelberg, pp 42–57 Wong ASW, Chalup SK, Bhatia S, Jalalian A, Kulk J, Nicklin S, Ostwald MJ (2012) Visual gaze analysis of robotic pedestrians moving in urban space. Archit Sci Rev 55(3):212–223 Yang P, Putra SY, Li W (2007) Viewsphere: a GIS-based 3D visibility analysis for urban design evaluation. Environ Plan B: Plan Des 34(6):971–992 Yu R, Gu N, Ostwald MJ (2016) The mathematics of spatial transparency and mystery: using syntactical data to visualise and analyse the properties of the Yuyuan Garden. Vis Eng 4(4):1–9 Zamani P, Peponis J (2013) Co-visibility and pedagogy: innovation and challenge at the high museum of art. J Archit 15(6):853–879

Fractal Dimensions in Architecture: Measuring the Characteristic Complexity of Buildings

55

Michael J. Ostwald and Josephine Vaughan

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Box-Counting Method in Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stage 1: Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stage 2: Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stage 3: Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stage 4: Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1434 1436 1438 1438 1438 1440 1443 1443 1447 1448 1448

Abstract In architectural research, debates about the development, function, or appropriateness of building forms have traditionally been dominated by qualitative approaches. These have been common in the past because the full geometric complexity of a building has proven difficult to encapsulate in any single measurement system. Even simple buildings may be made up of many thousands of separate changes in geometry, which combine together across multiple scales to create a habitable or functional structure. However, since the 1990s architectural scholars have begun to adopt one particular method for mathematically

M. J. Ostwald () UNSW Built Environment, University of New South Wales, Sydney, NSW, Australia e-mail: [email protected] J. Vaughan The University of Newcastle, Newcastle, NSW, Australia e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_12

1433

1434

M. J. Ostwald and J. Vaughan

examining the form of a building. This method relies on fractal dimensions, which are measures of the characteristic complexity of an image, object, or set. This chapter introduces fractal dimensions and the primary method used to measure them in architecture, the box-counting approach. The chapter describes key methodological variables and limits that are pertinent to its application in architecture, and then it summarizes the results of past research using this approach. The paper concludes with a tabulated set of typical fractal dimension ranges for sets of plans and elevations of designs by 11 famous architects or practices.

Keywords Fractal dimension · Architecture · Design · Box-counting method · Measurement · Assessment

Introduction This chapter is about a way of using mathematics to measure, analyze, and compare the dimensional properties of buildings. The particular measure that is its focus is the fractal dimension, and in architectural research it is typically developed using the box-counting approach. Applications of fractal dimension analysis in architecture have been growing in number and sophistication since the 1990s, and the findings of these studies have been used to challenge a number of views about architectural history, theory, and design. At the heart of this approach to the mathematical and computational analysis of architecture is a paradigm shift that occurred in the 1970s, challenging the world to think differently about dimensions. When – in practical terms – we discuss the geometry of the world, we typically describe objects as either two- or three-dimensional. For example, walls are usually described as two-dimensional surfaces, because they are generally flat and can be delineated primarily in terms of just two measures, width and height. Similarly, most people would describe a car as a three-dimensional object, because it can be characterized in terms of three measures, length, width, and height. While both of these descriptions might be reasonable in a general sense, a rigorous approach to the issue of dimensionality reveals that the reality is more complicated. Consider a typical wall of a rectilinear room. The surface of this wall might be exposed brick, timber-lined, cement rendered, or plastered, but from a distance most of these materials will appear relatively smooth. Even the joints in the bricks and between the timber panels – that we intuitively know are not completely smooth – still coalesce into a single, flat plane when viewed from a distance. But if we move slightly closer to the wall, we can see that its surface has pattern, texture, and granularity. If we then get close enough to touch the wall, we can see levels of unevenness in the surface, and we can feel that the wall has an innate texture or degree of roughness. If we then examine this wall through a magnifying glass, a network of plateaus and fissures is revealed, and, under a microscope, the wall’s

55

Fractal Dimensions in Architecture: Measuring the Characteristic. . .

1435

surface is transformed into a minute landscape of mountain ranges and crevasses, none of which could be considered flat, smooth, or two-dimensional. All of which begs the question, is this wall two-dimensional or not? Objects in the real world – as opposed to those in theoretical mathematics or computer simulations – actually have complex dimensional properties that change depending on the scales at which they are examined. Because of this, mathematicians and scientists have suggested that it might be more meaningful to ask: what is the average, typical, or characteristic dimensionality of an object? But before answering this question, an interesting conceptual problem occurs. If we average multiple values, surely we are likely to produce a result that is a non-integer or fraction. For example, if we return to the problem of the wall, logic would suggest that if its smoothest part is approximately two-dimensional, what a mathematician would describe as D ≥ 2, and the roughest part is higher than this, but less than three-dimensional, or D ≤ 3, then the average must be a non-integer somewhere between 2 and 3. Again, logic would suggest that if the wall was mostly quite smooth, this average might be closer to two, say D = 2.001, while if the wall was coarsely finished or strongly textured, the number would be higher but still less than D = 3. If we had a method that allowed us to rigorously compare the roughness of the wall across different scales, we could compute the average dimensionality of each wall surface. Such methods do exist (as the following sections reveal), and using them we can measure the surface geometry of various materials, discovering, for example, that a rough plaster wall might have an average dimension of around D = 2.015, while the brick surface is potentially closer to D = 2.035, and the cement render and timber lining are between these two. Thus, all four materials are on average much closer to being two-dimensional than three-dimensional, but they are also all dimensionally different if we adopt a rigorous approach to measuring them. A further interesting characteristic of this way of viewing the world is that some of these wall surfaces possess a relatively consistent degree of irregularity over multiple scales, while others do not. Thus, not all averages are necessarily informative in isolation, and as with most statistical approaches to the world, additional indicators can assist in interpreting the characteristic dimensions of objects. For example, a car also has multiple different dimensional properties depending on the scale at which it is viewed, although, on average, most of these are potentially higher than those of the wall. But again, conceptually, the typical car does not fill a perfect three-dimensional volume of space, and thus its mean dimensionality will be less than this (D < 3). We could, for example, calculate that the characteristic or average dimensionality of the car might be D = 2.650, but whereas some of the walls we considered earlier were relatively consistent in their spread of dimensions, the car is much less so. As such, knowing the characteristic complexity of an object, like a wall or a car, is interesting, but to be useful or significant, we require additional information, or the results must be compared in some way with other data or indicators. While this way of viewing the world in terms of “average roughness” or, as it is more commonly known, “characteristic complexity” might seem strange at first, it has been adopted in many fields as a valuable analytical approach. In engineering,

1436

M. J. Ostwald and J. Vaughan

medicine, botany, astronomy, geology, geography, and metallurgy, it has been used to compare, classify, and study a range of systems. Most often this approach is used to differentiate between classes of objects and then correlate these with various known or theorized properties or behaviors. For example, various types of lung disease produce changes in the texture of the bronchi and alveoli. By calculating the average distribution of roughness in an MRI image of a lung, it may be possible to make a preliminary diagnosis of an underlying medical problem. Similarly, by comparing the different dimensions of rust patterns in the joints of a bridge, the risk that this structure may fail or collapse can also be considered. In these examples the dimensions themselves are not innately significant; they are a symptom or condition of something else – which through careful correlation, comparison, or reasoning can be very revealing. This is also the case in architecture and urban design where, since the early 1990s, a growing number of scholars have begun to measure the characteristic complexity of buildings, cities, and landscapes. Through these processes, a large number of studies have been undertaken into the way architects and designers work, the development of particular architecture styles, the impacts of local materials on buildings, social inhabitation patterns, and even psychological or behavioral responses to space and form. The key, in many of these studies, has been the capacity to measure and analyze the dimensional properties of architectural form. As previously stated, the primary method used for this purpose is fractal dimension analysis, employing the box-counting approach. This chapter introduces both, before describing various architectural studies that have examined the fractal dimensions of buildings. While very similar methods have been used to analyze urban growth patterns, street networks, and parks, the majority of the examples (and the methods) described in this chapter are drawn solely from architectural research. Furthermore, most applications of fractal dimension analysis, and not just in architecture, are focused on representational images (elevations, plans, and sections) rather than objects in their entirety. The reasons for this are relatively straightforward. The methods available for analyzing images are well understood, have been used for several decades, and are widely accepted. The methods for analyzing objects in three dimensions are much more computationally intensive, still relatively poorly understood (especially in terms of their accuracy and limits), and are not commercially available for most researchers.

Background The Latin word frangere means to fragment or break, and its past participle, fractus, provides the basis for the mathematical term “fraction.” A fraction is a part or fragment of a whole, and in mathematics it is commonly explained as resulting from the process of dividing one number (the “numerator”) by another number (the “denominator”). When the mathematician Benoit Mandelbrot wanted a term

55

Fractal Dimensions in Architecture: Measuring the Characteristic. . .

1437

to describe the complexity of a range of fragmented geometric systems, he drew inspiration from the words frangere, fractus, and fraction, to coin the term “fractal.” Mandelbrot (1977) initially used the word fractal to refer to two related, but different, concepts. The first is a type of infinitely deep and recurring geometric pattern. This is the most famous of the two concepts, and images of fractal geometric forms were soon adopted in popular culture. The second concept is associated with a kind of irregular dimensionality that defies conventional classification of objects as one-, two-, or three-dimensional. In his early research, Mandelbrot blurred the distinction between these two – fractal geometry and fractal dimensions – because the special type of mathematically generated patterns he was interested in also had particular, corresponding types of irregular dimensionality. However, as his research developed over the following decades, he realized that fractal geometry and fractal dimensions are not necessarily so intrinsically related (Feder 1988; Mandelbrot 1982). In particular, he observed that some geometric patterns (like the Golden Section spiral), which conform to the mathematical definition of a fractal geometric form, are so trivial as to be irrelevant. Conversely, many irregular forms (including the vast majority of those found in nature and all of those constructed by humankind) do not conform to any of the mathematical properties of fractal geometry, but their dimensions are of great significance for understanding the world. For Mandelbrot, any image, object, or set may have a fractal dimension, but only those with a defined mathematical formula and repetitious structure (known as the “scaling” pattern) can be instances of fractal geometry. This distinction is significant for architectural analysis, where the two are often conflated or confused, leading to a large number of spurious claims about fractals and architecture. From a mathematical perspective, buildings may have fractal dimensions, but they are never, in the physical world at least, examples of fractal geometry (Ostwald 2001, 2003). Technically, buildings (and forms in nature) are actually “multi-fractals,” a class of objects that have multiple simultaneous dimensions but which do not follow a precise mathematical formula. This chapter is about fractal dimensions not fractal geometry, although the origins of the two are closely intertwined. Mathematicians and scientists have identified seven different approaches to measuring the fractal dimensions of images (where 1.0 < D < 2.0) or objects (where 2.0 < D < 3.0) or equivalent data sets. The box-counting method and the “differential box-counting” method are the first two. Both of these are often traced to the work of mathematician Richard Voss (1986, 1988). The next two are the “power spectrum” and the “power differentiation” methods. The remainder are the “Kth nearest neighbor,” “covering blanket,” and “difference statistics” methods. These seven will all produce relatively accurate fractal dimension calculations for images or sets with midrange complexity (say 1.2 < D < 1.8). However, in tests the boxcounting method has generally been found to be the most accurate (Asvestas et al. 2000; Li et al. 2009). The following section describes its application in architectural image analysis (generally measuring the characteristic complexity of elevations and plans), although it is not dissimilar to other applications in other fields.

1438

M. J. Ostwald and J. Vaughan

The Box-Counting Method in Architecture There are four main stages in the architectural application of the box-counting method: data preparation, data representation, data preprocessing, and data processing. The decisions made in each stage can have a substantial impact on the accuracy, usefulness, and repeatability of results. The four stages are described hereafter, and the issues presented in each stage are also applicable to both computational (software assisted) and manual versions of the method.

Stage 1: Data Preparation In order to produce a useful fractal dimension calculation, the first step is to prepare an appropriate data source. The box-counting method analyzes and measures black and white line images (or data sets with equivalent binary coding). Some software packages may import a color image for box-counting analysis, but it will always be converted into a black and white line image before processing. For this reason, computational line drawings (CAD images) are the best data sources for the fractal analysis of architecture and photographs are the worst. The problem with photographs is that either the software (or the researcher, if using the manual method) must convert the image into a line drawing, and many photographs contain shadows, reflections, vegetation, and other complicating factors. If the goal is to analyze the building, then any of these additional elements in a photograph are highly likely to affect the result.

Stage 2: Data Representation The next issue relates to representation. What lines should be present in the image being measured? Or phrased in another way, which lines are significant for architectural analysis using fractal dimensions? There is no universal answer to this question, but there is a line of reasoning that provides guidance for researchers. Consider the following examples. First, if the purpose of the fractal analysis is to compare the change in inhabitation patterns in architectural plans over time, there is no need to include any representation of floor textures (tiles, parquetry, timber boards, and the like) on the plan being examined. The geometry of a floor texture has no direct role in shaping social function, but the geometry of the walls does. Second, if the purpose of the research is to analyze the visual impact of proposed, high-level power lines on the character of an historic streetscape, it is irrelevant to try to represent the textures of individual brick patterns on façades. From the distance required to assess the power lines, such textures are not visible. For such a study, it is the silhouette of the streetscape, with and without power lines, that is most valuable for assessing visual impact. These two examples demonstrate that the level of representation must be appropriate for the purpose of the research. The representation chosen should also

55

Fractal Dimensions in Architecture: Measuring the Characteristic. . .

1439

be clearly documented and reasoned. A framework to assist in this matter has been developed with five cumulative levels of representation identified (Ostwald and Vaughan 2013). Each level could be conceptualized as an overlay on the one before; thus Level 3 representation assumes that Levels 1 and 2 are already included and visible in the image (Table 1). The example images of level of representation – depicted for Le Corbusier’s arts and crafts style Villa Jaquemet – also include the fractal dimensions for each, demonstrating how the same elevation can have very

Table 1 Levels of data representation mapped against research purpose. (Adapted from Ostwald and Vaughan (2013)) Level 1

Representation Outline of the building

Research focus Building skyline or footprint

2

+ Primary formal modeling of the building

Building massing

3

+ Secondary formal modeling of the building

Building design

4

+ Tertiary forms associated with detail design

Detail design

5

+ Material texture

Surface finish and ornament

Research purpose To consider major social, cultural, or planning trends or issues which might be reflected in large-scale patterns of growth and change in the built environment To consider issues of building massing and permeability which might be a reflection of social structure, hierarchy, responsiveness (orientation), and wayfinding (occlusion) To consider general design issues, where “design” is taken to encompass decisions about form and materiality, but to not extend to concerns with applied ornament, fine decoration, or surface texture To consider both general and detail design issues, or where “design” is taken to include not only decisions about form and materiality but also movable or tertiary forms and fixed furniture which directly support inhabitation To consider issues associated with the distribution or zoning of texture within a design, or the degree to which texture is integral to design

1440

M. J. Ostwald and J. Vaughan

Fig. 1 Example of Level 1 representation for an elevation of Le Corbusier’s Villa Jaquemet, D = 1.084

Fig. 2 Example of Level 2 representation for an elevation of Le Corbusier’s Villa Jaquemet, D = 1.348

different dimensional properties, depending on how it is depicted (Figs. 1, 2, 3, 4, and 5).

Stage 3: Data Preprocessing Once the image or data has been prepared, there are four preprocessing practicalities that should be adopted to minimize errors and improve repeatability.

55

Fractal Dimensions in Architecture: Measuring the Characteristic. . .

1441

Fig. 3 Example of Level 3 representation for an elevation of Le Corbusier’s Villa Jaquemet, D = 1.425

Fig. 4 Example of Level 4 representation for an elevation of Le Corbusier’s Villa Jaquemet, D = 1.447

First, the image must be positioned against an appropriately sized field (being the “page” or “background” on which the image is positioned prior to analysis), and second, it should be centrally located in this field. Both of these requirements arise from the way the box-counting method includes the space around the image in the early stages of the analytical process, and past research has shown that poor image placement can have a detrimental impact on results. Third, the image must be represented in the thinnest possible line weights or thicknesses. This is because

1442

M. J. Ostwald and J. Vaughan

Fig. 5 Example of Level 5 representation for an elevation of Le Corbusier’s Villa Jaquemet, D = 1.606

Table 2 Data preprocessing variables and settings. (Adapted from Ostwald and Vaughan 2016) Category Variable Preprocessing White space

Optimal setting 40–50% increase

Image position

Center-Center

Line weight

1 pt

Image resolution

175 dpi

Notes 40–50% white space around a starting image produces the most consistent result The more centered the image on the field, the more consistent the set of results The thinner the line, the better the result. In practice, all images should be converted into lines of 1 pixel width using edge detection algorithms In principle, the higher the resolution and the larger the field, the better the result. In practice, 175 dpi at 2 MB will produce reasonable results

thicker lines can be read as objects by some software and in the manual method they may be counted more than once when smaller scale analysis is undertaken. Fourth, if using software, the higher the image resolution, the better the result. While some of these variables have relatively minor impacts on the process, others (including line weight) have been shown to create substantial errors in calculations (Table 2).

55

Fractal Dimensions in Architecture: Measuring the Characteristic. . .

1443

Stage 4: Data Processing Once the image has been appropriately prepared, the box-counting process commences by placing a grid over the image and then analyzing each square in the grid to determine if any of the lines of that image are present. The number of boxes with lines in them is then recorded. Thereafter, a smaller grid is overlaid on the same image and the process is repeated, now at a different scale, and the number of boxes with lines in them is again recorded. A mathematical comparison is then constructed between the number of boxes with lines in the first grid (N(s1) ) and the number of boxes with lines in the second grid (N(s2) ). Such a comparison is made by plotting a log-log diagram (log[N(s#) ] versus log[1/s#]) for each grid size. The slope of the straight line produced by this comparison is called the box-counting dimension. This value is calculated for a comparison between two grids (# = 1 and # = 2 in this example) as follows:      log N(s2) − log N(s1)   Db = log (1/s2) − log (1/s1) where: N(s#) = the number of boxes in grid number “#” containing some detail. 1/s# = the number of boxes in grid number “#” at the base of the grid. This process compares just two scales of the image, and as the example of the wall previously in this chapter demonstrates, to produce a better average requires many more. Once the image has been subjected to multiple additional grid overlays and counts, and the results computed, the average slope of the log-log chart is the fractal dimension (D) of the image. In order to produce a result that is accurate to around ±1%, at least ten grid comparisons are needed. However, there are four additional variables that can have an impact on the accuracy of this process. First, the ideal ratio by which each successive grid must reduce has been calculated. Second, the best location for subsequent grids has also been tested. Third and fourth, the ideal dimensions (boxmultiples) for the largest and smallest grids have also been identified by researchers. By using these settings, error rates can be minimized (Table 3).

Application Carl Bovill (1996) was responsible for some of the earliest applications of the box-counting method in architecture, and his book Fractal Geometry in Architecture and Design contains a detailed explanation of the manual version of the method. Bovill’s book not only demonstrates the method, but it also offers the earliest fractal dimension measures of the plans and elevations of several famous buildings. However, while Bovill’s work is traditionally regarded as the catalyst for fractal dimension analysis in architecture, more than a decade was to pass

1444

M. J. Ostwald and J. Vaughan

Table 3 Optimal variables and settings for applying the box-counting method. (Adapted from Ostwald 2013) Category Processing variables

Variable Scaling coefficient

Optimal setting 1.4142:1

Grid disposition

Top left

Starting grid size

0.25 l

Closing grid size

0.03 l

Notes √ 2:1 (or 1.4142:1) produces the best balance between varying levels of white space being included in the calculations while generating enough grids for comparison to achieve a statistically viable data set Edge-growth (top left-hand corner as point of origin) is the optimal setting although the center-growth variable generates results with a similar level of accuracy The short side (l) of the field should be divisible by four (0.25 l) to generate the starting grid proportion and cell size The lowest grid cell size should be 0.03 l where l is the length of the shortest side of the field

before researchers would develop a sophisticated understanding of the method and its application. Furthermore, it wasn’t until the first software programs were available to measure fractal dimensions that the approach started to produce useful and accurate results. As such, fractal dimension calculations in architecture from the mid-1990s to the late 2000s should be either treated with care or viewed with skepticism. Some of these results were simply limited in their accuracy because they employed manual versions of the method, while others used completely flawed data sources, representation standards, and processing approaches. There are also several examples of architectural researchers drawing false conclusions from their data or simply misunderstanding what the box-counting method does or what fractal dimensions are. In the latter category, there have been multiple examples of researchers assuming that particular fractal dimensions are somehow intrinsically significant. Thus, some researchers sought to find new Fibonacci style numbers or ratios in an attempt to identify a “golden,” “natural,” or “magical” dimension which architecture should adhere to. A fractal dimension is a measure of the distribution of geometric data in an object; it has no innate spiritual, poetic, or mystical properties. Focusing on the more scholarly applications of fractal dimensions in architecture, historic buildings – especially temples, memorials, and monuments – were among the earliest subjects of this approach. For example, Klaudia Oleschko measured the properties of three Teotihuacan pyramids and six ancient complexes, spanning from

55

Fractal Dimensions in Architecture: Measuring the Characteristic. . .

1445

100 BC to 700 AD (Oleschko et al. 2000). Mesoamerican monuments, memorials, and temples have been the subject of multiple similar studies (Burkle-Elizondo 2001; Burkle-Elizondo and Valdez-Cepeda 2001), and the Kandariya Mahadeva, an eleventh-century temple in India, has also been analyzed using dimensional analysis (Rian et al. 2007). Wolfgang Lorenz (2003) and Daniele Capo (2004) examined the fractal dimensions of classical Greek and Roman temples. As part of his research, Lorenz (2003) measured the dimensions of the entry elevations of four Grecian temples from the fourth century BC, including the Treasury of Athens and the Erechtheion. Ottoman architecture of the sixteenth century has been another recurring subject of fractal dimension analysis. Indeed, the earliest example of fractal analysis of architecture was focused on Ottoman housing in Amasya (Bechhoefer and Bovill 1994). Lorenz (2003) repeated this study, with a more accurate result, and it has since been the subject of further investigation (Vaughan and Ostwald 2010). The dimensional properties of a different group of traditional Ottoman houses in the Chora district of Istanbul have also been measured (Cagdas et al. 2005). In the largest application of fractal dimension analysis in architecture, the sixteenthcentury Süleymaniye and Kılıç Ali Pa¸sa Mosques in Istanbul were measured and compared using three levels of representation: form, ornament, and materiality (Ediz and Ostwald 2012; Ostwald and Ediz 2015). Across these two connected studies, almost 2,000,000 calculations were completed to measure the dimensional properties of these complex and ornate buildings. Vernacular housing in Europe has also been a subject of fractal dimension analysis, most often for studying the development or propagation of various traditional approaches to design. For example, Jadwiga Zarnowiecka (2002) studied traditional architecture in Poland, Laurent Debailleux (2010) analyzed 36 elevations of timber-framed structures in Belgium, and Lorenz (2003) measured 61 elevations of vernacular farmhouses in Italy. The twentieth-century architecture of Frank Lloyd Wright and Le Corbusier has also been a popular subject of fractal dimension analysis. Not only are these architects highly influential, but their designs are so seemingly different that they appeared to provide scholars with valuable test cases for dimensional analysis (Bovill 1996; Lorenz 2003). Bovill compared the D values for the south elevation of Wright’s Robie House with those of the west elevation of Le Corbusier’s Villa Savoye, concluding that the former was in the order of 10% higher or more complex than the latter. In total, 20 of Wright’s houses have been measured, with varying degrees of success and repeatability, using the box-counting method (Wen and Kao 2005; Vaughan and Ostwald 2011; Ostwald and Vaughan 2016). The north elevation of Wright’s 1905 Unity Temple in Chicago has also been the subject of several different studies. As has often been the case in this discipline, when the same data originally analyzed using an early manual version of the method was later remeasured using more refined computational versions, different results were produced. For example, the first measure for this elevation was D = 1.550 (Bovill 1996), the second was D = 1.513 (Lorenz 2003), and the third, D = 1.574 (Vaughan and Ostwald

1446

M. J. Ostwald and J. Vaughan

2010). This is a range of 6.1%, which is relatively close given the methodological developments that took place over the intervening years. It does, however, dramatize the difficulty with combining new and old measures, and this example is one of the closer ones. While the west elevation of the Villa Savoye has been a common test subject, Le Corbusier’s early, pre-Modernist Swiss-chalet style homes have also been examined, as too have his later, Modernist designs (Bovill 1996; Lorenz 2003; Ostwald and Vaughan 2016; Wen and Kao 2005). Other Modernist works which have been examined using this method include designs by Ludwig Mies van der Rohe (Wen and Kao 2005; Ostwald and Vaughan 2016), Eileen Gray (Ostwald and Vaughan 2009), and Gerrit Rietveld and Peter Behrens (Lorenz 2012). Domestic designs by minimalist architect Kazuyo Sejima (Vaughan and Ostwald 2008), regionalist architects Glenn Murcutt and Peter Stutchbury, and late-Modern, avantgarde architects John Hejduk, Peter Eisenman, and Richard Meier have also been measured (Ostwald and Vaughan 2016). A recent study of over 625 plans and elevations of famous houses classified them into low (lower quartile), medium (interquartile range), and high (upper quartile) dimensional ranges (Ostwald and Vaughan 2016). Tables 4 and 5 record the classification of the mean results of the elevations and plans of sets of 5 houses from each of 11 architects or practices (Frank Lloyd Wright, Le Corbusier, Mies van der Rohe, Eileen Gray, Venturi and Scott Brown, Frank Gehry, John Hejduk, Richard Meier, Peter Eisenman, Glenn Murcutt, and Kazuo Sejima). All of the designs were constructed between 1901 and 2007, and all of the measures were developed using consistent methodical standards and settings. In the case of architect Frank Lloyd

Table 4 Mean fractal dimension results for the plans of 65 houses by 11 architects or practices, classified by movement and fractal dimension range. (Adapted from Ostwald and Vaughan 2016) Movement Organic Modernism Functionalist Modernism Postmodernism Avant-garde and abstraction Minimalism and regionalism Totals

Lowest quartile μD 0.

with

(15)

and with the hazard function h(x) =

1 β

with

β > 0.

(16)

(2) Weibull Distribution The CDF of the Weibull distribution is F (x) = 1 − e−(x/λ)

k

x ≥ 0 , λ > 0 , β > 0.

with

(17)

Hence, the survival function can be written as S(x) = e−(x/λ)

k

with

x ≥ 0 , λ > 0 , β > 0.

(18)

with

x ≥ 0 , λ > 0 , β > 0.

(19)

with the hazard function k x k−1 ( ) λ λ

h(x) =

(3) Gamma Distribution The probability density function (PDF) of the gamma distribution is −λt

λα t α−1e Γ (α)

f (t) =

with

x≥0, λ>0, α>0

(20)

with  Γ (α) =



t −1 e−t dt

with

x ≥ 0 , λ > 0 , α > 0.

0

There are no close form formulae for survival or hazard function.

(21)

1572

P. L. Brockett and Y. Zhang

(4) Lognormal Distribution The CDF of the lognormal distribution is  F (x) = Φ

log(x) − μ σ

 with

x≥0, σ >0

(22)

with Φ(·) being the CDF of standard normal distribution. Hence, the survival function can be written as   log(x) − μ S(x) = 1 − Φ with x ≥ 0 , σ > 0. (23) σ The hazard function is in the form of   log(x) − μ h(x) = φ /F (x), σ

(24)

with φ being the PDF of the standard normal distribution. (5) Log-Logistic Distribution The CDF of the log-logistic distribution is F (x) =

(λx)κ 1 + (λx)κ

with

x ≥ 0 , λ > 0 , κ > 0.

(25)

Hence, the survival function can be written as S(x) =

1 1 + (λx)κ

with

x≥0, λ>0, κ>0

(26)

with

x ≥ 0 , λ > 0 , κ > 0.

(27)

with the hazard function h(x) =

λκ(λx)κ−1 1 + (λx)κ

(6) Generalized Gamma Distribution The CDF of the generalized gamma distribution is F (x) = γ {α/p, (λx)p }/Γ (α/p)

with

x ≥ 0 , α > 0 , λ > 0 , p > 0. (28)

Here, 

x

γ (s, x) =

t s−1 e−t dt

with

s>0

(29)

0

Hence, the survival function can be written as S(x) = 1−γ {α/p, (λx)p }/Γ (α/p)

with

x ≥ 0 , α > 0 , λ > 0 , p > 0. (30)

61 Actuarial (Mathematical) Modeling of Mortality and Survival Curves

1573

with the hazard function h(x) =

pλ(λx)α−1 e−(λx) γ {α/p, (λx)p }

p

(31)

Stochastic Mortality Model for Individual Mortality Rate In the past few decades, another advancement in the modeling of individual mortality has been developed – the stochastic mortality model. It has increasingly been developed and gradually used in research studies. Milevsky and Promislow (2001) proposed the idea of a stochastic mortality model in their paper. Subsequently, many researchers have utilized this model in their research or proposed extended models (e.g. Biffis, 2005; Dahl, 2004, 2005; Luciano et al., 2008). The stochastic mortality model describes the death time of an individual as the first jump time of a doubly stochastic Poisson process (i.e. Cox process). Since it involves multiple concepts in mathematics and stochastic calculus, we do not give full modeling details here. For detailed information, see, for example, Luciano et al. (2008).

Joint Life Mortality Models Why Do We Need Joint Life Mortality Models? A useful extension of the mortality model for individuals is the bivariate mortality model that jointly (simultaneously) considers lives which are joined or bound together by some environmental, marital, or familial relationship. The basic idea of constructing the joint mortality model is that an individual’s mortality could be jointly determined together with other lives for which there is a statistical connection in longevity. The most common example is spouses whose lifestyles, common hazards, etc. lead to dependent rates of mortality for them individually. Indeed, empirical studies in actuarial science, finance and demography present evidence of dependency between joint lives. For example, Nielsen et al. (2018), Seifter et al. (2014), Jaggera and Sutton (1991) find that the risk of mortality of a survivor increases after the individual’s partner dies. Using data on twins, Hougaard et al. (1992) shows that the death of an individual in a significant paired relationship affects the surviving partner’s longevity. Manor and Eisenbach (2003) examine how the effects of spousal death on mortality are driven by the duration of bereavement and other individual and family characteristics, including age, gender, educational attainment, and the number of household members. All the above examples show that the mortality rate of an individual can indeed depend on that of his/her spouse/partner. Mortality rates associated with joined lives is of high importance to the insurance and annuity industry as it guides the product design and pricing for joint and survival

1574

P. L. Brockett and Y. Zhang

annuities. A joint and survival annuity is an annuity product that guarantees some regular payments (to the annuitant/annuitants) for as long as one of the annuitants is alive (first to die annuity) or until the death of the second annuitant (last survivor annuity). Typically, the payment amount to the second to die will be reduced to half or one third of the original payment once one of the annuitants is dead. This recognizes that it does not cost twice as much for two to live as it costs for one to live. It is common in defined benefit pension plans. In traditional actuarial practice, the joint probability of survival of couples is simplified to be the product of the two individual probabilities of survival, i.e., assumes independence of the longevities for couples. Although the mathematical models are simpler under the independent of life-length assumption, it does not reflect the reality that joint lives are actually dependent. This could lead to inaccurate bivariate (joint) life tables. Furthermore, inaccurate estimation of joint mortality rates can lead to miscalculated pricing and inaccurate risk estimates for types of insurance and annuity products. Therefore, mortality models that can accurately describe the dependence between joint lives are of substantial use. Since defined benefit pension plans require joint life annuities for annuitants by default, this is practically important. Methods to incorporate dependency into joint life tables include the axiomatic approach of Brockett (1984) who followed the Gompertz style axiomatic approach of delineating desirable characteristics of models for human hazard (mortality) rates (but in two dimensions) and incorporated a bivariate exponential distribution for joint accidental deaths. Frees et al. (1996) show how to use the copula approach, to be discussed next, to incorporate dependency in mortality.

Copula Model In probability theory, a model for the copula in a multivariate probability distribution is widely used to model dependence in the joint multivariate probability distribution. Mathematically, suppose that the joint distribution of N random variables, (X1 , X2 , · · · , XN ), is to be modeled. Here, (X1 , X2 , · · · , XN ) could be dependent. Assume that the marginal cumulative distribution functions of (X1 , X2 , · · · , XN ) are continuous functions Fi . The probability integral transform Fi (Xi ) = Ui yields a uniformly distributed variable over [0, 1], and so in a multivariate setting, (U1 , U2 , · · · , UN ) = (F1 (X1 ), F 2(X2 ), · · · , FN (XN )) is distributed over the unit cube in N dimensions and with the dependence of the original variable inherited by the dependent in the U distribution. The copula of (X1 , X2 , · · · , XN ) is defined as the joint CDF of (U1 , U2 , · · · , UN ), i.e., copula function C(u1 , u2 , · · · , uN ) = P r(U1 ≤ u1 , U2 ≤ u2 , · · · , UN ≤ uN ). Table 1 provides several commonly used copulas. Sklar (1973) proved that for any N -dimensional joint distribution function FX1 ,X2 ,··· ,XN (x1 , x2 , · · · , xN ) = P r(X1 ≤ x1 , X2 ≤ x2 , · · · , XN ≤ xN ) with marginal distribution functions for (X1 , X2 , · · · , XN ) given as

61 Actuarial (Mathematical) Modeling of Mortality and Survival Curves

1575

Table 1 Examples of frequently used copula Name of copula Frank copula

Formula

Parameters

C(u, v) = ln(1 +

(eθu −1)(eθv −1) )/θ eθ −1

−((−log(u))θ +(−log(v))θ )1/θ

θ ∈ R\{0} θ ∈ [1, +∞)

Gumbel copula

e

Gaussian copula

Φ N (Φ −1 (u1 ),

Clayton copula Joe copula

[max{u−θ + v −θ − 1; 0}]− θ θ ∈ [−1, +∞)\{0} 1 − [(1 − u)θ + (1 − v)θ − (1 − u)θ (1 − θ ∈ [1, +∞) 1 v)θ ] θ uv θ ∈ [−1, 1) 1−θ (1−u)(1−v)

Ali-Mikhail-Haq copula

···

, Φ −1 (uN ))

Φ N is a N -dimensional normal distribution

1

(F1 (x1 ), F2 (x2 ), · · · , FN (xN )), there always exists a copula function C(X) : RN →R, such that FX1 ,X2 ,··· ,XN (x1 , x2 , · · · , xN )=C(F1 (x1 ), F2 (x2 ), · · · ,FN (xN )). Because of the copula approach to modeling dependence and its mathematical attributes, copulas are widely used in statistics and mathematical finance to describe the dependence between random variables (Cherubini et al., 2004; Nielsen, 2007). Specifically, it is straightforward to use copula to model dependent joint mortality. For example, Frees et al. (1996) exploit a copula formulation to model bivariate dependent joint survival in a large Canadian insurance company dataset. Carriere further illustrates how the selected marginal distributions and the so-called “Frank copula” outperformed other options in his paper (Carriere, 2000). Spreeuw (2006) investigated how mortality between joint lives is correlated using mainly the “Archimedean copula.” Luciano et al. (2008) consider an Archimedean copula and study the dependency of mortality between couples. In order to illustrate how the joint mortality for couples is constructed by copula functions, we take the mortality model introduced by Frees et al. (1996). Here the observed cases are individual life lengths of heterosexual couples; however, same sex dependence relationships can be modeled in exactly the same way. Since we have male and female lives, N = 2 in our model. The first modeling step is to find the marginal distribution for male and female life length. Any of the mortality models we introduced in the last section, or any other proper ones, can be used here. In Frees et al. (1996), the authors chose to use the Gompertz distribution, with some variable transformation, for both male and female, because there are only two unknown parameters to be estimated for each of the genders, and the Gompertz fits mortality data quite well. The second step is to find the desired copula function. In theory, any bivariate distribution function with domain [0, 1] can be used as a copula function. However, in practice, there are several classic copula functions that are widely used in modeling dependent variables (see Table 1 for some examples). In Frees et al. (1996), the authors chose to use the Frank copula, which is in the αu αv −1) form of C(u, v) = ln(1 + (e −1)(e )/α. It can be seen from the formulation eα −1 that independence of life lengths corresponds to the limiting case α = 0. Once the desired distribution functions have been chosen, parameter estimation can be

1576

P. L. Brockett and Y. Zhang

done through maximum likelihood estimation (MLE). In Frees et al. (1996), the authors found that the best estimate for α is −3.367, which is quite different from 0, indicative of dependent lives. Also, the calculated Spearman’s correlation based on α is 0.49. Both numbers prove the dependence between members of a couple.

A New Stochastic Mortality Model for Joint Lives In Zhang and Brockett (2019), a new method is given to modeling mortality rates for dependent joint lives. Their model falls into the framework of stochastic mortality. However, they provide a new technique for modeling the mortality rate process of an individual and for building the dependence connection between joint lives. They model the hazard rate process of an individual as a time-changed Brownian motion and introduce dependence between members of a couple through correlated time changes, i.e., the subordinators. The intuition behind the process is that two lives have their own survival curves that they move along as they age, but common experiences, foods, environments, etc. affect how fast or slow they move along their curve, and this “speeding up” or “slowing down” of the aging process movement along the curve is dependent. We first introduce basic definitions and then present the model. The basic definitions are presented as follows: • Stopping Time The stopping time for a stochastic process is an important well-studied mathematical concept. Its definition can be found in any book on stochastic calculus. Let (Ft , t ≥ 0) be a filtration of α-algebras. Stopping time is a random variable τ with values in [0, +∞] and such that {τ ≤ t} ∈ Ft for t ≥ 0. • Time-Changed Brownian Motion Time-changed Brownian motion is also a well-studied mathematical concept (e.g., Luciano and Schoutens, 2006; Hurd, 2009; Luciano and Semeraro, 2010). By Hurd (2009), “The time-changed Brownian motion (TCBM) generated by X and G is defined to be the process Lt = XGt , t ≥ 0.” Here, “(Ω, F, F, P) is a filtered probability space that supports a Brownian motion B and an independent strictly increasing càdlàg process G with G0 = 0, called the time change. P may be thought of as either the physical or risk-neutral measure,” and Xt = x + σ Bt + βσ 2 t is the Brownian motion starting at x having constant drift βσ 2 and volatility σ > 0, (Hurd, 2009). Here, a Càdlàg process is defined in the way such that almost all its sample paths are continuous from the right and limited from the left at every point. • Subordinator Subordinator is another name for the time change transformation Gt which essentially changes intrinsic (clock or calendar) time into effective (or operational) time (i.e., G can speed up or slow down the time at which a process progresses along a path when used as a subordinator).

61 Actuarial (Mathematical) Modeling of Mortality and Survival Curves

1577

• Hazard Rate Process Hazard rate process is the hazard rate or mortality rate considered as a stochastic process over time. Here the hazard rate is treated as a stochastic process, or a random variable, instead of a deterministic function as in the models we introduced before. In Zhang and Brockett (2019), the authors model the intensity function h(t) of the hazard rate process as a time-changed Brownian motion, i.e., ht = XGt , and model the death time of an individual as the stopping time defined as the first jump time of a Cox process which has an intensity function of h(t). Mathematically, the death time  t τ of an individual who has survived upon age s can be written as τs = inf {t : s h(u)du ≥ E|Fs }, where E represents a unit exponential random variable and Fs represents the information of h(t) upon age s. Accordingly, it can be shown that the survival function is P (τs > t | Fs , t ≥ s) = E{e

t s

−XGu du

| Fs , t ≥ s}

(32)

where Fs represents the information up to age s. Now, consider a couple with a male partner (indicated as M) and the female partner (indicated as F ) both of whom have a base log hazard rate (force of mortality) process XM and XF , with XtM = X0M + σ M BtM + β M (σ M )2 t

(33)

XtF = X0F + σ F BtF + β F (σ F )2 t

(34)

and

F respectively. Assume also that the subordinators GM t (for male) and Gt (for female) M F M F and X , X , Gt , Gt are mutually independent. These determine how fast or slow the male (or female) life moves along their respective curves. Construct the F subordinators GM t and Gt in the following way M M M0 M GM t = α Gt + (1 − α )Gt , 0 ≤ α ≤ 1

(35)

GFt = α F Gt + (1 − α F )GFt 0 , 0 ≤ α F ≤ 1

(36)

and

and GFt 0 are three increasing Càdlàg By Zhang and Brockett (2019), “Gt , GM0 t processes which are mutually independent, allowing jumps, and α M and α F are numbers ranging between 0 and 1.” F We can see that by modeling in this way, the subordinators GM t and Gt are M F correlated through Gt when both α and α are positive. This setting leads in turn

1578

P. L. Brockett and Y. Zhang

to there being dependence between the intensities of the mortality rate processes XtM and XtF . We finally introduce model calibration. In Zhang and Brockett (2019), the authors calibrate the parameters through simulation and apply their model to the same Canadian insurance dataset used by Frees et al. (1996). When the subordinators (for both genders) take the form of the inverse Gaussian (IG) process, it is found that the hazard rate process can be well modeled by the newly constructed time-changed Brownian motion: the normal inverse Gaussian (NIG) process. For more details of NIG process, see Luciano and Semeraro (2010). Mathematically, the NIG process requires the subordinators take the form of Gt = I G(t, b),   √ √ M M 1 − α 1 − α b × GM0 = IG √ t, , √ t 1 − αM αM   √ √ 1 − αF b × 1 − αF F0 Gt = I G √ t, √ 1 − αF αF and the Brownian motion takes the form of  M β = (α M )2 − b2 /(α M (σ M )2 ),  β F = (α F )2 − b2 /(α F (σ F )2 ).

(37)

(38)

Nonparametric Estimation of the Mortality Function Another widely used method to estimate human mortality is a nonparametric estimation method. This method is preferred if the modeler desires to model without assuming any distributions. This can occur, for example, when determining the mortality process of people with a new disease.

One-Sample Estimation Assume that there are a total of n individuals observed. If the observation is not censored, i.e., if we can observe the actual death time of each individual, then a simple estimator can be used to estimate the survival rate (survival function) at each time point, 1 S(t) = I{ti > t} n n

i=1

(39)

61 Actuarial (Mathematical) Modeling of Mortality and Survival Curves

1579

Here, ti , i = 1, · · · , n denotes the death time of each individual, and I is the indicator function. This essentially takes us back to Halley’s empirical mortality table model (in probability terms instead of strict counts of those still living as a function of age). On the other hand, in the real world, it often happens that we can only obtain (right-) censored samples, so all we know about some people is that they survived beyond a certain age, but not when exactly they died (e.g., their record of their life observation is censored at the study’s end). In the case of censored data (lives not observed until death), there have been new models developed to estimate the survival rate curve. Kaplan and Meier (1958) proposed the following model which extended the above empirical distribution estimator to be capable of accommodating data some of which is (right-)censored. Assume that there are only m(m ≤ n) individuals for whom we observe the death occurring. Denote and sort their death times as t(1) , · · · , t(m) . Let n(i) be the number of individuals alive just prior to time t(i) and d(i) be the number of death at time t(i) . The Kaplan-Meier estimator of the survival function at time t is then S(t) =



 d(i) 1− n(i)

(40)

i:t(i) t, δ1i = 1, δ2i = 1) n

(48)

K2 (s, t) =

1 I(Y1i > s, Y2i > t, δ1i = 1) n

(49)

K3 (s, t) =

1 I(Y1i > s, Y2i > t, δ2i = 1) n

(50)

H (s, t) =

K1 (s, t) =

and

Here, δ1i and δ2i are the indicators of whether or not an observation on life length has been censored, with δ = 1 indicating that the individual is observed until death during the observation period and δ = 0 indicating that the individual left observation before death (so their actual death time was censored). There are several forms of extensions of Dabrowska’s estimator. Lin and Ying (1993) proposed a modified Dabrowska’s estimator which estimates the paired survival rate in a special case – when two members have univariate censoring time. This estimator simplified the calculation and still maintains good estimation results. Gribkova et al. (2013) proposed a nonparametric estimator for another special case, when two censoring variables for the paired individual differed only through an additional observed variable. It also fixed the issue that Dabrowska’s estimator

61 Actuarial (Mathematical) Modeling of Mortality and Survival Curves

1581

assigns negative mass to some points on the plane. For more details see Lin and Ying (1993) and Gribkova et al. (2013).

Mortality Modeling with Cohort Effect Increasing in Human’s Life Expectancy and Longevity Risk According to the World Health Organization (WHO), life expectancy is the expected number of year that a person will live, given that the current mortality rates remain the same. On the other hand, modern mortality research indicates that people are living longer now than decades ago, and this trend is expected to continue. Possible causes of mortality improvement include improvement in sanitation, medical technology, and individual’s health literacy. Mathers et al. (2015) proved that the reduced tobacco use contributed significantly to reduced older age male mortality, and reduction in cardiovascular disease and diabetes led to significant life expectancy gains for both males and females in older age. Baker et al. (2007) and Meara et al. (2008) argued that education level, especially health literacy, is strongly associated with individual’s life expectancy, with people having higher education and health literacy levels tending to live longer than those who lack this education. Developments in medical technology also contribute significantly to the increase in human’s life expectancy. For example, Samji et al. (2013) showed that, thanks to the combination antiretroviral therapy (cART) treatment, HIV-positive individuals now have a life expectancy closer to that of the general population. This increasing trend in life expectancy has been observed globally. For example, the World Health Organization (WHO) reports that, “Global average life expectancy increased by 5.5 years between 2000 and 2016, the fastest increase since the 1960s.” Leon (2011) also showed that life expectancy at birth increases steadily globally. Global Burden of Disease (GBD) (2016) showed in their analysis for the Global Burden of Disease Study that “life expectancy from birth increased from 61.7 years (95% uncertainty interval 61.4–61.9) in 1980 to 71.8 years (71.5–72.2) in 2015.” This trend is more significant in several countries in sub-Saharan Africa, while it is less significant (or even reversed) in some countries that were experiencing war or interpersonal violence. Longevity risk is one risk caused by the potential increase in human’s life expectancy, namely, the risk that one lives so long as to outlive their financial resources in retirement. Longevity risk can affect both individuals and the institutions that provide guaranteed retirement income and/or life insurance (such as annuity providers and institutions providing defined benefit pensions). In individual retirement planning, an unexpected increase in life span may lead to a shortage of money later in life. People who expect to live only a certain years may not adjust their spending to accommodate the likelihood of living longer than what they have expected. From the institutional perspective, longevity risk for defined benefit pension plans shifts this risk of underestimating the length of monetary need to

1582

P. L. Brockett and Y. Zhang

the employer. Butrica et al. (2009) and the Bureau of Labor Statistics (2008) have shown that the percentage of private wage and salary workers covered by defined benefit (DB) pension plans decreased while the percentage of defined contribution (DC) increased, and this trend will continue (Aglira, 2006; Gebhardtsbauer, 2006; Government Accountability Office, 2008; Munnell et al., 2006), with more of the longevity risk burden borne by individuals. As a result, “about 26 percent of last-wave boomers would have lower family incomes at age 67” (Butrica et al., 2009). For institutions that provide retirement income, an increase in life expectancy may lead to increased financial liability and shortage of money to make promised payments. The report by the Chartered Institute of Management Accountants (CIMA, 2008) points out that, “at the end of September 2007, there were four companies in the FTSE100 with FRS17/IAS19 pension liabilities in excess of their equity value.” Reuters news service (reuters.com 2010) also reported that in 2010, 5 billion British pounds were added to corporate pension obligations in the UK because of the increased longevity. Although the majority of individuals and institutes providing retirement income are exposed to longevity risk, few have given enough attention to it. According to Antolin (2007), many pension funds in the European Union still relied on mortality rates and tables which were based on observed lives some time ago, without taking into consideration that the life expectancy may have changed. Therefore, creating mortality models that incorporate the evolution of human life expectancy over time as opposed to the static (snapshot) models discussed previously is greatly needed in practice.

Lee-Carter Model Lee and Carter (1992) proposed a mortality model that builds in a longer-run forecast of age-specific mortality in the USA from 1990 to 2065. In their model, instead of modeling the mortality rate directly, they modeled the log of the mortality rate as the linear function of some parameters which are both age and year specified (as opposed to only being age-dependent snapshots at a particular time). Mathematically, the Lee-Carter model for the central mortality rate m(x, t) of an individual at age x at the time t, m(x, t), is in the form of ln(m(x, t)) = ax + bx kt + εx,t

(51)

m(x, t) = eax +bx kt +εx,t

(52)

or, equivalently,

Here ax , bx , and kt are parameters to be estimated, and εx,t is a set of white noise (with zero mean and small variance). Lee and Carter (1992) imposed certain constraints on ax , bx , and kt in order for them to have unique solutions. The constraints are

61 Actuarial (Mathematical) Modeling of Mortality and Survival Curves



bx = 1;



kt = 0.

1583

(53)

Under such constraints, ax is simply the average of ln(m(x, t)). It can be seen see from the model that ax is the simple average of ln(m(x, t)), bx describes the mortality evolution pattern by age, and kt describes the mortality evolution pattern by year. To solve for ax , bx , and kt , Lee and Carter (1992) applied singular value decomposition (SVD) to the log of the mortality matrix after subtracting the average, i.e., to the matrix

x1 x2 M˜ = . .. xm

t1 t2 ln(mx1 ,t1 ) − ln(mx,t ) ln(mx1 ,t2 ) − ln(mx,t ) ⎜ ln(mx2 ,t1 ) − ln(mx,t ) ln(mx2 ,t2 ) − ln(mx,t ) ⎜ ⎜ .. .. ⎝ . . ln(mxm ,t1 ) − ln(mx,t ) ln(mxm ,t2 ) − ln(mx,t ) ⎛

... tn ⎞ . . . ln(mx1 ,tn ) − ln(mx,t ) . . . ln(mx2 ,tn ) − ln(mx,t ) ⎟ ⎟ ⎟ .. .. ⎠ . . . . . ln(mxm ,tn ) − ln(mx,t )

Here, ln(mx,t ) is the average of all observed ln(mx,t ). Formally, SVD of the matrix M˜ can be written as M˜ = U DV  . After rearranging the singular values and the corresponding columns of U and V such that the diagonal of D is sorted in descending order, the estimated values of bx s are then the first column of U , and the estimated values of kt s are then D11 V∗1 , i.e., the largest singular value times the first column of V. Lee and Carter (1992) then proposed to model the time (year-dependent evolutionary factor) mortality index kt as an ARIMA process. More specifically, and in practice, the AR(1) model is used almost exclusively (Girosi and King, 2007), i.e., ˆ + θ + εt kˆt = kt−1

(54)

with εt being the white noise, εt ∼ N(0, σ 2 ). In another paper of Carter and Lee (1992), the authors used the Lee-Carter model to model the mortality rate for each gender separately. The authors mentioned three possible methods to extend the Lee-Carter analysis to male and female subpopulations, although they only used the first one in their paper. The first method to model gender-separate mortality rates is to model the male and female population separately and then find out their dependence as needed. The second one is to “estimate a single k which drives changes in all the age-specific rates of both subpopulations” (Carter and Lee, 1992). The third one is to estimate the subpopulations jointly “as a co-integrated process” (Engle and Granger, 1987). Carter and Lee (1992) found that the difference of mortality index caused by age and gender difference was significant. For example, between year 1970 and 1980, the mortality rate of young adults fell more rapidly than their older counterparts. Additionally, they found young male adult’s mortality rates rose steeply around 1960, while young female adult’s mortality rates were almost flat. Carter and Lee (1992) also proposed a new method to re-estimate the mortality index kt . In their analysis, instead of using the estimated kt ’s directly from the SVD, they re-estimated the kt ’s using the estimated ax and bx in order to have estimated

1584

P. L. Brockett and Y. Zhang

death numbers to be consistent with the observed value, assuming that the white noise term εs,t was all zero. Mathematically, assuming that the total death number in year t is Dt , and the total population of age x in year t is Nx,t , then kt can be re-estimated with Dt =

[Nx,t eax +bx kt +εx,t ], ∀x, t, εx,t = 0

(55)

Carter and Lee (1992) argued that there were no analytical solutions for kt by this equation. However, it can be solved in an iterative manner.

Extensions of Lee-Carter Model Since Lee and Carter (1992), many researchers have worked on extensions of LeeCarter model to have better fits on different datasets. Here we introduce seven forms of extensions. (1) Renshaw and Haberman (2006) Renshaw and Haberman (2006) proposed an extension of Lee-Carter model, in order to incorporate the cohort effects. Mathematically, Renshaw and Haberman (2006) model the force of mortality as m(x, t) = exp(ax + bx0 ιt−x + bx1 kt + εx,t ),

(56)

with the following constraints,

kt = 0;



bx0 = 1;



ιt−x = 0;



bx1 = 1

(57)

where either $ιt1 −xk = 0 or kt1 = 0. Here, bx0 ιt−x describes  the cohort effect with t − x being the birth year. It can be calculated that ax = n1 t 1t ln(m(t, x)). Instead of adapting the SVD procedure, Renshaw and Haberman selected the Poisson response model “with the response variable equal to the number of deaths” (Renshaw and Haberman, 2006) and used maximum likelihood estimation methods. The author listed two of the maximum likelihood estimation methods that can be used to estimate the parameters of the Lee-Carter model. The same method can also be used to estimate two extended Lee-Carter model: (1) the one adapted in Wilmoth (1993) and Brouhns et al. (2002) and (2) the one following James and Segal (1982). For more details, see Renshaw and Haberman (2006). Later on, Cairns et al. (2009) proposed a revision of Renshaw and Haberman’s iterating procedure.

61 Actuarial (Mathematical) Modeling of Mortality and Survival Curves

1585

(2) Currie et al. (2004) Currie et al. (2004) proposed the following model, which uses the B-splines and P-splines method to fit the mortality surface. ln(m(t, x)) =



αy

θij Bij Bαy (x, t)

(58)

ij

(3) Cairns et al. (2006) Cairns et al. (2006) proposed a model that is currently referred to as CairnsBlake-Dowd (CBD) model. In their model, instead of modeling the force of mortality, the authors modeled the mortality rate q(x, t) = 1−e−m(x,t) . They model the logit of the mortality rate as a linear function of two mortality indices kt1 and kt2 . Formally, the model is in the form of logit q(t, x) = log

q(t, x) = βx1 kt1 + βx2 kt2 1 − q(t, x)

(59)

with two constraints, βx1 = 1;

βx2 = x − x¯

(60)

 Here, x¯ = i xi /nα calculates the average age in the sample. Therefore, the CBD model can be simplified as logit q(t, x) = kt1 + kt2 (x − x) ¯

(61)

with no identification problem. (4) Generalized CBD Model (Cairns et al., 2009): Generalized Model 1 Cairns et al. (2009) proposed three generalizations of the original CBD model. We begin with the first generalization, which includes a cohort effect. 3 logitq(t, x) = βx1 kt1 + βx2 kt2 + βx3 γt−x

(62)

¯ βx3 = 1 βx1 = 1; βx2 = x − x;

(63)

with constraints

The authors pointed out that the model above suffered from the identifiability problem. Therefore, in order for the model to have unique solutions, the following constraints were added (Cairns et al., 2009)

1586

P. L. Brockett and Y. Zhang



γc3 = 0;

c∈C



cγc3 = 0

(64)

c∈C

Here, C is the set of cohort years of birth that have been included in the analysis (Cairns et al., 2009). (5) Generalized CBD Model (Cairns et al., 2009): Generalized Model 2 The second generalized model adds a quadratic term to the age effect in the first generalization, i.e., 4 ¯ + kt3 ((x − x) ¯ 2 − σˆ x2 ) + γt−x logitq(t, x) = kt1 + kt2 (x − x)

(65)

 ¯ 2 /nα This model requires the following constraints to avoid Here, σˆ x2 = i (x − x) identifiability problem (Cairns et al., 2009). βx1 = 1; βx2 = x − x; ¯ βx3 = 1;

c∈C

γc4 = 0;



cγc4 = 0;

c∈C



c2 γc4 = 0

(66)

c∈C

(6) Generalized CBD Model (Cairns et al., 2009): Generalized Model 3 The third generalization assumes that the impact of the cohort effect for any specific cohort diminishes over time, i.e., βx3 in the first generalized CBD model is decreasing with x. 3 logit q(t, x) = βx1 kt1 + βx2 kt2 + βx3 γt−x

(67)

¯ βx3 = xc − x βx1 = 1; βx2 = x − x;

(68)

with constraints

Here, xc is some constant parameter to be estimated (Cairns et al., 2009). Then, the model can be rewritten as 3 ¯ + γt−x (xc − x) logitq(t, x) = kt1 + kt2 (x − x)

(69)

The authors introduced one more constraint to avoid identifiability problems (Cairns et al., 2009): 3 γt−x =0 (70) x,t

Age, year period, and the cohort effects are considered in the models above. Modelers who plan to choose one of these models should follow their personal beliefs about how the underlying factors play a role in determining human mortality and pick the most proper one accordingly.

61 Actuarial (Mathematical) Modeling of Mortality and Survival Curves

1587

Mitchell et al. (2013)’s Extension of the Mortality Model Both the Lee-Carter model and its extensions are widely used in mortality modeling. However, these models could have certain properties that are not desired in practice. For example, Mitchell et al. (2013) argued that “the covariance matrix of mortality rates vastly overestimates dependence.” As illustrated in Mitchell et al. (2013), the Lee-Carter model would unnecessarily capture some “meaningless” correlation which indicates a strong dependence between two independent variables. In Mitchell et al. (2013), the authors provided a good example in which they constructed 11 independent Brownian motions and simulated each for 1,000 times. They recorded 100 points along the path of each Brownian motion. Following the matrix decomposition imposed in Lee-Carter analysis, the authors found out that that the first singular value explained 99.2% of the variability in the data, which was inconsistent with the fact that those 11 Brownian motions, are independent. Another argument that researchers have regarding to the original Lee-Carter model is that it cannot accurately capture the jumps of the mortality rate (e.g., Mitchell et al., 2013; Deng et al., 2012). Deng et al. (2012) applied a double exponential jump diffusion (DEJD) model (Kou and Wang, 2004) to forecast the mortality rate, while Mitchell et al. (2013) treated the mortality index as a normal inverse Gaussian (NIG) process. In this section, we will illustrate the model in Mitchell et al. (2013) in detail. The application and the model fitness of the DEJD procedure can be found in detail in Deng et al. (2012). In Mitchell et al. (2013), the author proposed to model the change in the log mortality rate changes instead of modeling the level of log mortality rates directly. Mathematically, the model is in the form of ln(m(x, t + 1)) − ln(m(x, t)) = αx + βx kt + εx,t

(71)

m(x, t + 1) = m(x, t)eαx +βx kt +εx,t

(72)

or, equivalently,

To avoid the identifiability problem, this model has the same constraints as the original Lee-Carter model, i.e.,

βx = 1;



kt = 0

(73)

The above model describes the change in log mortality rate as a liner function of the index kt with some white noises. The authors argued that, by modeling in this way, they de-trended the data since they only considered the change instead of the levels. As a result, they avoid the drawback from the Lee-Carter model. To estimate the index kt , Mitchell et al. (2013) applied the singular value decomposition method as in the original Lee-Carter analysis. Mathematically, if we define

1588

P. L. Brockett and Y. Zhang change

mx,t

 ln(m(x, t + 1)) − ln(m(x, t))

(74)

then the SVD is performed to the matrix

x1 x2 M˜ change = . .. xm change



t2

t1 change

mx1 ,t1

change

− mx,t

⎜ change change ⎜ mx2 ,t1 − mx,t ⎜ ⎜ .. ⎝ . change change mxm ,t1 − mx,t

change

mx1 ,t2

change

... change

− mx,t

change

− mx,t .. . change change mxm ,t2 − mx,t mx2 ,t2

tn change

. . . mx1 ,tn

change

− mx,t



change change ⎟ . . . mx2 ,tn − mx,t ⎟ ⎟. ⎟ .. .. ⎠ . . change change . . . mxm ,tn − mx,t

change

Here, mx,t is the average of all mx,t , which is also the value of αx . βx and kt can be estimated following the similar procedure. The next step is to make forecast of the future mortality rate. In Mitchell et al. (2013), instead of modeling the kt s as ARIMA process as in the Lee-Carter analysis, the authors proposed to model the kt series as a normal inverse Gaussian (NIG) process. Mathematically, a random variable X with NIG distribution has the density function of  λ(λ+μ2 θ 2 ) 1 ×K1 ( fX (x)=eλ/θ+μ(x−δ) (λ + (x − δ)2 )(λ + (x − δ)2 )). 2 2 2 θ π θ (λ + (x − δ) ) (75) Here, δ is the initial value of the Brownian motion; μ, θ > 0, and λ > 0 are the drift, the mean time when the Brownian motion is evaluated, and the volatility of the inverse Gaussian process, correspondingly. By adapting this model, Mitchell et al. (2013) captured the big jump in mortality rate which was caused by the large number of death as a result of the flu pandemic in 1918. This model obtained a very good prediction. In addition, Mitchell et al. (2013) showed that all of the above common mortality rate models’ fit is improved by modeling the change in (log of) the mortality rates rather than the (log of) the rates themselves.

References Aalen O (1978) Nonparametric inference for a family of counting processes. Ann Stat 6(4):701– 726 Aglira B (2006) To freeze or not to freeze: observations on the U.S. pension landscape. Global retirement perspective. Mercer Human Resource Consulting, New York Altshuler B (1970) Theory for the measurement of competing risks in animal experiments. Math Biosci 6:1–11 Amicable Society (1854) The charters, acts of parliament, and by-laws of the corporation of the amicable society for a perpetual assurance office. The Amicable Society for a Perpetual Assurance Office. South Yarra, Australia Antolin P (2007) Longevity risk and private pensions. OECD working papers on insurance and private pensions, No. 3, OECD Publishing. https://doi.org/10.1787/261260613084

61 Actuarial (Mathematical) Modeling of Mortality and Survival Curves

1589

Austad SN (2006) Why women live longer than men: sex differences in longevity. Gend Med 3(2):79–92 Baker T, Simon J (2002) Embracing risk: the changing culture of insurance and responsibility. University of Chicago Press, Chicago Baker DW, Wolf MS, Feinglass J (2007) Health literacy and mortality among elderly persons. Arch Intern Med 167(14):1503–1509. https://doi.org/10.1001/archinte.167.14.1503 Biffis E (2005) Affine processes for dynamic mortality and actuarial valuations. Insur Math Econ 37(3):443–468 Brockett PL (1984) General bivariate Makeham laws. Scand Actuar J 1984(3):150–156. https:// doi.org/10.1080/03461238.1984.10413763 Brouhns N, Denuit M, Vermunt JK (2002) Measuring the longevity risk in mortality projections. Bull Swiss Assoc Actuar (2):105–130 Butrica B, Smith KE, Toder E (2009) How will the stock market collapse affect retirement incomes. The Retirement Policy Program at The Urban Institute, Brief No. 20. June Cairns AJG, Blake D, Dowd K (2006) A two-factor model for stochastic mortality with parameter uncertainty: theory and calibration. Insur Math Econ 73:687–718 Cairns AJG, Blake D, Dowd K, Coughlan GD, Epstein D, Ong A, Balevich I (2009) A quantitative comparison of stochastic mortality models using data from England and Wales and the United States. North Am Actuar J 13(1):1–35. https://doi.org/10.1080/10920277.2009.10597538 Carriere JF (2000) Bivariate survival models for coupled lives. Scand Actuar J 2000(1):17–32. https://doi.org/10.1080/034612300750066700 Carter LR, Lee R (1992) Modeling and forecasting US sex differentials in mortality. Int J Forecast 8(3):393–411 Cherubini U, Luciano E, Vecchiato W (2004) Copula methods in finance. Wiley, Hoboken CIMA (2008) The pension liability. http://www.cimaglobal.com/Documents/ImportedDocuments/ cid_execrep_pension_liability_Feb08.pdf Clark G (1999) Betting on lives: the culture of life insurance in England, 1695–1775. Manchester University Press, Manchester Currie ID, Durban M, Eilers PHC (2004) Smoothing and forecasting mortality rates. Stat Model 4:279–298 Dabrowska BM (1988) Kaplan-Meier estimate on the plane. Ann Stat 16(4):1475–1489 Dahl M (2004) Stochastic mortality in life insurance: market reserves and mortality-linked insurance contracts. Insur Math Econ 35:113–136 Dahl M (2005) On mortality and investment risk in life insurance. http://web.math.ku.dk/noter/ filer/phd05md.pdf De Moivre A (1725) Annuities upon lives. Annuities upon lives or the valuation of annuities on any number of lives as also of reversions. London, William Pearson publ. The second edition of annuities upon lives was published in 1743 De Moivre A (1752) Annuities on lives, with several tables, exhibiting at one view, the value of lives for different rates of interest, Fourth Edition. Printed for A. Millar, over against Catherine Street, in the Strand, London Deng Y, Brockett PL, MacMinn RD (2012) Longevity/mortality risk modeling and securities pricing. J Risk Insur 79:697–721 El-Bar AMTA (2018) An extended Gompertz-Makeham distribution with application to lifetime data. Commun Stat Simul Comput 47(8):2454–2475 El-Gohary A, Alshamrani A, Al-Otaibi AN (2013) The generalized Gompertz distribution. Appl Math Model 37:13–24 Engle RF, Granger CWJ (1987) Co-integration and error correction: representation, estimation, and testing. Econometrica 55(2):251–276 Frees EW, Carriere J, Valdez E (1996) Annuity valuation with dependent mortality. J Risk Insur 63(2):229–261 Gavrilov LA, Gavrilova NS (1996) Mortality measurement at advanced ages: a study of the social security administration death master file. North Am Actuar J 63(2):229–261

1590

P. L. Brockett and Y. Zhang

Gavrilova NS, Gavrilov LA (2011) Mortality Measurement at Advanced Ages. North Am Actuar J 15(3):432–447 Gavrilova S, Lopez O, Philippe PS (2011) A simplified model for studying bivariate mortality under right-censoring. Demografie 53(2):109–128 GBD 2015 Mortality and Causes of Death Collaborators (2016) Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980– 2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet 388:1459– 1544 Gebhardtsbauer R (2006) The future of defined benefit (DB) plans – Keynote speech at the National Plan Sponsor Conference. The future of DB plans, Washington, DC Girosi F, King G (2007) Understanding the Lee-Carter mortality forecasting method. https://gking. harvard.edu/files/gking/files/lc.pdf Gompertz B (1825) On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philos Trans R Soc Lond 115(1825):513–583 Government Accountability Office (GAO) (2008) Fiscal year 2008 financial report of the United States Government (Report). https://www.gao.gov/financial_pdfs/fy2008/08frusg.pdf Graunt J (1662) Natural and political observations mentioned in a following index, and made upon the bills of mortality. London, John Martin publ Gribkova S, Lopez O, Saint-Pierre P (2013) A simplified model for studying bivariate mortality under right-censoring. J Multivar Analy 115(1):181–192. https://doi.org/10.1016/j.jmva.2012. 10.005 Halley E (1693) An estimate of the degrees of the mortality of mankind, drawn from curious tables of the births and funerals at the city of Breslaw; with an attempt to ascertain the price of annuities upon lives. J Inst Actuar Assur Mag 18(4):251–262. (Reprinted in 1874) Harlow M, Laurence R (2001) Growing up and growing old in ancient Rome: a life course approach. Routledge, Abingdon Hougaard P, Harvald B, Holm NV (1992) Measuring the similarities between the lifetimes of adult Danish twins born between 1881–1930. J Am Stat Assoc 87(417):17–24 Hurd TR (2009) Credit risk modeling using time-changed Brownian motion. Int J Theor Appl Financ 12:1213–1230 Hustead EC (1988) The history of actuarial mortality tables in the United States. J Insur Med 20(4):12–16 Jagger C, Sutton CJ (1991) Death after marital bereavement – is the risk increased? Stat Med 10:395–404 James IR, Segal MR (1982) On a method of mortality analysis incorporating age-year interaction, with application to prostate cancer mortality. Biometrics 38(2):433–443 Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481 Keyfitz N, Cawell H (2005) Applied mathematical demography. 3rd edn. SpringerVerlag, New York Kopf EW (1927) The early history of the annuity. L.W. Lawrence, New York Kou SG, Wang H (2004) Option pricing under a double exponential jump diffusion model. Manag Sci 50(9):1178–1192 Lee R, Carter LR (1992) Modeling and forecasting U.S. Mortality. J Am Stat Assoc 87(419): 659–671 Leon AD (2011) Trends in European life expectancy: a salutary view. Int J Epidemiol 40:271–277. https://doi.org/10.1093/ije/dyr061 Lin DY, Ying Z (1993) A simple nonparametric estimator of the bivariate survival function under univariateCensoring. Biometrika 80(3):573–581 Luciano E, Schoutens W (2006) A multivariate jump-driven financial asset. Quant Finan 29. www. carloalberto.org Luciano E, Semeraro P (2010) Multivariate time changes for Lvy asset models: characterization and calibration. J Comput Appl Math 233:1937–1953

61 Actuarial (Mathematical) Modeling of Mortality and Survival Curves

1591

Luciano E, Spreeuw J, Vigna E (2008) Modeling stochastic mortality for dependent lives. Insur Math Econ 43(2):234–244. https://doi.org/10.1016/j.insmatheco.2008.06.005 Makeham M (1860) On the law of mortality and the construction of annuity tables. Assur Mag J Inst Actuar 8(6):301–310 Manor O, Eisenbach Z (2003) Mortality after spousal loss: are there socio-demographic differences? Soc Sci Med 56:405–413 Marshall AW, Olkin I (2007) Life distributions. Structure of nonparametric, semiparametric and parametric families. Springer, New York Mathers CD, Stevens GA, Boerma T, White RA, Tobias MI (2015) Causes of international increases in older age life expectancy Lancet 385(9967):540–548 Meara ER, Richards S, Cutler DM (2008) The gap gets bigger – changes in mortality and life expectancy, by education, 1981—2000. Health Aff (Millwood) 27(2):350–360. https://doi.org/ 10.1377/hlthaff.27.2.350 Milevsky MA, Promislow D (2001) Mortality derivatives and the option to annuitize. York university finance working paper, No. MM08-1 Mitchell D, Brockett PL, Mendoza-Arriaga R, Muthuraman K (2013) Credit risk modeling using time-changed Brownian motion. Insur Math Econ 52:275–285 Munnell AH, Golub-Sass F, Soto M, Vitagliano F (2006) Why are healthy employers freezing their pensions? Issue Brief 44, Chestnut Hill, MA Center for Retirement Research at Boston College (March) Nelsen W (1969) Hazard plotting for incomplete failure data. J Qual Technol 1(1):27–52 Nelsen RB (2007) An introduction to copulas. Springer, New York Nielsen JJ, Hulman A, Witte DR (2018) Spousal cardiometabolic risk factors and incidence of type 2 diabetes: a prospective analysis from the English Longitudinal Study of Ageing. Diabetologia 61(7):1572–1580. https://doi.org/10.1007/s00125-018-4587-1. Epub 2018 Mar 8 Ogborn M (2015) Equitable assurances. Routledge, Abingdon Pearce J, Millett M, Struck M (2015) Burial, society and context in the Roman world. Oxbow Books, Oxford Renshaw AE, Haberman S (2006) A cohort-based extension to the Lee–Carter model for mortality reduction factors. Insur Math Econ 38:556–570 Riffi M (2018) A generalized transmuted Gompertz-Makeham distribution. J Sci Eng Res 5(8):252–266 Samji H, Cescon A, Hogg RS, Modur SP, Althoff KN et al (2013) Closing the gap: increases in life expectancy among treated HIV-positive individuals in the United States and Canada. PLoS One 8(12):e81355. https://doi.org/10.1371/journal.pone.0081355 Seifter A, Singh S, McArdle PF, Ryan KA, Shuldiner AR, Mitchell BD, Schäffer AA (2014) Analysis of the bereavement effect after the death of a spouse in the Amish: a populationbased retrospective cohort study. BMJ Open 4:e003670. https://doi.org/10.1136/bmjopen-2013003670 Sklar A (1973) Random variables, joint distribution functions, and copulas. Kybernetika 9(6): 449–460 Spreeuw J (2006) Types of dependence and time-dependent association between two lifetimes in single parameter copula models. Scand Actuar J 2006(5):286–309. https://doi.org/10.1080/ 03461230600952880 U.S. Bureau of Labor Statistics (2008) Current labor statistics monthly labor review, June 2008. https://www.bls.gov/opub/mlr/2008/06/cls0806.pdf Wilmoth JR (1993) Computational methods for fitting and extrapolating the Lee-Carter model of mortality change. Technical Report, University of California, Berkeley Zhang Y, Brockett P (2019) Modeling stochastic mortality for joint lives through subordinators. Working paper

Mathematics in the Maritime

62

Kyra Mycroft and Bharath Sriraman

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculating Latitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculating Longitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Map Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Global Positioning Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Least Squares Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Advent of Insurance and Actuarial Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1594 1597 1599 1601 1603 1607 1608 1610 1611 1611

Abstract In this chapter maritime history is explored in relation to the mathematics that arose from solving basic problems of determining ones location on the ocean, viz., latitude and longitude. Celestial navigation and GPS navigation techniques arose as a result of the need to accurately travel from one point to another and determine locations in transit. Oceans cover over 70% of the earth’s surface, and its exploration is traced back to ancient civilizations attempting to travel upon the water. Over time, different devices were invented for navigation across the ocean, each device becoming more sophisticated but using similar mathematical themes. Navigation on the ocean also presented new risks to safety and marked the advent

K. Mycroft () University of Montana, Missoula, MT, USA e-mail: [email protected] B. Sriraman Department of Mathematical Sciences, University of Montana, Missoula, MT, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_137

1593

1594

K. Mycroft and B. Sriraman

of insurance and actuarial science. These secondary aspects of maritime history are also discussed.

Keywords Latitude · Longitude · Map making · Navigation · Least squares · Global positioning systems · Insurance · Risk

Introduction Humanity has often struggled with finding its way, philosophically and realistically. Navigating within one’s comfort zone is done with ease and familiarity. People may be tempted to remain within their comfort zone, but our collective history indicates that there have always been people pushing to find new places away from what they are most familiar with for a variety of reasons such as finding new resources to sustain life and much later for the purposes of trade and conquest. To accomplish this also meant finding easier ways to travel and trade across oceans that cover 70% of the earth’s surface area, although they were not always safe for goods. Thus, over time navigation at sea presented new obstacles to be overcome and a problem for maintaining quality of goods. Navigation is loosely defined as the process of moving a vessel from one location to another. In this chapter we are concerned with navigation of marine vessels across bodies of water. Navigation includes ascertaining position at sea in transit from one point to another (i.e., determining latitude and longitude). The history of mankind is intertwined with the history of navigation – this is evident in the migration of peoples across the planet using the oceans as their highway. Polynesian navigation in ancient times has been the subject of many recent studies, especially since the ancient Polynesians were able to navigate and populate remote reaches of the Pacific on canoes much prior to the invention of the sextant and compass. They were adept at using celestial navigation, namely, knowledge of star configurations in addition to knowing currents and wave patterns to determine the dead reckoning for their canoes. The settlement of remote reaches of the South Pacific bears testimony to the navigational skills of the ancient Polynesians. Sea navigation has been dated back to ancient civilizations, with their routes thought to have begun in the Far East, populating the south Pacific. Other routes went past the coast of today’s Iran, then splitting into streams north into the Gulf of Aden or south into Alexandria. The sea routes were used for trade, removing the risk of bandits in the desert. The Ancient Egyptians had warships thought to be constructed during the Middle Kingdom, from 2050 BCE to 1710 BCE; however the first detailed description of an Egyptian warship is dated around the sixteenth century BCE. The captains of these ships may have had a written aid for coastal navigation known as a pilot book, or a periplus from the Greek. This book contained the routes between ports “set forth in terms of wind direction” (May et al. 2019). There

62 Mathematics in the Maritime

1595

are examples of these books dated from the fourth century BCE, in which routes, headlands, landmarks, anchorages, currents, and port entrances are all described. It is not known if sea charts were used along with these sailing guides, although there is a map drawn by Herodotus in the fifth century BCE that traces the Mediterranean shoreline very accurately. These early pilot books contained units of distances measured in a day’s sail. Later on, sailors deduced distances based on estimates of the ship’s speed and lengths of time for which the speeds were maintained. Determining the speed of the ship was not an easy feat. Likely the oldest method for estimating speed used a floating object, such as a log, that was dropped overboard from the bow of the ship. The navigator kept it in sight while walking the length of the vessel until it passed the stern of the ship while also keeping track of how long the same journey took the log. This technique was later replaced with a similar one in the seventeenth century. In this newer technique, the log was attached to a reel of light line and was dropped from the stern of the ship. A sandglass was used to measure the amount of time passed, and the length of line let out during the emptying of a sandglass was used as a measure of the ship’s speed. Some English navigators recommended the use of a line knotted at 50-ft intervals and a 30-s sandglass. Eventually, a 28-s sandglass was adopted along with intervals knotted at 47–48 ft. These measurements aligned with the nautical miles of slightly different lengths. In this scenario, speed was measured in knots where one knot was considered equal to one nautical mile traveled per hour. Across the world sea travel was being explored in China as well. There is evidence that they had constructed ships able to carry over 200 people in 200 CE. The primary purpose of the routes China used was for trade. One trade route extended from Arabia to ports of China and the other from the Chola kingdom to the coast of India. To guide their way, a mechanical engineer from China, Ma Jun (200–265 CE), invented a wheeled device that used a differential gear that allowed a fixed figurine to always point in the southern cardinal direction known as the southpointing chariot (Fig. 1). In Europe and western Asia, a device called an astrolabe was being used for navigation on the sea. The astrolabe translates roughly as “star-taker” in Greek and is thought to have been invented at the time of the Roman Empire by Claudius Ptolemy. This has not been verified, however, as earlier inventions from Egypt would have been documented on papyrus that would have decayed before discovery. By the eighth century, this device was used in the Islamic world as well, not only for sea navigation but also for religious purposes. The Muslim people would use an astrolabe to find their way to Mecca, the Holy City, in present-day Saudi Arabia, for their pilgrimage routes. The Islamic use of the astrolabe in finding their way to Mecca illustrates the importance people placed on astrology. The astrolabe was used also for making financial- and life-altering decisions, based simply on the zodiac sign arising at the time of an individual’s birth. There were, most likely, other tools similar in design used for the same purposes as the astrolabe. These tools consisted of stacks of circular disks, or plates, with sliding features. Each disk had a unique purpose. One plate was a two-dimensional

1596

K. Mycroft and B. Sriraman

Fig. 1 The Chinese south-pointing chariot (n.d.)

projection of the Earth’s latitudinal lines, and another plate contained the locations of particular well-known stars in the sky. Overlying this plate is a straight rule that pivots around to line the time measurements, and on the back of the astrolabe is a pivotable siting device to help find the altitude of the chosen star. This final feature is often the starting point for a calculation of location and distances to destination (Fig. 2). The astrolabe is an important piece of ancient navigation history, as it was likely utilized by Christopher Columbus when exploring the New World. Portuguese explorers were accustomed to using the North Star for navigation, but an astrolabe would be used when they explored near the equator and the North Star would no longer be visible to them. John Huth, a physicist at Harvard University, said “Bartolomeu Dias used the astrolabe to figure out the latitude of the Cape of Good Hope in 1488, because they were so far south that they lost Polaris” (Poppick 2017). In confirmation of this theory, there have been astrolabes and similar devices found in many shipwreck ruins of Portuguese explorers. The astrolabe is also an illustration of the importance of technology. At its prime, the astrolabe was the most advanced technology explorers had at their disposal. The astrolabe relied strongly on the position and visibility of the stars. As new rationales for science developed, people looked for a more dependable means of navigation which led to a decline in the use of astrolabes. Today, the astrolabe serves as a trophy of sorts to be put on display in a place of work. By the seventeenth and eighteenth centuries, a device called a sextant was used for more precise navigation.

62 Mathematics in the Maritime

1597

Fig. 2 Pedersen (2015)

Calculating Latitude In the northern hemisphere, the calculation of latitude at sea is relatively simple. Hipparchus the Greek astronomer noted that the position of the pole star remains invariant in relation to the other stars and sits exactly at the same angle above the horizon as one’s latitude. This is easily seen in Fig. 3. The same triangle POL is constructible at points Q, R, etc. (to get triangles QOL, ROL, etc. with the angle α) until one reaches the North Pole. In the southern hemisphere, a particular star from a constellation such as the Southern Cross may be used to replicate this calculation of latitude. The sextant is an astronomical device, designed as a tool for determining the angle between the horizon and a celestial body to then calculate latitude and maybe longitude as well. The sextant differs greatly in design from an astrolabe. Rather than a series of circular discs, the sextant consists of an arc of a circle marked in degrees, a movable radial arm pivoted at the center of the circle with a silver mirror mounted on it, and a telescope attached to the framework (Fig. 4). When calculating one’s position, the telescope is lined up with the horizon, and the radial arm is moved until the celestial body is reflected in the mirror in line with the telescope. When looking through the telescope, the star or moon should seem to coincide with the horizon. Thus, the angular distance of the star can be read from the arc of the sextant. Calculating latitude by taking a sight at noon has been a seaman’s practice for centuries. It is based on the simple observation that at noon

1598

K. Mycroft and B. Sriraman

Fig. 3 Pole star determines latitude above the horizon Fig. 4 Commons (2014)

(local time, not Greenwich Standard time) the ship and the sun are on the same line of longitude. The sun will be due north or due south of a ship at noon local time. If we assume that the day coincides with the equinox when there are 12 h of daylight distributed equally across all lines of latitude, the zenith distance (Zd ) of the sun is exactly the latitude of the ship. The zenith distance (Zd ) is obtained by subtracting

62 Mathematics in the Maritime

1599

Fig. 5 Latitude at noon with a sextant

the angle of the sight taken by the sextant (α) from 90◦ (see Fig. 5). If it is not the equinox and the sun is not directly over the equator, the latitude calculation at noon is obtained by adding or subtracting the sun’s declination, which is easily obtained from nautical tables available on board the ship. The angle together with the exact time of day can be used to determine the latitude within a few hundred meters by means of published tables. The tables used to determine position were not easy to create, and many mathematicians worked to solve the problem. The most “prodigious set of trigonometric tables in early Europe, the Opus palatinum, composed by Georg Rheticus” (Brummelen 2017), who was an Austrian born mathematician, astronomer, cartographer, and navigational-instrument maker were just the beginning of his accomplishments. These tables required exhaustive work and calculations but were extremely useful for not only calculating latitude but also calculating the size of the earth and the distance from the earth to the moon.

Calculating Longitude The strategies discussed thus far encapsulate the primary means for calculating latitude, but longitude is equally as important for an accurate estimation of position for ocean navigation. The feat of calculating longitude was a large one that many attempted and failed to achieve. The accurate calculation of longitude required the knowledge of the exact time of day, which was then compared with the time at either the destination or origin of the trip. It was known that there were 24 h in a day, so 1 h of passing time corresponds to traversing 15◦ of longitude. So once the difference in times was calculated, that difference was multiplied by 15. Additionally, the minutes and seconds would need to be divided by four to convert to degrees and minutes of

1600

K. Mycroft and B. Sriraman

arc to increase the accuracy of location. This was difficult, however, since every tool for keeping time was built from materials that would change in density due to changing temperatures, a common occurrence at sea. Prior to the eighteenth century, seafarers were flummoxed by the issue of longitude. The lack of knowledge proved to be a large dilemma as many ships met their doom due to their misguided navigation. Great scientists and mathematicians such as Isaac Newton proposed methods of calculating longitude but were consistently unsuccessful. After many years of struggle and frustration, Queen Anne issued the Longitude Act on July 8, 1714. This act was to “welcome potential solutions from any field of science or art, put forth by individuals or groups of any nationality, and to reward success handsomely” (Sobel 2007, p. 53). This act offered first, second, and third place monetary rewards based on the accuracy of the proposed method. The third place required accuracy within 1◦ , which translates to 60 nautical miles, or 68 geographical miles. This large allowance for error illustrates the urgency that the rulers were experiencing. The Longitude Act also established the Board of Longitude, a panel of judges responsible for the distribution of the prize money. The members of this board had a variety of qualifications and backgrounds, some were scientists, other naval officers, and even others government officials. Each man held importance and prestige and took pride in it. These men were able to offer incentive awards to “help impoverished inventors bring promising ideas to fruition” (Sobel 2007, p. 54). For many years the Board received letters filled with proposals and ideas, but the Board consistently determined each unworthy of further action. Meanwhile, a man by the name of John Harrison was making a name as a talented carpenter and clockmaker. John Harrison hailed from the county of Yorkshire and quickly proved himself to be a skilled craftsman. Harrison completed his project of constructing a pendulum clock before he had reached the age of 20, and the clock continues to tick still today in a museum located in London. In 1727, Harrison began the journey to solving the longitude problem from a mechanical perspective. He worked with his brother to develop a clock that would not lose time due to the changing weather. After 3 years, Harrison had developed a replacement for a pendulum consisting of “a springing set of seesaws, self-contained and counterbalanced to withstand the wildest waves” (Sobel 2007, p. 73). Thus, in the summer of 1730, Harrison arrived in London to present his newest promising invention to the Board of Longitude. The Board however was not ready to accept a mechanical solution when they viewed the problem as astronomical. They entertained Harrison’s idea and sent him on his way to continue development without making any promises. The Board referred Harrison to a clockmaker in London to present his plans, hoping to discourage him from pursuing his idea. Instead, a man with a similar mind, George Graham, was excited by Harrison’s proposition and spent the entire afternoon discussing his theory. Soon, Harrison had the funds necessary to begin building the first sea clock, and he hoped to someday repay Graham with his prize money from the Longitude Act. He developed several prototypes and presented

62 Mathematics in the Maritime

1601

each to the members of the Board but was continually denied. Eventually in 1764, Harrison successfully swayed the board with the chronometer, the first spring-based clock. Previously, clocks were futile at sea as they required a pendulum. A pendulum loses its accountability once on the rolling seas due to the constantly shifting temperature and weather. A spring-based chronometer, however, is more easily maintained and was found to be accurate within 8 miles (Manu 2018). This was a vast improvement from previous methods and a great achievement for Harrison. The maintenance of the chronometer included a daily routine of winding, careful protection against damage from the moving ship, and careful recording of the chronometer time by comparison with the time signal on the radio. Although this may initially sound cumbersome, it is a significant improvement compared to the loss of goods caused by the unknown.

Map Making Solving the latitude and longitude problem did not change the fact that accurate maps were needed for ships to reach their desired destinations. History seems to be replete with “navigational” mistakes that led to the discovery of unexpected places! The Flemish cartographer, Gerardus Mercator (1512–1594), created a map that allowed for navigation using a line of constant course, called a rhumb line or more technically a loxodrome, on a sphere. Mercator maps allowed ships to sail from point A to point B using the method of dead reckoning with a magnetic compass, i.e., simply follow a compass reading to a destination based on the fact that the course was a “line” that cut the lines of longitude at the same angle (Fig. 6). The map was constructed by Mercator, by scaling the space between latitudes, which caused distortion in sizes of landmasses as one moved further north or south, but this did not have any bearing (pun intended) on its navigational use. This distortion caused by scaling can be visualized in Fig. 7, where the radius of latitudes moving further north or south shrinks by a scaling factor of Rcosθ. Assuming the earth is a sphere of radius R = 1, the stretching factor needed in the North-South direction to make a map conformal, i.e., to preserve angles on the Mercator projection would be f (θ ) = sec (θ), since cosθsec(θ) = R = 1, the radius of the sphere (Fig. 8). Carslaw (1924) claimed that Mercator constructed his map using a compass and a straight-edge although no documentation of his method of construction exists other than the maps themselves. The original map was pieced together in 18 sections by Mercator (see Fig. 9). The mathematical accuracy of Mercator’s map was determined by the English mathematician Edward Wright in 1599 by calculating the scaling factor needed in the North-South direction as a function of the latitude. This table converted◦ 1 latitudes into distances from the equator with an interval of 1 min of arc, or 60

1602

K. Mycroft and B. Sriraman

Fig. 6 Rhumb lines (Wikipedia)



for all latitudes up to 75 with distortion occurring as one moved further and further north or south (see Fig. 10). For instance, taking the total of these secants in the ◦ ◦ table multiplied by the interval results in 15.6163 × 5 = 78.0815 , which means ◦ the 60th line of latitude would be placed at 78.0815 in this projection. In modern terms Wright was simply the numerical integration of the secant function which was accomplished before the formalization of calculus.

62 Mathematics in the Maritime Fig. 7 Shrinking parallels (Vezie 2016)

1603

R cos θ

θ R

θ

Fig. 8 f (θ) = sec (θ) (Sriraman and Lande 2018)

Global Positioning Systems Technology today allows for more sophisticated means of navigation by using a global positioning system, more commonly known simply as a GPS. A GPS is aptly named, for it uses satellites around the globe to allow the user to precisely calculate their latitude, longitude, and altitude at any given moment. In addition to requiring much less maintenance, a GPS can calculate one’s position with much greater ease than a chronometer. In order to calculate the exact location, the GPS ideally communicates with four satellites to locate its position on earth, activating

1604

K. Mycroft and B. Sriraman

Fig. 9 Mercator’s map of the world (1569) Fig. 10 Table of secants approximated at an interval of 5◦ (Sriraman and Lande 2018)

Table of Secants Secant 5° 1.0038 Secant 10° 1.0154 Secant 15° 1.0353 Secant 20° 1.0642 Secant 25° 1.1034 Secant 30° 1.1547 Secant 35° 1.2208 Secant 40° 1.3054 Secant 45° 1.4142 Secant 50° 1.5557 Secant 55° 1.7434 Secant 60° 2.0000 Total 15.6163

the space segment. The GPS satellite transmission contains a pseudo-random code, ephemeris, and almanac date. The pseudo-random code is responsible for identifying which satellite is transmitting. The satellites transmit information containing the condition of the satellite, current date, and the time which is known as ephemeris data. The ephemeris data is essential for determining a position for navigation. By transmitting with a minimum of three satellites, a GPS can calculate a 2D position determining latitude and

62 Mathematics in the Maritime

1605

longitude. By adding a fourth satellite with which to communicate, a 3D position can be determined, finding latitude, longitude, and altitude. There are three components to a GPS that are utilized to complete this process, the space segment, control segment, and the user segment. The space segment provides the GPS user with five to eight visible satellites from any point on the earth. These satellites will orbit the earth at a constant rate, and each satellite follows the same path daily. This consistency is important for calculating accurate location. The control segment involves a system of tracking stations throughout the world that monitor the satellite signals. The stations then utilize the satellite signals to create “orbital models for those satellites” (House 1997, p. 37). These orbital models are used to calculate precise orbital data and clock corrections for the satellites. The master control station in Colorado sends the data to a satellite. The satellite then communicates with a GPS through a radio signal, relaying part of the information to the user segment. The user segment is the third feature of the GPS. This segment consists of the GPS receivers and their users. The receiver works to “convert satellite signals into position, velocity, and time estimates” (House 1997, p. 37). The GPS requires data from, ideally, four satellites in order to compute latitude, longitude, altitude, and time. The latitude, longitude, and altitude segments together determine the users’ position on the earth. The triangulation process is a vital piece of the mathematics involved in a GPS. This process is used to determine the unknown position from one or more known positions based on their distances apart. For a three-dimensional GPS, the positions of the satellites sending the radio signals are known, and thus the user’s unknown position can be found. This calculation is dependent on radio signals travelling at a constant speed to use the travel time of each signal. Three satellites will communicate with the GPS transmitter, and comparing the travel times of these radio signals to the GPS will complete the calculation. In a two-dimensional setting, the time error can be found by examining the locus of points where the three satellites waves intersect as part of a hyperbola. For clarification, consider the following example (House 1997). Information on the signal transmission times from transmitter broadcasts radio stations is given where each signal travels at the same rate but reaches the ship at different times. This information can be used to determine a ships position and the amount of receiver clock error. Let two points F1 and F2 be the center of a circle from which the ship is receiving the radio signals. Let the radii of the circles be F1 A and F2 B, respectively. These lengths also represent the distance from the signal transmitters to a GPS when calculated with some unknown receiver clock error. Let r1 and r2 be the true distance from each transmitter to the GPS and e be the error for both distance calculation. The two necessary equations will then be: F1 A = r 1 + e F2 A = r2 + e.

1606

K. Mycroft and B. Sriraman

The absolute difference of these equations then yields: | r1 − r2 |=| r1 + e − (r2 + e) |=| F1 A − F2 A | This difference can be solved with the signal transmission times from each broadcasting station which can then be used in an equation of a hyperbola to determine the intersection point of the broadcasting signals. This intersection point will be the ships location (House 1997, p. 45). A similar process can be extended to a three-dimensional setting using the Pythagorean Theorem as appropriate. In this case, a system of multiple equations and variables can be used to determine position, a calculation a GPS provides with ease. An example of such a system is shown below using Cartesian coordinates:  D1 = (x − x1 )2 + (y − y1 )2 + (z − z1 )2

D2 =

 (x − x2 )2 + (y − y2 )2 + (z − z2 )2

D3 =

 (x − x3 )2 + (y − y3 )2 + (z − z3 )2

This system is different from a GPS in that a GPS utilizes a system of longitude and latitude expressed in degrees, minutes, and seconds instead of a Cartesian coordinate system. The two systems are analogous in that the X-axis coordinates in the Cartesian system correspond to latitude and longitude in a GPS. If one were to solve the system previously referenced, conversions would be needed. Let R be the approximate value of the earth’s radius, A be latitude, and O be longitude, the conversions are as follows: A = arcsin

O = arctan

z R y  x

.

The amount of time taken for the GPS to receive an accurate satellite signal is affected by the speed of the radio signal and signal propagation. An error as small as 1 minute could potentially “translate into hundreds of miles (or more) of error” (House 1997, p. 47). Due to the necessity of accuracy, an algorithm must be used for the position computation to minimize the errors, and the least squares method is most commonly used in cases with multiple systems and unknowns.

62 Mathematics in the Maritime

1607

The Least Squares Method The method of least squares is considered to be the automobile of modern statistical analysis. This method can be used to solve equations in which there are more unknowns than possible solutions and can be used to fit a generalized linear model on a graph. Algebraically, this is done by evaluating the ordered pairs of estimated location. A Cartesian coordinate system can be used as an example, and the conversions above can be used to evaluate actual latitude and longitude. Consider  a set of data points (xi , yi ) and fitting a model of the form y = ax + b. Due to different errors, the deviation of each data point is defined by: di = yi − yi = axi + b − yi , for i = number of data points. To solve, one will find the sum of the squares of the errors and minimize this sum:  S= (di )2 



This sum will then be in terms of a s and b s. The next step is to find the partial derivative of S with respect to a, ∂S ∂a , and solve for a. Then solve the partial derivative , and solve for b. These terms will provide a model for a of S with respect to b, ∂S ∂b line of best fit to be applied to the data from which an error-minimized location can be found. Technology can utilize a similar by  systematically varying the  method  terms a and b until a minimum for S = (y − y )2 = (di )2 is determined. The discovery of this method constitutes one of the “most famous priority dispute in the history of science” (Stigler 1981, p. 465), as it was unsure who should receive credit, Carl Friedrich Gauss or Adrien Marie Legendre. However, there has been more evidence compiled that indicates Gauss “probably possessed the method well before Legendre, but that he was unsuccessful in communicating it to his contemporaries” (Stigler 1981, p. 465). Gauss worked as surveyor, so it is probable that he developed this method before Legendre. However, the rivalry between their home countries, Gauss representing England and Legendre representing France, is extreme, so there may never be a united contribution to either man. Gauss is commonly referred to as a mathematician, but his first meaningful contribution was in the field of astronomy. He began work in 1805 on Theoria Motus Corporum Coelestium, “in which he described his techniques for computing orbits and gave his first probabilistic justification of the principle of least squares” (Stewart 1995, p. 2). Gauss finished his work in 1806, but due to Napoleon’s conquering of Germany, Gauss was urged by his publisher to translate his work into Latin. This delay resulted in his work not being published until 1809. During those 3 years that Gauss spent translating his otherwise completed work, Legendre worked, published, and named the method of least squares on his own “in an appendix to a memoir on the orbits of comets” (Stewart 1995, p. 2).

1608

K. Mycroft and B. Sriraman

The method of least squares is used in the case of GPS to “find the receiver position from pseudo-ranges to four or more satellites” (He and Bilgic 2011, p. 204) where the pseudo-range refers to the measured distance from satellites to the GPS receiver. The equation for the pseudo-range can be computed and linearized in order to utilize the least squares method and compute positions with reasonable positioning accuracy. Combining a GPS with the Inertial Navigation System, or INS, produces an even more accurate navigational system. A GPS is accurate only when the sky is clear, allowing open communication with the four satellites overhead. The INS can help cover these gaps to allow accurate navigation regardless of weather. However, the INS “generally has good short-term navigation performance but drifts over time” (El-Diasty 2010, p. 1). The GPS is able to account for this shortfall as well. Thus, combined, it creates a “drift free, high-accuracy, high-rate navigation solution” (El-Diasty 2010, p. 1). The theories and beginnings of the mathematics behind the astrolabe relied on the position of the earth and the surrounding celestial bodies. The theory of the sextant relies on calculating the angles between the horizon and a celestial body. The theory of the global positioning system relies on the positioning of the surrounding satellites. In order to be utilized, the GPS must be able to communicate with at least three different satellites, but as previously emphasized, four satellites are optimal. The technology of the global positioning system raises the question if celestial navigation is still necessary today. A GPS can easily fit in a pocket and can be accurate in all corners of the earth. However, how will those at sea today find their location and bearings if a GPS is rendered useless? It is perhaps the greatest drawback to technology today for if a part of the system fails, everything is lost. In this case, if space systems become unavailable, the GPS is unusable. The US Navy has been relying on the accuracy of the GPS, but this reliance increases the vulnerability of US national security space systems. As a result, the Navy has returned to educating on celestial navigation (Smith 2016). This ability to utilize other means of celestial navigation decreases the dependence on technology and electronics. For the layman, however, the knowledge of celestial navigation may not carry many, if any, benefits. The published tables for calculating latitude and longitude would not be a welcome addition for an avid outdoorsman.

The Advent of Insurance and Actuarial Science As sea travel is the patriarch to today’s systems of navigation, so also sea travel is the patriarch to today’s insurance policies. Before ocean navigation, ancient civilizations would traverse long distances by foot to trade their goods. Travelers were wary of bandits in the desert, but they were willing to face the risk individually for their own gain. With maritime navigation arising, risks were delegated among groups of people. As previously discussed, it took skill and training to navigate the ocean, so tradesmen would employ sailors to transport their goods for trade rather than attempting to brave the ocean individually.

62 Mathematics in the Maritime

1609

Along with this exchange and trust of goods came the transfer of risk across the group of people involved. The earliest record of insurance dates to the Babylonian period (2250 BCE), when the Babylonians appeared to have developed a loan insurance for when their business required maritime navigation. A merchant would take a loan in order to pay to ship his goods. Additionally, the merchant would typically “pay the lender an additional premium in exchange for the lender’s guarantee to cancel the loan should the ship be stolen or lost at sea” (Buckham et al. 2010, p. 2). Thus, the lender assumed the risks of the merchant’s goods at a premium rate of interest, referred to as a Bottomry Bond. The loan is named in “reference to the fact that lenders could claim the ship itself if there not paid back” (2016) by the agreed time. If the ship sank or lost the cargo, the merchant did not have to repay the loan. This is not an independent insurance contract, but the system was adopted by the Greeks, Romans, and other Italian city-states. There were other forms of insurance as well. At times, merchants might travel on the ocean themselves as well, paying a sailor for the travel. If a storm were to hit while in the middle of the ocean, the men might think it necessary to remove cargo from the ship. How do they determine who will suffer the loss of their merchandise? It is certainly not an argument to be facing in the middle of the storm. Thus, before departure those interested in traveling together would pool goods together to make up for the loss of anyone’s merchandise. This practice was soon enforced by law and was part of “commercial code of all maritime nations” (Winter 1919, p. 2) even in the early 1900s. This form of insurance was called a General Average, where the amount merchants paid was an average of all goods aboard. The Bottomry Bond and the General Average were different systems, but both served to mitigate risks to marine travel and trade. Throughout the years, trade routes began to expand across the world. These trade routes encompassed the globe, from Europe to China to South America. The cross-Atlantic voyages from Europe to South America presented greater risks due to the length of the voyage. Thankfully other forms of insurance were arising. An impactful and more modern form of insurance spreads in popularity, originating simultaneously from Pisa and Florence Italy during the fourteenth century. This insurance policy was simple, involving a premium paid against risk to an underwriter, and was referred to in Italy as “insurance all Fiorentina” (Ebert 2011, p. 103). In this scenario, the seller of the policy assumed the risk of loss or damage to the merchandise rather than securing his capital in a loan. If a voyage did not go as planned and ended unsuccessfully, the policy holders could abandon their cargo and leave it to the insurer to fret about. The insurer could try and find the lost cargo and recover what they could of their investment. This premium insurance spreads in use beyond maritime use due to the simplicity of the process but was still primarily associated with maritime trade. The premium amount the insured paid had a large variance among purchasers which was reflected by the insurer’s perception of each individual’s risk. The Venice branch of the Medici Company regularly insured trips between England and Venice by the middle of the fifteenth century. This company had premium rates that would

1610

K. Mycroft and B. Sriraman

depend on how “well armed the ships were” (Ebert 2011, p. 103) which ranged from 3% to 7%. Therefore, to determine the rates, the merchants who insured the vessel would consider “the type of vessel, reputation of the captain, destination, season, cargo, piracy, corruption, and war” (Buckham et al. 2010, p. 4). The merchants did not formally use probability theory in the statistical sense, but they did need to rely on their intuition, subjective experience, and objective records to try and guide their estimations of the risk. The premium insurance policies spread beyond the borders of Italy to cover trade routes in West Africa, Indian Ocean, and the sugar trades with Brazil. The trade routes to Brazil were often on dangerous routes and poorly armed ships. Thus, the premium insurance was a logical choice for risk management. Premium insurance spreads in use for countries other than Italy as well as for the Spanish trade, although Bottomry loans were used occasionally as well. The Spanish fleets sailing between Spain and the Americas by the sixteenth century, known as the “Carrera de Indias” (Ebert 2011, p. 104), utilized the premium insurance practices. Alternatively, the Dutch Baltic shipping did not begin implementing the premium-based insurance until the seventeenth century. Although the Dutch Baltic shipping company was slow on the uptake, the unanimous participation in Europe shows the worth of the system. The use of insurance remained primarily in the hands of Europeans until the end of the nineteenth century when it began to expand into other cultural realms. The spread of modern insurance began “directly from India by European merchants who had settled there” (Borscheid and Haueter 2015, p. 209) and subsequently expanded throughout Asia to China, the Malay Peninsula, Australia, and eventually reaching Japan and other eastern countries via the trading networks from India. Marine insurance began to spread in Asia only after the East India Company “transformed itself from a trading company into an instrument of government” (Borscheid and Haueter 2015, p. 210). The use of marine insurance has been highlighted, but the concept had begun to expand beyond ocean transport. Today insurance is a necessary component of wealth planning and risk management to protect against the loss of automobiles, homes, life, or health. In modern-day America, some forms of insurance are even required by law, a further illustration of the importance of a system marine navigation inspired.

Conclusion In a utopian world, perhaps people would be able to remain completely within their comfort zone their entire life. In this world, however, people are constantly required to be pushed out of their zones of comfort in order to grow and progress. Along with this advancement outside of what is secure come the unknown and the perception of many risks. Thankfully, both of these obstacles are easily maneuvered with the tools brought around through marine navigation. When one is equipped with a global positioning system and an insurance policy, there is a significant decrease in the number and severity of poor outcomes left up to fate. On the other hand, even if

62 Mathematics in the Maritime

1611

GPS systems fail, the celestial navigational tools that guided sailors for centuries are still available to navigate a vessel from one point to another!

Cross-References  Actuarial (Mathematical) Modeling of Mortality and Survival Curves

References Borscheid P, Haueter N (2015) Institutional transfer: the beginnings of Insurance in Southeast Asia. Bus Hist Rev 89(2):207–228. https://doi.org/10.1017/S0007680515000331 Brummelen GV (2017) Heavenly mathematics: the forgotten art of spherical trigonometry. Princeton University Press, Princeton Buckham D, Wahl J, Rose S (2010) Executive’s guide to solvency II. SAS Institute, Cary Carslaw HS (1924) The story of Mercator’s map. Math Gaz 12(168):1–7 Commons W (2014, July 10) File: Chambers 1908. Sextant.png. Retrieved 25 Feb 2019. https://commons.wikimedia.org/w/index.php?title=File:Chambers_1908_Sextant. png&oldid=128661596 Ebert C (2011) Early modern Atlantic trade and the development of maritime insurance to 1630. Past Present 213:87–114 El-Diasty M (2010) Development of a MEMS-based INS/GPS vessel navigation system for marine applications (Order No. NR64919). Available from ProQuest Dissertations & Theses Global. (748279790). Retrieved from https://search-proquest-com.weblib.lib.umt.edu:2443/docview/ 748279790?accountid=14593 He Y, Bilgic A (2011) Iterative least squares method for global positioning system. Adv Radio Sci 9:203–208. https://doi.org/10.5194/ars-9-203-2011 House P (1997) Mission mathematics: linking aerospace and the NCTM Standards, 9–12. National Council of Teachers of Mathematics, Reston, pp 37–62 Manu (2018, November 12) The Marine Chronometer and its use on board ships. Retrieved 3 Feb 2019, from https://www.brighthubengineering.com/seafaring/25886-the-marine-chronometera-breakthrough-in-celestial-navigation/ May WE, Howard JL, Jones S, Logsdon TS, Richey MW, Anderson EW (2019, February 07) Navigation. Retrieved from https://www.britannica.com/technology/navigation-technology Pedersen EJ (2015, June 25) Illustration – Astrolabe. Retrieved 24 Feb 2019, from https://www. ericjpedersen.net/astrolabe/ Poppick L (2017, January 31) The story of the Astrolabe, the original smartphone. Retrieved 30 Jan 2019, from https://www.smithsonianmag.com/innovation/astrolabe-original-smartphone180961981/ Smith M (2016, September 19) Navy resumes teaching celestial navigation just in case GPS is neutralized – UPDATE. Retrieved from https://spacepolicyonline.com/news/navy-resumesteaching-celestial-navigation-just-in-case-gps-is-neutralized/ Sobel D (2007) Longitude. Walker Publishing Company, New York Sriraman B, Lande D (2018) “Integrating” creativity and technology through interpolation. In: Freiman V, Tassell J (eds) Creativity and technology in mathematics education. Mathematics education in the digital era, vol 10. Springer, Cham, pp 399–411 Stewart G (1995) Gauss, statistics, and Gaussian elimination. J Comput Graph Stat 4(1):1–11. https://doi.org/10.2307/1390624 Stigler S (1981) Gauss and the invention of least squares. Ann Stat 9(3):465–474. Retrieved from http://www.jstor.org.weblib.lib.umt.edu:8080/stable/2240811

1612

K. Mycroft and B. Sriraman

The Chinese South-Pointing Chariot (n.d.). Retrieved from https://www.lockhaven.edu/ ~dsimanek/make-chinese/southpointingcarriage.htm Vezie K (2016) Mercator’s projection: a comparative analysis of rhumb lines and great circles. Senior Thesis, Whitman College, Walla Wall Winter W (1919) Marine insurance: its principles and practice. McGraw-Hill Book, New York

Mathematics and Economics, with Special Attention to Social Choice Theory

63

Maurice Salles

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematics in Economics, Game Theory, and Social Choice Theory . . . . . . . . . . . . . . . . . General Equilibrium Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Social Choice Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Game Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Use of Mathematics in Economics Questioned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion: The Indispensability of Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1614 1619 1619 1623 1627 1629 1632 1634

Abstract The historical context of the usage of mathematics in economics and some other related social sciences is briefly described. This usage is then emphasized in three specific topics: (1) the analysis of a competitive market, regarding in particular the existence of equilibrium and the relations between the concepts of equilibrium and of optimality; (2) in social choice theory, Arrow’s theorem and results about the strategy-proofness of voting procedures; and (3) in game theory, focusing on cooperative games, the relations between the concepts of core and of competitive equilibrium, and an analysis of Shapley-Shubik voting power index in the case of the voting procedure used within the UN Security Council. The use of mathematics in economics has been questioned. Some responses

M. Salles () CREM (UMR-CNRS 6211), University of Caen-Normandy, Caen Cedex, France CPNSS, London School of Economics, London, UK Murat Sertel Center for Advanced Economic Studies, Bilgi University, Istanbul, Turkey e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_48

1613

1614

M. Salles

to this questioning are provided. Finally, the thesis of the “indispensability of mathematics” is defended in the case of the social sciences.

Keywords Mathematics · Economics · Social choice · Game theory · General equilibrium · Pareto-optimality · Arrow’s theorem · Strategy-proofness · Voting · Heterodoxy in economics · Indispensability of mathematics

Introduction The fact that mathematics has pervaded economics will be obvious to anyone who quickly glances at the highly regarded economics journals such as the American Economic Review, the Quarterly Journal of Economics, or the Journal of Political Economy. Although other social and human sciences, such as say, psychology or political science, can make usage of mathematics, this usage is less developed than in economics. A common explanation of this situation is that numbers rule the economy: prices, quantities of goods, rates of growth, levels of incomes and taxes, and so forth. Already in Aristotle, one can find a pertinent analysis of the exchange process. In Book V of Nicomachean Ethics (Aristotle 1984, page 1788), Aristotle writes: . . . all things that are exchanged must be somehow commensurable. It is for this end that money has been introduced. . . The number of shoes exchanged for a house [or for a given amount of food] must therefore correspond to the ratio of builder to shoemaker.

Some contemporary commentators, for instance, Robert Gallagher (2018), do not hesitate to consider this “ratio” as a mathematical fraction open to arithmetical operations, even if the formal nature of the implied numbers (natural numbers, positive reals etc.) is unclear. I will mainly consider two topics, alluding in passing to other subjects, topics which have been, in my view, central in the use of mathematics: general equilibrium analysis and social choice theory. Social choice theory is concerned with the selection of options on the basis of individual data over these options. This selection can be the outcome of a mechanism as in the case of voting or the outcome of a subjective process performed by an individual on the basis of her views regarding fairness, social justice, etc. This dichotomy, opposing the aggregation of interests to the aggregation of judgments, is due to Amartya K. Sen (1977, 1982). Regarding the aggregation of interests and the difficulty to reach an outcome, a famous example takes us back to Roman antiquity. In Letter 14 of Book VIII (Pliny the Younger 1969), Pliny the Younger describes a decision problem within the Roman senate. Pliny explains that “the case at issue concerned the freedmen of the consul Afranius Dexter, who had been found dead; it was not known whether

63 Mathematics and Economics, with Special Attention to Social Choice Theory

1615

he had killed himself or his servants were responsible, and, if the latter, whether they act criminally or in obedience to their master.” Three possible outcomes are suggested: acquittal, banishment, or death. Pliny (himself a member of the senate) wished to have a vote by plurality rule, each senator proposing an outcome, the winning outcome being the outcome which was proposed by the largest number of senators. Pliny did persuade the senators to use this plurality voting rule. Pliny was obviously in favor of acquittal and thought that the number of senators in favor of acquittal exceeded the number of those who were in favor of banishment and the number of those who were in favor of death. Unfortunately, for Pliny, a sufficient number of senators who were in favor of the capital punishment decided to join those who were in favor of banishment, and banishment carried the day (although this is not said in Pliny’s letter, but it seems clear to social choice theorists even if some translators did not understand the real situation (on this see Maurice Salles 2016)). Pliny in his letter gives an example of a voting rule which is manipulated by a coalition in the modern sense: a group of individuals rather than voting for their preferred option force the voting rule to generate an outcome which they prefer to the outcome which would have prevailed if they had voted for their preferred option. The fact that all deterministic voting rules are manipulable by a single individual (except dictatorship) is one of the most important results of social choice theory. It is been demonstrated independently by Allan Gibbard, Prasanta Pattanaik, and Mark Satterthwaite in the 1970s, thanks to an appropriate mathematical formalism. It is difficult, if not impossible, to give even a vague date for the introduction of (some) mathematics in economics. Paola Tubaro (2016) provides interesting indications. I believe, however, that a date can be fixed for the first book-length mathematical treatment of a social science topic. The topic is judgment aggregation and voting theory, the date is 1785, and the author is the French visionary scientist Nicolas de Condorcet. The book is entitled Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. The main objective of this book concerns the probability to make correct decisions in a group of individuals. If the probability for an individual to have a correct opinion is greater than 0.5, majorities are more likely to select the correct opinion, and this likelihood will increase with the number of decision-makers. The book also includes an example showing that majority rule can generate a cyclic outcome. In Condorcet’s example, there are 60 voters who rank three candidates A, B, and C according to their preference. The situation is the following: 23 voters rank the candidates ABC (A first, B second, and C third); 17 voters rank them BCA; 2 voters, BAC; 10 voters, CAB; and 8 voters CBA. So, 32 voters prefer A to B, 35 prefer C to A, and 42 B to C. Of course, one can obtain a cycle with a very simple example with three voters, one ranking the candidates ABC; the second, BCA; and the third, CAB. Two quotes from Condorcet’s Esquisse d’un tableau historique des progrès de l’esprit humain (1794–1795) exemplify the fact that he was a visionary. On page 300 he writes:

1616

M. Salles

Nous ferons voir que les déclamations contre l’inutilité des théories,1 même pour les arts les plus simples, n’ont jamais prouvé que l’ignorance des déclamateurs. . . Ces observations conduiront à cette vérité générale, que dans tous les arts, les vérités de la théorie sont nécessairement modifiées dans la pratique; qu’il existe des inexactitudes réellement inévitables, dont il faut chercher à rendre l’effet insensible sans se livrer au chimérique espoir de les prévenir; qu’un grand nombre de données relatives aux besoins, au temps, à la dépense, nécessairement négligées dans la théorie, doivent entrer dans le problème relatif à une pratique immédiate et réelle; et qu’enfin en y introduisant ces données avec une habileté qui est vraiment le génie de la pratique, on peut à la fois, et franchir les limites étroites où les préjugés contre la théorie, menacent de retenir les arts, et prévenir les erreurs dans lesquelles un usage maladroit de la théorie pourrait entraîner.2 This quote is a kind of premonitory response to the self-called heterodox economists who recently partitioned economics as it is developed into either mainstream economics or heterodox economics.3 I will allude below to the criticism by the heterodox economists of the (excessive) use of mathematics in mainstream economics. It seems clear that Condorcet thought of economics among the arts (when he mentions wants, means, time, expences). It is even clearer in page 306 where he writes: L’application du calcul n’est-elle pas encore nécessaire à cette partie de l’économie publique qu’embrasse la théorie des mesures, celle des monnaies, des banques, des opérations de finances; enfin celle des impositions, de leur répartition établie par la loi, de leur distribution réelle qui s’en écarte si souvent, de leurs effets sur toutes les parties du système social.4

1I

think that this is a typo. Obviously, it should be “utilité” rather than “inutilité.” Given that the book was published after Condorcet’s death, the page proofs must not have been correctly checked. Strangely, the English translation is given as “against utility” which is probably what Condorcet wrote or wanted to write. 2 The translation is from Condorcet (1796). “We may remark that those declamations which are made against the utility of theories, even in the most simple arts, have never shewn anything but the ignorance of the declaimers. . . These observations will lead us to one general truth, that in all the arts the results of theory are necessarily modified in practice; that certain sources of inaccuracy exist, which are really inevitable, of which our aim should be to render the effect insensible, without indulging the chimerical hope of removing them; that a great number of data relative to our wants, our means, our time, and our expences which are necessarily overlooked in the theory, must enter in the relative problem of immediate and real practice; and that, lastly by introducing these requisites with that skill which truly constitutes the genius of the practical man, we may at the same time go beyond the narrow limits wherein prejudice against theory threaten to detain the arts, and prevent those errors into which an improper use of theory might lead us.” 3 I do not recognize myself in this dichotomy. Of course, the heterodox economists believe that they detain the truth. 4 “Is not the application of numbers also necessary to that part of the public economy which includes the theory of public measures, of coin, of banks and financial operations, and lastly, that of taxation as established by law, and its real distribution, which so frequently differs, in its effects on all the parts of the social system.” I do not agree with this translation as “calcul” is translated

63 Mathematics and Economics, with Special Attention to Social Choice Theory

1617

Mathematics emerged in economics after utility replaced labor in the standard theory of value. One can check that the followers of Adam Smith, David Ricardo, and Karl Marx in the nineteenth century, for instance, John Stuart Mill or Henry Sidgwick hardly use any mathematics in their otherwise excellent books entitled Principles of Political Economy. Among the precursors, Augustin Cournot and Jules Dupuit must be distinguished. But the new paradigm of economics is essentially due to Stanley Jevons, Karl Menger, and Léon Walras. In his Eléments d’économie pure, Walras proposes a formalization of an economy and gives what will be the standard conception of the general equilibrium model. Later, important developments are due to Francis Ysidro Edgeworth and Vilfredo Pareto. A number of books which appeared at the beginning of the twentieth century include a mathematical appendix. This is the case of Pareto’s Manuel d’économie politique (1909) and of Alfred Marshall’s Principles of Economics (1920). In Marshall’s appendix, one can see that Marshall believed that, given a system of (general) equations, if the number of “equations” is equal to the number of “unknowns,” the system is “determinate,” meaning possibly that there exists a solution (and maybe a unique solution). For instance, page 855 of Marshall (1920), one can read. We have then 2n + 2m unknowns, viz. the amounts and prices of n commodities and of m factors; and to determine them we have 2m + 2n equations, viz. – (i) n demand equations, each of which connects the price and amount of a commodity; (ii) n equations, each of which equates the supply price for any amount of a commodity to the sum of the prices of corresponding amounts of its factors; (iii) m supply equations, each of which connects the price of a factor with its amount; and lastly (iv) m equations, each of which states the amount of a factor which is used in the production of a given amount of the commodity. Even more recently, for instance, in the much respected book of John Hicks, Value and Capital (1946), in Chapter 4 on the general equilibrium of exchange, Hicks writes: Once a particular set of prices is given, we know how to determine the most preferred position of any individual. This gives us the quantity he will demand of those commodities he does not possess, and the quantities he will be willing to supply in exchange for them of those commodities he does. By simple addition, we can determine the demand and supply for each commodity. If the price-system is such as to make these demands and supplies equal, we have a position of equilibrium. If not, some prices at least will be bid up or down. The determination of this solution was shown by Walras to be ensured by equality of the number of equations and the number of unknowns.

as “numbers,” “économie publique” as “public economy,” and “théorie des mesures” as “theory of public measures.” It is evident to me that Condorcet’s “calcul” is “probability calculus,” “économie publique” should be rendered as “public economics” – Condorcet might be the first to use this phrase – and “théorie des mesures” has nothing to do with “public actions” or “public measures” but is a premonitory reference to the then non-existent “measurement theory.”

1618

M. Salles

According to Joseph Schumpeter (1954), in fact, this belief was not shared by Walras: Of all unjust and even meaningless objections that have beeen leveled at Walras, perhaps the most unjust is that he believed that this existence question is answered as soon as we have counted ‘equations’ and ‘unknowns’ and have found that they were equal in numbers. We have already seen that he made sure of one additional prerequisite–independence of equations. But as we analyze his argument we discover further that, though his mathematical equipment was no doubt deficient, his genius saw and sensed all or almost all the other relevant problems and practically always arrived at correct results. If he failed to answer all questions satisfactorily, there was immortal merit in his having posited them. (page 1006) What must be emphasized is that up to the 1940s, economists did not hesitate to use shaky mathematical reasonings to obtain their results. Furthermore, the mathematics they were employing have nothing to do with the mathematics which have been developed in the nineteenth and twentieth century. We have two different sources mentioning John von Neumann’s remarks regarding this situation. Martin Shubik (1970) writes: Writing at approximately the same time as Chamberlin, Mrs. Robinson managed to write a book on imperfect competition in which the work of Cournot was not even mentioned. The mathematical apparatus she assembled to do this job was such that von Neumann once remarked that if archeologists of some future civilization were to dig up the remains of ours and find a cache of books, the Economics of Imperfect Competition would probably be dated as an early precursor of Newton. It is of interest to note that the concept of equilibrium in Mrs. Robinson’s book is the same as that of Cournot. A few years later, Oskar Morgenstern (1976) insisted: Von Neumann’s view of books on mathematical economics written up to that time (of course, excepting Wald and Menger) and even somewhat later was: “you know Oskar, if these books are unearthed sometime a few hundred years hence, people will not believe that they were written in our time. Rather they will think that they are about contemporary with Newton, so primitive in their mathematics. Economics is simply still a million miles away from the state in which an advanced science is, such as physics.” Morgenstern had in mind not only Robinson’s book but also the revered Hicks’s Value and Capital. Incidentally, according to Cheryl Misak (2020), Joan Robinson once said “As I never learnt mathematics, I have had to think.” However, Misak does not give a source for her quote, an assertion which is so stupid that I wonder whether Robinson really said this. Fortunately, as will be shown below, things have changed, largely due to the interest of some remarkable scholars. In Alexander Soifer’s book (?), the mathematician Harold Kuhn, well known for his papers and his book on game theory and his contribution with Albert Tucker to smooth mathematical programming, writes: Although mathematics became the lingua franca of 20th century economics, only a handful of mathematicians have exerted a direct and lasting influence on

63 Mathematics and Economics, with Special Attention to Social Choice Theory

1619

the subject. They surely include Frank Plumpton Ramsey, John von Neumann, and John Forbes Nash jr. In passing, I will add a few other names. In section “Mathematics in Economics, Game Theory, and Social Choice Theory”, I will make a rather personal and subjective cursory glance at some mathematics as used in economics, in particular microeconomic theory and social choice theory, and in game theory. Section“The Use of Mathematics in Economics Questioned” will be devoted to the attacks against the use of (advanced) mathematics. I will conclude by considering a thesis which is well established in the philosophy of mathematics but mainly in the case of “natural sciences”: the indispensability of mathematics.

Mathematics in Economics, Game Theory, and Social Choice Theory We are presently living, to take the title of a recent volume edited by Roger Backhouse and Cherrier (2017), in the age of the applied economist. As a corollary, the importance of mathematics, or at least of the most advanced kind of mathematics normally used by economists, has somewhat decreased. As a consequence, I will mainly consider developments of theoretical economics that are at this time (2020) out of fashion. For this reason, I will focus my comments on what is still the core of microeconomic theory: general equilibrium theory. Fortunately, the theory of social choice and welfare, including the theory of voting, remains highly theoretical, even if applications are not absent. One of the first books which I consider as a mathematical economics book is Paul Samuelson’s Foundations of Economic Analysis (1947) (of course, excepting von Neumann and Morgenstern’s monograph Theory of Games and Economic Behavior whose first edition appeared in 1944). Samuelson’s Foundations includes fundamental developments on welfare economics, but not much on the Walrasian general equilibrium analysis. However, this analysis has been crucial for the development of modern microeconomic theory. The three main contributors to this domain, Kenneth Arrow (Arrow and Debreu 1954), Gérard Debreu (Arrow and Debreu 1954; Debreu 1959), and Lionel McKenzie (1959), have introduced the mathematical tools which are the basic mathematical ingredients of this microeconomic theory. The main question which was to be solved was the existence of a price system that would equalize demand and supply. The history of the discovery of this existence is beautifully given in Till Düppe and Weintraub (2014). Preliminary developments are due to Abraham Wald (Wald 1936, 1951) who is now better known as a statistician.

General Equilibrium Theory I give a brief description of the considered (exchange) economy in the case where production is excluded (in fact, we may consider that production has been included in the initial endowments of the individuals). Quantities of m goods (commodities)

1620

M. Salles

are given as vectors in the nonnegative subset of the Euclidean space (of finite m i dimension): Rm + . A vector x ∈R+ is the quantities of goods attributed to individual i. There is a finite number of n individuals i. Each individual i has an initial i i endowment ei ∈ Rm + (not 0). The sum for all i of the e s is the supply: i e . m The prices of the commodities are given as a vector p ∈ R++ of positive real numbers, for simplification. The value (scalar product) pei of the initial endowment of individual i is accordingly a positive real number that defines that individual’s budget set, B i (p), the vectors of quantities of commodities she can afford at prices given by the vector p: B i (p) = {x i : px i ≤ pei }. Each individual i has a preference relation over Rm + denoted i which is a complete preorder (a term borrowed from Bourbaki – in French préordre – by Debreu who, incidentally, used preordering rather than preorder). A preorder is a reflexive and transitive binary relation. The intuitive meaning of x i y (written this way rather than (x, y) ∈i ) is x is at least as good as y from the standpoint of individual i. One can define an asymmetric part, x  y (x is better than y) and a symmetric part x ∼ y (there is an indifference between x and y) from the preorder. Each individual i will demand a vector x¯ i which maximizes i over B i (p). With appropriate assumptions on i , a maximum is guaranteed and is unique (again with rather strong conditions for simplification). These assumptions are topological assumptions and convexity assumptions: (i) the sets {x : x i a} are closed for all m a ∈ Rm + in the relative topology induced by the topology over R defined from the Euclidean distance, and (ii) if x ∼i y, then each point of the convex combination of x and y (other than x and y) is strictly preferred to x (and y, of course), i.e., for any λ ∈]0, 1[, λx + (1 − λy i x. Another property which might be used in the proof of existence is the strong monotonicity which says that the individual prefers a vector x to any vector whose coordinates are less than (or equal to) the respective coordinates of vector x, with one coordinate being less. A Walrasian equilibrium is then a pair (p, ¯ x), ¯ where p¯ is a vector of prices, x¯ is the list of vectors (x¯ 1 , . . . , x¯ n ) where x¯ i maximizes i over B i (p), ¯ and x¯ verifies i x¯ i = i ei (demand and supply are equal). The vector p¯ is said to be a vector of equilibrium prices, and x¯ is an equilibrium allocation. Even if with all the given simplifications the described system is not extremely complex, it is far from simple. Of course, it is quite abstract, and we may think that the real world does not look like this abstract system. But there is no other way to try to understand the functioning of complex systems; we must construct simplified systems. What kind of mathematics will be useful to solve the questions posed by these simplified systems? Traditionally, economists used and still use calculus, more or less rigorously. Their main application of mathematics is through maximization (or minimization) of smooth functions where the domains of these functions are restricted by systems of equations (for a modern presentation, see Gunning 2018). This has been extended to the case of inequations by Kuhn and Tucker (see Simon and Blume 1994).

63 Mathematics and Economics, with Special Attention to Social Choice Theory

1621

However, Arrow, Debreu, and McKenzie did not attack the existence problem as a calculus maximization problem. They used methods borrowed from convex analysis and general topology. One of the main tools is a fixed-point theorem due to Y. Kakutani who generalizes Brouwer fixed-point theorem to the case of set-valued functions (correspondences). In my presentation above, Brouwer’s theorem would be sufficient. One must remark that a mathematical tool published in 1941 is used in a paper of Nash appearing in 1950, and, inspired by Nash, in the celebrated paper of Arrow and Debreu in 1954. Slightly previously, the modern version of the so-called fundamental theorems of welfare economics was proposed by Arrow and Debreu separately (it was before their collaboration leading to the 1954 Econometrica paper). An allocation x is a list of vectors (x 1 , . . . , x n ) where x i is a vector of quantities of goods attributed to individual i. An allocation x is Pareto-optimal if there is no other (feasible) allocation y such that y i i x i for all i and y i i x i for some i. One must note that this optimality which is the concept encompassing welfare considerations is based on a selfish property. We can say, for instance, that the allocation x is at least as good as y for the society if x i i y i for all i and x is better than y for the society if x i i y i for all i and x i i y i for some i. The relation “is as good as for the society” is then seen as a partial preorder whose maximal elements are the Pareto-optimal elements. The kind of social preference we obtain is only based on individual preferences which are over quantities of goods accruing to individuals. The first welfare theorem demonstrates that if x¯ is an equilibrium allocation, it is Pareto-optimal. This result is often presented as confirming Adam Smith property of the invisible hand (Mas-Colell et al. 1995) saying that the pursuit of self-interest leads to the collective good. As a matter of fact, Smith writes in the Wealth of Nations (fifth edition 1789, 1994) Book I, Chapter II: It is not from the benevolence of the butcher, the brewer, or the baker, that we expect our dinner, but from their regard to their own interest. (page 15) And in Book IV, Chapter II: By preferring the support of the domestic to that of foreign industry, he [any individual] intends only his own security; and by directing that industry in such a manner as its produce may be of the greatest value, he intends only his own gain, and he is in this, as in many other cases, led by an invisible hand to promote an end which was no part of his intention. Nor is it always the worse for the society that it was no part of it. By pursuing his own interest, he frequently promotes that of the society more effectually than when he really intends to promote it. Of course, the fact that to be a Pareto-optimal and allocation is a criterion of welfare is highly debatable since an allocation where nearly all the quantities of the commodities are attributed to a limited number of greedy individuals can be Pareto-optimal, since, supposing that all feasible allocations can only be attained by a redistribution of the quantities of goods among the individuals, any transfer will bring about a decrease of the quantities of goods accruing to some individuals, and, given strongly monotone preferences, these individuals will prefer the vector they had before the transfer.

1622

M. Salles

The so-called second fundamental theorem of welfare economics gives a partial response to this difficulty. We can start from a feasible Pareto-optimal allocation and associate with this allocation a price vector and an appropriate redistribution of the initial endowments so that the Pareto-optimal allocation and the price vector form a pair which is a Walrasian equilibrium allocation. To prove this result, we need to assume that the individual preferences are convex and then use results on the separation of convex subsets by hyperplanes due to Hermann Minkowski. A number of later developments have necessitated the recourse to rather deep and advanced mathematical techniques. I will briefly present some of these contributions. A major break was created by a paper of Robert Aumann (1964). Rather than considering a finite set of individuals, Aumann assumed a continuum of individuals, the set of individuals being, for instance, the closed interval [0, 1] and an individual being a point in this interval. The justification is that in a competitive market, individuals do not have the possibility nor the power to alter anything, including, of course, the prices. The absence of power is accordingly better taken into consideration by this assumption of a continuum of agents. This is a strong assumption, but it has an advantage regarding some convexity assumptions which are no more necessary. Aumann (1966) provides an existence theorem for the appropriately redesigned Walrasian system. In the 1970s, this approach was among the most active research programs in advanced mathematical economics with remarkable contributions by, among others, Hildenbrand (1974) and Shitovitz (1973). In particular, Shitovitz introduced oligopolies within the general economic system, a nice step towards the real world. Mathematically speaking, convex analysis was, in some sense, supplanted by measure theory. Related to this consideration of a continuum of agents and, consequently, large economies, there has been a few papers using nonstandard analysis. Noteworthy are the contributions of the modern “creator” of nonstandard analysis, Abraham Robinson, to this enterprise in a series of papers written in collaboration with Donald J. Brown (Brown and Robinson 1975, see, for instance, Brown and Robinson). A rather surprising development of the theory in the 1970s was the reintroduction of differentiability properties. This started with an article of Debreu (1970). Using tools borrowed from differential topology, Debreu dealt with a question that goes beyond the existence of a Walrasian equilibrium. Uniqueness is a target difficult to attain as it needs very strong assumptions. However, what Debreu was able to get is finiteness of the set of equilibria. This development prompted research on aggregate demand by Hugo Sonnenschein, Rolf Mantel, and Debreu himself. The outcome is often considered as marking a kind of halt in general equilibrium research beyond existence, optimality, and relations with cooperative game solution concepts. But this point is discussed. A number of remarkable books appeared using the differential topology approach, among which are books by Mas-Colell (1985) and Yves Balasko (1988, 2009, 2011). If we consider that a competitive economy is an ideal economy, this would mean that we look for an equilibrium in an ideal economy. Since it has been proven that this ideal situation is most probably not unique, we reach a kind of

63 Mathematics and Economics, with Special Attention to Social Choice Theory

1623

semantic contradiction. But we may consider that, even if the economic system is ideal and generates multiple equilibria (which are, furthermore, Pareto-optimal), there remains to choose the possibly ideal equilibrium. Then we see how social choice theory can play a fundamental role to complete the analysis of the (ideal) competitive economy. Of course, social choice cannot be limited to this crowning achievement of the theory of general equilibrium as will be seen in the next section.

Social Choice Theory In a remarkable book that readers of this chapter must consult, Donald G. Saari writes: A convenient way to describe the reductionist philosophy is to illustrate with a puzzling issue. My choice reflects the reality that it is nearly mandatory for books describing the mathematics of the social and behavioral sciences to include Arrow’s seminal impossibility theorem. As I share this point of view, I will present in this subsection Arrow’s theorem. I will also give a relatively detailed presentation of the so-called GibbardSatterthwaite theorem, which I prefer to call the Gibbard-Pattanaik-Satterthwaite theorem. We consider a set X of alternatives or options. We can assume that these options are candidates to an election but also assume that they are, as in Arrow (1948, 1950, 1951a, 1963), social states, that is, detailed descriptions of the state of the world, or also that they are allocations in the sense of the previous subsection. A finite set of individuals N of size n can be a set of voters, a set of economic agents, or any group of individuals having to take social or collective decisions. It is crucial that N be finite since not only this assumption is used to prove the theorem but it is also necessary for the theorem to be true. Each individual i ∈ N has a preference, i , over X which is supposed to be a complete preorder. In the case of the set X being finite (this is not necessary regarding Arrow’s theorem), this means that the individual is able to rank the alternatives with possible ties. A profile π is a function from N to a subset of the set of complete preorders over X, or, equivalently, a list (1 , . . . , n ) with the i s belonging to the subset. An aggregation function f associates a social preference S to each profile in its domain, the domain being the set of profiles such that the i belongs to the mentioned subset: S = f (1 , . . . , n ). The social preference S is supposed to be a complete and reflexive binary relation. Arrow considers aggregation functions which are called social welfare functions. For Arrovian social welfare functions, the social preference has to be transitive, that is, S is a complete preorder having, accordingly, the same properties as the individual preferences. The number of possible social welfare functions is huge. With three alternatives, we have thirteen complete preorders. With 3 individuals, supposing that the domain of the social welfare function is the complete set of profiles where each individual can have any of the 13 complete preorders, we have 133 profiles, i.e., 2197 profiles. The number of possible social welfare functions is then 132197 . Among these functions,

1624

M. Salles

some are clearly undesirable. For instance, constant functions: a constant function would mean that whatever the individual preferences, the social preference is always the same, that is, independent of these individual preferences. Also, imagine that the social preference is systematically the preference of individual 1, whatever the preferences of the other individuals. Obviously, this kind of aggregation function must be rejected. Arrow (1963) proposed four properties of aggregation functions, two of which entail the exclusion of these aforementioned functions. The first property, called condition of unrestricted domain (Condition U), says that the subset of profiles I mentioned to describe the domain of the function is in fact the whole set; the individuals can have any preferences as long as they are complete preorders. With appropriate domain restrictions such as the well-known single-peakedness condition due to Duncan Black in 1948 (see Black 1958), the theorem is no longer valid. The second property says that if in a profile every individual (strictly) prefers some alternative a to some alternative b, then in the social preference a must also be preferred to b. This is a condition related to the concept of Pareto-optimality considered in section “General Equilibrium Theory”. For this reason, it has been called Condition P. This condition, once admitted we can have profiles where every individual prefers a to b and profiles where every individual prefers b to a, entails the impossibility of constant functions. A third property asserts that there is no dictator (Condition D), a dictator being an individual whose strict preference is systematically the social strict preference: whenever a dictator prefers some alternative a to some alternative b, a is socially preferred to b. The most controversial property is the so-called condition of independence of irrelevant alternatives (Condition IIA). Suppose that we have two profiles, π = (1 , . . . , n ) and π  = (1 , . . . , n ) such that for two alternatives a and b, the restriction of the individual complete preorders to {a, b} are identical, i.e., i |{a, b} = i |{a, b}. Then the restriction to {a, b} of the social preferences obtained from the aggregation function as values of these two profiles are identical: S |{a, b} = S |{a, b} where S = f (1 , . . . , n ) and S = f (1 , . . . , n ). Suppose, for instance, that the set X is finite and has 10 elements and that the aggregation is by voting. The fact that, say, individual 1 ranks a first and b last or that she ranks a second and b third, all other things remaining the same, should not have any effect on the relative ranking of a and b in the social ranking. This, in particular, excludes functions based on scores defined from the position of the alternatives in a ranking, as with Borda’s rule where points are attributed to alternatives on the basis of their ranks, for instance, with three alternatives, supposing there are no ties, 2 points are attributed to an alternative ranked first, 1 point is attributed to an alternative ranked second, and 0 to the alternative ranked third. The points are added, and the social ranking is determined by the number of points obtained by the alternatives. This also excludes plurality rule which is the rule that is used in most

63 Mathematics and Economics, with Special Attention to Social Choice Theory

1625

political elections in the UK (where it is called ‘first past the post’) and USA and, in fact, the quasi-totality of election rules used in practice. Arrow’s theorem states that if there are at least three alternatives and at least two individuals, there is no social welfare function satisfying conditions U, P, D, and IIA. From the mathematical point of view, the Arrovian framework and the theorem itself are remarkable. It is known that Arrow was influenced by the teaching of the logician Alfred Tarski. The Arrovian basic framework is made of elementary set theory and, in particular, relations. Incidentally, one can find a K. J. Arrow among the individuals to whom Tarski expressed thanks and gratitude in the American edition of his book Introduction to Logic and the Methodology of Deductive Sciences (1994, first edition 1941). This framework became a rather standard framework to deal with preference relations in microeconomic theory and, of course, the fundamental framework for social choice theory. The framework and the proof of the theorem can be judged to belong to discrete mathematics as opposed to what was the regular mathematical language used by economists heretofore as exemplified by Samuelson’s book (1947). The second major result of social choice theory I wish to describe is the Gibbard-Pattanaik-Satterthwaite theorem (Gibbard 1973; Pattanaik 1973, 1978; Satterthwaite 1975). I will simplify the Arrovian framework in this presentation. Rather than using complete preorders for individual preferences, we can consider that there are no ties. Since we will also assume that X is finite, we will then have linear orders over X which are rankings of all the options without ties. A profile π will, accordingly, be a list of linear orders/rankings denoted (1 , . . . , n ). Rather than considering aggregation functions whose values are social binary relations, we will associate an option in X to a profile π . Let us call such functions social choice functions. We will assume that the social choice functions, f , are surjective: for every x ∈ X, there is a profile π in the domain of the social choice function such that x = f (π ). We will assume that the domain is unrestricted: the individual preference in a profile can be any possible linear order. The problem we want to tackle is the problem of individual strategic behavior, or, if f is a voting rule, strategic voting. We say that f is manipulated by individual i at profile π = (1 , . . . , n ) if there exists a linear order i such that f (π−i , i ) i f (π ) where (π−i , i ) is the profile π where the preference i has been replaced by the preference i . This means that if individual i indicates that his preference is i rather than i , the option selected as the social outcome will be preferred by individual i to the outcome that would have prevailed if he had indicated i . If you imagine that the original profile π = (1 , . . . , n ) is constituted of sincere individual preferences, this means that, by expressing a non-sincere preference (a lie), individual i forces the function to generate an outcome that she prefers to the outcome which would have been generated had she sincerely expressed her preference. If some individual i is detaining full power, the manipulation problem disappears. We want, of course, to exclude this possibility. We must, accordingly, reformulate the notion of dictatorship. A dictator for a social choice function f will be an individual i whose option ranked first in her linear order is systematically the social

1626

M. Salles

choice. Formally, the individual i is a dictator if for any profile π , f (π ) i x for all x ∈ X − {f (π )}. The theorem is then: if n ≥ 1 and the number of elements in X is at least 3, and f is surjective and non-manipulable, then there is a dictator. If dictatorship is a repugnant solution, one can see that we have to accept that the social choice function be manipulable. The mathematical tools that have been employed in social choice theory are diverse, depending on the assumptions made on the various concepts. For instance, the set of options can be a subset of the Euclidean space. Then topological properties and convexity properties will be introduced in a way similar with what is done in general equilibrium theory. Also, continuity properties of the aggregation functions can be meaningful in the context of aggregation of individual judgments. Continuity will replace the condition of independence of irrelevant alternatives. We owe this kind of analysis to Graciela Chichilnisky (1983). One must be careful, however. If we deal with aggregation of opinions, like in voting by using majority rule, we know that there are discontinuities. In a voting situation when, for instance, the voters are partitioned into two nearly equal sets, a slight modification of a profile can lead to a radical change in the social preference. Chichilnisky used tools borrowed from algebraic topology. This has sometimes been called the topological approach to social choice theory. Some well-known mathematicians participated to this domain such as Shmuel Weinberger. Important contributors, mathematicians, or economists include Geoffrey Heal, Nicholas Baigent, Charles Horvath, and Yuliy Baryshnikov. Smoothness properties have been considered by Norman Schofield (2008); Scofield (2014)), among others. Saari in his treatment of scoring rules in voting theory, one of the major developments of post-Arrovian social choice, used tools borrowed from differential geometry and dynamic systems. Taking account of the fact that preferences can be vague, fuzzy set theory has been used by a number of researchers (see the references in Richard Barrett and Salles (2011), and in John N. Mordeson et al. (2015)). Regarding the manipulability issue, several mathematicians wrote remarkable books, for instance, Alan Taylor (2005) and Bezalel Peleg and Peters (2010). Computer scientists building on the pioneering work of John Bartholdi, Craig Tovey, and Michael Trick in 1989 are at the origin of another major development of social choice in recent times, the problem to solve being the practical possibility of the determination of the choice, not only its existence. This important literature is covered in the Handbook of Computational Social Choice (Brandt et al. 2016). This research prompted a renewed interest in logic (philosophical and mathematical logic) as a possible tool for social choice theory. This takes us back to Charles L. Dodgson, an Oxford logician, better known under his pen name of Lewis Carroll, who wrote a few “pamphlets” on voting in the nineteenth century. It takes us back too to Arrow and Tarski (as noted in Wesley Holliday and Pacuit 2020) and other giants such as Sen (1970, 2017) or Pattanaik (1971). Both Sen and Pattanaik refer to books written by logicians (Willard von Orman Quine, and David Hilbert and Wilhelm Ackermann). Already in 1968, a little book by Yasusuke Murakami links logic and social choice (Murakami 1968). Moreover the recent research on the

63 Mathematics and Economics, with Special Attention to Social Choice Theory

1627

domain known as “judgment aggregation” based on Condorcet’s 1785 book has recourse to logic as exemplified in the papers by, among others, Christian List, Franz Dietrich, Clemens Puppe, or Philippe Mongin. Two recent books on logic have a chapter devoted to social choice (de Swart 2018; Hansson and Hendricks 2018).

Game Theory A remarkable fact regarding game theory is that some of the greatest mathematicians of the twentieth century made fundamental contributions. Von Neumann, with the economist Morgenstern, wrote the book, Theory of Games and Economic Behavior (1944, 1953), which, in some sense, starts the domain even though there has been some precursors, including von Neumann himself. John Nash (1950a) introduced a famous concept of equilibrium for noncooperative games which bears his name and, moreover, with a short paper is at the origin of bargaining theory (Nash 1950b). John Milnor (1954) wrote an important paper on individual decision theory interpreted as games against nature and papers on the so-called oceanic games (with Lloyd Shapley). Aumann tackled the question of incomplete information in a number of publications, including a major contribution to repeated games with Michael Maschler (Aumann and Maschler 1995). Most of the fundamental works in game theory are due to mathematicians. An encyclopedic treatise by Maschler, Eilon Solan, and Shmuel Zamir appeared in 2013, and a volume which can be seen as a mathematical monograph on repeated games was published in 2015 (Mertens et al. 2015). Even if game theory can be considered as “applied mathematics,” it permeates microeconomics, social choice and voting theory, and political science. However, the influence of game theory in economics, even if von Neumann and Morgenstern referred to “economic behavior” in the title of their book, did not start before the 1970s. Incidentally, nearly all the publications on game theory to this date had authors who were mathematicians, and their papers appeared in mathematical periodicals or in the famous Princeton Annals of Mathematics Studies series. Times changed in the 1970s with the development of industrial economics and its recourse to concepts forged in noncooperative games (such as Nash equilibrium and refinements of this concept). The importance of noncooperative games has been exemplified by the publication of books entitled “game theory” that did not include a word about cooperative games. I will, however, in the sequel, concentrate my comments on concepts which were created for cooperative games. The first concept is called the “core.” In cooperative games, a major assumption is that there is a function which evaluates, in some sense, the power of groups of people, called coalitions. This function has been called characteristic function in von Neumann and Morgenstern (1954) and coalition function in Maschler et al. (2013). It measures the “worth” of a coalition with a real number, possibly normalized to belong to the closed interval [0, 1]. A coalition can distribute its value/worth among its members. If we consider a vector x of n real numbers, one number for

1628

M. Salles

each individual, such a vector will be an outcome. An outcome will be stable if no coalition by seceding can obtain a better outcome for each of its members (on the basis of the worth of the coalition). The core is then the set of stable outcomes. I will now describe an application of this concept to the general equilibrium framework presented in section “General Equilibrium Theory”. The underlying idea, as a matter of fact, preceded the introduction of the “core” concept in game theory which is essentially due to Donald Gillies (1959). One can find this idea of relating a cooperative concept to the noncooperative equilibrium concept of equilibrium analysis in Mathematical Psychics of Edgeworth (1881), and the specific Edgeworth conjecture was proved only in 1963 by Debreu and Herbert Scarf. Let us consider two allocations x and y and a coalition C where for all individuals i ∈ C, we have x i i y i . Now let us suppose that the individuals in C can afford their share in x in the sense that i∈C x i ≤ i∈C ei , that is, in redistributing what they have in common (i∈C ei ), the individuals in C can construct a reduced vector made of x i for the i ∈ C.We can easily imagine that the members of the coalition C have an incentive to secede rather than contemplating the perspective to remain with their share in the allocation y. The “core of the economy” is then the set of allocations for which no coalition has an incentive to secede. It is rather easy to demonstrate that an equilibrium allocation belongs to the core. Starting with an allocation in the core, showing that we can “create” an equilibrium allocation is more complicated. The idea of Debreu and Scarf is to see whether with a large economy, that is, an economy with many individuals, a sort of conjecture à la Edgeworth holds. This conjecture essentially says that for a large economy, the core and the set of equilibrium allocations coincide. To prove this, or an approximate version, Debreu and Scarf postulate the existence of replicas of the basic economy, that is, an economy having, say, kn individuals where there are k copies of each individual (i.e., k individuals having the same preferences and the same initial endowments). If an allocation in the core of the k times replicated economy, with k tending towards ∞, is such that the k identical individuals i receive the same x i , then x = (x 1 , . . . , x n ) is an equilibrium of the basic economy. Of course, it has been considered to replace this strange construction of replicas by an assumption regarding the infinity of individuals, and then, using tools borrowed from measure theory and functional analysis, Aumann (1964) was able to get an essentially similar result. This domain has been excellently covered in Hildenbrand (1974) and is still actively explored. What is extremely remarkable is that, in the limit, the cooperative game concept of core is essentially identical with the non cooperative concept of Walrasian equilibrium. Another concept of cooperative game that reveals to be important in applications is the concept of value due to Shapley, hence the Shapley value. I will not describe the theory behind the notion of Shapley value, restricting myself to a particular aspect when the value is the value of a specific simple game (Shapley and Shubik 1954). In simple games, a coalition has value 1 or value 0. (The characteristic function co-domain is the two-value set {0, 1}.) I will present an example of a simple/voting game, viz., the voting system of the Security Council as described in the charter of the United Nations Organization. The Security Council has 15

63 Mathematics and Economics, with Special Attention to Social Choice Theory

1629

members with 5 so-called permanent members, the others being elected by the General Assembly. To pass a resolution, one needs the favorable votes of all the permanent members and of at least four other members. So, in the case of a nonpermanent member, for a resolution to pass, one needs an already formed coalition of the five permanent members plus three nonpermanent members. To compute the power index of a nonpermanent member, the Shapley-Shubik index is calculated as follows. Coalitions are supposed to form one member by one member: for instance, first Russia, then China, then a nonpermanent member, etc. up to a coalition of eight members. Then if a nonpermanent member is added to   9 this coalition, the resolution will pass. There are (8!) different orders of the 3 members before adding the nonpermanent member. Once it has been added, there are 6! orders of the six remaining nonpermanent members. So according to Shapley and Shubik, the power index of a nonpermanent member is then:   9 (8!)(6!) 3 4 = 15! 2145 The power index of a permanent member is then: 421 (2145 − 40)/5 = . 2145 2145 The Security Council voting method is known as giving a veto to permanent members, meaning that each of these permanent members must belong to a coalition for this coalition to win. The calculated power index clearly indicates that the power is concentrated in the permanent members. This kind of simple game structure is said to be proper: if a coalition wins, its complement does not win. Note, however, that this voting rule is not strong (strong meaning that when a coalition does not win, its complement wins). Of course, the combinatorial aspect of the calculation of possible orders might appear as too complicated to reflect the power of a coalition. There exist many power indices, the first one proposed by Luther Martin, one of the United States Founding Fathers. In modern times, Lionel Penrose defined an index which is essentially the index now known as the Banzhaf’s index (see Felsenthal and Machover (1998)).

The Use of Mathematics in Economics Questioned Two recent books have essentially the same title. The first one edited by Edward Fullbrook is entitled A Guide to What’s Wrong with Economics (2004). It includes a Part V whose title is Misuse of Mathematics and Statistics. The second one,

1630

M. Salles

by Robert Skidelsky is entitled What’s Wrong with Economics? A Primer for the Perplexed (2020). Skidelsky is a competent and well-informed economist whose volume is a must-read for all economists, even if I might disagree with him on several points. Fullbrook’s book is representative of the so-called heterodox vision in economics, a vision which is generally highly critical of the use of mathematics in economics. The chapter in Fullbrook by Donald Gillies (who is not the same person as the mathematician Donald Gillies of section “Game Theory”), a respected philosopher, is disappointing. He based his comments mainly on two books: Samuelson (1947) and Helpman and Krugman (1985). There is no justification for this choice. The choice of Samuelson could easily be justified since Gillies compares the use of mathematics in physics to its use in economics. Samuelson alluded to theoretical physics in his book (for instance page 8: “. . . a good deal of theoretical physics consists of the assumption of second order differential equations sufficient in number to determine the evolution through time of all the variables subject to given initial conditions of position and velocity. Similarly, in the field of economics dynamic systems involving the relationship between variables at different points of time (e.g., time derivatives, weighted integrals, functionals, etc.) have been suggested for the purpose of determining the evolution of a set of economic variables through time.” Gillies has a strange view of physics and economics: “The physical world appears on the surface to be qualitative, and yet underneath it obeys precise quantitative laws. That is why mathematics works in physics. Conversely economics appears to be mathematical on the surface, but underneath it is really qualitative. This is why attempts to create a successful mathematical economics have failed.” I must say that I do not understand what Gillies says about “quantitative” and “qualitative” when mathematics seem to be excluded of anything “qualitative.” It can be further remarked that some of the developments of economic theory I described in this chapter can be considered as qualitative. According to Gillies, the alternative to “mathematical economics” for economics is to model itself on branches of science which “have had striking discoveries and great achievements without the use of mathematics” (page 196). One particular branch is mentioned: medicine (with the strange specification “medicine between 1860 and 1945”). Incidentally, medicine models have inspired recent works in economics (although I am not sure that it is medicine between 1860 and 1945). I am thinking of Abhijit Banerjee, Esther Duflo, and Michael Kremer – the Nobel jury for economics has recognized the merits of their “medicine” approach. For most heterodox economists, there is a formal reasoning of the following type. (1) Neo-liberalism is harmful, (2) since neo-liberalism is justified by neoclassical economics, then neoclassical economics is also harmful, and (3) since neoclassical economics makes a deep usage of mathematics, mathematics are not neutral and must be banished. Of course, this is not only bad reasoning, but my presentation of it is a provocative short cut. However, in Fullbrook (2004), in Chapter 20, Steve Keen (2004), regarding neoclassical economists, writes: “they tried to use mathematics to reach personal conclusions. . . they used the wrong types of mathematics. . . they failed to realise the inherent limitations of mathematics. . . they made outright

63 Mathematics and Economics, with Special Attention to Social Choice Theory

1631

mathematical errors which persist in economic analysis and tuition to this day” (page 210). The mentioned mathematical errors pertain to the microeconomic theory of the firm. I must confess that I had never met this error before I read Keen’s chapter. Incidentally, rectifying mathematical errors by using non-rigorous mathematics is, in my view, strange. The errors in question belong to the preNewtonian times mentioned by von Neumann. Regarding the limitations, Keen considers the completeness assumption of consumer’s theory. The individual is supposed to have complete preferences, that is, to be able to compare any two vectors of quantities of goods. In Keen’s figures, these quantities are obviously real numbers (nonnegative). In his further comments, he uses discrete versions with natural numbers (indivisible goods). He rightly observes that it is really impossible to compare any two vectors since the number of these vectors is too large (there is no discussion of the notion of the infinite!). Strangely, there is nothing against the use of real numbers (in this case, the impressive numbers he mentions are nothing compared with the cardinality of uncountable sets! Obviously, Keen’s knowledge is itself limited. He obviously ignores that several attempts have been made (with only partial success) to deal with incomplete preferences. Concerning the wrong types of mathematics, Keen opposes statics to dynamics. Again, one has the impression that Keen’s knowledge of economics is limited to economics up to 1930 and he seems to ignore many developments in growth theory and/or the use of dynamical system theory in economics. The preconceived result, “the personal conclusions,” is about Walras and his economic system and the creation of a fictive auctioneer whose actions would lead to equilibrium. But what is crucial in Walras’ system is the existence of equilibria, not this ad hoc way to reach it. To comment on the stability of equilibrium, Keen considers a model proposed in 1960 by Dale Jorgenson, ignoring the result on aggregate demand of Sonnenschein, Mantel, and Debreu. Keen’s chapter is representative of the mainstream heterodox vision. Others, like Tony Lawson, try to establish their heterodoxy on philosophical foundations (Chapter 2 in Fullbrook 2004).5 A book of conversations with René Thom is entitled “Prédire n’est pas expliquer” (1991).6 I believe that general equilibrium theory and welfare economics main developments are abstractions from the reality to try to understand the functioning of complex systems. Social choice theory comprises many fundamental results which are impossibility results. In these cases, the question is what can be done to avoid these impossibilities? Answers to this question have been provided and are still provided. I know that we expect from economists that they make policy recommendations. I am not sure that heterodox economists can make better recommendations than those who do not consider themselves they are heterodox. But I am sure that, until now, heterodox economists have not been able to explain the functioning of an economy.

5 Although

Lawson seems to know some mathematics, his definition (page 22) of a mathematical function is rather shaky. 6 To predict is not to explain.

1632

M. Salles

Conclusion: The Indispensability of Mathematics Adam Smith’s intuition regarding individual interests leading to the collective good has been largely accepted by classical economists as something which happens in the real world. The first rigorous proof of this intuition appeared with the development of general equilibrium theory in papers by Arrow (1951b) and Debreu (1951) showing that equilibrium allocations were Pareto-optimal. Incidentally, as I already mentioned, considering that Pareto-optimality is the notion which characterizes the collective welfare is extremely controversial. It is true that many mainstream (neoclassical?) economists following economists who, like Lionel Robbins in the 1930s, believed that, within economics, value judgments had to be excluded, think that the only admissible welfare comparisons are comparisons based on unanimity: allocation x is socially better than allocation y if everyone prefers x to y. The general equilibrium analysis clearly indicates that Paretooptimal equilibrium allocations may prevail while being very unequal. Without the recourse to mathematics, these results, showing that Adam Smith’s intuition was not pure speculation, would have not been obtained. Of course, to obtain this result, one must make very strong assumptions, and it is easy to declare that the real world is not like this. Incidentally, when we have a formal description with assumptions clearly defined, it is possible to question the assumptions and to remove them if at all possible, to weaken them, or to replace them by apparently less stringent assumptions. Mathematically inclined economists know that they are using properties in their abstract theories which are not “realist,” whatever the meaning of this word. But researchers are always trying to create models of the reality which are more acceptable, regarding their assumptions. Unfortunately, this often implies the use of still more advanced mathematics. In social choice theory, the proof of Arrow’s theorem does not necessitate the use of advanced mathematics. However, it might be formally considered as a kind of extension of the Condorcet paradox, a paradox which can be explained to school children. But going from Condorcet paradox to Arrow’s theorem, one needs some mathematical sophistication. Furthermore, Arrow’s theorem, Saari’s treatment of scoring rules, computational social choice, etc. have direct implications in the real life. There is a famous paper by Eugene Wigner entitled The Unreasonable Effectiveness of Mathematics in the Natural Sciences (1960). I strongly believed that this too could apply to the social sciences. In the philosophy of mathematics, there is a much discussed thesis concerning the “indispensability of mathematics” (Colyvan 2001). Two great philosophers are associated with this thesis: Quine over the years and in particular in Theories and Things (1981) and Hilary Putnam (1979). In Philosophy of Logic (1979), Putnam writes: “So far I have been developing an argument for realism along roughly the following lines: quantification over mathematical entities is indispensable for science, both formal and physical; therefore we should accept this quantification; but this commits us to accepting the existence of the mathematical entities in question” (page 347). The problem we have to face

63 Mathematics and Economics, with Special Attention to Social Choice Theory

1633

concerns the notion of science which is put forward. Quine in Chapter 18 of Theories and Things entitled Success and Limits of Mathematization writes (this quote is also in Colyvan 2001): “Ordinary interpreted scientific discourse is as irredeemably committed to abstract objects – to nations, species, numbers, functions, sets – as it is to apples and other bodies. All these things figure as values of the variables in our overall system of the world. The numbers and functions contribute just as genuinely to physical theory as do hypothetical particles” (pages 149–50). Although both Putnam and Quine most probably had the so-called natural sciences in mind, it is not so clear regarding Quine when he mentions “nations” in the above quote. In his Chapter 18, Quine even alludes to anthropology and kinship structures. Putnam and Quine’s reasonings can have a formal expression such as in Øystein Linnebo (2017) where the conclusion is stated as: “We have reasons to believe that there are abstract mathematical objects” (page 97). However, these views are not unanimously admitted in philosophy of mathematics. Some philosophers do not believe in the notion of abstract mathematical objects; more exactly, for them, there are no abstract objects (at all). This is the case of Hartry Field who proposed a defense of nominalism (1980, 2017) and Mary Leng (2010). But this does not lead these philosophers to reject the use of mathematics in the natural sciences. Regarding social choice theory (I think that it can apply equally to economics in general), another major philosopher, Michael Dummett, who made fundamental contributions to social choice and voting theory, writes in the Epilogue of his book Voting Procedures (1984): “. . . it has been the contention of this book that the subject requires a systematic approach just because what seems at first sight obvious is often false; and some of the problems that arise in the theory of voting are far from easy to solve.” As for Quine mentioning “nations,” Dummett was obviously convinced that mathematics (or, in any case, any formal sort of reasoning such as formal logic, if it is not considered as belonging to mathematics) is indispensable in social science. Dummett mentions that the analysis of voting procedures does not involve deep concepts such as those of infinity or continuity. This is only partially true. For instance, infinity is indirectly considered in Arrow’s theorem since the finiteness of the set of individuals is crucial. The theorem is known to be false if this set is infinite. Continuity may be required in some cases for the set of options and in Saari’s geometrization of the scoring voting methods. As noted by Saari (2018), it happens that economists attempt to squeeze their research topic into available mathematics, “of more value would have been to modify the mathematics so that it encompasses accepted models” (page 160). Dummett (1984) justly remarks: “But to reject in principle a systematic manner of thinking about a subject which of its nature manifestly allows of a systematic approach is to repudiate rationality itself: it is the equivalent of a remark recently made by an MP, ‘logic has nothing to do with real life”’ (page 298). Replacing “logic” by “mathematics” is unfortunately an expression which many people pretending to be social scientists use to utter, even in academic circles.

1634

M. Salles

References Aristotle (1984) The complete works of Aristotle. The revised Oxford translation edited by Jonathan Barnes. Princeton University Press, Princeton Arrow KJ (1948) The possibility of a universal social welfare function. Document P-41, 26 September 1948, RAND Corporation Arrow KJ (1950) A difficulty in the concept of social welfare. J Polit Econ 58:328–346 Arrow KJ (1951a) Social choice and individual values. Wiley, New York Arrow KJ (1951b) An extension of the basic theorems of classical welfare economics. In: Neyman J (ed) Proceedings of the second Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 507–532 Arrow KJ (1963) Social choice and individual values, 2nd edn. Wiley, New York Arrow KJ, Debreu G (1954) Existence of an equilibrium for a competitive economy. Econometrica 22:265–290 Aumann RJ (1964) Markets with a continuum of traders. Econometrica 32:39–50 Aumann RJ (1966) Existence of competitive equilibria in markets with a continuum of traders. Econometrica 34:1–17 Aumann RJ, Maschler M (1995) Repeated games with incomplete information. MIT Press, Cambridge, MA Backhouse R, Cherrier B (2017) (eds) The age of the applied economist: the transformation of economics since the 1970s. Annual supplement to vol 49, History of Political Economy. Duke University Press, Durham Balasko Y (1988) Foundations of the theory of general equilibrium. Academic Press, Orlando Balasko Y (2009) The equilibrium manifold. MIT Press, Cambridge, MA Balasko Y (2011) General equilibrium theory of value. Princeton University Press, Princeton Barrett R, Salles M (2011) Social choice with fuzzy preferences. In: Arrow KJ, Sen AK, Suzumura K (eds) Handbook of social choice and welfare, vol II. North-Holland, Amsterdam, pp 367–389 Black D (1958) The theory of committees and elections. Cambridge University Press, Cambridge Brandt F, Conitzer V, Endriss U, Lang J, Procaccia A (eds) (2016) Handbook of computational social choice. Cambridge University Press, Cambridge Brown DJ, Robinson A (1975) Nonstandard exchange economies. Econometrica 43:41–55 Chichilnisky G (1983) Social choice and game theory: recent results with a topological approach. In: Pattanaik PK, Salles M (eds) Social choice and welfare. North-Holland, Amsterdam, 79–102 Colyvan M (2001) The indispensability of mathematics. Oxford University Press, Oxford Condorcet M-J-A-N Caritat Marquis de (1785) Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie Royale, Paris Condorcet M-J-A-N Caritat Marquis de (1794–1795/An III de la République) Esquisse d’un tableau historique des progrès de l’esprit humain. Agasse, Paris Condorcet M-J-A-N Caritat Marquis de (1796) Outlines of an historical view of the progress of the human mind. M. Carey, Philadelphia Debreu G (1951) The coefficient of resource utilization. Econometrica 19:273–292 Debreu G (1959) Theory of value. Wiley, New York Debreu G (1970) Economies with a finite set of equilibria. Econometrica 38:387–392 Debreu G, Scarf H (1963) A limit theorem on the core of an economy. Int Econ Rev 4:235–246 de Swart H (2018) Philosophical and mathematical logic. Springer, Cham Dummett M (1984) Voting procedures. Oxford University Press, Oxford Düppe T, Weintraub R (2014) Finding equilibrium. Arrow, Debreu, McKenzie and the problem of scientific credit. Princeton University Press, Princeton Edgeworth FY (1881) Mathematical psychics. An essay on the application of mathematics to the moral sciences. C. Kegan Paul & Co, London Felsenthal D, Machover M (1998) The measurement of voting power. Theory and practice, problems and paradoxes. Edward Elgar, Cheltenham

63 Mathematics and Economics, with Special Attention to Social Choice Theory

1635

Field H (1980, 2017) Science without numbers. A defense of nominalism. Oxford University Press, Oxford Fullbrook E (ed) (2004) A guide to what’s wrong in economics. Anthem Press, London Gallagher R (2018) Aristotle’s critique of political economy, with a contemporary application. Routledge, Abingdon Gibbard A (1973) Manipulation of voting schemes: a general result. Econometrica 41:587–601 Gillies DA (2004) Can mathematics be used successfully in economics. In: Fullbrook E (ed) (2004), pp 187–197 Gillies DB (1959) Solutions to non-zero sum games. In: Tucker RW, Luce RD (eds) Contributions to the theory of games, vol IV. Princeton University Press, Princeton Gunning R (2018) An introduction to analysis. Princeton University Press, Princeton Hansson SO, Hendricks V (2018) Introduction to formal philosophy. Springer, Cham Helpman E, Krugman P (1985) Market structure and foreign trade. Increasing returns, imperfect competition, and the international economy. MIT Press, Cambridge, MA Hicks JR (1946) Value and capital. An inquiry into some fundamental principles of economic theory, 2nd edn. Clarendon Press, Oxford Hildenbrand W (1974) Core and equilibria of a large economy. Princeton University Press, Princeton Holliday W, Pacuit E (2020) Arrow’s decisive coalitions. Soc Choice Welfare 54:463–505 Keen S (2004) Improbable, incorrect or impossible: the persuasive but flawed mathematics of microeconomics. In: Fullbrook E (ed), pp 209–222 Leng M (2010) Mathematics and reality. Oxford University Press, Oxford Linnebo Ø (2017) Philosophy of mathematics. Princeton University Press, Princeton Marshall A (1920) Principles of economics, 8th edn. Macmillan, London Maschler M, Solan E, Zamir S (2013) Game theory. Cambridge University Press, Cambridge Mas-Colell A (1985) The theory of general economic equilibrium. A differentiable approach. Cambridge University Press, Cambridge Mas-Colell A, Whinston MD, Green JR (1995) Microeconomic theory. Oxford University Press, New York McKenzie L (1959) On the existence of general equilibrium for a competitive market. Econometrica 27:54–71 Mertens J-F, Sorin S, Zamir S (2015) Repeated games. Cambridge University Press, Cambridge Milnor J (1954) Games against nature. In: Thrall RM, Coombs CH, Davis RL (eds) Decision processes. Wiley, New York Misak C (2020) Frank Ramsay. A sheer excess of powers. Oxford University Press, Oxford Mordeson JN, Malik DS, Terry TD (2015) Application of fuzzy logic to social choice theory. CRC Press, Boca Raton Morgenstern O (1976) The collaboration between Oskar Morgenstern and John von Neumann on the theory of games. J. Econ Lit 14:805–816 Murakami Y (1968) Logic and social choice. Routledge & Kegan Paul, London Nash J (1950a) Equilibrium points in n-person games. Proc Natl Acad Sci 36:48–49 Nash J (1950b) The bargaining problem. Econometrica 18:155–162 Pareto V (1909) Manuel d’économie politique. Giard et Brière, Paris Pattanaik PK (1971) Voting and collective choice. Cambridge University Press, Cambridge Pattanaik PK (1973) On the stability of sincere voting situations. J Econ Theory 6:558–574 Pattanaik PK (1978) Strategy and group choice. North-Holland, Amsterdam Peleg B, Peters H (2010) Strategic social choice. Springer, Heidelberg Pliny the Younger (1969) Letters, Book VIII–X, Panegyricus, trans. by B. Radice. Harvard University Press (Loeb Classic Library), Cambridge, MA Putnam H (1979) Mathematics, Matter and Method. Philosophical papers, vol I. Cambridge University Press, Cambridge Quine WO (1981) Theories and things. Harvard University Press, Cambridge, MA Saari DG (2018) Mathematics motivated by the social and behavioral sciences. SIAM, Philadelphia

1636

M. Salles

Salles M (2016) Social choice. In: Faccarello G, Kurz HD (eds) Handbook on the history of economic analysis, vol III–Developments in major fields of economics. Edward Elgar, Cheltenham, pp 518–537 Satterthwaite MA (1975) Strategy-proofness and Arrow’s conditions: Existence and correspondence theorems for voting procedures and social welfare functions. J Econ Theory 10:187–217 Samuelson PA (1947) Foundations of economic analysis. Harvard University Press, Cambridge, MA Schofield N (2008) The spatial model of politics. Routledge, Abingdon Scofield N (2014) Mathematical methods in economics and social choice. Springer, Heidelberg Schumpeter JA (1954) History of economic analysis. George Allen & Unwin, London Sen AK (1970) Collective choice and social welfare. Holden-Day, San Francisco Sen AK (1977) Social choice theory: A re-examination. Econometrica 45:53–89. Also in Sen (1982) Sen AK (1982) Choice, welfare and measurement. Basil Blackwell, Oxford Sen AK (2017) Collective choice and social welfare: an expanded edition. Harvard University Press, Cambridge, MA Shapley L, Shubik M (1954) A method for evaluating the distribution of power in a committee system. Am Polit Sci Rev 48:787–792 Shitovitz B (1973) Oligopoly in markets with a continuum of traders. Econometrica 41:467–501 Shubik M (1970) A curmudgeon’s guide to microeconomics. J Econ Lit 8:405–434 Simon CP, Blume L (1994) Mathematics for economists. Norton, New York Skidelsky R (2020) What’s wrong with economics? A primer for the perplexed. Yale University Press, New Haven Smith A (1789, 1994) An inquiry into the nature and the causes of the wealth of nations, edited by E. Cannan. The Modern Library, New York Soifer A (2009) The mathematical coloring book. Springer, New York Tarski A (1994) Introduction to logic and the methodology of deductive sciences. Oxford University Press, Oxford Taylor A (2005) Social choice and the mathematics of manipulation. Cambridge: Cambridge University Press Thom R (1991) Prédire n’est pas expliquer. Eshel, Paris Tubaro P (2016) Formalization and mathematical modelling. In: Faccarello G, Kurz HD (eds) Handbook on the history of economic analysis. volume III–Developments in major fields of economics. Edward Elgar, Cheltenham, pp 208–221 von Neumann J, Morgenstern O (1953) Theory of games and economic behavior. Princeton University Press, Princeton Wald A (1936, 1951) Über einige Gleichungssysteme der mathematischen Ökonomie. Zeitschrift für Nationalökonomie 7:637–670. Translated as ‘On some systems of equations of mathematical economics’. Econometrica 19:368–403 Wigner E (1960) The unreasonable effectiveness of mathematics in the natural sciences. Commun Pure Appl Math 13:1–14

64

Social Algorithms and Optimization Xin-She Yang

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Essence of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Search for Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantages of Social Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Social Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithms as Descriptive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithms as Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Firefly Algorithm as a Nonlinear System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithms as Quasi-linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm Analysis and Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithms and Self-Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Balance of Exploitation and Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1638 1640 1642 1645 1645 1646 1647 1648 1648 1649 1650 1652 1654 1654 1655 1656 1657 1657

Abstract Social algorithms have become popular and effective for solving problems in optimization and computational intelligence. They are population-based algorithms using multiple, interacting, and coevolving agents. We will review the brief history and introduce some of the commonly used social algorithms.

X.-S. Yang () Department of Design Engineering and Maths, School of Science and Technology, Middlesex University, London, UK e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_105

1637

1638

X.-S. Yang

We will also analyze these algorithms and then highlight some open problems so as to inspire further research.

Keywords Algorithm · Ant colony optimization · Bat algorithm · Cuckoo search · Firefly algorithm · Particle swarm optimization · Metaheuristic · Nature-inspired computation · Optimization · Self-organization · Social algorithm · Swarm intelligence

Introduction An algorithm is an iterative, step-by-step procedure for computation or a set of rules to be followed by a computer, and algorithms are widely used to solve problems in science and engineering. One of the oldest algorithms is probably the Euclidean algorithm for finding the greatest common divisor (gcd) of two integers such as 45678 and 123 (i.e., three in this case). The core idea of this algorithm was first outlined in detail in Euclid’s Elements about 2300 years ago (Berlinski 2001; Chabert 1999). As another example, Newton’s method for finding roots of a nonlinear function was described by Isaac Newton in 1669, which forms the basis for many numerical methods in the current literature. Modern scientific computing consists a wide range of different algorithms, from simple Euler’s scheme to more advanced finite volume methods and from basic fast Fourier transform (FFT) to more sophisticated techniques for image processing and machine learning. Optimization concerns many applications, and the main idea is to find the maximum or minimum of a cost function, also called the objective function, which usually depends on a set of decision variables. For example, to find the minimum of a univariate function, f (x) = x 2 is an optimization problem. In the whole domain of real numbers x ∈ R, the optimal solution to this problem is x∗ = 0 with fmin = 0. In this simple case, we can find the solution by inspection because x 2 is nonnegative for any real number, and its value increases when |x| increases, and thus the minimum must be zero at x = 0. In general, optimization problems typically have many constraints, which pose limits on the values of independent variables and thus modify the domain and shapes formed by the independent variables. Such limits and constraints tend to make it harder to find solutions to the optimization problems, and often specialized optimization algorithms have to be used to solve a given class of optimization problems. Such techniques and algorithms range from Newton-Raphson’s method to the simplex method for linear programming (LP) and from the trust-region method to the interior point method (Boyd and Vandenberge 2004; Yang 2019). Loosely speaking, optimization is everywhere, from engineering design to business planning and from vehicle routing to artificial intelligence. After all, time, money, and resources are limited, and we have to use resources in an optimal, sustainable way. Engineers and designers have to design products so as to maximize efficiency, performance, and sustainability and, at the same time, to minimize

64 Social Algorithms and Optimization

1639

the costs, energy consumption, and environmental impact. Thus, optimization and optimization algorithms are of both academic interest and industrial importance. Many problems in other areas of science and engineering can be formulated as optimization problems. For example, classification and clustering in data mining are often formulated as optimization problems, while one of the main aims of machine learning is to minimize its training or fitting errors and maximize its prediction accuracy. Once the problems in data mining and machine learning are properly formulated as optimization problems (Yang 2019), they can in principle be solved by appropriate optimization algorithms. Though realizing that almost everything is optimization, it does not make it easier to find good solutions. In fact, many problems such as the well-known traveling salesman problem (TSP) and clustering problem are really hard to solve. For example, the TSP is nondeterministic polynomial-time (NP) hard, which means that the time complexity for solving such problems can usually increase exponentially with the problem size. In the general case of n cities, the aim is to visit each city exactly once so as to minimize the overall distance traveled, and the number of possible combinations is order of n!. For n = 100 cities, this means 100! = 100 × 99 × 98 . . . × 3 × 2 × 1 ≈ 9.3 × 10157 possible combinations, which is a huge number. This number is much higher than the number of particles in the whole known universe. Thus, exact algorithms attempting to try every possible combination are impractical. There are a wide range of algorithms for solving optimization problems. Traditional techniques tend to be local search methods, and they can be very effective. However, they are usually not effective for global optimization, except for special cases such as linear programming and convex optimization. New algorithms tend to be heuristic and metaheuristic, often based on swarm intelligence. Such metaheuristic algorithms are inspired by successful characteristics of biological systems in nature. The contemporary trend is to use a hybrid approach by combining both traditional algorithms with new nature-inspired algorithms such as genetic algorithm, cuckoo search, firefly algorithm, and particle swarm optimization. A special class of swarm intelligence based metaheuristic algorithms can be classified as “social algorithms,” which is the main focus of this work. Here, social algorithms, or like swarm intelligence in general, use a system of multiple agents with “social” interactions, following some local rules. Such algorithms can mimic certain successful feature and characteristics of social or biological systems, including social ants, bees, birds, and animals. Under certain conditions, local rules and interactions in a multi-agent system can somehow lead to some characteristics of self-organization, leading to a certain level of collective intelligence (Ashby 1962). From the terminological point of view, it is worth pointing out that the social algorithms in the present context do not include the algorithms for social media, even though algorithms for social media analysis are sometimes simply referred to as “social algorithms” (Lazer 2015). Therefore, the emphasis here will be mainly on the population-based algorithms that use multiple agents, based on the behavior of social swarm intelligence, and such algorithms are used mainly for the purpose of solving optimization problems.

1640

X.-S. Yang

A Brief History Loosely speaking, social algorithms belong to a wider class of swarm intelligence based algorithms, which in turn belong to metaheuristic algorithms. Here, “meta” means “beyond” and “higher-level,” while “heuristic” means “by trial and error.” In essence, metaheuristic algorithms tend to use some high-level structure and mechanisms such as memory- and information-guided moves to attempt to find solutions to an optimization problem iteratively by trial and error so that better solutions are more likely to pass onto new generations. With multiple trials iteratively, it is hoped that the best (ideally the global optimal) solutions can be found in the long run. In the modern context of metaheuristics, it is believed that Alan Turing, the pioneer of artificial intelligence, was the first to use heuristic in his Enigma-decoding work during the Second World War and his inspiring work on connectionism (the essence of neural networks) as outlined in his National Physical Laboratory report Intelligent Machinery (Turing 1948). Almost all traditional optimization algorithms are deterministic in the sense that the same solutions can be obtained by starting with the same initial point. The procedure is fixed and deterministic, and there is no randomness in algorithms. Though such purely deterministic algorithms can be very efficient, however, they can have some serious disadvantages such as the potential of being trapped in local optima and their lack of exploration ability. There is no reason that we should stick with deterministic nature of traditional algorithms. In fact, there are some distinct advantages using nondeterministic, stochastic components in algorithms, which can potentially increase the exploration abilities in algorithms. Though the hill-climbing with random restart can be considered as an algorithm with randomness, however, the randomness is purely in terms of initialization. The initiation of truly nondeterministic algorithms was when both the evolutionary strategy (ES) and genetic algorithm (GA) were developed in the 1960s. Both algorithms attempted to simulate the key feature of Darwinian evolution of biological systems. The GA was developed by John Holland in the 1960s (Holland 1975), and some basic genetic operators such as crossover, mutation, and selection were used. At about the same period, the ES was developed by Ingo Rechenberg and H. P. Schwefel, for constructing automatic experimenter using mutation and selection; however, crossover was not used in ES. Crossover is a fundamental operator to generate two new (child) solutions from parent solutions by swapping parts of one solution with the corresponding parts of the other solutions, typically in terms of representations as binary strings or chromosomes. Mutation is an operation to generate a new solution from an old solution by mutating one part (or multiple parts) of a solution, often by slipping one bit or multiple bits from 1 to 0 or from 0 to 1. Selection is a driving force by choosing the solutions with better fitness so as to pass onto the next generation. Here, the fitness can be the values of the objective function in case of maximization problems or can be redefined in any proper form. For example, for minimization problems, we can choose lower values of objectives as better fitness.

64 Social Algorithms and Optimization

1641

Another important development in the 1960s is the development of evolutionary programming (EP) in about 1966 by L. J. Fogel and colleagues (Fogel et al. 1966). EP attempted to simulate evolution as a learning tool to study artificial intelligence. All these algorithms and developments in the 1960s formed the foundation of evolutionary algorithms or more generally evolutionary computation A different approach, called simulated annealing (SA), was developed in 1983 by Kirkpatrick et al. (1983), and SA was inspired by the annealing characteristics of metals. Another approach, called tabu search, was developed by Fred Glover in 1986, which uses a tabu list to record the recent search history so as to improve the search performance. It is believed that it was Fred Glover who coined the word “metaheuristic” in his 1986 paper. In the context of social algorithms and swarm intelligence, the main developments started in the 1990s. In 1992, Marco Dorigo first developed the ant colony optimization (ACO) in his PhD work (Dorigo 1992), where the pheromone and local rules for social interactions were used. Then, in 1995, particle swarm optimization (PSO) was developed by James Kennedy and Russell C. Eberhart, based on the swarming characteristics of birds and fish (Kennedy and Eberhart 1995). In around 1997, a different algorithm, called differential evolution (DE), was developed for solving optimization problems (Storn and Price 1997). Though a very important algorithm for evolutionary computation and optimization, however, DE is not considered as a social algorithm. A key question that users often ask is that what the best algorithm is. In other words, is there a so-called best algorithm for solving optimization problems? This dream was dashed by the so-called no-free-lunch (NFL) theorem that was proved in 1997 by Wolpert and Macready (1997). In essence, this theorem means that there is no best algorithm for solving all possible problems, which has some profound impact in the optimization and machine learning communities. Loosely speaking, if the performance is averaged over all possible problems, the average performance of any algorithm is the same as that of a random search. The keywords here are “average” and “all.” If our concern is not about the performance measured by averaging over all possible problems, we are more concerned with finding solutions to a specific set of problems in a specialized subject area; there is no need to solve all possible problems. In this case, it becomes a zero-sum ranking problem. In other words, for a given set of problems, if we use a finite set of algorithms, it is possible to rank the specific performance for a particular problem (Joyce and Herrmann 2018). In this sense, the best algorithm does exist for a given problem or a particular type of problems. This conclusion is consistent with our empirical observations and experience that some algorithms, especially those using problem-specific knowledge, can perform better than others. For example, for linear programming, the simplex method usually works very well. Therefore, our aim shifts from finding the best algorithm to all problems to identifying the best algorithm for a given type of problem. Consequently, active research continues, just with a different emphasis and from different perspectives. Another active period for the development of social algorithms is the turn of this century. First in 2004, a honeybee algorithm was developed by Sunil Nakrani and

1642

X.-S. Yang

Craig Tovey (2004), which was used to optimize the host centers of the Internet. In 2005, the bees algorithm was developed by Pham et al. (2005), and the virtual bee algorithm was developed by Xin-She Yang in the same year (Yang 2005). In addition, at about the same time, the artificial bee colony (ABC) algorithm was developed by Karaboga (2005). These three types of bee-based algorithms form the class of bees-inspired algorithms, though they use some (but different) aspects of the foraging behavior of social bees. Further development of social algorithms include the firefly algorithm (FA) in 2008 and cuckoo search (CS) in 2009. FA was developed by Xin-She Yang, based on the flashing behavior of tropic firefly species (Yang 2008). The attraction mechanism between different fireflies or agents, together with the nonlinear variation of light intensity, leads to a rich feature that a single swarm of fireflies can automatically subdivide into multiple sub-swarms. Thus, FA can effectively deal with optimization problems with multimodal objective landscapes. On the other hand, cuckoo search (CS) was developed by Xin-She Yang and Suash Deb, based on the coevolutionary behavior of cuckoo-host species (Yang and Deb 2009). The CS algorithm can partly simulate the complex social interactions of cuckoo-host species coevolution. Furthermore, a bat-inspired algorithm was developed by Xin-She Yang, inspired by the echolocation characteristics of microbats (Yang 2010b). This algorithm uses frequency-tuning and variations of loudness and pulse emission rates as key components in the algorithm to mimic the foraging behavior of microbats. Loosely speaking, all the above algorithms can be considered as social algorithms because they use some sort of “social” interactions and biologically inspired rules. The above algorithms and studies have inspired even more active research in this area, and there are more algorithms emerging each year. However, not all new algorithms can be considered as social algorithms. To name a few nonsocial algorithms, we have music-inspired harmony search (Geem et al. 2001), gravityinspired gravitational search algorithm (GSA) (Rashedi et al. 2009), and others. For example, the flower pollination algorithm (FPA) is an algorithm inspired by the pollination features of flowering plants (Yang 2012), which can be effective for solving optimization problems (Rodrigues et al. 2016; Yang et al. 2014). These algorithms have been applied in different applications with promising results, and interested readers can refer to more recent and advanced literature (Del Ser et al. 2019; Palmieri et al. 2019; Yang and Papa 2016; Yang et al. 2015, 2018a). As applications are not our focus here, we will now introduce optimization and some social algorithms in greater technical detail.

Essence of Algorithms An algorithm is a step-by-step procedure for computation, usually in terms of an iterative equation or more often a set of iterative equations. For example, to find the square root of a > 0, we can use the following iterative equation

64 Social Algorithms and Optimization

xk+1

1643

  1 a = , xk + 2 xk

(1)

where k is the iteration counter (k = 0, 1, 2, . . .). The new estimate xk+1 is calculated by using the value xk at k, and the whole iteration procedure starts with a random, initial guess x0 . Without much knowledge, we can start with x0 = 1 in most cases. For example, for a = 16, we know its exact roots are +4 and −4. If we start with x0 = 1, we have   16 1 1+ = 8.5. x1 = 2 1

(2)

  4 1 x2 = 8.5 + = 5.191176471, 2 8.5

(3)

Similarly, we have

x3 ≈ 4.136664723,

x4 ≈ 4.002257525,

x5 ≈ 4.000000637,

(4)

(5)

√ which is very close to the true value of 16 = +4. This iterative formula is very effective, and the accuracy after only five iterations is up to six decimal places. As the iterations continue, the above formula will approach a fixed point x∗ → +4. It is straightforward to check that this formula can lead to the correct root for other initial guess such as x0 = 10 or x0 = 20, though more iterations will be needed for x0 = 100 because the initial guess is too far away from the true root. Now careful readers may wonder how to get the other root −4? If we choose the initial guess to be x0 = −1, we have x1 =

  16 1 −1 + = −8.5, 2 −1

x2 =

x3 ≈ −4.136664723,

  16 1 −8.5 + = −5.191176471, 2 −8.5 (6) x4 ≈ −4.002257525,

x5 ≈ −4.000000637,

(7)

(8)

which approaches to the other root quickly. Similarly, if we start from x0 = −10 and x0 = −5, we can always get −4, not +4. The above shows that different initial values can lead to different final solutions. This highlights a key issue. The final solution will somehow depend on the initial

1644

X.-S. Yang

starting point, though we do not have any prior knowledge where to start in most cases. In addition, if the iteration starts with x0 = 0, it will lead to division by zero, and thus the formula will not work in this case. This again shows that the right initial point is very important. Another issue is how to design a proper iteration formula such as (1)? In general, there is no easy way to design good iteration formulas. In the present case, let us start with Newton’s method. Newton’s method for finding the roots of a polynomial p(x) = 0 is a fundamental iteration scheme, which can be written as xk+1 = xk −

p(xk ) , p (xk )

(9)

where xk is the approximation at iteration k, and p (x) is the first derivative of p(x). This procedure typically starts with an initial guess x0 at k = 0. In most cases, as along as p = 0 and x0 is not too far away from the target solution, this algorithm can work very well. As we do not know the target solution x∗ = limk→∞ xk in advance, the initial guess can be an educated guess or a purely random guess. However, if the initial guess is too far way, the algorithm may never reach the final solution or simply fail. In addition, it is obvious that f  (x0 ) = 0; otherwise, the iteration will not work, due to division by zero. In our earlier example of finding the roots of a > 0, we have p(x) = x 2 − a,

(10)

which gives p (x) = 2x. Thus, Newton’s iteration formula becomes xk+1 = xk −

(xk2 − a) p(xk ) = x − , k p (xk ) 2xk

(11)

which can be rewritten as    a xk  1 a , + xk + xk+1 = xk − = 2 2xk 2 xk

(12)

which is the same as (1). This means that the foundation of (1) is essentially Newton’s method. This method can be modified to solve optimization problems. For example, for a single objective function f (x), the minimal and maximal values should occur at stationary points f  (x) = 0, which becomes a root-finding problem for f  (x). Thus, the maximum or minimum of f (x) can be found by modifying the Newton’s method as the following iterative formula: xk+1 = xk −

f  (xk ) . f  (xk )

(13)

64 Social Algorithms and Optimization

1645

It is worth pointing out that the iteration counter t (instead of k) is commonly used as a pseudo-time in many textbooks. Thus, the above formula can be rewritten as xt+1 = xt −

f  (xt ) . f  (xt )

(14)

For a D-dimensional problem with an objective function f (x) of D independent variables x = (x1 , x2 , . . . , xD ), the above iteration formula can be generalized to a vector form x t+1 = x t −

∇f (x t ) , ∇ 2 f (x t )

(15)

where we have used the notation convention x t to denote the current solution vector at iteration t (not to be confused with an exponent). Here, H = ∇ 2 f (x) is the Hessian matrix, which can be expensive to compute for large-scale problems. In general, an algorithm A can be written as x t+1 = A(x t , x ∗ , w1 , . . . , wK ),

(16)

which represents the fact that the new solution vector is a function of the existing solution vector x t , some historical best solution x ∗ during the iteration history, and a set of algorithm-dependent parameters w1 , w2 , . . . , wK . The exact function forms will depend on the algorithm, and different algorithms are only different in terms of the function form, the number of parameters, and the ways of using historical data.

Optimization Algorithms To see how optimization algorithms work, especially the role of social algorithms for solving optimization problems, let us first outline the general formulation of an optimization problem as constrained optimization.

Optimization Whatever an optimization problem may be, as long as we can somehow calculate the objective function, we can always formulate them as a mathematical optimization problem in a D-dimensional design space or search space. We have minimize f (x), subject to M equality constraints

x = (x1 , x2 , . . . , xD ) ∈ RD ,

(17)

1646

X.-S. Yang

hi (x) = 0, (i = 1, 2, . . . , M),

(18)

and N inequality constraints gj (x) ≤ 0, (j = 1, 2, . . . , N),

(19)

where hi and gj are functions of x and are usually nonlinear. It is worth pointing out that optimization can also be expressed as maximization, though many textbooks use minimization as the standard formulation. All functions f (x), hi (x), and gj (x) are called problem functions. In a special case when they are all linear, the optimization problem becomes a linear programming (LP) problem. Simplex method, developed by George Dantzig, can be used to solve LP efficiently. In addition, if the search domain is convex, and f (x) is also convex, the problem becomes a convex optimization problem, which can be solved effectively by methods such as quadratic programming and other methods (Boyd and Vandenberge 2004). The problem becomes difficult if all the problem functions are nonlinear and non-convex. There are a wide spectrum of optimization algorithms and techniques, including quadratic programming, interior-point method, trustregion method, and conjugate-gradient methods (Süli and Mayer 2003; Yang 2010c) as well as evolutionary algorithms (Goldberg 1989) and heuristics (Judea 1984). However, traditional algorithms often struggle to deal with nonlinear, multimodal problems. A major current trend is to use nature-inspired metaheuristic algorithms (Yang 2008, 2014b), including social algorithms.

Search for Optimality If an optimization problem is properly formulated, there should be always at least one optimal solution, which should lie in the search domain. If the global optimal solution x ∗ is unique, this means that f (x ∗ ) achieves the minimum fmin . In general, there may be multiple solutions x ∗i (i = 1, 2, . . .), but their objective values should be the same f (x ∗i ) = fmin ≤ f (x) for ∀x ∈ RD . This latter case makes the search for optimality even harder. Figuratively speaking, the process of searching for the optimal solution(s) is like treasure hunting with a time limit in a vast hilly landscape. What is the best strategy for finding the hidden treasure? Imagine that all the hunters are all blindfolded without any guidance, as one extreme scenario. The search is purely random, which is not efficient at all. In the other extreme, if we are told that the treasure is at the highest peak with the flashing beacon, all hunters would climb up the highest peak with the shortest path. The former corresponds to random search, while the latter corresponds to hill-climbing in terms of Newton’s method. In general, the scenario is between these two extremes. Hunters are not blindfolded and there is no guiding beacon. As both time and resource are limited (even for a very valuable treasure), we have to use some strategy to optimize the chance of finding the treasure. Obviously, it is a silly idea to search every square inch of

64 Social Algorithms and Optimization

1647

a vast area as it is not practical and may not be worth the effort. Thus, a more promising approach is to search randomly and gather much information and hints so as to guide future search moves. In addition, the search can be a single hunter alone (a single agent) or a group of hunters (a multi-agent system or a swarm). If the treasure is valuable enough, multiple hunters can be more efficient if they share information and search findings. This multi-agent search is similar to swarm intelligence or collective intelligence. The above swarm-based search strategy can be refined further. For example, each day after a day’s search action, all hunters meet and decide to keep the better hunters. A fraction of worst hunters are replaced by newly recruited hunters, and the group will continue the search next day. Such selection, based on their performance, is called elitism. After many days, the group may become elite hunters with more experience and thus more likely to find the treasure. To a certain degree, social algorithms behave like such swarm-based elite multi-agent systems (Yang 2014b). There are different ways of analyzing search algorithms. One way of looking at algorithms and optimization is to consider an algorithm system as a complex, selforganized system (Ashby 1962; Keller 2009), which evolves far from equilibrium to self-organization. Nowadays, researchers usually look at algorithms from the point of view of swarm intelligence (Engelbrecht 2005; Fisher 2009; Kennedy et al. 2001; Yang 2014b). The main task is to figure out the conditions and rules for the rise of swarm intelligence. However, this is still an unresolved question that how swarm intelligence can be achieved under what conditions.

Advantages of Social Algorithms As there is a vast literature about traditional optimization algorithms, why do we need new algorithms such as social algorithms? In short, though traditional algorithms such as gradient-based methods can work well, they are most local search. That is to say, there is no guarantee that such algorithms can find global optimal solutions, except for linear programming and convex optimization. In most cases, the final solutions will depend on the initial starting points. In addition, traditional algorithms tend to use problem-specific information. For example, gradient-based methods such as Newton-Raphson method use gradient information of the objective function, which can be expensive to calculate, especially the Hessian matrix in higher-dimensional problems. In some cases, the gradient information may not be available, for problems with discontinuities. Furthermore, traditional algorithms are largely deterministic with high exploitation ability, but their exploration ability is usually low. In contrast, social algorithms have been designed to avoid the above drawbacks by using nondeterministic, population-based approaches. The advantages of social algorithms can be summarized as follows: • They are gradient-free, global optimizers, with a higher probability to find the truly global optimal solutions. They are flexible and easy to implement.

1648

X.-S. Yang

• They consider the problem to be optimized as a black box; thus no problemspecific knowledge is needed, which means that they can potentially deal with a wider range of problem types. • Their exploration ability is high with a diverse population, and the final solutions tend to “forget” their initial starting points. Thus, these methods tend to be more stable and robust. Despite these advantages, population-based algorithms do have some disadvantages. In general, they require more computational efforts, compared with traditional algorithms. In addition, due to stochastic nature, the final solutions are not exactly repeatable, even though they can be sufficiently close to each other. Thus, multiple runs should be used, and the interpretation of results should be in the statistical sense.

Social Algorithms Now we are ready to introduce the main social algorithms. As the literature of social algorithms and swarm intelligence is vast and expanding rapidly, we can only introduce some of the most widely used and most recent algorithms here. There are different ways of introducing algorithms. One way is to introduce them in the chronological order as we did in the section of brief history. However, to gain more insight, we now group them into different categories, based on the main characteristics of their algorithmic equations. We can have algorithms with linear systems, nonlinear systems, and quasi-linear systems.

Algorithms as Descriptive Systems Not all algorithms can be expressed as iterative equations. Genetic algorithms provide a descriptive procedure for carrying our crossover, mutation, and selection. Thus, algorithms without detailed equations can loosely be called a descriptive algorithm system.

Ant Colony Optimization The ant colony optimization (ACO) was developed by Marco Dorigo in (1992), based on the main characteristics of social ants. Ants are social insects that live together in well-organized colonies with a population size ranging from about 2 million to 25 million without centralized control. They communicate with each other and interact with their environment in a swarm using local interactions and scent chemicals or pheromone. Pheromone is deposited by each agent, and such chemical will also evaporate. An ant colony as a complex system can somehow self-organize with emerging behavior, leading to some form of social intelligence. ACO can be effective in solving discrete optimization problems. For example, a solution in a network optimization problem can be a path or route. Multi-agents

64 Social Algorithms and Optimization

1649

as ants will explore the network paths and deposit pheromone when they move. The quality of a solution is related to the pheromone concentration on the path. The combination of pheromone evaporation and deposition can lead to selforganized paths. At a junction with multiple routes, the probability of choosing a particular route is determined by a decision criterion, depending on the normalized concentration of the route, and relative fitness of this route, comparing with all others. From the implementation point of view, ACO is a mixture of a descriptive procedure with equations of pheromone deposition and evaporation as well as the path selection probability.

Bees-Inspired Algorithms Bees-inspired algorithms are also descriptive. Bees such as honeybees live a colony with division of labor, including worker bees, queens, and drones. Honeybees communicate by pheromone and “waggle dance” and other local interactions, depending on species. Researchers have developed various forms and variants of bees-inspired algorithms, based on different aspects of foraging behavior and social interactions. Pheromone and foraging characteristics were used in different bee algorithms and their variants (Afshar et al. 2007; Karaboga 2005; Nakrani and Tovey 2004; Pham et al. 2005; Yang 2005). Interested readers can refer to more specialized literature.

Algorithms as Linear Systems The updating of solutions can be achieved by a set of iteration equations. If all the equations are linear, the system behavior can be analyzed relatively easily.

Particle Swarm Optimization Many swarms in nature such as fish and birds can have higher-level behavior, but they all obey simple rules. Based on such swarming characteristics, particle swarm optimization (PSO) was developed by Kennedy and Eberhart in 1995, which uses equations to simulate the swarming characteristics of birds and fish (Kennedy and Eberhart 1995). Let x i and v i be the position (solution) and velocity, respectively, of a particle or agent i. In PSO, there are n particles as a population; thus i = 1, 2, . . . , n. This algorithm consists of two equations for updating positions and velocities of particles in the following form: v t+1 = v ti + α 1 [g ∗ − x ti ] + β 2 [x ∗i − x ti ], i

(20)

x t+1 = x ti + v t+1 i i Δt,

(21)

where g ∗ is the best solution found so far by all the particles in the population, and each particle has an individual best solution x ∗i by itself during the entire past iteration history. Here,  1 and  2 are two uniformly distributed random numbers

1650

X.-S. Yang

in [0,1]. In addition, the parameters α and β are usually learning parameters or cognitive factors, and they are in the range of [0,2]. It is worth pointing out that the time increment Δt = 1 is used for discrete systems; thus we can simply ignore the units and omit Δt in algorithms, as it is a common practice in the literature on metaheuristic algorithms and textbooks on optimization. Both equations (20) and (21) are linear in the sense that they only depend on x i and v i linearly. Thus, the algorithm is relatively easy to implement, and the convergence behavior is relatively quick. Consequently, PSO has been applied in many applications (Engelbrecht 2005; Kennedy et al. 2001). However, linear systems can be limited in terms of system behavior, and the quick convergence can lead to so-called premature convergence for PSO when its population loses diversity and thus gets stuck locally. To remedy this drawback, researchers have developed more than 20 different variants with different degrees of success (Kennedy et al. 2001).

Artificial Bee Colony As we mentioned earlier, there are quite a few different forms of bees-inspired algorithms. One form is the so-called artificial bee colony (ABC) where bees are divided into three groups: forager bees, onlooker bees, and scouts (Karaboga 2005). For each food source, there is a forager bee to explore. The generation of a new solution xi,k is done by xi,k = xi,k + φ(xi,k − xj,k ),

(22)

which is updated for each dimension k = 1, 2, . . . , D for different solutions (e.g., i and j ) in a population of n bees (i, j = 1, 2, . . . , n). Here, φ is a random number in [−1,1]. A food source is chosen by a roulette-based probability criterion, while a scout bee uses a Monte Carlo style randomization between the lower bound (L) and the upper bound (U). xi,k = Lk + r(Uk − Lk ),

(23)

where r is a uniformly distributed random number in [0,1]. Both equations are linear. Thus, characteristics of linear systems can be expected.

Firefly Algorithm as a Nonlinear System Firefly algorithm (FA) was developed by Xin-She Yang in 2008, based on the flashing characteristics of tropical fireflies. The main algorithmic equation in FA is highly nonlinear (Yang 2008, 2010a). In nature, there are about 2000 species of fireflies, and most species produce short, rhythmic flashes by bioluminescence. Each species can have different flashing patterns and rhythms, and one of the main functions of such flashing light acts as a signaling system to communicate with other fireflies.

64 Social Algorithms and Optimization

1651

From physics we know that light intensity decreases as the distance from the source increases; the range of firefly visibility can be typically a few hundred meters, depending on weather conditions. FA uses a nonlinear system by combing the exponential decay of light absorption and inverse-square law of light variation with distance r. Thus, the main algorithmic equation for the position x i , as a solution vector to a problem, can be written as = x ti + β0 e−γ rij (x tj − x ti ) + α  ti , x t+1 i 2

(24)

where the second term on the right-hand side of the equation is highly nonlinear, corresponding to the attraction term with an attractiveness constant β0 when the distance between two fireflies is zero (i.e., rij = 0). Here, the scaling factor α controls the step sizes, and scale-dependent parameter γ controls the visibility of the fireflies (and thus search modes). It is worth pointing out the brightness of a firefly should be associated with the objective landscape with its position as the indicator, the attractiveness of a firefly seen by others, depending on their relative positions and relative brightness. Thus, the beauty is in the eye of the beholder. This means that a pair comparison is needed for comparing all fireflies in implementation. As α is a parameter controlling the strength of the randomness or perturbations in FA, randomness should be gradually reduced to speed up the overall convergence. Thus, in FA, we often use α = α0 δ t ,

(25)

where α0 is the initial value and 0 < δ < 1 is a reduction factor. In most applications, we can use δ = 0.9 to 0.99, depending on the type of problems and the desired quality of solutions. In addition, as γ is an important scaling factor, a good value of γ should be linked to the scale or limits of the design variables so that the fireflies within a range are visible to each other. Loosely speaking, γ can be estimated by γ =

1 , L2

(26)

where L the typical size of the search domain or the radius of a typical mode shape in the objective landscape. Unlike the linear PSO system, FA is an algorithm with a nonlinear updating equation. As the short-distance attraction is stronger than long-distance attraction, the FA has the ability to automatically subdivide the whole swarm into multiple subswarms, leading to an intrinsic multi-swarm system. Consequently, FA is naturally suitable for solving multimodal optimization problems (Yang 2014a, b).

1652

X.-S. Yang

Algorithms as Quasi-linear Systems In some algorithms such as cuckoo search, the equations are quasi-linear due to a switching condition using a Heaviside step function. Their behavior will be different, especially when Lévy flights are used. In addition, the nonlinear pulse emission rates in the bat algorithm also make it a different system.

Bat Algorithm Bat algorithm (BA), developed by Xin-She Yang in 2010, uses some characteristics of frequency-tuning and echolocation of microbats (Yang 2010b, 2011). There are about 1000 different bat species, and most bat species use echolocation to a certain degree, though microbats extensively use echolocation for foraging and navigation. They emit a series of loud, ultrasonic sound pules and listen their echoes to “see” their surroundings, and pulse emission rates will increase when homing for prey with frequency-modulated short pulses (thus varying wavelengths to increase the detection resolution). Depending on the species, each pulse may last about 5 to 20 ms with a frequency range of 25 to 150 kHz, and the spatial resolution can be as small as a few millimeters, comparable to the size of insects they hunt (Altringham 1998). In the BA, the variations of pulse emission rate r and loudness A to control exploration and exploitation are also used. For position x i and velocity v i for bat i, they are updated at each iteration by ⎧ ⎨ fi = fmin + (fmax − fmin )β, + (x t−1 − x ∗ )fi , v t = v t−1 i i ⎩ it t, + v x i = x t−1 i i

(27)

where x ∗ is the current best solution found so far by all the virtual bats. Here, β ∈ [0, 1] is a random vector drawn from a uniform distribution, which allows to tune the frequency from fmin to fmax . In addition, there is a switching condition to control exploration and exploitation by varying loudness A(t) from a high value to a lower value and the emission rate r from a lower value to a higher value. We have At+1 = αAti , i

rit+1 = ri0 (1 − e−γ t ),

(28)

where 0 < α < 1 and γ > 0 are two parameters. Though the two updating equations are linear, the use of varying loudness and pulse emission rates makes the algorithm weakly nonlinear. Thus, BA can usually have a faster convergence rate in comparison with PSO. The convergence behavior has been proved by Chen et al. (2018) to show that this algorithm can converge quickly under the right parameter settings. BA has been extended to multiobjective optimization and hybrid versions (Yang 2011, 2014b).

64 Social Algorithms and Optimization

1653

Cuckoo Search Cuckoo search (CS) was developed by Xin-She Yang and Suash Deb in (2009), inspired by the cuckoo-host coevolutionary behavior. In the natural world, among 141 cuckoo species, 59 species engage the so-called obligate brood parasitism (Davies 2011). These cuckoo species do not build their own nests, and they lay eggs in the nests of host birds such as warblers. The eggs of cuckoos can be sufficiently similar to eggs of host birds in terms the size, color, and texture so as to increase the survival probability of cuckoo eggs. In reality, about 1/5 to 1/4 of eggs laid by cuckoos will be discovered and abandoned by hosts. In fact, there is an arms race between cuckoo species and host species, forming an interesting cuckoo-host species coevolution system. In terms of algorithmic equations, CS uses a combination of both local and global search capabilities, controlled by a discovery probability pa (Yang and Deb 2009, 2010). In CS, one equation is quasi-linear = x ti + αs ⊗ H (pa − ) ⊗ (x tj − x tk ), x t+1 i

(29)

where x tj and x tk are two different solutions selected randomly by random permutation, H (u) is a Heaviside function,  is a random number drawn from a uniform distribution, and s is the step size. Here, the multiplications are carried out in a component-wise manner using ⊗. Though these steps are mainly local, such moves can become global if s is large enough. The main global search mechanism is realized by Lévy flights: = x ti + αL(s, λ), x t+1 i

(30)

where the Lévy flights are simulated (or drawn random numbers) by drawing random numbers from a Lévy distribution L(s, λ) ∼

λΓ (λ) sin(π λ/2) 1 , π s 1+λ

(s  0).

(31)

Here α > 0 is the step size scaling factor and Γ is the standard gamma function. In CS, the use of Lévy flights can enhance the search capability because a fraction of steps generated by Lévy flights are larger than those used in Gaussian. Thus, the search steps in CS are heavy-tailed (Pavlyukevich 2007; Reynolds and Rhodes 2009). Consequently, CS can be very effective for nonlinear optimization problems and multiobjective optimization (Gandomi et al. 2013; Yang 2014a; Yang and Deb 2013; Yildiz 2013). A relatively comprehensive literature review of cuckoo search has been carried out by Yang and Deb (2014). Most recently, the cuckoo search has been extended to a multi-species cuckoo search (MSCS) by Yang et al. (2018b) where multiple cuckoo species and multiple host species coevolve to achieve some optimal characteristics. As we mentioned earlier, the literature concerning nature-inspired algorithms and swarm intelligence is expanding rapidly, and more social algorithms are being

1654

X.-S. Yang

developed by researchers, but we will not introduce more algorithms here. Instead, we will focus on summarizing the key characteristics of social algorithms and other population-based algorithms so as to gain a deeper understanding of these algorithms from the perspectives of self-organization and the balance of exploitation and exploration.

Algorithm Analysis and Open Problems Algorithms can be analyzed from many different perspectives, including computational complexity, fixed-point theory, Bayesian statistics, dynamical systems, filter theory, and Markov chain theory (Yang 2019). Here we will briefly discuss the algorithms and their link with self-organization and the role of exploration and exploitation.

Algorithms and Self-Organization Algorithms can be very different, and different social algorithms have different characteristics and inspiration from nature. However, it is possible to look at such algorithms to find some common feature and characteristics. • Multi-agent systems: All social algorithms are population-based algorithms with a set of multiple agents such as ants, cuckoos, fireflies, and particles. These agents interact and coevolve with other agents, and ultimately certain converged stages are reached by some or all the agents in the population. • Search mechanisms: Search moves can be local or global, depending on how far they may be from existing solutions. Modifications of the current population are usually carried out by mutation, randomization, crossover, and some hybrid operations. Explorative moves tend to have higher diversity, but solutions may be far away, while exploitative moves tend to be very effective but with lower diversity. If the search is primarily local and exploitative, it increases the probability of getting stuck locally. If the search focuses too much on global, explorative moves, it will slow down the convergence. Different algorithms may use different amount of randomization, and it often requires a fine balance between exploitation and exploration (Blum and Roli 2003). • Selection and elitism: Selection of the better or best solutions acts as a driving force for the system to converge. In most cases, the “survival of the fittest,” or simply elitism, can be used to make sure that the best solution g ∗ is kept in the population in the next generation. The consequence of such selection mechanism is to reduce the diversity of the population and force the population into certain organized structure or self-organization. We can summarize the above components and characteristics in Table 1. On the other hand, we can also look at social algorithms from the self-organization

64 Social Algorithms and Optimization

1655

Table 1 Characteristics and properties of social algorithms and self-organization Algorithm Population Randomization Iteration Selection

Properties Diversity and sampling Escape local optima Evolution of solutions Convergence/driving forces

Self-organization States Noise, perturbation Reorganization Selection mechanism

Characteristics Complexity Far from equilibrium State changes Organization

perspective. A complex system with diverse states can self-organized into highlevel structures under certain conditions such as higher degrees of freedom, far from equilibrium, enough noise, or perturbation combined with a driving mechanism (Ashby 1962). There are some strong similarities between algorithms and selforganization, as can be seen from the same table. Despite the similarities mentioned in Table 1, there are some fundamental differences between algorithms and self-organization. First, the exact avenue to self-organization may not be clear for self-organized systems; the evolution of solutions is often clear in algorithms. Second, time is usually not important for selforganization, but it is very important for algorithms. On the other hand, it may be desirable in algorithms to avoid certain organized structures, leading to premature convergence, but it is not clear how to avoid such premature convergence and under what conditions.

Balance of Exploitation and Exploration A key issue concerning algorithms is the right amount of exploitation and exploration. Exploration provides diversification, which enables an algorithm to explore different regions in a vast search space, and it is thus more likely to find the true global optimal solution. Exploration is often realized by mutation and strong randomization. On the other hand, exploitation uses information such as derivatives to guide local search more intensively, and such intensification can usually enhance the overall convergence. However, too much exploitation and too little exploration can significantly reduce the probability of finding the true global optimality, while too much exploration and too little exploitation can make an algorithm converge slowly. Clearly, a fine balance between these two conflicting components is needed. But how to achieve such balance is still an open problem. For a given algorithm, it is possible track the percentage of moves in exploration and exploitation. However, such moves largely depend on the algorithmic structures and configuration of the algorithm. Based on some simulations (Yang 2019), we can plot out these components for the commonly used algorithms as an explorationexploitation graph shown in Fig. 1 where nonsocial algorithms are also presented, including evolutionary programming (EP), evolutionary strategy (ES), simulated annealing (SA), and Nelder-Mead simplex method.

1656

X.-S. Yang

Fig. 1 Exploration and exploration abilities of some commonly used algorithms

Open Problems As we mentioned earlier, there are still some open problems in this area, despite active research and extensive studies. Here, we highlight a few open problems as follows: 1. Is there a unified mathematical framework? At the moment, it still lacks a rigorous mathematical framework to analyze all social algorithms and all other algorithms in a unified way so as to understand their stability, convergence, robustness, and other properties (He et al. 2017; Yang and He 2019). More importantly, it is still unclear about the conditions for the rise of social intelligence and swarm intelligence in a multi-agent system. 2. What is the optimal balance between exploration and exploitation? As we have seen earlier, we do not know what the optimal balance should be. It may be the case such balance may be problem-specific and algorithm-specific, and it may be dynamically varying during iterations, not static. 3. Can these algorithms solve large-scale problems without modifications? Though these algorithms are quite flexible and effective for many applications, however, the problem sizes are usually small or moderate with number of variables at most up to a few hundred. It is not clear if we can scale them up to solve truly largescale problems without modifications, including NP-hard problems such as the traveling salesman problem. Researchers are not sure how to modify existing social algorithms to cope with such challenges. 4. Can we design some self-adaptive and self-evolving algorithms? In almost all algorithms, there are parameters that need to be tuned because their setting may affect the performance of the algorithm under consideration. Ideally, an algorithm

64 Social Algorithms and Optimization

1657

should be able to adapt so as to suit for solving different types of problems (Yang et al. 2013). Ultimately, an algorithm should be self-adaptive and be able to automatically tune itself to suit for a given type of problems without much supervision from the users, and such algorithms should also be able to evolve by learning from their past performance histories. Again this is still a long away to go and it may require some multidisciplinary effort. 5. What are the conditions for the rise of collective intelligence? Though there are some extensive studies on social and swarm intelligence in the literature, it still lacks fundamental understanding about the conditions for the rise of intelligence. How local rules and interactions can exactly lead to certain higher-level structure and collective behavior. To answer all these important questions, it may require some systematical research by the whole research communities in the coming years.

Conclusions As we have seen in this work, social algorithms have become a powerful tool set for solving optimization problems, and their studies form an active area of research. We have introduced social algorithms concerning their brief history, algorithm structures, and characteristics. We have also highlighted some open questions. It is hoped that the above open questions can inspire more research and researchers can design more efficient algorithms so that they can solve large-scale real-world problems in a diverse range of applications.

References Afshar A, Haddad OB, Marino MA, Adams BJ (2007) Honey-bee mating optimization (HBMO) algorithm for optimal reservoir operation. J Franklin Inst 344(4):452–462 Altringham JD (1998) Bats: biology and behaviour. Oxford University Press, Oxford Ashby WA (1962) Princinples of the self-organizing system. In Von Foerster H, Zopf GW Jr (eds) Principles of self-organization: transactions of the University of Illinois Symposium. Pergamon Press, London, pp 255–278 Berlinski D (2001) The advent of the algorithm: the 300-year journey from idea to the computer. Harvest Book, New York Blum C, Roli A (2003) Metaheuristics in combinatorial optimization: overview and conceptual comparison. ACM Comput Surv 35(2):268–308 Boyd S, Vandenberge L (2004) Convex optimization. Cambridge University Press, Cambridge Chabert JL (1999) A history of algorithms: from the pebble to the Microchip. Springer, Heidelberg Chen S, Peng GH, He XS, Yang XS (2018) Global convergence analysis of the bat algorithm using a Markovian framework and dynamical system theory. Exp Syst Appl 114(1):173–182 Davies NB (2011) Cuckoo adaptations: trickery and tuning. J Zool 284(1):1–14 Del Ser J, Osaba E, Yang XS, Salcedo-Sanz S, Camacho D, Das S, Suganthan PN, Coello Coello CA (2019) Bio-inspired computation: where we stand and what’s next. Swarm Evol Comput 48(1):220–250

1658

X.-S. Yang

Dorigo M (1992) Optimization, learning and natural algorithms. Ph.D. thesis, Politecnico di Milano Engelbrecht AP (2005) Fundamentals of computational swarm intelligence. Wiley, Hoboken Fisher L (2009) The perfect swarm: the science of complexity in everyday life. Basic Books, New York Fogel LJ, Owens AJ, Walsh MJ (1966) Artificial intelligence through simulated evolution. Wiley, New York Gandomi AH, Yang XS, Alavi AH (2013) Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems. Eng Comput 29(1):17–35 Geem ZW, Kim JH, Loganathan GV (2001) A new heuristic optimization algorithm: harmony search. Simulation 76(2):60–68 Glover F (1986) Future paths for integer programming and links to artificial intelligence. Comput Oper Res 13(5):533–549 Goldberg DE (1989) Genetic algorithms in search, optimisation and machine learning. Addison Wesley, Reading He XS, Yang XS, Karamanoglu M, Zhao YX (2017) Global convergence analysis of the flower pollination algorithm: a discrete-time Markov chain approach. Proc Comput Sci 108(1):1354– 1363 Holland J (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor Joyce T, Herrmann JM (2018) A review of no free lunch theorems, and their implications for metaheuristic optimisation. In: Yang XS (ed) Nature-inspired algorithms and applied optimization. Springer, Cham, pp 27–52 Judea P (1984) Heuristics. Addison-Wesley, New York Karaboga D (2005) An idea based on honeybee swarm for numerical optimization. Technical Report, Erciyes University Keller EF (2009) Organisms, machines, and thunderstorms: a history of self-organization, part two. Complexity, emergence, and stable attractors. Hist Stud Nat Sci 39(1):1–31 Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of IEEE international conference on neural networks, Piscataway, pp 1942–1948 Kennedy J, Eberhart RC, Shi Y (2001) Swarm intelligence. Academic Press, London Kirkpatrick S, Gellat CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680 Lazer D (2015) The rise of the social algorithm. Science 348(6239):1090–1091 Nakrani S, Tovey C (2004) On honeybees and dynamic server allocation in internet hosting centers. Adapt Behav 12(3):223–40 Palmieri N, Yang XS, De Rango F, Marano S (2019) Comparison of bio-inspired algorithms applied to the coordination of mobile robots considering the energy consumption. Neural Comput Appl 31(1):263–286 Pavlyukevich I (2007) Lévy flights, non-local search and simulated annealing. J Comput Phys 226(2):1830–1844 Pham DT, Ghanbarzadeh A, Koc E, Otri S, Rahim S, Zaidi M (2005) The bees algorithm. Technical Note, Manufacturing Engineering Centre, Cardiff University Rashedi E, Nezamabadi-pour H, Sayazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179(13):2232–2248 Reynolds AM, Rhodes CJ (2009) The Lévy fligth paradigm: random search patterns and mechanisms. Ecology 90(4):877–887 Rodrigues D, Silva GFA, Papa JP, Marana AN, Yang XS (2016) EEG-based person identification through binary flower pollination algorithm. Expert Syst Appl 62(1):81–90 Storn R, Price K (1997) Differential evolution: a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359 Süli E, Mayer D (2003) An introduction to numerical analysis. Cambridge University Press, Cambridge Turing AM (1948) Intelligent machinery. National Physical Laboratory, Technical Report

64 Social Algorithms and Optimization

1659

Wolpert DH, Macready WG (1997) No free lunch theorem for optimization. IEEE Trans Evol Comput 1(1):67–82 Yang XS (2005) Engineering optimization via nature-inspired virtual bee algorithms. In: Artificial intelligence and knowledge engineering application: a bioinspired approach, Proceedings of IWINAC, pp 317–323 Yang XS (2008) Nature-inspired metaheuristic algorithms. Luniver Press, Bristol Yang XS (2010a) Firefly algorithm, stochastic test functions and design optimisation. Int J BioInspired Comput 2(2):78–84 Yang XS (2010b) A new metaheuristic bat-inspired algorithm. In: Nature-inspired cooperative strategies for optimization (NICSO 2010). Springer Berlin, SCI 284, pp 65–74 Yang XS (2010c) Engineering optimization: an introduction with metaheuristic applications. Wiley, Hoboken Yang XS (2011) Bat algorithm for multi-objective optimisation. Int J Bio-Inspired Comput 3(5):267–274 Yang XS (2012) Flower pollination algorithm for global optimization. In: Unconventional computation and natural computation. Lecture notes in computer science, vol 7445, Springer, Heidelberg, pp 240–249 Yang XS (2014a) Cuckoo search and firefly algorithm: theory and applications. Studies in computational intelligence, vol 516. Springer, Heidelberg Yang XS (2014b) Nature-inspired optimization algorithms. Elsevier Insight, London Yang XS (2019) Introduction to algorithms for data mining and machine learning. Academic Press, London Yang XS, Deb S (2009) Cuckoo search via Lévy flights. In: Proceedings of world congress on nature & biologically inspired computing (NaBic 2009), Coimbatore. IEEE Publications, pp 210–214 Yang XS, Deb S (2010) Engineering optimization by cuckoo search. Int J Math Model Num Opt 1(4):330–343 Yang XS, Deb S (2013) Multiobjective cuckoo search for design optimization. Comput Oper Res 40(6):1616–1624 Yang XS, Deb S (2014) Cuckoo search: recent advances and applications. Neural Comput Appl 24(1):169–174 Yang XS, He XS (2019) Mathematical foundations of nature-inspired algorithms. Springer briefs in optimization. Springer, Cham Yang XS, Papa JP (2016) Bio-inspired computation and applications in image processing. Academic Press, London Yang XS, Deb S, Loomes M, Karamanoglu M (2013) A framework for self-tuning optimization algorithm. Neural Comput Appl 23(7–8):2051–2057 Yang XS, Karamanoglu M, He XS (2014) Flower pollination algorithm: a novel approach for multiobjective optimization. Eng Opt 46(9):1222–1237 Yang XS, Chien SF, Ting TO (2015) Bio-inspired computation in telecommunications. Morgan Kaufmann, Waltham Yang XS, Deb S, Zhao YX, Fong S, He X (2018a) Swarm intelligence: past, present and future. Soft Comput 22(18):5923–5933 Yang XS, Deb S, Mishra SK (2018b) Multi-species cuckoo search algorithm for global optimization. Cogn Comput 10(6):1085–1095 Yildiz AR (2013) Cuckoo search algorithm for the selection of optimal machine parameters in milling operations. Int J Adv Manuf Technol 64(1):55–61

Applications of the Gini Index Beyond Economics and Statistics

65

Roberta La Haye and Petr Zizler

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gini’s Measures and the Lorenz Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Standard Deviation and Coefficient of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications of the Gini Index and GMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Society and Household Income Inequity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contrast in Grayscale Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Lorenz-Inspired Measures of Spread and Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . Further Modeling with the Lorenz Curve and Gini Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . Equalization and the Gini Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Golden Equity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Golden Academia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary of Desirable Properties of Measures of Inequality and Spread . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1662 1663 1666 1668 1669 1674 1677 1681 1681 1683 1685 1686 1688 1688

Abstract This chapter discusses the Gini index and Gini mean difference, two closely related statistics that have vastly different applications. These applications include traditional ones such as income and wealth as well as nontraditional ones arising in such diverse fields such as digital imaging, genetics, and astronomy. The Gini index and Gini mean difference are defined in terms of the Lorenz curve. The Lorenz curve allows the development of a geometric intuition regarding the quantities involved. The curve provides an interesting perspective on how the Gini index and the Gini mean difference change under various

R. La Haye · P. Zizler () Mount Royal University, Calgary, AB, Canada e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_103

1661

1662

R. La Haye and P. Zizler

scenarios involving household income inequality in a country. The Lorenz curve and Gini’s two measures are further explored in the context of measuring contrast in grayscale images as well. The coefficient of variation and standard deviation are also discussed for the sake of comparison, and several other measures of spread and inequality arising in the Lorenz curve are noted.

Keywords Gini index · Gini mean difference · standard deviation · coefficient of correlation · Robin Hood index

Introduction Italian statistician Corrado Gini, 1884–1965, had a variety of interests including economics, sociology, demography, and biology. No doubt his broad interests inspired many of his over 800 publications, including articles, books, and conference papers (Giorgi and Gubbiotti 2017). Gini also founded the statistical journal Metron and the population studies journal Genus among a litany of other accomplishments. It was the Gini index, though, that made him famous. Perhaps this is appropriate, since the Gini index is a leading measure of inequality appearing in a startling breadth of applications. The Gini index or Gini coefficient is a well-known and well-used measure of inequality or inequity. It is a unitless quantity bounded between zero and one. The Gini index quantifies how fairly a resource is distributed in a population. The closer the Gini index is to zero, the closer the situation is to perfect equity. A Gini index of one would occur if one population member held all of the resource, marking the worst inequality possible. The Gini index is commonly used in economics to summarize the distribution of household income or wealth, and it continues to find applications in a wide variety of fields. A second, lesser known statistical measure is also credited to Gini. It is now referred to as the Gini mean difference, or the GMD. It is a measure of spread that has also found applications in a variety of fields beyond statistics. When the GMD was first defined and discussed, it was in competition with the standard deviation for prominence as a measure of dispersion. While the standard deviation is unquestionably a vital measure of spread, some experts now believe the GMD deserves more recognition; see Yitzhaki (2003) and Gerstenberger and Vogel (2015) for example. This chapter considers both the Gini index and GMD and notes their relationship to each other, their applications, and their properties. A function known as the Lorenz curve facilitates the discussion and provides the option of understanding various quantities geometrically. Various models of the Lorenz curve and wealth distributions are discussed in the context of applications. Other measures of spread and measures of inequity are noted as well, including the standard deviation and the coefficient of variation.

65 Applications of the Gini Index Beyond Economics and Statistics

1663

Gini’s Measures and the Lorenz Curve The Gini index and the GMD have both been known for over a century. There are many ways to define the Gini index and GMD, both in the case of discrete random variables and in the case of continuous ones. The aptly named article More than a Dozen Alternative Ways of Spelling Gini (Yitzhaki 1997) describes many equivalent definitions of the two quantities. In order to develop a geometric intuition regarding the quantities involved, the Gini index is defined in this chapter in terms of the Lorenz curve and integrals of continuous functions. It is a common way to define the Gini index. Consider some quantity of interest distributed in a population. The options for both the quantity and the population are vast, and some options are noted in the beginning of the section on applications. The Gini index in this situation can be defined in terms of a function known as the Lorenz curve. Definition 1. The Lorenz curve, L(p), is defined by L(p) = q if and only if the poorest p × 100 percent of the population own q × 100 percent of the quantity of interest. It is clearly the case that L(0) = 0 and L(1) = 1 and L(p) is increasing on [0, 1]. It is also well-known that the Lorenz curve is convex on [0, 1] or possibly piecewise linear. The Lorenz curve L(p) = p is known as the line of perfect equity as it represents the situation when every member of the population has the same amount of the quantity of interest. Figure 1 below shows the Lorenz curve L(p) = p3 along with the line of perfect equity. The Gini index can be defined in terms of the Lorenz curve. Definition 2. If L(p) is the Lorenz curve describing the distribution of the quantity of interest in a population, then the Gini index, G, is defined as 

1

G=2

(p − L(p)) dp.

0

Simply put, the Gini index quantifies how far the situation is from perfect equity. Geometrically, it is twice the area trapped between the line of perfect equity and the Lorenz curve. See Fig. 2 for an example. The factor of 2 in the defining formula is a scaling factor that forces the Gini index to fall between 0 and 1 inclusive. As noted in the introduction, the closer the Gini index is to 0 the more equitable the distribution of the resource. In contrast, the GMD is more typically defined as an expected value. Let X be the amount (or number of units) of the quantity of interest held by a randomly selected population member. Let μ be the mean number of units of the quantity held by

1664

R. La Haye and P. Zizler

Fig. 1 The Lorenz curve L(p) = p 3 and the line of perfect equity L(p) = p

a population member. Assume X is a continuous random variable with distribution function F (x). Thus F (x) is the proportion of the population holding at most x units of the quantity of interest. The derivative of the distribution function is the density function f (x). In the context described, it is reasonable and convenient to assume X ≥ 0. (Yitzhaki (1997) explains how to handle the situation when the random variable is permitted to have negative values.) The original definition of the GMD is the mean or the expected absolute difference between two identical, independent copies of X. Definition 3. Let X1 and X2 be two identical, independent copies of the random variable X. Then the GMD is defined as follows: 

∞ ∞

GMD = E(|X1 − X2 |) = 0

|x1 − x2 |f (x2 )f (x1 ) dx2 dx1 .

0

It was Gini himself (Dalton (1920)) who first noted the following fundamental relationship between the GMD and the Gini index: GMD = 2μG.

(1)

65 Applications of the Gini Index Beyond Economics and Statistics

1665

Fig. 2 The geometric interpretation of the Gini index, G, for the Lorenz curve L(p) = p 3

Equation (1) may seem surprising given that the Gini index given was defined in terms of the Lorenz curve L(p) and the GMD in terms of the density function f (x). However, therelationship can be made more transparent with a few observations. x Note that 0 tf (t) dt is the mean number of units held by those with at most x units. Denote p = F (x) so that x = F −1 (p) and observe  F −1 (p) L(p) =

∞

0

0

tf (t) dt

tf (t) dt

=

1 μ



F −1 (p)

tf (t) dt.

(2)

0

Let s(p) = L (p). Note that if equation (2) is differentiated then dL x = s(p) = . dp μ

(3)

It is clear from equation (3) that the Lorenz function L is indeed convex or possibly piecewise linear. Equations (2) and (3) also establish how to convert between dealing with p and the Lorenz function L(p) and dealing with x and F (x). This conversion process was established by multiple authors including Gastwirth

1666

R. La Haye and P. Zizler

(1971). The relationships between x and p derived in equation (3) can be used to give a fairly quick proof of equation (1). Indeed, if p = F (x), then for the purpose of substitution in integrals, dp = f (x)dx. Denote dp = f (x1 )dx1 , dq = f (x2 )dx2 , L (p) = s(p), L (q) = s(q), and transform the GMD from an integral in terms of x1 and x2 into an integral in terms of p and q. 

∞ ∞

GMD = E(|X1 − X2 |) = 0

|x1 − x2 |f (x2 )f (x1 ) dx2 dx1

0



∞  F (x1 )

=2 0



1 p

= 2μ 

(x1 − x2 )f (x2 )f (x1 ) dx2 dx1

0

0 1

= 2μ

(s(p) − s(q)) dqdp

0

s(p)p − L(p) dp

0



1+G 1−G − = 2μ 2 2



= 2μG. Equation (1) confirms that the Gini index is indeed a unit-less quantity. Another consequence is that the GMD is at most 2μ. The relationship between the Gini index and GMD might remind the reader of the relationship between the standard deviation and coefficient of variation, CV. This will be discussed further in the next section.

The Standard Deviation and Coefficient of Variation The standard deviation is the best-known measure of spread and its importance as a parameter of the normal distribution guarantees that will always remain the case. The usual definition of the standard deviation is as the square root of the mean square distance to the mean. Specifically: Definition 4. The standard deviation, σ is defined by   σ 2 = E (X − μ)2 . By definition, the standard deviation measures spread about the mean and is nonnegative. It is known to be sensitive to outliers. It will be more sensitive to outliers than the GMD because of the squaring of differences noted in the definition.

65 Applications of the Gini Index Beyond Economics and Statistics

1667

If the standard deviation is close to zero, then there is little spread from the mean. However, it can attain arbitrarily large values. The standard deviation can be defined without mention of the mean, specifically σ2 =

 1  E (X1 − X2 )2 , 2

(4)

where X1 and X2 are the same as in the definition of the GMD. This form makes the relationship between the GMD and the standard deviation more apparent. To see equation (4) holds note     E (X1 − X2 )2 = E ((X1 − μ) − (X2 − μ))2     = E (X1 − μ)2 + E (X2 − μ)2 − 2E ((X1 − μ) (X2 − μ))     = E (X1 − μ)2 + E (X2 − μ)2 = 2σ 2 since E ((X1 − μ) (X2 − μ)) = 0 due to independence. The GMD can be bounded above by a multiple of the standard deviation. Specifically: 2σ GMD ≤ √ . 3 One proof of this result can be found in Piesch (2005). This inequality cannot be improved in general since equality is attained when the uniform distribution is involved. However, it can be shown that this inequality can be improved upon if additional information is known (La Haye and Zizler 2019). The coefficient of variation can be easily defined in terms of the standard deviation. Definition 5. The coefficient of variation, CV, is defined by CV =

σ . μ

The coefficient of variation is undefined if the mean is equal to zero. Like the Gini index, it is a unitless quantity. However, while the Gini index is bounded between zero and one, the CV has no upper bound. It is possible to express the CV in terms of the derivative of the Lorenz function L (p) = s(p) as follows:

1668

R. La Haye and P. Zizler

 CV 2 + 1 =

1

s 2 (p) dp.

(5)

0

Indeed 

1

 s (p) dp = 2

0

1

(s(p) − 1 + 1)2 dp

0

 =

1

 (s(p) − 1) dp + 2

0

 = =

0

 =

0

2(s(p) − 1) dp + 1

0 1

0



1

1

(s(p) − 1)2 dp + 2(L(p) − p)|10 + 1 (s(p) − 1)2 dp + 1

∞x

μ

2 − 1 f (x) dx + 1

= CV2 + 1. With the Gini index, GMD, standard deviation and CV now introduced, applications can be discussed.

Applications of the Gini Index and GMD Economists have long used the Gini index to gauge inequality in such things as income, wealth and land ownership. Its applications extend well beyond economics though. Below are but a few examples. The Gini index has been used to gauge inequality in water use in China (Wang et al. 2012) and to understand the nitrous oxide emissions from a bioenergy landscape (Saha et al. 2018). In the field of astronomy researchers proposed using the Gini index to measure the inequality with which a galaxy’s light is distributed (Abraham et al. 2003). Researchers in genetics used it to describe the distributions of expression levels of different genes between tissues or between cell lines (O’Hagan et al. 2018). Measuring spread is better understood by the population in general as it discussed in every introductory statistics class. As noted when the GMD was defined, the argument that has not yet been settled is whether the GMD is a more appropriate measure of spread in some instances than the standard deviation. At very least, the GMD, through the Lorenz curve allows for a better understanding of situations geometrically than the standard deviation. Below two applications are introduced and explored in more detail. Using the Gini index to measure household income inequality and using the Gini Mean

65 Applications of the Gini Index Beyond Economics and Statistics Table 1 Recent Gini Index Values for Household Incomes

Country Sweden Canada United States Mexico Guatemala South Africa

1669 Gini Index 0.249 0.321 0.45 0.482 0.53 0.625

Difference to measure contrast in a grayscale image. In both cases the Lorenz curve plays a fundamental role in the discussion.

Society and Household Income Inequity Consider the incomes of US households. According to Jantzen and Volpert (2012), in the year 2009, the poorest 80 percent of American households owned under 50 percent of the total household income. In terms of the corresponding Lorenz function, this translates as L(0.8) < 0.5. The table below shows the Gini index for household incomes for various countries at a given time. The data is drawn from the CIA Worldfactbook in January 2020 www.cia.gov/library/publications/resources/the-world-factbook/ fields/223rank.html. In calculations X would represent the income of a randomly selected household, and p = F (x) would be the proportion of households with income at most x. Anyone wishing to use or understand the Gini index in the context of household income inequality must also appreciate its limitations. It is a single number used to summarize a complex situation, and thus it can be a blunt tool to gauge inequality. Furthermore, in the specific case of income, one must be careful when comparing Gini indices. The following concerns, for example, were noted in Evelyn Lamb’s 2012 article Ask Gini: How to Measure Inequality https://www.scientificamerican. com/article/ask-gini/: 1. The data used to create the most recent Gini index on income often comes from different years for different countries. Consequently the Gini index for one country might be 5 years or so older than the Gini index for the country that it is compared to. 2. The methodology used to calculate the Gini index for one country might differ radically from another. The US census bureau usually reports the Gini index on income based on pretax numbers, while many foreign countries employ post tax numbers. Post tax numbers include the redistribution of income from rich to poor and thus reduce the Gini index. A potential solution to this problem is discussed in a later section.

1670

R. La Haye and P. Zizler

3. The Gini index does not report the total household income of a country. Two countries could have the same Gini index, and one country could be quite rich in income and the other country desperately poor in income. Despite these limitations, the Gini index is arguably the most well-used measure of income inequality. The examples below outline a few common income distribution models as well as showcase the utility of the Lorenz function. It is worthwhile to note that the Lorenz curve L(p) can be piecewise linear; take for example a society with two possible incomes. Uniformly Distributed Incomes: Consider the uniform distribution function

F (x) =

⎧ ⎨0 ⎩

x−a b−a ,

1

xb

in the context of household income. Suppose interval [a, b] is the interval [0, 300,000] as the household incomes are uniformly distributed over this interval. Therefore, half of the society has income at most $ 150,000, and the other half have income between $ 150,000 and $ 300,000. Every income interval of width $ 3,000 represents one percent of the population. x , 0 ≤ x ≤ 300,000, and it follows from equation (3) that the Then F (x) = 300,000 resulting Lorentz curve is L(p) = p2 for 0 ≤ p ≤ 1. Despite the obvious inequality of say the poorest one percent compared to the richest one percent, the Gini index is equal to 1/3, so it is closer to zero than it is to one. This value is lower than the Gini index for the United States. The standard deviation in this situation is about $ 86,600, while the GMD is equal to $ 100,000. The uniform distribution of income and the Lorenz curve L(p) = p2 can arise from a discrete setting as well. The following example was noted by Catalano et al. (2009). Suppose people in a society are lined up at random and assigned an income according to the following rule: The kth person in line is given 2k − 1 thousands of dollars. If there are n people in total then we have T =

n i=1

(2k − 1) = 2

n(n + 1) − n = n2 . 2

2 It follows that L nk = nk 2 . If n is large then it is natural to model this scenario by the continuous Lorenz function L(p) = p2 . Distribution of Incomes Based on the Pareto Principal: The Pareto principal claims

that, in many instances, 20 percent of the causes are responsible for 80 percent of the effect. This idea inspires a family of Lorenz curves called the Pareto distribution Lorenz curves. Consider a society where the richest 20 percent of the households

65 Applications of the Gini Index Beyond Economics and Statistics

1671

hold 80 percent of the wealth. Assume that this phenomenon repeats within itself, in a fractal-like nature, meaning 20 percent of the richest 20 percent hold 80 percent of the 80 percent of wealth, and so on. Generalizing the above and with arbitrary constants R and P instead of .20 and .80, we have the following relationships, as noted by Jantzen and Volpert (2012): L(1 − R) = 1 − P . L(1 − R 2 ) = 1 − P 2 . L(1 − R k ) = 1 − P k . Setting x = 1 − R k then 1 − x = R k and P k = (R k )

ln(P ) ln R

. Thus we obtain

ln P

L(x) = 1 − (1 − x) ln R . Thus, any arising Lorenz curve L(p) is of the form L(p) = 1 − (1 − p)q , 0 ≤ q ≤ 1 reflecting the Pareto distribution. Equations (2) and (3) allow for the translation to the probability distribution function. Indeed, consider the particular example when L(p) = 1 − (1 − p)1/3 and the mean μ being unspecified. Using equation (3), and solving for p = F (x) yields F (x) = 1 −

 μ 3/2 3

x −3/2 .

In order for F (0) = 0 and the total probability under the distribution to be 1, we have  μ 3/2

μ ≤ x < ∞. 3 3 ∞ If f (x) = F  (x) then it is simple to check that μ/3 f (x)dx = 1 as it should ∞ be. Calculating μ/3 xf (x) dx confirms the mean is indeed equal to μ. One can also confirm the standard deviation is infinite. The corresponding Gini index is equal to 0.5, while the resulting GMD is equal to μ. Returning to the general case, suppose a proportion α of a certain society earns a proportion β of all the income of the society and this phenomena repeats itself in a. Then F (x) = 1 −

x −3/2 ,

αL(p) = L (βp)

1672

R. La Haye and P. Zizler

for all p ∈ [0, 1] which yields L(p) = pλ for all p ∈ [0, 1] with λ = logβ (α). We can calculate the corresponding Gini index 

1

G=2

p − pλ

0

1−λ = 1+λ ln (α/β) = ln (αβ) For the case of α = 0.2 and β = 0.8, we have G = 0.76, fairly high Gini index. The Pareto principal can be thought of as a sort of right-sided similarity for income distribution. It is worth noting that Jantzen and Volpert (2012) construct a model for the Gini index based on self-similar behavior on both the low and high ends of the income spectrum. Their hybrid model has the form L(p) = pa (1 − (1 − p)b ) for suitable values of a and b. Adding people with no income to a society: If people with no income are added

to a population (such as babies), then the effect on the Lorenz function can be easily established. Suppose L(p) is the Lorenz curve of the original society. Adding babies shifts and scales the original Lorenz curve. More specifically, suppose that the society increases its population by a × 100 percent with no additional increase μ in total income. Then the mean in the new society is 1+a . The new Lorenz curve ∗ L (p) would be  ∗

L (p) =

0  L (1 + a) p −

a  for 0 ≤ p ≤ a+1 a a for a+1 < p ≤ 1. a+1

In this situation the Gini index will increase while the mean decreases. The new ∗ Gini index is given by G∗ = G+a 1+a and the new Gini mean difference, GMD , is GMD ∗ =

2aμ GMD + (1 + a)2 (1 + a)2

If G < 1/2 then letting in folk without an income (a ≈ 0) will initially increase the GMD, and when the proportion of these folk a reaches the value of 1 − 2G and beyond, the new GMD will begin to decrease. The maximum new GMD occurs

65 Applications of the Gini Index Beyond Economics and Statistics

1673

μ when a = 1 − 2G and the corresponding new GMD is then equal to 2(1−G) . If, on the other hand, G ≥ 1/2, then the new GMD will always decrease. The case for standard deviation is similar. Calculations show the new standard deviation is given by

 σ∗ =

aμ2 σ2 + 1+a (1 + a)2

If the original coefficient of variation is less than one, then letting in a few people that have no income will initially increase the standard deviation until the proportion 2 of such people a = 1−CV . Then the standard deviation begins to decrease. The 1+CV 2 maximal standard deviation occurs for the above a value, and the corresponding ∗ is then given by standard deviation σmax  ∗ σmax =

(1 − CV 4 )μ2 (1 + CV 2 )σ 2 + 2 4

If CV > 1 then the standard deviation is always decreasing. Merging Societies in General: Merging general societies together is significantly

more complicated in the context of Gini measures of spread than in terms of the standard deviation and coefficient of variation. Consider two societies, society A with mean income μ1 and standard deviation σ1 and society B with mean income μ2 and standard deviation σ2 . Merge those two societies together and obtain a combined society with mean income μ and standard deviation σ . Let p1 be the proportion of society A in the combined society and let p2 be the proportion of society B in the combined society. Thus μ = p1 μ1 + p2 μ2 . Finding the Gini index or GMD of the combined society in terms of the Gini indices of the individual societies is very hard, only in certain cases can one obtain a nice formula (like the case above where people with no income were added to the society). A powerful feature of standard deviation is the fact that there is a straightforward formula for the standard deviation in the new combined society in terms of the standard deviations of the individual societies. Namely, σ 2 = p1 σ12 + p2 σ22 + p1 (μ1 − μ)2 + p2 (μ2 − μ)2 .

(6)

This is a magnificent property of standard deviation as a measure of spread. Let L1 and L2 be the Lorenz curves for the society A and B, respectively, and let L denote the Lorenz curve for the combined society. Similarly, consider the

1674

R. La Haye and P. Zizler

derivatives of the Lorenz curves s1 , s2 , and s for the respective societies. Using the definition of covariance and equations (5) and (6) yields the following property involving definite integrals of the squares of the derivatives of the respective Lorenz functions: 

1

μ2 0

 s 2 (p) dp = p1 μ21

0

1

 s12 (p) dp + p2 μ22

0

1

s22 (p) dp.

The next section considers a completely different type of application of the Gini index and GMD. This next application was discussed in La Haye and Zizler (2018).

Contrast in Grayscale Images Consider the distribution of shades of gray in a grayscale image. Instead of income, X would be the intensity of a randomly selected point in the picture. By convention, 0 ≤ X ≤ 1, where black is assigned intensity 0, and as the shade of gray gets lighter, the corresponding value of X gets closer to 1. The brightest white has intensity 1. The mean intensity would be μ and F (x) would represent the proportion of points with intensity at most x. Certainly contrast is more than just shading to an artist or photographer, but there is some subjectivity in converting color pictures to grayscale or clearing up grayscale images (such as in medical imaging). In such cases a numeric quantification of contrast can be valuable. Unlike the income scenario, there is a maximum value that X can attain, namely, the value of one. Another difference is that one can argue that the choice to make black zero and white one is arbitrary. In a grayscale picture, black is no better than white. The corresponding Lorenz curve L(p) for a grayscale picture represents the proportion of points in the picture with intensity at most p. The Gini index is not a suitable measure of contrast. For example, consider a picture that is all black except one pixel is white. The Gini index would be maximal, the mean intensity would be essentially 0, yet intuitively the contrast should be low. The standard deviation and the GMD in this instance would be better measures of contrast as their values would be very close to 0. The standard deviation and GMD have another property making them more suitable measures of contrast in a grayscale picture. This property can be seen in the following example. The negative image of a grayscale picture: Let L(p) be the Lorenz function for a grayscale picture and let μ, G, GMD, σ , and CV be the mean intensity, Gini index, GMD, standard deviation, and coefficient of variation in this scenario. Let L∗ (p) be the Lorenz curve for the negative image. In the negative image, the black is replaced by white and in general the pixel with intensity x has the intensity replaced by 1−x. (See Fig. 3 for an example.)

65 Applications of the Gini Index Beyond Economics and Statistics

1675

Fig. 3 A grayscale picture and its negative image

If μ is the mean intensity in the original picture, then the new picture has mean intensity 1 − μ. Furthermore, the following relationship holds between the Lorenz curves of the original image and the negative image:

L(p)μ + 1 − L∗ (1 − p) (1 − μ) = p. Let the superscript “ ∗ ” denote the attributes of the negative image. The definition of the GMD and equation (4) show that both the GMD and the standard deviation for the negative image are the same as in the original black and white picture. Intuitively this seems like a reasonable property of contrast. However since the mean did change, both the coefficient of variation and the Gini index change in the negative image. By the definition of the coefficient of variation and equation (1) CV ∗ =

μ CV 1−μ

and G∗ =

μ G 1−μ

where G is the Gini index of the original image. This property would generalize as follows, beyond grayscale images. Suppose M is the maximum quantity value that can be attained. Suppose every population member has their quantity x replaced by M − x. Refer to this as the negative image population. The Lorenz function for the negative image population can be stated in terms of the original Lorenz function. Indeed, consider a population and its negative image. If the first population has mean wealth μ, then its negative image has mean

1676

R. La Haye and P. Zizler

Fig. 4 A value scale has a uniform distribution of pixel intensities

M − μ. The poorest p × 100 percent of the original society are the richest p × 100 percent of the negative image society. Suppose L(p) is the Lorenz function of the original society and L∗ (p) is the Lorenz function for the negative image society. Then we have

L(p)μ + 1 − L∗ (1 − p) (M − μ) = pM. Therefore CV ∗ =

μ CV M −μ

where CV is the coefficient of variation of the original image. Calculations show the Gini index G∗ for the negative image is given by G∗ =

μ G M −μ

where G is the Gini index of the original image. The Uniformly Distribution in a grayscale image: In terms of a grayscale picture,

the uniform distribution has an interpretation that is well-known in art education. Students are often asked to create a value scale with a pencil by starting with the darkest shade of black they can render and go through lighter and lighter shades of gray until they end at the white of the paper. See Fig. 4. The distribution function in this case would be ⎧ ⎨ 0, x < 0 F (x) = x, 0 ≤ x ≤ 1 ⎩ 1, x > 1 and the mean intensity is given by μ = 0.5. The corresponding Lorenz function is L(p) = p2 , and therefore the Gini index is G = 13 , while the GMD is also 13 . Note that σ ≈ 0.29 and the CV ≈ 0.58. A black and white image: The analogue of a two income society in a grayscale

image would be an image consisting of only two shades of gray. Consider a black and white picture that is a × 100 percent white and the rest is black. Then μ = a and the corresponding Lorenz function L is given by

65 Applications of the Gini Index Beyond Economics and Statistics

1677

Fig. 5 A grayscale image and a version where all intensities have been increased by a fixed amount

 L(p) =

0, for 0 ≤ p ≤ 1 − a 1 a (p + a − 1) , for 1 − a ≤ p ≤ 1.

√ Then G = 1 − a, GMD = 2a(1 − a), and σ = a(1 − a). The GMD value is a maximum when a = 0.5. Note that the standard deviation is also maximal when a = 0.5 and both quantities are then equal. Maximal contrast occurs when half of the picture is white and half is black. Raising the intensity of all points by the same amount: Suppose a grayscale picture

is too dark with all point intensities low (no white in the picture). If every intensity is increased by the same fixed amount (so that no intensity would go above the maximum value of 1), then intuitively it seems the spread of intensity (contrast) will be unchanged. (Fig. 5 has this property.) It is easy to see using the definitions of the GMD and standard deviation that the GMD and standard deviation would not be changed in such a situation. The mean does increase though and thus by equation 1 and the definition of the coefficient of variation both the Gini index and the CV will change as well. Specifically, if every pixel’s intensity is increased by K (K chosen so that no pixel intensity would be larger than 1), then the new mean will be μ + K, u and the new Gini index will be u+K times the old one. Similarly the new CV will u be u+K times the old one.

Other Lorenz-Inspired Measures of Spread and Inequality The Gini index, CV, and the GMD and standard deviation are by no means the only measures of inequality and spread considered by researchers. The measures of

1678

R. La Haye and P. Zizler

spread and inequality in this section have been around for quite some time but are not well-known. They also have nice geometric interpretations in the Lorenz curve. Another measure of spread defined in terms of expected values is the following. Definition 6. The mean absolute deviation from the mean, MAD, is defined by MAD = E(|X − μ|). The MAD is related to a measure of inequality called the Robin Hood index. The Robin Hood index is usually defined in terms of the Lorenz curve. Definition 7. The Robin Hood index, H, is H = F (μ) − L(F (μ)). Note that this is the maximum vertical distance from the line of perfect equity to the Lorenz curve. Indeed, p − L(p) is greatest when L (p) = s(p) = 1, and equation (3) means that occurs when x = μ and hence p = F (μ). In the context of considering the distribution of some quantity among a population, the Robin Hood index gives you the percentage of the total amount of the quantity that must be taken from those with more than a mean amount of the quantity and redistributed to those with less than a mean amount to attain perfect equity. The percentile value p = F (μ) also has an interpretation. It is the cutoff for those population members having less than the mean amount of the quantity to those having more than the mean amount of the quantity. The relationship between the MAD and H will now be established, using similar techniques to those used in the proof of equation (1). 



E(|X − μ|) =

|x − μ| f (x) dx

0



1



|s(p) − 1| dp

0

 =μ

F (μ)

 (1 − s(p)) dp +

0

1

 (s(p) − 1) dp

F (μ)

= 2μ(F (μ) − L(F (μ))) = 2μH Thus MAD = 2μH.

(7)

65 Applications of the Gini Index Beyond Economics and Statistics

1679

The Robin Hood index and equation (7) have been known for some time. Indeed, Gastwirth (1972) discusses the Robin Hood index (but not by that name) and its relation to the MAD. He notes that the Robin Hood index was proposed in the 1930s by different statisticians, namely, Yntema and Pietra. It did not become popular like the Gini index and has been rediscovered and renamed several times. Alternate names for the Robin Hood index include the Pietra ratio, the Hoover index, and the Shutz index. In the case of a society with only two possible incomes H = G. Indeed when ⎧ 2x0 p 1 ⎪ ⎪ for 0 ≤ p ≤ ⎨ x0 + x1 2 L(p) = p − x 2x x 1 ⎪ 1 0 1 ⎪ ⎩ + for < p ≤ 1. x0 + x1 x0 + x1 2 x1 −x0 0 it is the case that H = 1/2 − x0x+x = 2(x . 1 0 +x1 ) The property noted by the standard deviation in equation (4) has no analogue with the GMD. In fact

E(|X − μ|) ≥

1 E(|X1 − X2 |), 2

with equality if and only if p = L(p), i.e. MAD ≥

1 GMD, 2

unless L(p) = p and then equality is attained. Indeed, note that the difference F (μ) − L(F (μ)) is maximal, and therefore, from the definition of G, it must be the case that G < 2H (strict inequality) unless L(p) = p for all p ∈ [0, 1]. A median M can replace the mean μ in the expression above. In that case M = s(0.5)μ and a similar argument in the proof of equation (7) yields 

   s(p) − M  dp  μ

1

E(|X − M|) = μ 0



0.5



 (1 − s(p)) dp +

0

1

 (s(p) − 1) dp

0.5

= 2μ(0.5 − L(0.5))

The quantity E(|X − M|) is called the mean deviation from the median. Observe that all we need to know is L(0.5) and the mean μ to be able to evaluate the mean deviation from the median. It is known that E(|X − M|) ≤ E(|X − μ|),

1680

R. La Haye and P. Zizler

Fig. 6 The Robin Hood index and the measure

1 2

−L

  1 2

see Kendall et al. (1987). The figure below shows the Lorenz for L(p) = p3 , along with the vertical   curve 1 1 heights F (μ) − L(F (μ)) and 2 − L 2 . Observe that s(F (μ)) = 1 when F (μ) = √ 1/3 ≈ 0.5774 with (Fig. 6) H = F (μ) − L(F (μ)) ≈ 0.3849 and

  1 1 −L ≈ 0.3750 2 2

Another measure of inequality developed by Farris (2010) is based on the Lorenz curve as noted below. Suppose a quantity of interest is distributed throughout a population. Think of lining up the population members poorest in the quantity to richest in the quantity, with ties broken arbitrarily. Define a new random variable Q, as follows: Pick a unit of the quantity held by the population at random and let q be the percentile of the population member holding said unit of the quantity. In a perfectly equitable society, every population member would be equally likely to be holding the selected unit of the quantity. In a sharply unequal society, the selected unit is far more likely to be held by a population member rich in the quantity (so at a high percentile).

65 Applications of the Gini Index Beyond Economics and Statistics

1681

Note that L(p) is the distribution function for this random variable and s(p) is the density function. In the case of a perfectly equitable income distribution s(p) = 1 for all p ∈ [0, 1]. The expected value q of the share density function s(p) can be thought of as the percentile level of an individual which earns the “average” unit of the quantity. In particular, 

1

q=

ps(p) dp 0

and q=

G+1 2

(8)

The measure q has an interesting geometric interpretation in terms of the Lorenz curve. It is the area between the Lorenz curve and the horizontal line y = 1. Also, the following can be used to visualize the value q. The point q is located on the p− axis in such a way that the area below the Lorenz curve L from p = 0 to p = q is equal to the area above the Lorentz curve L (bounded by the horizontal line y = 1) from p = q to p = 1. See Fig. 3, where the two equal areas are indicated (Fig. 7).

Further Modeling with the Lorenz Curve and Gini Index Included below are further examples of modeling with the Lorenz curve and Gini index in the context of household income.

Equalization and the Gini Index Consider a population with total income W , population size N , and a mean income μ. Suppose the society loses proportion (1 − α) of its highest-income people but distributes their income equally among the remaining individuals. Let LN denote the new resulting Lorentz curve and sN denote its derivative, then sN (p) = αs (αp) + (1 − L(α)) . To see this, let X N denote the new mean and xN the new wealth amount. Observe xN XN

=

x+

(1−L(α))W ) αN W αN



x X

+ (1 − L(α)) .

1682

R. La Haye and P. Zizler

Fig. 7 The geometric interpretation of the measure q

As a result, LN (p) = L(αp) + (1 − L(α)) p. Suppose the society taxes income by α proportion and then distributes the taxed income equally among all folk.Then XN = X and therefore sN (p) = (1 − α)s(p) + α, resulting in LN (p) = (1 − α)L(p) + αp. To see this observe xN XN The new Gini is

=

(1 − α)x + W N

αW N

= (1 − α)

x X

+ α.

65 Applications of the Gini Index Beyond Economics and Statistics

1683

GN = (1 − α)G. Recall that one of the limitations of the usage of the Gini index in the household incomes situation was that, when comparing the Ginis of two countries, one might use pre-tax income data while the other uses post-tax income data in their respective Gini index calculations. The above discussion suggests how to adjust for this effect for the sake of comparison. It is possible to combine the above effects imposed on incomes in a society and track the resulting changes in the Gini index and other indeces along the way. For example, suppose a society increases its population by a × 100 percent by people who have zero wealth (babies) and then taxes its population by α × 100 percent. Then the new resulting Gini index GN in terms of the old one G is given by GN =

1−α (a + G) 1+a

The reader may trace the new GMD and the new standard deviation in this case as an exercise.

The Golden Equity Suppose two people are randomly selected and the lower income of the two people min indicate the expected is recorded as Y min . Then Y min is a random variable. Let Y min value for Y . It was shown (Farris 2010) that Y min = 1 − G One can see this as follows. Let X be the random variable indicating the income of an individual and let μ be the mean income  in a society. Let f (x) be the density x

function for the income and let F (x) =

f (t) dt be the cumulative density 0

function for X. Choose randomly two incomes and denote by Y min the lower value. Then P (Y min < x) = 1 − (1 − F (x))(1 − F (x)), and  Y min =



2xf (x)(1 − F (x))dx.

0

Setting p = F (x) with dp = f (x)dx and s(p) =

x yields μ

1684

R. La Haye and P. Zizler

Y min = μ

 



x 2 f (x)(1 − F (x))dx. μ

0 1

=

2s(p) (1 − p) dp

0

= 2 − 2p = 1 − G. Now let random variable Y max indicate the higher income of the incomes. Let Y denote the expected value of Y max . It follows that max

P (Y max < x) = F (x), and Y max = μ





0



=

x 2 f (x)F (x)dx μ

1

2s(p)p dp 0

= 2p = 1 + G. Observe now Y Y

min max

=

1−G 1+G

This ratio has the following geometrical interpretation. It is the ratio of the area under the Lorenz curve L over the area above the Lorentz curve, bounded by the horizontal line y = 1. One can bring the omnipresent golden ratio into this discussion as well. Clearly the utopia of equally distributed income is not desirable as it stumps innovation and progress. One could attempt to define a new equality in income that could retain some of the drive for innovation in a society. Suppose two people meet randomly in a society and compare their incomes. If the income Gini index in the society is too low, then the higher-earning individual loses zeal for trying. On the other hand, if the Gini index is too high, the lower-earning individual might feel discouraged to catch the higher-earning one. Define a golden equity in incomes. Suppose the lower-earning person measures himself or herself in relation to the higher-earning individual the same way as the higher-earning individual measures himself or herself in relation to the whole. The result is a golden equation

65 Applications of the Gini Index Beyond Economics and Statistics

1685

1+G 1−G = 1+G 2 which yields the golden Gini index, G ≈ 0.24. By equation (8), the golden Gini index forces q to be equal to φ − 1 where √ 1+ 5 φ = 2 ≈ 1.618 is the golden ratio. As a percentage, q in this instance is equal to about 61.8%. Thus we can conclude that the average dollar is spent by the (φ − 1) × 100 percentile of the population in this ideal society. The golden Gini index also has a nice geometrical interpretation using the Lorentz curve L. The area between the curve L and the horizontal line y = 1 is equal to the φ −1. If the Lorenz curve were of the form L = pλ then λ = φ. Consider the Gini index data given in Table 1. Sweden reported a Gini index of around G = 0.25. This value is close to the golden Gini index. On the other hand, the US Gini index in Table 1 was reported to be around G ≈ 0.45. As a result 1+G 1−G ≈ 0.38 and ≈ 0.73. 1+G 2

Golden Academia It is sometimes a practice in academia, to increase the income of its professors based on age alone. Let T be the mean time until retirement and r denote the continuous growth rate on the income per year. Assume a uniform distribution of people in the t age brackets. Therefore, letting t ∈ [0, T ] we have p = . Assume without loss of T generality that the initial income at t = 0 is one dollar and thus the income at the age t is given by ert . The Lorenz function can be found to be L(p) =

erTp − 1 erT − 1

and thus 

1

G=2

(p − L(p)) dp

0

=

rT erT + rT − 2erT + 2

rT erT −1

For example let r = 0.04, in particular assume there is a continuous growth rate of income of 4% per year in this academic setting. This might be a fairly reasonable assumption in many university settings. With T = 35 years of service, the Gini index G is very close to the golden equity scenario.

1686

R. La Haye and P. Zizler

Having noted various measures of inequality and spread is worth noting some desirable properties of such statistical quantities. Many of the properties noted below have come up in previous sections.

Summary of Desirable Properties of Measures of Inequality and Spread Many of the properties below were discussed in the context of applications. Most of these properties were discussed in Hurley and Ricard (2009) as desirable attributes measures of sparsity. It is valuable that a measure of spread or inequality be bounded for ease of comparison. By their definitions, the Gini index and Robin Hood index are both bounded between 0 and 1. In contrast, while the GMD and MAD are not universally bounded, both lie between 0 and 2μ. On the other hand the coefficient of variation and the standard deviation are not bounded, with the Pareto distribution showing the standard deviation can be infinite with a finite mean. It seems intuitive that if the units of the quantity of interest were converted to another unit, for example, dollars to euros, then the underlying inequality would be unchanged, and the measure of inequality should thus be unchanged. This unit conversion is a form of scaling. The Lorenz function is clearly unchanged by scaling; hence the Gini index, Robin Hood index, and covariance are invariant to scaling. The mean will change with scaling as E(kX) = kE(X). By the definition of GMD, MAD, and standard deviation, these measures will change with scaling. This is not surprising as they are quantities with units. If every population members ownership of the quantity of interest is topped up by the same amount, then intuitively it seems the measure of spread should be unchanged. Indeed, think of a dot plot or histogram capturing the situation before and after the shift. The shape would remain unchanged, but the whole graph would be shifted. This means that spread is unaffected. The GMD, MAD, and standard deviation all remain the same under shifting. The mean does shift though. Therefore by equation 1 the Gini index, Robin Hood index , and the coefficient of variation will change as well. Specifically, if every member amount is increased by K, then the new mean will be μ + K, and by equations (1) and (7) the new Gini index and u Robin Hood index will both be u+K times the old one (meaning they will decrease u if K ≥ 0.). Similarly by its definition, the new CV will be u+K times the old one. This property was examined in the discussion of grayscale pictures, where the possibility of increasing all pixel intensities by the same fixed amount was analyzed. To confirm it is a desirable property that the a measure of inequality change with a shift, consider an extreme case. Suppose the Gini index is used to analyze inequality of incomes the maximum household income in a society is one million. Suppose everyone’s household income is happily increased by one billion dollars. Then intuitively the inequity is not as bad as it was, as the difference between a

65 Applications of the Gini Index Beyond Economics and Statistics

1687

household with income of one billion dollars and one with one billion plus one million dollars seems insignificant. Another desirable property for a measure if inequality is sometimes known as the Robin Hood property. Not to be confused with the Robin Hood index, the Robin Hood property simply requires that taking the quantity of interest from richer population members and giving it to poorer population members, assuming we don’t make the rich poor and the poor rich, should decrease inequality. This is an intuitively desirable property. It is clear that no matter what transfer is done, the mean remains unchanged. Consider the more specific situation where the quantity of interest from members richer in it is given to those poor in it so that it does not change the order of the population when ordered poorest to richest. Then the proportion of the quantity owned by the poorest p × 100 percent of the population after the redistribution is as large or larger than it was before redistribution. Thus, if L(p) is the Lorenz curve before the transfer and L∗ (p) is the Lorenz curve for the society after the transfer from rich to poor, then L∗ ≥ L on [0, 1]. Hence the Gini index decreases in the population where the rich in the quantity have given to the poor in the quantity. Thus by equation (1) the GMD decreases as well. If members with zero of the quantity of interest are added to the population (e.g., adding babies in the case of income or wealth), then clearly the inequity should increase. This holds true for the Gini index. Adding babies shifts and scales the Lorenz curve. This scenario was dealt in the context of household incomes, where it was noted that while the Gini index increased, the situation with the GMD and standard deviation was more complex. A property sometimes known as the Bill Gates property says that adding a population member with an extreme amount of the quantity of interest to the society will increase inequality. In a sense, this is the opposite of the adding babies property. Obviously if this happens L(p) decreases for all percentiles lower than that of the added individual, who is at or near p = 1. There is an increase in the area trapped between p and L(p), and therefore the Gini index will increase, indicating the increased inequality. The mean will also increase and hence the GMD increases as well. The standard deviation would also increase. The next property was not noted by Hurley and Ricard (2009), but arose in the grayscale image discussion in the context of a negative image. Suppose M is the maximum quantity value that can be attained. Suppose every population member has their quantity x replaced by M − x. (This would change the total amount of the quantity held by the population and the mean.) As noted in the context of grayscale images, it does not change the standard deviation, GMD, or MAD, but it does change the Gini index (as well as the Robin Hood index) and coefficient of variation. A final property noted in Hurley and Ricard (2009) is the idea of combining copies of a society. If a society is duplicated and then combined with its duplicate, the inequality should clearly remain unchanged. Cloning does not change the Lorenz function as it deals with proportions of the total population. Therefore the Gini index, Robin Hood index and coefficient of variation are all unchanged in cloning. Moreover, the mean does not change either as the population size doubles along with

1688

R. La Haye and P. Zizler

its total wealth. Consequently, the GMD, MAD, and standard deviation also remain the same. This seems an odd property but it is useful in theory. The property can be appreciated in the context of grayscale image. If a grayscale image is doubled in size, it would have no affect on any of the measures of spread or inequality discussed.

Conclusions The Lorenz curve is very useful in understanding the Gini index and the GMD. It is also amenable to modeling various scenarios – be it dealing with incomes in a country, point intensities in a grayscale image, or some other quantity distributed among a population. Furthermore, the Lorenz curve contains additional measures of spread and inequality, and its derivative contains information on the standard deviation and the CV. While Gini’s measure differ only by twice the mean, that difference has a profound effect on the properties of each measure and hence their applications. Statisticians and economists have cemented the Gini index’s role as a prominent measure of inequality, but consensus has not yet been reached regarding the value of the GMD. The standard deviation shares many properties with the GMD, but there are differences. Some of these differences and similarities were noted in this chapter. More research needs to be done to assess the true value of GMD and no doubt that while that research progresses more and more applications will be found for the Gini index. Corrado Gini’s measures are still inspiring argument and applications long after he first proposed them.

References Abraham R, van der Bergh S, Nair P (2003) A new approach to Galaxy morphology: I. Analysis of the sloan digital sky survey early data release. Astrophys J 588:218–229. https://doi.org/10. 1086/373919 Catalano M, Leise T, Pfaff T (2009) Measuring resource inequality: the Gini coefficient. Numeracy 2:1–22 Dalton H (1920) Measurement of the Inequality of Income. Econ J 30:348–361 Farris F (2010) The Gini index and measures of inequality. Am Math Mon 117:851–864 Gastwirth J (1971) A general definition of the Lorenz curve. Econometrica 39:1037–1039 Gastwirth J (1972) The estimation of the Lorenz curve and Gini index. Rev Econ Stat 54:306–316 Gerstenberger C, Vogel D (2015) On the efficiency of Gini’s mean difference. Stat Methods Appl 24:569–596 Giorgi G, Gubbiotti S (2017) Celebrating the memory of corrado Gini: a personality out of the ordinary. Int Stat Rev 85:325–339 Hurley N, Ricard S (2009) Comparing measures of sparsity. IEEE Trans Inform Theory 55:4723– 4741 Jantzen R, Volpert K (2012) On the mathematics of income inequality: splitting the Gini index in two. Am Math Mon 119:824–837

65 Applications of the Gini Index Beyond Economics and Statistics

1689

Kendall M, Stuart A, Ord J (1987) Kendall’s advanced theory of statistics, vol 1, 5th edn. Oxford University Press, New York La Haye R, Zizler P (2018) The Gini index and grayscale images. Coll Math J 49:205–211 La Haye R, Zizler P (2019) The Gini mean difference and variance. Metron 77:43–52 O’Hagan S, Muelas M, Day P, D K (2018) GeneGini: assessment via the Gini coefficient of reference “housekeeping” genes and diverse human transporter expression profile. Cell Syst 6:230–244. https://doi.org/10.1016/j.cels.2018.01.003. Piesch W (2005) A look at the structure of some extended Ginis. Int J Stat 63:263–296 Saha D, Kemanian A, Montes F, Gall H, Adler P, Rau B (2018) Lorenz curve and Gini coefficient reveal hot spots and hot moments for nitrous oxide emissions. J Geophys Res-Biogeo 123: 193–206 Wang X, Zhang J, Shahid S, AlMahdi E, He R, Wang X, Ali M (2012) Gini coefficient to assess equity in domestic water supply in the Yellow River. Mitig Adapt Strateg Glob Change 17: 65–75 Yitzhaki S (1997) More than a dozen alternative ways of spelling Gini. Res Econ Inequality 8: 13–30 Yitzhaki S (2003) Gini’s mean difference: a superior measure of variability for non-normal distributions. Metron 61:285–316

A Computational Music Theory of Everything: Dream or Project?

66

Guerino Mazzola

Contents The World Formula: A Physical Theory of Everything (ToE) . . . . . . . . . . . . . . . . . . . . . . . . The ToE in Contemporary Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Are Physicists Dreaming? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Is ToE Essentially a Mathematical Problem? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Computational Music Theory of Everything (COMMUTE), a Mathematical Nightmare? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arguments Against a COMMUTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Individual Creativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Colonialist Universalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Uncontrollable Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Does “Computational” Mean in COMMUTE? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Directions Toward COMMUTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two Dimensions, Same Idea: Harmony and Rhythm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Understanding Harmony and Counterpoint via Gestures . . . . . . . . . . . . . . . . . . . . . . . . . . Counterpoint Worlds for Different Musical Cultures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unification of Mental and Physical Realities in Music: Introducing Complex Time . . . . Unifying Note Performance and Gestural Performance: Lie Operators . . . . . . . . . . . . . . . Unifying Composition and Improvisation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1692 1693 1695 1695 1696 1699 1699 1700 1703 1704 1705 1705 1706 1706 1708 1709 1709 1709 1710

Abstract This chapter draws future perspectives of music as a cultural achievement of humans. We discuss the role of mathematics and physics in music from Pythagoras to string theory and the music’s global human presence, transcending

G. Mazzola () School of Music, University of Minnesota, Minneapolis, MN, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_90

1691

1692

G. Mazzola

specific fields of knowledge in its synthetic force that unifies distant fields of knowledge and action in the concrete and abstract realms. We discuss the idea of a Computational Music Theory of Everything (COMMUTE) that would parallel the physical project of a Theory of Everything (ToE).

Keywords World formula · Theory of everything · Music theory · Mathematical structures · Topos theory

The World Formula: A Physical Theory of Everything (ToE) Before introducing the idea of a musical ToE, the physical ToE should be recalled. In physics, the ToE historically was preceded by the idea of a “world formula.” This magic wand concept means that the entire physical reality can be completely described by a big mathematical formula. First versions of such a formula are known as Democrit’s idea of an atomic construction of physics and, more specific in its mathematical form, in the Pythagorean tetractys; see Fig. 1. This metaphysical concept was thought to describe the basic principle of the universe, a musical entity, built from the Greek holy number of ten stacked points, which specifies the consonant intervals: octave, fifth, and fourth, through their frequency ratios 2:1, 3:2, and 4:3, when played on a monochord with corresponding subdivisions of the string. The wording “world formula” was introduced in 1872 by Emil Heinrich du BoisReymond in a talk entitled “Über die Grenzen der Naturerkenntnis” (On the Limits of Natural Science). He criticized the idea of a Laplace demon, as proposed by mathematician Pierre-Simon Laplace in 1814, who could know the state of the universe at any past or future time from the data at a given time and the differential equations that describe the universe. This world formula idea was restated in the 6th of the 23 problems listed by mathematician David Hilbert in his keynote at the International Mathematical Congress in Paris in 1900. Hilbert’s sixth problem asks if one could describe the entire physics by an axiomatic mathematical system. He was in fact restating the idea of a world formula in terms of contemporary mathematics. This problem is still unsolved and now would be called ToE. The

Fig. 1 The Pythagorean tetractys

66 A Computational Music Theory of Everything: Dream or Project?

1693

thesis of a ToE was in fact preceded by the world formula idea by many centuries. Let us now discuss the shape of the ToE in contemporary physics.

The ToE in Contemporary Physics In the perspective of contemporary physics, the ToE claims the integration of the four fundamental physical force types: electromagnetic, weak (showing up in radioactivity), strong (glueing protons together in atomic nuclei), and gravitation. The electromagnetic and weak forces are already united and called “electro-weak.” This is the result of the 1979 Nobel Prize winning research by Sheldon Glashow, Abdus Salam, and Steven Weinberg. The still hypothetical integration of the electroweak force with the strong force is called GUT: grand unification theory. Integration of forces means that ultimately, all these forces are special cases of a fundamental force, which splits into the four forces by a breaking of structural symmetries when energies are below a threshold. Let us stress first that even the reduction to those four forces is everything but evident. For example, would you guess that the mechanical forces of say, a hammer hitting a nail, are the same type as the forces of adhesion of a glue? Or the forces of chemical reactions? Or the force of sunlight tanning your skin? They all pertain to the electromagnetic force type. Recall that this force type is completely described by the famous four Maxwell equations established in 1865. The physical sciences and their outlets in chemistry have achieved an incredible reduction of the apparent variety of force types. This means that the surface of physical actions does not prevent a deep theory from unifying superficial diversities. We should keep in mind this fact when stepping over to the musical realm. It is remarkable that “Everything” in physics relates to physical forces, but not to psychological or symbolic realities. This restriction is significant since it is different from any physicalist totalitarianism. Physics does not claim a total explanation of this world; physics deals with the outer nature and has never tried to reduce psychological or symbolic realities to physics. Quite the opposite, prominent theoretical physicists, such as Roger Penrose (Fig. 2), argue that the innermost physical ontology might rely on mathematics. He also argues in Penrose (2002) that physics has not yet included the psychological realm of the human mind in its basic conceptual architecture. Despite the success of simplifying physical force types to three (electro-weak, strong, and gravitation), it is not clear why the ToE and even the GUT should work. And not all physicists subscribe the ToE thesis, for example, Nobel Prize winner Robert Laughlin in his book “A Different Universe: Reinventing Physics from the Bottom” (Laughlin, 2005) argues that the physical laws may have different layers, not only fundamental laws. But most physicists seem to believe in an ultimate unification, probably because of some monotheist paradigm: there is only one innermost, well yes, divine, entity that shapes the universe. The success gives them enough motivation to work in this direction with an impressive shared social, organizational (the Internet was invented by Tim Berners-Lee at the CERN, Centre

1694

G. Mazzola

Fig. 2 Theoretical Physicist Roger Penrose

Européen pour la Recherche Nucléaire, near Geneva, Switzerland, to coordinate nuclear research efforts globally), and economic effort called “Big Science.” Although a mathematical description of the conditions and possibilities of a ToE is far beyond this chapter’s possibilities, we want to sketch a basic mathematical conflict between the two antipodes within present physics: general relativity theory and quantum mechanics. The geometric shape of general relativity is to view the universe’s space-time as a mathematical manifold which is endowed with a multilinear form, a tensor, which defines the manifold’s curvature, and thereby the geometric expression of gravitational forces. It is a thoroughly differentiable mathematical structure that is embedded in the classical theory of global calculus. Opposed to this global picture of the universe, the microscopic behavior of physical objects, such as atoms and elementary particles (electrons, protons, neutrons, photons, etc.), is described by a completely different mathematical structure. The reality of physical objects here is defined by a very abstract “configuration space,” the Hilbert space H . This is a vector space with a scalar product, which is complete for the corresponding topology. The vectors ψ of length one of H are called the “states,” and an observable A on H is a special (“ self-adjoint”) linear operator on H . The physical reality is no more a specific space-time point, but the statistical expectation value for a measurement of an observable A in a determined state ψ. This expectation value is the value Aψ, ψ of the given scalar product. Intuitively this means that the observables are not directly perceived but only measured with respect to given states. It is evident that these two types of mathematical structures are far from being unified. This makes plausible how difficult a ToE must be in view of the mathematical discrepancies of the gravitational and microscopic structure theories.

66 A Computational Music Theory of Everything: Dream or Project?

1695

Are Physicists Dreaming? So what is the status of ToE in physics? Is it a dream or a project? Or a project of a dream? For many physicists it is a project; the activity of CERN is a reality. But for others it might be a nightmare, an expensive waist of energies, money, and intellectual potential. Let us be realistic: the CERN and similar organizations of Big Science have been very successful so far. Let us just recall that the discovery of the Higgs boson at CERN in 2012 and the experimental confirmation of gravitational waves by the LIGO (Laser Interferometer Gravitational-Wave Observatory) and Virgo Scientific Collaboration (French-Italian collaborative measurement of gravitational waves) in 2016 are sensational steps toward that ToE ideal. Suppose that ToE is impossible, are then all the efforts at CERN a waist of human energy? This question presupposes that one has proved the impossibility of ToE. Such a deep insight would in itself mean an incredible success of physical research. Physics is not only the pursuit of an idée fixe; it is a strong dialogue between theory and experiment – whatever may be its outcome. Such a strong dialogue can only be successful if it is based upon strong ideas, and ToE is such a strong idea. One would have to propose equally strong alternative ideas if one wanted to go in different directions. For example: How could Laughlin’s layering of physical laws look like in concreto? Perhaps one should step back for a moment and rethink the world formula idea that stands at the origin of a ToE. This idea is a mathematical one: one searches a mathematical expression for this universe’s physics. This search is not a physical experiment to be designed at CERN; it is a mental experiment, evolving in the a priori of mathematical conceptualization. Without calculus, no Maxwell equations are feasible. This means that ToE is also, and heavily so, an idea that challenges the conceptual architecture of such a potential world formula. Galilei’s experimental mechanics were built without the language of calculus. Without the concept of an instantaneous velocity, the differential quotient of the spatial coordinate in time, those experiments could not converge into a valid mechanical theory. The role of mathematics in this ToE is a big point. If one has to perform experiments, such an endeavor must also be conceived within a valid formal framework, be it in terms of theoretical mathematics, but also relating to computer technology, which may in its mathematically shaped software control the Big Data flow generated by such experiments.

Is ToE Essentially a Mathematical Problem? The above thoughts make clear that an important component of the ToE enterprise is the question about its mathematical form. Evidently, the latter is a necessary condition for a ToE. But could it also be its sufficient condition? Is it all only about finding the unifying mathematical language? Is there also a purely mathematical ToE?

1696

G. Mazzola

A mathematical ToE cannot relate to experimentally perceived phenomena; it only refers to mathematical theories about different objects, structures, and concepts. Recall that set theory was a first Theory of Every possible Objects, a ToEO. Category theory later was a theory of Every possible Structure, a ToES. And Grothendieck’s topos theory was a unification of geometry, logic, and arithmetics, essentially the project of a mathematics of concepts, a Theory of Concepts (ToC). Its absorption of set theory and essentials of category theory might be interpreted as a strong step toward a mathematical ToE. It is not astonishing that one recent “Noble Prize” in mathematics, the Fields Medal, was attributed to the theoretical physicist and string theorist Edward Witten in 1990. Progress in physics is becoming an option that involves heavy mathematics. This confirms the now widely accepted idea that physics in its substance shares a mathematical ontology. A ToE in physics seems to address a mathematical ToE.

A Computational Music Theory of Everything (COMMUTE), a Mathematical Nightmare? In view of the fundamental role of music in all cultures (even where music is virtually forbidden, with the Taliban, e.g., its force is recognized, and that is why it is forbidden, sad irony), it is not astonishing to try to imagine a unified view of music in the spirit of the physical ToE. Here is a first corresponding statement: Could it be that the variety of musical expressivity is the unfolding of a unique fundamental “force field”? Before we delve into this hypothesis of musical unification, we should understand that the wording COMMUTE, meaning Computational Music Theory of Everything, would not, similar to ToE, include strictly everything. This is also why the adjective “computational” was added. The idea is to think of a music theory that is computational; other theories might exist, but they are not addressed here. Moreover, it is also, similar to ToE, not intended to subsume all realities, the physical, psychological, or symbolic. Nevertheless, recall that the musical idea has been a driving force in the development of physics and astronomy from Pythagoras via Kepler to string theory. It is not evident in how far a COMMUTE would connect to ToE, but one should keep in mind these deep relationships. And also it should be considered important that the physical ToE is heavily based upon the conceptual architecture of mathematics. A similar configuration could be envisaged for a COMMUTE. The physical ToE is a precise hypothesis, the unification of all physical force types, independently of how this would work. For a COMMUTE the strict analogy to physics is problematic. Music does not share the simple idea of forces which are embodied by quanta (these are photons for the electromagnetic force, W and Z bosons for the weak force, gluons for strong force, and hypothetical gravitons for gravitation). Nevertheless, music theory has developed models which are in strict analogy to physical models where forces are mediated by quanta (bosons). For example, the

66 A Computational Music Theory of Everything: Dream or Project?

1697

Fig. 3 Physical quanta and musical quantum M for a modulation from tonality S to tonality T ; k is the cadence

core idea of Mazzola’s modulation model (Mazzola, 2002, Ch. 27.1) was an exact analogy to physical force quanta, musical symmetries being the analogy of forces, while modulation quanta (sets of pitches) are the analogy of physical quanta; see Fig. 3. If we are given two tonalities S an T , they must be in the same symmetry orbit of the group of transpositions and inversions. For example, both must be major scale tonalities. The force between S an T is a symmetry that transforms S into T . The quantum of this force is by definition a set M of notes that is invariant under this symmetry. This setup enables the exhibition of the modulation’s pivotal chords. But this is a special situation that cannot for the time being be generalized to general modulations. Another type of forces – elastic deformations as described by local symmetries in physics – was used in the mathematical model of counterpoint (Counterpoint et al., 2015). Here, the dichotomy of consonances and dissonance was deformed by a symmetry in the role of an elastic force acting on the set of intervals. See below in the section about counterpoint worlds for more details. The force paradigm could enable a unifying force theory in music theory. The idea of forces in music (theory) is not new; see Larson (2012). Already Schoenberg used the metaphor of erotic (!) forces to explain harmonic tension in Schönberg (1911). But there is no such universally acclaimed theoretical architecture as in physics, forces may show up (metaphorically or literally), but music has a number of very different structural paradigms:

1698

G. Mazzola

• The local-global duality describes musical structures as “manifolds” that are glued together from local “charts.” A sonata is glued together from its movements; a movement can have the charts of exposition, development, recapitulation, and coda. The exposition might be glued together from melodic units, chords, etc. Mathematical Music Theory describes such manifolds as global compositions; their classification has been a major topic of the theory. • Special cases of musical manifolds are pronouncing geometric ideas, the harmonic Moebius strip, for example. This structure is associated with the covering of a major scale by its seven triadic degree chords I, I I, . . . V I I . For C major, we would have I = {C, E, G}, etc. The Moebius strip is generated by the intersections of these degrees: this is the nerve construction in algebraic topology. The fact that we get a Moebius strip (see Fig. 4), which is non-orientable, is the mathematical reason for the failure of classical Riemann harmony; see Mazzola (2002, Ch. 13.4.2.1) for details. Of course, such manifold structures are also present in physics, for example, the manifold structure of global space-time with its interpretation as a gravitational force field. But in music theory, the precise role of manifolds is not related to forces, but to semiotic aspects. We come back to this point below. • Gestural and topological approaches are perspectives of music theory, which share a view onto music that is opposed to algebraic abstraction, but stresses a movement in space and time that expresses geometric/gestalt rather than forcedriven aspects of musical creativity. Musical architectures, such as large forms (sonata forms, blues forms, etc.), are not conceived as being force generators. The musical concept architecture is more complex than the physical one. One crucial reason for such a complexity is the semiotic charge of musical objects, a phenomenon that is absent in physics. Accordingly, a COMMUTE must take care of the dimension of meaning in music. Recall the classical insight that musical ontology is spanned by four dimensions: realities (physical, psychological, symbolic), communication (poiesis, neutral level, aesthesis), semiotics (expression, signification, content), and embodiment (facts, processes, gestures); see Fig. 5. Therefore a COMMUTE must try to take care of

Fig. 4 The intersection configuration of triadic degrees of a major scale is a non-orientable Moebius strip. It is generated by drawing lines between any two degrees that have notes in common and by drawing triangular surfaces for any three degrees with common notes

66 A Computational Music Theory of Everything: Dream or Project?

1699

Fig. 5 The musical ontology has four dimensions: realities, communication, semiotics, and embodiment

these coordinates, at least those which are accessible from the computational point of view. Before we further develop the idea of a COMMUTE, we want to set forth some important arguments against such a strong COMMUTE hypothesis.

Arguments Against a COMMUTE This section is not exhaustive, but focuses on three critical situations, where the quasi-scientific attitude of the COMMUTE hypothesis is viewed as an obstruction to core values in music.

Individual Creativity To begin with, the individual creativity of a musical composer or improviser seems to forbid any “universalist” background. But individual creativity has not always been the backbone of artistic utterance. Before the Renaissance, a European artist

1700

G. Mazzola

was the waiter who would bring divine messages to the public, but never an autonomous agent. Genuine artistic creativity was introduced in the Renaissance, especially in the poetry of the Grand Testament by François Villon and in visual arts by the central perspective, which would focus on the individual artist’s position rather than on the divine view sub specie aeternitatis. This is the heritage of the Renaissance movement, which opposes to the Pythagorean “world formula” its irreducible individual genealogy of artistic creativity. This opposition to universalist principles is however not stringent. A painter, for example, may create a deeply individual work of art whose colors are being completely described by the electromagnetic force that defines light and its action within the human eye. And on a higher level of structural abstraction, the variety of musical transformations can be described by a huge mathematical group, typically ranging within the cardinality of 1040 ; the number of transformations on the local score of the presto software (Mazzola, 1989–1994) is 10 445 260 466 832 483 579 436 191 905 936 640 000 ≈ 1.04453 × 1037 . This is all comprised within a very clear and unified conceptual architecture of theory, but such a virtually infinite number guarantees an unlimited variety of individual utterances. In other words, the individual point of view can subsist as a surface that is built upon universal background structures. The question here would be about the layer of reality, where the individual work is created, and about the hidden universal structures that enable such an individual expressivity. As a pianist, for example, one would fully accept the physical laws that are present in a grand piano, but one would nevertheless feel free to compose or improvise on the surface of the piano’s physical structure. Here we should stress again the limitation of COMMUTE to its computable reality, which may not include the human creative mentality. At the time being, no computational theory of human creativity is being developed, and this limit applies to the argument against COMMUTE.

Colonialist Universalism Another argument against COMMUTE would be the suspicion that this thesis is a consequence of the Western (Christian) colonialist mentality. Are we trying to unite and thereby dominate all musical cultures under a big equalizing umbrella? And thereby destroying unsurmountable differences? This is a delicate question since we already have many examples of a destructive reduction. For instance, when transcribing Arab Maqam music to Western notation – all the essential pitch bend effects are eliminated. The same destruction takes place when transcribing a jazz saxophone solo of Archie Shepp to Western standard notation. Other examples of the same type are abundant. But such a translational pathology may be eliminated by a more diligent conceptualization. For example,

66 A Computational Music Theory of Everything: Dream or Project?

1701

the language of denotators and forms as developed in Mazzola (2002, Ch. 6) can describe musical objects much better than the traditional Western score. Denotators represent musical objects, while forms are the spaces where denotators live. This conceptual framework uses cutting-edge mathematics, especially topos theory, which is a marriage of geometry and logic, and has been used as a foundation of computer science and physics. At the bottom of the space architecture, we have so-called simple forms. Simple forms are mathematical spaces, such as the real line R for onset or duration, or the pitch class space Z12 . But in our theory, concepts must all have an explicit name; this is an important difference to mathematics, where names are irrelevant. For example, a simple onset form would be notated by Onset : .Simple(R), and the duration form would be written as Duration : .Simple(R). One recognizes the difference to mathematics: same mathematical coordinates (R), but different names in the musical setup. This approach uses three types of composed spaces from already given spaces, which are products (also called limits), unions (also called colimits), and collections (also called power sets); see Fig. 6. These three types of composition are justified by topos theory and thereby guarantee the presently best existing conceptual architecture. 1. Products describe spaces by a number of coordinate spaces, and points in product spaces must be described by a sequence of coordinate values. For example in a standard score, a denotator is a note in the product space whose coordinates are onset, pitch, loudness, and duration. Such a product form would be denoted by Note : .Limit (Onset, P itch, Loudness, Duration).

Fig. 6 Composed spaces can be products, unions, or collections of already given spaces

1702

G. Mazzola

2. Unions are spaces that also have a number of coordinate spaces, but now the idea is akin to a library. The coordinate spaces are like books, and a denotator of a union is a denotator in one of the given “coordinate books.” For example, in an orchestra, the instrumental parts are the books, and a note is a denotator in one of the given instrumental “books.” The form named orchestra is denoted by Orchestra : .Colimit(Instrument1, Instrument2, . . .). It refers to a sequence of already defined forms (here Instrument1, Instrument2, . . .). 3. Collection spaces refer to one given coordinate space, and a denotator here is just a set of denotators in one single coordinate space. For example, a chord is a collection of notes, which are denotators in the above product space of notes in a score. The musical motif form is denoted by Motif : .Powerset(Note), and a denotator is a set of notes. This means that this question could in some cases be answered by extending the given language to a state where some differences would be taken care of. For example, even the very different music notation of Japanese Noh theater can be expressed by denotators. The conceptual architecture of denotators has recently been extended to include denotators of (mathematical) gestures (Mazzola, 2018, Ch. 68) and therefore a large class of objects that are of gestural nature, not only module-theoretic as in the first setup of denotators in Mazzola (2002). This situation is delicate because the argument of an extended language seems to be purely formal; it would not touch the cultural differences of the role of music. For example, the African social role of music is radically different from role in Western cultures and also different from the Indian role of Raga music, say. This aspect relates to music sociology or psychology. We do not include these aspects in the hypothesis of COMMUTE. At the time being such an inclusion seems too ambitious and also dangerous for the named reasons. Nevertheless, the argument that a language extension is a purely formal procedure without “colonialist” side effects is naive. It cannot be accepted without a diligent contextual inquiry, without discussing the language extension with specialists on both sides of the cultural interface. The prominent Ghanaian music semiotician Kofi Agawu has written important texts relating to failed Western scholarly appropriations of African music cultures (Agawu, 2003). This caveat does not mean that different music cultures are a priori incomparable. Recall that physical laws are also valid for galaxies, which live millions of lightyears from our solar system. So why couldn’t Indian musics be realizations of humanly universal musical rule systems? On the level of music theory, relations between European counterpoint and Raga music are in fact appearing; we come back to this phenomenon when discussing counterpoint worlds for different musical cultures below. Perhaps should one also reflect upon the idea of “fundamentally incomparable cultures” in view of a separation and even the famous “clash” of civilizations. On the level of human rights, the vast majority of cultures have agreed on a shared canon, such as the habeas corpus principle. In the movie Teak Leaves at the Temples (Nugroho, 2007), Mazzola argued that the language of gestures might be an approach to a non-divisional and mutual understanding of musical cultures.

66 A Computational Music Theory of Everything: Dream or Project?

1703

For these reasons, the COMMUTE thesis is not only a scientific program; it also pertains to the ethical idea of a unified humanity: humans should attempt to understand their basic unity and to overcome forces of separation.

Uncontrollable Complexity A third argument against the COMMUTE thesis could be that music, even in its computational aspects, is too complex to be understood in its (computational) totality. One could also rephrase this argument, stating that whatever might be subsumed under a COMMUTE would be too simple and too marginal to be seriously considered a theory or everything. We already mentioned that very probably psychology and sociology of music might – for the time being – be unreachable by computational methods. This argument is supported by the complexity of the human psyche and mental stratum (mathematics, language, emotions, memory, and so forth). This argument is also met for the physical ToE, where the solution is found by excluding such realities from the physical roadmap. Can we simply copy the physical methodology and focus on the musical realm outside the human psyche and mental stratum? In physics this works because physics by definition works on the external reality, the Cartesian “res extensa.” But music does not live “out there” alone, although it has its physical dimension in acoustics and associated technology. But let us recall that physics also cannot be restricted to the external reality when it comes to physical formulas: the mathematical mentality is the most intimate part of physics; experiments within the “res extensa” cannot be performed without a huge amount of mathematics. Quantum mechanics, for example, is a mathematical theory that relates abstract objects (living in Hilbert space) to their experimental reality. And general relativity is a thorough mathematization of space-time, eliminating the concept of a physical force in favor of a purely geometric structure. This means that a ToE and a COMMUTE must both accept the mental reality of mathematics, a reality which pertains to the Cartesian “res cogitans,” which does not mean that ToE and COMMUTE would explain the mathematical reality. They simply presuppose it in their unfolding. We are therefore stepping away from the psychological and sociological fields and have to reconsider the argument of excessive complexity of music aside from these two domains. There is one important aspect of musical ontology, which is not relevant in physics and does not, strictly speaking, pertain to the psychological or mathematical realities, namely, the semiotic dimension of music. Music is a meaningful system; it is/has a semiotic anatomy with its expressions, signification processes, and contents. For example, harmony is a classical semiotic structure, where the expressive surface of the score’s notes is given a content according to a harmonic mechanism. The Riemannian concept of tonality, for example, is such a harmonic device. It is built to assign to every chord expression one of the three content values: tonic, dominant, or subdominant. Or the classical Fuxian species counterpoint, where a semiotic syntax of consonant and dissonant intervals

1704

G. Mazzola

is constructed. Of course, this musical semiotics can also include psychological contents, but its overall architecture, specifically in music theory, is more formal. Consonant intervals, for example, are not referring to psychological contents, the dissonant fourth being the counterexample. Psychologically, it is consonant, but for Fuxian counterpoint, it is dissonant. Therefore the third argument reduces to the question whether musical semiotics is too far beyond a computational reality to be controlled. There are two answers to this challenge: The first answer regards semiotic structures in music theory, such as harmony, especially modulation theory, rhythm theory, melody theory, counterpoint, or serialism. In all these cases, and also in performance theory, which is in part a rhetoric enterprise, mathematical music theory has developed mathematically explicit models of semantic processes. The second answer regards the question about a mathematical theory of semiotics. This is a deep problem since the very process of creativity relies on extensions of a given semiotic; creativity is a semiotic phenomenon. If one aims at understanding creativity and its potential for becoming a computational field of knowledge, one has to solve the problem of a mathematical theory of semiotics. With regard to this challenge, the answer at present is as follows: there are efforts to build such a mathematical semiotics, for example, by Mazzola (2019). Summarizing this third argument’s discussion, one can say that a COMMUTE faces some challenges, but it is not a priori excluded from their solutions; efforts to overcome complexity in music theory using computational approaches, and especially in music semiotics, are being made.

What Does “Computational” Mean in COMMUTE? Let us terminate this exposition with a discussion of the specification “computational” in COMMUTE. This means two things: 1. Such a music theory should be covering all that is accessible by mathematical methods and concepts. 2. It should be accessible via computational engines such as computers. This is a restriction of “everything” to “computational everything.” It has the same function as in physics: what is not accessible in this way is not physically relevant. It is simply an expression of modesty; one only considers topics that are “visible” to computation. Of course, as musical research progresses, more and more things may become “visible.” It is however risky to restrict one’s views to what is actually computable without reflecting upon the conditions of actual computation. For example, Stephan Hawking claimed that the concept of a God is superfluous in physics. He did not consider the language and spirit of mathematics as being a conditio sine qua non for physics. The computational principle is pre-physical, and Hawking forgot to ask where we get that language from. Radical neuroscientists

66 A Computational Music Theory of Everything: Dream or Project?

1705

might argue that mathematics is an artifact of neurons, i.e., a product of physical reality. But any proof thereof would use huge mathematical tools, which creates a circular argument, and it would not explain anything. In this sense, COMMUTE is a methodological limitation and should not be taken as a definition of music, but as a perspective, which can be tested and used to shape concrete progress. Nevertheless, the hypothesis is a very strong one, similar to ToE, or even more tricky because the conceptual landscape in music is less unified than in physics.

Some Directions Toward COMMUTE In the following sections, a number of vectors toward such a COMMUTE are described, without claiming completeness. “Mathematical Music Theory” is abbreviated by “MaMuTh.”

Two Dimensions, Same Idea: Harmony and Rhythm Harmony and rhythm have played very different roles in the history and cultural diversity of music. For example, recent research (Munyaradzi and Zimidzi, 2012) stresses the fundamental difference in harmony and rhythm between classical European and African music. The complexity of Western harmony corresponds to the complexity of African polyrhythms, and vice versa: the simplicity of Western rhythms corresponds to the simplicity of African harmony. This difference however does not mean that rhythm cannot be dealt with like harmony. In fact, both phenomena deal with periodic sets of events, octave periodicity in harmony vs. time periodicity of rhythm. In MaMuTh, Mazzola’s modulation model as described above is not limited to pitch, but can equally be applied to time; just rotate pitch by 90◦ into time. This has been used for rhythmic modulation in the first movement of the composition Synthesis; see Mazzola (2002, Ch. 50). This double periodicity has also been used to investigate periodic structures in pitch or time using the finite Fourier decomposition of periodic functions; see Amiot (2016). The classical theory of Fourier for time functions is a classical theory (partials, fast Fourier transform, etc.), but the (octave-)periodic pitch functions have been analyzed only recently by David Lewin, Ian Quinn, and Emmanuel Amiot. This research proves that harmony and rhythm could converge to a unified theory of periodic functions in a number of parameter spaces. And such a unification could eventually create a less diversified perspective of Western and African musical cultures. Of course, the difference between music sociology of Europe and Africa remains untouched: the African music culture is shared by everybody, from childhood to adult life; Africans are natural musicians; the European musician as a specifically educated person is not a standard role in Africa. This difference may vanish in the future if we learn to think in pitch and time according to the same theoretical and compositional paradigms.

1706

G. Mazzola

Understanding Harmony and Counterpoint via Gestures In MaMuTh, modulation theory and counterpoint were developed using symmetries on pitch class spaces. This algebraic setup had its limits. For modulation theory the paring of tonalities was only permitted for tonalities in the same orbit of the symmetry group of transpositions and inversions. Modulation from a major tonality to a gipsy tonality or even a pentatonic one was not conceived. A similar restriction is applied to contrapuntal concepts. The mathematical model of counterpoint (Mazzola, 2002, Part VII ) used a finite number of consonances selected by specific symmetries. See below for a more detailed discussion of the contrapuntal model. It would crash if we had to consider an infinity of consonances in a continuous and therefore infinite interval model of counterpoint. Both restrictions could be solved by a shared new language: musical gestures. Modulation could be remodeled as a gestural deformation of chords instead of a symmetric action (Mazzola, 2017). Such a deformation does no longer require that the two tonalities live in the same symmetry orbit. Using gestures and their singular homology theory, counterpoint could be remodeled independently of the number of consonant intervals at stake (Counterpoint et al., 2015, Ch. 10). It is not possible to give a detailed account of the gestural application to modulation and counterpoint. But we can give a hint about how singular homology can be related to gesture theory. Singular homology is based on singular simplices in topological spaces. A singular simplex σ is a continuous map σ : Δn → X from an n-dimensional simplex Δn to a topological space X. This can also be stated as a continuous map Δ1 → T op(Δn−1 , X) from the one-dimensional simplex to the topological space T op(Δn−1 , X) of continuous maps f : Δn−1 → X. Continuing this recursive descent, one comes to the situation of (1) continuous maps f : Δ1 → X and (2) topological spaces of singular simplexes T op(Δn , X). Now, mathematical gestures are generalizations of continuous maps f : Δ1 → X, and spaces T op(Δn , X) relate to what are “hypergestures,” i.e., gestures mapping into topological spaces of gestures. This association of homology and gesture theory enables a homology theory of gestures, and this one is used to generalize the contrapuntal setup to possibly infinite sets of consonances. This is an example of mathematical gesture theory being a unifier of music theory, an interesting parallel to physics, where string theory is a strong candidate for ToE. Strings are analogs to gestures in music theory; in other words, gestures are the musical analog to strings.

Counterpoint Worlds for Different Musical Cultures The mathematical theory of counterpoint (Counterpoint et al., 2015) not only embraced the Fux tradition but also opened with its five new worlds connections to Raga music and Scriabin’s mystic chord. Such a contrapuntal world is defined

66 A Computational Music Theory of Everything: Dream or Project?

1707

by a specific dichotomy of the 12 intervals into “consonant” and “dissonant” intervals. The Fux dichotomy (K = {0, 3, 4, 7, 8, 9}, D = {1, 2, 5, 6, 10, 11}) of consonances K and dissonances D is just one of six possible dichotomies. Another one is the “major” dichotomy ({2, 4, 5, 7, 9, 11}, {0, 1, 3, 6, 8, 10}), which has its “consonances” being defined as the proper intervals from the tonic of the major scale. The defining property of a contrapuntal world is the existence of a unique symmetry between the “consonant” and the “dissonant” parts. For the Fux dichotomy, this is the symmetry d = 5k + 2. Every dissonance is the image under this symmetry of exactly one consonance. For example, the dissonant fourth, described by an interval of d = 5 semitone steps, is the image of the minor third k = 3, in fact, 5 = 5 · 3 + 2 modulo the octave 12. For the major dichotomy, the symmetry d = 11k + 5 does the job. Scriabin’s mystic chord Myst = {C, F #, B, E, A, D} defines another dichotomy, and Scriabin interestingly considered his chord as a consonant kernel entity. The counterpoint model uses the physical idea of forces being induced by local symmetries. Here, as shown in Fig. 7, a local symmetry g deforms the Fux dichotomy (K, D), and the two consonant intervals (3) and (5) are now separated by the deformed limit line; the movement between these intervals is a “virtual” passage from a deformed consonant to a deformed dissonant interval. This theory allows to reconstruct all the Fux rules for the first species counterpoint, in particular the rule of forbidden parallels of fifths. This theory is on its way to a global counterpoint theory. Connecting counterpoint models with Indian music (especially relating to the major dichotomy) is a sensational bridge between totally different musical cultures; see Mazzola (2002, Ch. 31.4.2). We should recall here that connections between Western and Raga music have been investigated by Robert Morris and Chitravina N. Ravikiran under the title of “Melharmony” (Ravikiran, 2014). This theory “aims to create chords

Fig. 7 The movement in species counterpoint from consonant interval minor third (3) to fifth (5) is defined by deforming the dichotomy (K, D) via a symmetry g such that in this deformed image, the first interval (3) would be a deformed consonance, while the target interval (5) would be a deformed dissonance

1708

G. Mazzola

and counterpoints based on the melodic rules of evolved systems across the world” (citation form Wikipedia). There is hope that Mazzola’s and Agustín-Aquino’s counterpoint worlds and melharmony converge to a new synthesis of two strong theoretical traditions.

Unification of Mental and Physical Realities in Music: Introducing Complex Time In MaMuTh, complex time was introduced in Mazzola and Mannone (2015) to solve problems of the transition from thinking to making music. The concept of complex time unifies Descartes’ res extensa, the physical reality, with res cogitans, the thinking, mental reality. The complex space-time is the space R3 × C, which is the sum of the four-dimensional res extensa space-time R3 × R with real time R and the res cogitans space-time R3 × iR with the imaginary time iR. This is a strong step toward a unification of a well-known duality in music: thinking and making, the physical utterance of performance vs. the mental construction in composition, performance, and improvisation. In Mazzola (2018, Ch. 78), ways of structurally connecting these two realities by means of world sheets were sketched, which are completely analogous to world sheets in physical string theory. Figure 8 shows such a world sheet that connects the symbolic gesture of moving a pianist’s finger up and down to the physical gesture of a real movement of the pianist’s finger. The duality of thinking and making in music has been a dividing force between theorists and performers in the Western music world. It is also present in the academic structure where “applied” scholars (teaching instrumental fields) are separated from “academic” scholars (music theory, musicology, education, ethnomusicology, and music psychology). Their interaction is reduced to a poor “laissez vivre” and is not based on a shared reality, a fact that complex time theories might annihilate in the future.

x 0

6

4

2

2 1

s 0 –1 6 4

y

2 0

Fig. 8 The symbolic gesture of moving a pianist’s finger up and down (left) is connected by a world sheet to the physical gesture of a real movement of the pianist’s finger (right)

66 A Computational Music Theory of Everything: Dream or Project?

1709

Unifying Note Performance and Gestural Performance: Lie Operators Computational performance theory was strongly developed by the Stockholm group around Johan Sundberg (Friberg et al., 2006). This approach was concerned with the transformation of notes to sound events. It could be shown by the work of the Zurich group of Mazzola (Mazzola, 2002, Ch. 39.7) that important cases of such a performative transformation can be described by classical Lie operators from differential geometry. More precisely, the transition from a given performance stage to a more refined one is described by a Lie operator acting on the given performance vector field. Performance vector fields are a conceptual generalization to other parameters, such as pitch, loudness, duration, glissando, and crescendo, of the classical one-dimensional time performance field, namely, the tempo curve. Recall that the tempo curve is the inverse of the differential quotient of the transition p : Symbolic T ime → P hysical T ime. When performance theory was extended to gestural performance, i.e., the transformation of symbolic gestures in the score to physical gestures of a musician, it could be shown that the same Lie operator formalism can be carried over to gestures (Mazzola, 2018, Ch. 78.2.13). In other words, we now have a unified formalism of performance theory for notes and for gestures.

Unifying Composition and Improvisation? It is an open question in how far composition and improvisation could be unified as special cases of a unique but still hidden dynamics. One knows of many famous composers, such as Beethoven or Mozart, that their compositions were often created from improvisation; see the genesis of Beethoven’s Sonata Op. 109 (Kinderman, 2003), for example. In the Indian tradition of Raga, improvisation and composition are intimately related on the basis of mela scales. Perhaps the combination of flow concepts and gestures, as sketched in the free jazz book (Mazzola and Cherlin, 2009), could help find a unified understanding of musical creativity in composition and improvisation.

Conclusions So is COMMUTE a dream or a project? Comparing the musical situation to the physical one, it is more likely to be a dream, or, say, a daydream. The Big Science in physics is not paralleled by an equivalent in music, although some institutions, such as the IRCAM (Institut de Recherche et Coordination Acoustique/Musique) in Paris, are targeting to a larger science project. What are the obstructions so far? The three counterarguments are relevant points here. Music is still celebrated (also with pronounced economic advantages) as a work of geniuses, as a more or less divine

1710

G. Mazzola

revelation, and as a super-complex entity transcending scientific methodologies. This attitude is a psychological wall, which can only be overcome by a diligent enlightenment. We have presented some very promising progress in MaMuTh. May its arguments help turn the daydream into a prosperous project. After all, music is of fundamental importance to humans.

References Agawu K (2003) Representing African music. Routledge, New York/London Agustín-Aquino OA, Junod J, Mazzola G (2015) Computational counterpoint worlds. Springer series computational music science. Heidelberg Amiot E (2016) Music through fourier space. Springer series computational music science. Heidelberg, Springer Friberg A, Bresin R, Sundberg J (2006) Overview of the KTH rule system for music performance. Advances in experimental psychology. Special issue on music performance Kinderman W (2003) Artaria 195. University of Illinois Press, Urbana/Chicago Larson S (2012) Musical forces. Indiana University Press, Bloomington Laughlin R (2005) A different universe: reinventing physics from the bottom. Basic Books, New York Mazzola G (1989–1994) presto software manual. SToA music, Zürich Mazzola G (2002) The topos of music. Birkháuser, Basel Mazzola G (2017) Gestural dynamics in modulation: (towards) a musical string theory. In: Pareyon G, Pina-Romero S, Agustin-Aquino OA, Lluis-Puebla E (eds) The musical-mathematical mind. Springer, Springer series computational music science. Heidelberg Mazzola G (2018) The topos of music, vol III: gestures, 2nd edn. Springer, Heidelberg Mazzola G (2019) Functorial semiotics for creativity. Submitted for publication Mazzola G, Cherlin PB (2009) Flow, gesture, and spaces in free Jazz—towards a theory of collaboration. Springer series computational music science. Heidelberg, Springer Mazzola G, Mannone M (2015) Hypergestures in complex time: creative performance between sybmbolic and physical reality. Springer proceedings of the MCM15 conference, London Munyaradzi G, Zimidzi W (2012) Comparison of Western music and African music. Creat Educ 3(2):193–195 Nugroho G (2007) Teak leaves at the temples. DVD, Trimax Enterprises Film Penrose R (2002) The road to reality. Vintage, London Ravikiran CN (2014) Robert Morris and the concept of Melharmony. Perspect New Music 52(2):154–161 Schönberg A (1911) Harmonielehre. Universal Edition, Wien 1966

Groovy Mathematics: Toward a Theoretical Model of Rhythm

67

Carl Haakon Waadeland

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Order in Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Natural Attraction to Rhythmic Behavior and Experience of Rhythm . . . . . . . . . . . . . . Expressive Timing in Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling Music Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RFM: A Continuous Model of Rhythm Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oscillations and Rhythmic Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Synthesis of Expressive Timing by Frequency Modulation . . . . . . . . . . . . . . . . . . . . . . . . Computer Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulating Movements in Rhythmic Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Synthesis of Asymmetric Movement Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration: RFM Simulation of fON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1712 1712 1714 1714 1715 1716 1716 1717 1720 1722 1724 1725 1727 1728

Abstract An attraction to rhythmic behavior and experience of rhythm is deeply rooted in man’s nature. Human life and in general all living beings are subject to a chain of structured temporal events, cycles, and well-ordered patterns of movements. Based on the antique understanding of ‘rhythmos’ as “order in the movement,” this chapter presents a theoretical model of rhythm production which is capable of making syntheses of various empirically investigated aspects of micro-timing in performances of rhythm and is also shown to be a possible approach in the

C. H. Waadeland () Department of Music, Norwegian University of Science and Technology (NTNU), Trondheim, Norway e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_96

1711

1712

C. H. Waadeland

construction of approximations to movement trajectories observed in studies of timed, rhythmic human behavior. The mathematical model is built upon frequency modulation of idealized, sinusoidal movement curves representing rhythmic structure and might be seen as demonstrating an application of “groovy mathematics.”

Keywords Rhythm · Rhythmos · Timing · Music performance · Rhythmic behavior · Frequency modulation · Model · Synthesis · Movement trajectories

Introduction Order in Movement To place the topic of this chapter in a more general context, it is both interesting and relevant to achieve some acquaintance with the origin of the concept ‘rhythm.’ Etymologically ‘rhythm’ originates from rhythmos, which is a concept born of the ancient Greeks’ understanding of music, of which a general overall feature is that music (mousiké/mousiké techne) and the “musical” have a dominating and integrated role in social life, human coexistence, and the understanding of reality as a whole. This is demonstrated in a beautiful way in the myth of the birth of the Muses, as told by Pindar in the fifth century B.C., in his hymn to Zeus (see Otto 1955, p. 28): Als Zeus die Welt geordnet hatte, betrachteten die Götter mit stummem Staunen die Herrlichkeit, die sich ihren Augen darbot. Endlich fragte sie der Göttervater, ob sie noch etwas vermissten. Da antworteten sie, es fehle noch eins: eine Stimme, die grossen Werke und seine ganze Schöpfung in Worten und Tönen zu preisen. Dazu bedurfte es einer neuen göttlichen Wesenheit, und so baten die Götter den Zeus, die Musen zu erzeugen.

The Muses, brought to birth to complete and perfect the existing, performed through singing, dancing, poetry, and playing instruments and provided existence with a voice capable of reporting the glory of the world. According to Greek antiquity, “musical” activities denoted the totality of performing activities executed by the Muses and were as such characterized by the following features: 1. “Musical” activities unfold through time. 2. “Musical” activities are intimately related to the phenomenon ‘movement.’ 3. “Musical” activities exist through a process of progression and are related to a development of action. 4. “Musical” activities represent a form of social fellowship. Of major importance for the theoretical formulation of antiquity’s understanding of music are the philosophers Pythagoras, Plato, and Aristotle. In this respect it

67 Groovy Mathematics: Toward a Theoretical Model of Rhythm

1713

seems adequate to assert Pythagoras as being the most extreme. For him and his disciples, the Pythagoreans, the understanding of music and the understanding of reality itself were two views of the same matter. The Pythagoreans regarded music as a sonorous realization of harmonic relations between numbers, and these numerical relations represented a key to an understanding and comprehension of the being. Founded on Pythagoras’ interpretation of the universal and “cosmic” meaning of numerical relations, the Pythagoreans supposedly stated: “All things resemble number” (see Barker 1989, p. 30). Pythagoras is sometimes also quoted by the statement: “All is number!” These are statements which in many respects have gained renewed validity today, viewed in light of the dominating role digital representations have achieved in technological transformation of information and communication. As pointed out above, movement is a crucial component of the art forms performed by the Muses. On the other hand, movement was also seen as a characteristic of cosmos (e.g., in speaking of the “beautiful ring dance” of the sun, the moon, and the stars) as well as an important quality of the human soul (microcosmos), where different states of mental moods were seen to be related to different “movements” of the soul. Essential in this respect is that the patterns of movements occurring in the “musical” were seen as similar to or, more precisely, as an imitation (mimesis) of corresponding patterns of movements in cosmos as well as in the human soul. The “musical” is thus, according to this antique thinking, governed by the same “laws of movements” which determine processes of cosmos and the human soul. This view reflects the dominating analogy (correspondence) thinking typical of much of antiquity’s understanding of reality and its philosophy in general and the Pythagorean thinking in particular. Through the phenomenon movement, music and the art forms of the Muses thus become an important link, or connection, between cosmos and the human soul. A beautiful expression of this is found in antiquity’s thinking of the well-ordered, harmonic cosmos pictured as a tuned lyre: Through man’s performance of the “musical” art forms, the inner “strings” of the soul will be so tuned that when the cosmic lyre is played, the strings of man’s soul will be brought in vibrating resonance. As a consequence of this view, music, not to say performance of music, also becomes important in education and ethics. This is explicitly formulated by the theorist and pedagogue Damon in the fifth century B.C. and later in works by Plato (The Republic and The Laws) and Aristotle (Politics, book VIII). In light of the importance of the concept ‘movement’ in ancient philosophy and analogy thinking, the meaning of rhythmos becomes extremely interesting. According to Paul Fraisse (1982, pp. 149–150): Rhythmos appears as one of the key words in Ionian philosophy, generally meaning “form,” but an improvised, momentary, and modifiable form. Rhythmos literally signifies a “particular way of flowing.” Plato essentially applied this term to bodily movements, which, like musical sounds, may be described in terms of numbers. He wrote in The Banquet “The system is the result of rapidity and of slowness, at first opposed, then harmonized.” In The Laws he arrived at the fundamental definition that rhythm is “the order in the movement.”

1714

C. H. Waadeland

Hence, we understand that rhythmos is a fundamental concept in antiquity’s understanding of reality and in ancient philosophy in general.

A Natural Attraction to Rhythmic Behavior and Experience of Rhythm The view of rhythmos as a phenomenon representing a structuring and ordering of patterns of movements has much in common with the meaning many of us today, on a purely intuitive basis, will assign to the concept ‘rhythm.’ The need of structuring and grouping events in time, space, and movement seems to be deeply founded in man and in human communication in general. Our whole life is fundamentally formed and organized within well-ordered patterns and repeating cycles of events: • The heartbeat gives a basic pulse, establishing a reference for elapsing time. • Breathing represents a continuous cycling between tension and relaxation (breathe in/breathe out). • The rotation of the earth gives a repeating alternation of night and day. • The gravitational attraction between the earth and the moon causes tidal ebb and flow. • The earth evolving around the sun causes the cycle of the year. These are examples of basic biological and astronomical cycles that we all are subject to. In addition we make our own everyday routines and “working rhythms” and develop our own “life patterns,” which at different times and in varying degree we may break or ignore but which we are still governed by and relate our actions to. Moreover, it is interesting to know that by studying the brain’s electrical activity, Hans Berger (1929) discovered occurrences of “alpha rhythms” and “beta waves” in the brain, and ever since Berger’s early observations, neuroscientists have in various ways studied “brain rhythms” and the oscillatory nature by which the brain organizes neuronal information (Buzáki 2006). Thus, it seems reasonable to say that human life and in general all living organisms are subject to a chain of structured temporal events, cycles, and well-ordered patterns of movements. Based on the Platonic understanding of ‘rhythmos,’ it thereby makes sense to assert that an attraction to rhythmic behavior, rhythmic organization, and experience of rhythm is deeply rooted in man’s nature.

Expressive Timing in Music Around 1930, a large group of researchers headed by Carl E. Seashore studied music performance at the University of Iowa. Most of their reports appeared in two volumes edited by C. E. Seashore (1932, 1937a), both classics in music performance literature. A characteristic feature of music performance, soon discovered by C.E. Seashore and his coworkers, is the different kinds of variability in performance

67 Groovy Mathematics: Toward a Theoretical Model of Rhythm

1715

and various deviations from presumed properties according to musical notation, e.g., deviations in relative note durations compared to simple ratios like 2:1, 3:1, etc. defined in the notation of music, fluctuations in tempo, and various dynamic deviations. As an “explanation” of these facts, C.E. Seashore stated a principle concerning “deviation from the exact” as a characteristic feature of artistic expression: The unlimited resources for vocal and instrumental art lie in artistic deviation from the pure, the true, the exact, the perfect, the rigid, the even, and the precise. This deviation from the exact is, on the whole, the medium for the creation of the beautiful – for the conveying of emotion. (Cited from H. G. Seashore 1937b, p. 155)

Various such deviations have been studied in empirical rhythm research by investigating systematic variations of durations (SYVARD; see, e.g., Bengtsson and Gabrielsson (1983)) and are also discussed and investigated as participatory discrepancies (Keil 1995; Prögler 1995; Butterfield 2010). Deliberate discrepancies or deviations may be seen as a process by which (more or less) conceptualized structural properties of rhythm are transformed into live performances of rhythm by the performing musician. Such a process of rhythmic transformation is often denoted expressive timing and is among the characteristic features of expressive music performances (see Clarke (1999) and Gabrielsson (1999, 2003)).

Modeling Music Performance Several computational models of expressive music performance have through the last 30 years or so been developed. An overview of a large number of these models is given by Kirke and Miranda (2013). Interesting computational models of rhythm and meter are also found in Boenn (2018). An underlying assumption in the construction of these models is that there is a strong systematic link between musical structure and structure of music performance, and a common strategy in many of the model constructions has been to apply various quantitative empirical research to identify different performance rules by which expressive music performance can be modeled. A very influential model in this respect is the KTH rule system for music performance, which is a result of a long-term research project initiated by Johan Sundberg (Sundberg et al. 1983). For an overview of this model, see Friberg et al. (2006). The program Director Musices represents an implementation of the rule system, with input being a symbolic score, and the output is the expressive rendering. The rules model typical performance principles such as phrasing, articulation, intonation, micro-level timing, rhythmic patterns, and tonal tension. Examples of some KTH performance rules are: “Create arch-like tempo and sound level changes over phrases,” “Shorten relatively short notes and lengthen relatively long notes,” and “Introduce long-short patterns for equal note values (swing)” (Friberg et al. 2006, p. 148). A common feature of most computational models of music performance, which is also a basic characteristic in the identification of the KTH rules, is that music

1716

C. H. Waadeland

performance is primarily studied by investigating and modeling note onsets/attack points and durations. These models are thereby based on information of a finite number of discrete points along the one-dimensional axis of time. The very object of study, the musical performance itself, however, is unfolded as a continuous movement in time and space, created through an interaction between the musician, a musical instrument, and different physical/social environments related to which the performance takes place. Rather than presenting a discrete model based on performance rules, this chapter will therefore describe the construction and application of a continuous model of rhythmic behavior which is intimately related to the antique meaning of rhythmos: the order in the movement. This model is based on rhythmic frequency modulation, RFM, and was presented by Waadeland (2000, 2001) and developed further by Waadeland and Saue (2018).

RFM: A Continuous Model of Rhythm Performance Oscillations and Rhythmic Structure When oscillating the index finger up and down in the air in such a way that the minimal points are in perfect synchronization with the clicks from a metronome, an idealized curve describing the finger movement could be something similar to what is shown in Fig. 1. To be quite explicit, this curve is given by the mathematical function: y(t) = A[1 – cos(2πft)],

(1)

where t is time, f is the frequency of the finger movement, and A is the amplitude (2A is a measure of the finger’s maximal distance from the minimal value, 0). In Fig. 1 f = 2. Viviani (1990) stated that sine waves are easy to approximate by human movements and are among the simplest predictable motions, whereas Balasubramaniam et al. (2004) conducted an empirical experiment where they found that an unpaced

2A A

0

0.5

1 t = time [s]

1.5

2

Period= 1/f

Fig. 1 Graphic illustration of an idealized performance of isochronous beats with frequency f = 2. Time is displayed along the horizontal axis, and the finger’s vertical position is measured along the vertical axis

67 Groovy Mathematics: Toward a Theoretical Model of Rhythm

1717

Fig. 2 An illustration showing the connection between a sequence of notes, a movement curve, and a mathematical representation of sinusoids, all related to a robot-like rhythmic performance executed in perfect synchronization with a metronome. The horizontal axis displays time, t, where the first beat occurs at time t = 0. In this figure the amplitude is made a function of note duration, : A() =  = 1/frequency, reflecting that the amplitudes of finger oscillations in the performance of notes with larger frequencies are smaller than in the performance of note values with smaller frequencies (which is in accordance with most real performances)

oscillation of the index finger tends to create a sinusoidal, close-to-symmetric movement curve, but when the finger movements were paced by clicks from a metronome, the movement curves attained different characteristic asymmetric shapes. Thus, the trajectory in Fig. 1 shows an illustration of a static, robot-like (unpaced) performance of isochronous beats. If the sinusoid in (Eq. 1) with frequency f is a representation of (a performance of) quarter notes, a corresponding sinusoid with frequency 2f will represent eighth notes, 3f corresponds to eighth note triplets, whereas (2/3)f represents dotted quarter notes. In other words, every note value may be represented by a sinusoid oscillation, and various sequences of such oscillations thereby make up a continuous representation of rhythmic structure. Figure 2 illustrates the connection between a sequence of notes, a movement curve, and a representation by sinusoidal oscillations. In this figure quarter notes correspond to the frequency f = 1.

Synthesis of Expressive Timing by Frequency Modulation In the model of rhythmic structure presented in the previous, metronomic performances of note values are represented by frequencies of sinusoidal functions. This is the basic premise and starting point in the construction of the RFM model. The next step in the model construction is to introduce some operation inducing deviations or alterations of frequencies that reflect empirically observed deviations in live, non-metronomic, performances of note values. A well-known technique of sound synthesis using various alterations or distortions of the frequency of an oscillator in order to achieve parameter control over the spectral richness of sound is frequency

1718

C. H. Waadeland

modulation, FM, pioneered by Chowning (1973). The most basic FM instrument consists of two sinusoidal oscillators interacting to give the output: y(t) = Asin[2πfc t + dsin(2πfm t)].

(2)

A is the amplitude, fc is commonly denoted carrier frequency, fm is the modulating frequency, and d is the peak frequency deviation (ibid.). If d = 0, there is no modulation, and the output is simply a sine wave with frequency fc . This very situation resembles, at least on a purely theoretical level, the situation the RFM model will establish for syntheses of rhythm: When there is no modulation, the result is a strict metronomic (i.e., sinusoidal) performance. When, on the other hand, modulation occurs, various deviations of frequency are created, resulting in different kinds of “deviations from the exact” in the modeled performance. Motivated by this observation, the basic RFM algorithm is defined on the basis of a combination of Eqs. (1) and (2) above and is illustrated in Fig. 3. The output of this RFM algorithm is given by the function:    y(t) = A 1– cos 2πfc t + dsinn [2π(fm t + φm )] ,

(3)

where t = time, A = carrier amplitude, fc = carrier frequency, fm = modulator frequency, φm = modulator phase divided by 2π, d = peak frequency deviation = modulator amplitude = strength of modulation, n = exponent of modulating function. Fig. 3 Flowchart for basic rhythmic frequency modulation

67 Groovy Mathematics: Toward a Theoretical Model of Rhythm

1719

It should be noted that Eq. (3) defines how a modulator operates on the function y(t) = A[1 – cos(2πfc t)], representing a specific note value (e.g., quarter note). Frequency modulation of subdivisions and ties of this sinusoid is defined accordingly and is expressed explicitly in Waadeland (2001, p. 29). Example 1: RFM Synthesis of Vienna Waltz Accompaniment As noted by Bengtsson and Gabrielsson (1983), a well-known feature of performances of Vienna waltzes occurs at the beat level in the accompaniment; the first beat is shortened, and the second beat is lengthened, whereas the third beat is close to one third of the measure length. Thus, the quarter note beats in the ¾ meter are characterized by a cyclic pattern of durations: short (S)–long (L)–intermediate (I). These deviations from metronomic regularity may, indeed, vary throughout the performance of one singular waltz and may also change in proportional values in different performances. However, as pointed out by Bengtsson and Gabrielsson (ibid., p. 42), a typical distribution of beat durations in a Vienna waltz accompaniment may be first beat, 25–27%; second beat, 40–42%; and third beat, close to 33% of the total measure. By modulating a metronomic performance of a Vienna waltz accompaniment which is played with equal durations of the quarter notes, as written in the notation of the music, the RFM model is able to simulate the characteristic S– L–I pattern by means of the RFM algorithm in (Eq. 3). This is obtained by choosing fc = 3 (3 beats to the measure), fm = 1, φm = 0.25, n = 1, and d = 1 (see Waadeland (2001, pp. 30–32)). Figure 4 shows the movement curve of this simulation. Observe that the movement curve now consists of different asymmetric movements.

Fig. 4 Illustration of a modulated movement curve associated with a synthesis of a performance of the first two measures of a Vienna waltz accompaniment. The durations of the metronomic quarter notes are “stretched” and “compressed” by the action of the rhythmic frequency modulation and the characteristic cyclic pattern of beat durations; S–L–I is created

1720

C. H. Waadeland

Computer Implementation In order to investigate the sounding implications of the RFM model, a computer application FMRhythm, which allows interactive experimentation with modulation parameters, has been developed (Saue 2000) (see also Waadeland (2001) and Waadeland and Saue (2018)). FM Rhythm is written in C++ and is available as open source at https://github.com/ssaue/FMrhythm. The basic ideas of the computer implementation are: • A multi-track, polyphonic MIDI recording is imported into the computer as a MIDI file. (Specifications for MIDI and the standard MIDI file format are available from the web site of the MIDI Association; see References.) • The computer calculates RFM and gives a graphic representation of modulated oscillations according to the definition given in the previous. • The computer “runs through” the movement curves created by means of RFM, and in every minimal point (modulated or not), the computer generates a sound on the basis of the following MIDI parameters: MIDI channel, note number, duration (note on/note off), and velocity. Figure 5 shows the setup of a single modulator and the parameters available. The application supports a number of modulating waveforms (sine, sawtooth, triangle, and square) and more complex setups with two modulating oscillations in series or in parallel. A sounding realization of the simulation of Vienna waltz accompaniment described above may be heard at the URL “Audio Realization of FM Rhythm Synthesis of Swing” (see References).

Fig. 5 Illustration of the basic RFM instrument written and designed in C++

67 Groovy Mathematics: Toward a Theoretical Model of Rhythm

1721

Example 2: Synthesis of Swing Groove A swing groove is a particular rhythmic ostinato played by jazz drummers. The swing groove originated in the swing era and has developed further in later jazz styles, like bebop and contemporary jazz. A typical swing groove is most often played on the ride cymbal. In its most basic form, a swing groove may have the written representation illustrated in Fig. 6. The remark “With swing feel” in the notation in Fig. 6 indicates that the experienced jazz drummer plays this groove according to her or his embodied understanding of swing, where the eighth notes in the swing groove are not performed with equal durations, as written in Fig. 6, but rather with characteristic long–short patterns. These patterns change the ratio between the durations of successive pairs of eighth notes, often called the swing ratio, from 1:1 (subdivision in eighth notes (as written in Fig. 6)) to 2:1 (subdivision in eighth note triplets) or around 3:1 (sixteenth note subdivision) – or somewhere in between – dependent on, among other things, the individual drummer and the tempo of the performance. Several studies of how different performance conditions influence the swing ratio have been conducted (see, e.g., Friberg and Sundström (2002), Waadeland (2006), and Honing and De Haas (2008)). RFM simulations of swing grooves for different fixed values of swing ratios have been constructed (Waadeland 2001). However, live performances of rhythm are often characterized by various local and long-term fluctuations from static patterns of systematic deviations. To be able to make syntheses of various time- and tempodependent performances of a swing groove, the RFM model has therefore been further developed into a dynamic model by making the peak frequency deviation, d, a function of time and tempo (Waadeland and Saue 2018). Audible realizations of some dynamic swing syntheses may be heard at “Audio Realization of FM Rhythm Synthesis of Swing” (cf. References). Example 3: A “Weird” Two-Part Bach Invention Rhythmic frequency modulation can also be applied to create exciting unplayable rather “weird” performances of music. Waadeland and Saue (2018) demonstrated this by importing a metronomic, quantized MIDI recording of J. S. Bach’s composition “2-Part Invention No. 13 in A Minor” as a MIDI file into the RFM program. The voices played by the right and the left hand were given different frequency modulation in such a way that the two voices are moving out of sync and into sync with each other in a somewhat “symmetric” way: Whenever the first voice is making an accelerando, the second voice is making a ritardando and vice versa. The two With swing feel

4 4 Fig. 6 An often written representation of a swing groove, commonly played on a ride cymbal

1722

C. H. Waadeland

voices “meet” (i.e., are synchronized) every eighth bar and also at the end. Such a “weird” performance may be heard at the URL “Audio Realization of FM Rhythm Synthesis of Swing” (see References). This example shows how a new piece of electronic music can be created and indicates that RFM synthesis might represent an interesting compositional instrument for electro-acoustic music. Apart from the last example, this chapter has so far demonstrated how the RFM model may be applied to construct various syntheses of empirically investigated aspects of timing in performances of rhythm. In the next section, it will be shown that rhythmic frequency modulation is also a possible approach in the construction of approximations to movement trajectories observed in studies of timed rhythmic behavior, as investigated through the application of motion capture systems. For an overview of how motion capture systems are used in studies of music performance, see, e.g., Dahl (2016) and Nusseck et al. (2017).

Simulating Movements in Rhythmic Behavior There are two dominant traditions in behavioral studies of movement timing: the information processing approach and the dynamical systems approach, also referred to as the nonlinear oscillator approach (cf. Beek et al. 2000; Wing and Beek 2002; Balasubramaniam 2006). The information processing approach deals with discrete aspects of timing behavior, and the variables of major interest are time intervals, i.e., intertap intervals and asynchronies. On the other hand, rather than studying discrete synchronization events, the dynamical systems approach investigates continuous movement trajectories, dynamic pattern formation, and the evolution of performance with time. In the dynamical systems approach, timing is considered to be an emergent property of the organizational principles that govern a particular coordinated action, whereas in the information processing approach, time is considered a mental abstraction that depends on central timing processes, represented independently of any particular effector system (cf. Wing and Beek 2002). These different approaches in studies of movement timing are closely related to the theoretical framework that distinguishes between two forms of timing control: emergent timing and event-based timing (see, e.g., Delignières and Torre (2011)). As stated by Delignières and Torre (ibid., p. 313): “ . . . , the essential difference between event-based and emergent timing is in the involvement or noninvolvement, respectively, of an abstract and effector independent representation of the time intervals to produce.” Very interesting discussions related to whether event-based and emergent timing can coexist or not in a single task are found in Repp and Steinman (2010) and Delignières and Torre (2011). In Delignières and Torre (ibid.), we read that some timing tasks tend to favor event-based timing (i.e., discrete finger tapping), some others emergent timing (i.e., continuous circle drawing or forearm oscillations), whereas other tasks appear more ambiguous. Moreover, they state: “Air tapping (in which taps are performed in the air, without contact with any surface) seems to present this ambiguity” (ibid., p. 313).

67 Groovy Mathematics: Toward a Theoretical Model of Rhythm

1723

An interesting empirical investigation that emphasizes the importance of combining the dynamical systems approach and the information processing accounts of movement timing is presented by Balasubramaniam et al. (2004). As a starting point for their study, they comment that whereas previous investigations of paced repetitive movements with respect to an external beat have either emphasized the form of movement trajectories (the dynamical systems approach) or timing errors made with respect to the external beat (the information processing approach), the question of what kinds of movement trajectories assist timing accuracy has not previously been addressed. Following up this question, Balasubramaniam et al. (ibid.) constructed a new experimental paradigm aimed at investigating how various timing tasks are reflected in different movement trajectories. This experiment involves synchronization or syncopation with an external auditory metronome, and they show that the nervous system produces trajectories that are asymmetric with respect to time and velocity in the out and return phases of the repeating movement cycle (see Fig. 7). Moreover, they find that this asymmetry is task specific and independent of motor implementation details (flexion vs. extension) and, furthermore, that the degree of asymmetry in the flexion and extension movement times is positively correlated with timing accuracy. On the basis of their findings, they suggest that “movement asymmetry in repetitive timing tasks helps satisfy requirements of precision and accuracy relative to a target event” (ibid., p. 129). Thus, they point at an interesting result which is related to research questions that are basic to both of the two dominant traditions in behavioral studies of movement timing. Correlation between asymmetric movement trajectories and timing accuracy has also been discussed in Balasubramaniam (2006), Delignières and Torre (2011), Elliott et al. (2009), and Torre and Balasubramaniam (2009) and is given additional support by results from empirical investigation on synchronization of the index finger with various visual pacing sequences (Hove and Keller 2010). Empirical studies of drummers’ movements in the performance of different rhythms and grooves also show that the movement of the drumstick produces trajectories that are asymmetric (cf. Dahl 2004, 2006, 2011; Waadeland 2003, 2006, 2011). Results from different investigations on gestural aspects of timed rhythmic movements thus indicate that the production of asymmetric movement trajectories is a feature that seems to be a common characteristic of various performances of repetitive rhythmic patterns. The behavioral or neural origin of these asymmetrical trajectories is, however, not identified.

Fig. 7 Example of asymmetric trajectory. Illustration of asymmetric movement trajectory in the experiment of Balasubramaniam et al. This trajectory is the result of a subject being instructed to synchronize the minimal points of the movement of the index finger to the beats of a metronome. (Adapted from Fig.1 in Balasubramaniam et al. 2004, p. 130)

1724

C. H. Waadeland

Synthesis of Asymmetric Movement Trajectories How might the different empirically documented asymmetric movement curves be approximated in a model of rhythmic behavior? In Waadeland (2017) it is demonstrated that one possible solution is given by an application of the RFM model described in this chapter. As was shown in Fig. 4 in the previous, a result of applying RFM to symmetric (sinusoidal) curves is that the trajectories become asymmetric in various ways. Based on this simple observation, the idea (ibid.) was to adjust the modulation parameters in Eq. (3) in such a way as to make the corresponding movement curves approximate the empirically documented movement trajectories in the experiment of Balasubramaniam et al. (2004), hereafter called the BWD experiment. The performance conditions in the BWD experiment were the following: Subjects were instructed to synchronize their index finger movement to a metronome in two ways: (i) Peak flexion on the beat: fON (i.e., to synchronize the minimal points of the movement to the beat of the metronome). (ii) Peak extension on the beat: eON (i.e., to synchronize the maximal points of the movement to the beat), moreover, to syncopate: (iii) Peak flexion off the beat: fOFF (i.e., flexing to strike (midway) between beats), and, in an unpaced condition: (iv) Subjects were instructed to oscillate their index finger at a comfortable frequency and amplitude in the absence of a metronome. In all these conditions, the index finger made no contact with any surface during the movement trials, i.e., the subjects performed “air tapping.” The conditions (i)– (iii) were conducted with three different metronome frequencies: 1000 ms (1 Hz), 750 ms (1.33 Hz), and 500 ms (2 Hz). The kinematics of the movement trajectories of the index finger were recorded by a motion capture system. Starting with the basic RFM algorithm given in Eq. (3), it has been shown (Waadeland 2017) that all the movement trajectories in the BWD experiment, given by each of the four different performance conditions, may be approximated by an application of the following RFM formula:    y(t) = 3 1 − cos 2πfc (t − δ) + dcos2 [2π (1/2fc (t − δ))] ,

(4)

fc = carrier frequency = frequency of metronome in the paced conditions d = peak frequency deviation, which is a measure of the “strength” of modulation δ = temporal transposition along the time axis, reflecting the different task specific performance conditions fON, fOFF, and eON

67 Groovy Mathematics: Toward a Theoretical Model of Rhythm

1725

It should be observed that Eq. (4) contains only two parameters, d and δ, in addition to the metronome frequency, fc , which is fixed within each condition of metronome speed.

Illustration: RFM Simulation of fON In the fON condition, subjects were instructed to synchronize the minimal points of the movement of their index finger to the beats of the metronome. Figure 8 shows a comparison between an empirically recorded movement curve and a theoretically constructed simulation. The metronome frequency in this illustration is 500 ms (2 Hz). To obtain a quantitative evaluation of how good this RFM simulation is, Waadeland (2017) applied MATLAB to carry out FFT analysis of the motion data in order to obtain insight into the different spectra of the finger movements related to the various performance conditions. These spectra were compared to the output of an FFT analysis of the RFM simulations in order to reveal to what extent the empirically documented movement trajectories and the corresponding RFM simulations share the same spectral properties. Figure 9 illuminates a comparison of the spectra of the trajectories in Fig. 8. Looking at Fig. 9, it is interesting to note that the RFM simulation has spectral components at the same frequencies as the fON performance, at 2 Hz, 4 Hz, 6 Hz, and 8 Hz, and that the relative magnitudes of the components in the RFM approximation match quite well the relative magnitudes of the corresponding components in the fON performance. At least this is the case for the two components with the largest amplitudes, at 2 Hz and 4 Hz, whereas the components at 6 Hz and 8 Hz in the simulation should have somewhat larger amplitudes to make an even better approximation. Observe, moreover, that the metronome tempo, 2 Hz, is reflected in the spectral component with the largest amplitude, whereas the components at 4 Hz, 6 Hz, and 8 Hz contribute to the asymmetric shape of the trajectory.

Fig. 8 RFM simulation of fON. The upper trajectory shows the performance of a subject in the fON condition, whereas the lower trajectory illustrates an RFM simulation (d = 1.85, δ = 0.1)

1726

C. H. Waadeland

100

x=2 y = 92.64

80 60

x=4 y = 45.32

40

x=6 y = 17.49

20

x=8 y = 4.70

0 0

2

4

6

8

10

Hz

100 x=2 y = 91.35

80 60

x=4 y = 45.18

40

x=6 y = 11.05

20 0

0

2

4

6

x=8 y = 1.743

8

10

Hz

Fig. 9 Comparison of spectra. Illustration of the spectra of an fON performance, metronome 2 Hz (upper graph), and an RFM simulation of the performance, with strength of modulation: d = 1.85 (lower graph). Frequency is displayed along the horizontal axis, whereas the vertical axis displays relative magnitude of the amplitudes of the spectral components

If the frequencies of the spectral components were translated into musical note values with metronome tempo representing quarter notes, Fig. 9 illustrates that the live realization of the performance conditions in the BWD experiment, creating the various asymmetric movement curves, might be interpreted as rhythmic behavior that mirrors different integrated (sinusoidal, symmetric) performances of quarter notes, eighth notes, eighth note triplets, and sixteenth notes. As mentioned above, the subjects of the BWD experiment were performing “air tapping.” Moreover, air tapping was mentioned by Delignières and Torre (2011) as a timing task that appears ambiguous as to whether the subjects favor event-based or emergent timing control. Related to this it is interesting to know that Delignières and Torre (ibid., p. 315) stated: “It is possible that during emergent epochs in air tapping the trajectory of the index finger should be smooth and harmonic, whereas during event-based epochs it should be more jagged, with the presence of systematic pauses before each downstroke.” Thus, emergent vs. event-based timing modes might be reflected in the degree of asymmetry of the movement trajectories. Since asymmetry in the RFM model is represented by the strength of modulation, determined by the peak frequency deviation, d, Waadeland (2017) suggested that a change between the

67 Groovy Mathematics: Toward a Theoretical Model of Rhythm

1727

two modes of timing control, emergent vs. event-based timing, might be reflected in the magnitude of the absolute value of d. However, further research is needed to obtain more knowledge about these matters.

Conclusion As pointed out in the introduction of this chapter, an attraction to rhythmic behavior and experience of rhythm is deeply rooted in man’s nature. Human life and in general all living beings are subject to a chain of structured temporal events, cycles, and well-ordered patterns of movements. Different unfoldings of rhythmic organization are basic features of human communication and human behavior, e.g., in speech, music, dance, and sports, and are also fundamental to human biology, from the oscillatory nature by which the brain organizes neuronal information to our heartbeats, breathing patterns, and longer “biorhythms.” To study and get insight into the many multifaceted aspects of rhythm therefore requires an interdisciplinary approach that takes advantage of knowledge, results, methods, and experiences from different fields of scientific research. One very interesting and important contribution to such an approach is given by RITMO Centre for Interdisciplinary Studies in Rhythm, Time, and Motion, at the University of Oslo (see the web site RITMO (in the References)). At this web site, under the heading “About the Centre,” we read: “RITMO is an interdisciplinary research centre focused on rhythm as a structuring mechanism for the temporal dimensions of human life,” and we learn that the research center involves researchers within different fields of musicology, music technology, informatics, psychology, and neuroscience. A main focus of this chapter has been to present a theoretical model of rhythm performance which is capable of making syntheses of various live performances of rhythm that are investigated and documented in empirical rhythm research. Moreover, the model is shown to be an interesting approach in the construction of approximations to movement trajectories observed in studies of timed rhythmic human behavior. The mathematical model is based on frequency modulation of idealized, sinusoidal, movement curves representing rhythmic structure. As underlined in the introduction of this chapter, the model is strongly motivated by a comprehension of rhythm as a continuous phenomenon and the antique, Platonic understanding of rhythmos as “the order in the movement.” Movement, on the other hand, is often seen as a characteristic of a perception of a swinging, groovy rhythm. Or, as stated by the composer and jazz historian, Gunther Schuller, a rhythm is perceived as swinging when: “ . . . a listener inadvertently starts tapping his foot, snapping his fingers, moving his body or head to the beat of the music” (Schuller 1989, p. 223). In light of the Pythagorean view of music as a sonorous realization of mathematics (harmonic relations between numbers), and on the basis of how the presented mathematical model of rhythm is built upon an idea of interacting movements, it is

1728

C. H. Waadeland

tempting to make a twist of the Pythagorean understanding of the relation between mathematics and music and claim that this chapter demonstrates how swinging performances of rhythm and various rhythmic behavior might be reflected in or (in antique terminology) be a mimesis of groovy mathematics.

References Audio Realization of FM Rhythm Synthesis of Swing. http://folk.ntnu.no/carlw/FMrhythm.html Balasubramaniam R (2006) Trajectory formation in timed repetitive movements. In: Latash ML, Lestienne F (eds) Motor control and learning. Springer, New York, pp 47–54 Balasubramaniam R, Wing AM, Daffertshofer A (2004) Keeping with the beat: movement trajectories contribute to movement timing. Exp Brain Res 159:129–134 Barker A (1989) Greek musical writings II. Cambridge University Press, Cambridge Beek PJ, Peper C, Daffertshofer A (2000) Timekeepers versus nonlinear oscillators: how the approaches differ. In: Desain P, Windsor L (eds) Rhythm perception and production. Swets & Zeitlinger, Lisse, pp 9–33 Bengtsson I, Gabrielsson A (1983) Analysis and synthesis of musical rhythm. In: Sundberg J (ed) Studies of music performance, vol 39. Royal Swedish Academy of Music, Stockholm, pp 27–59 Berger H (1929) Ueber das Elektroenkephalogramm des Menschen. Arch Psychiatr Nervenkrankh 87:527–570 Boenn G (2018) Computational models of rhythm and meter. Springer International Publishing, Cham Butterfield M (2010) Participatory discrepancies and the perception of beats in jazz. Music Percept 27(3):157–175 Buzáki G (2006) Rhythms of the brain. Oxford University Press, Oxford Chowning JM (1973) The synthesis of complex audio spectra by means of frequency modulation. J Audio Eng Soc 21:526–534 Clarke EF (1999) Rhythm and timing in music. In: Deutsch D (ed) The psychology of music, 2nd edn. Academic, San Diego, pp 473–500 Dahl S (2004) Playing the accent – comparing striking velocity and timing in an ostinato rhythm performed by four drummers. Acta Acust Acust 90(4):762–776 Dahl S (2006) Movements and analysis of drumming. In: Altenmüller E, Wiesendanger M, Kesselring J (eds) Music, motor control and the brain. Oxford University Press, Oxford, pp 125–138 Dahl S (2011) Striking movements: a survey of motion analysis of percussionists. Acoust Sci Tech 32(5):168–173 Dahl S (2016) Movements, timing, and precision of drummers. In: Müller B, Wolf SI (eds) Handbook of human motion. Springer International Publishing, Cham Delignières D, Torre K (2011) Event-based and emergent timing: dichotomy or continuum? A reply to Repp and Steinman (2010). J Mot Behav 43(4):311–318 Elliott MT, Welchman AE, Wing AM (2009) Being discrete helps keep to the beat. Exp Brain Res 192:731–737 Fraisse P (1982) Rhythm and tempo. In: Deutsch D (ed) The psychology of music. Academic, New York, pp 149–180 Friberg A, Sundström A (2002) Swing ratios and ensemble timing in jazz performance: evidence for a common rhythmic pattern. Music Percept 19:333–349 Friberg A, Bresin R, Sundberg J (2006) Overview of the KTH rule system for musical performance. Adv Cogn Psychol 2(2–3):145–161 Gabrielsson A (1999) The performance of music. In: Deutsch D (ed) The psychology of music, 2nd edn. Academic, San Diego, pp 501–602

67 Groovy Mathematics: Toward a Theoretical Model of Rhythm

1729

Gabrielsson A (2003) Music performance research at the millennium. Psychol Music 31(3):221– 272 Honing H, De Haas WB (2008) Swing once more: relating timing and tempo in expert jazz drumming. Music Percept 25(5):471–476 Hove MJ, Keller PE (2010) Spatiotemporal relations and movement trajectories in visuomotor synchronization. Music Percept 28(1):15–26 Keil C (1995) The theory of participatory discrepancies: a progress report. Ethnomusicology 39(1):19 Kirke A, Miranda ER (eds) (2013) Guide to computing for expressive music performance. Springer, London MIDI Association. https://www.midi.org/specifications Nusseck M, Wanderley MM, Spahn C (2017) Body movements in music performances: the example of clarinet players. In: Müller B, Wolf SI (eds) Handbook of human motion. Springer International Publishing, Cham Otto WF (1955) Die Musen und der göttliche Ursprung des Singens und Sagens. Eugen Diederichs Verlag, Düsseldorf-Köln Prögler JA (1995) Searching for swing: participatory discrepancies in the jazz rhythm section. Ethnomusicology 39:21–54 Repp BH, Steinman SR (2010) Event-based and emergent timing: synchronization, continuation, and phase correction. J Motor Behav 42(2):111–126 RITMO. https://www.uio.no/ritmo/english/ Saue S (2000) Implementing rhythmic frequency modulation. In: Waadeland CH (ed) Rhythmic movements and moveable rhythms – syntheses of expressive timing by means of rhythmic frequency modulation. PhD thesis, NTNU, Trondheim, pp 252–276 Schuller G (1989) The swing era: the development of jazz 1930–45. Oxford University Press, New York Seashore CE (ed) (1932) University of Iowa studies in the psychology of music. The vibrato, vol 1. University of Iowa, Iowa City Seashore CE (ed) (1937a) University of Iowa studies in the psychology of music. Objective analysis of musical performance, vol IV. University of Iowa, Iowa City Seashore HG (1937b) An objective analysis of artistic singing. In: Seashore CE (ed) Objective analysis of musical performance. University of Iowa, Iowa City, pp 12–157 Sundberg J, Askenfelt A, Frydén L (1983) Musical performance: a synthesis-by-rule approach. Comput Music J 7:37–43 Torre K, Balasubramaniam R (2009) Two different processes for sensorimotor synchronization in continuous and discontinuous rhythmic movements. Exp Brain Res 199:157– 166 Viviani P (1990) Common factors in the control of free and constrained movements. In: Jeannerod M (ed) Attention and performance XIII. Lawrence Erlbaum Associates, Hillsdale, pp 345– 373 Waadeland CH (2000) Rhythmic movements and moveable rhythms – syntheses of expressive timing by means of rhythmic frequency modulation. PhD thesis, NTNU, Trondheim Waadeland CH (2001) “It don’t mean a thing if it ain’t got that swing” – simulating expressive timing by modulated movements. J New Music Res 30:23–37 Waadeland CH (2003) Analysis of jazz drummers’ movements in performance of swing grooves – a preliminary report. In: Bresin R (ed) Proceedings of SMAC 03, Stockholm Music Acoustic Conference 2003, pp 573–576 Waadeland CH (2006) Strategies in empirical studies of swing groove. Stud Musicol Nor 32:169– 191 Waadeland CH (2011) Rhythm performance from a spectral point of view. In: Jensenius AR, Tveit A, Godøy RI, Overholt D (eds) Proceedings of the International Conference on New Interfaces for Musical Expression, University of Oslo, pp 248–251 Waadeland CH (2017) Synthesis of asymmetric movement trajectories in timed rhythmic behaviour by means of frequency modulation. Hum Mov Sci 51:112–124

1730

C. H. Waadeland

Waadeland CH, Saue S (2018) Modulated swing: dynamic rhythm synthesis by means of frequency modulation. In: Aramaki M, Matthew EPD, Kronland-Martinet R, Ystad S (eds) Music technology with swing, CMMR 2017, LNCS 11265. Springer, Switzerland, pp 135–150 Wing AM, Beek PJ (2002) Movement timing: a tutorial. In: Prinz W, Hommel B (eds) Attention and performance, vol 19. Oxford University Press, Oxford, pp 202–226

Music, Dance, and Differential Equations

68

Lorelei Koss

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sound Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Musical Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dance Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choreography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1732 1732 1732 1736 1738 1739 1740 1747 1747 1747

Abstract We describe some connections between differential equations, music, and dance. Some of these connections involve using differential equations to understand and explain the physical world around us, such as modeling sound vibrations or the motion of dancers. Other applications involve composers or choreographers using ideas from differential equations as part of their creative process.

Keywords Differential equations · Arts · Music · Dance · Choreography · Musical composition

L. Koss () Department of Mathematics and Computer Science, Dickinson College, Carlisle, PA, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_114

1731

1732

L. Koss

Introduction This chapter aims to survey a variety of ways that ideas from differential equations can be connected to the creation and performance of music and dance. Some of these connections are explicit and explain how differential equations help us understand sound and movement. Other examples are less explicit and illustrate how artists or composers use ideas from differential equations in their creative process. Ultimately, the examples from this paper are intended to help people think about music and dance through the lens of mathematics. Although this chapter is split into separate sections covering music and dance, some differential equation models, such as chaotic dynamical systems and the wave equation, appear in both sections. This chapter does assume some basic familiarity with differential equations. For the simpler differential equations, we provide a short reminder of how the differential equation is constructed. However, there are some examples which utilize very complicated differential equations, and we skip the mathematical background in these sections. This chapter is an expansion of an article on the connections between differential equations, music, and dance from 2016 (Koss, 2016) that focused more on using these models as teaching tools.

Music This section describes some ways that differential equations have been applied to the field of music. First, we describe how a differential equation called the wave equation can be used to model sound generation on stringed instruments and synthesizers. Second, we discuss how composers have used chaotic differential equations as part of their creative process.

Sound Generation To produce sound on a stringed instrument such as a guitar, the string is fixed at both ends and then stretched to a certain height. Releasing it forms a wave that moves to the left and a wave that moves to the right from the point of release. In the examples in this section, a differential equation helps us understand different types of sounds that a guitar makes, which composers and performers then use in their creative process. The description of the wave equation can be found in any introductory partial differential equations textbook, but we include an overview here. Readers who are familiar with this derivation can skip to Eq. 4. To model this situation, let y(x, t) denote the vertical displacement of a string from the x-axis at position x and time t, where the left (x = 0) and right (x = k, for some constant k) ends of the string are fixed at height 0, y(0, t) = 0,

y(k, t) = 0,

(1)

68 Music, Dance, and Differential Equations

1733

y (x , y(x,t))

k

x

Fig. 1 Wave equation model of a stringed instrument

for all t. These assumptions are illustrated in Fig. 1. Using Newton’s Law applied to a small piece of the string results in a differential equation called the wave equation: a2

∂ 2y ∂ 2y = , ∂x 2 ∂t 2

(2)

where 0 ≤ x ≤ k, t > 0, and a is a constant. The solution to the wave equation is y(x, t) =

∞  n=1

     nπ x   anπ t anπ t αn cos + βn sin sin k k k

(3)

where αn =

2 k



k

f (x) sin

 nπ x 

dx, k k  nπ x  2 βn = dx. g(x) sin anπ 0 k

(4)

0

(5)

The n = 1 term in Eq. 3 is called the fundamental and produces the lowest pitch. The n > 1 terms are called overtones; they correspond to higher pitches. Figure 2 shows the graphs of the fundamental and some of the overtones. James Hughes (2000) discussed how the wave equation can be used to understand how the performer can affect the tone of a classical guitar. Tone can be adjusted in two ways. First, the performer can dampen the guitar string at a point k/n, which varies the harmonics. Dampening the guitar string at this point removes the fundamental, which results in a higher tone, and many of the overtones, which results in a softer tone. The tone can also be affected by changing the horizontal location at which the string is plucked. For example, sol tasto means to pluck a string near the fingerboard, and sol ponticello means to pluck a string near the bridge, as shown in Fig. 3.

1734

L. Koss y

n=3

n=2

n=1

k

x

Fig. 2 Varying tone using harmonics Fig. 3 Locations on guitar for sol tasto and sol ponticello

Plucking near the fingerboard produces a mellow tone, and plucking near the bridge produces a metallic tone. Using a computer analysis, Hughes shows that sol tasto is a result of lower overtones dominating higher overtones and sol ponticello is a result of higher overtones dominating lower overtones. To model this, he uses a piecewise function yj (x, 0) = fj (x) to describe the shape of the string after it is plucked at different locations on the fingerboard, where  is a small constant representing the initial vertical displacement that occurs when plucking the string at position j , where 0 < j < k. Figure 4 shows the graph of fj for a sol ponticello position. Hughes computes a formula for the strength |αn,j | (from Eq. 4) of the nth partial of a plucked string with initial shape fj (x) using calculus. Figure 5 shows the graph of |αn,j | as a function of j for a few small values of n. Here, we can see that when j is close to k/2, or sol tasto, the higher overtones have less strength in relation to the fundamental. Similarly, the higher overtones have greater strength when j is close to k, or sol ponticello. The use of sol ponticello techniques to create a harsher tone can be found in sixteenth-century violin compositions (Boyden, 1965). Books from the nineteenth century explain how to create these sounds on the guitar (Josel and Tsao, 2014). These techniques can also be used on the guitar or violin to imitate other instruments. Another way that sol ponticello and sol tasto are used in classical guitar

68 Music, Dance, and Differential Equations

1735

y

Œ

0

j

k

x

Fig. 4 Model of location where string is plucked |α (n)|

n=1

n=2 n=3

k

sol tasto

j

sol ponticello

Fig. 5 Sol tasto and sol ponticello

performance is to produce a different sounding echo effect when musical material is repeated. As one example, this effect was used by Peter Fricker in his Paseo, Op. 61 (Fricker, 1971). Antonis Hatzinikolaou has released a CD containing a performance of Fricker’s Paseo (Hatzinikolaou, 2013). The wave equation can also be used to model sound in instruments that don’t have strings. Rachel Hall and Kresimir Josic use the wave equation to analyze a Norwegian folk instrument called a willow flute (Hall and Josic, 2001). They also perform an analysis of two-dimensional instruments such as drums and bells and give an overview of how one might approach three-dimensional instruments such as the marimba, the glockenspiel, and the instruments of the Indonesian gamelan orchestra. Systems of differential equations can also be used to synthesize sound. Nikolaos Stefanakis et al. (2015) used systems of real and complex nonlinear dynamical sys-

1736

L. Koss

tems to generate sound. They connect the coefficients of the differential equations to pitch, pitch bend, decay rate, and attack time. The book by Rudolf Rabenstein and Lutz Trautmann (2003) gives detailed explanations of the physics of instruments that can be described by the wave equation and connects them to advanced techniques for simulating these sounds using a computer.

Musical Composition In 1988, Jeff Pressing (1988) published the first paper on using chaotic discrete dynamical systems in algorithms to generate music. Pressing was interested in investigating whether musical composition of this type was useful, and, in his opinion, he found the results were “idosyncratic, but show[ed] a listenable degree of structural consistency.” Pressing concluded by suggesting a number of possible extensions that might produce better results, one example of which was the use of continuous variables. Rick Bidlock (Bidlack, 1990, 1992) extended Pressing’s study to differential equations when he composed a twelve-part Canon Dodecanon I using the Lorenz system. The Lorenz equations, developed by the meteorologist Edward Lorenz in 1963, are given by the following system of three differential equations, dx = σ (y − x), dt dy = Rx − y − xz, dt dz = xy − Bz, dt

(6)

where σ , R, and B are positive constants. The Lorenz system exhibits chaotic behavior and also has an attractor. Figure 6 shows the attractor when σ = 10, B = 8/3, and R = 28. Two solutions that start close together can end up relatively far apart (this is the chaotic behavior) but must stay close to the curve shown in Fig. 6 as time progresses (due to the presence of the attractor). Bidlack used the three variables x, y, and z to represent the three musical dimensions of frequency (pitch), loudness, and duration of each note. For example, as you move along a point on the attractor, the change in the x-coordinate dictates the change in pitch over time. At times, the changes in these variables are slight, and at other times, changes are significant. Bidlack discussed other models of chaotic behavior in his paper, and he noted that all such models exhibit “sudden changes in texture and range of values” (Bidlack, 1990, 1992). An excerpt from Dodecanon I is available online (Bidlack, 1995). Bidlack saw great potential in using differential equations as a tool for interesting and varied musical compositions. In the early 1980s, Leon Chua developed a chaotic system that could be constructed in an electronic circuit. Chua’s circuit is defined by the system

68 Music, Dance, and Differential Equations

1737

Fig. 6 The attractor of the Lorenz system

dx = α(y − x − f (x)), dt dy = x − y + z, dt dz = −βy, dt

(7) (8) (9)

where α, β, and γ are constants and f (x) is a nonlinear function. The attractor for the Chua system for the parameters α = .57, β = .5, and f (x) = .57(−.5x − 1.5(|x + 1| − |x − 1|)) is shown in Fig. 7. It is possible to get sound from Chua’s circuit (Bertacchini et al., 2013; MayerKress et al., 1993; Rodet, 1993; Sears, n.d.a) using an approach similar to Bidlack’s in which the x, y, and z coordinates correspond to different musical dimensions. The movement of a point along a trajectory determines the sounds in the composition. Composers have used the chaotic behavior of Chua’s circuit to compose works that give an auditory representation of chaos. Nick Sears and Nathan Ruyle were the sound designers of a staging of Joseé Rivera’s Obie winning play Marisol in

1738

L. Koss

Fig. 7 The attractor of the Chua system

2001 at Southern Illinois University-Edwardsville. Sears and Ruyle used music from Chua’s oscillator to highlight the setting of apocalyptic New York City (Sears, 2007). Chua’s circuit was chosen as part of the sound design in the second act to “create a surreal and chaotic soundscape to accent the script’s shift in tone” (Sears, n.d.b). The score and audio files are available online (Sears, 2007). There is a large body of work on using chaos in composition. Diana Dabby (1996), KB McAlpine et al. (1997), Jonathon Salter (2009), and David Borgo (2005) have all used ideas from mathematical chaos explicitly or as inspiration for their work. Andres Eduardo Coca Salazar et al. (2011) published an algorithm on generating melodies from both discrete dynamical systems and systems of differential equations, including the Lorentz and Chua systems, as well as others. In addition, they performed a statistical study of how music generated from chaotic systems differs from classical compositions.

Dance This section describes applications of differential equations to dance through both explicit and implicit connections. First, we present an analysis of how differential

68 Music, Dance, and Differential Equations

1739

equation models can help us understand a specific movement in ballroom dance. Second, we discuss dance choreography that utilizes differential equations in a variety of ways.

Dance Movement In ballroom dancing, sway is a technique used to help dancers complete turns in certain steps. Sway helps to improve balance and is also considered to give a pleasing visual effect. Traditionally, sway is defined as the inclination of the body away from the foot to the inside of a turn. In the early twentieth century, dance instructors developed an understanding of kinetic techniques such as sway to allow dancers to complete turns with grace and fluidity (Buckland, 2018). In Shioya (2018), Tadashi Shioya uses differential equations to develop a better explanation of sway. Shioya defines two new terms: inclination sway is defined as the inclination of the center of gravity to the vertical axis, and bending sway is the bending of the body at the hips in the side direction. The dancers shown in Fig. 8a are completing a turn and exhibiting bending sway in their positioning. The purple line in Fig. 8b shows a schematic of the dancer’s body, hinged at the hips at (xH , yH ), in purple. The center of gravity of the upper portion is denoted by (xU , yU ) with mass mU , and the center of gravity of the lower part is denoted by (xL , yL ) with mass mL . The particular movement of the dancer’s body during ballroom dancing allows for the assumption that the y components are constant. The angle θ indicates the inclination of the dancer’s body, and φ represents the bending angle at the hinge. These can be expressed as mU xU mL xL + , mU yU + mL yL mU yU + mL yL xU yU φ= − xL . (yU − yH ) yL (yU − yH ) θ=

The following differential equation relates the movement of θ to the other parameters, d2 θ mU yU + mL yL mU mL yL (yU − yH ) (yU − yL ) d2 φ

= gθ + , dt 2 mU yU2 + mL yL2 (mU yU + mL yL ) mU yU2 + mL yL2 dt 2

(10)

where g is the acceleration of gravity. The first component in Eq. 10 represents the solution to a simplified model in which the hinge angle is fixed. The second component represents the acceleration when the swing angle changes. Shioya splits his investigation of sway into a progession of three components: the sway developing stage, the sway maintaining stage, and the sway diminishing stage. He uses Eq. 10 to investigate the change in inclination sway θ and bending sway φ through these three stages. For example, in a three-step waltz turn, inclination sway

1740

L. Koss

(a)

(b)

Fig. 8 Sway in ballroom dance. Attribution for (A) Michaelfoskett/CC BY-SA (https:// creativecommons.org/licenses/by-sa/3.0). (a) Dancers. (b) Variables for the differential equation describing sway

begins at the initiation of the movement and diminishes by the end. In contrast, bending sway is not necessary at the initiation of the movement, but it increases during the turn to allow the dancer to balance at the end, as is shown in Fig. 8a. Shioya’s work helps determine the necessary positioning of a dancer’s body and clarifies the different roles of inclination sway and bending sway play in ballroom dance. In this way, Shioya’s model improves on the original definition and explanation of sway. Shioya concludes by noting that dancers could use the analysis of this differential equation to improve their sway movement.

Choreography In this section, we discuss some connections between choreography and differential equations. In the first four examples, artists explicitly use differential equation models as part of their creative process. We conclude with an example in which the use of differential equations in the choreography is not explicit but contains a connection to an important classical example.

Three-Body Problem The three-body problem was originally formulated by Newton when he studied the motions of the Earth, Moon, and Sun. A special case, called the Pythagorean

68 Music, Dance, and Differential Equations

1741

three-body problem, dates to work by Meissel in 1893 but was not solved until 1967 when Szebehely and Peters used numerical techniques to find solutions. The Pythagorean three-body problem is formulated as follows: three bodies x0 , x1 , and x2 with masses m0 = 3, m1 = 4, and m2 = 5 are placed on the vertices of a 34-5 right triangle, as shown in Fig. 9. The bodies start at rest, so dxi /dt = 0 for 0 ≤ i ≤ 2, and move based on Newton’s Laws of Gravitation. We also define r0 = |x2 − x1 | ,

r1 = |x0 − x2 | ,

r2 = |x1 − x0 | .

The equations describing the motion of the particles are d 2 x0 x1 − x0 x2 − x0 = m1 + m2 2 3 dt r2 r13 d 2 x1 x2 − x1 x0 − x1 = m2 + m0 3 dt 2 r0 r23 d 2 x2 x0 − x2 x1 − x2 = m0 + m1 2 3 dt r1 r03 An approximation of the initial movements of the three bodies is shown in Fig. 10, drawn using Mathematica code written by Michael Trott (CC BY-NC-SA, https://creativecommons.org/licenses/by-nc-sa/3.0/). Edward Warburton, Greg Laughlin, Karlton Hester, Lyès Belhocine, and Drew Detweiler collaborated on a dance piece called “The Three-Body Project” based on Fig. 9 Three-body start

1742

L. Koss

Fig. 10 Three-body trajectory

the Pythagorean three-body problem (Scudellari, 2016; Warburton and Laughlin, 2020). The project was interdisciplinary and featured a collaboration between a choreographer, an astronomer, three dancers, a musician, and two experts in technology. The goal was to work “with the predefined conditions–precise constraints on spatial arrangement, acceleration, timings and trajectories–to create dances that explore feelings of longing, connection and isolation as bodies are flung apart in response to the same gravitational forces that draw them together.” (Warburton and Laughlin, 2020). Each dancer wore an LED light that was tracked and projected on to the scenery behind them, showing a tracing of their movement. The team continued their collaboration to additional related projects that focused on other situations that exhibit periodic solutions as well as sensitive dependence on initial conditions. Videos of the performance can be found at Warburton’s website (Warburton, n.d.).

Influenced by Chaos Section “Musical Composition” discusses how composers use ideas from chaos to create new musical pieces. Joshua Bradley and Elizabeth Stuart choreograph dance using a technique employing symbolic dynamics, rigid-body mechanics, computational geometry, graph theory, statistics, computational linguistics, and a chaotic system of ordinary differential equations (Bradley and Stuart, 1998). The specific details of their methods are beyond the scope of this chapter, so we give just a broad overview of their technique. Bradley and Stuart’s mathematical choreography is based on a sequence of body movements that represents a dance step. As a simple example, they look at a dancer jumping and break the movement into seven different body positions, much like an old-fashioned cartoon would be constructed from seven different drawings. Using

68 Music, Dance, and Differential Equations

1743

techniques from symbolic dynamics and graph theory, they associate the sequence of these movements with a trajectory, or a path along a solution curve, in the chaotic dynamical system. They then start with a small change in the initial condition and follow the trajectory of this new point. This new trajectory will differ from the original because of sensitive dependence on initial conditions but will still vaguely resemble the original as a type of variation on a theme because the system has an attractor. Of course, the human body adds restrictions to the movements that are possible. In Bidlack’s technique of musical composition, described in section “Musical Composition”, any note can follow any other note. The same is not true of dance: you may not be able to move immediately from one body position to some other body position. Bradley and Stuart sometimes have to modify the movements dictated by the computer by adding additional postures in between positions that are too different in nature. Instead of doing this by hand, they use graph theory to make these changes. Videos of the original movements and the variations, including examples produced using the Lorenz and Rossler Equations, can be seen online (Bradley, n.d.). Bradley and David Capps created a performance piece called Con/cantation: chaotic variations that showcased a live dancer with animations of variations created by Bradley and Stuart’s process.

Choreography Using Waveforms Katie Roy used the graphs of the solutions to the wave equation (Eq. 2) as the foundation for the choreography of a dance called musicality. Roy found one piece of music, La valse d’Amèlie, performed in three different versions, original, piano, and orchestral. She used GarageBand to view the waveforms and noticed distinct features in all three. Trained in classical dance, Roy used ballet vocabulary and a process of listening and viewing the waveforms as the core of her choreographic process (Roy and Koss, 2015). The three waveforms are shown in Fig. 11. Roy noticed that the waveform of the original version, shown in Fig. 11a, was sharp, mechanical, and repetitive, so her choreography exhibited these features. For the piano version, shown in Fig. 11b, the waveform was smoother and featured peaks that tapered off slowly. The orchestral waveform, in Fig. 11c, was smaller, delicate, and more intricate than the previous two. The choreography throughout the dance mimicked the shapes that were determined by the wave equation. Figure 12 is a photo from the performance of musicality and has the waveform of the music projected live behind the dancers. Fluid Dynamics Hope Goldman and Andrew Moffat have choreographed and performed a piece based on fluid dynamics using the Navier-Stokes equations (Moffat, n.d.), partial differential equations that describe the flow of incompressible fluids. Moffat used an infrared camera to capture Goldman’s movement in real time and then used a computer simulation to compute velocity fields representing her motion. These velocity fields were projected onto the dancer and the background. Goldman’s

1744

L. Koss

Fig. 11 Waveforms for three versions of La valse d’Amèlie. (a) Original. (b) Piano. (c) Orchestral

movements appeared to stir the scenery around her. In this way, the dancer was simultaneously determining and interacting with the scenery. A photo of the performance can be seen in Fig. 13, and video of the performance can be seen at Goldman and Moffat (2012).

Movement of a Pendulum In this final example, the choreographer did not explicitly use differential equations when creating the work, but there is a strong connection between the dance and an classical example that appears in typical undergraduate courses in differential equations. Most introductory textbooks include a model of a swinging pendulum.

68 Music, Dance, and Differential Equations

1745

Fig. 12 Musicality performance. (Photo credit: A. Pierce Bounds/Dickinson College)

Fig. 13 Photo credit: Daniel R. James. Concept and Direction: Hope Goldman and Andrew Moffat, Choreography and Performance: Hope Goldman, Visual Art and Programming: Andrew Moffat

William Forsythe has choreographed a number of dance pieces in which a dancer interacts with swinging pendulums. Nowhere and Everywhere at the Same Time (Forsythe, 2005) was first performed in New York in 2005. Forsythe has adapted the

1746

L. Koss

Fig. 14 William Forsythe: Nowhere and Everywhere at the same time (6), ©Marc-Henri Le Noir

piece to different venues and incorporated different numbers of dancers, including some performances in which audience members participated. The most simple pendulum model describes a light rigid rod of 0 mass with length 1, with a bob at one end that has mass m. The pendulum can move in a circular motion in a plane that is perpendicular to the ground. If the position of the bob at time t is described by the angle b(t), measured in a clockwise motion, with b = 0 corresponding to the bob pointing straight down, then the movement of the pendulum can be described by the differential equation d 2b + mg sin b = 0. dt 2

(11)

Forsythe’s use of the pendulums varied for different performances in this series. Sometimes the dancers started the motion of the pendulums as part of the dance, and sometimes the pendulums moved automatically. The movement of the dancers was influenced by the movement of the pendulums, but dancers were able to use their bodies expressively under these constraints. Figure 14 shows a performance in Paris in 2017, and a video of the performance can be seen on The Forsythe Company’s website (Forsythe, 2013). The movement of the pendulums allows the dancer’s body to “become freer in terms of its movement creation, rather than following a set notation to fit their body in” (Kato, 2015).

68 Music, Dance, and Differential Equations

1747

Summary Artists can take inspiration from ideas outside of their field, and many choreographers and composers have used mathematical ideas from the field of differential equations and adapted them to their work. On a basic level, the laws of physics dictate certain aspects of what dancers and musicians are able to do, and differential equations can help us understand these restrictions. On a creative level, choreographers and composers have used differential equations to provide a different lens through which they focus their innovative energies.

Cross-References  Breaking the Ice: Figure Skating

References Bertacchini F, Bilotta E, Gabriele L, Pantano P, Tavernise A (2013) Toward the use of Chua’s circuit in education, art and interdisciplinary research: some implementation and opportunities. Leonardo 46(5):456–463 Bidlack R (1990) Music from chaos: nonlinear dynamical systems as generators of musical materials. Ph.D. dissertation, University of California, San Diego Bidlack R (1992) Chaotic systems as simple (but complex) compositional algorithms. Comput Music J 16:33–47 Bidlack R (1995) Dodecanon I. https://nujus.net/~locusonus/site/english/compatible/telecharge. html, Online. Accessed 24-Sept-2020 Borgo D (2005) Sync or swarm: improvising music in a complex age. Bloomsbury Publishing, New York Boyden DD (1965) The history of violin playing from its origins to 1761: and its relationship to the violin and violin music. Oxford University Press, London Bradley E (n.d.) Using mathematics to generate choreographic variations. http://www.cs.colorado. edu/~lizb/chaotic-dance.html, Online. Accessed 22-July-2020 Bradley E, Stuart J (1998) Using chaos to generate variations on movement sequences. Chaos Interdiscip J Nonlinear Sci 8(4):800–807 Buckland TJ (2018) How the Waltz was won: transmutations and the acquisition of style in early English modern ballroom dancing. Part two: the Waltz regained. Dance Res 36(2):138–172 Dabby DS (1996) Musical variations from a chaotic mapping. Chaos Interdiscip J Nonlinear Sci 6(2):95–107 Forsythe W (2005) Nowhere and everywhere at the same time. Creative Time, The Plain of Heaven, New York Forsythe W (2013) Choreographic objects–nowhere and everywhere at the same time, no. 2. https:// www.youtube.com/watch?v=as1bQ6Xl_fg&feature=youtu.be, Online. Accessed 15-Jul-2020 Fricker P (1971) Paseo, op. 61 Goldman H, Moffat A (2012) Form constant. https://www.youtube.com/watch?v=d1KbiIytrE0, Online. Accessed 24-Jul-2020

1748

L. Koss

Hall RW, Josic K (2001) The mathematics of musical instruments. Am Math Mon 108(4):347–357 Hatzinikolaou A (2013) Music of memory Hughes JR (2000) Applications of Fourier series in classical guitar technique. Coll Math J 31: 300–303 Josel S, Tsao M (2014) The techniques of guitar playing. Bärenreiter, Kassel Kato S (2015) Notating the spatiotemporal. http://www.interactivearchitecture.org/choreographyand-notating-the-spatiotemporal.html, Online. Accessed 17-July-2020 Koss L (2016) Differential equations in music and dance. J Math Arts 10(1–4):53–64 Mayer-Kress G, Choi I, Weber N, Barger R, Hubler A (1993) Musical signals from Chua’s circuit. Circuits Syst II Analog Digit Signal Process IEEE Trans 40(10):688–695 McAlpine K, Miranda ER, Hoggar SG (1997) Dynamical systems and applications to music composition: a research report. In: Proceedings of Journées d’Informatique Musicale, pp 106–113 Moffat A (n.d.) Form constant. https://www.arwmoffat.com/work/form-constant, Online. Accessed 24-Jul-2020 Pressing J (1988) Nonlinear maps as generators of musical design. Comput Music J 12(2):35–46 Rabenstein R, Trautmann L (2003) Digital sound synthesis of string instruments with the functional transformation method. Signal Process 83(8):1673–1688 Rodet X (1993) Sound and music from Chua’s circuit. J Circuits Syst Comput 3(01):49–61 Roy K, Koss L (2015) Movement matters! Public lecture, Dickinson College, Carlisle Salazar AEC, Romero RAF, Zhao L (2011) Generation of composed musical structures through recurrent neural networks based on chaotic inspiration. In: The 2011 international joint conference on neural networks, pp 3220–3226 Salter JR (2009) Chaos in music: historical developments and applications to music theory and composition. The University of North Carolina at Greensboro Scudellari M (2016) Science and culture: dancing with Pythagoras. Proc Natl Acad Sci 113(12):3123–3124 Sears JN (2007) Marisol. http://jamesnsears.com/archive/musicmarisol.htm, Online. Accessed 7Oct-2014 Sears JN (n.d.a) Chua’s oscillator in musical applications. http://jamesnsears.com/archive/ecechua. htm, Online. Accessed 7-Oct-2014 Sears JN (n.d.b) Marisol sound design. http://www.jamesnsears.com/2001/05/marisol_sound_ design.php, Online. Accessed 7-Sept-2016 Shioya T (2018) Analysis of sway in ballroom dancing. In: Multidisciplinary digital publishing institute proceedings, vol 2, p 223 Stefanakis N, Abel M, Bergner A (2015) Sound synthesis based on ordinary differential equations. Comput Music J 39(3):46–58 Warburton EC (n.d.) Three bodies. https://sites.google.com/a/ucsc.edu/tedw/praxis/threebodies, Online. Accessed 20-Jul-2020 Warburton EC, Laughlin G (2020) A performed solution to the Pythagorean problem: the three bodies project. Leonardo 53(2):145–150

Breaking the Ice: Figure Skating

69

Diana Cheng

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . History and Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematics Within Skaters’ Blade Tracings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quantitative Ways to Describe Pattern Dances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geometric Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Translations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Biomechanical Principles Within Skating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Angular Momentum in Spins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Angular Momentum in Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Projectile Motion in Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quintuple Jumps? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . International Judging System Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Judging Biases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure Skating Team Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Entrants’ Contributions to Their Team Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Team Event Compared to Hypothetical Team Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application of Hypothetical Team Event to Past Olympic Winter Games . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1750 1750 1751 1753 1760 1760 1761 1762 1762 1763 1767 1769 1771 1776 1781 1783 1784 1785 1789 1790 1791

D. Cheng () Towson University, Towson, MD, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_51

1749

1750

D. Cheng

Abstract Figure skating is filled with mathematically rich opportunities. Performing figure skating can be viewed as both an art form and a physically challenging endeavor. Competitive skaters continually seek to improve their skating skills and the scores that they receive for their performances. This chapter describes how knowledge of various facets of mathematics can be useful in helping them achieve these goals. Facets of figure skating that are described mathematically include: skaters’ blade tracings formed on the ice, skaters’ movements as they can be described using geometric transformations, pattern dances’ quantitative features such as musical tempi and step rates, and biomechanical research applying physical principles governing skating spins and jumps. The International Judging System used to score skating in individual events and the team event is explained. Shortfalls of the judging system, such as judging biases that have occurred in past competitive events, are described. Finally, mathematical analyses that can be used to consider aspects of the figure skating team event that was conducted at the 2014 and 2018 Olympic Winter Games are provided.

Keywords Figure skating · Geometry · Biomechanics · Judging bias · Player evaluation

Introduction The idea of skating on ice surfaces began as a way for people to travel. With improvements in the design of skates that allow for more intricate maneuvers, ice skating has evolved as a sport with both artistic and technical aspects. The sport of figure skating is recognized by the International Olympic Committee and has been contested at the Olympic Winter Games since 1924. The disciplines of ladies, men, pairs, and ice dance are conducted using International Judging System rules published by the International Skating Union (ISU). A separate team event was introduced at the Olympic Winter Games in 2014. Mathematics can be used to describe the two-dimensional tracings that skaters form on the ice, the threedimensional movements that skaters perform, biomechanics, and the scoring used to evaluate skaters.

History and Equipment Ice skating originated as a form of human transportation in northern European countries where rivers and canals frequently froze during the winter months. Formenti and Minetti (2007) studied both the evolution of the materials used to construct ice skates, as well as conducted an experiment to determine the amount of human energy, or metabolic cost, that trained skaters needed to use in order to transport themselves with several different designs of ice skates. The oldest known types of ice skates were animal bones strapped to regular shoes, and were

69 Breaking the Ice: Figure Skating

1751

Fig. 1 The figure skate blade has a circular curvature

found in the Netherlands and dated back to approximately 1800 BC. In order to provide 1 meter of transport, these skates had a metabolic cost of approximately 4.62 Joules per kilogram of the skater’s mass. Ice skates have also been made of ash and iron in 1200 AD (requiring approximately 2.46 J/kg-m), and birch and steel in 1700 AD (requiring approximately 1.78 J/kg-m). The most modern design of ice skates that Formenti and Minetti examined were made in 2004 and were comprised of fiberglass, carbon fiber, and steel, and were affixed to leather boots. These skates only required a metabolic cost of 1.32 J/kg-m, and Formenti and Minetti hypothesize that improvements in designs of ice skates were due to wanting to lower the metabolic cost for transport. Current figure skating blades are commonly composed of carbon steel, nickel, and aircraft aluminum (Santee 2014). Another factor that has improved the metabolic cost involved in skating is the shape of the blade. Animal bones were irregularly shaped and flat, without edges. In current figure skate blades, there is a slight curvature of the blade. The back third of the blade, corresponding to the arc from the heel of the blade (refer to point A of Fig. 1) to a spot just underneath the toe plate of the blade (refer to point B of Fig. 1), matches the curvature of a circle. Near the toe pick, the front third of the blade is set on a circle with a slightly smaller radius. Blades are approximately 4 mm wide and have slightly tapered edges so that the skater can lean to either side of the blade to obtain a deeper curvature of the trajectory skated on the ice. The tapered edges’ curvature is called the “radius of hollow” and generally ranges between 3/8 inches to 1 inch. The larger the radius of hollow, the more speed that can be generated for the same metabolic cost (Santee 2014).

Mathematics Within Skaters’ Blade Tracings Due to the circular radii of the blades, some of the paths that figure skaters trace on the ice can be identified as composite arcs of circles. For example, Fig. 2 illustrates

1752

D. Cheng

Fig. 2 A trajectory of semi-circles that a skater could form for the Basic Consecutive Edges move

Table 1 Combinations of edges to form arcs on the rink surface show in Fig. 2

Method 1 Method 2 Method 3 Method 4 Method 5 Method 6 Method 7 Method 8

Arc CD RFO LFI RFO LFI LBO LBO RBI RBI

Arc DE LFO RFI LBI RBO RBO RFI LBI LFO

the trajectory that a figure skater could make traveling down the ice, alternating feet on each half-circle. Fig. 2 uses an aerial view of the rink, as if a camera were placed on the ceiling of the rink and a photograph were taken from the ceiling looking straight down to the ice surface. The solid curves (arc CD is an example of a solid curve) must be skated on one foot, and the dotted curves (arc DE is an example of a dotted curve) must be skated on the other foot. The illustrated trajectory is similar to the pattern for “Basic Consecutive Edges,” a required element in US Figure Skating (USFS)‘s Pre-Preliminary level test (USFS 2018). To create the Basic Consecutive Edges pattern, the skater must begin from a stationary position and then create four to six semi-circles along a straight-line axis. There are eight different possible combinations of edges to form the curves to travel from point C through point D to point E, and each combination is listed as a different row in Table 1. The first listed method that a skater could use is a right forward outside (RFO) edge starting from point C to travel to point D, and then a left forward outside (LFO) edge to travel from point D to point E. In Table 1, the skating foot used is designated by “R” for right foot or “L” for left foot. The direction of the skater is indicated by “F” for forwards skating or “B” for backwards skating. The edge of the skater is indicated by “I” for inside edge of the blade or “O” for outside edge of the blade. The tracings that skaters make can be studied in terms of scaled drawings, combinatorics, lengths of composite figures, geometric transformations, and time needed to complete the path (Cheng et al. 2019). Tracings can also be studied to determine the area that skaters cover on the ice (Cheng and Twillman 2018). Kerrigan and Spencer (2003) provide detailed illustrations and descriptions of the

69 Breaking the Ice: Figure Skating

1753

different types of turns that skaters can perform. The tracings of steps and turns that American skaters need to complete to pass proficiency tests in order to qualify to compete at certain levels are illustrated in the US Figure Skating rulebook (USFS 2018), and the pattern dances that are part of the current season’s ice dance competitive events are illustrated in (ISU 2018d).

Quantitative Ways to Describe Pattern Dances Within the discipline of ice dance, there are established pattern dances used for testing proficiency levels and competing. These pattern dances have a designated set of steps and turns that are described in the US Figure Skating rulebook (USFS 2018) using three types of representations: (1) tables specifying the step number, partnered holds and relative positions of the partners, edges used by the lady and man, and number of beats assigned for each step; (2) drawings of the blade tracings as they should be formed on the ice, as portrayed from the bird’s eye view of the rink; and (3) text description of the nuances of the dance and notes about how skaters should express the style of the dance. The pattern dances are also referred to as “compulsory dances” or “set dances” because ice dancers must attempt to implement the prechoreographed set of steps and turns as closely as possible (Kerrigan 2003). The term “pattern” refers to the two-dimensional representation of the blade tracings on the rink’s surface since they serve as a pattern for the ice dancers. During training, coaches will sometimes use a marker to draw out the patterns for the skaters on the ice surface so that the skaters can trace those patterns with their blades as they skate. The 2018–2019 US Figure Skating rulebook (USFS 2018) identifies 33 distinct pattern dances that are classified into eight levels. Listed in order of increasing difficulty, the levels are the following: Preliminary, Pre-Bronze, Bronze, Pre-Silver, Silver, Pre-Gold, Gold, and International dance. The ten international dances are the Austrian Waltz, Cha Cha Congelado, Finnstep, Golden Waltz, Midnight Blues, Ravensburger Waltz, Rhumba, Silver Samba, Tango Romantica, and Yankee Polka. In 2018, the International Skating Union added another pattern dance, the Tea-Time Foxtrot (ISU 2018e). In Table 2, the 23 Preliminary through Gold pattern dances are listed. Skaters must pass proficiency tests for all of the pattern dances in a given level before moving onto the next level. Within a given level, skaters are permitted to test all the pattern dances on the same day, if they are able to train these pattern dances simultaneously. Skaters are also permitted to take pattern dance tests at separate test sessions, an option frequently used at the more advanced levels. The pattern dances to be performed in ice dance competitions are preselected by the International Skating Union each year, so competitive ice dancers do not have a choice in the pattern dance they will perform in a given competitive season. In the 2018–2019 season, the pattern dance selected for the Junior level was the Argentine Tango, and the pattern dance selected for the Senior level was the Tango Romantica (ISU 2018e). However, for training purposes, such as to broaden the

1754

D. Cheng

Table 2 Pattern dances at the Preliminary through Gold levels Preliminary Dutch Waltz

Pre-Bronze Swing

Canasta Tango Rhythm Blues

Cha Cha Fiesta Tango

Bronze Hickory Hoedown Willow Waltz Fen-Fox

Pre-Silver Fourteenstep European Waltz Foxtrot

Silver American Waltz Tango

Pre-Gold Kilian

Rocker Foxtrot

Paso Doble Starlight Waltz

Blues

Gold Viennese Waltz Westminster Waltz Quickstep Argentine Tango

scope of the skaters’ skill repertoire, coaches may elect to teach ice dancers some pattern dances that were not designated to be competed in the upcoming season. It may be beneficial for skaters to practice dances that are less challenging than the current season’s dances, in a progression to build up to the difficulty level of the current season’s dances; or, a more common practice is to learn more technically demanding dances for reasons such as making the current season’s dances seem relatively easier, or to prepare in advance for a future season’s selected dance. Skaters who train pattern dances as a cross-training tool (such as synchronized skaters and freestyle skaters) are not restricted to competitive rules, so they can select the next dance that they learn based on their technical strengths. What makes one pattern dance more complex than another? Or, what makes the Pre-Gold, Gold, and International dances harder than the remaining dances? Some of the factors that can be used to quantitatively describe the pattern dances, and thus be used to distinguish between the pattern dances, include: • • • • • •

The tempo of the music The number of steps taken per pattern of the dance The length of time it takes to complete the dance The average speed of the skaters’ steps The number of times the skaters turn from forwards to backwards (or vice versa) The number of holds that the partners use

(This numeric analysis does not take into account difficulty of steps within the pattern dances, or the number of different difficult steps within a dance; as rating the difficulty can vary between skaters of differing abilities). As an illustration of how these factors can be used, the numeric analysis below compares the ten waltz pattern dances listed in the US Figure Skating rulebook (2018). The waltzes are the following: Dutch Waltz, Willow Waltz, European Waltz, American Waltz, Starlight Waltz, Viennese Waltz, Westminster Waltz, Austrian Waltz, Ravensburger Waltz, and Golden Waltz. Waltz was the style of pattern dance selected for this analysis because it is the most commonly occurring genre within the dances. Waltzes are also commonly selected to be performed in international competitions organized by the International Skating Union. At the Olympic Winter Games, the Ravensburger Waltz was performed in 2006, the Golden Waltz was

69 Breaking the Ice: Figure Skating

1755

performed in 1998, the Starlight Waltz was performed in 1994, and the Westminster Waltz was performed in 1984. The Austrian Waltz was performed at the 1982 World Figure Skating Championships. The Viennese Waltz was performed at the 2002 World Junior Figure Skating Championships. Tempo of the Music. The pattern dances’ tempos are set as indicated in the US Figure Skating rulebook (USFS 2018) and reported using a bar graph in Fig. 3. All of the Pre-Gold, Gold, and International pattern dances’ music are faster than 138 beats per minute. The music tempo itself does not cause a dance to be classified at a high level; for example, the American Waltz is a silver-level dance that has a musical tempo of 198 beats per minute, which is the same tempo as the international-level Ravensburger Waltz. Lady’s Steps per Pattern. The steps that the man and the lady take are the same in the Dutch Waltz, but the man’s and lady’s steps differ in all other dances. A step is counted once per time that a blade strikes the ice: a left forward outside edge followed by a right forward inside edge would count as two steps. For simplicity, the bar graph in Fig. 4 reports only the lady’s steps of the waltzes. All of the steps must be memorized and performed in sequential order, and it is sensible that the larger number of steps are required in the more difficult dances. The pre-gold and higher level dances range from 43 to 54 steps per pattern, whereas the lower level dances include 22 or fewer steps. Performance Time. The length of time reported on the horizontal axis in Fig. 5 is the number of seconds that it takes for skaters to perform the test, not including any optional introductory steps prior to the start of the pattern dance or optional exit steps after the pattern dance has been finished. All waltz pattern dances require the skater to complete two patterns of the dance in the test setting, although this is not necessarily the case for other styles of pattern dance (e.g., for the Quickstep test, three patterns must be completed). Two patterns of the shortest waltz, the Dutch Waltz, take 42 s to perform. Two patterns of the longest waltz, the Golden Waltz, take 118 s (or just under 2 min) to perform. The two dances with the

Fig. 3 Waltz musical tempos as reported in beats per minute

1756

D. Cheng

Fig. 4 Number of steps for the lady in each waltz

Step Rate vs. Test Time 90 Ravensburger

85

Steps Rate (Steps/min)

80 75 70 65 60

Viennese

55

Willow

50 45

Austrian

Starlight

Golden

Westminster Dutch European

40 35 30

American

40

45

50

55

65

60

70 75 80 85 90 Approx Test Time (secs)

95 100 105 110 115 120

US Figure Skating Level International Gold

Pre-Gold Silver

Pre-Silver Bronze

Preliminary

Fig. 5 Lady’s step rate for waltz pattern dances vs. length of time in tests

69 Breaking the Ice: Figure Skating

1757

longest performance duration – Golden Waltz and Austrian Waltz, respectively – are international dances, the highest test level. Lady’s Step Rate. The average number of lady’s steps taken for each minute of the dance is reported on the vertical axis in Fig. 5. The fastest waltz is an International dance – the Ravensburger Waltz, which requires an average of 88 steps per min for the lady. On average, this is equivalent to having each step of the Ravensburger Waltz last 0.67 s! At an average of 62.61 steps per min, the Gold level Viennese Waltz also has a rate higher than one step per min, or equivalently, 1.09 s per step. The Silver level American Waltz has the lowest step rate at average of 33.10 steps each min, yet its music tempo is the fastest waltz tempo available. Parnters’ Holds. The “holds” for each pattern dance are based on the relative positions of the partners and their points of contact. The holds are prescribed for each dance, as indicated in the US Figure Skating Rulebook (2018). The holds are categorized by whether the partners are facing in the same direction or whether they are facing each other (where one partner is traveling forwards and the other is traveling backwards), whether the lady is on the left or right side of the man, and their arm positions. In the Starlight waltz, the partners start in “closed” hold (see Fig. 6, left picture) for the lady’s first 16 steps, move to “open” hold for 3 steps (Fig. 6, center picture), go back to “closed” hold, change to “Kilian” hold where the man’s right hand is on the lady’s right hip (Fig. 6, right picture), and return to “closed” hold for the remainder of the pattern. For this analysis, since the partners remain in closed hold from the end of the pattern (steps 16 through 24) and begin also in closed hold for step 1, the number of holds in the Starlight Waltz is four. The fifth column in Table 3 reports the average holds per min, calculated by multiplying the patterns per min by the holds per pattern for the dances which have more than one hold per pattern. The average holds per min are also represented on the horizontal axis in the graph of Fig. 7. In the lowest level waltz, the Dutch Waltz, both partners face the same direction for the entire dance, and the partners have the same exact steps. In next higher level dances, the Willow Waltz, the European Waltz, and the American Waltz, the partners face each other and perform mirrored steps without a change of hold. The Pre-Gold and higher waltzes involve changing holds and partners performing steps that are not necessarily always mirroring each other. Any time there is a change of hold, there is a greater chance of error since the partners risk separating too far away from each other; thus, dances with changes of hold are considered to be more difficult. The Golden Waltz has the largest rate of changes in hold, with an average of 16.27 holds per min of the dance, or equivalently, on average every 3.68 s the partners are in a different hold. The Viennese Waltz has the second largest average holds per min, 13.04. Changes of Lady’s Orientation. Each time that the lady moves from forwards to backwards, or vice versa, a change of orientation is counted. A change of orientation can be accomplished by a step (which involves changing weight from the standing leg to the opposite leg) or by a turn without taking an additional step (which does not involve a transfer of weight).

1758

D. Cheng

Fig. 6 Partner holds for the Starlight Waltz pattern dance: Closed, Open, Kilian

In Table 3, the sixth column reports the number of changes of orientation that the lady needs to complete for each pattern of the dance. The seventh column reports the average number of changes of orientation for each min of the dance. In Fig. 7, the vertical axis displays the information reported in the seventh column of Table 3. While performing the Golden Waltz, the lady changes orientation an average of 40.68 times a min; it also has a highest average number of holds per min. The dance with the second-highest rate of orientation changes is the Austrian waltz, which has

69 Breaking the Ice: Figure Skating

1759

Table 3 Waltz dances’ average holds per min and average changes of orientation per min

Waltz Dutch Willow European American Starlight Viennese Westminster Austrian Ravensburger Golden

Pattern time (s) 21 23 24 29 35 23 29 49 29 59

Patterns per min 2.86 2.61 2.50 2.07 1.71 2.61 2.07 1.22 2.07 1.02

Holds per pattern 1 1 1 1 4 5 6 8 6 16

Average holds per min 1.00 1.00 1.00 1.00 6.86 13.04 12.41 9.80 12.41 16.27

#changes of orientation per pattern 0 2 14 11 16 4 11 32 12 40

Average # changes of orientation per min 0 5.22 35.00 22.76 27.43 10.43 22.76 39.18 24.83 40.68

Average # changes of orientation per minute

Holds per Minute vs. Orientation Changes per minute 40

Golden Austrian

35

European

30 Starlight

25 20

Ravensburger Westminster

American

15 10 5

Viennese

Willow

0

Dutch

1

2

3

4

5

6

7 8 9 10 11 12 Average # Holds per minute

US Figure Skating Level International Pre-Gold Gold Silver

Pre-Silver Bronze

13

14

15

16

17

Preliminary

Fig. 7 Average number of partners’ holds and average changes of partners’ orientation in waltzes

an average of 39.18 changes of orientation for the lady each min. The European waltz has a relatively high rate of orientation change (35 per min), as two-thirds of the steps in the dance involve changing orientation – half of these changes are achieved by stepping and the other half are achieved through turns.

1760

D. Cheng

Geometric Transformations Many kinesthetic movements can be described using geometric transformations. Leonard and Bannister (2018) showed how dancers’ bodily movements could be identified as rotations, reflections, and translations. Figure skaters’ movements can also be explained using these transformations. A few examples are provided in this section.

Rotations There are many examples of rotations in figure skating, since skaters frequently turn on the ice during their performances. Some technical elements in which rotations occur include twizzles, step sequences, jumps, and spins. Another example of rotation is during lifts. Rotational lifts are technical elements within a free dance program (see Fig. 8). When they are performed in isolation, the maximum time for

Fig. 8 Three stages of a rotational lift

69 Breaking the Ice: Figure Skating

1761

the lift is 7 s. During a rotational lift, the lifting partner (most commonly the man) uses a continuous motion to rotate the lifted partner at least two rotations in one direction, either clockwise or counterclockwise (ISU 2018d).

Reflections There are many kinds of reflections within figure skating. First, the ice serves somewhat as a mirror in which skaters may be able to see their reflections, depending on the lighting in the rink (in Fig. 8, skaters’ reflections on the ice surface are apparent). Second, different photographs can produce images that appear to be reflections of each other (in Fig. 8, the lift depicted in the left hand photograph is a mirror image of the lift on the right hand photograph). Third, skaters’ movements can have reflection symmetry. In Fig. 9, in the picture on the left, the skater performs a split jump so that her legs are spread across an invisible vertical line of reflection through the center of her torso. In the picture on the right of Fig. 9, the skater performs a “spread eagle” move by turning out her feet so that her blades’ toe picks face in opposite directions. Fourth, partners can perform mirroring kinesthetic movements with their arms and/or legs, for example, within the pattern dances’ closed hold.

Fig. 9 Skating moves with reflection symmetry: split jump and spread eagle

1762

D. Cheng

Fig. 10 Skaters’ translations of poses

Translations Any time that a skater travels down the ice in a stationary pose, it is a translation of the skater’s body. Translation of movement occurs when two skaters are placed side by side, travel down the ice in the same direction, and strike the same pose. Figure 10 depicts several poses that translated. In the picture on the top left, both of the skaters’ arms are outstretched to their sides, the skaters are both standing on their left legs, and the skaters’ right legs are raised. In the top right picture, the skaters are performing a “hydroblade” movement where they glide backwards on concentric circles with their weight over their right back outside edges. In the picture in the bottom center, the skaters are performing a “lunge” movement where they travel forwards with their left knees bent, and their right boot drags along the ice.

Biomechanical Principles Within Skating Skaters and coaches have partnered with biomechanical researchers to study athletic movements based on physical principles, with the goal of improving athletic performance. Since the difficulty of movements that are performed is rewarded in competitions, skaters continually aim to learn how to execute movements requiring more coordination, muscle strength, and flexibility. Sport trainers have used results from biomechanical analyses, such as those published by Dubravicic-Simunjak et al.

69 Breaking the Ice: Figure Skating

1763

(2003), to inform their design of strength and conditioning programs. Rehabilitation physicians have also benefitted from biomechanical studies to assist in skaters’ recoveries from injuries sustained during training (Fortin et al. 1997). In this section, the physical principle of angular momentum conservation is discussed as it relates to figure skating spins; and angular momentum conservation and projectile motion are discussed as they relate to figure skating jumps. We also describe how knowing about these physical principles helps coaches consider how skaters might have a chance at breaking existing records for spins and jumps.

Angular Momentum in Spins There are three basic spin positions identified by the ISU (2018b): camel, sit, and upright. In Table 4, photographs of these spins positions are provided. Typically, spins are performed on one foot, ideally with the skater’s center of mass over the skating leg. In a camel spin, the skater’s free leg is backwards and the knee of this free leg must be higher than the hip level. In a sit spin, the skating knee is deeply bent and at least parallel to the ice. In an upright spin, the skating leg is extended. All other spin positions, such as those which require additional flexibility, are considered to be variations of the basic positions. Spin combinations are formed when at least two basic positions are performed within the same spin. The most outstretched spin position (with largest distance from the spinning axis) is performed in the camel position, and the most compact position relative to the spinning axis is performed in the upright spin. Thus, the upright spin will rotate much more quickly than the camel spin. Suppose a skater wants to know how the relative angular velocities in two different spin positions: camel and upright. How many times faster will the skater rotate in an upright spin than in the camel spin? Might we be able to use this information to break world records in spinning? This section discusses spins using the idea of conservation of angular momentum. Imagine that a skater performs a spin combination, first starting in a camel spin and then raising the torso and going into an upright spin. If the skater does

Table 4 Three basic spin positions

1764

D. Cheng

not change the spinning axis during the combination spin, the skater’s angular momentum is the same when in the camel position and in the upright position (Lcamel = Lupright ). As previously mentioned, angular momentum is the product of the moment of inertia and the angular velocity, and Icamel × ωcamel = Iupright × ωupright . A skater’s moment of inertia is determined by the position that the skater adapts. This moment of inertia depends on the masses that are rotating and how far away they are from the axis of rotation. The formulas for moments of inertia (I) ofall rotating masses (m) can be derived using calculus, by evaluating an integral I = r2 dm, where r represents the radius or distance from each mass to the axis of rotation (for example, see Ling et al. 2018). For a cylindrical object that rotates about an axis 2 through the center of the circle, the moment of inertia is I = mr2 .For a cylindrical object that rotates about an axis perpendicular to the aforementioned axis, the 2 2 moment of inertia depends also on the length (l) of the cylinder, I = mr4 + ml 12 .

Moment of Inertia in Camel Spin For the purposes of this example, the camel spin position will be approximated as a compound object that rotates about the vertical spinning leg. In the camel spin position, there are two approximately cylindrical entities that are rotating about the vertical axis in the center of the skater’s spinning leg (See Fig. 11): 1. The skater’s spinning leg, which has moment of inertia Ileg = 0.5mleg rleg 2 . In Fig. 11, the skater’s spinning leg is the right leg. 2. The rest of the skater’s body, which has moment of inertia Ibody = 0.25mbody rbody 2 + (1/12)mbody lbody 2 . In this formula for moment of inertia, rbody represents the radius of the body in the vertical spinning axis. lbody represents the

Fig. 11 Camel spin position approximated as two cylinders rotating about an axis

69 Breaking the Ice: Figure Skating

1765

distance away from the vertical axis that the skater’s outstretched free leg and head reach. For simplicity, we consider the distances from the vertical spinning axis to the free leg and to the skater’s head to be the same. Also for simplicity, we assume the portion of the body on each side of the rotational axis have the same mass (in reality, a skater’s head and torso have more mass than the free leg). In Fig. 11, the skater’s free leg is the left leg. Suppose a skater’s mass is 50 kg, and that the mass of the standing leg is mleg = 15 kg. Then the mass of the rest of the skater’s body is mbody = 50 kg – 15 kg = 35 kg. Also, suppose the skater’s spinning leg has a radius of rleg = 0.1 meters, the body has a radius of rbody = 0.2 meters. In this example, we will vary the length between the spinning axis and the skater’s head to demonstrate the calculations for skaters with different heights; for the first sample calculation, reported in Table 5, we will use lbody = 1 m. The moment of inertia in the camel spin, Icamel , is the sum of the moments of inertia of the two entities: the spinning leg and the rest of the skater’s body. In this illustration, Icamel = Ileg + Ibody = 0.075 kg× m2 + 3.27 kg× m2 = 3.34 kg× m2 .

Moment of Inertia in Upright Spin In the second part of the spin, the skater moves into an upright position. The skater’s body now can be modeled as a cylinder spinning about the same vertical axis. In

Table 5 Moments of inertia for two-cylinder compound object representing skater’s position in camel spin Body part (approximately cylindrical) Cylindrical diagram

Skater’s body (not including spinning leg)

Spinning leg axis of rotation

axis of rotation

rbody l

rleg

Photo

Moment of inertia formula Numeric calculations Moment of inertia

Ibody=(1/4)mbodyrbody2+(1/12)mbodylbody2 Ileg = (1/2)mlegrleg2 Ibody = (1/4) (35 kg) (0.2 m)2 + Ileg = (1/2) (15 kg) (1/12) (35 kg) (1 m)2 Ileg = 0.075 kg m2 Ibody = 0.35 + 2.92 = 3.27 kg m2

(0.1 m)2

1766

D. Cheng

the upright position, the skater has moment of inertia Iupright = 0.5mupright rupright 2 . Using the same measurements as in the camel spin, Iupright = 0.5 ×(50 kg)× (0.2 m)2 = 1 kg× m2 .

Conservation of Angular Momentum from Camel Spin to Upright Spin Since the skater starts from the camel spin and continues into the upright spin position, angular momentum is conserved. Using the numeric values mentioned above, if lbody = 1 m, the angular velocities are related by the equation, 3.34 ωcamel = ωupright . In other words, the skater’s angular velocity during the upright spin is 3.34 times the angular velocity during the camel spin, or the skater rotates 3.34 revolutions in the upright spin for each revolution in the camel spin. Figure 12 reports the number of times the upright spin rotates for each rotation of the camel spin, for different values of lbody , ranging from 1 meter to 1.65 meters. As shown in Fig. 12, if the skater’s mass, body radius, and leg radius are held constant, then the number of times faster a skater will rotate in an upright spin than in a camel spin will depend on the horizontal length of the position attained in the camel spin.

# of revolutions in upright spin per camel spin revolution

Is There Potential for More Record-Breaking Spins? There are currently two categories of spin records that have been established. Canadian skater Olivia Rybicka-Oliver holds the record for the fastest on-ice spin at 342 rotations per min. She was 11 years old at the time of the recording in Warsaw, Poland (Guinness World Records 2019). Swiss skater Lucinda Ruh, with height

Angular velocity of upright spin based on skater's horizontal length in camel spin 10 9 8 7 6 5 4 3 2 1 0

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Skater's lbody

Fig. 12 Angular velocity comparisons in camel spin vs. upright spin, for skaters with different body lengths

69 Breaking the Ice: Figure Skating

1767

1.75 m, holds the record for the most continuous upright spins on ice skates on one foot (Guinness World Records 2019). On April 3, 2003, she performed an upright spin rotating 115 revolutions. Will it be possible to surpass a record set in the Guinness World Records? As discussed above, a skater can use the camel spin position to generate angular momentum, and then can use this angular momentum to rotate quickly in an upright spin position. Both of these records were set without the skater commencing the spin in a camel position. With this simple change in the entrance to the spin, the skater can generate more angular momentum initially through the camel spin position and transfer it to the upright position to increase angular velocity. Being able to hold a spin for the length of time that Ruh did requires muscles developed to balance on one foot for a prolonged period of time and to hold the tightly wrapped upright position for the duration of the spin. Skaters who are taller and can wrap their bodies into positions more tightly may also have a chance at breaking the world records.

Angular Momentum in Jumps The physical principles of angular momentum and projectile motion govern jumping motions, so gaining a better understanding of how these principles work can help skaters achieve more complex jumps. At the 2018 Olympic Winter Games, the top six men’s singles event finishers each landed two quadruple jumps in their short programs. The same skaters landed an average of 4.2 quadruple jumps during their free skate programs (ISU 2018a). A large amount of biomechanical research has focused on multi-revolution jumps, because landing triple jumps are now the norm for internationally competitive skaters, many elite skaters are landing quadruple jumps, and some skaters are even training towards landing jumps with greater than four revolutions in the air. This section describes angular momentum and projectile motion as they relate to skaters’ jumps, illustrates characteristics of quintuple jumps based on both physical principles and empirical measurements, and then describes training tools that some skaters have used to assist with their jumping. Rotational speed and time in the air are the main factors that contribute to successful multi-revolution jumps (King 2005). The training implication is that when learning jumps with higher numbers of revolutions, skaters need to develop ways in which to increase both the amount of revolutions achieved per second and amount of time they are in the air. A skater’s angular momentum, L, is created when a skater takes off for a jump. Angular momentum is the product of the skater’s moment of inertia, I, and the skater’s angular velocity ω, also known in the literature as rotational speed. Skaters’ jumping motion can be approximated by a cylinder rod that rotates about a vertical axis (Lewin 2018). If the skater has mass m and the skater’s limbs have a distance r away from the axis of rotation, the skater’s moment of inertia given by I = 0.5mr2 . Skaters can control their moments of inertia by repositioning their arms and legs

1768

D. Cheng

closer to the axis of rotation, which then in turn changes the skaters’ angular velocity (King et al. 2004). During the takeoff of the jump, the skater applies torque to the ice and this produces angular momentum. When the skater is in the air, the skater does not have any additional opportunity to change the amount of angular momentum. Since a skater’s mass does not change within the few seconds’ duration of a jump, the equation representing the skater’s conservation of angular momentum is L = r1 2 ω 1 = r2 2 ω 2 . A sample illustration describing the relationship between radius (in meters) and angular velocity (as reported in revolutions per min) follows. Assume that a skater is performing a triple toe loop, whereby the peak RPM is 343 RPM. This estimation was derived from King’s (2005) study which found average RPM of triple jumps to be 243 RPM, whereby the peak RPM in a jump is always higher than the average RPM. Also, assume that the skater begins in a position with arms bent at the elbows and away from the body, forming a radius of 0.3 m during takeoff. Assume that the skater’s tightest air position is 0.18 m, which was an average found from analyzing Olympic male skaters’ triple jumps (King et al. 2004). The skater’s moment of inertia is smaller when in a tighter position of the body, since the radius is smaller; there is a natural limit in the radius of the tightest air position based on body size (King 2005). Using the conservation of angular momentum equation, the initial angular velocity in this situation is given by ω1 = (0.18)2 343 RPM/0.32 or approximately 123 RPM. Given the initial radius of 0.3 m and angular velocity of 123 RPM, we can then compute other instantaneous RPMs for the radii between 0.3 m and the tightest position with the radius at 0.18 m. Figure 13 shows a graph of the results from this illustration.

Angular Velocity (RPM)

400

Angular Velocity (RPM)

350 300 250 200 150 100 50 0

0

0.05

0.1

0.15

0.2

0.25

Radius (meters)

Fig. 13 Illustration of angular velocity and radius in a triple toe loop jump

0.3

0.35

69 Breaking the Ice: Figure Skating

1769

Fig. 14 Japanese skater Yuzuru Hanyu’s quadruple toe loop from the 2014 Olympic Games, as constructed by King (2018)

Projectile Motion in Jumps Projectile motion is an area of physics that applies to skaters when they propel themselves into the air during a jump. A skater’s trajectory in the air is parabolic (see Fig. 14). The kinematics equation showing the relationship between the vertical height of the skater’s center of mass H in meters as a function of time T in seconds is H(T) = (−9.8/2)T2 + v0 T, where v0 represents the skater’s initial vertical velocity and the acceleration due to gravity 9.8 m per s squared. One variable within the kinematic equation is a skater’s vertical takeoff velocity. The vertical takeoff velocity affects the amount of time that the skater spends in the air, known as the “flight time.” The maximum height is reached at time T = v0 /4.9 s, and the maximum height is at H(v0 /4.9). Skaters’ center of mass is generally lower upon landing than upon takeoff, because skaters typically take off for the jump with a skating knee flexed at approximately 10 degrees, and then land with the skating knee in a much deeper bend (King 2005). In the quantitative illustrations in this section, it is assumed that the skater’s center of mass is 0.1 m lower upon landing than upon takeoff, and the flight time can be computed by solving H(T) = −0.1 = (−9.8/2)T2 + v0 T. It is not recommended for skaters to land with the center of mass too much lower from takeoff to landing, such as in a seated position with a deeply bent landing knee. While this would seem to be a sensible strategy to increase flight time, biomechanically this position does not allow for the absorption of landing forces to be spread over a longer period of time on the ice, and thus can lead to overuse injuries to the landing knee and ankle (DubravicicSimunjak 2003). In King’s (2005) report based on video analysis of 2002 Olympic Winter Games male skaters’ skating performances, she recorded the vertical takeoff velocities when they performed quadruple and triple toe loops. The average takeoff velocity for the quadruple toe loop was 3.3 m per s, with standard deviation of 0.2 m per s. Using this average takeoff velocity, the kinematics equation describing vertical height (H) of the skater’s center of mass as a function of time (T), with acceleration due to gravity approximated at −9.8 m per s squared, is the following: H = (−9.8/2)T2 + 3.3 T. Skaters’ center of mass is generally lower upon landing

1770

D. Cheng

than upon takeoff, because skaters typically take off for the jump with a skating knee flexed at approximately 10 degrees, and then land with the skating knee in a much deeper bend (King 2005). If the skater’s center of mass is 0.1 m lower upon landing than upon takeoff, the flight time (in Fig. 15, this is the time between the skater’s takeoff at point A and landing at point F) during the quadruple toe loop is 0.703 s. The maximum point of the parabolic trajectory of the center of mass is at height 0.555 m. King (2005) found that the average vertical takeoff velocity for triple toe loops by the same Olympic male skaters was 3.2 m per s, with standard deviation 0.4 m per s. The kinematics equation describing the skater’s vertical height is H(T) = −4.9 T2 + 3.3 T (see dotted curve in Fig. 15, from point A to point C). Skaters generally land with their landing knee bent, so the skater’s center of mass is lower upon landing than upon takeoff. If the skater’s center of mass lands 0.1 m below takeoff, the flight time is 0.683 s and the center of mass reaches a maximum height of is 0.522 m (see point B in Fig. 15) above the takeoff height (point A in Fig. 15). Through experimental studies examining different physical factors contributing to vertical velocity, King (2005) found that powerful extension of the takeoff leg is the main factor that generates vertical velocity. The skater creates downward forces against the ice by extending the hip, knee, and the ankle. Other factors that contribute to vertical velocity include an extension motion of the trunk and upwards

E 0.5

B

Height(m)

0.4

0.3

0.2

0.1

0

A 0.1

0.2

0.3

0.4 Time (s)

–0.1

0.5

0.6

0.7 C

0.8

F

Fig. 15 Average vertical height vs. flight time for a male skaters’ quadruple toe loops (solid curve) and triple toe loops (dotted curve)

69 Breaking the Ice: Figure Skating

1771

motions of the free limbs. Skaters’ shoulder abductor strength and knee extensor strength are the strongest predictors of skaters’ jump height (King et al. 2004; King 2005). The last phase of the jump is the landing. Understanding biomechanics of the landing is extremely important from an injury prevention standpoint, as injuries can occur if the skater falls frequently or absorbs the landing force using improper techniques. When a skater lands a jump during off-ice practice, the landing ankle absorbs at least three times the skater’s body weight (Saunders et al. 2014). On-ice jumps land with a force of 5 to 8.5 times the skater’s body weight, whether the jump is landed properly or missed accidentally (Arbour 2012). In the past, some coaches only included the successfully landed jumps in their skaters’ daily jump counts, which are tabulated in an effort to prevent overuse injuries; but biomechanical research suggests including all jump attempts in the daily count because the landing forces for missed jumps are similar to that of successful jumps (Arbour 2012). In biomechanically optimal jump landings, the skater needs to transfer body weight from the toe pick of the landing foot to the edge of the landing blade. Ideally, the skater will land in a controlled and balanced manner. The skater will form a smooth curved tracing on the ice, which has the effect of dissipating the landing force. The free leg needs to be extended and the shoulders are generally perpendicular to the landing edge of the jump (Lockwood et al. 2006). Figure 16 shows a skater’s jump landing on the right back outside edge. In a study of 42 Canadian judges’ assessments of jump landings, Lockwood et al. (2006) found that judges are consistent in their ratings of jump landings, that is, they agree on whether jumps have aesthetically pleasing landings. Lockwood et al. (2006) also found that jumps which are more biomechanically optimal are also more pleasing to watch and more highly rated by judges (Lockwood et al. 2006). There does not appear to be a single theoretical angle at which the landing foot should be flexed to minimize landing forces sustained by the ankle, but there are ways in which skaters can empirically use movements of their hips and other joints to minimize the landing forces sustained (Rowley and Richards 2015).

Quintuple Jumps? Athletes, coaches, and biomechanical researchers have studied how skaters might attain jumps with more than four revolutions. Now that a handful of skaters have achieved quadruple jumps, including Nathan Chen who received credit for fully rotating five different types of quadruple jumps in his free skate at the 2018 Olympic Winter Games, the next natural question is whether it will be possible for skater to perform jumps with an even greater number of rotations. No skaters have yet landed quintuple jumps in an International Skating Union sanctioned competition. Internationally renowned coach Tom Zakrajsek believes that certain skaters have the body build to achieve a quintuple jump (Poppick 2014). In order to attain additional revolutions, skaters must have both increased flight time and increased rotational speed.

1772

D. Cheng

Fig. 16 Jump landing edge

What might a successfully landed quintuple jump look like? Below, we describe some scenarios in which a quintuple jump could theoretically be landed. As King et al. (2004) found, successfully landed jumps may have a range of different rotational speeds attained and different flight times. The scenarios described below all assume the skater’s center of mass lands 0.1 below the height used at takeoff. The average rotational speed is computed by dividing the number of revolutions (five for a quintuple jump) completed in 60 s by the flight time in seconds. The scenarios described below reflect calculations with the premises that quintuple jumps must have at least the same vertical velocity as quadruple jumps, and at least the same angular velocity as quadruple jumps. The first scenario reported in Table 6 uses the same initial vertical velocity (3.3 m/s) and flight time (0.703 s) as the average quadruple toe loop jump examined in King’s (2005) study. This scenario results in an average angular velocity of 427 RPM. Each scenario reported in a subsequent row of Table 4 has an increased vertical velocity of 0.1 m/s from the prior row’s scenario. The tenth scenario for quintuple jumps uses the same average angular velocity in quadruple jumps of 341 RPM from King’s (2005) study; and it has a corresponding initial vertical velocity that would need to be used for a quintuple, 4.2 m/s. The highest height for the skater’s center of mass in these scenarios is that of scenario 10, with 0.900 m reported for the height needed.

69 Breaking the Ice: Figure Skating

1773

Table 6 Characteristics of theoretically possible quintuple jumps Quintuple jump scenarios 1 2 3 4 5 6 7 8 9 10

Initial vertical velocity 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4 4.1 4.2

Flight time (s) 0.703 0.722 0.742 0.761 0.781 0.8 0.821 0.841 0.86 0.88

Average angular velocity (RPM) 427 416 404 394 384 375 365 357 349 341

Max. height (m) 0.556 0.590 0.625 0.661 0.698 0.737 0.776 0.816 0.858 0.900

The two figures that follow are visual representations of the information in Table 6. As with all projectile motion, the skater’s center of mass travels through a parabolic path in the vertical plane during the jump. The skater’s vertical height as a function of time corresponding to each scenario reported in Table 6 is graphed in Fig. 17. The highest curve represents scenario 10, and the lowest curve represents scenario 1. Figure 18 is a graph showing the relationship between the skater’s maximum height and the angular velocity. These are both factors that skaters must be aware of in order to complete a successful quintuple jump, based on principles of angular momentum and projectile motion. The quintuple jump in Scenario 10 is represented in the top-most dot, showing a height of 0.900 m and an angular velocity of 341 RPM. It may be natural to wonder whether it is within the realm of human possibility to land a quintuple jump based on the necessary angular velocities and vertical heights calculated. Regarding angular velocity, biomechanist Jim Richards observed that skaters’ peak angular velocities are typically 80–100 RPM higher than their average angular velocities while performing quadruple jumps (Gonzalez 2018). The quadruple jumps in Nathan Chen’s 2018 Olympic Winter Games had a peak rotation rate of approximately 440 RPM (The New York Times 2018). Even though 440 RPM was not Chen’s average angular velocity, it seems promising for quintuple jumps that it is humanly possible to rotate at the speed that Nathan Chen was able to produce. Regarding vertical height, we have no empirical data that skaters have attained the vertical heights required for the quintuple jumps; however, other athletes have reached and surpassed these heights (without rotation). Each year, the National Basketball Association hosts a Draft Combine event whereby players that are entered in the draft must perform certain skills. One of the Draft Combine skills is the maximum vertical jump, for which players are allowed a running start to jump as high as they can. In 2013, player D.J. Stephens attained a maximum vertical height of 46 inches, or approximately 1.17 m (NBA 2014). This height is much

1774

D. Cheng

Fig. 17 Trajectories of vertical height vs time for quintuple jumps

Vertical height and angular velocity of possible quintuple jumps

1.0 0.9

Vertical Height (m)

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

0

50

100

150

200

250

Angular Velocity (RPM)

Fig. 18 Vertical height and angular velocity for quintuple jumps

300

350

400

450

69 Breaking the Ice: Figure Skating

1775

higher than is needed for a skater’s quintuple jump. Many basketball players in the annual combines attain a maximum vertical height of at least 0.9 m. Some basketball players’ moments of inertia (specifically their large shoulder width) may prohibit them from attaining the angular velocities needed to complete successful quintuple jumpers on the ice.

Training Tools for Jumps Thus far, three kinds of training aids have been used to help skaters work towards attaining multi-revolution jumps on the ice: pole harnesses (See Fig. 19, left hand side), hinged figure skating boots (See Fig. 19, right hand side), and weighted gloves. The effects of these training aids are further explained in this section. Pole Harness A quadruple Axel involves 4.5 revolutions on the ice, with a forward take-off and a backwards landing. Several internationally competitive male skaters have successfully landed quadruple Axels in practice with the assistance of a handheld pole harness operated by their coaches (OlympicTalk 2018). The first such video was taken in October 2016 of Max Aaron, the 2013 US national champion, with his coach Tom Zakrajsek assisting him. To operate the pole harness, a coach skates alongside the skater. The coach holds onto a flexible cylindrical rod, which is connected to a rope that is affixed to a shoulder harness worn by the skater. The pole harness resembles a fishing pole and acts like a lever for the coach to assist the skater in rotating about a vertical axis. The pole harness is a commonly used training tool that coaches can use to help skaters develop the timing and muscle memory needed to perform a jump. In addition to helping skaters gain flight time,

Fig. 19 Training tools: Pole harness and the ProFlex model hinged figure skating boot manufactured by Jackson Ultima

1776

D. Cheng

the pole harness has other benefits. If the skater has an error in technique, the coach can use the pole to help the skater fall at a slower rate. However, skaters cannot rely on these harnesses during their competitive performances.

Hinged Figure Skating Boot Coach Zakrajsek suggested that an equipment modification, such as using a pivoting hinge on the skater’s takeoff boot, may be necessary to increase skaters’ range of motion in the ankle and increase vertical takeoff velocities (Wetzel 2018). The idea of using a hinged skating boot is not new; but, additional research is needed to develop a usable model. In 2004, biomechanical researchers at the University of Delaware collaborated with a figure skating boot company, Jackson Ultima, to manufacture and test a hinged skating boot that they called the “Pro-Flex” (Mayer 2018). US Figure Skating approved for hinged boots to be used in competitions, as long as they are commercially available to all skaters who desire to use them, as opposed to only being available to certain skaters affiliated with biomechanics research labs. However, Jackson Ultima Pro-Flex boot did not gain popularity among skaters and Jackson Ultima discontinued manufacturing the boot. No other boot manufacturer has decided to market this design of boot (Waldman 2006). University of Delaware research team members Dustin Bruening and Jim Richards reported that skaters found this design to be bulkier than a traditional skating boot, and skaters experienced difficulty with adapting to the increased range of motion afforded by the hinge feature of the boot (Bruening and Richards 2006). Weighted Gloves Additional equipment may be needed to assist skaters increase their rotation speed. Biomechanist Sarah Ridge proposes the use of weighted gloves: having extra mass at arm’s length away from skaters’ axis of rotation would increase skaters’ moments of inertia. This would also cause an increase in angular momentum, and when skaters bring their arms towards their torsos mid-air, angular velocities would increase (Gonzalez 2018).

International Judging System Scoring The ISU’s International Judging System is currently used for ISU sanctioned figure skating competitions. It provides a way to rate figure skating performances numerically based upon rubrics that are revised annually (e.g., ISU 2018b). Technical panel officials and judges all watch each skating performance, and they have different roles with respect to rating the skating performance. Technical panel officials identify and discuss the elements that the skaters performed, and judges provide a rating of the quality at which elements were performed. There are five people assigned to each technical panel for an event, and they have access to video replay equipment to review a skater’s performance. There are a maximum of nine judges on the judges’ panel for each ISU sanctioned event.

69 Breaking the Ice: Figure Skating

1777

In each competitive event, skaters perform two programs. For singles and pairs, the first program is called the “short program” and the second program is called the “free skate” (formerly called the “long program”). For ice dance, the first program is called the “rhythm dance” (formerly called the “short dance”) and the second program is called the “free dance”. The skater or couple with the highest competition score wins the competition. The competition score is the sum of the two program scores. Each program’s score is called the “Total Segment Score” (TSS) and is assigned taking into consideration the Total Element Score (TES), Program Components Score (PCS), and any deductions which might have taken been merited. Calculation methods for the TES and PCS are further detailed below. Deductions came be taken for errors such as: time violation when the program is lacking or in excess of the required time, costume or prop violations, illegal elements or movements, falls, interruption in performing the program in excess of 10 s, late start of the program after the skater’s name has been announced, violation of tempo specifications (ISU 2018b). To illustrate how scores are tabulated, Fig. 20 shows the breakdown of the Total Element Scores and Program Components scores for the 2018 Olympic Winter Games pairs event. German pairs team Aljona Savchenko and Bruno Massot won the event after they earned a Total Segment score of 76.59 points in the short program and a Total Segment Score of 159.31 points in the free skate, for a point total of 235.90 overall. Even though Savchenko and Massot only earned fourth place in the short program, they garnered enough points in the free skate to win overall. Chinese team Wenjing Sui and Cong Han earned the silver medal after they scored 82.39 points in the short program and 153.33 points in the free skate, earning 235.47 points overall. The bronze medalists were the Canadian team of Meagan Duhamel and Eric Radford, who earned 230.15 points overall, with 76.82 points in the short program and 153.33 points in the free skate. The Total Element Score is dependent upon the “base values” of the technical elements performed within the program, along with the judges’ assessment of how well the elements were performed. Technical elements include jumps, spins, step sequences, lifts (for pairs and ice dance couples), death spirals (pairs only), throw jumps (pairs only), and twizzles (ice dance only). The base value is a numeric value assigned to each element depending upon the difficulty of the element, and the judging panel decides the Grade of Execution of the element. A description of how jumps are scored follows; similar scoring mechanisms are in place for the other technical elements. For jumps, the larger the number of revolutions, the more difficult the jump is to perform and subsequently the higher number of base value points assigned for that jump. Quadruple jumps are the most difficult jumps that have been performed to date, and the ISU Figure Skating Media Guide (ISU 2018c) reports records that skaters have attained. The first triple jump recorded in ISU competition was landed by American skater Richard Button at the 1952 Olympic Winter Games, who performed a triple loop jump. In 1988, Canadian skater Kurt Browning became the first man to cleanly land any quadruple jump in an ISU sanctioned competition and he landed a quadruple toe loop at the World Championships. The first lady

1778

D. Cheng

Fig. 20 2018 Olympic Winter Games pairs event medalists’ scores

to cleanly land a quadruple jump was Japanese skater Miki Ando, who landed a quadruple salchow at the 2002 Junior Grand Prix Final; and the first lady to land a quadruple toe loop was Russian skater Alexandra Trusova at the 2018 World Junior Championships. American skater Nathan Chen set a record of having the highest number of quadruple jumps in one program – six – at the 2018 Olympic Winter Games in his men’s free skate program. At the 2002 Cup of Russia competition, Russian skater Evgeni Plushenko completed the first jump combination that involved one quadruple jump and two triple jumps: quadruple toe loop, triple toe loop, triple loop consecutively. Table 7 shows the 2018–2019 season’s International Judging System Scale of Values’ base values for the toe loop jump. In order to perform the toe loop jump, a skater takes off and lands on the same backward outside edge, and the skater uses the toe pick of the skating blade to assist take-off. The toe loop is considered the easiest of the toe-assisted jumps because the hips and upper body rotate half a revolution upon take-off (Petkevich 1989). The single toe loop jump involves one revolution (360 degrees) in the air and has a base value of 0.4 points; the double toe loop involves two revolutions in the air and has a base value of 1.3 points; the triple toe loop involves three revolutions in the air and has a base value of 4.2 points; and the quadruple toe loop involves four revolutions in the air and has a base value of 9.50 points. A jump is considered under-rotated if the skater rotated between 0.5 and 0.75 less than the designated jump; for instance, a toe loop jump that rotates 1.6 revolutions in the air is considered an under-rotated double toe loop. Under-rotated jumps receive 75% of the credit that fully rotated jumps receive. As of the 2018–2019 season, each completed element is evaluated for quality of execution on an eleven-point Grade of Execution scale of −5 to +5, where each

69 Breaking the Ice: Figure Skating

1779

Table 7 Points awarded for toe loop jumps (ISU 2018b)

Jump name Single toe loop - under-rotated Single toe loop Double toe loop - under-rotated Double toe loop Triple toe loop - under-rotated Triple toe loop Quadruple toe loop - under-rotated Quadruple toe loop

Jump abbreviation 1 T< 1T 2 T< 2T 3 T< 3T 4 T< 4T

Base value 0.3 0.4 0.98 1.3 3.15 4.2 7.13 9.5

Minimum score (−5 GOE) 0.15 0.20 0.49 0.65 1.58 2.10 3.57 4.75

Maximum score (+5 GOE) 0.45 0.60 1.47 1.95 4.73 6.30 10.70 14.25

positive Grade of Execution earned corresponds with an increase in 10% of the base value score. Reductions (negative Grades of Execution: −5, −4, −3, −2, −1) are taken if the skater has an error in the jump such as if the skater falls, touches down with one or both hands, lands on two feet instead of on one foot, uses the wrong edge for take-off of the jump, and/or has poor speed, height, distance, or air position. Positive Grades of Execution (+1, +2, +3, +4, +5) are given if the skater has good height and good horizontal length of the jump, has good take-off and landing, looks effortless throughout, has a preparation that includes an unexpected or creative entrance, has good body position, and/or matches the music (ISU 2018b). To provide a perspective on how the technical elements of jumps, spins, and step sequences contribute to a skater’s Total Element Score, Fig. 21 was generated using the results of the 2018 ISU Grand Prix Final (ISU 2018f). The Total Element Scores of men’s event winner Nathan Chen and ladies event winner Rika Kihara are displayed. The short program and free skate at the senior level include 19 required elements: 10 jumps, 6 spins, and 3 step sequences. Typically, successful skaters will have jumps contributing a significant portion of the Total Element Score. For example, two of the jumps in Chen’s free skate (quadruple flip earning 15.40 points, and triple Axel earning 10.06 points) received more points than his six spins combined in both the short program and the free skate. Chen earned 110.19 points for jumps, 25.12 points for spins, and 15.26 points for step sequences, for a Total Element Score of 150.57 points. Kihara earned 87.24 points on her jumps, 22.94 points for spins, and 15.39 for the step sequences, and her Total Element Score was 125.57 points. The percentages of the Total Element Score for each type of element are reported in the figure. In addition to rating skating performances for their technical element execution, judges also rate skaters on their overall presentation of five Program Components: Skating Skills, Transitions, Performance, Composition, and Interpretation of the Music/Timing (for ice dance). These program components are based on judges’ impressions of the overall quality of skating ability, the purposeful use of intricate footwork and movements linking the technical elements, the physical and emotional involvement of the skater, the original arrangement of the movements and the use

1780

D. Cheng

Nathan Chen (USA)

Rika Kihara (Japan) 12.3%

10.1% 16.7%

18.3% 69.5%

73.2%

Jumps

Spins

Steps

Jumps

Spins

Steps

Fig. 21 Distribution of technical elements in 2018 ISU Grand Prix Final winners’ Total Element Scores Table 8 Chinese skater Boyang Jin’s Program Components Scores from the 2018 Olympic Winter Games men’s free skate (ISU 2018a)

Skating Skills Transitions Performance Composition Interpretation of the music

Judge 1 9

Judge 2 8.25

Judge Judge Judge Judge Judge Judge Judge 3 4 5 6 7 8 9 Score 8.75 8.75 8.71 8.75 8.75 8.5 8.5 9.5

8.5 8.75 9 9

7.5 8.5 7.75 8

8.5 8.75 9 9

7.75 8.75 8.5 8.25

8 8.25 8.5 8.5

8 8.5 8.75 8.5

9.25 9.5 9.5 9.5

8.5 8.75 8.75 8.75

8.25 8.5 8.25 8.5

8.21 8.64 8.68 8.64

of space, and the translation of the character of the music to movement on the ice (respectively). Each of the five Program Components is rated on a scale from 0.25 to 10.00 in increments of 0.25. In the free skating program, the Program Components Score are multiplied by a factor of 2; whereas in the short program, the Program Components Score is only multiplied by a factor of 1. For both Grades of Execution and Program Components, the judges’ highest and lowest scores are both dropped and the remaining scores are averaged for the final score. To illustrate this calculation, we provide the Program Component scores of Chinese skater Boyang Jin from the 2018 Olympic Winter Games free skating program in the men single skating event (see Table 8). The highest and lowest scores within each row are underlined and were not considered in the calculation of the score. For Skating Skills, Judge 2 rated Jin’s skating with the lowest score of 8.25 points, and Judge 7 rated Jin’s skating with the highest score of 9.5 points. The scores from the other seven judges – 9.00, 8.75, 8.75, 8.5, 8.5, 8.75, 8.75 points respectively – were averaged to obtain the Skating Skills score of 8.71 points.

69 Breaking the Ice: Figure Skating

1781

Judge 2, American Lorrie Parker, provided the lowest ratings for Jin’s Skating Skills, Transitions, Composition, and Interpretation of the Music; and Judge 5, Israeli Albert Zaydman, provided the lowest rating for Jin’s Performance. Judge 7, Weiguang Chen from China, provided the highest ratings for all of Jin’s five Program Components, so her scores were never included in Jin’s scores for Program Components. Jin placed fourth overall at the 2018 Olympic Winter Games in the men’s figure skating event.

Judging Biases In any sport where the final results are determined by subjective ratings, developing a valid and reliable scoring system is challenging (Emerson et al. 2009). At the 2002 Olympic Winter Games, a highly publicized judging scandal involving votetrading occurred. As a result, the International Olympic Committee ordered that a new, more transparent scoring system be developed in hopes of avoiding some of the well-known problems that existed with the judging system that was used. Some of these problems included that one judge’s vote could drastically change the outcome of the event, variability among judges’ ratings were high, skate order (determined randomly prior to the event) mattered greatly, judges’ cognitive loads were high, and validity was in question when skaters regularly placed difficulty choreography in blind spots for the judges (Swift 1998). The newer judging system, the International Judging System, has been used since 2004 and has been an improvement on multiple fronts over the 6.0 judging system, yet still reputation biases and nationalism biases may affect judging outcomes. Immediately after the 2002 Olympic Winter Games pairs free skate, ISU referee Ron Pfenning filed an official complaint about the judging of the pairs event (ISU 2002). The Canadian team of Jamie Sale and David Pelletier had a clean free skate with easier technical elements than the Russian team of Yelene Berezhnaya and Anton Sikharulidze, whereas the Russian team made several noticeable errors on their more difficult technical elements. French judge Marie-Reine Le Gougne reported pressure to rate the Russian pairs team highest in the pairs event. The trade was reportedly in exchange for a Russian judge’s rating the French ice dance team highest. After an investigation ordered by the International Olympic Committee, Le Gougne was suspended from judging for 3 years and barred from the 2006 Olympic Winter Games (ISU 2002). Two gold medals were awarded for the pairs event – one to the Russian pairs team and one to the Canadian pairs team. Details about the computation of scores in the “6.0” judging system and how Le Gougne’s ratings affected the initial overall outcome of the pairs event are further described in Cheng (2013). Huang and Foote (2011) found considerable variability among judges’ ratings in events at the 2004 World Figure Skating Championships. The total variance for judges’ ratings were 7.9% and 9.8%, respectively, for the men’s and ladies’ singles events, indicating that scoring reliability under the 6.0 judging system was problematic. The 2004 World Figure Skating Championships was the last major ISU

1782

D. Cheng

international competition scored under the 6.0 judging system. In the 6.0 judging system, judges only reported their preferences for relative rankings of skaters’ technical and artistic marks, using numeric values ranging from 0.0 to 6.0 points. In the 6.0 judging system, judges had to keep track of the elements that were performed, mentally rate the difficulties of those elements, as well as rate the quality of execution of these elements while deciding upon skaters’ relative rankings. Several hours’ time gap might occur in between the first and the last skater of an event. With this difficult cognitive load, the skate order of the skaters in the event played a large role in overall result. Skaters tended to score higher if they skated chronologically later in the event, since skaters who skated later in the event were more memorable to the judges. With the IJS, judges no longer need to identify the elements performed or provide relative rankings. Rather, under the IJS, three different individuals on the Technical Panel – the Technical Specialist, the Assistant Technical Specialist, and the Technical Controller – work together to identify the elements that were performed so that the accuracy of identifying elements is improved, and the Technical Panel officials do not serve as judges of the same event. In the IJS, the judges only rate each element that was performed for its quality, so that wait time between elements or skaters is not a factor in their decision-making process. Also, in the 6.0 judging system, judges did not have access to video review of elements after the performances occurred. So, coaches and skaters often placed their jump combinations in a blind spot for the judges, so that the judges would be less likely to see any errors made in these jump combinations (Swift 1998). With the International Judging System, judges have access to video replay of elements so that they can review questionable elements after the skating performance takes place, irrespective of the placement of technical elements on the ice surface. Reputation bias can be present in any judging system, whenever some skaters are better known to judges than others. Findlay and Ste-Marie (2004) performed an experimental study with qualified Canadian figure skating judges, whereby they asked 12 judges to rate performances of selected skaters by watching videos and scoring them on their technical and artistic aspects. Half the judges were from Ontario and the other half of the judges were from Quebec. All the judges watched the same 14 videos of skating performances. Half of the videos were of Ontario section skaters who were rated collectively as being positively well known to the Ontario judges and not known to the Quebec judges; and half of the videos were of Quebec section skaters who were rated collectively as being positively well known to the Quebec judges and not known to the Ontario judges. The judges participating in this study gave higher technical marks to skaters whom they previously knew and thought had made positive names for themselves. Nationalism bias occurs when a judge deliberately shows preference for skaters representing the judge’s country, and it is considered a serious ethical offense by the International Skating Union. Zitzewitz (2006) found evidence of nationalism bias under the 6.0 judging system, and nationalism bias continues to appear when events are judged using the IJS. After the 2018 Olympic Winter Games, the International Skating Union suspended Chinese judge Weiguang Chen and barred

69 Breaking the Ice: Figure Skating

1783

her from judging at international figure skating events for the next 2 years and at the 2022 Olympic Winter Games (Waldeck et al. 2018). The reason for the suspension is because Judge Chen rated Chinese skater Boyang Jin more highly than the other judges and she underscored Jin’s competitors. Table 3 shows that judge #7, Weiguang Chen, awarded to Jin the highest marks for all five program components. The marks from Judge Chen were 9.25 or 9.50 for each program component, whereas the marks from the remaining judges ranged from 7.50 to 9.00, and the scores of the panel ranged from 8.21 to 8.71 after eliminating the highest and lowest marks. Waldeck et al. (2018) also provide numeric evidence that Judge Chen’s ratings of the two of the three men’s figure skaters who eventually placed higher than Jin in this event (Shoma Uno of Japan, and Javier Fernandez of Spain) were much lower than the ratings that the remaining judges awarded to these skaters.

Figure Skating Team Event The first time that the figure skating team event was contested at the Olympic Winter Games was in the year 2014, and the figure skating team was held again in 2018. Ten countries qualified to have teams compete in the team event. In Round 1 of the figure skating team event, each of the ten countries had an entrant perform the short program (or short dance) in each of four disciplines: men’s singles, ladies singles, pairs, and ice dance. In Round 2 of the figure skating team event, only the top five countries from Round 1 performed programs. Each of the top five countries had an entrant perform the free skate (or free dance) in the same four disciplines. Within each round, the top ranked entrant per discipline earned 10 points for the team, the second ranked entrant per discipline earned 9 points, and the last ranked entrant earned 1 point (in Round 1) or 6 points (in Round 2). The team with the largest sum points accrued by its entrants after Round 2 earns first place and the gold medal; the team with the second-largest sum of points earns second place and the silver medal; the team with the third-largest sum of points earns third place and the bronze medal. In the case of a tie for points earned, the International Judging System scores for each entrant are added and the country with the higher sum wins the tiebreaker. The points earned by the entrants of each of the teams in the 2018 Olympic Winter Games team event Round 1 are listed in Table 9. The ten countries which participated in the team event in 2018 are the following in order of placement, from first to last: Canada (CAN), Olympic Athletes from Russia (OAR), United States of America (USA), Italy (ITA), Japan (JPN), People’s Republic of China (CHN), Germany (GER), Israel (ISR), Republic of Korea (KOR), and France (FRA). The columns headings of “M” refer to the points earned by the Men’s entrant of a given country, “L” refer to the points earned by the Ladies entrant, “P” refer to the points earned by the Pairs entrant, and “D” refer to the points earned by the Dance entrant. Adding the team event to the figure skating schedule at the Olympic Winter Games gives skaters another opportunity to earn medals. For example, the team representing the United States of America at the 2018 Olympic Winter Games

1784

D. Cheng

Table 9 Points earned at the 2018 OWG team event Round 1 Country CAN OAR USA ITA JAP CHN GER ISR KOR FRA

M 8 3 7 6 10 4 2 9 5 1

L 8 10 6 9 7 4 3 1 5 2

P 9 10 7 4 3 6 8 2 1 5

D 10 8 9 7 6 4 3 1 2 5

Round 1 Total 35 31 29 26 26 18 16 13 13 13

consisted of men’s skaters Nathan Chen and Adam Rippon, ladies skaters Mirai Nagasu and Ashley Wagner, pairs couple Alexa Scimeca and Christopher Knierim, and ice dance couple Maia and Alex Shibutani. In the team event, the American team placed third and earned a bronze medal. Maia and Alex Shibutani earned another bronze medal in the ice dance event. However, none of the other American entrants in the team event earned medals in their individual events.

Entrants’ Contributions to Their Team Scores Sports analysts and countries’ skating federations may be interested in determining entrants’ contributions towards the team to name “Most Valuable Player” types of designations or to decide how to split financial incentives. It may be natural to think that the contributions of the entrants towards the team are proportional to the number of points earned, but depending on the team’s goal, this might not be the case. Cheng and Coughlin (2017) suggested using equations from the Banzhaf and Shapley-Shubik indices to measure the extent to which entrants on the figure skating team event contributed towards their countries’ goals. Next, we show how the Banzhaf and Shapley-Shubik indices can be applied to a scenario in the 2018 Olympic Winter Games team event. The 2018 Olympic Winter Games team representing the Olympic Athletes from Russia earned a total of 31 points in Round 1 of the team event, as shown in Table 9. The Men’s entrant, Mikhail Kolyada, earned 3 points. The Ladies entrant (Evgenia Medvedeva) and the Pairs entrant (couple Evgenia Tarasova and Vladimir Morozov) each earned 10 points, and the Ice Dance entrant (couple Ekaterina Bobrova and Dmitri Soloviev) earned 8 points. Since the Men’s entrant earned the fewest number of points towards the team total, it may appear that he contributed

69 Breaking the Ice: Figure Skating

1785

3/31 or approximately a tenth of the way towards the team’s placement in round 1. However, there are at least two situations in which this is not the case: • Suppose the OAR team wants to know the contributions of the entrants towards a team goal of earning 26 points. This goal would correspond to earning the same or more points as the fourth-place team of Italy. The resulting Banzhaf and Shapley-Shubik indices show that the men’s entrant did not contribute at all towards this goal, since the sum of the points earned by the Ladies, Pairs, and Dance entrants is 28 points and already meets the goal of 26 points. Also, even though there is a two-point difference in the number of points earned by the Dance entrant in comparison to the Ladies or the Pairs entrants, the Banzhaf and Shapley-Shubik indices indicate that all three entrants contributed equally (33.3%) towards this goal. • Suppose the OAR team wants to know the contributions of the entrants towards a team goal of 29 points, corresponding to the number of points earned by the thirdplace team from the United States of America. The Banzhaf and Shapley-Shubik indices show that all four entrants contributed equally (25%) towards this goal. A goal of 29 points is sufficiently high that no combination of three entrants’ points will meet it, and the points earned by all four entrants are needed.

Team Event Compared to Hypothetical Team Event Sports fans may wonder what the results of a team event had been if the Olympic Winter Games prior to 2014 conducted such an event. Since the only scores available in past Olympic Winter Games were individual event scores, it is worth examining whether the 2014 and 2018 Olympic Winter Games’ team event results would have been similar to results when team event scoring mechanisms were used to simulate a team event. Cheng and Coughlin (2018) compared the placements of the countries in the actual 2014 Olympic Winter Games team event to the results of the 2014 Olympic Winter Games individual event results, if hypothetically the individual event results had been used to assemble the team event scores. In Cheng and Coughlin’s article, they used the results from the skaters from the ten qualifying countries’ rosters to tabulate the hypothetical amount of points that each entrant would have earned for their countries. These ten countries are listed in order of placement in the team event, from first place to last place: Russia (RUS), Canada (CAN), United States of America (USA), Italy (ITA), Japan (JPN), France (FRA), China (CHN), Germany (GER), Ukraine (UKR), and Great Britain (GBR). The countries’ placements in the team event are represented by rectangular bars in Fig. 22 and the hypothetical placements of the countries based on individual event results are represented by circular dots in the same figure. The hypothetical team event identified the same top three teams, in the same order, as the actual team event did. The Spearman’s

1786

D. Cheng

Fig. 22 Comparison of 2014 Olympic Winter Games placements for team event and hypothetical team event

correlation coefficient comparing the placements of the countries in the actual team event and the hypothetical team event at the 2014 Olympic Winter Games was 0.878, indicating a strong correlation. Below, the same methods as Cheng and Coughlin (2018) used are applied to the results of the 2018 Olympic Winter Games. The results of the team event and the hypothetical team event are compared. The hypothetical team results are based on relative placements of entrants on the ten qualifying countries’ rosters. Table 10 shows the points that each entrant would have earned in the hypothetical team event. The abbreviation “OAR” is used for the Olympic Athletes from Russia. The underlined and italicized numbers indicate that the team did not have an entrant qualified to compete in that individual event, so the team is presumed to have placed last in that event. For example, the Israeli team did not have a skater qualified to compete in the ladies event. It is remarkable that the top five countries’ placements in the actual and the hypothetical team event for the 2018 Olympic Winter Games are exactly the same. A visual representation of the comparison of placements resulting from the team event and the hypothetical team event is provided in Fig. 23. The Spearman’s correlation coefficient comparing these placements at the 2018 Olympic Winter Games was 0.915, which confirms that skaters had very similar relative placements in both settings. The Spearman’s correlation coefficients found for both the 2014 and 2018 Olympic Winter Games could be because the performances at the team event took place very close in time (within 2 weeks) to the individual events. Some of the discrepancies can be attributed to the idea that countries might not have selected their strongest skaters to perform in the team event. For example, the French ice dance couple of Marie-Jade Lauriault and

69 Breaking the Ice: Figure Skating

1787

Table 10 2018 Olympic Winter Games’ figure skating individual results used to tabulate a hypothetical team event Round 1 Actual team event placement 1 2 3 4 5 6 7 8 9 10

Country CAN OAR USA ITA JPN CHN GER ISR KOR FRA

M 9 8 5 3 10 4 2 7 6 1

L 9 10 5 7 8 2 4 1 6 3

Round 2

P 9 10 4 5 2 7 8 3 1 6

D 10 7 9 8 6 2 4 1 5 3

M 9 10 8 6 7

L 6 10 7 9 8

P 10 9 7 8 6

D 10 8 9 7 6

Hypothetical point totals 72 72 54 53 53 15 18 12 17 13

Hypothetical team event placement 1 2 3 4 5 8 6 10 7 9

Fig. 23 Comparison of 2018 Olympic Winter Games placements for team event and hypothetical team event

Romain Le Gac performed in Round 1 of the 2018 Olympic Winter Games, and they earned five points for the team; in the individual ice dance event, they placed 17th. The French team could have selected ice dance couple Gabriella Papadakis and Guillaume Cizeron, who placed second in the ice dance event and likely would have earned more points than Lauriault and Le Gac did for the French team had they participated in the team event. However, the French team probably knew that the team would not qualify for Round 2 of the team event irrespective of the points earned from the ice dance event, and it may have wanted to rest its top athletes.

1788

D. Cheng

Table 11 2006 Olympic Winter Games’ figure skating individual results used to tabulate a hypothetical team event based on ISU (2006) Round 1

Country RUS CAN USA ITA JAP UKR CHN FRA GER GBR

M 10 9 6 7 2 4 5 8 3 1

Round 2

L 9 10 7 8 6 5 4 2 2 2

P 6 9 7 2 2 5 10 4 8 2

D 10 9 7 4 6 8 2 5 2 2

M 10 9 8 6 7

L 8 7 9 6 10

P 10 7 9 8 6

D 10 7 8 9 6

Hypothetical team event point totals 73 67 61 50 45 22 21 19 15 7

Hypothetical team event placement 1 2 3 4 5 6 7 8 9 10

When Cheng and Coughlin (2018) created a hypothetical team event using individual event results from the 2010 Olympic Winter Games, they found that Canada, the United States of America, and Russia would have earned medals, respectively. The Japanese team and the Italian team would have qualified to Round 2, respectively. By extending the methods described in Cheng and Coughlin (2018) to the 2006 Olympic Winter Games, Table 11 was generated. The same ten countries that qualified to the 2014 Olympic Winter Games were considered in this hypothetical team event based on 2006 Olympic Winter Games individual event results. In 2006, three rounds were skated in the ice dance event – compulsory dance, original dance, and free dance. In calculating the points for Round 1 ice dance, the sums of the IJS scores from compulsory dance and original dance were used to determine relative placements. The Russian team would have won the gold medal, the Canadian team would have won the silver medal, and the American team would have won the bronze medal in the 2006 hypothetical team event. The Italian and Japanese teams would have qualified to Round 2, respectively. At the 2002 Olympic Winter Games, the 6.0 judging system was used. In the 6.0 system, entrants were still ranked with ordinals for placement relative to the other skaters in their individual events, and these ordinal rankings were used to generate the points earned in each round of the hypothetical team event (the lower the ordinal ranking, the higher the number of points). The ten countries selected for analysis in the hypothetical team event were the same ten which qualified to the figure skating team event at the Olympic Winter Games in 2014. The ice dance event consisted of four rounds: two pattern dances, an original dance, and a free dance. For the purposes of this analysis, the entrant earning the most points in Round 1 had the lowest sum of ordinal rankings from the two pattern dances and the original dance.

69 Breaking the Ice: Figure Skating

1789

Table 12 2002 Olympic Winter Games’ figure skating individual results used to tabulate a hypothetical team event (Olympic.org 2002) Round 1

Country RUS USA CAN FRA JAP ITA GER UKR CHN GBR

M 10 8 7 6 9 2 2 4 5 2

Round 2

L 9 10 7 5 8 6 2 4 2 2

P 10 7 9 2 2 4 6 5 8 2

D 9 5 4 10 1.5 8 7 6 1.5 3

M 10 9 7 6 8

L 9 10 7 6 8

P 9.5 8 9.5 6.5 6.5

D 9 7 8 10 6

Hypothetical team event point totals 75.5 64 58.5 51.5 49 20 17 19 16.5 9

Hypothetical team event placement 1 2 3 4 5 6 7 8 9 10

Table 12 was generated to display the results of a hypothetical team event based on individual results from the 2002 Olympic Winter Games.

Application of Hypothetical Team Event to Past Olympic Winter Games One way of applying the results from a hypothetical team event is to determine whether entrants who did not win an Olympic medal in an individual event could have become an Olympic medalist if the figure skating team event had existed earlier. The 1980 Olympic Winter Games hypothetical team event results are reported in Table 13. There were exactly 10 countries represented by the men’s entrants, and these ten countries were assumed to have qualified for the hypothetical team event. The country abbreviations used include: Soviet Union (RUS), West Germany (WGR), and East Germany (EGR). There were only five countries which qualified entrants to the individual events in all four disciplines: RUS, USA, GER, GBR, and CAN. However, the Canadian skaters did not earn enough points to qualify to round 2. The first-placing team in the 1980 hypothetical team event was the United States of America. The three men’s entrants representing the USA were Charles Tickner, David Santee, and Scott Hamilton. Charles Tickner placed third in the men’s event and earned the bronze medal. David Santee placed fourth, and Scott Hamilton placed fifth and they did not earn medals in the men’s event. Yet, for the purposes of the team event, entrants representing the same country earn the same number of points if they have consecutive rankings in a given round; and Tickner, Santee, and Hamilton had consecutive rankings in both rounds of the hypothetical team event.

1790

D. Cheng

Table 13 1980 Olympic Winter Games’ figure skating individual results used to tabulate a hypothetical team event (Olympic.org 1980) Round 1

Country USA RUS EGR GBR WGR CAN JAP FRA CHN SWE

M 8 7 10 9 4 3 5 6 1 2

Round 2

L 10 5 9 6 8 5 7 1.5 4 1.5

P 8 10 9 5 7 6 2.5 2.5 2.5 2.5

D 7 10 3 9 6 8 3 3 3 3

M 8 7 9 10 6

L 10 6 9 7 8

P 7 10 9 6 7

D 8 10 6 9 7

Hypothetical team event point totals 66 65 64 61 53 22 17.5 13 10.5 9

Hypothetical team event placement 1 2 3 4 5 6 7 8 9 10

If there had been a figure skating team event contested in 1980, and if David Santee or Scott Hamilton had been selected to represent the USA in the team event, they could have earned an Olympic gold medal (Scott Hamilton won the gold medal in the 1984 Olympic Winter Games). The second-placing team in the 1980 hypothetical team event was the Soviet Union. In the pairs short program, there were three teams from the Soviet Union that earned equivalent rankings for the purposes of the hypothetical team event: Irina Rodnina and Alexander Zaitsev (first place), Marina Cherkasova and Sergei Shakhrai (second place), and Marina Pestova and Stanislav Leonovich (third place). The teams of Rodnina and Zaitsev and Cherkasova and Shakrai earned medals in the pairs event (gold and silver, respectively), but Pestova and Leonovich earned the pewter medal after all results were tabulated from the pairs event. If there had been a team event in 1980, the team from the Soviet Union could have chosen Pestova and Leonovich to skate in Round 1 and they would have earned a silver medal in the figure skating team event.

Summary Many aspects of the art of figure skating can be explained and investigated mathematically. Due to the circular nature of the skate blade, many of the tracings that figure skaters form on the ice can be described as composite arcs of circles. Describing pattern dances within ice dance uses proportional reasoning. The movements that skaters perform can be interpreted geometrically as combinations of transformations. From a biomechanical standpoint, mathematics is used to explore ways in which skaters can train to achieve better spins and higher skills such as multi-revolution jumps. The numeric scoring system used to evaluate

69 Breaking the Ice: Figure Skating

1791

skaters’ performances is also an area of mathematical consideration, as skaters try to maximize their points earned or as skating federations attempt to explain why they think scoring biases occurred. The newest event added to the Olympic Winter Games figure skating schedule, the team event, also provides an opportunity for additional explorations regarding athletes’ relative contributions towards their teams’ goals and what results might have taken place if the team event were held in previous years.

References Arbour K (2012) Tibial Shock of on-ice figure skating jumps relates to jump-characteristics and kinematics. Dissertation, University of Delaware Bruening D, Richards J (2006) The effects of articulated figure skates on jump landing forces. J Appl Biomech 22(4):285–295. https://doi.org/10.1123/jab.22.4.285 Cheng D (2013) International judging system of figure skating: a middle grades activity on decimal operations. The Bridges archive. http://archive.bridgesmathart.org/2013/bridges2013-563.pdf. Accessed 21 Feb 2019 Cheng D, Coughlin P (2017) Using equations from power indices to analyze figure skating teams. Public Choice 170(3):231–251. https://doi.org/10.1007/s11127-016-0392-x Cheng D, Coughlin P (2018) What if a figure skating team event had been held at past Olympic Winter Games? An analysis of a hypothetical competition. J Sports Anal 4(3):215–227. https:// doi.org/10.3233/JSA-170148 Cheng D, Twillman M (2018) Double the fun: mathematics within pairs figure skating side by side jumps. Math Teach 111(4):249–253 Cheng D, Berezovski T, Talbert R (2019, in press) Dancing on ice: mathematics of blade tracings. J Math Arts. https://doi.org/10.1080/17513472.2018.1509259 Dubravicic-Simunjak S, Pecina M, Kuipers H, Moran J, Haspl M (2003) The incidence of injuries in elite junior figure skaters. Am J Sports Med 31(4):511–517 Emerson J, Seltzer M, Lin D (2009) Assessing judging bias: an example from the 2000 Olympic Games. Am Stat 63(2):124–131 Findlay L, Ste-Marie D (2004) A reputation bias in figure skating judging. J Sport Exerc Psychol 6:154–166 Formenti F, Minetti A (2007) Human locomotion on ice: the evolution of ice-skating energetics through history. J Exp Biol 2010:1825–1833 Fortin J, Harrington L, Langenbeck D (1997) The biomechanics of figure skating. Phys Med Rehabil 11(3):627–648 Gonzalez, R (2018) Figure skating’s quintuple jump: maybe impossible, definitely bonkers. https:/ /www.wired.com/story/can-figure-skaters-master-the-head-spinning-physics-of-a-quintuplejump/. Accessed 21 Feb 2019 Guinness World Records (2019) Records. http://www.guinnessworldrecords.com/records/. Accessed 21 Feb 2019 Huang J, Foote C (2011) Using generalizability theory to examine scoring reliability and variability of judging panels in skating competitions. J Quant Anal Sports 7:1–21 International Skating Union (2002) Communication No. 1181: Sanctions relating to 2002 Olympic Winter Games pair skating event. http://www.isu.org/vsite/vfile/page/fileurl/0,11040,4844137120-154336-25640-0-file,00.pdf. Accessed 21 Feb 2019 International Skating Union (2006) XX Olympic Winter Games Torino 2006 results. http://www. isuresults.com/results/owg2006/. Accessed 21 Feb 2019 International Skating Union (2018a) Olympic Winter Games 2018 Pyeongchang results. http:// www.isuresults.com/results/season1718/owg2018/. Accessed 21 Feb 2019

1792

D. Cheng

International Skating Union (2018b) Communication No. 2168: single & pair skating: scale of values, levels of difficulty and guidelines for making grade of execution, season 2018/19. https:/ /www.isu.org/communications/17142-isu-communication-2168/file. Accessed 21 Feb 2019 International Skating Union (2018c) ISU figure skating media guide 2018/19. https://isu.org/ media-guides/17522-figure-skating-media-guide-2018-19/file. Accessed 21 Feb 2019 International Skating Union (2018d) International judging system: handbook for referees and judges: ice dance. https://www.isu.org/inside-single-pair-skating-ice-dance/isu-judgingsystem-fs/handbooks-faq-ice-dance-2/17832-handbook-for-referees-and-judges-2018-19-final/ file. Accessed 21 Feb 2019 International Skating Union (2018e) ISU communication 2188: ice dance. https://usfsa.org/ content/ISU%202188%20ID%20Communication%20replacing%202164.pdf. Accessed 21 Feb 2019 International Skating Union (2018f) ISU grand prix of figure skating final: 2018/19. http://www. isuresults.com/results/season1819/gpf1819/. Accessed 21 Feb 2019 Kerrigan N, Spencer M (2003) Artistry on Ice: figure skating skills and style. Human Kinetics, Champaign King D (2005) Performing triple and quadruple figure skating jumps: implications for training. Can J Appl Physiol 30(6):743–753 King D (2018) Dr. Deborah King: biomechanics of figure skating. 2018. American College of Sports Medicine. https://www.youtube.com/watch?v=TacSOaXZgJQ. Accessed 21 Feb 2019 King D, Smith S, Higginson B, Muncasy B, Schierman G (2004) Characteristics of triple and quadruple toe-loops performed during the Salt Lake City 2002 Olympic Winters. Sports Biomech 3(1):109–123 Leonard A, Bannister N (2018) Dancing our way to geometric transformations. Math Teach Middle Sch 23(5):258–267 Lewin, W (2018) Blended Learning Open Source Science or Math Studies (BLOSSOMS) – Ice skater’s delight: the conservation of angular momentum with professor Walter Lewin. https://techtv.mit.edu/videos/5731-blossoms-ice-skater-s-delight-the-conservationof-angular-momentum-with-professor-walter-lewin. Accessed 21 Feb 2019 Ling S, Sanny J, Moebs B (2018) OpenStax University physics: moment of inertia and rotational kinetic energy. https://phys.libretexts.org/Bookshelves/University_Physics/Book%3A_ University_Physics_(OpenStax)/Map%3A_University_Physics_I_-_Mechanics%2C_Sound% 2C_Oscillations%2C_and_Waves_(OpenStax)/10%3A_Fixed-Axis_Rotation__Introduction/ 10.4%3A_Moment_of_Inertia_and_Rotational_Kinetic_Energy. Accessed 21 Feb 2019 Lockwood K, Gervais P, McCreary D (2006) Landing for success: A biomechanical and perceptual analysis of on-ice jumps in figure skating. Sports Biomech 5(2):231–241 Mayer, J (2018) The history of ice skates. https://www.sciencefriday.com/articles/history-iceskates/. Accessed 21 Feb 2019 National Basketball Association (2014) A closer look at the Draft Combine. https://www.nba.com/ sixers/news/140512-combine-primer. Accessed 21 Feb 2019 Olympic.org (1980) Lake Placid 1980 Olympic Figure Skating. https://www.olympic.org/lakeplacid-1980/figure-skating. Accessed 21 Feb 2019 Olympic.org (2002) Salt Lake City 2002 Olympic Figure Skating. https://www.olympic.org/saltlake-city-2002/figure-skating. Accessed 21 Feb 2019 OlympicTalk (2018) Olympic figure skater lands quadruple Axel in harness. https://olympics. nbcsports.com/2018/06/28/keegan-messing-quadruple-axel-video-harness/. Accessed 21 Feb 2019 Petkevich J (1989) Figure skating championship techniques. Sports Illustrated Books, New York Poppick, L (2014) Is the quintuple jump in figure skating physically possible? https://www. scientificamerican.com/article/is-the-quintuple-jump-in-figure-skating-physically-possible/. Accessed 21 Feb 2019 Rowley K, Richards J (2015) Increasing plantarflexion angle during landing reduces vertical ground reaction forces, loading rates and the hip’s contribution to support moment within participants. J Sports Sci 33(18):1922–1931

69 Breaking the Ice: Figure Skating

1793

Santee J (2014) Essential Knowledge of the Figure Skating Blade: Course SS 208 – Sport Safety & Science. Professional Skaters Association. http://www.skatepsa.com/psa/CER.html. Accessed 21 Feb 2019 Saunders N, Hanson N, Koutakis P, Chaudhari A, Devor S (2014) Landing ground reaction forces in figure skaters and non-skaters. J Sports Sci 32(11):1042–1049. https://doi.org/10.1080/ 02640414.2013.877593 Swift E (1998) Blind Justice. Sports Illustrated, pp 52–54, 57, 60 The New York Times (2018) The quad jump is changing skating. Nathan Chen is leading the way. https://www.nytimes.com/interactive/2018/02/08/sports/olympics/nathan-chen-figureskating.html. Accessed 21 Feb 2019 US Figure Skating (2018) The 2019 Official U.S. Figure Skating Rulebook. http://www.usfsa.org/ content/2018-19 Rulebook.pdf. Accessed 21 Feb 2019 Waldeck V, Bohm A, Petricevic S (2018) Case No. 2018–06: Final Decision in the matter of International Skating Union against Ms. CHEN Weiguang and Chinese Skating Association. https:/ /www.isu.org/communications/17361-case-2018-06-isu-vs-chen/file. Accessed 21 Feb 2019 Waldman, P (2006) Figure skaters blame boot design for injury plague. https://www.wsj.com/ articles/SB114014691681476717. Accessed 21 Feb 2019 Wetzel D (2018) Skating’s final frontier: Is the quintuple jump possible? https://sports.yahoo.com/ figure-skatings-final-frontier-quintuple-jump-possible-043451973.html. Accessed 21 Feb 2019 Zitzewitz E (2006) Nationalism in winter sports judging and its lessons for organizational decision making. J Econ Manag Strateg. https://doi.org/10.1111/j.1530-9134.2006.00092.x

The Mathematical Foundations of the Science of Cities

70

Christa Brelsford and Taylor Martin

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ebenezer Howard’s Perspective on Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jane Jacobs’ Perspective on Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Network Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Space Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Axial Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measures Using the Axial Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Criticisms of Space Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Road Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Named-Street Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Intersection Continuity Negotiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measures of Road Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Social Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1796 1797 1797 1800 1803 1804 1805 1806 1807 1808 1809 1810 1810 1811

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC0500OR22725 with the US Department of Energy. The US Government retains and the publisher, by accepting the article for publication, acknowledges that the US Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). C. Brelsford () Oak Ridge National Laboratory, Oak Ridge, TN, USA e-mail: [email protected] T. Martin Sam Houston State University, Huntsville, TX, USA e-mail: [email protected] © This is a U.S. Government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_56

1795

1796

C. Brelsford and T. Martin

Urban Scaling Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1811 1813 1817

Abstract In this chapter, we describe graph-theoretic representations of infrastructure and social processes in urban environments and trace the development of these fields from the perspective of mathematics, social science, and urban planning. We follow the historical development of two different perspectives on cities and urban planning – one in which infrastructure and urban form is the primary focus and another which made people and social processes the primary focus. These perspectives can be traced through their application of concurrently developing mathematical techniques in graph theory, network science, and social network analysis. These different perspectives are now becoming integrated into a more mathematically grounded understanding of cities as coupled social and physical systems, in which the social life of a city shapes and defines the infrastructure that is built and a city’s infrastructure and physical form also shape the lives and communities of its residents.

Keywords Cities · Graph theory · Network science · Social network analysis · Infrastructure · Urban planning · Space syntax · Urban scaling theory

Introduction Since the beginning of the twentieth century, two very different conceptions of how cities can be organized have influenced our management of cities and the strategies we use to understand them. Both viewpoints sought to address what were perceived as key problems in urban management: slums, poor-quality housing, overcrowding, crime, poverty, and disease. One viewpoint draws from art and architecture and focuses on urban infrastructure and urban form. Another viewpoint of cities takes a people-first approach and focuses on the way that urban form shapes social processes within cities. These two perspectives on cities – one putting infrastructure first and another putting people first, can be traced through their application of concurrently developing mathematical techniques in graph theory, network science, and social network analysis to see how they are now becoming integrated into a more mathematically grounded understanding of cities as coupled social and physical systems, in which the social life of a city shapes and defines the infrastructure that is built and a city’s infrastructure and physical form also shape the lives and communities of its residents.

70 The Mathematical Foundations of the Science of Cities

1797

Ebenezer Howard’s Perspective on Cities Ebenezer Howard is sometimes called the first modern urban planning theorist. His 1902 work on “garden cities” is well known (Howard, 1902) and has been highly influential in urban planning. His proposals were explicitly intended to address contemporary problems of urbanization: slums and congestion. In this book, he explicitly described the ills of “town” as lack of nature, crowds, foul air, and murky sky and the ills of “country” as lack of activity and low wages. Howard proposed the idea of the “Garden City of Tomorrow,” a proposal intended to allow all the benefits of both the town and the country and none of their ills. Howard aimed to provide healthy, safe, and appropriate housing for the working classes from a top-down master plan, with rigid population caps within any given city. One of the defining characteristics of this plan was a strongly enforced agricultural greenbelt surrounding the town and ample, quality, low-density housing. Figure 1, an illustration reproduced from Howard (1902), shows the geometric layout of Howard’s garden cities. This greenbelt perspective significantly influenced town planning in the UK for the next century and thus around the world. Howard has been criticized as creating one of the first descriptions and plans about suburbia. Few formally planned garden cities were ever built, but the model of low-density single-family housing surrounded by yards and gardens and well separated from commercial and industrial activity still underpins a substantial share of urban development plans. Howard and later scholars who also expounded on the virtues of large swaths of green space within dense urban environments relied heavily on classical notions of geometric simplicity, art, and beauty.

Jane Jacobs’ Perspective on Cities In her 1961 book the Life and Death of Great American Cities, Jane Jacobs (1992) proposed multi-use, people-centered, bottom-up urban structure and management. Jacobs believes that places where people go to interact, congregate, and exist must be central to all urban management strategies. This means that plans must be made at a human scale rather than monumental scale. Functional neighborhoods must have a wide mix of different uses. Cities and even more particularly neighborhoods must have a wide mix of building types, use types, small blocks, and sufficient density, all in service of maintaining a robust community of diverse people (not cars) who use the streets and public places. This facilitates casual social contact and interactions between people who are unlikely to become close friends. Jacobs describes that these happenstance interactions ultimately generate the rich capacity for informal problem-solving and urban management that well-running cities rely heavily upon. Jacobs held an essential faith in the importance of self-organization and selfgovernance among urban dwellers, and her conceptual understanding of cities arose from the goal of enabling residents to manage themselves. The core point of all

1798

C. Brelsford and T. Martin

Fig. 1 Reproduced from Howard (1902)

of this work was trying to develop practical tools for planners to use as a strategy for building better cities, free from the accidentally planned ills Jacobs perceived as generated by the formally trained urban planners and theorists, who historically had focused on the physical layout and organization of a city, rather than its social processes. Not surprisingly then, Jacobs did not think highly of the master planned Garden Cities like those described by Howard and Le Corbusier. Jacobs described Howard’s Garden Cities as:

70 The Mathematical Foundations of the Science of Cities

1799

. . . really very nice towns if you were docile and had no plans of your own and did not mind spending your life among others with no plans of their own. As in all Utopias, the right to have plans of any significance belonged only to the planners in charge. (Jacobs, 1992)

In this critique, Jacobs makes the point that the vast majority of people “have plans” and cities need the ability to incorporate these plans. That incorporation is an inherent part of how cities adapt to the continuously changing social, economic, and physical contexts that any city exists through. This is consistent with the present-day perspective of cities as complex adaptive systems. Jacobs’ work is nearly entirely qualitative and descriptive. At the time that she was writing, very little mathematical effort had been put into attempting to describe the types of social processes Jacobs viewed as fundamental to the existence of cities. The advent of more mathematical network theory later in the century begins to allow these qualitative descriptions of social processes to be described mathematically. After the mathematical tools for describing these social processes developed, it becomes clear that Jacobs’ qualitative descriptions of social processes foreshadow future fundamental discoveries in network science. For example, in the context of a childhood game called “Messages,” Jacobs gives a description of a social process which is a near perfect characterization of that described 30 years later, by Watts and Strogatz in (1998) in a formal mathematical language. The idea was to pick two wildly dissimilar individuals – say a headhunter in the Solomon Islands and a cobbler in Rock Island, Illinois – and assume that one had to get a message to the other by word of mouth; then we would each silently figure out a plausible, or at least possible, chain of persons through whom the message could go. The one who could make the shortest plausible chain of messengers won. The headhunter would speak to the trader who came to buy corpa, who would speak to the Australian patrol officer when he came through, who would tell the man to was next slated to go to Melbourne on leave, etc. Down at the other end, the cobbler would hear it from his priest who got it from the governor, etc. We soon had these close-to-home messengers down to a routine for almost anybody we could conjure up, but we would get tangled in long chains in the middle until we began employing Mrs. Roosevelt. Mrs. Roosevelt made it suddenly possible to skip whole chains of intermediate connections. She knew the most unlikely people. The world shrank remarkably. It shrank us right out of our game, which became too cut and dried. (Jacobs, 1992)

Jacobs’ description even includes a qualitatively accurate description of the realworld (and game-based) consequences of differences in network structure between a “regular” and a “small-world” network. From this description of the importance of small-world networks, Jacobs concludes that cities should be tuned to facilitate social ties: both for the Mrs. Roosevelts of the world who know many diverse people and for the cobblers who know the shoe sizes of all of their neighbors, but rarely leave their neighborhood. Jacobs’ work didn’t have a deep mathematical foundation, and the mathematics that we now use to describe social networks had not yet been developed. This problem also shows up in many social science fields, in part because the mathematics

1800

C. Brelsford and T. Martin

we use to describe social processes has only recently been developed, and data collection at the scale of cities has historically be prohibitively difficult. In an urban context, a great deal of socially focused work seeks to identify priorities for planning and improving cities. As cities, urban plans, and information about the residents and their behavior became digitized, a new set of mathematically based perspectives for describing cities arose. A substantial literature now draws from graph theory and applied network science to describe the streets and urban infrastructure from network perspectives and can draw from Jacobs’ qualitative insights.

Graph Theory The mathematical history of the fields of topology and graph theory dates back to 1736 when Euler published his famous paper on the Seven Bridges of Königsberg. Thus, from its origin, graph theory was used to model traffic patterns in urban settings. Following, we define some graph-theoretic tools that a wide range of scientists studying urban environment, road network analysis, urban morphology, complexity theory, and contemporary city science all rely upon for the basis of giving mathematical models and measures of the changing dynamics of city infrastructure and function. Definition 1. A graph is a set of points called vertices V and a set of line segments called edges E such that every edge e ∈ E is bounded by two distinct vertices v1 , v2 ∈ V. Graphs that can be arranged on a plane such that no two edges cross are called planar graphs. Using a planar graph to represent the transportation network of an urban environment seems natural; roads are just thickened edges, and they intersect each other, giving natural placements of vertices. When studying the science of cities and, in particular, when studying road networks, a graph that is created from a map of a city by representing each road, street, path, or other transportation line by an edge and placing vertices at each intersection of two such edges is referred to as a primary graph. Definition 2. For a vertex v in a graph G, the degree of v, denoted d(v), is the number of edges that are incident to v in the graph. Definition 3. A graph is called simple if each pair of vertices v1 , v2 of G are connected by at most one edge and no edge has both endpoints incident to the same vertex. In other words, a graph is simple if it does not contain loops or multiple edges between the same two vertices.

70 The Mathematical Foundations of the Science of Cities

1801

Definition 4. For two vertices, v and w, in a graph G, we define the distance between v and w, which we denote (v, w), to be the number of edges in the shortest path between v and w. A face of a planar graph is a polygonal region in the plane that is bounded by edges on the graph. We also may include an exterior face, which is the unbounded region in the plane. Given a primary graph G for a street network, we will employ several ways to create new graphs from G that will model flow through the network or connectivity within the network. Broadly, each of these new graphs are referred to as dual graphs, though we should be careful to distinguish the various types of dual construction in city science and road network analysis. In graph theory, a traditional graph dual has a specific construction. Definition 5. For G a planar graph, the dual of G is a graph G∗ created by assigning one vertex in G∗ for each face of G, including the external face, and adding an edge of G∗ for every edge of G by placing an edge between two vertices v, w of G∗ for every edge in G that bounds both faces corresponding to v and w. The definition given above is the strongest form of graph dual and frequently includes multiple edges between the same two vertices and loops, which are edges whose endpoints are incident to the same vertex. A slightly more relaxed definition of a graph dual is given below and will be used in later sections. A graph and its dual are pictured in Fig. 2. Definition 6. For G a planar graph, the weak dual of G is a graph G1 that is created by assigning one vertex of G1 for every interior face of G and connecting two vertices of G1 by an edge if and only if the corresponding faces of G share a common edge.

Fig. 2 A graph G and its dual G∗

1802

C. Brelsford and T. Martin

Fig. 3 A graph G and its weak dual G1

Fig. 4 A graph G and its ¯ edge dual G

Notice that the definition of weak dual specifically does not include multiple edges or loops and omits the exterior face from consideration. This results in a weak dual that is simple. A graph G and its weak dual G1 are pictured in Fig. 3. A similar construction is occasionally referred to as a dual in road network literature, but involves replacing edges with vertices instead of replacing faces with vertices. We will refer to this type of construction as an edge dual, though we should note that many names are used for this type of construction in graph theory, including line graph, edge-to-vertex dual, adjoint, conjugate, and others. A graph ¯ are pictured in Fig. 4. G and its edge dual G ¯ that is created Definition 7. For G a planar graph, the edge dual of G is a graph G ¯ ¯ by assigning one vertex of G for every edge of G and connecting two vertices of G by an edge if and only if the associated edges of G are adjacent. Definition 8. An edge contraction on a planar graph G is a mapping that removes an edge e from G by identifying the two vertices incident to e as a single vertex. Thus, performing a single edge contraction on a graph will reduce the number of vertices by one. Further, we will choose to disallow the creation of multiple edges between the same two vertices from the edge contraction map. If contracting along an edge creates multiple edges, we will keep one edge and remove the duplicates.

70 The Mathematical Foundations of the Science of Cities

1803

¯ and the Fig. 5 A graph G result of an edge ¯ contraction, G

In this way, we ensure that an edge contraction on a simple graph produces a simple graph. Figure 5 shows the edge dual from Fig. 4 before and after contracting along an edge (pictured in green) and eliminating a duplicate edge created by the contraction.

Network Science Once the basic mathematical tools to describe graphs and their properties had been developed, fields including city planning, engineering, epidemiology, and the social sciences began applying graphs to study and describe their own sets of problems. This became the new field of network science. In network science, vertices V are typically called nodes, graphs are often called networks, and edges E are occasionally called links or relationships. Many types of physical infrastructure, such as road and rail lines, power distribution systems, water distribution systems, and wired telecommunication infrastructure, are easily and directly represented through networks. In these physically based applications of graph theory, a small set of general questions applicable across many different types of networks began to be asked. How can we categorize and describe the overall properties of a network? Basic characterization strategies include the number of nodes, number of edges, and the density of a network, which is defined as the number of edges divided by the total possible number of edges. Additional refinements include the average degree of the nodes in G, or a characterization of the degree distribution in G, and the average clustering coefficient, which is related to the probability that two of a nodes’ neighbors are also linked. Another set of fundamental questions include efforts to characterize which nodes are “most important” or most influential by some measure. This has led to the development of a range of centrality measures on nodes and edges in networks. Finally, we would like a strategy for splitting a large network into subgroups in order to observe the underlying structure of the network. There is now a substantial literature on clustering algorithms in networks. These measures of networks all support a broad series of questions that are essentially related to the transfer of information, ideas, disease, or materials along a network. Concurrently, there are characterizations of different categories of networks such as Erd˝os-Rényi random graphs, scale-free networks, and small-world networks. Behavior on these idealized, generic networks can be compared to behavior on some

1804

C. Brelsford and T. Martin

actual network, with the goal of observing whether an idealized network category is a sufficiently good approximation of some real-world networked phenomena. Some aspects of network science research aims to understand the underlying properties of networks themselves, and some research aims to understand processes that happen on networks. These methods and ideas lead to important advances in modern epidemiology, and versions of these mathematical techniques have been applied to the transmission of diseases, ideas, behavior, physical material or energy, and many other characteristics. They also set the stage for the analysis of explicitly social networks – estimating the role that network structure plays in shaping socioeconomic outcomes in different contexts.

Space Syntax The study of cities dubbed space syntax emerged in the 1970s and 1980s as a collection of theories for the analysis of built space using mathematical tools. This school of thought is based on the premise that: [S]ocial structure is inherently spatial and inversely that the configuration of inhabited space has a fundamentally social logic. One important implication of this assertion is that the relationship between society and space is not merely that of mapping one domain onto the other but has a dynamic aspect as well; each modifies and restructures the other. (Bafna, 2003)

The goal, therefore, of space syntax research is to describe human use of configured space – city streets and roads, but also floor plans within an office building, collections of neighboring buildings, or any other continuous physical space that has been partitioned for social use – in a way in which mathematical tools can be used to measure social function. In order to apply any tools of space syntax to a built space, it is common to create a graph that conveys information about how society uses the space. Figure 6, from Bafna (2003), shows an example of a graph that describes accessibility of rooms within an office building. Each individual office and hallway is represented by a vertex of the graph. An edge connects two vertices in the graph if and only if the rooms represented by the vertices are connected by a doorway.

Fig. 6 Creating a graph to represent accessibility of an office space. (From Bafna 2003)

70 The Mathematical Foundations of the Science of Cities

1805

The Axial Map There is a large body of space syntax literature concentrated in the 1990s– 2000s devoted to urban network analysis using topological tools that rely on a concept called an axial map. The axial map, as we will see, then lends itself to a corresponding axial graph that describes mobility within the network. It is upon this graph that we can place mathematical measures that will yield measures of social connectivity within a city. However, the construction of an axial map from a given urban access network has been the topic of much debate among scholars in the field; as with many mathematical models of physical systems, there is no perfect mapping. As we will see, this leads to interesting pathological examples from which we can derive mathematical conclusions. Ultimately, this indeterminacy prompted refinements of the space syntax methodologies that have spurred more recent branches of urban morphology study. The construction of an axial map originated as an intuitive iterative process and not a well-defined mathematical definition. We describe the process for creating an axial map of an urban street network, using the description from Bafna (2003), but this process can also be applied to any configured space. Beginning with a street-level map of a city or neighborhood, with a ruler, look for the “longest straightest line” segment that is completely contained within the road network, which we will refer to as an axial line. For example, one may find a long main road that does not have many bends or turns to be the first axial line included in the axial map. After drawing that line, iterate this process, drawing the second longest straight line that is fully contained within the network. Continue in this way until every street segment is accounted for in the axial map and, necessarily, every street intersection is represented by an intersection of axial lines. An example of a portion of a street-level map with its axial map appears in Fig. 7. Once an axial map has been created, we can associate a graph to the urban street network. Each axial line in the axial map corresponds to a vertex in the axial graph. An edge exists between two vertices if and only if their corresponding axial lines intersect. The axial graph corresponding to the street network in Fig. 7 is shown in Fig. 8. Mathematically, the axial map serves as a way to represent the two-dimensional configured space as a 1-complex. Certainly, there are myriad ways to collapse a surface with boundary into a 1-complex. The objective of the axial map’s concept of “longest straightest line” is to be able to identify main thoroughfares as important junctions for mobility. Therefore, a road segment in the urban street network that intersects with many smaller side streets will be represented in the axial graph by a vertex of higher degree than that of a smaller axial line with fewer intersections. It is worth noting that the axial map is not a proper graph. Axial lines are edgelike, but they can intersect in the interior of the axial lines, at places that are not vertices, as they are not endpoints of individual edges.

1806

C. Brelsford and T. Martin

Fig. 7 An example of a road network with its axial map overlayed

Fig. 8 An example of the axial graph corresponding to the axial map in Fig. 7

Measures Using the Axial Map One key measurement of mobility within a road network derived from the axial map is referred to in space syntax literature as topological distance: Space syntax takes the position that the key to urban function, at the level of the movement of people through the city and the distribution of people within the spaces of the city, is the way in which each space is accessible from every other space in the city, not in terms of metric distance, but rather in terms of topological distance, or the number of changes of direction needed to move from one space to another. (Read, 1999)

Each vertex of the axial graph is given a relative asymmetry (RA) value, introduced by Hillier and Hanson in 1984, which is calculated as follows:

70 The Mathematical Foundations of the Science of Cities

1807

Definition 9. For a fixed vertex v, in a graph with n vertices, compute the mean ¯ depth of v, denoted d(v) by averaging the minimal distance across the network between v and all other vertices of the graph.  ¯ d(v) =

(v, w)

w∈G

n−1

Then, choose the maximal mean depth across the system, ¯ d¯max := max{d(v)|v ∈ G}

(1)

Definition 10. For a vertex v in an axial graph G, the relative asymmetry (RA) of ¯ . v is defined by the ratio dd(v) ¯ max

¯ By normalizing the d(v) values, we guarantee that the relative asymmetry of a vertex in an axial graph always ranges between 0 and 1. This allows for comparison across different systems. Vertices with lower RA values are said to be shallow, while higher RA value vertices are called deep. The distribution of RA values across the vertices of an axial graph is a way to distinguish urban areas. However, this distribution doesn’t scale especially well; an axial graph with a large number of vertices doesn’t easily compare to a graph with a small number of vertices. A further normalization can be obtained by dividing RA values for a given axial graph by the RA value of the central vertex of a standardized axial graph referred to as a diamond graph, with the same number of vertices. This measure is called real relative asymmetry (RRA). The reciprocal of a vertex’s RRA value is called its integration value; several space syntax scholars have shown: a very noticeable correlation between integration values of a node, which represents a particular axial lines, and the average number of people found on the space that is associated with the same axial line. This result has been reproduced in different cultural settings, at different scales, and in different types of environment and has often helped generate insights about urban structure. (Bafna, 2003)

Criticisms of Space Syntax Space syntax views mobility in an urban system from a topological lens; path length between two vertices in an axial graph, which represent streets in the road network, is determined by the minimal number of turns required to travel from one street to the other. While the “longest straightest line” approach to constructing an axial map allows for main thoroughfares to be represented by vertices with much higher degree than smaller side streets, it also equates any two points on the same urban road as the same place, even though in the physical sense, these two points may be located miles apart geographically.

1808

C. Brelsford and T. Martin

Fig. 9 A slight deformation of a gridded neighborhood yields a different axial graph

However, the main drawback of using axial graphs to model urban function is that the axial graph is mathematically not well-defined. By this, we mean that small deformations of the urban road network could lead to vastly different axial graphs, which could result in large swings in RA, RRA, and integration values for cities that seem very similar from the perspective of someone navigating at the level of streets and roads (Ratti, 2004). Figure 9, from Ratti (2004), shows an axial map overlayed on a fictional neighborhood. In this image, we see that by shifting the city blocks slightly, the roads are no longer straight. The axial map does not compensate for this small perturbation and would thus classify the horizontal street as three separate vertices of the axial graph. Similarly, each vertical street would be represented as two separate streets in the axial graph. Space syntax uses graph-theoretic and topological tools to model and study urban road networks. However, it doesn’t delineate different types of configured space; land use has no part in the model of an urban network. In fact, space syntax makes a simplifying assumption that configured space is more or less evenly distributed in terms of function and that the variance, in say, suburban residential homes as compared to busy urban skyscrapers, is represented by the integration values on the streets they call home (Ratti, 2004).

Road Network Analysis The work of Batty and colleagues from early 2000s through today has its basis in space syntax literature, but makes deviations to correct for some of the mathematical ambiguity present in space syntax methodologies. This work is focused on analyzing the urban road network of cities using mathematical measures and validating these measures by observing road networks of various cities. The goal of this work is to better understand urban morphology as an evolving complex system. City science, past and present, agrees that cities cannot be modeled and studied using a single methodology or approach. In this section, we explore an area dubbed

70 The Mathematical Foundations of the Science of Cities

1809

urban morphology – the study of spatial structure of human settlements that focuses on pattern and transformation. Urban morphologists study cities as complex systems which emerge organically, displaying fractal-like geometry, which influence and are influenced by social structure and land use, and which dispel the utopian perspectives of cities as ineffective. In order to study road networks and, more broadly, cities, as complex systems, it is key to agree upon a well-defined procedure for representing the two-dimensional road network as a graph. Urban morphology seeks to model the flow and function of the road network using graph-theoretic and topological measures; therefore a street level graph is not sufficient. Thus, we look to build upon the edge dual idea that is central to space syntax and seek to change the construction of the axial map into a map that is mathematically well-defined.

Named-Street Construction One such approach, due to Jiang and Claramunt, is dubbed the “named-street approach.” (Jiang and Claramunt, 2004). In this construction, we form a primary graph from the road network by placing vertices at every street intersection and an edge connecting two vertices if and only if there is a segment of a road connecting the two intersections. Similar to the axial graph construction from space syntax, we create a secondary graph that is the result of contractions of an edge dual; we assign each edge in the primary graph to a vertex in the edge dual, and we connect two vertices v1 and v2 in the edge dual with an edge if and only if there are edges e1 and e2 in the primary graph, where ei maps to vi and e1 and e2 share a vertex in the primary graph. However, in the named-street construction, we assign edges of the primary graph to the same vertex in the edge dual if the edges represent streets with the same name. Note: we remove any loops or multiple edges if they are created. In this way, though a main thoroughfare may be represented by numerous edges in the primary graph, that set of edges are all represented by one single vertex in the dual. However, if a street changes names, though it continues on in a linear fashion, that street will map to two distinct vertices in the edge dual (which will be connected by an edge). This approach has a benefit of resulting in a well-defined graph construction in the sense that minor perturbations of the geometry of a road, such as a slight bend or curve, will not disrupt the resulting graph. However: The main problem with this approach is that in introduces a nominalistic component in a pure spatial context . . . street names are not always meaningful in any sense, they are not always reliable as the same street may be termed in different ways by different social groups, or in different contexts, at different scales, in different ages. Other problems are that street name databases are not easily available for all cases or at all scales, and that the process of embedding and updating street names into GIS seems rather costly for large datasets. (Porta et al., 2006)

1810

C. Brelsford and T. Martin

Intersection Continuity Negotiation A different approach seeks to use purely spatial filters to decide which roads in a primal road network graph should be contracted to the same vertex in a contracted edge dual. Dubbed the intersection continuity negotiation (ICN), each intersection of road segments is assigned a continuity value based on the largest convex angle of the intersection, and streets are grouped together based on these continuity values (Porta et al., 2006). Attempts have also been made to strike a balance between algorithmically and socially determining road identity. For example, the hierarchical intersection continuity negotiation (HICN) presents a hybrid approach by categorizing roads in a hierarchy: motorways, class A, class B, and minor roads. Then, the ICN is followed for each category in succession, with a stipulation that any segments meeting at less than a 90◦ angle are not allowed to be identified in the edge dual graph. Lastly, roads deemed minor roads are disallowed to be identified if they straddle a major road (Masucci et al., 2014).

Measures of Road Network Analysis Road network analysis begins with a primary graph G of an urban road network. ¯ for the Then, we must choose a convention for creating a contracted edge dual G primary graph in order to look at measures of the flow of the system. These edge contractions are performed in order for the vertices of the final contracted edge dual to best represent individual but entire streets and road in the urban system. It stands to reason that, like any mathematical model of a physical system, there will not be a perfect representation; some error and pathology must be accepted. Once the final edge dual representation of the urban network has been constructed, we can consider several measurements that are features of the city. A simple measure of flow within a road network is that of degree distribution. ¯ has a positive integer-valued degree, d(vi ). As vertices of G ¯ Each vertex vi of G represent streets in the urban system, this degree is the total number of streets that ¯ and, for each positive integer intersect street vi . If N is the number of vertices of G ¯ k, N(k) represents the number of vertices of G with degree k, we can consider the ¯ which we will define to be: degree distribution of G, P (k) =

N(k) . N

(2)

The study of complex systems, including urban morphology, looks to identify large complex systems as being scale free. A network is scale free if its degree distribution follows a power law, with power 2 < γ < 3. P (k) ∼ k −γ

(3)

70 The Mathematical Foundations of the Science of Cities

1811

Social Network Analysis Most examples of network science applied to a social context represent each person as a vertex and a specific type of relationship between two people as an edge. These edges might represent friendship, kinship, or some other social relationship, co-location, communication, financial dependence, or any of the many ways a person can relate to others. After the basic mathematics and computational tools to explore large hypothetical networks had been developed, researchers began using representations of real or simulated social networks to explore (1) interaction patterns, (2) clustering in networks, (3) epidemics, and (4) idea and behavior generation and transmission. The relationships that Jacobs attributes to Mrs. Roosevelt are direct analogs to the “shortcut” links in Watts and Strogatz’s paper on small-world networks (Watts and Strogatz, 1998). Jacobs described how it only takes a few Mrs. Roosevelts or “hop-skip people” to profoundly change the dynamics of communication through networks and that these information passing processes play a fundamental role in well-functioning cities. The broad concept of information passing on social networks is now a core idea in understanding social processes. Combining Jacobs’ rich, detailed, and intuitive understanding of social processes in cities with new tools from social network analysis allows a field that has historically been nearly purely qualitative to be written down in a quantitative manner. This enables us to develop empirically testable hypotheses and attempt to falsify them. In Jacobs’ framework, we can explore whether cities and neighborhoods with rich sidewalk lives are objectively safer than communities without them. Do very diverse neighborhoods and communities have broader social networks? Can we identify the Mrs. Roosevelts who can bridge disparate communities and allow people to learn from other residents they’re unlikely to have met without her? Can cities be designed to facilitate that social role? More significantly, developing empirical strategies for describing social properties in cities allows for the development of a coupled understanding of cities as composed of both physical infrastructure, which has its own set of laws, behaviors, and properties, and also complex social processes. These two different domains – the social and the physical – are obviously deeply interactive. Human social processes underlie the design and construction of all physical infrastructure. Likewise, our lives are shaped and constrained by the infrastructure of the city we live in. Thus, it seems logical that a fundamental understanding of the science of cities requires a coupled understanding of both the social and physical processes that allow cities to exist.

Urban Scaling Theory In 1949 George Zipf described an observation on the distributions of word frequency in text corpora and the distribution of city sizes within nations (Zipf, 1949). Zipf was a linguist and noticed the relationship first with word frequencies in common

1812

C. Brelsford and T. Martin

text copra: the most frequently used word in some text (with rank 1) is used approximately twice as often the word ranked 2 in frequency, which is used twice as often as the third ranked word, and so on. Zipf postulated that this empirical relationship was the result of balancing the effort of maintaining and understanding a large vocabulary and clear, concise communication of ideas. Zipf’s law, as most typically written is Pn ∼ 1/na

(4)

where Pn is the frequency of a word of rank n and a is approximately 1. This observation also directly translated in the distribution of city sizes within a nation, where P in the equation above becomes the population of the city with rank n. Zipf suggested that this was also the outcome of a constrained optimization problem on the part of firms and individuals. This empirical relationship was further developed into some concepts that later become a foundational part of urban scaling theory: there is a (consistent, nonlinear) relationship between a city’s population and the aggregate number of wage earners, gross receipts, etc. (Zipf didn’t postulate these relationships directly in terms of population, but the beginnings of the idea are there.) In the 1990s, allometric scaling theory in about metabolic processes was being developed at the Santa Fe Institute (West et al., 1999). The core idea in this theory proposes that the observed 4/3 scaling relationship between metabolic rate and an animal’s mass is an inevitable consequence of the fact that all animals use the same “terminal units,” capillaries, to distribute nutrients and resources to each cell of the organism. Capillaries are essentially size invariant across enormous variation in animal mass: the capillaries of a shrew are essentially the same as the capillaries in a blue whale. The rest of the circulatory system is tuned to move nutrients in blood from the heart to the capillaries as efficiently as possible while maintaining the capacity to deliver nutrients to each cell in the body. The 4/3 exponent comes from the combination of geometric constraints and biological constraints. Later, Zipf’s observations about city size distributions and the theoretical framing derived from allometric scaling theory were applied to cities and became urban scaling theory (Bettencourt, 2013; Bettencourt et al., 2007a) . These papers propose an empirical relationship between city size and a broad suite of socioeconomic outcomes including aggregate GDP, innovation markers like patents, as well as more negative urban outcomes such as crime and disease. Bettencourt and colleagues propose a scaling relationship in the same functional form as the allometric (biological) scaling result: Y = Y0 ∗ N β

(5)

where N is a city’s population, β is the exponent, Y0 is a constant, and Y is an aggregate socioeconomic outcome. Bettencourt proposes a theoretical value of β = 7/6 for most socioeconomic outcomes and β = 5/6 for most infrastructurebased outcomes. The sub-linear scaling observed for infrastructure outcome (road

70 The Mathematical Foundations of the Science of Cities

1813

lane miles, sewer system capacity, electrical grid capacity) is a result of the fact that this infrastructure can be more intensively used in larger and therefore denser urban environments. The superlinear scaling observed for socioeconomic outcomes results from the fact that most outputs of a city arise from novel interpersonal interactions, and there is a greater potential for this in larger populations. These outcomes are the result of a complex interplay between social processes and biophysical processes: a core area of research today. This relationship has been explored theoretically (Bettencourt et al., 2007a, 2010; Gomez-Lievano et al., 2016) and empirically observed across a broad range of countries and development status in modern times (Bettencourt and Lobo, 2016; Bettencourt et al., 2007a; Brelsford et al., 2017; Gomez-Lievano et al., 2016), in ancient society (Ortman et al., 2015), and across many different urban characteristics. Superlinear scaling is expected and observed for a broad range of social characteristics including GDP, wealth, innovation, serious crime, and infectious disease (Bettencourt et al., 2007a,b; O’Clery et al., 2016; Patterson-Lomba et al., 2015; Youn et al., 2016). Linear scaling relationships are expected and observed for characteristics relating to individual human needs, such as firms, employment, housing, and household water and electricity consumption (Bettencourt et al., 2007a). Sub-linear scaling relationships are expected and observed for many physical characteristics of the city: density, urban area, road lane miles, and other infrastructure characteristics (Bettencourt et al., 2007a; Samaniego and Moses, 2008). While urban scaling theory is well supported in the literature, there are disagreements, which are chiefly centered around the areal extent that should be used to describe a city (Arcaute et al., 2015). One paper has considered the time stability of deviations from urban scaling results at the city level or across urban characteristics (Bettencourt et al., 2010). Deviations in patenting rates by city are quite temporally consistent (Bettencourt et al., 2007b), but more careful study of these deviations might suggest policy tools to, for example, decouple deviations in crime and disease from GDP deviations.

Conclusions In past work on urban planning, some scholars have primarily focused on the geometric, spatial, or infrastructure-based properties of cities, and some have focused primarily on social properties of urban environments. The mathematical language to describe the geometric properties of cities was developed many centuries ago, while the graph theoretical language to describe connectivity via roads, streets, and infrastructure is comparatively recent. The mathematical foundation for describing social processes has developed much more recently, and it’s only with the recent advent of digital communication that scientists have had an empirically grounded approach based on large-scale measurement of social interactions and social processes. This makes it possible, for the first time, to take a systematic and empirical perspective to the kinds of ideas Jane Jacobs was writing about and then move forward with a science of cities which combines mathematical perspectives on

1814

C. Brelsford and T. Martin

both a cities physical infrastructure and its social processes. These processes have historically been studied in isolation even though they are deeply intertwined. The social life of a city cannot exist with the infrastructure to facilitate it, and likewise infrastructure with no people is merely a shell. Current research in urban systems and complexity science is seeking new strategies for incorporating both social processes and physical processes in our understanding of cities. This coupled understanding will support better theory, prediction, and modeling. For example, urban scaling work of Bettencourt, West, and colleagues does treat infrastructure-based outcomes and socioeconomic outcomes as related processes within the same theoretical framework. Their work primarily considers the aggregate properties of cities. The next branch of the research perspective is to look at fine spatial scale analysis, another development made possible in large part by the increasing availability of fine-scaled data. Brelsford, Martin, Hand, and Bettencourt recently published a paper observing within-city topological characteristics (Brelsford et al., 2018). The mathematical perspective of this paper is purely infrastructure based. In this chapter, we consider the connectivity of streets and pathways rather than the geometry – which allows a purely topological definition of urban streets. For most places within an urban environment, connection to the urban infrastructure network is topologically simple: there is (at least) one direct access pathway. For these places, the topological properties of the places are then completely defined by the access system. However, some places, predominantly in slums and informal settlements, many homes, offices, and buildings, do not have direct road access. For these blocks and neighborhoods, the topological properties of the road network do not fully describe the area, and a more complex perspective is needed. As cities proliferate and grow, the question of ideal city form has become a central question affecting urban sustainable development and economic change for billions of people worldwide. There have been many studies analyzing urban geometries and providing taxonomies of spatial plans, but the central issue of how to best plan a city has remained unresolved. This has huge consequences for people who live in places like Dharavi, a slum in the middle of Mumbai, India, that is home to uncounted hundreds of thousands, and Khayelitsha, a township in Cape Town, South Africa. Brelsford, Martin, Hand, and Bettencourt show that the answer to this question is not geometric but topological. They show that cities can be decomposed into two types of networked spaces (the access system and places) and prove that these spaces display universal topological characteristics common to all cities. Networks of urban infrastructure develop a universal topology because they surround and connect places of work and residence, solving the central problem of cities: the circulation of people, goods, and information. Any city block, anywhere in the world, can be characterized mathematically by a spatial graph and classified topologically based on a hierarchy of weak graph duals: when the second weak dual is a tree, a block is isomorphic to the surrounding infrastructure; otherwise there are internal places that lack public access. The block complexity is defined to be the number of weak duals that exist for any given block.

70 The Mathematical Foundations of the Science of Cities

1815

Fig. 10 Reproduced from Brelsford et al. (2018)

The block complexity for individual blocks, as in Fig. 10, and also for all the blocks within a neighborhood is calculated. Figure 11 shows a neighborhood in Harare, Zimbabwe, in which a substantial portion of blocks contain parcels without direct road access, and also demonstrates that the substantial heterogeneity in space available and infrastructure access in even very poor communities. When places don’t have access, as this block from Harare, Zimbabwe, in Fig. 10d, this topological language can be used to express the central difficulty of developing cities and neighborhoods (access to infrastructure and the circulation of goods, people, and ideas) as a rigorous mathematical problem that can be solved optimally through the introduction of infrastructure networks into urban blocks. Brelsford, Martin, Hand, and Bettencourt have developed a computational algorithm that can be used to propose solutions to this access problem with minimal disruption and cost for blocks that are both simple and complex. Within any given block, they identify the most deeply nested parcel within that block. They then identify the set of all possible paths from any node on that parcel to any node on the surrounding infrastructure and preferentially select short paths from the set of paths based on a polynomial function of the logarithm of path length. More formally, they define a probability distribution p(l) where l is a path with length l, where 1/ l n p(l) =  lin . The selection of n can be tuned to penalize long paths more or less i

1816

C. Brelsford and T. Martin

Fig. 11 The topological complexity of blocks within the Epworth Neighborhood in Harare, Zimbabwe. All parcels in blocks with complexities 1 and 2 have direct road access. For blocks with complexities 3, 4, and 5, at least one parcel does not have direct road access

strictly. Once a path has been selected, a road is proposed along that pathway, and then the entire block is reanalyzed based on the new infrastructure topology. This process is repeated until all parcels have direct access to the infrastructure system. This is a simple application of mathematical techniques to facilitate work that people who live in slums and informal settlements are already doing to improve their lives and communities. It is explicitly not intended to provide a single correct answer but to facilitate discussion and to allow residents to have robust discussions about neighborhood structure and form. Many purely graph theoretic models of urban infrastructure don’t distinguish between different types of places, and therefore the ability to incorporate social functions and perspectives was accidentally lost. The algorithms that were developed in Brelsford et al. (2018) currently focus only on physical infrastructure. However, it is algorithmically trivial to incorporate additional weights based on community input – for example, more heavily weighting path distance to community facilities or other important infrastructure such as toilets or water pumps. The probabilistic approach used was developed specifically to elicit community

70 The Mathematical Foundations of the Science of Cities

1817

input and therefore should interface smoothly with community-directed processes of social organization. In the future, it’s plausible that one could use digital trace data such as movement tracks, online communication patterns, or cell phone call and text records to describe networked social processes within urban spaces. These spatially embedded social processes could then be described with equal rigor and detail as the physical infrastructure has been described.

References Arcaute E, Hatna E, Ferguson P, Youn H, Johansson A, Batty M (2015) Constructing cities, deconstructing scaling laws. J R Soc Interface 12(102):20140745 Bafna S (2003) Space syntax a brief introduction to its logic and analytical techniques. Environ Behav 35(1):17–29 Bettencourt LMA (2013) The origins of scaling in cities. Science 340(6139):1438–1441. 00070 Bettencourt LMA, Lobo J (2016) Urban scaling in Europe. J R Soc Interface 13(116):20160005 Bettencourt LMA, Lobo J, Helbing D, Kuhnert C, West GB (2007a) Growth, innovation, scaling, and the pace of life in cities. Proc Nat Acad Sci 104(17):7301–7306 Bettencourt LMA, Lobo J, Strumsky D (2007b) Invention in the city: increasing returns to patenting as a scaling function of metropolitan size. Res Policy 36(1):107–120 Bettencourt LMA, Lobo J, Strumsky D, West GB (2010) Urban scaling and its deviations: revealing the structure of wealth, innovation and crime across cities. PLoS ONE 5(11):e13541 Brelsford C, Lobo J, Hand J, and Bettencourt LMA (2017) Heterogeneity and scale of sustainable development in cities. Proc Nat Acad Sci 114(34):8963–8968 Brelsford C, Martin T, Hand J, Bettencourt LMA (2018) Toward cities without slums: topology and the spatial evolution of neighborhoods. Sci Adv 4(8):eaar4644 Gomez-Lievano A, Patterson-Lomba O, Hausmann R (2016) Explaining the prevalence, scaling and variance of urban phenomena. Nat Hum Behav 1(1):s41562–016–0012–016 Howard E (1902) Garden cities of tomorrow. Swan Sonnenschein & Co., Ltd., London Jacobs J (1992) The death and life of great American cities. Vintage Books, New York Jiang B, Claramunt C (2004) Topological analysis of urban street networks. Environ Plan B Plan Des 31(1):151–162 Masucci AP, Stanilov K, Batty M (2014) Exploring the evolution of London’s street network in the information space: a dual approach. Phys Rev E 89(1):012805 O’Clery N, Gomez-Lievano A, Lora E (2016) The path to labor formality: urban agglomeration and the emergence of complex industries. CID Working Paper 78, Center for International Development at Harvard University, Oct 2016 Ortman SG, Cabaniss AHF, Sturm JO, Bettencourt LM (2015) Settlement scaling and increasing returns in an ancient society. Sci Adv 1(1):e1400066 Patterson-Lomba O, Goldstein E, Gómez-Liévano A, Castillo-Chavez C, Towers S (2015) Per capita incidence of sexually transmitted infections increases systematically with urban population size: a cross-sectional study. Sex Transm Infect 91(8):610–614 Porta S, Crucitti P, Latora V (2006) The network analysis of urban streets: a dual approach. Phys A Stat Mech Appl 369(2):853–866 Ratti C (2004) Space syntax: some inconsistencies. Environ Plann B Plann Des 31(4):487–499 Read S (1999) Space syntax and the dutch city. Environ Plan B Plan Des 26(2):251–264 Samaniego H, Moses ME (2008) Cities as organisms: allometric scaling of urban road networks. J Transp Land Use 1(1):21–39 Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440–442

1818

C. Brelsford and T. Martin

West GB, Brown JH, Enquist BJ (1999) A general model for the structure and allometry of plant vascular systems. Nature 400(6745):664–667. 00784 Youn H, Bettencourt LMA, Lobo J, Strumsky D, Samaniego H, West GB (2016) Scaling and universality in urban economic diversification. J R Soc Interface 13(114):20150937 Zipf GK (1949) Human behavior and the principle of least effort: an introduction to human ecology. Martino Publishing [u.a.], Mansfield Centre, Conn. OCLC: 935178490

Gilles Deleuze’s The Fold: Calculus and Curvilinear Design

71

Menno Hubregtse

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deleuze’s The Fold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Fold and Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Greg Lynn on Folded Architecture, Blobs, and Animate Form . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1820 1821 1822 1824 1830 1830 1830

Abstract This chapter addresses how Gilles Deleuze’s The Fold: Leibniz and the Baroque has affected architectural design and theory. It summarizes key concepts in The Fold, and it discusses how architects and architectural historians have drawn upon this philosophical text to analyze architectural design processes and the built environment. In The Fold, Deleuze argues that Gottfried Wilhelm Leibniz’s philosophy exemplifies the Baroque, the predominant style of art, architecture, and music created in Europe during the seventeenth and early eighteenth centuries. Leibniz’s conception of matter as forces correlates with the curvilinearity and dynamism that characterizes this style. This overview concentrates on how calculus, a mathematical method that Leibniz invented in the late seventeenth century, inspired his notions of matter, the soul, and perception. Moreover, it addresses how Deleuze interprets these concepts in The Fold, and it explains how he uses architectural examples to clarify his arguments.

M. Hubregtse () Art History and Visual Studies, University of Victoria, Victoria, BC, Canada e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_104

1819

1820

M. Hubregtse

Shortly after The Fold was published in French in 1988, architects such as Peter Eisenman began to create designs inspired by Leibniz’s thoughts on ontology and matter. During the 1990s, Greg Lynn employed Deleuze’s reading of Leibnizian philosophy to assess contemporary architecture and design processes. This chapter discusses how his publications on “folding” architecture draw an analogy between Leibniz’s calculus-inspired notions of matter and the new digital tools that architects were using to create and manipulate forms. It also addresses critiques of “folding” architecture and digital design practices.

Keywords Deleuze · Fold · Leibniz · Calculus · Architecture · Greg Lynn · CAD · Digital design · Animate form

Introduction Throughout history, architects have looked to philosophical texts for inspiration for their designs. During the 1980s, architects such as Peter Eisenman and Bernard Tschumi created buildings and plans based on Jacques Derrida’s writing on deconstruction. Within a few years, Gilles Deleuze eclipsed Derrida in terms of popularity among architects and architectural theorists. Greg Lynn (1993a, 1996, 1999), for instance, promoted Deleuze’s concepts in The Fold: Leibniz and the Baroque (1993; first published in 1988 in French as Le Pli: Leibniz et le Baroque) since they aligned with contemporary changes occurring within architectural design. During the early 1990s, architects began to use digital tools to design complex structures, and Gottfried Wilhelm Leibniz’s notion of matter as forces is an apt metaphor for the newfound flexibility enabled by computer software. What’s more, Leibniz’s philosophy derives from his invention of calculus, and computer-aided design (CAD) programs use this form of mathematics to generate and manipulate forms. This chapter begins with an overview of the main topics in The Fold, particularly those that pertain to calculus. Deleuze explains how Leibniz’s conception of matter, the soul, and perception relate to differential equations, and he argues that Leibniz’s philosophy typifies the Baroque – a style that appears in the late sixteenth century and that characterizes European art, architecture, and music during the seventeenth and early eighteenth centuries. Deleuze refers to a range of actual and fictional building designs to illustrate Leibniz’s ideas. After a brief overview of the principal examples of architecture and design employed in The Fold, this chapter considers other analyses of the built environment that draw upon Deleuze’s interpretation of Leibniz’s philosophy. The latter part of this chapter concentrates on Lynn’s publications on contemporary “folding” architecture which popularized ideas in The Fold among architects and architectural theorists. It examines why Lynn draws upon Leibniz’s conception of matter to champion a new approach to architecture that responds to external influences and that differs from Deconstructivism. Lynn also refers to Leibniz’s

71 Gilles Deleuze’s The Fold: Calculus and Curvilinear Design

1821

philosophy and calculus in order to support his analyses of CAD design practices and the forms they generate. This chapter concludes with an overview of the critical assessments of Lynn’s contributions to architectural theory.

Deleuze’s The Fold For Deleuze, Leibniz’s philosophical work, which appears in books, published essays, and letters, exemplifies the Baroque’s curvilinear aesthetic. Paintings, sculptures, and buildings in this style incorporate numerous dramatic curves, folds, and swirls to convey a sense of dynamism and to affect the viewers’ emotions. Deleuze, however, does not suggest that there is a causal connection between Leibniz’s writing and the Baroque. Rather, he demonstrates how central aspects of Leibniz’s thoughts on matter and ontology resemble this style’s qualities. Indeed, Leibniz publishes his philosophical tracts during the late seventeenth and early eighteenth centuries, decades after Gian Lorenzo Bernini completed two of the most quintessential Baroque works, the Baldacchino (1634) in St. Peter’s Basilica and the Ecstasy of Saint Teresa (1652) in the Cornaro Chapel. Deleuze does not simply examine Leibniz’s philosophy with regard to Baroque cultural products. Rather, he considers how Leibniz’s ideas are analogous to those of modern philosophers and poets such as Alfred Whitehead, Henri Bergson, and Stéphane Mallarmé. Why does Deleuze equate the sculpted curves and folds of Baroque art and architecture with Leibniz’s philosophy? Leibniz’s ideas apropos matter, perception, and the soul mirror his mathematical conception of curvature, specifically with regard to calculus. Many of his philosophical works, such as Theodicy (1710) and Monadology (1714), were written after his discovery of calculus (he published his research on differential equations in 1684). Leibniz’s notion of a tangent as an infinitesimal, a value expressed via his differential equation, resembles his conception of matter. Deleuze notes that for Leibniz a physical entity has “cohering parts that form a fold, such that they are not separated into parts of parts but are rather divided to infinity in smaller and smaller folds that always retain a certain cohesion” (1993, 6). Moreover, a body does not consist of divisible units such as grains of sand but resembles a piece of fabric with an infinite number of folds. What’s more, the size, shape, and motion of these folded forms of matter are defined by “the pressure of surrounding forces,” much like a tangent is defined by its curve (1993, 6). Leibniz proposes a monist ontology that contrasts with René Descartes’s dualistic philosophy, where the mind and body are separate entities. With regard to the soul, Leibniz posits that it is within the body and occupies a “point of view.” Deleuze describes this point in mathematical terms: “Moving from a branching of inflection, we distinguish a point that is no longer what runs along inflection, nor is it the point of inflection itself; it is the one in which the lines perpendicular to tangents meet in a state of variation. It is not exactly a point but a place, a position, a site, a ‘linear focus,’ a line emanating from lines. To the degree it represents variation or inflection, it can be called point of view” (Deleuze 1993, 19; italics

1822

M. Hubregtse

in original). Deleuze explains that there are an infinite number of souls, each with their own unique point of view. Furthermore, each point of view encompasses the entire world within it. Leibniz defines the soul as a monad, and he argues that the relations between monads occur according to a pre-established harmony (see Grene and Ravetz [1962] for a mathematical interpretation of Leibniz’s proposed pre-determined system). Each monad contains the world within itself, and this constitutes an infinite number of perceptions. Deleuze writes: “Since it does not exist outside of the monads that convey it, the world is included in each one in the form of perceptions or ‘representatives,’ present and infinitely minute elements” (1993, 86; italics in original). But, not all of these perceptions are consciously registered – only those which are remarkable. He discusses how a differential equation allows a monad to consciously perceive what is notable. For instance, the color green is the product of a differential relation among blue and yellow, and Deleuze illustrates this relationship with the following notation: db dy (1993, 88). Hunger becomes noticeable when it is the product of a differential relation among parts such as “a lack of sugar, butter, etc.” (1993, 88). In short, the infinite number of minute ordinary perceptions, which constitute the world, are obscure and located within the perceiving monad, and differential relations among some of these inconspicuous elements bring forth a clear and consciously registered perception such as hunger.

The Fold and Architecture Deleuze employs a number of architectural examples to demonstrate his theoretical concepts in The Fold. He refers to actual buildings as well as conceptual structures to explain his thoughts. The most notable example is the “Baroque House (an allegory),” which appears in the first chapter (1993, 5). This architectural metaphor illustrates how the body and soul are inseparable. They occupy two floors of a house that fold over each other. The body is located on the lower floor, and the soul is on the upper floor. The upper room has no windows, whereas the lower room has small openings which allow the outside world to enter as sensations. Hélène Frichot (2005) considers the Baroque House in terms of broader themes in Deleuze’s oeuvre as well as other architectural metaphors employed in his texts. Ntovros Vasileios (2009) draws upon a visual analysis of Guarino Guarini’s design for the San Lorenzo in Turin (1668–1687) to discuss the layout of Deleuze’s Baroque House and how it pertains to his arguments in The Fold. He also describes how inflection points and folds appear in this building’s plan and interior. When looking upward in the San Lorenzo, Vasileios notes that the dome’s ribs lead the eye to their springing points, which are located above the pendentives and arches that support the dome. He posits that this space of transition between the church’s dome and lower structure is analogous to the fold that exists between the two floors in Deleuze’s Baroque House. Deleuze also draws upon contemporary design technologies to elucidate his concepts regarding folds, curves, and Leibniz’s calculus. He invents the word

71 Gilles Deleuze’s The Fold: Calculus and Curvilinear Design

1823

“objectile” to refer to Bernard Cache’s investigations into industrially produced variable non-standard forms (1993, 19; italics in original). Cache, an architect, was working with Patrick Beaucé and developers at Missler Software to research how designers could use digital tools to modulate an object’s form (Cache and Girard 2013). Deleuze explains that this objectile is “manneristic, not essentializing: it becomes an event” (1993, 19). He also refers to Cache’s insights on inflection in Terre meuble, a manuscript that investigates furniture design with regard to “geographical” folds. It was published in English as Earth Moves (1995). Cache studied with Deleuze at the University of Paris VIII at Vincennes-St. Denis, and he named his architectural and digital design firm “Objectile.” Deleuze’s The Fold has had a substantial influence on architectural design since its publication. Lynn (1993b, 1996, 1998, 1999) has helped circulate Deleuze’s ideas on Leibniz’s philosophy among architects and architectural theorists via his numerous publications on folded architecture, architectural curvilinearity, blobs, and animate form. Some critics, however, have chastised the resultant folding architecture as simple applications of folded forms that are not analogous with Leibniz’s philosophy or Deleuze’s ideas in The Fold (Jobst 2013; Vidler 2000). Nonetheless, Lynn’s contributions are significant in terms of this overview since he concentrates on themes pertaining to calculus and how this mathematical method is used in contemporary architectural design. His publications and designs are discussed in the following section. A number of architectural theorists and historians employ concepts in The Fold in their analyses of historic and contemporary architecture and design. Paul Harris (2005), for example, draws upon Deleuze’s notion of folds to consider Simon Rodia’s Watts Towers, built in Los Angeles between 1921 and 1955. He discusses how Rodia’s method of construction is a bottom-up process that differs from Lynn’s and Cache’s conceptions of folded architecture and design. Martin Prominski and Spyridon Koutroufinis (2009) discuss how Deleuze’s writing on the Baroque is applicable to contemporary landscape design, and they argue that designers should not simply create folded forms inspired by the French philosopher’s ideas. They illustrate how Carlos Ferrater and Bet Figueras’s Jardí Botànic de Barcelona, Schweingruber Zulauf’s Administration of Canton Zug, and the Versailles École Nationale Supérieure du Paysage’s plans for a flooded area in Redon recall key concepts in The Fold. Simon O’Sullivan (2006) relies on Deleuze’s description of the Baroque and his thoughts on the arts to analyze Ivan Chtcheglov’s writing on utopian cities and spaces of experimentation. Robert Porter (2009) applies ideas from The Fold to consider Belfast’s built environment. Don Handelman (2010) considers how Jerusalem’s security barriers, city walls, Yad Vashem museum, and Santiago Calatrava’s Chords Bridge are folded into the city’s topology and why this matters in terms of Israeli politics and urban space. Tom Lundborg (2012) examines the plans for New York’s “Ground Zero” – the site of the World Trade Center towers destroyed on September 11, 2001 – as modes of folding in response to a traumatic event. Lundborg, however, draws upon Deleuze’s notion of folding processes based on Michel Foucault’s philosophical tracts rather than his conception of Leibnizian folds and calculus. Menno Hubregtse (2017) explains how Eero Saarinen’s Trans

1824

M. Hubregtse

World Airlines Terminal at John F. Kennedy International Airport (New York, 1962) can be read in terms of Leibniz’s and Descartes’s conceptions of mathematical curves and matter. While the building’s curvilinear aesthetic evokes Leibniz’s calculus and a dynamic sense of matter, its non-adaptable physical structure calls to mind Descartes’s conception of geometric curves and matter as extensions in space.

Greg Lynn on Folded Architecture, Blobs, and Animate Form Lynn introduced Deleuze’s ideas in The Fold to a broad architectural audience in his guest-edited March-April (1993a) issue of Architectural Design, which is titled Folding in Architecture. It includes a number of essays and short articles that examine “a new pliant, flowing architecture” that started to emerge in the early 1990s (Powell 1993). Kenneth Powell’s opening essay, “Unfolding folding,” explains that this new “approach” departs from Postmodern and Deconstructivist architecture which largely focused on contradiction. Indeed, Robert Venturi’s Complexity and Contradiction in Architecture (1966) prompted architects to move away from the Modern Movement’s functionally efficient and, typically, homogenous forms. Deconstructivist architecture, which is defined by fragmented shapes and shards, was showcased in a 1988 exhibition at the Museum of Modern Art in New York. The show’s curators, Philip Johnson and Mark Wigley (1988), alleged that this type of architecture was subversive and shared an affinity with Russian Constructivism from the 1920s. What’s more, Deconstructivist architecture is also related to Derrida’s philosophical conception of deconstruction. Two of the architects featured in the show, Tschumi and Eisenman, collaborated with Derrida to consider how his theory applied to architectural design (Benjamin 1988; Derrida 1989; Eisenman 1988; Kipnis and Leeser 1997; Tschumi 1986, 1988). The latter architect’s work appears in Lynn’s (1993a) guest-edited issue of Architectural Design which focuses on folding architecture. Eisenman (1991, 1993a, b) describes how his design for Frankfurt’s Rebstockpark incorporates folded forms inspired by Deleuze’s writing in The Fold (Fig. 1). He states that “the idea of folding was used on the site to initiate new social organisations of urban space and to reframe existing organisations” (1993b, 27). Powell (1993) explains that the folding architecture featured in the (1993a) issue of Architectural Design was created by a number of architects whose previous designs were Deconstructivist but who have abandoned its confrontational approach. In Lynn’s essay for the issue, he claims that the “contradictory logic [of Deconstructivism] is beginning to soften in order to exploit more fully the particularities of urban and cultural contexts” (1993b, 9). Lynn’s (1993b) article also equates Deleuze’s ideas in The Fold and René Thom’s catastrophe theory with a new type of architectural curvilinearity that responds to external influences. He contends that the architects featured in the (1993a) issue have folded their buildings in relation to immediate political, structural, economic, and contextual concerns. Lynn suggests that this is an active process and that these folded forms are analogous to viscous fluids that morph according

71 Gilles Deleuze’s The Fold: Calculus and Curvilinear Design

1825

Fig. 1 Eisenman Architects, Rebstockpark Masterplan, Frankfurt am Main, Germany, 1990– 1992. (Image credit: Courtesy Eisenman Architects)

to their surroundings. To be sure, Lynn’s conception of architectural curvilinearity recalls Leibniz’s conception of matter as a material defined by adjacent pressures and forces. Lynn refers to Eisenman’s Rebstockpark to support his argument. Eisenman’s plan morphs a rectilinear building type, the siedlung housing block, in conjunction with the landscape’s contours (Fig. 1). Similarly, Frank Gehry’s Guggenheim Museum in Bilbao, Spain, employs curvilinear shapes that respond to the city, the river, and the neighboring bridge and roadways. Lynn stresses that his notion of architectural curvilinearity is not merely a stylistic application of folded forms but an approach that integrates external factors: “Rather than speak of the forms of folding autonomously, it is important to maintain a logic rather than a style of curvilinearity. The formal affinities of these projects result from their pliancy and ability to deform in response to particular contingencies” (Lynn 1993b, 14–15). John Rajchman’s (1991) essay on the Rebstockpark competition documents discusses how Eisenman’s design incorporates and exemplifies ideas in Deleuze’s Le Pli, and he notes that “electronic modeling” allows architects to create increasingly complex designs (62). Lynn’s (1993b) article on architectural curvilinearity explains that architects were using computer programs to help create the forms that evoke Deleuze’s notion of folded matter. In a retrospective essay on his (1993a) guest-edited issue of Architectural Design, Lynn (2004a) notes that he highlighted experimental digital designs from the early 1990s which foreshadowed the architectural forms created solely with CAD programs. Some of the architectural designs featured in the (1993a) issue, however, were not formed exclusively with CAD technologies. Gehry, for example, drew a number of sketches for the Guggenheim Museum in Bilbao in 1991, and he used CATIA software to create a digital model and estimate construction costs (Ragheb 2001; Rappolt and Violette

1826

M. Hubregtse

Fig. 2 Frank Gehry, Nationale-Nederlanden Building, Prague, 1992–1996. (Photo: Menno Hubregtse, 2013)

2004). The Guggenheim Museum, which was completed in 1997, was the first major commission where Gehry employed this design process. He has used CATIA to realize the complex forms of his sketches and sculpted models for a number of subsequent projects. The Nationale-Nederlanden Building in Prague, which Gehry began to design in 1992, has a form that, like the Guggenheim Museum, responds to its riverfront site (Figs. 2 and 3). Lynn’s retrospective account (2004a) also explains that calculus was the primary concern in his analysis of architectural curvilinearity. He wrote this essay for a revised edition of Folding in Architecture (Lynn 2004b). In its preface, Helen Castle (2004) argues that digital architectural design, and its use of calculus, does not merely indicate a technological innovation; it “represent[s] the same sort of fullscale perceptual and tectonic shift” that occurred when Filippo Brunelleschi began to create perspectival drawings of buildings such as Florence’s Baptistery during the early fifteenth century (7). The revised edition also contains Mario Carpo’s (2004) insights on how folding architecture had developed since the original issue was published in 1993. By 2004, architects were typically using CAD software to design buildings. He explains that complex curvilinear forms are no longer prohibitively expensive to build since CAD programs have reduced the cost of manufacturing non-standard parts. Carpo alleges that the mass production of standardized building components “began with the mechanical phase of the Industrial Revolution – and

71 Gilles Deleuze’s The Fold: Calculus and Curvilinear Design

1827

Fig. 3 Frank Gehry, Nationale-Nederlanden Building, Prague, 1992–1996. (Photo: Menno Hubregtse, 2013)

ended with it” and that “non-standard production has opened for business and is here to stay” (2004, 18). Carpo also notes that the language regarding curvilinear architecture had changed: “Folds became blobs” (2004, 17). Here, he is referring to a new term that Lynn began to use in 1996 to designate rounded architectural forms. In “Blobs, or why tectonics is square and topology is groovy,” Lynn (1996) considers the shape of contemporary curved designs – such as Shoei Yoh’s roof structures – in terms of blobs in Hollywood films as well as philosophical assessments of fluidity and form. Like his earlier essay on architectural curvilinearity (1993b), Lynn posits that blobs meld with their environment while still retaining their own unique characteristics. He also draws an analogy between Leibniz’s monads and contemporary animation technologies used to model blob-like forms. In an interview with Mark Rappolt in 2003, Lynn stated that the term “blob” derived from a module in Wavefront software: “it was an acronym for Binary Large Object – spheres that could be collected to form larger composite forms” (Lynn cited in Rappolt 2003). In Animate Form, Lynn (1999) investigates the dynamic aspect of designing architecture with computer visualizations and parametric modeling. He alleges that CAD programs, which use mathematical models such as non-uniform rational basis spline (NURBS) curves, allow architects to embed virtual forces and motion within the constructed design: “Instead of a neutral abstract space for design, the context for design becomes an active abstract space that directs form within a current of forces

1828

M. Hubregtse

that can be stored as information in the shape of the form” (1999, 11). According to Lynn, CAD programs allow architects to embed temporality and flows into their designs since they rely on differential equations to formulate architectural forms; buildings drawn on paper with rulers and compasses, on the other hand, simply plot fixed coordinates in space. Lynn uses a sailboat hull as an illustrative example to clarify how virtual motions can be embedded into a form. The hull is modeled such that it planes smoothly when sailing downwind. It leans sideways and presses against the water when tacking into a headwind. Even though a sailboat’s hull does not change as a physical object, it incorporates a number of potential motions into its design. Similarly, an undulating landscape contains the forces that carved its hills and valleys, and these slopes contain virtual motions that permit objects to roll down their surface. Like Lynn’s (1993b, 1996) analyses of architectural curvilinearity and blobs, he incorporates Leibniz’s ideas regarding monads, forces, and matter to support his argument: “Once design is posed within a Leibnizian monadological space, architecture may embrace a sensibility of micro and macro contextual specificity as a logic that can not be idealized in an abstract space of fixed coordinates” (1999, 15). Lynn’s competition entry in 1994 for a gateway that accesses the Port Authority Bus Terminal in New York City demonstrates this approach (Fig. 4). Using Wavefront software, Lynn based the structure’s form on the dynamic flows of pedestrians and vehicles (Greg Lynn Form n.d.) Timothy Lenoir and Casey Alt (2003) consider Lynn’s early adoption of CAD programs for architectural design in terms of contemporaneous innovations in bioinformatics. They posit that his animate design process resembles “the mapping of molecular energy landscapes” as well as computer visualizations of proteins as vectors (348). Lenoir and Alt note, however, that Lynn’s animate designs have been critiqued as a somewhat quixotic initiative since buildings are typically static structures (see, for instance, Speaks 2001). No matter how many forces are accounted for in the design process, the completed building is a fixed physical object. In addition, Lenoir and Alt note that some critics maintain that traditional architectural design practices are as dynamic and animate as Lynn’s use of CAD programs. Fig. 4 Greg Lynn FORM, competition entry, gateway for Port Authority Bus Terminal, New York City, 1994. (Image credit: © Greg Lynn FORM)

71 Gilles Deleuze’s The Fold: Calculus and Curvilinear Design

1829

A number of architects who define their buildings as “folding architecture” claim that their designs are dynamic. In Lynn’s (1993a) guest-edited issue of Architectural Design, Eisenman describes the Alteka Office Building in Tokyo in terms of philosophical conceptions of becoming and ontology: “The building evades its cartesian definition: not representing an essential form, but a form ‘becoming’” (1993c, 28). Even though Eisenman’s model suggests a morphing and moving form, it does not change as a constructed physical object. Hubregtse (2017) illustrates how complex curvilinear designs – which connote Leibniz’s notion of matter as forces – are less dynamic than rectilinear designs when they are fabricated as built forms. He discusses how air terminals constructed with rectilinear modules are more adaptable than those with complicated curved forms. A number of overviews of CAD-designed architecture base their analyses upon Lynn’s theoretical assessments of The Fold, architectural curvilinearity, blobs, and animate form. These include investigations such as: Christian Pongratz and Maria Rita Perbellini’s (2000) overview of ten architects who began practicing when CAD programs became available during the 1990s; Alicia Imperiale’s (2000) analysis of innovations in the design of architectural surfaces; Paola Gregory’s (2003) assessment of how architects are using computer visualizations to create information “scapes” for their designs; Antoine Picon’s (2010) consideration of digital design tools and their effect on the urban environment; and Carpo’s (2011) discussion of how architects use CAD programs and how they differ from traditional architectural design practices. The tendency to associate digitally designed buildings with Deleuze’s ideas in The Fold has, in some cases, led to allegations that CAD-designed architecture is a new iteration of the Baroque (see, for instance, Massumi 1998). Michael Ostwald (2006) critiques this assessment, and he argues that contemporary CAD-designed buildings share few similarities with Baroque architecture and that these recent examples of folding architecture are usually more characteristic of Expressionist architecture. His analysis is based on an in-depth comparison between seventeenth-century Baroque buildings and contemporary structures in terms of their visual and material qualities as well as the social, cultural, and political conditions that influenced their designs. Nadir Lahiji (2016) refers to recent designs such as Gehry’s Guggenheim Museum in Bilbao as “neobaroque,” but he argues that they “signify a perversion in the original philosophical idea of the Baroque” (129; italics in original). In Ostwald’s (2006) analysis, he welcomes CAD programs for allowing architects to explore and shape new forms. But, he is also apprehensive of unforeseen social and cultural effects. Douglas Spencer (2016) argues that these new forms of architecture, inspired by Deleuze’s philosophical writing, epitomize a “neoliberal ideal of the post-political” (56). As noted above, Lynn draws upon The Fold to call for an architecture that is not contradictory but folds in response to its surroundings. Spencer argues that this type of architectural approach lacks criticality and is part of a larger trend where architectural design is merely “a service provider for the ‘real’ of the market” (72). Similarly, Adrian Parr (2013) notes that Lynn and Eisenman have merely created new forms based on their readings of Deleuze’s philosophy; he argues that these designs do not engage with social and political issues and that

1830

M. Hubregtse

“architects have chosen to turn a blind eye to the politics underpinning Deleuze’s work” (203). Simone Brott (1998) also alleges that Lynn’s notion of folding architecture is politically ineffective, and she suggests that Derrida’s contributions to Deconstructivism led to a more critical form architecture (see also Brott 2011).

Summary In The Fold, Deleuze describes how calculus inspired Leibniz’s conceptions of matter, the soul, and perception, and he posits that Leibniz’s philosophy typifies the Baroque. The Fold has had a significant impact on architectural design and theory as well as scholarly assessments of the built environment. Deleuze’s interpretation of Leibniz’s philosophy and calculus was published when architects were starting to experiment with CAD software. Lynn’s publications on folding architecture theorize the digital design process in terms of Leibniz’s calculus-inspired notions of matter, and they popularized ideas in The Fold among architects. Some critics, however, claim that folding architecture is based only a superficial application of Deleuze’s philosophy.

Cross-References  Baroque Architecture  Parametric Design: Theoretical Development and Algorithmic Foundation for

Design Generation in Architecture

References Alaçam S, Güzelci OZ, Gürer E, Bacıno˘glu SZ (2017) Reconnoitring computational potentials of the vault-like forms: thinking aloud on muqarnas tectonics. Int J Archit Comput 15(4):285–303. https://doi.org/10.1177/1478077117735019 Badiou A (1994) Gilles Deleuze, The fold: Leibniz and the Baroque. In: Boundas C, Olkowski D (eds) Gilles Deleuze and the theatre of philosophy. Routledge, New York, pp 51–69 Benjamin A (1988) Derrida, architecture and philosophy. In: Papadakis A (ed) Deconstruction in architecture. St Martin’s Press, New York, pp 8–11 Berressem H (2005) Multiplicity: foldings in architectural and literary landscapes. In: Benesch K, Schmidt K (eds) Space in America: theory, history, culture. Editions Rodopi, Amsterdam, pp 91–105 Brott S (1998) The form of form: the fold and architecture. Archit Theory Rev 3(2):88–111. https://doi.org/10.1080/13264829809478347 Brott S (2011) Architecture for a free subjectivity: Deleuze and Guattari at the horizon of the real. Ashgate, Farnham Bruno G (2003) Pleats of matter, folds of the soul. Log 1:113–122 Burns K (2013) Becomings: architecture, feminism, Deleuze – before and after the fold. In: Frichot H, Loo S (eds) Deleuze and architecture. Edinburgh University Press, Edinburgh, pp 15–39 Cache B (1995) Earth moves: the furnishing of territories (trans: Boyman A). MIT Press, Cambridge

71 Gilles Deleuze’s The Fold: Calculus and Curvilinear Design

1831

Cache B, Girard C (2013) Objectile: the pursuit of philosophy by other means? In: Frichot H, Loo S (eds) Deleuze and architecture. Edinburgh University Press, Edinburgh, pp 96–110 Carpo M (2004) Ten years of folding. In: Lynn G (ed) Folding in architecture. Wiley-Academy, Chichester, pp 14–19 Carpo M (2011) The alphabet and the algorithm: form, standards, and authorship in times of variable media. MIT Press, Cambridge Carpo M (2013) The digital turn in architecture 1992–2012. Wiley, Chichester Carpo M (2016) Parametric notations: the birth of the non-standard. Archit Des 86(2):24–29. https://doi.org/10.1002/ad.2020 Castle H (2004) Preface. In: Lynn G (ed) Folding in architecture. Wiley-Academy, Chichester, pp 6–7 Centre Georges Pompidou (2003) Architectures non standard: exposition présentée au Centre Pompidou, Galerie Sud, 10 décembre 2003–1er mars 2004. Centre Pompidou, Paris Deleuze G (1988) Le pli: Leibniz et le Baroque. Éditions de Minuit, Paris. English edition: Deleuze G (1993) The fold: Leibniz and the Baroque (trans: Conley T). University of Minnesota, Minneapolis Derrida J (1989) In discussion with Christopher Norris. In: Papadakis A (ed) Deconstruction II. St Martin’s Press, New York, pp 6–11 Duffy S (2010) Deleuze, Leibniz and projective geometry in the Fold. Angelaki 15(2):129–147. https://doi.org/10.1080/0969725X.2010.521401 Ednie-Brown P (2012) Vicious architectural circles: aesthetics, affect and the disposition of emergence. Archit Theory Rev 17(1):76–92. https://doi.org/10.1080/13264826.2012.661747 Egginton W (2005) Of Baroque holes and Baroque folds. In: Spadaccini N, Martín-Estudillo L (eds) Hispanic Baroques: reading cultures in context. Vanderbilt University Press, Nashville, pp 55–71 Eisenman P (1988) An Architectural Design interview by Charles Jencks. In: Papadakis A (ed) Deconstruction in architecture. St Martin’s Press, New York, pp 48–61 Eisenman P (1991) Unfolding events: Frankfurt Rebstock and the possibility of new urbanism. In: Eisenman P, Rajchman J, Geib J, Kohso S (eds) Unfolding Frankfurt. Ernst & Sohn, Berlin, pp 8–17 Eisenman P (1993a) Folding in time: the singularity of Rebstock. Archit Des 63(3–4):22–25 Eisenman P (1993b) Rebstock Park masterplan, Frankfurt. Archit Des 63(3–4):26–27 Eisenman P (1993c) Alteka office building, Tokyo. Archit Des 63(3–4):28–29 Frichot H (2005) Stealing into Gilles Deleuze’s Baroque House. In: Buchanan I, Lambert G (eds) Deleuze and space. Edinburgh University Press, Edinburgh, pp 61–79 Frichot H (2013) Deleuze and the story of the superfold. In: Frichot H, Loo S (eds) Deleuze and architecture. Edinburgh University Press, Edinburgh, pp 79–95 Galofaro L (1999) Digital Eisenman: an office of the electronic era. Birkhäuser, Basel Greg Lynn Form (n.d.) Port authority triple bridge gateway. http://glform.com/buildings/portauthority-triple-bridge-gateway-competition/. Accessed 28 Dec 2019 Gregory P (2003) New scapes: territories of complexity. Birkhäuser, Basel Grene M, Ravetz JR (1962) Leibniz’s cosmic equation: a reconstruction. J Philos 59(6):141–146. https://doi.org/10.2307/2022829 Handelman D (2010) Folding and enfolding walls: statist imperatives and bureaucratic aesthetics in divided Jerusalem. Soc Anal 54(2):60–79. https://doi.org/10.3167/sa.2010.540205 Harris PA (2005) To see with the mind and think through the eye: Deleuze, folding architecture, and Simon Rodia’s Watts Towers. In: Buchanan I, Lambert G (eds) Deleuze and space. Edinburgh University Press, Edinburgh, pp 36–60 Hills H (2007) The Baroque: beads in a rosary or folds of time. Fabrications 17(2):48–71. https:// doi.org/10.1080/10331867.2007.10539610 Hubregtse M (2017) Concrete curves: architectural curvilinearity, Descartes’ Géométrie, Leibniz’s calculus and Eero Saarinen’s TWA terminal. J Math Arts 11(4):223–239. https://doi.org/10. 1080/17513472.2017.1368001 Imperiale A (2000) New flatness: surface tension in digital architecture. Birkhäuser, Basel

1832

M. Hubregtse

Imperiale A (2002) Smooth bodies. J Archit Educ 56(2):27–30. https://doi.org/10.1162/ 10464880260472558 Jobst M (2013) Why Deleuze, why architecture. In: Frichot H, Loo S (eds) Deleuze and architecture. Edinburgh University Press, Edinburgh, pp 61–75 Johnson P, Wigley M (1988) Deconstructivist architecture. Museum of Modern Art, New York Kipnis J (1993) Towards a new architecture. Archit Des 63(3–4):40–49 Kipnis J, Leeser T (eds) (1997) Chora L works: Jacques Derrida and Peter Eisenman. Monacelli Press, New York Lahiji N (2016) Adventures with the theory of the Baroque and French philosophy. Bloomsbury, London Lenoir T, Alt C (2003) Flow, process, fold. In: Picon A, Ponte A (eds) Architecture and the sciences: exchanging metaphors. Princeton Architectural Press, New York, pp 314–353 Lundborg T (2012) The folding of trauma: architecture and the politics of rebuilding Ground Zero. Alternatives 37(3):240–252 Lynn G (ed) (1993a) Folding in architecture. Archit Des 63(3–4):1–96 Lynn G (1993b) Architectural curvilinearity: the folded, the pliant and the supple. Archit Des 63(3–4):8–15 Lynn G (1996) Blobs, or why tectonics is square and topology is groovy. ANY Archit N Y 14: 58–61 Lynn G (1998) Folds, bodies & blobs: collected essays. La Lettre volée, Bruxelles Lynn G (1999) Animate form. Princeton Architectural Press, New York Lynn G (2004a) Introduction. In: Lynn G (ed) Folding in architecture. Wiley-Academy, Chichester, pp 8–13 Lynn G (ed) (2004b) Folding in architecture. Wiley-Academy, Chichester Massumi B (1998) Sensing the virtual, building the insensible. Archit Des 68(5–6):16–24 O’Sullivan S (2006) Art encounters Deleuze and Guattari: thought beyond representation. Palgrave Macmillan, London. https://doi.org/10.1057/9780230512436 Ostwald M (2006) The architecture of the New Baroque: a comparative study of the historic and New Baroque movements in architecture. Global Arts, Singapore Paek S (2018) Fold as a non-spectacular event: the cases of Peter Eisenman’s Rebstockpark master plan (1990–1991) and the Aronoff Center for Design and Art (1988–1996). J Asian Archit Build Eng 17(3):385–392. https://doi.org/10.3130/jaabe.17.385 Parisi L (2009) Symbiotic architecture: prehending digitality. Theory Cult Soc 26(2–3):346–374. https://doi.org/10.1177/0263276409103121 Parr A (2013) Politics + Deleuze + Guattari + architecture. In: Frichot H, Loo S (eds) Deleuze and architecture. Edinburgh University Press, Edinburgh, pp 197–212 Picon A (2010) Digital culture in architecture: an introduction for the design professions. Birkhäuser, Basel Pongratz C, Perbellini MR (2000) Natural born CAADesigners: young American architects. Birkhäuser, Basel Porter R (2009) Deleuze and Guattari: aesthetics and politics. University of Wales Press, Cardiff Powell K (1993) Unfolding folding. Archit Des 63(3–4):6–7 Prominski M, Koutroufinis S (2009) Folded landscapes: Deleuze’s concept of the fold and its potential for contemporary landscape architecture. Landsc J 28(2):151–165. https://doi.org/10. 3368/lj.28.2.151 Ragheb JF (ed) (2001) Frank Gehry, architect. Guggenheim Museum Publications, New York Rajchman J (1991) Perplications: on the space and time of Rebstockpark. In: Eisenman P, Rajchman J, Geib J, Kohso S (eds) Unfolding Frankfurt. Ernst & Sohn, Berlin, pp 18–77 Rajchman J (1993) Out of the fold. Archit Des 63(3–4):60–63 Rajchman J (1998) Constructions. MIT Press, Cambridge Rappolt M (2003) Greg Lynn. Icon Magazine. https://www.iconeye.com/architecture/features/ item/2355-greg-lynn-%7C-icon-005-%7C-september-2003. Accessed 28 Dec 2019 Rappolt M, Violette R (2004) Gehry draws. MIT Press, Cambridge

71 Gilles Deleuze’s The Fold: Calculus and Curvilinear Design

1833

Rocker IM (2006) Calculus-based form: an interview with Greg Lynn. Archit Des 76(4):88–95. https://doi.org/10.1002/ad.298 Seppi A (2016) Simply complicated: thinking in folds. In: Friedman M, Schäffner W (eds) On folding: towards a new field of interdisciplinary research. Transcript Verlag, Bielefeld, pp 49–76 Speaks M (2001) It’s out there . . . . In: Di Cristina G (ed) Architecture and science. WileyAcademy, Chichester, pp 184–189 Spencer D (2016) The architecture of neoliberalism: how contemporary architecture became an instrument of control and compliance. Bloomsbury, New York Teyssot G (2005) A topology of thresholds. Home Cult 2(1):89–116. https://doi.org/10.2752/ 174063105778053427 Tissandier A (2018) Affirming divergence: Deleuze’s reading of Leibniz. Edinburgh University Press, Edinburgh Tschumi B (1986) La case vide: la villette, 1985. Architectural Association, London Tschumi B (1988) Parc de la Villette, Paris. In: Papadakis A (ed) Deconstruction in architecture. St Martin’s Press, New York, pp 32–39 van Tuinen S, McDonnell N (eds) (2010) Deleuze and The Fold: a critical reader. Palgrave Macmillan, London. https://doi.org/10.1057/9780230248366 Vasileios N (2009) Unfolding San Lorenzo. Nexus Netw J 11(3):471–488. https://doi.org/10.1007/ 978-3-7643-8978-9_10 Venturi R (1966) Complexity and contradiction in architecture. Museum of Modern Art, New York Vidler A (2000) Warped space: art, architecture, and anxiety in modern culture. MIT Press, Cambridge Voordouw J (2018) Topology and interiority: folding space inside. In: Marinic G (ed) The interior architecture theory reader. Routledge, London, pp 318–326 Williams J (2000) Deleuze’s ontology and creativity: becoming in architecture. Pli 9:200–219 Young M (2016) Minimalism and the phenomenological fold. In: Kanaani M, Kopec D (eds) The Routledge companion for architecture design and practice: established and emerging trends. Routledge, London, pp 61–76

Mathematics and Oenology: Exploring an Unlikely Pairing

72

Lucio Cadeddu, Alessandra Cauli, and Stefano De Marchi

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maths and Wine-Related Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barrel Volume Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Mathematics of Wine Aging: Arrhenius and Eyring Equations . . . . . . . . . . . . . . . . . Optimal Wine Storage Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Influence of the Heat Flow in the Temperature Equation . . . . . . . . . . . . . . . . . . . . . . The Optimal Depth for a Wine Cellar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Temperature Equation at the Optimal Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Qualitative Study of the Depth of a Wine Cellar Based on the Chosen Reference Period and Soil Conditions While the Temperature Is Changing . . . . . . . . . . . What’s Food and Wine Pairing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geometrical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matching Algorithm (MA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implementation Details and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . More Recent Investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1836 1836 1836 1839 1843 1846 1849 1851 1852 1855 1856 1860 1861 1862 1865 1865 1865

L. Cadeddu () University of Cagliari, Cagliari, Italy e-mail: [email protected] A. Cauli Politecnico di Torino, Turin, Italy e-mail: [email protected] S. De Marchi University of Padova, Padova, Italy e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_67

1835

1836

L. Cadeddu et al.

Abstract The aim of this chapter is to discuss some applications of mathematics: in oenology and in food and wine pairing. We introduce and study some partial differential equations for the correct definition of a wine cellar and to the chemical processes involved in wine aging. Secondly, we present a mathematical method and some algorithmic issues for analyzing the process of food and wine pairing done by sommeliers.

Keywords Wine · PDE · History of mathematics · General applied mathematics

Introduction Mathe matics is hidden everywhere and it surprises us with its several and fascinating applications. It is one of the oldest sciences that have developed in the course of human history, constantly evolving and resolving practical problems and deeply influencing our daily lives. In this chapter, an introduction is given to several applications of mathematics to solve problems arising when dealing with wine: aging, storing, building of wine cellars, and so on. In particular, applications of partial differential equations to the correct definition of a wine cellar and to the chemical processes involved in wine aging are given, as well as some historical details about wine-inspired problems, such as measurement of wine barrel internal volume. A mathematical method and some algorithmic issues for analyzing the technique of food and wine pairing are discussed. The approach is based on the comparison of the area of two planar polygons after some affine transformations. A MATLAB package, which analyzes the overlap of these planar polygons, has been developed for this purpose and is here presented.

Maths and Wine-Related Problems Barrel Volume Calculations In 1613, Johannes Kepler married, at the age of 42, the 24-year-old Susanna Reuttinger, after considering 11 different candidates over 2 years (this problem was later formalized as the “marriage problem” or the “secretary problem”). The wedding was celebrated in Linz, Austria. To celebrate the event, Kepler bought a barrel of wine but questioned the method the wine merchant used to measure the volume of the barrel and thus determine the selling price. Upset by the apparently incorrect merchant’s method, Kepler decided to study the problem of how to determine the correct volume of a wine barrel and, moreover, find the optimal

72 Mathematics and Oenology: Exploring an Unlikely Pairing

1837

Fig. 1 Horizontal sheets in a full barrel

proportions that maximize internal volume. He formalized the problem in his “new stereometry (solid geometry) of wine barrels” (Kepler 2018). The scheme of a wine barrel (see Fig. 2) shows how the wine merchant calculated the internal volume and thus the price of the wine. The merchant inserted a stick through the tap hole till the opposite edge of the lid of the barrel (at the upper left in the picture). The length of the stick d determined the volume of the barrel and hence the selling price. This method outraged Kepler, who saw that a narrow, high barrel might have the same d as a wide one and would indicate the same wine price, though its volume would be ever so much smaller. To determine the volume of a wine barrel accurately, Kepler thought of the wine in a full barrel, or of any solid body, as made up of numerous thin horizontal sheets arranged in thin layers and treated the volume as the sum of the volumes of these leaves (see Fig. 1). In the case of a wine barrel, each of these leaves was, at least approximately, a cylinder (Klein 2004, p. 209). Clearly, he had just introduced the notion of integration by cylindrical shells (more or less in the same fashion of Cavalieri’s theorem), which was still to come. Incidentally, Kepler’s calculations gave – approximately – the same results the wine merchant used to determine the cost of the barrel! At the same time, Kepler found the optimal proportions that maximized the internal volume of the barrel. Nowadays, the problem of optimization of that internal volume can be easily solved by means of simple differential calculus. Considering a cylinder of height h and diameter 2r, it is possible to determine h as a function of the length d of the stick the merchant used to determine the volume (see Fig. 2). Applying Pythagoras’ theorem r can be determined as a function of d and h, more 2 2 precisely: r 2 = d4 − h16 . The volume of the cylinder is V = π r2 h. Substituting into this formula this expression of r2 in terms of h and d, it is easily seen that π 3 h , i.e., a cubic polynomial in h which attains its maximum at V = π4 d 2 h − 16 2d h= √ . 3

1838

L. Cadeddu et al.

Fig. 2 Wine merchant’s calculation scheme

Fig. 3 Barrel volume V function graph as a function of h, only h > 0 and V > 0 are admissible

For simplicity’s sake, choosing d = 1 in order to draw√ the cubic function   π V = 16 4d − d 3 which attains its maximum at d = 2 3 3 and the resulting √

maximum volume of the barrel is 4π3 3 . This volume changes very slowly near its maximum (see Fig. 3); hence, small changes of h do not generate large changes of V. In other words, Kepler discovered that Austrian barrel shape was very close to the optimal one, that which maximizes internal volume and small variations of d

72 Mathematics and Oenology: Exploring an Unlikely Pairing

1839

were negligible. This couldn’t be completely accidental; perhaps internal volumes had been empirically calculated to obtain this result. Kepler calculation showed that the volume of cylinders of equal diagonals d does not vary significantly in a neighborhood of the maximum. Finally, the merchant’s method was fairly accurate and honest. The length of the stick d determines the volume of the barrel in a satisfactory way, provided the proportions satisfy the 3h2 = 4d2 requirement, as found on Austrian barrels. Ever since the earliest days of wine, various containers have been used for storage and maturation. Their capacity was not always known and differed from one country to the next. For this reason, the “stick” method was so popular. Different barrels also shared the same name, adding confusion to storage measurements. The medieval barrel used in Florence (Italy) contained just 45 l, while the fifteenth-century English wine barrel measured 143 l. Nowadays the situation is different as there are only a few standard wine barrels worldwide. The French barrel (Bordeaux or Burgundy) volume varies from 225 to 228 l. The modern “hogshead” stores 300 l, while the “puncheon” has a capacity of 500 l. There are also larger barrels that range from 550 to 630 l. Generally, the 225-l wine barrel has more or less become the standard barrel size for wine maturation. This size became widely adopted more than a century ago, because a barrel of this size could be handled by a single person.

The Mathematics of Wine Aging: Arrhenius and Eyring Equations Wine aging is a result of oxidation and non-oxidative processes. Complex chemical reactions involving wine’s sugars, acids, and phenolic compounds (e.g., tannins) can modify the aroma, taste, color, and mouthfeel of the wine in such a way that it might be more pleasing to the taste. Not all wines can age, though. The ratio of sugars, acids, and phenolics to water is a key determination of how well a wine can age. For example, the less water in the grapes prior to harvest, the more likely the resulting wine will have a good aging potential. Climate, grape variety, vintage, and viticultural practice are always relevant. Contact with oak either during fermentation or after (the so-called barrel aging) will add more phenolic compounds to the wine. Wine aging, hence, is a process which involves many chemical reactions taking place over time. Each of these reactions occurs at a certain rate, and each one is affected by temperature changes in a different way. Any chemical reaction has a unique “energy factor,” “activation energy,” or natural energy barrier that must be overcome for the reaction to occur. For a chemical reaction to proceed at a reasonable (or desirable) rate, the temperature of the system should be high enough in such a way that there exists an appreciable number of molecules with energy equal to or greater than the activation energy. The term activation energy was firstly introduced in 1889 by the Swedish Nobel Prize-winning scientist Svante Arrhenius and the empirical equation of reference is the so-called Arrhenius equation (see Laidler 1987). He followed the work of Dutch chemist J. H. van’t Hoff (1884). The equation reads as:

1840

L. Cadeddu et al.

ln k = ln k0 −

Ea RT

or, equivalently: Ea

k = k0 e− RT . This equation links the dependence of the rate constant k of a chemical reaction to the absolute temperature T (in kelvin), where k0 is the pre-exponential factor (constant for small temperature variations), Ea is the activation energy of the chemical reaction (also constant for small temperature variations), and R is the universal gas constant. This formula can be easily deduced from the following van’t Hoff differential equation which states that the rate of variation of K in time is directly proportional to the activation energy Ea and inversely proportional to the square of the temperature T multiplied by the universal gas constant R: dK Ea K. = dT RT 2 Now, integrating this equation by a simple separation of variables yields: 

k

dK = k0 K



T

Ea dτ 2 T0 Rτ

and hence ln k − ln k0 = − (assuming

Ea RT

Ea RT

is zero at T = T0 ) and, finally, the exponential form1 Ea

k = k0 e− RT . Arrhenius states that for every 10 ◦ C the temperature, as a result of a chemical reaction, increases between 50% and 200%. Similarly to this “empirical” Arrhenius equation, the so-called Eyring equation describes the variance of the rate of a chemical reaction with temperature. It is also known as Eyring-Polanyi equation because it was developed almost simultaneously in 1935 by the chemist Henry Eyring, the physical chemist Meredith Gwynne Evans, and the polymath Michael Polanyi. This equation provides insight into how a reaction progresses at the molecular level. Its expression results from transition state theory as follows: k=k where:

kB T − H ‡ S ‡ e RT E R , h

72 Mathematics and Oenology: Exploring an Unlikely Pairing

• • • • • • • •

1841

k is the reaction rate constant. κ is the transmission coefficient. kB is the Boltzmann’s constant. h is the Planck’s constant. T is the absolute temperature. H‡ is the enthalpy of activation. R is the gas constant. S‡ is the entropy of activation.

According to the transition state theory, the reaction does not occur instantly at the moment of the collision, but molecules, at the beginning at an infinite distance, when they approach, start to interact: the binding lengths change gradually until a particular configuration is reached, the so-called transition state or activated complex. All these transformations correspond to an increasing potential energy and the activated complex corresponds to the activation energy Ea . The activated complex is in an energy level higher than that of the reactants and the products and it is a very unstable state. For this reason, the activated complex exists only for a limited time, and it evolves soon toward the formation of products or it can recede forming again the initial reactants. The activated complex cannot be isolated. [C] Consider a bimolecular reaction A + B → C and K = [A][B] , where K is the equilibrium constant. In the transition state model, the activated complex AB is formed: A + B  AB ‡ → C

K‡ =

[AB]‡ . [A] [B]

The rate of a reaction is equal to the number of activated complexes decomposing to form products. Hence, it is the concentration of the high-energy complex multiplied by the frequency of it surmounting the barrier.   Rate = v AB ‡

(1)

= v [A] [B] K ‡

(2)

Rate = k [A] [B] .

(3)

The rate can be rewritten:

Combining Eqs. (2) and (3) gives:

1842

L. Cadeddu et al.

k [A] [B] = v [A] [B] K ‡

(4)

k = vK ‡

(5)

where • v is the frequency of vibration. • k is the rate constant. • K‡ is the thermodynamic equilibrium constant. The frequency of vibration is given by: v=

kB T h

(6)

where • kB is the Boltzmann’s constant (1.381 × 10 −23 J/K). • T is the absolute temperature in Kelvin (K). • h is Planck’s constant (6.626 × 10 −34 Js). Substituting Eq. (6) into Eq. (5): k=

kB T ‡ K . h

(7)

Equation (7) is often tagged with another term (M 1−m ) that makes the units equal with M as the molarity and m as the molecularity of the reaction. k=

kB T ‡  1−m  K M . h

(8)

The following thermodynamic equations further describe the equilibrium constant: G‡ = −RT ln K ‡

(9)

G‡ = H ‡ − T S ‡

(10)

where G‡ is the Gibbs energy of activation, H‡ is the enthalpy of activation, and S‡ is the entropy of activation. Combining Eqs. (9) and (10) to solve for lnK‡ gives:

72 Mathematics and Oenology: Exploring an Unlikely Pairing

1843

H ‡ S ‡ + . RT R

(11)

ln K ‡ = −

The Eyring equation is finally given by substituting Eq. (11) into Eq. (7):

k=

kB T − H ‡ S ‡ e RT e R . h

(12)

In particular, the difference between Arrhenius activation energy Ea and the activation enthalpy H‡ is quite small and numerically close to the accuracy attained in most experiments. These two energies are therefore frequently used interchangeably to define the activation barrier of a reaction. As the aging of a wine is the combination of several chemical reactions, the Arrhenius and the Eyring equations are needed to study the problem of the average of the temperature and to understand why wine collectors are so hung up on temperature. One of them was Jean-Baptiste Joseph Fourier (1768–1830) who, as any good Frenchmen, was motivated to carry out his research into the understanding of the propagation of heat because of his love for wine and food. More precisely, he was in trouble when trying to keep his bottles of wine at the ideal temperature. During summer, the bottles would get too hot and, hence, change the taste. During winter, the bottles would get way too cold, freeze, and eventually crack. The obvious solution was to keep the bottles in a cellar with small temperature changes, but understanding the perfect depth of a cellar was a trouble. Too deep and it could become too expensive to build and troublesome to manage (e.g., getting wine bottles in and out); not deep enough, and temperature fluctuations might have been harmful both for the aging process and the long-term storage. For these reasons, in 1804, he started studying the propagation of heat, and in 1807 he submitted to the Paris Institute the first paper on the topic, titled “On the Propagation of Heat in Solid Bodies.” The paper caused several objections, raised by many well-known scholars (viz., Lagrange, Laplace, Boit, Poisson), but he kept working on it and in 1822 published it as “Théorie analytique de la chaleur” (for a rich Fourier’s biography, we refer to O’Connor and Robertson 2003). Before attacking the problem the Fourier’s way, it is needed to study and understand the parameters that affect wine aging and storage.

Optimal Wine Storage Conditions The physical parameters that play an important role during wine aging process are optimal average temperature, temperature fluctuations, humidity, light, and vibrations.

1844

L. Cadeddu et al.

Optimal Average Temperature As seen in the previous section, maintaining the right temperature is certainly one of the most important factors in the preservation of a wine. Studying the average temperature to be maintained in a wine cellar and its possible variations in relation to the external environment is of paramount importance. At around 10 ◦ C, wine is already aging very well; increasing the temperature to 20 ◦ C makes it age in a few years. Increasing the temperature to 30 ◦ C, wine ages in just a few months, but it is not recommended. Chemical reactions occurring inside a bottle that increases its temperature are quite complicated and proceed at different speeds. In order to stop or control these chemical reactions, it is recommended to keep the wine cooler within the optimal range to achieve ideal aging of the bottle during storage. Thus, a wine stored at a temperature of 30 ◦ C is of lower quality with respect to a wine stored at 13 ◦ C. Indeed, wine experts have found the ideal wine storage temperature to be about 13 ◦ C in order to get a well-balanced aging. Keeping the wine at lower temperatures such as 10 ◦ C is not so harmful as long as it does not freeze. However, there are three drawbacks in keeping the wine at a low temperature until the end of the aging process: first of all, it grows (too) old and it is not ideal if you do not drink it right away. Then, it is the best solution if you plan to leave it to future generations, provided the wine can age for a long time. The second drawback is that keeping the wine at a low temperature is expensive. The ground temperature in most of the USA is about 10 ◦ C or higher, so maintaining a constantly cooler temperature implies a continuous artificial cooling of the cellar. The third possible problem that can be encountered during storage at too low temperatures is the inability to properly handle moisture levels. Proper, artificial refrigeration of the cellar tends to dry the air, which is not recommended, as it might affect (see next section C) wine quality. It has been shown that the maximum allowed temperature for storing a wine is about 18 ◦ C, except for particular delicate wines for which the value should be lower, as it has been already observed. Higher temperatures, above 21 ◦ C, might be good to accelerate the time to get the final product, while a few days at 27 ◦ C or a few hours at 32 ◦ C can permanently damage any even rugged wine. Hence, the ideal temperatures for storing wine are 13–16 ◦ C for red wines, 10–14 ◦ C for white wines, and 10–11 ◦ C for rosé wines. A decades-old wine could taste quite young if stored continually under these optimal conditions. The basic idea is therefore to keep the temperature constant at an optimal average value in such a way that the wine ages gradually over time.

Temperature Fluctuation Temperature fluctuation is perhaps even more important than the optimal average storage temperature. Using the Arrhenius equation (see section “Maths and Wine-Related Problems”), the highest value of the temperature within a cycle speeds up the aging process much more than a low value that instead slows it down. In addition, the temperature oscillations move the air inside the bottle and the wine expands as soon as it is heated. What can happen is that by increasing the pressure,

72 Mathematics and Oenology: Exploring an Unlikely Pairing

1845

the bottle breaks. Then, the cork moves slightly outward or a small amount of content pushes the cork. As soon as the bottle cools down, the wine retracts and the air compresses inside the bottle. Excessive temperature variations bring small amounts of air to replace the wine. This brings the old bottles to fill up to the brim. Since oxygen, one of the most reactive gases, is the most damaging element for wine, bottles that are subjected to repeated temperature fluctuations tend to lose their freshness. Minimizing the frequency of temperature oscillations is as important as minimizing the extent of these oscillations. An acceptable level of temperature oscillation is about 7 ◦ C (average annual oscillation). A change of about 11 ◦ C can become harmful if it occurs on a daily basis. The idea is that a deep wine cellar repairs by seasonal variations in temperature and also by the daily changes. This leads immediately to the problem of building an appropriate wine cellar.

Humidity This is the most controversial issue in the whole wine conservation process. The general consensus among experts is that a wine should be kept in a damp environment to keep the cork wet and perfectly sealed. Some experts have recently dealt with this problem by asserting that as long as the bottle is stored with the wine stopper, it remains wet and the humidity seems insignificant for optimal conservation. But other taste experts have indicated that wines kept at an appropriate temperature but in dry conditions seem to lose their freshness. Moreover, in dry conditions, the top of the cork which is normally not in contact with the wine dries up and therefore becomes smaller in diameter. This slightly loosens the seal and the air may enter inside the bottle. This, together with the temperature oscillations, favors harmful wine oxidation. So, keeping the wine in too dry conditions is a risk to avoid. On the other hand, controlling humidity inside a cellar is easy and not too expensive. As it has been said, humidity is therefore much less critical than temperature. Storing wine below 50% of relative humidity is harmful, while above 80% there are mold risks. The range between 50% and 80% of relative humidity is therefore acceptable. The amount of water that can dissolve in air increases with temperature. The maximum amount that can be held at a particular temperature is 100% relative humidity. It is important to distinguish between humidity and relative humidity.

Light It is advisable to keep the wine in dark environments, as light, at least for small wavelengths, lowers the resistance of the complex of molecules that will create the main shades of taste during a proper aging process. Glass absorbs most of the ultraviolet rays and in particular dark green glass absorbs all the other small wavelength rays. If you have a good wine in clear bottles, like white wines or Champagne, you should keep them in a dark place. It is also not recommended to expose the bottles to artificial light.

1846

L. Cadeddu et al.

Vibrations Finally, wine should be stored in a vibration-free environment. It has been carefully verified that this is not so important for wines that do not emit deposits. For those that produce sediment, the most harmful aspect is the agitation of that deposit before serving the wine. Hence, if the conservation area is subject to small vibrations, it is preferable to leave the wine for a couple of weeks in a quiet place before serving it. The same attention should be paid whenever an aged red wine is moved.

The Influence of the Heat Flow in the Temperature Equation The heat equation is of paramount importance in several scientific fields. In mathematics, the heat equation is the prototype for any parabolic partial differential equation. In financial mathematics, it is needed to solve the Black–Scholes partial differential equation in option pricing models. In probability theory, this equation is related to the study of Brownian motion (random walks) via the Fokker– Planck equation. A more general version of the heat equation, called the “diffusion equation,” is related to the study of chemical diffusion and other related processes, as those which arise in wine aging. The simplest form of the heat flow density is given by Fourier’s Law, which states that the time rate of heat transfer through a material is proportional to the negative gradient in the temperature and to the area, at right angles to that gradient, through which the heat flows. Along the z axis (see picture below), we have qs (t) = −k ∂T ∂z , where k is the coefficient of thermal conductivity and it is always positive as is T = T(t, z), the temperature which varies, in time t, along the z axis. All materials obey to this law if, at least at first approximation, the material is not heated excessively and quickly. The heat flow qs (t) is actually a vector and it is assumed that this is propagated along the z axis only. As seen in the previous paragraph, wine storage in underground cellars requires proper temperature control. The idea is that a good condition of the soil repairs wine from seasonal and daily variations in temperature. Clearly, very deep cellars have this advantage, but their construction is very expensive, and they are not very practical to use and maintain in good condition. Therefore it is relevant to ask how the emission of heat propagating through the surface of the cellar (represented along the t axis of the figure) affects the temperature function. Consider as a reference a semi-infinite homogeneous semi-space, isolated, that is, without heat sources inside. Thus the heat flow can be represented through the surface along the axis t. Indicate with z the depth of the cellar. Solving the heat equation means finding the precise relationship between temperature T, the depth z of the wine cellar, and the time t (Fig. 4). The goal is to find the best depth of a wine cellar. Formulating the problem with the heat diffusion equation in the form: ∂ 2T 1 ∂T = 0, − k ∂t ∂z2

(13)

72 Mathematics and Oenology: Exploring an Unlikely Pairing

1847

Fig. 4 Find z0 , the optimal depth!

where the temperature is a function of the space and time that is denoted by T(t, z) and k = ρc k is the diffusion coefficient of the Earth. Assume also that the heat flow qs (t) is a periodic function, so that the initial condition is expressed by: −k

∂T = F0 e−iωt , ∂z

(14)

where k is the coefficient of thermal conductivity. Hence, it is needed to solve the following second-order differential system:

∂2T ∂z2



− k ∂T ∂z

=0 , = F0 e−iωt

1 ∂T k ∂t

(15)

whose solution, obtained by straightforward application of Fourier series techniques (for more details refer to Bornemann (2004), Cadeddu and Cauli (2018), Siegfried and Marcuson (2010), and Taler (2006)), is of the form: T (z, t) = Af (z)e−iωt .

(16)

In fact, substituting the expression of T(z, t) in the heat diffusion equation (13), yields: Ae−iωt



d 2 f (z) dz2



(−iω) f (z) = 0 k

1848

L. Cadeddu et al.

which thus becomes an ordinary equation of the second order: d 2 f (z) iω f (z) = 0 + k dz2 that has a general solution of the form: f (z) = ae

 − iω k

+ be

 − − iω k

.

Since the exponential decay of the temperature oscillation, when the depth grows is relevant, choose a = 0 and consider only the second term of f (z), which must be replaced in the expression (16) of the temperature, which provides: T (z, t) = Af (z)e

−iωt

= Ae

 √ ω √ − 1−i k z −iωt 2

e

.

(17)

By separating the oscillatory part: T (z, t) = Ae

√ω



2k z

e

i

√

ω 2k z−ωt



(18)

.

It remains to determine the constant A to obtain the particular solution of the problem (15) considered. Derive the quantity (18) with respect to z: ∂T = −Ae−iωt ∂z



1−i √ 2





ω − e k

1−i √ 2

√

ω kz

and impose the boundary (initial) condition (14):  ω 1 − i −iωt kA e = F0 e−iωt √ k 2 and  ω −iωt−i π 4 = F0 e −iωt , kA e k from which the value of the constant is obtained: F0 A= k



k iπ e 4. ω

Thus, the desired solution, which expresses the temperature under the action of the heat flow that crosses the surface, is:

72 Mathematics and Oenology: Exploring an Unlikely Pairing

F0 T (z, t) = k



√  k −√ ω z i 2kω z −ωt+ π4 2k e e ω

1849

(19)

By comparing the equations of the temperature wave (19) and the heat flow (14), it is readily seen that the oscillatory parts differ from π4 . This means that the temperature anomaly due to the heat flow on the surface, for each given depth, returns the oscillation period on the surface of 1/8 of the oscillation period which appears in the temperature wave equation. Equation (19) completely solves the heat diffusion problem since it shows the precise relationship between temperature T, the depth z of the cellar, and the time t. Given T and t, one can find z and vice versa. This will be done in the next paragraph.

The Optimal Depth for a Wine Cellar The aim is to estimate the depth z of a wine cellar to reduce the temperature anomaly, so that the oscillation is negligible. Suppose that the temperature variation at the surface is described by the wave law T(0, t) = T0 + A0 cos ωt, where T0 , A0 , and ω are positive constants, in particular, ω = 2π t represents the rate of change of the temperature corresponding to the initial known value of the temperature at the surface, T0 . For example, taking as reference 2π rad 2π rad −7 period of storage the year, ω = 365days = 31536000 s = 1.99 × 10 rad/s. The problem is finding the optimal depth z0 of the wine cellar or establishing the winery size for different values of ω. Then, formulate the problem in the case where the wave describing the temperature at the surface is of the form T(0, t) = T0 + A1 cos ω1 t + A2 cosω2 t, where ω1 is the temperature variation frequency corresponding to the period of 1 year and ω2 is the frequency of the temperature variation with respect to the 1-day period. Let T = T0 + U. Then the model of the initial problem is formulated by the following second-order system: ⎧ 0 < z < ∞, −∞ < t < ∞ ⎨ Ut − kU zz = 0, , −∞ < t < ∞ U (0, t) = A0 cos ωt, ⎩ | U (z, t) |< C, 0 ≤ z < ∞, −∞ < t < ∞ whose solution is of the form: U (z, t) = ℜA0 e−αz ei(ωt−αz) = A0 e−αz cos (ωt − αz) ,  ω 1 2 . Notice that the solution U(z, t) must have the average surface where α = 2k temperature T0 added to it in order to obtain the expression of the temperature function. Observe that κ depends on the nature of the soil: it can vary by a factor of 5 or more between a wet and a dry soil. For more precise calculations (see Turcotte and Schubert 2002), assume an average value of κ = 2 × 10−7 m2 /s while assuming that the Earth’s temperature fluctuates around 20 ◦ C. Impose the condition αz = π

1850

L. Cadeddu et al.

to obtain the optimal depth of the cellar for a period of a year: z0 =

1 π 2k 2 ≈ 4.45 m. =π α ω

If the amplitude of the annual fluctuation is assumed to be 15 ◦ C, things get better because a damping factor of 10 is now sufficient. That is, choose z so that e−αz = 0.1 or z0 ≈ 3.2 m. However, considering as period of aging 1 day = 86,400 s, instead of 1 year, 2π yields ω = 86,400 = 7.27 × 10−5 rad/s and the optimal depth necessary to damp the oscillations of the temperature is equal to: 1 1 1 86, 400 2 2k 2 π 2 = π (0.004) ≈ 23.3 cm. z0 = = π α ω 2π So, the optimal depth z0 is linked to the conditions of the wine cellar previously described, and in particular z0 depends primarily on ω, i.e., on the period of aging and on the diffusion coefficient of the Earth κ. In the case in which the temperature at the surface is of the form T(z, t) = T0 + A1 cos ω1 t + A2 cos ω2 t, putting T = T0 + U + V, the model system of the problem becomes: ⎧ Vt − kV zz = 0, 0 < z < ∞, −∞ < t < ∞ ⎨ Ut − kU zz = 0, , V (0, t) = A2 cos ω2 t, −∞ < t < ∞ U (0, t) = A1 cos ω1 t, ⎩ | U (z, t) | < C, | V (z, t) | < C, 0 ≤ z < ∞, −∞ < t < ∞ (20) Like in the previous case, one gets: U (z, t) = ℜA1 e−α1 z ei(ω1 t−α1 z) = A1 e−α1 z cos (ω1 t − α1 z) ,

α1 =

V (z, t) = ℜA2 e−α2 z ei(ω2 t−α2 z) = A2 e−α2 z cos (ω2 t − α2 z) ,

α2 =

ω 1 1

ω 1 2

2k

from which the temperature at the depth z and time t is: T (z, t) = T0 + A1 e−α1 z cos (ω1 t − α1 z) + A2 e−α2 z cos (ω2 t − α2 z) , with

2

2k

2

72 Mathematics and Oenology: Exploring an Unlikely Pairing

ω1 =

2π 3.15 × 107

1851

= 1.99 × 10−7 rad/s

and  ω 1/2 α2 1 α1 = √ . = 2k 365 So, choosing the reference period of the year, the optimal depth of: 1/2 2k π z0 = =π ≈ 4.45 m. α1 ω1 is found.

The Temperature Equation at the Optimal Depth Wishing to write the equation of the temperature function T(z0 , t), at the optimal depth z0 , its oscillations at the variation of this depth can be highlighted and therefore it can be determined the best one for building the cellar, taking into account the so-called skin effect occurring in the first layer of the Earth surface where the temperature function decreases with depth at a factor of 1e . Consider the inside temperature of the cellar T = 1e T0 = 1e Ae−iwt , being T0 = T(0, t) = Ae−iωt , the temperature on the outer surface.  By √matching this ω √ − 1−i k z −iwt 2 value to the temperature function found earlier, T (z, t) = Ae e , and resolving with respect to z, it is easily found the value of the optimal depth (already calculated):  z0 = π

2k ω

which depends exclusively on ω and κ. Now rewrite the equation of temperature (19) according to the optimal depth z0 found: F0 T (z0 , t) = k



k − zz i e 0e ω



z π z0 −ωt+ 4



(21)

The choice of the optimal depth z0 that has been calculated leads to two advantages. First, a reduction by a factor of 23 in the amplitude wave of the temperature function is obtained while the temperature oscillation is reduced to a mere 1 ◦ C with a correct design of the winery. Secondly, the phase of the wave of the temperature function at these depths is exactly the opposite of the phase of the

1852

L. Cadeddu et al.

same at ground level. This is the desirable effect to dampen excessive temperature fluctuations and limit the mechanisms of heat transfer, such as the opening of the door of the cellar, which pushes the temperature of it well beyond the optimal storage value of 13 ◦ C.

A Qualitative Study of the Depth of a Wine Cellar Based on the Chosen Reference Period and Soil Conditions While the Temperature Is Changing In order to study how the depth of the cellar needs to be modified according to the chosen time for the aging of the wine and the soil conditions in which the cellar is located, choose three main reference time periods, and examine the optimal depths in considering the oscillations which is subjected to the temperature during 1 day or 1 year or under intense and persistent cold conditions, i.e., for the three different values of ω: • ω = 7.27 × 10−5 rad/s, for a daily period • ω = 1.99 × 10−7 rad/s, for an annual period • ω = 1.99 × 10−12 rad/s, for a period of hundreds of thousands of years Also consider three ground conditions, that is, analyze the cases of a clay, sandy, and rocky soil. In this regard, observe the following table:

Considering 1 day as a reference period and examining the oscillations of the outside temperature on the surface according to time, the trend is the following: It can be observed, therefore, that during the hottest hours of the day, the temperature undergoes strong oscillations, reaching at midday a considerable variation of 12◦ . Regarding the regulation of the depth z0 of the wine cellar, consider 1 day as a reference period for temperature fluctuations, and taking into account soil characteristics, no major changes are necessary since in order to damp daily fluctuations, it would be necessary, for a rocky terrain, the optimal depth of only 20 cm, as shown in the following table:

72 Mathematics and Oenology: Exploring an Unlikely Pairing

1853

Surface temperature oscillation (kelvin)

Surface temperature for daily oscillations 12 10 8 6 4 2 0

0

5

10 15 Times (hours)

20

25

Choosing a year as a wine aging period, depending on the characteristics of the soil, the optimal depths vary between 1.4 and 3.8 m, as shown in the following table:

In order to understand how important is adjusting temperature oscillation by means of the depth of the cellar, consider an extreme situation of an overly long aging period, with regard to the third ω value which has been considered, of about 100,000 years. It should be noted that, in this case, a proper adjustment of the depth of the cellar would be impracticable for the same soil characteristics, since it would

1854

L. Cadeddu et al.

be needed the building of a cellar deeper than 1 km, as confirmed by the following table: Now, consider the three main types of soil as a reference and choose a wine production area in the world, where the average annual temperature is 13 ◦ C and there are substantial variations in temperature during the considered yearly aging period. Look at the respective charts for the optimal depths in the three different chosen soil conditions for an annual value of ω = 1.99 × 10−7 rad/s. Temperature depending on depth and time: clay soil 0

Depth (Meters)

–2 –4 –6 –8

–10 –12 –4

–3

–2 –1 0 1 2 Temperature variations in time (kelvin)

3

4

Choosing the optimal wine conservation conditions which have been described in this section, from the charts provided, notice that for a clay soil, the optimal depth is over 6 m; for a sandy soil, it is almost 8 m; and, finally, for a rocky soil, which is subjected to strong variations in temperature, it should imply a depth of more than 20 m. Finally, in order to keep the temperature of the wine conservation environment constantly equal to the optimal temperature of 13 ◦ C, the depth of a wine cellar is crucially important for the aging process as it is able to cope with the oscillations also daily of the temperature function. The following pic shows a chart that highlights the importance of correct sizing of a wine cellar even in persistently cold climates throughout all the period. The different values in meters can be seen in the graph, in order to adjust the depth of the cellar in an attempt to mitigate the

72 Mathematics and Oenology: Exploring an Unlikely Pairing

1855

Temperature depending on depth and time: sandy soil 0

Depth (Meters)

–2 –4 –6 –8

–10 –12 –4

–3

–2 –1 0 1 2 Temperature variations in time (kelvin)

3

4

Temperature depending on depth and time: rocky soil 0 –5

Depth (Meters)

–10 –15 –20 –25 –30 –35 –40 –4

–3

–2 –1 0 1 2 Temperature variations in time (kelvin)

3

4

oscillations of the temperature function that occur during the annual aging process. For wine cellars located almost on the surface, where the temperature function takes up more than 10◦ below 0, one can observe how, by modifying the depth of the cellar and increasing it, it is easier to attenuate the frequent temperature oscillations.

What’s Food and Wine Pairing? The discussion starts by explaining what is the process of food and wine pairing. “Food and wine pairing is often cast as a rather mysterious science, but in truth it is actually quite simple, and the experimentation involved is great fun. People often thought that the purpose for a wine is for it to be drunk with food in a situation where

1856

L. Cadeddu et al. Attenuation of temperature oscillations with depth 0 –5 –10 –15

°C

–20 –25 –30

0.25 m 1m 1.5 m 2m 3m 5m 7m 10 m

–35 –40 –45 –50 2007

2007.5

2008

2008.5

2009

2009.5

2010

both complement each other, and it still amazes how often a rather humble wine will synergize with a food match in a profound way.” These words give a perfect idea of what is aimed by the process of matching food and wine. Mathematically speaking, one can formalize the process as an algorithm that geometrically can be modelled as the comparison of the area of two polygons obtained by connecting the values indicated in the sommelier graph depicted in Fig. 5). This diagram is somehow reminiscent of a dart board highlighting the more important characteristics of the food and those of the wine that we would like to understand when tasting a food and smelling a wine. How does one compare these polygons? What kind of information does one wish to get from this comparison? What follows a pairing algorithm based on the technique that sommeliers use by tasting the food and the wine is presented and tested on some classical food and wine pairing. On reading the chapter, one should be convinced of the importance of choosing the proper wine for every dish: the proper wine exalts the dish and its preparation.

The Graph Firstly the graph that the Italian Association of Sommeliers (called AIS) uses to find the wine that better match a given food is presented (see Fig. 5). There are two different sets of descriptors, called characteristics, that are pointed out in the graph with different characters.

72 Mathematics and Oenology: Exploring an Unlikely Pairing SWEETNESS SOFTNESS

1857 AROMA INTENSITY

I.A.P.

sweetness tendency

iuiciness greasiness

fattiness

ALCOHOL LEVEL ACIDITY EFFERV. TANNIN CONTENT

SAPIDITY

sapidity bitterness tendency

persistence taste-aroma spiciness

acidity tendency aromatic sweetness

Fig. 5 The graph for food and wine pairing used by the AIS association

• Words in normal size correspond to the food characteristics evaluated using the natural number from 0 to 10. These characteristics are considered in counterclockwise order and are greasiness, juiciness, sweetness tendency, fattiness, sapidity/bitterness tendency/acidity tendency/sweetness, and persistence tastearoma/spiciness/aromatic. • Words in capitals correspond to the wine characteristics again using the values from 0 to 10. As above, these are considered in counterclockwise order, opposite to the circles center, as follows: alcohol level, tannin content, sapidity/effervescence, acidity, aroma intensity/IAP (intense aromatic persistence), and sweetness/softness Notice: When one has a group of characteristics (such as those for the food sapidity/bitterness tendency/acidity tendency/sweetness) and gives to each of them a value, the one reported in the graph is the highest that corresponds to the most perceptible characteristic of the group.

1858

L. Cadeddu et al.

To be more precise: 1. The value 0 is used when a characteristic is absent. 2. The values in the range 1–3 are used when a characteristic is barely perceptible. 3. The values in the range 4–6 are used when a characteristic is better perceptible than before, but not clearly. 4. The values 7 and 8 are considered when a characteristic is clearly perceptible. 5. Finally, the values 9 and 10 are given when a characteristic is perfectly perceptible. These values are represented in the graph of Fig. 5 by using 11 concentric circles of radii 0 up to 10, intersected by lines emanating from the inner circle and connecting the value of the characteristics. In Fig. 6 is presented a slightly different form of the original graph (as that in Associazione Italiana Sommeliers 2001) that gives a good idea of what Italian sommeliers should handle when performing food and wine pairing. Some characteristics of the food the wine pairing process are obtained by opposition while others by accordance. It is time to look to the mathematical point of view. Fixing a Cartesian coordinate (x, y) system, the graph has essentially three main symmetries: one bottom-up that corresponds to the horizontal line x = 0 describing a matching by accordance and the other two describing a matching by opposition, corresponding to the bisection lines y = x and y = − x. (a) Consider the bottom-up line, parallel to x = 0. In the bottom part, there are the characteristics of the food that should agree with the corresponding ones of the wine along the same side (left, sapidity, bitterness tendency, acidity tendency, sweetness; right, persistence taste-aroma, spiciness, aromatic), on the top of the graph for the wine. There is one exception: the sapidity, bitterness, and acidity tendency of the food have to be considered as opposed to the sweetness and the softness of the wine. As a simple example, if the food has aromatic 6 and spiciness 5, the wine should have aroma intensity and/or IAP with almost the same values. (b) The second symmetry is along the bisection line y = x. Here the matching is made by opposition. For example, a food which is juicy with value 8 must match a wine with similar values of alcohol level and/or tannin content. (c) The third symmetry is along the bisection line y = −x. The matching is here again by opposition. For example, a food which is fat should be matched with a wine with good percentage of acidity and effervescence or sapidity. The reason is simple: effervescence and acidity have the effect of polish from fat in the mouth. In Fig. 6 is shown the graph, similar to that can be found in Italian books for sommeliers (cf., Associazione Italiana Sommeliers 2001), where the characteristics

72 Mathematics and Oenology: Exploring an Unlikely Pairing

SWEETNESS SOFTNESS

1859

AROMA INTENSITY I.A.P.

sweetness tendency

iuiciness greasiness

fattiness

ALCOHOL LEVEL ACIDITY EFFERV. TANNIN CONTENT

SAPIDITY

sapidity bitterness tendency

persistence taste-aroma spiciness

acidity tendency aromatic sweetness

Fig. 6 The graph used by sommeliers to match food and wine by accordance or opposition

have been identified by opposition with the colors red and black and the ones by accordance with the colors green and magenta. Example 1 In Fig. 7 the graph for matching of a slice of S. Daniele ham, Italian prosciutto crudo (blue polygon), and a red wine from Sicily, DOC Nero d’Avola, year 2002, 14% (red polygon), is presented. One question arises: “given a food, which characteristics should a wine have for the optimal match (or the best possible)?” Since our problem consists of the “comparison” of two polygons inscribed in 11 circumscribed circles, the comparison is made by analyzing the intersections of their areas and how much they overlap. Hence, the best matching problem can be performed, modulo a roto-translation, by looking at how much the polygons overlap! The more they overlap, the better the matching is.

1860

L. Cadeddu et al.

SWEETNESS

7 AROMA INTENSITY

SOFTNESS

7 I.A.P.

sweetness tendency 4

6 iuiciness greasiness

5 fattiness

ALCOHOL LEVEL 7

6 ACIDITY EFFERV.

6 TANNIN CONTENT

5 SAPIDITY

sapidity 7

7 persistence taste-aroma

bitterness tendency

spiciness

acidity tendency aromatic sweetness

Fig. 7 This graph corresponds to the match of a slice of ham and a glass of Nero d’Avola (Sicilian wine), aged 2002

Geometrical Issues The vertices of the polygons are identified as follows: • Put the common center of the circles at the origin. • Every polygon has vertices at the points (xs , ys ) = (k cos θs , k sin θs ) , where k ∈{0, . . . , 10} and θ s are the corresponding angles. • On looking at the graph used by Italian sommeliers in Fig. 5, for the wine characteristics, one can consider the angles θ 1 = π /2 − δ/2, θ 2 = π /2 + δ/2, θ 3 = π + δ, θ 4 = π + 2δ, θ 5 = −2δ, θ 6 = − δ with δ = π /12.

72 Mathematics and Oenology: Exploring an Unlikely Pairing

1861

• Similarly for the food. Food characteristics are at the polygon vertices that, by symmetry with respect to the center of the circles, have angles α s − π + θ s , s = 1, . . . , 6, where the θ s are the angles of the wine’s characteristics. Let W = {(xs , ys ) , s = 1, . . . , SW } F = {(us , vs ) , s = 1, . . . , SF } be the two polygons. Applying Green’s theorem (cf., Kaplan 1991) to the region enclosed by the polygons, we can easily compute AW and AF , i.e., the signed area of the polygons. The vertices must be ordered clockwise or counterclockwise: if they are ordered clockwise, the area has negative sign. The wine polygon has vertices W 1 AW = (xi yi+1 − xi+1 yi ) , 2

S

(22)

i=1

where xSW +1 = x1 and ySW +1 = y1 (cf., Bockman 1989). Equation (22) can be interpreted as the product (crossed) of the two column array formed by the coordinates of the vertices. One has also to consider the location of the centroid of each polygon, i.e., the geometrical analogue of the center of mass. The general formula is well-known. For the sake of completeness, the coordinates of the centroid for the food polygon are: SF 1  υF = (ui + ui+1 ) (ui vi+1 − ui+1 vi ) 6AF

(23)

SF 1  (vi + vi+1 ) (ui vi+1 − ui+1 vi ) 6AF

(24)

i=1

υF =

i=1

Matching Algorithm (MA) Let start by constructing the graph of Fig. 5 and the corresponding polygons. In each polygon, the two moment lines corresponding to the centroidal principal moments about axes parallel to the Cartesian ones are indicated. Then one finds the centroid, as described above. The matching algorithm (MA) can be summarized as follows: 1. Find the equation of the moment lines 2. Apply a roto-translation to the food polygon (with the aim to make the polygons having the same centroid)

1862

L. Cadeddu et al.

3. Analyze the area of the difference

E (W, F) = (W ∪ F) \ (W ∩ F) ,

(25)

to check if the wine matches the food. Mathematically speaking, we have to face the question of finding the minimum of E (W, F ). From linear programming, each (convex) polygon represents a planar region that can be represented as A x ≤ b. Hence, W := A1 x j , xi = hi ; • xj = 0;   • for i < j , xi = gi if Y =H yi = 0, xi = 1 − gi if Y =H yi = 1, where the games Y range over the nonselected heaps and their respective heap sizes are y.  

Conway’s Theory of the Full Class of Normal Play In the 1970s and 1980s, Berlekamp, Conway, and Guy generalized the normal play convention (Berlekamp et al., 2001; Conway, 2000) (in the spirit of the preceding Milnor’s positional games – see section “Positional Games with Nonnegative Incentive”), to partizan games where the players do not necessarily have the same move options. The 0-game is the empty game where no player has an option. Two results are possible, Left (the maximizer) or Right (the minimizer) wins. This invokes four outcome classes, N, P, L and R, corresponding to a win for the Next player, Previous player, Left, and Right, respectively. Hence, these four outcomes are partially ordered with L > P > R, L > N > R, where the outcomes P and N are incomparable. When a starting player is given, then the result in optimal play is simply Left wins (L) or Right wins (R). In this spirit, we will depart from the standard notation (Berlekamp et al., 2001) and write o(G) = (oL (G), oR (G)). We get: P: N: L: R:

Left wins if Right starts and Right wins if Left starts; oL (G) = R and oR (G)=L Left wins if Left starts and Right wins if Right starts; oL (G) = L and oR (G)=R Left wins if Right starts and Left wins if Left starts; oL (G) = L and oR (G)=L Right wins if Right starts and Right wins if Left starts; oL (G) = R and oR (G)=R

73 CombinArtorial Games

1883

This gives rise to a partial order of equivalence classes of games, by using a notion of disjunctive sum, generalized suitably for partizan games, i.e., in the game G + X, Left plays either in the G component to GL + X or in the X-component to G + XL and similarly for Right (where GL denotes a typical Left option of G). Definition 1. [Partial order] For all games G and H , G ≥ H if, for all games X, o(G + X) ≥ o(H + X). This definition is correct, since we want to be as careful as possible; indeed, we consider the behavior of the games G and H in any given context, i.e., in any normal play disjunctive sum. The conjugate of G, denoted by −G, is defined by (recursively) swapping the players’ roles. We will prove that this is the negative of G, i.e., that G + (−G) = 0. This will serve to prove that G ≥ H is equivalent with Left wins G + (−H ) = G − H playing second, i.e., oR (G − H ) = L, and this is the fundamental theorem of normal play games. Or equivalently, Left wins G − H independently of who plays first if and only if G − H > 0. In fact, this implies a bijection between the partial order relations and the outcome classes. Let us first motivate the result, by studying the classical partizan normal play game of DOMINEERING. Left plays vertical (blue) pieces, and Right plays horizontal (red) pieces. If the game board is a 2 by 2 square, then the first player wins in one move, for example, if Left moves. Then, without loss of generality, the game board becomes the shape to the right.

Note that this position is terminal if and only if Right starts. (Observe that a non-empty NIM heap is a next player win, but it contains much less urgency, than the 2 by 2 DOMINEERING position. In general, there is more incentive to play in a DOMINEERING position than in a NIM heap. Such ideas are developed in temperature theory.) Suppose that we want to justify that

>

A consequence of the fundamental theorem of normal play is that this is true if and only if Left wins the following compound independently of who starts:

1884

U. Larsson

+

And this is easy to verify, because Left can terminate each component in one move, but by moving first in a component, Right must leave Left with a free move in that component. That the second component has the correct shape is a consequence of that normal play games is a group structure and that the negative of a game is obtained by swapping the players. By using a mimic strategy, one can prove that Definition 1 implies a group structure. 1. We must prove that, for all games G, o(G − G + X) = o(X), for any game X. To prove oL (G − G + X) ≥ oL (X), we assume that oL (X) = L (for otherwise we are done). Left has an optimal move in X since oL (X) = L. This move can be used to play to the game G − G + XL . If Right responds in G − G, then Left mimics, and otherwise Left continues in the X component. To prove oR (G − G + X) ≥ oR (X), we may assume that oR (X) = L, and the argument is analogous (if Right plays in the X component, Left has a winning strategy and otherwise mimic). The remaining two cases are symmetric. Next, we prove that oR (G − H ) = L, if and only if G ≥ H . By the group structure, it suffices to study the inequality G − H ≥ 0, that is, we may instead prove that, for any game G, oR (G) = L if and only if G ≥ 0. 2. Note that, if G ≥ 0, then X = 0 gives oR (G + 0) ≥ oR (0) = L. 3. For the other direction, assume that oR (G) = L, and we must prove that, for all X, o(G + X) ≥ o(X). First we prove that oR (G + X) ≥ oR (X), and we may assume that oR (X) = L. Altogether Left can play in the same component as Right until the end of game, and Left wins G+X, since Left wins both individual games (Left can force Right to start in G). Next we prove that oL (G + X) ≥ oL (X), and we may assume that oL (X) = L. Since Left has a winning strategy in X, Left can force Right to start in G and then by assumption win. Thus, we proved constructive game comparison for normal play combinatorial games. Next, we will look at Milnor’s positional (scoring) games that preceded Conway’s normal play by about 20 years (but which is nowadays known to be a subgroup of normal play if we restrict the scores to dyadic rationals and identify “scores” with “moves”).

73 CombinArtorial Games

1885

Positional Games with Nonnegative Incentive Von Neumann and Morgenstern’s classical game theory concerns strategic form games, where utility functions with numerical values define equilibria via simultaneous play and mixed strategy profiles. The setting is quite different from CGT, and a concept such as “disjunctive sums of games” is rarely encountered. One early departure is Milnor’s so-called positional games (Hanner, 1959; Milnor, 1953). These are perfect information extensive form games, and moreover, they are “partizan” games with nonnegative incentive. Nonnegative incentive means that it is never bad to play first, a notion that is intimately attached to the concept of a disjunctive sum of games. (But note that this concept is not meaningful in normal play games, because every P-position has negative incentive.) Positional games are zero-sum, and hence, given a starting player, a backward induction algorithm determines optimal play and, the corresponding equilibrium utility, which is a real number, a score. In CGT terms Milnor’s positional games correspond to so-called dicot scoring games without zugzwangs. Dicot means that if one of the players has a move option, then so does the other player. A recreational game example for attempted membership is the “placement game” BLOKUS. Players place tiles in form of polyominoes on a finite grid, for each player, chosen from a given finite set. The game ends when both players pass, because they cannot place any of their remaining tiles, and scoring starts. The player who tiles the largest area wins. This popular ruleset is slightly outside the class, because the pass rule is a global feature, which does not apply to individual game components in a disjunctive sum of games (for a similar reason, the popular “impartial” scoring game DOTS AND BOXES lands slightly outside the class). Let us instead illustrate the ideas of Milnor’s positional games with a scoring variation of the game DOMINEERING from section “Conway’s Theory of the Full Class of Normal Play”. The membership question is sensitive to the particular scoring rules, and because of the dicot rule, it is required to set a final score to any component where at most one of the players can move. Therefore, the final score for player Left, say, is the number of vertical tiles on the game board + the maximal number of remaining move options for Left in each terminated game component, and note that a generic game decomposes during play. The final score for Right is computed analogously, and then the total score is the difference between them. Clearly this scoring variation of DOMINEERING has nonnegative incentive, and it is dicot by definition. (In fact, optimal play in this scoring variation is very similar to optimal normal play – only the parity or the so-called infinitesimal part of normal play differs.) An outcome function o(G) = (oL (G), oR (G)) = (, r) defines optimal play scores  and r given the maximizing and minimizing player as starting player, respectively. Left is the maximizing player and Right is the minimizing player. (That

1886

U. Larsson

is,  or r is the game value if Left or Right starts, respectively.) This is similar to normal play, but where outcomes are instead ordered pairs of reals. Milnor was the first author to understand how one should define equality of games in the context of games such as GO and in view of the disjunctive sum theory of NIM and its Sprague-Grundy generalization. Since outcomes are partially ordered, it is more convenient to define inequality, and thus we reuse Definition 1. The neutral element 0 is the terminal game where no player has an option and the score is 0 (no matter who starts). Milnor proved that the restriction of nonnegative incentive permits a constructive comparison of games. (And this property is closed under disjunctive sum.) That is, to understand the order relation between the games G and H , it suffices to play on the games G and H , i.e., the “for all X” part is redundant. See item 2 below, where we prove that, for all games G and H , G ≥ H is equivalent with o(G − H ) ≥ 0. Moreover, his definition gives nontrivial (big) equivalence classes of games, and this fact initiated a field of research, which fits under the umbrella combinatorial game theory, CGT. We encourage the reader to check the example in section “Conway’s Theory of the Full Class of Normal Play” but now in the case of SCORING DOMINEERING. Indeed, the composite game in the third picture has total final score 1 point, independently of who starts, which proves “G > H ”. A useful inequality says that, for any games G and H , oL (G) + oR (H ) ≤ oL (G + H ),

(5)

because if Left starts by playing GL + H , then onward Left can follow Right’s choice of game component, unless, at some point, one of the two components ran out of move options. In this case, by the nonnegative incentive criterion, Left is not worse off, by continuing instead in the other component. Similar to normal play, we prove first that this monoid of games is in fact a group and that the inverse of G is obtained by swapping the players’ roles. We prove that o(G−G+X) = o(X) for all X. By symmetry it suffices to study one of the outcome functions, say oR . 1a. To prove oR (G − G + X) ≥ oR (X), we find a Left strategy to any Right play in G − G + X. If Right plays optimally to say G − G + XR , then if Left responds by the optimal move in XR to say G − G + XRL , we get oR (G − G + X) = oL (G − G + XR )

(6)

≥ oR (G − G + XRL )

(7)

≥ oR (XRL )

(8)

≥ oR (X),

(9)

where (6) is by assumption, (7) is by Left playing, (8) is by induction, and (9) is by Right playing and Left playing optimal. If XR is the final move in

73 CombinArtorial Games

1887

the X component, then Left starts in G − G, and by the nonnegative incentive condition, she does at least as well as if Right starts. If Right starts in G − G, Left has the mimic strategy, and repeat the previous argument. 1b. To prove oR (G − G + X) ≤ oR (X), Right has a mimic strategy to any Left play in G − G, and Right can start by playing to G − G + XR where XR is Right’s optimal play in the game X. If Right gets to start in the G − G component, then he is better off, by the nonnegative incentive criterion. 2. Next we prove that for all games G and H , G ≥ H is equivalent with o(G − H ) ≥ 0. The forward direction follows by the proved group structure, and take X = 0. For the other direction, similarly, by G ≥ H if and only if G + (−H ) ≥ 0, it suffices to prove that o(G − H ) ≥ 0 implies, for all X, o(G − H + X) ≥ o(X). 3. Thus to simplify notation, we prove instead that o(G) ≥ 0 implies, for all X, o(G + X) ≥ o(X). We begin by proving oR (G) ≥ 0 implies, for all X, oR (G + X) ≥ oR (X). If Right plays optimally to G + XR , if Left can respond optimally in X, the inequality holds by induction. If Left has no move in XR , then oR (G + X) = oL (G) + oR (X) ≥ oR (X), by o(G) ≥ 0. Similarly if Right plays optimally to GR + X, by nonnegative incentive, GR perhaps not optimal in G alone, inequality (5), and the assumption oR (G) ≥ 0, we get oR (X) ≤ oR (G) + oR (X) ≤ oL (GR ) + oR (X) ≤ oL (GR + X) = oR (G + X). 4. By the same inequality (5), it follows that oR (G) ≥ 0 implies, for all X, oL (G + X) ≥ oR (G) + oL (X) ≥ oL (X). It turns out that Milnor’s games are congruent with a subgroup of normal play games, namely, those without so-called infinitesimal games (corresponding to reduced canonical forms), and this was explained in full detail in the survey part of a recent paper (Larsson et al., 2018b). Later, Ettinger studied a class of positional scoring games with zugzwang (Ettinger, 1996), and his work was generalized further to the class of guaranteed scoring play (Larsson et al., 2016a, 2018b) and to absolute combinatorial game theory (Larsson et al., 2016b, 2018a). In fact, by using guaranteed scoring play, the somewhat artificial scoring method of SCORING DOMINEERING can be removed, and one may instead simply count the total area of the placed tiles. A guaranteed rule is invoked, where a player who can still move when the game ends may cash in any remaining possible placements, but there is no need to keep track of individual components; no component needs be terminated before the full game ends. In fact, we could also allow zugzwang positions, where players sometimes are forced to place a piece of the opponent, and there is still constructive or “play” game comparison, similar to what has been described here. We refer the reader to those papers for more on these theories.

1888

U. Larsson

“Wythoff’s Star” Larsson et al (2011); Larsson (2012a)

Patterns of a Generalized Games The solutions of combinatorial heap games can be visualized revealing richness in patterns. This happens frequently for generalizations of NIM and WYTHOFF NIM , and we would like to share some of our findings. (Related papers, Fink 2012, Larsson 2013b, Larsson and Wästlund 2013, show that standard variations of heap games encompass computational universality, so in theory all kinds of patterns could appear in visualizing solutions; the artistic part embraces intuition of where to look.) In the Epilogue, we tell the idea behind some of the pictures.

73 CombinArtorial Games

1889

1890

U. Larsson

73 CombinArtorial Games

1891

1892

U. Larsson

73 CombinArtorial Games

1893

1894

U. Larsson

73 CombinArtorial Games

1895

1896

U. Larsson

73 CombinArtorial Games

1897

1898

U. Larsson

73 CombinArtorial Games

1899

1900

U. Larsson

73 CombinArtorial Games

1901

1902

U. Larsson

73 CombinArtorial Games

1903

1904

U. Larsson

73 CombinArtorial Games

1905

1906

U. Larsson

73 CombinArtorial Games

1907

1908

U. Larsson

73 CombinArtorial Games

1909

1910

U. Larsson

73 CombinArtorial Games

1911

1912

U. Larsson

73 CombinArtorial Games

1913

1914

U. Larsson

73 CombinArtorial Games

1915

1916

U. Larsson

73 CombinArtorial Games

1917

1918

U. Larsson

73 CombinArtorial Games

1919

1920

U. Larsson

Epilogue An Emperor wants to hire a new Advisor, and only two contestants remain to be considered. He sets up the following game to test their skills. The Emperor has been to a visit of foreign territory and wants to return to his fortress in the northeast corner of his empire. At each step, he may travel arbitrarily far toward this goal, but only along prescribed directions. He lets the candidate Advisors alternate in prescribing the current move. The Advisor who brings him back to his safe Fortress wins the prominent position. To make the contest more interesting, for a fixed positive integer b, at each stage of the game, he allows the non-advising Advisor to occupy, or block, at most b positions with hostile Pawns (the Emperor can jump). At each new stage of play, any previous blocking maneuver is forgotten, and the Pawns are handed

73 CombinArtorial Games

1921

Fig. 2 Left: The Emperor’s move directions and his fortress in the NE. Right: Some palace positions are blocked off by hostile Pawns

over to the other player. In Fig. 2, we see an example where the Emperor moves as the Queen of CHESS, with blocking number b = 4, and where the olive green area in the NE is the safe fortress. The yellow colors represent (intermediate) safe palaces, whereas the black, blue, and purple colors represent various degrees of positions from which the Emperor can move to a palace position (or the fortress), independently of how the non-advising Advisor blocks off at most four positions. In the rightmost picture, we see an unsuccessful blocking of three positions. Note: even if one more hostile Pawn was put to block one more yellow palace, there would still remain one safe palace within the prescribed directions. (Of course, the color coding is not revealed to the players.) In the above pictures, we have often departed from the color code used here in this example, but the idea is the same. Moreover, the prescribed move directions varies, the blocking number vary, and often we show only the local behavior of the terrain. In addition to given directions, in some pictures a small finite number of additional move options have been added, and in one of the pictures, an additional forced asymmetrical palace position has been adjoined to perturb otherwise symmetric game rules. By a backward induction approach, exactly one of the two Advisors will fail to put the Emperor in safe palaces, because hostile Pawns will occupy all (at most b) palaces in view. The colors in the pictures represent the number of palaces the Emperor sees from the given position. Differing coloring schemes are used to highlight various appearing structures. Each pixel in a picture represents one game position. A reader might want to zoom in to discover patterns of individual palace numbers in various pictures. (Please feel free to email me for questions, suggestions and comments on the pictures/games. There is a lot to be done still to explore these type of games.)

1922

U. Larsson

By following the Emperor-Pawn idea, we define a general class of games on two heaps, with a blocking maneuver, for which only very few special cases have been studied in the literature. A move option from the position (x, y) is (x − r, y − s) ≥ (0, 0), where the pair (r, s) belongs to a given set M ⊂ N × N of move options. This defines the vector subtraction game M. For example, NIM satisfies M = {(m, 0), (0, m) | m ∈ N}. Let L be a given finite set of pairs of nonnegative integers, (0, 0) ∈ L. A linear restriction to the class is of the form M(L) = {(1 m, 2 m) | (1 , 2 ) ∈ L, m ∈ N}. Thus, for example, WYTHOFF NIM is the linear vector subtraction game M({(0, 1), (1, 0), (1, 1)}). Let S ⊂ N × N be a finite set of pairs of nonnegative integers, (0, 0) ∈ S. Then, the S alteration of the linear game M(L) is the symmetric difference M(L, S) = (M(L) \ S) ∪ (S \ M(L)) (and this is what the Emperor sees). For example, the game MAHARAJA NIM is as W YTHOFF NIM but with the alteration S = {(1, 2), (2, 1)}. Given a game M, the b-blocking maneuver defines the game Mk : at each stage of the game, before the current player moves, the other player is allowed to block off at most b move options (and these are the hostile Pawns). When the current player has moved, any blocking maneuver is forgotten. A player who cannot move because all move options are blocked off by the other player loses. Outside of this definition, one may impose a finite number of forced terminal positions, or similarly a finite number of positions with moves to (0, 0), and in this case we say that the game is perturbed (the only perturbed picture is the Butterfly in Amazonas). One would perhaps think that such a small change would not change the outcome patterns dramatically, but we did not yet see any proof to support or dismiss this guess. The game rules give a way to recursively compute the palace numbers for each position. But there is a more efficient way, via a dynamic algorithm. Suppose that, for all positions smaller than X, we computed the number of palaces the Emperor sees in each given direction (given by L). Then, we compute the number of palaces the Emperor sees in each direction at position X, by adding one to the number in the nearest position in each given direction. In this way, it suffices to store a linear number of positions (the standard backward induction approach uses the whole game board, a quadratic number of positions). Most of the pictures have not appeared in the literature, with a few exceptions in Larsson (2012a, 2013a), Cook et al. (2017), Friedman et al. (2019), and Larsson et al. (2011). The overarching idea is that, when we find interesting game variations, then the pictures of their solutions may carry appealing features. Conversely, one may ask, when we see interesting patterns, can they be simulated by some elegant game rules? (For a survey of sequences and games generalizing WYTHOFF NIM see a recent survey (Duchene et al., 2019).) The last picture does not originate in any game rules; it was produced while still developing the code, while it still contained various mistakes that I cannot recreate. Most of the pictures stem from both combinatorial games and cellular automata, and they generalize the games studied in Larsson (2011, 2014, 2012b), Larsson and Wästlund (2014), Cook et al. (2017), and Friedman et al. (2019).

73 CombinArtorial Games

1923

A game may visualize a large-scale geometry, but where the individual patterns of regions and the apparent “borders” of shapes are hard nuts to crack. However, sometimes methods from physics help out, as we see in a recent paper Friedman et al. (2019), where various linear extensions to WYTHOFF NIM have been explored (but without a blocking maneuver). Acknowledgments I thank the participants of the two introductory CGT workshops at Ohio State University and IIT-Bombay, organized by Dr. Érika B. Roldán Roa and Prof. Mallikarjuna Rao and Dr. Ravi Kant, respectively. They have inspired much of this book chapter. Thanks to Gal Cohensius, Melissa Huggan, Richard Nowakowski, Pia Moberg and family, Ofer Zivony, David Wahlstedt, Hans Ekbrand, and Silvia Heubach for comments that helped improve this chapter.

References Beatty S (1926) Problem 3173. The American Mathematical Monthly Berlekamp ER, Conway JH, Guy RK (2001) Winning ways for your mathematical plays, vol 1–4, 2nd edn. A K Peters, Ltd., Natick Berstel J (1986) Fibonacci words—a survey. In: The book of L. Springer, pp 13–27 Bouton CL (1901) Nim, a game with a complete mathematical theory. Ann Math 3(1/4):35–39 Cole AJ, Davie A (1969) A game based on the Euclidean algorithm and a winning strategy for it. Math Gaz 53(386):354–357 Conway JH (2000) On numbers and games, 2nd edn. AK Peters/CRC Press Cook M, Larsson U, Neary T (2017) A cellular automaton for blocking queen games. Nat Comput 16(3):397–410 Duchene E, Frankel AS, Gurvich V, Ho NB, Kimberling C, Larsson U (2019) Wythoff visions. In: Larsson U (ed) Games of no chance 5. Mathematical sciences research institute publications, vol 70. Cambridge University Press, pp 35–87 Ettinger JM (1996) Topics in combinatorial games. PhD thesis, University of Wisconsin–Madison Fink A (2012) Lattice games without rational strategies. J Combin Theory Ser A 119(2):450–459 Friedman E, Garrabrant SM, Phipps-Morgan IK, Landsberg AS, Larsson U (2019) Geometric analysis of a generalized wythoff game. In: Larsson U (ed) Games of no chance 5. Mathematical sciences research institute publications, vol 70. Cambridge University Press, pp 351–380 Gardner M (1989) Penrose tiles to trapdoor ciphers. Freeman, New York. Chapter 8, section 6, “corner the lady” Grundy PM (1939) Mathematics and games. Eureka 2(6–8):21 Hanner O (1959) Mean play of sums of positional games. Pac J Math 9(1):81–99 Isaacs RP, Berge C (1962) The theory of graphs and its applications. London: Methuen & Co; New York: Wiley, Chap. 6 Larsson U (2011) Blocking Wythoff nim. Electron J Comb 18(1):120 Larsson U (2014) Wythoff nim extensions and splitting sequences. J Integer Sequences 17(2):3 Larsson U, Wästlund J (2013) From heaps of matches to the limits of computability. Electron J Comb 20(3):P41 Larsson U, Wästlund J (2014) Maharaja nim: Wythoff’s queen meets the knight. Integers 14(G05) Larsson U, Hegarty P, Fraenkel AS (2011) Invariant and dual subtraction games resolving the duchêne–rigo conjecture. Theor Comput Sci 412(8-10):729–735 Larsson U, Nowakowski RJ, Neto JP, Santos CP (2016a) Guaranteed scoring games. Electron J Comb 23(3):3–27 Larsson U, Nowakowski RJ, Santos CP (2016b) Absolute combinatorial game theory. arXiv preprint arXiv:160601975

1924

U. Larsson

Larsson U, Nowakowski RJ, Santos CP (2018a) Game comparison through play. Theor Comput Sci 725:52–63 Larsson U, Nowakowski RJ, Santos CP (2018b) Games with guaranteed scores and waiting moves. In: Special issue of combinatorial games, vol 47, Springer, pp 653–671 Larsson U (2012a) The *-operator and invariant subtraction games. Theor Comput Sci 422:52–58 Larsson U (2012b) A generalized diagonal wythoff nim. Integers 12(5):1003–1027 Larsson U (2013a) Impartial Games and Recursive Functions. Chalmers University of Technology Larsson U (2013b) Impartial games emulating one-dimensional cellular automata and undecidability. J Comb Theory Ser A 5(120):1116–1130 Lekkerkerker CG (1952) Representation of natural numbers as a sum of Fibonacci numbers. Simon Stevin 29:190–195 Milnor J (1953) Sums of positional games. Contributions to the Theory of Games II 28:291–301 Ostrowski A (1922) Bemerkungen zur Theorie der Diophantischen Approximationen. Abh Math Sem Univ Hamburg 1(1):77–98. https://doi.org/10.1007/BF02940581 Ostrowski A, Hyslop J, Aitken A (1927) Solutions to problem 3173. Am Math Mon 34(3):159–160 Rayleigh JWSB (1896) The Theory of Sound, vol 2. Macmillan Siegel AN (2013) Combinatorial Game Theory, vol 146. American Mathematical Soc. Silber R (1976) A fibonacci property of wythoff pairs. Fibonacci Quart 14(4):380–384 Sprague R (1935) Uber mathematische kampfspiele. Tôhoku Math J 41:438–444 Stolarsky KB (1976) Beatty sequences, continued fractions, and certain shift operators. Can Math Bull 19(4):473–482. https://doi.org/10.4153/CMB-1976-071-6 Whinihan MJ (1963) Fibonacci nim. Fibonacci Quart 1(4):9–13 Wythoff WA (1907) A modification of the game of nim. Nieuw Arch Wisk 7(2):199–202 Zeckendorf E (1972) Représentation des nombres naturels par une somme de nombres de Fibonacci ou de nombres de Lucas. Bull Soc Roy Sci Liège 41:179–182

Combinatorial Artists: Counting, Permutations, and Other Discrete Structures in Art

74

Lali Barrière

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Combinatorics in Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dodecaphonic Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iannis Xenakis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tom Johnson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elliott Carter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Combinatorics in Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Oulipo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan Eduardo Cirlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Digital Poetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Combinatorics in Visual Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sol LeWitt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vera Molnar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manfred Mohr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vladimir Bonaˇci´c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anders Hoff Aka Inconvergent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Combinatorial Visual Artists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dance, Theatre, and Cinema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Theatre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cinema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Closing Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1926 1929 1931 1932 1932 1935 1936 1937 1940 1942 1943 1946 1947 1948 1950 1952 1954 1955 1956 1957 1957 1958 1959 1960

L. Barrière () Departament de Matemàtiques, Universitat Politècnica de Catalunya, Barcelona, Spain e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_132

1925

1926

L. Barrière

Abstract This chapter is motivated by a question I asked myself: “How can combinatorial structures be used in a work of art?” Immediately, other questions arose: Are there artists that work or think combinatorially? If so, what works have they produced in this way? What are the similarities and differences between artworks produced using combinatorics? Combinatorics is a very transversal branch of mathematics. It is connected to logic, and it intervenes in the building of languages in general as natural language, musical language, and poetry. Combinatorics has been present in artistic practice for millenia, especially in music and poetry; for instance, in the shaped poems of Simmias of Rhodes in Ancient Greece. However, we are interested in artistic practices that are driven by the use of combinatorial ideas, structures, and methodologies, artists whose work is conceptualized or merely inspired by combinatorics. This often happens in connection with an interest in structure and language, and the phenomenon was significant in the twentieth century, as a by-product of the revolution that took place not only in art but in many other areas of knowledge. In this chapter, we present a survey of artists that think combinatorially, work combinatorially, and construct combinatorial artworks. The selection covers music, literature, visual arts including digital art, and an example of early physical interactive art, dance, theatre, and cinema. It is a non-exhaustive list of artists, selected to show differences and similarities in their ways of approaching art when using combinatorics.

Keywords Combinatorics · Permutations · Music · Literature · Visual art · Computer art · Conceptual art

Introduction Combinatorics is the study of sets, subsets, and other discrete, finite structures, as defined by a number of accepted combinatorial rules. Typical combinatorial problems are existence, counting, and optimization of a particular configuration. Combinatorics is related to many other branches of mathematics, for instance, enumerative combinatorics, or the “art of counting,” is the basis of (discrete) probability. That is to say, it is related to randomness in the sense that whenever you have a choice to make, the study of the range of that choice is a combinatorial problem. The basic configurations are the permutations and combinations of the elements of a set. Other combinatorial structures include graphs, hypergraphs, partitions of sets, and combinatorial designs which are much more complex, including Latin squares and finite geometries, and have fundamental connections with finite fields, group theory, and number theory (Cameron, 1994; Stinson, 2004).

74 Combinatorial Artists

1927

Motivated by the question, “How can combinatorial structures be used in a work of art?,” this chapter presents a survey of the relationship between combinatorics and the arts, focusing on the twentieth century. It is certainly possible to find combinatorial structures in artworks before the twentieth century. There are numerous examples in music, such as Bach’s fugues or Indian traditional music. In literature, especially in poetry, the use of permutations and rearrangements dates back to the fourth century, with the Carmina of Publilius Optatianus Porphyrius. Visual arts are more inclined to geometry, although one could argue that the symmetry groups in Islamic art and architecture are a particular case of discrete structure. Here, the emphasis is on artists that take a combinatorial approach to their work, and on works in which combinatorial structures are part of the creative process itself rather than simply being a tool to achieve some kind of “artistic” result. In such works, whether the artist intends to make the combinatorial structure visible or not, it is consciously and purposely made part of the work rather than being only a means to an end. This artistic approach is one of the consequences of the artistic revolution that took place, in consonance with the changes in science, philosophy, and technology, at the beginning of the twentieth century, with the avant-garde and the beginnings of abstract art. In most of the disciplines, the artistic language had exhausted its resources. Artists in search of a renaissance wanted to break with tradition and certainly did it in many different ways. Some of them opted for simply destroying the old rules, like the Dadaists, and later Burroughs; others looked for new resources, new materials. In music, Russolo, Edgar Varèse, and – some years later – John Cage, three musicians whose work aims to liberate sound, were open to including any kind of sound in their music, abolishing the distinction between “musical” and “not-musical” sounds. Other artists developed a growing interest in formalism as a defining aspect of their work, generating new narratives in literature and music, and working with simple geometric shapes to evidence abstraction and structure in the visual arts. Thus meaning became less important. In this context, the Constructivists and the artists of the Bauhaus school became interested in mathematics as a way to free their art from subjectivity. Unsurprisingly, in the visual arts, combinatorics comes together with geometry. The revolution begins with geometric abstraction, with works where the presence of combinatorics is not so evident. But the artists were already thinking combinatorially. For instance, a varied set of combinatorial schemes and diagrams can be found in the teaching notes of Paul Klee (see Fig. 1), who taught at the Bauhaus. Since the beginning of the artistic abstraction, the artistic language has become more diverse. What motivates an artist to use combinatorics, to create a system, to follow a process, in which combinatorics plays an important role? We have seen that combinatorial techniques are a resource for artists with an interest in formalism. Many of them intend to create an art with no meaning and, in the case of music and literature, with no narrative or with a non-linear narrative determined by a mathematical structure and process.

1928

L. Barrière

Fig. 1 Three pages of Paul Klee’s Teaching Notes on Pictorial Creation he taught at the Bauhaus from 1921 to 1931. (From the Online-Data Base at Zentrum Paul Klee, public domain)

A desired consequence of this formalism is the abolition of the artist’s ego. Ultimately, the work of art will no longer be a masterpiece; rather it becomes a process that – in theory – anyone could carry out. One seeks for a more objective art. Although, in our opinion, this remains a utopia, it does open new doors to new forms of working and creating. Another approach with a long tradition is to confer on the use of mathematical tools, particularly logic and combinatorics, a mystical meaning. This is in connection with the ideas of Ramon Llull who at the end of the thirteenth century designed a universal, rational system of knowledge, which he called Ars Combinatoria, based on logical and calculatory operations, translated into diagrams and “thinking machines” on paper. Because of its enormous influence on formal logic and algorithmic theories, it is considered a precursor of program languages and their algorithmic logic. The ultimate goal of Llull was to get closer to God. The exhibition DIA-LOGOS. Ramon Llull and the Ars Combinatoria, which took place in 2018 in the Zentrum für Kunst und Medien in Karlsruhe, Germany, was dedicated to Llull’s work and life and to artists whose work can be related to Llull’s ideas (Vega et al., 2018). Zweig (1997) presents some examples of art that make use of the basic combinatorial structures: combinations, permutations, and variations. Her focus is on literature and music, and she includes some works based on the use of randomness. Indeed, the use of combinatorics is often connected with the use of randomness, because discrete probability calculus is a combinatorial calculus. In art, this relates to the exploration of possibilities of a configuration or a set of restrictions, often implying the notion of total exhaustion, which takes advantage of a new form of artwork, the serial work, and is enlarged by the advent of computers. Indeed, combinatorics is a very transversal branch of mathematics which has undergone significant development since the arrival of computers. Before that, in the mid-nineteenth century, Charles Babbage devised his “analytical engine.”

74 Combinatorial Artists

1929

Even though the machine was never built, Ada Lovelace designed a programming language for it. In the Notes to the translation of Menabrea (1843), she wrote: [The Analytical Engine’s operating mechanism] might act upon other things besides number, were objects found whose mutual fundamental relations could be expressed by those of the abstract science of operations, and which should be also susceptible of adaptations to the action of the operating notation and mechanism of the engine. . . Supposing, for instance, that the fundamental relations of pitched sounds in the science of harmony and of musical composition were susceptible of such expression and adaptations, the engine might compose elaborate and scientific pieces of music of any degree of complexity or extent.

This quote shows how ahead of her time Lovelace was, because she conceived of Babbage’s machine as a general purpose machine, as computers would later be. By doing this, she understood that the goal of the Analytical Engine was to manipulate discrete structures, “other things besides numbers,” as she says. Combinatorics is at the foundations of computer science; two areas that have benefitted each other. Code can be considered a set of combinatorial rules; built upon logic and combinatorics, it is a means to manipulate discrete structures. Working with code, programming implies working with combinatorics. As with many other technologies, artists have used the computer and worked with code since the very beginning. It is not surprising that an important segment of the artists that use combinatorics in their work are digital or computer artists. Nowadays, combinatorics is an available resource for artists of any discipline. Combinatorics has become a common tool in the artistic language or the set of artistic languages. Artists use it habitually, using different approaches, according to their implication, background, and formalism. Some artists use combinatorics only in one work; others, the ones we are more interested in, devote their entire artistic life to works or creative processes based on combinatorics. In the following pages, we present the works and ideas of some artists in music, literature, visual art, dance, theatre, and cinema. The list of artists we present is not exhaustive, for the evident reason of time and space. We focus on Western art of the twentieth century, on artists that decided to use combinatorics as a main tool seeking for a renovation of the artistic language, or on artists that, although using combinatorics in only few of their works, did so in a significant and representative way. The examples are chosen to answer the following questions: “Are there artists who work or think combinatorially? If so, which works are the result of this mode of creation? What are the similarities and differences between combinatorial artworks?”

Combinatorics in Music Music has been associated with mathematics from its very beginnings. Among the mathematical models that have inspired musical theories, combinatorics has had a continuous role because almost all musical systems are discrete. From the musical systems of the Middle Ages to the present day, combinatorics has been present,

1930

L. Barrière

Fig. 2 The Componium, invented in 1821 by Dietrich Nikolaus Winkel. (Source Wikipedia. Licence CC Attribution-Share Alike 4.0)

underlying scales and chords, rhythm patterns, and compositional techniques, as explained in Knoblock (2002) and Nolan (2000). Some famous examples are the fugal works by Johann Sebastian Bach, a master in counterpoint techniques. Some decades later, while Bernoulli and Euler were independently working on some combinatorial problems, the Musikalisches Würfelspiel was a musical game that allowed the composition of polonaises and minuets. With strict aesthetics, this game presented a table of motifs, chosen with a die. The result was a classical musical piece, with a pre-established structure, in order to not get “undesirable” results. Some of the musicians that composed a Musikalisches Würfelspiel are Wolfgang Amadeus Mozart, Joseph Haydn, and Carl Philipp Emmanuel Bach. Diedrich Nikolaus Winkel constructed, in 1821, the first machine for automatic composition, the Componium (see Fig. 2), which was, in fact, a mechanized Musikalisches Würfelspiel. Benson (2006) dedicates a chapter to the study of symmetry in music, which includes a section on change ringing and permutation groups, and another on the use of Pólya’s enumeration theorem in solving music-related counting problems. Read (1997) presents a number of musical combinatorial problems, with a focus on

74 Combinatorial Artists

1931

Messiaen and Stockhausen. Sethares (2007) carried out an original study of rhythm, with a chapter dedicated to its combinatorial aspects. In fact, when composition is viewed as a methodology of choice, the range of options that a musical system provides is always a combinatorial problem (Loy 2006, chapter 9). In the twentieth century, just as the conventions of figurative painting were radically transformed by abstraction, so too the fundamental forms of music were changed by the shift from tonal music to atonality. Musicians who wished to break with tradition began to explore new sonic territories, for which they needed new methodologies. This new way of thinking resulted in the stronger presence of combinatorics in modern musical works. The mid-century appearance of computers further enhanced this presence through the development of automatic composition systems.

Dodecaphonic Music In addition to these new developments, there emerged dodecaphonic music, also known as serialism, developed by Arnold Schoenberg (1874–1951). This methodology of composition, based on the permutations of the 12 notes of the chromatic scale, was later transformed into a more organized system, so-called integral serialism. The first wave of serialist composers, the Second Viennese School, was a group of composers from the early twentieth century formed by Arnold Schoenberg and his pupils, including Alban Berg (1885–1935) and Anton Webern (1883–1945) among others. With the objective of deconstructing tonal expectation, serialism works with 12-tone rows, consisting of ordered sequences of the 12 notes in which each of the notes appears just once. Schoenberg (1975) established the following rules, which he called Method of Composing with Twelve Tones Which are Related Only With One Another: • A composition uses a particular ordering of the 12 notes. The chosen tone row is called the basic set. It can be viewed both as a sequence of notes or a sequence of intervals, and it can be denoted by a permutation of the set {0, 1, . . . , 11}. The notes can be freely applied to any octave. • Each time a new tone is needed in the composition, the composer uses the next tone in the row, circling back to the first tone when the row is exhausted. • The composer can also use the tone rows obtained from the basic set via the application of the following operations: – Inversion. Replace each tone by its negative (modulo 12). – Retrograde. Read the sequence in reverse order. – Retrograde inversion. The composition of the inversion and the retrograde operations. – Transposition by n half-tones. Add n to each tone (modulo 12). After World War II, a second group of composers, including Karlheinz Stockhausen (1928–2007), Pierre Boulez (1925–2016), Milton Babbitt (1916–2011),

1932

L. Barrière

and Olivier Messiaen (1908–1992), inspired by the systematic treatment of pitch, rhythm, dynamics, and articulation of the compositions of Anton Webern, created a new form of music. Integral serialism is a composition methodology that represents a stage further in dodecaphonic techniques. Essentially, it consists of extending the 12-tone ordering technique to all other parameters of music. For instance, Olivier Messiaen established series or modes of 36 pitches, 24 durations, 12 articulations or attacks, and 7 degrees of loudness for his work Mode de valeurs et d’intensités, a work that was not strictly dodecaphonic but inspired, as Anton Webern did, this second group of composers.

Iannis Xenakis Musicians have tried to formalize music using mathematics since ancient times. The Pythagorean construct, the music of the spheres, related the musical notes with numbers, and further examples can be found throughout history. Among these examples, of particular importance is Iannis Xenakis (1922–2001), a Greek composer who reacted against serialism and is known for his “stochastic music.” His musical theory is based on an idea of using all available mathematical tools for composition. As he says: The formalization that I attempted in trying to reconstruct part of the musical edifice ex nihilo has not used, for want of time or capacity, the most advanced aspects of philosophical thought. But the escalade is started and others will certainly enlarge and extend the new thesis. (Xenakis, 1971).

Although stochastic music is based on probability theory, combinatorics is present in Xenakis’ stochastic compositions (see Fig. 3). Moreover, he also produced some combinatorial compositions, like Duel, 1958, and Stratégie, 1962, both based in game theory, and Nomos alpha, 1965, and Nomos gamma, 1968, both based on the permutation group of the cube (see Fig. 4). He also was one of the first musicians to use a computer to compose. In Xenakis (1971) he gathers his proposals for composition, which cover a widely diverse use of mathematics. The treatise includes an interesting analysis of some of his works.

Tom Johnson Another composer that uses combinatorics in his music is Tom Johnson (born 1939), who takes an approach quite different both from that of the serialist composers and from that of Xenakis. Johnson uses combinatorics to define the structure of his compositions, sometimes using complex combinatorial design configurations. However, as he himself admits, one of his goals is to make his music accessible, understandable, and its mathematical (combinatorial) structure and process easy to perceive. He has sometimes been called “The man who counts.”

74 Combinatorial Artists

1933

Fig. 3 Two pages from Xenakis’s notebook from 1959. (©Collection Famille Iannis Xenakis, with permission)

Johnson’s webpage (2020a) is a rich source of information, including writings, scores, and links. He also maintains a YouTube channel, with explanatory videos about his music (2020b). Johnson devises graphs and diagrams to represent and understand the structures used in his compositions. He collaborated with the mathematician Franck Jedrzejewski to explain the combinatorial content of his drawings and music in Looking at Numbers (Johnson and Jedrzejewski, 2014). Johnson systematically explores ways to apply mathematical ideas to rhythm, melody, and harmony. Some works that make particular use of combinatorics are L’opéra de quatre notes, 1972, Symmetries, 1981–1990, Full rotation of 60 notes through 36 positions, 1996, and Automatic music, 1997, to name only a few. A selection of techniques used in Johnson’s combinatorial works from the early 2000s include the following: • Integer partitions are used to construct chords with the same average height. In Trio, 2005, the three instruments play chords that add up to 72, while in Hexagons, 2005, two sums, 30 and 31, are combined. • Subsets. In Mocking, 2009, three percussionists play rhythms constructed by choosing four different numbers from 1 to 8. This is represented by Johnson as sums from 10 = 1 + 2 + 3 + 4 to 26 = 5 + 6 + 7 + 8, in a graph connecting intersecting subsets. • Combinatorial designs. Working with block designs, Johnson identifies the elements of the blocks with musical parameters, such as pitch and rhythm, and

1934

L. Barrière

Fig. 4 Schemes of the cube’s rotations for the piece Nomos Alpha by Xenakis, 1965. (© Collection Famille Iannis Xenakis, with permission)

uses the combinatorial structure of these blocks to create a path through the musical material, linking blocks by their common objects. In Vermont Rhythms, 2008, he uses 42 × 11 rhythms based on the (11, 6, 3) design, while Block Design for Piano, 2012, is built on the 4-(12, 6, 10) design defined by 30 base blocks and one automorphism of the permutation group over 12 elements. The score Kirkman’s Ladies, 2005, was inspired by the well-known school girls problem, posed by Reverend Kirkman in 1847 in The Lady’s and Gentleman’s Diary: Fifteen young ladies in a school walk out three abreast for seven days in succession: it is required to arrange them daily so that no two shall walk twice abreast.

The solution to this problem involves Kirkman triple systems, a special case of combinatorial design. In the score, Johnson uses a (15, 3, 1) design with 13 × 35 blocks (see Fig. 5). As he explains: In my score, entitled Kirkman’s Ladies, the 15 ladies become a scale of 15 notes, and the daily walks of five rows, three ladies in each row, become phrases of five chords with three notes in each chord. Each lady/note occurs once in each sequence of five chords, each pair of ladies walks together once a week, and by the end of the 13 weeks/sections, all 455 possible trios of women, all 455 possible combinations of three notes, have passed by. (Johnson and Jedrzejewski, 2014, page 39.)

74 Combinatorial Artists

1935

Fig. 5 A representation of the (15, 3, 1) design used by Tom Johnson in Kirkman’s Ladies, 2005. (With permission of the author)

Elliott Carter A composer that explored new forms of composition involving the use of combinatorial structures is Elliott Carter (1908–2012), famous for his innovations in rhythm and metric forms, and for developing a new rhythmic language. However, his experiments covered all aspects of musical creation. His exhaustive exploration of all possible chords of a certain number of notes is particularly innovative. Carter, who received a quite traditional musical education, had been influenced by Charles Ives (1874–1954), Henry Cowell (1897–1965), and Conlon Nancarrow (1912–1997), three composers that were at the time experimenting with tone divisions, polytonality, polyrhythms, and new harmonies in a very radical way. As a consequence, in the 1940s, he engaged in new directions with the composition Piano Sonata, 1945, and more clearly Cello Sonata, 1948, beginning his search for a new treatment of rhythmic matters.

1936

L. Barrière

Ives’s combinatorial ideas come from his eagerness to experiment with new forms of music. For instance, he applied dodecaphonic rules (independently of Schoenberg’s theory) to some compositions and worked with cyclical structures (Bellamann, 1933; Lambert, 1990). The compositions by Nancarrow and Cowell achieved a great rhythmic complexity, to the point that some of them were impossible to play by human musicians; they both constructed machines to play their works. Cowell published the influential book New Musical Resources (1969), in which he discusses new techniques regarding tone, overtones and polyharmony, time and rhythm, and tone-clusters as a new form of chord formation. As for Carter’s works, being influenced by these three composers, he took a step further in the formalization of his new musical language. He introduced a technique that is now called rhythm modulation, which consists of moving from one speed to another by means of changes of time signature and revision of the beat. This technique evolved until its culmination: the projection of simultaneous speeds together, with acceleration and retardation in the same passage (Bernard, 1988). Besides his exploration and renovation of rhythm patterns, from the 1960s, Carter engaged in a search for new pitch structures. As an example, in his piece Piano Concerto, 1964, he composes with all twelve 3-note chords, each with a set of speeds. Carter’s Harmony Book (2001) is a handbook of harmonic materials that he used in nearly all of his works from the early 1960s, written in the form of a catalogue of all possible chords, with a complete and systematic analysis of them in a non-tonal universe.

Further Examples Combinatorics has flooded the realm of music composition, becoming a regular tool for all kinds of composers. Hybrid techniques involving combinatorics and other mathematical ideas are applied by almost all composers in contemporary music. This makes it impossible to compile a complete list of composers that use combinatorics in their works. However, we can point out some notable examples of composers, works, and theorists. James Tenney (1934–2006) worked in many different areas, including algorithmic composition, for which he formally and computationally developed the ideas of dissonant counterpoint (Polanski et al., 2011). He defined the concept of harmonic distance based on the idea that harmonic relations between pitches can be modelled by a multidimensional space with metrical and topological properties that reflect how the human auditory apparatus perceives relations between pitches. In the model, pitches are represented by points in a multidimensional lattice, and the perceived harmonic distance between two pitches is the Manhattan distance between the corresponding points in harmonic space (Winter, 2019). Luiz Henrique Yudo (born 1962) is an auto-didact musician, trained as an architect. His experimental music comes from structures, labyrinths, alphabets, codes, patterns, architecture, paintings, drawings, sculptures, displays, textiles, numbers, geometry, and tables, anything visual with a sonic potential.

74 Combinatorial Artists

1937

James Saunders (born 1972) started to develop the idea of modular music with the piece compatibility hides itself, 1998–1999. The work consists of small pieces of music, each of a few seconds’ duration, which could be recombined into longer various combinations for each performance. This has become a combinatorial method of composition (Saunders, 2008). Samuel Vriezen (born 1973) is a composer with a mathematical background. His piece Within Fourths/Within Fifths, 2006, is his most clearly combinatorial work, and he also composed some pieces exploring block designs as a harmonic technique. As he says, “I have been, increasingly, more interested in game-like structures. [. . . ] certain structures of enumeration, permutation, combination etc. pop up in my work all the time, but they do not always occupy the centre stage, the way it happens in Tom’s [Johnson] work for example. Rather, they are among the formal tools that I use for a wide variety of purposes” (Vriezen, 2020). His piece Linking, 2019, dedicated to Tom Johnson on his 80th birthday, is structured as a card game. Vriezen (2017) explains how game-like structures can be applied to the analysis of open scores. Ferran Fages (born 1974) is a composer that, with the notion of memory as a common thread of ideas, is interested in sound and silence, space, and vibration. He has no mathematical background. However, he has used combinatorial ideas to generate a structure using combinations in Un lloc entre dos records, 2017, or a process using the notion of a tree in 107 arbres, 2018–2020 (Fages, 2020). To refer to other genres, some discussion of mathematics in jazz can be found in Maurer (2004). To mention only two examples, the music of the pianist Thelonious Monk is concerned with permutation, and the band Dawn of Midi use translations and permutations to compose their music, a sort of minimalistic jazz. In rock, math rock and mathcore are two styles that involve the use of complex and changing rhythm structures, sometimes with explicit application of combinatorial techniques. We can mention the bands The Dillinger Escape Plan, Converge, Meshuggah, Car Bomb, and Jute Gyte.

Combinatorics in Literature In literature, poetry stands apart from prose since metrics and rhyme patterns can be considered combinatorial structures. Moreover, mathematical permutations have been used in poetry since the twelfth century. The sextine, invented by the troubadour Arnaut Daniel, is a famous structure involving a permutation of six elements. A typical rhyme scheme is ABCDEF, FAEBDC, CFDABE, ECBFAD, DEACFB, BDFECA. The sextine was practiced by several medieval poets, including Dante and Petrarch, and, as a poetic form, it has survived to the present day: it was recovered during the French resistance by Louis Aragon, generalized by Raymond Queneau, and the Catalan experimental poet Joan Brossa wrote four books of sextines. Another known poetic form that uses permutations is the proteus poem, in which each line employs the same words but in a different order. The earliest surviving proteus poem dates back to 1561 (Higgins, 1987). In the twentieth

1938

L. Barrière

Fig. 6 The shaped poem The Mouse’s Tail in Carroll’s manuscript Alice’s Adventures Under Ground, 1865, which would be published as Alice’s Adventures in Wonderland. (Public domain)

century, we find permutations in the cut-up technique of the Dadaists and the machine for making poems of Tristan Tzara, and some decades later, in the works of the digital poetry pioneer Brion Gysin, among others. More ancient and also with a combinatorial flavor are the carmina figurata, which are poems written in a way that form a figure or a shape, sometimes hidden inside another text. The oldest documented carmina figurata is the calligram The egg, by Simmias of Rhodes, in the fourth century BCE. Maybe the most famous, both for its simple visual appeal and for its display of remarkable technical skill, are the poems written by Publilius Optatianus Porphyrius to honor the Emperor Constantine (Edwards, 2005). This poetic form has survived to our time and has been used by a number of poets and writers, particularly the Dadaists and the Concrete movement. There is even an example in Lewis Carroll’s Alice’s Adventures in Wonderland, 1865 (see Fig. 6). In the twentieth century, a number of poets have cultivated the carmina figurata, sometimes called calligrams or shaped poems, including Guillaume Apollinaire, Allen Ginsberg, William Carlos Williams, Dylan Thomas, Vicente Huidobro, Guillermo Cabrera Infante, Gerardo Diego, and Joan Brossa, among others. D’Ors (1977) studies this long tradition of experimental

74 Combinatorial Artists

1939

Fig. 7 Gulliver’s engine, an illustration from one of the first editions of Gulliver’s Travels, 1735. (Public domain)

poetry, which passes through poets from Ancient Greece, artists in the Middle Ages, in the sixteenth and seventeenth centuries, and, as we have seen, more recent ones. In prose, there are a number of works that contain references to combinatorial structures, mainly permutations. The most relevant examples are in chapter five of part three of Gulliver’s Travels, by Jonathan Swift (1735, pp 188–189), Gulliver visits the Grand Academy of Lagado. There he encounters a professor that shows him an engine that would create prose and poetry, entirely mechanically. The method of its operation involved turning the frame on which all the words of the language hung and having students read them aloud while capturing the results. This is one of the earliest mentions of a computer-like machine, one hundred years before Babbage’s engine (see Fig. 7). A similar example is found in Jorge Luis Borges’s novel La biblioteca de Babel, 1941, which describes a “universe” in the form of a library that consists of a structure filled with volumes of text, each a unique and random permutation of the letters of the alphabet. Because permutations of finite items, when repetition is allowed, provide infinite variations, Borges touches on the notion that exhaustive variations suggest a key to the mysteries of the infinite, one of the recurrent themes in his work. The science fiction novel Permutation City, 1994, by Greg Egan, who holds a Bachelor in Mathematics, starts with a 20-line poem consisting of anagrams of the title of the novel: Into a mute crypt, I Can’t pity our time Turn amity poetic

1940

L. Barrière

Ciao, tiny trumpet! Manic piety tutor Tame purity tonic Up, meiotic tyrant! I taint my top cure To it, my true panic Put at my nice riot To trace impunity I tempt an outcry, I Pin my taut erotic Art to epic mutiny Can’t you permit it To cite my apt ruin? My true icon: tap it Copy time, turn it; a Rite to cut my pain Atomic putty? Rien!

Moreover, the chapter headings also consist of a permutation of the title: “(Rip, tie, cut toy man)” appears in the headings of the prologue and chapters 3, 6, 9, and 12; “(Remit not paucity)” in the headings of chapters 1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 15, 17, 18, 19, 21, 22, and the epilogue; “Toy man picture it” in chapter 16; and “Can’t you time trip” in chapter 20. More conceptually interesting are the writers that explore new forms of narrative. The innovative narrative techniques of the Argentinian writer Julio Cortázar eschew temporal linearity. His most famous work is the novel Rayuela, 1963, in which the reader decides the chapters’ reading order. Another work is the novel Composition no. 1, 1962, by the French experimental writer Marc Saporta, where pages are read in arbitrary order. These two novels connect with the idea of potentiality, one of the fundamentals of the Oulipo group.

The Oulipo The “Ouvrier de Littérature Potentielle,” Oulipo, was a group of writers and mathematicians founded in 1960 by François Le Lionnais and Raymond Queneau (mathematicians) as a reaction against both traditional, in particular romantic, literature and surrealism. The group worked basically with two objectives: first, to generate what they called “potentiality,” to provide methodologies to create potential works, even if they were not realized, to create what they called “the book of all stories,” and second, to write with “constraints” or rules, in order to elaborate new forms and structures to serve as a support for literary works. This technicist vision of literature, aiming to use language in a more abstract manner and understand it as a sort of Meccano of signs which can be assembled by following rules, combinations, or even algorithms, is what gives to the works of the Oulipo their combinatorial character. Claude Berge, a member of the Oulipo and, as a mathematician, a renowned expert in combinatorics and graph theory, explains how the literature of the Oulipo is related to combinatorics and proposes many ways

74 Combinatorial Artists

1941

Fig. 8 The book is an object of art, in Cent mille milliards de poèmes, by Raymond Queneau. (Photo by Glòria Solsona)

to represent the combinatorial structure of some Oulipian works using graphs (see Berge, 2016). Besides Le Lionnais, Queneau, and Berge, other relevant members of the group were Georges Perec, Italo Calvino, and Marcel Duchamp. It is worth mentioning that Cortázar refused to be part of the Oulipo, for political reasons, and Boris Vian, who could have belonged to the group, died one year previous to the foundation. The following authors produced works as part of this group.

Raymond Queneau The most famous work of Raymond Queneau (1903–1976) is Cent mille milliards de poèmes, 1961, which is, in fact, a set of 10 sonnets. Since each sonnet follows the same rhyme scheme and bears the same rhyming sounds in the same places, each line of one poem can be replaced by the equivalent line of another without breaking any formal rules. The work is made by cutting slits on either side of each line of each poem, which yields a set of 140 recombinable lines, easy to rearrange into one hundred trillion possible sonnets. In this way, the author goes beyond the use of a permutation, with the purpose of having the whole set of poems together, showing the potentiality of the work. An edition of Cent mille milliards de poèmes is shown in Fig. 8. George Perec Georges Perec (1936–1982) is one of the most representative members of the Oulipo. His novel La vie, mode d’emploi, 1978, is an extremely detailed description of the rooms of a house and of everything found there, including occupants, at a particular moment of time. Throughout the novel, many connections are established between the rooms, objects, and characters, which create a network, a novel of novels. A number of combinatorial schemes are involved in the writing of La vie, mode d’emploi. For example, the rooms are mapped onto an orthogonal Latin square of order 10, which consists of two Latin squares over two n-sets, S and T , defined over the same n × n square, so that each pair (s, t) of the Cartesian product S × T appears exactly once. Furthermore, during the first part of the novel, the movements

1942

L. Barrière

around the house, that is, the 10 × 10 square, follow a Hamiltonian path, according to the movements of a knight on a 10 × 10 chessboard (Perec, 1981). In 1976, Perec explained La vie mode d’emploi in a video recorded by the Institut Nationale de l’Audiovisuel (1976). Perec was extraordinarily prolific and his work has been extensively studied. One example that is less famous, but in our opinion, very interesting and revealing of Perec’s mode of creation is the piece Deux Cent Quarante-Trois Cartes Postales en Couleurs Véritables, 1978, a set of 243 postcards supposedly sent from different places around the world. They were created according to five “ingredients,” location, activities, entertainments, special mentions, and farewells, each of them with three possible values. The values were converted into names from a table – one for each ingredient – containing 81 possibilities, some of them invented by Perec (BonchOsmolovskaya, 2018).

Italo Calvino Italo Calvino (1923–1985), an Italian writer who joined the Oulipo in the last stage of his career, wrote novels that potentially contain many different stories which must be constructed by the reader and that have been called “combinatorial literature” (Calvino, 1968). As a precursor of hypertext, he called his novels “hypernovels,” novels that may potentially contain many different stories. In this period, the most famous books he wrote are Le città invisibili (1972), a series of short stories that describe a number of fantastic cities; Il castello dei destini incrociati (1973), for which he used combinations of tarot cards; and Se una notte d’inverno un viaggiatore (1979), a collection of interwoven stories, some of them interrupted by accidents such as an error in typing (purposely made by the author). In all three books, the story is conceived as a network of stories. Calvino (1987) is an essay in which the author explains how he created Se una notte d’inverno un viaggiatore, enlarging the hypertextual character of the novel. In this essay, Calvino shows a diagram representing the structure of the novel using graphs.

Juan Eduardo Cirlot Juan Eduardo Cirlot (1916–1973) was a musician, an intellectual, a critic of art, a medievalist, the author of a dictionary of symbols, and a poet. His early poems were classical sonnets, but shortly after he began a very personal exploration through surrealism – he was in personal contact with André Breton – experimental poetry and permutational poetry. His interests in the Jewish kabbalah, symbolism, structuralism, musical serialism, and almost all ancient and modern literary traditions, including esoterism, occultism, and Sufi mysticism, intensely leak in his writings. Cirlot chooses topics from all these sources to combine them with his own spiritual search (Janés, 2014). The kabbalah is the science of esoteric interpretation of the sacred texts, not only through letters but through numbers, since in the Hebrew alphabet each letter

74 Combinatorial Artists

1943

has an associated numeric value. Juan Eduardo Cirlot found in the hermeneutical techniques of Abraham Abulafia, a Spanish kabbalist from the thirteenth century, a source of inspiration for his permutational poetry, also influenced by dodecaphonic music and the study of symbology. Cirlot’s use of permutations is closely linked to the phonetic value of the words, rhyme and rhythm, and structure. It also relies on the visual distribution of the verses in the page, as in shaped poems. Maybe, the most original feature of Cirlot’s permutational poetry is the connection that he establishes between permutations and symbolism. As he says in the prologue to his Diccionario de los símbolos: [. . . ] the premises that allow the symbolist conception [. . . ] are: (a) Nothing is indifferent. Everything expresses something and everything is significant. (b) No form of reality is independent; everything is related in some way. (c) The quantitative becomes qualitative at certain essential points that constitute precisely the significance of quantity. (d) Everything is serial. (e) There are correlations of situation between the various series and of meaning between said series and the elements they comprise. Seriality, a fundamental phenomenon, encompasses the physical world as well as the spiritual world.

His first experiment with permutations is the work Homenaje a Bécquer, 1954– 1968, in which he composes 23 poems using exclusively lines and words from the famous poem Volverán las oscuras golondrinas, by the Spanish romantic poet Gustavo Adolfo Bécquer. Only one word of the original poem is missing in Cirlot’s homage: Dios, the Spanish word for God (Martin, 2012). Other permutational works are El palacio de plata, 1955, and Inger, permutaciones, 1971, in which he applies permutations to letters and syllables. His creative inquiries resulted in the immense collection of poems Bronwyn, 1966–1971. This work gathers all the influences and resources of the poet, including the use of combinatorial rules for rhyme, in the books Bronwyn, II and Bronwyn, V, and permutations, in the books Bronwyn, permutations and Bronwyn, n (Berenguer, 2007).

Digital Poetry In 1952, Christopher Strachey wrote an algorithm for a Ferranti Mark I that created combinatorial love letters. The algorithm built the love letters according to a predetermined structure, choosing the words to use in each part from a list compiled by Strachey from a thesaurus. In 1959, Theo Lutz produced what he called stochastic texts with a program in ALGOL on a Zuse Z 22. He generated a database with 100 words from Franz Kafka’s novel The Castle. The program selected words from the database and generated syntactically correct sentences. These two examples are known as the first digital poetry experiments, although they both were singular experiments with no artistic context. According to Funkhouser (2007), “digital poetry is a new genre of literary, visual and sonic art launched by poets who began to experiment with computers in the late 1950s.” As Strachey and Lutz’s examples show, digital poetry began with text

1944

L. Barrière

generators, which were based either on random choices of words from a database or permutations of words from a source text. Since the beginning of the 1980s, hypertext has provided the possibility of non-linear texts to digital poets, enhanced with the possibility of dissemination and communication that came with the advent of the Web and network technology. Digital poetry has become a wide field of creation, resulting in a myriad types of expressions. The foundations of digital poetry are the poem Un coup de dés jamais n’abolira le hasard, 1914, by Stéphane Mallarmé and the cut-up techniques from the Dadaists, rediscovered in the 1960s by Brion Gysin, who explained and “lent” it to William Burroughs. In fact, the influence of the Dada poets, who challenged convention with methods like collage, the invention of neologisms, typographical distortion, and the use of non-semantic sounds, is very strong on digital poets that create by means of programmed text generators. Most of the techniques used in digital poetry are based either on the use of randomness or the use of combinatorics, like the permutation poems and the works that use hypertext to create a network structure. Moreover, combinatorial thinking is always present in computer programming, and this gives computer art a more combinatorial character. However, we need to notice that, in general, digital poetry is conceptually more involved with pure randomness than combinatorics. The book by Funkhouser (2007) is a very complete and reliable source of information. It begins with a chronology of works from its origins in 1959 to 1995. With numerous examples, it covers the different types of computer poetry the author has identified, namely, computer programmed texts; visual works (static and kinetic); hypertext and hypermedia; early Internet publications and audio productions; code poetry; and holographic poetry. It also addresses some interesting questions on digital poetry, like how to classify such a varied spectrum of works, and how the technology, hardware and software, available to the poets evolved over time. Bailey (1973) is an interesting anthology of computer poems. The author relates the computer poems in the anthology with known poetic tendencies, such as concrete poetry, sound poetry, shaped poetry, and haiku, a Japanese poetic form that was considered easily mechanized because of its shortness. The webpage I love e-poetry (2020) is a good repository of works and resources on digital poetry.

Brion Gysin Brion Gysin (1916–1986) was a painter, poet, inventor, historian, novelist, songwriter, performer, polyglot – he could handle seven languages – and an artist who made things happen. His eclectic interests are variations of a single fascination: language, in all its forms. He lived in London, Greece, New York, Tangier (Morocco), and Paris, the city that through the years remained his most constant home. In the late 1950s, he spent 4 years in the so-called Beat Hotel, where he began long-lasting artistic collaborations with both the writer William Burroughs and a student of mathematics interested in electronic and computer programming, Ian Sommerville. Those years turned out to be the most productive of Gysin’s artistic career.

74 Combinatorial Artists

1945

One of his most famous works, the Dreamachine, was a recreation, by means of stroboscopic light, of an experience he had on a bus: I had the actual experience, in the back of a bus, driving along a row of trees that was spaced exactly as was necessary to produce the effect with the sun setting behind the trees. I closed my eyes and had what I thought was a spiritual experience. (Weiss, 1991)

The Dreamachine, constructed by Gysin and Sommerville, was the first artwork to be experienced with the eyes closed. As a poet, Gysin (re)discovered the cut-up technique and was one of the fathers of sound poetry with his permutation poems. He says that the permutations idea came from the famous divine tautology “I am that I am,” which was quoted in The Doors of Perception by Huxley, and which he started to play with, repeating and rearranging its words to finally build his first permutation poem. The poem is generated by a cyclical randomized representation of the three words contained in that phrase. One version of his first permutation poem is: I AM THAT I AM I THAT AM I AM I AM I THAT AM I I AM THAT AM I THAT I AM AM I I THAT AM AM I AM THAT AM I I THAT AM AM I I AM AM THAT I I AM AM THAT I I THAT AM AM I AM THAT AM I I AM I AM THAT I I AM AM THAT

Gysin’s permutation poetry imposes a pre-established pattern on the words in a phrase, so they appear in different orders until all possibilities have been exhausted. The availability of computer technology automated the process of randomizing these permutations. Other permutation poems by Gysin are Rub Out The Word, In the Beginning there was the Word, Kick That Habit Man, Poets Don’t Own Words, I Don’t Work You Dig, and Junk Is No Good Baby. In a 1964 piece called Cut-Ups Self-Explained, Gysin declares, “The permuted poems set the words spinning off on their own; echoing out as the words of a potent phrase are permuted into an expanding ripple of meanings which they did not seem to be capable of when they were struck and then stuck into that phrase” (Kuri, 2003). Another famous poem by Gysin is the Pistol Poem, composed by cutting into pieces a recording of a pistol firing and rearranging them. Corbett (1998) writes about Gysin’s collaboration with the saxophonist Steve Lacy as a songwriter. He also collaborated with the graffiti artist Keith Haring, who was also interested in language and used permutations and anagrams in his drawings.

1946

L. Barrière

Combinatorics in Visual Art The interaction between art and geometry has existed since antiquity. In the early decades of the twentieth century, starting with the experiments in Cubism and Expressionism, visual art turned from the figurative to the abstract. A large number of artistic movements arose, some of them inspired by mathematics and, in the case of Constructivism, Geometric Abstraction and Minimalism, by the most simple forms of geometry. Non-representational compositions were concerned with the production of various geometric shapes where the size and character of these shapes, their relationship to each other, as well as the colors used throughout the work become the defining motifs of abstraction. Such artworks were conceived as combinations of forms and colors. The survey by Lorenzi and Francaviglia (2011) provides detailed descriptions of the relations between geometry and art in the twentieth century. In some cases, geometry is the visual part of a work that makes use of, or is based on, combinatorial structures and methodologies. In this sense, and despite the strong geometric character of the works of avant-garde artists, Paul Klee, Wassily Kandinsky, and Josef Albers, who were teachers at the Bauhaus in the interwar period, were influenced by the work of the chemist and Nobel laureate Wilhelm Ostwald, who used combinatorics as a creative and interdisciplinary way of thinking in areas such as knowledge organization and in his theory of colors and forms, as studied by Hapke (2012). In Fig. 1, three pages of the teaching notes of Paul Klee at those years show how he used graphs to represent the relations between colors. From the mid-twentieth century, conceptual geometric abstract artists have worked with systems of rules or instructions that often were considered an artwork in themselves. The use of numerical sequences and the basic combinatorial structures – combinations and permutations – sometimes with the exploration of all possibilities, together with the work in series are the main characteristics of their combinatorial work. Conceptual art developed in parallel with the works of computer art pioneers. In the late 1960s, several computer labs in universities around the world hosted groups of artists that wanted to use computers to draw. Some of these artists were, in fact, scientists who had turned into artists: mathematicians, engineers, physicists. Many collective exhibitions took place around the world: from 1961 to 1973, five exhibitions of artists from the group EXAT51 under the title Nove Tendencije in Zagreb; in 1965, Computer Art in Stuttgart; and in 1968, Cybernetic Serendipity in London. The first solo exhibition of a computer artist was Manfred Mohr: une esthétique programmée, at the Musée d’Art Moderne, in Paris in 1971. Computer art is a vast territory. For those who want to explore it, Shanken (2009) is a survey on how artists use electronic media; McCormack and d’Inverno (2012) and Paul (2016) are good collections of writings on the history, aesthetics, and politics of computer art; Migayrou (2018) is a catalogue of the exhibition Coder le monde that took place in the Centre Georges Pompidou, Paris, in 2018.

74 Combinatorial Artists

1947

Sol LeWitt A conceptual artist, Sol LeWitt (1928–2007) is known for his series of cube works and for his wall drawings. His first cube work, Serial Project #1, 1966, shows the 36 possible configurations that result from two cubes nested into each other, each with two parameters, “surface” (open/closed) and “height” (low/middle/high). In his accompanying text, LeWitt (1967a) describes the combinatorial rules that define this work. Variations of incomplete cubes, 1974, is an installation that explores all possible figures that can be obtained by removing the edges from a cube (see LeWitt and Garrels, 2011). Figure 9 shows some of the constructed figures. The next example allows us to grasp the combinatorial character of LeWitt’s creative process. Between 1969 and 1970, LeWitt created four drawing series on paper. In each series he applied a different system of change to each of twenty-four possible combinations of a square divided into four equal parts, each containing one of the four basic types of lines LeWitt used. The result is four possible permutations for each of the twenty-four original units, which are presented in a grid of twentyfour sets of four squares, each divided into four equal parts. In Drawing Series IV, LeWitt used the Cross Reverse method of change, in which the parts of each of the original units are crossed and reversed. In Wall Drawing 413, shown in Fig. 10, LeWitt executed his Drawing Series IV using ink. Other works that follow combinatorial rules are Wall Drawing #450, A wall is divided vertically into four equal parts. All one-, two-, three- and four-part combinations of four colors, 1985, and Wall Drawing #493, The wall is divided vertically into three equal parts. All one-, two-, and three-part combinations of three colors, 1986. LeWitt wrote about his ideas and his art in an easy to read style, not lacking a sense of humor. The most famous writings are the Paragraphs on Conceptual Art (1967b) and the Sentences on Conceptual Art (1969). They are short artist statements, in which he says, for example: “Ideas can be works of art; they are in a chain of development that may eventually find some form. All ideas need not be made physical.” (sentence number 9) or “For each work of art that becomes physical there are many variations that do not.” (sentence number 11) – two sentences that

Fig. 9 Some of LeWitt’s incomplete open cubes, 1974. (Courtesy MASS MoCA)

1948

L. Barrière

Fig. 10 Sol LeWitt’s Wall Drawing #413, 1984. (Courtesy MASS MoCA)

connect with the ideas of the Oulipo group – and also “Conceptual artists are mystics rather than rationalists. They leap to conclusions that logic cannot reach.”(sentence number 1), which suggests connections with Llull’s Ars Combinatoria and Cirlot’s permutation poems. LeWitt’s interest in making the combinatorial structure of the work visible to the observer contrasts with the conception of Donald Judd, an artist that worked in numerical sequences and whose early work influenced LeWitt. Judd thought that the mathematics in his work had to be somehow hidden or, at least, made less clear to the observer (Rottman, 2015).

Vera Molnar Vera Molnar (born 1924), who was educated in classical painting, worked with a variety of techniques such as collage, gouache, and pencil, and yet her fame comes from being a pioneer of digital art (Molnar, 2020). She used the computer as a tool to develop a systematic language. “Its immense combinatorial capacity facilitates the systematic investigation of the infinite field of possibilities,” she said (Molnar, 1984). In her work, influenced by the ideas of Max Bill (1996) concerning mathematics in art, she used geometric figures such as squares, circles, and lines, with an emphasis on repetition. Besides the use of simple geometric figures (as she says, “I love squares”), her artworks are based on an investigation of the expressive possibilities of the contrast between order and disorder, which she accomplishes by combining irregularities or randomness with the structure given by combinatorics. The representation and decomposition of the 3 × 3 grid, sometimes using polyominoes, is a recurrent theme in Molnar’s work, and more generally, the square grid, a combinatorial structure that is decomposed and deconstructed in different ways in a large number of her works (see Fig. 11). One of these works is Hommage à Dürer, a series that she started in 1948, and on which she worked for more than

74 Combinatorial Artists

1949

Fig. 11 Structure de Quadrilatères by Vera Molnar, 1987. (Courtesy Spalter Digital Art Collection)

Fig. 12 A recreation of Vera Molnar’s Hommage à Dürer by L.B

50 years. It represents a magic square by a line following a permutation of the squares of a 4 × 4 grid, referring to the magic square in the painting Melencolia I, an engraving by Albrecht Dürer from 1514. A number of different permutations are generated and placed to occupy the squares of a 10 × 10, 15 × 15, or 20 × 20 grid. Figure 12 shows a reinterpretation of this work by Lali Barrière, a creative programming exercise.

1950

L. Barrière

Manfred Mohr Manfred Mohr (born 1938) was attracted to computer-generated algorithmic geometry after discovering Max Bense’s information aesthetics in the early 1960s (Bense, 1956). In 1969, he programmed his first computer drawings. Since then he has continued to develop and write algorithms for his visual ideas, and his art is produced exclusively with computers. As he explains: The first step in that direction was an extended analysis of my own paintings and drawings from the last ten years. It resulted in a surprisingly large amount of regularities, determined of course by my particular aesthetical sense, through which I was able to establish a number of basic elements that amounted to a rudimentary syntax. After representing these basic constructions through a mathematical formalism, and setting them up in an abstract combinatorial framework, I was in a position to realize all possible representations of my algorithms. (Mohr, 1971, page 36)

Despite the fact that Mohr has worked for more than 50 years with computers, he does not consider himself a digital artist; similarly, despite the fact that he has always used elements of mathematics to realize his ideas, he does not consider himself a mathematical artist. Language and logic are the intellectual concepts that sustain his work, and he sees the computer as a tool that allows him to write down what he wants to do. Combinatorial ideas show up in Mohr’s work very early, before he began to use the computer, but become completely realized in his computer works. In 1973, Mohr started his work on the cube and its combinatorial properties, first considering the 12 lines of the cube as an alphabet, and explicitly showing combinations of a certain number of lines in the computer generated films Cubic Limit, 1972–1975, and Complementary Cubes, 1973–1974 (see Fig. 13). His work on the cube has continued to the present day with an extension to multidimensional hypercubes and their combinatorial properties. Mohr has succeeded in constructing a very rich, personal language from fixed and apparently simple geometric structure. In using the cube as an instrument to create his language, the necessity of working in higher dimensions comes not from a desire for complexity but from the fact that more complex structures provide more elements to work with, more possibilities (Mohr, 2019). In the series Dimensions I, 1978, he introduced the concept of the diagonal path in his works, which he used first in the graph of the 4-dimensional hypercube, with some interesting combinatorial works: exploration of all possible diagonal paths between two opposite vertices, for all pairs of opposite vertices; the construction of maximal planar graphs containing only diagonal paths between two vertices; and arranging the sets of edges in subsets with some given properties. With an increasing structural complexity, given by the dimension of the hypercube Mohr worked with, other combinatorial series of works are Divisibility I, II, III, 1980–1985, Dimensions II, 1986–1989, Line Cluster, 1989– 1990, and Laserglyphs, 1990–1993. Mohr’s work is based on the systematic creation of signs from a simple geometric structure, the cube and the hypercube. Showing only some aspects of it breaks the

74 Combinatorial Artists

1951

Fig. 13 Some images from Mohr’s film Cubic Limit, 1972–1975. (With permission of the author)

Fig. 14 parallelResonance by Manfred Mohr, 2009–2012, on a diagonal path through the 11dimensional hypercube. (With permission of the author)

symmetry of the underlying structure and creates an ambiguity in the sign. This is especially true in his series of works from 2009: parallelResonance, 2009–2011, Artificiata II, 2012–2016, Transit code, 2017–2018, and Algorithmic Modulations, 2019. Through explorations of the diagonal paths of hypercubes of dimensions 11 and 12 and their rotations, Mohr generates aesthetically appealing works where the hidden structure is suggested but difficult to infer. Figures 14 and 15 show two of these works.

1952

L. Barrière

Fig. 15 Manfred Mohr’s Algorithmic modulations, 2019, one of his late works, on diagonal paths through the 12-dimensional hypercube. (With permission of the author)

Manfred Mohr’s webpage (2020a) is very well maintained and a huge source of information. Besides his artworks, it includes information about the processes that led him to them and also documents from all the exhibitions and events he was involved in. Several videos from Artificiata II can be watched on Mohr’s Vimeo channel (2020b).

Vladimir Bonaˇci´c Vladimir Bonaˇci´c (1938–1999) was an electrical engineer at the University of Zagreb. In 1967, he wrote a PhD thesis on pattern recognition and data structures, a very advanced subject at that time. In 1968, on the occasion of the 4th Nove Tendencije exhibition, he collaborated with the painter and graphic designer Ivan Picelj, a constructivist artist that was a founder member of the Croatian group EXAT51. This collaboration was the beginning of Bonaˇci´c’s artistic career, and he became a pioneer in interactive kinetic art, using computer systems to create what he called cybernetic art. From 1969 to 1973, he was the head of the Laboratory of Cybernetics at the Research Institute Ruder Boškovi´c in Zagreb, and in 1971 served as an advisor to the UNESCO on art and science matters. From 1973 to 1977, together with the software developer Miro A. Cimerman and the architect Dunja Donassy, he created the laboratory bcd – cybernetic art in Jerusalem, in the framework of the “Jerusalem Program in Art and Science,” where he taught “computer-based art,” again, a very advanced matter to teach in the 1970s. Just as Manfred Mohr was influenced by the aesthetical and theoretical ideas of Max Bense, and Vera Molnar by Max Bill’s proposal of formalizing art using mathematics, Vladimir Bonaˇci´c was influenced by the ideas of Matko Meštrovi´c,

74 Combinatorial Artists

1953

who did not consider artworks as unique goods for the artistic market, but as “plastic-visual research, with the aim of determining the objective psychophysical bases of the plastic phenomenon and visual perception, in this way a priori excluding any possibility of including subjectivism, individualism, and romanticism [. . . ]” (Meštrovi´c, 2005). Bonaˇci´c’s innovative approach was a result of the combination of his artistic personality with his scientific and technological knowledge. In his exploration of interactivity he said: The fact that the dimension of time is a key factor in objects of a kinetic art type makes it possible to give artistic expression to visual and auditory experiences, perhaps of more significance than those that can be made by computer display methods that have been commonly used. (Bonaˇci´c, 1974)

This quotation shows Bonaˇci´c’s worries about kinetics and interactivity, in the early stages of this form of art. His artworks are physical works that he calls “dynamic objects,” constructed using group theory and other mathematical concepts. Most of them were panels of light (and sometimes sound) combinations, controlled by a computer that generated sequences of Galois fields patterns. They also provided some primitive interaction. The most known is GF.E (16,4) CNSM, 1969–1971, a 178 × 178 centimeters panel of colored lights that weighted half a ton, shown in Fig. 16. Far from the artists that used the computer to simulate existing art, Bonaˇci´c conceived of the computer as a tool to uncover a new world where scientists and artists could work together. In his aesthetic and artistic conception, he was very

Fig. 16 The dynamic object GF.E (16,4) CNSM, 1969–1971, by Vladimir Bonaˇci´c. (©Dunja Donassy-Bonaˇci´c, with permission)

1954

L. Barrière

Fig. 17 Vladimir Bonaˇci´c’s Random 63, 1969. (Photo by L.B)

critical of the use of pure randomness in computer art. His reflections on this matter, which can be found in Bonaˇci´c (1974), led him to the creation of the work Random 63, 1969 (see Fig. 17). Bonaˇci´c (1974) is a good source for a first-hand explanation of both Bonaˇci´c’s works on the Galois fields and Bonaˇci´c’s ideas on what and how computer art must be. Fritz (2008, 2011) shed more light on the history of Bonaˇci´c’s artistic career and other works.

Anders Hoff Aka Inconvergent Hoff (2020) is a young digital artist that explores algorithms in depth. Since the beginning of his career as an artist, he has evolved from works inspired by nature simulation to more conceptual ones, leading him to a more combinatorial approach to creation. Hoff has a profound knowledge of programming languages, advanced data structures, and algorithms, from which he gets inspiration in his work. He is one of the current digital artists that think combinatorially.

74 Combinatorial Artists

1955

Fig. 18 The plotter drawings 8dbcdd3 and 89143f7, 2017–2020, by inconvergent. (With permission of the author)

Inconvergent’s webpage (2020), a repository of his work, shows images of his digital prints and plotter drawings, but also some writings about how he approaches his artistic activity and some technical explanations on some of the algorithms he creates. Figure 18 shows two of his plotter drawings.

Other Combinatorial Visual Artists Conceptual geometric abstraction is an artistic tendency that favors the use of combinatorial ideas. Besides Sol LeWitt, other artists have used combinatorial ideas in their work, although in a less committed way. In this sense, the most interesting works are the drawings of Kenneth Martin (1905–1984) and Mary Martin (1907– 1969), but also the use of the grid structure by Luis Tomasello (1915–2014), the original colored paintings by Channa Horwitz (1932–2013), the graph-like structures of Gertrude Goldschmidt aka Gego (1912–1994), and the conceptual art by François Morellet (1926–2016). Algorithmic artists Vera Molnar and Manfred Mohr shared their interest for drawing with the computer with other digital art pioneers, who necessarily used combinatorics in their works, though interested in other areas of mathematics. Interesting works include those of Frieder Nake (born 1938), based on probability and Markov chains, some of them on the grid structure (see Fig. 19), the computer calculated but hand painted works of Hiroshi Kawano (1925–2012), the graphic experiments using random numbers by Georg Nees (1926–2016), and the geometric but somehow combinatorial drawings of Edward Zajec (born 1938), to name only a few.

1956

L. Barrière

Fig. 19 Frieder Nake’s Walk-through-raster Series 7.1-6 and Walk-through-raster Series 2.1-4, 1966. (With permission of the author)

During the 1980s and 1990s, the rapid evolution of the computational power and graphic capabilities of computers awakened artists’ interest in graphic and computational challenges like simulation and 3D, far from the conceptual ideas and mathematical formalization of combinatorial artists. As we have already mentioned, the use of combinatorics has become natural for artists nowadays, often mixed with other techniques and methods. We have highlighted some of the works by Anders Hoff, aka inconvergent, because they represent a younger generation of computational artists that have a solid background in modern computer science (data structures and programming) and discrete mathematics. We are interested in artists that pay attention to the process, to the systems they create, something that will always involve the use of combinatorics. The following list is subjective and represents only a tiny sample: the minimalist conceptual art of Casey Reas, the artistic projects based on data visualization of Ryoji Ikeda, the generative works of the software designer and artist Reza Ali, the data visualization works of Manuel Lima, the computational geometric universe of Marius Watz, and the works Forms, 2012, by Memo Akten and Quayola; Colorfield, 2009–2016, by Jon McCormack; and Sphere Packing: Bach, 2018, by Rafael Lozano-Hemmer.

Dance, Theatre, and Cinema Dance, cinema and theatre are artistic manifestations that revolve around the notion of “the spectator”: the processes of expression, communication, and reception are an important part of the artist’s interests. Moreover, in dance and theatre, there is an obvious interest in the body and movement. These notions are not easy to formalize using mathematics, and so there are few examples of combinatorics applied to these three forms of art. In most of them, artists get close to combinatorics through their

74 Combinatorial Artists

1957

interest in language and structure. We have combined them in this section for the sake of completeness.

Dance Merce Cunningham (1919–2009) was an influential dancer and choreographer. For more than 50 years he developed an experimental and innovative body of work, in collaboration with artists from other fields. His lifelong partner, the musician John Cage, was known for the use of chance in his music, something that they also applied in their numerous works together. Cunningham is famous for applying the technique of collage to dance and for being the first choreographer to use the computer as a choreographic tool. The earliest dance Cunningham choreographed with the assistance of the computer program Life Forms was Trackers in 1991 (Copeland, 1999, 2002 and Cunningham, 1994). Lucinda Childs (born 1940) and Deborah Hay (born 1941) are two pioneering dancers and choreographers in the field of experimental dance. They both trained with Merce Cunningham. Lucinda Childs’ conceptual dance is based on creating patterns by means of the repetition, combination, and permutation of simple minimalistic movements. She has collaborated with Philip Glass and Sol LeWitt among others (Bither and Engberg, 2011). Deborah Hay is one of the founding members of the Judson Dance Theater. She used specific movements as permutable elements in 20 Permutations of 2 Sets of 3 Equal Parts in a Linear Pattern, 1969 (Hay and McDonagh, 1970). Matteo Fargion (born 1961), musician, and Jonathan Burrows (born 1960), choreographer, met in 1989 and have collaborate closely since then. Together they have built a language that explores the relationship between music and dance. They are interested in the process more than the result, and concerned with attention, rhythm, repetition, unpredictability, surprise, and counterpoint (see Burrows, 2020 and Fargion, 2020).

Theatre Samuel Beckett (1906–1989) was a playwright, theatre director, writer, and poet known for his exploration of language. The linguistic experimentation present in his works, which throughout his career became increasingly minimalist, is embodied in Beckett’s use of mathematical processes as artistic techniques. Beckett tries to describe the world mathematically by presenting all (or many) possible combinations or permutations of a situation, and shows the contrast between the limited but precise world of mathematics and the imprecise but unlimited world of natural language. Quad, 1981, is a performance he wrote for television based on the number 4, which involves a very committed use of combinatorial structures (Stevens, 2010). Watt, 1945, is a novel that explores the absurd and the contradictions of language (Howard, 1994). Beckett’s universe has been and continues to be widely studied. The article by Brits (2019) is devoted to the use of mathematics in Beckett’s works.

1958

L. Barrière

Allan Kaprow (1927–2006) is known for inventing, in the late 1950s, the concept of happening, an artistic action which blurred the line between life and art, and between the artist and the audience. He advocated against competition in favor of a playful participation: [. . . ] This critical difference between gaming and playing cannot be ignored. Both involve free fantasy and apparent spontaneity, both may have clear structures, both may (but needn’t) require special skills that enhance the playing. Play, however, offers satisfaction, not in some stated practical outcome, some immediate accomplishment, but rather in continuous participation as its own end. Taking sides, victory, and defeat, all irrelevant in play, are the chief requisites of game. In play, one is carefree; in a game one is anxious about winning. (Kaprow, 1993, page 122)

His work influenced the Fluxus movement and the art of installation. Although his approach is not mathematical, in his explorations of the artistic language he used variations on a given number of elements in performance work. Some of the numerous and interesting essays he wrote about his work and ideas are collected in Kaprow (1993).

Cinema In cinema, there are two different types of combinatorial works: abstract works that show combinatorial geometric structures and works that treat the narrative in a nonlinear combinatorial way. The earliest combinatorial abstract film we found is John Whitney’s Permutations, 1968, although we could find, with little effort, some combinatorial elements in early films classified as “visual music,” namely, the films by Viking Eggeling (1880–1925), for example, Symphonie Diagonale, 1924, and the works of Oskar Fischinger (1900–1967) (Jennings, 2015; Keefer and Guldemond, 2013). Hollis Frampton (1936–1984) was a photographer and experimental filmmaker who made audacious films that shunned narrative and adopted systems based on mathematics and linguistics for their organizational structures. His most famous film, Zorns Lemma, 1970, takes the title from a result in set theory. In fact, for Frampton, set theory permits the abstract representation of film’s capacity to catalogue intersecting planes of perception in infinite combinations (MacDonald, 1979; Ragona, 2004). Raul Ruiz (1941–2011), a Chilean filmmaker exiled in France since 1973, created an experimental cinema that explores lack of identity and loss of territory. Raul Ruiz’s cinema comes from a continuous reflection on narrative modes that results in a narrative universe made of interwoven stories which cross and turn over themselves. This generates a network of stories. All of them seem possible. (Vásquez Rocca, 2013). Peter Greenaway (born 1942) is a filmmaker that uses numbers, sets, and proportions, to help him structure a film. Greenaway writes: I am constantly looking for something more substantial than narrative to hold the vocabulary of cinema together. I have constantly looked for, quoted, and invented organising principles

74 Combinatorial Artists

1959

that reflect temporal passing more successfully than narrative and that code behaviour more abstractly than narrative, performing these tasks with some form of passionate detachment. [. . . ] Numbers help. Numbers can mean definable structure, readily understandable around the world. And numbers essentially carry no emotional overload. (Greenaway, 2005).

Closing Time We have seen several musicians that work with combinatorics in one way or another. Dodecaphonic music, a rigid system based on permutations of the twelve tones; Iannis Xenakis, a musician that wanted to formalize music using mathematics and who created combinatorial compositions; Tom Johnson, “the” combinatorial musician, the man who counts, whose whole production is inspired by the application of mathematical rules, most of them combinatorial; and Elliott Carter, an experimental musician who explored rhythm and harmony in a very combinatorial way. In literature, permutations have been the most used combinatorial structure since very early times. The examples of Juan Eduardo Cirlot, who managed to give permutations a mystical meaning, and Brion Gysin, who used them in a playful way as part of his diverse artistic production, show two very distinct approaches. Besides permutations, we have presented some works exploring the idea of potentiality, such as the novels by Cortázar and Marc Saporta, and also the Oulipo group. This group of writers were strongly involved in using combinatorics in their art in all possible forms, with the help of some of the members of the group that were also mathematicians. Like music and literature, visual art became combinatorial with the arrival of abstraction. However, in visual art, combinatorics is always mixed with geometry. We have presented the conceptual geometric art of Sol LeWitt, together with two digital artists, Vera Molnar and Manfred Mohr, who are the most combinatorial examples among the pioneers of computer art. These three artists were contemporary and shared ideas, but clearly have different approaches. Vladimir Bonaˇci´c, an artist who worked with hardware and electronics and was interested in collaboration between scientists and artists, built an emblematic installation based on a Galois field, an algebraic discrete structure, and made other interesting cybernetic artworks. Many of the early computer artworks involved the use of combinatorial structures and methodologies, but not all artists adopted a combinatorial approach in their work. Among those who did, we found two main non-exclusive approaches: the ones that worked conceptually with combinatorial ideas, structures, and methodologies and the ones that were inspired by discrete structures and algorithms in a more technical manner. Combinatorial artists are those artists that merge these two interests in a balanced way. Concepts and ideas matter; methodologies and techniques matter too. This is the reason why Anders Hoff aka inconvergent is “the” artist we choose to represent the current generation of digital combinatorial artists. Notice that in dance, theatre, and cinema there are few examples. We cannot think of these examples as isolated cases, because they belong to the extended family of artists of all disciplines that worry about structure and about renewing language in a formal way, and in doing so, rely on mathematics.

1960

L. Barrière

The selection of artists we have presented shows a variety of combinatorial works and a variety of approaches to the use of combinatorics. Nowadays, combinatorics is in the toolbox of all artists, and we can find a huge number of works of art or artistic projects with the work “permutation” in the title. This current state of affairs is thanks to the pioneers that philosophically and artistically engaged in the paradigm changes that took place during the twentieth century. To the question we posed in the introduction of this chapter, “Are there artists who work or think combinatorially?” the answer is yes. Those artists (in order of appearance) are our beloved Tom Johnson, Georges Perec, Sol LeWitt, and Manfred Mohr.

References Bailey RW (1973) Computer poems. Potagannissing Press, Michigan Bellamann H (1933) Charles Ives: the man and his music. Music Q 19(1):45–58 Bense M (1956) Aesthetica II – aesthetische information. Agis-Verlag, Baden-Baden Benson D (2006) Music: a mathematical offering. Cambridge University Press, Cambridge Berenguer Martín J (2007) Fundamentos epistemológicos de la permutación en la poesía de Juan Eduardo Cirlot. Anuario de filología hispánica 10:69–81 Berge C (2016) Para un análisis potencial de la literatura combinatoria. In: Salceda H (ed) OULIPO. Atlas de Literatura Potencial, 1: Ideas Potentes. Pepitas de Calabaza, Logroño Bernard JW (1988) The evolution of Elliott Carter’s rhythmic practice. Perspect New Music 26(2):164–203 Bill M (1996) The mathematical approach in contemporary art. In: Stiles K, Selz P (eds) Theories and documents of contemporary art. A Sosurcebook of Artists’ Writings, University of California Press, Berkeley, pp 74–77 Bither P, Engberg S (2011) Philip Glass and Lucinda Childs Discuss “Dance” (video). Walker Art Center. https://www.youtube.com/watch?v=LBcAEmIn19g. Accessed 24 Sept 2020 Bonaˇci´c V (1974) Kinetic art: application of abstract algebra to objects with computer-controlled flashing lights and sound combinations. Leonardo 7(3):193–200 Bonch-Osmolovskaya T (2018) Combinatorial greetings from Georges Perec. In: Torrence B, Torrence E, Séquin CH, Fenyvesi K, Kaplan CS (eds) Proceedings of Bridges Stockholm 2018: Mathematics, Music, Architecture, Education, Culture, Stockholm, 25–29 July 2018. Tessellations Publishing, Phoenix, pp 253–258 Brits B (2019) Beckett and mathematics. In: Rabaté JM (ed) The new Samuel Beckett studies. Cambridge University Press, Cambridge, pp 215–230 Burrows J (2020) Jonathan Burrows’ website http://www.jonathanburrows.info. Accessed 24 Sept 2020 Calvino I (1968) Cibernetica e fantasmi (Appunti sulla narrativa come processo combinatorio). In: Le conferenze dell’Associazione Culturale Italiana 1967–68, Torino, pp 9–23 Calvino I (1987) Comment j’ai écrit un de mes livres. In: La Bibliotèque Oulipienne, vol 2. Ramsay, pp 24–44 Cameron PJ (1994) Combinatorics: topics, techniques, algorithms. Cambridge University Press, Cambridge Carter E (2001) Harmony book. Carl Fisher, New York Copeland R (1999) Cunningham, collage, and the computer. PAJ J Perform Art 21(3):42–54 Copeland R (2002) Merce Cunningham and the aesthetic of collage. Drama Rev 46(1):11–28 Corbett J (1998) Nothing is true; everything is permuted: the Brion Gysin/Steve Lacy songbook. Discourse 20(1/2):110–123 Cowell H (1969) New musical resources. Something Else Press, New York

74 Combinatorial Artists

1961

Cunningham M (1994) Four events that have led to large discoveries. In: Vaughan D, Harris M (eds) Merce Cunningham: fifty years. Aperture Press, New York, p 276 D’Ors M (1977) El Caligrama de Simmias a Apollinaire. Historia y antología de una tradición clásica. Universidad de Navarra, Pamplona Edwards JS (2005) The Carmina of publilius optatianus porphyrius and the creative process. In: Deroux C (ed) Studies in latin literature and roman history, vol XII. Collection Latomus, Bruxelles, pp 447–466 Fages F (2020) Personal communication Fargion M (2020) Matteo Fargion’s website https://www.matteofargion.com. Accessed 24 Sept 2020 Fritz D (2008) Vladimir Bonaˇci´c: computer-generated works made within Zagreb’s new tendencies network (1961–1973). Leonardo 41(2):175–183 Fritz D (2011) Images and sound created and synchronized by algorithm Vladimir Bonaˇci´c’s computer-generated interactive audiovisual object GF. E(16,4), 1969–1974. In: Proceedings of the International Conference Pierre Schaeffer: MediArt, Rijeka, 2011. Museum of Modern and Contemporary Art, Rijeka, pp 134–143 Funkhouser CT (2007) Prehistoric digital poetry. An archaeology of forms, 1959–1995. The University of Alabama Press, Tuscaloosa Greenaway P (2005) Some organizing principles. In: Emmer M (ed) The visual mind II. The MIT Press, Cambridge, pp 601–622 Hapke T (2012) Wilhelm Ostwald’s combinatorics as a link between in-formation and form. Libr Trends 61(2):286–303 Hay D, McDonagh D (1970) Interview with Deborah Hay. (2 audio disks) Higgins D (1987) Pattern poetry: guide to an unknown literature. SUNY Press, Albany Hoff A (2020) Anders Hoff a.k.a. inconvergent’s website https://inconvergent.net. Accessed 24 Sept 2020 Howard JA (1994) The roots of Beckett’s aesthetic: mathematical allusions in ‘Watt.’ (Samuel Beckett). Papers Lang Lit 30(4):346–351 I love e-poetry (webpage). http://iloveepoetry.com/. Accessed 24 Sept 2020 Janés C (2014) Juan Eduardo Cirlot. Cuando la palabra y la letra llaman a su forma. Tintas. Quaderni di letterature iberiche e iberoamericane Special issue pp 389–400 Jennings G (2015) Abstract video. The moving image in contemporary art. University of Clifornia Press, Oakland Johnson T (2020a) Editions 75 – works by Tom Johnson. http://www.editions75.com/. Accessed 24 Sept 2020 Johnson T (2020b) Tom Johnson – Youtube. http://www.youtube.com/channel/ UCUJX3s4zL0tc2vgnCNpDjAg. Accessed 24 Sept 2020 Johnson T, Jedrzejewski F (2014) Looking at numbers. Birkhäuser, Basel Kaprow A (1993) Essays on the blurring of art and life. University of California Press, Berkeley Keefer C, Guldemond J (2013) Oskar Fischinger 1900–1967. In: Experiments in cinematic abstraction. EYE Filmmuseum, Amsterdam Knoblock E (2002) The sounding algebra: relations between combinatorics and music from Mersenne to Euler. In: Assayag G, Feichtinger HG, Rodrigues JF (eds) Music and mathematics. A diderot mathematical forum. Springer, Berlin/Heidelberg Kuri JF (ed) (2003) Brion Gysin: tuning in to the multimedia age. Thames and Hudson, New York Lambert JP (1990) Interval cycles as compositional resources in the music of Charles Ives. Music Theory Spect 12(1):43–82 LeWitt S (1967a) Serial project #1. Aspen 5–6, 1967. http://www.ubu.com/aspen/aspen5and6/ serialProject.html. Accessed 24 Sept 2020 LeWitt S (1967b) Paragraphs on conceptual art. Artforum June 1967 LeWitt S (1969) Sentences on conceptual art. Art-Language 1(1) LeWitt S, Garrels G (2011) Untangling the puzzle of Sol LeWitt’s open cubes (video). San Francisco Museum of Modern Art. https://www.youtube.com/watch?v=w9ROCnWMPww

1962

L. Barrière

Lorenzi MG, Francaviglia M (2011) The role of mathematics in contemporary art at the turn of the millenium. Aplimat – J Appl Math 4(4):215–238 Loy G (2006) Musimathics. The mathematical foundations of music, vol 1. The MIT Press, Cambridge MacDonald S (1979) Interview with Hollis Frampton: “Zorns Lemma”. Q Rev Film Stud 4(1): 23–37 Martí Ortega E (2012) Juan Eduardo Cirlot y las golondrinas de Bécquer. Cuad Hispanoam 747: 39–48 Maurer WD (2004) The mathematics of Jazz. In: Sarhangi R, Séquin C (eds) Proceedings of Bridges 2004: Mathematical Connections in Art, Music, and Science, Winfield, Kansas. Central Plain Book Manufacturing, Winfield, Kansas, pp 273–280 McCormack J, d’Inverno M (eds) (2012) Computers and creativity. Springer, Berlin/Heidelberg Menabrea LF (1843) Sketch of the analytical engine invented by Charles Babbage. R and JE Taylor, London Meštrovi´c M (2005) Rendering scientific as the condition for humanization. In: From the particular to the general, Mladost, Zagreb. (Originally in the catalog of Nove Tendencije 2, 1963) Migayrou F (ed) (2018) Coder le Monde. Éditions du Centre Pompidou, Paris Mohr M (1971) Computer graphics. Une ésthetique programmée. Weberdruck, Pforzheim Mohr M (2019) Personal communication Mohr M (2020a) Manfred Mohr’s website. http://www.emohr.com/. Accessed 24 Sept 2020 Mohr M (2020b) Manfred Mohr’s vimeo video channel. https://vimeo.com/manfredmohr. Accessed 24 Sept 2020 Molnar V (1984) Regard sur mes images. Revue d’esthétique 7:116 Molnar V (2020) Vera Molnar. http://www.veramolnar.com/. Accessed 24 Sept 2020 Nolan C (2000) On musical space and combinatorics: historical and conceptual perspectives in music theory. In: Sarhangi R (ed) Bridges. Mathematical connections in art, music, and science. Winfield, Kansas, pp 201–208 Paul C (ed) (2016) A companion to digital art. Wiley Blackwell, Oxford Perec G (1976) Georges Perec présente La Vie mode d’emploi (video). Institut Nationale de l’Audiovisuel. Available online: https://www.ina.fr/video/I09365760/georges-perec-presentela-vie-mode-d-emploi-video.html. Accessed 24 Sept 2020 Perec G (1981) Quatre figures pour La Vie mode d’emploi. L’Arc 76:50–54 Polanski L, Barnett A, Winter M (2011) A few more words about James Tenney: dissonant counterpoint and statistical feedback. J Math Music 5(2):63–82 Ragona M (2004) Hidden noise: strategies of sound Montage in the films of Hollis Frampton. October 109:96–118 Read RC (1997) Combinatorial problems in the theory of music. Discr Math 167–168:543–551 Rottman M (2015) Donald Judd’s arithmetics and Sol LeWitt’s combinatorics. On the relationship between visual and mathematical in New York art around 1960. In: Emmer M (ed) Imagine math 3. Between culture and mathematics. Springer, Cham Saunders J (2008) Modular music. Perspect New Music 46(1):152–193 Schoenberg A (1975) Composition with twelve tones. In: Schoenberg A (ed) Style and idea. Selected writings of Arnold Schoenberg. Faber & Faber, London, pp 102–143 Sethares WA (2007) Rhythm and transforms. Springer, London Shanken E (ed) (2009) Art and electronic media. Paidon Press, New York Stevens B (2010) A purgatorial calculus: Beckett’s mathematics in “Quad”. In: Gontarski SE (ed) A companion to Samuel Beckett. Wiley-Blackwell, Chischester, pp 164–181 Stinson DR (2004) Combinatorial designs: constructions and analysis. Springer, New York Swift J (1735) Gulliver travel’s. Available at the Internet Archive: https://archive.org/details/ thetravelsofdoctorlemuelgulliverintoremotenationsoftheworld. Accessed 24 Sept 2020 Vásquez Rocca A (2013) Diálogo de exiliados, cine y políticas estéticas en Latinoamérica: Raúl Ruiz, territorios, ontología de lo fantástico y polisemia visual. Nómadas. Crit J Soc Juridical Sci 187–211. https://dialnet.unirioja.es/servlet/articulo?codigo=4232804

74 Combinatorial Artists

1963

Vega A, Weibel P, Zielinski S (2018) DIA-LOGOS. Ramon Llull and the Ars Combinatoria (brochure). ZKM | Center for Art and Media, Karlsruhe. https://zkm.de/media/file/en/dialogos_ broschure-en_digital.pdf. Accessed 24 Sept 2020 Vriezen S (2017) Diagrams, games and time (Towards the analysisi of open form scores). In: Pareyon G, Pina-Romero S, Agustín-Aquino OA, Lluis-Puebla E (eds) The musicalmathematical mind. Patterns and transformations. Springer, Cham Vriezen S (2020) Personal communication Weiss J (1991) Writing at risk: interviews in Paris with uncommon writers. University of Iowa Press, Iowa City Winter M (2019) A few more thoughts about Leibniz: the prediction of harmonic distance in harmonic space. MusMat Braz J Music Math 3:79–92 Xenakis I (1971) Formalized music: thought and mathematics in composition. Pendragon Press, Styuvesant Zweig J (1997) Ars combinatoria: mystical systems, procedural art, and the computer. Art J 56(3):20–29

Part V Mathematics, Science, and Dynamical Systems

Mathematics, Science, and Dynamical Systems: An Introduction

75

Torsten Lindström and Bharath Sriraman

Abstract In this short introduction, the section “Mathematics, Science, and Dynamical Systems” of the Handbook of the Mathematics of the Arts and Sciences is summarized.

Keywords Dynamical systems · Differential equations · Difference equations

Mathematics is usually considered to be an abstract and general science for problemsolving and method development. It contains development, use, and analysis of methods. In some cases, the context in which an original problem or a class of problems was solved presents opportunities for axiomatization, and a mathematical theory can develop out of it (e.g., theory of differential and partial differential equations, etc.). In other words, the abstraction level is high enough to allow an analysis of the problems outside their original context (pure mathematics). This approach allows for focusing on the problem structure, isolation of key dependencies, and analysis of the generality of the methods. In other cases, the coupling to the original context is explicit and requires a solid understanding of both mathematical methods and the particular application in question (applied mathematics). In rare instances, theoretical mathematics like Minkowskian space-

T. Lindström () Department of Mathematics, Linnaeus University, Växjö, Sweden e-mail: [email protected] B. Sriraman () Department of Mathematical Sciences, University of Montana, Missoula, MT, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_143

1967

1968

T. Lindström and B. Sriraman

time geometry, or Birkhoff’s ergodic theorem precipitate subsequent advances in the study of dynamical systems. A dynamical system is a model designed for predicting the future given a current state. In most cases, the model is a differential equation, but it can also be an equation based on recursion, i.e., using difference equations, which result in discrete dynamical systems. Differential equations have a number of advantages in this context. On an infinite decimal time-scale, the various contributions tend to operate independently and the various contributions can simply be added. This makes it possible to build models that, in principle, do the prediction for very extensive setups of processes. The process of integration is also better understood than summation in general, and this gives another advantage to the differential equation approach. The laws of nature can in most cases be expressed in terms of a set of differential equations. This means that the changes in the abundances of certain quantities are given in terms of functions of these abundances. The modeling itself is the easy part in this case, but the process of integrating these equations for formulating the solutions brings us in many cases to the unknown. Even if some simplifications of the reality can be completely known, taking an additional aspect into account might change the situation entirely. In other words, the complexity can radically increase with addition of variables (e.g., models of climate change). The study of dynamical systems today involves many studies that include problems that are studied outside their original context as well as problems that have a clear connection to their particular application. This section of the handbook provides a number of contributions covering both directions. Indeed, the contributions range from topics like model parameter validation and statistics, possibilities for using dynamical systems in order to solve equations and optimization problems, descriptions of evolution, a detailed description of an extremely nonlinear phenomenon called limit cycles, predictions of the spread of invasive species, and recurrent population dynamics. In this section, our attempt is to keep the above ordering starting from contributions that are quite close to their original applications and ending up contributions that show a tendency to become separated from their original applications. We have tried to order the chapters using this logic. The study of dynamical systems covers a wide spectrum of problems from both the physical and natural sciences, and the mathematics can involve deep notions from both ergodic theory and topology. The chapters in this section also reveal this range of mathematics.

Modern Ergodic Theory: From a Physics Hypothesis to a Mathematical Theory with Transformative Interdisciplinary Impact

76

˘ Çömez Dogan

Contents Prelude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consequence of the Ergodic Theorem and Other Significant Results . . . . . . . . . . . . . . . . . . Interdisciplinary Aspects of Ergodic Theory in Mathematics . . . . . . . . . . . . . . . . . . . . . . . . Number Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functional Analysis and Harmonic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fractal Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interdisciplinary Aspects of Ergodic Theory with Other Disciplines . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1970 1971 1975 1980 1981 1983 1984 1985 1986 1989

Abstract Ergodic theory emerged as a statistical mechanics hypothesis and has quickly reached into a mature and influential mathematical theory. Beginning with a brief historical account on the origins of the theory, the first two sections of this chapter aim to provide a broad exposé of some major results in ergodic theory and dynamical systems. The remaining sections are devoted to the discussion of the interdisciplinary nature of ergodic theory from a broad perspective. Select applications of the theory are outlined, and its interactions with other mathematical fields and with some nonmathematical disciplines are briefly illustrated.

D. Çömez () Department of Mathematics, North Dakota State University, Fargo, ND, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_31

1969

1970

D. Çömez

Keywords Ergodic theory · Dynamical system · Ergodic transformation · Measure-preserving transformation · Invariant measure · Interdisciplinary aspects of ergodic theory

Prelude The history of the immense body of knowledge and technology that humans created clearly points out that mathematics emerged as the first body of knowledge as an independent discipline. In the beginning, particularly in ancient Egypt and Mesopotamia, mathematics consisted merely of a bunch of “technical methods” that were useful in solving practical problems in surveying, construction, and astronomical prediction. Greek thinkers managed making the quantum leap to transform these practical methods into an abstract and organized collection of concepts, as exemplified in Elements of Euclid or in the Conics of Apollonius. The conditions that led to this transformation are peculiar to the social and cultural environment of Greek society, which valued engaging in intellectual pursuits. This intellectual pursuit also took the form of making connections; hence, we see attempts being made to describe musical harmony using properties of numbers, applying geometric concepts to geography and astronomy. During the Roman and Islamic periods that followed, due to various political and cultural changes, development of mathematics continued, albeit with a slower pace and with a shift to ideas more aligned to practical purposes rather than a pursuit of knowledge for its own sake. The emphasis was more on applications, especially, for Romans who favored the engineering aspect. Islamic intellectuals continued the Greek tradition, as exemplified in al-jabr of Al Khwarizmi, but they were as much interested in applications that would make social life orderly by calculating praying times and predicting phases of the moon accurately; hence, their other contributions were focused more on trigonometry. Even during the early developmental ages, one encounters significant interactions of mathematical ideas and other disciplines like astronomy, music, geography, and engineering. Needless to say, these interactions have continued with increasing intensity, resulting in numerous interdisciplinary subjects. In broadest terms, a discipline is considered as interdisciplinary if it is a combination of or involves two or more (rather distinct) fields of study. In today’s scientific and intellectual environment, this description is rather restrictive. Some disciplines have evolved to a stage so as to include several sub-disciplines which have their own methodologies and perspectives. In such a case, one can also speak about “interdisciplinary” subfields within a vast field. The evolution of many STEM disciplines suggest that emergence of many intellectual and technological disciplines, as an independent discipline or as a subfield, is rarely an isolated occurrence; rather, it is a result of interaction among several disciplines. Therefore, it is not an exaggeration if we posit that many disciplines first come into being as an interdisciplinary field and grow into an independent one over time. In one case

76 Modern Ergodic Theory: From a Physics Hypothesis to a . . .

1971

an idea, method, or practice in a certain discipline finds a fruitful application to a situation/problem in another discipline, often in a transformative way. Over time this leads to new/different methodologies and practices within the latter. Indeed, sometimes the evolution of the newly formed discipline results in an independent discipline in itself. There are also cases where a problem or a phenomena in a discipline begs ideas from one or more others, which leads to a new sub-discipline that utilizes methodologies and ideas from all involved disciplines while generating its own in the meantime. Occasionally, a new idea or development within a field gives rise to new practices and eventually grows apart from the mother discipline into an independent one. As an example we can consider the emergence and development of computer science out of a subfield of mathematics. Whichever the manner disciplines or sub-disciplines emerge and evolve, they naturally keep having strong interaction with their “mother” disciplines in one form or another. Hence, not surprisingly, one might consider them as an interdisciplinary field, whereas others may object to this view. Depending on the discipline or the subfield, the arguments pro or con may have merit. In this chapter, we will take the broad view of the disciplines under discussion as commonly accepted today. We will also measure the level of the interdisciplinary aspect of a field by the impact and contributions of its interaction with other sub-disciplines in the same broad discipline or with the other disciplines. Mathematics at large, and many of its sub-disciplines in particular, is a prime example of a body of knowledge that has always been at the heart of many interdisciplinary activities. This is partly due to the fact that it provides the most definitive and effective language for other disciplines. Mostly it is because its methodologies and concepts are devoid of any ambiguities due to its deductive nature. During its long history, it has experienced a significant growth, particularly from the Age of Enlightenment to present. Now it is a vast discipline containing numerous diverse fields. If one speaks about the impact or application of mathematics, often it is not descriptive, unless the particular subfield and particular idea/concept or method is mentioned. Our central field will be ergodic theory, which initially emerged independently of classical dynamical systems, and over time it has reached a level to encompass measurable dynamical systems. We will exhibit the interactions of ergodic theory, together with its dynamical system component, within mathematics as well as with other nonmathematical disciplines.

Origins Physics has always been one of the most prolific sources of ideas and concepts for mathematicians. This is particularly the case if one looks at the developments of many mathematical theories in the last two centuries. For instance, Fourier analysis, the theory of distributions, and string theory all have their origins tied to some physical concepts. Likewise, having originated from a physical idea, with the beginnings and growth as a mathematical field and its interactions with other fields within mathematics as well as other (nonmathematical) disciplines including

1972

D. Çömez

physics, ergodic theory is a prime case of such a field of mathematics that has significant impact in this regard. Ergodic theory, which focuses on statistical properties of a system, initially emerged independently of classical dynamical systems and over time reached a level that became synonymous with, or inseparable from, measurable dynamical systems. Earliest ideas in ergodic theory have their origins in the attempts made by the pioneers of statistical mechanics, like Boltzmann, Maxwell, and Gibbs, who investigated connections between the ensembles typically studied in statistical mechanics and the properties of single systems. More specifically, their work led them to postulate the existence and equality of infinite time averages and phase averages. Consider a physical system consisting of a (large) number of particles, say N, confined in a compact phase space X; this is usually an energy surface. The state of a single particle moving in this space can be described by the trajectory of a point x = (p, q), where p, q ∈ RN represent the position and momenta of all N particles in the system. We can also assume that X ⊂ R2N is a compact manifold inheriting its topological structure from R2N . By the same token, we can also consider X as having a measurable structure inherited from R2N ; in other words, X is a probability space (X, B, μ), where B is the Borel σ -algebra of subsets of X and μ is the normalized Lebesgue measure. Then, the quadruple (X, B, μ, τ ) is a measurable as well as topological dynamical system. Now, given an initial state x, such a system always has a unique solution, which determines the state Tt (p, q) = (p(t), q(t)) at any time t ≥ 0. In particular, this gives us a one-parameter continuous flow of transformations τ = {Tt }t∈R on the phase space X ⊂ R2N that describes the evolution of the system. Consequently, the orbit of a particle x is the set Ox = {Tt (x)}t∈R ⊂ R2N . By Liouville’s theorem, τ preserves the normalized Lebesgue measure on X, i.e., each Tt is a (Lebesgue) measure-preserving transformation. If f : X → R denotes a function of a physical quantity, measured during an experiment, for any t ≥ 0, f (Tt x) is the value it takes at the instant t provided that the system is at x at time t = 0. It is argued that the measurements of the precise values of f (Tt x) is not possible since it requires knowing the detailed positions and momenta of all N particles. Hence, it is assumed that the result of a measurement is actually the time average of f, i.e., 1 t



t

f (Tt x)dt (time average of f ). 0

Indeed, since macroscopic interval of time for the measurements is extremely large from the microscopic point of view, one may actually consider the limit of the time averages:

lim

t→∞

1 t



t

f (Tt x)dt. 0

76 Modern Ergodic Theory: From a Physics Hypothesis to a . . .

1973

Originally, Boltzmann argued that such a system left to itself will pass through all the points of the phase space; hence, the phase space is completely filled by the orbit of a single particle. Then, he deduced that  the time average should coincide with the average value of f over X, which is X f (x)dμ (space average of f ). Thus, he hypothesized that 1 lim t→∞ t

 0

t

 f (Tt x)dt =

f (x)dμ. X

This is the ergodic hypothesis of Boltzmann. Although it was instrumental in laying the foundation of statistical mechanics, this was a controversial hypothesis since its inception. For example, some influential figures of that time, such as Landau, raised doubts about verifiability of this hypothesis for many systems. There is a very good account of these historical arguments in the sources (Patrascioiu 1987; Sklar 1993; van Leth 2001). The original formulation of the hypothesis has some mathematical issues in regard to rigor. First of all, f must be integrable on X, which is usually the case. However, there’s a deeper problem. Mathematically, a curve is the image of an interval in R, and any curve in Rn is one dimensional. Therefore, such a curve cannot fill a space with dimension greater than one. This is an apparent contradiction to Boltzmann’s assumption that “the orbit of a single point in the phase space visits every point in the space.” Having recognized this contradiction, around 1911 P. Ehrenfest (a student of Boltzmann) modified this assumption into what is also known today as the quasi-ergodic hypothesis: “The orbit of a single point comes arbitrarily close to any point in the phase space.” Mathematically speaking, this means that the orbit of a point is dense in the phase space. This is a reasonable assumption, which is accepted as the actual and workable hypothesis by adherents of the theory and many mathematicians. While working on some problems in celestial mechanics, around 1893, wellknown mathematician H. Poincare observed an important property of measurepreserving transformations on probability spaces: Theorem 1 (Poincare recurrence theorem). If E ∈ B with μ(E) > 0, then for almost every x ∈ E, there exists k ≥ 1 such that T k x ∈ E. That is, almost every point of E returns to E (and does so infinitely often). Soon after its appearance, this innocent but powerful result stirred quite a controversy among many scholars in mathematics, physics, and philosophy. For instance, it suggests that in a box containing large number of gas molecules, left on its own for a long period of time, there will be an instant at which all molecules will occupy the right half of the box while no molecule residing in the other half! Having such an interesting result, which has many interesting consequences within mathematics,

1974

D. Çömez

physics, and astronomy, investigations on the qualitative behavior of orbits are further continued by Lyapunov (1892), Perron (1929), Kolmogorov (1954), Sinai (1959), and many others. Observations that small changes in the initial condition of some differential equations resulted in large deviations led to concepts like bifurcation and chaotic behavior as we know it today. Their pioneering work has been extended to the concept of entropy, which, roughly speaking, is a measure of complexity of a system, by Kolmogorov and his students. Recall that while Poincare recurrence theorem guaranties that an initial state in a set of positive measure returns infinitely often to this set. Naturally, one questions the rate at which such a state returns to the set. With a clever use of Birkhoff’s ergodic theorem, Kac proved that the average return time to the set can be calculated as inversely proportional to the size of the set (Kac 1947). Furthermore, if the point originally was outside of the given set A of positive measure, there is no guarantee about its return to A. The condition that guarantees this for almost every point in the system is what is known as the ergodicity of the transformation, which states that the only sets invariant under the map (up to sets of measure zero) are the whole space and the empty set. Having defined ergodicity, now it’s time to check if we can expect the time average be equal to space average in such systems as claimed. Well, the result stating this fact, known as the ergodic theorem or Birkhoff’s ergodic theorem, is the jewel of ergodic theory.

Theorem 2 (Birkhoff’s ergodic theorem – discrete parameter version). Let (X, μ) be a probability space and T : X → X be a measure-preserving transformation and f ∈ L1 (X). Then  k ∗ (a) limn→∞ n1 n−1 k=0 f (T x) = f (x) exists for almost every x ∈ X, ∗ ∗ x ∈ X, (i.e., the limit is T -invariant), (b) f (T x) = f (x) for almost every  (c) if T is ergodic, then f ∗ (x) = X f (x)dμ. Hence, Birkhoff’s ergodic theorem, proved in 1931 (Birkhoff 1931), confirms that for ergodic dynamical systems the time average must be equal to the space average. However, having said so, one should notice that the assertions may not hold on a set of measure zero. This may be a concern for the philosophically minded, since there are sets that are topologically dense in X while having measure zero. If one shifts focus to the mathematical side of the issue, it is a mild statement to assert that the ergodic hypothesis created tremendous interest among mathematicians of the early twentieth century. For mathematicians, main problems of interest were: 1. Which (measurable) dynamical systems satisfy the ergodic hypothesis? 2. In a dynamical system, can we always expect the time average be equal to space average? If not, what is the limit of the time averages? 3. What is the structure of dynamical systems satisfying ergodicity?

76 Modern Ergodic Theory: From a Physics Hypothesis to a . . .

1975

Each of these questions (and subsequent inquiries) led to a prolific output of results that not only provided answers to these questions but also paved the way to new ideas and techniques that impacted many fields within mathematics and nonmathematical disciplines. To name a few, Birkhoff’s ergodic theorem has been extended and generalized to many different settings; numerous systems (both purely mathematical and physical) proved to satisfy ergodicity. Deeper investigations of the structure of measure-preserving systems lead to many other finer properties beyond ergodicity such as mixing, weak mixing, and exact transformations, which provide finer statistical properties of measure-preserving transformations. More specifically, a measure-preserving transformation T is • mixing if limn→∞ μ(T −n A∩ B) = μ(A)μ(B), −k A ∩ B) − μ(A)μ(B)| = 0, • weak-mixing if limn→∞ n1 n−1 k=0 |μ(T for all measurable sets A and B. Furthermore, the hierarchy between these types of transformations is determined as T is exact ⇒ T is mixing ⇒ T is weak-mixing ⇒ T is ergodic. Indeed, there are also several types of transformations between each of these categories, and the inclusions these implications provide are not reversible. Also, it turns out that by a remarkable theorem of Rohlin (1967) any measure-preserving transformation can be decomposed into ergodic components, although this decomposition can be very involved.

Consequence of the Ergodic Theorem and Other Significant Results Being, undoubtedly, the most fundamental result in dynamical systems and ergodic theory, over the years Birkhoff’s ergodic theorem has been generalized and extended to numerous settings. For easy reference in discussions in the next sections, we will outline some of these. The first two are operator-theoretic generalizations. A contraction T on L1 (X), where (X, μ) is a σ -finite measure space, is a linear operator that satisfies the condition T 1 ≤ 1; if it also satisfies T ∞ ≤ 1, it is called an L1 − L∞ -contraction. Theorem 3 (Dunford and Schwartz 1956). Let (X, μ) be a σ -finite measure space and T : L1 (X) → L1 (X) be  an Lk1 − L∞ -contraction. Then, for all f ∈ Lp (X), 1 ≤ p < ∞, limn→∞ n1 n−1 k=0 T f (x) exists for a.e. x ∈ X. If T is an L1 -contraction only, then the assertion is not valid anymore; hence, the condition of T being an L1 − L∞ -contraction in Dunford-Schwartz theorem cannot be dropped.

1976

D. Çömez

There are other operator theoretical generalizations of Birkhoff’s ergodic theorem; notable one is the similar result for T is an Lp -contraction (i.e., T p ≤ 1), 1 < p < ∞. By Riesz interpolation theorem, an L1 − L∞ -contraction is also an Lp -contraction for all 1 < p < ∞ (Koopman operators induced by measurepreserving transformations are prime examples of such operators). On the other hand, this is not the case for Lp -contractions when 1 < p < ∞; namely, an Lp contraction need not be an Lq -contraction if p = q and 1 < p, q < ∞. Theorem 4 (Akcoglu 1975). Let (X, μ) be a σ -finite measure space, and let T : Lp (X) → Lp (X) be a positive contraction for some 1 < p < ∞. Then, for all  k f ∈ Lp (X), limn→∞ n1 n−1 k=0 T f (x) exists for a.e. x ∈ X. It is known that the condition of positivity in Akcoglu’s theorem cannot be dropped when p = 2, and it is still an open problem if positivity can be dropped when p = 2. Many physical and mathematical situations involve multiple operators; hence, obtaining multiparameter versions of (one-parameter) ergodic theorems is a natural endeavor. In some fields of mathematics, extending a one-parameter statement to higher parameters is merely a simple exercise; however, this is not the case for multiparameter extensions of ergodic theorems, especially those involving a.e. convergence of ergodic averages. The following are two extensions of the ergodic theorem to multiparameter setting. Assume T : X → X and S : X → X be two measure-preserving transformations or T , S : Lp (X) → Lp (X) be linear contractions, 1 ≤ p ≤ ∞. Then, it makes sense to consider multiparameter averages 1 m,n i j of the form mn i,j =1 T S f (x). Theorem 5 (Fava 1972; Zygmund 1951). Let (X, μ) be a σ -finite measure space and T , S : L1 (X) → L1 (X) be positive L1 − L∞ -contractions. Then for all f ∈ L log L(X), the limit of multiparameter averages exists a.e. and m,n n−1 n−1 1  i j 1 k 1 k T S f (x) = ( lim T f (x))( lim S f (x)). m,n→∞ mn n→∞ n n→∞ n

lim

i,j =1

k=0

k=0

This statement is not true if f ∈ L1 . On the other hand, if the operators commute and the averaging is limited to “squares,” then multiparameter averages converge for f ∈ L1 as well: Theorem 6 (Brunel 1973; Dunford and Schwartz 1956). If (X, μ) is a σ -finite measure space and T , S : L1 (X) → L 1 (X) are commuting L1 − L∞ -contractions, then, for all f ∈ L1 (X), limn→∞ n12 ni,j =1 T i S j f (x) exists a.e. Dunford and Schwartz proved this theorem in the setting of commuting measurepreserving flows; later, Brunel provided the discrete version stated above.

76 Modern Ergodic Theory: From a Physics Hypothesis to a . . .

1977

One can view Lebesgue differentiation theorem as a result that recovers a function from its averages. Following  t this approach one can ask if it is possible to recover f from the averages 1t 0 f (Ts x)ds? Well, the answer is, surprisingly, “Yes” for many cases, which are known as local ergodic theorems. Theorem 7 (Local ergodic theorem, Wiener 1939). Let {Tt } be a continuous oneparameter flow of measure-preserving transformations on X such that T0 = I. Then, for all f ∈ L1 (X), 1 lim + t→0 t



t

f (Ts x)ds = f (x) exists a.e.

0

Extension of this result to the multiparameter setting was proved by Terrell (1971) and to positive linear L1 -contractions was proved by Ornstein (1970) and Akcoglu and del Junco (1981). Another extension of the ergodic theorem is along subsequences. Imagine that we’re able to make precise measurements at each point T k x along the orbit of a point in our dynamical system. Suppose that you asked one of your graduate students to make the measurements. But the poor student is sleep deprived, and, rather than making measurements at each T n x, n = 0, 1, 2, . . . , precisely, he/she made measurements at times n1 , n2 , n3 , . . . Having realized this mistake,the student was −1 nk afraid to tell the truth; hence, you ended up with the averages N1 N k=0 f (T x).  −1 nk Does limN N1 N k=0 f (T x) converge a.e.? If so, does it converge to the right value? If not, what are conditions on the sequence (nk ) that guarantees an affirmative result? The answer turns out to be affirmative in some cases and negative in some others. Below is a sample list (out of a vast literature) of sequences {nk } along which a.e. −1 nk convergence holds in the space of functions (i.e., limN N1 N k=0 f (T x) exists a.e). • The sequence of square-free integers in L1 (Boshernitzan and Wierdl 1996), • The sequence [n3/2 ] and [n log n], in Lp , 1 < p < ∞ (Boshernitzan and Wierdl 1996), • Return time sequences, in L2 (Bourgain 1989), • Randomly generated sequences of positive density in L1 , randomly generated sequences of zero density, in L2 (Boshernitzan and Wierdl 1996; Bourgain 1988a), • Sequences of squares, in Lp , 1 < p < ∞ (Bourgain 1988a), • Sequences of primes, in Lp , 1 < p < ∞ (Bourgain 1988b; Wierdl 1988). Furthermore, the limit along the first three sequences is the right one, namely, to the space average! A more comprehensive list of sequences along which ergodic averages exist a.e. (as well as those along which averages fail to converge in any system) is available in the sources (Eisner et al. 2015; Petersen and Salama 1995).

1978

D. Çömez

We all know well that no measuring device is capable of making perfect measurements. Hence, many measurements are somewhat “tainted” or modulated. That is, instead of obtaining the values f (T k x) along the orbit of the point x, we would be getting values like ak f (T k x) for some sequence {ak }. Then, we’re looking for modulated averages 1 ak f (T k x). n n−1 k=0

 k For which sequences {ak } does limn n1 n−1 k=0 ak f (T x) converge a.e.? As in the previous cases, the answer is affirmative for some sequences. The following is a list  of sequences {ak } along which a.e. convergence holds, i.e., limn n1 n−1 a f (T k x) k k=0 exists a.e. for f ∈ L1 (X) : • • • •

ak = λk , where |λ| = 1 (Wiener 1939; Wiener and Wintner 1941), {ak } is a bounded Besicovitch sequence (Bellow and Losert 1985) {ak } is a sequence having a mean (Bellow and Losert 1985) {ak } is a dynamically generated sequence (Çömez et al. 1998).

The first of these two results has been extended to various settings, to L1 − L∞ contraction case (Çömez et al. 1998; Lin et al. 1999), to the case of function valued sequences (Çömez and Litvinov 2013), and to the setting of noncommutative von Neumann algebras (Litvinov 2012). Since the setup and the statements of these results require introduction of many technical concepts, we will refrain from doing so. Convergence of the time averages also exists in norm, which was proved around the same time as Birkhoff’s ergodic theorem, in 1932, by J. von Neumann, which is known as the mean ergodic theorem (von Neumann 1932). In the case of probability spaces, convergence in norm follows from Birkhoff’s ergodic theorem. However, a more general convergence result is also the case: Theorem 8 (von Neumann’s mean ergodic theorem).  If U is a unitary operator k ∗ on a Hilbert space H, then for any f ∈ H, limn→∞ n1 n−1 k=0 U f = f exists in ∗ norm and f ∈ F where F ⊂ H is the subspace of U -invariant elements. If H = L2 (X) of a probability space, this is exactly the norm convergence version of Birkhoff’s ergodic theorem. As expected various extensions and generalizations of mean ergodic theorem are also proved, such as the multiparameter version by Dunford and Schwartz (1988), the analogue of Brunel’s theorem for norm convergence by Çömez and Lin (1991), and, recently, a generalization of it to non-conventional averages by Tao (2008). In his quest to provide an ergodic theoretical proof of Szemeredi’s theorem, in 1977 H. Fürstenberg proved a generalization of Poincare recurrence theorem which proved to have far-reaching consequences.

76 Modern Ergodic Theory: From a Physics Hypothesis to a . . .

1979

Theorem 9 (Multiple recurrence theorem (Fürstenberg 1977)). Let (X, μ) be a probability space and : X → X be a measure-preserving transformation. If k ≥ 1 and μ(A) > 0, then 1 lim inf μ(A ∩ T −i A ∩ T −2i A ∩ . . . ∩ T −(k−1)i A) > 0. n→∞ n n

i=1

In particular, for some n ≥ 1, μ(A ∩ T −n A ∩ T −2n A ∩ . . . ∩ T −(k−1)n A) > 0. This is a significant strengthening of Poincare recurrence theorem. Actually, Fürstenberg proved a more general version, namely, if T is weakly mixing, then for every integer k ≥ 1 and for all {fj }kj =1 ⊂ L∞ (X),    N 1  f1 (T n x)f2 (T 2n x) . . . fk (T kn x) → f1 f2 . . . fk in L2 , N n=1

which can be considered a special multiparameter von Neumann mean ergodic theorem ( n1k is replaced by n1 ). Over the years various generalizations of this result have been obtained. The first type of generalization is proving the convergence of multiple ergodic averages along integer sequences. Belgelson, Fürstenberg and Weiss, and Host and Kra obtained results of this type. The most general one so far was proved by Leibman (2005), who proved that if T is an invertible measurepreserving transformation, then for every integer k ≥ 1, for any integer polynomials p1 , p2 , . . . , pk with pi (0) = 0, and for all {fj }kj =1 ⊂ L∞ (X), N 1  f1 (T p1 (n) x)f2 (T p2 (n) x) . . . fk (T pk (n) x) converges in L2 as N → ∞. N n=1

The case when pm (n) = mn, 1 ≤ m ≤ k, was obtained by Host and Kra (2005). Also, they proved that if T n is ergodic for all n ≥ 1 and pi ’s are rationally  independent, then the L2 -limit is f1 . . . fk . Another way to generalize Fürstenberg multiple recurrence theorem is by considering simultaneous intersection of iterates of the same set under different measure-preserving transformations. A generalization of this kind has been studied by Fürstenberg and Katznelson, Conze and Lesigne, Frantzikinakis and Kra. One of the latest results of this type is due to Tao (2008); he proved that if T1 , T2 , . . . , Tk are commuting invertible measure-preserving transformations, then for all {fj }kj =1 ⊂ L∞ (X), N 1  f1 (T1n x)f2 (T2n x) . . . fk (Tkn x) converges in L2 as N → ∞. N n=1

1980

D. Çömez

Although ergodic theorem guarantees a.e. convergence of the averages  Birkhoff’s k x) for f ∈ L , it does not provide any information on the rate An := n1 n−1 f (T 1 k=0 of convergence. By Abel’s summation by parts formula, n  f (T k x)

k

k=1

 Ak f (T n x) −f + ; n k(k + 1) n−1

=

k=1

which suggest that one can obtain the rate of convergence once the left side of this equality is known to converge a.e. However, this is far from the case; also it has been shown that one cannot have any knowledge on the rate of convergence in ergodic theorem unless additional conditions are assumed. On the other hand, the left-hand side of this equality has some connotations with the well-known Hilbert transform. Inspired by this, M. Cotlar proved in 1955 that: Theorem 10 (Existence of the ergodic Hilbert transform (Cotlar 1955)). Let (X, μ) be a probability space and T be an invertible measure-preserving transformation. Then, for all f ∈ L1 (X), lim

n→∞

n  k=−n,k =0

f (T k x) exists a.e. on X. k

As in the case of others, this theorem has also been extended to various settings; for instance, Campbell and Petersen (1989), Sato (1987), Jones et al. (1998), and others proved that it is also valid in the operator setting when T is a unitary operator on L2 and when T is an L1 and L∞ -contraction. Modulated version of this result has recently been proved by Akhmedov and Çömez (2015). The statements mentioned in this section, which are some of the important results proved in ergodic theory, constitute a small sample from a vast collection of results in ergodic theory. For example, some other important types of statements in ergodic theory are omitted: ergodic theorems for semigroup of operators, ergodic theorems of group actions, noncommutative ergodic theorems, superadditive ergodic theorems, etc. For a more comprehensive study, the reader is referred to the sources (Eisner et al. 2015; Goldstein and Litvinov 2000; Petersen and Salama 1995). The article Moore (2015) provides a nice historical perspective of the theorems of Birkhoff and von Neumann.

Interdisciplinary Aspects of Ergodic Theory in Mathematics In this section we will explore the interdisciplinary aspect of ergodic theory within mathematics. Ergodic theory is intimately intertwined with topological dynamics and probability theory. Many measurable dynamical concepts are amenable to be transferred to the topological setting, sometimes verbatim or in some other cases with some modifications. In the same way, sequences of independent random

76 Modern Ergodic Theory: From a Physics Hypothesis to a . . .

1981

variables can be obtained using measurable dynamical systems. For example, with T (x) = 2x(mod1), and with the function ⎧ 1 ⎪ ⎪ ⎨ 1 if x ∈ [0, ) 2 f (x) = ⎪ 1 ⎪ ⎩ 0 if x ∈ [ , 1], 2 f (T n (x))s are independent random variables for most x ∈ [0, 1]. This is nothing but the Bernoulli shift on two letters. Indeed, both weak law of large numbers and strong law of large numbers can be deduced from the mean ergodic theorem and Birkhoff’s ergodic theorem, respectively. Naturally, there are many straightforward applications of ergodic theory statements in topological dynamics and probability. As it also happens naturally in many fields of mathematics, in the development of ergodic theory, many theorems or methods from other fields of mathematics were instrumental. In return, many statements of ergodic theory, particularly the results stated in the previous section, have contributed significantly to other fields of mathematics, in some cases by leading to the creation of a new line of research within that field. In this regard, we will focus on the interaction with number theory, combinatorics, functional analysis and harmonic analysis, and fractal geometry. One can continue to add many other fields with significant ergodic theory impact to this list, but we will limit ourselves to those mentioned above.

Number Theory A sequence of numbers {an } is called equidistributed (modulo 1) if for any interval [a, b] ⊂ [0, 1] lim |{k : 1 ≤ k ≤ n, < ak >∈ [a, b]}| = b − a, n

where |A| denotes the cardinality of the set A and < x > denotes the fractional part of x ∈ R. In 1916 Weyl proved that the sequence {< nα >} is equidistributed if and only if α is irrational. Later, he also showed that {< n2 α >} is equidistributed if and only if α is irrational. Equidistribution of {< pn α >}, where pn is the n-th prime, was proved by Vinogradov. Let T (x) = x + α(mod1), and with the function

f (x) =

then

χ[a,b] (x) if b < 1 χ{0}∪[a,b] (x) if b = 1,

1982

D. Çömez

1 1 1 f (T k (0)) = f (< kα >) = |{k : 1 ≤ k ≤ n, < kα >∈ [a, b]}|. n n n n−1

n−1

k=0

k=0

By the mean ergodic theorem (uniform version), the limit of these averages exists 1 and is equal to 0 f (x)dx = b − a, implying Weyl’s equidistribution theorem. Similarly, the equidistribution of the sequences {< n2 α >} and {< pn α >} and many other such sequences are also obtained by reducing the problem into the convergence of ergodic averages and applying an appropriate version of mean ergodic theorem mentioned in the previous section. A number 0 < x < 1 is called normal to the base k if every finite combination of length m of the digits {0, 1, . . . , k − 1} appears in the expansion of x (to the base k) with asymptotic frequency of k1m . The famous theorem of Borel (1909) states that in almost every real number (with respect to Lebesgue measure) in its binary expansion the frequency of appearances of 0 (or 1) is 12 . This is equivalent to the statement for all m ≥ 1, l = 0, 1, . . . , 2m − 1, 1 1 χCm,l (T i x) = m , n→∞ n 2 n−1

lim

i=0

where Cm,l = [ 2lm , l+1 2m ] and T (x) = 2x(mod1). Then Borel’s result is a simple consequence of a.e. convergence of ergodic averages. The more general statement that almost every number in [0,1] is normal to any base k follows from the ergodic theorem similarly by using the transformation T (x) = kx(mod1). Two efficient ways of approximating a real number in [0, 1), in particular an irrational number, are via decimal expansion and continued fraction expansion. Both of these expansions can be studied as dynamical systems ([0, 1), T ), where T : [0, 1) → [0, 1) is the ten-fold map T (x) = 10x(mod1) in the case of decimal expansion and in the case of continued fraction expansion T : [0, 1) → [0, 1) is the Gauss map ⎧ ⎪ ⎨ 1 −  1  if x ∈ (0, 1) x T (x) = x ⎪ ⎩ 0 if x = 0, where x is the floor function. Clearly, these maps have significantly different dynamical behavior; hence, one would be inclined to deduce that this difference in behavior would reflect on the corresponding expansions for a given irrational number. In 1964 G. Lochs proved that contrary to this expectation, for almost all irrationals for large enough n, the n digits of decimal expansion of an irrational number x determine close to first n digits (partial quotients) of its continued fraction expansion! Let x ∈ (0, 1) be an irrational number with decimal expansion (which is unique)

76 Modern Ergodic Theory: From a Physics Hypothesis to a . . .

1983

x = 0.d1 d2 . . . dn . . . . For any n ≥ 1, let xn = 0.d1 d2 . . . dn , the rational number determined by the first n decimal digits of x. Let c(xn ) = [0; c1 , c2 , . . . , ck ] be the continued fraction expansion of xn , which is unique if ck = 1 is not allowed. Define k(x, n) as the largest integer for which digits of c(xn ) coincide with the digits of actual continued fraction expansion of x. Theorem 11 (Lochs 1964). For almost all irrational numbers (with respect to Lebesgue measure) lim

n→∞

6 log 2 log 10 k(x, n) = ≈ 0.97027014. n π2

Since the limiting value is very close to 1, it follows that for a large enough n, knowing n decimal digits of x completely determines its k(x, n) (almost close to n) partial quotients. Indeed, Lochs himself showed that first 1000 decimals of π determine its first 968 partial quotients (Lochs 1963). The proof of Lochs’ theorem follows from Shannon-McMillan-Breiman which is, in turn, proved via Birkhoff’s ergodic theorem. It turns out that one can replace continued fraction by other type of expansions, such as Lüroth expansion and β-continued fraction expansion, with similar conclusions (Bosma et al. 2006; Dajani and Fieldsteel 2001).

Combinatorics As mentioned above, Fürstenberg’s multiple ergodic theorem was obtained in connection with Szemeredi’s theorem. In 1927, van der Waerden proved that in any finite partition of N there is an element of the partition that contains arbitrarily long arithmetic progressions (namely, a sequence of the form {a + kd}nk=0 , a, d ∈ N). Knowing this result, Erdös and Turan conjectured that any subset E of N that fails to contain an arithmetic progression of length k must be of 0 density (i.e., limn→∞ |E∩{1,2,...,n}| = 0). This is the same as stating that any subset of N with n positive upper density contains arbitrary long arithmetic progressions. Roth (1952) proved the conjecture affirmatively if k = 3, and so did Szemeredi when k = 4 first (1969) and then for all k ≥ 1 in 1975. Since the proof is very involved and is not easily accessible, naturally mathematicians sought for simpler arguments that yielded the same conclusion. Fürstenberg realized a deep connection between a type of Poincare recurrence and arithmetic progressions, which led him to the multiple recurrence theorem from which Szemeredi’s theorem follows (Fürstenberg 1977). The key ingredient in this connection is also known as the Fürstenberg correspondence principle:

1984

D. Çömez

Let E ⊂ Z with d(E) = lim supn→∞ |E∩{1,2,...,n}| > 0. Then there exists n an invertible measure-preserving system (X, μ, T ) and a measurable set A with μ(A) = d(E) such that for all finite subset C of Z

d ∩n∈C (E − n) ≥ μ ∩n∈C T −n A . This result, as well as being much more accessible, opened up a new avenue of research in ergodic theory as well as combinatorics. Simply put, by the correspondence principle, in order to claim the property about arithmetic sequences, all one needs is to establish the right side of the inequality. Since all generalizations and extensions of Fürstenberg multiple recurrence establish various extensions of the inequality in the correspondence principle, it begs to see the associated configuration about arithmetic sequences in Z. First of all, Fürstenberg multiple recurrence theorem (and correspondence principle) paved way to a subfield of ergodic theory, also termed as ergodic Ramsey theory, containing many results of similar nature. Some notable ones are: • IP-multiple recurrence theorem. A set I = {ni1 + ni2 + · · · + nik : k ∈ N, i1 < i2 < · · · < ik } is called an IP-set. Fürstenberg and Katznelson proved an IPmultiple recurrence theorem which states that the recurrence always takes place along an IP-sets (Fürstenberg and Katznelson 1985). • Polynomial Szemeredi theorem. This is the consequence of polynomial extension of multiple recurrence theorem, from which the polynomial van der Waerden theorem is also obtained by Bergelson and Leibman (2003). There are other such results, some of which have no counterpart within combinatorial number theory, that opened up a new challenge in finding associated structures within Z. Furthermore, proving existence of these new structural theorems solely with combinatorial techniques has been another new task for mathematicians. Another notable outcome is a recent result of Tao and Ziegler (2008), who proved, via appropriate ergodic theorem and correspondence principle, a remarkable theorem which states that the set of primes contains arbitrarily long arithmetic progressions!

Functional Analysis and Harmonic Analysis Ergodic theorems are in general difficult to prove, or their proofs are very “involved,” with a few exceptions, like the Poincare recurrence theorem. Naturally, these proofs employ various theorems and/or techniques from other fields of mathematics, notably from functional analysis and harmonic analysis. One such is the Banach principle, which states that, if a convergence result takes place on a dense subset of a function space and if the (averaging) operators involved in the

76 Modern Ergodic Theory: From a Physics Hypothesis to a . . .

1985

process are bounded, then the convergence holds for all functions. This functional analytic result has been utilized, with some necessary modifications, in all a.e. ergodic theorems stated in the previous section. However, it requires boundedness of the (averaging) operators, which is known as maximal inequality. It turns out that, in the majority of theorems, establishing the necessary maximal inequality is one of the most difficult tasks. Indeed, proof of this inequality, despite following the same broad pattern, is significantly difficult, as well as challenging, in each case. For instance, all the subsequential and modulated ergodic theorems stated above fall into this category. These various types of maximal inequalities have, in turn, become an indispensable tool for many convergence results in harmonic analysis (Eisner et al. 2015; Krengel 1985; Rosenblatt and Wierdl 1992). Establishing a suitable dense subset of the function space involved has also been fruitful in proving numerous decomposition results. Starting with mean ergodic theorem, one such decomposition is expressing the function space as the sum of invariant functions and the “coboundary,” which consists of all functions of the form f − f ◦ T . This main theme has been extended to more general settings in functional analysis (Eisner et al. 2015; Krengel 1985). Proving maximal inequalities for ergodic Hilbert transform is as much, if not more, difficult task as proving those about ergodic averages. This task, both in the local and discrete settings, has been an intense subject of research by numerous ergodists (Campbell and Petersen 1989; Campbell et al. 2003; Petersen 1983). Since the Banach principle is fundamental in proving a.e. convergence, when ergodic theorems in more general settings are considered, for instance, in spaces like L1 (X) + L∞ or in von Neumann algebras, it would be desirable to have a version of Banach principle in such spaces. This is a nontrivial task. However, Banach principle was extended to L1 (X) + L∞ by Bellow and Calderón (1999) and to von Neumann algebras by Goldstein and Litvinov (2000). Akcoglu’s ergodic theorem (Akcoglu 1975) was a breakthrough after being an open problem for more than 40 years. One key ingredient component of the proof is a dilation argument that extended the a.e. convergence from a “simpler” space to a general one containing the former. There were several dilation theorems in functional analysis, particularly in the Hilbert space setting, but they all had a limited scope. Akcoglu’s dilation argument was devoid of many of these restrictions. Not surprisingly, this result has been generalized to many settings by numerous mathematicians (Akcoglu and Sucheston 1977; Nagel and Palm 1982).

Fractal Geometry Before closing, an interesting feature of some dynamical systems in connection with fractal geometry is worth mentioning. For that purpose, consider the tent map, with a modified form as described below. Let X = [0, 1] with Lebesgue measure and define

1986

D. Çömez

⎧ 1 ⎪ ⎪ ⎨ 3x if x ∈ [0, ) 2 T (x) = ⎪ ⎪ ⎩ 3(1 − x) if x ∈ [ 1 , 1]. 2 We immediately notice that T does not map [0, 1] into itself. When x ∈ ( 12 , 23 ), T x > 1; hence, it lies outside the interval [0, 1]. The successive images of [0, 1] under T will reveal an interesting outcome. If we define J1 = {x ∈ [0, 1] : T x > 1}, then it is centered around the point x = 12 . Then [0, 1] \ J1 consists of two closed intervals, both of which are mapped   one-to-one and onto [0, 1]. Similarly, define J2 = x ∈ [0, 1] : T 2 x > 1 . Then [0, 1] \ (J1 ∪ J1 ) consists of four closed intervals that are mapped one-to-one n 1]} and and onto [0, 1]. Continuing on we can define In = {x ∈ [0, n1] : T x ∈ [0, n Jn = {x ∈ In−1 : T x > 1}. Notice that In = [0, 1] \ k=1 Jk has 2n disjoint intervals that T maps one-to-one and onto [0, 1]. Taking the set of all such intervals, define C = {x ∈ [0, 1] : T n x ∈ [0, 1], ∀n ≥ 1}. Since all of these intervals are oneto-one and onto [0, 1], C is mapped to itself. Hence, the pair (C, T )  is a dynamical system. Furthermore from the construction, we can see that C = n≥1 In is the Cantor set, a fractal! This is just an example of how one can construct fractals via dynamical systems. The most important consequence of such constructions is that one can bring the powerful tools of ergodic theory to study the properties of resulting fractals. Indeed, some fundamental features of fractals (such as box dimension, Hausdorff dimension (Pesin 1997), subfractal structure (Sattler 2017), etc.) have been studied in depth only after the introduction of ergodic theory tools into fractal geometry.

Interdisciplinary Aspects of Ergodic Theory with Other Disciplines The central mathematical object in ergodic theory is a self-map T , whether it is a measure-preserving transformation on a measure space or a linear operator on a suitable function space. The dynamics that this self-map creates leads up to the consequences discussed in the previous sections. Basic properties involved in all these successful results are ergodicity, mixing, and recurrence, and the fundamental idea is the equality, in some sense, of space average and time average, which requires existence of an invariant measure. It is natural to expect that in any discipline which deals with phenomena that changes in time or involves a sequential process, the notions and the central idea of ergodic theory can be utilized in one form or another. Such disciplines are widespread, many experimental sciences (physics and astronomy, chemistry, biological sciences, climatology), social sciences (population studies, sociology), engineering (electrical engineering, aerospace engineering, control theory, transportation studies), economics, and medicine (neural networks).

76 Modern Ergodic Theory: From a Physics Hypothesis to a . . .

1987

The mathematical universe is an ideal one, in the sense that the axioms, terms, and relations on them are well defined and the statements proved are valid without any doubt. However, this is not the case in the real world. Consequently, there is always some kind of discrepancy, or margin of error, or modeling assumptions, etc., in applications to real-world situations. Therefore, it is not unusual that application of a mathematical theory into a real-world problem is contested or can even be controversial. Indeed, as mentioned earlier, heated debates that took place at the earlier stages of ergodic theory is an excellent testimony to this fact. Similarly, today, in some disciplines where ergodic theory concepts or results are implemented, we witness debates taking place on the necessity or validity of this implementation. Paradoxical as it may seem, these conflicts frequently bring new ideas or methodologies within the discipline or establish the existing one firmly in place. The goal of this chapter is to exhibit existing and working connections between ergodic theory and other disciplines; engaging in any debate about the issues raised on these connections is out of the context. Also, for obvious reasons, rather than providing a comprehensive survey, we will bring up a few selected cases of interdisciplinary connections. Since the expertise of the author is in ergodic theory only, these connections will be displayed in broad terms. Categorically, interactions of ergodic theory with other disciplines happen in two manners: either the idea of “time average = space average” is utilized in one form or another, or an ergodic property of the map (ergodicity, mixing, exactness) governing the dynamical system at hand is assumed. In the first case, one needs a suitable invariant measure, and in the second case, the property attributed to the map must represent the reality as closely as possible. Both of these can be contentious due to the extent of accuracy of modeling or adherents of the opposite view. In general terms, if a real-world phenomenon under investigation gives rise to a dynamical system, often it is a chaotic system; hence, it is not amenable to making precise longterm predictions along trajectories. On the other hand, in many cases, the estimation or measurement of averages is more likely, which leads to derivation of an invariant measure. This outcome, naturally, paves way to employing all the machinery of ergodic theory into the study. In practice, deriving an invariant measure means finding an algorithm which provides a description of the measure, which, in many cases, takes the form of obtaining the measure as a (finite or countable) discrete measure. The quest for developing such algorithms, in turn, is a new area of research in computer science and applied mathematics. Below, in order to provide a context to interdisciplinary connections of ergodic theory, we will discuss ergodic theory connections within climate science, biological science, and economics in some broad terms. One can view climate as the expected value of meteorological quantities, such as surface temperature and precipitation, main contributors of which are atmosphere, hydrosphere, biosphere, and geosphere. Each of these is a dynamical system which evolve under the action of multiple “transformations,” such as solar heating, rotation of the Earth, geological events, etc. Furthermore, these agents of change are highly interconnected. Therefore, all the dynamical system models of local, regional, or global climatic systems constitute extremely complex systems. The records of

1988

D. Çömez

observations of climatic changes over the years and some extensive research on polar ice and on fossils provide clear evidence on the variability of climatic events both on the short term and the long term. There are weekly or seasonally or yearly cycles; there are also cycles expanding much longer time intervals, some of which show measurable differences. Clearly, characterizing this variability is fundamental in understanding the predictability of climate and weather over both short time and long time intervals. Lorenz (1963) was one of the pioneers that used a dynamical systems approach to understand climatic changes and predictability (Lorenz 1980). His system was induced by three nonlinear ordinary differential equations (known as Lorenz equations) whose solutions, few exceptions aside, are inherently chaotic. Hence, while this model would provide quite accurate information in the short term, it would be unstable beyond a certain range. Due to the pioneering work of Lorenz and adherents of his ideas, today it is possible to predict atmospheric events within a range of a week to 10 days accurately. It is also accepted that beyond a few weeks, such predictions are unlikely to be accurate. Today, many climatologists agree that Lorenz’s model needs to be improved; hence, the issue is not whether climate should be modeled as a dynamical system, rather the nature of this dynamical system chosen and to which extent it represents the reality (Lucarini 2009; May 1987; Ozawa et al. 2003; Pietsch and Hasenauer 2005; Thornton 1998). It is a pleasant irony that, besides being a pioneering work in climatology, Lorenz’s work was also inspirational within the ergodic theory and dynamical systems. The particular system he developed, in a simplified form given as T x = ax 3 + (1 − a)x, is non-periodic one with a very complicated attractor to analyze, a strange attractor and a fractal. It is one of the most studied examples of chaotic dynamical systems; only relatively recently mathematicians have had a firm grasp of its features (Tucker 2002). Furthermore, it turns out that Lorenz equations are not peculiar to the model he intended to study; they also appear (in some modified form, of course) in other models of various phenomena in physics, engineering, and chemistry. Economics may seem an unlikely discipline that would have ergodic theory connections. However, many economical events evolve subject to some agents that drive it, such as time, market fluctuations, political events, etc., many of which are inherently unpredictable. On the other hand, over long time scales, there are some regularities that can be measured. So, it can be assumed that, given this statistical information on the past, the future should be predictable. From this perspective, interpreting past fluctuations of prices as time series, Samuelson (1965) modeled market future pricing as a stochastic series of events, more specifically, a stationary process with an underlying ergodic probability (Samuelson 1965). Here, one can utilize the time series analysis tools that ergodic theory provides. This was a revolutionary approach and opened up both avenues of new means of market studies as well as avenues of criticisms (Blume 1979). Some of the debate focused on the dependence of the probability distribution on the initial distribution. This, on the other hand, is ignorable, by the uniform mean ergodic theorem, if the process

76 Modern Ergodic Theory: From a Physics Hypothesis to a . . .

1989

involved has a unique ergodic part. Today, both the objections to the use of ergodic tools (by classical economists) and study of various economical processes as an ergodic one continue (Jovanovic and Schinckus 2013; Poitras and Heaney 2015). Needless to say, biological science is one of the prime sources of dynamical systems as well as a discipline amenable to applications of such. Biological systems have strong resemblance to those in climate science; often they involve multiple agents of change. Hence, their models can range from a relatively simple system to a highly complex one. The logistic system, being one of the prototypes of an ergodic dynamical system with chaotic behavior, with numerously modified forms, besides providing a model to study some biological systems in depth, also supplies the foundation for developing new ones that apply to predator-prey type processes or genetics or many more (May 1987). As another example, the plasma membrane of living cells is a porous tissue; hence, diffusion through the membrane, being time dependent, can be studied by modeling it as a dynamical system. Further, in this model tracking a single molecule can be interpreted as the trajectory of a point, which naturally gives possibility of both time series analysis of individual trajectories and ensemble averages. It turns out that this is a very complex system that displays seemingly opposing processes, namely, both ergodic and fractal and non-ergodic (Weigel et al. 2011). Besides statistical mechanics, the originating field, many other systems in physics are modeled as a dynamical system in which ergodicity naturally emerges. This is not limited to classical dynamics; it also expands to quantum mechanical systems (Hofmann 2015; Klein 1952). In a quantum system, the equality of time and ensemble averages is equivalent to having all invariant operators being a constant multiple of the unit operator, which is essentially von Neumann’s spectral characterization of ergodicity. On the other hand, some research also indicate that, due to inherent uncertainty within a quantum system, it would be desirable to infuse statistical properties into quantum mechanical systems without requiring or assuming the existence of an external probability measure. Such considerations led to the concept of “breaking ergodicity,” describing systems which are non-ergodic, but their phase space is not necessarily decomposable into regions on which the restricted system is ergodic (Bel and Barkai 2006; Bouchaud 1992; Turner et al. 2018). This is not the same as ergodic decomposition of a (non-ergodic measurepreserving) transformation. Ergodic theory emerged from physics and grew into a mature and influential field of mathematics with significant applications to other disciplines (including physics). Does that mean physics has given birth to another ergodic-like concept that would become a new theory?

References Akcoglu MA (1975) A pointwise ergodic theorem in Lp -spaces. Can J Math 27:1075–1082 Akcoglu MA, del Junco A (1981) Differentiation of n-dimensional additive processes. Canadian J Math 33:749–768 Akcoglu MA, Sucheston L (1977) Dilations of positive contractions on Lp -spaces. Can Math Bull 20:285–292

1990

D. Çömez

Akhmedov A, Çömez D (2015) Good modulating sequences for the ergodic Hilbert transform. Turkish J Math 39:124–138 Bel G, Barkai E (2006) Weak ergodicity breaking with deterministic dynamics. Europhys Lett 74:15–21 Bellow A, Calderón AP (1999) A weak-type inequality for convolution products Harmonic analysis and partial differential equations (Chicago, 1996). Chicago lectures in mathematics. University Chicago Press, Chicago, pp 41–48 Bellow A, Losert V (1985) The weighted pointwise ergodic theorem and the individual ergodic theorem along subsequences. Trans Am Math Soc 288:307–345 Bergelson V, Leibman A (2003) Topological multiple recurrence for polynomial configurations in nilpotent groups. Adv Math 175:271–296 Birkhoff GD (1931) Proof of the ergodic theorem. Proc Natl Acad Sci USA 17:656–660 Blume LE (1979) Ergodic behavior of stochastic processes of economic equilibria. Econometrica 47:1421–1432 Boshernitzan M, Wierdl M (1996) Ergodic theorems along sequences and Hardy fields. Proc Natl Acad Sci USA 93:8205–8207 Bosma W, Dajani K, Kraaikamp C (2006) Entropy quotients and correct digits in number-theoretic expansions. Dyn Stoch 48:176–188 Bourgain J (1988a) On the maximal ergodic theorem for certain subsets of the integers. Israel J Math 61:39–72 Bourgain J (1988b) On the pointwise ergodic theorem on Lp for arithmetic sets. Israel J Math 61:73–84 Bourgain J (1989) Pointwise ergodic theorems for arithmetic sets. IHES Publ Math 69:5–45. (With an appendix by the author, H. Fürstenberg, Y. Katznelson and Ornstein DS) Bouchaud JP (1992) Weak ergodicity breaking and aging in disordered systems. J Phys I 2: 1705–1713 Brunel A (1973) Théorème ergodique ponctuel pour un semi-groupe commutatif finiment engendré de contractions de L1 . Ann Inst H Poincaré Sect B 9:327–343 Campbell J, Petersen K (1989) The spectral measure and Hilbert transform of a measure-preserving transformation. Trans Am Math Soc 313:121–129 Campbell J, Jones R, Reinhold K, Wierdl M (2003) Oscillation and variation for singular integrals in higher dimensions. Trans Am Math Soc 355:2115–2137 Çömez D, Lin M (1991) Mean ergodicity of L1 -contractions and pointwise ergodic theorems. Almost everywhere convergence, II (Evanston, 1989). Academic, Boston, pp 113–126 Çömez D, Litvinov S (2013) Ergodic averages with vector-valued Besicovitch weights. Positivity 17:27–46 Çömez D, Lin M, Olsen J (1998) Weighted ergodic theorems for mean ergodic L1 –contractions. Trans Am Math Soc 350:101–117 Cotlar M (1955) A combinatorial inequality and its applications to L2 -spaces. Rev Mat Cuyana 1:41–55 Dajani K, Fieldsteel A (2001) Equipartition of interval partitions and an application to number theory. Proc Am Math Soc 129:3453–3460 Dunford N, Schwartz JT (1956) Convergence almost everywhere of operator averages. J Ration Mech Analyse 5:129–178 Dunford N, Schwartz JT (1988) Linear operators-I. Wiley, New York Eisner T, Farkas B, Haase M, Nagel R (2015) Operator theoretic aspects of ergodic theory. Springer-graduate texts in mathematics, 272, Springer, Heidelberg Fava NA (1972) Weak type inequalities for product operators. Stud Math 42:271–288 Fürstenberg H (1977) Ergodic behavior of diagonal measures and a theorem of Szemerédi on arithmetic progressions. J Analyse Math 31:204–256 Fürstenberg H, Katznelson Y (1985) An ergodic Szemeredi theorem for IP-systems and combinatorial theory. J Analyse Math 45:117–168 Goldstein, Litvinov S (2000) Banach principle in the space of τ -measurable operators. Stud Math 143:33–41

76 Modern Ergodic Theory: From a Physics Hypothesis to a . . .

1991

Hofmann HF (2015) On the fundamental role of dynamics in quantum physics. Phys Rev A 91:062123 Host B, Kra B (2005) Nonconventional ergodic averages and nilmanifolds. Ann Math 161:397–488 Jones RL, Kaufman R, Rosenblatt JM, Wierdl M (1998) Oscillation in ergodic theory. Ergodic Theory Dyn Syst 18:889–935. Jovanovic F, chinckus C (2013) Econophysics: a new challenge for financial economics? J Hist Econ Thought 35:319–352 Klein MJ (1952) The ergodic theorem in quantum statistical mechanics. Phys Rev 87:111–115 Kac M (1947) On the notion of recurrence in discrete stochastic processes. Bull Am Math Soc 53:1002–1010 Kolmogorov AN (1954) On conservation of conditionally periodic motions for a small change in Hamilton’s function (Russian). Dokl Akad Nauk SSSR 98:527–530 Krengel U (1985) Ergodic theorems. de Gruyter, Berlin Leibman A (2005) Convergence of multiple ergodic averages along polynomials of several variables. Israel J Math 146:303–315 Lin M, Olsen J, Tempelman A (1999) On modulated ergodic theorems for Dunford-Schwartz operators. Ill J Math 43:542–567 Litvinov S (2012) Uniform equicontinuity of sequences of measurable operators and noncommutative ergodic theorems. Proc Am Math Soc 140:2401–2409 Lochs G (1963) Die ersten 968 Kettenbruchnenner von π . Monatsch Math 67:311–316 Lochs G (1964) Vergleich der Genauigkeit von Dezimalbruch und Kettenbruch. Abh Math Sem Univ Hamburg 27:142–144 Lorenz EN (1963) Deterministic nonperiodic flow. J Atmospheric Sci 20:130–141 Lorenz EN (1980) Attractor sets and quasigeostrophic equilibrium. J Atmos Sci 37:1685–1699 Lucarini V (2009) Thermodynamic efficiency and entropy production in the climate system. Phys Rev E 80:021118 Lyapunov AM (1892) The general problem of the stability of motion (In Russian). Doctoral dissertation, University of Kharkov May RM (1987) Chaos and dynamics of biological populations. Dynamical chaos. Proc R Soc Discuss Meet 27–43 Moore CC (2015) Ergodic theorem, ergodic theory, and statistical mechanics. Proc Natl Acad Sci USA 112:1907–1911 Nagel R, Palm G (1982) Lattice dilations of positive contractions on Lp -spaces. Can Math Bull 25:371–374 Ornstein DS (1970) The sum of the iterates of a positive operator. In: Ney P (ed) Advances in probability and related topics, vol 2. pp 87–115 Ozawa H, Ohmura A, Lorenz R, Pujol T (2003) The second law of thermodynamics and the global climate system: a review of the maximum entropy production principle. Rev Geophys 41: 1018 Patrascioiu A (1987) The ergodic hypothesis: a complicated problem in mathematics and physics. Los Alamos Science (Special issue) 263–279 Perron O (1929) Über Stabilität und asymptotisches Verhalten der Integrale von Differentialgleichungssystemen (German). Math Z 29:129–160 Pesin YB (1997) Dimension theory in dynamical systems. The University of Chicago Press, Chicago Petersen K (1983) Another proof of the existence of the ergodic Hilbert transform. Proc Am Math Soc 88:39–43 Petersen KE, Salama IA (eds) (1995) Ergodic theory and its connections with harmonic analysis. Mathematical society lecture notes series, vol 205. Cambridge University Press, London Pietsch SA, Hasenauer H (2005) Using ergodic theory to assess the performance of ecosystem models. Tree Physiol 25:825–837 Poitras G, Heaney J (2015) Classical ergodicity and modern portfolio theory. Chin J Math 1–17 Rohlin VA (1967) Lectures on the entropy theory of transformations with invariant measure (Russian). Uspehi Mat Nauk 22:3–56

1992

D. Çömez

Rosenblatt J, Wierdl M (1992) A new maximal inequality and its applications. Ergodic Theory Dyn Syst 12:509–558 Roth K (1952) Sur quelques ensembles d’entiers (French). C R Acad Sci Paris 234:388–390 Samuelson PA (1965) Proof that properly anticipated prices fluctuate randomly. Indus Manag Rev 6:41–49 Sato R (1987) On the ergodic Hilbert transform for Lamperti operators. Proc Am Math Soc 99: 484–488 Sattler E (2017) Fractal dimension of subfractals induced by sofic subshifts. Monatsch Math 183:539–557 Sinai Y (1959) On the concept of entropy for a dynamic system (Russian). Dokl Akad Nauk SSSR 124:768–771 Sklar L (1993) Physics and chance: philosophical issues in the foundations of statistical mechanics. Cambridge University Press, Cambridge Tao T (2008) Norm convergence of multiple ergodic averages for commuting transformations. Ergodic Theory Dyn Syst 28:657–688 Tao T, Ziegler T (2008) The primes contain arbitrarily long polynomial progressions. Acta Math 201:213–305 Terrell TR (1971) Local ergodic theorems for n-parameter semigroups of contraction operators. Doctoral dissertation, The Ohio State University Thornton PE (1998) Description of a numerical simulation model for predicting the dynamics of energy, water, carbon and nitrogen in a terrestrial ecosystem. Ph.D. Thesis, University of Montana, Missoula Tucker W (2002) A rigorous ODE solver and Smale’s 14th problem. Found Comput Math 2:53–117 Turner CJ, Michailidis AA, Abanin DA, Serbyn M, Papi Z (2018) Weak ergodicity breaking from quantum many-body scars. Nat Phys 14:745–749 van Leth J (2001) Ergodic theory, interpretations of probability and the foundations of statistical mechanics. Stud Hist Philos Mod Phys 32:581–594 von Neumann J (1932) Proof of the quasi-ergodic hypothesis. Proc Natl Acad Sci USA 18:70–82 Wiener N (1939) The ergodic theorem. Duke Math J 5:1–18 Wiener N, Wintner A (1941) Harmonic analysis and ergodic theory. Am J Math 63:415–426 Weigel AV, Simon B, Tamkun MM, Krapfa D (2011) Ergodic and nonergodic processes coexist in the plasma membrane as observed by single-molecule tracking. Proc Natl Acad Sci USA 108:6438–6443 Wierdl M (1988) Pointwise ergodic theorem along the prime numbers. Israel J Math 64:315–336 Zygmund A (1951) An individual ergodic theorem for non-commutative transformations. Acta Sci Math Szeged 14:103–110

Two-Way Thermodynamics

77

L. S. Schulman

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Opposite Arrows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix: Precise Definition of the Modified “Cat” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1994 1994 1996 1997 1998 1999 1999 2001

Abstract A model in which two weakly coupled systems maintain opposite running thermodynamic arrows of time is exhibited. Each experiences its own retarded electromagnetic interaction and can be seen by the other. “Time” is thus a statistical concept. Paradoxes are also explored from the standpoint of initial and final value problems.

Keywords Time · Statistical mechanics · Model systems · Paradoxes

L. S. Schulman () Physics Department, Clarkson University, Potsdam, NY, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_108

1993

1994

L. S. Schulman

Introduction The arrow of time is perceived by all of us and is made precise by the second law of thermodynamics. Let me explain both assertions of the previous sentence: The perception is simple. The past is fixed and often known; the future is unpredictable and generally unknown. This is an arrow appreciated by all. Now a precise version. The “second law of thermodynamics,” is the statement that entropy increases or stays constant as time moves forward. Entropy has a practical and a theoretical meaning. Its practical meaning is the amount of heat transferred divided by the temperature at which the transfer took place. Thus the entropy increase of a gram of ice that melts is 334 joules (about 80 calories, what it takes to melt the ice), divided by T (about 273 ◦ K), where entropy has units of joules per Kelvin (and Kelvin is a measure of temperature above absolute zero). It is thus about 1.22 joules per Kelvin for 1 gram of H2 O at standard pressure. The theoretical meaning of entropy is both as the logarithm of multiplicity (divided by kBoltzmann 1 ) and as missing information. It can be shown that all definitions of entropy agree (up to a multiplicative constant), and in the following, I’ll mostly use the theoretical definitions.2 Is the “perceived arrow” the same as entropy increase? In this essay I mix the ideas, but some people worry about what perception is, what consciousness is, and I don’t have a good answer for them. Nevertheless, my demonstration in terms of entropy does not depend on defining these terms. In this article I’ll show that you can have an arrow of time in the opposite direction, and in fact both arrows can exist simultaneously (Schulman 1999). This is a consequence of not presupposing an arrow and allowing an arrow to emerge as a consequence of having a large system. It completely rewrites ideas on causality and as such will make some people uncomfortable.

Some Mathematics There are two pieces of mathematics that I will require to make my point. The first is the notion of boundary conditions. In space this is easy to understand. Suppose you have a string tied down at its two ends, but allow it to vibrate in one dimension in between. This gives rise to a classic problem in mathematics and leads to Fourier analysis and other properties of the associated partial differential equation. You can also have boundary conditions in time. For example, you might require an oscillator to be at one position at an “initial” time and at another at a “final” time. Depending on the kind of oscillator and the boundary conditions, there could be one, none, or many ways to satisfy these conditions. For example, if you have the simplest kind of oscillator, d 2 x/dt 2 = −ω2 x with x(tinitial ) = xi and x(tfinal ) = xf , then

x(t) =

xi sin ω(tfinal − t) + xf sin ω(t − tinitial ) . sin ω(tfinial − tinitial )

(1)

77 Two-Way Thermodynamics

1995

Fig. 1 Cat map for an “ideal gas.” 500 points are started in the square (x, y) ∈ [.5, .6] ⊗ [.5, .6], but are otherwise random. Under the cat map, x  ≡ x + y and y  ≡ x + 2y (both mod1), they become the parallelogram in the next figure. On the next time step (the next figure) the mod 1 action coupled with the stretching has begun to pull the points apart and by time 7 (the last figure) nothing is recognizable

This obviously satisfies the boundary conditions and the equation of motion. (Note that things get complicated for tfinal − tinitial = π n/ω for integer n, but this is not of current interest.) Now imagine much more complicated two-time boundary conditions. You have a large number of particles,3 but you don’t give the position of each one. Rather you give macroscopic information; I have yet to give a definition of macroscopic, but I have in mind what’s possible for us limited human beings, information about gross properties. If those conditions at the disparate times are not equilibrium, then you could expect that at intermediate times – times between tinitial and tfinal – the system would head for equilibrium (and would reach it if there’s enough time), but before the final time, it would do something strange: it would head away from equilibrium to meet the final conditions. One consequence of the second law (as elaborated above) is that entropy is maximum in equilibrium. This two-time boundary condition has caused entropy to decrease. It has violated the second “law.” Since the two-time boundary condition does not contradict any existing experiment, it remains a possibility.4,5,6 A second piece of mathematics arises when I want to give an example. This involves the “cat map,” 7 which resembles classical mechanics and is known to equilibrate quickly. It is a map of the unit square into itself: x  ≡ x + y and y  ≡ x + 2y, both modulo 1, where x and y are coordinates within the unit square and x  and y  represent the new values after one application of the map. The map is area preserving and, thinking of x and y as periodically defined position and momentum, this obeys a kind of Liouville theorem (equal areas before and after, in 2-dimensions), qualifying it as a kind of classical mechanics. Our system is an ideal gas of particles obeying the cat map. If you start these “particles” in a small region of the unit square they will have low entropy since there are fewer states associated with a bunched configuration than one that is spread out (for space-based entropy, a subject that we will make precise in a moment). After a few iterations of the cat map, the particles are (generally) spread out, an increase of the entropy. In Fig. 1 I show an example.

1996

L. S. Schulman

Fig. 2 Increase of entropy with time in the cat map. The coarse grains are 1/10 on a side. The line above the circles is the theoretical maximum, log 100 (for this grain size). The equilibration time for this coarse graining is about 5 or 6. The equilibration time only increases like the logarithm of the total number of grains

This mechanical system is headed for equilibrium. To be quantitative, define an information entropy using a coarse graining. As grains take the 100 1/10 by 1/10 squares contained in the unit square, and only count the number of points in each grain. Thus the observer can only determine which grain a point is in, not the point’s exact coordinates. The entropy is S=−



pk log pk ,

pk ≡ nk /N,

(2)

where kruns over the coarse grains, nk is the number of points in grain-k and N = nk . Time evolution consists of applying the cat map to each particle individually. I’ll give a few examples to show that all this makes sense, including my earlier declaration concerning the second law. First, I show that the system comes to equilibrium. Suppose you start all points in a 0.1-by-0.1 square (with its lower left at (0.5, 0.5) and calculate the entropy using squares of side 1/10, then the initial entropy is zero, but it increases quickly. This is illustrated in Fig. 2. In the real world, entropy may not increase so quickly, and, for example, some objects last a long time. They would make even better examples of what I’m about to show, but alas, it is difficult to solve the associated two-time boundary problem.

Opposite Arrows There are other things to try once you adopt the two-time boundary condition. For example, a system that comes out of equilibrium – without waiting for a Poincaré recurrence – is illustrated in my book Schulman (1997) (Chap. 4, Sect. 3). Probably the most puzzling is the case of “opposite arrows.” It is obvious that if you demand that all points fall in one square at the end of a run, rather than at the beginning (as in Fig. 2), you would get an arrow that points the “wrong” way. What I mean by “wrong” is that entropy decreases. This is trivial and based on the time symmetry of the map.8 Now imagine two “cats,” for one of

77 Two-Way Thermodynamics

1997

Fig. 3 Increase of entropy with time in the cat map, with opposite boundary conditions for two “cats.” The coarse grains are 1/10 on a side. The line above the circles is the theoretical maximum, log 100 (for this grain size). The equilibration time for this coarse graining is about 5 or 6

which the entropy increases in the usual direction and other has its arrow reversed. Again, no problem. Each goes its own way. But now couple them. You might think the old joke about an anti-person and person meeting would cause the whole thing to blow up.9 No, nothing blows up. The two persons, cats, whatever, see a certain amount of noise due to the neighboring system, but each retains an arrow. This is illustrated in Fig. 3, where both entropies are shown. Note that the coupling has the same strength at all times (something not expected in reality), and nevertheless, each retains its arrow. Moreover there is a short period during which they coexist and have opposite arrows. What does this imply? The most important would be to find experimental, or observational, consequences. An example of this would be a star whose entropy decreases. But it won’t be easy. For one thing, if it’s a star, instead of radiating, it will be absorbing light. And that means you won’t see it. So you’d have to look for something hidden, and that something is decreasing in entropy. It would have a gravitational interaction. Does this make it a candidate for dark matter? Several arguments mitigate against this possibility, although the properties of opposite arrow material at this point are unknown.

A Paradox Let’s call this paradox the cat paradox. (With time travel there’s also something known as the grandfather paradox in which you shoot your grandfather before his child, your parent, is born; as a grandfather I find this lacking in amusement. And anyway, I don’t deal with time machines.)10 For this paradox we have two protagonists, Alice and Bob. And, as usual for physics, there’s a cat, which might or might not get wet. Alice is going in one direction in time and Bob and his cat, in the other. The cat is sunning itself by an open window, and Bob is in the room but not paying attention. Alice, who can see into Bob’s future, notices that it will rain, so she gets Bob’s attention and tells him to close the window – to keep his cat dry. If

1998

L. S. Schulman

he closes it, Alice sees a closed window and no need to signal. Is the window open or closed? A paradox of this sort was considered by Wheeler and Feynman (1949) who resolved it using continuity in nature. With this solution Bob will mostly close the window leaving it slightly open – enough for Alice to tell him about it. Unfortunately, the Wheeler-Feynman solution doesn’t consider other problems (Schulman 2002). How does Alice send her message? Perhaps an inscription on a rock or maybe a note written on paper. I’m reluctant to invoke electromagnetic signals because I don’t know how they behave in time-symmetric situations. So let’s suppose it’s a rock. Alice writes something on the rock, maybe a weak demand for a window closing. Bob may be able to interpret Alice’s writing because of watching earlier (to him) broadcasts in her language (there I go using photons, but let’s not worry about that). But there’s still a problem. Bob thinks he threw the rock. As his subjective time goes forward, the rock is more distant, or at least it left traveling away from him. Even worse, the rock has been with him, inscription and all, before – in his time – it’s sent away. How did the inscription get there? After all, rocks can be smooth or rough, but to have inscriptions in an existing language requires a highly unlikely series of events. So Bob has seen the 2nd law violated. Maybe he didn’t realize it at first because he didn’t know Alice’s language, and maybe it looked like the usual broken patterns of rocks. But if he can read it, it would be an unlikely event, one violating his version of the 2nd law. (And if he can’t read it, there’s no paradox.) That violation would have taken place earlier, possibly before contact with an opposite-arrow system was encountered. The bottom line in all this is that you’d have trouble recognizing that you were encountering an opposite arrow system You’d have more noise than expected, but maybe that’s Murphy’s law – something always goes wrong – and to someone trying to interpret the signal might not be convinced, it’s a reversed arrow system.

Further Issues There are also issues that go beyond the physics calculations I’ve done and might be called philosophy. First and foremost, why have two-time boundary conditions. My answer is that the usual initial condition prejudices the arrow of time. A proper derivation of the arrow does not presuppose it. And if you’d like to prove that expansion of the universe drives the arrow, you should phrase the argument in an unprejudiced manner. The second answer has to do with the uncertainties of cosmology. The Russian physicist Lev Landau said of cosmologists: often in error, never in doubt. Cosmology is difficult, and it takes a certain hubris to make claims about the universe. At this stage there are still serious scientists who say there may be a collapse ahead. Another problem in my work is the use of macroscopic boundary conditions. Presumably, if the world is deterministic, a single boundary, with all detail, would be sufficient. The use of perturbations is a related problem. Let me elaborate. Using the framework of the present article, I’ve shown (Schulman 2001) that “causality is

77 Two-Way Thermodynamics

1999

an effect.” To give this meaning, I first do a two-time boundary value problem, with low entropy at both ends, without perturbation, and with boundary times sufficiently separated to allow equilibration in the middle. Entropy rises, reaches a maximum, and then declines. Then I perturb the system by having a given time step,  t0 ,use a 32 different map, not the cat map, but something faster, such as Mfast ≡ (the 43 determinant is still 1 – it’s area preserving). I then solve the new boundary value problem with the same macroscopic boundary conditions (but different exact paths, since the collection of maps is different). Now there are three possibilities. If the “given time” is during the period when the system is in equilibrium (so t0 > 5 and tfinal − t0 > 5 in the examples given), there is no significant change in the entropic history, although the exact paths – and the cryptic constraints – are different. If the time for the perturbation is before the system reaches equilibrium, then the increase in entropy is later than the perturbation, and no macroscopic effect is visible before the perturbation. What’s interesting is that if the perturbation occurs in the period when the arrow of time is opposite (so tfinal − t0 < 5 in the example given), i.e., during the last stage of evolution, then the increase in entropy is only before the time of the perturbation. (A word about “before” and “after”: here I use the time of the programmer.) Thus causality, effect following cause, depends on the arrow of time, as reflected in the increase of entropy. The “philosophical” question that I have then is the perturbation – how to justify its use. If the world follows a definite path, then it can’t be perturbed. It is what it is.

Conclusions Conceptually there’s nothing preventing opposite arrow matter from existing. It would be difficult to detect, but that’s not a reason to believe it’s not there. One scenario would have a collapsing universe with a piece of the other-time-direction stuff, a piece that dissipates slowly, come close. There would be a gravitational interaction but I doubt if signals could be exchanged. However, the gravitational effect would be accompanied by increased and not otherwise explained dissipation in systems that can be seen. There may be other scenarios, related to the vastness of the universe, which for all we know is infinite. In any case, until there is experimental or observational evidence for this, the idea remains a speculation.

Appendix: Precise Definition of the Modified “Cat” The usual cat map is  M=

11 12



  x , (x, y ∈ [0, 1)) , , ξ ≡ Mξ mod 1, ξ = y 

(3)

2000

L. S. Schulman

where the mod 1 means modulo 1, i.e., subtract (or add) integers so the remainder satisfies 0 ≤ x  < 1 and 0 ≤ y  < 1. A second map, the coupling of A and B, is  Sα =

1α 01







, ξ ≡ Sα ξ mod 1, ξ =

xA yB



 or ξ =

xB yA



 A single step of the combined operation, acting on a 4 vector η =

(4)

. ξA ξB

 (where

indices A and B refer to Alice and Bob) is ¯ S¯α/2 η, η = S¯α/2 M

(5)

where ⎛

1 ⎜1 M≡⎜ ⎝0 0

1 2 0 0

0 0 1 1

⎛ ⎞ ⎞ 100α 0 ⎜ ⎟ 0⎟ ⎟ and Sα ≡ ⎜ 0 1 0 0 ⎟ , ⎝0 α 1 0 ⎠ 1⎠ 0001 2

(6)

and the bars atop S and M indicate that the mod 1 operation is performed after acting with each matrix. Acknowledgments I thank Frank Avignone, Josi Avron, Daniel ben-Avraham, Richard Creswick, Paolo Facchi, Shmuel Fishman, Bernard Gaveau, Eva Mihóková, Dima Mozyrsky, Charles M. Newman, Amos Ori, Saverio Pascazio, Marco Roncadelli, Antonello Scardicchio, Leonard J. Schulman, Eran Singer, and Jeffrey Weeks for helpful discussions.

Notes 1k −23 J/K. Boltzmann has been honored by naming a useless constant Boltzmann ≈ 1.38 × 10 after him (which is a shame, considering Boltzmann’s stature). Temperature should be measured in energy units, but in a world in which people cannot agree on Celsius vs. Kelvin vs. Centigrade vs. Fahrenheit, who can demand logic. This gives entropy units of joules per Kelvin, whereas it should be dimensionless. This constant, with or without units, is the source of the ambiguity mentioned in the text. 2 As pointed out by Schrödinger (1944) and as just demonstrated, entropy is not a hazy concept; it’s measurable like temperature or pressure. Yes, you sometimes multiply it by a constant to match theoretical ideas, but once the constant is defined, it’s measurable – and it increases or stays constant in time. 3 This discussion uses classical mechanics to illustrate its points. We could just as well have used quantum mechanics, but it would unnecessarily complicate the discussion. 4 There are two scenarios that I can imagine in which a two-time boundary condition would arise. The first is theoretical studies of the arrow of time. By giving equal weight to both possible arrows, one can derive one or the other. A second more physical possibility is that we live in a universe that will ultimately collapse. Yes, I know that contemporary ideas are against this notion, but I know of enough “contemporary ideas” about cosmology to realize that occasionally they are just plain wrong. Other scenarios may also exist.

77 Two-Way Thermodynamics

2001

5 This is not the only instance of future boundary conditions in physics. There is a famous paper by Dirac (1938) in which he derives radiation reaction through the non-existence of a selfforce on an electromagnetic particle. There is trouble however in that the equation governing the charged particle is of third order in time and has runaway solutions. The latter are eliminated by Dirac through a future boundary condition. And Dirac is not at all apologetic about this constraint, calling it “the most beautiful feature of the theory.” 6 Although many patent offices refuse to grant patents on perpetual motion devices on the grounds that they violate the first or second laws of thermodynamics, there are serious scientists continuing to test the second law. Daniel Sheehan of the University of San Diego works in this area (Sheehan et al. 2014). Personally I don’t expect success (except for future boundary conditions) and have not read his cited paper carefully, but I think it’s a good idea to test everything. (Well, almost everything. I don’t think it’s necessary to confirm that the earth is not flat.) Physics is an experimental science and despite Eddington’s objections (Eddington 1935) it’s worth trying. 7 This particular map is called the “cat map” because the mathematician V. I. Arnol’d illustrated its equilibration with the image of a cat, which after a few applications of the mapping (to the image) was completely distorted. See Arnold and Avez (1968). 8 The cat map is not time-symmetric by some definitions, but the dynamics in the backward   11 direction shares properties with the forward motion. (The matrices, M ≡ (which governs 12 −1 the cat map) and M have the same two eigenvalues.) In particular, demanding a low entropy state at what might be called a final time requires cryptic constraints, but points selected in this way can be anywhere in the unit square at the initial time (provided there’s enough time to equilibrate). 9 The joke has a person and an anti-matter version of the same person meeting. When they touch, they blow up. 10 Time machine-type paradoxes come up among those who consider Gödel universes or wormholes. As far as I can tell, those who worry about such paradoxes don’t take into account either dissipation or possible changes in the arrow of time.

References Arnold VI, Avez A (1968) Ergodic problems of classical mechanics. Benjamin, New York Dirac PAM (1938) Classical theory of radiating electrons. Proc R Soc Lond A 167:148–169 Eddington A (1935) The nature of the physical world. E.P. Dutton, New York. The quotation concerning the hopelessness of contradicting the 2nd law of thermodynmics is found on page 37 in my 1948 reprinted edition, published by Cambridge University Press Schrödinger E (1944) What is life? Cambridge University Press, Cambridge Schulman LS (1997) Time’s arrows and quantum measurement. Cambridge University Press, New York Schulman LS (1999) Opposite thermodynamic arrows of time. Phys Rev Lett 83:5419–5422 Schulman LS (2001) Causality is an effect. In: Mugnai D, Ranfagni A, Schulman LS (eds) Time’s arrows, quantum measurements and superluminal behavior, Roma. Consiglio Nazionale delle Ricerche (CNR), pp 99–112, available from arXiv:cond-mat/0011507 Schulman LS (2002) Opposite thermodynamic arrows of time. In: Sheehan DP (ed) Quantum limits to the second law, Melville, New York (2002). American Institute of Physics Sheehan DP, Mallin DJ, Garamella JT, Sheehan WF (2014) Experimental test of a thermodynamic paradox. Found Phys 44:235–247 Wheeler JA, Feynman RP (1949) Classical electrodynamics in terms of direct interparticle action. Rev Mod Phys 21:425–433

Visualizing Four Dimensions in Special and General Relativity

78

Magdalena Kersting

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematics of Space and Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Four-Dimensional Spacetime and the Special Theory of Relativity . . . . . . . . . . . . . . . . . . Gravity, Geometry, and the General Theory of Relativity . . . . . . . . . . . . . . . . . . . . . . . . . Black Holes and Numerical Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Revealing Spacetime Through Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Imagination and Artistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analogies and Metaphors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spacetime Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relativistic Ray Tracing and First-Person Visualizations . . . . . . . . . . . . . . . . . . . . . . . . . . Gravitational Lensing and Astrophysical Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Simulations of Gravitational Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual, Augmented, and Mixed Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2004 2005 2006 2008 2014 2015 2016 2018 2020 2024 2026 2029 2032 2036 2037 2037

Abstract Modern physics unfolds on the stage of four-dimensional spacetime. Grappling with century-old ideas of space and time, Albert Einstein revolutionized our understanding of the cosmos by merging space and time into a fourdimensional entity that takes an active role in shaping the laws of physics. While experiments have repeatedly confirmed Einstein’s theories, the abstract

M. Kersting () Department of Physics, University of Oslo, Oslo, Norway ARC Centre of Excellence OzGrav, Swinburne University of Technology, Hawthorn, Australia e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_120

2003

2004

M. Kersting

character of this physical knowledge contradicts the common sense of many. Based on the physics of relativity and the mathematics of differential geometry, scientists have developed visualizations and representations of spacetime to make Einstein’s ideas more intelligible. This chapter explores the links between the mathematics of space and time and our historic struggle to visualize these concepts. Technology serves as the lens to unpack the fruitful interplay between mathematics, physics, and arts that has shaped our understanding of spacetime. Linking mathematical concepts with physical intuition and artistic vision, imaginative thinkers developed representations that led from simple spacetime diagrams and analogies to powerful numerical simulations and virtual environments that allow exploring the extreme physics of black holes and gravitational waves. Visualizations of spacetime continue to be an active field of research that is driven by interdisciplinary efforts to understand the cosmos.

Keywords Spacetime · Special relativity · General relativity · Numerical relativity · Technology · Visualization · Imagination · Interdisciplinary

Introduction Before dealing further with the special theory of relativity, I want to try to convey to the reader what is involved in the new phrase “spacetime,” because that is, from a philosophical and imaginative point of view, perhaps the most important of all the novelties that Einstein introduced. (Russell, 1925)

For many physicists, the theory of relativity is one of the most beautiful mathematical descriptions of our cosmos. In a stroke of genius, Albert Einstein merged space and time into spacetime, a four-dimensional entity that takes an active role in shaping the laws of physics. Einstein’s field equations govern the dynamics of spacetime physics. The gravitational field influences the flow of time, and massive cosmic collisions let space vibrate and ripple. Yet, the abstract nature of four-dimensional spacetime has challenged the imaginative faculties of scientists and laymen alike. Even though the theory of relativity firmly established a new scientific worldview and had vast philosophical repercussions, Einstein’s ideas of curved space and warped time are still perceived as elusive, counterintuitive, and at times even paradoxical (Kersting and Steier, 2018). In their quest to understand our universe, generations of scientists have faced the challenge of visualizing four-dimensional spacetime. Based on the physics of relativity and the mathematics of differential geometry, scientists have developed representations that illustrate features of a theory whose geometry continues to confound. This chapter explores the links between the mathematics of space and time and our historic struggle to visualize these concepts. To unpack this struggle,

78 Visualizing Four Dimensions in Special and General Relativity

2005

technology serves as the lens through which we view possibilities of visualizations in the domain of relativity. According to Martin Heidegger (1977), technology is a way of seeing and understanding the world. In line with this view, the historic struggle to obtain representations of spacetime translates to revealing unknown features of spacetime in order to push the boundaries of our understanding. This quest has propelled the development of relativity from its historic origins into the age of scientific visualizations and numerical relativity. Investigations into the physical properties of spacetime and its mathematical structure continue to motivate scientists in their wish to understand the Universe.

Mathematics of Space and Time The views of space and time which I want to present to you arose from the domain of experimental physics, and therein lies their strength. Their tendency is radical. From now onwards space by itself and time by itself will recede completely to become mere shadows and only a type of union of the two will still stand independently on its own. (Minkowski, 1909)

In the following, we briefly present the mathematics of spacetime to introduce special and general relativity as the stage where century-old ideas of space and time interact with new technologies. Although Einstein published his theory of relativity more than a century ago, much progress has only been made in recent years. As a system of coupled, nonlinear partial differential equations, Einstein’s equations are notoriously difficult to solve. Consequently, scientists study relativistic phenomena through numerical methods. Numerical relativity provides sophisticated visualization techniques that allow visualizing dynamic spacetime phenomena such as binary black hole mergers and gravitational waves (Baumgarte and Shapiro, 2010). In line with the historic development of spacetime, our presentation in this section follows a progression from the mathematical foundations of four-dimensional spacetime and special relativity to the formulation of Einstein’s field equations and general relativity before we turn to numerical relativity and present-day challenges of visualizing spacetime. The exposition of this chapter addresses graduate students or advanced undergraduate students in physics or mathematics without prior knowledge of relativity and differential geometry. We only expect readers to be familiar with linear algebra, basic group theory, and classical mechanics. Following the conventional formalism, we provide a brief introduction to the central concepts of special and general relativity. A more detailed introduction to these topics and more examples can be found in numerous textbooks, for example (Baumgarte and Shapiro, 2010; Carroll, 2003; Guidry, 2019; Hartle, 2003). Once the mathematical stage of special and general relativity is set, our focus shifts to visualizations of relativistic phenomena. Specifically, we show how technology and interdisciplinary have played important roles in understanding and visualizing spacetime. It is interesting to trace the historic origins of spacetime because the pioneers of relativity were very much led by visual-geometric thinking. Many of these pioneers, among them Hermann Minkowski and Henri Poincaré, were mathematicians who

2006

M. Kersting

translated their mathematical insights into a physical theory of space and time (Galison, 1979). Minkowski went so far as to claim that the theory of relativity discovered by Einstein could have been formulated by mathematicians already in the late nineteenth century when it became popular to study geometries via their characteristic groups of transformations. Others agreed with this judgment. In his popular scientific account of relativity, Bertrand Russell (1925) stated that there was little in the theory of relativity that could be regarded as physical laws or physics in the strict sense. Yet, relativity is a branch of physics, not of pure mathematics. Of course, the conclusions and implications of the theory could not have been obtained without the aid of abstract mathematical reasoning (Durell, 1926). Still, Minkowski (1909) repeatedly stressed the importance of experimental physics in deducing that “three-dimensional geometry becomes a chapter in four-dimensional physics.” For Minkowski and others, geometry was not merely an abstraction from physical laws but constituted the very nature of natural phenomena (Petkov, 2014; Walter, 2014). Thus, experimental results and observations played a pivotal role in the development, interpretation, and visualization of spacetime physics: mathematics and physics have always been tightly interwoven in the domain of relativity.

Four-Dimensional Spacetime and the Special Theory of Relativity According to Minkowski (1915), “the world in space and time is, in a certain sense, a four-dimensional non-Euclidean manifold.” The goal of this introduction to the mathematics of space and time is to explain Minkowski’s statement and to supplement it with a precise mathematical formalism. We introduce some necessary notation first. Spacetime is a four-dimensional set in which we combine the time t and the position vector x ∈ R3 into a four-vector x μ = (t, x) = (x 0 , x 1 , x 2 , x 3 ) that labels a point (sometimes called “event”) in spacetime. We use small Greek letters to denote spacetime indices and small Roman letters to denote spatial indices of such four vectors, i.e., μ, ν = 0, 1, 2, 3 and i, j = 1, 2, 3. Since most of our equations will be equations between 4 × 4 matrices, it is convenient to adopt such an index notation where superscripts and subscripts label elements of vectors and matrices. We define the spacetime interval, often also called the line element by ds 2 = −dt 2 + dx 2 + dy 2 + dz2 .

(1)

As is common in relativity, we use natural units in which the speed of light c is set equal to 1. Some authors prefer to define the line element with the c-factor included:

78 Visualizing Four Dimensions in Special and General Relativity

2007

ds 2 = −cdt 2 + dx 2 + dy 2 + dz2 This notation makes explicit that c is a conversion factor between space and time. Thus, in our notation that uses natural units, time is measured in length units. The importance of the spacetime interval in relativity is that it is invariant under changes of inertial reference frames, i.e., reference frames that are either at rest or that move at a constant velocity. One can write the spacetime interval in a more compact form by introducing a 4 × 4 matrix, the Minkowski metric, via

ημν

⎛ −1 ⎜0 =⎜ ⎝0 0

⎞ 000 1 0 0⎟ ⎟. 0 1 0⎠ 001

Here, the two lower indices label the components of the metric. Introducing the standard summation convention, in which indices appearing both as superscripts and subscripts are summed over, we can now express the line element as ds 2 = ημν dx μ dx ν .

(2)

Equations (1) and (2) are therefore equivalent; equation (2) just makes use of the conventional summation formalism that one can find in many textbooks. Minkowski space is the four-dimensional real vector space of all points x μ equipped with the Minkowski metric. Minkowski space presents a special case of a four-dimensional non-Euclidean manifold. In the next section, we properly introduce the notion of a differentiable manifold. In this section, we take a closer look at the non-Euclidean nature of Minkowski space. The geometry of Minkowski space is determined by the Poincaré group, the group of all linear transformations that leave the spacetime interval invariant. The Poincaré group is a ten-parameter non-abelian group that consists of translations, rotations, and boosts. Boosts illustrate the non-Euclidean nature of Minkowski space quite well because they can be thought of as rotations between space and time. Thus, in contrast to a four-dimensional Euclidean space, Minkowski space does not feature an absolute notion of simultaneous events. It depends on the observer and the choice of coordinates whether or not two events occur at the same time. The set of spatial rotations and boosts form the so-called Lorentz group under matrix multiplication. If μ ν is a Lorentz transformation (i.e., a 4 × 4 matrix labeled by two indices μ, ν) that corresponds to the coordinate transformation

2008

M. Kersting 



x μ → x μ = μ ν x ν , then we can express the condition that μ ν be a Lorentz transformation as follows 



ηρσ = μ ρ ν σ ημ ν  .

(3)

One simple example of a boost in the x-direction is given by ⎛



μ ν

⎞ cosh(φ) − sinh(φ) 0 0 ⎜− sinh(φ) cosh(φ) 0 0⎟ ⎟ =⎜ ⎝ 0 0 1 0⎠ 0 0 01

(4)

where φ is the boost parameter. It is a simple exercise for the reader to check that this matrix does, indeed, satisfy equation (3). We get back to this specific boost transformation in the next section when we visualize Lorentz transformations via spacetime diagrams. There is a close analogy between the Lorentz group and the rotation group in three-dimensional Euclidean space. Geometrically, Lorentz transformations can be constructed as a rotation about the origin of coordinates in a four-dimensional vector space with three real axes and one imaginary axis. Of course, it is challenging to build intuitions of such a four-dimensional geometry. Yet, from a physics perspective, it is relatively straightforward to interpret Lorentz transformations. Lorentz transformations are those transformations that leave the form of the equation of a propagating light wave invariant. The structure of Minkowski spacetime is therefore a natural consequence of special relativity that postulates that there exists a fixed velocity c = 1 at which electromagnetic waves propagate in vacuum. Publishing his special theory of relativity, Einstein (1905) embraced the implications of the light postulate: while absolute motion, absolute space, and absolute time do not have any physical significance, it is the relation between these items in form of the spacetime interval that has physical significance. More than anything else, special relativity is therefore a theory of the structure of spacetime.

Gravity, Geometry, and the General Theory of Relativity Special relativity replaced the Euclidean geometry of three-dimensional space with the geometry of a four-dimensional non-Euclidean differentiable manifold. In the last section, we encountered already first examples of non-Euclidean features of spacetime, for example, the observer-dependence of simultaneity. In this section, we turn to the definition of differentiable manifolds and show how gravity can be interpreted as the curved geometry of four-dimensional spacetime. Our introduction follows Carroll (2003) and Wald (1984), and the interested reader can find more details in those two textbooks. The knowledge of the structure of spacetime served

78 Visualizing Four Dimensions in Special and General Relativity

2009

as the point of departure for Einstein’s discovery of general relativity (Petkov and Ashtekar, 2014). In general relativity, four-dimensional spacetime is no longer flat but identified with a curved manifold. Despite the need to introduce a certain amount of mathematical formalism to discuss four-dimensional spacetime curvature in a quantitative way, the basic notion that the curvature of spacetime leads to gravitational phenomena is quite simple and elegant (Carroll, 2003). A differentiable manifold (or just a manifold) generalizes the notion of Euclidean spaces. Manifolds are sets that look locally like flat (Euclidean) space Rn but that can have a different (non-Euclidean) global geometry. To define manifolds we first formalize the notion of a coordinate system (sometimes also called chart) and an atlas. For a given set M, a coordinate system consists of a subset U ⊂ M and a one-to-one map φ : U −→ Rn such that φ(U ) ⊂ Rn is open. A (smooth) atlas is an indexed collection of charts (Uα , φα ) such that 1. the union of all Uα covers M, 2. if Uα ∩ Uβ = ∅, then the map (φα ◦ φβ−1 ) is smooth and takes points in φβ (Uα ∩ Uβ ) ⊂ Rn onto an open set φα (Uα ∩ Uβ ) ⊂ Rn . We can think of an atlas as a collection of coordinate systems that describe individual regions of M and that are smoothly related on their overlap. Since a set M can be covered in many different ways using different combinations of coordinate systems, an atlas is not unique. Two atlases of M are equivalent if their union is also an atlas of M. We define a maximal atlas as the equivalence class containing the given atlas under this equivalence relation, i.e., the atlas containing all possible coordinate systems consistent with this atlas. The maximal atlas of a set M is unique. The notion of a maximal atlas finally allows us to define a manifold. A differentiable manifold is a set M together with a maximal atlas of M. With this definition we have now formalized Minkowski’s statement that the world in space and time is a fourdimensional non-Euclidean manifold. What makes manifolds and four-dimensional spacetime interesting is of course not their locally flat character but their global geometry that often deviates starkly from flat geometry. We therefore turn to the metric tensor (often just called the metric) as our next object of study. As a generalization of the flat Minkowski metric ημν , the metric tensor gμν is one of the most important objects in general relativity because it completely determines the geometry of spacetime. Among other things, the metric tensor allows defining geodesic curves, i.e., the spacetime analogue of shortest distance paths, and it provides the basic building blocks to construct the curvature tensor. To define the metric, we first introduce tangent spaces which are real vector spaces associated with each point of a given manifold M. Intuitively, we can think of tangent spaces as the spaces of all possible directions in which one can tangentially pass through a point x on M. Tangent vectors, elements of this vector space, are thought of as the velocity of a curve passing through x. However, this intuitive picture makes use of the fact that we often think of a manifold as embedded into an ambient vector space Rm (Fig. 1). It is desirable, though, to define the notion of a tangent space based solely on intrinsic properties of the manifold. To that end, we define a tangent vector as an equivalence class

2010

M. Kersting

Fig. 1 A pictorial representation of the tangent space of a single point on a sphere. Intuitively, the tangent space contains all possible directions in which one can tangentially pass through a point on the sphere. (Credits: Alexwright at English Wikipedia [Public domain])

Fig. 2 The tangent space Tx M of a manifold M at the point x ∈ M is defined via tangent vectors v ∈ Tx M along a curve γ (t) that passes through x. (Credits: derivative work by McSush; the original uploader was TN at German Wikipedia [Public domain])

of curves passing through a point x ∈ M in the following way. Pick a coordinate chart φ : U −→ Rn where U is an open subset of M containing x. For two given curves γ1 , γ2 : (−1, 1) −→ M that satisfy γ1 (0) = x = γ2 (0) and for which φ ◦ γ1 , φ ◦ γ2 : (−1, 1) −→ Rn are smooth functions, we say that γ1 and γ2 are equivalent at 0 if and only if the derivatives of φ ◦ γ1 and φ ◦ γ2 coincide at 0. This defines an equivalence relation on the set of all curves γ : (−1, 1) −→ M that satisfy γ (0) = x and φ ◦ γ : (−1, 1) −→ Rn is smooth. We define tangent vectors of M at x to be such equivalence classes of curves and we denote the tangent vector by γ  (0). In a next step, we define the tangent space of M at x, denoted by Tx M, to be the set of all tangent vectors at x (Fig. 2). The vector space operation on Tx M is given by a map dφx : Tx M −→ Rn that we define via dφx (γ  (0)) :=



d

(φ ◦ γ )(t)

t=0 dt

78 Visualizing Four Dimensions in Special and General Relativity

2011

One can check that these definitions do not depend on the choice of coordinate chart φ : U −→ Rn . It is natural to think of tangent vectors as directional derivatives, and indeed, this identification allows us to specify a basis of the tangent space for a given coordinate system. In classical multivariable calculus, for a vector v ∈ Rn , one defines the directional derivative at a point x ∈ Rn by n

∂f d

(Dv f )(x) := [f (x + tv)]

= v i i (x) ∀f ∈ C ∞ (Rn ). t=0 dt ∂x i=1

To extend this notion to more general manifolds, we think of a tangent vector v ∈ M as the initial velocity of a curve γ , i.e., v = γ  (0). Then we define the covariant derivative of v ∈ Tx M at x ∈ M by Dv (f ) := (f ◦ γ ) (0) ∀f ∈ C ∞ (M). Using this identification, we can now construct a basis of the tangent space in the 1 n n following way. For a given n system φ = (x , . . . , x ) : U −→ R with coordinate  of Tp M by p ∈ U , we define a basis ∂x∂ i p i=1

∀i ∈ {1, . . . , n}, ∀f ∈ C ∞ (M) :

∂d  ∂x i

p

:= (∂i (f ◦ φ −1 ))(φ(p)).

Using the coordinate system φ = (x 1 , . . . , x n ) : U −→ Rn , we can write every tangent  vector v ∈ Tp M as a linear combination of the basis tangent vectors ∂ ∂x i p

∈ Tp M in the following way:

v=

n

v(x i )

i=1

∂  ∂x i p

Since tangent spaces are attached to specific points x ∈ M, it is useful to look at sets of tangent vectors with exactly one tangent vector at each point on the manifold. Such vector fields are generalizations of the velocity field of a particle moving in space. A vector field X attaches to every point x ∈ M a tangent vector X(x) ∈ Tx M in a smooth manner. More precisely, using the above decomposition of tangent vectors as linear combinations of basis vectors, we say that a vector field X on M is smooth if for any coordinate system φ = (x 1 , . . . , x n ) : U −→ Rn , we have, for any point a ∈ U , X(a) =

n i=1

f i (a)

∂  ∂x i a

2012

M. Kersting

for some smooth functions f i : U −→ R. We are now finally in the position to define the metric tensor g as a family of bilinear, symmetric, nondegenerate functions gx : Tx M × Tx M −→ R, x ∈ M such that for every pair of vector fields X, Y on M, the map x → gx (X(x), Y (x)) defines a smooth function M −→ R. We write g(X, Y )(p) = gp (X, Y ) for p ∈ 1 n M. For a given coordinate system (x , . . . , x ) on an open set U ⊂ M and the corresponding basis of vector fields X1 = ∂x∂ 1 , . . . , Xn = ∂x∂ n , the metric g has  components gij = g ∂x∂ i , ∂x∂ j . These n2 functions form the entries of an n x n symmetric matrix. In the context of relativity where our manifold corresponds to four-dimensional spacetime, the metric is usually denoted gμν following standard index notation. For the rest of this chapter, we will focus on the specific case of four-dimensional spacetime. Since the metric tensor is nondegenerate, there exists μ μ an inverse metric tensor g μν such that g μν gνσ = δσ . Here, δσ denotes the Kronecker delta   δμμ

=

0 if μ = μ , 1 if μ = μ .

We think of the metric as an inner product on the tangent space at each point of the manifold that varies smoothly from point to point. This construction allows us, in particular, to define local notions of angles or lengths of curves on our manifold. We will soon see how the metric tensor replaces the classical gravitational field in Einstein’s theory of gravity, thus linking differential geometry to the physics of general relativity. However, before we turn to gravity, we first formalize the idea of curvature that is intimately linked to the metric tensor. The basic idea of the construction is to express the intrinsic curvature of a manifold by assigning a curvature tensor to each point of the manifold. This tensor measures the extent to which the metric is not locally isometric to the metric of Euclidean space. The curvature tensor relies on a geometric object that connects nearby tangent spaces by relating tangent vectors in these different spaces. This object is the Christoffel λ defined by connection μν λ =

μν

1 λσ g (∂μ gνσ + ∂ν gσ μ − ∂σ gμν ). 2

Using the Christoffel connection, we define the Riemann curvature tensor ρ Rσ μν by

78 Visualizing Four Dimensions in Special and General Relativity ρ

2013 ρ

ρ ρ λ λ Rσρ μν = ∂μ νσ − ∂ν μσ + μλ νσ − νλ μσ .

The Riemann curvature tensor completely encodes the curvature of spacetime. One can show that if a coordinate system exists in which the components of the metric are constant, the Riemann tensor will vanish. Conversely, if the Riemann tensor vanishes, one can always construct a coordinate system in which the metric components are constant (Carroll, 2003). This relation between the metric and the curvature tensor shows again why the Minkowski metric ημν does indeed describe the flat spacetime of special relativity. Let us now turn to the curved spacetime of general relativity and to Einstein’s elegant idea that the curvature of spacetime leads to gravitational phenomena. Generally, there are two parts to describe the physics of gravitational interactions: first, how the gravitational field influences the movement of matter and, second, how matter determines the gravitational field. Classically, there are two equations that correspond to each part. One equation describes the acceleration a of an object in a gravitational potential φ a = −∇φ, and the second equation expresses the gravitational potential in terms of the matter density ρ and Newton’s gravitational constant G ∇ 2 φ = 4π Gρ. In what physicist Max Born (1968) described as “the greatest feat of human thinking about nature, the most amazing combination of philosophical penetration physical intuition, and mathematical skill,” Einstein (1915) identified the curvature of spacetime with the gravitational potential. It is the curvature of spacetime that acts on matter; energy and matter, in turn, influence the geometry of spacetime. Einstein’s field equations describe this dynamic interplay and make up the heart of general relativity: 1 Rμν − Rgμν = 8π GTμν 2 Here, the left-hand side of the equation encapsulates the curvature of spacetime λ where Rμν = Rμλν is the contraction of the Riemann curvature tensor and R = μν g Rμν is the trace of this contraction. The right-hand side of the equation describes the energy and matter content of the spacetime region. Tμν is the stress-energy tensor that describes the density and flux of energy and momentum in spacetime. Just as the mass density is the source of the gravitational field in classical physics, the stressenergy tensor is the source of the gravitational field in general relativity. G is again the gravitational constant that also appears in the classical equation of gravity.

2014

M. Kersting

Black Holes and Numerical Relativity One of the most exotic and spectacular consequences of general relativity is the existence of black holes (Guidry, 2019). Black holes are regions in spacetime that become so curved that they trap light. The mathematical description of a black hole is surprisingly simple and given by the Schwarzschild metric, the unique spherically symmetric vacuum solution of Einstein’s field equations. In spherical coordinates {t, r, θ, φ}, the Schwarzschild solution reads 2GM 2  2 2GM 2 −1 2 ds 2 = − 1 − dt dt + 1 − dt dr + r 2 d2 r r

(5)

where d2 is the metric on a unit two-sphere d2 = dσ 2 + sin2 (θ )dφ 2 . The Schwarzschild metric describes empty spacetime around a gravitating object such as the Sun or the Earth. In this context, the constant M is interpreted as the mass of this object. The most interesting property of the Schwarzschild solution is that the line element in equation (5) contains two singularities at r = 0 and r = 2GM, that is, two points at which the metric is not defined. The quantity RS = 2GM is called the Schwarzschild radius, and it plays a central role in the description of the spacetime geometry around a black hole (Guidry, 2019). The Schwarzschild radius is often called the event horizon because it defines the radius at which spacetime curvature is so strong that the escape velocity exceeds the velocity of light. Since light is trapped inside this radius, the region interior to RS is called a black hole. While RS is a coordinate singularity that can be removed by choosing different coordinates, r = 0 is a physical singularity whose meaning is hard to interpret both mathematically and physically. By their very nature, black holes are difficult to visualize and pose severe challenges to our ability to find representations of these extreme spacetimes. Moreover, black hole phenomena that are astrophysically realistic such as the spiraling and merging of two black holes are only accessible through numerical simulations. In these scenarios, the exact form of spacetime is no longer known, and all analytical methods break down (Varma, 2019). Studying black holes is one of the main goals of numerical relativity, which is the art and science of developing computer algorithms to solve Einstein’s equations for astrophysically realistic systems (Baumgarte and Shapiro, 2010). In fact, much insight into the physics of space and time has only been gained in the age of numerical computing. For most of the twentieth century, computer technology was not advanced enough to support numerical solutions to Einstein’s equations. By and large, progress in numerical relativity was impeded by lack of computers with sufficient memory and computational power to perform well-resolved calculations of realistic spacetime scenarios (Anninos et al., 1995). Not only are the equations difficult to solve because they are multidimensional, nonlinear, coupled partial differential equations in space and time, but they present

78 Visualizing Four Dimensions in Special and General Relativity

2015

additional complications due to the existence of singularities. Even today numericalrelativity simulations of merging black holes might take months of computational time on powerful supercomputers (Varma et al., 2019). Historically, first steps toward numerical approaches were taken by Arnowitt et al. (1962) who proposed a decomposition of spacetime back into separated space and time parts. Such a decomposition in form of sequences of three-dimensional space-like hypersurfaces allows reformulating Einstein’s field equations as an initial value problem suitable for numerical solutions. However, computer technology at that time was still in its infancy, and it was only in the 1980s that Stark and Piran (1985) attempted the first realistic calculations of a rotating black hole. In the following three decades, both computers became more power, and researchers developed new computational techniques to avoid the interior spacetime singularities in the course of their simulations (Baumgarte and Shapiro, 2010). For example, the so-called excision technique does not evolve portions of spacetime inside the event horizon surrounding the singularity of a black hole; rather, one only solves the equations outside of the event horizon (Alcubierre and Brugmann, 2001). Alternatively, the puncture method factors the solution of Einstein’s equations into an analytical part that contains the black hole singularity and a numerically constructed part which is free of singularities (Brandt and Bruegmann, 1997). The breakthrough in numerical relativity came in 2005 when Pretorius (2005) achieved the first successful numerical relativity simulation of the merging of two black holes. Based on an improved version of the puncture method that allowed punctures to move through the coordinate system, accurate long-term evolutions of two black holes orbiting each other became possible for the first time. Today, numerical relativity and visualizing the results of such relativistic simulations are fields which still offer many challenging and exciting problems for researchers (Ruder et al., 2008; Varma et al., 2019). Often, sophisticated visualizations of binary back holes serve as a reality check of numerical simulations. For example, field lines of spacetime curvature elucidate the nonlinear dynamics of curved spacetime in merging black-hole binaries (Owen et al., 2011). Nowadays, the use of computers and methods of computer graphics has greatly increased the potential and scope of visualization of spacetime, not only for the purpose of disseminating science but also as a tool for researchers to develop an intuitive understanding of their results (Ruder et al., 2008; Varma et al., 2019).

Revealing Spacetime Through Technology No man can visualize four dimensions, except mathematically. We cannot even visualize three dimensions. I think in four dimensions, but only abstractly. The human mind can picture these dimensions no more than it can envisage electricity. Nevertheless, they are no less real than electromagnetism, the force which controls our universe, within, and by which we have our being. (Einstein 1929)

Technology has always provided crucial perspectives on how we navigate, experience, and understand the world around us. Thus, it seems natural to take

2016

M. Kersting

technology as the lens through which we view possibilities of visualizations in the domain of relativity. The point of entrée into our explorations of relativistic visualizations is the observation that technology, at its heart, is a way of seeing and understanding the world. Indeed, according to Heidegger (1977), technology must be understood as “a way of revealing.” In line with Heidegger’s famous analysis of technology, we understand technology as a particular approach of orienting ourselves to the world so that reality is brought forth through the act of revealing. This perspective on technology aligns well with the task of visualizing four-dimensional spacetime to reveal the basic structure of our cosmos. In the regimes of everyday life, the non-Euclidean character of our world does not become apparent. Visualizations can help us reeducate our intuitions with respect to a world that is best described by a four-dimensional curved manifold (Chandler, 1994). More generally, visualizations can change the way scientists think about the world. In contrast to the common belief that visualizing and analyzing scientific phenomena are separate tasks, these two go hand in hand when scientists explore abstract concepts and phenomena (Goodman, 2012). Often, emerging technologies such as skills, techniques, and knowledge create new access to known concepts, allow for deeper insights, and facilitate physical intuition. By adopting a broad definition of technology as a way of revealing, experiencing, and understanding reality, we explore how century-old ideas of space and time have interacted with evolving tools and technical expertise in relativity. For the remainder of this chapter, we explore some of these attempts with a particular focus on how different visualizations revealed new insights into the structure of space and time.

Imagination and Artistry Imagination lies at the heart of science and art because it involves interacting with situations that are different from the present reality. Before the age of computers, scientists had to rely on imagination, artistry, and analogies to visualize space and time. Doing so, they took on a technological approach toward understanding four dimensions because they facilitated their ways of seeing by entering into different relations with the abstract concept of spacetime. Orienting themselves to particular spacetime features, scientists, mathematicians, and artists were able to reveal spacetime in new ways by relying on their imagination. It takes imaginative efforts to explore the physical implications of a mathematical structure that asks us to let go of absolute space and universal time, concepts that seem integral to our understanding of the world (Woodhouse, 2014). Imagining involves efforts to bring forth the unexperienced, immaterial, or non-present by bringing together disparate aspects of the object of imagination into a perceivable whole (Steier and Kersting, 2019). Literature served as an excellent medium to bring together our everyday experience of motion and the existence of a universal

78 Visualizing Four Dimensions in Special and General Relativity

2017

speed limit. Relativistic effects become noticeable when one approaches velocities close to the speed of light. An early literary example that explores such relativistic movement to imagine the strange nature of Minkowski space is George Gamow’s popular scientific book “Mr Tompkins in Wonderland” (Gamow, 1940). In this book, the title character Mr Tompkins dreams of an alternative world where the speed of light is set to a lower numerical value. In this dream world, the relativistic concepts of space, time, and motion are accessible to our ordinary senses: A single cyclist was coming slowly down the street and, as he approached, Mr Tompkins’s eyes opened wide with astonishment. For the bicycle and the young man on it were unbelievably shortened in the direction of the motion, as if seen through a cylindrical lens. The clock on the tower struck five, and the cyclist, evidently in a hurry, stepped harder on the pedals. Mr Tompkins did not notice that he gained much in speed, but, as the result of his effort, he shortened still more and went down the street looking exactly like a picture cut out of cardboard.

Gamow chose a literary approach to explore how an observer perceives relativistic length contraction, the phenomenon that a moving object’s length is measured to be shorter than its length as measured in the object’s own rest frame. Even though Gamow failed to recognize the difference between measuring and visually perceiving relativistic phenomena (Kraus, 2008), he was an early popularizer of the idea that the division of spacetime into space and time is a choice we make for our own purposes, not something intrinsic to the world. Visualizations of black holes serve as another example of the fruitful interplay between artistic, imaginative, and mathematical explorations of spacetime. As one of the flashier fruits of Einstein’s century-old insights, black holes are deeply embedded in the popular imagination and show up regularly in movies, books, and TV shows (Cole, 2019). Presumably no other scientific concept has fueled the imagination of scientists and artists in the same way because black holes push the known laws of physics to their limits (Fig. 3). According to astrophysicist Chandrasekhar (1992) ,“the black holes of nature are the most perfect macroscopic objects there are in the universe: the only elements in their construction are our concepts of space and time.” Naturally, artists imagined possibilities of black hole physics long before supercomputers were able to simulate spacetime around black holes. Figures 4 and 5 show artistic illustrations of black holes that reveal several features of our concepts of space and time around these exotic objects. Figure 4 shows an artistic illustration of two spiraling black holes that orbit one another in a plane. These two black holes have different orientations relative to the overall orbital motion of the system. In the artist’s conception, swirling gas around the black holes helps visualize the spiraling motion of the merging process. Also Fig. 5 uses swirling gas in form of an accretion disk to visualize a supermassive black hole at the core of a young, star-rich galaxy. This illustration also shows how supermassive black holes can distort space around them in a phenomenon called gravitational lensing. We explore visualizations of binary systems and gravitational lensing in more detail in the next section.

2018

M. Kersting

Fig. 3 According to physicist John Wheeler (1998), “[the black hole] teaches us that space can be crumpled like a piece of paper into an infinitesimal dot, that time can be extinguished like a blownout flame, and that the laws of physics that we regard as ‘sacred,’ as immutable, are anything but.” In this visualization, a digital artist has imagined a scene that pushes the known laws of physics to their limit: a black hole is about to swallow a neutron star creating a trail of swirling gas during the process. Credit: Carl Knox/OzGrav, https://www.ozgrav.org

Analogies and Metaphors Closely related to imagination and artistry is the generation of analogies and metaphors that play a central role in scientific practice, thought, and creativity (Kapon and DiSessa, 2012). By constructing similarities between two objects, analogies and metaphors enable new ways of revealing aspects of the world. Not only does science produce these figures of thought for subsequent development in the arts. The technological importance of analogies and metaphors lies in their ability to serve as frames through which we perceive and make meaning of the world (Hesse, 1953). The rubber sheet analogy is one of the most prevailing popular visualizations of spacetime and presumably also one of the oldest representations in the context of general relativity. According to Einstein’s field equations, spacetime tells matter how to move, and matter tells spacetime how to curve (Wheeler, 1998). The rubber sheet analogy captures this basic mechanism of gravitational attraction by comparing the four-dimensional fabric of spacetime to a two-dimensional stretched rubber sheet. The dynamic interplay between the movement of massive objects and the curvature of spacetime is illustrated by heavy objects placed on the rubber sheet that warp the sheet and that influence the movement of other objects on the sheet (Fig. 6).

78 Visualizing Four Dimensions in Special and General Relativity

2019

Fig. 4 An artist’s conception shows two merging black holes which will ultimately spiral together into one larger black hole. In this illustration, the black holes are spinning in a nonaligned fashion, which means they have different orientations relative to the overall orbital motion of the pair. (Credit: Courtesy LIGO/Caltech/MIT/Sonoma State (Aurore Simonnet))

Even though Einstein did not explicitly use the rubber sheet analogy to explain his field equations, he did use the analogy of a soft cloth to illustrate boundary problems in his gravitational theory. In a letter to his colleague Willem de Sitter, he wrote in 1917: “Our problem can be illustrated with a nice analogy. I compare the space to a cloth floating (at rest) in the air, a certain part of which we can observe. This part is slightly curved similarly to a small section of a sphere’s surface.” (Hentschel, 1998) Although the rubber sheet analogy is simple, it exhibits great explanatory power by revealing the dynamic interplay between gravity, space, and time. Illustrating the geometric and universal nature of gravity, the analogy can visualize orbital motions, curved space, and photon trajectories in intuitive ways (Kersting and Steier, 2018). At the same time, the explanatory scope of the analogy is limited because it simplifies four-dimensional spacetime to a two-dimensional spatial fabric. The analogy suggests that space curves into an unseen dimension while neglecting the role of warped time all together. Since the human mind cannot visualize four dimensions and much less curvature of a four-dimensional entity, analogies will always have limitations in their ability to reveal the nature of spacetime (Fig. 7).

2020

M. Kersting

Fig. 5 This artist’s conception illustrates a supermassive black hole (central black dot) at the center of a young galaxy. As the drawing shows, gas swirls around a black hole in what is called an accretion disk. Space and light around the black hole are distorted which is illustrated by the warped stars behind the black hole. (Credit: Courtesy NASA/JPL-Caltech)

Spacetime Diagrams In the early days of relativity, imagination and analogies served as one route toward finding useful visualizations of four-dimensional spacetime. Another route employed two-dimensional representations in form of spacetime diagrams. As one of the earliest scientific visualizations of Minkowski space, spacetime diagrams revealed the causal structure of our four-dimensional world. To this day, spacetime diagrams are a routine way of introducing students to the physics of space and time because they are useful tools to reason with relativistic kinematics. Historically, Minkowski had to persuade physicists of the value of his spacetime approach since his treatment of spacetime was couched into the abstract language of four-dimensional vector calculus (Walter, 2014). To provide visual aids, Minkowski employed spacetime diagrams as geometrical interpretations that offered diagrammatic readings of the Lorentz transformations and illustrated the light-cone structure of spacetime. The underlying idea is to suppress two spatial dimensions and only consider the time axis t and one spatial axis x at right angles (Fig. 8). The starting point for the development of special relativity was the observation that the speed of light is finite.

78 Visualizing Four Dimensions in Special and General Relativity

2021

Fig. 6 The rubber sheet analogy is a two-dimensional representation that visualizes how our Sun and Earth warp spacetime. The green grid illustrates that spacetime is a malleable fabric on which massive objects move along paths that are determined by the geometry of spacetime. (Credit: LIGO/T. Pyle)

Fig. 7 The rubber sheet analogy is a ubiquitous visualization of spacetime but it has a limited explanatory scope. (Credit: https://xkcd.com/895/, Creative Commons AttributionNonCommercial 2.5 License)

This very same observation gives guidance when constructing spacetime diagrams. The paths that correspond to travel at the speed of light c = 1 are given by x = t. If we imagine adding one more spatial coordinate, these two diagonal lines will form a cone (Fig. 9). This light cone describes the set of points that are connected to a single spacetime event by straight lines at 45◦ angles.

2022

M. Kersting

Fig. 8 A spacetime diagram is a two-dimensional representation of four-dimensional spacetime that offers illuminating visualizations of Lorentz transformations that relate the (t, x)-coordinates to the (t  , x  )-coordinates. In accordance with the light postulate of special relativity, light cones remain unchanged under these transformations

Fig. 9 Light cones in spacetime diagrams visualize the causal structure of Minkowski space by dividing the set of spacetime events into timelike, lightlike, and spacelike separated points

Light cones naturally visualize causality and the physics of simultaneity because they are divided into future and past. The set of all points inside the future and past light cones of a given spacetime event are timelike separated from this event, those outside the light cones are spacelike separated, and those on the cones are lightlike separated. Using the definition of the line element in equation (1), we see that the

78 Visualizing Four Dimensions in Special and General Relativity

2023

interval between timelike separated points is negative, between spacelike separated points positive, and between null separated points zero. Spacetime diagrams offer illuminating visualizations of Lorentz transformations. If we return to the previously introduced Lorentz transformation in equation (4) that describes a boost in the x-t-plane, we see that the transformed coordinates t  and x  are given by t  = t cosh φ − x sinh φ x  = −t sinh φ + x cosh φ. Therefore, the point defined by x  = 0 moves with the velocity v=

x = tanh φ. t

If we replace φ = tanh−1 v and set γ = √ 1

1−v 2

, we recover the familiar form of

Lorentz transformations that are conventionally used to derive length contraction and time dilation: t  = γ (t − vx) x  = γ (x − vt). To see how this Lorentz transformation rotates the space and time axes into each other, we just have to look at the new axis x  which for t  = 0 is given by t = x tanh(φ). Similarly, the t  axis for x  = 0 is given by t=

x . tanh(φ)

Thus, contrary to our Euclidean intuition, the transformed space and time axes scissor together; it is impossible to say whether a point that is spacelike separated from a spacetime event is in the future, past, or simultaneous to that event (Fig. 8). Although different observers can clearly distinguish between space and time, the distinction drawn by one observer is not the same as that drawn by another. What one observer measures as “time,” another one might measure partly in space and partly in time (Durell, 1926). Attempting to visualize four dimensions in special relativity amounts to finding representations of a geometry that is governed by the Lorentz transformations. The basic postulate of special relativity that the speed of light is constant finds a prominent expression in the fact that Lorentz transformations leave the paths x = t invariant; the paths defined by x  = t  are precisely the same as those defined by

2024

M. Kersting

x = t. Thus, spacetime diagrams are an early example of the usefulness of visualgeometric thinking in special relativity and present a powerful tool to reveal the structure of four-dimensional spacetime.

Relativistic Ray Tracing and First-Person Visualizations There is a subtle but important difference between attempts to visualize spacetime from an exocentric and an egocentric perspective. Spacetime diagrams are exocentric visualizations because they place the observer outside of the concept that is to be visualized. A more intuitive approach to visualizing four dimensions adopts an egocentric perspective that corresponds to a first-person visualization (Kraus, 2008; Weiskopf et al., 2006). The key idea of egocentric visualizations is to translate the geometry of spacetime into an experienceable relativistic scenario similar to the dream worlds of Mr Tompkins. First-person visualizations of relativistic movement often reveal counterintuitive features of our four-dimensional world that can be perplexing even to experts in the field (Kraus, 2008). Placing the observer directly into the scene, a key technique to visualize relativistic phenomena is relativistic ray tracing. Based on light propagation through spacetime, the basic idea is to follow light rays that travel backward in time from an observer to the spacetime scene in question. These light rays generate a map that produces an image of objects seen in the scene (James et al., 2015). In the language of general relativity, each light ray follows a geodesic curve through spacetime. Geodesic curves are spacetime generalizations of straight paths. A parameterized curve x μ (λ) is a geodesic if ρ σ d 2xμ μ dx dx = 0. +

ρσ dλ dλ dλ2 λ is the Christoffel connection that we defined in the previous section. where μν One of the most famous examples of relativistic ray tracing is the visualization of a wormhole in the Hollywood blockbuster Interstellar that built science into its very fabric (James et al., 2015). Interstellar was the first movie to correctly depict a wormhole as it would be seen by a nearby observer. To visualize the appearance of objects under the influence of a strong gravitational field, the simulation had to take into account the bending of light through curved spacetime around the wormhole. The team of physicists behind these visualizations wrote an instructional paper in which they explained the main steps of the ray tracing process including its numerical implementation (James et al., 2015). Interstellar provides us with yet another example of how artistic approaches toward exploring spacetime phenomena – in this case film-making – inspired scientists to push the boundaries of our physical knowledge. Numerical first-person visualizations also allow us to revisit the strange distortions that Mr Tompkins observed when a fast-moving object approached him in his dream world (Galison, 1979). Visual observations arise from the photons that

78 Visualizing Four Dimensions in Special and General Relativity

2025

Fig. 10 Cubes are set up in a row (bottom). A second row of cubes moves along the first row at 90% of the speed of light (top, motion is from left to right). All cubes, whether moving or at rest, have the same orientation: the face with the “3” is in front, the “4” is on the rear side. The fact that we can see the rear sides of the moving cubes is a consequence of the finite light travel time. (Credit: Ute Kraus, Institute of Physics, Universität Hildesheim, Space Time Travel (https://www. spacetimetravel.org/), Attribution-ShareAlike 2.0 Germany (CC BY-SA 2.0 DE))

simultaneously reach the eye of an observer. In everyday life, we assume that such photons have left the observed object simultaneously as well. However, this assumption does no longer hold at relativistic speeds. If the relative velocity between observer and object is comparable to the speed of light, one has to take into account the finite light travel time that leads to distortions in the direction of light (Woodhouse, 2014). Kraus (2008) explored the visual distortions of moving objects in great detail. As an illustrative example, she chose a row of cubes all facing toward an observer with the number “3.” A second row of such cubes moves toward the observer at 90% of the speed of light as illustrated in Fig. 10. Quite contrary to our everyday experience, the simulation reveals that the apparent shape of the cubes has changed and that we can see the rear sides of the cubes as well. In this case, it is not the relativistic length contraction but the finite travel time of light signals that explains the surprising visibility of the rear side of the cubes. Photons from the rear are able to reach the observer because the cubes move fast enough “out of the way” of these photons. The team around Kraus (2008) has created many visualizations of relativistic phenomena that are accessible at https://www.spacetimetravel.org. A short movie of the moving cubes is available at the website: https://www.spacetimetravel.org/ tompkins/tompkins.html.

2026

M. Kersting

Fig. 11 This picture is a negative of the 1919 solar eclipse taken from the report of Sir Arthur Eddington on the expedition to verify Einstein’s prediction of the bending of light around the Sun. (Credit: F. W. Dyson, A. S. Eddington, and C. Davidson [Public domain])

Computer-generated first-person visualizations provide insights into the structure of spacetime that are complimentary to exocentric visualizations such as Minkowski diagrams. Even though both types of visualizations share the common assumption that the speed of light is finite, they reveal quite different consequences: Minkowski diagrams reveal the causal structure of spacetime and illustrate Lorentz transformations; numerical first-person visualizations reveal the distortions that a relativistic observer would perceive.

Gravitational Lensing and Astrophysical Observations While relativistic ray tracing provides a numerical procedure to visualize spacetime based on the propagation of light, the bending of light in strong gravitational fields offers real-world visualizations of curved spacetime as well. Telescopes and astrophysical observations reveal intriguing manifestations of general relativity. In fact, the light deflection of distant stars around the Sun that Arthur Eddington measured during a total eclipse in 1919 was the first experimental evidence for general relativity (Fig. 11). Gravitational lensing, the bending of light due to spacetime curvature around massive objects such as our Sun, black holes, or distant galaxies, provides

78 Visualizing Four Dimensions in Special and General Relativity

2027

Fig. 12 This illustration of gravitational lensing shows how light from a distant source bends around a massive object. The light rays (the gray arrows) from the distant galaxy are bent when passing a large gathering of mass – such as the galaxy cluster symbolized by the ball with blue glow in the center. When the light finally arrives at the Earth, telescopes observe it as coming from a slightly different direction (the red arrow). One might say that the cluster has acted like a giant magnifying glass, or gravitational lens, in space – focusing, magnifying, and distorting the images of the galaxy. (Credit: NASA/the Space Telescope Science Institute (STScI))

visualizations of the curved geometry of our universe. Light that comes from a distant light source gets deflected around a massive object before getting refocused again (Fig. 12). Today, astronomers use gravitational lensing to obtain information about the gravitational lens or to reconstruct properties of the background objects. Figure 13 shows an Einstein ring that was taken with the Hubble Space Telescope’s Wide Field Camera 3 based on data from the Sloan Digital Sky Survey. Einstein rings are lensing phenomena that occur when the source, the lens, and the observer are aligned in such a way to make the distorted light take the shape of a ring. In 2019, the Event Horizon Telescope Collaboration revealed the first-ever taken picture of the shadow of a supermassive black hole (Fig. 14). When surrounded by a transparent emission region, black holes are expected to reveal a dark shadow caused by gravitational light bending and photon capture at the event horizon (Akiyama et al., 2019). Up to this point, only artists and numerical simulations had produced visualizations of gravity in its most extreme limit. Interestingly enough, astronomers compared their images to an extensive library of ray-traced general-relativistic simulations of black holes. Finding that the observed image was

2028

M. Kersting

Fig. 13 Pictured above is the image of a galaxy that is being magnified by the gravity of a massive cluster of galaxies situated in front of it. This phenomenon is called gravitational lensing. Here, the lens alignment is so precise that the background galaxy is distorted into a horseshoe – a nearly complete Einstein ring. (Credit: NASA, ESA, J. Richard (Center for Astronomical Research/Observatory of Lyon, France), and J.-P. Kneib (Astrophysical Laboratory of Marseille, France))

Fig. 14 The Event Horizon Telescope, a planet-scale array of eight ground-based radio telescopes, captured the first direct visual evidence of the supermassive black hole and its shadow. In this image taken on 11 April 2017, the shadow of a black hole is the closest we can come to an image of the black hole itself, a completely dark object from which light cannot escape. (Credit: Event Horizon Telescope [CC BY 4.0 (https://creativecommons.org/licenses/by/4.0)])

78 Visualizing Four Dimensions in Special and General Relativity

2029

Fig. 15 Neutron stars are the remnants of dead stars that were not heavy enough to collapse into black holes. This visualization depicts a binary system of a black hole and a neutron star and the gravitational waves that ripple outward as the two objects spiral toward each other. (Credit: Mark Myers/Ozgrav, https://www.ozgrav.org)

consistent with expectations for the shadow of a rotating black hole as predicted by general relativity, present-day astronomy shows how closely visualizations, numerical simulations, and the mathematics of four-dimensional spacetime interact to produce new knowledge of our cosmos.

Numerical Simulations of Gravitational Waves General relativity predicts fluctuations in the metric of spacetime that can propagate at the speed of light as gravitational waves (Guidry, 2019). Gravitational waves are “ripples” in space time that are produced by some of the most violent events in the cosmos, such as the collision of binary black holes and neutron stars (Figs. 15 and 16). With the first direct observation of gravitational waves and the subsequent birth of gravitational wave astronomy (Abbott et al., 2016), numerical relativity and visualizations of numerical simulations have gained much impetus in contemporary research. These explorations promise unique insights into the nature of spacetime and allow testing general relativity in the highly dynamic and nonlinear strong field regime that has been largely unexplored until very recently (Varma et al., 2019). Testing spacetime physics in the strong-gravity limit is crucial because there is always the possibility that general relativity might be valid for weak gravity but breaks down in not yet explored realms of strong gravity (Guidry, 2019). Modeling gravitational waves is important to interpret experimental observations from the LIGO and Virgo gravitational wave detector networks. Visualizations of

2030

M. Kersting

Fig. 16 Even though the spiraling and merging of cosmic objects are among the most violent events in our Universe, they produce only the tiniest ripples in spacetime. In this illustration, a digital artist has attempted to visualize the elusive nature of gravitational waves. (Credit: Carl Knox/OzGrav, https://www.ozgrav.org)

these models, in turn, help scientists develop intuitions of the complex dynamics of binary black hole systems. Moreover, visualizations of gravitational waves are instrumental in disseminating gravitational wave discoveries for outreach and educational purposes (Key and Hendry, 2016; Varma et al., 2019). Yet, the complex dynamics of binary black hole systems make them hard to model. Often, obtaining a single prediction for the wave form of a merging system can take several months of computational time on powerful supercomputers (Varma et al., 2019). In response to these numerical challenges, scientists have started to develop surrogate models that use interpolation models to accurately reproduce the result of gravitational wave simulations in fractions of a second (Varma, 2019). One example of such a surrogate model is the binary black hole explorer that offers on-the-fly visualizations of precessing binary black hole systems (Varma et al., 2019). These binary systems are characterized by a misaligned orbital angular momentum similar to the artistic visualization in Fig. 4. The binary black hole explorer offers an interactive scientific tool to visualize the merging of binary black holes, the emitted gravitational wave forms, and the black hole remnant properties. Figure 17 shows such a visualization that can be generated

78 Visualizing Four Dimensions in Special and General Relativity

2031

Fig. 17 This visualization of a numerical simulation shows the complex dynamics of a precessing binary black hole merger. The black holes are shown as oblate spheres, with arrows indicating their spins. The orbital angular momentum is indicated by the pink arrow at the origin. Similar to electromagnetic waves, gravitational waves have different polarizations. The colors in the bottomplane encode the values of the plus polarization of the gravitational wave as seen by an observer at that location. In the subplot at the bottom, one can see the plus and cross polarizations as seen from the camera viewing angle. (Credit: Varma et al. (2019), https://vijayvarma392.github. io/binaryBHexp/)

on a laptop using an easy-to-install-and-use Python package that is available at vijayvarma392.github.io/binaryBHexp. Before the first observation of two merging black holes in 2015, gravitational waves had been the last prediction of general relativity that had not been tested directly. Combining state-of-the-art detector technology, numerical relativity, and scientific visualization techniques, gravitational wave astronomy promises completely new ways of revealing the nature of spacetime and our cosmos.

2032

M. Kersting

Fig. 18 The app Pocket Black Hole allows users to explore distortions of space and time by placing black holes into their immediate environment. Here, a black hole was placed on top of a walking trail near the Swedish town of Karlstad. (Credit: Pocket Black Hole/Laser Labs, https:// www.laserlabs.org)

Virtual, Augmented, and Mixed Reality Ever since the historic origins of spacetime, technology has facilitated attempts to visualize four dimensions. The latest advancements in scientific visualizations build on virtual, augmented, and mixed reality applications that merge real and virtual worlds. By reaching new contexts that are far from our possibilities in the real world, these technologies have the potential to deeply transform the way scientists and the public engage with spacetime phenomena. Instead of being a mere observer or interpreter, scientists become able to directly experience and manipulate spacetime. While the scientific use of virtual and augmented reality is still in its infancy, science education and outreach teams around the world have started to capitalize on the unique opportunities to blend physical and digital objects in real time. Often, these teams are highly interdisciplinary and combine the expertise of scientists, programmers, digital artists, and game designers to increase the public’s scientific literacy through immersive technology. For example, the Gravitational Wave Group at the University of Birmingham founded the community interest company Laser Labs that develops educational apps such as Pocket Black Hole and Stretch and Squeeze (Carbone et al., 2012). These apps invite users to add black holes or gravitational waves to their real-world environments (Fig. 18). Using the app, users can explore distortions of space and time and manipulate effects of gravitational lensing in interactive and immediate ways. In a similar vein, the educational app Science in VR (SciVR) invites users to experience spacetime scenes and to explore the virtual universe either using virtual

78 Visualizing Four Dimensions in Special and General Relativity

2033

Fig. 19 Science in VR contains a universe of virtual reality content, from explorations of our Milky Way and the solar system to black holes colliding. The app offers visualizations of spacetime phenomena that users can enjoy at home with accompanying audio explanations. (Credit: OzGrav, https://www.scivr.com.au)

reality cardboards or devices in 3D mode with accompanying audio explanations (Fig. 19). The developers behind SciVR, the Education and Public Outreach Team at the Australian Research Council Centre of Excellence for Gravitational Wave Discovery (OzGrav), combine classic scientific modeling with immersive virtual reality to make gravitational wave physics more accessible to school students and the general public. Using the freedom of virtual reality, the OzGrav team reimagines what it means to reveal spacetime phenomena from a first-person perspective (Fig. 20). The OzGrav virtual reality environments offer different and complementary approaches to the scientific visualizations of gravitational waves in the previous section. Virtual reality headsets detect the movement of the wearer and allow for realistic embodied experiences of gravitational waves. Figures 21, 22, and 23 show screenshots of virtual scenes in which users can freely move around and lean into the spacetime scene to explore gravitational waves. According to the mission of OzGrav, the center wishes “to capitalize on the historic first detections of gravitational waves to understand the extreme physics of black holes and warped spacetime and to inspire the next generation of Australian scientists and engineers through this new window on the Universe.” Just as gravitational waves have opened a new window into the Universe that reveals the physics of space and time, so have emerging new technologies opened new possibilities to alter one’s perception of four-dimensional spacetime. Advancements in technology and visualizations of four-dimensional spacetime drive change in each other. Virtual, augmented, and mixed reality applications mediate a completely different quality of spacetime experiences. The promise to provide people with an

2034

M. Kersting

Fig. 20 Mission Gravity is a virtual reality environment that allows school students to explore spacetime phenomena in interactive and immersive ways. Wearing virtual reality headsets, students can see and interact with a world that would otherwise not be accessible to them. In this scene, students sit inside a virtual spaceship to study the properties of a giant blue star in the background. (Credit: Mark Myers/OzGrav, https://www.ozgrav.org)

Fig. 21 In this virtual reality scene, two spiraling black holes have been placed next to the Earth so that users can observe and experience the squeezing and stretching of our own planet due to the ripples in space. (Credit: Mark Myers/OzGrav, https://www.ozgrav.org)

78 Visualizing Four Dimensions in Special and General Relativity

2035

Fig. 22 Similar to electromagnetic waves, gravitational waves can have different polarizations. In this virtual reality visualization, users can observe these different polarizations from different angles while the gravitational waves move toward or away from the observer. (Credit: Mark Myers/OzGrav, https://www.ozgrav.org)

Fig. 23 This screenshot of a dynamic virtual reality visualization shows a binary system of a black hole and a neutron star that spiral around each other. In the virtual reality environment, the observers can view the scene from different perspectives to explore gravitational waves. (Credit: Mark Myers/OzGrav, https://www.ozgrav.org)

2036

M. Kersting

intuitive feel for the extreme physics of warped spacetime and our four-dimensional reality has truly propelled spacetime explorations into the twenty-first century.

Conclusion According to the valid observation of Pushkin, imagination is as necessary in geometry as it is in poetry. Everything that requires artistic transformation of reality, everything that is connected with interpretation and construction of something new, requires the indispensable participation of imagination. (Vygotsky, 1998)

Modern physics unfolds on the stage of four-dimensional spacetime. To the non-mathematical mind, the abstract character of such a description may seem unsatisfactory (Russell, 1925). Yet, it is this very abstract character of our physical knowledge that has inspired our scientific, imaginative, and artistic attempts to push the boundaries of what we know. Nowhere does the fruitful interplay between mathematics, sciences, and arts feature as prominently as in the historic struggle to visualize four dimensions as part of the quest to understand our cosmos. Special and general relativity present unique challenges to the human mind because these theories contradict the commonsense understanding of many. To align the perceptual qualities of spacetime physics with our experiential understanding of the world, we must draw on a repertoire of visualizations, representations, and models (Kersting and Steier, 2018). In the course of this chapter, we have explored some of these visualizations through the lens of technology. Since the stage of special relativity is flat Minkowski spacetime, visualizations of special relativistic phenomena are usually restricted to visual perceptions of the distortions of objects due to movement at high speeds. In general relativity, in contrast, the dynamic interplay between spacetime and massive objects allows for a greater variety of spacetimes that can be studied and visualized. Exotic phenomena such as black holes and gravitational waves push the laws of physics to their limits and challenge our imaginative faculties. The historical origins of spacetime and the century-long efforts to visualize relativistic phenomena testament to the power of using technology as a mediator of our understanding. Along with technological progress throughout the twentieth century, computer simulations started to replace artistic illustrations of fourdimensional spacetime with more accurate visualizations based on the mathematics of differential geometry. In this process, interdisciplinary has been a driving force and a recurrent theme. The historic origins of spacetime sprung from a fruitful bringing together of mathematical prowess, visual-geometric thinking, and physical insight. Today, interdisciplinary teams with expertise in relativistic physics, computer simulations, user interfaces, and digital artistry come together to develop visualizations that allow probing extreme phenomena of four-dimensional spacetime (James et al., 2015; Weiskopf et al., 2006). It is only through interdisciplinary efforts that imaginative leaps become possible. Linking mathematical concepts with physical intuition and artistic vision, these imaginative leaps have led from simple spacetime diagrams and rubber sheet analogies to sophisticated first-person visualizations and powerful virtual environ-

78 Visualizing Four Dimensions in Special and General Relativity

2037

ments that allow exploring the extreme physics of black holes and gravitational waves. In the history of spacetime, interdisciplinary has been the driving force and mathematics and technology the means through which imagination has explored our world for new knowledge.

Cross-References  Artistic Manifestations of Topics in String Theory  Modern Ergodic Theory: From a Physics Hypothesis to a Mathematical Theory

with Transformative Interdisciplinary Impact

References Abbott BP et al (2016) Observation of gravitational waves from a Binary Black Hole Merger. Phys Rev Lett 116(6):061102 Akiyama K et al (2019) First M87 event horizon telescope results. I. The shadow of the supermassive black hole. Astrophys J 875(1):L1 Alcubierre M, Brugmann B (2001) Simple excision of a black hole in 3+1 numerical relativity. Phys Rev D 63(10):104006 Anninos P, Camarda K, Masso J, Seidel E, Suen W-M, Towns J (1995) Three dimensional numerical relativity: the evolution of black holes. Phys Rev D 52(4):2059–2082 Arnowitt R, Deser S, Misner CW (1962) The dynamics of general relativity. In: Witten L (ed) Gravitation: an introduction to current research. Wiley, New York, p 227 Baumgarte TW, Shapiro SL (2010) Numerical relativity. Cambridge University Press, Cambridge Born M (1968) Bern’s colloquium, 1955. In: Physics in my generation. Springer, New York Brandt S, Bruegmann B (1997) A simple construction of initial data for multiple black holes. Phys Rev Lett 78(19):3606–3609 Carbone L, Bond C, Brown D, Brückner F, Grover K, Lodhia D, Mingarelli CM, Fulda P, Smith RJ, Unwin R, Vecchio A, Wang M, Whalley L, Freise A (2012) Computer-games for gravitational wave science outreach: Black hole pong and space time quest. J Phys Conf Ser 363(1):012057 Carroll SM (2003) Spacetime and geometry: an introduction to general relativity. Pearson, Chicago Chandler M (1994) Philosophy of gravity: intuitions of four- dimensional curved spacetime. Sci Educ 3(2):155–176 Chandrasekhar S (1992) A mathematical theory of black holes. Oxford University Press, New York Cole KC (2019) The simple idea behind Einstein’s greatest discoveries. Quanta Magazine, https:// www.quantamagazine.org/einstein-symmetry-and-the-future-of-physics-20190626/ Durell CV (1926) Readable relativity. G. Bell & Sons LTD., London Einstein A (1905) Zur Elektrodynamik bewegter Körper [On the electrodynamics of moving bodies]. Annalen der Physik 17(10):891–921 Einstein A (1915) Grundgedanken der allgemeinen Relativitätstheorie und Anwendung dieser Theorie in der Astronomie [Fundamental ideas of the general theory of relativity and the application of this theory in astronomy]. Preussische Akademie der Wissenschaften, Satzungsberichte 1(1):315 Galison P (1979) Minkowski’s spacetime: from visual thinking to the absolute world. Hist Stud Phys Sci 10:85–121 Gamow G (1940) Mr Tompkins in wonderland. Cambridge University Press, Cambridge, reprint 19 edition Goodman AA (2012) Principles of high-dimensional data visualization in astronomy. Astron Nachr 333(5–6):505–514

2038

M. Kersting

Guidry M (2019) Modern general relativity: black holes, gravitational waves, and cosmology. Cambridge University Press, Cambridge Hartle JB (2003) Gravity: an introduction to Einstein’s general relativity. Addison-Wesley, San Francisco Heidegger M (1977) The question concerning technology. In: The question concerning technology and other essays. Harper and Row, New York Hentschel K (ed) (1998) The collected papers of Albert Einstein, Volume 8 (English) The Berlin years: correspondence, 1914–1918. Princeton University Press, Princeton Hesse M (1953) Models in physics. Br J Philos Sci 4:98–214 James O, von Tunzelmann E, Franklin P, Thorne KS (2015) Visualizing interstellar’s wormhole. Am J Phys 83(6):486–499 Kapon S, DiSessa AA (2012) Reasoning through instructional analogies. Cogn Instr 30(3):261– 310 Kersting M, Steier R (2018) Understanding curved spacetime – the role of the rubber sheet analogy in learning general relativity. Sci Educ 27(7):593–623 Key JS, Hendry M (2016) Defining gravity. Nat Phys 12(6):524–525 Kraus U (2008) First-person visualizations of the special and general theory of relativity. Eur J Phys 29(1):1–13 Minkowski H (1909) Raum und Zeit [Space and Time]. Jahresbericht Deutscher Mathematischer Verein 18:75–88 Minkowski H (1915) Das Relaitivitätsprinzip [The principle of relativity]. Jahresbericht Deutscher Mathematischer Verein 24:372–382 Owen R, Brink J, Chen Y, Kaplan JD, Lovelace G, Matthews KD, Nichols DA, Scheel MA, Zhang F, Zimmerman A, Thorne KS (2011) Frame-dragging vortexes and tidal tendexes attached to colliding black holes: visualizing the curvature of spacetime. Phys Rev Lett 106(15):4–7 Petkov V (2014) Physics as spacetime geometry. In: Springer handbook of spacetime. Springer, Berlin, pp 141–164 Petkov V, Ashtekar A (eds) (2014) Springer handbook of spacetime. Springer, Berlin Pretorius F (2005) Evolution of binary black-hole spacetimes. Phys Rev Lett 95(12):121101 Ruder H, Weiskopf D, Nollert HP, Müller T (2008) How computers can help us in creating an intuitive access to relativity. New J Phys 10:125014 Russell B (1925) ABC of relativity. Allen & Unwin, London Stark RF, Piran T (1985) Gravitational-wave emission from rotating gravitational collapse. Phys Rev Lett 55(8):891–894 Steier R, Kersting M (2019) Metaimagining and embodied conceptions of spacetime. Cogn Instr 37(2):145–168 Varma V (2019) Black hole simulations: from supercomputers to your laptop. Ph.D. thesis Varma V, Stein LC, Gerosa D (2019) The binary black hole explorer: on-the-fly visualizations of precessing binary black holes. Classical and Quantum Gravity 36(9):095007 Vygotsky LS (1998) The collected works of L. S. Vygotsky, 5th edn. Springer, New York Wald RM (1984) General relativity. The University of Chicago Press, Chicago Walter S (2014) The historical origins of spacetime. In: Petkov V, Ashtekar A (eds) Springer handbook of spacetime. Springer, Berlin Weiskopf D, Borchers M, Ertl T, Falk M, Fechtig O, Frank R, Grave F, King A, Kraus U, Müller T, Nollert H-P, Rica Mendez I, Ruder H, Schafhitzel T, Schär S, Zahn C, Zatloukal M (2006) Explanatory and illustrative visualization of special and general relativity. IEEE Trans Vis Comput Graph 12(4):522–34 Wheeler JA (1998) Geons, black holes, and quantum foam: a life in physics. W.W. Norton&Company, New York Woodhouse NMJ (2014) Relativity today. In: Petkov V, Ashtekar A (eds) Springer handbook of spacetime. Springer, Berlin

Coevolution of Mathematics, Statistics, and Genetics

79

Yun Joo Yoo

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Early Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mendel and His Inheritance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hardy-Weinberg Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wright-Fisher Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Study of Family History and Pedigrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Twin Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Genetic Linkage Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exploring Big Genetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Genome-Wide Association Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Whole Genome Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Network-Based Analysis for Genetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2040 2042 2042 2045 2048 2051 2051 2053 2057 2057 2061 2063 2066 2067

Abstract Genetics is the science of studying heredity. Heredity is the process of transmitting genetic materials from parents to offspring. In genetic studies, hypotheses derived from biological theories and mathematical models are tested with the data from experiments or observations of genetic phenomena using statistical methodologies. Throughout the history of genetics, mathematics and statistics have been extensively used for genetic studies, and genetics, in turn, has influenced many fields of mathematics and statistics. In this chapter, we describe

Y. J. Yoo () Department of Mathematics Education, Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_28

2039

2040

Y. J. Yoo

some of the most important mathematical models and statistical methods in the history of genetics. We especially focus on three periods: (1) the early days, when the basic concepts in genetics were established, such as genes, evolution, and inheritance, and mathematical models of such genetic mechanisms were laid out; (2) the period of studying family data from twins or large pedigrees in the mid- to late twentieth century; and (3) the present period of exploring big genetic data by complex modeling and machine learning. We show that various probabilistic models, differential equations, and graph and network theories have been applied to the analysis of genetic data. We also illustrate how statistical issues involved with model fitting, estimation, and hypothesis testing have been raised and resolved in the context of genetic studies, contributing to the field of statistics as well as that of genetics. In the discussion, we suggest some promising mathematical and statistical methods to be applied in future genetic studies.

Keywords Mathematical genetics · Statistical genetics · Linkage study · Genetic association · Whole genome sequencing

Introduction Genetics is the study of the mechanisms of inheritance in living organisms at the molecular level or at the population level. Whether an organism is a bacterium or a human, many of its biological characteristics are affected by genetic factors. To solve many important problems in the fields of biology, agriculture, ecology and medicine, heredity in humans and other species has long been studied using mathematical models and statistical methodologies. In human genetics, genetic traits, especially those related to diseases for which researchers are trying to find causes and cures, have been extensively researched. Mathematics has been a key instrument since the very beginning of genetics, when Gregor Mendel (1822–1884) tried to explain why some traits of plants appear in certain ratios under controlled conditions (Siddartha 2016). Currently, genetic studies depend upon large-scale, high-throughput data such as single nucleotide variations or protein networks. The large-scale genomic data generated by the current advanced technology require high-dimensional statistical analysis, advanced machine learning techniques, and complex mathematical modeling, through which researchers discover a specific genetic makeup responsible for certain heritable traits by exploring the billions of possible candidate genetic compositions (Brown 2002). Early statisticians such as Francis Galton (1822–1911), Karl Pearson (1857– 1936), and Ronald A. Fisher (1890–1962), who laid the foundations of modern statistics, were also geneticists. Galton suggested the concepts of variance, standard deviation, correlation, and regression for the first time and discovered the phenomenon called “regression to the mean” from observations of the heights of parents and their children, where the extreme values found in the parents

79 Coevolution of Mathematics, Statistics, and Genetics

2041

move toward the average in their children (Stigler 2010). Galton thought it was a general inheritance phenomenon, and the term “regression” originated from these observations (Galton 1886). Currently, the term “regression to the mean” is used to describe the statistical phenomenon in which initial sampling bias disappears over repeated observations (Stigler 1997). Later, the regression method became one of the most important and powerful tools in statistics. Pearson developed the correlation coefficient r, chi-squared test, and hypothesis testing (Biau et al. 2010). Pearson’s chi-squared test was applied to inheritance data, including Mendel’s original experiment results (Magnello 1998). Fisher discovered the F distribution, which is the distribution of the ratios of two independent chi-squared random variables (Fisher 1924), and introduced the concept of p-values to judge the statistical significance of hypothesis testing, which is required to determine whether evidence is enough to prove the designated hypothesis in genetic studies (Biau et al. 2010). Fisher also discovered numerous genetic principles and key concepts in population genetics, including the principle of equal ratio between male and female and the fundamental theorem of natural selection which states the relationship between fitness and genetic variance (Yates and Mather 1963; Crow 2002). Galton, Pearson, Fisher, and their fellow scholars applied the various mathematical achievements of their time, from probability theory to differential equations, to explaining genetic and evolutionary phenomena, and laid the basic principles and frameworks of statistics, then a relatively new field of science, to provide rational criteria to make scientific judgments based on observed data in genetic experiments. Currently, scientists say that genetics has entered the era of “omics,” where large-scale comprehensive information is accumulated and analyzed. In this era of omics, the coevolution of mathematics, statistics, and genetics remains strong (Kiechle et al. 2004; Raja et al. 2017; Tian et al. 2011). In earlier times, the cost and time required to generate genetic data restricted the scale of the experiments such that a genetic study usually focused on a limited number of genetic elements. Now, new technologies with reduced cost and increased speed and capacity, such as next-generation sequencing, parallel computing, and improved memory devices, enable geneticists, mathematicians, and statisticians together to conduct large-scale studies using the genetic data from hundreds of thousands of subjects or allow the accumulated data assembled in public databases to be shared with researchers all over the world. For example, genomics is the study of entire genomes to find the causes of certain biological phenomena, using the genomic data of millions and even billions of single nucleotides genotyped by commercial array chips or sequencing machines (Brown 2002). Proteomics is the large-scale study of proteins generated from microarray experiments (Blackstock and Weir 1999). Phenomics is the analysis of the phenotypes (trait configurations) of an organism and their changes in response to genetic variations and interactions with the environment (Freimer and Sabatti 2003; Gerlai 2002). Omics studies require mathematical and statistical methodologies for high-dimensional data. For this purpose, network theory has been actively applied to omics data, and new machine learning methods have been developed to select causal variables amid a vast array of candidate genetic materials (Wu et al. 2014).

2042

Y. J. Yoo

In this chapter, we introduce and discuss several key ideas for mathematical models and statistical approaches that have enabled the discovery of important findings in genetics and have led to changes in the paradigm of genetic research. First, we show how the beginning of genetics was deeply related to mathematical modeling and early statistical development. Next, we describe how statistical inferences from the inheritance data of complex families with molecular information made it possible to find genes responsible for Mendelian and complex diseases. Then, we also illustrate the more recent genetic research scenarios of using various advanced mathematical and machine learning methods to disentangle the complex mechanisms of genes and outcomes from the big genetic data generated by new technologies.

Early Contributions Mendel and His Inheritance Models Gregor Mendel is recognized as the founder of genetics due to his first attempt to systematically model the cross-breeding results of plants and his proposal of the concept of genes (the term he used was actually “factors”) (Siddartha 2016; Chiras 2012). Mendel conducted extensive breeding experiments on pea plants (Pisum sativum) and established several rules of heredity, now called the laws of Mendelian inheritance. His experiments using pea plants were intended to study the inheritance patterns of seven traits including height, flower color, and seed color and shape. He artificially pollinated one type of pea plants with the pollen from another type (or sometimes the same type) and examined the appearances of those seven traits in the resulting offspring. The first thing he did for these experiments was to obtain purebred plants for each type of the seven traits by fertilizing plants that shared the same trait type (self-fertilization) for years. Mendel ensured that the self-fertilization of a plant purebred for a trait led to offspring with the same trait type as that of the parent. This phenomenon of the offspring having the same trait type as its parents is called heredity. The concept of heredity as the result of breeding was widely understood by scholars of agriculture or biology in Mendel’s time, but they lacked a clear explanation for the mechanism of heredity. As a result, accurate prediction or control of the outcomes of breeding was almost impossible (Orel 2009). With these purebred plants in hand, Mendel proceeded to cross-breed different types of purebred plants. The purebred plants used in the cross-breeding experiment are called the P generation, and the offspring, as the outcomes of cross-breeding between different types, are called the F1 generation. He then obtained the offspring of the self-fertilization of the F1 generation, which are called the F2 generation. When Mendel observed the distributions of trait types in the F1 and F2 generations, he noticed some patterns in terms of the ratios among the different trait types. For example, when he cross-bred violet-flowered pea plants and white-flowered pea plants, he obtained all violet flowers in the F1 generation. Next, when he let the

79 Coevolution of Mathematics, Statistics, and Genetics

2043

P generation cross-breeding

X

True-bred of violet flower

True-bred of white flower

F1 generation

All violet flowers

F2 generation

705 violet flowers

224 white flowers

Fig. 1 Results of cross-breeding of pure-bred pea plants of violet and white flowers and selffertilization of offspring generation (F1 )

F1 generation self-fertilize, suddenly, he observed white flowers appearing in the F2 generation, which were absent in the F1 generation (Fig. 1). In addition, he found that the ratio between violet and white in the F2 generation was close to 3:1 (705 vs. 224). He observed similar phenomena for height (tall and short) and other traits in the pea plants. To explain why this phenomenon happens, Mendel came up with the theory of a pair of hidden factors: one inherited from the father and the other inherited from the mother. This notion of “inheritance factors residing as a pair in an individual” suggested by Mendel led to the concept of the gene, which is now known as the DNA sequences inherited from both parents residing on chromosomes as pairs. Mendel theorized the concept of genes without physically observing the process of meiosis, only by inferring a mathematical model that fits the data. In addition to the concept of genes, he proposed three principles describing inheritance mechanisms. One of these principles is called the law of dominance, which can be seen as a biological model of inheritance. The law of dominance means that the trait type related to one allele (a variant of a gene) is suppressed by the other trait type, which is related to a different allele of that gene, when they coexist in a heterozygous (co-appearing) genotype. The suppressed allele and its

2044

Y. J. Yoo

related trait type are said to be recessive, and the suppressing allele and its related trait type are said to be dominant. For example, if Y represents the dominant allele corresponding to violet flowers and y represents the recessive allele connected to white flowers, a plant with a Yy genotype yields violet flowers. The other two principles of inheritance are the law of segregation and the law of independent assortment. These principles are directly connected to probabilistic models of Mendelian inheritance. The law of segregation is the principle that each pair of parental genes (alleles) is randomly separated into sex cells so that the offspring inherits one genetic allele from each parent with a 50:50 chance. When the law of segregation is combined with the law of dominance, the 3:1 ratio of violet flowers versus white flowers in the F2 generation in Mendel’s experiment can be explained. The cross-breeding of two different purebred lines in the P generation means that fertilization occurs between one plant with a YY genotype and another with a yy genotype; this cross-breeding process can be symbolized as YY × yy. According to the law of segregation, only one of the parent gene copies will be inherited randomly by the offspring, so every offspring in the F1 generation will have the genotype Yy (provided that we always write the Y before the y). Next, the self-fertilization of F1 , denoted by Yy × Yy, will yield three possible genotypes, YY, Yy, and yy, in the F2 generation. The self-fertilization results of the heterozygous F1 generation are usually represented by a Punnett square (Fig. 2). Fig. 2 Illustration of the law of segregation and the Punnett square

P generation YY

yy

YY x yy

Y

y

F1 generation Yy

Yy

Yy x Yy

y

Y

Y

y

Y

YY

Yy

y

Yy

yy

Y

F2 generation

Punnett square

y

79 Coevolution of Mathematics, Statistics, and Genetics

2045

The proportions of YY, Yy, and yy genotypes in the F2 generation should be 25%, 50%, and 25%, according to the law of segregation. If violet flowers represent the dominant phenotype, then 75% of the F2 generation will have violet flowers, fitting the observed ratio of approximately 3:1 in Mendel’s cross-breeding experiments. The law of independent assortment is a principle describing independence in the inheritance of different genes. Mendel investigated two traits of pea plants at the same time: seed color and seed shape. For seed color, there are two types, yellow (Y) and green (y), and for seed shape, round (R) and wrinkled (r). Here, Y and R are dominant alleles, i.e., seeds of genotype YY or Yy will appear yellow, and seeds of genotype RR or Rr will appear round. If we cross-breed pea plants that are purebred in both traits, one with seeds of yellow color and round shape (YYRR) and one with seeds of green color and wrinkled shape (yyrr), we will end up with an F1 generation having round yellow seeds and uniformly heterozygous genotypes (Yy and Rr); such plants are called dihybrids. When Mendel fertilized these dihybrid F1 plants among themselves, he observed four types in the offspring (F2 ) generation: yellow and round, yellow and wrinkled, green and round, and green and wrinkled. These types were observed to appear in a ratio of approximately 9:3:3:1. From this result, Mendel conjectured that the pairing of alleles for these two traits, seed color and seed shape, in the inheritance process is randomly (independently) determined, meaning that the pairs (Y, R), (Y, r), (y, R), and (y, r) all have equal probabilities. This result fit the observed 9:3:3:1 ratio in four phenotype cases (Fig. 3). Based on this observation of the independent determination of two traits, Mendel proposed the law of independent assortment. With the data obtained from biological experiments in one hand and the mathematical models that seemed to explain the data in the other hand, the need for a rational decision process to justify models in comparison to data emerged in the field of genetics (Cox 2002). The chi-squared goodness of fit test was suggested by Karl Pearson in this atmosphere (Pearson 1900) and applied to Mendel’s data by Raphael Weldon (Magnello 2004). Weldon discovered that Mendel’s data were too close to the expected values and suggested the possibility of fabrication. Fisher was another person who claimed possible falsification based on a statistical point of view (Fairbanks and Schaalje 2007). The controversy over the results of Mendel’s experiments was deeply related to hypothesis testing and the statistical decision process, which prompted an active discussion of those subjects among statisticians.

Hardy-Weinberg Equilibrium Mendel’s laws of inheritance are typical examples which show that finding proper biological and mathematical models for inheritance mechanisms that explain the observed data can be the objective of genetic research. In particular, population genetics, the study of the distributions of genetic components and phenotypic variables in the population in relation to various environmental and genetic factors, has actively used mathematical modeling and statistical methods to theorize and prove hypotheses about population genetic phenomena, such as mutation, evolution,

2046

Y. J. Yoo P generation YYRR

yyrr

X

YR

yr

F1 generation

YR

yR

Yr

YyRr

X

YyRr yr

YR

yR

yr

Yr

F2 generation YR

yR

Yr

yr

YR

YYRR

YyRR

YYRr

YyRr

yR

YyRR

yyRR

YyRr

yyRr

Yr

YYRr

YyRr

YYrr

Yyrr

yr

YyRr

yyRr

Yyrr

yyrr

Fig. 3 Illustration of the law of independent assortment

migration, and natural selection, from the early days of genetic research. In its early days, population genetics was also called mathematical genetics due to its extensive use of mathematical theories (Edwards 1977). One of the most famous principles found by the first generation of population geneticists is the Hardy-Weinberg principle established by Godfrey H. Hardy (Hardy 1908) and Wilhelm Weinberg (Weinberg 1908). The Hardy-Weinberg principle states that the allele frequencies and genotype frequencies of one generation will remain unchanged in subsequent generations as long as no genetic interference, such as mutation, sexual and natural selections, or genetic drift, is present, with the assumption of random mating. Hardy was actually a mathematician who had not been interested in genetics prior to this problem. When his cricket buddy Reginald Punnett (1875–1967) (who created the Punnett square) introduced the proposition of the then-famous statistician George U. Yule (1871–1951) that a dominant allele should prevail in a population over successive generations, Hardy solved the problem by applying his mathematical knowledge and commented that it was a very “simple” problem (Edwards 2008).

79 Coevolution of Mathematics, Statistics, and Genetics Table 1 Punnett square for Hardy–Weinberg principle assuming random mating between males and females

Genotype and its frequency Males

A (p) a (q)

2047 Females A (p) AA (p2 ) Aa (qp)

a (q) Aa (pq) aa (q2 )

The mathematical modeling and theoretical explanation for the Hardy-Weinberg principle are as follows. Let the baseline generation be denoted using the index value t = 0. Suppose that only two alleles, A and a, exist in the population for a gene. The frequencies of occurrence of alleles A and a are denoted by p0 (A) = p and p0 (a) = q = 1 − p, respectively. For any generation t, the allele frequencies pt (A) and pt (a) can be obtained from the genotype frequencies pt (AA), pt (Aa) and pt (aa) of that generation by: 1 pt (A) = pt (AA) + pt (Aa) 2 1 pt (a) = pt (aa) + pt (Aa) 2 To obtain the genotype frequencies of the next generation, we assume random mating in the current generation, which results in the expected frequency table in a Punnett square, as shown in Table 1. For example, the t = 1 generation’s allele frequencies are same as those of the t = 0 generation: 1 p1 (A) = p1 (AA) + p1 (Aa) = p2 + pq = p = p0 (A) 2 1 p1 (a) = p1 (aa) + p1 (Aa) = q 2 + pq = q = p0 (a) 2 In this way, the allele frequencies and genotype frequencies remain fixed across generations; this phenomenon is also called Hardy-Weinberg equilibrium (HWE). Given the genotype frequencies of a population, deviations from HWE can be statistically tested using Pearson’s goodness of fit chi-squared test (Wang and Shete 2017). If you have genotype counts OAA , OAa , and Oaa in the data for genotypes AA, Aa, and aa, respectively, then frequency p of allele A is obtained by: 1 p = OAA + OA 2 and the frequency of allele a is obtained by 1 − p.

2048

Y. J. Yoo

From this, you can compute the expected genotype frequencies EAA , EAa , and Eaa according to HWE such that EAA = np2 , EAa = 2npq, and Eaa = n(1 − p)2 , where n is the total number of genotypes (subjects). The Pearson chi-squared test to detect deviation from HWE is then defined as:

χ2 =

(OAA − EAA )2 (OAa − EAa )2 (Oaa − Eaa )2 + + EAA EAa Eaa

The distribution of χ 2 under the null hypothesis of no deviation from HWE follows a chi-squared distribution with 1 degree of freedom.

Wright-Fisher Model Population genetics research evaluates changes and variations in the genetic composition of populations over time. Factors that affect the population genetic phenomena such as natural selection or mutation are of particular interest. To find evidence for such special evolutionary events, population geneticists must establish a neutral model and investigate departures from that neutral model, as in the case of the test for HWE (Crow 1987). One of the neutral phenomena that mathematical geneticists have studied from the early history of population genetics is genetic drift, the mechanism that causes changes in allele frequencies in the population over generations due to chance (Masel 2011). Technically, any changes in allele frequencies can be called evolution since they can affect the characteristics of the population. In natural selection, allele frequencies may change in order to adapt to environmental pressures. In genetic drift, the change in allele frequencies occurs as a random phenomenon. An illustration of genetic drift is given as follows. Suppose that the population contains only B and b alleles for a gene, with frequencies of 0.5 and 0.5, respectively, and the ratio of BB, Bb, and bb genotypes in the population is 1:2:1. Let us assume that only a portion of the population has mated, and by chance, the individuals who have mated have only BB and Bb genotypes, in a ratio of 2:3. Then, in the next generation, the allele frequencies become 0.7 and 0.3 for the B and b alleles, respectively. If, by chance, only BB individuals reproduce in the second generation, then the b allele completely disappears from the population, due solely to the random phenomenon of genetic drift (Fig. 4). Sewall Wright (1889∼1988) and Ronald A. Fisher (1890∼1962) independently suggested similar mathematical models for genetic drift in approximately 1930 (Fisher 1930; Wright 1931). In a population of N individuals, for a gene with two kt alleles B and b with allele frequencies pt = 2N and qt = 1 − pt , respectively, in generation t, the probability of obtaining kt + 1 copies of allele B in generation t + 1 is:

79 Coevolution of Mathematics, Statistics, and Genetics

2049

First generation

p=0.5 (frequency of allele B) q=0.5 (frequency of allele b)

BB

BB

Bb

Bb

Bb

Bb

bb

bb

BB

Bb

Bb

Bb

Bb

bb

BB

BB

BB

BB

BB

BB

Bb Bb genetic drift

Second generation p=0.7 q=0.3

BB

BB BB genetic drift

BB

Third generation : fixation p=1.0 q=0.0

BB

BB

BB

BB

Fig. 4 Illustration of genetic drift resulting in fixation in the third generation



2N kt+1



(pt )kt+1 (qt )2N −kt+1

since kt + 1 gene copies are drawn from the 2N copies of each gene, consisting of kt copies of allele B and 2N − kt copies of allele b. Then, the expected value and the variance of allele frequency of B in generation t + 1, given the allele frequency distribution in generation t, are: E [pt+1 |pt ] = pt ,

V ar (pt+1 |pt ) =

1 pt (1 − pt ) 2N

By iterating this process, the mean and variance of the allele frequency of B after s generations from the initial generation, t = 0, can be obtained as follows (Crow and Kimura 1970):

2050

Y. J. Yoo

E [ps |p0 ] = ps ,     1 r 1 . p0 (1 − p0 ) 1 − 1 − V ar (ps |p0 ) = 2N 2N

1 2N

If we treat time as a continuous quantity and assume N is large, by letting t = as a new timescale, we obtain: 2Np t+t | pt ∼ B (2N, pt )

where B(n, p) denotes a binomial distribution with sample size n and probability p. With a normal approximation to the binomial distribution, we have: (1)

n=20 0.0 0.4 0.8

B Allele frequency

pt+t | pt ∼ N (pt , pt (1 − pt ) t)

0

10

20

30

40

50

30

40

50

30

40

50

n=200 0.0 0.4 0.8

B Allele frequency

Generations

0

10

20

n=2000 0.0 0.4 0.8

B Allele frequency

Generations

0

10

20 Generations

Fig. 5 The allele frequency changes by genetic drift simulated for three population sizes n = 20, 200 and 2000

79 Coevolution of Mathematics, Statistics, and Genetics

2051

where N(μ, σ 2 ) indicates the normal distribution with mean μ and variance σ 2 . Furthermore, Eq. (1) corresponds to a differential equation: dpt =

 pt (1 − pt )dwt

where wt is a standard Brownian motion. HWE states that allele frequencies and genotype frequencies will remain the same from generation to generation with random coupling of genes and assuming an infinite population (in practice, a sufficiently large population). The WrightFisher model states that chance can lead to the disappearance of an existing allele from a finite population, a phenomenon called fixation. In Fig. 5, simulated allele frequency changes under genetic drift are given for several population sizes. For each population size (n = 20, 200 and 2000), 10 scenarios of genetic drift are plotted assuming only half of the population participate in mating. When the population size is low, some populations undergo fixation by chance. In contrast, as the population size increases, these cases of fixation or near fixation occur less often in the simulation results.

Study of Family History and Pedigrees Twin Studies One of the longest-running scientific issues, the nature versus nurture debate, is closely related to the history of genetics. Even before Mendel’s theories on the concept of genes and inheritance became known to the public, Francis Galton tried to explain the nature of heredity by studying how inheritance affects human behavior and characteristics (Galton 1874). To disentangle the effects of nature and nurture, with a belief in the primary role of the former, Galton investigated twins for the first time in history (Waller 2012). Since twins share almost the same environmental factors and genetic makeup, especially identical twins raised in the same family, research on twins was expected to provide the answer to this old debate. Studies on twins that were more scientifically sound than Galton’s attempt, based on the findings on genes and heredity in the early twentieth century, emerged at approximately the same time. This “classical” twin study design involves comparing the traits of monozygotic (MZ) twin pairs (identical twins) and dizygotic (DZ) twin pairs (nonidentical twins). MZ twins share exactly the same genetic variants, while DZ twins have 50% genetic similarity, the same as the genetic similarity between ordinary siblings, since they are derived from different eggs and sperm. The assumption of the classical twin study is that MZ and DZ twins share almost identical family environments, and thus, if a trait is genetically determined, at least to some degree, that trait in MZ twin pairs will be more similar than that in DZ twin pairs. The first twin study to compare MZ and DZ twins was a study on refraction in human eyes published by a German ophthalmologist, Walter Jablonski, in 1922 (Liew et al. 2005). He compared 40 MZ twins and 12 DZ twins of the same sex for

2052

Y. J. Yoo

within-pair differences in refractive error and astigmatism. Two other independently designed twin studies also appeared in 1924: one study by Hermann Siemens (Siemens 1924) and another by Merriman (Merrriman 1924). In Merriman’s study, the intelligence quotient (IQ) scores of MZ twins showed high correlation (98%) compared to that of the overall twin population (88%). The most commonly used analytical method in classical twin studies is a statistical method called variance component analysis, which is based on a mathematical model of a simple linear relationship between the variable representing the trait phenotype and the variables for the effects of genes and environment. If we denote shared genetic effects between twins as A, shared environmental effects as C and the residual as E, the linear model to account for the standardized train phenotype Y (the mean is zero and the variance is one) is: Y =A+C+E If A, C, and E are independent within each twin pair, then Var(Y) = Var(A) + Var(C) + Var(E), i.e., the phenotype variance, can be decomposed into genetic, environmental, and residual components. Here, Var(E) can be seen as the component representing the environmental influences that are not shared by family members and measurement error, and E is assumed to be independent for the twins in each pair. If Y is affected solely by genes and not by environmental factors, then the correlation between MZ twin pairs should be 1 and the correlation between DZ twin pairs should be 0.5. If Y is affected by only shared environmental factors, then the correlations of the Y values between MZ twin pairs and DZ twin pairs should both be 1. The correlation between MZ twin pairs should be: rMZ = V ar(A) + V ar(C)

(2)

since A and C are identical and E is independent between the twins in each pair. Additionally, the correlation between DZ twin pairs should be: rDZ =

1 V ar(A) + V ar(C) 2

(3)

since only 50% of genes are shared by DZ twin pairs. If we solve (2) and (3) for Var(A), we obtain: V ar(A) = 2 (rMZ − rDZ ) which is called Falconer’s formula (Falconer and MacKay 1996) and can be used to obtain the heritability. The heritability is the degree of genetic effect on a specific trait and is mathematically defined as:

79 Coevolution of Mathematics, Statistics, and Genetics

h2 =

2053

V ar(A) V ar(Y )

If Y is standardized, then h2 = Var(A). If the heritability is high, the trait can be considered to be determined largely by genes, i.e., by nature. If the heritability is close to zero, we can conclude that the trait has nothing to do with genetic influence, i.e., it is the result of the other factors. For example, recent studies of IQ, a phenotypic measurement of mental ability, have used twin studies (Visscher et al. 2008). In twin samples obtained from many studies, the average MZ and DZ correlations for IQ were 0.86 and 0.60, respectively, based on 4,672 MZ and 5,546 DZ twin pairs (Deary et al. 2006). The heritability obtained from these values using Falconer’s formula is 2(0.86–0.60) = 0.52, i.e., 52%. The estimated heritability for IQ has generally been reported as 50–80% by various types of studies (Bartels et al. 2002; McClearn et al. 1997). A recent study examined all twin studies conducted between 1956 and 2012 by meta-analysis, covering 17,804 traits from 14,558,903 twins, and reported that the heritability across all traits was 49%, concluding that the nature versus nurture debate should be settled by admitting that both genetic and environmental factors are equally important to human life (Polderman et al. 2015).

Genetic Linkage Mapping Genetic linkage is a phenomenon that violates the law of independent assortment. When Mendel discovered the law of independent assortment, he did not know that genes are serially structured in chromosomes. Now, it is known that many genes reside together on each chromosome. For example, humans have 19,000 genes on 23 chromosomes, and fruit flies have approximately 15,000 genes on four chromosomes (Ezkurdia et al. 2014; Halligan and Keightley 2006). During meiosis, one of each pair of chromosomes is randomly selected to become the single set of chromosomes in a sex cell. For example, a woman’s egg can contain the chromosome 1 copy inherited from her mother and the chromosome 2 copy inherited from her father as a result of random selection. Therefore, if two genes are on different chromosomes, Mendel’s law of independent assortment applies. Genes on the same chromosome are another question. Early geneticists thought that the genes on the same chromosome were inherited together by offspring until Morgan discovered a contradictory phenomenon (Lobo and Shaw 2008). For example, if the genes for seed color and seed shape are on the same chromosome, and the two copies of the chromosome are separated intact during meiosis, then self-fertilization in the F1 generation should involve only two types of allele pairs for these two genes, (Y, R) and (y, r). (Here, Y and R represent the dominant alleles for seed color and shape, while y and r represent the recessive alleles for seed color and shape, respectively, as in the example in section “Mendel and His Inheritance Models.”) However, the actual meiotic mechanism is more complex than simple

2054

Y. J. Yoo

A

a

A

A

a

a

A

A

a

a

A

A

a

a

B

b

B

B

b

b

B

b

B

b

B

b

B

b

Four products of meiosis

R : recombinant copy NR : nonrecombinant copy A

A

a

a

B

b

B

b

NR

R

R

NR

Fig. 6 Illustration of cross-over (recombination) between a chromosome pair during meiosis

random selection of chromosomes. During meiosis, the DNA of chromosome pairs can become mixed due to physical proximity, and when they are separated, a chromosome copy in a sex cell can contain some parts inherited from the father and some parts inherited from the mother (Fig. 6). This phenomenon is called crossing over or recombination. Since these recombination events occur at random locations on a chromosome, the alleles of genes positioned closely on a chromosome are more likely to be inherited together, whereas those of genes positioned farther apart are more likely to assort independently, similar to genes on different chromosomes. This phenomenon is called genetic linkage, and genes that are in physical proximity and passed on to gametes together are said to be linked. Genes that assort independently are said to be unlinked. The combination of alleles at different loci (positions) on the same member of a chromosome pair is called the haplotype. The positions where recombinations occur are determined randomly, and the recombination probability between two positions depends on the distance between them. Based on this biological model, the concept of a linkage study to find the location of susceptibility genes that are responsible for a trait was proposed through

79 Coevolution of Mathematics, Statistics, and Genetics

2055

the study of pedigree data (Dawn-Teare and Barrett 2005; Pulst 1999). If pedigree data including several generations of family members with their genotypes and phenotypes are available, then the inherited haplotypes at two loci can be logically or statistically inferred, and the occurrence of recombination events between the two loci can be determined or estimated. The number of recombination events among all meioses observed in the offspring generations can be used to estimate the recombination probability, called recombination fraction, according to the relative frequency of recombination. If two genes are unlinked, the recombination fraction should be 0.5. If two genes are linked, the recombination fraction should be lower than 0.5. These two statements constitute the hypothesis testing for linkage studies: H0 : θ =

1 2

vs.

H1 : θ
T, and     d ϕ tj , xj , xj +1 < ε for j = 0, . . . , p.

In short, a chain recurrent point is a point that can arrive back near itself if arbitrarily small errors or perturbations are allowed as time passes. Definition: The chain recurrent set for a flow ϕ is the set of all chain recurrent points for this flow. Definition: A subset A ⊂ X is said to be gradient-like if there is a continuous function that is strictly decreasing on nonconstant solutions. That is, there exists a function f : A → R such that for any nonequilibrium point x ∈ A and times s < t in R, ,f (ϕ(s, x)) < f (ϕ(t, x)). We can restate the Fundamental Theorem of Dynamical Systems now as follows: Theorem (Fundamental Theorem of Dynamical Systems): For any flow on a compact metric space has a chain recurrent set, and on the compliment of this set the flow is gradient-like.

2152

W. Basener et al.

A point x is forward asymptotic to a stationary point y (i.e., equilibrium) is ϕ(t, x) → y as t → ∞. We can generalize this notion of asymptotic behavior with ω-limit sets.   Definition: The ω -limit set of a point x is the set ω(x) = T >0 t>T ϕ (t, x). Equivalently, y is in the ω-limit set of x if and only if there exits an infinite sequence {ti } with ti → ∞ such that ϕ(ti , x) → y as i → ∞. Clearly, if a point x is forward asymptotic to a stationary point y then ω(x) = y, but an ω-limit set can be a more complicated set, such as a limit cycle or a strange attractor. It is not hard to show that an ω-limit is compact (in the topological sense) and invariant (if y ∈ ω(x) then ϕ(t, y) ∈ ω(x) for all t). If we make any physically reasonable assumptions about the state space for differential equations modeling evolution (for example, either there is an upper limit on the reproduction rate even if it is excessively large such as 1020 births per second, or that there is a limit on the length of a DNA strand, even if it is excessively large such as 10100 ), the state space is a compact metric space, being a bounded subset of Rn for some n. Then the theorem of W. Basener (2013) says that every initial condition has an ω-limit set, and this set is chain recurrent. This gives the following theorem: Theorem: For differential equations model of evolution, every trajectory is either chain recurrent or is forward asymptotic to an ω-limit set which is chain recurrent. It is clear that on a chain recurrent set, there is no long-term evolutionary progress. There can be no concept of increasing fitness because you cannot have a Lyapunov function on a chain recurrent set, and up to small perturbations or errors in measurement, every point will eventually return close to itself. A chain recurrent ω-limit set may be an equilibrium, a limit cycle, or a more complicated set where the system exhibits chaos, but there cannot be any long-term evolutionary progress. This describes the general case for differential equations models for evolution, which by their structure assume an infinite population and perfect selection. All trajectories either have no long-term change in fitness (there does not exist a continuous function, i.e., fitness, that increases with time) or they are forward asymptotic to a set on which there is no long-term change in fitness. The trajectories of the first type being those on a chain recurrent set which can be thought of as a generalization of mutation-selection balance, and those of the second type are in the gradient-like sets and have their ω-limit set in the chain recurrent set. As discussed elsewhere in this chapter, the two factors of finite population size and imperfect selection (for example, due to environmental noise and small heritability of fitness) cause natural selection to be less effective then idealized in differential equations models, and mutation effects thus will cause a decrease in fitness from the equilibrium or other chain recurrent limit set. In such cases, the population can experience a mutational meltdown of decreasing reproduction rate

81 Dynamical Systems and Fitness Maximization in Evolutionary Biology

2153

and the ω-limit corresponds to extinction. This mutational meltdown is faster when the population size is small or the mutation rate is high. The same analysis is true for discrete time models, as both Conley’s Fundamental Theorem of Natural Selection and Basener’s theorem from W. Basener (2013) have discrete time versions.

Are There Laws in Biology? Fisher’s work was an attempt to establish laws in biology akin to those in physics, and famously compared his Fundamental Theorem of Natural Selection to laws of thermodynamics. While the parallel that both cases are use of statistics to estimate net behavior from a large number of discrete entities, there are important differences. Gas molecules are far more numerous than biological organisms, and more uniform in function and behavior, even if we put aside philosophical questions of animals having free will. As to the question of whether laws exists in biology comparable to those in physics, and if so, what form those laws take, this question has at least as many answers to this question as there are researchers. Even if we were to restrict attention to laws involving fitness, there is no consensus on what fitness is. The usual notion of equating fitness with reproduction rate is fraught with contradictions (are E. coli such higher fitness than humans?), and Michod (1999) documents 28 different published definitions of fitness. In Rice (2004), the author observes that. “Though evolutionary theory is not built on the idea that any quantity is necessarily maximized, the idea that there is such a quantity remains one of the most widely held misconceptions about evolution.” While relatively easy to quantify, reproduction rate is an incomplete measurement of fitness, although the two are often equated. There are many situations where natural selection by reproduction rate has caused a loss functionality, for example, cave salamanders losing sight. This may select organisms with highest reproduction rate, but is a form of adaptation but the loss of function should not be considered increase in fitness. In the author’s opinion, some form of net biological functionality is a more important component of fitness than reproduction rate alone. The author’s opinion on laws in biology relating to fitness, there are certainly laws that can derived for mathematical models, but these “laws” only apply to the extent that the reduction of biology to math is precise. While there are strong general principles, biological organisms and the engineering principles and genetic information are too complex, combined with probabilistic events in limited population sizes, do not enable the reduction to determinism that Fisher theorized. Application of a model in population dynamics always involves understanding the model as well as the ways in which biological population differ from the model. For example, our fundamental theorem of natural selection with mutations gives the contribution to fitness change from selection and mutations for an infinite population where genetic fitness equal to phenotypic fitness and with the given parameter values, but biological populations are finite, phenotypic fitness is not

2154

W. Basener et al.

equal to genotypic fitness, and probabilistic events act like noise in reproduction, all of which reduce the effectiveness of selection from the ideal formalization. In addition, the parameter values change with time.

A Biological Experiment, Individual Mutations, Adaptation, and Fitness Having discussed the foundational experiments involving change in genetic information over time from mutation, inheritance, and selection in chapter  “Coevolution of Mathematics, Statistics, and Genetics”, and then presenting mathematical models for these systems of processes in sections “Historical Development of Natural Selection and Genetics,” “Fisher’s Fundamental Theorem of Natural Selection,” “The Problem of Genetic Mutation, Comprehensive Simulations and Comprehensive Fitness,” and “Evolutionary Models, Dynamical Systems, and Maximization Principles,” in this subsection, we turn to current experiments involving mutation, selection, and change in genetic information over time. This allows us to consider how different models contributes to our understanding of reality. Ideally, one would consider a variety of biological situations, comparing mathematical models to large and small populations, organisms with mutations ranging from low to high, sexually and asexually reproducing organisms, populations both in controlled lab situations and in wild environments, and the list could go on. Such a comparison is clearly far too comprehensive for the current setting.

The Long-Term Evolutionary Experiment In this section, we focus on Lenski’s famous Long-Term Evolutionary Experiment (LTEE), which is probably the best documented and most closely monitored process change in genetic information and reproduction rates over a long term. This experiment began February 15, 1988, when Dr. Lenski and his colleagues established 12 genetically identical E. coli populations from the same ancestral strain, and continues today. Since then, each population has been grown in its own separate fluid-filled flask with glucose as the carbon source (and with citrate added as a type of buffer). For all this time, each strain has been transferred to fresh nutrient media daily. Each day the glucose and other nutrients in the medium are depleted, so 1% of each population must be transferred to a new flask with fresh medium – allowing continued growth. The 12 replicate populations have averaged 6.6 rounds of cell division (generations) each day. Every 500 generations a sample of each population is stored in a freezer – creating what Dr. Lenski calls his “frozen fossil record.” As of 2014, there have been over 60,000 bacterial generations since the experiment began, with frozen samples filling six freezers. At different times throughout the experiment, the original ancestral strain was retrieved from the freezer and was grown together in a mixed culture with each of the 12 continuously growing populations (the so-called “evolving” strains). The purpose of this was to

81 Dynamical Systems and Fitness Maximization in Evolutionary Biology

2155

do head-to-head fitness competition tests to determine if the continuously growing E. coli populations had developed a competitive edge over the ancestral strain (note: “fitness” was always measured based on growth rate compared to the ancestor in the artificial environment). Lenski and collaborators hoped to experimentally demonstrate that all 12 E. coli populations were evolving continuously over time. In an attempt to monitor this, the genomes of the 12 so-called “evolving” strains were periodically sequenced (at generations 2k, 5k, 10k, 20k, 40k, etc.) and compared to the ancestral genome. This gave Lenski and collaborators a chance to observe and analyze the spontaneous mutations that contributed to adaptation and fitness gain. With respect to the mathematical models, the E. coli represent a population that effectively has an abundant supply of food and spatial room, involves almost no environmental noise in the prepared homogeneous petri dishes, has a low mutation rate of around 10−3 , and has a large effective population size. This provides an optimal situation for natural selection to increase fitness, according to the model and simulations in section “The Problem of Genetic Mutation” and Fig. 8. Importantly, this is not an observation of a population undergoing change in genetics in its natural environment to which it has adapted, but observation of a population that has been introduced to a new novel lab environment, so the expectation under all models should be an initial increase in fitness due to adaptation (due to change in frequency of present alleles and introduction to fixation of mutations that are beneficial in the new environment but not necessarily in the wild environment). The mutationlimited models predict the rate and level for fixation of new beneficial mutations, if and when they come.

Mutation-Selection-Reproduction Experimental Results Over the years since the experiment began, the results have been published in numerous scientific journals and a full updated list of publications is available from the Lenski’s website (Richard Lenski n.d.). Mean fitness was reported to increase rapidly in the first few thousand generations and continued to improve, though more slowly, for 20,000 generations with mutations accumulating at a constant rate. During this time, the 12 E. coli populations became better adapted to the glucose medium and experienced a total fitness gain of 67%; this means that after 20,000 generations, the descendants were able to grow 1.67 times faster than the ancestor. After 20,000 generations fitness had largely leveled off with few adaptive changes until about 31,500 generations into the experiment. At that time, one of the populations gained the ability to uptake the citrate within the medium into the cell, to be used as a nutrient. Previous to that generation the E. coli only utilized glucose as its carbon source. This new adaptation was caused by several complementary random mutations, which allowed unregulated (continuous) expression of the gene controlling citrate uptake. The change in reproduction rate over time is shown in Fig. 10. Either model could reasonably be used to model this experiment if appropriate parameters for the effects of mutations on fitness are used. Observe that many of the numerical simulations using our mutation-selection model shown in Fig. 8 involve

2156

W. Basener et al.

Fig. 10 The most dramatic fitness gains occurred within the first few thousand generations. After 2000 generations meant fitness increased 37% (1.37). By 20,000 generations, the mean fitness increased 67% (a fitness of 1.67) and had largely leveled off, with few adaptive changes thereafter. After 50,000 generations fitness improved was almost imperceptible (3%). To date, the E. coli populations have improved by a total of 70% relative to the ancestor. Note: The red line represents the best-fit curve using the power-law model, based on the complete data set over the course of 50,000 generations. It shows fitness has reached a near maximum and is not expected to significantly increase. The blue line represents an alternate way of interpreting the data (using the hyperbolic model) that is favored by Lenski and colleagues. It suggests fitness will increase without bound, implying uninterrupted evolutionary advance. The problem with this interpretation is that their definition of fitness is growth rate, and obviously growth rate cannot increase without limit. (Image from Wiser, Science, vol. 342, 2013)

an initial adaptation phase with fitness increase (effectively due to optimization of preexisting allele frequencies) followed by either statis or fitness decline, depending on parameters and which realistic effects are considered. A factor that would alter the adaptation phase in real biological populations such as the LTEE experiment is that over time mutation accumulation causes both the mutation rate and the distribution of effects of mutations to change, as studied particularly. This suggests that our mutation-selection model could be more realistic if a mutational-effect distribution is allowed to vary with time, but estimating mutational effects for a population in stasis is sufficiently difficult (see Kibota and Lynch (1996) and Schneider et al. (2011)) and change in mutational-effect distribution is not consistent even in repeated trials with the identical initial population as specifically studied in A. (2018). This is also evident from comparison between populations in the LTEE experiment; 6 out of 12 of the populations have developed defective DNA repair systems and so now are accumulating such slightly deleterious mutations at a much higher mutation rate than normal (Pennisi 2013). Thus, it seems practically best to assume biological populations introduced to a new environment can undergo an adaptation phase transitioning to statis in which our mutation-selection model becomes increasingly applicable. Much of the increase in reproduction rate fitness observed in the LTEE during adaptation documented in Fig. 10 resulted from a small number of mutations that have been documented though gene sequencing. A table listing these mutations

81 Dynamical Systems and Fitness Maximization in Evolutionary Biology

2157

Table 3 The top 10 mutations that have caused almost of the increase in reproduction rate, along with details for each mutation Gene or region topA pykF spoT nadR glmU promoter fis rbs operon malT pbpA-rodA citT

Function DNA topoisomerase 1 Pyruvate kinase Stringent response regulator Transcriptional regulator Cell-wall biosynthesis

Population(s) 10 of 12 12 of 12 8 of 12

Generation established 2000 5000 2000

Fitness gain (%) 13.3 11.1 9.4

Mode of adaptation Reduction Inactivation Reduction

12 of 12

5000

8.1

Inactivation

1 of 12

5000

4.9

Reduction

Nucleoid-associated protein Ribose catabolism Transcriptional regulator Cell-wall biosynthesis Citrate transporter

10 of 12

10,000

2.9

Reduction

12 of 12 8 of 12

2000 5000

2.1 0.4

Deletion Deletion/reduction

6 of 12 1 of 12

2000 31,000

– –

Reduction Loss of regulation

along the gene or region in which the mutation occurred, the function that was affected, the number of populations which experienced this mutation, the generation in which the mutation was first observed, the fitness (in the sense of reproduction rate) gain the was observed, and the mode of adaptation describing how the mutation affects the associated function is provided in Table 3. The data in this table is from, S. Wielgoss (2013), and Pennisi (2013). Each of the documented mutations that reached fixation caused an increase in reproduction rate and simultaneous loss of some functionality. For example, within the first 2000 generations, mutations arose in 8 out of 12 of the populations within a gene known as spoT, involved in the regulatory control of multiple genes, detailed in the third row of Table 3. One of the genes controls the expression of the flg operon, which encodes the bacterial flagellum – the tiny whip-like cord that propels bacteria through their aqueous environment. Researchers found that this type of mutation reduced the expression of the flagellum-encoding genes, which turned out to be advantageous to E. coli bacteria growing in shaker flasks. In the Proceedings of the National Academy of Science, Cooper and colleagues explain how the advantage was obtained. First, the array data show that the spoT mutation lowers the expression of the flagellaencoding flg operons. The ancestral strain used in the evolution experiment was non-motile, the selective environment lacked physical structure, and the production of flagella is known to be costly. Hence, reducing the expression of these genes could be beneficial.

Since the flagellum encoding genes were unnecessary for the nonmotile bacteria and burdensome to maintain in the artificial environment, the reduction mutations conferred a 9.5% improvement in fitness, as shown in Table 3.

2158

W. Basener et al.

The top nine mutations providing increase in reproduction rate, shown in Table 3, all involve a clear loss of function, either reduction in gene expression, inactivation or complete deletion of a gene (last columns). The single mutation that appeared to some to be a novel gain in function was in the citT region, which modified the E. coli so that it would utilize the citrate (initially included in the medium as a buffer) aerobically. This is considered to be the most important mutational advance in the LTEE experiment, and “a key innovation” of Lenski’s LTEE experiment. The genes, proteins, and functionality involved are not simple, but the basic concept is that E. coli normally has the ability to utilize citrate but this utilization is regulated in wild-type E. coli, only invoked when there is a lack of other nutrients as shown experimentally in Hall (1982). The citT mutation broke this regulation mechanism, and because the E. coli in the experiment in a medium containing abundant citrate, this deregulation of citrate uptake resulted in increase in reproduction rate. Thus, this mutation does not generate a new functionality or new genetic information; it is simply deregulation of preexisting functionality. This was described in Minnich et al. (2016), comparing to related experiments, as follows: E. coli cannot use citrate aerobically. Long-term evolution experiments (LTEE) performed by Blount et al. (Z. D. Blount, J. E. Barrick, C. J. Davidson, and R. E. Lenski, Nature 489:513–518, 2012,  https://doi.org/10.1038/nature11514) found a single aerobic, citrateutilizing E. coli strain after 33,000 generations (15 years). This was interpreted as a speciation event. Here we show why it probably was not a speciation event. Using similar media, 46 independent citrate-utilizing mutants were isolated in as few as 12 to 100 generations. Genomic DNA sequencing revealed an amplification of the citT and dctA loci and DNA rearrangements to capture a promoter to express CitT, aerobically. These are members of the same class of mutations identified by the LTEE.

The authors conclude (Minnich et al. 2016): We conclude that the rarity of the LTEE mutant was an artifact of the experimental conditions and not a unique evolutionary event. No new genetic information (novel gene function) evolved.

And further observe that their “findings parallel the conclusions from bacterial starvation studies by Zinser and Kolter (Kolter 2004) in which E. coli adaptations were dominated by changes in the regulation of preexisting gene activities rather than by the generation of new gene activities, de novo.” In fact, it had been shown as early as 1982 in Hall (1982) that E. coli gain the ability to utilize citrate via mutation, and that in both Hall’s work and the LTEE experiment the process incorporated genes that maintain their normal function but exhibit expanded expression through deregulation.

LTEE Experiment and Mathematical Modeling Conclusion The LTEE provides important lessons regarding the result of mutation and selection over the long term. First, we see that adaptation via fixation of beneficial mutations does cause increase in reproduction rate. But, these “beneficial” mutations are only

81 Dynamical Systems and Fitness Maximization in Evolutionary Biology

2159

beneficial in a very narrow sense; in every case, they involve a loss of biological function. This type of selection is selecting for a single trait – reproduction rate – and selection for a single trait will, as a general principle, sacrifice other traits for the sake of the optimized one. A good comparison might be optimizing a car for a drag race by stripping out unnecessary weight, such as air conditioning, seats other than the driver’s, etc. The LTEE team observed that deletion mutations caused a reduction in the E. coli genome size in 10 of the 12 clones by amounts ranging from 0.9% to 3.5% of the ancestral genome size (Schneider et al. 2014), concluding: . . . generally, the deleted genes have functions that are not used under the conditions prevailing during the LTEE. These deletions might have conferred higher fitness by eliminating unnecessary and costly gene expression . . .

While these few reproduction-rate-beneficial mutations have been happening, a much larger number of mutations have been accumulating which are not beneficial by any measurement. Most of these non-beneficial mutations should be slightly reproduction-rate-deleterious. 6 out of 12 of the populations have developed defective DNA repair systems and so now are accumulating such slightly deleterious mutations at a much higher rate than normal (Pennisi 2013). Such populations have by now accumulated well over a thousand such slightly harmful (deleterious) mutations per cell. Lenski et al. describe this as a growing “genetic load” that is in tension with adaptation (Wielgoss et al. 2013) that parallels the tension between the selection term and the mutation term in our fundamental theorem of natural selection with mutations, Theorem 2 in section “Mutation-Selection Models with More Realistic Factors.”

Maximization of Net Biological Function We have discussed how in both mathematical models and experiments, increase (maximization) of reproduction-rate-fitness is difficult, and most populations are in mutation-selection equilibrium, with a possibly observable accumulation of slightly deleterious mutations. Even in the presence of reproduction-rate-fitness increasing adaptation, the mutations that increase reproduction rate do so be causing a loss of biological function that is unnecessary in the new environment, called reductive evolution. In this section, we consider what seems actually maximized in biological organisms – net biological function – which does not appear to be maximized by the mutation-selection process, even when this process acts to increase reproduction rates. The perspective that natural selection cannot globally maximize fitness is increasingly given acceptance current mathematical models of genetics and selection – yet there are many life functions that are clearly maximized, or are very nearly optimized. So, what is actually being maximized in biology? At the Hans Bethe memorial 2015 lecture entitled, “More Perfect than We Imagined: A Physicist’s View of Life” (Bialek, More Perfect than we Imagined: A Physicist’s View of Life 2015), Princeton biophysicist William Bialek recounted how living systems appear to be astoundingly maximized, not in terms of reproductive success, but in

2160

W. Basener et al.

terms of system performance. He explains that life systems approach the maximum levels allowed by physics. Life systems that involve sensory organs, information processing, and noise reduction routinely approach the absolute limits of what is possible in terms of the laws of physics, given the size of the organism, the materials used, and other constraints. To understand the spirit of what biological maximization is, consider maximization of performance in the man-made realm. If an engineer is given the task of making a communication system that links computers to each other via a copper coaxial cable, there is a theoretical maximum to the number of bits that can be pushed across the copper wire based on the laws of physics. Of course, the engineer could choose another substrate like glass instead of copper, but that is not what the spirit of maximization means. Maximization in this sense means, given certain constraints (such as materials, energy available, size, etc.), a system is constructed in such a way that certain aspects of performance (like signal flow, or sensory detection) reach the peak of what is allowed by the laws of physics. Bialek’s graduate-level biophysics textbook recounts numerous examples of performance maximization that should humble the most talented teams of scientists and engineers. For example, Bialek writes about a bacterium trying to sense concentration gradients of the chemicals in the environment it lives in (Bialek 2012): To reach the observed performance, it has to count nearly every molecule that arrives at its surface. Even with this nearly ideal behavior, it can work only by making comparisons across time, not space, and estimates of time derivatives have to be averaged for a few seconds, not more and not less.

There can hardly be a more precise way for an organism to monitor its environment, than for it to count specific molecules that it encounters. In addition to “molecule counting,” in a chapter devoted to the topic, Bialek describes eyes of various creatures that are capable of “photon counting.” Bialek also describes how some animal sensory organs are so well-crafted, they sense tiny vibrations that have amplitudes that approach atomic scale: There is a particular species of neotropical frog that exhibits clear behavioral responses to vibrations of the ground that have an amplitude of ∼1Å. Individual neurons that carry signals from the hair cells in the sacculus to the brain actually saturate in response to vibrations of just ∼10 Å = 1 nm. Although there are controversies about the precise numbers, the motions of our eardrums in response to sounds we can barely hear are similarly on the atomic scale. Invertebrates don’t use hair cells, but they also have mechanical sensors, and many of these respond reliably to motions in the angstrom or even sub-angstrom range.

The incredible biophysical achievement now being discovered throughout the biological realm is entirely discordant with the crude operation of the mutation/selection process. Natural selection can only act on total (macro) fitness. Selection acts on the level of the whole organism, selection does not happen on the nano-scale. Positive selection only sees those extremely rare, and grossly beneficial mutations, which are impactful enough to significantly improve the total net fitness, taking into account all the functions of the whole organism. Such extremely rare mutations require an immediate survival benefit, and since such mutations arise randomly, it is logical to assume such mutations must be only approximate – never

81 Dynamical Systems and Fitness Maximization in Evolutionary Biology

2161

perfect. This chapter has shown that natural selection is too crude a tool to maximize total (macro) fitness on the scale of the intact individual. While natural selection can sometimes select for rare high-impact beneficial mutations, those mutations need to be in the context of many other nucleotides – which individually have little or no effect, but must operate in in concert to have any function. Most emphatically, natural selection cannot act on those many nearly neutral mutations that are required to establish multicomponent super-optimal biological systems, which operate on the nano-scale. Almost any engineer involved in the industry of aviation and maritime navigation might wonder how a monarch butterfly can navigate thousands of miles using a magnetic compass system smaller than a pin head. Quantum physicists marvel at a bird’s ability to navigate by sensing magnetic fields by leveraging aspects of quantum spin chemistry that exceed “the best comparable man-made molecular systems.” (Gauger et al. 2011). One of the fundamental principles of biological systems, which was observed by Darwin and is orders of magnitude clearer today, is that biological organisms are phenomenally maximized for net biological function. Mathematical models of natural selection are only weakly and locally able to optimize by reproduction rate, act on whole genomes and not individual components, and provide no observable pathway for optimization of net biological functionality that has so strongly been optimized. This is one of the great problems in understanding genetic change of time with respect to observable biological organisms and processes.

Conclusion This chapter presented the biological processes generating change in a population’s fitness over time; reproduction, mutation, competition, and genetic inheritance. A historical review of these processes was given along with the associated phenomena experimentally observed in biological populations. Then mathematical models of these underlying and resulting phenomena were presented, comparing the results of the models to observations of biological populations. We showed that an understanding of both the models and the physical biological systems being models are essential for the modeling process, and mathematical models which are simple for the sake of tractability can be misleading because of the complexity of biological function and information. At best, such models describe the behavior of one component of the larger biological system. Broadly speaking, fitness corresponds to the ability, or probability, that an organism will reproduce to pass on its genetic traits, often equated with reproduction rate. Differences in fitness among organisms in a population leading to the more fit members passing on more of their genetic information over time is called natural selection, or survival of the fittest. This as proposed by Charles Darwin as the underlying mechanism of evolution, including speciation. We reviewed Mendelian genetics, and how this was originally seen as being inconsistent with the Darwinian

2162

W. Basener et al.

view until work in the first half of the twentieth century, prominently by Ronald Fisher but also by Haldane and Wright, unifying the view of discrete genetics with the statistical and probabilistic view of traits in large populations. This historically led to Neo-Darwinism, the view that mutations provide genetic variance and natural selection converts variance into fitness increase – known as fitness maximization. We showed that Fisher’s model, and his famous theorem, suggested that this increase should be perpetual, giving support the Neo-Darwinian view. However, in the second half of the twentieth century, the impact of mutations has become better understood experimentally, and contrary to Fisher’s view, the effect of mutations cannot be considered fitness neutral. We re-derived Fisher’s model, but including mutational effects, and showed that the rate of change in mean fitness (reproduction rate) is the sum of a positive term for selection and an (usually negative) term for mutations. The is model is for the infinite ideal case, and suggests a mutation-selection equilibrium, which is consistent with biological observations. Moreover, as realistic factors such as small/finite population size and environmental noise are considered, numerical simulations show the likelihood of fitness decline. A proper view of modeling considers each model as providing insight to only the part of the biological system being modelled. Mutation-limited models provide insight into which mutations might reach fixation through selection, the likely allele frequencies, and rate at which this will happen, but can only consider one mutation at a time. Our model for the fundamental theorem of natural selection with mutations (FTNSWM) considers the collective effect of multiple simultaneous mutations (humans for example have approximately 100 de nova mutations per birth) modeled by a probability distribution and numerical simulations from these can be created that also consider the effects of finite or small populations and environmental noise, which decreases the effectiveness of selection and can lead to genetic meltdown. Experimental LTEE experiments show that during adaptation, mutations occur that can be modeled by the mutation-limited models because they are rare and increase reproduction rate, but these experiments also show these mutations to be reducing net functionality. At the same time, many small-effect mutations (very slightly deleterious mutations, or VSDMs) accumulate according to our FTNSWM model, creating genetic load that cannot be selected out because the individual mutation effects are too small for selection and organisms accumulate many VSDMs that cannot be selected for individually. Comprehensive numerical simulations like Mendel’s Accountant can consider all these effects, being the most complete models of reality considering any known phenomenology, but at the expense of mathematical tractability.

Skepticism of Fitness Maximization This skepticism regarding fitness maximization is common among those studying natural selection in genetics, both biologically and through modeling. Grodwhol, in

81 Dynamical Systems and Fitness Maximization in Evolutionary Biology

2163

his 2016 paper “The Theory was Beautiful Indeed”: Rise, Fall and Circulation of Maximizing Methods in Population Genetics (1930–1980), gives a historical review of the attempts to find fitness maximization in population genetics, and the eventual abandoning of the pursuit (Grodwohl 2017). Birch, in his 2015 paper “Natural selection and the maximization of fitness,” discusses the rejection of fitness maximization in population genetics (Birch 2015). However, he documents the contradiction that researchers in other fields “take it for granted that natural selection can be regarded as a process of fitness maximization.” He furthermore observes that: The notion that natural selection is a process of fitness maximization gets a bad press in population genetics, yet in other areas of biology the view that organisms behave as if attempting to maximize their fitness remains widespread.

Birch provides the following quotes from three textbooks in ecology that explicitly state a fitness maximization principle from natural selection as an unquestioned axiom of the field: The majority of analyses of life history evolution considered in this book are predicated on two assumptions: (1) natural selection maximizes some measure of fitness, and (2) there exist trade-offs that limit the set of possible [character] combinations. Roff (1992, p. 393) The second assumption critical to behavioral ecology is that the behavior studied is adaptive, that is, that natural selection maximizes fitness within the constraints that may be acting on the animal. Dodson et al. (1998, p. 204) Individuals should be designed by natural selection to maximize their fitness. This idea can be used as a basis to formulate optimality models. Davies et al. (2012, p. 81)

Natural selection and presumed fitness maximization have often been cited for why biological systems can maximize their performance – even approaching the absolute limits of physics as discussed in section “Maximization of Net Biological Function.” Mathematical models of natural selection provide no pathway to maximize fitness measured as net biological functionality, and selection operates on the level of the whole organism, and so cannot maximize biological systems just above the atomic level. This problem is profound, and is compounded by the fact that when natural selection is most effective, it is most often reductive in nature, involving loss of function (Behe 2010). Therefore, it seems incredible that natural selection could maximize functionality and performance to the level observed in biological systems. In Darwin’s day, there was no such thing as “biophysics.” Yet even then, Darwin saw the problem. Darwin cited “organs of extreme perfection and complication” (Darwin 1959) as a major difficulty of his theory. Darwin himself said, “to suppose that the eye with all its inimitable contrivances . . . could have been formed by natural selection, seems, I freely confess, absurd in the highest degree.” Historically, one of the fundamental reasons for trying to formally prove fitness maximization was to demonstrate that evolution really could create organs of extreme perfection and complication. Contrary to this goal, mathematical models and experiments suggest that evolution working through mutation and selection provides no such mechanism for optimization or formation.

2164

W. Basener et al.

References Aris-Brosou S (2019) Direct evidence of an increasing mutational load in humans. Mol Biol Evol 36(12):2823–2829. https://doi.org/10.1093/molbev/msz192 Barton NH, Briggs DE, Eisen JA, Goldstein DB, Patel NH (2007) Evolution. Cold Spring Harbor Press, Cold Spring Harbor Basener W (2013) Limits of chaos and progress in evolutionary dynamics. Biol Inform New Perspect:87–104. https://www.worldscientific.com/doi/abs/10.1142/9789814508728_0004 Basener WF, Sanford JC (2018) The fundamental theorem of natural selection with mutations. J Math Biol (Springer) 76:1589–1622. https://doi.org/10.1007/s00285-017-1190-x Bataillon T (2000) Estimation of spontaneous genome-wide mutation rate parameters: whither beneficial mutations? Heredity 84:497–501. https://doi.org/10.1046/j.1365-2540.2000.00727.x. Accessed May 2020 Bateson W, Saunders ER, Punnett RC (1904) Reports to the evolution committee of the Royal Society. Harrison and Sons, London. Printers. https://archive.org/details/RoyalSociety. ReportsToTheEvolutionCommittee.ReportIi.Experimental. Accessed 9 Mar 2020 Baumgardner J, Brewer W, Sanford JC (2013) Can synergistic epistasis halt mutation accumulation? Results from numerical simulation. In: Marks RJ, Behe MJ, Dembski WA, Gordon B, Sanford JC (eds) Biological information – new perspectives. World Scientific, Ithaca, pp 338– 368, 312–337 Behe M (2010) Experimental evolution, loss-of-function mutations, and “the first rule of adaptive evolution”. Q Rev Biol 85(4):419–445 Bhattacharjee Y (2014) The vigilante. Science 343(6177):1306–1309 Bialek W (2012) Biophysics: searching for principles. Princeton University Press, Princeton Bialek W (2015) More perfect than we imagined: a physicist’s view of life. https://www.cornell. edu/video/william-bialek-physicists-view-of-life. Accessed 21 Feb 2020 Birch J (2015) Natural selection and the maximization of fitness. Biol Rev 91(3). https://doi.org/ 10.1111/brv.12190 Boveri T (1904) Ergebnisse über die Konstitution der chromatischen Substanz des Zelkerns. Gustav Fischer Verlag, Jena Bowler PJ (1983) The eclipse of Darwinism: anti-Darwinian evolutionary theories in the decades around 1900. Johns Hopkins University Press, Baltimore Bowler PJ (2003) Evolution: the history of an idea. University of California Press, Berkely and Los Angeles Brewer W, Smith F, Sanford J (2013a) Information loss: potential for accelerating natural genetic attenuation of RNA viruses. In: Biological information new perspectives. World Scientific, Ithaca, pp 369–384 Brewer W, Sanford JC, Baumgardner J (2013b) Using numerical simulation to test the “mutation-count” hypothesis. In: Marks RJ, Behe MJ, Dembski WA, Gordon BL, Sanford JC (eds) Biological information – new perspectives. World Scientific, Ithaca, pp 298–311 Bull JJ, Meyers LA, Lachmann M (2005) Quasispecies Made Simple. PLoS Comput Biol. https:// doi.org/10.1371/journal.pcbi.0010061 Bürger R (1989) Mutation-selection models in population genetics and evolutionary game theory. In: Kurzhanski AB, Sigmund K (eds) Evolution and control in biological systems. Springer, Dordrecht Carter RC, Sanford JC (2012) A new look at an old virus: patterns of mutation accumulation in the human H1N1 influenza virus since 1918. Theor Biol Med Model 9(42). https://doi.org/10. 1186/1742-4682-9-42 Caswell H (1989) Matrix polulation modles. Sinauer Associates, Sunderland Charlesworth B (2012) The effects of deleterious mutations on evolution at linked sites. Genetics 190(1):5–22. https://doi.org/10.1534/genetics.111.134288

81 Dynamical Systems and Fitness Maximization in Evolutionary Biology

2165

Charlesworth B, Charlesworth D (2009) Darwin and genetics. Genetics 183(3). https://doi.org/10. 1534/genetics.109.109991 Conley C (1978) Isolated invariant sets and the Morse index. CBMS regional conference series 38. American Mathematical Society, Providence Cosens D, Briscoe D (1972) A switch phenomenon in the compound eye of the white-eyed mutant of Drosophila melanogaster. J Insect Physiol 18(4):627–632. https://doi.org/10.1016/ 0022-1910(72)90190-4. Accessed 9 Mar 2020 Crotty S, Cameron CE, Andino R (2001) RNA virus error catastrophe: direct molecular test by using ribavirin. Proc Natl Acad Sci U S A 98(12):6895–6900. https://doi.org/10.1073/pnas. 111085598 Crow JF, Kimura M (1970) An introduction to population genetics theory. Blackburn Press, Caldwell. (Reprint ed.) Darwin C (1868) The variation of animals and plants under domestication. Print Darwin C (1959) Origin of species. Harvard Classics, P.F. Collier & Son, New York Davies NB, Krebs JR, West SA (2012) An introduction to behavioural ecology. Wiley-Blackwell, Hoboken DeVries H (1889) Intracellular pangenesis. The Open Court Publishing (1910 Translation). http:// www.esp.org/books/devries/pangenesis/facsimile/. Accessed 8 Mar 2020 DeVries H (1901–1903) Die Mutationstheorie. Versuche und Beobachtungen über die Entstehung von Arten im Pflanzenreichh, zwei Bänder. Veit, Leipzig Diekmann J, Metx A, O. (1986) The dynamics of physiologically structured populations. Springer Verlag, Berlin Dodson SI, Allen TFH, Carpenter SR, Ives AR, Jeane RL, Kitchell JF, Langston NE (1998) Ecology. Oxford University Press, Oxford Dunham I, Kundaje A, Aldred S, Collins PJ, Davis CA, Doyle F (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414). https://doi.org/10.1038/nature11247 Edwards AWF (2007) Maximizations principles in evolutionary biology. In: Matthen M, Stephens C (eds) Handbook of the philosophy of science: philosophy of biology. North-Holland, Amsterdam, pp 335–347 Eigen M (1971) Self-organization of matter and the evolution of biological macromolecules. Naturwissenschaften:465–523. https://doi.org/10.1007/BF00623322 Elena SF, Ekunwe L, Hajela N, Oden SA, Lenski RE (1998) Distribution of fitness effects caused by random insertion mutations in Escherichia coli. Genetica 102–103(1–6):349–358 Ewens WJ, Lessard S (2015) On the interpretation and relevance of the fundamental theorem of. Theor Popul Biol 104(2015):59–67. https://doi.org/10.1016/j.tpb.2015.07.002 Eyre-Walker A, Keightley PD (1999) High genomic deleterious mutation rates in hominids. Nature 397(6717):344–347 Falconer DS, Mackay FC (1996) Introduction to quantitative genetics. Longmans Green, Essex Felsenstein J (1974) The evolutionary advantage of recombination. Genetics 78(2):737–756 Felsenstein J (1989) Mathematics vs. evolution. Science 246(4932):941–942 Felsenstein J (2017) Theoretical evolutionary genetics. Draft, Seattle. http://evolution.genetics. washington.edu/pgbook/pgbook.html. Accessed 28 Feb 2020 Felsenstein J (2018) Fisher memorial lecture 2018 by Joe Felsenstein. March 26. https://www. youtube.com/watch?v=ZF3nIMvBBDw&feature=youtu.be. Accessed 9 Mar 2020 Ferreiro MJ, Perez C, Mariana M, Santiago R, Caputi A, Aguilera P, Barrio R, Cantera R (2017) Drosophila melanogaster white mutant w1118 undergo retinal degeneration. Front Neurosci 11(732). https://doi.org/10.3389/fnins.2017.00732 Fisher RA (1930) The genetical theory of natural selection. Clarendon Press, Oxford, UK. https:// doi.org/10.5962/bhl.title.27468. Accessed 9 Mar 2020 Fisher RA (1999 edition) The genetical theory of natural selection – a complete variorum edition. Ed H Bennett. Oxford University Press, Oxford, UK Frank SA, Slatkin M (1992) Fisher’s fundamental theorem of natural selection. Trends Ecol Evol 7(3):92–95

2166

W. Basener et al.

Gauger EM, Rieper E, Morton JJ, Benjamin SC, Vedral V (2011) Sustained quantum coherence and entanglement in the avian compass. Phys Rev Lett. https://doi.org/10.1103/physrevlett.106. 040503 Geritz SAH, Metx JAJ, Nisbet RM (1992) How should we define “fitness” for general ecological scenarios? Trends Ecol Evol 7(6):198–202 Gerrish PJ, Lenski RE (1998) The fate of competing beneficial mutations in an asexual population. Genetica 102–103(1–6):127–144 Graur D (2016) Rubbish DNA. arXiv. Electronic. Comp. Cornell University. Cornell University, Ithaca. 22 Jan. https://arxiv.org/abs/1601.06047. Accessed 28 Feb 2020 Graur D (2013) How to assemble a human genome. Society of Molecular Biology and Evoluiton/Spanish Society for Evolutionary Biology. https://www.slideshare.net/dangraur1953/ update-version-of-the-smbesesbe-lecture-on-encode-junk-dna-graur-december-2013. Accessed 5 Mar 2020 Grodwohl J-B (2017) “The theory was beautiful indeed”: rise, fall and circulation of maximizing methods in population genetics (1930–1980). J Hist Biol 50:571–608 Guijarr-Clarke C, Holland PW, Paps J (2020) Widespread patterns of gene loss in the evolution of the animal kingdom. Nat Ecol Evol Hald A (1998) A history of mathematical statistics. Wiley, New York Haldane JB (1932 (1990 reprint)) The causes of evolution. Princeton University Press, Princeton Hall DG (1982) Chromosomal mutation for citrate utilization by Escherichia coli. J Bacteriol 151:269–273 Harris CL (1981) Evolution: genesis and revelations: with readings from Empedocles to Wilson. State University of New York Press, Albany Herron JC, Freeman S (2014) Evolutionary analysis, 5th edn. Upper Saddle River, New Jersey, Pearson Prentice Hall Hofbauer J (1985) The selection mutation equation. J Math Biol:41–53. https://doi.org/10.1007/ BF00276557 Hössjer O, Bechly G, Gauger A (2018) Phase-type distribution approximations of the waiting time until coordinated mutations get fixed in a population. In: Silvestrov S, Malyarenko A, Rancic M (eds) Stochastic processes and algebraic structures – from theory towards applications, Springer proceedings in mathematics and statistics, pp 245–313. https://doi.org/10.1007/9783-030-02825-1_12 Huxley J (1942) Evolution: the modern synthesis. Allen and Unwin, London Johannsen WL (1903) Om arvelighed i samfund og i rene linier. Oversigt over Det Kongelige Danske Videnskabernes Selskabs Forhandlingerm 3:247–270 Kibota TT, Lynch M (1996) Estimate of the genomic mutation rate deleterious to overall fitness in E. coli. Nature 381(6584):694–696. https://doi.org/10.1038/381694a0 Kimura M (1965) Attainment of quasi linkage equilibrium when gene frequencies are changing by natural selection. Genetics 52:875–890 Kimura M (1979) Model of effectively neutral mutations in which selective constraint is incorporated. Proc Natl Acad Sci U S A 76(7):3440–3444. https://doi.org/10.1073/pnas.76.7.3440 Kimura M, Maruyam T (1966) The mutational load with epistatic gene interactions in fitness. Genetics 54(6):1337–51 Kolter ERZ, R. (2004) Escherichia coli evolution during stationary phase. Res Microbiol:328–336. https://doi.org/10.1016/j.resmic.2004.01.014 Kondrashov A (1995) Contamination of the genome by very slightly deleterious mutations. J theoret Biol 175:583–594 Kondrashov A (2017) Crumbling genome: the impact of deleterious mutations on humans. John Wiley and sons, Hoboken Koonin E (2011) The logic of chance: the nature and origin of biological evolution. FT Press, Upper Saddle River Kot W, Schaffer M, M. (1985) Do strange attractors govern ecological systems? Bioscience 35(6):342–350

81 Dynamical Systems and Fitness Maximization in Evolutionary Biology

2167

Kruuk LEB, Clutton-Brock T, Slate J, Pemberton J, Brotherstone S, Guinness F (2000) Heritability of fitness in a wild mammalian population. PNAS 97(2):698–703 Leroi A (2011) WHO is the greatest biologist of all time? https://www.edge.org/conversation/whois-the-greatest-biologist-of-all-time. Accessed 9 Mar 2020 Lewontin R (2003) Four complications in understanding the evolutionary process. Santa Fe Institute Bulletin (Santa Fe Institute). https://sfi-edu.s3.amazonaws.com/sfi-edu/production/ uploads/publication/2016/10/31/winter2003v18n1.pdf. Accessed 28 Feb 2020 Lynch M (2010a) Rate, molecular spectrum, and consequences of human mutation. PNAS 107(3):961–968. www.pnas.org/cgi/doi/10.1073/pnas.0912629107 Lynch M (2010b) Evolution of the mutation rate. Trends Genet 26(8):345–352. https://doi.org/10. 1016/j.tig.2010.05.003 Lynch M (2016) The origins of genome complexity. Sinauer Associates, Sunderland Lynch M, Burger R, Butcher D, Gabriel W (1993) The mutational meltdown in asexual populations. J Hered 84(5):339–344. https://doi.org/10.1093/oxfordjournals.jhered.a111354 Michod RE (1999) Darwinian dynamics: evolutionary tranistions in fitness and individuality. Princeton Univerity Press, Princeton Minnich S, Van Hofwegen D, Hovde C (2016) Rapid evolution of citrate utilization by Escherichia coli by direct selection requires citT and dctA. J Bacteriol 198(7) https://jb.asm.org/content/jb/ 198/7/1022.full.pdf Morgan TH, Sturtevant AH, Muller HJ, Bridges CB (1915) The mechanism of Mendelian heredity. Henry Holt, New York. http://www.esp.org/books/morgan/mechanism/facsimile/. Accessed 9 Mar 2020 Muller HJ (1928) The problem of genic modification. In: Proceedings of the 5th International Congres, Supplementband of the Z. indukt. Abstamm.-u. Vererb-Lehre, pp 234–260 Muller HJ (1932) Some genetic aspects of sex. Am Nat 66(703):118–138. https://doi.org/10.1086/ 280418 Muller H (1950) Our load of mutations. Am J Hum Genet 2:111–176. https://www.ncbi.nlm.nih. gov/pmc/articles/PMC1716299/pdf/ajhg00429-0003.pdf. Accessed 28 Feb 2020 Muller H (1964) The relation of recombination to mutational advance. Mutat Res:2–9. https://doi. org/10.1016/0027-5107(64)90047-8 Nachman MW, Crowell SL (2000) Estimate of the mutation rate per nucleotide in humans. Genetics 156(1):297–304 Nelson C, Sanford JC (2011) The effects of low-impact mutations in digital organisms. Theor Biol Med Model 8:9 Nelson CW, Sanford JC (2013) Computational evolution experiments reveal a net loss of genetic information despite selection. In: Marks RJ, Behe MJ, Dembski WA, Gordon BL, Sanford JC (eds) Biological information new perspectives. World Scientific, Ithaca, pp 338–368 Nortan DE (1995) The fundamental theorem of dynamical systems. Comment Math Univ Carol 36(3):585–597. http://emis.matem.unam.mx/journals/CMUC/pdf/cmuc9503/norton.pdf Nowak MA (2006) Evolutionary dynamics: exploring the equations of life. Harvard University Press, Harvard Ohno S (1972) So much “junk” DNA in our genome. Brookhaven Symp Biol. http://www.junkdna. com/ohno.html. Accessed 4 Mar 2020 Ohta T (1977) Molecular evolution and polymorphism. Natl Inst Genet Mishima Jpn 76:148–167 Pennisi E (2013) The man who bottled evolution. Science 342(6160):790–793. https://science. sciencemag.org/content/342/6160/790.summary Phillips PC (2008) Epistasis–the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet 9(11):855–867. https://doi.org/10.1038/nrg2452 Plutynski A (2006) What was Fisher’s fundamental theorem of natural selection and what was it for? Stud Hist Phil Biol Biomed Sci 37(1):59–82 Price GR (1972) Fisher’s “fundamental theorem” made clear. Ann Hum Genet 36(2):129–140 Quammen D (2006) The reluctant Mr. Darwin. Atlas Books, Norton Queller DC (2017) Fundamental theorems of evolution. Am Nat 189(4):345–353

2168

W. Basener et al.

Razvan MR (2004) On Conley’s fundamental theorem of dynamical systems. Int J Math Math Sci. https://doi.org/10.1155/S0161171204202125 Rice SH (2004) Evolutionary theory: mathematical and conceptual foundations. Sinauer Associates, Sunderland Roberts HF (1929) Plant hybridization before Mendel. Princeton University Press, Princeton Roff DA (1992) The evolution of life histories: theory and analysis. Chapman and HAll, New York Royal Society (2020) The Royal Society, science in the making.https://makingscience.royalsociety. org/s/rs/people/fst00034451. Accessed 9 Mar 2020 Sanford JC (2019) Mendel’s accountant genetic macroevolution simulation. November 10. https:/ /github.com/genetic-algorithms/mendel-go/wiki. Accessed 07 Mar 2020 Sanford J, Nelson C (2012) The next step in understanding population dynamics: comprehensive numerical simulation. In: Carmen Fustú M (ed) Studies in population genetics. ISBN: 978953-51-0588-6 InTech. https://www.intechopen.com/books/studies-in-population-genetics/thenext-step-in-understanding-population-dynamics-comprehensive-numerical-simulation Sanford JC, Baumgardner J, Gibson P, Brewer W, ReMine W (2007a) Mendel’s accountant: a biologically realistic forward-time population genetics program. Scalable Comput: Pract Exper 8(2):147–165 Sanford JC, Baumgardner J, Gibson P, Brewer W, ReMine W (2007b) Using computer simulation to understand mutation accumulation dynamics and genetic load. In: Shi Y, van Albada GD, Dongarra J, Sloot PM (eds) International conference on computational science. Springer, Berlin/Heidelberg, pp 386–392 Sanford JC, Baumgardner J, Brewer W (2013) Selection threshold severely constrains capture of beneficial mutations. In: Biological information – new perspectives. World Scientific, Ithaca, pp 264–297 Sanford JC, Brewer W, Smith F, Baumgardner J (2015) The waiting time problem in a model Hominin population. Theor Biol Med Model 12(18). https://doi.org/10.1186/s12976-015-0016z Sarraf MA, Woodley MA, Feltham C (2019) Making the case for mutation accumulation. In: Modernity and cultural decline. Palgrave Macmillan, Cham, pp 197–228. https://doi.org/10. 1007/978-3-030-32984-6_6 Schneider D, Wielgoss S, Barrick J, Tenaillon O, Cruveiller S, Chane-Woon-Ming B, Médigue C, Lenski R (2011) Mutation rate inferred from synonymous substitutions in a long-term evolution experiment with Escherichia coli. G3 Genomes Genetics 1(3):183–186 Schneider D, Raeside C, Gaffe J, Deatherage DE, Tenaillon O, Briska AM, Ptashkin RN, Cruveiller S, Medigue C, Lenski RE, Barrick JE (2014) Large chromosomal rearrangements during a longterm evolution experiment with Escherichia coli. Am Soc Microbiol 156(2):1–13 Solé V, Sardanyés J, Ricard (2007) Red queen strange attractors in host–parasite replicator genefor-gene coevolution. Chaos, Solitons Fractals 32(5):1666–1678 Stamhuis IH (2003) The reactions on Hugo de Vries’s intracellular pangenesis; the discussion with August Weismann. J Hist Biol 36(1):119–152 Stamhuis IH, Meijer OG, Zevenhuizen EJ (1999) Hugo De Vries on heredity, 1889–1903. Statistics, Mendelian Laws, Pangenes, mutations. Isis 90(2):238–267. https://doi.org/10.1086/ 384323. Accessed 8 Mar 2020 Sutton WS (1903) The chromosomes in heredity. Biol Bull 4:231–251. https://www.journals. uchicago.edu/doi/pdfplus/10.2307/1535741. Accessed 8 Mar 2020 Teicher A (2018) Caution, overload: the troubled past of genetic load. Genetics:747–755. https:// doi.org/10.1534/genetics.118.301093 Tuljapurkar S (1990) Population dynamics in variable environments. Springer-Verlag, New York Walsh B, Lynch M (2018) Evolution and selection of quantitative traits. Oxford University Press, Oxford Wielgoss S, Barrick J, Tenaillon O, Wiser M, Dittmar WJ, Cruveiller S, Chane-Woon-Ming B, Medigue C, Lenski RE (2013) Mutation rate dynamics in a bacterial population reflect tension between adaptation and genetic load. Proc Natl Acad Sci 110(1):222–227

81 Dynamical Systems and Fitness Maximization in Evolutionary Biology

2169

Wilke CO (2005) Quasispecies theory in the context of population genetics. BMC Evol Biol. https:/ /doi.org/10.1186/1471-2148-5-44 Wolf YI, Koonin EV (2013) Genome reduction as the dominant mode of evolution. BioEssays 35(9):829–837 Xiao C, Qiu S, Meldrum Robertson R (2017) The white gene controls copulation success in Drosophila melanogaster. Sci Rep 7. https://doi.org/10.1038/s41598-017-08155-y Zhang SD, Odenwald WF (1995) Misexpression of the white (w) gene triggers male-male courtship in Drosophila. Proc Natl Acad Sci U S A 92(12):5525–5529. https://doi.org/10.1073/ pnas.92.12.5525

Damped Dynamical Systems for Solving Equations and Optimization Problems

82

Mårten Gulliksson, Magnus Ögren, Anna Oleynik, and Ye Zhang

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ill-Posed Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . From Linear to Nonlinear Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Local Linearization Using Optimal Damping and Time Step . . . . . . . . . . . . . . . . . . . . . . . Total Energy as a Lyapunov Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Problems for Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications in Quantum Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2172 2177 2177 2179 2182 2183 2188 2189 2191 2191 2194 2194 2199 2203 2210 2212

Abstract We present an approach for solving optimization problems with or without constrains which we call Dynamical Functional Particle Method (DFMP). The

M. Gulliksson () M. Ögren Mathematics, School of Engineering and Technology, Örebro, Sweden e-mail: [email protected] A. Oleynik Department of Mathematics, University of Bergen, Bergen, Norway Y. Zhang Faculty of Mathematics, Chemnitz University of Technology, Chemnitz, Germany © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_32

2171

2172

M. Gulliksson et al.

method consists of formulating the optimization problem as a second order damped dynamical system and then applying symplectic method to solve it numerically. In the first part of the chapter, we give an overview of the method and provide necessary mathematical background. We show that DFPM is a stable, efficient, and given the optimal choice of parameters, competitive method. Optimal parameters are derived for linear systems of equations, linear least squares, and linear eigenvalue problems. A framework for solving nonlinear problems is developed and numerically tested. In the second part, we adopt the method to several important applications such as image analysis, inverse problems for partial differential equations, and quantum physics. At the end, we present open problems and share some ideas of future work on generalized (nonlinear) eigenvalue problems, handling constraints with reflection, global optimization, and nonlinear ill-posed problems.

Keywords Optimization · Damped dynamical systems · Convex problems · Eigenvalue problems · Image analysis · Inverse problems · Quantum physics · Schrödinger equation

Introduction In this chapter we describe the idea of solving optimization problems (with or without constraints) and equations by using a second-order damped dynamical system. In order to introduce the idea, let us consider a simple example from classical mechanics. The harmonic oscillator is described by mu¨ = −ku,

k > 0.

(1)

Here u(t) is the distance from the equilibrium of a mass m on which a force −ku, k > 0 is acting and the dot denotes the time derivative. For example, the mass could be attached to a spring where then ku is the force acting on the mass; see Fig. 1. The solution of (1) is  u(t) = C1 sin

k t + C2 m



where the constants Ci are given by the initial position and velocity at, say, t = t0 . The mass never reach equilibrium unless u(t0 ) = 0. However, if we in addition assume that there is some friction proportional to the velocity u, ˙ e.g., between the mass and the surface on which it is sliding, we get a damped second-order system mu¨ = −ηu˙ − ku,

(2)

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2173

Fig. 1 A simple oscillating spring-mass system

where η > 0 is the friction constant. In this case, the solution is given as u(t) = C1 eξ1 t + C2 eξ2 t where Ci are determined by the initial conditions and ξi = −η/(2m) ±  (η/(2m))2 − k/m. It is easy to see that u tends to zero which is the equilibrium position of the mass and the stationary solution to (1). Thus, the system (2) can be used to find the stationary solution of the harmonic oscillator (1). Further, we note that u = 0 is also the solution of the (trivial) optimization problem min ku2 /2. In fact, the convex function V (u) = ku2 /2 is the potential corresponding to the force F = −ku which is conservative since F = −dV /dt. Of course, in this case it is easier to solve ku = 0 or min ku2 /2 than to integrate (2). However, this simple idea will be extended to solve more challenging problems. Let us generalize the mechanical system slightly by imagining masses mij in a rectangular grid where each mass except on the boundary is connected to its four closest neighbors by springs. Different types of boundary conditions could be introduced that we do not specify here. The damped dynamical system for this problem is M u¨ + ηu˙ = −Au where u is a vector consisting of the x- and y-parts of the displacement of masses in some given order and A is a positive definite symmetric matrix with spring constants. It is not difficult to show that the stationary solution is the solution to the linear system of equations Au = 0. More generally, we realize that we can solve any linear system of equations Au = b with A positive definite by finding the stationary solution to a damped dynamical system. The potential corresponding to the force Au is the convex function V (u) = uT Au/2 that has a unique minimum corresponding to the solution of the linear system of equations. As before F = −Au is conservative since F = −∇V . The continuous problem corresponding to the last example is an isotropic linear elastic continuum where u = u(x, y, t) is the displacement field. The second-order damped system is the partial differential equation (sometimes called the damped wave equation) utt + ηut = −∇ · k∇u where k = k(x, y) > 0. The stationary solution is given by the solution of Poissons equation ∇ · k∇u = 0, and we note that damped dynamical systems can be used to find the solution of many different partial differential equations. We now expand our idea into a more general setting and show how to numerically solve the problem in a stable and efficient way. For the sake of brevity, we omit

2174

M. Gulliksson et al.

proofs of the theorems that require introducing additional theory or are lengthy. We however provide numerous useful references to recent publications and to the state of the art where one could find the missing and additional information. Otherwise we try to supply enough details, with all the numerical results being reproducible, so the reader can expand on the theory and applications. Some of the ideas presented here are work in progress. Our starting point is the minimization problem min V (u),

u∈H

(3)

where H is a real Hilbert space and V : H → R is a smooth (analytical) convex functional. We use the conventional notation for inner product and norm as (·, ·) and  · , respectively, with subindices added to specify the underlying spaces if needed. The main idea, already introduced above, for solving (3) is to utilize the fact that the solution to (3) is also a stationary solution, say u∗ , to the second-order damped dynamical system u(t) ¨ + ηu(t) ˙ = −∇V (u(t)), η > 0,

(4)

and this solution is unique and globally exponentially stable, see, e.g., references in Begout et al. (2015). We have seen that the problem (4) naturally appears in modeling mechanical systems where an additional relevant example is the heavy ball with friction system (HBF). In this case (4) describes the motion of a material point with positive mass. The optimization properties for HBF with different friction have been studied in detail in Attouch et al. (2000), Attouch and Alvarez (2000), and Alvarez (2000) and references therein. Throughout the whole chapter, we reserve the dot notation for the derivatives with respect to the fictious time t. Moreover, we use u as the unknown everywhere except in section “Inverse Problems for Partial Differential Equations” where we use p. Since the minimum of (3) is the (unique) solution of ∇V (u) = 0,

(5)

we can also use (4) to solve equations such as linear equations that we will expand upon in the sections to come. After the problem has been formulated as (4), the important question is the following: How does one choose a numerical method for solving (4) that is efficient and fast enough to compete with the existing methods of solving (3) or (5)? We observe that the dynamical system (4) is Hamiltonian with the total energy 1 2 E(u(t), u(t)) ˙ = V (u(t)) + u(t) ˙ , 2

(6)

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2175

which is also a Lyapunov function to (4). Its exponential decrease in time results in an exponential decay of u(t) − u∗ . Symplectic methods, such as symplectic Runge-Kutta, Störmer-Verlet, etc., are tailor-made for Hamiltonian systems and preserve the energy. This serves as the motivation for our choice of numerical method. Let us rewrite (4) as the first-order system u˙ = v v˙ = −ηv − ∇V (u).

(7)

Then, we apply a one-step symplectic explicit method, such as symplectic Euler or Störmer-Verlet (Hairer et al., 2006), which give us an iterative map on the form wk+1 = F (wk , tk , ηk ), wk = (uk , vk ), k = 1, 2, . . . ,

(8)

where the time step tk , damping ηk may be independent of k. The choice of parameters tk and ηk can be aimed to optimize the performance of the numerical method, which generally is a nontrivial task. We call the approach of finding the solution to (3) or (5) by solving (4) with a symplectic method as the dynamical functional particle method (DFPM) (Gulliksson et al., 2013). We would like to emphasize that it is the combination of the damped dynamical system together with an efficient (fast, stable, accurate) symplectic solver that makes DFPM a novel and powerful method. Even if the idea of solving minimization problems using (4) with different damping strategies goes far back (see Poljak 1964 and Sandro et al. 1979), it has not been systematically treated with symplectic solvers and optimal parameter choices. DFPM is readily extended to constrained problems. Consider a minimization problem on a convex constraint set G = {u ∈ H : g(u) = 0} where g is smooth, i.e., minu∈H V (u) s.t. g(u) = 0.

(9)

The problem (9) has a unique solution u∗ if ∇g(u∗ ) is of full rank. The corresponding dynamical system can be formulated using the Lagrange function L(u, μ) = V (u) + (∇g(u), μ), where μ is the Lagrange parameter. The dynamical system for solving (9) is given by u¨ + ηu˙ = −∇V (u) − (∇g(u), μ)

(10)

with μ(t) chosen such that u(t) tends to u∗ . For existence and uniqueness of solutions to constrained problems in a more general setting, see McLachlan et al. (2014) and Alvarez et al. (2002) and references therein.

2176

M. Gulliksson et al.

In order to solve (9) using (10), we can either choose μ(t) to remain on the constraint set G or approach it as t grows. We intend to realize these two strategies by using (i) projection and (ii) full damped formulation, respectively. The first approach is executed by formulating the iterative map (8) and then projecting wk to the set G at each iteration step. This method is generally costly but there are exceptions, e.g., eigenvalue problems with only normalization constraints (see section “Linear Eigenvalue Problems”). We consider nonlinear ill-posed inverse problems using DFPM with projection in section “Inverse Problems for Partial Differential Equations” and nonlinear eigenvalue problems in section “The Yrast Spectrum for Atoms Rotating in a Ring”. For the fully damped approach, we introduce an additional dynamical system for the constraints g as g¨i + ξi g˙ i = −ki gi , ξi , ki > 0.

(11)

Then g(u(t)) tends to zero exponentially fast, and the equations in (11) can be used to derive explicit expressions of μ(t) in (10). A first attempt in this direction was made in Gulliksson (2017) (see section “Linear Eigenvalue Problems”) and also applied to a nonlinear Schrödinger equation (see section “Excited States to the Schrödinger Equation”). Symplectic solvers for undamped Hamiltonian systems has long been known for their excellent behavior for integration over long time intervals (Hairer et al., 2006). There has been much less research on symplectic methods for the damped system and even less is known about the use of symplectic methods for the constrained problem (9). However, there are some results (see, e.g., Bhatt et al. 2016 and Mclachlan et al. 2006) that indicate a similar good long-term performance for the damped case, which further supports DFPM. To the best of our knowledge, there are no other symplectic methods developed specifically to attain fast convergence to a stationary solution of (4) and (10) except DFPM for linear problems (see section “Linear Problems” and Edvardsson et al. 2015, Neuman et al. 2015, and Gulliksson 2017) and our work on the constrained problems (see sections “Linear Eigenvalue Problems” and “The Yrast Spectrum for Atoms Rotating in a Ring”, and Sandin et al. 2016 and Gulliksson 2017). Very recently we have started to develop DFPM for ill-posed problems (see sections “Ill-Posed Problems”, “Image Analysis”, and “Inverse Problems for Partial Differential Equations”), inspired by work on second-order methods for ill-posed linear problems and the work in Attouch and Chbani (2016). One of the main advantages of DFPM as opposed to Tikhonov or other regularization techniques is the simplified choice of regularization. In the linear case, this is manifested by an a posteriori stopping time (see section “Ill-Posed Problems”) and for inverse ill-posed source problems by using a decreasing regularization term (see section “Inverse Problems for Partial Differential Equations”). A closely related approach for solving (3) that has been studied extensively in, e.g., Smyrlis and Zisis (2004), is the steepest descent method u(t) ˙ + α∇V (u(t)) = 0, α > 0.

(12)

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2177

It might seem that (12) should be better than (4) since the exponential decrease toward the stationary solution in (12) can be made arbitrary large by choosing α large enough. However, this is not true if one takes into account the stability and accuracy of the numerical solver. DFPM has shown to have a remarkably faster convergence to the stationary solution than any numerical method applied to (12) (see Gulliksson et al. 2012 and Edvardsson et al. 2015). Finally, we would like to mention some other methods that are based on introducing an extra parameter in order to solve equations. These are the continuation method (Watson et al., 1997), fictitious time methods (Ascher et al., 2007; Tsai et al., 2010), and dynamical system methods developed for solving numerical linear algebra problems (Chu, 2008). Despite that these methods and DFPM share a common idea, their similarities does not extend further. The chapter is organized as follows. Linear system of equations, linear eigenvalue problems, linear least squares, and linear ill-posed problems are treated in section “Linear Problems”. The first three of these parts deal with finite dimensional problems, whereas linear ill-posed problems are considered in an infinite dimensional Hilbert space setting. In section “From Linear to Nonlinear Problems” we present two novel ideas how to solve nonlinear problems using either a local linearization or the Lyapunov function given by the total energy in (6). The next section contains important applications in image analysis, inverse source problems, and quantum physics where both linear and nonlinear (ill-posed) problems are treated. Finally, we make some conclusions and present some possible future research where DFPM can be further developed and analyzed.

Linear Problems Linear Equations Consider the linear system of equations Au = b, A ∈ Rn×n , b ∈ Rn ,

(13)

where A is positive definite. It is straightforward to formulate (13) as the dynamical system (4) by letting V (u) = uT Au/2 − bT u. We have ˙ = v0 . u¨ + ηu˙ = b − Au, u(0) = u0 , u(0)

(14)

As it was already discussed in section “Introduction”, the solution to (14) is globally asymptotically stable. We reformulate (14) as the system u˙ = v . v˙ = −ηv + (b − Au)

(15)

2178

M. Gulliksson et al.

and apply symplectic Euler to obtain 

vk+1 = (I − t η)vk − t (b − Auk ) uk+1 = uk + t vk+1

(16)

or, equivalently, 

wk+1

   I − t 2 A t (1 − t η) I −t 2 b = Bwk + c, B = , c= , t I (1 − t η) I −tb

(17)

where wk = [uk , vk ]T . In order to ensure fast convergence of the method, one must choose B < 1 and as small as possible. This choice provides the optimal time step and damping which can be summarized in the following theorem. Theorem 1. Consider symplectic Euler (16) where λi = λi (A) > 0 are the eigenvalues of A. Then the parameters √ √ 2 λmin λmax 2 ,η = √ , t = √ √ √ λmin + λmax λmin + λmax

(18)

where λmin = mini λi and λmax = maxi λi are the solution to the problem min max |μi (B)|, t,η 1≤i≤2n

where μi (B) are the eigenvalues of B. Proof. We outline the idea of the proof and refer for details to Edvardsson et al. (2015). Let A = U U T be a diagonalization of A. We apply the transformations u˜ = U T u and w˜ k = diag(U T , U T )wk to (14) and (17) and obtain the system of n decoupled oscillators u¨˜ i + ηu˙˜ i + λi u˜ i = b˜i and w˜ k+1

= B˜ w˜k +c, ˜

   T    U O U O I − t 2  t (1 − t η) I ˜ B B= , = O UT t  (1 − t η) I OU

˜ and hence B, are then given as respectively. The eigenvalues of B, μi (B) = 1 −

t t (η + tλi ) ± 2 2



(η + tλi )2 − 4λi .

From Edvardsson et al. (2015) each oscillator has fastest decay if they√are critically √ damped. This implies η + tλmin = 2 λmin and η + tλmax = 2 λmax which consequently yields (18).



82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2179

The obtained above optimal parameter choice gives the following important property. Corollary 1. Symplectic Euler defined by (16) with t and η given by (18) is stable and convergent. Proof. By construction the eigenvalues μi (B) corresponding to the smallest and largest eigenvalue of A has no imaginary part, and all other μi (B) are imaginary (underdamped). By direct calculation we then get |μi (B)| =



Re(μi (B))2 + I m(μi (B))2 =

 1 − ηt.

Substituting the optimal parameters in the equation above yields 0 < 1 − ηt < 1 and therefore stability. Since symplectic Euler is consistent, we get convergence by the Lax equivalence theorem.

In Edvardsson et al. (2015) the system (16) has been used to solve very large sparse linear systems (up to 107 unknowns) using optimal parameters in symplectic Euler (18). In particular, DPFM was tested on a linear Poisson equation discretized using finite differences and a problem where the matrix originates from an s-limit two-electron Hamiltonian in quantum mechanics. For the Poisson problem, DFPM clearly outperformed classical methods such as Gauss-Seidel, Jacobi, as well as the method based on the first-order ODE (12). On the quantum physics problem, DFPM was slightly faster in CPU time than the conjugate gradient method and Chebyshev semi-iterations. It was shown that DFPM is less computationally expensive in each iteration than the other two methods, and there is no benefit of choosing a higherorder symplectic solver. DFPM has shown excellent convergence rates for an indefinite system of equations (see Neuman et al. 2015). The drawback of the method in that case is to find an efficient set of parameters. Finally the more general system M u¨ + N u˙ = b − Au, u(0) = u0 , u(0) ˙ = v0 ,

(19)

with M, N positive definite and A not necessarily positive definite, can be considered instead of (10). This case was analyzed for convergence in Gulliksson et al. (2013). While it is straightforward to apply a symplectic method to solve (19), the choice of the optimal time step and the parameters in M, N remains an open problem.

Linear Eigenvalue Problems Consider the eigenvalue problem

2180

M. Gulliksson et al.

Au = λu,

uT u = 1,

(20)

where A ∈ Rn×n is positive definite. We assume that the eigenvalues are distinct (non-defect eigenvalue problem) and sorted as 0 < λ1 < λ2 < · · · < λn . Then the solution of the minimization problem minu∈Rn V (u) = 12 uT Au s.t g(u) = uT u − 1 = 0,

(21)

is the eigenvector u1 corresponding to the smallest eigenvalue λ1 as the solution. As we have discussed in section “Introduction”, the constrained optimization problem could be solved by using (i) projection to the manifold or by (ii) introducing additional dynamical system for constraints. Here we present both approaches. For the first approach, we formulate the corresponding dynamical system to (21) where we project the gradient of V (u) on the tangent space of the unit sphere u2 = 1, that is, (I − uuT )∇V (u) = (I − uuT )Au. This gives u¨ + ηu˙ = (uT Au)u − Au, u2 = 1 ˙ = v0 . u(0) = u0 , u(0)

(22)

Theorem 2. The system (22) is globally exponentially stable. Proof. The total energy (6) is given as E = uT Au/2 + u ˙ 22 /2 where u2 = 1. T T T We then have dE/dt = u˙ (I − uu )Au + u˙ u¨ and, using (22), dE/dt = −ηu ˙ 22 . Therefore, E is a Lyapunov function, and the system (22) is globally exponentially stable.

Applying symplectic Euler to solve (22) we get vk+1 = (1 − t η)vk − (I − uuT )Au yk+1 = yk + tvk+1 , uk+1 = yk+1 /yk+1 2 .

(23)

As the system (23) is nonlinear, the optimal time step and damping are not necessarily constant, as for linear system of equations. However, close to u1 we can derive locally optimal parameters by considering a linearization. We calculate the Jacobian of F (u) = (uT Au)u−Au and project it on the tangent space of the unit sphere to obtain J (u) = (I − uuT )∂F /∂u = (I − uuT )((uT Au)I − uuT A − AuuT − A). The equalities uT1 Au1 = λ1 and (I − u1 uT1 )u1 = 0 yield J (u1 ) = λ1 I − A. Hence the first approximation of (22) close to u1 is given by u¨ + ηu˙ = (λ1 I − A)u, u2 = 1.

(24)

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2181

Using Theorem 1 and (24) we derive, the locally optimal time step and damping as √ √ λ2 − λ1 λn − λ1 2 , η = 2√ . t = √ √ √ λ2 − λ1 + λn − λ1 λ2 − λ1 + λn − λ1

(25)

This result is easily extended to the more general problem of finding eigenvector um assuming ui , i = 1, . . . , m − 1 to be known. Indeed, instead of (21) we solve minu∈Rn V (u) = 12 uT Au s.t uT1 u = 0, . . . , uTm−1 u = 0, uT u = 1.

(26)

The locally optimal parameters of the corresponding discretized system is then given as √ √ λm+1 − λm λn − λm 2 , η = 2√ . t = √ √ √ λm+1 − λm + λn − λm λm+1 − λm + λn − λm

(27)

In Gulliksson et al. (2012) DFPM with a symplectic Euler was used for finding the s-limit of the Helium ground state and first excited stage giving matrices of size n = 500,000 or larger. It was shown that DFPM outperforms the standard package ARPACK and has a complexity rate of O(n3/2 ), i.e., the same order as conjugate gradient methods. For further details and numerical tests, we refer the reader to Gulliksson (2017) and section “Excited States to the Schrödinger Equation”. For the damped constraint approach, we introduce gm = (uT u − 1)/2 = 0,

gi = uTi u = 0, i = 1, . . . , m − 1.

Then we consider the following dynamical system u¨ + ηu˙ = −Au +

m−1

μi ui + μm u

(28)

g¨i + ηg˙ i = −ki gi , i = 0, . . . , m − 1.

(29)

i=1

Observe that ∇gm = u, ∇gi = ui , i = 1, . . . , m − 1. This method for solving eigenvalue problem was first introduced in Gulliksson (2017) where it was shown that u(t) converges asymptotically to the eigenvector um . In Gulliksson (2017) it was also shown that the choice of ki does not change the local convergence rate if λm+1 − λm < ki < λn − λm . Given these constraints, the local convergence of the corresponding symplectic Euler with the parameters as in (25) will be the same as for the projection approach. However, while the two approaches have the same local behavior, it is not known generally which of these two methods is faster for a specific problem.

2182

M. Gulliksson et al.

Linear Least Squares The linear least squares problem can be formulated as the damped dynamical system (4) using V (u) =

1 Au − b22 , 2

A ∈ Rm×n , b ∈ Rm , m ≥ n.

(30)

We obtain u¨ + ηu˙ = AT (b − Au),

(31)

u˙ = v v˙ = −ηv + AT (b − Au).

(32)

or, equivalently,

Here we assume that A has full rank in order to keep the problem well-posed (see section “Ill-Posed Problems”) for the ill-posed case. The most common and efficient iterative methods for (30) are conjugate gradient (CG) methods such as LSQR and LSMR. From Edvardsson et al. (2015) and preliminary results, we know that DFPM in some cases performs as well as conjugate gradient methods and even outperforms it. This serves as a motivation to further develop DFPM for the least squares problem. Once again we use the symplectic Euler to obtain the iterative map 

uk+1 = uk + t vk vk+1 = (I − t η)vk − AT (Auk+1 − b).

(33)

We choose the parameters as in (18) where λi (A) is substituted by λi (AT A) = σi2 (A), the square of the singular values σi (A) of A. It is well known from the analysis of the conjugate gradient method that explicitly forming the matrix AT A will introduce unnecessary rounding errors. Therefore, we introduce a new vector d = Au − b in (33) and get 

dk+1 = dk + t Avk vk+1 = (I − t η)vk − AT dk+1 .

(34)

The result in Fig. 2 shows the performance of DFPM compared to the conjugate gradient method for a set of random matrices A. No preconditioning was done since its effect is expected to be similar for both methods. Further work will be done to investigate the convergence of DFPM. Of specific interest is the influence of the clustering of eigenvalues which is well known to improve the convergence of conjugate gradient methods.

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2183

Fig. 2 Estimated normalized execution time as a function of the number of rows in A for DFPM compared to the conjugate gradient method. Number of columns is n = 1000, sparsity is 0.0001%, condition number is 20, and absolute tolerance is 10−4 . The markers in red and green indicate when DFPM performs better or worse, respectively. The curve in the green part is the mean value of 50 random problems. The estimated time for convergence is attained with Matlabs tic and toc commands

Ill-Posed Problems Consider the linear operator equations, Au = y,

(35)

where A is an injective and compact linear operator acting between two infinite dimensional Hilbert spaces U and Y. A group of examples for (35) are the integral equations, which are models with applications in the natural sciences (Wang et al., 2012, 2013; Zhang et al., 2017a), mathematics (Zhang et al., 2013, 2016c), imaging (Yao et al., 2018; Zhang et al., 2015), and engineering (Zhang et al., 2016a). Since A is injective, the operator equation (35) has a unique solution u† ∈ U for every y from the range R(A) of the linear operator A. In this context, R(A) is assumed to be an infinite dimensional subspace of Y. Suppose that, instead of the exact right-hand side y = Au† , we are given noisy data y δ ∈ Y obeying the deterministic noise model y δ − y ≤ δ with noise level δ > 0. Since A is compact and dim(R(A)) = ∞, we have R(A) = R(A) and the problem (35) is ill-posed. Therefore, regularization methods should be employed for obtaining stable approximate solutions. Loosely speaking, two groups of regularization methods exist: variational regularization methods and iterative regularization methods. Tikhonov regulariza-

2184

M. Gulliksson et al.

tion is certainly the most prominent variational regularization method, while the Landweber iteration is the most famous iterative regularization approach (see, e.g., Tikhonov et al. 1998 and Engl et al. 1996). For the linear problem (35), the Landweber iteration is defined by uk+1 = uk + tA∗ (y δ − Auk ),

t ∈ 0, 2/A2 ,

(36)

where A∗ denotes the adjoint operator of A. We refer to Engl et al. (1996, § 6.1) for the regularization property of the Landweber iteration. The continuous analogue of (36) can be considered as a first-order evolution equation in Hilbert spaces u(t) ˙ + A∗ Au(t) = A∗ y δ

(37)

if an artificial scalar time t is introduced, and t → 0 in (36). The formulation (37) is known as Showalter’s method or asymptotic regularization (Tautenhahn, 1994; Vainikko and Veretennikov, 1986) that we introduced earlier in (12). The regularization property of (37) can be analyzed through a proper choice of the terminating time. Moreover, it has been shown that by using Runge-Kutta integrators, all of the properties of asymptotic regularization (37) carry over to its numerical realization (Rieder, 2005). From a computational viewpoint, the Landweber iteration, as well as the steepest descent method and the minimal error method, is quite slow. Therefore, in practice accelerating strategies are usually used; see Kaltenbacher et al. (2008) and Neubauer (2000) and references therein for details. Over the last few decades, besides the first-order iterative methods, there has been increasing evidence to show that the discrete second-order iterative methods also enjoy remarkable acceleration properties for ill-posed inverse problems. The wellknown methods are the Levenberg-Marquardt method, the iteratively regularized Gauss-Newton method, and the Nesterov acceleration scheme (Neubauer, 2017). DFPM can be viewed as a second-order iterative method with the corresponding dynamical system 

u(t) ¨ + ηu(t) ˙ + A∗ Au(t) = A∗ y δ , ˙ = v0 , u(0) = u0 , u(0)

(38)

where u0 , v0 ∈ U are the prescribed initial data and η is a positive constant damping parameter. Let us assume we have found a stopping time T ∗ (δ) < ∞ using some criteria to be discussed later in more detail. The rate of convergence of u(T ) → u† as T → ∞ in the case of precise data, and of u(T ∗ (δ)) → u† as δ → 0 in the case of noisy data, can be arbitrarily slow for solutions u† which are not smooth enough (see Schock 1985). In order to prove convergence rates, some kind of smoothness assumptions imposed on the exact solution must be employed. Here we use the range-type source conditions, that is, we assume that there exist elements z0 , z1 ∈ X and numbers

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2185

p > 0 and ρ ≥ 0 such that

p u0 − u† = A∗ A z0

with

z0  ≤ ρ

(39)

and

p v0 = A∗ A z1

with

z1  ≤ ρ.

(40)

In many cases the source conditions could be interpreted in the form of differentiability of the exact solution, boundary conditions, or similar. For the choice v0 = 0, the condition (40) is trivially satisfied. However, following the discussions in Zhang and Hofmann (2018), the regularized solutions essentially depend on the value of v0 . A good choice of v0 provides an acceleration of the regularization algorithm. In practice, one can choose a relatively small value of v0 to balance the source condition and the acceleration effect. Below we give three theorems on convergence properties of the method and the choice of the regularization parameter. For the sake of brevity, we omit proofs here. The range of positive constants γ , γ∗ , γ1 , and γ2 and further details can be found in Zhang and Hofmann (2018). Theorem 3. (A priori choice of the regularization parameter) If the terminating time T ∗ of the second-order flow (38) is selected by the a priori parameter choice T ∗ (δ) = c0 ρ 2/(2p+1) δ −2/(2p+1)

(41)

with the constant c0 = (2γ )2/(2p+1) , then we have the error estimate for δ ∈ (0, δ0 ] u(T ∗ ) − u†  ≤ cρ 1/(2p+1) δ 2p/(2p+1) ,

(42)

where c = (1 + γ∗ )(2γ )1/(2p+1) and δ0 = 2γρη2p+1 . In practice, the stopping rule in (41) is not realistic, since a good terminating time point T ∗ requires knowledge of ρ which is a characteristic of the unknown exact solution. Such knowledge, however, is not necessary in the case of a posteriori parameter choices. For choosing the termination time point a posteriori, we make use of Morozov’s conventional discrepancy principle and the newly developed total energy discrepancy principle (see Zhang and Hofmann 2018). In our setting, Morozov’s conventional discrepancy principle means searching for values T > 0 satisfying the equation χ (T ) := Au(T ) − y δ  − τ δ = 0, where τ is bounded below by γ1 ≥ 1. Lemma 1. If Au0 − y δ  > τ δ, then the function χ (T ) has at least one root.

(43)

2186

M. Gulliksson et al.

Theorem 4. (A posteriori choice I of the regularization parameter) If the terminating time T ∗ of the second-order flow (38) is chosen according to the discrepancy principle (43), we have for any δ ∈ (0, δ0 ] and p > 0 the error estimates T ∗ ≤ C0 ρ 2/(2p+1) δ −2/(2p+1)

(44)

u(T ∗ ) − u†  ≤ C1 δ 2p/(2p+1) ,

(45)

and

where δ0 is defined in the Theorem 3, C0 := (τ − γ1 )−2/(2p+1) (2γ )2/(2p+1) , and C1 := (τ + γ1 )2p/(2p+1) (γ1 + γ2 )1/(2p+1) + γ∗ (τ − γ1 )−1/(2p+1) (2γ )1/(2p+1) . The second method searches for roots T > 0 of the total energy discrepancy function, i.e., ˙ )2 − τte2 δ 2 = 0, χte (T ) := Au(T ) − y δ 2 + u(T

(46)

where τte > γ1 is a constant. Lemma 2. The function χte (T ) is continuous and monotonically non-increasing. If Au0 − y δ 2 + u˙ 0 2 > τ 2 δ 2 , then χte (T ) = 0 has a unique solution. Theorem 5. (A posteriori choice II of the regularization parameter) Assume that a positive number δ1 exists such that for all δ ∈ (0, δ1 ], the unique root T ∗ of χte (T ) satisfies the inequality Au(T ∗ ) − y δ  ≥ τ1 δ, where τ1 > γ1 is a constant, independent of δ. Then, for any δ ∈ (0, δ0 ] and p > 0 we have the error estimates T ∗ ≤ C0 ρ 2/(2p+1) δ −2/(2p+1) ,

u(T ∗ ) − u†  ≤ C1 δ 2p/(2p+1) ,

(47)

where δ0 is defined in Theorem 3 and C0 and C1 are the same as in Theorem 4.

Numerical Simulations Roughly speaking, in the language of ill-posed problems, DFPM yields a discrete second-order iterative method, and as mentioned before symplectic integrators are more stable due to their inherit property of minimizing the exponential decrease in energy due to the damping retaining stability. Here, we use the Störmer-Verlet method, which belongs to the family of symplectic integrators, giving at the k-th iteration the scheme

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

  ⎧ k+ 12 k + t A∗ (y δ − Auk ) − ηv k+ 12 , ⎪ v = v ⎪ 2 ⎪ ⎪ 1 ⎨ k+1 u = uk + tv k+2 ,  1 k+1 = v k+ 2 + t A∗ (y δ − Auk+1 ) − ηv k+ 12 , ⎪ ⎪ v ⎪ 2 ⎪ ⎩ 0 u = u0 , v 0 = v0 .

2187

(48)

Our numerical tests are for the following integral equation  Au(s) :=

1

K(s, t)u(t)dt = y(s),

K(s, t) = s(1−t)χs≤t +t (1−s)χs>t . (49)

0

If we choose X = Y = L2 [0, 1], the operator A is compact, self-adjoint, and injective. It is well known that the integral equation (49) has a solution u = −y  if y ∈ H 2 [0, 1] ∩ H01 [0, 1]. Moreover, the operator A has the eigensystem Aϕj = √ σj ϕj , where σj = (j π )−2 and ϕj (t) = 2 sin(j π t). Furthermore, using the interpolation theory (see, e.g., Lions and Magenes 1972), it is not difficult to show that for 4p − 1/2 ∈ N R((A∗ A)p ) = u ∈ H 4p [0, 1] : u2l (0) = u2l (1) = 0, l = 0, 1, . . . , 2p − 1/4 , where · denotes the standard floor function. In general, a regularization procedure becomes numerically feasible only after an appropriate discretization. Here, we apply the linear finite elements to solve (49). Let Yn be the finite element space of piecewise linear functions on a uniform grid with step size 1/(n − 1). Denote by Pn the orthogonal projection operator acting from Y into Yn . Define An := Pn A and Un := A∗n Yn . Let {φj }nj=1 be a basis of the finite element space Yn ; then, instead of the original problem (49), we solve the following system of linear equations An un = yn , where [An ]ij =

 1  1 0

0

(50)

 1 k(s, t)φi (s)ds φj (t)dt and [yn ]j = 0 y(t)φj (t)dt.

Uniformly distributed noises with the magnitude δ  are added to the discretized exact right-hand side:   [ynδ ]j := 1 + δ  · (2Rand(u) − 1) · [yn ]j ,

j = 1, . . . , n,

(51)

where Rand(u) returns a pseudorandom value drawn from a uniform distribution on [0,1]. The noise level of measurement data is calculated by δ = ynδ − yn 2 , where  · 2 denotes the standard vector norm in Rn .

2188

M. Gulliksson et al.

To assess the accuracy of the approximate solutions, we define the L2 -norm relative error for an approximate solution un (T ∗ (δ)) as L2Err := un (T ∗ (δ)) − u† L2 [0,1] /u† L2 [0,1] , where u† is the exact solution to the corresponding model problem. In order to demonstrate the advantages of DFPM over the traditional approaches, we solve the same problems by four well-known regularization methods – the Landweber method, the Nesterov’s method, the ν-method, and the conjugate gradient method for the normal equation (CGNE). The Landweber method is given in (36), while Nesterov’s method is defined as (Nesterov, 1983) 

k−1 zk = x k + k+α−1 (x k − x k−1 ), x k+1 = zk + tA∗ (y δ − Azk ),

(52)

where α > 3 (we choose α = 3.1 in our simulations). Moreover, we select the Chebyshev method as our special ν-method, i.e., ν = 1/2 Engl et al. (1996, § 6.3). For all of four traditional iterative regularization methods, we use the Morozov’s conventional discrepancy principle as the stopping rule. We consider the following two different right-hand sides for the integral equation (49). • Example 1: y(s) = s(1 − s). Then, u† = 2, and u† ∈ R((A∗ A)p ) for all p < 1/8. This example uses the discretization size n = 50. Other parameters are t = 19.4946, η = 2.5648 × 10−4 , u0 = 1, v0 = 0, τ = 2, p = 0.1125. • Example 2: y(s) = s 4 (1 − s)3 . Then, u† = −6t 2 (1 − t)(2 − 8t + 7t 2 ), and u† ∈ R((A∗ A)p ) for all p < 5/8. This example uses the discretization size n = 100. Other parameters are t = 19.4946, η = 0.0051, u0 = 0, v0 = 0, τ = 2, p = 0.5625. The results of the simulation are presented in Table 1 where kmax = 400, 000 is the maximal number of iterations, DP is the discrepancy principle, and TEDP is the total energy discrepancy principle. We can conclude that, in general, DFPM need less iterations and offers a more accurate regularization solution and that the two discrepancy principles are comparable both in number of iterations and accuracy.

From Linear to Nonlinear Problems As we have mentioned before, the main challenge of making DFPM efficient is the choice of the damping parameter and the time step. Obviously if V (u) is nonlinear, this choice cannot be made a priori as for linear problems. In this section we discuss some ways of dealing with the choice of damping and time step for nonlinear problems, some of them later appearing in the applications (see section “Applications”). We note that this section is a subject of future research, and

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2189

Table 1 Comparisons with the Landweber method, the Nesterov’s method, the Chebyshev method, and the CGNE method DP δ k ∗ (δ) CPU (s) L2Err Example 1 1.0400e-2 70 0.2023 0.1494 1.0380e-3 2872 8.9239 0.0945 1.0445e-4 56246 177.3387 0.0597 Example 2 1.0761e-2 15 0.0554 0.3676 1.0703e-3 93 0.2488 0.0637 1.1006e-4 453 1.4931 0.0195 Nesterov δ k ∗ (δ) CPU (s) L2Err Example 1 1.0400e-2 419 1.1384 0.2590 1.0380e-3 2813 8.5986 0.1600 1.0445e-4 16642 60.1179 0.1025 Example 2 1.0761e-2 102 0.3018 0.7043 1.0703e-3 416 1.1732 0.1676 1.1006e-4 1805 6.0793 0.0280

TEDP k ∗ (δ)

CPU (s)

L2Err

70 0.1995 0.1494 2871 1.9105 0.0945 56246 177.9603 0.0473 25 0.0753 132 0.3567 531 1.8000 Chebyshev k ∗ (δ) CPU (s)

Landweber k ∗ (δ) CPU (s) 20438 kmax kmax

L2Err

52.6428 0.2639 1.6899e3 0.1807 1.8048e3 0.1807

0.1987 1457 6.3672 0.0581 23790 67.6082 0.0184 187188 679.8274 CGNE L2Err k ∗ (δ) CPU (s)

0.7032 0.1767 0.0509 L2Err

264 2229 17443

0.7378 0.2553 6 7.6512 0.1496 18 52.2056 0.0897 39

0.0351 0.0904 0.2013

0.2213 0.1383 0.0894

62 415 2226

0.1768 0.7102 6 1.1908 0.1190 12 7.6967 0.0196 15

0.0148 0.0261 0.0309

0.4835 0.1514 0.0447

therefore we do not go into details but rather present main ideas and illustrate the potential of the method. Here we assume that V : Rn → R in (3) is convex and V ∈ C 2 (Rn ). However, we believe that many of the ideas presented here can be extended to more general Hilbert spaces. We differ between two ideas that seems possible to develop into efficient and globally convergent algorithms. The first one is to linearize the nonlinear problem and then apply optimal damping and time step for the symplectic method as in the linear case. The second idea is to use the total energy as a Lyapunov function to determine the time step either with constant damping or a time-dependent damping. These two approaches can be combined for better efficiency.

Local Linearization Using Optimal Damping and Time Step Let us assume we have some initial approximation u1 of the convex problem (3) together with an initial velocity v1 . By linearizing (4) around u1 and dropping higher-order terms, we get the dynamical system u(s) ¨ + ηu(s) ˙ = −∇V (u1 ) − H (u1 )(u(s) − u1 ), s ≥ 0

(53)

where H (u1 ) is the Hessian of V (u) at u1 with elements hij = ∂ 2 V /∂ui ∂uj . The initial conditions are set to u(0) = u1 , v(0) = v1 . We note that H (u1 ) is positive

2190

M. Gulliksson et al.

definite due to the convexity of V (u). Then the stationary solution of (53) is given by −∇V (u1 ) − H (u1 )(u − u1 ) = 0, i.e., u˜ 1 = u1 − H (u1 )−1 ∇V (u1 ). Moreover, u˜ 1 is the solution to the quadratic convex minimization problem 1 min V (u1 ) + ∇V (u1 )(u − u1 ) + (u − u1 )T H (u1 )(u − u1 ), u 2

(54)

and we recognize that u˜ 1 is actually a full Newton step from u1 . The system (53) can be solved as we have discussed earlier by a symplectic method. Let us now describe the overall algorithm. At iteration k assume that we have uk as an approximation of (3) together with a velocity vk and define tk as the time corresponding to iteration k. The linearized problem u(t) ¨ + ηu(t) ˙ = −∇V (uk ) − H (uk )(u(t) − uk ), u(tk ) = uk , v(tk ) = vk ,

(55)

where H (uk ) is the Hessian of V at uk then has the stationary solution u˜ k = uk − H (uk )−1 ∇V (uk ) which is also a Newton step at uk . By applying, e.g., a symplectic Euler for the dynamical system (55) with time step tk and damping ηk , we get wj +1 = Bk (tk , ηk )wj + ck , wj = (uj , vj ), j = 1, 2, . . . , w0 = (uk , vk ) (56) where Bk is given by B in (17) with A = H (uk ) and c = ck = (v(uk ) −  T H (uk )uk ) tk2 , tk . Furthermore, we choose tk , ηk according to Theorem 1 using the eigenvalues of H (uk ). To terminate the inner iterations (56) and to attain global convergence, i.e., uk − u∗  → 0, j → ∞ for any initial conditions u0 , v0 , we have chosen to use an Armijo rule, that is, we exit the inner iterations when V (uj +1 ) < V (uj ) + γ (uj − uk )T ∇V (uk ), γ ∈ (0, 1).

(57)

If the inner iterations are terminated at j = J, we set uk+1 = uJ +1 , vk+1 = vJ +1 and tk+1 = tk + tk . Since H (uk ) is positive definite, the inner iterations always converge and the corresponding solution will define a Newton step. Therefore, the inequality in (57) will eventually be satisfied, and the algorithm is thus globally convergent. For further details on proving global convergence using the Armijo rule, we refer to Bertsekas (2015). We attain a variant of the algorithm above if we solve the original nonlinear problem with a symplectic method using the parameters at each step obtained from the matrix Bk given above. This shows to be a rather promising approach (see section “Numerical Experiments”), even if it is not generally globally convergent. Obviously, the two approaches are only valid if the largest and smallest eigenvalue of H (uk ) can be estimated cheaply enough. On the other hand, the parameters tk and ηk do not necessarily have to be calculated in every step k and can be held constant when close to the solution. Moreover, an approximation of H (uk ), together

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2191

with its minimum and maximum eigenvalue, may be obtained from one rank updates as in quasi-Newton methods for further efficiency.

Total Energy as a Lyapunov Function As mentioned in section “Introduction”, the total energy in (6) is a Lyapunov function for the dynamical system (4). We can use this fact in order to find a globally convergent method for solving (3) or (4) following the ideas in Karafyllis and Grüne (2013). Let wk+1 (t) = F (wk , t) be the iterative map defined by a symplectic solver applied to the dynamical system of interest (see (7)). Here as before wk+1 (t) = (uk+1 (t), vk+1 (t)) where we have emphasized the dependence on t. Then total energy at k + 1 iterate is given as Ek+1 (t) = V (uk+1 (t)) + vk+1 (t)2 /2. The approach here is to determine the next time step tk as the solution to mint Ek+1 (t). However, to fully resolve this minimization problem would generally not be efficient. Therefore, we follow the idea used in Karafyllis and Grüne (2013) where Ek+1 (t) is approximately minimized using the algorithm below instead. Algorithm 1 A simplified time step determination based on the total energy Input: Initial time step t = tinit , ρ ∈ (0, 1). while Ek+1 (t) − Ek > −ρ t ηvk+1 (t)22 t = t/2 and compute uk+1 (t), vk+1 (t), Ek+1 (t) end tk = 2t

Using Theorem 8 in Karafyllis and Grüne (2013), it can be proven that wk → w ∗ = (u∗ , 0), k → ∞ where u∗ is the minimum of V (u) in (3). There are more sophisticated ways of choosing tinit and tk in the algorithm above described in Karafyllis and Grüne (2013) that can improve the efficiency of DFPM and lead to fewer iterations.

Numerical Experiments We consider the problem min V (u),

u∈Rn

V (u) = exp(uT diag(a)u)

a ∈ Rn ,

ai > 0.

The condition ai > 0 yields convexity of V (u) and the existence of the unique solution u∗ = 0. We calculate ∇V (u)i = 2 exp(uT diag(a)u)ai ui , and the Hessian H (u) with elements hij (u) = exp(uT diag(a)u)(4ai ui aj uj + 2ai δij ), where δij

2192

M. Gulliksson et al.

10-2 10-4

10-8

10

20

30

40

50

Iterations Fig. 3 Convergence of the algorithm based on Algorithm 1

stands for Kronecker delta. The values ai are chosen as uniformly random values in (0, 1]. The intention is not to derive the most efficient implementation but to illustrate the behavior of our suggested algorithms. Therefore, we do not report CPU times but only the number of iterations. We have implemented and tested the four different algorithms. The first method is a symplectic Euler given as 

vk+1 = (I − tk η)vk − tk ∇V (uk ) uk+1 = uk + tk vk+1

(58)

where tk is chosen according to Algorithm 1 with ρ = 0.9 and η is constant. In order to get a more efficient algorithm, we switch to optimal parameter values based on (18) when ∇V (uk )2 < 10−2 , and the parameters are kept constant from then until convergence. A typical convergence behavior is shown in Fig. 3. In the figure we show the decrease of the norm of the gradient, V (uk ), total energy, and the norm of the velocity, vk . The vertical line indicates the first iterate where the damping and time step are being kept constant. The second algorithm shown in Fig. 4, denoted by uNesterov , is the famous Nestorov’s method uk+1 = uk + γ (uk − uk−1 ) − ω∇V (uk ) where we have used γ = (k − 1)/(k + α − 1), α = 3, and ω = 0.6.

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2193

10-1 10-2

10-4

10-6

10-8 10

20

30

40

50

Iterations Fig. 4 Convergence of three different algorithms based on different choices of damping and time step; see the legend in the plot. The dimension of the problem is n = 100, and initial condition is chosen uniformly random as u0 , v0 , u0 2 = v0 2 = 1 and is the same for all algorithms eig

The third algorithm with approximations uk is based directly on the optimal parameters in (18), say tk , ηk , given by the smallest and largest eigenvalue of H (uk ). The symplectic Euler iterations then are given by 

vk+1 = (I − tk ηk )vk − tk ∇V (uk ) uk+1 = uk + tk vk+1

(59)

In order to improve efficiency, we keep the time step and damping constant when ∇V (uk ) < 10−2 which is indicated with a vertical line in Fig. 4. Note that this algorithm is not globally stable and did indeed diverge for some starting points. The final algorithm we use here is based on the linearization in section “Local Linearization Using Optimal Damping and Time Step”. We use constant parameters when ∇V (uk ) < 10−4 where then the iterations are given by (58). In Fig. 5 the error in the solution is shown as a function of number of total inner iterations given by (56). The stop criterion for the inner iterations is based on the Armijo rule (57) but for illustration purposes we added a criterion of minimal inner iterations as described in the caption. Note that for more inner iterations, the convergence rate gets closer to quadratic since then the iterations are very close to Newton iterations. Even for an increasing number of inner iterations, it is not evident how many inner iterations are the most efficient since it depends on the cost for calculating the Hessian and its smallest and largest eigenvalue. For comparison we used a Polak-Ribiere-Polyak Constrained by Fletcher-Reeves nonlinear conjugated gradient method with line search based on the strong Wolfe condition. In average it had the same number of iterations as the third algorithm

2194

M. Gulliksson et al.

10-1 10-2 10-4 One inner iteration 5 inner iterations 20 inner iterations

10-6 10-8 1

10

20

30

40

50 60 Iterations

70

80

90

100

Fig. 5 Convergence of the solution uk − u∗  for the linearized approach in section “Local Linearization Using Optimal Damping and Time Step” using the Armijo rule (57) where γ = 10−4 . On the horizontal axis is the total number of inner iterations. The number of minimal inner iterations (56) are one, marker ‘+’, 5 marker ‘×’, and 20 marker ‘*’, respectively

presented here. However the methods cannot be compared in CPU time before we make DFPM more efficient using suggestions in sections “Local Linearization Using Optimal Damping and Time Step” and “Total Energy as a Lyapunov Function”.

Applications Image Analysis Image denoising is a fundamental task in image processing. An essential challenge for imaging denoising is to remove noise as much as possible without eliminating the most representative characteristics of the image, such as edges, corners, and other sharp structures. Traditional denoising methods assume that some information about the noise is given. The problem of blind image denoising involves computing the denoised image from the noisy one without any a priori knowledge of the noise. The energy functional approach has in recent years been very successful in blind image denoising. The approach is to minimize the energy functional which is, most often, taking the form E(u) =

1 2



 (u − u0 )2 dx + α 

(|∇u|)dx.

(60)



Here u0 (x) is the observed (noisy) image, and  ⊂ Rd (d = 2, 3) is a bounded domain with almost everywhere C 2 smooth boundary ∂. The first term in (60) is a fidelity term, the second term is a regularization term, and α > 0 is the regularization parameter. The regularization term (|∇u|) is usually assumed to be strictly convex. A well-studied case of E(u) is when the regularization term is the p-Dirichlet energy, i.e., (|∇u|) = |∇u|p /p, p ≥ 1. The case p = 1 corresponds to the total

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2195

variation principle (see Scherzer et al. 2009), and p > 1 has been studied in, e.g., Baravdish et al. (2015). The Euler-Lagrange equation ∂E/∂u = 0 associated with the functional E(u) is given by     (|∇u|) u − u0 + α · div  |∇u| ∇u = 0 in , (61) ∂u ∂n = 0 on ∂, where n is the outward normal to the boundary ∂. For the p-Dirichlet energy, let α → 0, and we obtain the first-order flow ⎧ ⎨ u˙ − p u = 0 in (0, T ) × , (62) u(x, 0) = u0 (x) in , ⎩ ∂u on ∂, ∂n

 where the p-Laplace operator is defined by p u = div |∇u|p−2 ∇u . The p-parabolic equation in (62) has been studied intensively in Roubíˇcek (2013) and references therein. Motivated by DFPM we consider the following PDE instead of (62) ⎧ u¨ + ηu˙ − p u = 0 in (0, T ) × , ⎨ (63) ˙ 0) = 0 in , u(x, 0) = u0 (x), u(x, ⎩ ∂u = 0 on ∂, ∂n where again η > 0 is the damping parameter. In order to overcome the ill-posedness of the formulation (63), we introduce regularization. Let Gσ (x) be the Gaussian kernel Gσ (x) =

2 1 − |x| 2σ , e (2π σ )N/2

σ > 0,

and  denote the cross-correlation,  (f  g)(x) =

Rd

f¯(y) g(x + y) dy,

where f¯ is the complex conjugate of f. Then we consider   ⎧ p−2 2 ⎪ ⎨ u¨ + ηu˙ − div (ε + |∇Gσ  u| ) 2 ∇u = 0 in (0, T ) × , ˙ 0) = 0 in , u(x, 0) = u0 (x), u(x, ⎪ ⎩ ∂u ∂n = 0 on ∂. where ε > 0 is a fixed small number and |∇Gσ  u| =

 N

j =1



∂Gσ ∂xj

(64)

2 1/2 u .

2196

M. Gulliksson et al.

Theorem 6. Assume that p ∈ [1, 2] and u0 ∈ H 2 (). Then a unique weak solution to problem (64) exists if T is sufficiently small with an upper bound depending on ||u0 ||H 1 () , Gσ and . For the proof we refer to Baravdish et al. (2018). Now, let us consider the numerical algorithm based on the equation (64). For simplicity and clarity of statements, we assume  is a rectangle region in R2 and consider a uniform grid MN = {(xi , yj )}M,N i,j =1 in  with the uniform step size

k h = xi+1 − xi = yj +1 − yj . Define u(t) = [u(xi , yj , t)]M,N i,j =1 . Denote u as the projection of u(x, y, t) at the spacial grid MN and

time point t = tk . We approximate the div (a ε (u)∇u) by a linear one – div a ε (uk−1 )∇uk , where

a ε (u) = (ε + |∇Gσ  u|2 ) have

p−2 2

. Using the central difference discretization rule, we

   

 ε,k−1 ε,k−1 div a(uε,k−1 )∇uk = Dx, h ai,j Dx, h uki,j + Dy, h ai,j Dy, h uki,j 2  2 2 2    k k k k ε,k−1 ai,j

u

i+ 21 ,j

−u

u

i− 21 ,j

i,j + 1

−u

i,j − 1

ε,k−1 2 2 + Dy, h ai,j h h 2 2   ε,k−1 k ε,k−1 ε,k−1 ε,k−1 ε,k−1 k uki,j u + a u − a + a + a + a = h12 a ε,k−1 1 i− 12 ,j i−1,j i,j − 12 i,j −1 i,j − 12 i+ 12 ,j i,j + 12 i− 2 ,j k +a ε,k−11 uki,j +1 + a ε,k−1 1 ui+1,j ,

= Dx, h

i,j + 2

i+ 2 ,j

(65)

where a ε,k−1 i− 12 ,j

 p−2  2 k−1 2 = ε + |∇Gσ  u 1 | , uk−11 i− 2 ,j

i− 2 ,j

=

k−1 uk−1 i−1,j + ui,j

2

,

(66)

and ∇Gσ is the projection of the function ∇Gσ on the same grid MN .  ∈ RMN by stacking the Given a matrix u ∈ RM×N , one can obtain a vector u M columns of u. This defines a linear operator vec : R × RN → RMN , vec(u) = (u1,1 , u2,1 , · · ·, uM,1 , u1,2 , u2,2 , · · ·, uM,1 , · · ·, u1,N , u2,N , · · ·, uM,N )T  = vec(u), u  q = ui,j , q = (i − 1)M + j. u This corresponds to a lexicographical column ordering of the components in the matrix u. The symbol array denotes the inverse of the vec operator. That means the following equalities hold array(vec(u)) = u,  ∈ RMN . whenever u ∈ RM × RN and u

, vec(array( u)) = u

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2197

 k , where Based on the above definition, rewrite (65) as the matrix form, Fk−1 u k−1 k−1  . the matrix F is dependent only on u ˙ k then the Störmer-Verlet method for the PDE (64) gives the Denote vk = u scheme   ⎧ 1 1 ⎪  k − ηvk+ 2 , vk+ 2 = vk + t2 k Fk−1 u ⎪ ⎪ ⎪ 1 ⎨ k+1   k + tk vk+2 , u =u  1 1 ⎪ ⎪  k+1 − ηvk+ 2 , vk+1 = vk+ 2 + t2 k Fk u ⎪ ⎪ ⎩ 0 = u  δ0 , v0 = 0, u

(67)

 δ0 = vec(uδ0 ) and uδ0 is the projection of uδ0 (x) on the grid MN . It has been where u shown in Baravdish et al. (2018) that all eigenvalues of Fk (k = 1, . . . , MN ) are (k) nonpositive. Let λmax > 0 be the largest eigenvalue; then the following result holds. (k) Theorem 7. The scheme above is convergent provided tk ≤ η/ λmax . Various stopping criteria exist for an iteration algorithm (Scherzer et al., 2009). In principle, the stopping criterion for image denoising problems should be proposed case by case. In real-world problems, in order to obtain a high qualified denoised image, a manual stopping criterion is always required, especially for the PDE-based denoising technique. Nevertheless, an automatic stopping criterion can be helpful to select a good initial guess of the denoised image. Here, we use a frequency domain threshold method based on the fact that noise is usually represented by high frequencies. Define energy of high frequencies by N0 (u) =



|F(u)(i, j )|2 ,

i+j ≥N0

where F(u) denotes a 2D discrete Fourier transform of an image u and N0 presents the high frequency index. In the simulation, we set N0 = 0.6N 2 . Define by RDE(k) = |N0 (uk ) − N0 (uk−1 )|/N0 (uk−1 ) the relative denoising efficiency. Then, the value of RDE at every iteration can be used as a stopping criterion. Based on this stopping criterion, an algorithm for imaging denoising is proposed in Algorithm 2.

2198

M. Gulliksson et al.

Noisy SSIM=0.108

DF P M SSIM=0.549

TV SSIM=0.549

O r igin a l

MTele SSIM=0.492

Tele SSIM=0.494

Fig. 6 Example results for DFPM, TV, MTele, and Tele methods

Algorithm 2 DFPM for imaging denoising Require: Observed noisy image uδ0 . Parameters η and p. Tolerance ε. Ensure: A denoised image uˆ ← array( uk ).  0 ← vec(u0 ), v0 ← 0, t0 ← λmax (F0 ), F−1 ← F0 , RDE(0) ← 1, k ← 0 1: u 2: while RDE(k) > ε do   −1  1 k 3: vk+ 2 ← 1 + t2 k η · vk + t2 k Fk−1 u 4: 5: 6: 7: 8: 9:

 k+1 ← u  k + tk vk+ 1 u 2  1 1  k+1 − ηvk+ 2 vk+1 ← vk+ 2 + t2 k Fk u k ←k+1 RDE(k) ← |N0 (uk ) − N0 (uk−1 )|/N0 (uk−1 ) tk ← λmax (Fk ) end while

In order to show the advantages of our algorithm over the existing approaches, we solve the same problem by the following methods: total variation (TV), modified telegraph (MTele), and telegraph (Tele). The test degraded image is given in the first

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2199

picture of Fig. 6, while the denoised images are displayed in the last five pictures. We see that DFPM, together with the TV methods, gives the highest Structural Similarity Index (SSIM) value 0.549. It also gives a smoother result than the other methods. Note that in Fig. 6 we have chosen the terminating time which produces the highest SSIM value. We also indicate that in general, with an appropriate choice of damping parameter η, DFPM exhibits an acceleration phenomenon, i.e., with an earlier termination time, the reconstructed image by DFPM presents a higher SSIM value than the denoised image by TV. A rigorous theoretical analysis on the acceleration of the damped flow (64) will be addressed in future work.

Inverse Problems for Partial Differential Equations The interaction between partial differential equations and inverse problems has produced remarkable developments in the last couple of decades, partly due to its importance in applications of mathematics, science, and engineering, such as inverse scattering and inverse spectral problems (Chadan et al., 1997), inverse homogenization problems (Gulliksson et al., 2018), inverse chromatography problems (Cheng et al., 2018; Lin et al., 2018b; Zhang et al., 2016b, 2017b), parameter identification problems (Lin et al., 2018a), etc. Here we consider an inverse source problem (Cheng et al., 2014; Zhang et al., 2018a,b). Let  ⊂ Rd (d = 2, 3) be a bounded domain with Lipchitz boundary ∂. Given g1 and g2 on ∂ we are concern with finding the source p such that the solution u of the boundary value problem (BVP) 

−u + cu = pχ0 in , ∂u ∂n = g2 on ∂,

(68)

u = g1 on ∂.

(69)

satisfies

Here c is a positive constant, ∂/∂n stands for the outward normal derivative, 0 ⊂  is known as a permissible region of the source function, and χ is the indicator function such that χ0 (x) = 1 for x ∈ 0 , while χ0 (x) = 0, when x ∈ 0 . In order to find p, we introduce the minimization problem minp V (p) where the minimum is taken over an admissible set that incorporates a priori information about the source function p. Here for simplicity we assume it is given by L2 (0 ). The data fitting term V (p) may have different forms. For instance, in Han et al. (2006), V (p) =

1 u(p) − g1 2L2 (∂) 2

(70)

with u(p) being the weak solution of (68). Another choice of V (p) is using the Kohn-Vogelius-type functionals which are expected to give more robust optimization procedures when compared with the boundary fitting formulation (Afraites

2200

M. Gulliksson et al.

et al., 2007). In this approach (see, e.g., Song and Huang 2012), V (p) =

1 u1 (p) − u2 (p)2L2 () 2

(71)

with u1 , u2 ∈ H 1 () being the weak solutions of −u1,2 + cu1,2 = pχ0 with Dirichlet and Neumann data, respectively. Recently, in Cheng et al. (2014), a novel coupled complex boundary method (CCBM) was introduced where the Dirichlet data g1 and the Neumann data g2 are used simultaneously in a single BVP and the data fitting term V (p) takes the form V (p) = with u = ure + iuim (i =

1 uim (p)2L2 () , 2

(72)

√ −1 is the imaginary unit) solving 

−u + cu = pχ0 in , ∂u ∂n + iu = g2 + ig1 on ∂.

(73)

In practice the data g1 and g2 are noisy, and therefore one needs to introduce regularization. Thus, we consider pε = arg min Vε (p), p

ε Vε (p) := V (p) + p2L2 ( ) , 0 2

(74)

where ε > 0 is a regularization parameter. Under certain assumptions, (74) admits a unique stable solution pε , which converges to p∗ , the solution of the original inverse source problem with minimal L2 -norm, as ε → 0 (see Han et al. 2006, Cheng et al. 2014, and Song and Huang 2012 for the three forms of V (p) in (70), (71), and (72), respectively). To solve the optimization problem (74), one can use DFPM. To that end, we consider the damped dynamical system p(t) ¨ + ηp(t) ˙ + ∇Vε (p) = 0,

(75)

where η > 0 is a damping parameter and ∇Vε is the gradient of Vε . The regularization parameter ε could be chosen constant or as a function of the artificial scalar time t. In the first case, ε must be chosen a priori which is not always possible. In the latter case, no such information is required. It turns out that given ε(t) → 0 and some mild assumptions the monotonicity of ε(t), the solution of (75) p(t) → p∗ as t → ∞. We call the approach above as the damped dynamical regularization method. Below we give more details of the CCBM for inverse source problem with noisy data. Suppose that, instead of the exact boundary data {g1 , g2 }, we are only given approximate ones {g1δ , g2δ }, such that gkδ − gk L2 (∂) ≤ δ, k = 1, 2, where δ reflects the magnitude of noise in the measurements. Then, u = ure + iuim solves

82 Damped Dynamical Systems for Solving Equations and Optimization Problems



−u + cu = pχ0 in , ∂u δ δ ∂n + iu = g2 + ig1 on ∂.

2201

(76)

It is not difficult to show that the second Fréchet derivative V  (p)q 2 = uim (q)−uim (0)2L2 () . Hence, V (p), and thus Vε (p), is convex. The next theorem allows to obtain ∇p V (p) using an adjoint problem. Theorem 8. The Fréchet derivative of the convex functional V (p), defined in (72), is the imaginary part of the solution on 0 to the adjoint problem 

−w + cw = uim (p) in , ∂w on ∂, ∂n + iw = 0

(77)

where uim is the imaginary part of u, the solution of (76), i.e., ∇p V (p) = wim (p)χ0 . Let t ∈ [t0 , T (δ)] where T (δ) → ∞ when δ → 0. We set T (δ) = 1/δ. However, for another T (δ) the same results with slight modification will hold. We formulate the main results. Theorem 9. Assume that the dynamical regularization ε : [t0 , ∞) → (0, 1] satisfies the following conditions

parameter

(i) ε(t) → 0 as t → ∞, (ii) ε˙ (t) ≤ 0 on [t0 , ∞) and ε˙ (t) → 0 as t → ∞, ∞ (iii) t0 ε(t)dt = ∞. Then, the following statement holds: (a) For each pair (p0 , p˙ 0 ) ∈ L2 (0 ) × L2 (0 ), there exists a unique solution pδ : [t0 , T (δ)] → L2 (0 ) of the Cauchy problem 

δ (x, t) + ε(t)p(x, t) = 0, x ∈  , t ∈ [t , T (δ)], p(x, ¨ t) + ηp(x, ˙ t) + wim 0 0 ˙ t0 ) = p˙ 0 (x), x ∈ 0 , p(x, t0 ) = p0 (x), p(x, (78) δ + iw δ is the solution of the adjoint problem with the same t where w δ = wre im



−w(x, t) + cw(x, t) = uδim (pδ (x, t)), x ∈ , t ∈ [t0 , T (δ)], ∂w(x,t) + iw(x, t) = 0, x ∈ ∂, t ∈ [t0 , T (δ)], ∂n

and uδ = uδre + iuδim is the solution of the BVP

(79)

2202

M. Gulliksson et al.



−u(x, t) + cu(x, t) = pδ (x, t)χ0 , x ∈ , t ∈ [t0 , T (δ)], . ∂u(x,t) δ δ ∂n + iu(x, t) = g2 (x) + ig1 (x), x ∈ ∂, t ∈ [t0 , T (δ)].

(80)

(b) For any fixed t ∈ [t0 , T (δ)], pδ (t) → p(t) in L2 (0 ) as δ → 0. Here, p(t) denote the solution of system (78), (79), and (80) with noisy-free boundary data. (c) Both p˙ δ (T (δ)) and p¨ δ (T (δ)) converge to zero in L2 (0 ) when the noise level δ vanishes. (d) Given p∗ be the minimal L2 (0 )-norm solution of problem (72) with noisy-free boundary data, pδ (T (δ)) → p∗ in L2 (0 ) as δ → 0. The proof of Theorem 9 can be found in Zhang et al. (2018a). Obviously, the dynamical regularization parameter satisfying the conditions of Theorem 9 can, e.g., be chosen as the following functions: ε(t) = C/t, C/ log(t), C/(t log(t)), etc., where C is a constant, and we choose t0 large enough so that the value of ε(t) is restricted in (0, 1]. The constant C does not have an effect on the value of the approximate solution. However, it influences the speed of the numerical solver. With an appropriate value of C, one may obtain a good enough approximate solution by our algorithm, within a few iterations. By using standard linear finite elements on a triangular mesh with size h on the weak formulations of (79) and (80), we attain discrete approximations uh (x), w h (x) h (x). Further we assume that p h (x) with corresponding imaginary parts uhim (x), wim is the finite element ansatz in the same finite element space but restricted on 0 . The finite element approximations introduced are now all assumed to be time dependent in order to formulate our semi-discrete dynamical system corresponding to (78). By defining q h (x, t) = p˙ h (x, t), we can formulate the semi-discrete dynamical system as first-order ⎧ h h χ , ⎨ q˙ = −ηq h − εph − wim 0 h h p˙ = q , ⎩ h p (t0 ) = p0h , q h (t0 ) = p˙ 0h .

(81)

We then apply the symplectic Euler to get 

h ⎧ h h h h h ⎨ qk+1 = qk − t ηqk+1 + εk pk + wim (pk )χ0 , h , ph = pkh + tqk+1 ⎩ hk+1 h h q (t0 ) = p˙ 0 , p (t0 ) = p0h ,

(82)

where pkh = ph (tk ) and t is a fixed time step size. Theorem 10. Let pkh be the finite element solution of the scheme (82) with noisy boundary data and p∗ be the minimal L2 (0 )-norm solution of the noise-free problem (72). If the iteration number of scheme (82) satisfies k(t, δ)tδ → ∞ h as (δ, t) → 0, then pk(t,δ) − p∗ 0, → 0 as (δ, t, h) → 0. Here δ is the

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2203

noise level of the boundary data, and t and h are the discretization parameters for problems (78), (79), and (80) in time and in space, respectively.

Numerical Simulations We let  = {(x, y, z) ∈ R3 | x 2 + y 2 < 1, 0 < z < 2}, c = 1, the true source p∗ (x, y, z) = 10(1 + x + y + z) be defined on 0 = {(x, y, z) ∈  | (x − 0.5)2 + (y − 0.5)2 + (z − 1)2 ≤ 0.32 }, and the nonnoisy Neumann data be g2 = 0. We solve (73) with p = p∗ in order to produce g1 . It was computed on a fine mesh with h = 0.1050, 55637 nodes and 316565 elements. Then uniformly distributed noises with level δ are added to both g1 and g2 to get g1δ and g2δ , that is, giδ (x, y, z) = [1 + δ · (2 rand(x, y, z) − 1)] gi (x, y, z),

x, y, z ∈ ∂,

i = 1, 2,

where rand(x, y, z) returns a pseudorandom value drawn from a uniform distribution on [0, 1] for each fixed (x, y, z). With properly chosen parameters η, t, ε(t), the scheme (82) was implemented to get ph , a stable approximation of p∗ . We chose t0 = 2 and p0 = p˙ 0 = 0, which is far from the true source p∗ . With our method, the approximate source function ph is recovered on a mesh with h = 0.4068, 1077 nodes, 5157 elements for various values of the parameters δ with t = 10, η = 1, ε(t) = 0.1/(t ln(t)). The approximate source functions are plotted in Figs. 7 and 8. The corresponding relative error L2Err and iterative number are 7.8728e-2, 7.8381e-2, 7.9860e-2, 7.7507e-2, 8.6974e-2, 7.0421e-2, and 1.0757e-1 and 5760, 5936, 6261, 3805, 1651, 456, and 494, respectively. We conclude from these numerical results that the reconstruction is accurate and stable. Moreover, the regularized solution is easily attained without having to use any additional equation for the regularization parameter.

Applications in Quantum Physics A quantum particle can be described through a complex wave function that solves the Schrödinger equation, which in the DFPM formulation is a time- and spacedependent linear PDE. For many interacting quantum particles, the dimensionality of the corresponding Schrödinger equation normally makes it unsuitable for direct computations. However, in various situations so-called mean-field methods can yield an approximate wave function through the solution of a nonlinear Schrödinger equation. In the following section, we present three such different examples. Note that the notation in this section is not standard in quantum physics where usually E denotes total energy and V the potential in the Hamiltonian.

Excited States to the Schrödinger Equation We start with an example of how to calculate the well-known normalized wave functions (eigenstates) u and energies (eigenvalues) μ to the linear Schrödinger

2204

M. Gulliksson et al.

Fig. 7 Exact source p ∗ and reconstructed source p h for δ = 0, 0.005, 0.01

equation with a harmonic potential (here in dimensionless units) 1 1 − uxx + x 2 u = μu, uL2 (R) = 1. 2 2

(83)

We note that (83) is an infinite dimensional counterpart of the linear eigenvalue problem discussed in section “Linear Eigenvalue Problems” where the energy level μ corresponds to the unknown positive eigenvalue. That is, we are solving the following optimization problem 2

d 1 2 min V (u) = 12 (u, H u)L2 (R) , H = − 12 dx 2 + 2x , u

s.t.uL2 (R) = 1. We emphasize that even if (83) is a simple problem with exact solution, see below, it has all the relevant properties of more challenging eigenvalue problems such as the ones treated in Gulliksson et al. (2012) and is thus excellent for illustrating DFPM.

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2205

Fig. 8 Reconstructed source p h for δ = 0.05, 0.1, 0.2, 0.3

Let u(m) denote the m’th eigenstate and μ(m) = (u(m) , H u(m) ) the corresponding eigenvalue. To obtain u(m) we use in addition to (83) the m − 1 orthogonality constraints    (1) (m) (2) (m) u¯ u dx = 0, u¯ u dx = 0, . . . , u¯ (m−1) u(m) dx = 0, (84) R

R

R

where, as before, the bar denotes complex conjugation. Using (83) and (84) we formulate the dynamical system and the m constraints compactly written using Kroneckers delta, that is, (m)

u¨ (m) + ηu˙ (m) = 12 uxx − 12 x 2 u(m) + μ(m) u(m) , gn =

 R

u¯ (n) u(m) dx − δnm = 0, n = 1, 2, . . . , m.

(85)

Since one need access to u(n) , n = 1, 2, . . . , m − 1, we solve (85) in consecutive order. Here we have used the damped constrained approach described

2206

M. Gulliksson et al. 101

5

n=1 n=2 n=3 n=4 n=5

10-1

2.5

0 -5

0

5

10-3

0

100

200

300

Fig. 9 Left figure: The five lowest eigenstates u(n) , n = 1, 2, . . . , 5 of (83) sorted by their respective energy μ(n) . Dotted lines correspond to the numerical solution and the rings to the analytic solutions (86). The dashed black curve shows the potential x 2 /2. Right figure: The convergence of the numerically calculated five energies μ(n) (t), n = 1, 2, . . . , 5

in section “Linear Eigenvalue Problems” with symplectic Euler where parameters t and η were chosen according to (25). The Eq. (83) possesses the explicit solutions  2 1 x  u = , μ(n) = n− , n = 1, 2, 3, . . . , Hn−1 (x) exp − 1/4 (n−1) 2 2 2 π (n − 1)! (86) where Hk denote the Hermite polynomial of degree k. In the left plot of Fig. 9 we plot the first five wave functions u(n) , n = 1, 2, . . . , 5 for the harmonic potential. The wave functions are sorted according to their energy μ(n) = n − 1/2 which is reflected by the following inequality u(1) (|5|) < . . . < u(5) (|5|). We used −20 ≤ x ≤ 20 and x = 10−2 for the discretization. The parameters η and t were chosen optimally according to (25). We used the damped constraint approach (see section “Linear Eigenvalue Problems”) with k1 = k2 = . . . = k5 = 5. In the right plot we give examples of the convergence for the corresponding energies μ(n) as t → ∞.. (n)

1

The Yrast Spectrum for Atoms Rotating in a Ring In a recent development of DFPM, we investigated a one-dimensional nonlinear Schrödinger equation with a rotational term on a ring geometry with radius R = 1, i.e., I = {x ∈ R : −π < x ≤ π} with periodic boundary conditions. The aim is to minimize π |∇u|2 + γ π |u|4 dx,

V = −π

subject to the constraints for normalization and the total angular momentum

(87)

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2207

10

Fig. 10 Yrast curve, i.e., energy vs momentum, with some examples of the density and the phase for the wavefunction u for γ = 7.5

0.2

0.2

0.1

0.1

0 -

0

0 -

x

0 x

5

0 -

0

-

0 -

0

0

-

-

0

1

π |u| dx − 1 = 0, g2 = −i 2

−π

0

0.5

π g1 =

-

0

uu ¯ x dx −  = 0.

(88)

−π

Together with the normalization and the momentum constraint, the equation is given as Sandin et al. (2016). u¨ + ηu˙ = uxx − 2π γ |u|2 u − iux + μu, u ∈ L2 (I ).

(89)

In this case DFPM was implemented with a modified RATTLE method (Andersen, 1983), in which we solved for the Lagrange parameters μ and − in each time step (see Sandin et al. 2016 for details). In Fig. 10 we have plotted the resulting so-called Yrast curve (main figure) with the density and phase of the corresponding complex wave function u for the particularly interesting points  = 0, 0.5, 1 (inset √ figures). At integer values of  (0, 1 here) u is a plane wave u = exp (ix) / 2π . At half-integer values (e.g.,  = 0.5), u corresponds to a dark soliton that circulates in the ring (see the right-upper- and mid-lower-inset figures). In Sandin et al. (2016) it was demonstrated that DFPM outperformed another commonly used first-order method.

Phase Separation of Bosonic- and Fermionic-Densities in an Ultracold Atomic Mixture In a recent work (Abdullaev et al., 2018) DFPM was used to calculate the initial conditions for a real-time propagation of coupled nonlinear Schrödinger-like-equations

2208

M. Gulliksson et al.

to model a so-called Fermi-Bose mixture. This is an exotic state of matter consisting of two interacting different atomic species at ultracold temperatures. The aim is to minimize  Vtot =

g1 π2 |∇un |2 +Vn(ext) |un |2 dx, |u1 |4 + |u2 |6 +(g12 + g21 ) |u2 |2 |u1 |2 + 2 12 n=1,2

(90) subject to  |un |2 dx − Nn,0 , n = 1, 2,

gn =

(91) (ext)

where N1,0 = 1000 and N2,0 = 200 are the number of atoms and V1,2 (ext) V1,2

are the

external potentials, which in this example was harmonic = finally we kept the bosonic coupling g1 = 1 and varied the interatomic interaction g12 = g21 in the numerical example below. In order to solve the problem above, we formulated the coupled dynamical system x 2 /4;

  (ext) − g1 |u1 |2 − g12 |u2 |2 + μ1 u1 u¨ 1 + η1 u˙ 1 = ∇ 2 − V1   2 (ext) u¨ 2 + η2 u˙ 2 = ∇ 2 − V2 − π4 |u2 |4 − g21 |u1 |2 + μ2 u2 ,

(92)

where μ1 and μ2 are the Lagrange parameters. Using the dynamical formulation for the constraints (see (11)), we evaluate μn (t), n = 1, 2, at each iteration step tk from  μ1 =

|∇u1 |2 + V1(ext) |u1 |2 + g1 |u1 |4 + g12 |u2 |2 |u1 |2 − |u˙ 1 |2 dx −  |u1 |2 dx

k1 2



|u1 |2 dx − N1,0

 ,

(93)

and  μ2 =

|∇u2 |2 + V2(ext) |u2 |2 +

π2 4

|u2 |6 + g21 |u1 |2 |u2 |2 − |u˙ 2 |2 dx −  |u2 |2 dx

k2 2



|u2 |2 dx − N2,0

 .

(94)

For the simulations we used a fourth-order Runge-Kutta method in the XMDS2 code generator (www.xmds.org) with η1 = η2 = k1 = k2 = 1 and tk = 0.01. We have studied the atomic densities |u1 |2 and |u2 |2 of the mixture for different values of g12 = g21 = ∓π (see Fig. 11). For attractive interatomic forces g12 = g21 < 0, the densities overlap in order to minimize the total energy (see the left plot in Fig. 11). For repulsive interatomic forces g12 = g21 > 0 the densities instead (partly) separate from each other in order to minimize the total energy, see the right plot in Fig. 11.

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2209

Fig. 11 Ground-state densities for an ultracold Fermi-Bose mixture with different interatomic interaction with the attractive interaction g12 = −π (left) and the repulsive interaction g12 = π (right). In both figures the blue curve shows the bosonic density, and the red dashed curve shows the fermionic density

Fig. 12 The left figure shows an example of convergence for the total energy (blue curve) and normalization constraints (red and green dashed curves). The right figure shows a landscape for the inverse number of iterations for different parameter values. The black thick curve corresponds to the result using projection method (see text)

In Fig. 12 (left) we have plotted the total energy Etot and the dynamical constraints g1 (t) and g2 (t) at each iteration step to demonstrate convergence while keeping the parameters constant η1 = η2 = k1 = k2 = 1. We then performed a numerical (sub-)optimization of the parameters η1 = η2 and k1 = k2 to minimize the number of iterations needed for DFPM to reach the ground state. The corresponding landscape for the inverse number of iterations needed to reach |Etot − Eref | < 10−8 is shown in the right part of Fig. 12. Here Eref is the ground-state energy, calculated numerically for a large time (i.e., formally t → ∞). The corresponding result for using projection instead of dynamic constraints for the two normalization conditions, i.e. with only η1 = η2 being the free parameter, was also calculated and is plotted in the right part of Fig. 12. In some sense this projection corresponds to the limit k1 = k2 → ∞, and qualitative agreement can be seen, especially in the right part of the figure, corresponding to the overdamped regime. The resolution of the parameters in the right figure is η1 = k1 = 0.1. In all cases we used Gaussian initial conditions for un , n = 1, 2,

2210

M. Gulliksson et al.

 with initial normalization |un |2 dx = 2Nn,0 , n = 1, 2, (in order to have any substantial dynamic for the constraints) and interatomic interaction g12 = −π . For this example with only normalization constraints, we cannot conclude an advantage of the DFPM version with dynamic constraints, compared to DFPM with projection, with respect to the number of iterations. However, it is generally known that projection requires a smaller time step to maintain stability.

Conclusions and Future Work We have described a new approach, DFPM, to solve optimization problems and equations using a second-order damped dynamical system together with symplectic methods. The strength of the method lies in the combination of a globally exponentially stable system together with an energy preserving symplectic method. Based on the work presented here, we believe that DFPM has the capacity to solve a variety of problems more efficiently and accurate than existing methods. This is shown to be true for linear problems such as linear system of equations, linear eigenvalue problems, and linear least squares problems, see section “Linear Problems” where it is fairly easy to choose the parameters in DFPM efficiently. A straightforward extension of the linear eigenvalue problems in section “Linear Eigenvalue Problems” is to consider the generalized eigenvalue problem Au = λBu where A, B are positive definite matrices. Optimal choices of parameters can be derived by a simultaneous transformation of A, B to diagonal matrices and then estimated numerically. The corresponding Rayleigh quotient is easily calculated for finding the approximation of the generalized eigenvalue in each iteration. A major challenge here is to approximate the optimal parameters efficiently. However, because of the simplicity of DFPM we believe that DFPM might be more useful for nonlinear eigenvalue problems A(λ)u = 0 where A(λ) most often is a polynomial, a rational function, or containing the exponential function. For ill-posed linear problems, considered in section “Ill-Posed Problems” the method has the advantage of being highly competitive with other existing iterative methods as well as not requiring an a priori choice of regularization parameter needed such as in, e.g., Tikhonov regularization methods. It would be most interesting to further develop DFPM toward nonlinear ill-posed problems following the approach we have developed for the linear case. It will be a challenge to find good convergence criteria for the iterations as well as an efficient and robust way of choosing the damping parameter. Regarding inequality constraints, we are currently considering using reflections for simple inequality bound constraints based on the idea of reflecting particles (see Kaufman et al. 2012). A major challenge here is to construct the reflections such that the numerical method for solving the damped dynamical system will conserve the energy without losing accuracy. As a simple example we can take the optimization problem min V (u) with positivity constraints u > 0. In step k of the numerical method for solving (4), we attain an updated position uk+1 . If

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2211

(k+1)

any component of uk+1 , say ui , is nonpositive a new update is defined for that component by finding the reflection at the surface ui = 0 using the direction of (k+1) for defining the angle of reflection α (see Fig. 13). The energy conservation is vi maintained by adjusting the distance between uk+1 and the reflection point (however there are other ways of conserving the energy). For nonlinear problems we presented in section “From Linear to Nonlinear Problems” two different approaches based on the Lyapunov function and a local linearization that both can be made convergent for convex problems. We showed by a simple example preliminary numerical results. With more future research using ideas from quasi-Newton methods and higher-order symplectic solvers, we think DFPM will be a competitive alternative for these problems. By using negative damping, it is possible to add energy to the dynamical system forcing the iterates to leave any vicinity of a minimum. This can be used to solve a global optimization problem. To illustrate the idea, we used DFPM with a symplectic Euler to find the two minima of the peaks function 2 −u21 −(u2 +1)2

V (u) = 3(1 − u1 ) e

 1 1 2 2 2 2 3 5 − 10 u1 − u1 − u2 e−u1 −u2 − e−(u1 +1) −u2 , 5 3 

see Fig. 14. In Fig. 15 the upper curve shows the norm of the gradient and the lower curve the size of the damping (note the negative damping). The starting point is chosen close to the minimum (0.22826, −1.6255), and DFPM converges initially toward that minimum. The damping is then switched to a negative value increasing the norm of the gradient, and the iterates depart from the minimum. When the damping is switched back to a positive value, DFPM converges toward the second local minimum situated approximately at (−1.3474, 0.20450). We have not tried to find any optimal parameters in this simple example which explains the rather slow convergence.

Fig. 13 Reflection on the surface ui = 0 where uk , vk are current approximations of the position and velocity, u(tk ), v(tk ). The updated approximate position and velocity are uk+1 , vk+1 , and α is the angle of reflection with respect to the surface ui = 0

2212

M. Gulliksson et al.

5 0 -5 3

2

1

0

-1

-2

-3

-3

-2

-1

0

1

2

3

Fig. 14 The peaks function

1010 100 10-10 0 2

500

1000

1500

2000

2500

500

1000

1500

2000

2500

0 -2

0

Fig. 15 Upper curve show the convergence of DFPM and the lower the size of the damping when finding the minima of the peaks function

References Abdullaev F Kh, Ögren M, Sørensen MP (2018, Submitted) Collective dynamics of Fermi-Bose mixtures with an oscillating scattering length Afraites L, Dambrine M, Kateb D (2007) Conformal mappings and shape derivatives for the transmission problem with a single measurement. Numer Func Anal Opt 28:519–551 Alvarez F (2000) On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J Control Opt 38(4):1102–1119 Alvarez F, Attouch H, Bolte J, Redont P (2002) A second-order gradient-like dissipative dynamical system with hessian-driven damping. J de Mathematiques Pures et Applicuees 81(8):747–779 Andersen HC (1983) Rattle: a velocity version of the shake algorithm for molecular dynamics calculations. J Comput Phys 52:24–34 Ascher U, van den DK, Huang H (2007) Artificial time integration. BIT 47:3–25 Attouch H, Alvarez F (2000) The heavy ball with friction dynamical system for convex constrained minimization problems. Lect Notes Econ Math Syst 481:25–35 Attouch H, Chbani Z (2016) Combining fast inertial dynamics for convex optimization with Tikhonov regularization. 39(2). arXiv:1602.01973

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2213

Attouch H, Goudou X, Redont P (2000) The heavy ball with friction method, I. The continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Commun Contemp Math 2(1):1–34 Baravdish G, Svensson O, Åström F (2015) On backward p(x)-parabolic equations for image enhancement. Numer Funct Anal Optim 36(2):147–168 Baravdish G, Svensson O, Gulliksson M, Zhang Y (2018) A damped flow for image denoising. ArXiv e-prints Begout P, Bolte J, Jendoubi M (2015) On damped second-order gradient systems. J Differ Equ 259(7):3115–3143 Bertsekas DP (2015) Convex optimization algorithms. Athena Scientific, Belmont Bhatt A, Floyd D, Moore BE (2016) Second order conformal symplectic schemes for damped Hamiltonian systems. J Sci Comput 66(3):1234–1259 Chadan K, Colton D, Paivarinta L, Rundell W (1997) An introduction to inverse scattering and inverse spectral problems. SIAM, Philadelphia Cheng X, Gong R, Han W, Zheng W (2014) A novel coupled complex boundary method for inverse source problems. Inverse Prob 30:055002 Cheng X, Lin G, Zhang Y, Gong R, Gulliksson M (2018) A modified coupled complex boundary method for an inverse chromatography problem. J Inverse Ill-Posed Prob 26:33–49 Chu MT (2008) Numerical linear algebra algorithms as dynamical systems. Acta Numer 17:1–86 Edvardsson S, Neuman M, Edström P, Olin H (2015) Solving equations through particle dynamics. Comput Phys Commun 197:169–181 Engl H, Hanke M, Neubauer A (1996) Regularization of inverse problems, vol 375. Springer, New York Gulliksson M (2017) The discrete dynamical functional particle method for solving constrained optimization problems. Dolomites Res Notes Approx 10:6–12 Gulliksson M, Edvardsson S, Persson J (2012) The dynamical functional particle method: an approach for boundary value problems. J Appl Mech 79(2):021012 Gulliksson M, Edvardsson S, Lind A (2013) The dynamical functional particle method. ArXiv e-prints, 2013 Gulliksson M, Holmbom A, Persson J, Zhang Y (2018) A separating oscillation method of recovering the g-limit in standard and non-standard homogenization problems. Inverse Prob 32:025005 Hairer E, Lubich C, Wanner G (2006) Geometric numerical integration, 2nd edn. Springer, Berlin/Heidelberg Han W, Cong W, Wang G (2006) Mathematical theory and numerical analysis of bioluminescence tomography. Inverse Prob 22:1659–1675 Kaltenbacher B, Neubauer A, Scherzer O (2008) Iterative regularization methods for nonlinear ill-posed problems. Walter de Gruyter GmbH & Co. KG, Berlin Karafyllis I, Grüne L (2013) Lyapunov function based step size control for numerical ode solvers with application to optimization algorithms. In: Hüper K, Trumpf J (eds) Mathematical system theory – festschrift in honor of Uwe Helmke on the occasion of his 60th birthday. CreateSpace, pp 183–210. http://num.math.unibayreuth.de/de/publications/2013/gruene_ karafyllis_lyapunov_function_based_step_size_control_2013/index.html Kaufman D, Pai D (2012) Geometric numerical integration of inequality constrained, nonsmooth hamiltonian systems. SIAM J Sci Comput 34(5):A2670–A2703 Lin G, Cheng X, Zhang Y (2018a) A parametric level set based collage method for an inverse problem in elliptic partial differential equations. J Comput Appl Math 340:101–121 Lin G, Zhang Y, Cheng X, Gulliksson M, Forssen P, Fornstedt T (2018b) A regularizing Kohn-Vogelius formulation for the model-free adsorption isotherm estimation problem in chromatography. Appl Anal 97:13–40 Lions J, Magenes E (1972) Non-homogeneous boundary value problems and applications, vol I. Springer, Berlin Mclachlan R, Reinout G, Quispel W (2006) Geometric integrators for odes. J Phys A 39: 5251–5285

2214

M. Gulliksson et al.

McLachlan R, Modin K, Verdier O, Wilkins M (2014) Geometric generalisations of shake and rattle. Found Comput Math J Soc Found Comput Math 14(2):339 Nesterov Y (1983) A method of solving a convex programming problem with convergence rate. Sov Math Doklady 27:372–376 Neubauer A (2000) On Landweber iteration for nonlinear ill-posed problems in Hilbert scales. Numer Math 85:309–328 Neubauer A (2017) On Nesterov acceleration for Landweber iteration of linear ill-posed problems. J Inverse Ill-Posed Prob 25:381–390 Neuman M, Edvardsson S, Edström P (2015) Solving the radiative transfer equation with a mathematical particle method. Opt Lett 40(18):4325–4328 Poljak BT (1964) Some methods of speeding up the convergence of iterative methods. Akademija Nauk SSSR. Zurnal Vycislitel nli Matematiki i Matematicoskoi Fiziki 4:791 Rieder A (2005) Runge-Kutta integrators yield optimal regularization schemes. Inverse Prob 21:453–471 Roubíˇcek T (2013) Nonlinear partial differential equations with applications, vol 153. Springer Science & Business Media, Basel Sandin P, Ögren M, Gulliksson M (2016) Numerical solution of the stationary multicomponent nonlinear schrödinger equation with a constraint on the angular momentum. Phys Rev E 93:033301 Sandro I, Valerio P, Francesco Z (1979) A new method for solving nonlinear simultaneous equations. SIAM J Numer Anal 16(5):779–11. 10 Scherzer O, Grasmair M, Grossauer H, Haltmeier M, Lenzen F (2009) Variational methods in imaging. Springer, New York Schock E (1985) Approximate solution of ill-posed equations: arbitrarily slow convergence vs. superconvergence. Construct Methods Pract Treat Integral Equ 73:234–243 Smyrlis G, Zisis V (2004) Local convergence of the steepest descent method in Hilbert spaces. J Math Anal Appl 300(2):436–453 Song S, Huang J (2012) Solving an inverse problem from bioluminescence tomography by minimizing an energy-like functional. J Comput Anal Appl 14:544–558 Tautenhahn U (1994) On the asymptotical regularization of nonlinear ill-posed problems. Inverse Prob 10:1405–1418 Tikhonov A, Leonov A, Yagola A (1998) Nonlinear ill-posed problems, vol I and II. Chapman and Hall, London Tsai C-C, Liu C-S, Yeih W-C (2010) Fictious time integration method of fundamental solutions with Chebyshev polynomials for solving Poisson-type nonlinear pdes. CMES 56(2):131–151 Vainikko G, Veretennikov A (1986) Iteration procedures in ill-posed problems. Moscow: Nauka (In Russian) Wang Y, Zhang Y, Lukyanenko D, Yagola A (2012) A method of restoring the aerosol particle size distribution function on the set of piecewise-convex functions. Vychislitelnye Metody i Programmirovanie 13:49–66 Wang Y, Zhang Y, Lukyanenko D, Yagola A (2013) Recovering aerosol particle size distribution function on the set of bounded piecewise-convex functions. Inverse Prob Sci Eng 21:339–354 Watson L, Sosonkina M, Melville R, Morgan A, Walker H (1997) Alg 777:hompack90: a suite of fortan 90 codes for globally convergent homotopy algorithms. ACM Trans Math Softw 23(4):514–549 Yao Z, Zhang Y, Bai Z, Eddy WF (2018) Estimating the number of sources in magnetoencephalography using spiked population eigenvalues. J Am Stat Assoc 113(522):505–518 Zhang Y, Hofmann B (2018) On the second order asymptotical regularization of linear illposed inverse problems. Applicable Analysis, pp 1–26. https://doi.org/10.1080/00036811.2018. 1517412 Zhang Y, Lukyanenko D, Yagola A (2013) Using Lagrange principle for solving linear ill-posed problems with a priori information. Vychislitelnye Metody i Programmirovanie 14:468–482 Zhang Y, Lukyanenko D, Yagola A (2015) An optimal regularization method for convolution equations on the sourcewise represented set. J Inverse Ill-Posed Prob 23:465–475

82 Damped Dynamical Systems for Solving Equations and Optimization Problems

2215

Zhang Y, Gulliksson M, Hernandez Bennetts V, Schaffernicht E (2016a) Reconstructing gas distribution maps via an adaptive sparse regularization algorithm. Inverse Prob Sci Eng 24:1186–1204 Zhang Y, Lin G, Forssen P, Gulliksson M, Fornstedt T, Cheng X (2016b) A regularization method for the reconstruction of adsorption isotherms in liquid chromatography. Inverse Prob 32:105005 Zhang Y, Lukyanenko D, Yagola A (2016c) Using Lagrange principle for solving two-dimensional integral equation with a positive kernel. Inverse Prob Sci Eng 24:811–831 Zhang Y, Forssen P, Fornstedt T, Gulliksson M, Dai X (2017a) An adaptive regularization algorithm for recovering the rate constant distribution from biosensor data. Inverse Prob Sci Eng 24:1–26 Zhang Y, Lin G, Forssen P, Gulliksson M, Fornstedt T, Cheng X (2017b) An adjoint method in inverse problems of chromatography. Inverse Prob Sci Eng 25:1112–1137 Zhang Y, Gong R, Cheng X, Gulliksson M (2018a) A dynamical regularization algorithm for solving inverse source problems of elliptic partial differential equations. Inverse Prob 34:065001 Zhang Y, Gong R, Gulliksson M, Cheng X (2018b) A coupled complex boundary expanding compacts method for inverse source problems. J Inverse Ill-Posed Prob, pp 1–20. https://doi. org/10.1515/jiip-2017-0002

Mathematics and Climate Change

83

Gerrit Lohmann

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Climate: A Fluid Dynamical System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nondimensional Parameters: The Reynolds Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convection in the Rayleigh-Bénard System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduction of Dimensions and the Lorenz System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scaling in the Climate System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Projection Methods: Coarse Graining and Stable Manifold Theory . . . . . . . . . . . . . . . . . Brownian Motion, Weather, and Climate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Climate Variability and Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Non-normal Growth of the Climate System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predictability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boltzmann Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2218 2222 2222 2225 2227 2229 2231 2232 2234 2235 2237 2240 2242 2245 2246 2246

Abstract Climate change is one of the most pressing scientific challenges of our times, with transformations which are already becoming present in many areas of the world. The demand (from the stakeholders) for clear answers under a wide range of future scenarios has to be addressed (by the scientific community) using our rapidly evolving knowledge of the weather and climate system. Mathematics is

G. Lohmann () Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany University of Bremen, Bremen, Germany e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_145

2217

2218

G. Lohmann

one of the essential pillars at the foundation of this knowledge, as it allows us to quantify and predict the effects we observe in nature. In this chapter, we illustrate some crucial mathematical techniques and theoretical approaches, in the context of their application to the climate system. The concepts of critical parameters, dimension reduction, and stochasticity are explored in detail.

Keywords Climate models · Fluid dynamics · Dynamical theory · Spatiotemporal scales · Coarse graining · Stability · Predictability

Introduction Mathematics can be described in large parts on (unprovable, but very well founded) axioms by purely logical steps. Physics owes, since the beginning of modern times, its great successes due to experiments (can be repeated at any time with limitation to measurable data) with mathematical theories and models. Strong abstraction, e.g., in free fall, is unavoidable and at the same time greatly reduces the “overall reality.” Climate science is rather new subject, describing the nature of its components and quantities like temperatures and currents. Its great success is due to a proper combination of observations, its theoretical foundations in fluid dynamics, and the statistical analysis of data. Unlike in physics, there is no lab to repeat measurements; instead, we have just one realization of the climate trajectory. Until now it is unknown on whether the gradual or the catastrophic case is more likely. Figure 1 shows the Northern Hemisphere temperature evolution of the last 150 years. The last 150 years are quite often called the “instrumental period” since the spatial coverage and the quality of the data is high compared to earlier periods. The current and future climate is subject to significant change and fluctuations; a large

Fig. 1 Northern Hemisphere near-surface temperature anomaly [K] based on HadCRUT4 (Morice et al., 2012)

83 Mathematics and Climate Change

2219

part is due to the increasing human influence on the climate system. The extent and the rate of this change are controversial, however. It is therefore necessary to improve the understanding of natural climate variability and trends by searching for their causes at time different scales, i.e., the multidecadal component in Fig. 1. A major challenge is furthermore to understand the dynamics and potential thresholds of rapid climate changes. The analysis of the current status, of the past, of driving mechanisms and feedbacks provides a suitable framework to study conditions which are expected to develop in the future. A comprehensive modeling strategy designed to address abrupt climate change includes vigorous use of a hierarchy of models, from theory and conceptual models with only a few degrees of freedom through models of intermediate complexity, to high-resolution models of components of the climate system, to fully coupled Earth system models. The simpler models are well-suited for use in developing new hypotheses for abrupt climate change. Model-data comparisons are needed to assess the quality of model predictions. It is important to note that the multiple long integrations of enhanced, fully coupled Earth system models required for this research are not possible with the computer resources available today, and thus, these resources are currently enhanced. Since Earth system models have to simplify the system and rely on parameterizations of unresolved processes using present data, paleoclimate records provide a unique tool to validate models for conditions which are different from our present one. Suitable model-data analyses provide therefore a proper basis to estimate and possibly reduce uncertainties of future climate change projections (Lohmann et al., 2020). Furthermore, the model scenarios in conjunction with the long-term data can be used to examine mechanisms for the statistics of regional climate extremes under different boundary conditions. Mathematical tools are numerics of partial differential equations, and conceptual approaches of fluid mechanics are described in this chapter. In the entire climate system, different scales play an important role (Fig. 2). These are the characteristic orders of magnitude in space and time that a system possesses or is superimposed on the system in order to record or observe it. Climate has a spatial and temporal dimension, which fluctuate in a wide range of spatial and temporal scales. Spatial scales vary from local to regional to continental. Time scales vary from seasonal to geological. The spatiotemporal dimension of complex phenomena are defined by their typical spatial extension (e.g., the diameter of a high-pressure area), which is linked to a structure whose magnitudes can be specified as spatial scales. The time scale of an atmospheric process is the order of magnitude of its lifetime. In addition, it is also possible to specify the spatial and temporal resolution with which a system is to be viewed. A distinction is made between the scale in space and time that an atmospheric process or an atmospheric system has and the scale of the observation. If the spatial and temporal scales of observation are to make it possible to capture the system in good resolution, they must be significantly smaller than those describing the overall system. The word “scale” in this context means order of magnitude or scale. The spectrum of atmospheric space and time scales covers many orders of magnitude. A rough classification, which is common in meteorology, is based on the horizontal space

2220

G. Lohmann

spatial scale

global

hemispheric

Ice ages Holocene trends

DO, H-events

AMO Quasidecadal

ENSO

1000 km

100 km

S e a s o n

Synoptic variations

Internally generated variability

m

Turbulence, mixing etc.

Internally generated variability millennia

centennial

decadal

annual

days

seconds

time scale

Fig. 2 Schematic diagram of the spatio-temporal scales considered. DO: Dansgaard-Oeschger, H: Heinrich events; AMO: Atlantic Multidecadal Oscillation; PDO: Pacific Decadal Oscillation; ENSO: El Niño-Southern Oscillation. The annual and astronomical cycles are externally driven and have quasi-global impact. The dashed line shows a schematic power spectrum with more variability on long time scales

scale L. Basically, climate represents a space-time continuum, so that fixed scale limits do not occur in the real atmosphere. Rather, more or less narrow transition ranges between the scales are the rule. Larger and smaller systems influence each other so that the transitions between differently scaled weather phenomena are smooth and the approach is usually based on the question. In order to illustrate the scaling in the climate system, the procedure of non-dimensional parameters are introduced. Models of planetary motion based on Newton’s models of gravity and motion were astonishingly successful, and this had a profound impact on the way in which people viewed mathematical models and in the way that we still view them. The eighteenth-century French scientist/mathematician Laplace, extended the basic Newtonian model, given above, to model the motion of all the planets in the solar system, including their influences on each other. This model accounted for the motions of the planets as perfectly as they could be measured (including all the small deviations from elliptical orbits caused by the planets gravitational effects on each other). This seemed to be such a triumph that the belief grew up that everything in the universe could be described by such models and that in principle the future could be predicted perfectly given such models and accurate measurements of the

83 Mathematics and Climate Change

2221

state of the universe now. Perhaps not surprisingly, Laplace was a vigorous promoter of this idea, which gained the name determinism. These ideas lent their name to the concept of a deterministic model – that is, a model which if solved from identical starting conditions always has the same solution. For much of the nineteenth-century, determinism reined and resulted in a great deal of heart searching about free will and the like. In the twentieth century, three areas of science comprehensively overturned the idea that everything could be described by deterministic models. These were quantum mechanics and chaos theory. The later had its origin in meteorology and fluid dynamics. In the 1960s, Edward Lorenz, a mathematician and meteorologist, showed that there are natural limits to the predictability of a nonlinear system, such as atmospheric circulation. He discovered and described the chaotic behavior of largescale motion patterns in the atmosphere and showed that despite the determinacy of the system, i.e., that although the partial differential equations could be calculated at any time, the system itself loses its predictability after a relatively short time. Even the smallest changes in the initial conditions caused different final states after a few iteration steps (calculation steps). A predictability of the system is therefore limited in time and the nonlinearity is responsible for the finite predictability of atmospheric flow patterns. This insight became known under the technical term “butterfly effect”. Climate physicists are constantly trying to learn from our observations how to examine causes and effects in such a way that the observations are captured. We write down our gained insights as mathematically formulated laws of nature. We formulate equations of motion in the form of differential equations. Their solutions provide information about what will happen to a given state at a given time at a later time. In this sense, the equations of motion as such provide a causal description par excellence – all equations of motion. Sometimes, however, there seem to be annoying difficulties with causal chains or with predictability, for example, when the destructive path of a hurricane is poorly predicted, or in extreme weather events. We are even more aware of the problems in long-term predictions, such as climate change. So is causality failing here? Of course we do not think so, otherwise we would not be looking for causes. It then also made no sense to derive political decisions from insights into climate evolution. And although word has got around that quantum mechanics is not a causal physics in the classical sense, we will hardly want to blame quantum mechanics for the cases of lack of predictability or undetectable cause-effect relationships. Here, the intersection of mathematical modeling and questions within climate change sciences are elaborated. A systematic description of the mathematics and climate is given (Fig. 3). A general question within the micro-macro dynamic is that of integration between different levels. Two distinctly different levels emerge with different rules governing each, but they then need to be reconciled in some way to create an overall functioning system. Physical, chemical, biological, economic, social, and cultural systems all exhibit this micro-macro dynamic and how the system comes to reconcile it forms a primary determinate in its identity and overall structure. This multidimensional nature to a system that results in the micro-macro dynamic is a product of synthesis and emergence. An approach is coarse graining and projection

2222

G. Lohmann

Boltzmann equation/BBGKY hierarchy

Mori-Zwanzig approach: Coarse graining Macroscopic moments

Fluid dynamics (Navier Stokes equation etc.) Scale analyses and approximations Coarse graining by numerical schemes, Parameterizations of unresolved scales Coupling techniques Climate system including its components with charactieristic time and spatial scales (atmosphere, ocean, cryosphere, land cover and vegetation, biogeochemical cycles, solid Earth) Defining initial and boundary conditions Climate model experiments for past, present and future scenarios (Ensenbles) Statistical data analyses Validation of climate scenarios on different time scales by using observations and paleoclimates

Fig. 3 Systematic description of climate and mathematics. Changing the description of the dynamics: from the micro- to macroscales. This is a common problem since we are not able to describe the systems on all temporal and spatial scales

where the underlying dynamics is projected onto the macroscopic dynamics; the other is the statistical physics theory of nonequilibrium statistical mechanics. The Boltzmann equation, coarse graining, and the Brownian motion are the approaches to understand the dynamics on different scales.

Climate: A Fluid Dynamical System For present climate state, we are able to directly measure all involved quantities. From measurements we can draw conclusions about physical, chemical, and biological relationships between the variables. Our understanding about the involved processes is far from complete, but nevertheless we derive equations that describe and predict the observed phenomena (Fig. 4).

Mathematical Equations Our starting point is a mathematical model for the system of interest. In physics a model typically describes the state variables, plus fundamental laws and equations of state. These variables evolve in space and time. For the ocean, fundamental equations are formulated:

83 Mathematics and Climate Change

2223

Fig. 4 Schematic view on the climate system. Global climate is a result of the complex interactions between the atmosphere, cryosphere (ice), hydrosphere (oceans), lithosphere (land), and biosphere (life), fueled by the nonuniform spatial distribution of incoming solar radiation (Peixoto and Oort, 1992). We know from climate reconstructions using recorders such as ice cores, ocean and lake sediment cores, tree rings, corals, cave deposits, and groundwater that the Earth’s climate has seen major changes over its history. An analysis of the temperature variations patched together from all these data reveals that climate change occurs in cycles with characteristic periods, for example, 200 million, 100 thousand, or 4–7 years. For some of these cycles, particular mechanisms can be identified, for example, forcing by changes in the Earth’s orbital parameters or internal oscillations of the coupled ocean-atmosphere system. However, major uncertainties remain in our understanding of the interplay of the components of the climate system

• State variables: Velocity (in each of three directions), pressure, temperature, salinity, and density • Fundamental laws: Conservation of momentum, conservation of mass, conservation of temperature, and salinity • Equations of state: Relationship of density to temperature, salinity, and pressure and perhaps also a model for the formation of sea ice The state variables are expressed as a continuum in space and time and the fundamental laws as partial differential equations. If the atmosphere is becoming too thin in the upper levels, a more molecular, statistical description is appropriate. Even at this stage, though, simplifications may be made. For example, it is common to treat seawater as incompressible. Furthermore, equations of state are often specified by empirical relationships or laboratory experiments.

2224

G. Lohmann

∂ρ + ∇ · (ρu) = 0 ∂t

(1)

or, using the substantive derivative: Dρ + ρ(∇ · u) = 0. Dt

(2)

A simplification of the resulting flow equations is obtained when considering an incompressible flow of a Newtonian fluid. The assumption of incompressibility rules out the possibility of sound or shock waves to occur, so this simplification is invalid if these phenomena are important. The incompressible flow assumption typically holds well even when dealing with a “compressible” fluid – such as air at room temperature – at low Mach numbers (even when flowing up to about Mach 0.3). The dynamics of flow are based on the Navier-Stokes equations. This is a statement of the conservation of momentum in a fluid and it is an application of Newton’s second law to a continuum; in fact this equation is applicable to any non-relativistic continuum and is known as the Cauchy momentum equation (e.g., Landau and Lifshitz 1959). Taking this into account and assuming constant viscosity, the Navier-Stokes equations will read, in vector form:  ρ

Inertia (per volume)



∂u ∂t 



Unsteady acceleration

+ u  ·∇u Advective acceleration

 

Divergence of stress

   = −∇p + μ∇ 2 u +  F .       Pressure gradient

Viscosity

(3)

Other body forces

Note that only the advection terms are nonlinear for incompressible Newtonian flow. This acceleration is an acceleration caused by a (possibly steady) change in velocity over position, for example, the speeding up of fluid entering a converging nozzle. Though individual fluid particles are being accelerated and thus are under unsteady motion, the flow field (a velocity distribution) will not necessarily be time dependent. The vector field F represents “other” (body force) forces. Typically this is only gravity but may include other fields (such as electromagnetic). In a non-inertial coordinate system, other “forces” such as that associated with rotating coordinates may be inserted. We note that the Coriolis force will be one of the main contributions in the rotating Earth system. Often, these forces may be represented as the gradient of some scalar quantity. Gravity in the z direction, for example, is the gradient of −ρgz. Since pressure shows up only as a gradient, this implies that solving a problem without any such body force can be mended to include the body force by modifying pressure. If temperature effects are also neglected, the only “other” equation (apart from initial/boundary conditions) needed is the mass continuity equation. Under the

83 Mathematics and Climate Change

2225

incompressible assumption, density is a constant and it follows that the equation will simplify to: ∇ · u = 0.

(4)

This is more specifically a statement of the conservation of volume (see divergence). These equations are commonly used in three coordinates systems: Cartesian, cylindrical, and spherical. While the Cartesian equations seem to follow directly from the vector equation above, the vector form of the Navier-Stokes equation involves some tensor calculus which means that writing it in other coordinate systems is not as simple as doing so for scalar equations (such as the heat equation). Taking the curl of the Navier-Stokes equation results in the elimination of pressure. This is especially easy to see if two-dimensional Cartesian flow is assumed (w = 0 and no dependence of anything on z), where the equations reduce to:   Dt ∇ 2 ψ = ν∇ 4 ψ

(5)

where ∇ 4 is the (2D) biharmonic operator and ν is the kinematic viscosity ν = μρ . This single equation together with appropriate boundary conditions describes 2D fluid flow, taking only kinematic viscosity as a parameter. Note that the equation for creeping flow results when the left side is assumed zero. In axisymmetric flow another stream function formulation, called the Stokes stream function, can be used to describe the velocity components of an incompressible flow with one scalar function. The concept of taking the curl of the flow will become very important in climate dynamics (vorticity dynamics). The term ζ = ∇ 2 ψ is called relative vorticity, and the term f = 2Ω sin ϕ is due to the rotating Earth (Ω is the radiation, ϕ the latitude). The dynamics can be described by the barotropic vorticity equation as Dt (ζ + f ) = ν∇ 2 ζ

(6)

which is heavily used in climate research.

Nondimensional Parameters: The Reynolds Number In climate, we are interested in the critical parameters of the system. For the case of an incompressible flow in the Navier-Stokes equations, assuming the temperature effects are negligible and external forces are neglected, the equations consist of conservation of mass ∇ ·u=0 and momentum

(7)

2226

G. Lohmann

∂t u + (u · ∇)u = −

1 ∇p + ν∇ 2 u ρ0

(8)

where u is the velocity vector and p is the pressure and ν denotes the kinematic viscosity. The equations can be made dimensionless by a length scale L, determined by the geometry of the flow, and by a characteristic velocity U. For inter-comparison of analytical solutions, numerical results, and experimental measurements, it is useful to report the results in a dimensionless system. This is justified by the important concept of dynamic similarity (Buckingham, 1914). The main goal for using this system is to replace physical or numerical parameters with some dimensionless numbers, which completely determine the dynamical behavior of the system. The procedure for converting to this system first implies, first of all, the selection of some representative values for the physical quantities involved in the original equations (in the physical system). For our current problem, we need to provide representative values for velocity (U ), time (T ), and distances (L). From these, we can derive scaling parameters for the time derivatives and spatial gradients also. Using these values, the values in the dimensionlesssystem (written with subscript d) can be defined: u = U · ud

(9)

t = T · td

(10)

x = L · xd

(11)

with U = L/T . From these scalings, we can also derive ∂t =

∂ 1 ∂ = · ∂t T ∂td

(12)

∂x =

1 ∂ ∂ = · ∂x L ∂xd

(13)

Note furthermore the units of [ρ0 ] = kg/m3 , [p] = kg/(ms 2 ), and [p]/[ρ0 ] = Therefore the pressure gradient term has the scaling U 2 /L. Furthermore, divide the momentum equation by U 2 /L and the scalings vanish completely in front of the terms except for the ∇d2 ud -term: m2 /s 2 .

∇d · ud = 0

(14)

∂ 1 2 ud + (ud · ∇d )ud = −∇d pd + ∇ ud ∂td Re d

(15)

and conservation of momentum

83 Mathematics and Climate Change

2227

The dimensionless parameter Re = U L/ν is the Reynolds number and the only parameter left. For large Reynolds numbers, the flow is turbulent. In most practical flows Re is rather large (104 − 108 ), large enough for the flow to be turbulent. A large Reynolds number allows the flow to develop steep gradients locally. The typical length scale corresponding to these steep gradients can become so small that viscosity is not negligible. So the dissipation takes place at small scales. In this way different length scales are present in a turbulent flow, which range from L to the Kolmogorov length scale. This length scale is the typical length of the smallest eddy present in a turbulent flow. In the climate system, this dissipation by turbulence is modeled via eddy terms. To evaluate the critical parameters and scales, we implicitly assume such procedure. A classical example is provided in the next section.

Convection in the Rayleigh-Bénard System A system of three ordinary differential equations is introduced whose solutions afford the simplest example of deterministic flow that we are aware of. The system is a simplification of the one derived by Saltzman (1962), to study finite-amplitude convection. Consider the Rayleigh-Bénard circulation (Fig. 5). Rayleigh (1916) studied the flow occurring in a layer of fluid of uniform circulation depth H , when the temperature difference between the upper and lower surfaces is maintained at a constant value ΔT . T (x, y, z = H ) = T0 T (x, y, z = 0) = T0 + ΔT

(16)

The Boussinesq approximation is used, which results in a buoyancy force term which couples the thermal and fluid velocity fields. Therefore

Benard−Cell (low temperature) T0

z

H

g

y x

H/a

T0 + ΔT (high temperature)

Fig. 5 Geometry of the Rayleigh-Bénard system (see text for details)

2228

G. Lohmann

ρ ≈ ρ0 = const.

(17)

except in the buoyancy term, where: ρ = ρ0 (1 − α(T − T0 )) with α > 0.

(18)

ρ0 is the fluid density in the reference state. This assumption reflects a common feature of geophysical flows, where the density fluctuations caused by temperature variations are small, yet they are the ones driving the overall flow. We have the following relations. Furthermore, we assume that the density depends linearly on temperature T . This system possesses a steady-state solution in which there is no motion, and the temperature varies linearly with depth: u=w=0  z ΔT Teq = T0 + 1 − H

(19)

When this solution becomes unstable, convection should develop. In the case where all motions are parallel to the x − z-plane, and no variations in the direction of the y-axis occur, the governing equations may be written (see Saltzman 1962) as: Dt u = −

1 ∂x p + ν∇ 2 u ρ0

(20)

Dt w = −

1 ∂z p + ν∇ 2 w + g(1 − α(T − T0 )) ρ0

(21)

Dt T = κ∇ 2 T ∂x u + ∂z w = 0

(22) (23)

where w and u are the vertical and horizontal components of the velocity. Furthermore, ν = η/ρ0 , κ = λ/(ρ0 Cv ) the momentum diffusivity (kinematic viscosity) and thermal diffusivity, respectively.  Now, the pressure is eliminated to derive the vorticity equation Dt ∇ 2 ψ = ν∇ 4 ψ. Here, it is useful to define the stream function Ψ for the two-dimensional motion, i.e., ∂Ψ =w ∂x ∂Ψ = −u. ∂z

(24) (25)

83 Mathematics and Climate Change

2229

∂ ∂ ∂ ∂w ∂u ∂ (21) − (20) = Dt w − Dt u = Dt − Dt ∂x ∂z ∂x ∂z ∂x ∂z == Dt

∂ 2Ψ ∂ 2Ψ − D = Dt ∇ 2 Ψ. t ∂x 2 ∂z2

(26) (27)

Furthermore, one can introduce the function Θ as the departure of temperature from that occurring in the state of no convection (19): T = Teq + Θ In the temperature term in

∂ ∂x

(28)

(21) on the right-hand side:

∂ ∂ g(1 − α(Teq + Θ − T0 )) = −gα Θ ∂x ∂x The left-hand side of (22) reads Dt T = Dt Teq + Dt Θ = w ·

−ΔT ΔT ∂Ψ + Dt Θ = − + Dt Θ H H ∂x

Then, the dynamics can be formulated as   ∂Θ Dt ∇ 2 Ψ = ν∇ 4 Ψ − gα ∂x ΔT ∂Ψ Dt Θ = + κ∇ 2 Θ. H ∂x

(29) (30)

Nondimensionalization of the problem yields equations including the dimensionless Prandtl number σ and the Rayleigh number Ra which are the control parameters of the problem. One can take the layer thickness H as the length of unit, the time T = H 2 /κ of vertical diffusion of heat as the unit of time, and the temperature difference ΔT as the unit of temperature.

Reduction of Dimensions and the Lorenz System Saltzman (1962) derived a set of ordinary differential equations by expanding Ψ and Θ in double Fourier series in x and z, with functions of t alone for coefficients, and substituting these series into (29) and (30). A complete Galerkin approximation Ψ (x, z, t) =

∞ ∞

k=1 l=1

Ψk,l (t) sin

kπ a x H



× sin

lπ z H

(31)

2230

G. Lohmann

Θ(x, z, t) =

∞ ∞



Θk,l (t) cos

k=1 l=1

kπ a x H



× sin

lπ z H

(32)

yields an infinite set of ordinary differential equations for the time coefficients. He arranged the right-hand sides of the resulting equations in double Fourier series form, by replacing products of trigonometric functions of x (or z) by sums of trigonometric functions, and then equated coefficients of similar functions of x and z. He then reduced the resulting infinite system to a finite system by omitting reference to all but a specified finite set of functions of t. He then obtained timedependent solutions by numerical integration. In certain cases, all, except three of the dependent variables, eventually tended to zero, and these three variables underwent irregular, apparently nonperiodic fluctuations. These same solutions would have been obtained if the series had been at the start truncated to include a total of three terms. Accordingly, in this study we shall let a κ 1 + a2 Ra 1 π Rc ΔT

π  πa  √ x sin z Ψ = X 2 sin H H π   π  πa  √ Θ = Y 2 cos x sin z − Z sin 2 z H H H

(33) (34)

where X(t), Y (t), and Z(t) are functions of time alone. It is found that fields of motion of this form would develop if the Rayleigh number Ra =

gαH 3 ΔT , νκ

(35)

exceeds a critical value Rc = π 4 a −2 (1 + a 2 )3 .

(36)

The minimum value of Rc , namely, 27π 4 /4 = 657.51, occurs when a 2 = 1/2. In fluid mechanics, the Rayleigh number for a fluid is a dimensionless number associated with the relation of buoyancy and viscosity in a flow. When the Rayleigh number is below the critical value for that fluid, heat transfer is primarily in the form of conduction; when it exceeds the critical value, heat transfer is primarily in the form of convection. When the above truncation (33) and (34) is substituted into the dynamics, we obtain the equations (Lorenz model): X˙ = −σ X + σ Y

(37)

Y˙ = rX − Y − XZ

(38)

Z˙ = −bZ + XY

(39)

83 Mathematics and Climate Change

2231

Here a dot denotes a derivative with respect to the dimensionless time td = π 2 H −2 (1 + a 2 )κt, while σ = νκ −1 is the Prandtl number, r = Ra /Rc , and b = 4(1 + a 2 )−1 . Equations (37), (38), and (39) are called Lorenz model in the literature (Lorenz, 1960, 1963, 1984; Maas, 1994; Olbers, 2001). The system may give realistic results when the Rayleigh number is slightly supercritical, but their solutions cannot be expected to resemble those of the complete dynamics when strong convection occurs, in view of the extreme truncation. Figure 6 shows the numerical solution in the phase space with the parameters r = 28, σ = 10, and b = 8/3. The chaotic nature of this system inspired climate scientist and scientists in general. This phenomenon had probably the greatest impact of climate science to mathematics.

Scaling in the Climate System As we will see now, the Coriolis effect is one of the dominating forces for the largescale dynamics of the oceans and the atmosphere. It is convenient to work in the rotating frame of reference of the Earth. The equation can be scaled by a length scale L, determined by the geometry of the flow, and by a characteristic velocity U. One can estimate the relative contributions in units of m/s 2 in the horizontal momentum equations: ∂v ∂t 

U/T ∼10−8

+ v ·∇v

U 2 /L∼10−8

=

1 − ∇p ρ   

+ 2  × v +

δP/(ρL)∼10−5

f0

U∼10−5

(40)

f ric   

νU/H 2 ∼10−13

30

where fric denotes the contributions of friction due to eddy stress divergence (usually ∼ ν∇ 2 v). Typical values are given in Table 1. The values have been taken for the ocean. It is furthermore useful to think about the orders of magnitude: Because of the continuity equation U/L ∼ W/H and since the horizontal scales are orders of

−20

0

y

10

20

Fig. 6 Numerical solution of the Lorenz model, in the X − Y phase space with the parameters r = 28, σ = 10, and b = 8/3

−10

0

10 x

20

2232

G. Lohmann

Table 1 Table shows the typical scales in the atmosphere and ocean system. Using these orders of magnitude, one can derive estimates of the different terms in (40) Horizontal velocity Vertical velocity Horizontal length Vertical length Horizonal pressure changes Mean pressure Time scale Gravity (gravitation + centrifugal) Earth radius Coriolis parameter at 45◦ N 2nd coriolis parameter at 45◦ N Density Viscosity (turbulent)

Quantity U W L H δP (horizontal) P0 T g a f0 = 2Ω sin ϕ0 f1 = 2Ω cos ϕ0 ρ ν

Atmosphere 10 m s−1 10−1 m s−1 106 m 104 m 103 Pa 105 Pa 105 s 10 m s−2 107 m 10−4 s−1 10−4 s−1 1 kg m−3 10−5 kg m−3

Ocean 10−1 m s−1 10−4 m s−1 106 m 103 m 104 Pa 107 Pa 107 s 10 m s−2 107 m 10−4 s−1 10−4 s−1 103 kg m−3 10−6 kg m−3

magnitude larger than the vertical ones, the vertical velocity is very small relative to the horizontal. For small-scale motion (like small-scale ocean convection or cumulus clouds), the horizontal length scale is of the same order as the vertical one and therefore the vertical motion is in the same order of magnitude as the horizontal motion. The time scales are related to T ∼ L/U ∼ H /W . It is essential to think about the relative importance of the different terms in the momentum balance (40). The Rossby Number Ro is the ratio of inertial (the lefthand side) to Coriolis (second term on the right-hand side) terms Ro =

U (U 2 /L) = . (f U ) fL

(41)

It is used in the oceans and atmosphere, where it characterizes the importance of Coriolis accelerations arising from planetary rotation. It is also known as the Kibel number. Ro is small when the flow is in a so-called geostrophic balance.

Projection Methods: Coarse Graining and Stable Manifold Theory The structure of fluid dynamical models and thus climate models is valid for systems with many degrees of freedom, many collisions, and for substances which can be described as a continuum. The transition from the highly complex dynamical equations to a reduced system is an important step since it gives more credibility to the approach and its results. The transition is also necessary since the active entangled processes are running on spatial scales from millimeters to thousands of kilometers, and temporal scales from seconds to millennia (Figs. 2 and 3).

83 Mathematics and Climate Change

2233

Therefore, the unresolved processes on subgrid scales have to be described. This is the typical problem in statistical physics, known as the so-called Mori-Zwanzig approach (Mori, 1965; Zwanzig, 1960, 1980). The basic idea is the evolution of a system through a projection on a subset (macroscopic relevant part), where a randomness reflects the effects of the unresolved degrees of freedom. A particular example is the Brownian motion (Einstein, 1905; Langevin, 1908). Another solution for the transition form may degrees of freedom to the macroscopic laws goes back to Boltzmann (1896). The Boltzmann equation, also often known as the Boltzmann transport equation (Bhatnagar et al., 1954; Boltzmann, 1896; Cercignani, 1990), describes the statistical distribution of one particle in a fluid. It is one of the most important equations of nonequilibrium statistical mechanics, the area of statistical mechanics that deals with systems far from thermodynamic equilibrium. It is applied, for instance, when there is an applied temperature gradient or electric field. Both, the Mori-Zwanzig and Boltzmann approaches play also a fundamental role in physics. The microscopic equations show no preferred time direction, whereas the macroscopic phenomena in the thermodynamics have a time direction through the entropy. The underlying procedure is that part of the microscopic information is lost through coarse graining in space and time. In order to get a first idea of coarse graining, one may think of the transition from Rayleigh-Bénard convection to the Lorenz system (section Convection in the Rayleigh-Bénard system). In our formula, the Galerkin approximation (31) and (32) provided a suitable projector to simply truncate the series at some specified wave number cutoff into a low-order system (such as in equations (33) and (34). The mathematical theory behind this truncation is called the center manifold theory (Haken, 1983; Oseledets, 1968). We could arrive at the slow manifold of the climate system, to which all the faster response variables (e.g., the atmosphere) are attracted. In mathematics, the slow manifold of an equilibrium point of a dynamical system occurs as the most common example of a center manifold. One of the main methods of simplifying dynamical systems is to reduce the dimension of the system to that of the slow manifold – center manifold theory rigorously justifies the modeling (Arnold, 1998; Arnold and Imkeller, 1998; Lorenz, 1986; Roberts, 2008). The Mori-Zwanzig formalism (Mori, 1965; Zwanzig, 1960) and the slow manifold theory provide a conceptual framework for the study of dimension reduction and the parameterization of less relevant variables by a stochastic process. It includes a generalized (Langevin, 1908) theory. Langevin (1908) studied Brownian motion from a different perspective to Einstein’s seminal 1905 paper (Einstein, 1905), describing the motion of a single Brownian particle as a dynamic process via a stochastic differential equation, as an Ornstein-Uhlenbeck process (Uhlenbeck and Ornstein, 1930). The Gaussian filtering of hydrodynamic equations that leads to the Smagorinsky equations (Smagorinsky, 1963) is, in its essence, a version of coarse graining. The projection method includes the procedure to describe turbulent energy dissipation in turbulent flows, where the larger eddies extract energy from the mean flow and ultimately transfer some of it to the smaller eddies which, in turn, pass the energy to even smaller eddies, and so on up to the smallest scales, where the eddies convert

2234

G. Lohmann

the kinetic energy into internal energy of the fluid. At this scales (also known as Kolmogorov scale), the viscous friction dominates the flow (Frisch, 1996).

Brownian Motion, Weather, and Climate The daily observed maximum and minimum temperatures are often compared to the “normal” temperatures based upon the 30-year average. Climate averages provide a context for something like “this winter will be wetter (or drier, or colder, or warmer, etc.) than normal.” It has been said “Climate is what you expect. Weather is what you get.” What is the difference between weather and climate? This can be also answered by an example/a metaphor in the football league. Predicting the outcome of the next game is difficult (weather), but predicting who will end up as German champion is unfortunately relatively easy (climate). For climate, this transition between the climate and weather scales has been formulated conceptually (Hasselmann, 1976; Leith, 1975) and later re-formulated in a mathematical context (Arnold, 2001; Chorin et al., 1999; Gottwald, 2010). The effect of the weather on climate is seen by red-noise spectra in the climate system, showing one of the most fundamental aspects of climate and serving also as a null hypothesis for climate variability studies. In a stochastic framework of climate theory, one may use an appropriate stochastic differential equation (Langevin equation) d x(t) = f (x) + g(x)ξ, dt

(42)

d where ξ = dt W (t) is a stationary stochastic process and the functions f, g : R n → n R describe the climate dynamics. The properties of the random force are described through its distribution and its correlation properties at different times. The process ξ is assumed to have a Gaussian distribution of zero average,

< ξ(t) >= 0

(43)

< ξ(t)ξ(t + τ ) >= δ(τ )

(44)

and to be δ-correlated in time,

where δ is the delta function defined by

f (x) δ(x − x0 ) dx = f (x0 ).

(45)

R

The brackets indicate an average over realizations of the random force. Formally, ξ(t) is a random variable, i.e., ξ(t)(α) with different realizations due to random variable α. The expectation < ξ(t) > is thus the mean over all α :< ξ(t)(α) >α .

83 Mathematics and Climate Change

2235

Using the ergodic hypothesis, the ensemble average can be expressed as the time  T /2 average limT →∞ T1 −T /2 dt of the function. Almost all points in any subset of the phase space eventually revisit the set. For a Gaussian process, only the average and second moment need to be specified since all higher moments can be expressed in terms of the first two. Note that the dependence of the correlation function on the time difference τ assumes that ξ is a stationary process. ξ is called a white-noise process. Additionally, there might be an external forcing F (x, t) which is generally time, variable, and space dependent. In his theoretical approach, Hasselmann (1976) formulated a linear stochastic climate model d x(t) = Ax + σ ξ + F (t), dt

(46)

with system matrix A ∈ R n×n , constant noise term σ , and stochastic process ξ. Many features of the climate system can be well described by (46), which is analogous to the Ornstein-Uhlenbeck process in statistical physics (Uhlenbeck and Ornstein, 1930). Notice that σ ξ represents a stationary random process. The relationship derived above is identical to that describing the diffusion of a fluid particle in a turbulent fluid. In a time scale separated system, during one slow-time unit the fast uninteresting variables y perform many “uncorrelated” events (provided that the fast dynamics are sufficiently chaotic). The contribution of the uncorrelated events to the dynamics of the slow interesting variables x is as a sum of independent random variables. By the weak central limit theorem, this can be expressed by a normally distributed variable. Note, in the absence of any feedback effects Ax, the climate variations would continue to grow indefinitely as the Wiener process. A perturbation in a system with a negative feedback mechanism will be reduced, whereas in a system with positive feedback mechanisms, the perturbation will grow. In the one-dimensional case, A can be rewritten as −λ. The real part of λ determines then the stability of the system and is called feedback factor.

Climate Variability and Sensitivity Imagine now that the temperature of the ocean mixed layer of depth h is governed by a one-dimensional system dT = −λT + Qnet + f (t) , dt

(47)

where the air-sea fluxes due to weather systems are represented by a white-noise process with zero average < Qnet >= 0 and δ-correlated in time < Qnet (t)Qnet (t + τ ) >= δ(τ ). The function f (t) is a time-dependent deterministic forcing. Assume furthermore that f (t) = c · u(t) with u(t) as unit step or the so-called Heaviside step function. Because < Qnet >= 0, < T (t) > can be solved using the Laplace

2236

G. Lohmann

transform: < T (t) > = L

−1

{F (s)}(t) = L

−1

= T (0) · exp(−λt) +



< T (0) > c 1 + · s+λ s s+λ

c (1 − exp(−λt)) λ

 (48) (49)

because we have < T (0) >= T (0). As equilibrium response, we have ΔT = lim < T (t) >= t→∞

c . λ

(50)

The fluctuation can be characterized by the spectrum S(ω) =< Tˆ Tˆ ∗ >=

λ2

1 . + ω2

(51)

and therefore, the spectrum and the equilibrium response are closely coupled (fluctuation-dissipation theorem). In mathematics, this is called Wiener-ChintschinTheorem (Chintchin, 1934; Wiener, 1930). For some energy considerations, it is useful to rewrite equation (47) as C

dT = −λC T + fC , dt

(52)

with C = cp ρdz as the heat capacity of the ocean. For a depth of 200 m of water distributed over the globe, C = 4.2 · 103 Ws kg−1 K−1 × 1000 kg m−3 × 200 m = 8.4 · 108 Ws m−2 K−1 . The temperature evolution is T (t) = T (0) · exp(−λc /C t) +

fC (1 − exp(−λC /C t)) λC

(53)

The left-hand side of (52) represents the heat uptake by the ocean, which plays a central role in the transient response of the system to a perturbation (53). Typical changes in fC are 4 W m−2 for doubling of CO2 , λC = 1 − 2 Wm−2 K−1 . The typical time scale for a mixed layer ocean is C/λC = 13 − 26 years . Please note that the climate system is simplified by a slab ocean with homogeneous temperature and heat capacity. This is an approximation as the heat capacity should vary in time as the perturbation penetrates to deeper oceanic levels. The equilibrium temperature change ΔT is ΔT =

ΔfC c = λC λ

with values of ΔT = 2 − 4 K. The term CS = radiative forcing ΔfC :

1 λC

(54) is called climate sensitivity to a

83 Mathematics and Climate Change

2237

ΔT = CS · ΔfC .

(55)

In the literature, the concept of climate sensitivity is quite often used as the equilibrium temperature increase for a forcing ΔfC related to doubling of CO2 . It is obvious that the CS depends on the included sources of feedback of the system which are related to climate components and their respective time scales (e.g., Lohmann 2018). Due to the non-normality in (46), the effective damping may not be directly related to the eigenvalues of the system.

Non-normal Growth of the Climate System In the one-dimensional case for x(t) = exp(at), we have the inverse Laplace transform exp(at) = L−1 {F (s)}(t) =

1 lim 2π i T →∞

γ +iT

γ −iT

est

1 ds, s−a

(56)

1 and the entire range of t is controlled t by the resolvent | s−a |. Using the Fourier transformation, (46) with forcing F(t) is transformed to

(iωI − A)xˆ = Fˆ

(57)

xˆ = (iωI − A)−1 Fˆ

(58)

where I is the identity. The so-called resolvent operator of matrix A is R(ω) = (iωI − A)−1 . The behavior of the norms || exp(At)|| over the entire range of t is controlled t by the resolvent norm ||R(ω)||. If A is a normal operator A A+

=

A+ A,

(59)

where + denotes the adjoint-complex operator, then ||R(ω)|| = 1/dist(iω, σ (A))

(60)

is completely determined by the spectrum σ (A) alone. The operator dist denotes the shortest distance of ω to the eigenvalues, the spectrum σ (A). This explains the success of eigenvalue analysis. In contrast to this, for non-normal operators the behavior of ||R(ω)|| may deviate from that dramatically, and hence in this context pseudospectral analysis is just the right tool. For example, there are problems in fluid mechanics where σ (A) is contained in the left half-plane, which suggests laminar behavior, but it protrudes strongly into the right half-plane, which implies that ||eAt || has a big hump before decaying exponentially fast to zero Reddy et al. (1993) and Trefethen et al. (1993). More about the dynamics can be learned by examining the pseudospectrum of A in the complex plane. Inspection of many

2238

G. Lohmann

geophysical systems shows that most of the systems fail the normality condition. The −pseudospectrum of operator A is defined by two equivalent formulations: Λ (A) = {z ∈ C : ||(zI − A)−1 || ≥  −1 } = {z ∈ C : [ smallest singular value of (zI − A)] ≤ }.

(61)

This set of values z in the complex plane are defined by contourlines of the resolvent (zI − A)−1 . The resolvent determines the system’s response to a forcing as supplied by external forcing F (x, t), stochastic forcing g(x)ξ, or initial/boundary conditions. The pseudospectrum reflects the robustness of the spectrum and provides information about instability and resonance. One theorem is derived from Laplace transformation stating that transient growth is related to how far the −pseudospectrum extends into the right half-plane: || exp(A t) ||



1 

sup

Real(z).

(62)

z∈Λ (A)

In terms of climate theory, the pseudospectrum indicates resonant amplification. Maximal amplification is at the poles of (zI − A)−1 , characterized by the eigenfrequencies. In a mathematical normal matrix A, the system’s response is characterized solely by the proximity to the eigenfrequencies. In the non-normal case, the pseudospectrum shows large resonant amplification for frequencies which are not eigenfrequencies. This transient growth mechanism is important for both initial value and forced problems. An atmospheric general circulation model PUMA (Fraedrich et al., 2005) is applied to the problem. The model is based on the multilevel spectral model described by Hoskins and Simmons (1975). For our experiments we chose five vertical levels and a T21 horizontal resolution. PUMA belongs to the class of models of intermediate complexity (Claussen et al., 2002); it has been used to understand principle feedbacks (Lunkeit et al., 1998), and dynamics on long time scales (Romanova et al., 2006). For simplicity, the equations are scaled here such that they are dimensionless. The model is linearized about a zonally symmetric mean state providing for a realistic storm track at midlatitudes (Frisius et al., 1998). In a simplified version of the model and calculating the linear model A with n = 214, one can derive the pseudospectrum. Figure 7 indicates resonances besides the poles (the eigenvalues) indicated by crosses. The I m(z)−axis shows the frequencies and the Re(z)−axis the damping/amplification of the modes. Important modes for the climate system are those with −0.5 < I m(z) < 0.5 representing planetary Rossby waves. The basic feature is that transient growth of initially small perturbations can occur even if all the eigenmodes decay exponentially. Mathematically, an arbitrary matrix A can be decomposed as a sum A=D+N

(63)

83 Mathematics and Climate Change

2239

Fig. 7 Contours of log10 (1/). The figure displays resonant structures of the linearized atmospheric circulation model. The modes extend to the right half-plane and are connected through resonant structures, indicating for transient growth mechanism inherent in atmospheric dynamics

where A is diagonalizable and N is nilpotent (there exists an integer q ∈ N with N q = 0) and D commutes with N (i.e., DN = NA). This fact follows from the Jordan-Chevalley decomposition theorem. This means that we can compute the exponential of (A t) by reducing to the cases: exp(At) = exp( (D + N) t) = exp(Dt)

exp(N t)

(64)

where the exponential of Nt can be computed directly from the series expansion, as the series terminates after a finite number of terms. Basically, the number q ∈ N is related to the transient growth of the system (q = 1 means no transient growth). The resonant structures are due to the mode interaction: It is not possible to change one variable without the others, because they are not orthogonal. Interestingly, one can also compute the A+ model, showing the optimal perturbation of a mode through its biorthogonal vector which is the associated eigenvector of the adjoint A+ . The analysis indicates that non-normality of the system is a fundamental feature of the atmospheric dynamics. This has consequences for the error growth dynamics and instability of the system, e.g. Palmer (1996) and Lohmann and Schneider (1999). Similar features are obtained in shear flow systems (Reddy et al., 1993; Trefethen et al., 1993) and other hydrodynamic applications. This transient

2240

G. Lohmann

growth mechanism is important for both initial value and forced problems of the climate system (Farrell and Ioannou, 1996).

Predictability In climate we may ask about our initial state. Climatologists always feel uncertain when we want to give the initial values. There is always a more or less big inaccuracy due to weather and uncertainties in many quantities which cannot be observed at any time step (e.g., the deep ocean). The Lyapunov exponents λ play an important role in knowing the predictability of a system. This is because the larger they are, the smaller the number of steps for which predictions can be made with a certain, desired accuracy. Consider a trajectory x(t) and a nearby trajectory x(t) + δ(t) where δ(t) is a vector with infinitesimal initial length. As the system evolves, track how δ(t) changes. The maximal Lyapunov exponent of the system is the number λ such that |δ(t)| ≈ |δ(0)| · exp(λt). A classical example is again the Lorenz system (37), (38), and (39) where for large parts of the phase space, we have limited predictability because initial errors can grow. Every dynamical system has a spectrum of Lyapunov exponents, one for each dimension of its phase space. Like the largest eigenvalue of a matrix, the largest Lyapunov exponent is responsible for the dominant behavior of a system. In case of weather and climate, this Lyapunov exponent is therefore also time scale dependent. Causality in climate has only a limited range in time and can only be verified in the context of finite errors. Please note that even in classical mechanics, strict causal relationships cannot be verified experimentally! It is true that the classical laws of motion are generally deterministic. But the connection with the real physical world is always possible only with limited accuracy due to the unavoidable measurement errors. Therefore, the actual state can only be given with a certain probability distribution within state ranges. In the usual discussion, this important part of physics is often faded out; one likes to limit oneself to the equations of motion alone. Natural events also have their own time scales, namely, the so-called Lyapunov times tLyap (these result from the expansion rates at tLyap = λ−1 ). This Lyapunov time of a weather situation, it is about seconds to days, for climate years to decades. We now have to compare the two relevant time scales tM and tLyap . Three cases are possible: • The Lyapunov time of the climate system under investigation is much larger than the humanly relevant time scale (tLyap >> tM ): Then we can imagine the initial state inaccuracy to be smaller and smaller in our minds, because this would only increase the prediction time. Even if it would become infinite in our thoughts because we let the measurement inaccuracy become zero, we would not even notice it. Therefore, in these cases we consider the event (within the accepted accuracy) to be predictable, causal.

83 Mathematics and Climate Change

2241

• The human relevant time scale is much larger than the Lyapunov time of the investigated system (tM >> tLyap ): In this case prediction is no longer possible; the actual course is completely different from the expected one. We observe statistical, random behavior, since we do not know the actual initial state. • The relevant time scale is about as large as the Lyapunov time of the system under study (tM ≈ tLyap ): Then no exact but approximate predictions are possible; it is also not entirely random, statistically. The predictions can even be improved by measurement progress or by less demanding requirements for prediction accuracy. A good example is the weather forecasts, which are only possible to a limited extent; in the short and medium term, they are now quite reliable. In essence, therefore, causality depends on the time of interest in comparison to the forecast time, whether we can regard a phenomenon “practically” as causal, as predictable in (sometimes excellent approximation), or whether the event appears to us to be completely random, or finally as lying in the transition area and is therefore experienced as improvable by increasing the accuracy, i.e., as neither causal nor statistical. Climate is not only a differential equation; it must also be coupled to the real world by specifying the initial values with measurement errors and by translating the final values into measurable predictions. Thus it loses its purely mathematical, causal character determined by the solution of differential equations. Besides the initial conditions, uncertainties can appear through the external forcing. Prominent external forcing are the change in greenhouse gases into the atmosphere (e.g., Fig. 8) which strongly affect the long-term evolution of the Earth system. Another external forcing is due to changes in insolation by orbital parameters. There parameters vary on multi-millennial time scales (thousand years = ky) and can be calculated by orbital theory. Milankovitch (1941) suggested the ice sheet growth and decay is triggered by this external forcing. There are several open questions for paleoclimate dynamics. Despite the pronounced change in Earth system response evidenced in paleoclimatic records, the frequency and amplitude characteristics of the orbital parameters, i.e., eccentricity (∼100 ky), obliquity (∼41 ky), and precession (∼21 and ∼19 ky), do not vary (Berger and Loutre, 1991), the climate frequency does. The uncertainty on long time scale is usually dominated by the external forcing and the short time scale by the initial value problem; the intermediate times at 10–50 years for the coming decades are dominated by internal variability and uncertainty in model physics (Hawkins and Sutton, 2009). The uncertainty of global and especially regional temperature estimates on decadal to multidecadal time scales is manifested by large-scale coherent pattern like AMO, PDO, and the quasi-decadal mode (cf. Fig. 2). For a while, people tended to think that deterministic models would still always provide the best models in these cases. One thought of the climate system is that deterministic models would be completely adequate for describing the Earth’s atmosphere, which is basically just a layer of gas subject to external heating. As seen here, the phenomenon of stochasticity means that this is not so. As in the Lorenz system, chaos is introduced into the climate system. So, in many cases, for quite fundamental reasons, deterministic mathematical models do not provide adequate models. Statistical models are

2242

G. Lohmann

Fig. 8 Different climate scenarios with a coupled Earth system model (Ackermann et al., 2020). Time series of 11-year mean (a) CO2 forcing as concentration of CO2 equivalent in the atmosphere, (b) global near-surface average temperature, (c) sea ice volume in the Northern Hemisphere; shaded areas indicate 1 standard deviation

mathematical models, each replicate realization of which will be different from other realizations with the same model, even under identical conditions. Statistical models are the major means of making sense of the climate dynamics (Fig. 9).

Boltzmann Dynamics One of the most significant theoretical breakthroughs in statistical physics was due to Ludwig Boltzmann (Boltzmann 1896, 1995 for a recent reprint of his famous lectures on kinetic theory), who pioneered nonequilibrium statistical mechanics. Boltzmann postulated that a gas was composed of a set of interacting particles, whose dynamics could be (at least in principle) modeled by classical dynamics. Due to the very large number of particles in such a system, a statistical approach

83 Mathematics and Climate Change

2243

Fig. 9 The wavelet sample spectrum of long-term climate change. The climate record is based on Lisiecki and Raymo (Lisiecki and Raymo, 2005). The wavelet is calculated using Morlet wavelet with ω0 = 6. Thin and thick lines surround pointwise and areawise significant patches, respectively

was adopted, based on simplified physics composed of particle streaming in space and billiard-like inter-particle collisions (which are assumed elastic). As already mentioned above, a fluid can be described by several physical theories, of different granularities. The fact that we can, in principle, recover the phenomena predicted by the coarse-grained theories from solutions of the finegrained theories also suggests a non-conventional way of constructing numerical algorithms for simulating fluid flows: Instead of directly modeling the coarsegrained equations (i.e., Navier-Stokes equations for human-scale flows), we can construct a simplified model of the fine-grained equations, which will exhibit the same behavior at the larger scales. In the following, an example derived for the Lattice Boltzmann Model (LBM) is shown which is related to the thermohaline circulation. Water that is dense enough to sink from the surface to the bottom is formed when cold air blows across the ocean at high latitudes in winter in the northern North Atlantic (e.g., in the Labrador Sea and between Norway and Greenland) and near Antarctica. The wind cools and evaporates water. If the wind is cold enough, sea ice forms, further increasing the salinity of the water because sea ice is fresher than seawater and salty water remains in the water when ice is formed. Bottom water is produced only in these regions, and the deep ocean is affected by these deepwater formation processes. In other regions, cold, dense water is formed, but it is not quite salty enough to sink to the bottom. At mid and low latitudes, the density, even in winter, is sufficiently low that the water cannot sink more than a few hundred meters into the ocean. The only exception are some seas, such as the Mediterranean Sea, where evaporation is so great that the salinity of the water is sufficiently great for the water to sink to intermediate depths in the seas. If these seas are can exchange water with the open ocean, the waters formed in winter in the seas spreads out to intermediate depths in the ocean. A numerical solution of this equation is shown in Fig. 10.

c

y

x

Vorticity time=22320

x

Vorticity time=9990

0.0

0.2

0.4

0.6

0.8

1.0

0.2

0.4

0.6

0.8

1.0

d

b

x

Vorticity time=7245

x

Vorticity time=7152

0.2

0.4

0.6

0.8

1.0

0.2

0.4

0.6

0.8

1.0

Fig. 10 Four examples of the ocean flow for different boundary conditions, and fixed Prandtl number = 1 and Rayleigh number = 45000. The contours show lines of constant vorticity; the colors in the background display the temperatures (purple, warm; blue, cold). For the right scenarios, an obstacle representing an oceanic sill is implemented. (a) Two hemisphere temperature. (b) Ridge and a two hemisphere temperature. (c) Linear temperature gradient. (d) Flow including a ridge

y

y y

a

2244 G. Lohmann

83 Mathematics and Climate Change

2245

Conclusions Climate change occurred during the history of the Earth, the tectonic movements over billions of years, and climate has varied between extremes before any anthropogenic action could have arisen. However, anthropogenic action in terms of heavy usage of fossil fuel has the potential to affect the Earth to a point where its habitability is significantly affected. In terms of the time scale, it is noted that we might disturb planetary-scale processes in the course of a few decades. The complication is due to the fact that the climate system has inherent fluctuations (internal climate variability), uncertainties in model formulations, and scenario uncertainties for past and future climate scenarios. Modeling is necessary to produce a useful understanding of abrupt climate processes. Model analyses help to focus research on possible causes of abrupt climate change, such as human activities; on key areas where climatic thresholds might be crossed; and on fundamental uncertainties in climate system dynamics. Improved understanding of abrupt climatic changes that occurred in the past and that are possible in the future can be gained through climate models. For climate science, most fundamental laws were discovered decades ago (Landau and Lifshitz, 1987) (although there are Navier-Stokes existence and smoothness problems in three dimensions, see The Clay Mathematics Institute, http://www.claymath.org/millennium-problems/). The system has specific scales and characteristic numbers. Part of the uncertainty is due to the difficulty to find a proper description of the system. Since it is hard to disagree with the simple statement “more data are better,” the task here is rather to identify those dimensions in the data space where invested resources may yield to a maximum of new information. In this way, data assimilation techniques could help for an estimate of the state of the system but also its uncertainty (Burgers et al., 1998; Kalman, 1960; Nerger and Hiller, 2013). High-resolution models are required to elucidate the causal chains in the climate system, notably during abrupt transitions of the last deglaciation, and provide a benchmark for future transitions under rapid CO2 increase. Practically, given the present high-performance computer capacities, efficient and parallelized model codes, it is now possible to conduct simulations for 50–100 model years per day even with a multi-scale ansatz (Lohmann et al., 2020; Sein et al., 2018). Recent developments have considerably improved the computational efficiency and scalability of unstructured-mesh approaches on high-performance computing systems (Danilov et al., 2017). The surface ocean current in such high-resolution simulation (Fig. 11) has a completely different structure including eddies than the structure in coarse resolution model. Weather and climate extremes cause huge economic damages and harm many lives each year (Franzke, 2017). There is evidence that some types of weather and climate extremes, like heat waves and flooding, have already increased or intensified over the last few decades, and climate projections reveal a further intensification for many types of weather and climate extremes in many regions though the uncertainties still remain large. Future research may be enhanced along

2246

G. Lohmann

Fig. 11 Representation of small-scale features of ocean currents in high-resolution ocean models: simulated velocity field: simulated velocity field in the North Atlantic at 100 m depth in December 1950 using FESOM with high-resolution locally eddy-resolving mesh based on Sein et al. (2018)

three directions data, statistics, theory, and models, leading to an increase in the current knowledge about the climate evolution. It is crucial that researchers deepen or acquire the ability to integrate all directions into their arsenal of mathematical methods.

Cross-References  Interdisciplinary Mathematics and Sciences in Schematic Ocean Current Maps in

the Seas Around Korea  Limit Cycles in Planar Systems of Ordinary Differential Equations

References Ackermann L, Danek C, Gierz P, Lohmann G (2020) Amoc recovery in a multicentennial scenario using a coupled atmosphere-ocean-ice sheet model. Geophys Res Lett 47(16):e2019GL086,810 Arnold L (1998) Random dynamical systems. Springer, Berlin/Heidelberg Arnold L (2001) Hasselmann’s program revisited: the analysis of stochasticity in deterministic climate models, vol 49. Birkhäuser, Boston Arnold L, Imkeller P (1998) Normal forms for stochastic differential equations. Probab Theory Related Fields 110(4):559–588 Berger A, Loutre MF (1991) Insolation values for the climate of the last 10 million years. Quat Sci Rev 10(4):297–317 Bhatnagar P, Gross EP, Krook MK (1954) A model for collision process in gases. I. Small amplitude processes in charged and neutral one-component system. Phys Rev 94:511 Boltzmann L (1896) Vorlesungen über Gastheorie: 2 volumes (in German). Leipzig 1895/98 UB: O 5262–6.

83 Mathematics and Climate Change

2247

Boltzmann L (1995) Lectures on gas theory. Dover Publications, New York. ISBN:9780486684550 Buckingham E (1914) On physically similar systems; illustrations of the use of dimensional equations. Phys Rev 4(4):345–376. https://doi.org/10.1103/PhysRev.4.345 Burgers G, Jan van Leeuwen P, Evensen G (1998) Analysis scheme in the ensemble kalman filter. Mon Weather Rev 126(6):1719–1724 Cercignani C (1990) Mathematical methods in kinetic theory, 2nd edn. Plenum. ISBN:9780306434600 Chintchin A (1934) Korrelationstheorie der stationären stochastischen prozesse. Math Ann 109(1):604–615 Chorin AJ, Kast AP, Kupferman R (1999) Unresolved computation and optimal predictions. Commun Pure Appl Math 52(10):1231–1254 Claussen M, Mysak L, Weaver A, Crucifix M, Fichefet T, Loutre MF, Weber S, Alcamo J, Alexeev V, Berger A, et al (2002) Earth system models of intermediate complexity: closing the gap in the spectrum of climate system models. Clim Dyn 18(7):579–586 Danilov S, Sidorenko D, Wang Q, Jung T (2017) The finite-volume sea ice–ocean model (fesom2). Geosci Model Dev 10:765–789 Einstein A (1905) Investigations on the theory of the brownian movement. Ann der Phys 17:549– 560 Farrell BF, Ioannou PJ (1996) Generalized stability theory. Part I: autonomous operators. J Atmos Sci 53(14):2025–2040 Fraedrich K, Jansen H, Kirk E, Luksch U, Lunkeit F (2005) The planet simulator: towards a user friendly model. Meteorol Z 14(3):299–304 Franzke CL (2017) Impacts of a changing climate on economic damages and insurance. Econ Disasters Clim Change 1(1):95–110 Frisch U (1996) Turbulence: the legacy of A.N. Kolmogorov. Cambridge University Press. ISBN:0521-45103-5 Frisius T, Lunkeit F, Fraedrich K, James IN (1998) Storm-track organization and variability in a simplified atmospheric global circulation model. Q J R Meteorol Soc 124(548):1019–1043 Gottwald G (2010) On recent trends in climate dynamics. AMS Gazette 37(5) Haken H (1983) Synergetics. Springer, Berlin/Heidelberg Hasselmann K (1976) Stochastic climate models. Part I. Theory. Tellus 6:473–485 Hawkins E, Sutton R (2009) The potential to narrow uncertainty in regional climate predictions. Bull Am Meteorol Soc 90(8):1095–1108 Hoskins B, Simmons A (1975) A multi-layer spectral model and the semi-implicit method. Q J R Meteorol Soc 101(429):637–655 Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82(1):35–45. https://doi.org/10.1115/1.3662552 Landau LD, Lifshitz EM (1959) Fluid mechanics, course of theoretical physics, vol 6. Pergamon Press, Oxford Landau LD, Lifshitz EM (1987) Course of theoretical physics: fluid mechanics, vol 6, 2nd edn. Butterworth-Heinemann. ISBN:978-0750627672 Langevin P (1908) On the theory of brownian motion. Comptes Rendues 146:530–533 Leith C (1975) Climate response and fluctuation dissipation. J Atmos Sci 32(10):2022–2026 Lisiecki LE, Raymo ME (2005) A pliocene-pleistocene stack of 57 globally distributed benthic δ18o records. Paleoceanography 20(1). https://doi.org/10.1029/2004pa001071 Lohmann G (2018) ESD ideas: the stochastic climate model shows that underestimated holocene trends and variability represent two sides of the same coin. Earth Syst Dyn 9(4):1279–1281 Lohmann G, Schneider J (1999) Dynamics and predictability of stommel’s box model. A phasespace perspective with implications for decadal climate variability. Tellus A 51(2):326–336 Lohmann G, Butzin M, Eissner N, Shi X, Stepanek C (2020) Abrupt climate and weather changes across time scales. Paleoceanogr Paleoclimatol 35(9). https://doi.org/10.1029/2019pa003782 Lorenz EN (1960) Maximum simplification of the dynamic equations. Tellus 12(3):243–254 Lorenz EN (1963) Deterministic nonperiodic flow. J Atmos Sci 20(2):130–141

2248

G. Lohmann

Lorenz EN (1984) Irregularity: a fundamental property of the atmosphere*. Tellus A 36(2):98–110 Lorenz EN (1986) On the existence of a slow manifold. J Atmos Sci 43(15):1547–1558 Lunkeit F, Fraedrich K, Bauer S (1998) Storm tracks in a warmer climate: sensitivity studies with a simplified global circulation model. Clim Dyn 14(11):813–826 Maas LR (1994) A simple model for the three-dimensional, thermally and wind-driven ocean circulation. Tellus A 46(5):671–680 Milankovitch MK (1941) Kanon der erdbestrahlung und seine anwendung auf das eiszeitenproblem. R Serb Acad Spec Publ 133:1–633 Mori H (1965) Transport, collective motion, and brownian motion. Prog Theor Phys 33(3):423– 455 Morice CP, Kennedy JJ, Rayner NA, Jones PD (2012) Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: the hadCRUT4 data set. J Geophys Res Atmos 117(D8) Nerger L, Hiller W (2013) Software for ensemble-based data assimilation systems–implementation strategies and scalability. Comput Geosci 55:110–118 Olbers D (2001) A gallery of simple models from climate physics. In: Imkeller P, von Storch J (eds) Stochastic climate models, progress in probability, 49:3–63 Oseledets VI (1968) A multiplicative ergodic theorem. Characteristic Ljapunov, exponents of dynamical systems. Trudy Moskovskogo Matematicheskogo Obshchestva 19:179–210 Palmer T (1996) Predictability of the atmosphere and oceans: from days to decades. In: Decadal climate variability. Springer, Berlin/Heidelberg, pp 83–155 Peixoto JP, Oort AH (1992) Physics of climate. American Institute of Physics, New York Rayleigh L (1916) On convection currents in a horizontal layer of fluid, when the higher temperature is on the under side. Phil Mag 6:529–546 Reddy SC, Schmid PJ, Henningson DS (1993) Pseudospectra of the orr–sommerfeld operator. SIAM J Appl Math 53(1):15–47 Roberts AJ (2008) Normal form transforms separate slow and fast modes in stochastic dynamical systems. Phys A Stat Mech Appl 387(1):12–38 Romanova V, Lohmann G, Grosfeld K, Butzin M (2006) The relative role of oceanic heat transport and orography on glacial climate. Quat Sci Rev 25(7–8):832–845 Saltzman B (1962) Finite amplitude free convection as an initial value problem – I. J Atmos Sci 19:329–341 Sein DV, Koldunov NV, Danilov S, Sidorenko D, Wekerle C, Cabos W, Rackow T, Scholz P, Semmler T, Wang Q, et al (2018) The relative influence of atmospheric and oceanic model resolution on the circulation of the north atlantic ocean in a coupled climate model. J Adv Model Earth Syst 10(8):2026–2041 Smagorinsky J (1963) General circulation experiments with the primitive equations: I. The basic experiment. Mon Weather Rev 91(3):99–164 Trefethen LN, Trefethen AE, Reddy SC, Driscoll TA (1993) Hydrodynamic stability without eigenvalues. Science 261(5121):578–584 Uhlenbeck GE, Ornstein LS (1930) On the theory of the brownian motion. Phys Rev 36(5):823 Wiener N (1930) Generalized harmonic analysis. Acta Math 55:117–258 Zwanzig R (1960) Ensemble method in the theory of irreversibility. J Chem Phys 33:1338 Zwanzig R (1980) Problems in nonlinear transport theory. In: Systems far from equilibrium. Springer, Berlin/Heidelberg, pp 198–225

Mathematical Models Can Predict the Spread of an Invasive Species

84

John G. Alford

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Population Growth Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dispersal by Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2250 2252 2260 2271 2273 2273

Abstract Invasive species are nonindigenous plants and animals that have the potential to cause great harm to both the environment and native species. If an invasive species is able to survive and spread throughout the environment, there may be large financial losses to the public. Public policy makers and scientists are responsible for developing and funding programs to control and eradicate invasive species. These programs are founded on the science of invasion ecology and the biological characteristics of the invader. Conclusions drawn from mathematical modeling have contributed to this knowledge base. This chapter presents some of the fundamental mathematical models in population ecology that have been used to predict how an invasive species population can grow and disperse after its introduction. These predictions are shown to be consistent with experimental data. The chapter concludes with a brief discussion of how the basic modeling principles and results that are presented here have contributed to developing strategies for control and eradication efforts.

J. G. Alford () Department of Mathematics and Statistics Huntsville, Sam Houston State University, Huntsville, TX, USA e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_52

2249

2250

J. G. Alford

Keywords Invasive species · Mathematical model · Exponential · Logistic · Allee effect · Reaction-diffusion equation

Introduction Humans often benefit from their relationship with other species in the plant and animal kingdoms. For example, domesticated animals have been used throughout history to manage farms and ranches and many plants are prized for their medicinal power (Petrovska 2012). But encounters with other species can be harmful or even perilous. Homeowners across the country lose time and money on treating yards that are infested with chinch bugs or homes that are inhabited by termites or rodents. Many people have endured mosquito bites or poison ivy rashes and occasionally some people die after being bitten or mauled by an animal. Although many harmful species are indigenous, some were unintentionally introduced by unsuspecting agents. These invasive species can cause great economic loss and stress to our environment. Insects offer a sobering example. • The European gypsy moth (Lymantria dispar) was accidentally introduced from France to Medford, Massachusetts, in the late 1860s by an amateur entomologist (Liebhold and Tobin 2006). Since its introduction, it has defoliated millions of acres of trees in northeastern and mid-western forests decreasing wildlife habitat and (as of 2008) causing an estimated $ 3.9 billion in tree losses (Beck et al. 2008). • The spotted lanternfly (Lycorma delicatula) is native to China and was first detected in Pennsylvania in September 2014. It has since spread to New York, Delaware, and Virginia. The Pennsylvania Department of Agriculture declared that the spotted lanternfly poses a significant threat to its grape, apple, and peach industries, as well as the hardwood industry which accounts for nearly $ 17 billion in sales in Pennsylvania (Walker 2018). In response, the US Department of Agriculture is committing $ 17.5 million in emergency funding to stop the spread of the spotted lanternfly in southeastern Pennsylvania (Walker 2018). The Executive Summary of the National Invasive Species Management Plan (NISMP) states that an invasive species is one that is “non-native to the ecosystem under consideration and whose introduction causes or is likely to cause economic or environmental harm or harm to human health” (Beck et al. 2008). It has been estimated that only 5–50% of all nonindigenous species become widespread and of those, only (approximately) 0.3% of non-native plants and 25% of non-native animals ultimately cause harm (Keller et al. 2009). But because the harm an invasive species causes can be extreme, the control and eradication of an invasive species requires the well-funded and coordinated efforts of scientists, policy makers, and the general public.

84 Mathematical Models Can Predict the Spread of an Invasive Species

2251

Biological invasions occur over time and space and in a distinct sequence of events which include the: (1) transport of the species to its new location, (2) establishment of the species in its new location, (3) spread of the species to surrounding regions, and (4) impact to the environment and public (Keller et al. 2009; Lockwood et al. 2007). The introduction of a non-native species into a new habitat often occurs via human transport (Beck et al. 2008) and many invaders are simply “hitchhikers” that overcome geographic barriers by traveling over land, sea, and air. For example, non-native aquatic species arrive by ships which dump their ballast water overboard into oceans or lakes (Mills et al. 1994); invading insects are transported in the luggage of unwitting airline passengers or in the wood of lumber shipments (Keller et al. 2009; Liebhold et al. 2006). Once it is introduced an invasive species must avoid predation, find food, shelter, and mates and then grow its population to self-sustaining levels. Researchers suggest that only 5% of invading plant species survive their introduction while 50% of invading animal species survive their introduction (Keller et al. 2009). Assuming the invader can survive and is able to establish itself without the reintroduction of new species, it can then expand its territory by dispersing locally and over long distances using both natural and anthropogenic means (Keller et al. 2009). Examples of natural dispersal mechanisms include active transport such as walking, flying, and swimming or passive transport by wind or current. Anthropogenic dispersal again includes “hitchhiking” on vehicles as plants and animals are transported to markets for consumption or purchase. After expanding its range, a non-native species can cause irreparable harm such as destroying forests and fruit trees or out-competing native species for natural resources. In 2005 it was estimated that approximately 50,000 nonindigenous species in the United States caused major environmental damage and losses totaling approximately $ 120 billion per year (Pimentel et al. 2005). The public is constantly at risk for new biological invasions so that there is a continual need to have programs aimed at controlling or even eradicating invasive species. These programs are expensive. For example, (as of 2009) controlling the spread of the Gypsy moth cost as much as $ 12 million annually (Keller et al. 2009). But doing nothing can cost much more. Part of the difficulty of developing and maintaining programs to eradicate or control an invasive species is a lack of understanding of the invasion ecology. Because of the inherent danger of the spread of an invasive species as well as the large time and distance scales at which an invasion may occur, traditional means of experimentation in the field may not be practical. In this regard, mathematical models have proven to be helpful. Lewis et al. recently stated (Lewis et al. 2016) “mathematical modeling and computer simulations create a convenient virtual laboratory where the hypotheses can be tested and thus can provide a valuable supplement to field experiments.” Furthermore they state that “mathematical modeling is nowadays regarded as good practice and has been proven to be an efficient and cost-effective approach.” This chapter focuses on mathematical models of the growth and spatial spread of populations of invading species. Population growth models have been in use at least since the early thirteenth century when Leonardo of Pisa (1175–1250), more

2252

J. G. Alford

commonly known as Fibonacci, published Liber abaci (Book of Calculation) in which he considered practical applications of the decimal number system (Bacaër 2011). In his book, Fibonacci presented a computational exercise on the population growth of rabbits and deduced that the Fibonacci sequence (1, 1, 2, 3, 5, 8, . . . ) would predict the number of pairs of rabbits at the end of each month when “it is the nature of them in a single month to bear another pair and in the second month those born to bear also.” Johannes Kepler (1571–1630) recognized that the Fibonacci sequence could be expressed in the form ni+2 = ni + ni+1 , n1 = 0, n2 = 1, resulting in the earliest example of a difference equation to model population growth (Edelstein-Keshet 2005). Difference equations are now commonly used to describe the population dynamics of species that have non-overlapping generations or discrete life-cycle stages such as insects (Edelstein-Keshet 2005). This chapter will mainly focus on continuous time models of population growth for organisms whose generations overlap. These models are often formulated as ordinary differential equations. In addition, the diffusion equation will be considered which is a partial differential equation often used to describe the spatial spread of a species. In particular, the rate of spread of an invading species can be computed by mathematical analysis of the various forms of the diffusion equation. Knowing the spread rate has important implications for the control of a biological invasion. Indeed, one of the foremost mathematical ecologist of our time, Alan Hastings, has stated that understanding “how the rate of spread depends on the parameters of a simple, robust, model may help in the design of control measures which would slow the rate of spread” (Hastings 1996). This chapter is divided into two main sections: “Population Growth Models” and “Dispersal by Diffusion.” The first section begins with a very simple example to illustrate exponential population growth in both the discrete and continuous time cases. This is followed by a description of the continuous time models that account for resource limited (or logistic) population growth and the Allee effect which models the failure of an invading population to establish when its initial size is too low. The second section develops the diffusion model to describe the spatial spread of a population and shows how this model can be used to explain the well known and empirically verified linear rate of spread of a population in time (Hastings 1996; Shigesada and Kawasaki 1997). In some cases, data sets have been obtained from the literature to show how the theoretical predictions of these models correctly describe the population dynamics of invasive species.

Population Growth Models Once a non-native species is introduced it must increase in numbers in order to survive and become established. The first population growth models were investigated by the scientists Leonhard Euler (1707–1783) in 1748 and Thomas Malthus

84 Mathematical Models Can Predict the Spread of an Invasive Species

2253

(1766–1834) in 1798 (Bacaër 2011). Malthus wrote that “population when unchecked, increases in a geometrical ratio.” Equivalently, populations grow exponentially. To fix this idea, the table below shows a simple example. The population count is N on day t and there are initially 5 individuals. t N

0 5

1 10

2 20

3 40

4 80

5 160

6 320

The ratio of any two successive population counts is 2 which is the common ratio. That is, the population doubles every day. This sequence can be expressed as a recursion formula Nt+1 = 2Nt ,

N0 = 5.

(1)

Thus, given the number of species on any given day (N t ), the population on the following day (N t + 1 ) will double. The initial population is N 0 . In order to find the population on the jth day, simply multiply 5 by j factors of the number 2. For example, the population on day 4 is N 4 = 5 · 2 · 2 · 2 · 2 = 5 · 24 = 80. The solution of (1) can be concisely written as Nt = 5 · 2t ,

t = 0, 1, 2, . . .

(2)

Note that this formula is consistent with the initial condition: N0 = 5 · 20 = 5 · 1 = 5. Malthus’ reasoning has been criticized as, for example, he predicted exponential population growth based on the growth in North America which (at the time) was largely due to immigration (Seidl and Tisdell 1999). However, Malthus’ theory of exponential growth forms the basis of many mathematical models and was influential in Charles Darwin’s the formulation of the theory of evolution. Indeed, Darwin wrote in 1868 that “I saw, on reading Malthus on Population, that Natural Selection was the inevitable result of the rapid increase of all organic beings” (Seidl and Tisdell 1999). In (1) it is assumed that time is a discrete variable so that t = 1,2,3,.... Discrete models are typically used for populations that have non-overlapping generations which include semelparous organisms (e.g., salmon and agave plants) that reproduce once and then die (Ricklefs and Miller 1999). Many organisms can produce offspring continuously in which case it is assumed births may occur at any time t. Let t take any (nonnegative) value and N(t) represent the population density at time t. An example of the units that may be used for N is the number of species per square kilometer. Consider any time t and some later time t∗ and express the difference in these values as t = t − t∗ and the corresponding difference in population density as N = N(t) − N(t∗ ). Let us assume the change in population density from t to t∗ is proportional to the population density at t and t so that N = rN(t)t or N/t = rN(t) where r is the constant of proportionality. If t and t∗ are close, then

2254

J. G. Alford

t is small and N/t approximates the instantaneous rate of change of N with respect to t which is a quantity from calculus known as the derivative and denoted dN/dt. Formally, the quantity N/t is replaced with the derivative dN/dt to get the equation dN = rN (t). dt

(3)

The per capita growth rate r is a parameter in the model. The dimension of r is (time)−1 In mathematical models, parameters typically must be determined experimentally. In this chapter, parameters are considered to be constants that reflect the type of species being modeled. In general, parameters may depend on time as, for example, environmental conditions change. Given an initial population density, N(0) = N 0 , the solution of (3) is N(t) = N0 ert ,

(4)

which is the continuous time analog of (2). Here e is the natural exponential base and e ≈ 2.718. Figure 1 shows the plots two examples of invasive species data that exhibit exponential growth. Figure 1a is data collected from a submersed aquatic weed, Hydrilla verticillata, which forms dense canopies at the water surface. These canopies raise surface water temperatures, change pH, exclude light, and consume oxygen, resulting in indigenous plant displacement and decreased sport fish populations (Beck et al. 2008). Figure 1b is data collected from the Eurasian Collared Dove (Streptopeltia decaocto), a bird native to India which in the early 1900s colonized much of Europe and northwest Africa within a short period of about 50 years (Fujisaki et al. 2010). (Note that for all of the graphs in this chapter such as those in Fig. 1 that have been adapted from the literature, the software Graph Grabber version 2.0 by Quintessa was used to collect data values.) Exponential growth has been used to model many biological systems (e.g., tumors Rodriguez-Brenes et al. (2013), tilapia Santos et al. (2008), and turkeys McJunkin et al. (2005)) and has been referred to as “the first law of population dynamics” (Turchin 2001). Exponential growth is typical of the early phase of growth when population levels are relatively low and environmental conditions are favorable. Populations that grow exponentially continue to increase, but it is reasonable to assume there should be an upper bound to population growth as larger and larger numbers of species begin to deplete available resources. The idea of a self-limiting population can be traced back to Malthus who surmised that the ability of humankind to produce food would increase arithmetically rather than geometrically. That is, on any given day the food supply on the following day is determined by adding a constant value (i.e., the common difference). He concluded that since exponential growth far outpaces linear growth, population levels will eventually surpass the amount of food available and there would be “a strong and constantly operating check on population from the difficulty

84 Mathematical Models Can Predict the Spread of an Invasive Species

a

2255

1400 1200 1000 800 600 400 200 0 1984

b

3.5

1986

1988

1990

1992

1994

1996

104

3 2.5 2 1.5 1 0.5 0 1948 1950 1952 1954 1956 1958 1960 1962 1964

Fig. 1 The points in graph (a) are data for the total area covered by Hydrilla verticillata in Gaston Lake, NC as measured for each of the years shown (adapted from Madsen and Owens (2000)). The points in graph (b) are data for the total number of Eurasian Collared Doves in the Netherlands as measured for each of the years shown (adapted from Hengeveld (1992)). The equations for each of the curves are exponential (4) with N0 = 1.83 and r = 0.617 in (a) and N0 = 17 and r = 0.538 in (b)

of subsistence” (Bacaër 2011). Self-limited growth has been called the “second foundational principle” of population ecology (Turchin 2001). In order to formulate the mathematical model, population density will be the main factor that affects the limits to growth while other factors such as fluctuating environmental conditions will be ignored. In this case, a density-dependent growth rate is assumed and the constant r in (3) is replaced by a function of the population density, denoted f (N), which is the per capita growth rate. The general model is then

2256

J. G. Alford

dN = Nf (N ). dt

(5)

It is assumed that given an initial population N 0 there is exactly one solution of this equation and no two solutions of (5) can be equal at a single value of t. The conditions on f (N) such that this holds can be found in any elementary textbook on differential equations such as (Boyce and DiPrima 2012). The properties of f (N) that will limit the population density will now be examined. It is assumed for biological feasibility that N ≥ 0. Because the derivative dN/dt is the instantaneous time rate of change of N, if f (N) is negative it can be seen from (5) that the population is decreasing whereas if f (N) is positive the population is increasing. This can be summarized as f (N ) > 0 ⇒ N increases,

f (N) < 0 ⇒ N decreases.

It is assumed that if there are two values of N, denoted N1 and N2 , such that f (N2 ) > 0 and f (N1 ) < 0 there must be a value of N, denoted N ∗ , between N1 and N2 such that f (N ∗ ) = 0. The value N ∗ is called an equilibrium solution of (5). For self-limiting population growth there will be an equilibrium N ∗ so that f (N) is positive when N < N ∗ and negative when N > N ∗ Ecologists refer to this equilibrium as the carrying capacity of the system which will be denoted with the letter K. The carrying capacity may change with environmental conditions and may be both space and time dependent (Ricklefs and Miller 1999). In general, equilibrium solutions of (5) are constant values of N which make the right side of this equation zero. Note that N ∗ = 0 is an equilibrium solution that represents population extinction. If there are non-zero equilibrium solutions, they must obviously satisfy   f N ∗ = 0.

(6)

Solving this equation depends on the specific form of the per capita growth rate function f (N) and (when possible) algebra can be used to find its solution. If (6) cannont be solved by hand, a computer or calculator can be used to solve the equation. If the initial population density is at equilibrium, N0 = N ∗ , then the solution of (5) is N(t) = N ∗ for all t. If N = N ∗ , the solution of (5) will change with time but will typically approach one of the equilibrium solutions. Thus the long term behavior of the model is determined by the equilibrium solutions. One of the earliest models that included a carrying capacity is the logistic equation first proposed by Pierre Verhulst (1804–1839) in 1838 (Bacaër 2011). This model has a per capita growth rate function   N . f (N) = r 1 − K

(7)

Here both r (growth rate) and K (carrying capacity) are parameters. When N is very small relative to K (i.e., N/K is very close to zero), it can be seen that f (N) ≈ r.

84 Mathematical Models Can Predict the Spread of an Invasive Species

2257

a

b 200

150

100

50

0

0

50

100

150

200

250

Fig. 2 The graph in (a) shows how the population increases according to a per capita logistic growth rate (7) when K = 200. The graph in (b) is the solution of (5) with per capita growth rate (7). The population grows to carrying capacity K for all positive initial populations. Here the growth rate is r = 0.1

It follows that for small population densities the logistic model is (approximately) equivalent to the exponential model (3). Note that f (K) = 0 and there are two equilibrium solutions given by N ∗ = 0 (extinction) and N ∗ = K (carrying capacity). When the population density N is smaller than K and N/K < 1, it follows that f (N) > 0 and the population increases. It can be concluded that all positive initial populations N0 < K will approach the carrying capacity K. This is illustrated in Fig. 2a by the upwards direction of the arrows when N < K = 200. Figure 2b shows a graph of N(t) approaching K = 200 as t → ∞. An explicit solution to the logistic equation can be found by separation of variables technique and is shown in Mathematics and Recurrent Population Outbreaks.

2258

J. G. Alford

It is predicted from the logistic model that every invading species will survive and grow to its carrying capacity. But as discussed in the introduction, most invading plant species and about half of all invading animal species fail to survive. The Allee effect, named after the ecologist Warder C. Allee (1885–1955), is often cited as one reason an invading population fails to establish and grow (Taylor and Hastings 2005). The Allee effect “requires that some measurable component of the fitness of an organism (e.g., probability of dying or reproducing) is higher in a large population” (Stephens et al. 1999). Conversely, “an individual of a species that is subject to an Allee effect will suffer a decrease in some aspect of its fitness when conspecific density is low” (Taylor and Hastings 2005). For example, per capita growth rate is reduced when an invading species struggles to find mates (Lockwood et al. 2007) at low population densities. Although mate limitation is the most commonly cited cause of the Allee effect (Kramer et al. 2009; Taylor and Hastings 2005), other mechanisms include cooperative defense, predator satiation, cooperative feeding, dispersal, and habitat alteration (Kramer et al. 2009). A simple way to model the Allee effect in (5) is to create a per capita growth function f (N) that is negative for values of N that are smaller than some prescribed value known as a threshold population (e.g., see Lewis et al. 2016). These models are said to have a strong Allee effect (Taylor and Hastings 2005). Henceforth, the word strong will be omitted when referring to the Allee effect. To fix the idea, let f (N) = β (α − N ) (N − K),

(8)

where β and α are positive parameters and 0 < α < K. In this case, there are three equilibrium solutions of (5) given by N ∗ = 0, N ∗ = α, and N ∗ = K. The parameter α controls the threshold population. Figure 3a illustrates how f (N) < 0 when N < α = 25 and f (N) > 0 when (α < N < K = 200. An initial population density less than α will become extinct. If the initial population is greater than α, the population will grow to its carrying capacityK. Figure 3b shows graphs of (5) with (8) where one population fails to establish (dashed) while the other grows to its carrying capacity (solid). In addition to deterministic models, Allee effects have been studied extensively in stochastic models . A stochastic model accounts for random fluctuations in population densities due to either demographic stochasticity (e.g., random fluctuations in birth and death rates) or environmental stochasiticty (e.g., random extreme weather events). In Brassil (2001), stochasticity is included in a discrete model of the form Nt+1 = Nt + (1 − Nt ) (Nt − L) + Nt E.

(9)

Here f (Nt ) = (1 − Nt ) (Nt − L) so that L is the Allee threshold population. Demographic stochasticity can be measured using the positive terms in f for birth rate and the negative terms for death rate. The parameter E controls environmental stochasticity. Here E is a normally distributed random variable with zero mean. In Dennis (2002), a stochastic model is presented as a diffusion process which describes the change in the number of species at time t as

84 Mathematical Models Can Predict the Spread of an Invasive Species

2259

a

b 200

150

100

50

0 0

50

100

150

200

250

300

350

Fig. 3 The graph in (a) shows how the population increases or decreases according to a per capita growth rate with an Allee effect (8) when α = 25 and K = 200. The graph in (b) is the solution of (5) with per capita growth rate (8). The population grows to carrying capacity K (solid curve) when the initial population N0 > α and becomes extinct when N0 < α (dashed curve). Here the value of β is 5 × 10−6

dN t = m (Nt ) dt +

 v (Nt ) dW t ,

(10)

where dt is the time increment, m is the infinitesimal mean function (growth dynamics), v is the infinitesimal variance function (stochastic noise), and dWt has a normal distribution with mean 0 and variance dt (Dennis 2002). The Allee dynamics can be described through the proper choice of the function m, while either demographic or environmental stochasticity (or both) is controlled by the function v. Repeated simulations of stochastic models yield various trajectories (or realizations) that reflect a probability distribution for the population size at time t. Unlike deterministic models which predict populations will remain at carrying capacity for

2260

J. G. Alford

all time (assuming the initial population is large enough), stochastic models predict that populations will go extinct if a simulation is performed for a large enough amount of time (Taylor and Hastings 2005). Computations and probability theory can be applied to make predictions about the population dynamics. For example, in (Brassil 2001) it is found that increasing the threshold population parameter L in Eq. (9) tends to decrease the mean time to extinction. In Dennis (2002), the probability of extinction is determined from Eq. (10) as a “first passage” probability which is the probability a population reaches a smaller value a before reaching a larger value b. When considered as a function of initial population size, this probability has an inflection point at the Allee population threshold.

Dispersal by Diffusion Equation (5) is an ordinary differential equation and is derived by assuming that at any time the population density of an invading species is the same at every spatial location. When an invading species disperses to surrounding regions, its population density may vary with space as well as time. Consider one dimension and let x denote the spatial coordinate and N(x, t) denote the population density. The diffusion equation is a classic model of animal dispersal given by ∂N ∂ 2N =D 2. ∂t ∂x

(11)

Here the ordinary time derivative, dN/dt, has been replaced with the partial time derivative, ∂N/∂t. On the right hand side, there is the second partial derivative of N with respect to x. The constant D is called the diffusion coefficient. In their classic book Diffusion and Ecological Problems, Okubo and Levin (2001) state that “when the microscopic irregular motion of each particle gives rise to a regularity of motion of the total particle group (macroscopic regularity), the phenomenon of diffusion arises.” In fact, the diffusion equation can be derived (formally) by assuming organisms move in a random manner. The following derivation can be found in Edelstein-Keshet (2005). Let n˜ (x, t) denote the number of organisms at position x and at time t. It is assumed here that time and space are discretized and each time and space interval is constant where t denotes the time increment and x denotes the space increment. Thus, if t is the current time and x is the current position, then at time t + t an organism may stay at its current position, move to its left a distance x, or move to its right a distance x. Then at t + t the total number of organisms at position x is given by n˜ (x, t + t) = n˜ (x, t) + λr n˜ (x − x, t) − λr n˜ (x, t) + λl n˜ (x + x, t) − λl n˜ (x, t) ,

(12)

where λr and λl are the probabilities (assumed constant) of moving right and left, respectively. Each term on the right represents a possible contribution to the number

84 Mathematical Models Can Predict the Spread of an Invasive Species

2261

of organisms at position x at time t + t. For example, some organisms that were just to the left of x (that is, at position x − x) will move with probability λr to their right a distance x and arrive at position x thereby increasing the number of organisms at x. On the other hands, organisms that are at position x may choose to move with probability λr to their right a distance x and arrive at position x + x thereby decreasing the number of organisms at x. To derive the diffusion equation then requires some arguments from calculus. Using the Taylor series, expansions yield expressions for n˜ in terms of its derivatives n˜ (x, t + t) = n˜ (x, t) +

1 ∂ 2 n˜ ∂ n˜ t + (t)2 + HOTt ∂t 2 ∂t 2

(13)

n˜ (x ± x, t) = n˜ (x, t) ±

1 ∂ 2 n˜ ∂ n˜ x + (x)2 ± HOTx ∂x 2 ∂x 2

(14)

Here HOTt stands for higher order terms in t and represents terms in the expansion which are all proportional to (t)k , k ≥ 3 and HOTx stands for higher order terms in x and represents terms in the expansion which are all proportional to (x)k , k ≥ 3. Now assume λr = λl = 1/2 (organisms are equally likely to move left or right), substitute (13) and (14) into (12), divide both sides by t, and simplify to get 1 ∂ 2 n˜ ∂ n˜ 1 1 (x)2 ∂ 2 n˜ + · HOT · · HOTx . t + = ± t ∂t 2 ∂t 2 t 2t ∂x 2 t Note that (1/t) · HOTt has terms that are proportional to (t)k , k ≥ 2 and when both t → 0 and x → 0 in such a way that (x)2 /(2t) is constant, this yields the equation ∂ n˜ (x)2 ∂ 2 n˜ = · ∂t 2t ∂x 2

(15)

which is the diffusion equation for n. ˜ It can also be seen by comparing (11) and (15) that D is equivalent to (x)2 /(2t). Another standard derivation of (11) uses conservation of mass and Fick’s Law which assumes that populations will move from areas of high concentration to areas of low concentration (e.g., see Edelstein-Keshet (2005), Kot (2001)). Although these appear to be overly simplistic assumptions about how organisms move, the diffusion equation has been used to accurately describe movement of some organisms (Andow et al. 1990). The diffusion coefficient D is a parameter. Its value depends on the amount of time that it takes the invading species to disperse from a release point. The dimensions of D are (length)2 time)−1 . A biologist can estimate D using a markrecapture experiment. In this experiment, a random sample of individuals within the target population are caught, marked (e.g., tagged or painted), and then released.

2262

J. G. Alford

At some time later, a random sample of the population is again taken and D is estimated according to the formula D = MSD/(π t) where MSD is the mean squared distance or the square of the average distance (averaged over all recaptured animal distances) and t is time from release (Shigesada and Kawasaki 1997). The diffusion Eq. (11) is a linear equation and it can be solved explicitly for N(x, y, t). In order to discuss a more general case, consider the two-dimensional model which includes a second spatial coordinate y where D ∂ 2 N/∂y2 is added to the right side of (11). Assume the release point is located at the coordinates (x, y) = (0, 0) and the initial population is some positive number at (0, 0) and zero everywhere else which is an idealization of a concentrated release over a small area. In this case, it can be shown that the solution of (11) is given by (Shigesada and Kawasaki 1997)   2 N0 x + y2 . N (x, y, t) = exp − 4π Dt 4Dt

(16)

Here the expression “exp” denotes the exponential function base e with the power in parentheses. The value of x2 + y2 is the square of the distance from the release point to the position (x, y) and N0 = N(0,0,0) denotes the initial number of invading species at the release point. Note that (16) is not defined for t = 0, but it can be shown that as t → 0, the function N(x, y, t) has properties that mimic a point release of the organism at the origin (Logan 2004). In the two-dimensional case, Eq. (11) has solution (16) which is a model of dispersal only. The formula (16) predicts that the invasion spreads out concentrically from its release point, but due to the inverse relationship with t, the population density N will approach zero as t increases and the invading species will become extinct. In order to model an invading species that survives the invasion, a term that accounts for population growth will be included in the diffusion model. Using the same form as the right side of (5) gives the reaction-diffusion equation ∂N =D ∂t



∂ 2N ∂ 2N + 2 ∂x ∂y 2

 + Nf (N ).

(17)

The reaction-diffusion equation is routinely used to predict the population density of animals that disperse and grow, including insects (Karieva 1983), worms (Anderson et al. 2007), and fish (Bertignac et al. 1998). For the remainder of this chapter, the reaction-diffusion model will be our main focus. Solutions of (17) will now be examined using different growth rate functions including exponential, logistic, and those with an Allee effect. First, consider the exponential model which has per capita growth rate f (N) = r > 0. The solution of (17) is then given by (Shigesada and Kawasaki 1997)

84 Mathematical Models Can Predict the Spread of an Invasive Species

a

2263

120 100 80 60 40 20 0 20

b

10

0

-10

-20 -20

-10

20

10

0

120 100 80 60 40 20 0 -60

-40

-20

0

20

40

60

Fig. 4 Plot of the population density function (18) for t = 25 in (a) three dimensions and (b) two dimensions as a cross-section in the xN-plane for y = 0. Here the values of the parameters are D = 1, N0 = 20, and r = 0.3

  x2 + y2 N0 exp rt − , N (x, y, t) = 4π Dt 4Dt

(18)

which is simply (16) with the extra term rt in the exponent to account for growth. Figure 4a shows a plot of this function at t = 25. The function is rotationally symmetric about the release point at (0, 0) in the center of the graph. In Fig. 4b, there is a graph of the cross-section of the solution in the xN-plane when y = 0. In practice, there is often a minimum population density or threshold density that is necessary in order to detect the invasion (Shigesada and Kawasaki 1997) As an example, suppose the detection threshold is ten. If the population density is less than ten, then there is a very low probability that the invading species will be located by observers in the field and the invasion will go undetected. In Fig. 4 the population

2264

J. G. Alford

t = 25

t = 30

50

50 110

40

100

30

90

20

80

400

30

350

20

300

10

70

0

60

–10

50

–10

–20

y

y

10

40

200

40

–20

150

–30

30

–30

100

–40

20

–40

50

–50 –50 –40 –30 –20 –10 0

10

–50 –50 –40 –30 –20 –10 0

10 20 30 40 50

x

10 20 30 40 50

x

t = 35

t = 40

50

1600

40

1400

30

50 30

1200

20

10

1000

10

0

800

–10

600

–20

6000

40

20

y

y

250

0

5000 4000

0

3000

–10

2000

–20 400

–30

200

–40 –50 –50 –40 –30 –20 –10 0

x

10 20 30 40 50

–30

1000

–40 –50 –50 –40 –30 –20 –10 0

10 20 30 40 50

x

Fig. 5 Plots of the population density function (18) for only those values of N ≥ 10. The bar on the right of each graph shows the color scale shading corresponding to the values for N. The value along the edge of each disk is the (assumed) minimum population density of N = 10 for which the invading species may be detected. The invasion front expands as t increases. The values of r, D, and N0 are the same as those in Fig. 4

density is 10 at a radius of (approximately) 16 and any observer farther than 16 away from (x, y) = (0, 0) will not detect the invasion. The radius at which there are exactly ten species is called the invasion front and this front expands outwards in time. Figure 5 shows how these wavefronts expand over four values of t between 25 and 40. Invading organisms that survive may not immediately disperse and grow. There may be a time lag of several years before the species establishes its population level and begins to disperse (Lewis et al. 2016). Figure 6 shows cross sections of the solutions of (18) (just as in Fig. 4b) at four different times. Here it can be seen that the initial population density of 20 has decreased and at t = 5 is less than 2. For t > 5 the population density increases and reaches its detection threshold of 10 between t = 15 and t = 16. Shigesada and Kawasaki (1997) show how (18) can be used to quantify the establishment period of the invasion. To derive the formula, it is first necessary to

84 Mathematical Models Can Predict the Spread of an Invasive Species

2265

15

10

5

0 -60

-40

-20

0

20

40

60

Fig. 6 Plot of the population density function (18) at the indicated times in the xN-plane (b) when y = 0. The population density has fallen from its initial density of 20 to below the detection threshold of N = 10. After about 5 time units, the population density disperses away from its release point and increases until it reaches the detection threshold at some time between t = 15 and t = 16

assume that solutions have radial symmetry and substitute the radial coordinate R in (18) where R2 = x2 + y2 . Let R ∗ denote the radius at which the population density is at its detection threshold denoted n∗ and solve (18) for R ∗ to get 1/2   √ N0 1 R ∗ = 2 rD t 1 + ln rt 4π Dtn∗

(19)

After scaling r∗ and t, this equation can be written in the following form ρ = 2T

  1 γ 1/2 1 + ln , T 4π T

(20)

√ ∗ ∗ where γ = (rN0 )/(Dn √ ), ρ = r/Dr , and T = rt. Note here that ρ and T are dimensionless as r/D has dimensions of 1/distance and r has dimensions of 1/time. These formulas show how the threshold radius depends on time. From (20), it can be determined that γ controls the existence and duration of the establishment phase. In order for (20) to be defined for all values of T > 0, the expression inside the square root must be nonnegative. After applying simple calculus it can be shown that this requires that γ > 4π /e ≈ 4.6. In this case, there will be no establishment phase and the invading species will immediately grow and disperse after it is released. Conversely, if γ < 4π /e, then there will be a time lag before the invading species can disperse and be detected and the smaller γ is the longer the time lag. Note that

2266

J. G. Alford

γ is directly proportional to r and inversely proportional to D. Thus it is predicted from the model that decreasing the growth rate or increasing the diffusion coefficient increases the establishment period. In the example depicted in Figs. 5 and 6, γ = 0.6. Inspection of (19) shows that as t → ∞ the threshold radius converges to √ R ∗ = 2 rDt. Similarly, in 1951 J.G. Skellam (1914–1979) used (17) to show that the square root of the invaded area (which is proportional to the threshold radius) increases linearly with time to derive a formula (10) for the asymptotic spread rate of an invasion √ √ area = 2 rD t.

(21)

√ √ The plot of t versus area is a straight line with slope c = 2 rD which is the speed of the invasion. Note that the dimensions of r are (time)−1 and the dimensions of D are (length)2 time)−1 so that the dimensions of c must be (length) (time)−1 which are the correct dimensions for speed. Skellam used previously published data for the invasion of a non-native species of muskrat (Ondatra zibethica L.) in central Europe to verify his prediction (21) (Edelstein-Keshet 2005). It is believed that in 1905 a landowner in Czechoslovakia allowed two male and three female muskrats to be released onto his estate (Becker 1972). After they were released, their numbers increased rapidly as they spread out into the surrounding regions. Although the muskrat is valued for its fur, it can do great damage to the embankments of rivers and lakes which then threatens the surrounding communities (Becker 1972). Skellam’s application of the diffusion equation to invasion ecology was foundational and linear spread rates have been observed in biological invasions. Two examples of (21) are shown in Fig. 7, one using data from a woody weed invader in Australian wetland ecosystems and the other using data from the Gypsy moth. Skellam used exponential growth in the reaction diffusion Eq. (17) to predict a linear rate of spread. Exponential growth yields the exploding population sizes which can be observed in Fig. 5. In 1937 Ronald Fisher (1890–1962) modeled the spatial propagation of a favorable gene in a population using the one-dimensional version of (17) with logistic growth (7) (Bacaër 2011) (see also Coevolution of Mathematics, Statistics, and Genetics). Since this is a model of self-limiting growth, it can be expected that the solutions will be bounded (i.e., have a carrying capacity). Fisher was able to show that there are bounded solutions even though (17) with (7) is nonlinear and there is no known (as yet) formula for its solution (Okubo and Levin 2001). Furthermore, Fisher showed the existence of traveling wave solutions with wavefronts which advance at a constant speed while maintaining a constant shape. These solutions are particularly relevant to invasive species modeling as they describe how the invading species disperses from its release point and grows to its carrying capacity. To understand how a traveling wave solution of (17) is determined, first express the equation in one-dimension, that is, consider N to be a function of x and t only and neglect y in (17). A traveling wave solution will have the special form N(x, t) = w(x − ct) where w is an unspecified function of one variable and c is

84 Mathematical Models Can Predict the Spread of an Invasive Species

2267

a 2500 2000

1500

1000

500

0 1979

1980

1981

1982

1983

1984

1985

1986

1987

b 600 550 500 450 400 350 300 250 200 150 100 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989

Fig. 7 These graphs show the linear spread rates predicted by (21). The plot in (a) is from the woody weed (Mimosa pigra: L.) in Australian wetland ecosystems (adapted from Lonsdale (1993)) and the plot in (b) is from the gypsy moth (Lymantria dispar) in the Lower Peninsula of Michigan (adapted from Tobin et al. (2015)). The slope of the line which best fits the data in (a) is 279 m/year and in (b) is 17.1 km/year which provide estimates of the rate of spread of the invasion

2268

J. G. Alford

an unknown positive constant which represents the wavespeed (Lewis et al. 2016). When substituted into the reaction-diffusion equation, the result is an ordinary differential equation of the form d 2w dw + wf (w) = 0. +c 2 dt dt

(22)

Now suppose the growth rate function is given by logistic growth (7). Clearly the equilibrium solutions of (7) and (22) are the same and given by w∗ = 0 and w ∗ = K. It is easily verified that (22) is written as a system of two equations in the form dw = v, dt

dv dw = −c − wf (w). dt dt

(23)

by recognizing that dv/dt = d2 w/dt2 and then subtracting the last two terms in (22) on each side. Plots of the solutions of (23) in the wv-plane are known as phase-space plots or trajectories. These curves are parameterized by t; that is, for each t there is a corresponding point (w, v) in the phase plane that solves (22). Equilibrium solutions are constant solutions of (22) and are therefore plotted as points in the phase-plane. A traveling wave solution that satisfies w(−∞) = K and w(∞) = 0 is a trajectory that connects the two equilibria (0, 0) and (K, 0) in the phase-plane. Clearly, the value of the wave speed c will affect trajectories in the phase-plane and the analysis of (22) must √ involve a determination of c. It has been shown (Lewis et al. 2016) that if c ≥ 2 rD, a traveling wave solution exists that satisfies these conditions and therefore N(−∞, t) = K and N(∞, t) = 0. These conditions imply that N will approach the carrying capacity K as x approaches −∞ and N will approach zero as x approaches ∞.√This corresponds to a right moving wavefront which travels at some speed c ≥ 2 r D. Existence of a traveling wave solution when the growth dynamics include the Allee effect as in (8) have also been proven. In this case, there is a unique value of c for which a traveling wave solution connects (0, 0) and (K, 0) in the phase plane (Volpert and Petrovskii 2009). In general, the solution of the reaction diffusion equation will depend on the initial condition and it is essential to know what initial conditions will yield traveling wave solutions. For an invading species, it is reasonable to consider an initial population density that is positive on an interval −L < x < L (where L is some positive value) and zero outside this interval. In this case, it has been shown (Lewis et al. 2016) that the solution will evolve into two traveling wave solutions, one which propagates to the√left and the other to the right. Each of these traveling waves will have speed c = 2 rD. Note that the wavespeed that is obtained for density dependent growth where the per capita growth rate depends on the population density (e.g., logistic growth) is the same as for exponential growth where the per capita growth rate is constant (see Eq. (21)). As explained by Alan Hastings in (Hastings 1996), the intuitive reason for this may be that “the rate of spread is determined by dynamics ‘near the front’, and density dependence is unimportant there because population levels are low.” Figure 8 shows an example. Ahead of each

84 Mathematical Models Can Predict the Spread of an Invasive Species

2269

200

150

100

50

0 -200

-150

-100

-50

0

50

100

150

200

Fig. 8 Traveling wave solutions of (17) restricted to one-dimension with logistic growth (7). The initial population density is N = 30 on the interval −20 ≤ x ≤ 20 and zero otherwise. The solution √ evolves into two traveling wavefronts which move to the right and to the left at speed c = 2 rD. Here the parameter values are D = 1, r = 0.1, and K = 200 just as in Fig. 2

wavefront which is a steep transition from carrying capacity to zero, the invasion has yet to arrive. After some amount of time (depending on the wavespeed), the population density of the invading species at any point in space x will grow to carrying capacity K as the wave passes. In fact, an asymptotic linear rate of spread that is proportional to the per capita growth rate (when the invading species is rare) is robust (Hastings 1996). That is, this prediction holds for various types of modeling scenarios such as two species models and age-structured models. Traveling wave solutions also exist for the reaction diffusion Eq. (17) with the per capita growth rate function that has an Allee effect in (8). Recall that these growth dynamics include a threshold population and the parameter α specifies the minimum initial population necessary for survival as depicted in (3). A similar result holds for the reaction diffusion equation with Allee growth dynamics. For example, when the initial population density is positive on an interval −L < x < L and zero outside this interval a traveling wave can only occur under certain conditions (19). These conditions include that L needs to be sufficiently large and the population density must be larger than the threshold population (y on some part of the interval −L ≤ x ≤ L. If it is assumed that K > 2α (that is, the carrying capacity is more than twice the threshold population), a traveling wave solution will have speed (Lewis et al. 2016) c=

βD (K − 2α) . 2

(24)

Figure 9 shows an example of how an Allee effect may inhibit a traveling wave solution to the reaction diffusion equation.

2270

J. G. Alford

a

200

150

100

50

0 -200

b

-150

-100

-50

0

50

100

150

200

30 25 20 15 10 5 0 -200

-150

-100

-50

0

50

100

150

200

Fig. 9 Solutions of the one-dimensional version of (17) using the growth rate function with an Allee effect (8). In (a), the initial population density is 30 on the interval − 20 ≤ x ≤ 20 and zero otherwise. In (b), the initial population density is 25 on the interval − 20 ≤ x ≤ 20 and zero otherwise. In (a) the initial population evolves into two opposite traveling wavefronts which travel at speed (24). However, in (b) the invading species is unable to survive the invasion. In each case, D = 1 while β, α, and K have the same values as in Fig. 3

The preceding analysis applies to the one-dimensional reaction diffusion equation. The two-dimensional reaction diffusion Eq. (17) has rotationally symmetric traveling wave solutions similar to those in Fig. 5, but that reaches a carrying capacity as they expand outwards from the release point. It has been shown that these solutions depend on curvature (i.e., radial distance from the release point) (Lewis et al. 2016; Volpert and Petrovskii 2009). This is expressed by the formula

84 Mathematical Models Can Predict the Spread of an Invasive Species

2271

500 450 400 350 300 250 200 150 100 50 0 0

5

10

15

20

25

30

Fig. 10 This graph shows how formula (25) accurately predicts the dependence on curvature of rotationally symmetric traveling wave solutions of the reaction diffusion Eq. (17). The data is from the invasion of the house finch in eastern North America (adapted from Shigesada and Kawasaki (1997), Volpert and Petrovskii (2009)), and Lewis et al. (2016)). The solid curve is a plot of t versus R in (25) while fixing the parameters c, R0 , and D

   cR − D 1 D , t= (R − R0 ) + ln c c cR0 − D

(25)

where c is the speed of the wave, t is time, D is diffusion coefficient, R is the radial position of the wave front, and R 0 is the radius of the initially invaded region (Volpert and Petrovskii 2009). Figure 10 shows that this formula is consistent with the invasion data of the house finch.

Conclusion This chapter has focused on mathematical models of invasion ecology. These models show how an invading organism can either go extinct or become established after its introduction. They display exponential and logistic population growth as well as growth subject to the strong Allee effect when there has a threshold population density below which a species will go extinct. An invading species that becomes established can then disperse into surrounding regions. Here the reaction diffusion equation has been presented as a model of species dispersal. Analysis of the reaction diffusion equation yields predictions of the rate of spread, the establishment period of the invasion, and species dispersal in the form of a traveling wave. Theoretical results were shown to be consistent with experimental data. Although a brief discussion of stochastic models was presented, many types of models have been neglected such as integrodifference equations (Neubert and Parker

2272

J. G. Alford

2004) and those that account for age structure or systems of interacting populations (Hastings 1996). The models presented here can be extended to investigate strategies used for the control and eradication of an invasive species. If it is known that an invading population is subject to an Allee effect, eradication may be achieved by reducing the population to a level that is below its threshold density (Liebhold et al. 2016; Suckling et al. 2012). Population levels can be reduced by various techniques including the application of insecticides or the use of the sterile insect technique whereby sterile insects are released into the environment which then mate with fertile insects to produce offspring that are not viable (Suckling et al. 2012). Mathematical models can be used to investigate these types of control strategies. For example, in (Liebhold and Bascompte 2003) a discrete model with Allee dynamics and environmental stochasticity similar to Eq. (9) is used to examine the eradication of the gypsy moth. A kill rate is included in the model to control the population density (e.g., application of an insecticide). Analysis of the model shows that eradication of isolated gypsy moth populations can be achieved if at least 80% of the moths are eradicated (assuming the initial population is relatively low which may require treatment early in the invasion process) (Liebhold and Bascompte 2003). In the case of an established species that disperses, knowledge of the speed of dispersal is useful for developing control strategies since the danger of invasion increases with spread rate (Neubert and Parker 2004). As discussed in this chapter, analysis of various types of mathematical models has shown that there will be a linear rate of spread that is proportional to per capita growth rate. Thus, reducing the per capita growth rate may be an effective strategy to slow the rate of spread of an invasion (Hastings 1996). The total cost of biological invasions in the United states has been estimated at $ 120 billion annually (Pimentel et al. 2005). Although the focus of this chapter has not included models of the economics of invasive species, many modeling efforts combine ecology and economics to better understand how to develop strategies that can minimize the cost of invasive species (Keller et al. 2009). One recent example is a model for control of the European grapevine moth Lobesia botrana (which destroys vineyards) is developed in (Picart et al. 2015). This is an age-structured population growth model that uses insecticides and mating disruptions to control moth population density as well as equations that account for harvest losses (euros per hectare). Optimal control theory (Lenhart and Workman 2007) is used to predict how often chemicals should be spread to minimize these harvest losses. Understanding the impact of invasive species on communities and ecosystems involves the efforts of scientists who develop theory, perform experiments, and collect and analyze data as well as stakeholders who help formulate policy and make management decisions. This chapter has focused on how mathematical modeling can be useful in this process. Okubo and Levin (Okubo and Levin 2001) classify mathematical models as educational or practical. This chapter was concerned with the former. Experimental studies involving invasive species may be impractical (Liebhold and Bascompte 2003). Practical models of invasive species may involve large numbers of parameters, variables, and assumptions which may prohibit the

84 Mathematical Models Can Predict the Spread of an Invasive Species

2273

analytical results that can be derived from educational models. Thus the usefulness of educational models should not be underestimated. Indeed, Okubo and Levin write that educational models “provide a process for gaining insight, expressing ideas, and eventually extending to more complex models” (Okubo and Levin 2001). The mathematical models that have been discussed in this chapter clearly illustrate this statement.

Cross-References  Coevolution of Mathematics, Statistics, and Genetics  Mathematics and Recurrent Population Outbreaks

References Anderson JL, Albergotti L, Proulx S, Peden C, Huey RB, Phillips PC (2007) Thermal preference of Caenorhabditis elegans: a null model and empirical tests. J Exp Biol 210:3107–3116 Andow DA, Kareiva PM, Levin SA, Okubo A (1990) Spread of invading organisms. Landsc Ecol 4:177–188 Bacaër N (2011) A short history of mathematical population dynamics. Springer, London Beck KG, Zimmerman K, Schardt JD, Stone J, Lukens RR, Reichard S, Randall J, Cangelosi AA, Cooper D, Thompson JP (2008) Invasive species defined in a policy context: recommendations from the federal invasive species advisory committee. Invasive Plant Sci Manage 1:414–421 Becker K (1972) Muskrats in central Europe and their control. In: Proceedings of the 5th vertebrate pest conference. University of Nebraska, Lincoln Bertignac M, Lehodey P, Hampton J (1998) A spatial population dynamics simulation model of tropical tunas using a habitat index based on environmental parameters. Fish Oceanogr 7:326– 334 Boyce WE, DiPrima RC (2012) Elementary differential equations and boundary value problems, 10th edn. Wiley, New York Brassil CE (2001) Mean time to extinction of a metapopulation with an Allee effect. Ecol Model 143:9–16 Dennis B (2002) Allee effects in stochastic populations. Oikos 96:389–401 Edelstein-Keshet L (2005) Mathematical models in biology. SIAM, New York Fujisaki I, Pearlstine EV, Mazzotti FJ (2010) The rapid spread of invasive Eurasian collared doves Streptopelia decaocto in the continental USA follows human-altered habitats. Ibis 152:622–632 Hastings A (1996) Models of spatial spread: a synthesis. Biol Conserv 78:143–148 Hengeveld R (1992) Potential and limitations of predicting invasion rates. Fla Entomol 75:60–72 Karieva P (1983) Local movement in herbivorous insects: applying a passive diffusion model to mark-recapture field experiments. Oecologia 57:322–327 Keller RP, Lodge DM, Lewis MA, Shogren JF (2009) Bioeconomics of invasive species: integrating ecology, economics, policy and management. Oxford University Press, New York Kot M (2001) Elements of mathematical ecology. Cambridge University Press, Cambridge, UK Kramer AM, Dennis B, Liebhold AM, Drake JM (2009) The evidence for Allee effects. Popul Ecol 51:341–354 Lenhart S, Workman JT (2007) Optimal control applied to biological models. Chapman & Hall, London Lewis MA, Petrovskii SV, Potts JR (2016) The mathematics behind biological invasions, vol 44. Springer, Berlin

2274

J. G. Alford

Liebhold A, Bascompte J (2003) The Allee effect, stochastic dynamics and the eradication of alien species. Ecol Lett 6:133–140 Liebhold AM, Tobin PC (2006) Growth of newly established alien populations: comparison of North American gypsy moth colonies with invasion theory. Popul Ecol 48:253–262 Liebhold AM, Work TT, McCullough DG, Cavey JF (2006) Airline baggage as a pathway for alien insect species entering the United States. Am Entomol 52:48–54 Liebhold AM et al (2016) Eradication of invading insect populations: from concepts to applications. Annu Rev Entomol 61:335–352 Lockwood JL, Hoopes MF, Marchetti MP (2007) Invasion ecology. Black-well, Malden Logan JD (2004) Applied partial differential equations, 2nd edn. Springer, New York Lonsdale WM (1993) Rates of spread of an invading species Mimosa pigra in northern Australia. J Ecol 81:513–521 Madsen JD, Owens CS (2000) Factors contributing to the spread of Hydrilla in lakes and reservoirs. Aquatic plant control technical notes collection (ERDC TN-APCRP-EA-01). US Army Engineer Research and Development Center, Vicksburg, pp 1–11 McJunkin JW, Zelmer DA, Applegate RD (2005) Population dynamics of wild turkeys in Kansas (Meleagris gallopavo): theoretical considerations and implications of rural mail carrier survey (RMCS) data. Am Midl Nat 154:178–187 Mills EL, Leach JH, Carlton JT, Secor CL (1994) Exotic species and the integrity of the Great Lakes. Bioscience 44:666–676 Neubert MG, Parker IM (2004) Projecting rates of spread for invasive species. Risk Anal 24:817– 831 Okubo A, Levin SA (2001) Diffusion and ecological problems: modern perspectives. Springer, New York Petrovska BB (2012) Historical review of medicinal plants usage. Pharm Rev 6:1–5 Picart D, Milner FA, Thiery D (2015) Optimal treatments schedule in insect pest control in viticulture. Math Popul Stud 22:172–181 Pimentel D, Zuniga R, Morrison D (2005) Update on the environmental and economic costs associated with alien-invasive species in the United States. Ecol Econ 52:273–288 Ricklefs RE, Miller GL (1999) Ecology, 4th edn. W.H Freeman, New York Rodriguez-Brenes IA, Komarova NL, Wodarz D (2013) Tumor growth dynamics: in-sights into somatic evolutionary processes. Trends Ecol Evol 28:597–604 Santos VB, Yoshihara E, Freitas RTF, Reis Neto RV (2008) Exponential growth model of Nile tilapia (Oreochromis niloticus) strains considering heteroscedastic variance. Aquaculture 274:96–100 Seidl I, Tisdell CA (1999) Carrying capacity reconsidered: from Malthus’ population theory to cultural carrying capacity. Ecol Econ 31(3):395–408 Shigesada N, Kawasaki K (1997) Biological invasions: theory and practice. Oxford series in ecology and evolution. Oxford University Press, Oxford Skellam JG (1951) Random dispersal in theoretical populations. Biometrika 38(1–2):196–218 Stephens PA, Sutherland WJ, Freckleton RP (1999) What is the Allee effect? Oikos 87:185–190 Suckling DM, Tobin PC, McCullough DG, Herms DA (2012) Combining tactics to exploit Allee effects for eradication of alien insect populations. J Econ Entomol 105:1–13 Taylor CM, Hastings A (2005) Allee effects in biological invasions. Ecol Lett 8:895–908 Tobin PC, Liebhold AM, Roberts EA, Blackburn LM (2015) Estimating spread rates of non-native species: the gypsy moth as a case study. In: Venette RC (ed) Pest risk modelling and mapping for invasive alien species. CABI Press, Wallingford, pp 131–144 Turchin P (2001) Does population ecology have general laws? Oikos 94:17–26 Volpert V, Petrovskii S (2009) Reaction-diffusion waves in biology. Phys Life Rev 6:267–310 Walker MS (2018) Spotted lanternfly: states urge citizens to report sightings of invasive insect hitchhiker. In: Entomology Today. Entomological Society of America. https:// entomologytoday.org/2018/02/26/spotted-lanternfly-states-urge-citizens-report-sightingsinvasive-insect-hitchhiker/. Accessed 9 March, 2019

Mathematics and Recurrent Population Outbreaks

85

Torsten Lindström

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Lotka–Volterra Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantages of the Lotka–Volterra Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Criticism Against the Lotka–Volterra Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gause-Type Models for Population Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What About Real Chemostat Conditions? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2276 2277 2278 2279 2281 2286 2289

Abstract Despite that outbreaks had been observed for hundreds of years for many populations, it took until the 1920s before the first mechanisms that did not involve human interference were suggested. Just a few mechanisms were included in the first models and the question whether the inclusion of other, very plausible, mechanisms could alter the predictions remained. In this chapter, we follow the development of models that have been proposed to explain oscillatory population dynamics from the early models suggested by Lotka (1925) and Volterra (1926) until global dynamical questions that are still open for models incorporating explicit resource dynamics, like the chemostat, cf Kuang (1989).

Keywords Global stability · Limit cycle · Lyapunov function · Mechanistic population models · Oscillatory dynamics · Recurrent outbreaks T. Lindström () Department of Mathematics, Linnaeus University, Växjö, Sweden e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_33

2275

2276

T. Lindström

Introduction Already in ancient times outbreaks or plagues of voles and mice have been observed and some of these outbreaks was described already in the Old Testament, cf. Elton (1942). Similar observations concerning infectious diseases exist cf. Hethcote (2000), but people were perhaps not aware of the existence of microorganisms or viral agents at ancient times. Despite that observations of population outbreaks existed already a long time ago, we had to wait until 1920 until some mathematical attempts to explain such outbreaks and cycles started to exist. First of all, it was assumed until Elton (1930) that something called “the balance of nature” existed. This balance should prevent population outbreaks if the considered ecologies are not subject to human interference. However, key observations of Hewitt (1921) were made in boreal areas where the key factors, at least at that time, could not be human interference. Before we proceed to mathematical models explaining the possible origin of population outbreaks, we mention two models that appear in the models we are considering below as building blocks. The first one is the model for exponential growth dX = RX, X(0) = X0 . dT

(1)

The explicit solutions of this model are given by X(T) = X0 exp(RT), compare, e.g., Hirsch et al. (2013). They can be checked by direct substitution and since the equation is both linear and separable they can be obtained by at least two methods. The model predicts exponentially increasing populations for R > 0, X0 > 0. An equilibrium solution is visible at X = 0. Most species cannot increase exponentially forever and the logistic equation   dX X = RX 1 − , X(0) = X0 dT K

(2)

has been many authors’ response to deal with this problem (see, e.g., Hirsch et al. 2013; Brauer and Castillo-Chávez 2001). Equation (2) is still separable. Hence, explicit solutions can still be derived and verified. In this case, they are given by X(T ) =

X0 exp(RT ) 1+

X0 K

(exp(RT ) − 1)

.

This model has two equilibria at X = 0 and X = K, respectively. Since all solutions with 0 < X < K increases and all solutions with X > K decreases, the equilibrium at X = 0 is unstable while the equilibrium at X = K attracts all positive initial conditions. We shall later see that even if Eq. (2) formally remedies the problem with ever increasing solutions, it lacks (as stated) clearly defined birth and death processes and

85 Mathematics and Recurrent Population Outbreaks

2277

clear interpretations of its parameters in terms of physiological parameters of the individuals of the population under consideration or the environment. We encounter this equation later as we add equations for the resource level, and we shall then see that its two parameters may be far from independent from each other. We proceed to models that explain population outbreaks in terms of interaction between at least two populations.

The Lotka–Volterra Model The first mathematical attempts to explain population outbreaks were made by Lotka (1925) and Volterra (1926). They assumed that an interaction between predator and prey exist and proposed the following mathematical model   R dX = RX − AXY = AX −Y , dT A   D dY = MAXY − DY = MAY X − . dT MA

(3)

Here, X stands for prey density, Y for predator density, R for natural growth rate of the prey, A for search rate of the predator, M conversion factor for the predator, and D is the natural death rate for the prey. In the first form of the equation, the different processes that are taken into account in the model are visible. In the second form, the equation is written in order to make its isoclines (or null-clines) visible. From the second form, we see that the Lotka–Volterra model has two equilibria, the origin and (D/(MA), R/A). Its generic Jacobian matrix takes the form  J (X, Y ) =

R − AY −AX MAY MAX − D



and it follows that the eigenvalues in the origin are given by R and −D, so it is a saddle and we have exponential growth along the X-axis and exponential decay along the Y-axis. Since the right-hand sides (Eq. 3) satisfy Lipschitz conditions on bounded domains and solutions along the axes exist, the positive quadrant is invariant. The remaining orbits of Eq. (3) are level curves of the functional  V (X, Y ) =

since

X D MA

D X − MA dX + AX



Y R A

Y − R A dY  MAY 

2278

T. Lindström

    D Y−R R D dV X − MA A = AX −Y + MAY X − dT AX A MAY MA       R D D R X− = 0. = X− −Y + Y − MA A A MA It is clear that V diverges as the boundaries of the first quadrant are approached. Therefore, the solutions of Eq. (3) in the nonnegative quadrant consists of two constant solutions at the equilibria, exponentially increasing solutions along the X-axis, exponentially decaying solutions along the Y-axis, and closed orbits.

Advantages of the Lotka–Volterra Model The first advantage is that the underlying assumptions of Eq. (3) are simple. The first assumption (I) is that the growth of the prey is proportional to the number of currently existing prey individuals and determined by Eq. (1) in section “Introduction,” the second (II) is that number of prey caught per predator is proportional to prey density, the third (III) is that a fixed proportion of consumed prey biomass is converted to predator biomass, and the fourth assumption (IV) is that the natural death-rate of the predator individuals is proportional to the currently existing predator individuals. Finally, we build up the model under the assumption that the different processes operate independently on an infinite-decimal time-scale so that their contributions simply add to each other (Metz and Diekmann 1986). Therefore, the precise circumstances granting validity of the model are easy to check. They might not hold but are at least known. The second advantage is that the dynamics of the model is completely known. This has the consequence that even if there are many situations when the validity of one or several assumptions of the model fails to be true, the model, or its solutions might be used for comparison purposes. The third advantage is that already this simple model provides results that demonstrate nonlinear effects that cannot easily be discerned trough verbal arguments. The support for these results is obtained in a precise manner. We take Volterra’s principle (Hofbauer and Sigmund 1988, 1998) as an example of such effects. We proved that a generic solution of Eq. (3) is periodic in section “The Lotka–Volterra Model.” We now continue proving that the time-averages of these periodic solutions remain constant and that they agree with the coordinates of positive fixed point of this system. Assume that the period of the solution is T and observe that  T  T dX d DT log(X)dT = dT X 0 0 dT  T  T = Y (T )dT (R − AY ) dT = RT − A

0 = log (X (T )) − log (X(0)) =

0

0

85 Mathematics and Recurrent Population Outbreaks

2279

and similarly that  T dY  T d DT log(Y )dT = dT Y 0 dT 0  T  T = X(T )dT − DT . (MAX − D) dT = MA

0 = log (Y (T )) − log (Y (0)) =

0

0

It follows from the above equalities that 1 T

 T 0

X(T )dT =

1 D and that MA T

 T

Y (T )dT =

0

R . A

We conclude that if we decrease the natural growth rate of the prey (our parameter R) the consequence is that average densities of their predators decrease, whereas the average densities of the prey population remain constant. If, for instance, insecticides are used the insect population remains constant, whereas the bird population decreases. It is not possible to make this quite sharp conclusion in a reliable way by using verbal arguments! The fourth advantage with Eq. (3) is its ability to explain oscillatory population dynamics as a consequence of population interaction; the balance of nature does not exist (Elton 1930). The Lotka–Volterra model (3) explains how population outbreaks can be created in the absence of human interference.

Criticism Against the Lotka–Volterra Model The most important argument for not using the Lotka–Volterra model is that the model is structurally unstable. By not being structurally stable (Guckenheimer and Holmes 1983), we mean that any small modifications in the modeling assumptions might alter the predictions of the model completely. Such changes are topological, meaning that they cannot be recovered by continuous coordinate transformations. It is therefore necessary to check the assumptions and analyze the dynamical properties of nearby models carefully. We begin criticizing the second assumption of the Lotka–Volterra model: The number of prey caught per predator during a food-gathering period is proportional to prey density. For high prey densities, this assumption cannot be true since the predator must saturate at some prey density. It simply cannot handle all prey gathered. More precisely, let us assume that the length of the food-gathering period is T , the time for handling each prey individual is B, and that N prey individuals were caught during this food-gathering period. As in the Lotka–Volterra model, we assume that the search rate is A. We get N = AX (T − BN )

2280

T. Lindström

since a part of the time for searching is now blocked by the number of preyindividuals actually found by the predator (Holling 1959). We ask for how many prey-individuals are caught per unit of time and get N AX = . T 1 + ABX

(4)

The right hand side of this relationship is usually referred to as the Holling (type II) functional response (Nisbet and Gurney 1982). It describes the fact that a specialist predator saturates when operating in an environment abundant with prey. On the basis of this, we modify our model to 

 R (1 + ABX) −Y , A   D MAXY Y A (M − DB) dY X− = − DY = dT 1 + ABX 1 + ABX A (M − DB)

dX AXY AX = RX − = dT 1 + ABX 1 + ABX

(5)

and the question is what properties Eq. (5) has. We have again, expressed the model in one form that explicitly clarifies the processes taking place and one form expressing its isoclines. Assume that M − DB > 0, otherwise the predator goes extinct and the model no longer represents population interaction. The model still has two positive equilibria, the origin and 

  D R BD , 1+ . A (M − DB) A M − DB

We get a global picture of all orbits of this model if we compare them to the orbits of the comparison system (Duff 1953; Ye et al. 1986)    R BD 1+ −Y , A M − DB   Y A (M − DB) D dY = X− . dT 1 + ABX A (M − DB)

dX AX = dT 1 + ABX



(6)

The orbits of Eq. (6) are level curves of the functional  V (X, Y ) =

X

 (M − DB) X −

D A(M−DB)

 +

Y   R BD A 1+ M−DB

D A(M−DB)

X  Y − R A 1+ Y

BD M−DB

 dX

 dY  .

85 Mathematics and Recurrent Population Outbreaks

2281

It is clear that V diverges along the boundaries of the first quadrant; hence, all positive orbits of Eq. (6) are closed orbits. We now compare the orbits of Eq. (5) to the orbits of Eq. (6) by differentiating V along the orbits of Eq. (5). We get   D   A (M − DB) X − A(M−DB) dV R (1 + ABX) = −Y dT 1 + ABX A   DB   Y−R A 1 + M−DB D X− + A (M − DB) 1 + ABX A (M − DB)   A (M − DB) D = X− 1 + ABX A (M − DB)    DB R (1 + ABX) R − 1+ · A A M − DB  2 RAB (M − DB) D = . X− 1 + ABX A (M − DB) We note that dV/dT is positive semidefinite. This means that V increases along the orbits of Eq. (5) as long as it does not reach an invariant set within the set dV/dT = 0 (LaSalle 1960). Solutions of Eq. (5), therefore, diverge towards the boundaries of the first quadrant. This is not realistic. We conclude that instability is a dynamical consequence of saturation.

Gause-Type Models for Population Interaction Gause (1934) considered the following general class of predator–prey models dX = RX − g (X, Y ) , dT dY = G (X, Y ) . dT

(7)

He concludes that the Lotka (1925) and Volterra (1926) belongs to this class of models and made some attempts to develop the expressions for the involved twovariable functions g and G further. He suggested, for instance √ dX = RX − AY X, dT √  dY MAY X, X > 0, = −DY, X = 0. dT

(8)

2282

T. Lindström

We observe that it was quite difficult to find reasonable expressions for the involved functions in models like Eq. (7) before Holling (1959). The square root function is still not bounded so Eq. (8) still models an unsaturated predator. The involved parameter A has in this context no interpretation as a physiological parameter of the involved individuals of the populations as in Eq. (4). Furthermore, the involved functions do not meet Lipschitz criteria and consequently, the mathematical difficulties may be larger than necessary. Now and then, there are authors that point out mistakes of this type in the literature (Ardito and Ricciardi 1995). Some models belonging to a subclass of Eq. (7) have been referred to as Gausetype models by many authors (see, e.g., Kuang 1988; Lindström 1993; Ardito and Ricciardi 1995; González-Olivares and Rojas-Palma 2011). We shall now derive a model of that type. Our program is to motivate the forms of the functions that are involved better and populate them with parameters that either describe the environment or some physiological properties of the involved species. The most common way of dealing with the instability problem is to assume limited resources, and the simplest way of doing this with some rigor is the assumption of chemostat conditions (Smith and Waltman 1995). This includes taking into account the dynamics of some limiting nutrient that we do not specify in this context. We denote the nutrient (substrate) by S and write dS A1 SX = CD − DS − , dT 1 + A1 B1 S dX M1 A1 SX A2 XY = − DX − , dT 1 + A1 B1 S 1 + A2 B2 X

(9)

dY M2 A2 XY = − DY. dT 1 + A2 B2 X Here, the parameters C, D, A1 , B1 , M1 , A2 , B2 , M2 stand for concentration in the reservoir, dilution rate, search rate for the nutrient for the prey, handling time for nutrient for the prey, conversion factor for the prey, search rate for the predator, handling time for the predator, and conversion factor for the predator, respectively. See Fig. 1 for a sketch of the experimental setup. It reminds about a nutrient flow through a lake system. We assume that the major mortality reason for both species is washout through dilution. Now consider the function W (S, X, Y ) = M1 S + X +

Y − M1 C. M2

It holds that dW DY = M1 CD − M1 DS − DX − = −DW dT M2 meaning that the surface W = 0 is asymptotically invariant for Eq. (9). A study of Eq. (9) on this surface is equivalent to a study of the predator–prey system

85 Mathematics and Recurrent Population Outbreaks Fig. 1 The experimental setup for a chemostat is illustrated here. It is assumed that the fluids in both reservoirs are kept well mixed

2283

D

S

D S, X, Y

  Y X M C − X − A 1 1 M2 dX A2 XY   − DX − = , A1 B1 Y dT 1 + A2 B2 X 1 + M1 M1 C − X − M2

(10)

M2 A2 XY dY = − DY. dT 1 + A2 B2 X This system could in some sense be included in the class (7) of models but is usually not considered to be of Gause-type anyway. In order to see how the above assumptions actually reduces population growth, we may study the case when the prey is unsaturated versus its limiting nutrient, i.e., we put B1 = 0 in order to get rid of the denominator in the first term of the first equation. We get  dX = (A1 M1 C − D) X 1 − dT

X A1 M1 C−D A1



A2 XY A1 XY , − M2 1 + A2 B2 X

(11)

dY M2 A2 XY = − DY. dT 1 + A2 B2 X Model (11) differs from Eq. (5) in two ways. First, we have encountered a logistic growth rate with growth rate parameter R = A1 M1 C − D and carrying capacity parameter K = (A1 M1 C − D)/A1 , cf. Eq. (2). We now see that the two parameters appearing in the logistic growth rate are far from independent from of each other. Second, we encounter the nonremovable coupling term −A1 XY/M2 . Just replacing the linear growth rate in Eq. (5) by a logistic growth rate would have left us with a model without this term (Kooi et al. 1998). Model (11) contains a number of parameter combinations. We are not going to extend the model for the moment, so it is better to introduce new parameters

2284

T. Lindström

and variables in order to reduce the algebraic work. We assume A1 M1 C − D > 0 (otherwise the prey goes extinct) and M2 − B2 D > 0 (otherwise the predator goes extinct) and put x=

A1 X , A1 M1 C − D

y=

A2 Y , A1 M1 C − D

t = (A1 M1 C − D) T , a=

A1 , M2 A2

b=

A2 B2 (A1 M1 C − D) , A1

m=

A2 (M2 − B2 D) , and A1

x =

DA1 . A2 (A1 M1 C − D) (M2 − B2 D)

The model takes the form x˙ = x (1 − x) − axy − y˙ = my

xy , 1 + bx

x − x , 1 + bx

(12)

and with x , 1 + bx (1 + bx) (1 − x) F (x) = , (1 + a + abx) x − x p(x) = m , 1 + bx f (x) = ax +

it takes the isocline form x˙ = f (x) (F (x) − y) , y˙ = yp(x).

(13)

Models with similar structure have been considered to be of Gause (1934)-type by many authors (see, e.g., Kuang 1988; Lindström 1993; Ardito and Ricciardi 1995; González-Olivares and Rojas-Palma 2011). We note that (A-I), f, F, and p

85 Mathematics and Recurrent Population Outbreaks

2285

are continuously differentiable, (A-II) x f (x) is positive definite, (A-III) (1 − x)F(x) is positive definite, (A-IV) (x − x )p(x) is positive definite, and (A-V) 0 < x < 1. It is not difficult to prove that if (A-I)–(A-V), then the solutions of Eq. (13) remain positive and bounded, see, e.g., Lindström and Cheng (2015). The system (Eq. 13) has three equilibria, (0, 0), (1, 0), and (x , F(x )) and its generic Jacobian matrix is given by  J (x, y) =

 f  (x) (F (x) − y) + f (x)F  (x) −f (x) . p(x) yp (x)

We see that the origin is a saddle point with eigenvalues f  (0)F(0) > 0 and p(0) < 0, (1, 0) is a saddle point with eigenvalues f (1)F  (1) < 0 and p(1) > 0. For the remaining equilibrium at (x , F(x )), we get det J (x , F (x )) = F (x ) f (x ) p (x ) > 0 TrJ (x , F (x )) = f (x ) F  (x ) meaning that the stability of (x , F(x )) depends on the sign of F  (x ). This local stability condition is usually referred to as the Rosenzweig and MacArthur (1963) graphical criterion for stability of predator–prey interactions. The local properties of Eq. (13) are now known. What about global properties? If F  (x ) < 0 we know that (x , F(x )) is locally stable, but what about global stability. Comparison with Lotka–Volterra-type systems that we introduced in section “The Lotka–Volterra Model” gives the following theorem. Theorem 1. Assume (A-I)–(A-V) and that (F(x) − F(x ))(x − x ) is negative definite. Then (x , F(x )) is globally asymptotically stable in the positive quadrant. Proof. We introduce the Lyapunov function  V (x, y) =

x x

 y p x y  − F (x )   dx + dy . f (x  ) y F (x )

(14)

The level curves of this Lyapunov function are solutions of the Lotka–Volterra type system x˙ = f (x) (F (x ) − y) , y˙ = yp(x). These solutions are closed orbits, cf. Eq. (6). We differentiate V with respect to time along the solutions of Eq. (13) and get

2286

T. Lindström

y − F (x ) p(x) f (x) (F (x) − y) + yp(x) = p(x) (F (x) − F (x )) . V˙ = f (x) y The theorem follows.



Remark 1. If additional smoothness is assumed, Dulac’s theorem can be used to obtain a similar conclusion. Indeed if the Dulac function B(x, y) = 1/(yf (x)) is used, we see that no cycles can exist if F  (x) < 0 for 0 < X < 1. The question is of course how far we can proceed. An ultimate, not improvable theorem would state that local stability implies global stability, i.e., F  (x ) < 0 makes (x , F(x )) globally asymptotically stable in the positive quadrant for Eq. (13). This can indeed be proved, if Eq. (14) is modified slightly (see Ardito and Ricciardi 1995; Lindström and Cheng 2015) or by the use of Green’s theorem (Smith and Waltman 1995). The other global issue is the uniqueness of the limit cycle when F’(x ) > 0. From Poincaré–Bendixson’s theorem (Hirsch et al. 2013; Wiggins 2003), we know that at least one limit cycle exist if (A-I)–(A-V) and F’(x ) > 0. It turns out that it is possible to prove that this limit cycle is unique by use of Zhang’s (1986) theorem, see Kuang and Freedman (1988) and Lindström and Cheng (2015). Hence, all the qualitative properties of Eq. (13) are known in complete detail: Assume (A-I)–(A-V): Then (i) if F’(x ) < 0, then (x , F(x )) is globally asymptotically stable in the positive quadrant for Eq. (13) and (ii) if F’(x ) < 0, then Eq. (13) has a unique limit cycle that attracts all initial conditions in the positive quadrant.

What About Real Chemostat Conditions? We have so far mainly studied a limiting case of the chemostat and obviously this system must contain some information about real chemostat conditions and we return to study (10). In order to avoid technical details, we limit the discussion to the most simple results. Also here, we introduce new variables   1 A1 1 − DB M1   X, x= DB1 A1 M1 C 1 − M1 − D   1 A1 1 − DB M1    Y,  y= 1 − D M2 A1 M1 C 1 − DB M1     DB1 − D T, t = A1 M1 C 1 − M1

(15)

85 Mathematics and Recurrent Population Outbreaks

2287

that based on the growth interval of the prey species x,  A1 M1 C 1 − ⎣0,  A1 1 − ⎡

DB1 M1 DB1 M1

 

−D

⎤ ⎦.

We also equip (10) with the new parameters M2 A2  , 1 A1 1 − DB M1     A2 B2 DB1   A1 M1 C 1 − −D , b= M1 1 A1 1 − DB M1   DB1 A −D M C 1 − 1 1 M1 B1 k= · , M1 1 − DB1

a=

M1

c = A1 B1 C,   M2 A2 1 − DM22B   , m= 1 A1 1 − DB M1 x =

 D  DB aM1 C 1− M 1 −D 1   DB M2 A2 1− M 2 2   DB A1 1− M 1

.

1

Now, Eq. (10) becomes x (1 − x − y) ax −y , 1 + c − k (x + y) 1 + bx x − x y˙ = my , 1 + bx

x˙ =

(16)

with a, b, c > 0, 0 < m < a, and 0 < x < 1 (x < 1 ensures a two-species food-chain). Finally, we have 0 ≤ k < c.

(17)

The special case B1 = 0 gives k = 0 (other parameter combinations giving k = 0 do not make sense) and this case was already considered in section “Gause-Type Models for Population Interaction.” We therefore assume 0 < k < c. For the variables, we have x ≥ 0, y ≥ 0.

2288

T. Lindström

With f (x) =

ax , 1 + bx

(18)

r(x) =

1 + bx , a

(19)

1−s , 1 + c − ks

(20)

H (s) =

p(x) = m

x − x , 1 + bx

(21)

it takes the isocline form x˙ = f (x) (r(x)H (x + y) − y) , y˙ = yp(x).

(22)

It is not evident what isocline form to choose when studying the chemostat model. However, some form carrying over key properties of the involved functions must be chosen in order to reduce the algebra needed in the further work. The form suggested here does at least some part of the work. We note that (C-I), f, p, r, and H, s < (1 + c)/k, are continuously differentiable, (C-II) xf (x) is positive definite, (C-III) H’(s) < 0, H(1) = 0, (C-IV) (x − x )p(x) is positive definite, (C-V) r(x) > 0, r’(x) > 0, and (C-VI) −f (x) + p(x) < 0, for x < x < 1. It is not difficult to prove that the choice (18), (19), (20), and (21) meets the criteria (C-I)–(C-VI) and that solutions of Eq. (22) that start in the triangle x ≥ 0, y ≥ 0, x + y ≤ 1 remain there, see Lindström and Cheng (2016). The system (22) has three equilibria, (0, 0), (1, 0), and (x , y ), with y implicitly defined by y = r(x )H(x + y ). Its generic Jacobian matrix is given by ⎛



f  (x) (r(x)H (s) − y) +  (s) − 1

 f (x) r(x)H ⎠, J (x, y) = ⎝ + f (x) r (x)H (s) + r(x)H  (s) p(x) yp (x) with s = x + y as already noted. We see that the origin is a saddle point with eigenvalues f  (0)r(0)H(0) > 0 and p(0) < 0, (1, 0) is a saddle point with eigenvalues f (1)r(1)H’(1) < 0 and p(1) > 0. For the last equilibrium at (x , y ), we get ⎛



f (x ) r  (x ) H (x + y ) +  f (x ) r (x ) H (x + y ) − 1 ⎠ J (x , y ) = ⎝ f (x ) r(x ) H  (x + y ) .  0 y p (x )

85 Mathematics and Recurrent Population Outbreaks

2289

Now, we have

det J (x , y ) = −y p (x ) f (x ) r(x ) H  (x + y ) − 1 > 0 meaning that (x , y ) cannot be a saddle and that the stability of (x , y ) is determined by the sign of TrJ(x , y ) only. Indeed,

TrJ (x , y ) = f (x ) r  (x ) H (x + y ) + r (x ) H  (x + y ) . We end up by comparing this expression to the derivative of the implicitly defined prey isocline, y(x). ˜ From Eq. (22), we have the implicit relation r(x)H (x + y(x)) ˜ − y(x) ˜ = 0. The implicit function theorem can be applied and gives y˜  (x) =

˜ + r(x)H  (x + y(x)) r  (x)H (x + y(x)) ˜ . 1 − r(x)H  (x + y(x)) ˜

The denominator of the above expression is positive and therefore the isocline function is defined for 0 < x < 1. The sign of the derivative of this function is determined by the nominator and for x = x it agrees with the sign of TrJ(x , y ). Consequently, the Rosenzweig and MacArthur (1963) graphical criterion persists under chemostat conditions. Green’s theorem can still be used to prove that local stability implies global stability for the chemostat, see Smith and Waltman (1995). However, the uniqueness of limit cycles for the chemostat is still an open question (Kuang 1989). Thus, parameter regimes allowing for multiple limit cycles (or initial value dependent behavior) when the interior equilibrium is unstable might still exist.

References Ardito A, Ricciardi P (1995) Lyapunov functions for a generalized Gause-type model. J Math Biol 33:816–828 Brauer F, Castillo-Chávez C (2001) Mathematical models in population biology and epidemiology, volume 40 of Texts in applied mathematics. Springer, New York Duff GFD (1953) Limit cycles and rotated vector fields. Ann Math 57(1):15–31 Elton CS (1930) Animal ecology and evolution. Clarendon Press, Oxford Elton CS (1942) Voles, mice and lemmings. Clarendon Press, Oxford Gause GF (1934) The struggle for existence. The Williams & Wilkins, Baltimore González-Olivares E, Rojas-Palma A (2011) Multiple limit cycles in a Gause type predator-prey model with Holling type III functional response and Allee effect on prey. Bull Math Biol 73:1378–1397 Guckenheimer J, Holmes P (1983) Nonlinear oscillations, dynamical systems, and bifurcations of vector fields. Springer, Berlin Hethcote HW (2000) The mathematics of infectious diseases. SIAM Rev 42(4):599–653

2290

T. Lindström

Hewitt CG (1921) The conservation of the wild life of Canada. Charles Scribner’s Sons, New York Hirsch MW, Smale S, Devaney RL (2013) Differential equations, dynamical systems, and an introduction to chaos. Academic, Oxford Hofbauer J, Sigmund K (1988) The theory of evolution and dynamical systems. Cambridge University Press, Cambridge Hofbauer J, Sigmund K (1998) Evolutionary games and population dynamics. Cambridge University Press, Cambridge Holling CS (1959) Some characteristics of simple types of predation and parasitism. Can Entomol 91(7):385–398 Kooi BW, Boer MP, Kooijman SALM (1998) On the use of the logistic equation in models of food chains. Bull Math Biol 60:231–246 Kuang Y (1988) Nonuniqueness of limit cycles of Gause-type predator-prey systems. Appl Anal 29:269–287 Kuang Y (1989) Limit cycles in a chemostat related model. SIAM J Appl Math 49(6):1759–1767 Kuang Y, Freedman HI (1988) Uniqueness of limit cycles in Gause-type models of predator–prey systems. Math Biosci 88:67–84 LaSalle JP (1960) Some extensions of Lyapunovs second method. IRE Trans Circuit Theory CT7:520–527 Lindström T (1993) Qualitative analysis of a predator-prey system with limit cycles. J Math Biol 31:541–561 Lindström T, Cheng Y (2015) Uniqueness of limit cycles for a limiting case of the chemostat: does it justify the use of logistic growth rates. Electron J Qual Theory Differ Equ 47:1–14. http:// www.math.u-szeged.hu/ejqtde Lindström T, Cheng Y (2016) A Rosenzweig–MacArthur (1963) criterion for the chemostat. Sci World J 2016:1–6. https://doi.org/10.1155/2016/5626980 Lotka AJ (1925) Elements of physical biology. Williams and Wilkins, Baltimore Metz JAJ, Diekmann O (1986) The dynamics of physiologically structured populations. Springer, Berlin Nisbet RM, Gurney WSC (1982) Modelling fluctuating populations. The Blackburn Press, Caldwell Rosenzweig ML, MacArthur RH (1963) Graphical representation and stability conditions of predator–prey interactions. Am Nat 97:209–223 Smith HL, Waltman P (1995) The theory of the chemostat: dynamics of microbial competition. Cambridge University Press, Cambridge Volterra V (1926) Variazioni e fluttuazioni del numero d’individui in specie animali conviventi. Mem R Accad Natl Lincei 6(2):31–113 Wiggins S (2003) Introduction to applied nonlinear dynamical systems and chaos, 2nd edn. Springer, New York Ye Y-Q et al (1986) Theory of limit cycles, 2nd edn. American Mathematical Society, Providence Zhang Z-f (1986) Proof of the uniqueness theorem of limit cycles of generalized Liénard equations. Appl Anal 29:63–76

Limit Cycles in Planar Systems of Ordinary Differential Equations

86

Torsten Lindström

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Planar Linear and Linearized Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . First Integral Systems and Gradient Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monotone Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Index Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Complex Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Existence of Limit Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Liénard (1928) Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Theorems for Absence of Limit Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Uniqueness of Limit Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2292 2294 2295 2297 2298 2300 2302 2303 2306 2309 2316 2317

Abstract The idea of a dynamical system is predicting the future of a given system with respect to some initial conditions. If the dynamical system is formulated as a differential equation, then there is usually a direct relation between the dynamical system and the processes involved. Today, we can easily say that dynamical systems can predict a huge number of phenomena, including chaos. The real question is, therefore, not whether complicated phenomena may occur, but whether restrictions on the possible dynamics exist. In this chapter, we commence with major theorems that are frequently used for justifying phase space analysis. We continue with simple examples that either possess limit cycles and classes of differential equations that never possess limit

T. Lindström () Department of Mathematics, Linnæus University, Växjö, Sweden e-mail: [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_34

2291

2292

T. Lindström

cycles. We end up with the ideas behind two major theorems that put bounds for the number of limit cycles from above: Sansone’s (Annali di Matematica Pura ed Applicata, Serie IV 28:153–181, 1949) theorem and Zhang’s (Appl Anal 29:63– 76, 1986) theorem. Both theorems apply to systems that have a clear mechanistic interpretation. We outline the major arguments behind the quite precise estimates used in these theorems and describe their differences. Our objective is not to formulate these theorems in their most general form, but we give references to recent extensions.

Keywords Limit cycle · Global stability · Hamiltonian system · Holomorphicmap · Phase portrait

Introduction Let us consider a system of ordinary differential equations of the form x˙ = f(x),

(1)

and x ∈ Rn , with f ∈ C 1 (Rn ). The next theorem (Perko 2001) implies that f ∈ C 1 is sufficient for uniqueness and existence of the solutions. Theorem 1 (Global existence theorem). For f ∈ C 1 (Rn ) and for each x0 ∈ Rn , the initial value problem x˙ =

f(x) . 1 + |f(x)|

(2)

has a unique solution x(t) defined for all t ∈ R, i.e., (2) defines a dynamical system on Rn and, furthermore, (2) is topologically equivalent to (1). Since this presentation is entirely two-dimensional (n = 2), we also write x˙ = P (x, y) y˙ = Q(x, y)

(3)

with x ∈ R, y ∈ R, P , Q ∈ C 1 (R2 ) instead of (1). The Poincaré-Bendixson theorem (Hirsch et al. 2013) states that the most complicated limiting behavior of (3) is periodic. It is a strong theorem, since it rules out the possibilities for chaos. We formulate it as follows. Theorem 2 (Poincaré-Bendixson). All non-empty compact limit sets of (3) containing no equilibria are closed orbits. A more precise formulation of the above theorem catalogues the possibilities for limiting behavior that includes fixed points, too; see, e.g., Perko (2001) or Wiggins

86 Limit Cycles in Planar Systems of Ordinary Differential Equations

2293

(2003). There are extensions of this theorem to other two-dimensional manifolds, e.g., the sphere S 2 . It cannot be extended to all two-dimensional manifolds. The torus T 2 (see Wiggins (2003)) provides a counterexample. The reason is that objects homeomorphic to a circle do not necessarily divide the torus into two pieces. Some higher-dimensional extensions exist, too. There are, for instance, extensions to three-dimensional monotone dynamical systems (Smith 1995) and even to infinite dimensional delay equations of monotone cyclic feedback type; see Mallet-Paret and Sell (1996) and Smith (2011). One may construct planar systems that give rise to limit cycles in order to provide examples. Consider, e.g., the system r˙ = r(1 − r) θ˙ = 1

(4)

in polar coordinates. It is easy to see that the system has an isolated periodic orbit (a limit cycle) at r = 1, which is stable. Since the radial equation is simple enough, separates, and is logistic, the stability of the limit cycle can indeed be verified by explicit computation of the Poincaré map r2π (r0 ) =

r0 e2π 1 − r0 + r0 e2π

and its associated derivative at r0 = 1,  (1) = e−2π < 1. 0 < r2π

(5)

For later purposes, we note that the divergence of the vector field (4) is given by div(r(1 − r), 1) =

1 ∂(r · r(1 − r)) 1 ∂ + 1 = 2 − 3r r ∂r r ∂θ

in polar coordinates. It takes the value −1 along the unit circle. Integrated along the unit circle, it gives the exponent  0



(−1)dt = [−t]2π 0 = −2π.

of the multiplier (5). A method for computing the stability of a given limit cycle is, therefore, to estimate the sign of the divergence integrated along the limit cycle; see, e.g., Grimshaw (1993) and Perko (2001). This technique is directly related to the derivative of the Poincaré map, and the corresponding stability theorem is formulated below. We shall return to this method at a later stage. Theorem 3 (Poincaré criterion about stability). Assume n = 2. Let  be a periodic solution of (1) with period T . Consider the Poincaré map P locally along a straight line normal to the periodic solution , and let its variable s be the signed distance from  along the straight line so that for small s, s < 0 if we are inside 

2294

T. Lindström

and s > 0 if we are outside . Then P has a fixed point at s = 0 corresponding to the periodic solution , and the derivative of the Poincaré map at this fixed point is given by  T   ∇ · f((t))dt . P (0) = exp 0

We end this section by concluding that similar techniques can be used for the analysis of all systems of type  x˙ = −y + xf ( x 2 + y 2 )  y˙ = x + yf ( x 2 + y 2 ) since they take the form r˙ = rf (r) θ˙ = 1

(6)

in polar coordinates. Indeed, we have f (r) = 1 − r in (4) above. It is therefore, possible to arrange for an arbitrary number of stable and unstable limit cycles of any multiplicity around the origin for such systems. An explicit computation of the Poincaré map might not be possible or might not provide the information needed, but sign analysis of the involved function f still contains the qualitative information. The key problem here is that despite that the class of system that is considered here provides complete information of their limit cycles, it cannot usually be used for making statements of the appearance of limit cycles in systems of any practical importance. The class considered is far too special. We proceed to classes of systems that are well-known, but do not possess limit cycles. The resulting catalogue of classes of systems without limit cycles shows that systems possessing limit cycles have special properties and require special analysis.

Planar Linear and Linearized Systems All planar linear systems of ordinary differential equations are linearly equivalent to one of the canonical forms      x x˙ λ1 0 , (7) = 0 λ2 y y˙      x˙ λ1 x = , and (8) y˙ 0λ y      x˙ α −β x = . (9) y˙ β α y

86 Limit Cycles in Planar Systems of Ordinary Differential Equations

2295

The cases correspond to (7) existence of two real eigenvectors, (8) existence of a single real eigenvector, and (9) existence of two complex eigenvectors. The solutions of these three cases are given in terms of fundamental matrices by 

  x0 eλ1 t 0 , y0 0 eλ2 t    λt λt    x(t) x0 e te , and = y(t) 0 eλt y0    αt   x(t) e cos βt −eαt sin βt x0 , = αt αt y(t) y0 e sin βt e cos βt x(t) y(t)





=

respectively. Hence, linear systems cannot produce non-constant periodic solutions unless the real part of the eigenvalues of the defining matrix is equal to zero. In this case all solutions are periodic. There cannot exist any isolated non-constant periodic solutions in linear systems, so there are no limit cycles. We conclude that linearization techniques based on the Hartman-Grobman theorem cannot provide any information about limit cycles either.

First Integral Systems and Gradient Systems Another interesting category of systems are those possessing a first integral. This class include, for instance, systems that can be transformed onto the Hamiltonian form x˙ =

∂H (x, y), ∂y

y˙ = −

∂H (x, y), ∂x

(10)

where H : R2 → R is C 2 (R2 ). We now differentiate the continuous functions involved here with respect to time and the solution curves of (10) to get ∂H ∂H ∂H ∂H ∂H ∂H x˙ + y˙ = − = 0. H˙ = ∂x ∂y ∂x ∂y ∂y ∂x Thus, the C 2 function H involved here is a constant of motion or a first integral. It is constant along every solution curve. In this context, it is most important to note that systems possessing first integrals cannot have limit cycles. All exceptions are non-generic. Theorem 4. Let H be a first integral of a planar system of ordinary differential equations. If H is not constant on any open set, then there are no limit cycles.

2296

T. Lindström

Proof. If there is a limit cycle , then H must be constant, say H () = c along this periodic solution. A limit cycle is an isolated periodic solution, so nearby solutions must approach it either forward in time or backward in time. Since H is continuous, H evaluated along those solutions must equal c, too. Thus, H is constant on an open set around .   Example 1. The system x˙ = −y y˙ = x is linear and has the first integral H (x, y) = x 2 + y 2 . It can therefore not have limit cycles by both reasons mentioned so far. Example 2. The Lotka (1925); Volterra (1926) system dX = X(Y∗ − Y ), dT dY = Y (X − X∗ ), dT

(11)

X ≥ 0, Y ≥ 0, is not Hamiltonian, but it has the first integral  W (X, Y ) =

X X∗

X − X∗ dX + X



Y Y∗

Y  − Y∗  dY . Y

for X > 0, Y > 0. However, on X > 0, Y > 0, the transformation of the independent variable dT 1 = , X > 0, Y > 0 dt XY

(12)

brings it to the Hamiltonian form Y∗ − Y X˙ = Y X − X∗ Y˙ = X with respect to the continuous function W (X, Y ), X > 0, Y > 0. In general, it is difficult to characterize first integral systems that are not Hamiltonian (see, e.g., Dumortier et al. 2006).

86 Limit Cycles in Planar Systems of Ordinary Differential Equations

2297

We now assume that our system is defined as the negative gradient of a function H (x) that belongs to C 2 (Rn ). Then x˙ = −∇H i.e., the direction of the phase curves are perpendicular to the Hamiltonian flow defined by H (Perko 2001). In order make general statements concerning the dynamics of (), we refer to a Lyapunov function argument based on the following theorem. Theorem 5 (LaSalle’s (1960) invariance principle). Consider (1) and let V (x) be a scalar C 1 function. Assume that (i) V (x) is positive definite and that (ii) V˙ (x) is negative semidefinite. Let E be the set of points where V˙ (x) = 0 and let M be the largest invariant set contained in E. Then every solution of (1) bounded for t ≥ 0 approaches M as t → ∞. We can then use H as the scalar function mentioned above and get H˙ = −||∇H ||2 ≤ 0. This means that H decreases along the orbits of a gradient system. According to LaSalle’s invariance principle (LaSalle 1960), the flow will reach the largest invariant set where H˙ = 0. But this set is exactly the equilibrium points of (), and we can revert the time in order to get the unstable equilibria. Therefore, a gradient system cannot possess limit cycles. We end this section by mentioning that recent extensions of Lyapunov’s method and LaSalle’s invariance principle are available in Michel et al. (2015).

Monotone Dynamics Another category of systems that have received a lot of attention recently are systems producing monotone dynamics (Smith 1995). That is, they preserve some order relation (for instance, the order relation defined by the positive cone) when we follow the orbits either forward in time (cooperative systems) or, when possible, backward in time (competitive systems). In many cases it is difficult to ensure that such an order preserving property actually holds and, therefore, the Kamke (1930, 1932), Müller (1927a, b), or quasimonotone (Hadeler and Glas 1983) conditions are usually used for ensuring this property. They state that if all the off-diagonal elements of the generic Jacobian are positive, then the given ordinary differential equation generates a monotone flow.

2298

T. Lindström

Example 3. Consider (3) in R2 with, P , Q ∈ C 1 and ∂P ∂Q > 0 and > 0. ∂y ∂x Assume now that x(t ˙ 0 ) > 0, y(t ˙ 0 ) > 0, and x(t) ˙ = 0 for some t ≥ t0 . Then x(t) ¨ =

∂P ∂P x˙ + y˙ > 0 ∂x ∂y

meaning the x˙ cannot become negative. Similar arguments apply to the other coordinates and quadrants. The solutions are either monotone or eventually monotone; see Hofbauer and Sigmund (1998). This means that the solutions tend to either infinity or an equilibrium. The system can therefore not have any attracting limit cycles. The above argument shows that forward monotone (cooperative) dynamical systems cannot have attracting limit cycles and that backward monotone (competitive) dynamical systems cannot have repelling limit cycles. In our two-dimensional case, it is possible to say more: Systems with monotone dynamics can always be reduced to with one dimension less than the given system. A two-dimensional system with monotone dynamics is one-dimensional; see, e.g., Smith (1995). Thus, forward monotone and backward monotone two-dimensional ordinary differential equations cannot have limit cycles. By a similar argument, the Poincaré-Bendixson theorem holds for three-dimensional monotone systems of ordinary differential equations as stated in Sect.; see Smith (1995).

Index Theory Our examples started from the system (4), and in the next step, we proceeded to more general systems like (6). These systems had just one fixed point at the origin, and it was clear that possible limit cycles had to circumvent that point. If the system has several equilibria or more complicated equilibria, it is useful to have some simple rules for stating whether limit cycles are possible or not. We define the index of a piecewise, smooth, simple, closed curve (a Jordan curve) relative to a vector field; see Perko (2001). Definition 1. The index IP ,Q (γ ) of a Jordan curve γ with respect to the C 1 vector field (3) on R2 , where (3) has no critical point on γ is defined as the integer IP ,Q (γ ) =



86 Limit Cycles in Planar Systems of Ordinary Differential Equations

2299

where is the total change in the angle that the vector (P , Q) makes with respect to the x-axis, i.e., is the change in (x, y) = arctan

Q(x, y) P (x, y)

as the point (x, y) traverses γ exactly once in positive direction. It turns out that (1) if there are no fixed points in the interior of γ , then IP ,Q (γ ) = 0. (2) If the interior of γ contains just one hyperbolic fixed point, then IP ,Q (γ ) = −1 if the fixed point is a saddle and IP ,Q (γ ) = 1, otherwise. (3) IP ,Q (γ ) depends on the number and types of fixed points in the interior of γ only; we may talk about indexes of fixed points rather than Jordan curves. (4) If the interior of γ contains a finite number of fixed points, then IP ,Q (γ ) is the sum of the indexes of the fixed points that are contained in γ . (5) The index of a limit cycle is 1. (6) If a finite set of fixed points exist, then the sum of their indexes and the index of the infinity point equals 2 (the Euler characteristic of a sphere). The fact that the index of a limit cycle is 1 excludes the possibility that limit cycles could surround any constellation of fixed points. In fact, if the limit cycle contains just hyperbolic fixed points, it contains at least one non-saddle, and if it contains saddles, it must contain an additional non-saddle for each contained saddle. An explicit example with a limit cycle surrounding one non-saddle is given by (4); an explicit example with a limit cycle surrounding two non-saddles and one saddle is given in Lindström (1993). The next theorem is quite useful for computing the indexes of fixed points (cf. Jordan and Smith 1990). Theorem 6. Consider the system (3) and traverse Jordan curve γ once in counterclockwise direction. Let p be the number of changes in Q(x, y)/P (x, y) from +∞ to −∞ along γ , and let q be the number of changes in Q(x, y)/P (x, y) from −∞ to +∞ along γ . Then IP ,Q = 12 (p − q). Example 4. We compute the index of the origin of the system x˙ = 2xy, y˙ = 3x 2 − y 2 .

(13)

We stretch the theory a bit and use the unit square as a Jordan curve. We begin with the segment y = 1, −1 ≤ x ≤ 1, and there we have Q/P = (3x 2 − 1)/2x. Counterclockwise, we have one transition from −∞ to +∞. A similar transition is found on the segment y = −1, −1 ≤ x ≤ 1. For the segment x = −1, −1 ≤ y ≤ 1, we have Q/P = (3−y 2 )/(−2y). Thus, one transition from −∞ to +∞ occurs, and a similar transition occurs counterclockwise for the segment x = 1, −1 ≤ y ≤ 1. Therefore, the index of the origin is −2 for (13).

2300

T. Lindström

The Complex Plane In this section we consider the complex first-order differential equation z˙ = f (z), z ∈ C, t ∈ R,

(14)

where f is analytic in C, except, possibly, at isolated singularities; see Garijo et al. (2007). The family of complex functions that meet these criteria is rather general. Examples are polynomial, rational, entire, and meromorphic functions. Functions with isolated essential singularities are included. The change of variables z = x +iy describes the relation between (14) in the complex plane C and (3) in the real plane R2 . Our objective is to elucidate the role of the Cauchy-Riemann equations (cf. Saff and Snider 2003) in the context of limit cycles. Indeed, (14) cannot possess limit cycles. Theorem 7. Consider (14) and assume that f is analytic in C except at isolated singularities. Let  be a periodic orbit with period T in (14). Then all orbits in an open set containing  are periodic, and they share the period T . Proof. We first prove that all periodic orbits in a neighborhood of the assumed periodic orbit inherit its period T . Assume that there is a periodic orbit  in a neighborhood of  with period T . We have 

T

T = 0



T

dt = 0

z˙ (t) dt = f (z(t))

 

1 dz = T f (z)

The second equality holds by (14), and the last equality holds by the residue theorem (Saff and Snider 2003). Therefore, the periods of possible periodic orbits of (14) are determined of sum of residues of 1/f (z) over the zeroes of f surrounded by . We continue proving that there are no isolated periodic orbits in (14). We introduce the new independent variable dT 1 = dt f (z)f (z) that brings (14) into the conjugate analytic form z˙ =

1 f (z)

(15)

wherever the transformation in question is defined. The conjugate Cauchy-Riemann equations imply now that the vector field is a divergence free Hamiltonian vector field and that its Hamiltonian is harmonic (Sverdlove 1979).

86 Limit Cycles in Planar Systems of Ordinary Differential Equations

2301

If f does not have zeroes, it cannot possess limit cycles by index theory, and we can, thus, assume that the right hand side of (15) has at least one singularity that is not removable. Consider now the connected open set (region) in C where the harmonic Hamiltonian of (15) is defined outside its singularity. It might be multiply defined, so that a branch need to be chosen. Choosing a branch does not remove connectedness. Since the singularity is non-removable, the harmonic Hamiltonian cannot be constant in a neighborhood of the singularity. The mean value property for harmonic functions (Conway 1973) excludes the possibility that it is constant on any open subset of the region where it was defined. Theorem 4 implies that there are no limit cycles.   The above exclusion of isolated periodic orbits based on properties of harmonic functions is different from the one in Garijo et al. (2007) that use properties of Lie brackets instead. We continue with a number of simple examples. Example 5. We return to Example 1. It can be written as z˙ = iz in C, and its corresponding conjugate analytic system is given by z˙ = −1/iz or −y , + y2 x . y˙ = 2 x + y2

x˙ =

x2

(16)

The first integral H (x, y) = x 2 + y 2 given in Example 1 is not harmonic but H∗ (x, y) = log(x 2 + y 2 ) is. We get 2x · (−y) 2y · x H˙ ∗ = 2 + 2 =0 2 2 (x + y ) (x + y 2 )2 along the orbits of (16). Example 6. We compare the result to the system z˙ = z that has the conjugate analytic form x + y2 y y˙ = 2 x + y2 x˙ =

x2

(17)

If choose to place the branch-cut on the ray y < 0, x = 0, a harmonic Hamiltonian takes in this case the form

H∗ (x, y) =

⎧ ⎨

arctan(y/x), x > 0. π/2, x = 0, y > 0 ⎩ arctan(y/x) + π, x (x 2 + y 2 )2 − (x 4 + y 4 ) = x 4 + 2x 2 y 2 + y 4 − x 4 − y 4 = 2x 2 y 2 > 0, if x 2 + y 2 < 1. Further, (x 2 − y 2 )2 ≥ 0 implies x 4 + y 4 ≥ 2x 2 y 2 . We get 2(x 4 + y 4 ) ≥ (x 2 + y 2 )2 > 2(x 2 + y 2 ) for x 2 + y 2 > 2. Therefore, V˙ < 0 for x 2 + y 2 > 2 and the region 1 ≤ x 2 + y 2 ≤ 2 contains at least one limit cycle.

The Liénard (1928) Equation We have so far dealt with more or less artificially constructed equations. We now turn our attention to an equation that is strongly connected to applications. The most widely used system of equations in this context is the Liénard (1928) equation x¨ + f (x)x˙ + g(x) = 0.

(18)

It models a particle subject to friction and a potential and has, thus, an obvious physical meaning. There are two ways to describe this equation in the phase plane. One may use either the standard phase plane x˙ = y, y˙ = −f (x)y − g(x),

(19)

2304

T. Lindström

or the Liénard plane given by x˙ = y − F (x) y˙ = −g(x)

(20)

x with F (x) = 0 f (x  )dx  . The Liénard plane has a large number of advantages. One of them is that the C 1 -condition for existence and uniqueness in Theorem 1 is posed on F and not on its derivative f . We are going to study (20) under the following general conditions: (A-I) F ∈ C 1 (R) and F (0) = 0. x (A-II) g ∈ C 1 (R) and xg(x) > 0, x = 0 and with G(x) = 0 g(u)du, we have G(−∞) = G(∞) = ∞. x Remark 1. If F is defined as F (x) = 0 f (x  )dx  through the Liénard transformation from (18), then assumption F (0) = 0 is redundant. It is needed only when (20) is considered separately from (18). We first note that we can arrange some additional symmetry in the system by a variable translation, i.e., we replace the potential with a standard one. Lemma 1. Assume (A-I)-(A-II). System (20) is topologically equivalent to ξ  = η − F (ξ ), η = −ξ. Proof. The relevant variable transformation is √

2G(x) dt ξ = sgn(x) 2G(x), η = y and = . dτ sgn(x)g(x) It is a homeomorphism because of (A-II). We get √ sgn(x)g(x) 2G(x) dξ dx dt ξ = = √ = η − F (x(ξ )) (y − F (x)) dx dt dτ sgn(x)g(x) 2G(x) √ 2G(x) dη dy dt  η = = −g(x) = −ξ. dy dt dτ sgn(x)g(x) 

The function x(ξ ) above is the inverse transformation of the newly introduced variable ξ .

86 Limit Cycles in Planar Systems of Ordinary Differential Equations

Lemma 1 implies that all results that are proved for the system

2305

 

x˙ = y − F (x), y˙ = −x

(21)

can be translated to (20), too, provided (A-II) holds. It follows from (A-I) that the origin is the only fixed point of (21). The tracedeterminant criterion gives immediately that it is a non-saddle which is stable if f (0) > 0 and unstable if f (0) < 0. Now we use the Poincaré-Bendixson theorem in order to formulate conditions for closed orbits for the Liénard equation. Theorem 8 (Dragilev (1952)). Assume (i) F ∈ C 1 (R), (ii) xF (x) < 0, x = 0 in some neighborhood of the origin, and that (iii) there exists constants X∗ > 0, K− , K+ , with K− < K+ and F (x) < K− , for x < −X∗ and F (x) > K+ , for x > X∗ . Then the system (20) has at least one closed orbit. Remark 2. The most recent extension (up to our knowledge) of this theorem is given in Cioni and Villari (2015). Proof. Consider the Lyapunov functional V (x, y) =

1 2 (x + y 2 ). 2

(22)

We have V˙ = x(y − F (x)) + y(−x) = −xF (x) > 0, x = 0 from condition (ii) in some neighborhood of the origin. This implies that the vector field is directed outward from or along a sufficiently small neighborhood of the origin. We continue now constructing an outer boundary, such that the vector field (21) is directed inward with respect to that boundary. We begin with the region x < −X∗ and use the Lyapunov function V− (x, y) =

1 2 1 x + (y − K− )2 2 2

here. We get V˙− = x(y − F (x)) + (y − K− )(−x) = −x(F (x) − K− ) < 0. Similarly, we have in the region x > X∗ with V+ (x, y) =

1 2 1 x + (y − K+ )2 , 2 2

V˙+ = x(y − F (x)) + (y − K+ )(−x) = −x(F (x) − K+ ) < 0.

2306

T. Lindström

We proceed to the region |x| < X∗ and consider the level curves V0 (x, y) = |−(K+ − K− )x + 2X∗ y| . We have for  y > y+ = max

max F (x) +

|x|≤X∗

2X∗2 K+ − K− , K+ − K− 2



that V˙0 = −(K+ − K− )(y − F (x)) + 2X∗ (−x) < −(K+ − K− )

2X∗2 + 2X∗2 = −2X∗2 + 2X∗2 = 0. K+ − K−

Similarly, for 

2X∗2 K+ − K− y < y− = min min F (x) − ,− |x|≤X∗ K+ − K− 2



we get that V˙0 = (K+ − K− )(y − F (x)) − 2X∗ (−x) < −(K+ − K− )

2X∗2 + 2X∗2 = −2X∗2 + 2X∗2 = 0. K+ − K−

For the regions (A) |x| ≤ X∗ with |y| > max(y+ , |y− |) and (B) |x| > X∗ , it is therefore, possible to construct a trapping region for the solutions proving that at least one closed orbit exists according to the Poincaré-Bendixson theorem (Theorem 2).   We conclude that it is of course, possible to state additional conditions ensuring that possible closed orbits are isolated. For instance, F analytic is sufficient. Later, we shall deal with uniqueness of limit cycles for (21). If the periodic orbit is unique, then it is isolated and hence, a limit cycle (Fig. 1).

Theorems for Absence of Limit Cycles The Poincaré-Bendixson Theorem can be used to prove that closed orbits exist. We have seen that it can be combined with the fact that the vector field may be analytic or other facts excluding degenerate cases in some relevant part of the phase space in order to conclude that limit cycles exist, too. We have already proved that a large number of quite general systems do not exhibit limit cycles at all.

86 Limit Cycles in Planar Systems of Ordinary Differential Equations

2307

x = −X∗ x = X∗ y

V0 (x, y) = constant

30 25

y = F (x)

20

V− (x, y) = constant

15

y = K+

y = K−

10 V+ 0(x, y) = constant

5 y = F (x)

0 -5

V0 (x, y) = constant

V− (x, y) = constant

-20

-15

-10

-5

0

5

10

15

x

20

Fig. 1 The trapping region for the system (21) together with the function F and one closed orbit

However, use of the Poincaré-Bendixson theorem to conclude that limit cycles exist is the easy part in this field. It can, thus, be used for providing lower bounds for the number of limit cycles, but it does not provide any answers regarding upper bounds for the number of limit cycles. It does not limit the complexity of planar dynamical in any other way than stating that chaotic dynamics do not exist. The general problem regarding upper bounds for the number of limit cycles is related to Hilbert’s 16th problem and is still open. The finiteness theorem for polynomial vector fields in the real plane (Ilyashenko 1991) and the connection to trophical geometry (Viro 2008) are probably the most interesting achievements so far related to Hilbert’s 16th problem (Ilyashenko 2002). The simplest theorems giving upper bounds for the number of limit cycles are theorems for absence of limit cycles. If limit cycles are not excluded by the type of system considered (it is not linear, first integral, monotone, etc.), then there are mainly two arguments that can be used. Both of them are non-trivial to use and require construction of certain user-defined functions. I divide these arguments in

2308

T. Lindström

either divergence-based arguments (cf. Theorem 3) or Lyapunov function based arguments (cf. Theorem 5). The divergence-based argument is in this case called Dulac’s theorem. Theorem 9 (Dulac’s theorem). Consider (3) and assume that ρ(x, y) ∈ C 1 (R2 ). There are no closed orbits in a simply connected domain on which ∂(ρ(x, y)P (x, y)) ∂(ρ(x, y)Q(x, y)) + ∂x ∂y

(23)

is of one sign. Proof. This theorem is a consequence of Green’s theorem. We might start by assuming that (23) is positive in a simply connected region R. Let  be a closed trajectory for (3) in R and let D be the interior of . We use Green’s theorem in the first equality below and get    0< =

D

 ∂(ρ(x, y)P (x, y)) ∂(ρ(x, y)Q(x, y)) + dxdy ∂x ∂y

(−ρ(x, y)Q(x, y)dx + ρ(x, y)P (x, y)dy) 

=

 ρ(x, y) (−ydx ˙ + xdy) ˙ =



ρ(x, y) (−y˙ x˙ + x˙ y) ˙ dt = 0. 

The derived conclusion 0 < 0 is a contradiction. We can now repeat arguments assuming (23) negative in a simply connected region as well.   We return to (21). The divergence of the vector field is −f (x), so if f is negative or positive, then (21) has no closed orbits. In this case it suffices to use ρ(x, y) = 1 in order arrive in the conclusion. We can, therefore, state the following theorem. Theorem 10. Assume (A-I). If f (x) > ( 0, x = 0 then the system (21) has no limit cycles.

86 Limit Cycles in Planar Systems of Ordinary Differential Equations

2309

Proof. Consider the functional (22). It is scalar and C 1 . It is positive definite, too. Now, we get V˙ = x(y − F (x)) + y(−x) = −xF (x)

(24)

which is negative semidefinite by assumption (ii). It follows that all bounded solutions approach the set x = 0, and there the only invariant part of this set is the origin. Hence, all bounded orbits approach the origin and no closed orbits exist.   Note that there are conditions for which both Dulac’s and LaSalle’s invariance principle apply but that there are exist cases for which just one of these criteria apply in order to exclude closed orbits. The construction of the Lyapunov functional V above or a Dulac function ρ can be a delicate mathematical problem if no natural or obvious choices exist.

Uniqueness of Limit Cycles When limit cycles cannot be excluded, the next step is to prove that they exist, and we have already concluded that Poincaré-Bendixson’s theorem combined with some argument excluding degenerate cases suffices for this. However, estimating the number of limit cycles from above was an acknowledged mathematical problem. The only theory that describes the evolution of limit cycles and limit cycle bifurcations in a precise manner is the theory of general rotated vector fields; see, e.g., Duff (1953), Ye et al. (1986), and Perko (1993). The difficulty in applying this theory is finding suitable rotation parameters for systems occurring in applications. In the end of the proof of Theorem 12, we give an example of how this theory can be applied. Also in this case the so far most efficient methods are either divergence based or Lyapunov functional based. In both cases we assume that two limit cycles exist and compare either the divergence integrated over these cycles or the changes of the value of the Lyapunov functional when integrated over the assumed cycles. Also here, the Liénard system serves as a good example. It is simple enough in order to generate strong conclusions, and yet it is an example that is clearly based on real-world applications. We first present the divergence-based method. Theorem 12 (Zhang (1986)). Consider (21) and assume (i) F ∈ C 1 (R) (ii) xF (x) < 0, x = 0 in some neighborhood of the origin (iii) f/id is non-decreasing on R− and R+ then the system (21) has at most one limit cycle, and if it exists, it is stable.

2310

T. Lindström

Remark 3. Cherkas and Zhilevich (1970) proved a theorem that at the first glance seems more general and easier to prove than the above one. The key problem with their formulation is one of the conditions of their theorem. Symmetry is not assumed, and it is not obvious how to verify that all cycles encircle the prescribed interval on the x-axis before using their theorem. In the proof of Theorem 13 later, we give an argument that could be used in order to remedy this problem. Remark 4. When sufficient smoothness is added, xF (x) < 0, x = 0 implies f (x) < 0 in some neighborhood of the origin. Proof. We consider (22) and conclude that the origin is an unstable non-saddle if xF (x) < 0, x = 0 in some neighborhood of the origin. Index theory gives that all limit cycles must encircle the origin. We assume that two limit cycles exist and baptize them as i , i = 1, 2. Let their parameter description be given by (xi (t), yi (t)), i = 1, 2. We assume that 1 is the one that is closest to the origin. Then limit cycle 1 must be stable from the inside meaning that  f (x1 (t))dt ≥ 0. (25) 1

according to Theorem 3. The next step is to prove   f (x2 (t))dt > f (x1 (t))dt 2

(26)

1

meaning that 2 is stable. This is a contradiction if we can exclude the possibility for semistable limit cycles. Our plan to prove the uniqueness of limit cycles, by proving inequality (26) first and then by removing the possibility of semistable limit cycles. We first circumvent the problem that we addressed in the theorem of Cherkas and Zhilevich (1970). By continuity, the interior limit cycle 1 must intersect the graph of the isocline y = F (x) at exactly two points; we call them Q1 and P1 . We assume that their horizontal coordinates are xQ1 < 0 and xP1 > 0 and construct a new function defined as f∗ (x) = f (x) −

f (xQ1 ) x. xQ1

We note that f∗ has the following properties: First, f∗ (x) = f (0) ≤ 0 follows from continuity and xF (x) negative definite in some neighborhood of the origin. Second, f (x) f (xQ1 ) f∗ (x) = − x x xQ1 is non-decreasing on R− and R+ . Third, f∗ (x)(x − xQ1 ) negative definite for x < 0 since it can be reorganized as f∗ (x)(x − xQ1 ) = x

f∗ (x) (x − xQ1 ). x

86 Limit Cycles in Planar Systems of Ordinary Differential Equations

2311

We already knew that f∗ /id was non-decreasing on R− and that it was constructed in order to have a zero at xQ1 . Similarly, we note that f∗ (x)(x − xN ) = x

f∗ (x) (x − xN ) x

is positive definite on x > 0 for exactly one point N with xN > 0. We have xN < xP , and if this does not happen, then we must have 



 f (xQ ) f∗ (x1 (t))dt = f (x1 (t))dt − x1 (t)dt (27) xQ 1 1 1    f (xQ ) = f (x1 (t))dt + dy1 (t) = f (x1 (t))dt < 0 xQ 1 1 1

contradicting (25). From the second equality of (27), we conclude that 

 f∗ (xi (t))dt =

i

f (xi (t))dt, i = 1, 2, i

too. We have now circumvented the problem with the formulation of the Cherkas and Zhilevich (1970) theorem and have derived sufficiently precise statements regarding the position of the limit cycle with respect to the zeros of the constructed function f∗ from the conditions stated in the theorem. These statements can now be used for distinguishing between negative and positive contributions in the Poincaré criterion about stability (Theorem 3). Now consider Fig. 2. We start from the arcs   Q 1 A1 and F 2 A2 . Along these arcs, we have x˙ > 0 meaning that we can represent these arcs as functions yi (x), i = 1, 2, xQ1 < x < xN . We then get 

  F 2 A2



f∗ (x2 (t))dt −

xN

f∗ (x) dx − y2 (x) − F (x)

xQ1



 Q 1 A1

f∗ (x1 (t))d = 

xN xQ1

f∗ (x) dx = y1 (x) − F (x)

f∗ (x)(y1 (x) − y2 (x)) dx > 0. (y2 (x) − F (x))(y1 (x) − F (x))

xN

xQ1

A similar computation verifies 



 C 2 E2

f∗ (x2 (t))dt −





xN

xQ1

 C 1 Q1

f∗ (x1 (t))dt =

f∗ (x)(y1 (x) − y2 (x)) dx > 0, (y2 (x) − F (x))(y1 (x) − F (x))

2312

T. Lindström

10

A2

y

8

P2

A1 6

B2

4

P1 y = F (x)

F2

2

N 0

y = f (x)

Q1

-2

Q2

D2

E2

y = f∗ (x)

C1

-4

C2

-6 -4

-3

-2

-1

0

1

2

3

x

Fig. 2 Two limit cycles in the system (21) together with the functions F , f , and f∗

  too. We then consider the arcs B 2 D2 and A 1 C1 . These arcs can be represented as functions xi (y), i = 1, 2. We get 

  B 2 D2



f∗ (x2 (t))dt −

y A1



yC1

 A 1 C1

f∗ (x1 (t))dt =

 f∗ (x2 (y)) f∗ (x1 (y)) − dy ≥ 0 x2 (y) x1 (y)

since xN > 0 and f∗ /id was non-decreasing on R+ . We also have 

  A 2 B2

f∗ (x2 (t))dt ≥ 0,

so (26) holds.

 D 2 C2

 f∗ (x2 (t))dt ≥ 0, and

 E 2 F2

f∗ (x2 (t))dt ≥ 0,

86 Limit Cycles in Planar Systems of Ordinary Differential Equations

2313

The last step is to exclude the possibility for semistable limit cycles, and we use the theory of rotated vector fields (Duff 1953; Perko 1993) to do so. First we take a line x = x > 0 intersecting 1 and construct a new system x˙ = y − F (x), y˙ = −x,

(28)

> 0, where F (x) =

F (x), x < x , F (x) + (x − x )2 , x ≥ x .

Since f (x) =

f (x), x < x , f (x) + 2 (x − x ), x ≥ x .

is continuous and (x − x )/x is strictly increasing; (28) satisfies all conditions of Theorem 12. We have now that



y − F (x) x 0, x < x , 1

= x(F (x) − F (x)) = 2 1 2, x ≥ x

y − F (x) x − ) x( 2 1  2 has fixed sign with respect to 2 > 1 > 0 meaning that (28) is a family of general rotated vector fields. If 1 is semistable for = 0 and stable from the inside, it will split into at least two limit cycles as we get 0 < X∗ and for x = {−X∗ , 0, X∗ } we have F (x)x(x − X∗ )(x + X∗ ) > 0. Then the system (21) has a unique limit cycle, which is stable.

2314

T. Lindström

Remark 5. This theorem requires some symmetry regarding the location of ±X∗ . Recent work that address this problem are available; see, e.g., Hayashi et al. (2018). Proof. Consider (21). Theorem 8 implies that at least one limit cycle exists. The innermost cycle is stable from the inside, and the outermost cycle is stable from the outside. The derivative of (22) with respect to (21) is given by (24). Therefore, dV dV dt −xF (x) = = = F (x). dy dt dy −x meaning that dV = F (x)dy. We commence by estimating the location of the limit cycle by proving that its leftmost point must take a smaller value than −X∗ and that its rightmost point must take larger value than X∗ , i.e., all limit cycles encircle the interval [−X∗ , X∗ ] on the x-axis. We first assume that no part of the limit cycle  is outside the region |x| ≤ X∗ . Moving along the cycle when x > 0, we have dy < 0 and F (x) < 0, i.e., dV > 0. Similarly for x < 0, we get dV > 0. Therefore, dV > 0. 

However, V should return to its original value when moving around any closed curve once. This contradicts the existence of a limit cycle that does not possess any part outside the strip −X∗ < x < X∗ . Next suppose that the leftmost point Q is on the right of x = −X∗ and that the rightmost point is on the right of x = X∗ . Then  intersects x = X∗ at a point P below the horizontal axis, and we must have OP > X∗ . If we move along  from P , then dV > 0 but we have OQ < X∗ . A similar argument excludes the possibility that the leftmost point of the limit cycle is on left of x = −X∗ and the rightmost point is on the left of x = X∗ . We conclude that all limit cycles of (21) must intersect x = ±X∗ at four points. We now assume that two limit cycles i , i = 1, 2 exist and label their four intersection points with x = ±X∗ as Ai , Ci , Ei , and Gi , i = 1, 2; see Fig. 3. The idea is now to compare two integrals that are supposed to be zero, i.e.,

dV and

dV

1

2

and show that both of them cannot equate to zero since they are different. First we have   B 2 D2

 dV =

  B 2 D2

F (x)dy ≤

 A 1 C1

 F (x)dy =

 A 1 C1

dV

86 Limit Cycles in Planar Systems of Ordinary Differential Equations

2315

4

y

A2

3 2

B2

A1 G2

1

H2

0

G1 D2

C1

-1

E1

F2

C2

-2 E2

-3 -1

-0.5

0

0.5

x

1

1.5

Fig. 3 Two limit cycles in the system (21) together with F

since F is positive and non-decreasing on x > X∗ . Similarly, we have 

  F 2 H2

dV =

  F 2 H2

F (x)dy ≤



 E 1 G1

F (x)dy =

 E 1 G1

dV

since F is negative and non-decreasing on x < −X∗ . Next, we have 

  G 2 A2

dV =

 G 2 A2

−xF (x) dx < y − F (x)

 C 2 E2

−xF (x) dx < y − F (x)

  G 1 A1

−xF (x) dx = y − F (x)

  G 1 A1

dV

and 

  C 2 E2

dV =

  C 1 E1

−xF (x) dx = y − F (x)

  C 1 E1

dV ,

2316

T. Lindström

that together with 

  H 2 G2

dV < 0,

 A 2 B2

 dV < 0,

  D 2 C2

dV < 0,

 E 2 F2

dV < 0,

implies

dV < 2

dV . 1

This means that 2 and 1 cannot simultaneously exist. The remaining limit cycle is unique and stable from both sides and, thus, stable.  

Summary In this chapter we have considered a quite special nonlinear phenomenon: limit cycles or isolated periodic orbits. Such phenomena are not encountered in many classes of systems that in general are considered as well-understood. Complete qualitative analysis of systems possessing limit cycles with clear connections to real world problems require thus, in many cases, precise estimates and well-selected methods. Our last remark is that many of the methods that we have treated above apply to the generalized Liénard equation x˙ = φ(y) − F (x), y˙ = −g(x),

(29)

too. The usual conditions set on the involved functions are (A-I), (A-II), and (A-III) φ ∈ C 1 (R) and yφ(y) > 0, y = 0 and φ nondecreasing with φ(±∞) = ±∞. The function φ cannot be removed by a transformation similar to the one that removed g in Lemma 1. A natural Lyapunov function still exists and is given by 

x

V (x, y) = 0

 g(u)du +

y

φ(v)dv. 0

This makes it harder to use the various symmetry arguments above. This generalization is necessary for translating results for the generalized Liénard equation into a biological context; see, e.g., Kuang and Freedman (1988) and Xiao and Zhang (2003), Lindstrom (2019), Mathematics and recurrent population outbreaks, Springer, 2019.

86 Limit Cycles in Planar Systems of Ordinary Differential Equations

2317

References Álvarez MJ, Gasull A, Prohens R (2010) Topological classification of polynomial complex differential equations with all critical points of centre type. J Differ Equ Appl 16(5–6):411– 423 Cherkas LA, Zhilevich LI (1970) Some criteria for the absence of limit cycles and for the existence of a single cycle. Differ Equ 6:891–897 Cioni M, Villari G (2015) An extension of Dragilev’s theorem for the existence of periodic solutions of the Liénard equation. Nonlinear Anal 127:55–70 Conway JB (1973) Functions of one complex variable. Springer, New York/Heidelberg/Berlin Dragilev AV (1952) Periodic solutions of the differential equation of the differential equation of nonlinear oscillations. Acad Nauk SSSR Prikl Mat Meh 16:85–88 Duff GFD (1953) Limit cycles and rotated vector fields. Ann Math 57(1):15–31 Dumortier F, Llibre J, Artés JC (2006) Qualitative theory of planar differential systems. Springer, Berlin/Heidelberg Garijo A, Gasull A, Jarque Z (2007) Local and global phase portrait of equation z˙ = f (z). Discret Contin Dyn Syst 17(2):309–329 Grimshaw R (1993) Nonlinear ordinary differential equations. CRC Press, Boca Raton Hadeler KP, Glas D (1983) Quasimonotone systems and convergence to equilibrium in a population generic model. J Math Anal Appl 95:297–303 Hayashi M, Villari G, Zanolin F (2018) On the uniqueness of limit cycle for certain Liénard systems without symmetry. Electron J Qual Theory Differ Equ 55:1–10. http://www.math.uszeged.hu/ejqtde Hirsch MW, Smale S, Devaney RL (2013) Differential equations, dynamical systems, and an introduction to chaos. Academic, Oxford Hofbauer J, Sigmund K (1998) Evolutionary games and population dynamics. Cambridge University Press, Cambridge Ilyashenko Y (1991) Finiteness theorems for limit cycles. Translations of mathematical monographs, vol 94. American Mathematical Society, Providence Ilyashenko Y (2002) Centennial history of Hilbert’s 16th problem. Bull Am Math Soc 39(3):301– 354 Jordan DW, Smith P (1990) Nonlinear ordinary differential equations, 2nd edn. Clarendon Press, Oxford Kamke E (1930) Über die eindeutige Bestimmtheit der Integrale von Differentialgleichungen. Mat Z 1:101–107 Kamke E (1932) Zur Theorie der Systeme gewöhnlicher Differentialgleichungen II. Acta Math 58:57–85 Kuang Y, Freedman HI (1988) Uniqueness of limit cycles in Gause-type models of predator-prey systems. Math Biosci 88:67–84 LaSalle JP (1960) Some extensions of Lyapunovs second method. IRE Trans Circuit Theory CT– 7:520–527 Liénard A (1928) Étude des oscillations entretenues. Revue Gén Électr 23:906–924 Lindström T (1993) Qualitative analysis of a predator-prey system with limit cycles. J Math Biol 31:541–561 Lotka AJ (1925) Elements of physical biology. Williams and Wilkins, Baltimore Mallet-Paret J, Sell GR (1996) The Poincaré-Bendixson theorem for monotone cyclic feedback systems with delay. J Differ Equ 125:441–489 Michel AM, Hou L, Liu D (2015) Stability of dynamical systems, 2nd edn. Birkhäuser, Cham Müller M (1927a) Über das fundamentaltheorem in der theorie der gewöhnlichen differentialgleichungen. Mat Z 26(1):619–645 Müller M (1927b) Über die Eindeutigkeit der Integrale eines systems gewöhnlicher Differentialgleihungen und die Konvergenz einer Gattung von Verfahrenzur Approximationdieser Integrale. Sitzungsber Heidelb Akad Wiss Math-Natur Kl 9, 2–38 Perko L (1987) On the accumulation of limit cycles. Proc Am Math Soc 99:515–526

2318

T. Lindström

Perko LM (1993) Rotated vector fields. J Differ Equ 103:127–145 Perko L (2001) Differential equations and dynamical systems. Springer, New York Saff EB, Snider AD (2003) Fundamentals of complex analysis with applications to engineering and science, 3rd edn. Pearson Education, Upper Saddle River Sansone G (1949) Sopra l’equazione di A. Liénard delle oscillazioni di rilassimento. Ann Mat Pura Appl Ser IV 28:153–181 Smith HL (1995) Monotone dynamical systems: an introduction to the theory of competitive and cooperative systems. American Mathematical Society, Providence Smith HL (2011) An introduction to delay differential equations with applications to life sciences. Springer, New York Sverdlove R (1979) Vector fields defined by complex functions. J Differ Equ 34:427–439 Viro O (2008) From the sixteenth Hilbert problem to tropical geometry. Jpn J Math 3:185–214 Volterra V (1926) Variazioni e fluttuazioni del numero d’individui in specie animali conviventi. Memorie della R. Accademia Nationale dei Lincei 6 2:31–113 Wiggins S (2003) Introduction to applied nonlinear dynamical systems and chaos, 2nd edn. Springer, New York Xiao D-M, Zhang Z-F (2003) On the uniqueness and nonexistence of limit cycles for predator-prey systems. Nonlinearity 16:1185–1201 Ye Y-Q et al (1986) Theory of limit cycles, 2nd edn. American Mathematical Society, Providence Zhang Z-F (1980) Theorem of existence of exact n limit cycles in |x| ˙ ≤ n for the differential equation x¨ + μ sin x˙ + x = 0. Sci Sin 23(12):1502–1510 Zhang Z-F (1986) Proof of the uniqueness theorem of limit cycles of generalized Liénard equations. Appl Anal 29:63–76

Mathematical Models in Neuroscience: Approaches to Experimental Design and Reliable Parameter Determination

87

Denis Shchepakin, Leonid Kalachev, and Michael Kavanaugh

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chemical Kinetics Schemes and the Law of Mass Action . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristic Scales and Model Non-dimensionalization . . . . . . . . . . . . . . . . . . . . . . . . . . . Brief Review of Asymptotic Analysis and Asymptotic Algorithm for Model Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quasi-Steady-State Approximation and Michaelis–Menten–Henri Kinetics . . . . . . . . . . . . . NMDAR Desensitization: Background Information and General Model . . . . . . . . . . . . . . . Kinetic Model of NMDAR and Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Initial Conditions for NMDAR Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduction of the NMDAR Model in Case of Experiments with High Concentration of D-Serine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduction of the NMDAR Model in Experiments with High Concentration of L-Glutamate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduction of the NMDAR Model in Experiments with High Concentrations of D-Serine and L-Glutamate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduction of the NMDAR Model After the Pulse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reliable NMDAR Model Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Fitting to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2320 2321 2324 2326 2332 2334 2338 2340 2342 2345 2347 2350 2351 2353 2355 2356

Abstract Overparametrization of models in natural sciences, including neuroscience, is a problem that is widely recognized but often not addressed in experimental

D. Shchepakin () · L. Kalachev · M. Kavanaugh University of Montana, Missoula, MT, USA e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected] © Springer Nature Switzerland AG 2021 B. Sriraman (ed.), Handbook of the Mathematics of the Arts and Sciences, https://doi.org/10.1007/978-3-319-57072-3_134

2319

2320

D. Shchepakin et al.

studies. The systematic reduction of complex models to simpler ones for which the parameters may be reliably estimated is based on asymptotic model reduction procedures taking into account the presence of vastly different time scales in the natural phenomena being studied. The steps of the reduction process, which are reviewed here, include basic model formulation (e.g., using the law of mass action applied routinely for problems in neuroscience, biological and chemical kinetics, and other fields), model non-dimensionalization using characteristic scales (of times, species concentrations, etc.), application of an asymptotic algorithm to produce a reduced model, and analysis of the reduced model (including suggestions for experimental design and fitting the reduced model to experimental data). In addition to the review of some classical results and basic examples, we illustrate how the approach can be used in a more complex realistic case to produce several reduced kinetic models for N-methylD-aspartate receptors, a subtype of glutamate receptor expressed on neurons in the brain, with models applied to different experimental protocols. Simultaneous application of the reduced models to fitting the data obtained in a series of specially designed experiments allows for a stepwise estimation of parameters of the original conventional model which is otherwise overparameterized with respect to the existing data.

Keywords Neurotrasmitter transport · Receptors and transporters · Chemical kinetics schemes · Asymptotic methods · Systems of differential equations · Model reduction

Introduction Mathematical models are applied extensively to studies of fundamental problems in neuroscience. One of the best known is the Hodgkin–Huxley model, which provided a biophysically accurate, mechanistic account of the generation of the action potential in the squid giant axon (Hodgkin and Huxley, 1952). Later development of a reduced version of this model, known as FitzHugh–Nagumo model, provided a simplified and useful tool for studying excitable systems (FitzHugh, 1955; Nagumo et al., 1962). In this work, the attention is focused on the analysis of kinetic models describing the action of neurotransmitter binding to receptors and transporters in the brain. Some of the models discussed in this chapter are widely known, comparatively simple, and generic, while others are more realistic but at the same time more complex, e.g., modeling neurotransmitter diffusion and transport in the synaptic cleft and extracellular space in the brain (Rusakov and Kullman, 1998; Savtchenko and Rusakov, 2007). The classical results on model derivation and reduction techniques presented below are of general interest and may be used to handle numerous applications not only in neuroscience but also in biological and chemical kinetics, pharmacokinetics, and other fields of study dealing with complex models and the need for reliable model parameter identification.

87

Mathematical Models in Neuroscience: Approaches to. . .

2321

The current chapter is organized as follows. First, the well-known law of mass action (Voit et al., 2015) is briefly discussed, and some simple models derived using this law are introduced as an illustration. Then the basic ideas behind model non-dimensionalization are addressed; these ideas are then illustrated via their application to a generic neurotransmitter transporter model mentioned above. Next, the classical results of asymptotic analysis and the steps of a reduction algorithm based on the so-called boundary function method (Vasil’eva et al., 1995) are reviewed. Following this, the asymptotic algorithm is applied to the example of a generic transporter model to produce the so-called Michaelis–Menten–Henri approximation (Henri, 1903; Michaelis and Menten, 1913), and the idea of the quasi-steady state approximation (Segel and Slemrod, 1989; Stiefenhofer, 1998) is briefly discussed. Finally, the formulation of a realistic general model related to the N-methylD-aspartate receptor (NMDAR), one of the major subtypes of glutamate receptors on neurons, is presented. This receptor plays critical role in synaptic plasticity, development, learning, and memory (Traynelis et al., 2010). Disruptions of its function are associated with such disorders as epilepsy, depression, schizophrenia, ischemic brain injury, and others. NMDARs have been a topic of numerous studies; over the last two decades several mathematical models have been proposed and applied to explain the dynamics of the ion currents mediated by the NMDAR ion channel; see, e.g., Benveniste et al. (1990), Nahum-Levy et al. (2001) and Iacobucci and Popescu (2018). However, the conclusions on receptor kinetics based on these models were typically limited due to model overparameterization with respect to the available data. For this more realistic, and thus more complex, example, it is shown how designing the experiments in accordance with model predictions resolves the issue of model overparameterization. Application of the algorithm suggests a series of experiments which have to be performed to reliably estimate model parameters. The boundary function method, one of the asymptotic methods, is used to obtain the simplified versions of the general model corresponding to some particular experimental setups. As a proof of concept, the application of the algorithm to simulated data, which mimics the data from real experiments, is presented. Brief discussion of the approach concludes the chapter.

Chemical Kinetics Schemes and the Law of Mass Action Consider the following hypothetical kinetic scheme describing a chemical (biochemical) reaction, where A, B, C represent some chemical species: k+

nA + mB + j C −→ sA + pB + rC,

k−

nA + mB + j C ←− sA + pB + rC, (1)

where n, m, j , s, p, and r are the integers representing the number of molecules of each species taking part in the corresponding forward–reverse reactions with the reaction rate constants k + (forward) and k − (reverse); arrows indicate the directions of particular reactions.

2322

D. Shchepakin et al.

Let [A], [B], and [C] represent the concentrations of A, B, C, respectively, in appropriate dimensional units of measurement. Under assumption that the species are well mixed in a fixed volume, the kinetic scheme (1) describes the time-dependent species transformations (with no spatial dependence) which lead to changes of species concentrations in the mix. The law of mass action states that the rates of change of species concentrations depend on the respective concentrations according to the following formulas: d[A] = k + (s − n)[A]n [B]m [C]j + k − (n − s)[A]s [B]p [C]r , dt d[B] = k + (p − m)[A]n [B]m [C]j + k − (m − p)[A]s [B]p [C]r , dt d[C] = k + (r − j )[A]n [B]m [C]j + k − (j − r)[A]s [B]p [C]r . dt

(2) (3) (4)

Here t stands for time; below, without loss of generality, we will use the terms time, time variable, and independent variable interchangeably. For the case where, e.g., n = 1, m = 1, j = 0, s = 0, p = 0, and r = 1, the following kinetic scheme can be written (compare with (1)): k+

A + B −→ C,

k−

A + B ←− C,

(5)

describing the case of a generic receptor if A represents a neurotransmitter, B stands for a free receptor, and C corresponds to a bound receptor, so that the entire scheme is just representing a binding–unbinding reaction. From (5) we obtain (compare with (2), (3) and (4)): d[B] d[C] d[A] = =− = −k + [A][B] + k − [C], dt dt dt

(6)

for which the law of mass action just states that the rates of change of species concentrations during the reactions (5) are proportional to the products of corresponding species concentrations taking part in the reactions under consideration. To solve (6) for the time-dependent concentrations of species, corresponding initial conditions, i.e., the concentrations of species at time t = 0, must be specified. In particular, consider the case where all the receptors at the initial instant of time are unbound: [A](0) = [A]∗ ,

[B](0) = [B]∗ ,

[C](0) = [C]∗ = 0.

(7)

Problem (6) and (7) can be easily solved. First, a conservation relationship for the receptors, free and bound, may be derived; indeed, from (6), since d[B]/dt + d[C]/dt = 0, it follows that [B](t) + [C](t) =const= [B]∗ . Then, it can be easily seen that the total concentration of neurotransmitter, free and bound, is also

87

Mathematical Models in Neuroscience: Approaches to. . .

2323

conserved: d[A]/dt + d[C]/dt = 0, and thus, it follows that [A](t) + [C](t) =const= [A]∗ . These two conservation relationships can be used to substitute the system of three differential equations (6) by one equation with corresponding initial condition: d[C] = k + ([A]∗ − [C])([B]∗ − [C]) − k − [C], dt

[C](0) = 0.

(8)

The solution of the constant coefficient Riccati-type differential equation (8) with zero initial condition can be written out. Indeed, the stable steady state of this equation [C]st. is given by the expression: [C]st. = ([A]∗ + [B]∗ + k − /k + )/2 − D > 0,  where D = ([A]∗ + [B]∗ + k − /k + )2 /4 − [A]∗ [B]∗ , and the formula for the time dependent solution is [C](t) = [C]st. +

1 . 1/(2D) − (1/[C]st. + 1/(2D)) exp(2k + Dt)

(9)

When [C](t) is known, the expressions for other species’ concentrations, [A](t) = [A]∗ − [C](t) and [B](t) = [B]∗ − [C](t), can also be immediately written out. Unfortunately, only the simplest models, e.g., (6), (7) have, explicit solutions, like (9). Even the slight modifications of the kinetic scheme (5) produce the models which do not have explicit analytic solutions. Consider the following example, dealing with the kinetics of a generic transporter, where A represents a neurotransmitter (e.g., located outside a cell), B stands for a free transporter located on a cell membrane, and C corresponds to a bound transporter; the scheme below is describing binding–unbinding reaction combined with the transfer of neurotransmitter through a cell membrane: k+

A + B −→ C,

k−

A + B ←− C,

λ

C −→ B.

(10)

The corresponding system of differential equations for concentrations of species now has the form (here once again the law of mass action is used with an additional, transfer, reaction characterized by the rate constant λ; note that in (10) the product of the transfer reaction, i.e., the neurotransmitter which had penetrated through the cell membrane and ended up inside a cell, was omitted): d[A] = −k + [A][B] + k − [C], dt d[B] = −k + [A][B] + k − [C] + λ[C], dt

(11) (12)

2324

D. Shchepakin et al.

d[C] = k + [A][B] − k − [C] − λ[C]. dt

(13)

The system (11), (12), and (13) is supplied with the same initial conditions (7), where now [B] and [C] stand for the concentrations of free and bound transporters. The conservation of transporters can once again be derived from the system (11), (12), and (13) following the steps similar to those applied in the case of system (6): [B](t)+[C](t) = [B]∗ . The neurotransmitters, however, are not conserved anymore. Elimination of [B] from (11), (12), and (13) leads to the following problem: d[A] = −k + [A]([B]∗ − [C]) + k − [C], dt d[C] = k + [A]([B]∗ − [C]) − (k − + λ)[C], dt [A](0) = [A]∗ ,

[C](0) = 0,

(14) (15)

(16)

which does not have an explicit analytic solution. In the next section the discussion of characteristic time scales is presented, and the topic of model non-dimensionalization is addressed. As a side note, it is important to mention that the law of mass action is also often used in the applications dealing with interactions of individuals, such as epidemic models, including a well-known susceptible–infected–recovered (SIR) model describing infectious disease propagation and its numerous generalizations (Murray, 1993).

Characteristic Scales and Model Non-dimensionalization Usually complex models of real-life phenomena involve simultaneous description of several processes occurring on a vastly different characteristic time scales. To better understand the relationships between different parameters in the original statement of the problem and to find out which parameter combinations are actually important, it is usually advisable to perform model non-dimensionalization before starting the analysis. Natural phenomena in neuroscience commonly involve multiple processes that occur on different time scales: it may happen that some reactions are fast because the reaction rate constants have large numerical values compared to the others, but we may also observe the situations where the reaction rate constants are moderate, but the concentrations of some species participating in the reactions are much higher compared to the others. For the model (14), (15), and (16), consider a realistic situation, which may be implemented experimentally, where the initial concentration of neurotransmitter [A]∗ is much higher than the initial concentration of free transporters [B]∗ , i.e.,

87

Mathematical Models in Neuroscience: Approaches to. . .

2325

[A]∗  [B]∗ . It is very natural to introduce the new units of measurement for various variables, which are characteristic of the model under consideration. The new non-dimensional variables will be equal to the old dimensional ones divided by the dimensional constants corresponding to the new units of measurement. In particular, the concentration of neurotransmitters can be measured in the units of their initial concentration, and the bound transporters concentration can be naturally measured in the units of the initial free transporters concentration. For the nondimensional concentrations U and V of neurotransmitters and bound transporters, respectively, as well as for the non-dimensional time t˜, we write (note that now 0 ≤ U ≤ 1 and 0 ≤ V ≤ 1): U = [A]/[A]∗ ,

V = [C]/[B]∗ ,

t˜ = t/σ,

(17)

where σ is a characteristic time scale to be determined. After substitution of (17) into (14), (15), (16), the model will be written as dU = −k + [B]∗ σ U (1 − V ) + k − ([B]∗ /[A]∗ )σ V , d t˜ dV = k + [A]∗ σ U (1 − V ) − (k − + λ)σ V , d t˜ U (0) = 1,

V (0) = 0.

(18) (19)

(20)

The choice of σ must lead to fewer model parameter combinations in (18), (19), and (20); it must also reflect the characteristic time of interest, i.e., determine the time interval over which, e.g., the experimental data needs to be collected for model parameter identification. Since in the case of low initial concentration of transporters the availability of free transporters determines the rate of overall reaction (this is the so-called rate limiting step of the process), the following choice of measurement unit for the time variable can be made: σ = 1/(k + [B]∗ ). Then (18) and (19) may be written as dU = −U (1 − V ) + k − /(k + [A]∗ )V , d t˜ dV = ([A]∗ /[B]∗ )U (1 − V ) − ([A]∗ /[B]∗ )(k − + λ)/(k + [A]∗ )V . d t˜

(21) (22)

The following non-dimensional so-called small parameter 0 < ε  1 and nondimensional rate constants K and γ can now be introduced: ε = [B]∗ /[A]∗ ,

K = (k − + λ)/(k + [A]∗ ),

γ = λ/(k + [A]∗ ).

(23)

Under condition that both non-dimensional rate constants of reactions in (23) are moderate, and thus, the corresponding characteristic reaction times are of the same

2326

D. Shchepakin et al.

order of magnitude, the system (21) and (22) is now transformed to dU = −U (1 − V ) + (K − γ )V , d t˜ dV ε = U (1 − V ) − KV . d t˜

(24) (25)

This system has to be analyzed together with initial conditions (20) on a nondimensional time interval 0 ≤ t˜ ≤ T , where T is a moderate number (i.e., it does not depend on ε). In what follows, the notation t is once again used for the nondimensional time instead of t˜. The review of the basic ideas of asymptotic analysis is presented in the next section. This has to be done before the reduction procedure is applied to the original model (24), (25) and (20) with a small parameter (multiplying the derivative in one of the equations) in order to produce a simpler model containing only one differential equation approximating the behavior of the solution of the original problem on a finite time interval.

Brief Review of Asymptotic Analysis and Asymptotic Algorithm for Model Reduction As was illustrated in the previous section, in the presence of different characteristic time scales, the non-dimensionalized models may contain small parameters multiplying different terms in the corresponding equations. As a result, in many cases, such so-called perturbed problems may be reduced to simpler ones containing fewer equations and fewer parameters. Such reductions are often needed to determine the optimal number of parameter combinations which can be reliably estimated from the experimental data collected on a particular time scale related to the duration of an experiment and to the frequency with which the data were collected. Model reduction procedures cannot be applied automatically without additional analysis, i.e., some conditions have to be checked, and only if these conditions are satisfied, the reduction is possible. It is important to point out that asymptotic results are very practical; they are not just pure mathematical exercises. The possibility of asymptotic model reduction is related to that fact that the actual processes in the complex systems being studied may be comparatively fast or slow. This asymptotic approach provides useful tools allowing one to better understand the behavior of such realistic systems in neuroscience and in numerous other applied fields in the limit where the fast processes are assumed to be happening instantaneously and the slow processes are approximated by the ones where changes within characteristic time intervals of interest do not happen at all. First, it is important to introduce a number of definitions. Without loss of generality, some of the definitions, formulations, and terminology presented below are intentionally simplified to make understanding of the material easier. While an exact mathematical definition of regularly and singularly perturbed problems may be given for a general case, a corresponding intuitive notion will be presented here

87

Mathematical Models in Neuroscience: Approaches to. . .

2327

for a simple particular situations relevant to the current discussion, where models are formulated in terms of scalar ordinary differential equations or