Understanding the Discrete Element Method: Simulation of Non-Spherical Particles for Granular and Multi-body Systems 9781118567203, 111856720X

Gives readers a more thorough understanding of DEM and equips researchers for independent work and an ability to judge m

127 46 22MB

English Pages 484 [480] Year 2014

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Understanding the Discrete Element Method: Simulation of Non-Spherical Particles for Granular and Multi-body Systems
 9781118567203, 111856720X

Table of contents :
UNDERSTANDING THE DISCRETE ELEMENT METHOD SIMULATION OF NON-SPHERICAL PARTICLES FOR GRANULARAND MULTI-BODY SYSTEMS
Copright
Contents
About the Authors
Preface
Acknowledgements
List of Abbreviations
1 Mechanics
1.1 Degrees of freedom
1.1.1 Particle mechanics and constraints
1.1.2 From point particles to rigid bodies
1.1.3 More context and terminology
1.2 Dynamics of rectilinear degrees of freedom
1.3 Dynamics of angular degrees of freedom
1.3.1 Rotation in two dimensions
1.3.2 Moment of inertia
1.3.3 From two to three dimensions
1.3.4 Rotation matrix in three dimensions
1.3.5 Three-dimensional moments of inertia
1.3.6 Space-fixed and body-fixed coordinate systems and
equations of motion
1.3.7 Problems with Euler angles
1.3.8 Rotations represented using complex numbers
1.3.9 Quaternions
1.3.10 Derivation of quaternion dynamics
1.4 The phase space
1.4.1 Qualitative discussion of the time dependence of linear oscillations
1.4.2 Resonance
1.4.3 The flow in phase space
1.5 Nonlinearities
1.5.1 Harmonic balance
1.5.2 Resonance in nonlinear systems
1.5.3 Higher harmonics and frequency mixing
1.5.4 The van der Pol oscillator
1.6 From higher harmonics to chaos
1.6.1 The bifurcation cascade
1.6.2 The nonlinear frictional oscillator and Poincar´e maps
1.6.3 The route to chaos
1.6.4 Boundary conditions and many-particle systems
1.7 Stability and conservationlaws
1.7.1 Stability in statics
1.7.2 Stability in dynamics
1.7.3 Stable axes of rotation around the principal axis
1.7.4 Noether’s theorem and conservation laws
1.8 Further reading
Exercises
References
2
Numerical Integration of Ordinary
Differential Equations
2.1 Fundamentals of numerical analysis
2.1.1 Floating point numbers
2.1.2 Big-O notation
2.1.3 Relative and absolute error
2.1.4 Truncation error
2.1.5 Local and global error
2.1.6 Stability
2.1.7 Stable integrators for unstable problems
2.2 Numerical analysis for ordinary differential equations
2.2.1 Variable notation and transformation of the order of a
differential equation
2.2.2 Differences in the simulation of atoms and molecules,
as compared to macroscopic particles
2.2.3 Truncation error for solutions of ordinary differential equations
2.2.4 Fundamental approaches
2.2.5 Explicit Euler method
2.2.6 Implicit Euler method
2.3 Runge–Kutta methods
2.3.1 Adaptive step-size control
2.3.2 Dense output and event location
2.3.3 Partitioned Runge–Kutta methods
2.4 Symplectic methods
2.4.1 The classical Verlet method
2.4.2 Velocity-Verlet methods
2.4.3 Higher-order velocity-Verlet methods
2.4.4 Pseudo-symplectic methods
2.4.5 Order, accuracy and energy conservation
2.4.6 Backward error analysis
2.4.7 Case study: the harmonic oscillator with and
without viscous damping
2.5 Stiff problems
2.5.1 Evaluating computational costs
2.5.2 Stiff solutions and error as noise
2.5.3 Order reduction
2.6 Backward difference formulae
2.6.1 Implicit integrators of the predictor–corrector formulae
2.6.2 The corrector step
2.6.3 Multiple corrector steps
2.6.4 Program flow
2.6.5 Variable time-step and variable order
2.7 Other methods
2.7.1 Why not to use self-written or novel integrators
2.7.2 Stochastic differential equations
2.7.3 Extrapolation and high-order methods
2.7.4 Multi-rate integrators
2.7.5 Zero-order algorithms
2.8 Differential algebraic equations
2.8.1 The pendulum in Cartesian coordinates
2.8.2 Initial conditions
2.8.3 Drift and stabilization
2.9 Selecting an integrator
2.9.1 Performance and stability
2.9.2 Angular degrees of freedom
2.9.3 Force equilibrium
2.9.4 Exploring new fields
2.9.5 ODE solvers unsuitable for DEM simulations
2.10 Further reading
Exercises
References
3
Friction
3.1 Sliding Coulomb friction
3.1.1 A block on a slope
3.1.2 Static and dynamic friction coefficients
3.1.3 Apparent and actual contact area
3.1.4 Roughness and the friction coefficient
3.1.5 Adhesion and chemical bonding
3.2 Other contact geometries of Coulomb friction
3.2.1 Rolling friction
3.2.2 Pivoting friction
3.2.3 Sliding and rolling friction: the billiard problem
3.2.4 Sliding and rolling friction: cylinder on a slope
3.2.5 Pivoting and rolling friction
3.3 Exact implementation of friction
3.3.1 Establishing the difference between dynamic and static friction
3.3.2 Single-particle contact
3.3.3 Frictional linear chain
3.3.4 Higher dimensions
3.4 Modeling and regularizations
3.4.1 The Cundall–Strack model
3.4.2 Cundall-Strack friction in three dimensions
3.5 Unfortunate treatment of Coulomb friction in the literature
3.5.1 Insufficient models
3.5.2 Misunderstandings concerning surface roughness and friction
3.5.3 The Painlev´e paradox
3.6 Further reading
Exercises
References
4
Phenomenology of Granular
Materials
4.1 Phenomenology of grains
4.1.1 Interaction
4.1.2 Friction and dissipation
4.1.3 Length and time scales
4.1.4 Particle shape, and rolling and sliding
4.2 General phenomenology of granular agglomerates
4.2.1 Disorder
4.2.2 Heap formation
4.2.3 Tri-axial compression and shear band formation
4.2.4 Arching
4.2.5 Clogging
4.3 History effects in granular materials
4.3.1 Hysteresis
4.3.2 Reynolds dilatancy
4.3.3 Pressure distribution under heaps
4.4 Further reading
References
5
Condensed Matter and Solid
State Physics
5.1 Structure and properties of matter
5.1.1 Crystal structures in two dimensions
5.1.2 Crystal structures in three dimensions
5.1.3 From the Wigner–Seitz cell to the Voronoi construction
5.1.4 Strength parameters of materials
5.1.5 Strength of granular assemblies
5.2 From wave numbers to the Fourier transform
5.2.1 Wave numbers and the reciprocal lattice
5.2.2 The Fourier transform in one dimension
5.2.3 Properties of the FFT
5.2.4 Other Fourier variables
5.2.5 The power spectrum
5.3 Waves and dispersion
5.3.1 Phase and group velocities
5.3.2 Phase and group velocities for particle systems
5.3.3 Numerical computation of the dispersion relation
5.3.4 Density of states
5.3.5 Dispersion relation for disordered systems
5.3.6 Solitons
5.4 Further reading
Exercises
References
6
Modeling and Simulation
6.1 Experiments, theory and simulation
6.2 Computability, observables and auxiliary quantities
6.3 Experiments, theories and the discrete element method
6.4 The discrete element method and other particle
simulation methods
6.5 Other simulation methods for granular materials
6.5.1 Continuum mechanics
6.5.2 Lattice models
6.5.3 The Monte Carlo method
References
7
The Discrete Element Method
in Two Dimensions
7.1 The discrete element method with soft particles
7.1.1 The bouncing ball as a prototype for the DEM approach
7.1.2 Using two different stiffness constants to model damping
7.1.3 Simulation of round DEM particles in one dimension
7.1.4 Simulation of round particles in two dimensions
7.2 Modeling of polygonal particles
7.2.1 Initializing two-dimensional particles
7.2.2 Computation of the mass, center of mass and moment of inertia
7.2.3 Non-convex polygons
7.3 Interaction
7.3.1 Shape-dependent elastic force law
7.3.2 Computation of the overlap geometry
7.3.3 Computation of other dynamic quantities
7.3.4 Damping
7.3.5 Cohesive forces
7.3.6 Penetrating particle overlaps
7.4 Initial and boundary conditions
7.4.1 Initializing convex polygons
7.4.2 General considerations
7.4.3 Initial positions
7.4.4 Boundary conditions
7.5 Neighborhood algorithms
7.5.1 Algorithms not recommended for elongated particles
7.5.2 ‘Sort and sweep’
7.6 Time integration
7.7 Program issues
7.7.1 Program restart
7.7.2 Program initialization
7.7.3 Program flow
7.7.4 Proposed stages for the development of programs
7.7.5 Modularization
7.8 Computing observables
7.8.1 Computing averages
7.8.2 Homogenization and spatial averages
7.8.3 Computing error bars
7.8.4 Autocorrelation functions
7.9 Further reading
Exercises
References
8
The Discrete Element Method
in Three Dimensions
8.1 Generalization of the force law to three dimensions
8.1.1 The elastic force
8.1.2 Contact velocity and related forces
8.2 Initialization of particles and their properties
8.2.1 Basic concepts and data structures
8.2.2 Particle generation and geometry update
8.2.3 Decomposition of a polyhedron into tetrahedra
8.2.4 Volume, mass and center of mass
8.2.5 Moment of inertia
8.3 Overlap computation
8.3.1 Triangle intersection by using the point–direction form
8.3.2 Triangle intersection by using the point–normal form
8.3.3 Comparison of the two algorithms
8.3.4 Determination of inherited vertices
8.3.5 Determination of generated vertices
8.3.6 Determination of the faces of the overlap polyhedron
8.3.7 Determination of the contact area and normal
8.4 Optimization for vertex computation
8.4.1 Determination of neighboring features
8.4.2 Neighboring features for vertex computation
8.5 The neighborhood algorithm for polyhedra
8.5.1 ‘Sort and sweep’ in three dimensions
8.5.2 Worst-case performance in three dimensions
8.5.3 Refinement of the contact list
8.6 Programming strategy for the polyhedral simulation
8.7 The effect of dimensionality and the choice of boundaries
8.7.1 Force networks and dimensionality
8.7.2 Quasi-two-dimensional geometries
8.7.3 Packings and sound propagation
8.8 Further reading
References
9
Alternative Modeling Approaches
9.1 Rigidly connected spheres
9.2 Elliptical shapes
9.2.1 Elliptical potentials
9.2.2 Overlap computation for ellipses
9.2.3 Newton–Raphson iteration
9.2.4 Ellipse intersection computed with generalized eigenvalues
9.2.5 Ellipsoids
9.2.6 Superquadrics
9.3 Composites of curves
9.3.1 Composites of arcs and cylinders
9.3.2 Spline curves
9.3.3 Level sets
9.4 Rigid particles
9.4.1 Collision dynamics (‘event-driven method’)
9.4.2 Contact mechanics
9.5 Discontinuous deformation analysis
9.6 Further reading
References
10
Running, Debugging
and Optimizing Programs
10.1 Programming style
10.1.1 Literature
10.1.2 Choosing a programming language
10.1.3 Composite data types, strong typing and object orientation
10.1.4 Readability
10.1.5 Selecting variable names
10.1.6 Comments
10.1.7 Particle simulations versus solving ordinary differential equations
10.2 Hardware, memory and parallelism
10.2.1 Architecture and programming model
10.2.2 Memory hierarchy and cache
10.2.3 Multiprocessors, multi-core processors and shared memory
10.2.4 Peak performance and benchmarks
10.2.5 Amdahl’s law, speed-up and efficiency
10.3 Program writing
10.3.1 Editors
10.3.2 Compilers
10.3.3 Makefiles
10.3.4 Writing and testing code
10.3.5 Debugging
10.4 Measuring load, time and profiles
10.4.1 The ‘top’ command
10.4.2 Xload
10.4.3 Performance monitor for multi-core processors
10.4.4 The ‘time’ command
10.4.5 The Unix profiler
10.4.6 Interactive profilers
10.5 Speeding up programs
10.5.1 Estimating the time consumption of operations
10.5.2 Compiler optimization options
10.5.4 Avoiding unnecessary disk output
10.5.5 Look up or compute
10.5.6 Shared-memory parallelism and OpenMP
10.6 Further reading
Exercises
References
11
Beyond the Scope of This Book
11.1 Non-convex particles
11.2 Contact dynamics and friction
11.3 Impact mechanics
11.4 Fragmentation and fracturing
11.5 Coupling codes for particles and elastic continua
11.6 Coupling of particles and fluid
11.6.1 Basic considerations for the fluid simulation
11.6.2 Verification of the fluid code
11.6.3 Macroscopic simulations
11.6.4 Microscopic simulations
11.6.5 Particle approach for both particles and fluid
11.6.6 Mesh-based modeling approaches
11.7 The finite element method for contact problems
11.8 Long-range interactions
References
A MATLAB as Programming Language
A.1 Getting started with MATLABR
A.2 Data types and names
A.3 Matrix functions and linear algebra
A.4 Syntax and control structures
A.5 Self-written functions
A.6 Function overwriting and overloading
A.7 Graphics
A.8 Solving ordinary differential equations
A.9 Pitfalls of using MATLAB
A.10 Profiling and optimization
A.11 Free alternatives to MATLAB
A.12 Further reading
Exercises
References
B
Geometry and Computational
Geometry
B.1 Trigonometric functions
B.2 Points, line segments and vectors
B.3 Products of vectors
B.3.1 Inner product (scalar product, dot product)
B.3.2 Orthogonality
B.3.3 Outer product
B.3.4 Vector product
B.3.5 Triple product
B.4 Projections and rejections
B.4.1 Projection of a vector onto another vector
B.4.2 Rejection of one vector with respect to another vector
B.5 Lines and planes
B.5.1 Lines and line segments
B.5.2 Planes
B.6 Oriented quantities: distance, area, volume etc.
B.7 Further reading
References
Index

Citation preview

RED BOX RULES ARE FOR PROOF STAGE ONLY. DELETE BEFORE FINAL PRINTING.

Understanding the Discrete Element Method Hans-Georg Matuttis, The University of Electro-Communications, Japan Jian Chen, RIKEN Advanced Institute for Computational Science, Japan The aim of this book is to advance the field of granular and multi-body studies while giving readers a more thorough understanding of the discrete element method (DEM). By working through this volume, researchers will be better equipped for independent work and will develop an ability to judge methods related to the simulation of polygonal particles. When materials are not handled as fluids, they are dealt with mostly in granular form (e.g. cement, sand, grains, powders). Granular materials are characterized by abrupt transitions from loose to dense, from flowing to static states, and vice versa. Many problems in natural disasters (earthquakes, landslides, etc.) are also of a “granular” nature. Continuum methods have been applied in these fields, but lack any intrinsic mechanism to account for the transitions, behavior that is inherently discontinuous. The “natural” approach is to use particle simulation methods, often called the “discrete element method”, where bodies in the physical system and the simulation match one to one. The field of discrete element simulation has changed little since the early 1990s, when simulations predominantly used spherical particles. The aim of this book is to show the practicability and usefulness of non-spherical discrete element simulations. Phenomena from related fields (mechanics, solid state physics, etc.) are discussed, which as test cases are sometimes not applicable due to intriguing reasons. Understanding both the pitfalls and applications will help one to predict the outcome of simulations and use the predictions for the design of future experiments. • Introduces the discrete element method (DEM) starting from the fundamental concepts (theoretical mechanics and solid state physics), with 2D and 3D simulation methods for polygonal and polyhedral particles • Explains the basics of coding DEM, requiring little previous knowledge of granular matter or numerical simulation • Highlights numerical tricks and pitfalls that are usually only recognized after years of experience, using relevant simple experiments to illustrate applications • Presents a logical approach starting with the mechanical and physical bases, followed by a description of the techniques and their applications • Written by key authors presenting ideas on how to model the dynamics of angular particles using polygons and polyhedrals • Accompanying website includes MATLAB® programs providing the simulation code for two-dimensional convex polygons This book is ideal for researchers and graduate students who deal with particle models in areas such as fluid dynamics, multi-body engineering, finite element methods, virtual reality, the geosciences, and multi-scale physics. Computer scientists involved with solid modeling and game programming will also find this book a useful reference for the design of physics engines.

www.wiley.com/go/matuttis

Understanding the Discrete Element Method

Simulation of Non-Spherical Particles for Granular and Multi-body Systems

Matuttis Chen

Hans-Georg Matuttis | Jian Chen

Understanding the Discrete Element Method Simulation of Non-Spherical Particles for Granular and Multi-body Systems

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:13 — page xiv — #14

i

i

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:08 — page i — #1

i

i

UNDERSTANDING THE DISCRETE ELEMENT METHOD

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:08 — page ii — #2

i

i

To my wife and my daughter Ayako Hans-Georg Matuttis To my wife, parents and younger brother Jian Chen

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:08 — page iii — #3

i

i

UNDERSTANDING THE DISCRETE ELEMENT METHOD SIMULATION OF NON-SPHERICAL PARTICLES FOR GRANULAR AND MULTI-BODY SYSTEMS Hans-Georg Matuttis The University of Electro-Communications, Japan

Jian Chen RIKEN Advanced Institute for Computational Science, Japan

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:08 — page iv — #4

i

i

This edition first published 2014 © 2014 John Wiley & Sons, Singapore Pte. Ltd. Registered Office John Wiley & Sons, Singapore Pte. Ltd., 1 Fusionopolis Walk, #07-01 Solaris South Tower, Singapore 138628. For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as expressly permitted by law, without either the prior written permission of the Publisher, or authorization through payment of the appropriate photocopy fee to the Copyright Clearance Center. Requests for permission should be addressed to the Publisher, John Wiley & Sons, Singapore Pte. Ltd., 1 Fusionopolis Walk, #07-01 Solaris South Tower, Singapore 138628, tel: 65-66438000, fax: 65-66438008, email: [email protected]. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The Publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. R is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the MATLAB R software or related products does accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the R software. MATLAB

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Library of Congress Cataloging-in-Publication data has been applied for. ISBN: 978-1-118-56720-3 Set in 10/12pt Times by SPi Publisher Services, Pondicherry, India

1 2014

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:13 — page v — #5

i

i

Contents About the Authors

xv

Preface

xvii

Acknowledgements

xix

List of Abbreviations

xxi

1 Mechanics 1.1 Degrees of freedom 1.1.1 Particle mechanics and constraints 1.1.2 From point particles to rigid bodies 1.1.3 More context and terminology 1.2 Dynamics of rectilinear degrees of freedom 1.3 Dynamics of angular degrees of freedom 1.3.1 Rotation in two dimensions 1.3.2 Moment of inertia 1.3.3 From two to three dimensions 1.3.4 Rotation matrix in three dimensions 1.3.5 Three-dimensional moments of inertia 1.3.6 Space-fixed and body-fixed coordinate systems and equations of motion 1.3.7 Problems with Euler angles 1.3.8 Rotations represented using complex numbers 1.3.9 Quaternions 1.3.10 Derivation of quaternion dynamics 1.4 The phase space 1.4.1 Qualitative discussion of the time dependence of linear oscillations 1.4.2 Resonance 1.4.3 The flow in phase space 1.5 Nonlinearities 1.5.1 Harmonic balance 1.5.2 Resonance in nonlinear systems

1 1 1 3 4 5 6 6 7 9 12 13 16 19 20 21 27 29 31 34 35 39 40 42

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:13 — page vi — #6

i

vi

i

Contents

1.5.3 Higher harmonics and frequency mixing 1.5.4 The van der Pol oscillator 1.6 From higher harmonics to chaos 1.6.1 The bifurcation cascade 1.6.2 The nonlinear frictional oscillator and Poincar´e maps 1.6.3 The route to chaos 1.6.4 Boundary conditions and many-particle systems 1.7 Stability and conservation laws 1.7.1 Stability in statics 1.7.2 Stability in dynamics 1.7.3 Stable axes of rotation around the principal axis 1.7.4 Noether’s theorem and conservation laws 1.8 Further reading Exercises References 2 Numerical Integration of Ordinary Differential Equations 2.1 Fundamentals of numerical analysis 2.1.1 Floating point numbers 2.1.2 Big-O notation 2.1.3 Relative and absolute error 2.1.4 Truncation error 2.1.5 Local and global error 2.1.6 Stability 2.1.7 Stable integrators for unstable problems 2.2 Numerical analysis for ordinary differential equations 2.2.1 Variable notation and transformation of the order of a differential equation 2.2.2 Differences in the simulation of atoms and molecules, as compared to macroscopic particles 2.2.3 Truncation error for solutions of ordinary differential equations 2.2.4 Fundamental approaches 2.2.5 Explicit Euler method 2.2.6 Implicit Euler method 2.3 Runge–Kutta methods 2.3.1 Adaptive step-size control 2.3.2 Dense output and event location 2.3.3 Partitioned Runge–Kutta methods 2.4 Symplectic methods 2.4.1 The classical Verlet method 2.4.2 Velocity-Verlet methods 2.4.3 Higher-order velocity-Verlet methods 2.4.4 Pseudo-symplectic methods 2.4.5 Order, accuracy and energy conservation 2.4.6 Backward error analysis 2.4.7 Case study: the harmonic oscillator with and without viscous damping

44 45 47 47 47 51 52 53 54 55 56 58 61 61 63 65 65 65 67 69 69 71 74 74 75 75 76 76 77 77 78 79 79 81 82 82 82 83 85 88 88 89 90

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:13 — page vii — #7

i

Contents

2.5 Stiff problems 2.5.1 Evaluating computational costs 2.5.2 Stiff solutions and error as noise 2.5.3 Order reduction 2.6 Backward difference formulae 2.6.1 Implicit integrators of the predictor–corrector formulae 2.6.2 The corrector step 2.6.3 Multiple corrector steps 2.6.4 Program flow 2.6.5 Variable time-step and variable order 2.7 Other methods 2.7.1 Why not to use self-written or novel integrators 2.7.2 Stochastic differential equations 2.7.3 Extrapolation and high-order methods 2.7.4 Multi-rate integrators 2.7.5 Zero-order algorithms 2.8 Differential algebraic equations 2.8.1 The pendulum in Cartesian coordinates 2.8.2 Initial conditions 2.8.3 Drift and stabilization 2.9 Selecting an integrator 2.9.1 Performance and stability 2.9.2 Angular degrees of freedom 2.9.3 Force equilibrium 2.9.4 Exploring new fields 2.9.5 ODE solvers unsuitable for DEM simulations 2.10 Further reading Exercises References 3 Friction 3.1 Sliding Coulomb friction 3.1.1 A block on a slope 3.1.2 Static and dynamic friction coefficients 3.1.3 Apparent and actual contact area 3.1.4 Roughness and the friction coefficient 3.1.5 Adhesion and chemical bonding 3.2 Other contact geometries of Coulomb friction 3.2.1 Rolling friction 3.2.2 Pivoting friction 3.2.3 Sliding and rolling friction: the billiard problem 3.2.4 Sliding and rolling friction: cylinder on a slope 3.2.5 Pivoting and rolling friction 3.3 Exact implementation of friction 3.3.1 Establishing the difference between dynamic and static friction 3.3.2 Single-particle contact 3.3.3 Frictional linear chain

i

vii

92 93 94 94 94 94 96 97 98 98 98 98 100 100 101 101 103 103 106 107 109 109 109 109 110 110 111 113 125 129 129 130 132 134 135 136 136 137 138 140 143 144 144 145 148 151

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:13 — page viii — #8

i

viii

i

Contents

3.3.4 Higher dimensions 3.4 Modeling and regularizations 3.4.1 The Cundall–Strack model 3.4.2 Cundall-Strack friction in three dimensions 3.5 Unfortunate treatment of Coulomb friction in the literature 3.5.1 Insufficient models 3.5.2 Misunderstandings concerning surface roughness and friction 3.5.3 The Painlev´e paradox 3.6 Further reading Exercises References

152 153 153 155 155 156 158 158 158 159 159

4 Phenomenology of Granular Materials 4.1 Phenomenology of grains 4.1.1 Interaction 4.1.2 Friction and dissipation 4.1.3 Length and time scales 4.1.4 Particle shape, and rolling and sliding 4.2 General phenomenology of granular agglomerates 4.2.1 Disorder 4.2.2 Heap formation 4.2.3 Tri-axial compression and shear band formation 4.2.4 Arching 4.2.5 Clogging 4.3 History effects in granular materials 4.3.1 Hysteresis 4.3.2 Reynolds dilatancy 4.3.3 Pressure distribution under heaps 4.4 Further reading References

161 161 161 162 162 163 164 164 165 166 168 168 168 169 170 171 173 173

5 Condensed Matter and Solid State Physics 5.1 Structure and properties of matter 5.1.1 Crystal structures in two dimensions 5.1.2 Crystal structures in three dimensions 5.1.3 From the Wigner–Seitz cell to the Voronoi construction 5.1.4 Strength parameters of materials 5.1.5 Strength of granular assemblies 5.2 From wave numbers to the Fourier transform 5.2.1 Wave numbers and the reciprocal lattice 5.2.2 The Fourier transform in one dimension 5.2.3 Properties of the FFT 5.2.4 Other Fourier variables 5.2.5 The power spectrum 5.3 Waves and dispersion 5.3.1 Phase and group velocities 5.3.2 Phase and group velocities for particle systems

175 176 176 178 180 182 185 186 186 188 189 193 193 194 194 196

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:13 — page ix — #9

i

Contents

5.3.3 Numerical computation of the dispersion relation 5.3.4 Density of states 5.3.5 Dispersion relation for disordered systems 5.3.6 Solitons 5.4 Further reading Exercises References

i

ix

199 200 202 204 206 206 210

6 Modeling and Simulation 6.1 Experiments, theory and simulation 6.2 Computability, observables and auxiliary quantities 6.3 Experiments, theories and the discrete element method 6.4 The discrete element method and other particle simulation methods 6.5 Other simulation methods for granular materials 6.5.1 Continuum mechanics 6.5.2 Lattice models 6.5.3 The Monte Carlo method References

213 213 214 215 217 218 218 219 220 221

7 The Discrete Element Method in Two Dimensions 7.1 The discrete element method with soft particles 7.1.1 The bouncing ball as a prototype for the DEM approach 7.1.2 Using two different stiffness constants to model damping 7.1.3 Simulation of round DEM particles in one dimension 7.1.4 Simulation of round particles in two dimensions 7.2 Modeling of polygonal particles 7.2.1 Initializing two-dimensional particles 7.2.2 Computation of the mass, center of mass and moment of inertia 7.2.3 Non-convex polygons 7.3 Interaction 7.3.1 Shape-dependent elastic force law 7.3.2 Computation of the overlap geometry 7.3.3 Computation of other dynamic quantities 7.3.4 Damping 7.3.5 Cohesive forces 7.3.6 Penetrating particle overlaps 7.4 Initial and boundary conditions 7.4.1 Initializing convex polygons 7.4.2 General considerations 7.4.3 Initial positions 7.4.4 Boundary conditions 7.5 Neighborhood algorithms 7.5.1 Algorithms not recommended for elongated particles 7.5.2 ‘Sort and sweep’ 7.6 Time integration 7.7 Program issues

223 223 224 227 228 228 229 229 231 237 237 238 240 244 246 248 249 250 250 252 253 255 257 258 263 271 272

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:13 — page x — #10

i

x

i

Contents

7.7.1 Program restart 7.7.2 Program initialization 7.7.3 Program flow 7.7.4 Proposed stages for the development of programs 7.7.5 Modularization 7.8 Computing observables 7.8.1 Computing averages 7.8.2 Homogenization and spatial averages 7.8.3 Computing error bars 7.8.4 Autocorrelation functions 7.9 Further reading Exercises References 8 The Discrete Element Method in Three Dimensions 8.1 Generalization of the force law to three dimensions 8.1.1 The elastic force 8.1.2 Contact velocity and related forces 8.2 Initialization of particles and their properties 8.2.1 Basic concepts and data structures 8.2.2 Particle generation and geometry update 8.2.3 Decomposition of a polyhedron into tetrahedra 8.2.4 Volume, mass and center of mass 8.2.5 Moment of inertia 8.3 Overlap computation 8.3.1 Triangle intersection by using the point–direction form 8.3.2 Triangle intersection by using the point–normal form 8.3.3 Comparison of the two algorithms 8.3.4 Determination of inherited vertices 8.3.5 Determination of generated vertices 8.3.6 Determination of the faces of the overlap polyhedron 8.3.7 Determination of the contact area and normal 8.4 Optimization for vertex computation 8.4.1 Determination of neighboring features 8.4.2 Neighboring features for vertex computation 8.5 The neighborhood algorithm for polyhedra 8.5.1 ‘Sort and sweep’ in three dimensions 8.5.2 Worst-case performance in three dimensions 8.5.3 Refinement of the contact list 8.6 Programming strategy for the polyhedral simulation 8.7 The effect of dimensionality and the choice of boundaries 8.7.1 Force networks and dimensionality 8.7.2 Quasi-two-dimensional geometries 8.7.3 Packings and sound propagation 8.8 Further reading References

272 274 274 276 278 280 280 281 282 284 285 286 286 289 289 290 291 292 292 294 296 299 300 301 301 305 309 310 312 315 320 322 323 324 325 325 326 327 329 332 332 332 333 333 333

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:13 — page xi — #11

i

Contents

9

i

xi

Alternative Modeling Approaches 9.1 Rigidly connected spheres 9.2 Elliptical shapes 9.2.1 Elliptical potentials 9.2.2 Overlap computation for ellipses 9.2.3 Newton–Raphson iteration 9.2.4 Ellipse intersection computed with generalized eigenvalues 9.2.5 Ellipsoids 9.2.6 Superquadrics 9.3 Composites of curves 9.3.1 Composites of arcs and cylinders 9.3.2 Spline curves 9.3.3 Level sets 9.4 Rigid particles 9.4.1 Collision dynamics (‘event-driven method’) 9.4.2 Contact mechanics 9.5 Discontinuous deformation analysis 9.6 Further reading References

335 335 336 337 337 339 340 344 344 345 345 345 347 347 347 348 349 349 349

10 Running, Debugging and Optimizing Programs 10.1 Programming style 10.1.1 Literature 10.1.2 Choosing a programming language 10.1.3 Composite data types, strong typing and object orientation 10.1.4 Readability 10.1.5 Selecting variable names 10.1.6 Comments 10.1.7 Particle simulations versus solving ordinary differential equations 10.2 Hardware, memory and parallelism 10.2.1 Architecture and programming model 10.2.2 Memory hierarchy and cache 10.2.3 Multiprocessors, multi-core processors and shared memory 10.2.4 Peak performance and benchmarks 10.2.5 Amdahl’s law, speed-up and efficiency 10.3 Program writing 10.3.1 Editors 10.3.2 Compilers 10.3.3 Makefiles 10.3.4 Writing and testing code 10.3.5 Debugging 10.4 Measuring load, time and profiles 10.4.1 The ‘top’ command 10.4.2 Xload

353 353 354 355 356 356 357 359 361 362 362 364 365 365 367 369 370 370 371 372 377 378 379 379

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:13 — page xii — #12

i

xii

Contents

10.5

10.6

10.4.3 Performance monitor for multi-core processors 10.4.4 The ‘time’ command 10.4.5 The Unix profiler 10.4.6 Interactive profilers Speeding up programs 10.5.1 Estimating the time consumption of operations 10.5.2 Compiler optimization options 10.5.3 Optimizations by hand 10.5.4 Avoiding unnecessary disk output 10.5.5 Look up or compute 10.5.6 Shared-memory parallelism and OpenMP Further reading Exercises References

11 Beyond the Scope of This Book 11.1 Non-convex particles 11.2 Contact dynamics and friction 11.3 Impact mechanics 11.4 Fragmentation and fracturing 11.5 Coupling codes for particles and elastic continua 11.6 Coupling of particles and fluid 11.6.1 Basic considerations for the fluid simulation 11.6.2 Verification of the fluid code 11.6.3 Macroscopic simulations 11.6.4 Microscopic simulations 11.6.5 Particle approach for both particles and fluid 11.6.6 Mesh-based modeling approaches 11.7 The finite element method for contact problems 11.8 Long-range interactions References A

i

R as Programming Language MATLAB R A.1 Getting started with MATLAB A.2 Data types and names A.3 Matrix functions and linear algebra A.4 Syntax and control structures A.5 Self-written functions A.6 Function overwriting and overloading A.7 Graphics A.8 Solving ordinary differential equations R A.9 Pitfalls of using MATLAB A.10 Profiling and optimization R A.11 Free alternatives to MATLAB A.12 Further reading Exercises References

380 380 383 383 383 383 384 389 390 390 390 391 392 392 395 395 395 396 396 396 398 398 398 399 399 400 402 402 403 403 407 407 408 409 413 415 416 417 418 420 424 425 425 426 430

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:13 — page xiii — #13

i

Contents

B

Geometry and Computational Geometry B.1 Trigonometric functions B.2 Points, line segments and vectors B.3 Products of vectors B.3.1 Inner product (scalar product, dot product) B.3.2 Orthogonality B.3.3 Outer product B.3.4 Vector product B.3.5 Triple product B.4 Projections and rejections B.4.1 Projection of a vector onto another vector B.4.2 Rejection of one vector with respect to another vector B.5 Lines and planes B.5.1 Lines and line segments B.5.2 Planes B.6 Oriented quantities: distance, area, volume etc. B.7 Further reading References

Index

i

xiii

433 433 435 436 436 437 438 438 440 441 441 442 442 442 444 446 449 449 451

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 15:13 — page xiv — #14

i

i

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/18 — 14:24 — page xv — #15

i

i

About the Authors Hans-Georg Matuttis did his Diploma and PhD on Quantum Monte Carlo methods at the University of Regensburg, Germany, and later worked on granular materials as assistant of professor Hans Herrmann at the university of Stuttgart. After three years research stay at The University of Tokyo, in 2002 H.-G. Matuttis became Associate Professor at the University of Electro-Communications (Tokyo, Japan). Jian Chen received his Bachelor and Master in Mechatronics Engineering from the University of Electronic Science and Technology of China. He did his Ph.D. research on granular materials with Hans-Georg Matuttis at the University of Electro-Communications and is currently a postdoctoral researcher at the computational disaster mitigation and reduction research unit of RIKEN’s Advanced Institute for Computational Science (Kobe, Japan).

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/18 — 14:24 — page xvi — #16

i

i

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/18 — 14:25 — page xvii — #17

i

i

Preface While the discrete element method (DEM) for round particles has been around for decades (more than three if one starts with the algorithms of P. A. Cundall, more than five if one starts with the methods developed by B. Alder), the use of simulation techniques with nonspherical particles has spread only slowly, in spite of the much richer phenomenology and better geometric verisimilitude. Kepler’s dictum ‘ubi materia, ibi geometria’ (‘where you have to deal with matter, you have to deal with geometry’) is particularly true for the discrete element method. Accordingly, we have focused more on concrete illustrations and motivations than on abstract derivations. This book basically consists of material that we wish we could have had in hand when we started to work with discrete element methods ourselves. We hope that the reader will find it useful. Particle modeling needs a clear geometrical imagination, much more than modeling with continuum equations, where algebraic methods may also succeed. Geometrical arguments and approaches are given in detail, because we have found that they often don’t form part of the curricula in the fields to which this book may be relevant. The reader is assumed to have reasonable familiarity with Newtonian mechanics, linear algebra, ordinary differential equations and at least one procedural programming language. R as the programming language in this book, as it has proved to be a flexiWe use MATLAB ble tool for fast prototyping, even for complex algorithms; we have found that it improves readability and reduces development time compared to more ‘traditional’ programming languages. As we intend the book to be accessible to graduates in the physical sciences, engineering and computer science, we have formulated many ideas more explicitly than if the book had been aimed at a single community alone. We have tried to make sure that the exposition is self-contained for the broadest readership we could envision. Nevertheless, depending on the reader’s background, some chapters will be more easily understandable and others more difficult. It was the intention of the authors to make the book self-contained, i.e. all the important concepts are explained in the book without the need to use other reference material. In particular, ‘the internet’ contains some rather dubious resources for the learner. The ultimate goal of the book is to enable the reader to write and understand DEM codes. Therefore, we have had to include additional material from other fields related to possible test cases, conceptual verification etc. Misunderstandings about the scope and validity of test cases

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/18 — 14:25 — page xviii — #18

i

xviii

i

Preface

have derailed the schedules of many researchers for months and even years. We recommend starting with simpler algorithms and lower dimensions; only after gaining experience in these simpler settings should one move to more complex problems (such as polyhedra in three dimensions). We know of several projects which failed because the researchers began with full three-dimensional simulations right away, without first gaining experience with simpler algorithms in lower dimensions. The exercises are not meant to ‘keep the noses of students to the grindstone’ but to familiarize the reader with, in all likelihood, unfamiliar material. The resulting calculations may also help the reader devise examples to test their programs.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/18 — 14:27 — page xix — #19

i

i

Acknowledgements We are indebted to Wang Xiao Xing, Dominik Krengel, Shi Han Ng and Robin Tenhagen for reading the chapters and making valuable suggestions. Shi Han Ng is thanked in particular for programming various set-ups and algorithms, as well as for taking photos and recording movies. Chen Wei Shen is thanked for ‘giving a hand’ in the pictures of the experiments. H.-G. M. would like to thank Professor Christian Lubich of the University of T¨ubingen for introducing him to Filippov-type solutions for Coulomb friction problems. The authors gratefully acknowledge the pleasant cooperation with Clarissa Lim and James Murphy of John Wiley & Sons and Alistair Smith of Sunrise Setting Ltd. as well as freelance copy-editor Alice Yew.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/18 — 14:27 — page xx — #20

i

i

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/18 — 14:27 — page xxi — #21

i

i

List of Abbreviations × · • ∝ 1 C R |•| • •∗ •−1 arctan 2(y, x) BDF BLAS d DAE DDA DEM ED FEM i g(· · · ) MD MPS O ODE OpenMP q q qφ

three-dimensional vector product scalar, matrix or quaternion product (depending on context) placeholder for a variable in an operation proportional to unit operator or identity matrix field of complex numbers field of real numbers absolute value length of a vector or norm of a matrix conjugate of a complex number or quaternion inverse of a matrix or quaternion or number two-argument arctangent (first argument is y, not x!) backward difference formula (Gear predictor–corrector) basic linear algebra subroutines differential increment in dx, dt, d/dx etc. differential algebraic equation discontinuous deformation analysis discrete element method event-driven finite element method √ imaginary unit −1 constraint function molecular dynamics moving particle semi-implicit order (Taylor order, Landau notation) ordinary differential equation open multi-processing (application programming interface) a quaternion a unit quaternion the unit quaternion qφ = (cos φ, v sin φ)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/18 — 14:27 — page xxii — #22

i

xxii

⎧ ⎪ if x > 0 ⎨1 sgn(x) = 0 if x = 0 ⎪ ⎩ ⎧ −1 if x < 0 ⎪ if x > 0 ⎨= 1 sgn(x) ∈ [−1, 1] if x = 0 ⎪ ⎩ = −1 x δ 2 t); see the solid gray curve in Figure 1.15. The critically damped and over-damped cases can be found in Benenson [9]. The exponential decay of the solution, x(t) ∝ exp(−δt), leads to a similar exponential decay of the velocities (see Figure 1.16, solid gray line). Continuum materials under vibration usually show viscous damping patterns, too, due to the dissipation mechanisms of kinetic energies in solids. Exponential decay sounds impressive, but is in fact a relatively ‘weak’ type of decay: the amplitude never actually reaches zero. With Coulomb friction (dry friction or sliding friction), the linear oscillator becomes mx¨ + μ sgn(x) ˙ + kx = 0,

(1.97)

where μ is the product of the friction coefficient and the normal force, and we define the sgn function as ⎧ ⎪ for a > 0, ⎨= 1, sgn(a) ∈ [−1, 1] for a = 0, (1.98) ⎪ ⎩ = −1 for a < 0, so that the friction force exactly compensates for the external force. Note that this is different from the usual step function definition ⎧ ⎪ ⎨1 sgn(a) = 0 ⎪ ⎩ −1

for a > 0, for a = 0, for a < 0.

Physically, the system corresponds to a spring that is fixed to a wall and connected to a block which slides on the floor nearby; see Figure 1.14(f). In this chapter the discussion will be in a hand-waving fashion; we give the exact solution for v = 0 in Chapter 3. For sgn(v) = sgn(x) ˙ = ±1, the solution is composed of solutions to one of the inhomogeneous differential equations [10, 11] mx¨ + kx = −μ,

(1.99)

mx¨ + kx = +μ,

(1.100)

or the amplitude stays constant when −kx is smaller than μ. The solutions to Equations √ (1.99) and (1.100) have the same periodicity as the solution to (1.92), with ω0 = k/m, because for a linear ordinary differential equation, introducing a non-zero term on the righthand side (inhomogeneity) does not change the general solution. The effect of damping with Coulomb friction is that the piecewise solution branches between the reversals in sign of the

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 34 — #34

i

34

i

Understanding the Discrete Element Method

velocity decay in magnitude (for both the amplitude and the velocity) within a linear envelope (the outer dotted lines in Figures 1.15 and 1.16). This means that the relative maxima of the positions and velocities lie along a line, and likewise for the minima, so that after a finite time, the velocity v(t) becomes zero and the amplitude x(t) becomes constant. As can be seen in Figure 1.15, the final amplitude does not have to be zero: when the spring force −kx is smaller than the friction force μ, the amplitude stays fixed, which is why we have to use the inclusion definition for the sign in (1.98). From Figure 1.16 one sees that the velocity, and with it the kinetic energy, goes to zero in finite time, so Coulomb friction is much more effective in damping out energies or vibrations than is velocity-dependent friction, especially at small velocities. This effect has various applications. Machine parts (e.g. running gears and wheels of trains) are tested by tapping them with a hammer. If everything is in good condition, one hears a nice ‘metallic’ ringing sound: the sound amplitude is damped out exponentially and decays smoothly. If there are cracks, the contact between ragged surfaces damps the sound much faster due to Coulomb friction, so that it comes out as a short, ugly rattling noise. One can visualize this effect by fixing one end of a ruler on a desk and setting the other end to vibrate; usually there will be a smooth decay in the vibration amplitude, but if the vibrating end is in frictional contact with another object, the decay will be abrupt. Individually, contacts in granular assemblies are equivalent to linear oscillators with Coulomb friction. For this reason, aggregates of granular material are often much better at damping out kinetic energies than a similar piece of continuum material would be. Jugglers use grain-filled balls for practice, because such balls won’t roll away when accidentally dropped; sand slopes are used in shooting ranges to catch straying bullets, while sand sacks are used for protection against aimed bullets.

1.4.2

Resonance

Now let us consider what happens when we drive the damped linear oscillator of Equation (1.95) by a periodic force which oscillates with period ω and maximal amplitude f˜0 . To reduce the amount of algebra required for the solution, we write the periodic force in complex exponential form, so that the equation is x¨ + 2δ x˙ + ω02 x = f˜0 exp(iωt)

(1.101)

√ where ω0 = k/m. We are interested in the absolute value of the amplitude A of the stationary solution x(t) = A exp(iωt).

(1.102)

Substituting (1.102) into (1.101) allows us to get rid of the time dependence (by canceling out factors of exp(iωt)) and hence obtain4 −Aω2 + i2δAω + ω02 A = f0 ,

(1.103)

4 This is possible because of how we captured the time dependence with a complex exponential; to formulate a solution using only real functions, about two pages of arithmetic and algebraic transformations are necessary; see, for instance, Knudsen and Hjorth [12, § 15.6].

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 35 — #35

i

Mechanics

i

35

δ = 0ω0

7 6 A(ω)

5 4

δ = 0.05ω0

3

δ = 0.075ω0

2 1 0

δ = 0.15ω0

δ = 0.5ω0

ω

ω0

Figure 1.17 Graph of the absolute value of the resonance amplitude, |A|, as a function of ω for the linear oscillator in Equation (1.103) with various values of δ. The dashed line gives the positions of the maxima as in formula (1.106).

with f0 = f˜0 /m, which then gives A=

−ω2

f0 . + i2δω + ω02

(1.104)

From this, we obtain the absolute value of the complex amplitude A according to (1.42): f0 |A| =  . (1.105) (ω02 − ω2 )2 + 4δ 2 ω2 (The absolute value is also more meaningful in the purely real case with δ = 0, as A(ω) changes sign from +∞ to −∞ at ω = ω0 ; since we are interested in the magnitude of the amplitude, the sign is not important.) The resulting amplitudes are plotted in Figure 1.17 for several values of δ. For damping 0 ≤ δ ≤ 1, the maxima of the resonance amplitudes lie on the curve f0 Amax =  . (1.106) δ (ω02 − δ 2 ) For δ = 0, the amplitude increases toward infinity, i.e. an undamped system excited at the resonance frequency ω = ω0 would be destroyed, due to unlimited growth of the vibration amplitude. Note that the amplitude increases only linearly in time, so that an infinite amplitude would only be reached after an infinite amount of time; see Exercise 1.3. The right-hand side f0 exp(iωt) of Equation (1.101) contains only a t-dependence, so it is an ‘external’ force; terms with dependence on x only are the ‘internal’ forces of the system. In mathematical terminology, systems that depend only on ‘x’ are said to be autonomous, while those which also have a dependence on ‘t’ are non-autonomous.

1.4.3

The flow in phase space

With the results from the previous subsection, we are ready to discuss the flow of the differential equation in phase space, also called the ‘attractor’ of the system. (The flow will be used

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 36 — #36

i

36

i

Understanding the Discrete Element Method

later in Chapter 3 to make mathematically exact distinctions between conditions for static and dynamic friction.) In Figures 1.18–1.20, we visualize the flow in several ways. First, we plot with solid lines the trajectories in time, (x(t), v(t)), of the solution. We can also consider Newton’s equation of motion in the form (1.7)–(1.8), written as d dt

    x v = , v F /m

(1.107)

so that the right-hand side is equivalent to the directions ⎞ x(t + δt) − x(t)  ⎟  ⎜ δt x(t) ˙ ⎟ ⎜ ⎟= ⎜ v(t) ˙ ⎝ v(t + δt) − v(t) ⎠ δt ⎛

(1.108)

of the flow field: these directions are depicted as arrows in Figures 1.18–1.20. Finally, it has become traditional to discuss the transport of a set of initial conditions in phase space from time t0 to time t: y(t0 ) → y(t).

(1.109)

1 0.8 0.6 0.4

v

0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −1 −0.8 −0.6 −0.4 −0.2

0 x

0.2 0.4 0.6 0.8

1

Figure 1.18 Phase portrait (attractor) for the linear oscillator (m = 1, k = 1) without damping: illustration of Liouville’s theorem on conservation of phase space volume.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 37 — #37

i

Mechanics

i

37

1 0.8 0.6 0.4 0.2 v

0

−0.2 −0.4 −0.6 −0.8 −1 −1 −0.8 −0.6 −0.4 −0.2

0 x

0.2 0.4 0.6 0.8

1

Figure 1.19 Phase portrait for the linear oscillator (m = 1, k = 1) with viscous damping (δ = 0.1): the attractor is a whirl, where the phase space volume shrinks exponentially in spiral-shaped trajectories.

1 0.8 0.6 0.4

v

0.2 0

−0.2 −0.4 −0.6 −0.8 −1 −1 −0.8 −0.6 −0.4 −0.2 0 x

0.2 0.4 0.6 0.8

1

Figure 1.20 Attractors for the linear oscillator (m = 1, k = 1) with Coulomb friction (μ = 0.15): the flow of an initially simply connected phase space volume is split and deposited from above and from below at the singularity g(x, v) = (−μ ≤ x ≤ μ, v = 0).

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 38 — #38

i

38

i

Understanding the Discrete Element Method

The initial conditions usually take the shape of a cat’s head, which goes back to Arnold’s book on mechanics [13], but is probably originally due to Delaunay.5 For the undamped linear oscillator, plotting sine against cosine from the solutions (1.93)– (1.94) gives circular trajectories, as shown in Figure 1.18. We can see that the area of the cat’s head does not change, i.e. it is a ‘conserved’ quantity; this illustrates Liouville’s theorem, which says that phase space density is conserved for ‘Hamiltonian’ mechanical systems, i.e. systems for which Newton’s equation of motion can be written as [15] d x = m−1 p, dt d p = −∇x V (x), dt where x denotes position, m is mass, p is momentum, and ∇x V (x) is the gradient of the position-dependent potential V (x). In mathematics, such systems of ordinary differential equations are said to be ‘symplectic’; in physics they are called ‘Hamiltonian’ or ‘canonical’ systems [3, 13, 16]. Among other things, these systems exhibit conservation of energy. The direction field is continuous, i.e. the mapping     x(t) x(t + δt) + δx → (1.110) v(t) v(t + δt) + δv with infinitesimal δt, δx and δv is continuous for all initial values of x, v > 0. When damping is introduced, the amplitude in (1.96) decays exponentially; see Figure 1.19. Viscous damping leads to an exponential contraction of the cat’s head, i.e. the volume spanned by the initial condition decreases during transport of the coordinates in phase space, but the shape stays basically the same. The exponential decay gives spiral- or vortex-shaped trajectories in phase space, or whirls, as they are called in the field of dynamical systems. As for the energy-conserving system in (1.110), the right-hand side functions in Equation (1.107) are also continuous from one point to another in phase space, and the direction field has no singularity; in other words, the direction change from an arrow at (x(t), v(t)) to a nearby arrow at (x(t + δt) + δx, v(t + δt) + δv) is always smooth, and the singularity (v = 0, x = 0) cannot be reached in finite time, so it is not part of the phase space for the problem. The situation changes dramatically when we have Coulomb friction; see Figure 1.20. At the beginning the attractor resembles that in the viscous damping case: for (|μ| > x, v = 0), the situation for dynamic friction, the flow is continuous. Along g(x, v) = (−μ ≤ x ≤ μ, v = 0), the flow is non-smooth. In an infinitesimal region around g(x, v) = (−μ ≤ x ≤ μ, v = 0), flow from above or from below can occur: this is the region of static friction, where the tension of the spring at finite displacement in Figure 1.14 is not strong enough to overcome the friction force acting on the block. When the cat’s head approaches the line g(x, v) = (−μ ≤ x ≤ μ, v = 0), it splits up: part of the flow is transported into g(x, v) from above, another part from below. No flow is possible on the horizontal axis, either from left to right or from right to left. This is a consequence of the fact that the right-hand side of the 5 According to Zdravkovska et al. [14, p. 82], B. N. Delaunay (1890–1980), who taught at Moscow University where Arnold studied, used to visualize affine transformations by ‘transformations of a picture of a kitten’.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 39 — #39

i

Mechanics

i

39

system (1.99)–(1.100) is not smooth: arrows coming from below face upward, arrows coming from above face downward, and along the whole line g(x, v) = (−μ ≤ x ≤ μ, v = 0) arrows have zero length. Note that while across the line the flow is not smooth, the line itself is part of the phase space of the problem and corresponds to the situation in Figure 1.14(f), where the spring is under tension but the block does not move because it is held by the friction force. A contraction like for viscous damping has been proposed [17] for the phase space evolution of, among other systems, sheared granular materials, which implies a flow as in Figure 1.19. As these materials are, to all intents and purposes, assemblies of solid particles with Coulomb friction (except in the most artificial cases), assumption of a ‘damped’ Liouville equation ∂f (t) ˜ (t) = − [iL + ] f (t) = −iLf ∂t

(1.111)

to describe the phase space volume, with a solution that is an exponential contraction of the ˜ 0 of Equation (1.111), may be an appropriate local descripphase space f (t) = exp(−iLt)f tion in some cases; however, globally this approach is inappropriate, even for only a single frictional contact, as a comparison of Figures 1.19 and 1.20 easily shows. Such a description would easily break down for transitions from dynamic to static friction, for example from hopper flow to clogging. This provides a more esoteric justification of why one should use particle models with Coulomb friction: they allow us to access much more exotic flows in phase space than do continuum approaches alone. While physically g(x, v) = (−μ ≤ x ≤ μ, v = 0) is reached in finite time in Figure 1.20, Filippov theory [18], the standard theory for differential equations with discontinuous righthand sides, does not allow for singularities g(x, v) = (−μ ≤ x ≤ μ, v = 0) which have the shape of a line in the solution domain. Instead of the attractor in Figure 1.20, for singularities in the flow directions in (1.108), Filippov theory [18, Ch. 4] postulates transport along the line (in our case, along the x-axis with v = 0), but this is clearly impossible in the case of a spring with a block: the block can only change its position if its velocity is finite. If the singularities are reached only as t → ∞, this may be physically meaningful; but for the case of Coulomb friction where singularities are reached after relatively short times, or static friction where the singularity is reached after a finite time span, the mathematical theory is insufficient. Nevertheless, for particles in contact, the linear oscillator with Coulomb friction is the prototype pattern of the flow in phase space.

1.5

Nonlinearities

Nonlinearities come up frequently in DEM simulations: even when linear interaction laws are assumed between contacting particles, the transition from non-contacting (zero force) to contacting (linear interaction) is nonlinear. The dynamics of nonlinear oscillators differ in various aspects from the dynamics of linear oscillators. The linear force law in Equation (1.92) contains no dependence of the period on the amplitude. For forces that are nonlinear and which grow more slowly than linearly in the displacement x, as in the case of the mathematical pendulum mx¨ + sin(x) = 0,

(1.112)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 40 — #40

i

40

i

Understanding the Discrete Element Method

2

x x2signx

1 x (t)

sinx 0 −1 −2

0

5

10

t

15

20

25

Figure 1.21 Relationship between period and amplitude for: the linear oscillator (dashed lines); the nonlinear oscillator of Equation (1.113) with n = 2 (solid black lines); and the mathematical pendulum of Equation (1.112) (solid gray lines). One curve of each pair has amplitude 2, and the other has amplitude such that the period is 2π.

the frequency decreases with the amplitude; see Figure 1.21. For forces that grow faster than linearly in the displacement x, such as mx¨ + |x|n sgn(x) = 0

(n > 1),

(1.113)

the oscillation frequency increases with the amplitude; see Figure 1.21. The forces which result from particles coming into contact with deformations are not of this type: in that case, there is only a repulsive part, for wedge-shaped contacts with n = 2 and for spherically shaped contacts with n = 3/2; see Johnson [19]. Still, although the attractive part of the interaction is missing, the frequency dependence for half a period is important to know: when the amplitudes are high (e.g. high collision velocity, large compression), the frequency is higher and the time-scale is smaller; therefore smaller time-steps have to be used to resolve the corresponding particle contacts. In the same way as the frequency is influenced by the power of the displacement, the contact time for colliding DEM particles will be affected: in a temporary collision, instead of a full sine oscillation, only a single arc of the sine curve will be transversed by the contacting particles. Viewed in phase space, linear differential equations leave the shape of the cat’s head as it is (Figures 1.18, 1.19 and the initial flow in Figure 1.20), whereas nonlinear differential equations distort the shape; see Figure 1.22 and the final stage of the flow in Figure 1.20.

1.5.1

Harmonic balance

The graphs in Figure 1.21 were produced via numerical integration. Analytical approaches to computing the amplitude dependence of the frequency are possible via the method of harmonic balance, i.e. by expanding the solution in a Fourier series (a sum of trigonometric functions) and considering the leading terms. For an oscillator with a third-order term (the Duffing oscillator) ˜ 3 = 0, mx¨ + kx + kx

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 41 — #41

i

Mechanics

i

41

3 2 1 0 −1 −2 −3 −4

−3

−2

−1

0

1

2

3

4

Figure 1.22 Phase flow for the mathematical pendulum of Equation (1.112): two phase space volumes chosen as cats’ heads become distorted during their transport through phase space, due to the sin(x) nonlinearity in (1.112).

we first rewrite the equation as x¨ + ω02 x + x 3 = 0,

(1.114)

and then approximate it by ˜ x¨ + K(x) = 0. For small nonlinearities, we can assume solutions similar to those of the linear oscillator, in the form x(t) ˜ = A cos ωt, where instead of ω0 we have to deal with the as-yet-unknown ω. Using the trigonometric identity cos 3θ = 4 cos3 θ − 3 cos θ, we obtain the expansion of x(t) ˜ in ω and its powers as   1 3 ˜ x(t)) K( ˜ = ω02 x + x 3 = ω02 + A2 A cos ωt + A3 cos 3ωt. 4 4 Neglecting the third harmonic (the term with dependence on 3ωt) and substituting this ˜ x(t)) expression for K( ˜ into Equation (1.114) yields the linearization   3 A2 2 x¨ + ω0 1 + x = 0. (1.115) 4 ω02

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 42 — #42

i

42

i

Understanding the Discrete Element Method

So, for the nonlinear oscillator of Equation (1.114), the amplitude dependence of the frequency is approximately  ω ≈ ω0

 3 A2 1− . 4 ω02

(1.116)

This agrees with the amplitude–frequency behavior in Figure 1.21 (though strictly speaking the nonlinearities in (1.113) with n = 2 and n = 3/2 cannot be expanded with leading terms in x 3 ): forces that grow faster than linearly lead to an increase in the frequency with increasing amplitude, while forces growing at a weaker rate lead to a decrease in the frequency with increasing amplitude. The resulting effect on the collision duration and the choice of time-step have already been discussed in the previous section.

1.5.2

Resonance in nonlinear systems

Resonance in nonlinear systems can be discussed analogously to the linear case in § 1.4.2. We make an ansatz for the solution, x(t) = A exp(i t),

(1.117)

where the frequency is the frequency of the external excitation, and add damping so that from (1.114) we obtain x¨ + 2δ x˙ + ω02 x + x 3 = f0 exp(i t).

(1.118)

Here, due to the nonlinearity of the system, will depend not only on the fundamental frequency ω0 , the nonlinear coefficient and the damping γ , but also on the amplitude A of the solution and the amplitude f0 of the external excitation. Using the harmonic balance approach of the previous subsection, Equation (1.118) simplifies to  x¨ + 2δ x˙

+ ω02

3 A2 1+ 4 ω02

 x = f0 exp(i t).

(1.119)

Plugging in the ansatz from (1.117), as in the linear case, we can eliminate the dependence on exp(i t) and get left with  −A

2

+ 2δA i + ω02

3 A2 1+ 4 ω02

 A = f0 .

(1.120)

Instead of solving for A as a function of , which would be a third-order equation, let us solve the second-order equation for in terms of A. This gives the two solutions !

1,2

3 = A 2 + 2δ 2 − A2 ± 4

  f02 3 2 2 2 2 + 4δ δ − ω0 + A ; 4 A2

(1.121)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 43 — #43

i

i

Mechanics

43

3.5 δ = 0ω0

3 δ = 0.1ω0

A(Ω)

2.5 δ = 0.2ω0 δ = 0.3ω0

2 1.5 1

δ = 1ω0

δ = 0.5ω0

0.5 ω0

2ω0

Ω

Figure 1.23 Curves of the relation between the amplitude A and the frequency in harmonic balance for the parameters = 0.4ω02 , f0 = 1.5ω02 and various values of δ in Equation (1.121). The dashed curve represents the relation between amplitude and frequency for free, undamped oscillations, i.e. the solution to (1.121) with γ = 0 and f0 = 0. For smaller , the curve will be more upright; for negative , it will be tilted towards the left. C B A(Ω)

A D E ω0

ΩBD

F

ΩCE

Ω

Figure 1.24 Hysteresis curve for the resonance of the nonlinear oscillator: when the frequency sweeps in a quasi-stationary way from lower to higher , the resonance curve follows the path ABCEF, whereas from higher to lower it follows the path FEDBA.

so the 1,2 indeed depend on all the other parameters in Equation (1.119). The graph for the real parts of 1,2 (only these are physically meaningful) is shown in Figure 1.23. Compared with the resonance curve for the linear oscillator, the cusp is tilted to the right for > 0 (and it would be tilted to the left for < 0). Because the solution for A in Equation (1.121) would be a third root, mathematically there are up to three solution points for a single value of , i.e. there may be several amplitudes for the same frequency; which one of these is assumed by the system depends on the history. In Figure 1.24, possible transitions between the states are sketched: for an increase of from point A to point F, the amplitude will follow the path ABCEF; for a decrease of from point F to point A, the amplitude will follow path FEDBA. The amplitudes between B and D (gray dashed line in Figure 1.24) will usually not be assumed by the system. The phenomenon of different amplitudes being

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 44 — #44

i

44

i

Understanding the Discrete Element Method

selected in the range between BD and CE , depending on whether the control parameter is increased or decreased, is called hysteresis. If we assume that a granular system is composed of particles with the nonlinear contacts described by the equations in this section, and if we assume that the whole system inherits the properties of the contacts, then for reasonably strong nonlinearities it becomes likely that certain vibration amplitudes cannot be realized: either too large or too small excitations take place. The authors have found such behavior in vibrated granular materials even in experiments: for some vibrated systems, convection was observed only for amplitudes that were larger or smaller than the actually desired amplitude, at which the system stood still.

1.5.3

Higher harmonics and frequency mixing

When we investigate physical systems, we input an external influence I (e.g. a force) and look at the response R of the system (e.g. the deformation); in the simplest case, there may be linear dependence R = aI. For a periodic input I = cos ωt with frequency ω, the displacement will follow a temporal variation of the same frequency. When we have a nonlinear system, the nonlinear response can be expanded as a Taylor series; for example, to second order, R = a1 I + a2 I 2 .

(1.122)

The response to a periodic input I = B cos ωt can then be rewritten via the trigonometric identity cos2 θ = 12 (cos 2θ + 1), with θ = ωt, as R = a1 B cos ωt + a2 B 2 cos2 ωt = a1 B cos ωt +

a2 B 2 a2 B 2 cos 2ωt + . 2 2

(1.123)

In other words, the response will consist of a part with the original frequency ω, another part with doubled frequency 2ω, and a displacement from the original equilibrium a2 B 2 /2. A striking example from optics of second-harmonic generation by frequency doubling is the emission of blue light from an optically active target which is irradiated by a red laser of high intensity. For mechanical systems such as granular materials, we may also obtain an output spectrum that differs from the input spectrum (i.e. different frequencies, different wavelengths). There is another important consequence for disordered granular materials with nonlinear characteristics, i.e. particle contacts that obey nonlinear interaction laws: for sound waves with different finite amplitudes B passing through the same initial configuration, there may be a different reconfiguration of the granular matrix of magnitude a2 B 2 /2 in (1.123), which itself affects again the sound propagation; see [20] for a signature of such reordering in a DEM simulation of sound propagation through a system of poyhedral particles. Due to the strong frictional damping in granular materials, it will not be possible to reduce the amplitude too much, or else no output signal R can be measured at all. The generation of

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 45 — #45

i

i

Mechanics

45

higher harmonics, not only doubling the frequency, can be derived mathematically using the trigonometric identities cos 3θ = 4 cos3 θ − 3 cos θ for frequency tripling, cos 4θ = 8 cos4 θ − 8 cos2 θ + 1 for frequency quadrupling, and     n n sin2 θ cosn−2 θ + sin4 θ cosn−4 θ − · · · 2 4   n k sin2k θ cosn−2k θ + · · · + (−1) 2k

cos nθ = n cosn θ −

for the general case of nth-order harmonics. If there are oscillations with more than one input frequency ω, then there will be multiplicative terms; for example, with two input frequencies A cos ω1 t and B cos ω2 t, the I 2 term in (1.122) will become (A cos ω1 t + B cos ω2 t)2 = A2 cos2 ω1 t + 2AB cos ω1 t cos ω2 t + B 2 cos2 ω2 t, so we have a product term cos ω1 t cos ω2 t. Using the trigonometric formula cos θ cos φ =

 1 cos(θ + φ) + cos(θ − φ) , 2

we obtain ‘sum frequency mixing’ with (ω1 + ω2 ) and ‘difference frequency mixing’ with (ω1 − ω2 ). For many materials, nonlinear effects can often be ‘argued away’ based on small pre-factors. However, for the granular materials that we wish to study with the discrete element method, damping is often considerable, so one cannot work with small amplitudes, even in small laboratory experiments. Apart from that, some granular phenomena, such as landslides and earthquakes, naturally come with large amplitudes. Because of all these nonlinear effects which can modify the original frequency spectrum, it is not possible to rely on runtime experiments with mixed frequency spectra: waves A(t, x, ω) emitted at t0 with a given frequency into the sample at one end may be damped out and not reach the detector at the other end of the sample at all; on the other hand, waves B(t, x, ω) ˜ in between are generated from other frequencies and reach the target at times unrelated to t0 ; see Shourbagy et al. [21] for a discussion of real data.

1.5.4

The van der Pol oscillator

An oscillator that exhibits some of the nonlinear frequency behavior discussed above is the forced van der Pol oscillator x¨ − μ(1 − x 2 )x˙ + x = η sin(ωt),

(1.124)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 46 — #46

i

46

i

Understanding the Discrete Element Method

2

2

10

1

1

10

0

0

10

1

10

0

|fft(x(t))|

v

x

η =0

1 2

2 0

20

60

40

80

t 2

10

1

1

10

0

0

10

1

10

v

1 2

2 0

20

40

60

80

100

2

1

t

0 x

1

2

2

4

0

0.5

0

1

ω

10

1 v

0

0

10

1

1

10

1.5

2

|fft(x(t))| 1

2

3

4

0

0.5

0

1

ω

1.5

2

|fft(x(t))|

2 1

x

3

10 10

η =0.4

2

10

2

η =0.04 x

2

1

0 x

1

2

100

1

1

2

3

2 2

0

20

40

60 t

80

100

2

1

0 x

1

2

10

4

0

0.5

1

ω

1.5

2

Figure 1.25 Position x(t) (left column), phase portrait (middle column) and power spectrum with maximum value normalized to 1 (right column) for the van der Pol oscillator (1.124) with μ = 0.2, ω = 1.15 and the values of η shown at the left of each row. For computation of the power spectrum, the gray portions of the x(t) curves (up to t ≈ 45) were omitted, and data x(t) with t up to about 1050 was used.

where the x 2 x˙ term is nonlinear, of third order. The autonomous system (without explicit time dependence, where η = 0) oscillates with frequency ω = 1 (see Figure 1.25, top row). The graph of x(t) is not exactly sinusoidal, so the peak of the power spectrum, i.e. the absolute value of the Fourier transform (see § 5.2.2) is broadened around the fundamental frequency. When for μ = 0.2 the external forcing is increased to η = 0.04, we see in the middle row of Figure 1.25 that another peak appears at a new, higher frequency, as well as a smaller one at ω = 0.85 and a tiny one at ω ≈ 0.7. This indicates the presence of difference frequency mixing. For larger forcing with η = 0.4, the difference mixing spreads out over the whole spectrum, and the peak at ω = 1 nearly reaches the amplitude for the eigenfrequency ω = 1 of the unforced oscillator (Figure 1.25, bottom row). The Poincar´e–Bendixson theorem prohibits the occurrence of chaos (in the exact mathematical sense) in a continuous dynamical system in the plane, so the van der Pol oscillator (which has only two coordinates, x and v) can have only a discrete spectrum. The Fourier transform used in the right column of Figure 1.25 gives a peak in the spectrum for ω. The peak is of finite width, even for the unique stable

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 47 — #47

i

Mechanics

i

47

trajectory which exists when η = 0. Therefore, the Fourier transform is not the optimal tool for analyzing whether a spectrum is continuous or not. In the next section, we will discuss a model which indeed exhibits a continuum of states, and introduce an method of analysis that does not require use of the Fourier transform.

1.6 1.6.1

From higher harmonics to chaos The bifurcation cascade

As the strength of the nonlinearity increases, phenomena can occur which are unexpected from the point of view of ‘linearized’ mechanics. Such phenomena can affect the observable computation and the accuracy with which specifications for experiment and simulation have to be given, and there may be considerable scattering of data even when the initial conditions are ‘nearly identical’ or if the system is perturbed a little to have ‘slightly different’ dynamics. Even if only the time-step of computation is changed, for large enough particle numbers the configuration may evolve along totally different trajectories. While there are many treatises on chaos in mechanics, few are directly applicable to DEM simulations; here we shall give an overview of phenomena that can actually affect the development of DEM programs. The generation of higher harmonics means that for a given discrete spectrum of input frequencies Sin = {ω1 , ω2 , . . . }, the system could respond with an output spectrum Sout = {ω˜ 1 , ω˜ 2 , . . . } that is different but still discrete. Beyond that, there is a possibility of going from discrete to continuous spectra in a bifurcation scenario: as the nonlinearity parameter (called η in the following) increases, the response parameter could split into two branches repeatedly, until a continuum of states (‘chaos’) is reached; see Figure 1.26.

1.6.2

The nonlinear frictional oscillator and Poincar´e maps

We consider the differential equation for the nonlinear friction oscillator (‘stick-slip oscillator’), x¨ + x + a [μ(1) + μ(x˙ − 1) sgn (x˙ − 1)] = γ cos(ηt),

(1.125)

x

η

Figure 1.26 Bifurcation scenario of a variable x, showing successive (not necessarily symmetric) period-doubling up to a continuum of states as an external nonlinearity parameter η is increased, leading to the development of chaos.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 48 — #48

i

48

i

Understanding the Discrete Element Method x(t) γ

k

η μ(v)

v

Figure 1.27 A physical system with the behavior of the differential equation (1.125): a mass on a conveyor belt with several couplings, which alternates between sliding and sticking.

0.6 0.4 0.2 μ(v) μ(v−1) sign(v−1) μ(1)+μ(v−1) sign(v−1)

0 −0.2 −0.4 −4

−3

−2

−1

0

1

2

3

4

Figure 1.28 Graphs of the velocity-dependent friction law μ(v) of (1.126) and different combinations of terms in the Stribeck friction expression in Equation (1.125).

where we use η rather than ω to represent the frequency, indicating that it will be our nonlinearity parameter. Here we take the sign function of (1.98), a = 10, and a velocity-dependent friction law with μ(v) =

μ0 − μ1 + μ1 + λ1 |v|2 , 1 + λ0 |v|

(1.126)

where the pre-factors of the velocity dependence are λ0 = 1.42 and λ1 = 0.01. The coefficient of static friction, μ0 = 0.4, is larger than the coefficient for dynamic friction, μ1 = 0.1. The velocity dependence is sketched in Figure 1.28: note that μ(v) is symmetric in v; the physical dependence of the sign requires the multiplication with sgn(x˙ − 1) in Equation (1.125). Velocity-dependent characteristics similar to μ(v) sgn(x) are sometimes referred to as ‘Stribeck friction’; an example is the friction between violin strings and the rosin-coated violin bow [22, p. 284]. A physical system corresponding to Equation (1.125) is depicted in Figure 1.27: a mass connected to a spring with spring constant 1 slips or sticks on a belt, with the mutual friction given by (1.126). The frictional oscillator of Equation (1.126), like the van der Pol oscillator, has only a single position coordinate x and a single velocity coordinate v, but it is not a purely twodimensional system. Our at-first-glance elusive definition of the sign function in (1.98), which leaves λ in the range −1 ≤ λ ≤ 1 such that the external force can be compensated, in fact includes an additional parameter (the ‘Conley index’; see Kunze [23]) which can act as a further dimension to the problem. In § 3.3.2, we will show how the computation of λ can be performed in a ‘numerically exact’ manner (i.e. with controllable discretization errors,

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 49 — #49

i

i

Mechanics

49

and without any modeling assumptions). Because of the variation of λ, the system exhibits true chaos without running afoul of the Poincar´e–Bendixson theorem, which forbids chaos in purely two-dimensional continuous systems. For discrete systems, even in one dimension, such as for the logistic map xn+1 = ηxn (1 − xn ), chaotic behavior (in the sense of continuous distributions for the xn+1 ) is possible; see § 1.6.3. The corresponding continuum model, the logistic equation dx = ηx (1 − x), dt has the explicit solution x(t) =

1+



1

1 x0

 , − 1 exp(ηt)

which is not chaotic at all. This should serve as a warning to anyone who tries to model the physical behavior of systems of discrete particles with continuum approaches: the same dynamics is not necessarily accessible when one goes from discrete to continuous models in a given dimension. The solutions to the nonlinear friction equation (1.125) vary strongly with η, as can be seen from the equilibrium trajectories in Figure 1.29 (i.e. trajectories omitting the initial part of the solution); depending on η, the solutions may differ considerably. One might guess that the solution is periodic, or not. To make it easier to investigate the periodic dynamics of the system and its dependence on the parameter η, instead of looking at the Fourier transform as in § 1.5.4, we will investigate the Poincar´e map (or Poincar´e cut, as it is obtained as an intersection with the plane at a given η), which is the intersection of the trajectory in phase space with a plane defined at a certain velocity (Poincar´e section); see Figure 1.30. This reduces the effective dimension of the system by 1. Instead of n peaks in the Fourier transform, for suitable chosen (half-)planes (we will choose v = 0 (a)

(b) 1 η = 1.04

1 η = 1.1529

0

−2

−1

−1

−2

−2

0 x

2

4

−3 −4

η = 1.15

0 v

−1

v

v

0

−3 −4

(c) 1

−2

−2

0 x

2

4

−3 −4

−2

0 x

2

4

Figure 1.29 Some trajectories for different values of the parameter η in the nonlinear friction oscillator equation (1.125). The cats’ heads are not shown, but they would all be contracted on the lines for the trajectories.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 50 — #50

i

50

i

Understanding the Discrete Element Method

1 0.5 0

η = 1.2803

−0.5 v

η = 1.1529

η = 1.04

−1 −1.5 −2 3 2 1

0 −1 −2 x

−3

1.05

1.15

1.1

1.25

1.2

1.3

η

Figure 1.30 Selected trajectories for various values of the parameter η in the nonlinear friction oscillator equation (1.125). Intersection points between the trajectory x(t) and the plane v = 0 are marked by crosses; further (numerically computed) intersection points for this Poincar´e map are marked by gray dots and are replotted in Figure 1.31 in two dimensions.

3

2.8

2.5 x

3

2.6

2

x

2.4 2.2 1.7

2

1.086 1.089 1.092 η

1.65

x

1.8 1.6 1

1.05

1.1

1.15

1.2

1.25 η

1.3

1.35

1.4

1.45

1.5

1.6

1.086 1.089 1.092 η

Figure 1.31 Return map (Poincar´e map) for the nonlinear friction oscillator of (1.125), obtained from the Poincar´e section at v = 0. Values of x(t) ≥ 0 are plotted for different values of η; the two insets display successively magnified phase space volumes to show the fine structure.

for x > 0) one finds n intersection points between the trajectory and the plane. The return map is plotted in Figure 1.31: the nonlinear frictional oscillator alternates between oscillating among a set of discrete values (periodic dynamics) and visiting a range of practically continuously distributed values (chaos) at different values of η. This alternating behavior is called intermittency. The practically continuous spectrum is a sign of mechanical chaos: initially close trajectories can diverge arbitrarily far. Attractors exhibiting this kind of behavior

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 51 — #51

i

Mechanics

i

51

are also called ‘strange attractors’. Chaos is the most highly nonlinear form of nonlinearity. The short-term behavior is predictable, but the long-term behavior is not. Although trajectories are unpredictable, there is a definite mathematical structure that allows one to predict in which parameter region chaos will occur. Despite this, due to finite errors which are inherent in modeling a system, one may obtain practically random behavior from systems which are to all intents and purposes deterministic. Note, however, that although the distribution of values in a chaotic system is continuous, it is by no means uniform, as can be seen from the shading in Figure 1.31. Because, moreover, the order in which the continuum is sampled is difficult to conceive, Poincar´e maps of the chaotic regime cannot be used as, for instance, random number generators, for which there are better alternatives (see, e.g., vol. 2 of [24]).

1.6.3

The route to chaos

The sequential growth in complexity of the dynamics with the strength of the nonlinearity, as in the bifurcation scenario, is sometimes called the ‘route to chaos’. For a linear system, there is a single mode (e.g. velocity, frequency, wavenumber, position, or a combination of these). When nonlinearity is involved, additional peaks can be observed in the spectrum. Eventually there is a transition from a spectrum of densely positioned peaks to a continuous spectrum—to chaos. Even in the chaotic case it does not mean that the probability density of each trajectory is the same, as can be seen from Figure 1.31, which definitely shows structure even in the chaotic region. The classical bifurcation scenario assumes that each stage involves a doubling of peaks, but that is not what we see for our nonlinear frictional oscillator in Figure 1.31. Some return maps are self-similar, or fractal; this is the case for the final (stationary) values of the discrete iteration known as the ‘logistic map’: xn+1 = ηxn (1 − xn )

(1.127)

with nonlinearity parameter η. Self-similarity means that if one magnifies a portion of the diagram, one sees basically (and in some cases, after transformation of the axes, exactly) the same overall structure as the original; see Figure 1.32, where successively magnified portions of the map are shown. The Poincar´e map of the frictional oscillator in Figure 1.31 is not fractal; the Coulomb friction seems to break the scale-invariance that is inherent to the return maps of many nonlinear systems. This means that one has to be careful when adapting concepts of nonlinear theory to realistic mechanical systems, especially granular materials. Particle size, friction and other physical properties lead to characteristic dynamics at different scales, which may be incompatible with aspects of nonlinear systems such as self-similar return maps. While chaos itself inhibits the computation of individual trajectories in accordance with experimental data, it may actually be an asset for the theoretical prediction of statistical properties of many-particle systems. Molecular chaos, the assumption that velocities of colliding particles are uncorrelated and independent of position (Boltzmann’s ‘Stosszahlansatz’, or collision-parameter approach) underlies many analytical methods for collision-dominated particle systems, including granular particles at low densities. In fluid mechanics, chaos is equivalent to turbulence, i.e. a continuous size distribution of vortices from the largest to the smallest length scales. In fluid dynamics, ‘routes to chaos’ via bifurcation can evolve simultaneously in the same system at different places: for the separation flow in a transitional boundary layer with an impinging shock-wave as external forcing, the spatial and temporal

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 52 — #52

i

52

i

Understanding the Discrete Element Method

3.8537 0.505 1

3.8541

0.5

0.8

0.495

0.6 xn

0.54 0.52 0.5 0.48 0.46

0.4 0.2 0

3.8539

2

2.2

3.83 2.4

3.84 2.6

3.85 2.8

3 η

3.2

3.4

3.6

3.8

4

Figure 1.32 Return map for the logistic map of Equation (1.127). Successive inserts show magnified detail of the previous map; while having different scales, all three plots show the same structure, demonstrating the fractal (self-similar) nature of the map.

development of the first vortex at the impinging point towards the vortex field further downstream (see [25]) follows the bifurcation cascade in Figure 1.26. It cannot be excluded that different stages of the development of chaos might occur simultaneously in granular systems simulated with the discrete element method.

1.6.4

Boundary conditions and many-particle systems

The character of the nonlinearity may be not only a matter of the dynamics of the physical system but also of the boundary conditions. In Figure 1.33 we contrast the trajectories in a conventional billiard geometry and in a ‘stadium billiard’ geometry, for constant absolute velocity; the dynamics is that of a single particle which gets reflected at the boundaries. In the conventional billiard case, the trajectories are parallel, whereas in the stadium billiard case they diverge and, for certain types of boundaries, become chaotic [26]. This means that sharpness of corners (here, of the system boundaries) is not in itself a guarantee of the existence of nonlinearities. As the divergence of initially close trajectories may be desired, for example when considering mixing in hoppers, one has to pay proper attention to the shape of the boundaries. For sharp corners rather than flat edges, the character of the nonlinearity can be assumed to increase. Especially for particle systems with low density, boundary and initial conditions will have considerable influence on the dynamics, beyond mere interaction. For the simulation of accretion disks via smoothed particle hydrodynamics (SPH), a symmetric choice of initial positions and velocities has been found to cause axisymmetric stripes at a later stage in the simulations [27], which overlay the inherent instabilities of the system [28]. As SPH uses more interaction partners and stronger averaging than the discrete element method, one has to be even more careful with ‘harder’ (more nonlinear) interactions than in DEM simulations. Chaos can easily occur in mechanical multi-body systems: already the double-pendulum, which has only two degrees of freedom, can exhibit chaos [29]. In dry granular materials, there are several aspects which contribute to the nonlinear character on the level of individual particles: the first is the transition from no interaction for separated particles to a finite interaction

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 53 — #53

i

Mechanics

i

53

v0

v0

Figure 1.33 Effect of the boundary condition on the nonlinearity of the system (manifested here as the divergence of trajectories): trajectories of a system with rigid reflection when the boundary conditions are shaped according to a conventional billiard table (above) or a ‘stadium billiard’ table (below). The same initial velocity is assumed in both cases; the time evolution of points from the set of initial conditions (cat’s head) is shown in black, while selected trajectories are drawn in gray.

for particles in contact, which may actually be more decisive than the second aspect, which is the detailed nonlinear power of the interaction. Computationally, chaos was discovered by E. Lorenz, who found, in a nonlinear oscillator system with three variables, wildly different solution trajectories from only slightly different initial conditions [30]. One should not be surprised to encounter this behavior in discrete element solutions as well. The Euler equations of motion for rotation, (1.35)–(1.37), are themselves nonlinear; we have observed in polyhedral particle simulations that minimal changes (even a mere reduction in the step-size or choice of a different order in the summation of the forces) could lead to a strong divergence of the orientation of the particles, although at least in the beginning the positions of the center of mass were not affected.

1.7

Stability and conservation laws

Stability is the notion that a system ‘does not change much’ under a perturbation. This means that if we repeat the same experiment (or calculation, or simulation) with slightly different initial conditions, the outcome should also not change much. Here we review some basic ideas

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 54 — #54

i

54

i

Understanding the Discrete Element Method

Unstable stationary point δ εδ

Trajectory of pendulum z

Stable stationary point

Figure 1.34 Stability and instability for a pendulum resting at different stationary points. After a small displacement δ from the upper stationary point, the bob moves away from the position, which is therefore an unstable state. The bob always returns to the lower stationary point after a small displacement, so this position is a stable equilibrium.

from stability theory, but will not go into details: while the general notion is important for simulations of mechanical systems, almost all DEM systems will turn out to be unstable in the sense of classical stability in mechanics, which was devised more with celestial mechanics in mind than with the aim of describing friction- and dissipation-influenced phenomena on Earth. Nevertheless, the concept (though not the mathematical theory) of stable and unstable quantities is useful in helping us focus on appropriate observables in particle simulations. Further, we outline which conservation laws are suitable for testing the quality of DEM simulations.

1.7.1

Stability in statics

Mechanical stability, or lack thereof, is usually defined with respect to (not necessarily onedimensional) stationary points xs (also called equilibrium positions [31, p. 797]) of a physical system. If for x(t) = xs , v(t) = 0, the position will stay at xs always, so xs is said to be a stationary point. If after a small deflection δ from xs , the system stays close to xs , then the stationary point is stable; if the system moves away after a small deflection δ, the stationary point is unstable [32, p. 166]. The bob of the pendulum in Figure 1.34 has two stationary positions, at the highest and lowest points; the upper one is unstable and the lower one is stable. The formal definition of stability is that for any > 0 there exists a δ( ) > 0 such that whenever |x(0) − xs | < δ,

(1.128)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 55 — #55

i

Mechanics

i

55

we have |x(t) − xs | <

(1.129)

for all t > 0. This means that solutions which start ‘close enough’ to the equilibrium (within a distance δ of it) remain ‘close’ forever (within a distance ). Note that this must be true for any infinitesimal > 0 that one might choose. Equations (1.128) and (1.129) have the same mathematical form as the Weierstrass criterion for continuous functions (epsilon-delta continuity) [31, p. 57]: a function f (t) is continuous if for any > 0 there exists a δ( ) > 0 such that whenever |t − a| < δ, we have |f (t) − f (a)| < .

1.7.2

(1.130)

Stability in dynamics

We define stability in dynamics analogously to the stability of points in statics, by generalizing from single positions to entire time-dependent trajectories x(t), i.e. to solutions of an initial value problem for a differential equation. Let ti denote the initial time and tf the final time of ˜ i ) = q˜i for two initial states, and q(tf ) = qf and q(t ˜ f ) = q˜f interest. We write q(ti ) = qi and q(t for the final states on the corresponding trajectories. Then we have stability if for initially close coordinates with |qi − q˜i | < δ,

(1.131)

the separation between the final coordinates is bounded by a function which is a power of the time span: |qf − q˜f | < (tf − ti )p .

(1.132)

If the deviation diverges exponentially, |q(t) − q(t)| ˜ = C exp(λt)

(1.133)

for some λ > 0, the system is said to be Lyapunov unstable, and λ is called the Lyapunov exponent; see Figure 1.35. From this follows a definition of stability via ‘Lyapunov functions’: if a solution can be constructed using exponential functions with positive exponents, then it is unstable. Because of the symmetry between coordinates and their velocities, as mentioned in § 1.4, velocities may be included in the norm (i.e. distance) measurements of (1.131)–(1.133). Strictly speaking, these definitions are valid only for trajectories which correspond to solutions of ordinary differential equations that are autonomous systems without dissipation. We have seen in the previous discussions on resonance that a periodic perturbation can generate infinite amplitudes in the absence of damping, which makes the definition of stability more complicated; see Merkin [33, p. 226ff]. Furthermore, one should really make a distinction between a theory for finite times and one for infinite times (see the Introduction of [33]). Earlier we introduced some other mathematical phenomena that can lead to instability; for example, chaos, whereby the Poincar´e map generates a continuum of points, is a type of

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 56 — #56

i

56

i

Understanding the Discrete Element Method

(b)

(a)

x˜f

Unstable system Neighboring trajectory

Position

Position

Stable system Neighboring trajectory

Original trajectory x˜ 0 x0

x˜f

Original trajectory

xf

xf

x˜ 0 x0 Time

Time

Figure 1.35 Neighboring trajectories of: (a) a stable system; (b) an unstable system.

instability. Certain kinds of attractors are also indicators of instability; see the elaborate discussion in Greiner [34, p. 467ff]. If dissipation is added to stable mechanical systems, the stability of otherwise stable structures can be destroyed [33, p. 202], which is rather counterintuitive: one might expect that dissipation, which removes energy from the system and reduces particle motion, would increase stability. However, as demonstrated by the hysteresis jumps in the resonance curve with linear damping of Figure 1.24, damping can indeed lead to a loss of stability. On the other hand, that the usual definitions of stability result in dissipative systems being classified as generally unstable reflects the fact that these ideas of stability originate from celestial mechanics and have only limited applicability to terrestrial mechanics. We might hope for a definition that could discriminate between ‘stable’ and ‘unstable’ slopes in granular materials; however, no such theory exists. There are other aspects of classical stability theory that make its application to particle mechanics problems, such as the discrete element method, difficult. For instance, displacements (or the corresponding perturbations to the systems) are always ‘infinitesimal’ in mathematical stability theory; but for real systems and finite displacements, doubt remains as to whether this mathematical theory can describe appropriately the actual stability or instability of mechanical systems. Arnold [13, p. 121] gave a mathematical proof that while the acrobatic (inverted) pendulum (i.e. the bob positioned at the apex of the trajectory in Figure 1.34) is unstable, the same stationary point becomes stable if the pendulum is vibrated vertically. An experimental realization of this scenario would be balancing a pencil by merely moving it up and down in one’s hand. This would certainly not be sufficient to keep the pencil upright, so in this case mathematical rigor is not the same as physical relevance. As any perturbations in physical experiments are finite, but the mathematical theory assumes infinitesimal perturbations, we would still consider the system in question as being unstable based on everyday experience.

1.7.3

Stable axes of rotation around the principal axis

The previous discussions pertained to rectilinear degrees of freedom. For angular motion, where different degrees of freedom are coupled, a slightly different approach than the − δ

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 57 — #57

i

Mechanics

i

57

philosophy is needed. We show here the common analysis of which axes are stable for rotation around the principal axis, i.e. the axis obtained from the coordinate transformation in such a way that the tensor of inertia is diagonal. For the torque-free case, we can rewrite the Euler equations of motion, (1.35)–(1.37), in terms of the angular momenta Lbi . To do this, multiply the first Euler equation by J2 J3 , the second by J1 J3 and the third by J1 J2 (where J1 , J2 , J3 are the diagonal elements of the tensor of inertia), then replace each Ji ωib with Lbi to obtain J2 J3 L˙ b1 = (J2 − J3 ) Lb2 Lb3 ,

(1.134)

J1 J3 L˙ b2 = (J3 − J1 ) Lb1 Lb3 ,

(1.135)

J1 J2 L˙ b3

(1.136)

=

(J1 − J2 ) Lb1 Lb2 .

Now, if we multiply (1.134) by J1 Lb1 , (1.135) by J2 Lb2 (1.136) by J3 Lb3 , and add up the resulting equations, we obtain   J1 J2 J3 Lb1 L˙ b1 + Lb2 L˙ b2 + Lb3 L˙ b3 = 0 Integration with respect to time then gives    2  2  2 b L1 + Lb2 + Lb3 = C1 ,

(1.137)

where C1 is a constant, equal to the value of L2 , the square of the total angular momentum Lb of the object in the body-fixed coordinate system. So we have shown that in the absence of an external torque, the total angular momentum is conserved. If we multiply (1.134) by Lb1 , (1.135) by Lb2 and (1.136) by Lb3 and then add up the resulting equations, we obtain   2  2  2  = 0. J2 J3 Lb1 + J1 J3 Lb2 + J1 J2 Lb3 Integrating with respect to time and dividing by J1 J2 J3 gives 

2  b 2  b 2  Lb1 L2 L + + 3 = C2 . J1 J2 J3

(1.138)

Recall that in the context of rectilinear degrees of freedom, for mass m, momentum p and velocity v we have p2 = m2 v 2 = 2T , where T is the kinetic energy; so we see that C2 must correspond to twice the kinetic energy of the intrinsic angular momentum T i , which is also conserved. At the same time, we see that for the individual components 1, 2, 3 of kinetic energy and angular momentum, no conservation law can be derived, as the Euler equations of motion couple the three components together. Equation (1.138) has the same functional form as the general equation for an ellipsoid, y2 z2 x2 + 2 + 2 = 1, 2 a b c

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 58 — #58

i

58

i

Understanding the Discrete Element Method

3

1 2

Figure 1.36 Poinsot ellipsoid of constant kinetic energy for the intrinsic angular momentum; stable trajectories around the principal axes 1 and 3 are plotted as solid lines, and the trajectory around the unstable principal axis 2 is plotted as a dashed line.

for which the half-axes of lengths a, b and c are aligned parallel to the Cartesian axes x, y and z, respectively. The ellipsoid described by Equation (1.138) is called the Poinsot ellipsoid, after L. Poinsot, who first proposed an interpretation of rotation as rolling of this ellipsoid on a plane in angular momentum space [35]. Let us order the axes so that J1 < J2 < J3 . For fixed i = Lb /(2J ), which is maximal, while Lb , the kinetic energy for rotation around axis 1 is Tmin 1 i the kinetic energy for rotation around axis 3, Tmax = Lb /(2J3 ), is minimal. Accordingly, the kinetic energy T i is bounded by i i Tmin ≤ T i ≤ Tmax .

In Figure 1.36 we plot trajectories of L for various initial conditions, along with periodic ‘trajectories’ of the angular momentum. It turns out that rotations around axis 1 (corresponding to the smallest moment of inertia J1 ) and around axis 3 (the largest moment of inertia J3 ) are stable, whereas rotations around axis 2 (intermediate moment of inertia J2 , where J1 < J2 < J3 ) are unstable. It is also possible to prove the Lyapunov instability analytically; see [36]. The assumption of energy conservation is rather strong—too strong to be valid for many technical applications. If a system is energy dissipating, a reduction in the energy will force the rotation to be around axis 3 with the minimum kinetic energy and maximum moment of inertia. Even satellites have been found to show enough dissipation so that axis 1 with the minimal moment of inertia and therefore the maximal kinetic energy becomes unstable [37, p. 62ff]. This shows that stability proofs for ‘ideal’ (e.g. frictionless) systems are not all that relevant for technical systems, including particle systems and the discrete element method.

1.7.4

Noether’s theorem and conservation laws

Noether’s theorem (see, e.g., [38, p. 359]) asserts that where there is a symmetry in a mechanical system, there is a conserved quantity. From the homogeneity of time (every time interval looks like every other time interval) follows the conservation of total energy. From the homogeneity of space (every spatial interval looks like every other spatial interval) follows the

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 59 — #59

i

Mechanics

i

59

conservation of momentum. From the isotropy of space (every direction looks like every other direction) follows the conservation of angular momentum. Such conservation laws can be used to test, for example, whether interaction laws have been implemented correctly in models. It will be useful to discuss briefly what quantities can be conserved for discrete element systems. (It is advisable to test such conservation laws first with two particles in the system to avoid losing the overall view of the simultaneous interactions.) The total energy in a particle system consists of kinetic energy, both for the rectilinear degree of freedom, (1.26), and for the angular degree of freedom, (1.27), and potential energy. In the body-fixed system, the position energy for particle systems consists of external potentials, usually gravity, and interaction energy, which for DEM particles would be ‘elastic’ energy due to overlapping or deformed contacts. While for gravity the potential energy is easy to compute, in the DEM case potential energy can be difficult to estimate for anything other than linear potentials. Forces F that are ‘conservative’, i.e. which conserve the total energy of a particle moving under their influence) are those which can be derived from the gradient of a scalar field , so that F = −∇. Besides for gradient potentials, energy conservation holds for rotationally symmetric potentials; for collisions with other potentials, a violation of energy conservation must be expected. In other words, for many discrete element models with non-spherical particles, there is no energy conservation even in the absence of velocity-dependent forces. Imagine the onedimensional propagation of a particle through a force field which is asymmetric: from x1 to 0 the force increases linearly, and from 0 to x2 it decreases linearly, so that it is described by the formula ⎧ 0 ⎪ ⎪ ⎪ ⎨k (x − x ) 1 1 F (x) = ⎪k2 x − k1 x1 ⎪ ⎪ ⎩ 0

for x < x1 , for x1 < x < 0, for 0 < x < x2 , for x > x2 ,

(1.139)

where |k1 | = |k2 |, k1 > 0, k2 < 0 and |k1 x1 | = |k2 x2 |; see Figure 1.37. Consider a particle moving from a position x3 < x1 to another position x4 > x2 . Because the work performed on the particle is W =

x4

F (x) dx > 0,

(1.140)

Force F(x)

x3

k1 x1

k2 0

Position x

x2

Figure 1.37 Sketch of the asymmetric force field given by Equation (1.139), where k1 and k2 are the gradients of the force at different positions.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 60 — #60

i

Understanding the Discrete Element Method

Space

60

i

Time

Figure 1.38 One-dimensional collision of non-spherical particles: the kinetic energy of the rectilinear degree of freedom is greater before than after the collision, even if there are no velocity-dependent damping forces. At the approach of the contact, the interaction is via a wedge–wedge contact (proportional to d 2 where d is the deformation; see § 1.5); on separation the interaction is linear (proportional to d 1 ).

the kinetic energy of the particle at x4 will be different than at x3 . At x3 and x4 , the kinetic energy is the same as the total energy, because there is no other potential. So particles which traverse asymmetric force fields on approach may undergo a change of their total energy even in the absence of velocity-dependent dissipative forces. This often happens in particle simulations of collisions involving rotations of non-spherical particles: when the particles turn during the collision, the repulsive force on approach can be different from the force upon separation; see Figure 1.38. This is one reason why verifying conservation of energy is not a useful way to check whether the simulation and time integrator were implemented correctly; another reason is the time integration method itself (see Chapter 2, in particular § 2.4). Only for spherical particles, owing to the rotational symmetry, is the energy conserved with certainty. Nevertheless, for asymmetric forces (without velocity-dependent dissipative forces), time reversal is a suitable test. First, one runs the particle collision forward from initial time ti to final time tf : xi , qi , vi , ωi ) → ( xf , qf , vf , ωf ); ( then one runs the process backward and compares the respective increase and loss of energy, which should add up to zero. In practical terms this means running the integrator backward in time (with negative time-step) or running the integrator with positive time-step but with the final velocities reversed, i.e. vi , −ωf ) → ( vi , −ωi ). xi , qi , − xf , qf , − ( Newton’s third law states that the forces between bodies are equal in magnitude and opposite in direction, i.e. ‘action = reaction’. This means that if there are no forces which act on the whole system (such as gravity), only interaction forces between particles, then momentum should always be conserved. Thus, conservation of momentum is generally a more useful test than energy conservation. However, as ‘action = reaction’ holds only for forces, we cannot invoke such a law for torques when the distance between the force point and the center of mass is different for the two bodies. The total angular momentum should be conserved

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 61 — #61

i

Exercises

i

61

unconditionally; it is the sum of all orbital and intrinsic angular momenta, Ltotal = Ltotal,o + Ltotal,i . For all particles the orbital angular momenta Lok , cross products of the centers of mass rk and the momenta pk , are summed: Ltotal,o =



Lok =

k

n

rk × pk =



mk rk × vk .

k

k=1

The intrinsic angular momenta (‘spin’) are the products of the tensors of the moments of inertia Jk and the angular velocities ωk : Ltotal,i =

k

Lik =



Jk ωk .

k

If angular momenta are to be compared for equality at times t1 and t2 (provided no external torques act on the system), the computation has to be done in the same coordinate system. Equality should also be taken in the numerical sense, i.e. with finite precision; see Chapter 2, particularly the section on relative and rounding errors. While many exercises in mechanics favor center-of-mass calculations for multi-body systems, in terms of numerical computation these offer no advantage. Newton’s first law states that if the sum of forces acting on a body is zero, the body’s velocity will be constant: either it stays at rest or it will move in a straight line with constant velocity. In simulations, initial conditions that should lead to zero velocity may actually experience motion due to small noise terms (which arise, for instance, from using a finite time-step or from oscillatory motion generated by insufficiently damped interaction forces). For example, a block on a slope which, according to analytical calculations, should keep its position may slide downhill in a simulation; see § 3.1.

1.8

Further reading

A readable introduction to mechanics is the book by R. D. Gregory [38]. Merkin et al. [33] give an overview of mathematical stability theory which is applicable to mechanical systems. A not-too-difficult introduction to Hamiltonian systems and geometrical integration is the book by Leimkuhler and Reich [15]. An extensive analysis of the frictional oscillator and further references can be found in the article by Kunze [23]. Solution methods for various nonlinear oscillations are discussed in Mickens’s book [39]. For a recent monograph on Newton–Euler dynamics, see Ardema [40]. Resonance phenomena in nonlinear systems are treated by Manevich and Manevich [41].

Exercises 1.1 Rotations and complex numbers a) Compute the eigenvalues of the two-dimensional rotation matrix Aφ of Equation (1.9).

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 62 — #62

i

62

i

Understanding the Discrete Element Method

  1 b) Show that multiplication of the two-dimensional vector v = by the matrix Aφ is 0 equivalent to rotation of v by angle φ. 1.2 Quaternions a) Quaternion product. Derive the rule for quaternion products, Equation (1.92), from the definitions in (1.44)–(1.47). b) Real representation of quaternion basis elements. Show that the real matrices ⎛ ⎛ ⎞ ⎞ 1 0 0 0 0 1 0 0 ⎜0 1 0 0⎟ ⎜−1 0 0 0⎟ ⎟ ⎟ B1 = ⎜ BI = ⎜ ⎝0 0 1 0⎠, ⎝ 0 0 0 1⎠, 0 0 0 1 0 0 −1 0 (1.141) ⎛ ⎛ ⎞ ⎞ 0 0 0 1 0 0 1 0 ⎜0 ⎜ 0 0 0 −1⎟ 0 1 0⎟ ⎜ ⎟ ⎟ BJ = ⎜ ⎝ 0 −1 0 0⎠, BK = ⎝−1 0 0 0 ⎠ −1 0 0 0 0 1 0 0 satisfy the same commutativity relations as 1, I, J and K in Equations (1.44) and (1.45). c) Complex representation for element quaternions. Show that the complex matrices     1 0 i 0 , BI = , B1 = 0 1 0 −i (1.142)     0 1 0 i BJ = , BK = −1 0 i 0 satisfy the same commutativity relations as 1, I, J and K in Equations (1.44) and (1.45). d) In quantum physics, to describe objects with multiples of spin 12 , the Pauli matrices       0 1 0 −i 1 0 σx = , σy = , σz = (1.143) 1 0 i 0 0 −1 are used. Compute the eigenvalues of the matrices (1.143) and the 2 × 2 matrices in (1.142). What is different? e) Program the elementary operations for quaternions (multiplication, conjugation, R functions. inversion) as MATLAB 1.3 For the undamped case of the driven linear oscillator (1.101), i.e. x¨ + ω02 x = f0 /m exp(iωt),

(1.144)

check for yourself that at resonance ω0 = ω, not only x(t) ∝ sin(iωt) is a solution but also x(t) ∝ t sin(iωt), which means that although mathematically the resonance amplitude can become infinite according to Equation (1.105), its growth is only linear in time, so that the time needed to reach the infinite amplitude is also infinite.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 63 — #63

i

Exercises

i

63

References [1] M. Nagasawa, Schr¨odinger Equations and Diffusion Theory. Monographs in Mathematics, Birkh¨auser, 1993. [2] G. Emch and C. Liu, The Logic of Thermo-Statistical Physics. Physics and Astronomy Online Library, Springer, 2002. [3] H. Goldstein, C. P. Poole, and J. L. Safko, Classical Mechanics, 3rd ed. Pearson, 2001. [4] M. P. Allen and D. Tildesley, Computer Simulation of Liquids. Oxford University Press, 1987. [5] J. Myers, Handbook of Equations for Mass and Area Properties of Various Geometrical Shapes. NAVWEPS Report 7827, U.S. Naval Ordnance Test Station, 1962. [6] J. Wittenburg, Dynamics of Systems of Rigid Bodies. Teubner, 1977. [7] D. Greenwood, Advanced Dynamics. Cambridge University Press, 2006. [8] H. Corben and P. Stehle, Classical Mechanics. Wiley, 1960. [9] W. Benenson, J. Harris, H. St¨ocker, and H. Lutz (eds.), Handbook of Physics. Springer, 2002. [10] J. Hartog, “LXXIII. Forced vibrations with combined viscous and coulomb damping”, Philosophical Magazine Series 7, vol. 9, no. 59, pp. 801–817, 1930. [11] R. Reissig, “Erzwungene Schwingungen mit z¨aher und trockener Reibung”, Mathematische Nachrichten, vol. 11, no. 6, pp. 345–384, 1954. [12] J. Knudsen and P. Hjorth, Elements of Newtonian Mechanics: Including Nonlinear Dynamics, 3rd ed. Advanced Texts in Physics, Springer, 2000. [13] V. I. Arnold, Mathematical Methods of Classical Mechanics, 2nd ed. Graduate Texts in Mathematics, Springer, 1989. [14] S. Zdravkovska and P. Duren, Golden Years of Moscow Mathematics. History of Mathematics, American Mathematical Society, 2007. [15] B. Leimkuhler and S. Reich, Simulating Hamiltonian Dynamics. Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, 2004. [16] E. Hairer, C. Lubich, and G. Wanner, Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations, 2nd ed. Springer, 2006. [17] S.-H. Chong, M. Otsuki, and H. Hayakawa, “Generalized Green-Kubo relation and integral fluctuation theorem for driven dissipative systems without microscopic time reversibility”, Physical Review E, vol. 81, 041130, 2010. [18] A. Filippov, Differential Equations with Discontinuous Righthand Sides. Mathematics and its Applications, Springer, 1988. [19] K. Johnson, Contact Mechanics. Cambridge University Press, 1987. [20] W. C. Cheng, J. Chen, and H.-G. Matuttis, “Granular acoustics of polyhedral particles”, AIP Conference Proceedings, vol. 1542, pp. 567 –570, 2013. [21] S. A. E. Shourbagy, S. Okeda, and H. G. Matuttis, “Acoustic of sound propagation in granular materials in one, two, and three dimensions”, Journal of the Physical Society of Japan, vol. 77, no. 3, article 034606, 2008. [22] N. Fletcher and T. Rossing, The Physics of Musical Instruments. Springer, 1998. [23] M. Kunze, “Rigorous methods and numerical results for dry friction problems”, in Applied Nonlinear Dynamics and Chaos of Mechanical Systems with Discontinuities (M. Wiercigroch and B. De Kraker, eds.), World Scientific Series on Nonlinear Science Series A, pp. 207–235, World Scientific, 2000. [24] D. Knuth, The Art of Computer Programming, Volumes 1–4A. Addison-Wesley, 2011. [25] Y. Tokura and H. Maekawa, “Direct numerical simulation of impinging shock wave/transitional boundary layer interaction with separation flow”, Journal of Fluid Science and Technology, vol. 6, no. 5, pp. 765–779, 2011. [26] L. Bunimovich, “On the ergodic properties of nowhere dispersing billiards”, Communications in Mathematical Physics, vol. 65, pp. 295–312, 1979. [27] S. T. Maddison, J. R. Murray, and J. J. Monaghan, “SPH simulations of accretion disks and narrow rings”, Publications of the Astronomical Society of Australia, vol. 13, pp. 66–70, 1996. [28] R. Speith and W. Kley, “Stability of the viscously spreading ring”, Astronomy and Astrophysics, vol. 399, no. 2, pp. 395–407, 2003. [29] T. Shinbrot, C. Grebogi, J. Widsom, and J. A. Yorke, “Chaos in a double pendulum”, American Journal of Physics, vol. 60, pp. 491–499, 1992. [30] E. N. Lorenz, “Deterministic nonperiodic flow”, Journal of the Atmospheric Sciences, vol. 20, pp. 130–141, 1963.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:18 — page 64 — #64

i

64

i

Understanding the Discrete Element Method

[31] I. N. Bronshtein, K. A. Semendyayev, G. Musiol, and H. Muehlig, Handbook of Mathematics, 5th ed. Springer, 2007. [32] F. R. Gantmaher, Lectures in Analytical Mechanics. MIR Publishers, 1970. [33] D. Merkin, F. Afagh, and A. Smirnov, Introduction to the Theory of Stability. Texts in Applied Mathematics, Springer, 1997. [34] W. Greiner, Classical Mechanics: Systems of Particles and Hamiltonian Dynamics. Classical Theoretical Physics, Springer, 2010. [35] L. Poinsot, Th´eorie nouvelle de la rotation des corps. Bachelier, 1851. [36] J. P. Vinti, “Conservation laws and Liapounov stability of the free rotation of a rigid body”, Celestial Mechanics, vol. 1, pp. 59–71, 1969. [37] M. H. Kaplan, Modern Spacecraft Dynamics & Control. Wiley, 1976. [38] R. Gregory, Classical Mechanics. Cambridge University Press, 2006. [39] R. Mickens, Truly Nonlinear Oscillations: Harmonic Balance, Parameter Expansions, Iteration, and Averaging Methods. World Scientific, 2010. [40] M. Ardema, Newton-Euler Dynamics. Springer, 2006. [41] A. Manevich and L. Manevich, The Mechanics of Nonlinear Systems with Internal Resonances. Imperial College Press, 2005.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 65 — #1

i

i

2 Numerical Integration of Ordinary Differential Equations Historically, the numerical solution of ordinary differential equations (ODEs) has been called integration, and the numerical evaluation of integrals has been called quadrature. Therefore, solution methods for ODEs are usually called integrators, or solvers. In this book we focus on methods and concepts, along with the associated terminology. For DEM implementations these are more important than mathematical proofs and discussions of the order of accuracy, which may not be valid anyway for the non-smooth force laws of granular dynamics. Readers who are interested in derivations of the order of accuracy of numerical methods can refer to the numerical analysis literature—but be aware that such derivations generally assume smooth forces.

2.1 2.1.1

Fundamentals of numerical analysis Floating point numbers

Integers are represented in the binary system as sequences of zeros and ones (bit patterns); for example, ··· ···

0 ↓ 0 · 25

1 ↓ + 1 · 24

1 ↓ + 1 · 23

0 ↓ + 0 · 21

1 ↓ + 1 · 20

(2.1)

R has some convenient tools for conversion between binary and decimal repreMATLAB sentations, such as the functions bin2dec and dec2bin, but integers are insufficient for representing the much bigger set R of real numbers in mathematics. Fixed point numbers are integers scaled by a constant factor that is smaller than 1, but as they are very limited in magnitude, they went out of use in scientific computing decades ago. Nowadays, for numerically

Understanding the Discrete Element Method: Simulation of Non-Spherical Particles for Granular and Multi-body Systems, First Edition. Hans-Georg Matuttis and Jian Chen. © 2014 John Wiley & Sons, Singapore Pte Ltd. Published 2014 by John Wiley & Sons, Singapore Pte Ltd. Companion website: www.wiley.com/go/matuttis

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 66 — #2

i

66

i

Understanding the Discrete Element Method

intensive calculations, real numbers are approximately represented in computers by floating point numbers. To express a floating point number, a base must first be chosen, which nowadays is usually 2 because of the optimal rounding properties (it can be shown that for base 2 the rounding error is minimal). Therefore, we will give the following explanations for base 2. The exponent of a floating point number is a power p of the base, where p is represented by a binary integer as in (2.1). (Additionally, a ‘shift’ is specified to allow for exponents both effectively larger and smaller than 1.) The significant digits are represented by the mantissa, which is a sum of powers of 12 . For example, the bit pattern 010110 . . . for a mantissa translates as follows:



0 ↓  1 1 2

+ 1·

1 ↓  2 1 2

+ 0·

0 ↓  3 1 2

+ 1·

1 ↓  4 1 2

+ 1·

1 ↓  5 1 2

+ 0·

0 ↓  5 1 2

··· (2.2) + ···

The value of the floating point number is the mantissa multiplied by 2 raised to the power of the exponent (plus or minus the shift). The IEEE standard [49] specifies the number of bits reserved for the mantissa and the number reserved for the exponent. For single precision (‘4-byte real’), there are 8 bits for the exponent and 23 for the mantissa; this allows the representation of numbers between about 10−38 and 10+38 to about 7 or 8 digits of accuracy. For double precision (‘8-byte real’), there are 11 bits for the exponent and 52 for the mantissa, which allows the representation of numbers between about 10−307 and 10+307 , to 15 or 16 digits of accuracy. One bit is necessary to express the sign of the number, and additional bits may get lost in some implementations for representations of infinity or results that are ‘not a number’; see Appendix A, in particular § A.9. While a data type may have a certain number of valid digits, internally a processor may work with a higher precision than that of the actual data type to guarantee that the last digit is R , we obtain rounded correctly. For example, when we subtract 3.1415 from π in MATLAB >> pi ans = 3.141592653589793 >> pi-3.1415 ans = 9.265358979293481e-05 Comparing the answer with π = 3.141592653589793115997963 . . . shows that actually 17 digits were used in the calculation. Besides single and double precision, there are other types of extended precision which are not standardized. On some architectures (Intel, AMD) ‘extended precision’ is 10-byte; on other architectures (e.g. DEC Alpha), ‘extended precision’ 16-byte floating point numbers have been implemented. There are several differences between the real numbers of mathematics and the floating point numbers in computers (even though the latter may be declared as real in program headers). First of all, in computer calculations there is the possibility of overflow (when results are too large for the floating point format) or underflow (when results are too small for the floating point format); the latter is less harmful, as the result will be rounded to zero. Another crucial difference is that real numbers are infinitesimally dense (for any two distinct

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 67 — #3

i

i

Numerical Integration of Ordinary Differential Equations

–7

–6

–5

−4

–3

–2

–1

0

67

1

2

3

4

5

6

7

Figure 2.1 Spacing between integers (above) compared with the spacing in a floating point system with 3 bits in the mantissa and 2 bits in the exponent, shifted by 1 (below); see exercise 2.1 b.

real numbers a and b one can always find another real number, e.g. (a + b)/2, between the two), but for floating point numbers there is a finite distance, called ‘machine epsilon’, below which two distinct numbers cannot be resolved. Real numbers which cannot be represented as floating point numbers to the necessary degree of precision (due to having too large a number of decimal digits) must be rounded. For this reason, integrators that are able to vary the timestep adaptively (see § 2.3.1) might terminate with an error message if the time increment falls below a critical value (about 10−14 ). While any integer n is separated from the next one, n+1, by the same distance 1, the distance between successive floating point numbers increases with their magnitude because of their representation via mantissa and exponent; see Figure 2.1. As intermediate results must, in general, be rounded to the next floating point number, rounding errors increase with the magnitude of the calculation results. Thus, multiplication by a large number (or division by a small number) will lead to a larger relative error due to rounding error; so such operations should be avoided as much as possible. In algorithms, systematic rearrangement of floating point operations to avoid division by small numbers is called pivoting (the term is not related to the ‘pivoting friction’ discussed in Chapter 3); see Exercise 2.1.c. By a proper choice of units, one can try to prevent quantities from deviating too much from unity, thus avoiding the problems of working with very large or very small numbers. In planetary simulations, the radius of the Earth’s trajectory, the Earth’s mass, and its time of revolution around the Sun can be used as convenient reference units; see, for instance, Garcia [1]. Similarly, in granular simulations with mono-disperse round particles, the average particle mass is set to unity, as well as the theoretical collision time [2]. Nevertheless, doing this does not necessarily guarantee sufficient accuracy in the calculations, or even favorable rounding properties. Later, in § 2.6.1, we will discuss a case where the time-step has to be set to unity. Apart from these cases, for macroscopic grains it is preferable to use SI units to avoid mistakes in converting between real-world data and simulation results. Because of the irregular spacing among floating point numbers, there is no translation invariance for interaction computations with computer arithmetic. Depending on the absolute location of particle pairs in a Cartesian coordinate system, the result may vary even if the relative location is the same. For example, in an overlap computation involving two particles of radius 1, when the center is close to the origin, an overlap of 1/1000 will give about 15 − 3 = 12 valid digits; with the center at (1000, 1000), only 15 − 3 − 3 = 9 valid digits can be expected—so shifting the center may lead to different results.

2.1.2

Big-O notation

In the following, we will often take into account only the largest power of a small increment τ < 1 or a large increment n > 1. A term which has its highest power p in a variable τ is

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 68 — #4

i

68

i

Understanding the Discrete Element Method

conventionally expressed in big-O notation (also known as Landau notation or Bachmann– Landau notation) as p O(τ ).

This notation allows us to ignore constant pre-factors of τ p , so p p O(kτ ) = O(τ ).

(There is also a ‘little-o’ notation, but we do not need it here.) When two functions of orders p and q are added, where p > q, one has p q p O(τ ) + O(τ ) = O(τ ).

When we are specifically considering the τ -dependence of an error, instead of O(τ n ) we will write (τ n ). Another application of big-O notation is in classifying the complexity of algorithms. In that case, the variable is an integer n greater than 1, which indicates the number of elements that the algorithm deals with. As well as integer powers of n, the logarithm O(log(n)) is also useful, as it signifies growth slower than that of O(n). Additionally, O(n log n) often comes up in texts on computer science, but its relevance is rather hypothetical: even for a huge number like n = 1015 , this would only result in O(15n), which to all intents and purposes is O(n). Finally, the exponential order, O(exp(n)), is important, as it grows faster than any polynomial order. In the context of ordinary differential equations, the term ‘order’ is used with two totally different meanings: One is the order of the differential equation, which means the highest derivative, so d y + ay = 0 dx d y + ay = 0 y+b dx d y+b y + ay = 0 dx b

d2 dx 2 d3 d2 d 3 y+c 2 dx dx c

is a first-order differential equation, is a second-order differential equation, is a third-order differential equation,

and so on. Newton’s equation of motion F = mx¨ (where x¨ = d2 x/dt 2 ) is a second-order equation in time. On the other hand, when discussing numerical solutions of ordinary differential equations, approximations with time-step τ may be indicated by the order n and the accuracy O(τ n ). A higher order indicates higher accuracy, at least if certain conditions are met. The order of the approximation and the order of the differential equation are unrelated to each other. One might encounter a first-order differential equation solved using a fourthorder approximation, or a second-order differential equation approximated in second order, and many other combinations. In this chapter, we will discuss the merits of different kinds of approximations, but one should keep in mind that the order of the approximation is not the only quantity which may determine the usefulness of an approximation.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 69 — #5

i

Numerical Integration of Ordinary Differential Equations

2.1.3

i

69

Relative and absolute error

˜ The The error is the deviation between an exact value X and an approximated value X. absolute error (irrespective of whether it is due to rounding or truncation) is ˜ abs = |X − X|, and the relative error is rel

   X − X˜    = .  X 

Both are usually defined with the norm or absolute value | · |, because in general only the magnitude and not the sign (direction) of the deviation is relevant. In efficiency–accuracy diagrams, the effort (computer time or number of function evalutions) is plotted against the error on a double-logarithmic scale, so negative values must be avoided. Nevertheless, in some applications the absolute value may be dropped, for example when two low-order algorithms are combined to obtain a high-order algorithm via error compensation; see the ‘Composite Simpson’ integrator in Exercise 2.4(d). Integer divisions (where the remainder is truncated) lead to a constant order of the absolute error: 5/7 = 0 + O(1), 50/7 = 7 + O(1), 500/7 = 71 + O(1), . . . . In contrast, floating point divisions lead to a constant relative error: 5/7 = 0.714285714285714

(15 valid digits),

50/7 = 7.142857142857143

(15 valid digits),

500/7 = 71.428571428571431 (15 valid digits), . . . . This vindicates the choice of floating point numbers for the description of scientific problems: the number of valid digits does not change when the magnitude changes, and it is the number of valid digits which is relevant in comparisons with experiments. The absolute error abs and the relative error ref can be specified in so-called adaptive algorithms, which control the accuracy of the numerical evaluation during the computation. When the exact value X is close to zero, prescribing the relative error rel is not advisable. On the other hand, when the result is very large, it is better to prescribe only the relative error.

2.1.4

Truncation error

‘Rounding errors’ occur when in calculations real numbers (which have theoretically infinite precision) are replaced by numbers and operations with only a finite number of digits. Consider the calculation

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 70 — #6

i

70

i

Understanding the Discrete Element Method

>> 3/7 ans = 0.428571428571429 The fraction should have periodic decimals, i.e. the pattern . . . 428571 . . . should keep repeating, but the last digit in the computer output is obviously rounded. ‘Truncation errors’, on the other hand, arise from expressions in algorithms being shortened, resulting in a loss of accuracy. The exponential function has the following infinite expansion around x = 0: exp(x) = 1 +

1 1 1 x + x2 + · · · + xn + · · · 1! 2! n!

When the truncated series exp(x) ≈ 1 +

1 1 x + x2 1! 2!

is used in computations, its deviation from the exact representation of exp(x) is the truncation error. While rounding errors affect numbers, truncation errors affect algorithms and representations. When a mathematical process is defined for p (possibly infinite) iterations or terms, but for computational reasons only p˜ < p iterations can be performed, this generates a truncation error (sometimes also called ‘discretization error’). The accuracy of the resulting approximation is often described by a Taylor series, as for the Romberg integration in Exercise 2.4. Not for every discretization scheme is it possible to derive a description of the order via Taylor expansions. One notable exception is Gauss quadrature, as implemented in the R function quadgk (see Exercise 2.4.f), where ‘higher order’ means more inteMATLAB gration points, which nevertheless cannot be expressed by a ‘Taylor order’. The accuracy of Ritz–Galerkin (‘finite element’) methods cannot be described by Taylor expansions either: these methods use piecewise-continuous polynomials between points x1 , . . . , xi , xi+1 , . . . . While the solution is close to exact at the points xi , around these points no Taylor expansion in polynomials with different coefficients is possible. To classify the order of accuracy, we use the usual definition from mathematics, namely that a polynomial of degree p in the approximation parameter τ is an approximation of order p. Therefore, if a method f (x + τ ) is accurate up to order p, the order of the error will be larger, say p + δ with δ > 0 (often δ = 1 is assumed, but empirically smaller δ values are found). In practice, it is easier to investigate the order of the error than the order of the method: if we plot the deviation from the exact value F (τ ) of the result of the numerical method F˜ (τ ) for τ ∈ [0, t] on a double logarithmic (log-log) graph, we can fit a line to the plot and find its slope: log |F (τ ) − F˜ (τ )| = p + δ. log(τ ) This means that the error is proportional to a power of τ with exponent p + δ (where 0 < δ ≤ 1), and we will write this order of the error as (τ p+δ ) to distinguish it from the order of accuracy, which we express using big-O notation. Thus

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 71 — #7

i

Numerical Integration of Ordinary Differential Equations

f (x + τ ) =

∞  f (n) (x) n=0

n!

τ = n

71

p  f (n) (x) n=0

i

n!

τ n + (τ p+δ ).

(2.3)

It is often assumed that δ is 1, but for actual numerical data this is often not the case, except for the most primitive examples (approximation of functions). In the case study in § 2.4.7, we shall present two methods of the same order of accuracy (i.e. derived by maintaining the coefficients for the same order of τ ), but different (non-integer) error orders. In general, integrators for ordinary differential equations will lead to different orders of accuracy for different variables; see § 2.4.1 for an integrator which has different orders of accuracy for positions and velocity, and § 2.3 for a class of integrators with the same accuracy for the velocity and positions but lower accuracy for the energy. Here is a simple example of how different quantities obtained from the same formula can have different orders of accuracy. For the approximation of derivatives in differential equations, we will use, among other approaches, finite differences; for example, a first-order finite difference approximation for the second derivative is fi−1 − 2fi + fi+1 d2 f = + (τ 2 ). 2 dx τ2

(2.4)

To make fi+1 the subject of Equation (2.4), we multiply through by τ 2 , which raises the order of the error term from (τ 2 ) to (τ 4 ): fi+1 = 2fi − fi−1 +

d2 f 2 τ + (τ 4 ). dx 2

(2.5)

Accordingly, the computation of fi+1 is accurate to third order. There is another definition of the error, which conflicts with Equation (2.3) and is used, for instance, in the computer algebra R [3] as well as in some textbooks (e.g. [4]). It defines the order of a method package MAPLE to be the order of its error, so that a polynomial of degree p would be an approximation of order p + 1 to itself. This terminology is commonly used (according to Saha et al. [5]) with regard to second-order partial differential equations in the engineering and physics literature, but is not prevalent in the mathematics literature. Leonard [6] argued that the origin of the definition of order as error order is a confusion between the errors for finite differences, such as in Equation (2.4) above, and the associated equations for variable computation, such as (2.5) in our discussion.

2.1.5

Local and global error

For Equation (2.3), we defined the error for values of τ ∈ [0, t]. When t is on the order of τ, we call the error in (2.3) the local error; if t is the total length of the interval of interest (which may be much bigger than τ ), the error is the global error. For the global error, the effect of rounding errors may be much more noticeable than for the local error. First, let us assume for simplicity that the local error is only a truncation error, as discussed in the previous subsection. In that case, we can divide the total time interval [0, t] into N steps, so that τ=

t . N

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 72 — #8

i

72

i

Understanding the Discrete Element Method

(a)

Solution

εloc

τ εtot Exact solution Accumulated local error Time interval

(b)

Solution

εloc

τ

εtot

Exact solution Error compensated accumulation Time interval

Figure 2.2 Total error: (a) due to accumulation of local errors; (b) with error compensation of the local errors. The actual situation will usually lie between these two extremes.

For accuracy of order p, the truncation error for a single time-step will be   loc,trunc =  τ p+δ .

(2.6)

If there are no additional constraints, the most natural assumption is that for a time span t = N τ, the global error will be N times the accumulated local error. The global error is then       N  1  glob = N  τ p+δ = t  τ p+δ = t  τ p+δ = t τ p−1+δ , t τ

(2.7)

so the order of accuracy has been reduced to O(τ p−1 ), as illustrated in Figure 2.2(a). Of course, ideally the local errors in successive time-steps would cancel each other out at least partially, so that overall the order is not lowered due to additional constraints. In that ideal case, we would have   (2.8) glo,benign = t τ p+δ , so the global error would be the same as the local error, but with a pre-factor proportional to the integration time, as shown in Figure 2.2(b). In practical case studies, we will see that the error order is usually not an integer, either for the global error or for the local error. Moreover, the global error is not necessarily larger by a whole order, but may lie between the estimates in Equation (2.7) and Equation (2.8). As we shall see in the case studies, the error order represents only the change in the error when the time-step is changed; it cannot tell us whether the integrator represents the behavior of the physical system appropriately. We will encounter a case where an integrator from one family gives a physically more meaningful result than an integrator from a different family of the same order; see § 2.4.7.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 73 — #9

i

Numerical Integration of Ordinary Differential Equations

Worse accuracy

i

73

Error

Total error Optimal time-step τopt

Rounding error

Better accuracy

Larger τ

Worse accuracy

Number of time-steps N per time interval

Smaller τ

Total error Error

Better accuracy

Truncation error ε (1/Np)

Total error Rounding error Larger τ

ε (1/Np+1)

Truncation error ε (1/Np) Number of time-steps N per time interval

Smaller τ

Figure 2.3 Global error as the sum of rounding and truncation errors: the upper panel shows the timestep τopt that gives the minimal total error; the lower panel shows how reduction of the minimal total error can be achieved by reducing the truncation error.

Our next consideration pertains to the influence of the rounding error. Obviously, this error can only come from adding up local rounding errors k˜ during the N time-steps: ˜ glo,round = N k. This implies that the rounding error increases with the number of time-steps, so reducing the truncation error by increasing the number of time-steps will also have the effect of increasing the global rounding error. Ideally, the rounding error would only manifest in the last digit. However, for particle simulations, forces in opposite directions of approximately the same magnitude could add up, which may magnify the error beyond the last digit. If the influence of this error becomes significant, it could be reduced by appropriate sorting of the terms; see [7] and Exercise 2.2. The total global error, i.e. the sum of the global rounding error and the global truncation error, will be    1 . (2.9) glo,total = N k˜ +  τ p = N k˜ +  Np Accordingly, the minimal total global error is not reached for minimal τ or maximal number of time-steps N; for a given rounding error and a given order p for the global truncation error, there is a τopt which will be optimal in the sense of leading to minimal total global error. Both smaller and larger time-steps will lead to a larger total global error; see the upper panel of Figure 2.3. If the global error is still too large for this optimal τopt , the only way to reduce it is to use a discretization with a higher power of the truncation error; see the lower panel of Figure 2.3. A hand-waving strategy for improving numerical results In numerical simulations, solutions of mathematical problems involving real numbers and, often, limit processes have to be represented using floating point numbers and finite processes.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 74 — #10

i

74

i

Understanding the Discrete Element Method

As the information contained in floating point numbers and discrete approximations is limited, to obtain numerical solutions that are ‘better’ in terms of accuracy and stability, one needs to use ‘more numbers’. This can mean using more digits in each number (double instead of single precision), smaller step-sizes (so that values are computed at a larger number of time-steps), higher-order approximations (i.e. more coefficients), and so on.

2.1.6

Stability

The minimum requirement for a numerical integrator is consistency, which means that in the limit as the discretization parameter τ tends to zero, the original differential equation should be recovered. For a numerical solution of a differential equation, stability is defined [8, § I.13] in a similar way to the Lyapunov stability for mechanical systems in § 1.7.2: when the numerical method produces a deviation from the exact solution which is not exponential, the method is said to be stable. We will not go into the details of the conventional proofs (which involve computation of the eigenvalues of the linearized Jacobian); the interested reader is referred to the literature on numerical integrators, in particular [8]. In numerical analysis, it is the domain of stability (loosely speaking, the maximal time-step τmax ) which is of greatest interest. For different integrators, proofs of stability have to be conducted in different ways; see, for example, the book by Hairer and Wanner [9], where stability-related issues take up a whole column of the index. For discrete element methods, our main interest is in stability as a property of integrators that will allow us to use larger time-steps while being reassured that the behavior of the simulation stays physically meaningful. As with the accuracy proofs, stability proofs typically assume a smooth and usually differentiable ‘right-hand side’ function (which is the force in mechanics problems). For this reason, proofs in the mathematical literature generally do not apply exactly to DEM forces, which are finite when the particles are in contact and zero otherwise. When the contact is closing, there is usually a discontinuity at least in the higher derivatives; this may be detected by adaptive integrators of high accuracy, which ‘lock’ and can’t continue due to inconsistencies in the error estimator; see § 2.7.3.

2.1.7

Stable integrators for unstable problems

In the previous subsection we discussed the numerical stability of integrators for stable physical problems, but the question remains as to what happens when one applies numerically stable integrators to physically unstable problems. In fact, what really matters is whether the instability in the system is related to the observables one wants to compute. For weather forecasts, the observables of interest (e.g. rain or sunshine) are indeed connected to possible instabilities (trajectories of neighboring low- or high-pressure regions), so in that case the results would be unreliable. For granular materials, the trajectories (positions and orientations) of individual particles may indeed be unstable, but in experiments hardly anyone investigates individual trajectories of many thousands of particles anyway (which would also be unstable); instead, the quantities of interest are angles of repose, density distributions etc., which do not depend on individual trajectories but rather on the many-body character of the system, and this, at least in the presence of finite friction, is characterized by quite regular, stable behavior. However, when trajectories of a system with Lyapunov exponents are computed [10], above a certain order (greater than 2 for both positions and velocities) there are no differences in the

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 75 — #11

i

Numerical Integration of Ordinary Differential Equations

i

75

methods in terms of accuracy of the trajectories or velocities; while the investigation of other observables, such as conservation of energy, might benefit from using higher-order methods, these are not of any help for the accuracy of the trajectories.

2.2

Numerical analysis for ordinary differential equations

A naive answer to the question ‘What properties should an integrator have?’ is: good accuracy. While this sounds rather natural, we will see that ‘accuracy’ may not be the same as a high order of approximation, and that for particulate systems, stability is a much more desirable property than high approximation order. While the solutions of differential equations with given initial conditions are unique (if they exist), there is nothing unique about the corresponding numerical discretizations with finite time-step τ, as indicated by the sheer extent (over 1800 pages) of the three volumes on the subject by Hairer and co-authors [8, 9, 11]. Here we give a short overview of the different approaches used to construct integrators, but keep in mind that our main selection criterion will be reliable behavior for the not-exactly differentiable interaction laws common in the discrete element method, which are not generally considered in the numerical analysis literature.

2.2.1

Variable notation and transformation of the order of a differential equation

Historically, in mathematical texts, x is used to represent the independent variable while y represents the dependent variable, so that the first derivative is written as d y = y, dx

(2.10)

with a prime  denoting differentiation. In the physics literature, the independent variable is usually t (time), and x (which is different from the ‘x’ in (2.10)) is used to represent the dependent variable; the first derivative is then written as d x = x, ˙ dt with a dot ˙ denoting differentiation. As in more advanced textbooks (e.g. [8]), we will use the same notation for vector quantities as for scalar ones, without arrows or boldface fonts. This allows us (among other things) to denote scalar position by x and also use it as a component in a vector y. With a few notable exceptions (which we will discuss in § 2.4 on symplectic methods and § 2.6.1 on backward difference methods), the majority of methods covered in the numerical analysis literature are for first-order equations. In contrast, in classical mechanics, Newton’s equation of motion F = mx¨ is of second order and therefore has to be transformed to a first-order system before the methods can be applied. This is done by defining the derivative of the ‘old’ dependent variable as a ‘new’ variable; for Newton’s equation, we define v = x˙ and obtain two first-order equations x˙ = v, v˙ = F /m.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 76 — #12

i

76

i

Understanding the Discrete Element Method

(Note that the number of equations times the order of the system must remain constant, in this case 2.) In the following x will be used to represent the Cartesian coordinates of the system, x = (x1 , x2 , . . .); for many-particle systems, x may contain the positions of all the particles and therefore have more than three coordinates. We use v = (v1 , v2 , . . .) to denote the velocities of the respective coordinates in x, and y = (x, v) will stand for the solution vector made up of the positions and velocities, y = (x1 , x2 , . . . , v1 , v2 , . . .). We will use the R , with independent variable t and dependent variable y. hybrid notation of MATLAB

2.2.2

Differences in the simulation of atoms and molecules, as compared to macroscopic particles

The field of molecular dynamics, which studies systems composed of atoms and molecules, is similar to the field of discrete element methods in that both use particles in their simulations. In molecular dynamics, the interaction in the potentials varies smoothly over the diameter of a particle, and often differences in the behavior of various integrators are barely noticeable. On the other hand, for granular particles and other particulate systems in which forces vary strongly within the diameter of a particle, the range over which these macroscopic particles interact is much smaller than the distance between their centers of mass. For the solution of differential equations that model such systems, stability becomes more of an issue than accuracy. Related to this is the issue of stiff ordinary differential equations, to which we give an introduction in § 2.5. For the discrete element method, we recommend the integrators in § 2.6. Readers interested only in the implementation of DEM simulations may skip directly to that section.

2.2.3

Truncation error for solutions of ordinary differential equations

In classical mechanics, the order of accuracy of a numerical method for solving ordinary differential equations is defined for the coordinates and the corresponding velocities. A numerical solution method does not necessarily have the same accuracy for the positions and the velocities. Suppose that in an interval from t to t +τ the error in the coordinates is r = O(τ p ) for some positive integer p; then if the velocities are computed from the positions as v(t) = =

x(t + τ ) − x(t) + O(τ p ) τ x(t + τ ) − x(t) + O(τ p−1 ), τ

they will be affected (assuming no other conditions hold) with an error of v = O(τ p−1 ). The error order is defined for the coordinates and velocities and is in principle unrelated to the error in the energy. In general, whether and how the (local and/or global) error in coordinates and velocities affects the error in the energy depends on the particular numerical integration method. For symplectic systems (discussed in § 2.4), the error for the energy is one order higher than the error in the positions (see § 2.4.5), but in general no relations can be given.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 77 — #13

i

Numerical Integration of Ordinary Differential Equations

2.2.4

i

77

Fundamental approaches

There are two obvious approaches to developing numerical solution methods for ordinary differential equations: one is via integration, another is via differentiation. A first-order differential equation can be written for vectorial y and f (y, t) as dy = f (y, t). dt

(2.11)

This means that the rate of change (gradient) of y with respect to time is equal to the righthand side f (y, t), a notion which is important to the geometric interpretation discussed in the following. Equation (2.11) can be integrated formally to give

y(t) = f (y, t) dt. (2.12) With appropriate initial conditions and a numerical integration scheme for f (y, t), the time evolution of y(t) can be obtained. This formulation is the reason that solvers for ordinary differential equations have traditionally been called ‘integrators’. To highlight the features of integrators for particle simulation, however, it is more useful to take the differentiation approach, where one rewrites Equation (2.11) using finite differences with increments symbolized with a capital .

2.2.5

Explicit Euler method

If we replace the differential increments dy and dt in Equation (2.11) with finite differences y = y(tn+1 ) − y(tn ) and t = tn+1 − tn = τ over successive time-steps tn and tn+1 , we obtain the Euler method y y(tn+1 ) − y(tn ) = = f (y(tn ), tn ). t τ

(2.13)

This can be rearranged to give a formula for computing y(tn+1 ), y(tn+1 ) = y(tn ) + τf (y(tn ), tn ),

(2.14)

which is called the explicit ‘Euler step’ and forms the basic element of many integrators. Despite its simplicity and widespread use as a building block for numerical methods, the explicit Euler method (2.14) is locally only first-order accurate and globally only zero-order accurate. Due to the possibility rounding errors accumulating, as outlined in § 2.1.5 on the total error, (2.14) should never be used in ‘serious’ applications. As the right-hand side f (y(tn ), tn ) in (2.13) is actually the gradient of the solution trajectory at time tn , Equation (2.14) can be read out as: ‘The new variables are equal to the old variables plus the old gradient times the time-step τ ’. Geometrically, the curved trajectory of the true solution from tn to tn+1 is replaced with a straight line, as sketched in the upper panel of Figure 2.4. Over many time-steps, a curved trajectory will be replaced with a polygonal trajectory. Methods of higher order (than the first-order Euler method) can be thought of as approximations of the true solution trajectory by higher-order curves. When the additional points needed to fit these curves are obtained from time-steps previous to tn , the methods are called multi-step methods; if the points are generated after the current time-step tn , the methods are called one-step methods.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 78 — #14

i

78

i

Understanding the Discrete Element Method

y (tn) + τ f (tn)

f (tn)

y (tn + τ) y (tn)

y (t) tn + τ

tn

f (tn + τ)

y (tn+ τ) y (tn) + τ f (tn+ τ)

y (tn) y (t) tn + τ

tn

Figure 2.4 Schematic representation of the explicit Euler method (above) and the implicit Euler method (below); the explicit scheme uses the ‘forward’ right-hand side f (tn ) (the slope at tn ), while the implicit scheme uses the ‘backward’ right-hand side f (tn + τ ) (the slope at tn + τ ).

2.2.6

Implicit Euler method

In the explicit Euler method, one replaces the right-hand side f (y(t), t) of the differential equation (2.11) with f (y(tn ), tn ), evaluated at the discrete time tn . But we could also have chosen to evaluate f (y(t), t) at tn+1 = tn + τ , i.e. the next time-step. In that case, one obtains the implicit Euler method y(tn+1 ) = y(tn ) + τf (y(tn+1 ), tn+1 ),

(2.15)

which is also of first order. It can be read out as: ‘The new variables are equal to the old variables plus the new gradient times the time-step’. This scheme is illustrated in the lower panel of Figure 2.4. The only problem is that at time tn , we don’t yet know what f (y(tn+1 ), tn+1 ) is supposed to be. To get around this obstacle, one could try to solve the difference equation (2.15) as a nonlinear system; alternatively, one could approximate the unknown f (y(tn+1 ), tn+1 ) by a preliminary f˜(y(tn+1 ), tn+1 ) computed via an explicit ‘predictor step’, and then use f˜(y(tn+1 ), tn+1 ) to obtain a better approximation (‘corrector step’). The scheme (2.14) is also referred to as ‘forward Euler’, because one uses information at tn to compute ‘forward’ towards tn+1 ; Equation (2.15), on the other hand, is the ‘backward Euler’ method. Explicit and implicit methods of the same order, such as the forward and backward Euler methods, usually differ in their stability properties; for implicit methods, larger timesteps can generally be used. However, the computational effort needed for implicit methods is usually higher.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 79 — #15

i

Numerical Integration of Ordinary Differential Equations

2.3

i

79

Runge–Kutta methods

Runge–Kutta methods are among the most commonly used numerical methods for solving ordinary differential equations. But, as will become clear, they are not very suitable for dealing with particle simulations. Suppose that at time tn we have computed the numerical solution yn ; then over a time-step τ, an (explicit) Runge–Kutta method is obtained by taking several Euler steps over sub-steps c1 τ, c2 τ, . . . , with coefficients 0 ≤ ci ≤ 1, within the interval [0, τ ]. Over these sub-steps, a sequence of gradients of Euler steps is computed: k1 = f (tn , yn ),   k2 = f tn + c2 τ, yn + τ b2,1 k1 ,   k3 = f tn + c3 τ, yn + τ (b3,1 k1 + b3,2 k2 ) , .. .   km = f tn + cm τ, yn + τ (bm,1 k1 + bm,2 k2 + . . . bm,m−1 km−1 ) .

(2.16) (2.17) (2.18)

.. .

(2.19)

Each of these gradients depends on one or several of the previous gradients, weighted by the coefficients bi,j , which can be zero or negative. To advance the solution to the next time-step, we use a weighted average of the gradients ki , with coefficients ai : yi+1 = yi + τ (a1 k1 + a2 k2 + · · · + am km ). (2.20) The ai can be zero or negative, but must satisfy ai = 1. The coefficients ci , bi and ai are usually represented in so-called Butcher tableaus, like Table 2.1. Generally speaking, up to order 4, four gradients are necessary; for higher orders, the number of necessary gradients is usually greater than the order. Runge–Kutta methods are one-step methods, as no function evaluations from before tn are used. In implicit Runge–Kutta methods, the solution and the gradients depend on each other in the same time-step; see the bottom row of Table 2.1. This means that to obtain the values at the next time step, it is necessary to solve a nonlinear system involving several variables. While in principle implicit Runge–Kutta can be more stable than explicit schemes for forces whose time evolution is a differentiable function, for discrete element methods with their ad-hoc or non-smooth force laws, this stability is often not guaranteed. Thus, we will not be discussing implicit Runge–Kutta methods any further. Runge–Kutta methods are very useful and versatile for computing solutions to mathematically well-posed ordinary differential equations at points (t, y) for which the ‘neighboring’ point (t + δt, y + δy) always exists. However, in particle simulations, computing and retaining intermediate gradients ki as well as ‘neighboring’ positions may actually be unphysical, especially for intermediate gradients ki with negative pre-factors bi,j . Moreover, for particle simulations one has to deal with changing neighborhoods, which makes the implementation of sub-time-steps additionally cumbersome.

2.3.1

Adaptive step-size control

Equations (2.16)–(2.20) for the Runge–Kutta method make use only of information in the time-step considered, so they are so-called one-step methods. Because at each step no

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 80 — #16

i

80

i

Understanding the Discrete Element Method

Table 2.1 Butcher tableaus for various explicit Runge–Kutta (‘one-step’) methods, as well as for the implicit Euler method and general implicit Runge–Kutta methods. c...

b...

Order

Method

a... 0

Euler, explicit 1

0 1

Heun 1 1/2

0 1/2 3/4 1

0 1/2 1/2 1

0 c2 c3 .. . cm 1

1

1/2

2 Bogacki– Shampine (Embedded method)

1/2 0 2/9

3/4 1/3

4/9

2/9 7/24

1/3 1/4

4/9 1/3

2 3 Classical Runge–Kutta

1/2 0 0

1/2 0

1

1/6

1/3

1/3

b2,1 b3,1 .. . bm,1 a1

0 1/8

1/6

4 General (explicit) Runge–Kutta

b3,2 ..

bm,2 a2

. ... ...

bm,m−1 am−1

am

1

Euler, implicit

1 0 c2 c3 .. . cm

≤m

1

b2,1 b3,1 .. . bm,1

b2,2 b3,2 .. . bm,2

... ... .. . ...

bm,m−1 bm,m−1 .. . bm,m−1

a1

a2

...

am−1

General implicit Runge–Kutta

am

≤m

information from previous time-steps is needed, one can change the size of the time-step τ from one step to the next. ‘Embedded’ methods (see Table 2.1) use two different orders, O(p) O(p) O(p+1) and yn+1 . The deviation between and O(p + 1), to compute respective solutions yn+1 (p)

(p+1)

O O these numerical solutions, yn+1 − yn+1

, is used together with the exponents p and p + 1

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 81 — #17

i

Numerical Integration of Ordinary Differential Equations

i

81

of the order to estimate the error in the solution. Such an embedded method is determined to be of order p or p + 1, depending on the order which is used to continue the computation. In methods with adaptive step-size, one can input the desired upper bound of the error per time-step; according to the error estimate, the time-step is reduced to τ˜ = ared τ, ared < 1, to improve the accuracy, or increased to τ˜ = ainc τ, ainc > 1, to improve efficiency. Usually, the pre-factors for increasing the time-step are smaller than the inverses of the pre-factors by which time-steps are reduced, i.e. ainc < 1/ared by a ‘safety margin’. If there is too large a deviation between y(t + τ )O(p) and y(t + τ )O(p+1) from t to t + τ , the solution step is rejected; τ is then replaced with a smaller step-size τ  < τ, and the computation starts again at y(t + τ )O(p+1) . For both accepted and rejected time-steps, function evaluations take place, so obtaining the output of intermediate valid results (i.e. those from accepted time-steps) is not straightforward, and some workarounds are necessary; see Exercise 2.12. While these methods may be unsuitable for many-particle simulations due to the large number of intermediate function evaluations—which on top of that may be rejected— they are very efficient tools for testing force laws and modeling differential equations: if the solution turns out to be unstable, or if the time-step becomes inexplicably small, this is an indication that there might be a problem with the smoothness of the solution; see Exercise 2.13. Other possibilities are that the force law is mathematically inconsistent, or even ‘unmathematical’, as in Exercise 2.14, where no smooth solution is possible. If such tests give unsatisfactory results, the corresponding force laws had better not be used for many-particle simulations. If the accuracy of a simulation is mostly affected by the velocity, then inputting very stringent error tolerances for the particle positions may not improve the results. Therefore, many ODE integrator packages allow independent specification of the accuracy for each variable. In R , the relative or absolute accuracy must be either a scalar or a vector of the same MATLAB dimension as the vector for the initial condition.

2.3.2

Dense output and event location

Pointwise solutions y(tn ), y(tn+1 ), . . . obtained from Runge–Kutta methods, together with the intermediate gradients k1 , k2 , . . . , allow a relatively easy, piecewise-continuous expansion of the solution between discrete times tn and tn+1 based on the coefficients of the integrator: y(t) = a0 + a1 t + a2 t 2 + · · · + ai t i .

(2.21)

This polynomial can be used to obtain intermediate points for graphics, if the original data points are too widely separated to produce curves that look continuous to the eye. This technique is used for trajectories of free flight computed with higher-order Runge–Kutta methods with adaptive time-steps; see Appendix A, in particular § A.8. A more serious application in the area of particle simulation is to do with ‘event location’, which means that when a certain condition for y(t˜) is met, the computation can be adjusted accordingly, for instance by R , applying another force function. Such is the case in the ballode example of MATLAB where a bouncing ball is simulated by free flight, until there is contact with a wall between two time-steps tn and tn+1 , identified as a zero-crossing at tn ≤ t˜ ≤ tn+1 of the polynomial for the position as expressed via the dense output (2.21). When this happens, the integration is terminated at time t˜ and restarted with the velocity vector reversed (pointing upward instead

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 82 — #18

i

82

i

Understanding the Discrete Element Method

of downward). This approach can in principle be used for any event-driven simulation where analytical prediction of the collision times is not possible due to the potentials in which the particles move.

2.3.3

Partitioned Runge–Kutta methods

Runge–Kutta methods were constructed for first-order differential equations, so Newton’s equation of motion always has to be rewritten in the form of a first-order system as explained in § 2.2.1; then, for each coordinate, two solutions must be computed, one for the position and one for the velocity. The error analysis for Runge–Kutta methods is very advanced, so in recent years there has been a tendency to analyze other methods that are not one-step methods and which treat, e.g., velocities differently from positions in the framework of Runge–Kutta methods. This leads to the concept of ‘partitioned Runge–Kutta methods’ [11], which involve a different error analysis for each different component of the solution.

2.4

Symplectic methods

The same concepts discussed above for time integrators of ODEs can be applied to symplectic systems, but with the additional requirement that the flow of the ODE, and that of the approximation by the time integrator, must be conserved. Thus, Liouville’s theorem (see § 1.4) must hold also for the numerical approximation, within the accuracy of the integrator. Accordingly, symplectic solvers are integrators which conserve the flow of symplectic ODEs even for finite time-steps. While systems of macroscopic particles are actually never symplectic, symplectic solvers have been applied to granular materials (usually inappropriately, as there is no consistent way to include velocity-dependent forces; see the discussion at the end of § 2.4.2), because their use is pretty common in particle modeling on the molecular scale. There are also examples of their use in the context of non-dissipative discrete element methods (e.g. ‘Alder systems’ [12, 13]). Symplectic solvers conserve energy over long time-scales much better than do non-symplectic solvers. However, energy conservation should not be used as a criterion of accuracy for non-symplectic systems; accuracy is defined for the variables of the ordinary differential equation (positions and/or velocities), not for complicated compositions of such variables like the total energy. For the class of Verlet and velocity-Verlet schemes we will outline in this section, transformation of Newton’s equation of motion to a first-order system is not necessary: the methods work directly for second-order differential equations.

2.4.1

The classical Verlet method

The Verlet scheme, also known as the classical Verlet scheme or the St¨ormer–Verlet scheme (as St¨ormer [14] had introduced the method half a century before Verlet [15] did) computes the new position x(tn+1 ) from the current position x(tn ), the current acceleration a(tn ), and the position in the previous time step, x(tn−1 ), using the following formula: x(tn+1 ) = 2x(tn ) − x(tn−1 ) + τ 2 a(tn ).

(2.22)

Since not only values from the current time-step from tn to tn+1 are used but also values from a previous time-step, tn−1 , this is a multi-step method. The method is not ‘self-starting’, because at the initial time-step from t0 to t1 , the value of x(tn−1 ) is needed but not known.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 83 — #19

i

Numerical Integration of Ordinary Differential Equations

i

83

Conventionally, to overcome this problem, one first computes x(t1 ) from x(t0 ) with a onestep method (e.g. a Runge–Kutta method) or a velocity-Verlet method (see § 2.4.2), which are all self-starting. Preferably a method should be chosen which is of at least the same or higher order than the non-self-starting method, to avoid the introduction of errors already at the beginning of the integration process. When the necessary number of solution steps have been obtained, one can then continue with them in the multi-step method. For many-particle simulations, however, the details of the initial conditions are not very important, in which case one could start the Verlet method by setting, for instance, x(t1 ) = x(t0 ) and then computing x(t2 ), x(t3 ) and so on. The Verlet method itself gives only the positions, from which velocities can be obtained by interpolation. Using forward differences v(tn ) =

x(tn ) − x(tn−1 ) τ

(2.23)

gives the velocities to first order at time tn ; using centered differences v(tn ) =

x(tn+1 ) − x(tn−1 ) 2τ

(2.24)

gives the velocities to second order, but only at time tn+1 . As the velocity in Equations (2.24) and (2.23) is known only one step after the acceleration an has been used according to Equation (2.22), the classical Verlet scheme cannot be used for velocity-dependent forces if one is interested in maintaining second-order accuracy. Manipulating the Taylor series can lead to the wrong conclusion (see, e.g., the Wikipedia entry [16]) that the local order of the Verlet method is 3; in this case, the error propagation in the acceleration has been neglected. The equally wrong conclusion that the global order of accuracy is 1 (based on the erroneous argument that the global error must be one order higher than the local error) comes from neglecting the fact that the worst-case accumulation of hypothetical errors from a Taylor series is unable to capture the global conservation of the flow by symplectic methods. In Figure 2.5 we plot the error order in the harmonic oscillator example, computed for the positions and for the velocities according to both Equation (2.23) and Equation (2.24). For both the positions and the velocities computed from centered differences (second order), the error is approximately of second order. For the velocities computed from forward differences (first order), the local error is approximately of first order, while the global error shows a drift (the corresponding graph in Figure 2.5 is slightly curved; for the time-steps considered, the order averages to about 1.4). This shows that theoretical derivations of the order via Taylor approximations should be taken with a grain of salt.

2.4.2

Velocity-Verlet methods

Integrators of velocity-Verlet type compute alternately the new positions and velocities in sub-steps. For the second-order velocity-Verlet method, the sub-steps are 1 1 v(tn + τ ) = v(tn ) + τ a(tn ), 2 2 1 x(tn+1 ) = x(tn ) + τ v(tn + τ ), 2 1 1 v(tn+1 ) = v(tn + τ ) + τ a(tn+1 ). 2 2

(2.25) (2.26) (2.27)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 84 — #20

i

84

i

Understanding the Discrete Element Method

100

10–1

10–2

10–3

10–4

10–5 Verlet, loc. ε (τ1.9895) in x Verlet, loc. ε (τ1.9888) in v, cent. diff. Verlet, loc. ε (τ1.0109) in v, forw. diff.

10–6

Verlet, glo. ε (τ1.9817) in x Verlet, glo. ε (τ1.9812) in v, cent. diff. Verlet, glo. ε (τ1.4001) in v, forw. diff.

10–7 10–3

10–2

10–1

Figure 2.5 Global and local error of the positions and the velocities computed with forward and centered differences in the time evolution of the harmonic oscillator computed with the classical Verlet method.

v (tn)

v (tn + 21 τ)

v (tn + 1)

v (tn + 1 + 21 τ)

v (tn + 2)

a (tn)

a (tn + 1)

a (tn + 2)

x (tn)

x (tn + 1)

x (tn + 2)

Figure 2.6 Sketch of the evaluation order for the velocity-Verlet integrator given by Equations (2.25)– (2.27); both the velocities and the positions are available at each integer time-step tn , tn+1 , tn+2 , . . . .

As the accelerations are position-independent (nothing else makes sense for symplectic integrators), the algorithm is explicit; see Figure 2.6. Like the classical Verlet method, the accuracy for the position variable is second order; but, in contrast to the classical Verlet method, the accuracy for the velocity is also of second order, which is a significant advantage. One can compute empirical Lyapunov exponents for multi-body simulations by computing the divergence of trajectories or the velocities of systems with slightly different initial conditions.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 85 — #21

i

Numerical Integration of Ordinary Differential Equations

v (tn – 21 τ)

i

85

v (tn + 1 + 12 τ)

v (tn + 21 τ) a (tn)

a (tn + 1)

x (tn)

x (tn + 1)

Figure 2.7 Sketch of the evaluation order for the leapfrog integrator given by Equations (2.28)–(2.29); velocities and positions are never available simultaneously.

In this case, the classical Verlet method may give different Lyapunov exponents for computations using only positions as compared to computations that use the velocities [10]; due to the different error orders, the data for the velocities will be influenced by a large truncation error. For the velocity-Verlet method (and its higher-order variants discussed in the next subsection), the data can be expected to be consistent. A further advantage of the velocity-Verlet method (and its variants) is that it is self-starting: for a given initial velocity v(t0 ) at position x(t0 ), no information from previous time-steps is necessary to start the program. A variant of the velocity-Verlet method is the leapfrog method (see Figure 2.7) 1 x(tn+1 ) = x(tn ) + τ v(tn + τ ), 2 1 1 v(tn + τ ) = v(tn ) + τ a(tn ), 2 2

(2.28) (2.29)

which, in a manner of speaking, lumps together the computations of Equations (2.25) and (2.27) in the second-order velocity-Verlet method. Skipping one sub-step for the velocities is traded in for the fact that one never has both the velocity and the position at the same time-step, and the algorithm is no longer self-starting. It is clear that the leapfrog method in Equations (2.28)–(2.29) and the classical Verlet method in Equation (2.22) are not suitable for velocity-dependent interaction laws, because there is no way to obtain the velocities with the same accuracy at the same (sub-)time-step as for the positions. Slightly less obvious, the velocity-Verlet method (as well as its higherorder variants introduced in the next subsection) is likewise unsuitable: in Equation (2.27), the velocities appear on the left-hand side and the accelerations on the right-hand side for one and the same time-step. Should the accelerations depend on the velocities, the equation would become implicit, and there would be no way to solve it directly. The same implicit structure arises for other formulations that are called ‘velocity Verlet’ (see, e.g., Allen and Tildesly [34, p. 81]).

2.4.3

Higher-order velocity-Verlet methods

By modifying Equations (2.25)–(2.27) with different sub-steps, one can obtain velocityVerlet-type methods of higher order. Instead of one sub-step at the middle of the time interval, tn + 12 τ, we could take Msub sub-steps of lengths γi τ and ηi τ for the velocity v and position x, respectively:

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 86 — #22

i

86

i

Understanding the Discrete Element Method

v(tn + γ1 τ ) = v(tn ) + γ1 τ a(tn ), x(tn + η1 τ ) = x(tn ) + η1 τ v(tn + γ1 τ ), v(tn + γ1 τ + γ2 τ ) = v(tn + γ1 τ ) + γ2 τ a(tn + η1 τ ), x(tn + η1 τ + η2 τ ) = x(tn + η1 τ ) + η2 τ v(tn + γ1 τ + γ2 τ ), .. . v(tn+1 ) = v(tn+1 − γMsub ) + γMsub τ a(tn+1 − γMsub τ ), x(tn+1 ) = x(tn+1 − ηMsub ) + ηMsub τ v(tn+1 − ηMsub τ ).

(2.30)

As in the original (second-order) velocity-Verlet method given by (2.25)–(2.27), new velocities and positions are obtained from the old ones by incrementation with the accelerations and velocities, respectively. In Table 2.2 we show some representative velocity-Verlet methods, their orders and their coefficients. We remark here that the error order given in Table 2.2 is valid only for bounded energy terms in the system. Strictly speaking, this does not include systems with 1/r 2 potentials, such as gravitational, unscreened Coulomb interaction and Lennard–Jones potentials  φ(r) = σ

1 r 2n



1 rn

(2.31)

(the exponents n and 2n are common, because the term with the n power need only be squared to obtain a term with the 2n power). Such potentials have actually been used [17] to model phenomena in granular particles! In principle, the velocities and positions in system (2.30) (but not the order of γi and ηi ) could be interchanged. However, the force evaluation is usually the most costly operation; therefore the coefficients in Table 2.2 are arranged so that a vanishing ηi allows us to drop one force evaluation. As for the original velocity-Verlet method, the order of the local accuracy is also the order of the global accuracy. By choosing the coefficients γi and ηi appropriately, one can also construct other second-order velocity-Verlet methods (e.g. the McLachlan method in Table 2.2) besides the original one in (2.25)–(2.27). In general, the number of sub-steps is at least as large as the order of the method, but there are methods where the first or last γi τ or ηi τ is zero. The coefficients γi and ηi must satisfy the relation M sub  i=1

γi =

M sub 

ηi = 1.

(2.32)

i=1

Some of the γi , ηi can be negative; in fact, for methods of order 3 or higher, there must be at least one negative coefficient (see, e.g., the method of Ruth in Table 2.2); the size of the coefficients is not limited either (see, e.g., the method of Tselios and Simos in Table 2.2, which has some coefficients greater than 1). Depending on the derivation, coefficients may be obtained in closed form or in finite precision as the solution of linear (in the case of Tselios and Simos’s method) or nonlinear systems of equations. If accuracy becomes a problem in a

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 87 — #23

i

i

Numerical Integration of Ordinary Differential Equations

87

Table 2.2 Coefficients of velocity-Verlet methods for symplectic and pseudo-symplectic decompositions. Asymmetric methods are indicated by AS; pseudo-symplectic methods are indicated by PS. If very high accuracy is desired, it is more efficient to use higher-order methods rather than low-order methods with small time-steps. Method

Order p

# Sub-steps Msub

γn

ηn

Original velocity Verlet

2

2

1/2 1/2

1 0

McLachlan [18, 19]

2

2

√ 1 − 1/ √ 2 1/ 2

√ 1/ 2√ 1 − 1/ 2

Ruth [20] (AS)

3

3

7/24 3/4 −1/24

2/3 −2/3 1

Forest & Ruth [21]; Candy & Rozmus [22]

4

4

1/(2 − 21/3 ) 1/(1 − 22/3 ) η1 0

Chambers [23–25] (PS)

4

3

(2 + 21/3 + 2−1/3 )/6 (1 − 21/3 − 2−1/3 )/6 γ2 γ1 √ (1 − 1/√ 3)/2 1/ 3 γ1

Suzuki ‘fractal’ [26–28]

4

6

0.2072453858971879 0.4144907717943757 −0.1217361576915636 γ3 γ2 γ1

0.4144907717943757 0.4144907717943757 −0.6579630871775028 η2 η1 0

Tselios & Simos [29] (AS)

5

7

0.4515650720436606 −0.002625517726040550 −0.2887462490910128 0.4703720043422902 0.3704466763359328 0.1934796732533846 −0.1944916591582146

1.904232780508446 −1.939586366441925 0.3960766510231830 0.5133868104090695 −2.967739460604547 0.004177409528669316 3.089452175577104

Yoshida [30, 31]

6

8

0.3922568052387786 0.5100434119184577 −0.4710533854097564 0.06875316825252009 γ4 γ3 γ2 γ1

0.7845136104775573 0.2355732133593581 −1.177679984178871 1.315186320683911 η3 η2 η1 0.0

1/2 1/2 0

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 88 — #24

i

88

i

Understanding the Discrete Element Method

simulation (see § 2.4.5), one could choose higher-order methods, as they generally give much better accuracy than the ‘classical’ Verlet method; however, if a time-step is used that is close to border where the simulation would become unstable, then there would be no performance advantages.

2.4.4

Pseudo-symplectic methods

It can be shown (see [32]) that symplectic methods of order higher than 2 must contain some negative coefficients γi , ηi . Negative sub-steps mean that one needs to perform force evaluations which do not advance the solution towards the final time-step, and which have to be compensated for in the forward integration. One can get around the problem of negative time-steps in higher-order symplectic methods by using pseudo-symplectic methods; these are derived (see [23, 24]) under the additional assumption that one contribution (the interaction energy) is a small perturbation of the other contributions in the system (the kinetic energy). Using this approach, positive values can be assigned to every γi and ηi . Up to now, we have not encountered any situations where the theoretical assumption of a small perturbation in deriving the coefficients would impose practical restrictions on the application of pseudo-symplectic decompositions to time integration in particle simulations. Nevertheless, as with all symplectic methods, an application to systems with energy dissipation is not possible.

2.4.5

Order, accuracy and energy conservation

The order p given in Table 2.2 is the order of accuracy for the coordinates and velocities; it does not indicate the accuracy for the energy. It can be shown (see [33]) that if a symplectic method has an order of accuracy p, then in general the worst-case bound for the order of accuracy of the energy is p − 1, one order less! Whether the order of accuracy of the energy is actually lower than for the positions and velocities depends on the system. In simulating the harmonic oscillator with the classical Verlet method, we find the same order of error for the energy (Figure 2.8) as we did for the accuracy of the positions and velocities (Figure 2.5). It is not possible to predict in general whether the energy of a symplectic approximation is below or above the exact value [33]. Accordingly, for a second-order method, the error in the positions should be of third order for the positions and of second order for the energy. In other words, when the time-step is reduced by one order of magnitude, the error in the energy should decay by two orders of magnitude. Such a scaling with an exponent of 2 can in fact be seen from plots of the error in the energy (see, for example, Figure 3.3 on p. 83 of [34]), but due to the common confusion between the order of accuracy and the error order (see § 2.2.3), this can easily be misinterpreted as a second-order error. Let us consider the magnitude of the error for a hypothetical problem where the energy, positions and velocities are about 1. For a method with order of accuracy 1, the error order would be 2; that is, for a time-step τ = 1/10, the error in the positions and velocities would be about (1/10)2 = 1%, while the error in the energy would be one order less, (1/10)1 = 10%. For time integrators, the work efficiency is basically determined by the number of force evaluations per simulation time interval. When good energy conservation is needed, higher-order methods are recommended, especially the pseudo-symplectic methods which we have found to be very efficient. For moderate or low accuracy and large time-steps, it is hard to beat the efficiency of the velocity-Verlet method; in our experience only a few

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 89 — #25

i

Numerical Integration of Ordinary Differential Equations

i

89

10–2

10–4

Verlet, loc. ε ( τ0.99455) in energ., forw. diff. 10–6

Verlet, loc. ε ( τ1.9872) in energ., cent. diff. Verlet, loc. ε ( τ0.99995) in energ., forw. diff. Verlet, glo. ε ( τ1.9996) in energ., cent. diff. 10–3

10–2

10–1

Figure 2.8 Error order for the Verlet integrator applied to the harmonic oscillator, with the velocities computed by both centered differences (second order) and forward differences (first order).

methods may be more efficient (i.e. for the same quality of energy conservation, fewer force evaluations are needed), of which two are Forest and Ruth’s method and Chambers’ method shown in Table 2.2. Generally, methods that have large coefficients or too many negative coefficients cannot be recommended; methods of very high order, such as the sixth-order Yoshida method in Table 2.2, may develop instabilities for some problems. Symplectic integrators can be symmetric, i.e. the coefficients are symmetric with respect to the integration interval, or asymmetric. Sometimes we require that a numerical solution exhibit not only good energy conservation but also time-reversal symmetry, for example in cases where equilibrium properties are related to time reversibility. In such situations, the use of the symmetric methods is preferable. If one performs an integration from x(0) to x(t) forward in time and then from x(t) to x(0) ˜ backward in time, asymmetric methods (marked with ‘AS’ in Table 2.2) will yield considerably larger deviations between x(0) and x(0) ˜ than symmetric methods. This drift is caused principally by the asymmetry of the decompositions rather than by rounding errors [10]. Pseudo-symplectic methods are in general symmetric. Owing to the correspondence between Hamiltonian systems and their exponential time operator, coefficients of operator decomposition schemes can be used in symplectic integration methods and vice versa. A more complete table than Table 2.2 listing current methods up to order 8 can be found in [35]. We conclude with the remark that symplectic methods also exist which are not Verlet or velocity-Verlet schemes; some are implicit Runge–Kutta methods; see [11] and [36].

2.4.6

Backward error analysis

In this subsection we make some remarks to clarify how symplectic integrators attain the surprising feat of having the same local and global discretization error. While forward error analysis (as for the truncation error in §2.2.3) compares the approximation with the exact solution and tries to quantify the error (e.g. by Taylor expansion), backward error analysis

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 90 — #26

i

90

i

Understanding the Discrete Element Method

ε0 + ∣ε2∣ ε0 + ∣ε1∣ ε0

Figure 2.9 Schematic representation of the meaning of backward error analysis for symplectic integrators: the exact trajectory (thick black line) at energy ε0 , along with one trajectory computed using a symplectic approximation of high accuracy, i.e. high order or small time-step (dotted line), on the constant energy surface ε0 + |ε1 | (dark gray narrow tube around the exact trajectory) and another trajectory computed using a symplectic approximation of low accuracy, i.e. low order or large time-step (dashed line), on the constant energy surface ε0 + |ε2 | (light gray narrow tube around the dark gray tube); also plotted is the drift away from the exact trajectory due to a symplectic approximation with randomly varying time-step (thin solid line).

(see, e.g., § VII.8 of [9]) tries to identify the problem for which the numerical solution is the exact solution. In the case of symplectic solvers for trajectories of particles described by ODEs, a solution obtained from the solver corresponds to the exact solution of a system whose trajectories lie in the vicinity of the exact solution of the original system, and the extent of the ‘vicinity’ depends on the order of the integrator; see Figure 2.9. Adaptive step-size control is problematic for symplectic methods. With a symplectic integrator, a random variation of the time-step leads to an energy drift in the direction normal to the exact solution, which destroys the symplecticity [37]. Additional considerations are necessary for implementing adaptive step-size control while conserving the symplectic properties [11]. We have discussed symplectic methods in much more detail than is necessary for the practical implementation of discrete element methods: it has become clear that energy considerations in selecting methods or time-steps, which are popular among physicists, are not very relevant for dissipative systems, in which the accuracy can be defined only for positions and velocities, not for a combination of the variables such as energy.

2.4.7

Case study: the harmonic oscillator with and without viscous damping

In the narrower sense, the harmonic oscillator refers to the linear differential equation mx¨ + kx = 0, which, if we set m = k = 1, has the exact solution x(t) = A cos t + B sin t. We compute the solution with the symplectic velocity-Verlet method (§ 2.4.2) and with the second-order Runge–Kutta ‘Heun’ method (§ 2.3), and find the error order for the position and the velocity; we do this for both the local error (where we have used the maximum error in a single period, t ∈ [0, 2π ]) and the global error (which we take to be the maximal error after 100 periods, t ∈ [0, 200π ]). We chose the integration interval t ∈ [0, 200π ] because the global error does not yet ‘jump’ in this range; for t ∈ [0, 2000π ], the numerical solution for the largest

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 91 — #27

i

Numerical Integration of Ordinary Differential Equations

101

i

91

Heun, loc. ε (τ1.9993) in x Heun, loc. ε (τ1.9876) in v

100

Heun, glo. ε (τ2.0206) in x Heun, glo. ε (τ2.0226) in v

10–1

Vel. Verlet, loc. ε (τ1.9895) in x Vel. Verlet, loc. ε (τ1.9888) in v Vel. Verlet, glo. ε (τ1.9817) in x

10–2

Vel. Verlet, glo. ε (τ1.9812) in v

10–3

10–4

10–5

10–6

10–7 10–3

10–2

10–1

Figure 2.10 Sketch of the global and local error order for the harmonic oscillator solved with the second-order symplectic velocity-Verlet method and with the second-order Runge–Kutta ‘Heun’ method.

time-step is off by half a period, which leads to a freak reduction in the error. We vary the time-step so that τ ∈ {0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.25}, and plot the error (the deviation between the exact solution and the numerical solution) on a double-logarithmic scale in Figure 2.10. In the figure legend we give the exponent α of the error order (τ α ), calculated from α=

log((τmax )) − log((τmin )) . log(τmax ) − log(τmin )

(2.33)

From Figure 2.10, we can make the following observations for the harmonic oscillator: 1. For these two second-order methods, we find an error order that is about the same, as the graphs are all parallel. 2. For both solvers, the error order for the velocity is marginally lower than for the positions.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 92 — #28

i

92

i

Understanding the Discrete Element Method

10

Velocity verlet Heun

5 0 –5 −10 0

100

200

300

400

500

600

Figure 2.11 Time evolution of the harmonic oscillator computed using the velocity-Verlet and Heun methods with τ = 0.25. While the maximal amplitude of the exact solution should not change, note that for the solution obtained from the second order Runge–Kutta method (Heun) the amplitude diverges.

3. For both solvers, the global error is larger than the local error, proportional to the integration time. 4. While the order of the global error in the velocity-Verlet method is marginally higher than the order of the local error, surprisingly the opposite is true for the Heun method. While the error order is nearly the same for both methods, for the time interval under consideration, the magnitude of the errors in the Heun method is one order of magnitude larger than in the velocity-Verlet method. In Figure 2.11 it can be seen that the amplitude of the harmonic oscillator trajectory computed by the Heun method actually diverges, but the logarithm in (2.33) hides this bad behavior. This shows that the error order (or order of accuracy), i.e. the power with which the accuracy depends on the time-step, is not such a useful quantity by which to judge numerical methods.

2.5

Stiff problems

Originally, the term ‘stiff’ came from problems with spring constants that made it necessary to use a very small time-step in computations, which was very costly for the computers of the 1960s. The meaning has metamorphosed slowly to refer to ODEs which have special properties that make the use of very small time-steps necessary for some solution methods— but not for others, which are therefore called ‘stiff solvers’. In fact, it is difficult to formulate a precise definition of a stiff ODE; we should say, rather, that some ODEs exhibit the property of stiffness for certain parameters; see Exercise 2.8. There are various reasons why ODEs become stiff, though in each case it is possible to find counterexamples where a similar equation satisfies the same condition but does not behave in a stiff manner. The following effects may lead to stiff ODEs. Stability is more important than accuracy. When solving stiff problems with stiff solvers, solutions don’t change much if the time-step is varied—in fact, often they are so stable that they have even been called super-stable; see [38]. Multiple time-scales in the problem. Stiff problems often involve several different timescales; for example, a system may exhibit slow oscillations with large amplitude, and also

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 93 — #29

i

Numerical Integration of Ordinary Differential Equations

i

93

fast oscillations with small amplitude and damping. Stiff solvers may be able to ignore the fast time-scale altogether. Large variations in the solution over small intervals. A problem can become stiff when there is large variation in the solution (not only toward large absolute values but perhaps also toward very small values, ranging over many orders of magnitude). The large variation may not occur in the ‘obvious’ variable; for example, for the van der Pol oscillator ˙ y¨ = −y + μ(1 − y 2 )y, the solution y(t) stays fairly bounded (see Exercise 2.8), but the velocity y˙ shows large variations with increasing values of the parameter μ. Large variation in the eigenvalues of the Jacobian. Some stiff problems can be characterized by large variation in the eigenvalues of the Jacobian of the system, but this is not a useful criterion to use for particle simulations, where the Jacobian ⎛ ∂f1 ∂y1 ⎜ ∂f2 ⎜ ∂y1 ⎜

∇f (y, t) = ⎜ . ⎝ ..

∂fn ∂y1

∂f1 ∂y2 ∂f2 ∂y2

.. .

∂fn ∂y2

··· ··· .. . ···

∂f1 ⎞ ∂yn ∂f2 ⎟ ∂yn ⎟ ⎟

.. ⎟ . ⎠

∂fn ∂yn

is often not accessible, or its computation may be too costly. Stiff solvers work better than non-stiff solvers. A pragmatic definition of stiffness is that if the solution of a problem is obtained at lower computational cost with an implicit (stiff) solver than with an explicit (non-stiff) solver, the problem is stiff. Although this definition may sound recursive or circular, it has long been used in numerical analysis; see, e.g., [9, p. 1]. Even though stiff solvers have many advantages, when details of a solution in the highfrequency range are sought, one still needs to use either a non-stiff solver which will not neglect the fast time-scale, or a stiff solver with the same time-step as the non-stiff solver. Also, if small time-steps are chosen, or if small error tolerances are specified which lead to small time-steps, then the time-step for stiff-solvers will be reduced to the same size as for non-stiff solvers [9, p. 561].

2.5.1

Evaluating computational costs

For explicit methods, the most costly operation in the numerical solution of ODEs of the form (2.11) is the evaluation of the right-hand side function f (y, t), which for DEM simulations corresponds to the computation between the particles. In particular, for the work efficiency, one has to take into account the number of force evaluations per length of the time-step, not per time-step itself; so an algorithm which needs one evaluation of f (y, t) with a time-step of τ = 0.1 will in general be less effective than an algorithm which needs two evaluations of f (y, t) with a time-step of τ = 0.3. Small arithmetic effort in an integrator may seem attractive; however, if it comes at the cost of small time-steps, the work efficiency will be poor.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 94 — #30

i

94

2.5.2

i

Understanding the Discrete Element Method

Stiff solutions and error as noise

While the error in numerical procedures such as quadrature (the numerical evaluation of definite integrals) is ‘only’ a deviation from the exact values, for the solution of the time evolution of many-particle systems it is advisable to think of the error as noise. This noise may generate responses in the system, such as resonances (as discussed in § 1.4.2), or it may lead to the necessity of reducing the time-step, so that the noise cannot diverge to an instability of the system. In the worst case, for unstable systems, the noise will drive the system away from the correct trajectory in phase space. For the damped harmonic oscillator, stiff and non-stiff solvers with adaptive time-steps allow us to increase the step-size when the amplitude of the solution becomes larger; see Exercise 2.5. In Exercise 2.11, when the ball has stopped bouncing and is at rest on the floor, the physics is practically the same as for the damped harmonic oscillator with vanishing amplitude; but in this situation the non-stiff solvers are not able to increase the time-step. So one way of thinking about stiff solvers is that they are solvers that do not create additional noise in the solution; non-stiff solvers, on the other hand, create noise which can be dealt with (holding the simulation stable) only by keeping the time-step small.

2.5.3

Order reduction

In the case study of § 2.4.7, we saw that the relation between the local and global order of accuracy is far from straightforward. For stiff solvers, the situation is more complicated, even for local errors. When a stiff solver is applied to a stiff problem, so that a much larger time-step τs is used than if a non-stiff method were employed (τns ; see Exercise 2.13), it may happen p p+δ that the order of accuracy is reduced, due to algebraic reasons, from O(τns ) to O(τs ), where δ depends on the problem and may be greater than or equal to 1 if constraints are present in the system. This phenomenon is called ‘order reduction’; see [9, § IV.15] for details. Order reduction does not occur if the stiff solvers are used with time-steps that are of the order which would be necessary for non-stiff solvers applied to the same equations.

2.6 2.6.1

Backward difference formulae Implicit integrators of the predictor–corrector formulae

Many stiff solvers are implicit solvers. We have already encountered an implicit formula, namely the backward Euler method, (2.2.6). In principle, instead of solving such a system of possibly nonlinear implicit equations, one could try to approximate its solution by first taking an explicit ‘predictor’ step from time t to time t + τ using the forward Euler method with Jacobian (gradient) f (x, t), and then ‘correcting’ the result by using the difference in the Jacobians between the successive times, f (x, t)−f (x +x, t +τ ). One generalization of the explicit and implicit Euler methods to higher orders gives the backward difference method, in which first a predictor step is taken, then the forces are evaluated, and then the change in the forces from the previous time-step is used for the corrector step. Such methods are ‘implicit’ because they contain the positions as functions not just of the forces but also of all other derivatives. While it is possible to combine correctors and predictors of different orders, the combination of a predictor of order p and a corrector of order p˜ will be of order min(p, p), ˜ i.e. the lower order of the two.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 95 — #31

i

Numerical Integration of Ordinary Differential Equations

i

95

Backward difference formulae (BDF) are stable up to order 5; beyond that, they are conditionally unstable, i.e. they are guaranteed not to converge. For a position r0 and a time-step τ, a method of order p requires the derivatives up to order p rescaled by the respective power of the time-step τ :     dr0 τ 2 d2 r0 τ 3 d3 r0 τ p dp r0 , r , . . . , r . , r2 = r1 = τ = = 3 n dt 2! dt 2 3! dt 3 p! dt p These variables are usually collected into the so-called Nordsieck vector rp (t) = [r0 (t), r1 (t), r2 (t), r3 (t), r4 (t), r5 (t)]

(2.34)

(where we use row and column vectors interchangeably). The predictor, i.e. the Nordsieck vector for the next time-step t + τ , computed under the assumption that the force does not change, is (the reason for the underlining will be made clear later) ⎛ p ⎞ ⎞ ⎛ ⎞⎛ r0 (t + τ ) 1 1 1 1 1 1 r0 (t) ⎜r p (t + τ )⎟ ⎜0 1 2 3 4 5⎟ ⎜r (t)⎟ ⎜ 1p ⎟ ⎜ ⎟⎜ 1 ⎟ ⎜r (t + τ )⎟ ⎜0 0 1 3 6 10⎟ ⎜ r2 (t)⎟ ⎜ 2p ⎟ ⎟=⎜ ⎟⎜ (2.35) ⎜r (t + τ )⎟ ⎜0 0 0 1 4 10⎟ ⎜ r3 (t)⎟. ⎜ 3p ⎟ ⎟ ⎜ ⎟⎜ ⎝r (t + τ )⎠ ⎝0 0 0 0 1 5⎠ ⎝ r4 (t)⎠ 4 p 0 0 0 0 0 1 r5 (t) r5 (t + τ ) This means that the position at time t + τ is obtained under the assumption that the force is constant. Instead of giving a derivation, which can be found in [39], here we explain the general idea of the derivation. The evolution of a function f (x, t) to f (x + h, t + τ ) can be formally written using the operator exponential of the derivatives in space and time as  ∂ ∂ f (x + h, t + τ ) = exp h +τ f (x, t). (2.36) ∂x ∂t This time evolution expressed with the operator exponential can then be formally expanded in a power series f (x + h, t + τ ) =

 ∞  ∂ ∂ k 1 h +τ f (x, t). k! ∂x ∂t

(2.37)

k=0

Retaining only a finite number of terms p and collecting terms of the same power according to the binomial formula leads to the pre-factors from the Pascal triangle 1 1 1 1

1 2

3

1 3

1

as the first entries. We have underlined the elements in the center so that the relationship between the matrix in Equation (2.35) and the direction of the Pascal triangle becomes more obvious.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 96 — #32

i

96

i

Understanding the Discrete Element Method

Obviously, for the time integration, it would be best to work with a time-step of τ = 1 for the sake of numerical stability. With time-steps smaller than 1, there is a risk that the higher-order terms in the predictor equation (2.35) will drop out due to lack of valid digits. For DEM simulations, it is therefore convenient to work with two sets of variables. One set consists of positions, velocities and forces which are accessible by the main program; these should be in SI units, to avoid extra work (and errors) in converting the input and output data, so that the units of intermediate results need no post-processing. On the other hand, within the predictor–corrector module there should be a set of variables with units scaled so as to make the internal time-step τ = 1. The transformation is only a scalar multiplication, which is computationally much cheaper than computation of the predictor equation (2.35) itself. With the pre-computation of dt2=dt*dt, the first three predicted derivatives X0P, X1P, X2P of the coordinates in dimensionless units can be computed from the physical positions, velocities and accelerations X_phys, VX_phs, AX_phys in SI units by using a loop over all particles with variable i_particle: X0P(i_particle) = X_phys(i_particle) X1P(i_particle) = VX_phys(i_particle)*dt X2P(i_particle) = 0.5d0*AVX_phys(i_particle)*dt2

2.6.2

The corrector step

The corrector step acts on the predicted coordinates calculated as above; the scaled coordinates with τ = 1 must be used. The corrector step is then ⎛ c ⎞ ⎛ p ⎞ ⎛ ⎞ r0 (t + τ ) c0 r0 (t + τ ) ⎜r c (t + τ )⎟ ⎜r p (t + τ )⎟ ⎜c1 ⎟ ⎜ 1c ⎟ ⎜1 ⎟ ⎜ ⎟ ⎜r (t + τ )⎟ ⎜r p (t + τ )⎟ ⎜c2 ⎟ ⎜ 2c ⎟ = ⎜ 2p ⎟ ⎜ ⎟ (2.38) ⎜r (t + τ )⎟ ⎜r (t + τ )⎟ + ⎜c3 ⎟ r, ⎜ 3c ⎟ ⎜ 3p ⎟ ⎜ ⎟ ⎝r (t + τ )⎠ ⎝r (t + τ )⎠ ⎝c4 ⎠ 4 4 p c5 r5c (t + τ ) r5 (t + τ ) with coefficients from Table 2.3; one uses different coefficients for different orders of the equation and the approximation. The corresponding lower-order approximations are obtained by using the respective coefficients while retaining only the upper equations of (2.35) and (2.38). The BDF predictor–corrector formulae are available for both first- and second-order ODEs, which means that Newton’s equation of motion can be implemented directly, without needing to be transformed to a first-order system as discussed in § 2.2.1. In (2.38), r is the difference between the predicted and corrected pth-order derivatives for an pth-order differential equation (with p = 1 or 2). So, for first-order ODEs, r is the difference between the predicted and corrected first derivatives, p

r = r1c − r1 ; and for second order ODEs, r is the difference between the predicted and corrected second derivatives (accelerations), p

r = r2c − r2 ,

(2.39)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 97 — #33

i

i

Numerical Integration of Ordinary Differential Equations

97

Table 2.3 Table of corrector coefficients for Gear predictor–corrector methods, with the time-step scaled to τ = 1. Sets of coefficients are given for three- to six-value methods (with order of accuracy 2 to 5) and for first-order and second-order differential equations. For second-order differential equations (lower half of table), in the case of velocity-independent forces (y¨ = f (y)) the coefficients 19/120 and 3/20 have to be used for five- and six-value methods, respectively, while for velocity-dependent forces (y¨ = f (y, y)) ˙ the coefficients 19/90 and 3/16 (in parentheses) have to be used. Order 1

2

# Values

c0

c1

c2

c3

c4

c5

3 4 5 6

5/12 3/8 251/720 95/288

1 1 1 1

1/2 3/4 11/12 25/24

1/6 1/3 35/72

1/24 5/48

1/120

3 4 5 6

0 1/6 19/120 (19/90) 3/20 (3/16)

1 5/6 3/4 251/360

1 1 1 1

1/3 1/2 11/18

1/12 1/6

1/60

i.e. between the predicted force and the actual force, scaled by the mass and the dimensionless time-step. The predicted forces/accelerations are obtained from the predictor step, and the corrected forces/accelerations are obtained from the force computations in the simulation. A common beginner’s mistake in programming the corrector is to use the ‘corrected force’ r2c p directly, without subtracting r2 .

2.6.3

Multiple corrector steps

The corrector step can be applied repeatedly. In that case, for each iteration i, a force computation is necessary, and for i > 2 one has c,(i)

r(i) = r2

c,(i−1)

− r2

,

with p

r(1) = r2c − r2 . In principle, the corrector should be only a ‘small’ modification of the predictor step. One can implement a comparison of the correction with a threshold, and enforce additional corrector steps until the change r(i) falls below a certain threshold for all particles. In general, it is more efficient to choose a smaller time-step to reduce the number of force evaluations per time unit. On the other hand, it could happen that in some rare events collisions occur with such high particle speeds that the simulation would become unstable in only a single corrector iteration. In such cases, adaptive multiple iterations (undertaken only if the deviation between previous corrector results is too large) could prevent the simulation from becoming unstable without unduly increasing the computer time for ‘ordinary’ collisions.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 98 — #34

Understanding the Discrete Element Method

(p) xn

(c) xn

Force evaluation

Force evaluation

Observable computation

98

i

Observable computation

i

(p) xn + 1

(c) xn + 1

(c) xn + 2

Figure 2.12 Program flow for Gear predictor–corrector algorithms: force computation with the predicted values, and observable computation with the corrected values.

2.6.4

Program flow

The predictor computes the new positions under the assumption that the forces (and torques) do not change. With these predicted positions (and orientations), the new particle outlines are computed. From the new particle outlines, one obtains potentially new interaction partners, and the interactions (forces, torques) are computed next. Only the deviations from the old forces and torques are used in the corrector step. The resulting corrected variables are the ones which have to be used as the ‘physical’ variables in the observable computation; see Figure 2.12.

2.6.5

Variable time-step and variable order

For the backward difference formulae, Gear developed the subroutine DIFSUB which is able to change both the time-step and the order. Its functionality is available in an up-to-date form R function ode15s. However, one should bear in mind that for (see [40]) in the MATLAB variable step-size methods, the error estimators also work under the assumption that the variation of the forces is smooth, but this is often not the case with the ad-hoc force laws in DEM modeling. While the integrator part of ode15s and DIFSUB may be immaculate, as in any other case where time-step adaptation is implemented, additional conditions are assumed which are often not fulfilled for DEM simulations.

2.7 2.7.1

Other methods Why not to use self-written or novel integrators

The exposition in this chapter is rather conservative, and the use of integrators which have been around for some time is recommended. When one tries to be clever in devising new methods, a lot can go wrong—sometimes in interesting and unforeseen ways, and not necessarily even at the beginning at the integration process. For example, in Exercise 2.4 we show that the error of the midpoint rule for quadrature is only half that of the trapezoidal rule for quadrature. Based on the trapezoidal quadrature rule, one can construct an integrator, such as R . The temptation is then irresistible to try out the midpoint rule in a ode23t in MATLAB time integrator; see Program 2.1. The result of Program 2.1 is plotted in Figure 2.13: while initially the numerical result fits well with the analytical solution, at around t = 3 the numerical solution starts to develop oscillations around the true value; these oscillations then increase in amplitude in a periodic

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 99 — #35

i

Numerical Integration of Ordinary Differential Equations

i

99

Program 2.1 Integrator based on the midpoint rule, which leads to the ghost solutions in Figure 2.13, even if started with the exact initial conditions for t1 = 0 and t2 = dt. clear format compact x0=1 D=0.1 dt=0.1 omega0=1; omega_d=sqrt(omega0ˆ2-Dˆ2); y(1,2)=x0 % Position y(1,1)=x0*(-D*exp(-D*0).*cos(omega_d*0)); % Velocity % Euler-Step n=1 y(n+1,2)=y(n,2)+dt*y(n,1); y(n+1,1)=y(n,1)+dt*(-omega0ˆ2*y(n,2)-2*D*y(n,1)); n=2 t_max=40 t(2)=dt while (t(n) 0, the method is obviously implicit. Therefore, Newmark methods are widely used in the field of finite element methods, where the problems require the solution of nonlinear systems anyway. For DEM simulations, on the other hand, the associated effort is prohibitive (and the assumption of a coupling parameter between time-steps like β is not very confidenceinspiring either). Symplectic methods are usually unsuitable for DEM simulations, as most of these systems are dissipative; there are explicit dependencies on the velocity which cannot be taken into account in the formulation of many of these methods, including the Verlet class of methods discussed in § 2.4. In the few cases where energy conservation is needed (i.e. no dissipation and ‘symmetric’ interactions, like in Alder systems), the velocity-Verlet methods listed in Table 2.2 would be preferable—in particular, the second-order scheme if ‘approximate’ energy conservation is necessary, and Forest and Ruth’s scheme if very high accuracy is desirable. We repeat here the warning that the quality of integrators should never be tested with conservation of energy if the system is not energy-conserving anyway: the resulting time-steps are much smaller than those which would be chosen based on arguments about accuracy in the position and velocity. Runge–Kutta methods which use the gradients f (y, t) at intermediate times between t and t + τ are not efficient for DEM simulations, where configuration changes may be unphysical for some steps (in particular those taken ‘backward’ in time) or may create additional overhead due to changes in neighborhood tables, for instance. This includes adaptive step-size methods with an accept–reject structure, as discussed in § 2.3.1. For very high precision, extrapolation methods may have their uses, but for most DEM problems, the necessary accuracy is much too low and the force laws are much too noisy.

2.10

Further reading

In this chapter we have given an overview of numerical methods that are convenient to implement for discrete element and rigid body systems, so our focus is different from that of typical numerical analysis books. Nonetheless, we have introduced the conventional terminology, so that readers who might wish to explore the topic further can more easily see the connections with the numerical analysis literature.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 112 — #48

i

112

i

Understanding the Discrete Element Method

Issues relating to machine accuracy, in the context of the IEEE standards, are discussed in [49]. A monograph on accuracy and stability in general is [50]. The confusion between the order of accuracy and the order of the error is elucidated in [6]. A history of the solution of ordinary differential equations is outlined in [51], along with a history of stiff problems and BDF methods. The standard references on integrators are written by E. Hairer and coauthors: [8] on non-stiff methods, [9] on stiff methods, and [11] on symplectic methods. For application of the projection method to DEM-related mechanical systems, see [52]. The explicit construction of Taylor methods can be found in [53]. Extensive Butcher tableaus for explicit Runge–Kutta methods and tables of coefficients for Adams–Bashforth methods are given by Engeln-M¨ullges and Uhlig, in Fortran [54] and in C [55]; their Butcher tableaus are arranged a bit differently from Table 2.1 here. The ODE integrators R are described in [40]. of MATLAB A very readable introduction to step-size control is [1]. More on event location can be found in [56]. Coefficients for symplectic and pseudo-symplectic integrators of velocity-Verlet type are collected in the appendix of [35] and can be copied and pasted from the pdf file. Even if the main part of the paper is not concerned with ODE problems but with the approximation of operator exponentials, the formal relationship between the two is obvious from Equations (2.36) and (2.37) in this chapter. A classical paper on DAEs is [46], a classical textbook is [57], and two newer texts are [58] and [59]. The time integration of partial differential equations by ODE solvers, the so-called ‘method of lines’, may be of interest in the DEM context when external fields have to be coupled with the particle simulation; see [60] and [61]. Internet searches have become a mixed blessing for the scientific community, with the main problem being the abundance of irrelevant or misleading information. Nowadays, students can quickly find numerical algorithms on the internet which seem easy to implement. Unfortunately, simpler algorithms often come at the price of lesser stability and reduced accuracy, especially in the field of numerical solutions of differential equations. Another problem is that flawed search strategies yield useless results. Many students looking for material on the discrete element method enter in the search engine Discrete element method which will yield a list of documents containing the words ‘Discrete’, ‘element’ or ‘method’ anywhere in the text. To obtain more meaningful results, one should enter the exact word sequence with quotation marks: "Discrete element method" Sometimes it helps to specify alternative terms, for example: "Distinct element method" To search for information on numerical integrators, it is also helpful to know the alternative names for the methods. For example, instead of ‘Runge–Kutta method’, ‘one-step method’ could be used; similarly, rather than ‘Predictor corrector’ one could look for ‘Adams–Bashforth’. To exclude unwanted documents, use a -. For example: "Discrete element method" -"rock mechanics"

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 113 — #49

i

Exercises

i

113

Alternatively, add other distinctive expressions, such as "Discrete element method" "polygons" until one has narrowed the search down to a point where the retrieved documents are most relevant to one’s needs.

Exercises If there is a large number of similar programs in a directory, which will be the case if related tasks are programmed as in the following exercises, it is advisable to choose filenames so that the files for the ODEs and their main programs (or drivers) can easily be associated with one another. For example, for a function harmos.m, it will be convenient to call the driving program dr_harmos.m. Most of the following problems are representative of the behavior of particle contacts with various interactions. While realistic problems will have many such contacts, it is instructive to study their behavior (with respect to noise, damping, etc.) in isolated systems for single contacts first. The following programs are only minimal examples. They can be improved considerably by, for example, the addition of comments. Further, defining the same parameters for drivers R ’s global variables, so that they can be set from the and the ODEs (e.g. using MATLAB driver instead of in the ODE solver) will make the programs considerably more user-friendly. Unless mentioned otherwise, the default accuracy will be sufficient. 2.1

Floating point numbers R ’s eps-function for the so-called machine epsilon to find out a) Use MATLAB how ‘gaps’ between the actual floating point numbers change: eps(1), eps(2), eps(3), eps(4), eps(5), . . . , as schematically illustrated in Figure 2.1. (Note that eps without an argument is the same as eps(1).) In a programming language that does not have a function for the machine epsilon, the next-largest number greater than 1 can be computed with the following algorithm: clear format compact % compute machine-epsilon myeps=1. myepsp1=myeps+1. while (myepsp1>1) myeps=0.5*myeps; myepsp1=1+myeps; end myeps b) Program your ‘own’ set of floating point numbers as the product of a mantissa with 3–4 bits, according to Equation (2.1), and an exponent part with 2–3 bits, according to Equation (2.2). The easiest program structure uses several nested loops from 0 to 1. Be aware that some floating point numbers with different mantissas and exponents may actually represent the same floating point value. Convince yourself that the

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 114 — #50

i

114

i

Understanding the Discrete Element Method

density of these floating point numbers is higher around zero, and the gaps between these numbers are largest between the largest numbers. R c) Find out which MATLAB functions profit from pivoting (i.e. the reordering of operations so that division by small numbers can be avoided) by typing lookfor pivoting -all R d) Use MATLAB ’s realmin and realmax to find the smallest and largest numbers that can be represented; then divide and multiply them to see what happens in the cases of underflow and overflow, respectively. R e) Depending on the version and platform, even with format long, MATLAB and OCTAVE may not display all digits.

GNU Octave, version 3.6.3 Copyright (C) 2012 John W. Eaton and others. octave:1> format long octave:2> pi ans = 3.14159265358979 octave:3> pi-3.14159265358979 ans = 3.10862446895044e-15 On the other hand: < M A T L A B (R) > Copyright 1984-2011 The MathWorks, Inc. R2011b (7.13.0.564) 64-bit (maci64) August 13, 2011 >> pi ans = 3.141592653589793 >> pi-3.141592653589793 ans = 0 2.2

Reducing catastrophic cancellation in summations Generate a sequence of random numbers with randomly assigned plus and minus signs. R ) and add them up in Round them to single precision (using single in MATLAB single precision (by calling single after each addition): a) in random order; b) in order sorted from smallest to largest; c) in order sorted from largest to smallest; d) in sorted order, by alternately adding a positive number to a negative number. Compare the accuracy of the sums by comparing them with the result in original double precision. Which summation method is the most accurate?

2.3

Inverses R MATLAB has several commands for producing inverses of matrices that are well known to be difficult to invert (which means that the inverse matrix is computed

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 115 — #51

i

Exercises

i

115

very inaccurately). Such a set of difficult-to-invert matrices are the Hilbert matrices R . hilb(n), for which the exact inverses invhilb are also available in MATLAB a) Compute the deviation between the computed and the exact inverses, inv(hilb(n))-invhilb(n) and the relative error for n from 1 to 10. In the same way, compute the absolute error norm(inv(hilb(n))-invhilb(n)) and the relative error norm(inv(hilb(n))-invhilb(n))/norm(invhilb(n))) b) Convince yourself that for n=10, the residual inv(hilb(8))*hilb(8)-eye(8) is relatively close to the zero matrix zeros(8), while the deviation between inv(hilb(8)) and invhilb(8) is considerable. This shows that the residual is not a meaningful parameter for determining the accuracy of an algorithm, especially with regard to matrix inversion. c) Instead of looking at the residual, an algebraically more meaningful way to assess the accuracy of a matrix inversion is to check whether the eigenvalues of a matrix A are equal to the inverse (reciprocal) eigenvalues of its inverse matrix A−1 . Compare the eigenvalues of hilb(n) with those of 1./inv(hilb(n)) as well as 1./invhilb(n). 2.4

Romberg quadrature and other methods A common experience when doing calculus by hand is that differentiation is easier to perform than integration. If a function f (x) is given explicitly, then explicit computation of the derivative is always possible (provided the derivative exists). On the other hand, even for simple functions such as exp(−x 2 ), one cannot express the integral in closed form in terms of elementary functions. On the contrary, the integral can be used to define a new type of function, the so-called error-function 2 erf(a) = √ π



a

exp(−x 2 ) dx.

(2.50)

0

In cases where the integrand function is readily available but its integral is not, quadrature (numerical evaluation of the integral) becomes attractive. In fact, numerically, integration (quadrature) can be performed with much smaller error than differentiation. The error in the numerical evaluation of the integral in (2.50),

a

exp(−x 2 ) dx,

(2.51)

0

with a suitable choice of upper integration limit (we will use a = 1) can easily be computed by comparing the quadrature result with the following value obtained from R ’s erf function: MATLAB √ π erf(1) = 0.746824132812427. (2.52) 2

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 116 — #52

i

116

(a) 1

Understanding the Discrete Element Method

Left rectangle

Right rectangle 1 0.5

0.5 0

i

0

0.5

1

0

0

0.5

1

(b) Midpoint method 1

(c) Trapezoidal method 1

0.5

0.5

0

0

0.5

1

0

0

0.5

1

1

Figure 2.17 Romberg methods for numerical quadrature of 0 exp(−x 2 ) dx with equidistant intervals: (a) left and right rectangle methods; (b) midpoint method, where in each subinterval one part of the graph is below and the other part above the midpoint value, which leads to error compensation; (c) trapezoidal method, which underestimates the integral value over the integration interval.

(Integration limits a ≤ 1 would also be suitable, but when one plots the graph of exp(−x 2 ) one will see that due to the changing curvature, at some values error compensation may come into play, which will make the discretization error more 1 obscure.) Computing the integral 0 exp(−x 2 ) dx = erf(1) as the exact result is easy to do, which makes this a suitable test case for demonstrating the peculiarities of quadrature methods and the error order. The methods explored here that have integration points at a constant distance apart are called Romberg quadrature methods. Gauss quadraR ’s quadgk), which have integration points at variable ture methods (e.g. MATLAB distances—see part f) of this exercise—will always be preferable for practical problems, as they can achieve much higher accuracy with far fewer function evaluations than the Romberg methods with equidistant integration points. In parts b) to d), the integration points have to be chosen carefully, as an error by half an interval can lead to such larger errors that the order of accuracy will become much lower than for the correctly implemented method. a) Left and right Riemann sum methods The Riemann integral is derived by approximating the area under the integrand function by rectangles and then taking the limit as the width of the rectangles goes to zero. Using such a ‘Riemann sum’ with constant interval width, approximate the integral in (2.51) with upper integration limit a = 1 by a hundred rectangles. Use both the ‘left rectangle’ method Ileft with points at 0, 0.01, . . . , 0.99 and the ‘right rectangle’ method Iright with points at 0.01, 0.02, . . . , 1.0; see Figure 2.17(a). The two methods should have the same accuracy (i.e. the same number of valid digits as determined from a comparison with sqrt(pi)/2*erf(1)=0.746824132812427 R notation). (in MATLAB b) Midpoint method Instead of the left or right function values, the function value at the middle (see Figure 2.17(b)) of each of the 100 intervals, i.e. at the points 0.005, 0.015, . . . , 0.995, can be used to obtain higher accuracy. Convince yourself that with this method Imid , the number of valid digits more than doubles. c) Trapezoidal method Rather than approximating the integral by rectangles, one can use trapezoids, which are quadrilaterals with one pair of parallel sides; see Figure 2.17(c). The trapezoidal rule Itra essentially averages the function values at the left and right endpoints of each

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 117 — #53

i

Exercises

i

117

interval [0, 0.01], [0.01, 0.02], . . . , [0.99, 1.0]. Check for yourself that for functions with non-vanishing curvature, the results from the trapezoidal rule are different from those for the midpoint rule. The trapezoidal method can be implemented with 101 function evaluations, but the number of valid digits obtained is slightly worse than for the midpoint rule, because of error compensation. (Over the interval [0, 1], the top edges of the trapezoids always lie below the graph of the function.) d) Composite Simpson method Upon analyzing the error of the midpoint and trapezoidal methods, one finds that (see [62, p. 86]) at point xi , the respective errors of the integrals are 1 3  h f (xi ), 24 i 1 3  = h f (xi ), 12 i

midpoint = trapezoidal

(2.53) (2.54)

where f  (xi ) is the curvature of the integrand function at point xi . By looking at the graph of f (x) = exp(−x 2 ), one sees that for the interval [0, 1], the trapezoidal rule underestimates the integral, while the midpoint rule overestimates it. A higher-order method, the composite2 Simpson rule ICS , is a weighted average of the trapezoidal and midpoint rules; to balance the errors so that they compensate for each other, the method with higher accuracy is given more weight: ICS =

2 1 Imid + Itra . 3 3

(2.55)

By taking such a weighted average, the lowest-order error terms (2.53) and (2.54) of the midpoint and trapezoidal rules can be eliminated (they cancel exactly). The ‘ordinary’ Simpson method, which uses constant intervals and a parabolic curve to approximate the function over each interval, is slightly less accurate. An adaptive Simpson method with successive refinement of the integration intervals is R implemented in MATLAB and will be used below in part f). e) Cost–performance diagram Vary the number of time-steps and draw a cost–performance diagram, i.e. plot the number of function evaluations on the abscissa and the deviation from the exact value on the ordinate of a double logarithmic plot. For the composite Simpson rule, you should be able to reach the necessary number of steps where the rounding error does not allow any further improvement of the result, i.e. you obtain 14 or 15 valid digits with fewer than 10 000 steps; if not, the integration points are chosen wrongly. For the left and right rectangle methods, the midpoint rule and the trapezoidal rule, it is better to extrapolate the necessary number of steps. f) Gauss-integration R , namely the Gauss method quadgk The adaptive quadrature methods in MATLAB and the Simpson method quad, require the integrand as input. A suitable m-file for this purpose is 2 In this case ‘composite’ means that the method is composed of two different rules; sometimes ‘composite’ refers to the quadrature of several adjoining intervals.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 118 — #54

i

118

i

Understanding the Discrete Element Method

function fout=expmx2(xin) % evaluate the function 1/sqrt(2pi) exp(-x*x) disp(length(xin)) % display number of points for integration fout=exp(-xin.*xin); return

which is programmed with elementwise function evaluations and also tells us the number function evaluations performed. When expm2 is called several times by an adaptive method, the numbers of points are displayed multiple times. To obtain the actual number of function calls, these numbers must all be added up. Compare the number of function evaluations from part e) with the corresponding numbers for R ’s adaptive methods. MATLAB R is called by The adaptive Simpson rule in MATLAB quad(@expmx2,0,1,1e-12) or quad(@expmx2,0,1,1e-13) to reach, respectively, 14 or 15 digits of precision for the value in (2.52) with 200 to 500 function evalutations. The Gauss–Lobatto method can be called by quadl(@expmx2,0,1,1e-10) to reach the full accuracy of 14 to 15 digits with fewer than 50 function evaluations. This demonstrates the superiority of methods with an adaptive choice of integration points.

2.5

Harmonic oscillator with viscous damping R Use MATLAB ’s ODE solvers ode23, ode45, etc. in a driver like the one shown below: clear all format compact tspan=[0 80]; intitialcondition=[0 % v(0)

2]; x(0)

[t,y] = ode23(’harmos_damp’,tspan,intitialcondition); %[t,y] = ode45(’harmos_damp’,tspan,intitialcondition); subplot(2,1,1) plot(t,y(:,2),’+’) % plot the position subplot(2,1,2) plot(t(1:end-1),diff(t),’+’) % plot the time-step return

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 119 — #55

i

Exercises

i

119

R Use the driver on the following MATLAB function for a viscously damped linear oscillator:

function [dydt]=harmos_damp(t,y); %harmonic oscillator %y(1)=v, y(2)=x k=1; gamma=0.4; dydt(1,1)=-k*y(2)-gamma*y(1); dydt(2,1)=y(1); return; a) Observe how for large times (greater than 40), the step-size τi (which can be obtained R as diff(t)) increases as the amplitude becomes from the vector t in MATLAB smaller and smaller. Only the last time-step is smaller, so that the computation terminates exactly at the value of the final time. b) Look at the plot of the time-steps for gamma=0.01. Convince yourself that the time-step is reduced not only in regions where the curvature (the variation in the rate of change) of the position is large (i.e. at the extrema) but also in regions where the curvature of the velocity is large. c) Observe how for large damping, for which the solution is exponential decay, the ‘stiff’ solvers ode23s and ode15s can use a much larger time-step than the ‘nonstiff’ solvers ode23 and ode45. 2.6

Resonance Resonance is the reinforcement of oscillations due to external influences. While in previous exercises the ODE did not have explicit time dependence and so the input parameter t was not used, this exercise provides an example program where not only y but also t is used in the function. With an external sinusoidal force of amplitude f0, the behavior of the harmonic oscillator under resonance can be investigated using this program. For finite damping constant gamma, either beats (periodic increases and decreases in the amplitude; see § 5.3.1) occur in the solution or, if f0 is large enough, the solution amplitude follows the external force f0*sin(t). For vanishing damping gamma=0, the solution is a sine wave with linear increase of the amplitude. In the case of beats, you may have to increase the final time to see the whole evolution of the solution. function [dydt]=harmos_resonance(t,y); %harmonic oscillator %y(1)=v, y(2)=x k=1; gamma=0.1; f0=0.1; dydt(1,1)=-k*y(2)-gamma*y(1)+f0*sin(t); dydt(2,1)=y(1); return;

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 120 — #56

i

i

120

Understanding the Discrete Element Method

2.7

Inconsistent equations: the harmonic oscillator with dry friction The step-size control of standard solvers assumes that the right-hand side of the ODE system is smooth, so that reducing the time-step leads to a reduced change in the solution and hence to better accuracy (smaller error). But this is not the case if there is a jump built into the right-hand side. In the following piece of code, the viscous damping of Exercise 2.5 is replaced with a damping due to Coulomb friction. This means that the dependence on the velocity -y(1) is replaced by a dependence on the sign of the velocity, -sign(y(1)). Run the program up to t = 18. Be aware that for too-long integration times at near-zero amplitudes, the numerical solution will take a long time due to reduction of the timestep. When the velocity amplitudes are close to zero, there is a jump in the right-hand side. This non-smoothness forces small time-steps for some solvers, while for other solvers the time-step is reduced to practically zero and the integration fails altogether. The physically meaningful regularization of this problem is discussed in § 7.1.1. function [dydt]=harmos_dry_fric(t,y); %harmonic oscillator %y(1)=v, y(2)=x k=1; mu=0.2; dydt(1,1)=-k*y(2)-mu*sign(y(1)); dydt(2,1)=y(1); return; a) Observe how the time-step is reduced (and the computing time increased) considerably for the explicit solvers ode23, ode45 and ode113 as the amplitude approaches zero. This is the range where on physical grounds, static friction with absolute value smaller than mu should set in, which numerically is mimicked very badly by alternating the values ± mu. b) Observe how the stiff solvers ode23t and ode15s reduce the time-step to numerical zero (approximately 10−15 ) and then stop the integration with a warning Unable to meet integration tolerances without reducing the step size below the smallest value allowed Nevertheless, the program outputs the result up to the time at which the time-step collapsed to zero (t ≈ 16). c) Observe how the stiff solver ode23s computes the solution with time-steps similar to those of the corresponding explicit solver (ode23). In this case the stiff solver has no advantage, due to the inconsistent alternation of ± mu. The consistent way of representing static friction in the ODE will be introduced in § 3.3.

2.8

Van der Pol equation An interesting feature of the van der Pol oscillator y¨ = −y + μ(1 − y 2 )y˙

(2.56)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 121 — #57

i

Exercises

i

121

is that for μ = 0 it is equivalent to the harmonic oscillator, and as μ increases the solutions become more asymmetric and the equation eventually becomes stiff. Equation (2.56) can be rewritten as two coupled first-order equations: d (1) y = y (2) dt  2   d (2) y = μ 1 − y (1) y (2) − y (1) . dt R ODE solvers. This first-order system should be used with the MATLAB a) Take μ to be 1, 10, 100. Observe how the ‘frequency” increases as μ increases. Choose the time-span accordingly (between 20 and 4μ), and find the range of μ for which the equation becomes stiff (i.e. the time-step decreases drastically for the non-stiff solvers ode23, ode45 etc., but not for stiff solvers such as ode23s and ode15s). b) Plot the velocity, and verify the statement that the time-step decreases as the curvature in the graph of the variables over time increases.

2.9

Non-symplectic solvers with symplectic systems Observe that the energy does not remain constant for the harmonic oscillator in Exercise 2.5 with gamma=0 (no damping) when the time is increased up to hundreds, thousands, ten thousands . . . of periods. Figure out for which solvers the energy increases and for which solvers the energy decreases. Since growth or decay of the energy varies depending on the type of solver, its order and the time-step, this shows that it is not possible to ‘fudge’ energy conservation by modifying gamma. (For some solvers that would mean negative, unphysical gamma values!)

2.10 Velocity-Verlet methods and matrix exponentials Convince yourself that the coefficients in Table 2.2 on page 87 can be used to approximate matrix exponentials   E = exp θ (A + B) (2.57) by the symplectic approximations S=

 n

exp(τ γi A) exp(τ ηi B),

nτ = θ.

(2.58)

i

The corresponding operator exponentials for derivation of the symplectic integrators form the basis for deriving symplectic and pseudo-symplectic integrators. Matrix exponentials are the mathematical expressions for the time evolution of mechanical systems, where A and B represent the kinetic and potential energies, respectively. Set up symmetric matrices by using code such as l=8 A= rand(l)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 122 — #58

i

122

i

Understanding the Discrete Element Method

A=A+A’ B= rand(l) B=B+B’ Convince yourself that the ‘numerically exact’ matrix exponential (2.57) can be approximated by the decomposition (2.58). The matrix exponential (2.57) can be computed R command expm(theta*(A+B)) (θ can be a number larger using the MATLAB than 1, but not too large, as the exponential grows very fast). Note that the matrix exponentials for A and B must also be computed with the numerical matrix exponential function expm, not with the elementwise scalar exponential exp. a) Compare the relative error E − S / S for various values of n and τ . Here · is the matrix norm, which can be computed R . with norm in MATLAB b) Whether the approximated energy is above or below the exact value depends on the commutators [33] C(A, B) = [A, B] = AB − BA, the eigenvalues of C(A, B) and, depending on the order of the approximation, also the higher-order commutators [A, [A, B]], [B, [A, B]], [[A, B], A], [A, B, B], . . . . which can be evaluated for the above matrices directly. 2.11 A bouncing ball program It is always safer to develop a program by successive modifications than by writing ‘from scratch’. In this programming exercise, the simulation of a bouncing ball is obtained from modifications of the program for the harmonic oscillator. a) Modify the program for the harmonic oscillator function [dydt]=harmos(t,y); %harmonic oscillator with spring constant k %y(1)=v, y(2)=x k=1; dydt(1,1)=-k*y(2); dydt(2,1)=y(1); return; with spring constant k by introducing gravitation: function [dydt]=harmos_gravity(t,y); %harmonic oscillator with spring constant k,

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:21 — page 123 — #59

i

Exercises

i

123

% mass 1 and gravity g %y(1)=v, y(2)=x k=1; g=9.81; dydt(1,1)=-k*y(2)-g; dydt(2,1)=y(1); return; Verify that this does not change the oscillation frequency (plot the result using R ’s grid command), in accordance with the theory of nonhomogeneous MATLAB linear differential equations; see, for example, [63]. b) To model the interaction between a floor at z = 0 and a particle of radius r = 1, assume that there is only an interaction force if the center of the particle at y(2) is below 1, or else only gravity acts. The corresponding m-file looks like function [dydt]=bounce(t,y); % particle with radius r , mass 1 gravity g % and spring constant k bouncing on a floor %y(1)=v, y(2)=x k=1; g=9.81; r=1; if (y(2) t is it clear that a step-size has actually been accepted (or not). For explicit integrators, one can reconstruct the values y of accepted step-sizes from the fact that the times t1 , t2 , t3 , . . . for the evaluation have to be monotonic, i.e. t1 ≤ t2 ≤ t3 ≤ · · · . Take a program which can be guaranteed to change the time-step, such as the bouncing ball program from Exercise 2.11. Output the time and variables of every time-step into a file. Write a function which reads in the data and eliminates the redundant data from the rejected time-steps based on the fact that a new time t  computed after an old time t with t  < t indicates that the step from t to t  was rejected. Such a program is convenient for comparing the function evaluations of adaptive solvers with those of solvers that use a constant step-size, and to trace back the development of instabilities as well as the behavior of the time-step adaptation part. 2.13 A mathematically inconsistent model: bouncing ball with naive damping We now introduce velocity-dependent damping into the bouncing ball program via the constant gamma. Be aware that the following program has a jump in the force both where the particle comes into contact with the wall and where it separates, at which the relative velocities are largest. This yields unpredictable results and behavior, due to the initial conditions and the solver, depending on the time within a time-step at which the contact closes or separates. function [dydt]=bounce_damping_bad1(t,y); % particle with radius r , mass 1, gravity g % and spring constant k bouncing on a floor, % with velocity dependent damping % CAVEAT: This ODE is mathematically inconsistent %y(1)=velocity, y(2)=position k=1e4; gamma=0.1*sqrt(1e4) g=9.81; r=1 if (y(2) μ. For tan(α) = μ, theoretically either sliding or sticking is possible, but in simulations, the noise due to the initial positioning and the time integration will usually lead to a downhill slide at constant velocity. When tan(α) < μ, in naive implementations of particle simulations the block will also slide downhill if the friction force is computed as −μFn sgn(v), because the friction will overcompensate for the downhill force, and this causes sliding in the

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 131 — #3

i

Friction

i

131

Fdh n Ff Fn

mg t α

Figure 3.2 A block on a slope, showing the normal force Fn , the downhill force Fdh , and the weight (gravitational force) mg, in the normal–tangential coordinate system, which is depicted by the arrows on the right. The weight is drawn as acting on the center of the block, and the friction force Ff is drawn as acting on the midpoint of the line of contact.

uphill direction; then, in the next step, the friction will act downhill and pull the block downward. In the following step, the friction will work in the uphill direction again, and so on. The net motion will be downhill, while the actual dynamics will depend on the time integrator. The exact implementation of static friction so that the block stays in place, based on sliding velocity and external forces, will be discussed in § 3.3. The older literature on friction from the 19th century, by Morin [2], Conti [3] and later Galton [4] (all originally military engineers), helped to cement the opinion that static friction coefficients should be larger than those for dynamic friction. However, there is also manipulated ‘evidence’, such as lecture room experiments where a block was pulled with a force meter, which dutifully showed a larger reading when the block started to move than afterwards—this was of course due to the force necessary for the accelerations, and had nothing to do with the static coefficient of friction. (The processes which influence static friction will be discussed further below.) Already in the early 20th century, Klein and Sommerfeld considered Morin’s experiments unreliable ‘due to the circumstances such experiments depend on’ [1]. In contrast to many other material parameters, the friction coefficient is on the order of 1, independent of the material strength; see Figure 3.3. ’Very large’ coefficients (greater than 1) can be found for materials with both very low (e.g. polyurethane) and very high Young’s modulus (e.g. platinum). Home-made experiments may turn out differently from tabulated data on friction due to the experimental conditions—for example, laboratory experiments involving metallic surfaces are performed in a vacuum or in an atmosphere of inert gases to avoid oxidization of the contacts. As the particle size decreases towards the atomic scale, it is the character of the friction that determines whether a particle can be considered ‘granular’ or whether it is a molecule. For polymer surfaces on the nano-scale, the velocity dependence is linear [5] for very small velocities, so one has to conclude that such macromolecules (at molecular weights ranging from 103 to 80 × 103 g/mol) do not behave as ‘solid grains’. They are heavier than some nano-powders. which behave reasonably like granular materials with Coulomb friction, allowing the construction of heaps. On the other hand, for solids on the nano-scale, Coulomb friction, with its characteristic jump at v = 0 and proportional to the normal force in accordance with Equation (3.1), has been measured using atomic force microscopy [6].

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 132 — #4

i

132

Understanding the Discrete Element Method

1.6

PU

Rh

1.4

Pt

1.2 Friction coefficient

i

Al

Phen ACR

0.8 0.6

Nylon ABS PP

0.4 PE

BW PW

0 106

107

Br

Ep PC

Ir W

Lim

Granite

PI

SIL

0.2

Cu MiFe

Gl

Rubber

1

Au

Ste

OA

PS Bra

BN TC GR Di

FEP PTFE 108

109

1010

1011

1012

Young modulus [Pa]

Figure 3.3 Coefficients of sliding friction for contacts between the same materials, demonstrating their independence of Young’s modulus. The materials are: acrylonitrile butadiene styrene (ABS), polymethylacrylate (ACR), aluminum (Al), gold (Au), boron nitride (BN), beeswax (BW), brick (Br), brass (Bra), copper (Cu), diamond (Di), epoxy (Ep), fluorinated ethyline propylene (FEP), iron (Fe), monocrystalline graphite (GR), glass (Gl), granite (Granite), iridium (Ir), dry limestone (Lim), mica (Mi), polyamide (Nylon), oak along grain (OA), polycarbonate (PC), polyethylene (PE), polyimide (PI), polypropylene (PP), polystyrene (PS), teflon (PTFE), polyurethane (PU), paraffin wax (PW), phenol-formaldehyde (Phen), platinum (Pt), rhodium (Rh), rubber (Rubber), silicone (SIL), steel (Ste), tungsten carbide (TC) and tungsten (W).

3.1.2

Static and dynamic friction coefficients

In the 19th century, the view got established that the dynamic friction coefficient μd for dry surfaces is smaller than the static friction coefficient μs . In the second half of the 20th century, the possibility to better control experimental conditions such as air humidity led to the conclusion that for contacts which do not change chemically or mechanically over time, the dynamic friction coefficient is the same as the static friction coefficient. In a recent table [7] giving two coefficients of friction, with μd < μs , the reference sources for the static friction coefficients differ from those for the dynamic friction coefficients—in other words, the values of the coefficients originate from different laboratories, using different samples and taking measurements on different machines. While this table upholds the idea that static coefficients should be larger than dynamic ones, the outcome is a result of the choice of publication sources, not of physical necessity: even for the same material, friction coefficients may vary due to experimental conditions (e.g. vacuum or controlled air humidity, use of outgassed or unprepared samples, etc.). Newer tables in tribology (see [8–10]) give only one coefficient for static and dynamic friction, except in the case of polymers,1 where the static coefficient of friction is smaller than the dynamic coefficient of friction μs < μd (see [8, p. 547ff])! 1 Polymers are in many respects different from other materials (crystalline or polycrystalline solids); for example, a rubber band will contract when it is warmed up [11, p. 39], in contrast to most other solids, which will expand; so it is not beyond imagination that the friction properties are also somehow exotic.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 133 — #5

i

Friction

133

(b) abs. value of dynamic friction

(a)

Maximal static friction

i

Fs = f0 + k t1/10 μ

μ

Time t

Fd = c v−1/10

Velocity

Figure 3.4 (a) Dependence of static friction on the time of contact. (b) Dependence of dynamic friction on the velocity.

The coefficient of friction increases with time (except for measurements in vacuum), albeit only weakly; the relationship for the friction force is given by Rabinowicz [12, p. 72] as Fs (t) = f0 + kt 1/10

(3.3)

(see Figure 3.4(a)), and by Popov [13, p. 137] as F˜s (t) = a + b ln(t + t0 ).

(3.4)

Experiments in rock mechanics show that when the air humidity is controlled, the coefficient governing the increase (k in Equation (3.3), b in Equation (3.3)) is proportional to the air humidity [14]. The same dependence on air humidity is exhibited by the angle of repose for granular materials in a rotated drum [15]: the longer the waiting time and the higher the air humidity, the higher is the angle of repose which can be obtained. While classical mechanics treats frictional contacts as inert, absorption (usually of water molecules) and various chemical reactions take place on the surface, which change the nature of the contact. Nevertheless, for DEM simulations of many particles, the disorder in the normal forces usually guarantees variation in the tangential forces, so that it is not necessary to introduce an additional variation of the friction coefficient in time to obtain a distribution of inter-particle tangential forces. As with the time dependence, there is a similar weak dependence of the friction force on the velocity. Different functional dependencies have been proposed: Rabinowicz [12, p. 72] gives Fd (v) = cv −1/10

(3.5)

with parameter c (see Figure 3.4(b)), while Dunaevsky [8, p. 448] gives ˜ exp(cv) ˜ ˜ + d, F˜d (v) = (a˜ + bv)

(3.6)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 134 — #6

i

134

i

Understanding the Discrete Element Method

˜ c˜ and d. ˜ Note that the formulae for the weak dependence on time t and with parameters a, ˜ b, velocity v, Equations (3.3)–(3.6), have different functional dependencies and are expressed for the friction force, not for the friction coefficient. We also point out that there is a strong influence of the surface chemistry. Conventional material surfaces have a relatively complicated layered structure; for metals, over the original metal substrate there may be a work-hardened layer (e.g. obtained by forging in the case of iron), then an oxide layer, and above that absorbed gas (usually water molecules) and contaminant (e.g. skin fat if the object has been touched with bare hands). For longer durations of sliding contact, the layers may experience abrasion without actual macroscopic wear becoming visible, and this may account for the large discrepancies in the data on friction coefficients in the literature. The distribution of the friction coefficient due to such material inhomogeneity is usually larger than the range which could be reached by a velocity dependence like Equation (3.5) or (3.6) for the velocities obtainable in discrete element simulations. As a sidenote, to reduce the influence of the aforementioned layering due to laboratory conditions, scratching and mechanical handling, some experimental groups in powder mechanics throw glass beads away after using them only once [16]. As the velocity dependence is logarithmic, only a few materials allow measurements over a large range of velocities without exhibiting wear at the contact surface due to too-large sliding velocities or deformations over too-long times for too-small sliding velocities. Rabinowicz [12] gives an example of titanium on titanium where there is a variation of the coefficient of friction from about 0.4 to about 0.6, with a variation of the velocity from 10−7 mm/s to 103 mm/s. For the sake of completeness, we also mention friction between solid bodies and Newtonian fluids (fluids for which the strain rates, i.e. the flow velocities, are proportional to the stresses; an example of a non-Newtonian fluid would be ketchup). For low flow velocities, viscosity dominates, so the friction force will be proportional to the velocity, as modeled by the damped harmonic oscillator in Equation (1.95). For large Reynolds numbers (large flow velocities), the inertia dominates, so the friction force (‘drag’) will be proportional to the square of the velocity. Where fluid and solid friction occur simultaneously, as in the case of lubricated friction, hybrid friction laws are used, such as the Stribeck friction introduced in § 1.6.2.

3.1.3

Apparent and actual contact area

Friction is independent of the apparent contact area but depends on the normal force Fn . With the same material, for larger normal forces, larger areas are in contact (see Figure 3.5) so that more surface electrons contribute to the adhesion. For very large compression or very soft materials (e.g. copper on copper, or lead on lead), when surfaces deform plastically, higher friction coefficients can be measured: for copper, the friction coefficient can vary from 0.5 to over 1.5 (see [12]) as the normal force is increased. However, these are effects of the adhesion, as can be seen from comparison with the load dependence of steel on aluminium, where the friction coefficient is practically unchanged. The adhesion is bad, because the electron affinity of the two materials is so poor that they cannot be alloyed. The reason that the friction coefficient is on the order of 1 is that shear and normal stresses are related via the bulk shear strength [12, p. 74]; this allows the microscopic rearrangement of surfaces so that the number of contacting surface electrons becomes proportional to the load.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 135 — #7

i

Friction

i

135

(a) Fn

(b) 2Fn

Figure 3.5 Cross-section of the apparent contact area (whole length of the contacting bodies) and the actual contact area (marked by thick lines) for: (a) a certain normal force Fn ; (b) twice this normal force, 2Fn . The doubling of the friction force is due to doubling of the actual contact area and the resulting adhesion.

3.1.4

Roughness and the friction coefficient

For centuries there was a dispute between the ‘roughness theory’ and the ‘adhesion theory’ of friction. In the 1950s this was resolved in favor of the adhesion theory, which asserts that friction is caused mainly by ‘unemployed’ electrons at the surface of two contacting solids. As a result of adhesion, smooth surfaces can exhibit rather large friction coefficients; in his book Friction and Wear of Materials [17], Rabinowicz cites as an example surfaces of atomically smooth mica plates that give friction coefficients of nearly 1. The relation between friction and surface electrons manifests itself most spectacularly in the form of triboelectricity: when materials of different electron affinity are rubbed together, for many material combinations electrons will leave one material and move to the other, with rather high voltages building up. ‘Popular’ combinations include glass with leather or cat fur, and (usually unintentionally, in winter) cotton shirts with polyester sweaters. The effect is by far not marginal: high-voltage generators, such as the Wimshurst machine and the Van de Graaff generator, have been based on it. Rabinowicz laments in the first (1965) edition of his book [17] on the recognition of the adhesion theory of friction that this ‘development has penetrated rather slowly’ into the field of mechanics, where ‘smooth’ is still equated to ‘frictionless’. When the second edition [12] appeared in 1995, the lament was left in, and even in recent years frictional laws are still being derived (e.g. [18]) based on surface roughness, in some cases assuming an increase of the friction coefficient with the velocity [19], the opposite behavior of what is found experimentally. In fact, for rougher surfaces, the friction coefficients tend to be lower than for very smooth surfaces, such as copper on copper; see [12, Figure 4.14]. Recently, a powerful and cheap tool has become available in the form of Gel Gems , allowing one to ‘play around’ with adhesion effects. The extremely soft Gel Gems indeed stick much better on smooth than on rough surfaces. A striking effect can be obtained by rubbing lead pellets with flat surfaces against each other, as shown in Figure 3.6. The pellets will stick to each other, so that the lower plate can be lifted by the force from the upper plate. This shows that adhesion alone can produce forces equivalent to the contact pressure of a body under its own weight, as postulated by the adhesion theory of friction. It is the adhesion of metal surfaces that is responsible for

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 136 — #8

i

136

(a)

i

Understanding the Discrete Element Method

(c)

(b)

p

p

Figure 3.6 (a) Two circular lead pellets with scratched flat surfaces; the lighter color indicates metallic regions, and the darker color indicates oxidized regions. (b) The surfaces are rubbed against each other under pressure. (c) The surfaces stick together: the lower pellet can be lifted with the upper pellet; the candle flame indicates the upward direction.

this effect; if the surfaces are dirty or contaminated, it will be difficult to make them stick, and wiping the surfaces often helps to improve the sticking. The rubbing must be able to ‘smooth’ the surface, at least at some contacts—for hard lead the experiment will not work, as the material is too hard.

3.1.5

Adhesion and chemical bonding

The four main kinds of chemical bonding are ionic bonding, covalent bonding, metal bonding and hydrogen bonding. Quantum mechanical interactions take place between states of similar energy, which explains cum grano salis why metal surfaces are lubricated with oil composed of molecules with covalent bonds: the difference between the energy states (covalent electrons on one side and metal electrons on the other) leads to poor electron affinity and hence to good lubrication. Likewise, the friction coefficient between teflon and metals is very low. The friction coefficient will change significantly when, at high temperatures, the surface electrons become chemically activated or the surface chemistry changes (by a factor of nearly three for copper on carbon in [12, p. 104]). Unrelated to this influence of high temperatures are mechanical effects due to other temperature-induced changes of the surface chemistry. The variations in the mechanical properties reported by Rueche et al. [20] are probably due to changes in the surface humidity [21], which lead to a change in the adhesion force between particles. Experimental control of these effects is extremely cumbersome: drying granular materials so that the surface humidity is in equilibrium with the humidity of the surrounding air may take weeks [16], because the water molecules must be allowed to diffuse through the granular pore space.

3.2

Other contact geometries of Coulomb friction

In § 3.1 we dealt with the pure sliding case, where the velocity v is the relative velocity of the two surfaces at the contact area, and at each contact point this velocity is the same. The mathematical form of Coulomb friction, (3.1)–(3.2), holds also for other contact situations, namely rolling and sliding, but with different magnitudes of the coefficients; see Figure 3.7.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 137 — #9

i

Friction

i

137

(a) Sliding friction (b) Rolling friction 2Vc

Z

(c) Pivoting friction Vc

Figure 3.7 Velocity at the contact point for a particle on a plane: (a) pure sliding friction; (b) rolling friction; (c) pivoting friction. For sliding friction, the velocity at the contact point is the same everywhere; for rolling friction, it is zero; and for pivoting friction it increases in proportion to the distance from the axis of rotation.

(a)

(b)

(c)

(d) Stainless steel

Gel Duralmin

Figure 3.8 Exaggerated sketches of the deformation at the contact point, depending on the relative elasticity of the bodies which causes rolling friction: (a) deformed ground and undeformed rolling body; (b) deformed rolling body and undeformed ground; (c) deformation of both the ground and the rolling body. Panel (d) shows the actual shape of a very soft ground (gel) when a hard body (stainless steel cylinder) is pressed into it.

For particles which move only rectilinearly, without rotation, the relative velocity at the contact is the same as the relative velocity of the respective centers of mass. If additionally there is rotational motion of the contacting particles, we have additional torques due to the contact force distribution. For simplicity let us consider a single particle on a plane which is symmetric with respect to the axis of rotation. If the axis of rotation is parallel to the plane, we have rolling friction; if the axis of rotation is normal to the plane, we have pivoting friction.

3.2.1

Rolling friction

For pure rolling friction of a particle on a plane, the rotation axis is parallel to the plane as in Figure 3.7(b), and for rigid bodies the relative velocity at the contact would be zero. For actual materials, rolling friction results from the deformation at the contact point; see Figure 3.8. This means that in contrast to sliding friction, which is also meaningful for rigid bodies, rolling

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 138 — #10

i

138

i

Understanding the Discrete Element Method

friction is a property of elastic bodies. In discrete element simulations, the deformation at the contact point is not computed, so rolling friction can only be modeled via parameters. When we consider the normal force acting on the contact and the force that is necessary to move the body forward, we again have a Coulomb-type friction law. For a single round (spherical or cylindrical) particle, it is convenient to define the friction as the force which is needed to displace the particle from its position; in analogy to sliding friction, rolling friction has the form Frol = μF,rol Fn

(3.7)

with dimensionless μF,rol (for the static case, an inequality analogous to (3.2) holds). This simplifies comparison of the magnitudes of sliding and rolling friction: coefficients of rolling friction for smooth bearing balls (1 × 10−3 ) and bearing cylinders (4 × 10−3 ) [22, p. 150] are around two orders of magnitude smaller than the usual coefficients of sliding friction in Figure 3.3, and still one order of magnitude smaller than the sliding friction between teflon and metals. Comparing such a dimensionless coefficient makes sense only for round particles of the same diameter. For many-particle contacts, writing the friction in terms of torque is necessary: τrol = μτ,rol Fn ,

μτ,rol = rμF,rol ,

(3.8)

where μF,rol is the coefficient of rolling friction from Equation (3.7) and r is the radius of the round particle (or the distance between the center of mass as the axis of rotation and the contact point). Thus, μτ,rol has the dimension of [m]. For this definition, the size of the objects has to be given. In discrete element simulations, for polygons or polyhedra with many corners, the rolling friction may be of the order of the dissipation (due to normal damping of the up–down motion of changing corner–side contacts and numerical dissipation of the integrators). Attempts have been made to model the behavior of assemblies of non-spherical particles with round particles and huge rolling friction coefficients. However, the strength of a bulk made up of many particles depends basically on the competition between rolling and sliding of the particles, which itself depends on the particle shape. Increasing the rolling friction coefficient does nothing to improve verisimilitude; even a square standing on an edge can have a small rolling friction coefficient, but when it falls on a side, it basically will not roll.

3.2.2

Pivoting friction

Hardly any general tracts exist (in particular not in English) that go into detail about pivoting friction, which is also called drilling friction or boring (in the sense of a rotational contact without wear) friction. A longer discussion can be found in volume III of The Theory of the Top by Klein and Sommerfeld [1], which has recently been translated into English [23, p. 546ff]. Further, there is the treatment by Contensou [24, pp. 201–216] in French. For pivoting friction a Coulomb-like friction law with torques, τpiv = μτ,piv Fn ,

(3.9)

similar to the one for rolling in Equation (3.8), can be assumed (based on the sliding of surface sections with the same velocity), where μτ,piv has dimension [m]. Contacts of different size

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 139 — #11

i

Friction

i

139

and the same shape will have different friction coefficients. A basic problem in practice is the surface roughness, which is not an issue for rolling friction (gears can be in practically perfect rolling contact even though they are ‘ideally rough’). With pivoting friction, a freely spinning top on a plane may get stuck at surface asperities. Klein and Sommerfeld [1] warned that in this case, any theory for a freely moving top breaks down. So, for axially mounted contacts, surface asperities can easily lead to artifacts. For measurements of the coefficient of pivoting friction for contacts between half-spheres and planes, we have found that the surface roughness of planes made with planar rotating machining tools gave different coefficients for clockwise and counterclockwise rotation, although no isotropies were found in similar surfaces for the coefficient of sliding friction; only with additional polishing did the plane surfaces give consistent measurements for both directions of rotation [25]. Pivoting friction (or, equivalently, its energy dissipation) is ‘weaker’ than sliding friction: it is easier to move an elongated object upright by a pivoting motion than by a sliding motion. Typically, for wood-splitting, one uses a chopping block, a piece of tree trunk which is heavy enough not to fall over. The block may be too heavy to carry, and still too heavy to tilt on its side and roll; moreover, its conical shape does not allow it to roll in a straight trajectory. It can, however, be moved by rocking it along its edge in a cycloid motion, using a combination of pivoting and rolling friction; see Figure 3.10. In another sense, pivoting friction is ‘larger’ than rolling friction: with no special mechanism to suppress rolling, a spinning egg will spin by rolling, not by pivoting alone; see Figure 3.9. Accordingly, for non-spherical particles and the discrete element method, pivoting friction may be more important than rolling friction. (a)

(b) t = 0s

(e) t = 1.53s

(f) t = 2.04s

(c) t = 0.51s

(g) t = 2.56s

(d) t = 1.02s

(h) t = 3.07s

Figure 3.9 (a) Sketch of a wooden egg-shaped spheroid, alternately colored in sections around the axis of symmetry. In (b)–(h), such a spheroid is shown to spin (i.e. the orientation of the long axis changes) by rolling (i.e. the orientation along the long axis changes, too); successive frames are taken at intervals of 0.51 s. At the beginning, in (b), an uncolored section faces up; after about one turn, in (h), a black section is facing up. Thus, from (b) to (h), the orientation of the long axis has changed by about 2π, while the orientation around this axis has changed by about π/2.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 140 — #12

i

140

i

Understanding the Discrete Element Method

Figure 3.10 Rocking a chopping block which is too heavy to carry or to tilt for rolling makes use of the fact that the force needed to overcome the combination of rolling friction and pivoting friction is smaller than that needed to overcome sliding friction.

vcenter(0)

vcenter(tpr/2)

v (t ) ω(t r ) center pr

ω(tpr /2)

vcontact(0)

vcontact(tpr /2)

vcontact(tpr) = 0

Figure 3.11 Transition from perfect sliding (no rotation) to perfect rolling (zero contact velocity) at tpr .

3.2.3

Sliding and rolling friction: the billiard problem

There are not many models that can be used to simultaneously study sliding, which is dissipative, and rolling, which in a first approximation can be treated as non-dissipative. A convenient test case for particle simulations is the ‘billiard ball problem’, where a circular object with radius r and symmetrically distributed mass, i.e. symmetric moment of inertia I , is in contact with the ground, without rolling friction; see Figure 3.11. Initially there is perfect sliding (no rotation, the velocity of the center of mass is equal to the contact velocity), and the orientation φ of the body does not change. Then the torque due to the sliding friction sets the body to rotate, until it is eventually in perfect rolling state with angular velocity ω = φ˙ (no sliding, the contact velocity is zero). The normal force at the contact due to the gravitation g and mass m of the body is Fn = −mg,

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 141 — #13

i

Friction

i

141

and the resulting friction, as long as the contact is sliding, is Ff = −μmg (with friction coefficient μ). As there is only rotation along one axis, so that the nonlinear terms ωi ωj in the Euler equations (1.35)–(1.37) drop out, the resulting torque rFf increases the angular momentum L = I ω(t) so that, analogous to Newton’s equation of motion, dL = I φ¨ = Ff r. dt

(3.10)

As an example, consider a sphere with moment of inertia I = (2/5)mr 2 (an analogous calculation is possible for any other circular object with symmetrically distributed mass, such as a cylinder or circular disk, for which we would use I = (1/2)mr 2 ). For φ we obtain that r φ¨ =

5 μg. 2

(3.11)

For the rectilinear degree of freedom, the mass is also decelerated by the friction, so that we have x¨ = −μg.

(3.12)

Many beginners in multi-body mechanics have trouble with the concept that the force simultaneously causes a torque T at the contact point and performs frictional work Wf at the center of mass. It helps to write down the equations in vectorial form to understand why there is no double counting of forces. For the torque T = Ff × r,

(3.13)

we use the vector (cross) product ×, while for the work  Wf =

Ff · dx

(3.14)

we have the inner (dot) product · ; so different components of the force enter into different computations. When we integrate Equations (3.11) and (3.12) with initial conditions x(0) = 0, ˙ = 0, we obtain φ(0) = 0, x(0) ˙ = v0 and φ(0) x˙ = v0 − μgt r φ˙ =

5 μg, 2

(3.15) (3.16)

which is valid as long as there is sliding. The contact velocity vc = rω − x˙

(3.17)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 142 — #14

i

142

i

Understanding the Discrete Element Method

relates the angular velocity ω = φ˙ and the velocity of the center of mass x. ˙ The latter decreases due to the torque of Equation (3.13), as kinetic energy of the rectilinear degree of freedom is shifted towards the angular degree of freedom when the body is set in rotation. We can integrate these equations until tpr , when the motion transitions from mixed sliding and rolling to pure rolling, such that vc (tpr ) = 0. As the torque is constant, we can write Equation (3.16) as φ˙ = 52 μgtpr /r and substitute (3.15) into (3.17) to obtain 5 μgtpr = v0 − μgtpr , 2

(3.18)

a condition for when perfect rolling occurs. Therefore, perfect rolling occurs at and after tpr =

2v0 ; 7μg

(3.19)

see Figure 3.12. After tpr , not only the velocity (3.17) at the contact point is zero, but also the frictional force and the resulting torque at the contact. According to Equation (3.19), the distance covered from perfect sliding through mixed sliding and rolling to perfect rolling will be xpr =

12v02 . 49μg

(3.20)

What is most instructive about this case is that for an ordinary differential equation from classical mechanics, a piecewise solution in time is necessary due to the transition from dynamic (a) 4

5/7 v0

v, x

v 2

x

(12v02)/(49 μg) 0

0

0.05

0.1

0.15

0.2

0.25 Time

0.3

0.35

0.4

0.45

0.2

0.25 Time

0.3

0.35

0.4

0.45

(b)

ω, ϕ

3 2 ω

1

ϕ

0 0

0.05

0.1

0.15

R Figure 3.12 Billiard problem, computed with ode23 of MATLAB , with m = 1, g = 9.81 and initial velocity v0 = 4. (a) Plots of the velocity v and the position x; (b) plots of the angle φ and the angular velocity ω. The maximal time-step was limited to 0.01 via odeset; for larger time-steps or higher orders, the integrator may miss the exact results (dotted) from Equations (3.19) and (3.20).

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 143 — #15

i

Friction

i

143

to static friction. The change of the kinematic state from perfect sliding, where the torque and frictional force are maximal, to perfect rolling occurs according to Gauss’s principle of least constraint (see, e.g., Sommerfeld [26, p. 210]). Simply speaking, the system tries to minimize the constraint—the deviation between the accelerations and the forces, rescaled by the respective moments of inertia. This meaning of ‘physical constraint’ as a dynamic quantity should not be confused with the meaning of the constraint conditions discussed in Chapter 2, § 2.8.

3.2.4

Sliding and rolling friction: cylinder on a slope

The example of a block on a slope can be extended to a rolling body as in Figure 3.13. In contrast to the previous section, here we will always assume perfect rolling, without slip, i.e. not too large angles α of the slope, or not in the initial phase of sliding. For a body with rotational symmetry and moment of inertia I on a slope, the normal force is Fn = mg cos α, as for the block. The friction force then satisfies the two equations mx¨ = mg sin α + Ff

(3.21)

¨ rFf = I φ,

(3.22)

¨ the tangential acceleration where φ is the angular orientation of the cylinder. With x¨ = r φ, is coupled with the angular acceleration. The component of the force which accelerates the angular degree of freedom is ‘missing’ from the acceleration of the rectilinear coordinate, hence the ‘−’ in Equation (3.21). Plugging Equation (3.21) into Equation (3.22) yields

r a

Fdh n

Ff Fn

mg t α

Figure 3.13 A cylinder on a slope, showing the normal force Fn , the downhill force Fdh , and the weight (gravitational force) mg, in the normal–tangential coordinate system, which is depicted on the right. The weight is drawn as acting on the center of the cylinder, and the friction force Ff is drawn as acting on the contact point. One can imagine the torque on the cylinder as the result of the weight mg acting vertically downward at the center, shifted from the actual contact point by a distance a, which acts as the force arm.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 144 — #16

i

144

i

Understanding the Discrete Element Method

Ff =

mg sin α 1 + mr 2 /I

(3.23)

as the force opposing the downhill force Fdh , which is different from mg sin α for the sliding block. While the condition of rolling is easy to impose theoretically, the experimental realization may need some fine-tuning. With perfect rolling, a hollow cylinder should take a longer time to move down a slope than a massive one. In practice, if one takes a light hollow cylinder and a heavy massive cylinder, the hollow cylinder can bounce more easily and, with partial sliding, move faster than the massive cylinder. Likewise, for a discrete element simulation, where the normal force may be affected by bouncing, the initial conditions have to be chosen carefully to obtain rolling without bouncing. The correction 1 + mr 2 /I in the denominator in Equation (3.23) allows the interpretation of mt = m

1 (1 + mr 2 /I )

as tangential mass. When one goes from particle–ground contact to two interacting particles of masses m1 and m2 , their reduced mass must be introduced for interactions in the normal direction: m∗ =

m1 m2 . m1 + m2

For interactions in the tangential direction, such as with ideal rolling, if the moments of inertia are I1 , I2 and the distances between the contact point and the centers of mass are r1 , r2 , then for the tangential motion there is additionally the tangential reduced mass [27] m∗t =

3.2.5

m1 m2 . m1 + m2 + m1 r12 /I1 + m2 r22 /I2

Pivoting and rolling friction

Usually, phenomena such as a spinning egg that rises if it is cooked but does not rise if it is raw are treated theoretically under the assumption of pivoting or even sliding friction. We have found experimentally that solid eggs or ellipsoids spin only for very slow spinning velocities, which may be due to locking of surface asperities; at higher velocities, which are not even sufficient for rising, in the spinning motion it is not pivoting that occurs at the contact but rolling, as can be seen from the sequence shown in Figure 3.9. The color pattern visible on the top face of the egg is changing continuously, i.e. the color which was initially on the underside becomes visible, whereas for pure pivoting one would only see rotation of the pattern in the first frame (Figure 3.9(b)).

3.3

Exact implementation of friction

For single-particle contacts or one-dimensional systems, the friction force can be derived exactly so that the condition (3.2) for static friction is satisfied; the derivation is based on

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 145 — #17

i

Friction

i

145

constraint relations, i.e. differential algebraic systems. In the mathematical formalism, we follow Hairer et al. [28, p. 196ff].

3.3.1

Establishing the difference between dynamic and static friction

To implement Coulomb friction in the exact sense, we first need a mathematical criterion to distinguish between dynamic friction, where friction acts as dissipative force and at v = 0 the direction is reversed, and static friction, where the frictional force has to be implemented as a constraint force. That the velocity v equals zero is not by itself a sufficient criterion for static friction: a particle in oscillatory motion may reverse its velocity repeatedly, each time passing through a state with v = 0, but the motion is purely dynamic; see Figure 3.15. Furthermore, we need a criterion that is practical for numerical implementation; as the velocity will hardly ever be ‘exactly zero’ (i.e. zero up to the full 15 digits for double precision), we need a criterion which is robust with respect to the discretization error and other noise. Let us return to the phase portrait of the linear oscillator with dry friction from § 1.4.1, mx¨ + μ sgn(x) ˙ + kx = 0,

(3.24)

where we take m = 1 and k = 1, i.e. we drop the dimensions in the following analysis. We can study the scalar product of the force and the velocity in the two regions I, where v > 0, and II, where v < 0, in Figure 3.14. Writing fI and fII for the sum of all forces (including the dynamic friction), we can define their scalar products with the corresponding velocities as aI = v · fI ,

(v > 0),

(3.25)

aII = −v · fII

(v < 0).

(3.26)

We have plotted the signs of aI and aII in Figure 3.15. The following cases can be distinguished: 1. For aI > 0 and aII < 0, the flow traverses the field from region II to region I (to the left of x = −μ); the friction is dynamic friction. 2. For aI < 0 and aII > 0, the flow traverses the field from region I to region II (to the right of x = μ); the friction is dynamic friction.

v

0.1 0 −0.1 −0.8

−0.6

−0.4

−0.2

0 x

0.2

0.4

0.6

0.8

Figure 3.14 Flow in phase space for the linear oscillator with dry friction around the region with static friction, for μ = 0.2. For static friction, the flow lines for v > 0 and v < 0 in the region −2.2 ≤ x ≤ 0.2 point against each other, whereas for dynamic friction the flow has the same direction for v > 0 and v < 0 in the region |x| > 0.2.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 146 — #18

i

146

i

Understanding the Discrete Element Method

(a)

1 a

–1 0 –0.5

–1 –1

–0.2 0

–0.5

0.2 0

0.5

v

x

0.5 1

(b)

1

(c)

1

Sign(aI)

0

1 0 –1

–1 1

0.2 0.1 v

0.5

0 –0.1 –0.2

v

0

–0.2

–0.5

0 x

0.2

Sign(aII)

–1

–1

–0.5

–0.2

0

0.2

0.5

1

x

Figure 3.15 (a) Flow in phase space of the linear oscillator with friction, for μ = 0.2, where the piecewise indicator functions aI and aII for above and below v = 0, respectively, can be lumped into a = − sgn(v)x − μ and plotted in the z-direction: (b) the regions for dynamic and static friction; (c) the region for static friction where the flows for velocities with different signs push against each other.

3. For aI < 0 and aII < 0, in the region −μ ≤ x ≤ μ, flows with different signs push against each other, into the constraint, as shown in Figure 3.15(c); this is the case of static friction where the actual friction forces are −μ ≤ Fs ≤ μ.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 147 — #19

i

Friction

i

147

4. For aI > 0 and aII > 0, the flow would pull away from the constraint; this does not happen for the time evolution in Figure 3.14. However, if we try to choose an initial condition when unaware that we are in the regime of static friction, we end up with a contradiction, as the following argument shows. a) b)

We work, as usual, under the premise that friction decelerates the motion. We assume that we are in the region −μ < x < μ; accordingly, the absolute value of the elastic force is smaller than the dynamic friction force. c1 ) Let us assume that the particle moves to the right. Because of b), the dynamic friction works against the velocity; hence we have mx¨ + kx − μ = 0, so that mx¨ = (−kx + μ) > 0. Therefore the particle moves to the right and is accelerated towards the right, which is a contradiction to the premise of friction as a decelerating force in a). c2 ) If we now assume that the particle moves to the left, we will find that the friction accelerates the particle towards the left, which is again a contradiction to a). This corresponds to the situation where the flow ‘pulls away’ in an opposite direction from the constraint manifold (v = 0, −μ ≤ x ≤ μ); see Figure 3.16. Of course, what is wrong here is the assumption that the static friction must have magnitude μ, as it is in the case of dynamic friction. This is a case of inconsistent initial conditions as explained in § 2.8.2. Such examples have been discussed in the literature under the name of ‘Painlev´e paradox’, usually with more degrees of freedom so that the corresponding motion in phase space is more difficult to fathom. Already in the 1940s, Hamel re-examined several such paradoxes and found that some could be solved by demanding continuity of the solution [29, p. 636], while for others [29, p. 549] unique solutions could always be obtained, ‘. . . but we had to give up the assumption that every initial condition is realizable. Rather, we had to demand that only such initial conditions are allowed which can be produced from the initial position via an appropriately chosen force. Then everything paradoxical vanishes to which

v

0.1 0 −0.1 −0.8

−0.6

−0.4

−0.2

0 x

0.2

0.4

0.6

0.8

Figure 3.16 An unphysical flow in phase space for the linear oscillator with dry friction—a case of the Painlev´e paradox: with the full value μ for dynamic friction and external forces smaller than the friction at zero velocity, the flow pulls away from the constraint manifold at v = 0.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 148 — #20

i

148

i

Understanding the Discrete Element Method

Painlev´e took such exception that he thought he had to state the impossibility of the Coulomb–Morin friction laws’ [our translation, slanted text retained from the original]. In other words, Hamel demanded compliance with consistent initial conditions for constraint systems, a requirement which in the field of differential algebraic equations (see § 2.8) is nowadays part of the general theory [30]. For simulations with Coulomb friction, choosing initial conditions with v = 0 and arbitrary (often inconsistent) x is very tempting to the user of a simulation, so such initial conditions with aI , aII > 0 must be explicitly prohibited in the program.

3.3.2

Single-particle contact

We model the stick–slip oscillator in Figure 3.17 by the differential equation x¨ + γ x˙ + μ sgn(x) ˙ + x = A cos(ωt).

(3.27)

To determine what happens for x˙ = v = 0, we define the switching function g(x, v) = sgn(v).

(3.28)

As the choice of the function name g indicates, this will turn out to be the constraint function. We have left an implicit dependence on x in the argument, because—as we saw in the previous section—depending on whether the friction is static or dynamic for different x, different courses of action have to be taken. We can now separate the solution of Equation (3.27) into two branches, depending on the sign of the velocity: 

fI (y) = (A cos(ωt) − x¨ − μ − x)/γ f˜ = x˙ = fII (y) = (A cos(ωt) − x¨ + μ − x)/γ

if g(v) > 0, if g(v) < 0.

(3.29)

The γ x˙ term has been included in Equation (3.27) to enable us to write down a case-by-case analysis explicitly. However, later it will turn out that this x˙ term is not strictly necessary for the final result. Note that f˜ is discontinuous at the manifold S = {g(|x| < μ, v) = 0}, where we have fII −fI = 2μ. We will now refine our condition g(x, v) = 0. Instead of the discontinuous f˜, we use a smooth interpolation f (v, λ) as the ‘convex hull’ of fI and fII : f (v, λ) = (1 − λ)fI (v) + λfII (v). x(t) γ

k

ω μ

Figure 3.17 The ‘stick–slip’ oscillator described by Equation (3.27).

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 149 — #21

i

Friction

i

149

The interpolation parameter λ will turn out to be the Lagrange multiplier for the constraint problem.2 Equation (3.29) now reads x˙ = f (v, λ).

(3.30)

Let us designate x0 and v0 as the system variables in the case of static friction (dynamic friction is determined by the sgn function anyway and does not have to be considered here). So we have to solve the constraint equation g(x0 , v0 ) = 0

(3.31)

As in the DAE approach to the pendulum problem in § 2.8, we derive the necessary equations by differentiation. Using the chain rule d ∂g(x, v) ∂x ∂g(x, v) ∂v g(x, v) = + dt ∂x ∂t ∂v ∂t on Equation (3.31), we obtain g(x ˙ 0 , v0 ) =

∂g(x, v) v˙ = 0 ∂v

(the derivative term in x drops out, as Equation (3.28) does not contain an explicit dependence on x). Therefor v˙ must be 0, which, by Equation (3.30), leads to f (v, λ) = (1 − λ)fI (v) + λfII (v) = 0. Thus we obtain λ as λ=

fI . fI − fII

(3.32)

What is remarkable is that the solution is given by a linear equation: although Coulomb friction has a jump at v = 0 and is therefore highly nonlinear, it is not necessary to solve any nonlinear equations. For the parameters given in the caption of Figure 3.18, the model exhibits stick–slip, with the slipping intervals becoming shorter and shorter as the spring relaxes. Hairer et al. [28, p. 198ff] computed this example with a Runge–Kutta method and dense output. They detected the transitions through v = 0, stopped the integration when the condition for static (or dynamic) friction changed, and then restarted the integration with the appropriate constraint equation. Such stopping and restarting would correspond to an event-driven method for rigid particles with collision dynamics; see § 9.4.1. (This is the numerical equivalent to

2 In contrast to the DAEs discussed in Chapters 1 and 2, which were defined via equalities and are therefore called ‘bilateral’ constraints, for static friction the constraints have to be determined from inequalities, and so one speaks of ‘unilateral constraints’.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 150 — #22

i

150

i

Understanding the Discrete Element Method

(a) 4

x(t)

3 v(t)

2 1 0 −1

1

0

2

3

4

5 Time

6

7

8

9

3

4

5 Time

6

7

8

9

(b) 4

x(t)

3 v(t)

2 1 0 −1 0

1

2

(c) 4

x(t)

3 2 1 v(t)

0 −1 0

1

2

3

4

5 Time

6

7

8

9

10

Figure 3.18 Solution of the stick–slip oscillator in Equation (3.27) for the parameters γ = 0.05, A = 2, ω = π and the initial conditions x(0) = 3, v(0) = 4. To be consistent with Hairer et al. [28, p. 198ff] we used μ = 4, which is rather large from the physical standpoint. Plotted are the time evolution traces for the position (+) and the velocity (×): (a) the solution with the ‘numerically exact’ static friction from Equation (3.27); (b) the result of the same calculation with dynamic friction only; (c) the result obtained with dynamic friction and adaptive time-steps.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 151 — #23

i

i

Friction

151

a piecewise solution for a friction problem, of which an analytical example is the billiard problem discussed in § 3.2.3.) For many-particle simulations, it is tedious to restart the integration whenever one of the many tangential velocities becomes zero, as the many additional interruptions will decrease the effective time-step. Instead of stopping and restarting, we used a third-order Runge–Kutta method with constant step-size to compute the solution plotted in Figure 3.18(a). We get practically the same result as in Figure 6.4 of [28], although one should bear in mind that the coarsened time resolution reduces the accuracy. When Coulomb friction is simulated with constraints, additional provisions have to be made in the function file for the equation of motion: when the condition for static friction is fulfilled, the velocity has to be set to zero by hand (which corresponds to stabilization by projection, as described in § 2.8.3). With the same integrator, the naive approach of using only dynamic friction fails obviously, as can be seen from Figure 3.18(b): there are no sticking intervals, R ’s ode23 only intervals with noise around zero. Using dynamic friction with MATLAB adaptive integrator somehow gives the correct sticking, but the solution is computed with very small effective time-steps and a very large rejection ratio, so the computational effort is prohibitive. Additionally, stabilization by projection can be used on the coordinates. This would be necessary, in particular, if the particles were set on an inclined surface, with constant downward force.

3.3.3

Frictional linear chain

With the method explained in the previous subsection, we can also compute the case of a chain of particles sliding on a floor with the possibility of static friction, as in Figure 3.19; this example can serve as a coarse model for studying the influence of Coulomb friction on a linear chain of many masses connected with springs. To investigate the influence of static friction on the acceleration, we can compute the sound velocity (group velocity) as the speed which the chain needs in order to pick up the external excitation. In Figure 3.20, the spreading of the wavefront can be seen from the transversal amplitudes above the average positions. In the simulation we use 40 blocks, with √ m = 1 and spring constant k = 400; the excitation √ is by a force A sin ωt, with A = k and ω = k/m. The theoretical sound velocity cth = a K/m is given by the thick gray line in Figure 3.20. For both μ = 0.5, shown in Figure 3.20(a), and μ = 0.05, shown in Figure 3.20(b), the sound velocity is consistent with the theoretical value for the frictionless chain. The deviation between the theoretical and the numerically computed wavefront at larger distances from the excitation is due to the degeneration of the wave’s shape; the actual spreading velocity is consistent—friction has no effect on the sound velocity, only on the decay of the sound amplitude. a ω

k

1 μ

k

2 μ

k

3 μ

k

k

40 μ

Figure 3.19 Frictional linear chain with friction coefficient μ and equilibrium distance a between the particles, subject to external excitation.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 152 — #24

i

152

i

Understanding the Discrete Element Method

(a) 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2

40 38 36 34 32 4

4.5

0

5

1

5.5

6

2

6.5

3 Time

4

5

6

(b) 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2

40 38 36 34 32 4

0

4.5

5

1

5.5

2

6

6.5

3 Time

4

5

6

Figure 3.20 Spreading of a longitudinal wave through a chain of particles (the particle index is given by the column of numbers on the right, and the amplitude is plotted in vertical direction) for a frictional linear chain excited by A sin ωt: (a) with friction coefficient μ = 0.5; (b) with friction coefficient μ = 0.05. In both systems, the sound velocity (thick gray line) is the same. The inserts show the amplitudes magnified five times; it can be seen that for μ = 0.05 the vibration reaches the last particle, whereas for μ = 0.5 the particles with index 34 and higher do not move any more and are constrained by the Coulomb friction. The amplitudes have varying size, due to the formation of a standing wave along the length of the chain in Figure 3.19.

3.3.4

Higher dimensions

The exact friction can easily be implemented for single particles also in higher-dimensional geometries. For example, in the case of a particle on a slope, instead of scalar f and v, the corresponding vectorial quantities (fx , fy ) and (vx , vy ) must be used in the inner products in

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 153 — #25

i

Friction

i

153

expressions (3.25) and (3.26) for aI and aII . Instead of the expression (3.32) for λ in terms of fI and fII , we have to use λ=

aI . aI − aII

(3.33)

For the one-dimensional chain of the previous subsection, the implementation is straightforward. For higher dimensions and arbitrary contact geometries, to date there is no process that would give a smooth solution in agreement with classical mechanics. Moreau’s sweeping process [31] can give a solution which is mathematically unique and satisfies the inequality constraint for static friction given in Equation (3.2), but that whole approach is outside the framework of conventional classical mechanics: non-smooth changes in velocities are possible in Moreau’s ‘contact mechanics’ [32, 33], so the corresponding accelerations as well as the forces leading to them, including partial forces such as the tangential contact force, can also vary in a non-smooth matter. However, this is at odds with Newtonian kinematics (where the accelerations are time derivatives of smoothly varying velocities) and with the whole approach of a soft-particle discrete element method with equations of motion which can be solved by conventional ODE solvers, so we do not consider it here.

3.4 3.4.1

Modeling and regularizations The Cundall–Strack model

Because the coefficient of friction is of order 1, friction cannot be treated as a small perturbation: a block on a slope inclined at less than the critical angle atan μ will not slide; but if the incline is greater than the critical angle, the block will slide. Likewise, in a DEM simulation, a heap of many particles can be constructed on a flat surface (represented by a single particle with straight edges) without the need to add extra roughness. When the friction is ‘switched off’, the heap will dissolve as if it were a viscous fluid. Friction can also have an effect on the speed of simulations: for low densities (‘granular gas’), the increased energy dissipation leads to clustering—more particles are in contact so that more interactions must be evaluated, and the necessary CPU time increases compared to a simulation without friction. For dense systems (‘granular solid’), when the friction coefficient is larger, the packing densities will be lower, so that particles have fewer neighbors; this reduces the number of necessary force evaluations, and therefore the simulation speed of the code will be higher. As there is currently no exact computation method available for many-particle friction, we have to make do with modelizations. Let us assume that the contact point and the relative tangential velocity vt are known. Then the best verisimilitude is obtained with the Cundall–Strack model [34], where the tangential force ft at time t is incremented from the previous time-step as long as there is sliding: ft (t) = ft (t − τ ) − kt,1 vt · τ,

(3.34)

where vt is the tangential velocity, and the direction of ft is in the direction opposite to vt . The appropriate choice of kt,1 , the ‘tangential stiffness’, will be discussed below. If the resulting tangential force exceeds the product of the normal force and the friction coefficient, it is

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 154 — #26

i

154

i

Understanding the Discrete Element Method

truncated with the maximal possible value μfn , i.e. the product of the friction coefficient μ and the magnitude of the normal contact force fn : ft (t) = sgn(ft (t)) · μfn (t)

if ft (t) > μfn .

(3.35)

Since the direction of the tangential force is obtained from the tangential velocity, the Cundall–Strack model uses scalar increments of magnitude, though the particle geometry is two-dimensional. As kt must have the dimension of a spring constant, the Cundall–Strack model is sometimes referred to as a model of ‘breaking tangential springs’. Unfortunately, the behavior of the model is oscillatory: when we divide Equation (3.34) by an infinitesimal dt, we obtain df = −kt,1 vt . dt

(3.36)

If we integrate this equation with respect to t, we obtain essentially the time evolution of the harmonic oscillator. This means that the tangential force in this modelization does not always act strictly opposite to the actual velocity. Due to the inertia of the ‘harmonic oscillator’, it may even act in the direction of the actual velocity. Only if the tangential friction reaches the value for sliding friction (the condition before Equation (3.35)) is energy dissipated. We can reduce the oscillations by introducing a damping term −kt,2 vt in addition to the hysteretic force of Equation (3.34), so that we have ft,act (t) = ft (t) − kt,2 vt .

(3.37)

While the damping may reduce the oscillations, one also sees that the Cundall–Strack model leads to a tangential friction whose grip is delayed relative to the ‘exact’ friction; see Figure 3.21. Nevertheless, beyond the time-scale of the oscillation, the results are satisfying: particles on a slope will come to rest if there is damping, and heaps can be modeled stably even if they are constructed on smooth surfaces. Note, however, that we have replaced a constraint force by an additional degree of freedom, which can store energy. For systems with strong vibrations (e.g., the simulation of fluidization due to vibration), an energy release from the ‘frictional springs’ may be triggered when particle contacts are suddenly loosened.

Tangential force

Delayed time for gripping for Cundal−Strack model μ Fn

Gripping

Closing of contact Exact friction Cundall Stack friction Time

Figure 3.21 Static friction of a block on a slope: behavior of exact friction (thick solid line), actual oscillatory behavior of the Cundall–Strack model (thin solid line), and behavior of the Cundall–Strack model without a cut-off at the product of the normal force Fn and the friction coefficient μ (dotted line). The gripping is delayed for the Cundall–Strack model.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 155 — #27

i

Friction

3.4.2

i

155

Cundall-Strack friction in three dimensions

In two-dimensional simulations, the tangential surface of a particle is a one-dimensional line, so we had to retain a one-dimensional quantity in Equation (3.34). In three-dimensional simulations,the particle surface  (r1 (t), r2 (t)) is two-dimensional, while the relative tangential velocity vx (t), vy (t), vz (t) and the trajectory of the contact point (x(t), y(t), z(t)) are threedimensional. To maintain a friction direction opposite to the velocity, we have to relate the contact manifold (r1 (t), r2 (t)) to the contact trajectory (x(t), y(t), z(t)), and at the same time we want to retain the incremental feature of the Cundall–Strack model in two dimensions, where the magnitude of the tangential friction ft (t − dt) from the previous time-step is used irrespective of a possible shift in the direction at the new time-step (except if this direction is backward). This can be achieved in the following steps: 1. Projection onto the new tangential plane. During the advance from time t −τ to time t, with ˆ new contact normal n(t) and new tangential velocity vt (t), we project the old tangential force ft (t − τ ) onto the new tangential plane:   ˆ ˆ ft (t − τ )p = ft (t − τ ) − ft (t − τ ) · n(t) n(t). 2. Rescaling to the old magnitude. We then rescale ft (t − τ )p to the magnitude of the previous tangential force ft (t − τ ): ft (t − τ )r = ft (t − τ ) ·

ft (t − τ )p . ft (t − τ )p 

3. Vectorial addition of the new increment. The rescaled projection ft (t − τ )r is then incremented to the new tangential force: ft (t) = ft (t − τ )r − kt vt (t)τ.

(3.38)

4. Application of a cut-off, if necessary. Finally, a cut-off is applied if the result from the previous vector addition exceeds the maximal friction allowed (the dynamic friction), so the dynamic friction becomes ft (t) = sgn(ft (t)) · μfn (t)

if ft (t) > μfn .

(3.39)

It should be noted that ft (t) may not be anti-parallel to the current (relative tangential) velocity v(t); nor was it in the two-dimensional case with its oscillations around the equilibrium.

3.5

Unfortunate treatment of Coulomb friction in the literature

Klein and Sommerfeld, in volume III of their monumental work Theory of the Top ([1] and various reprints; all translations in this book are our own, which deviate in several respects from the translation [23] published by Birkh¨auser) treat friction in their Chapter VIII, ‘Theory and reality: Influence of friction . . . ’. The first section is titled ‘The contradiction between

i

i i

i

i

i “Matuttis-Driv-1” — 2014/4/3 — 14:34 — page 156 — #28

i

156

i

Understanding the Discrete Element Method

rational and physical or celestial and terrestrial mechanics’. The authors complain that due to the discrepancies between theories based on rational (parameter-free) mechanics and the experiments, the ‘physical’ (phenomenologically influenced) mechanics would better be considered as ‘rational’, while ‘rational’ (in the sense of parameter-free) mechanics ‘in truth is highly unphysical and irrational’. The same can be said of particle modeling. Still, many recent books on finite element contact mechanics contain sophisticated nonlinear continuum treatments of the bulk, while the contacts are assumed to be either in perfect friction (no sliding possible) [35] or totally frictionless [36, 37]. The treatment of friction is also lacking in books on DEM simulations [38]. The results of large portions of the mechanics literature are indeed only of limited validity, due to the neglect of friction, for actual physical systems.

3.5.1 Insufficient models We have already seen in Figure 3.18(b) that neglecting the character of static friction and instead using only dynamic friction leads to practically useless solutions. When the Coulomb friction in Figure 3.22(a) is dealt with by ‘regularizations’, many approaches are immediately recognized to be unphysical, as they are not able to reproduce the simplest test case for static friction, i.e. to keep a block on a slope from sliding. This is the case for the approach in Figure 3.22(b), where for v = 0 the static friction is simply set to zero. In practice, this corresponds to the use of only dynamic friction, as v = 0 can hardly be reached in a ‘numerically exact’ sense. A block on a slope inclined at an angle above the critical φ = atan(μ) will alternately be pulled upward and downward, with the average motion being downhill, as the model cannot arrive at a force equilibrium; see Figure 3.23. Despite such unrealistic predictions, this approach has been discussed seriously in the mathematical literature [39], which shows that mathematical existence theorems don’t necessarily mean anything for physical relevance. Next comes a wide class of models where the friction around v = 0 is ‘regularized’ via a viscous force, so that the tangential velocity is proportional to the velocity. Haff and Werner [40] proposed a friction law FHW = − sgn(vt ) · min (γt vt , μ|Fn |)

(a)

(b)

(3.40)

(c) −kv

Dissipative

Friction force

Friction force

Friction force Constraint v

v

v

Dissipative Viscous regime

Figure 3.22 Three models of friction: (a) Coulomb friction; (b) setting the friction force to 0 for v = 0; (c) Haff–Werner model with viscous regularization around v = 0, indicated by dotted lines.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 157 — #29

i

i

Friction

157

(a)

(b)

(c)

0.1 0.02

0.6 Total force

Velocity

0.08

0.015

0.06 0.04

0.01

0.02

0.005

0

Position Height

0 0

0.2

0.4 Time

0.6

0

0.2

0.4 Time

0.6

0.4 0.2 0

−0.2 0

0.2

0.4 Time

0.6

Figure 3.23 Particle sliding down a slope inclined at angle α = 20◦ , assuming the force law from Figure 3.22(b) and computed using the classical Runge–Kutta method with τ = 0.05: (a) velocity in the tangential direction; (b) height and position in the tangential direction; (c) the total force.

Figure 3.24 Heap of base width about 5 cm, built on a mirror. The macroscopic smoothness does not prevent the formation of a high angle of repose.

for relative tangential velocity vt , normal force Fn and a parameter γt . In that case, a block on a slope will always slide downward, with a constant velocity which depends on the angle and γt . The friction law is then purely dissipative, so the nature of static friction as a nondissipative constraint gets lost. The pattern in the sign of the force plotted in Figure 3.23(c) depends on the angle α, the time-step τ and the integration method. For smaller time-steps and higher-order methods, velocity values closer to zero are obtained, but the particle will slide downhill all the same. Thus, with such friction laws, granular heaps cannot be stable either. Attempts to ‘stabilize’ a heap with the above or similar tangential force laws by building it on a rough surface [41] leads to the transmission of only normal forces on the floor, which does not correspond to the physical situation—heaps can even be built on mirrors, which are macroscopically as flat as one can get; see Figure 3.24. One could argue that there might be situations where the model in Equation (3.40) is sufficient; however, one would only be able to justify that by comparison with more realistic simulation models, in which case one could

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 158 — #30

i

158

i

Understanding the Discrete Element Method

just use those models right away. Even for single-particle models, the fast energy dissipation of the Coulomb friction cannot be mimicked by switching to a viscous dissipation. Finally, when static friction acts, it will constrain the relative motion of neighboring particles so that, instead of a cloud of particles, a fixed cluster moves; this phenomenon cannot be modeled by energy dissipation alone.

3.5.2

Misunderstandings concerning surface roughness and friction

In the traditional terminology of the Cambridge University Tripos [42, 43], questions involving ‘perfectly rough particles’ indicate a constraint of zero relative velocity for the contacting surfaces; so vanishing friction is associated with smoothness, and large friction with roughness. But, as shown in Figure 3.24, even polished surfaces of mirrors can have reasonably large friction coefficients; and, as Rabinowicz [17] pointed out, atomically smooth mica surfaces have the rather high friction coefficient of nearly μ = 1. So roughness should not be associated with large friction coefficients.

3.5.3

The Painlev´e paradox

The insufficiency of Coulomb’s friction laws was alleged by P. Painlev´e in the late 19th century, who took into account only the equation for dynamic friction. A refutation was published in 1910 by F. Klein [44], who also mentioned a growing number of other refutations in the literature. In textbooks, the problem had been treated at least since the first edition of Hamel’s book on theoretical mechanics [29]. Understanding has spread rather slowly, partially through wrong citations and misunderstandings of the literature. For example, Hamel, who emphasized that some initial conditions are not permissible, was incorrectly described even recently in [45] as having ‘joined the point of view by L. Lecornu about failure of the rigid body model’. Recently, Painlev´e paradoxes have been discussed in connection with rigid impacts. As this kind of ‘non-smooth mechanics’ [32] makes it necessary to allow jumps in the velocity, which is at odds with Newtonian mechanics, we will not dwell on the subject but stick to the case of elastic particles.

3.6

Further reading

In many respects we have followed Rabinowicz’s book [12], which gives a clear and concise treatment of tribology from the perspective of the adhesion theory of friction and is also useful for particle modeling. The book by Johnson [46] presents a wealth of experimental facts for contacts with friction. Under compression with additional oscillating forces, frictional contacts tend to slip, and for large compression the contact area gets damaged (‘fretting’); see Johnson [46, p. 224ff]. For rolling, additionally micro-slip and creep can occur [46, p. 242ff]. Vibration can be generated by the noise which is induced by granular flow. For slopes below granular chute flow, such creep was induced with a peculiar type of time- and depth-dependence [47]. Friction in connection to impacts is discussed in Stronge’s book [48]. Unilateral constraints in general are treated in [49].

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 159 — #31

i

Exercises

i

159

Exercises 3.1 Program the problem of the billiard ball initially in a state of perfect sliding (from R . § 3.2.3) using ode23 and ode45 in MATLAB Verify how well the values from Equations (3.19)–(3.20) can be observed depending on the numerical accuracy. Be aware that the transition from mixed rolling and sliding to perfect rolling is non-smooth, i.e. the higher-order derivatives don’t exist, so that higherorder integrators (or at least their step-size control) may run into problems. This may lead to oscillations in the solutions which are on the order of the absolute tolerance.

References ¨ [1] F. Klein and A. Sommerfeld, Uber die Theorie des Kreisels. III. Die st¨orenden Einfl¨usse. Astronomische und geophysikalische Anwendungen, 2nd ed. B.G. Teubner, 1923. [2] A. J. Morin, Nouvelles exp´eriences sur le frottement: faites a` Metz en 1831 [–1833]. Bachelier, 1834. [3] P. Conti, “Sulla resistenza di attrito”, Atti della Reale Accademia dei Lincei, Serie 2, vol. 272, no. 2, pp. 16–200, 1875. [4] D. Galton, The Effect of Brakes Upon Railway Trains. Westinghouse Air Brake Company, 1894. [5] J. A. Hammerschmidt, B. Moasser, W. L. Gladfelter, G. Haugstad, and R. R. Jones, “Polymer viscoelastic properties measured byfriction force microscopy”, Macromolecules, vol. 29, pp. 8996–8998, 1996. [6] R. W. Stark, G. Schitter, and A. Stemmer, “Velocity dependent friction laws in contact mode atomic force microscopy”, Ultramicroscopy, vol. 100, no. 34, pp. 309–317, 2004. [7] B. Bhushan and B. Gupta, Handbook of Tribology: Materials, Coatings, and Surface Treatments. McGraw-Hill, 1991. [8] E. Booser, ed., Tribology Data Handbook: An Excellent Friction, Lubrication, and Wear Resource. Taylor & Francis, 2010. [9] P. Blau, Friction Science and Technology: From Concepts to Applications, 2nd ed. Taylor & Francis, 2010. [10] B. Bhushan, Modern Tribology Handbook, Two Volumes. Mechanics & Materials Science, Taylor & Francis, 2010. [11] U. Gedde, Polymer Physics. Springer, 1995. [12] E. Rabinowicz, Friction and Wear of Materials, 2nd ed. Wiley, 1995. [13] V. Popov, Contact Mechanics and Friction: Physical Principles and Applications. Springer, 2010. [14] J. H. Dieterich and G. Conrad, “Effect of humidity on time- and velocity-dependent friction in rocks”, Journal of Geophysical Research: Solid Earth, vol. 89, no. B6, pp. 4196–4202, 1984. [15] L. Bocquet, E. Charlaix, S. Ciliberto, and J. Crassous, “Moisture-induced ageing in granular media and the kinetics of capillary condensation”, Nature, vol. 396, no. 6713, pp. 735–737, 1998. [16] T. Akiyama. Personal communication. [17] E. Rabinowicz, Friction and Wear of Materials. Wiley, 1965. [18] Y. Sang, M. Dub´e, and M. Grant, “Dependence of friction on roughness, velocity, and temperature”, Physical Review E, vol. 77, article 036123, Mar 2008. [19] T. P¨oschel and H. Herrmann, “A simple geometrical model for solid friction”, Physica A: Statistical Mechanics and its Applications, vol. 198, no. 3, pp. 441–448, 1993. [20] E. Rou`eche, E. Serris, G. Thomas, and L. P´erier-Camby, “Influence of temperature on the compaction of an organic powder and the mechanical strength of tablets”, Powder Technology, vol. 162, no. 2, pp. 138–144, 2006. [21] V. Gorodnichev and G. Borisov, “Influence of temperature on the equilibrium moisture content of medicinal granulations”, Pharmaceutical Chemistry Journal, vol. 11, pp. 1410–1412, 1977. [22] H. Jones and D. Scott (eds.), Industrial Tribology: The Practical Aspects of Friction, Lubrication and Wear. Elsevier Science, 1983. [23] F. Klein and A. Sommerfeld, The Theory of the Top. Volume III: Perturbations, Astronomical and Geophysical Applications. Birkh¨auser Boston, 2012.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:23 — page 160 — #32

i

160

i

Understanding the Discrete Element Method

[24] H. Ziegler (ed.), Kreiselprobleme / Gyrodynamics: Symposion Celerina, 20–23 August 1962. Springer, 1963. [25] Y. Sakamoto, Experimental Investigation of Drilling Friction for Axial Symmetric Bodies with Planes (in Japanese). Master’s thesis, The University of Electro-Communications, 2008. [26] A. Sommerfeld, Mechanics. Lectures on Theoretical Physics, Academic Press, 1964. [27] F. Radjai, Dynamique des rotations et frottement colllectiv dans les syst´emes granulaires. PhD thesis, University Paris Orsay, 1995. [28] E. Hairer, S. P. Norsett, and G. Wanner, Solving Ordinary Differential Equations I: Nonstiff Problems, 2nd ed. Vol. 8 of Springer Series in Computational Mathematics, Springer, 1993. [29] G. Hamel, Theoretische Mechanik: eine einheitliche Einfhrung in die gesamte Mechanik. Springer, 1945. [30] U. Ascher and L. Petzold, Computer Methods for Ordinary Differential Equations and Differential-Algebraic Equations. Miscellaneous Titles in Applied Mathematics, Society for Industrial and Applied Mathematics, 1998. [31] M. Kunze and M. Marques, “An introduction to Moreau’s sweeping process”, in Impacts in Mechanical Systems, B. Brogliato, ed., vol. 551 of Lecture Notes in Physics, pp. 1–60. Springer, 2000. [32] J. J. Moreau and P. D. Panagiotopoulos, eds., Nonsmooth Mechanics and Applications. Vol. 302 of CISM Courses and Lectures, Springer, 1988. [33] J. J. Moreau, P. D. Panagiotopoulos, and G. Strang, eds., Topics in Nonsmooth Mechanics. Birkh¨auser, 1988. [34] P. A. Cundall and O. D. L. Strack, “A discrete numerical model for granular assemblies”, Geot´echnique, vol. 29, no. 1, pp. 47–65, 1979. [35] Z. Zhong, Finite Element Procedures for Contact-Impact Problems. Oxford Science Publications, Oxford University Press, 1993. [36] P. Wriggers, Computational Contact Mechanics. John Wiley & Sons, 2002. [37] T. Laursen, Computational Contact and Impact Mechanics: Fundamentals of Modeling Interfacial Phenomena in Nonlinear Finite Element Analysis. Engineering Online Library, Springer, 2003. [38] A. Munjiza, The Combined Finite-Discrete Element Method. John Wiley & Sons, 2004. [39] V. Matrosov and I. Finogenko, “The solvability of the equations of motion of mechanical systems with sliding friction”, Journal of Applied Mathematics and Mechanics, vol. 58, no. 6, pp. 945–954, 1994. [40] P. Haff and B. Werner, “Computer simulation of the mechanical sorting of grains”, Powder Technology, vol. 48, no. 3, pp. 239–245, 1986. [41] D. Zhao, E. G. Nezami, Y. M. Hashash, and J. Ghaboussi, “Three-dimensional discrete element simulation for granular materials”, Engineering Computations, vol. 23, no. 7, pp. 749–770, 2006. [42] L. Pars, A Treatise on Analytical Dynamics. Heinemann, 1965. [43] S. L. Loney, An Elementary Treatise On the Dynamics of a Particle and of Rigid Bodies. Cambridge University Press, 1930. [44] F. Klein, “Zu Painlev´es Kritik der Coulombschen Reibungsgesetze”, Zeitschrift f¨ur Mathematik und Physik, no. 58, pp. 704–709, 1910. [45] Z. P. Wiercigroch M., “On the Painleve paradoxes”, in Proceedings of the XXVII Summer School ‘Nonlinear Oscillations in Mechanical Systems’, St. Petersburg, Russia, pp. 89–111, 2000. [46] K. L. Johnson, Contact Mechanics. Cambridge University Press, 1987. [47] T. S. Komatsu, S. Inagaki, N. Nakagawa, and S. Nasuno, “Creep motion in a granular pile exhibiting steady surface flow”, Physical Review Letters, vol. 86, pp. 1757–1760, Feb 2001. [48] W. Stronge, Impact Mechanics. Cambridge University Press, 2000. [49] F. Pfeiffer and C. Glocker, Multibody Dynamics with Unilateral Contacts. Wiley Interscience, 1997.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 161 — #1

i

i

4 Phenomenology of Granular Materials The field of granular materials—from powder engineering to rock mechanics—is an important area of application for discrete element methods. Granular materials consist of solid particles which are definitely larger than atoms. In general, the particles’ deformations under shear are insignificant relative to the dimensions of the particles, and there are normal forces when particles are in contact. Apart from the usual materials that are treated as granular, it has been proposed that even foams can be considered (a very soft limit of) granular materials, with the ‘grains’ being single bubbles [1]. In contrast, the analogy between granular materials and fluids [2] is much weaker than that between granular materials and foam: there are no normal forces between fluid elements at rest. Accordingly, the respective particle modeling approaches are different. While the discrete element method for granular materials is based on velocity-independent normal forces, smoothed particle hydrodynamics [3–5] for fluids is based on tangential forces proportional to the velocities of the fluid elements. In this chapter, we focus on aspects of granular materials which are accessible by DEM simulations and where, at the same time, the outcome depends on the shape of the particles.

4.1 4.1.1

Phenomenology of grains Interaction

If the particles are larger than about 1 mm in diameter, the interaction between the surfaces in the normal direction of the contact is mostly due to elastic deformation, and the interaction in the tangential direction is mainly due to Coulomb friction. For particles of diameter less than 1 mm, in humid environments cohesion due to agglomeration of water molecules at the contact may have a significant influence. For even smaller particles, cohesion effects from the surface

Understanding the Discrete Element Method: Simulation of Non-Spherical Particles for Granular and Multi-body Systems, First Edition. Hans-Georg Matuttis and Jian Chen. © 2014 John Wiley & Sons, Singapore Pte Ltd. Published 2014 by John Wiley & Sons, Singapore Pte Ltd. Companion website: www.wiley.com/go/matuttis

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 162 — #2

i

162

i

Understanding the Discrete Element Method

itself will play an important role. In dry environments, if there is relative motion between the particles or between the particles and the surrounding walls, electrostatic effects will play a role. In other words, it is the quality of the ‘surface interactions’ between the particles which is characteristic of granular materials. A related phenomenon is fracturing, whereby the particle shape is destroyed, but which nevertheless leads to a system of granular particles on a smaller scale. The opposite effect would be sintering, where smaller particles aggregate to produce larger ones. We will discuss how these phenomena can be treated in DEM simulations in § 7.3.5 of Chapter 7. In contrast to atomic forces, which are usually treated as central forces (i.e. the normal direction is along the line connecting the centers of mass), the forces between grains are in general not central.

4.1.2

Friction and dissipation

Apart from their contribution to particle interaction, friction and energy dissipation lead to macroscopic behavior which is different from that of atomic and molecular systems. Even for low-density systems, there does not seem to be an ‘equilibrium state’ at which the particle densities are homogeneous, except for specially prepared initial conditions. When the system evolves under the granular dynamics with friction and energy dissipation, clustering occurs, and the initial homogeneous density is destroyed. This happens in purely theoretically conceived systems [6] as well as in both simulations and experiments involving monolayers of particles which are allowed to roll [7]. In short, even one of the most common assumptions of statistical mechanics, homogeneous density, is not necessarily fulfilled for granular materials, depending on the processes in the system. The discrete element method allows us to at least model the actual density distributions and the breakdown of the density homogeneity.

4.1.3

Length and time scales

The length scales of grain diameters range from nanometers in powders to perhaps hundreds of meters in the case of rock mechanics. System sizes of interest may range from five or six particles in linear extent, when clogging is considered, to many hundred kilometers, when considering debris-filled earthquake fault lines as granular phenomena. On astronomical scales, Saturn’s rings and the even larger asteroid belt between Mars and Jupiter can be included in the consideration of ‘granular systems’. When we consider the grains as mesoscopic mechanical bodies, and exclude considerations on the atomic level, the fastest time-scales are the collisions between particles. In that case, the ‘grip’ from friction is instantaneous. The largest time-scales we may have to deal with in the laboratory are weeks for careful drying of material. In an hourglass, many time-scales are present simultaneously: there are the collision times between the more or less freely falling particles; there are avalanches, which occur when the granular material starts to flow and is then deposited again; the longest time-scale is the time the material takes to move from the upper to the lower bulb. Such considerations are important for estimating the number of time-steps needed (largest time-scale divided by smallest time-scale, multiplied by the necessary number of time-steps to resolve the smallest time-scale). Together with estimation of the system size (in number of particles), this will determine the feasibility of the simulation.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 163 — #3

i

Phenomenology of Granular Materials

i

163

Therefore, the computer time for the simulation depends also on the real time of the granular processes. All things (stiffness constant for the material, particle mass, number of grains, etc.) being equal, the simulation of heap formation with fast outflow from a hopper will take less computational time than the same scenario with slow outflow. However, fast outflow will lead to more noise (as kinetic energy), so systems created with different speeds may not have the same packing structure and other properties.

4.1.4

Particle shape, and rolling and sliding

The researcher may want to take the intuitive approach of making the constituents of a particle simulation as regular as possible: in many physical and engineering problems, regularity leads to symmetry, and symmetry reduces the number of variables which one has to deal with. For granular simulations, this leads to the use of round particles, but unfortunately they are not able to resist rolling, so systems made up of round particles lead to instability; see Figure 4.1. While the heap made of hexagonal particles is stable, constructing a heap from the round cylinders is impossible. The temptation is great to model such systems by using artificially large coefficients of rolling friction—but what would be the worth of a simulation which gives a result guaranteed to be unobtainable by experiment? The competition between rolling and sliding also determines the strength of granular assemblies: different shape and size distributions result in different material strengths, and assemblies of convex particles can be expected to be weaker than assemblies of non-convex particles with possible interlocking. While the instability of round particles is less obvious if walls are present, the effect on the strength of the assembly is the same as for heaps without walls. (b)

(a)

(c)

(d)

Figure 4.1 (a) Two-dimensional heap constructed from hexagonal prisms (with 13 mm distance between parallel edges) made of Duralmin (on a Duralmin surface). (b)–(d) Failure to construct a heap with cylinders (of 13 mm diameter): while the two layers in (b) are stable, adding a cylinder in (c) leads to collapse of the structure in (d), and the cylinders on the left are still rolling. The decomposition of the heap occurs because of slipping-rolling along the diagonal direction, similar to ‘slip planes’ for dislocations in crystals.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 164 — #4

i

164

4.2

i

Understanding the Discrete Element Method

General phenomenology of granular agglomerates

In an influential article, P. K. Haff [2] argued in favor of a fluid-mechanical approach for granular materials and, based on that, continuum-mechanical modeling of granular materials. While his arguments are based on analogies in the interaction, Coulomb friction is not mentioned at all in the paper. In this section, we will outline that part of granular phenomenology which does not occur in fluids and which is mainly due to normal forces that do not vanish with vanishing velocity, as well as Coulomb friction. From the modeling point of view, these phenomena gave rise to the development of the discrete element method in the first place.

4.2.1

Disorder

Disorder in regard to granular materials is important in the context of statistical physics: on the particle scale, which is accessible via DEM simulations, the variation of quantities (densities, forces, etc.) is not necessarily smooth and regular. Such variation is an outcome of the physical situation, not a result of sloppy implementations (or, on the experimental side, careless measurements); it tells us that there is something in the physical situation which prevents smooth progression of the data. Such fluctuations are not limited to granular materials. For example, while in good weather one may experience a smooth ride on a plane, turbulence will make for a bumpy flight, and the pressure variations that shake the plane, which are due to the vorticity of the flow field around the plane, do not result from ‘bad experimentation’. Smoothing out the fluctuations is not possible in the physical situation, and therefore smoothing out fluctuations in the corresponding simulation (by using a viscosity which is much too large) would not be desirable, as it would suppress the crucial feature of the physical system. But while disorder in flow fields is commonly an effect of the dynamics of the system, large fluctuations are frozen into granular systems also in static configurations, so that averaging may actually erase the significant characteristics. For the mock data in Figure 4.2, seven data sets with pressure minima are averaged, yielding an average that has no pressure minimum. (a)

(b)

(c)

(d)

1

1

1

1

0.5

0.5

0.5

0.5

0

–2

0

2

0

–2

0

2

0

–2

0

2

0

(e)

(f)

(g)

(h)

1

1

1

1

0.5

0.5

0.5

0.5

0

0 –2

0

2

–2

0

2

0

–2

0

2

0

–2

0

2

–2

0

2

Figure 4.2 (a)–(g) Mock data of pressure distributions, each with a pressure minimum near the middle; (h) the average of the data from (a)–(g), which lacks a pressure minimum.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 165 — #5

i

Phenomenology of Granular Materials

i

165

So one should careful about the use of averages: In disordered systems, it may happen that one averages away the effect one wants to investigate. Obtaining the ‘correct’ distribution of elementary data is one thing; comparing simulation data with experimental data is another matter. Simulation and experimental data are in general averages over many length scales of particles or many time-steps, or have been smeared out by a finite size of gauge or a finite reaction time. Accordingly, for spatial distributions, one has to average over a certain width by ‘binning’ data at adjoining points. Instead of choosing the bins side by side, one can obtain smoother data by using ‘moving’ averages, which shift a measurement range of length l over intervals which are smaller than l; see § 7.8.2.

4.2.2

Heap formation

From Figure 4.6(d), another phenomenon that does not occur in fluids is obvious—the formation of heaps. While surface tension, adhesion and cohesion allow fluids to form surfaces which are not flat, they don’t allow the formation of heaps with straight slopes. All other parameters being the same, the angle of repose depends on the particle shape; spherical particles won’t form heaps at all if there are no provisions to keep them from rolling away. Roughness of the ground is not necessary for physical heap formation; heaps can even be built upon mirrors, which are as smooth as one can reasonably prepare a surface; see Figure 3.24 in Chapter 3 and the associated discussion. Nevertheless, roughness and the friction coefficient may influence the angle of repose. If sliding between grains and ground is easily realized, macroscopic slipping will play a role in the heap formation; otherwise, the heap is basically formed by avalanches on the surface. Cohesion will inhibit rolling and therefore increase the angle of repose, which is why small glass beads (with radii on the order of 1 mm or smaller) form heaps with straight slopes, while marbles or soft air-gun beads (with a radius on the order of 4 mm or larger) won’t form proper heaps; see Figure 4.3. (a)

(b)

(c)

(d)

(e)

(f)

Figure 4.3 (a)–(c) Heap formation with plastic polyhedral particles; (d)–(f) failure of heap formation with plastic spheres. In both cases, the floor is inclined by 2.1◦ .

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 166 — #6

i

166

i

Understanding the Discrete Element Method

If free heaps don’t form on a plane in a DEM simulation, something is seriously amiss. In the case of round particles, the reason is the particles’ shape. While it is possible to set the rolling friction coefficient to unrealistically high values, or switch off rolling altogether [8], we don’t consider this kind of modeling to be consistent with classical mechanics. If simulation codes do not produce realistic angles of repose, the results will also be unreliable for particle systems which are kept between walls: there is a natural balance between sliding and rolling for a given grain mixture, which depends on the particle shape and the Coulomb friction. If this balance is unrealistic, then also, e.g., the stress fields of aggregates between fixed boundaries will not be given correctly.

4.2.3

Tri-axial compression and shear band formation

For solids, such as blocks of concrete, uni-axial compression is a conventional testing method, where a cylindrical sample is loaded from above under controlled pressure and advancing rate. For granular materials, the strength of an agglomerate depends on the strength which holds the agglomerate together: a sand pile can disintegrate under its own weight if it is vibrated, without any external vertical pressure. Thus, if the external pressure on the walls cannot be controlled, ‘uni-axial’ compression of a granular sample between fixed walls will not give much information on the strength of the sample, as it is unclear how much of the strength is due to the granular sample and how much is due to the forces caught up at the fixed boundaries. Therefore, the testing method of choice for samples of granular materials is tri-axial compression. In this method, the pressures in the x- and y-directions are fixed, while the sample is compressed from below (the z-direction) with a small, constant velocity, to give a ‘quasistationary’ compression. The force (or pressure) in the z-direction is measured as the parameter of the stress, and the dislocation of the floor is measured as the parameter of the strain. The behavior of the density during tri-axial compression depends on the density of the initial configuration; during the initial compression, the density may increase further due to settling in the sample. As the compression continues, there is a region where the stress is proportional to the strain; this looks like a Hooke material, but the behavior is not elastic— there will be no restorative force to the initial state, and during the whole compression the processes in the material are highly dissipative. Looking at Figure 4.4(b), initially the stress is proportional to the strain; then the curve begins to flatten, i.e. the material has reached the ‘plastic’ region. When the maximal density (and maximum coordination number) is reached, the stress will peak, i.e. the strength of the agglomerate becomes maximal. After that, due to Reynolds dilatancy (see § 4.3.2), the density will decay, and so will the stress, which is a measure of the material strength. Following that, in the ‘failure region’ of the material, the stress becomes constant. As illustrated in Figure 4.2, when one averages a few data sets in DEM simulations, it may actually happen that the maximum is averaged away. For this reason, careful choice of the point for the zero-strain  = 0 (see [9]) is necessary. Stress–strain curves for a given material are not universal, but depend on the external pressure. Further, in experiments, both ‘drained samples’ (no fluid in the pore space) and ‘undrained samples’ are common. For round (spherical or, in 2D, cylindrical) particles, the stress–strain curve has no maximum in the simulation or in the experiment, at least when the walls are held at constant pressure and allowed to move symmetrically. Moreover, there is no proper linear regime but, rather, an increase

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 167 — #7

i

i

Phenomenology of Granular Materials

167

Constant velocity Constant pressure

Constant pressure

Pressure (stress) sensor

Onset of plastic regime

Peak strength Critical state

Linear regime

(c) Volume

(a)

Stress (pressure)

(b)

Strain

Figure 4.4 (a) Set-up of tri-axial compression; the constant pressure is usually realized by a rubber membrane with external water pressure, and the constant velocity is realized by a linear actuator. (b) Typical stress–strain curve for a drained dilatant (dense) soil: ragged black line shows realistic fluctuations from experiments, and thick grey line is the idealized curve. (c) Plot of the corresponding volume (inverse density).

(a)

(b)

Figure 4.5 (a) Closest packing, and (b) shear band formation after application of external stresses, with reduction of the density (Reynolds dilatancy).

similar to an arctan curve, up to a saturation value. Maxima in the stress can occur when the walls are manipulated: when one wall is fixed, shear bands (see Figure 4.5) develop in the system which have similar stress–strain characteristics as homogeneous tri-axial compression: first a region with a linear stress–strain relation, then a maximum, followed by a decay of the stress. However, this has nothing to do with homogeneous compression of the sample, but is instead related to asymmetries introduced in the packing. Typically, stress–strain diagrams are sketched as smooth curves, in accordance with continuum-mechanical assumptions. The raw experimental data, in contrast, show strong fluctuations, where variations in the stress of 30% or more are not uncommon, e.g. for the time evolution of stresses in compression experiments [10] or neighboring gauge readings in the pressure distributions [11].

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 168 — #8

i

168

4.2.4

i

Understanding the Discrete Element Method

Arching

Arching refers to the ability of granular materials to deflect downward forces horizontally toward the sides. Arches have been in use in architecture for millennia, and they can be amazingly stable: in ruins from buildings of the middle ages in Europe, arches are often the most high-rising remains. (Admittedly, this may partly be due to the fact that the stones for arches were often cut in such a way that makes them less useful for quarrying than stones in other parts of the building, so that the arches may have been spared when other parts of the building were ‘recycled’.) Arching is also the reason that clogged hoppers don’t get unstuck if the pressure on the grains is increased: arches become stronger when the pressure from above increases, as long as the support on the sides is stable. Discussion of arching for granular materials started in the 19th century [12–16]. Due to arching, there is no hydrostatic pressure (which increases linearly with depth) in silos, but a part of the weight is deflected towards the walls and carried by the walls. We will not dwell on this subject, as we think that the derivation of formulae assuming unique relations which depend only on a friction parameter and are independent of particle size and deposition history is at odds with the well-established history effects discussed in the next section. Analytical formulae for silo pressure have their uses more as worst-case scenarios for industrial standards than as an adequate representation of the actual physical situation. Forces in granular materials are not distributed continuously: there is a discrete force network along which the force ‘paths’ propagate. For two-dimensional simulations of round particles, such force networks have the structure of a net, where within closed ‘meshes’ of strong forces nets of smaller forces are embedded. For non-spherical particles, there is no clear mesh structure; force chains start out as weak forces, become stronger in some parts of the granular matrix, and can then become weaker again.

4.2.5

Clogging

When a fluid is poured into a hopper, it either flows through the hopper or it does not; the latter occurs if the capillary or cohesion forces are too large. For granular materials, when the size of the hopper outlet is about five particle diameters, there will be clogging. While at the beginning particles will flow, the flow may suddenly stop because a stable plug of particles has formed at the outlet of the hopper. The clogging itself is due to arching, and its erratical occurrence is due to the disorder in the system. Clogging occurs in experiments in three dimensions, as well as in simulations in two and three dimensions provided the Coulomb friction is modeled correctly; see Figure 4.6. Time-steps that are too large or other sources of noise in the simulation may prevent or delay the clogging, as will any vibration of the hopper or the granular material in the experiment.

4.3

History effects in granular materials

In the 19th century, James Clark Maxwell proposed measuring how the pressure of embarkments on walls depends on the filling method. While constitutive theories (by Bousinesque, among others) all pointed to unique results, Maxwell suspected a ‘historical element’,

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 169 — #9

i

Phenomenology of Granular Materials

i

169

(a)

(b)

(c)

(d)

Figure 4.6 DEM simulation of the flow through a hopper with particles of size approximately 1.0 cm ×1.2 cm and friction coefficient μ = 0.3: (a) at the start, t = 0.0; (b) at t = 0.8; (c) at t = 1.75; (d) at t = 2.5, when the hopper clogs and the heap is stable.

i.e. an influence of the construction history. In fact, the pressure differences turned out to be up to 30% [17], a result which was later confirmed by Terzaghi [18]. While the experiments demonstrating that different filling methods lead to different pressure distributions are rather old, the prevailing theories in geotechnics deal with unique, history-independent equations for the earth pressure, a fact which is deplored even in the field of geotechnics itself [19].

4.3.1

Hysteresis

Hysteresis is the dependence not only on the current state but also on previous states. The oldest studies of hysteresis are related to magnetism: starting from an unmagnetized state, a piece of iron is magnetized in an outer constant field B; but when the field is reversed to −B, one does not reach the original unmagnetized state again. Hysteresis is also found in granular materials: when one shears a soil sample forward and then backward again, one does not necessarily reach the same state as the original, i.e. the void ratio, material strength etc. may be different. Hysteresis is notoriously difficult to model with conventional continuum methods, as the conventional continuum approach is by partial differential equations where the left-hand side at a given time t depends on the right-hand side at the same time t. If the stationary case is treated, then along with the time dependence, the

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 170 — #10

i

170

i

Understanding the Discrete Element Method

dependence on any initial state drops out of the equation; this is not a good mathematical basis from which to deal with history-dependent phenomena. In a recent book [20], Gerd Gudehus discusses at length phenomena which are essentially hysteresis effects, and proposes the time evolutions of the stress–strain diagrams of soil as necessary conditions for the proper modeling of soil.1 From the point of view of the discrete element method, hysteresis is obtained via two effects. One stems from the character of solid friction: both the Cundall–Strack model (see § 3.4.1) and the exact solution for one-dimensional systems (see § 3.3.2) are hysteretic—in both cases, the loads and tangential velocity before the particles come to relative rest determine the finite value of the static friction. The second effect is the reordering of the particle configuration; for this, realistic particle shapes and particle size dispersions are necessary. Assemblies of non-elongated particles of approximately the same size tend to order in hexagonal structures (in two dimensions, or the corresponding structures in three dimensions); hardly any reordering is possible and, accordingly, arching effects and pressure minima under heaps are scarcely visible, either in simulations [21] or in experiments (with rape seed in [22] and glass beads in [11]).

4.3.2

Reynolds dilatancy

When granular materials are sheared in a dense state (deformable walls are necessary for this), in most cases the resulting state is less dense, a phenomenon which is called ‘Reynolds dilatancy’ [23]; see Figure 4.5. This makes for baffling experiments. For example, when a plastic flask filled with a dense packing of equal-sized glass beads is compressed at the sides, water will not spurt out but rather be sucked in; see Figure 5.10. Another effect is that while one can insert a stick into a bottle containing loosely piled granular material and pull it out again, if the bottle is tapped after insertion of the stick, then the bottle can be lifted by the stick, which will not slide out; see Figure 4.7. However, there is a difference between the above two examples in regard to the optimal grains. For the plastic flask filling, spherical particles are preferable because they allow faster reordering; for the stick in the bottle, angular grains will give a better grip on the stick. Thus, also for the demonstration of Reynolds dilatancy, there are shape effects. It goes without saying that this is a manifestation of history effects: the care taken in preparation of the packing determines the outcome (and sometimes failure) of the experiment. For DEM simulation, the lesson is that one should consider very carefully the choice of particle shape and initial preparation of the state. Reynolds dilatancy is also an example where one has to be careful with the analogy between atoms and grains: whereas for atomic crystals the thermal motion of atoms favors one packing or another, so that the ‘wrong’ (energetically high) packing may undergo a transition to the ‘right’ (energetically low) packing with time, a granular system is strictly under the influence of mechanical forces, so that no relaxation into other states is possible if the system is left to itself.

1 This is all the more remarkable because Gudehus has been famous for a particular continuum modeling approach for soils (the ‘hypoplastic’ continuum) through most of his career.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 171 — #11

i

Phenomenology of Granular Materials

(a)

i

171

(b)

Figure 4.7 (a) Vessel filled with ceramic beads, weighing 590 g in total, together with a plastic pipe. (b) The plastic pipe is inserted into the vessel, and then the vessel is lifted by the pipe after the granular material was compactified via tapping.

4.3.3

Pressure distribution under heaps

There is no hydrostatic pressure in granular assemblies because, due to arching, pressures don’t propagate purely downward but may be deflected horizontally. While it may seem simple to conjecture a pressure distribution under heaps, it is not possible to justify any particular distribution as the only possible one: depending on how one envisages the propagation in a ‘representative’ volume, different pressures may result. There are ‘good’ reasons to postulate characteristics which lead to either a pressure maximum, a flat pressure, or a pressure minimum in the middle even for the simplest models; see Figure 4.8(a)–(c). If one additionally allows manipulation of the angle of repose, even more variations become possible; see Figure 4.8(d)–(e). Needless to say, each ‘linear combination’ (or stochastic combination) can also be envisaged. What is shown via representative elements in Figure 4.8(a)–(d) could also be formulated via partial differential equations—mathematically more impressive, but equally lacking in physical validity. Because it should be possible to construct such heaps with suitable blocks experimentally, theoretical modeling will not give an answer about the realistic behavior; no universal pressure distribution can be postulated for heaps, as long as it cannot be determined which blocks (or which arrangement) is the valid representation of a volume of granular material. The experimental situation is rather complex, with some measurements indicating pressure minima (mostly from powder mechanics [11, 22, 24]) and others (mostly from civil engineering [25, 26]) suggesting constant pressures in the middle. In the 1990s the discussion became rather heated, to the point that the validity of measurements with pressure minima was called into question [27]. That the material, particle size and heap size varied immensely and data fluctuations within the same article were considerable did nothing to make the problem more transparent. Discrete element simulations in two dimensions where the material parameters

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 172 — #12

i

172

i

Understanding the Discrete Element Method

(a)

(b)

(c)

P

P

P

(d)

(e)

P

P

Figure 4.8 Conjectured behavior of the distribution of weight (vertical component of the arrow count in a given direction) of square elementary volumes of heaps onto lower layers, together with the corresponding pressure distributions on the ground. In (a)–(c) the heaps are built with a 45◦ angle of repose; in (d) and (e) it is shown that with additional manipulations of the angle of repose, it is possible to obtain more realistic pressures (vanishing at the ends of the heap).

could be well controlled allowed researchers to identify the building history as relevant [21]: building the heap from a point source, which is common in powder mechanics (e.g. flow from hoppers) favors heaps with pressure minima, while building the heap layer-wise, which is common in civil engineering, does not give a pressure minimum. These results were later corroborated in three-dimensional experiments [28]. As a structural feature of heaps with pressure minima, regions of higher density in the middle were found in [29]; this explained the absence of pressure minima under heaps of large, round particles (such as glass beads and rape seed) for which the density is mostly homogeneous, as the particle positions equilibrate via rolling. Additionally, it explained why integration of the experimentally found pressure for a selective choice of gauge positions did not yield the weight of the heap, which had led to doubts about the experiment earlier on [27]: the assumption of a homogeneous density was not justified. Consistent results for different building histories of experimental measurements and three-dimensional simulations were obtained recently [30]. The above example is instructive for the use of the discrete element method in several respects. While theories must be based on assumptions (e.g. homogeneous density), discrete element simulations allow us to test these assumptions themselves. While experimental results are available for different materials (sometimes without mention of crucial influences such as air humidity), discrete element simulations allow us to recreate different settings with exactly the same ‘material’; moreover, the material can be varied to match the experimental conditions. Further, discrete element simulations enable preliminary studies to be used to design meaningful set-ups for experiments, which are much more difficult to reconfigure than simulations.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 173 — #13

i

Phenomenology of Granular Materials

4.4

i

173

Further reading

A very readable, informal text which bridges the gap between material science and architecture, between molecules and ‘grains’, but without overstraining the analogy, is the book by Gordon [31].

References [1] D. Weaire, V. Langlois, M. Saadatfar, and S. Hutzler, “Foam as granular matter”, in Granular and Complex Materials, T. Aste, T. Di Matteo, and A. Tordesillas, eds., World Scientific Lecture Notes in Complex Systems, pp. 1–26, World Scientific, 2008. [2] P. K. Haff, “Grain flow as a fluid-mechanical phenomenon”, Journal of Fluid Mechanics, vol. 134, pp. 401–430, 1983. [3] R. Gingold and J. Monaghan, “Smoothed particle hydrodynamics: theory and application to non-spherical stars”, Monthly Notices of the Royal Astronomical Society, vol. 181, pp. 375–389, 1977. [4] S. Koshizuka and Y. Oka, “Moving-particle semi-implicit method for fragmentation of incompressible fluid”, Nuclear Science Engineering, vol. 123, no. 3, pp. 421–434, 1996. [5] G. R. Liu and M. B. Liu, Smoothed Particle Hydrodynamics: A Meshfree Particle Method. World Scientific, 2003. [6] I. Goldhirsch and G. Zanetti, “Clustering instability in dissipative gases”, Physical Review Letters, vol. 70, pp. 1619–1622, Mar 1993. [7] D. Krengel, S. Strobl, A. Sack, M. Heckel, and T. Pschel, “Pattern formation in a horizontally shaken granular submonolayer”, Granular Matter, vol. 15, no. 3, pp. 377–387, 2013. [8] J. Lee and H. Herrmann, “Angle of repose and angle of marginal stability: molecular dynamics of granular particles”, Journal of Physics A, vol. 26, no. 2, pp. 373–383, 1993. [9] S. A. M. El Shourbagy, S. Morita, and H.-G. Matuttis, “Simulation of the dependence of the bulk-stress–strain relations of granular materials on the particle shape”, Journal of the Physical Society of Japan, vol. 75, no. 10, article 104602, 2006. [10] T. Doanh, M. Hoang, J.-N. Roux, and C. Dequeker, “Stick-slip behaviour of model granular materials in drained triaxial compression”, Granular Matter, vol. 15, no. 1, pp. 1–23, 2013. [11] R. Brockbank, J. Huntley, and R. Ball, “Contact force distribution beneath a three-dimensional granular pile”, Journal de Physique II, vol. 7, no. 10, pp. 1521–1532, 1997. ¨ [12] M. Huber-Burnand, “Uber das Ausfliessen und den Druck des Sandes”, Annalen der Physik, vol. 92, no. 6, pp. 316–328, 1829. ¨ [13] G. H. L. Hagen, “Uber den Druck und die Bewegung des trockenen Sandes”, Monatsberichte der k¨oniglich, Preußischen Akademie der Wissenschaften zu Berlin, p. 35, Jan 1852. ¨ [14] F. Engesser, “Uber den Erddruck gegen innere St¨utzw¨ande”, Deutsche Bauzeitung, 1882. ¨ [15] P. Forchheimer, “Uber Sanddruck und Bewegungserscheinungen im inneren trockenen Sandes”, Zeitschrift des o¨ sterreichischen Ingenieurs- und Architekten-Vereins, 1883. [16] H. A. Janssen, “Versuche u¨ ber Getreidedruck in Silozellen”, Zeitschrift des Vereines deutscher Ingenieure, vol. 39, no. 35, pp. 1045–1049, 1895. [17] G. Darwin, On the Horizontal Thrust of a Mass of Sand. Institution of Civil Engineers, 1883. [18] C. Terzaghi, “Old earth-pressure theories and new test results”, Engineering News-Record, vol. 85, no. 14, pp. 632–637, 1920. [19] G. Gudehus, “Earth pressure determination”, in Geotechnical Engineering Handbook. Volume 1: Fundamentals, pp. 407–436. Wiley, 2002. [20] G. Gudehus, Physical Soil Mechanics. Springer, 2010. [21] H.-G. Matuttis, “Simulation of the pressure distribution under a two-dimensional heap of polygonal particles”, Granular Matter, vol. 1, pp. 83–91, 1998. [22] T. Jotaki and R. Moriyama, “On the bottom pressure distribution of the bulk materials piled with the angle of repose”, Journal of the Society of Powder Technology, Japan, vol. 16, no. 4, pp. 184–191, 1979. [23] O. Reynolds, “On the dilatancy of media composed of rigid particles in contact”, Philosophical Magazine Series 5, vol. 20, pp. 469–482, 1885.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:24 — page 174 — #14

i

174

i

Understanding the Discrete Element Method

ˇ [24] J. Smid and J. Novosad, “Pressure distribution under heaped bulk solids”, International Chemical Engineering Symposium Series, vol. 63, pp. D3/V/1–12, 1981. [25] I. K. Lee and J. Herrington, “Stresses beneath granular embarkments”, Proceedings of the first Australian-New Zealand Conference on Geomechanics, Melbourne, vol. 1, pp. 291–296, August 1971. [26] B. Lackinger, Das Tragverhalten von Staud¨ammen mit membranartigen Dichtungen. PhD thesis, Mitteilungen des Instituts f¨ur Bodenmechanik, Felsmechanik und Grundbau an der Fakult¨at f¨ur Bauingenieurwesen und Architekture der Universit¨at Innsbruck, 1980. [27] S. Savage, “Modelling and granular material boundary value problems”, in Physics of Dry Granular Media, H. J. Herrmann, J.-P. Hovi, and S.Luding, eds., vol. 350 of NATO Advanced Science Institutes Series E, Kluwer Academic Publishers, 1998. [28] L. Vanel, D. Howell, D. Clark, R. P. Behringer, and E. Clement, “Memories in sand: Experimental tests of construction history on stress distributions under sandpiles”, Physical Review E, vol. 60, no. 5, pp. R5040–R5043, 1999. [29] A. Schinner, H.-G. Mattutis, J. Aoki, S. Takahashi, K. M. Aoki, T. Akiyama, and K. Kassner, “Towards a micromechanic understanding of the pressure distribution under heaps”, in Mathematical Aspects of Complex Fluids II, vol. 1184 of Kokyuroku (Kyoto University), pp. 123–139, Research Insitute for Mathematical Sciences, 2001. [30] J. Chen and H.-G. Matuttis, “Study of quasi two dimensional granular heaps”, Theoretical and Applied Mechanics Japan, vol. 60, pp. 225–238, 2012. [31] J. Gordon and P. Ball, The New Science of Strong Materials: Or Why You Don’t Fall Through the Floor. Alix G. Mautner Memorial Lectures, Princeton University Press, 2006.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 175 — #1

i

i

5 Condensed Matter and Solid State Physics Condensed matter and solid state physics offer important methodological insights into the application of the discrete element method—insights that can only be developed by transcending the concepts of classical continuum mechanics and which are based on the premise that the constituents of matter are discrete. Conversely, the discrete element method can be used to investigate phenomena that are not accessible by continuum approaches but which are closer to atomic systems in many respects. Condensed matter physics teaches us about the emergence of properties of the agglomerate which are not inherited directly from the single constituents; for example, neither pure carbon (graphite) nor pure iron is hard, but mixing them produces hard iron. The division of the periodic table into groups, comprising materials with similar properties, and periods, consisting of materials with very different properties, tells us there may be single-particle properties which greatly influence the properties of an agglomerate, while some variations of such properties have hardly any influence. On the other hand, single-particle properties may lose their relevance altogether; for instance, chlorine as a gas is green and poisonous, but as a chemical compound within table salt it is neither; similarly, the detailed mathematical form of the normal damping for single particles, which determines the trajectory of bouncing motion, may become irrelevant in an agglomerate where the contacts are permanent. Therefore, many concepts from solid state physics apply in some way or other to granular systems, with the appropriate modifications for friction, dissipation and disorder effects. Mechanisms from solid state physics help us to predict macroscopic changes in simulations due to changes in microscopic simulation parameters (size dispersion, particle shape, etc.). This, up to a point, helps us to estimate the outcome of computer simulations. Classically, the states of matter are divided according to their phases into solids (with definite shape and definite volume), liquids (with definite volume but no definite shape) and gases (for which neither volume nor shape is definite). Solids and liquids together are called condensed matter, while

Understanding the Discrete Element Method: Simulation of Non-Spherical Particles for Granular and Multi-body Systems, First Edition. Hans-Georg Matuttis and Jian Chen. © 2014 John Wiley & Sons, Singapore Pte Ltd. Published 2014 by John Wiley & Sons, Singapore Pte Ltd. Companion website: www.wiley.com/go/matuttis

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 176 — #2

i

176

i

Understanding the Discrete Element Method

liquids and gases together are called fluids. The parameters that control the transition between phases are temperature and pressure. Since the melting temperatures of conventional materials under normal conditions are well known, one usually knows the phase of such a material. For macroscopic granular particles, one can define granular solids, liquids and gases according to the above criteria for volume and shape (i.e. permanent or non-permanent neighborhoods between particles), but the transition conditions are far from clear. First of all, there is no homogeneous pressure, as individual collisions and disorder lead to large inhomogeneities. Depending on the conditions, external vibrations can either fluidize or compactify grains, i.e. make the packing either looser or harder. We start with a discussion of crystal structures, which also have applications to classifying the order in granular agglomerates such as heaps or fillings. Solids are classified into crystals (where there is regular positional ordering of molecules or atoms), glasses (with no ordering) and ceramics (which are inhomogeneous mixtures of crystals and glasses). Crystals may be built from a single kind of metal (e.g. diamond or silicon), several kinds of atoms (ion crystals, like salts) or molecules (e.g. ice, composed of water molecules connected via hydrogen bonds.)

5.1 5.1.1

Structure and properties of matter Crystal structures in two dimensions

Crystal structures, sometimes also called point groups, are periodic, space-filling, non-selfintersecting partitionings of space. In two dimensions, a crystal structure would be a tiling which could be continued periodically to infinity and which would cover the whole plane. In Figure 5.1(a) a few examples of such crystal-like tilings are shown; the pattern which is repeated is the ‘unit cell’. Figure 5.1(b) shows a pattern which cannot be part of a crystal, as it is not space-filling. The pattern in Figure 5.1(c) also cannot be part of a crystal tiling, because it is self-intersecting. The structure in Figure 5.1(d) is called a ‘quasi-crystal’: from one pentagon to the next there is ‘near order’ (the pentagons are joined edge-to-edge), but no ‘far order’ exists since the whole structure is not periodic (and, of course, not space-filling either). Figure 5.2 shows the elementary crystal structures (Bravais lattices), their unit cells and the elementary vectors in two dimensions; these elementary structures are the parallelogram, rectangular, rhombic, square and triangular lattices. (We ignore different terminologies for two and three dimensions, and talk about ‘cells’ in both cases rather than, e.g., ‘meshes’ in two dimensions.) Neglecting internal structures, the unit cell and lattice vectors determine the symmetries of the (infinite) pattern. While the rhombic lattice can be rotated only by 180◦ to obtain the original structure, the triangular lattice can be rotated by 60◦ or any multiple of it. As the triangular lattice has more possible symmetry operations than the rhombic lattice, it is said to have ‘higher’ symmetry. Apart from rotational symmetries (‘point symmetry’ is a symmetry with respect to rotation by 180◦ ), there are also mirror (reflection) symmetries. The rectangular lattice has point symmetry for each lattice point, as well as two mirror symmetries (with respect to the horizontal and vertical crystal planes). Parallelogram, rectangular and rhombic lattices have crystal planes at different distances, which can be realized by anisotropic molecules or interactions; for round granular particles they are not so relevant, because such packings would result in layers ‘hanging in the air’. The

i

i i

i

i

i “Matuttis-Driv-1” — 2014/4/3 — 14:38 — page 177 — #3

i

Condensed Matter and Solid State Physics

(a)

(b)

i

177

(c)

(d)

Figure 5.1 (a) Tilings with structure resembling crystals, i.e. periodic space-filling partitionings which can be continued to infinity, where the unit cells are shaded in gray. (b)–(d) Partitionings which are not crystal structures because they are: (b) non-space-filling; (c) self-overlapping; or (d) non-periodic (‘quasi-crystal’).

a2 a1

Figure 5.2 The two-dimensional space lattices, with particles represented by filled black circles; from left to right, these are the parallelogram, rectangular, rhombic, square and triangular lattices. The elementary translations (lattice vectors) are shown as dark gray arrows. The unit cell is marked in light gray, and for the square and triangular lattices, alternative unit cells are drawn which are also primitive cells. Light gray dotted lines indicate the orientation of the crystal planes.

same is true for the three-dimensional equivalents of these packings. The two-dimensional structure with the densest packing is the triangular lattice, which is sometimes referred to as ‘hexagonal’, although the hexagon is not the primitive cell, i.e. the cell of minimal size. The lattice vectors for two-dimensional crystals are usually denoted by a1 and a2 (a1 , a2 , a3 in three dimensions, and in one dimension there is just the ‘lattice constant’ a). ‘Proper’ crystal structures are constructed with the symmetry, i.e. the unit cell, and the basis, i.e. an internal structure with an arrangement of particles in the unit cell; this construction is sometimes written in mock-equation form as lattice + basis = crystal structure. The same unit cell may be combined with different bases. The triangular lattice (rightmost pattern in Figure 5.2), the honeycomb structure in Figure 5.4(a) and the Kagome lattice (Japanese for ‘basket-eye’, a pattern that emerges from basket-weaving) in Figure 5.4(b) all have the same triangular unit cell, but the bases are different. In the field of crystallography, unit cells are said to be primitive if there are no smaller unit cells which could be used to form the lattice. The term ‘elementary cell’ is usually used in the sense of ‘primitive cell’,

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 178 — #4

i

178

i

Understanding the Discrete Element Method

(a)

(c)

(b)

1 2

Figure 5.3 (a) A square lattice, showing the primitive √ cell (a simple square) of lattice constant 1 together with an area-centered square of lattice constant 2 which is not a primitive cell. (b) A triangular lattice, showing the primitive unit cell (a triangle) and a unit cell (hexagon) which is not primitive. (c) A Bethe lattice, which is not a crystal lattice at all.

(a)

(b)

Figure 5.4 (a) The honeycomb lattice (left) and its realization with particles (right). (b) The Kagome lattice (left) and the corresponding particle packing (right), called ‘trihexagonal packing’. The elementary lattice vectors are shown in dark gray and the elementary cell in light gray.

i.e. as a cell spanned by elementary lattice vectors which could not be chosen smaller, while the term ‘unit cell’ means that the lattice can be built up by translation of the cell, but the cell is not necessarily minimal. In Figure 5.3(a), the area-centered square is not primitive, as the same lattice can be built by repeating a smaller simple square; the hexagonal structure in Figure 5.3(b) is not a ‘proper’ primitive unit cell, as the same lattice can be constructed via the triangular grid. Other lattices, such as the Bethe lattice in Figure 5.3(c), may not be crystal lattices at all, as its other name, ‘Cayley tree’, suggests (but note that a Bethe lattice is generally taken to be infinite, whereas a Cayley tree is finite). The Bethe lattice is defined by its connectivities, which result in symmetries, but it is not space-filling and there is no unit cell. As a defining characteristic of a crystal is that the periodicity can be continued into infinity, there must be no special point, but the Cayley tree has a singular point, namely the one from which the construction starts.

5.1.2

Crystal structures in three dimensions

In three dimensions, there are fourteen ‘Bravais lattices’, but we won’t list all of them, as most of the time there is a very low chance of observing them in granular materials with equalsized particles, due to the same lack of isotropy as for the rectangular and rhombic lattices in two dimensions. Cubic structures can be realized in the variants simple cubic, body-centered cubic and face-centered cubic; see Figure 5.5. The Wigner–Seitz cell (see next subsection) is a primitive cell for the simple cubic cell, but not for the body-centered and face-centered

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 179 — #5

i

Condensed Matter and Solid State Physics

(a)

i

179

(b)

(c)

Figure 5.5 The family of cubic lattices for spheres in three dimensions: (a) simple cubic; (b) bodycentered cubic; (c) face-centered cubic. In each diagram the elementary cell is shown as solid and the remainder of the spheres as transparent.

(a)

(b)

(c)

(d)

Figure 5.6 Densest packings of spheres in three dimensions: (a) construction of the body-centered cubic lattice; (b) actual orientation of the elementary cell for the body-centered cubic lattice; (c) construction of the hexagonal closed packed lattice; (d) actual orientation of the elementary cell for the hexagonal closed packing. In (b) and (d), the particles are reduced in size compared with (a) and (c) to keep the elementary cell (drawn with bold lines) visible; the particles which form the elementary cell are drawn as solid, while the others are transparent. In each diagram the thin line represents the symmetry axis of the construction.

cubic cells. Unlike for circular disks in two dimensions, where there is only one structure with the highest density, for spheres in three dimensions there are two lattice structures which have the highest packing density: the hexagonal closed packing and the cubic face-centered lattice. Their constructions are similar (see Figure 5.6), but for the hexagonal closed packing two different layers are needed, while with the same construction scheme the cubic face-centered lattice needs three different layers.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 180 — #6

i

180

i

Understanding the Discrete Element Method

5.1.3

From the Wigner–Seitz cell to the Voronoi construction

A special way of constructing a unit cell gives the Wigner–Seitz cell, the region around a lattice point which is closer to this lattice point than to any other lattice point. It is obtained by drawing the vectors to all neighboring particles and taking the normal at the middle of each vector; the region enclosed by these normals is the Wigner–Seitz cell. In two dimensions, the normals are lines; in three dimensions, the normals are planes. The Wigner–Seitz cell is a unit cell but not necessarily a primitive cell. The Wigner–Seitz cell for the triangular lattice, shown in Figure 5.7(a), is a different primitive cell than the ones in Figure 5.2. The concept of the Wigner–Seitz cell can be generalized to random lattices, where the resulting tessellation is referred to as a ‘Voronoi construction’; see Figure 5.7(b). The Voronoi cells for the points on the inside are convex, but for points at the boundary they are half-spaces that extend towards infinity. Voronoi tessellations have been used to obtain polygonal particles as irregular decompositions of domains to mimic fracturing [1]. Related to the Voronoi lattice is the Delaunay triangulation; see Figure 5.7(c): the vertices of the Voronoi lattice are the centers of the circumcircles of the Delaunay triangles. While the Voronoi construction is ‘safe’ (i.e. unique and stable, without large changes for small changes in the positions of the underlying point pattern), the Delaunay triangulation may be influenced by rounding errors, so that the triangulation of a regular spacing turns out to be not regular at all; see Exercise 5.1.b. When Delaunay constructions have to fulfill additional conditions (e.g. sides of triangles should not cross certain lines or boundaries), one speaks of ‘constrained’ Delaunay triangulations; however, in such cases, the circumcircles of the Delaunay triangles may not lie on the corresponding Voronoi lattice any more, so they are no longer ‘proper’ Delaunay triangulations. Constrained Delaunay constructions can be used to triangulate the pore space between polygonal particles, in order to simulate flow in the pore space using finite element methods [2, 3]. Voronoi and Delaunay constructions can be generalized to higher dimensions.

(a)

(b)

(c)

Figure 5.7 (a) For the point at the center, the Wigner–Seitz cell is the gray area up to the normals (thin lines) through the middle of the connecting vectors to the neighboring points; points which are farther away (along or beyond the dashed normal through the middle of the dashed arrow, and similarly all around the cell) have no influence on the construction. (b) The Voronoi tessellation for random points. (c) The Delaunay triangulation (thick lines) along with the Voronoi tessalation (thin lines) for the points in the middle graph, with circles drawn around selected vertices of the Voronoi lattice, which are the circumcircles of the Delaunay triangles around them.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 181 — #7

i

i

Condensed Matter and Solid State Physics

181

Table 5.1 Truncated packing fractions for equal-sized circles in two dimensions and equal-sized spheres without overlap in three dimensions (tcp, triangular closest packed; fcc, face-centered cubic; hcp, hexagonal closed packed; bcc, body-centered cubic; sc, simple cubic) Particles Lattice

Circles (2D)

Packing

tcp √ π 3= 6

square π = 4

fraction

0.90689. . .

0.785398. . .

Spheres (3D) fcc and hcp π = √ 2 2

bcc √ π 3= 8

sc π = 6

0.74048. . .

0.68017. . .

0.52359. . .

For particle simulation methods, Voronoi constructions have been used in [4, 5] to obtain fast detection algorithms for the intersection of faces and edges of polyhedral rigid bodies. In Table 5.1, we give packing fractions for circular particles in two dimensions and spherical particles in three dimensions. Kepler conjectured that the highest densities would be obtained for the face-centered cubic packing, but recent decades have seen a quest for even higher packing ratios, and densities of up to 0.77836 . . . have been discussed [6] based on arguments involving packings of polyhedra and their inscribed spheres. The upper limit of the packing fraction is   √ 1 π = 0.779635 . . . . 18 arccos − 3 3 The number of contacts of a particle in a packing is called the ‘coordination number’. By construction, one can obtain statically stable packings with coordination number six (triangular lattice, rightmost pattern in Figure 5.2), four (the Kagome lattice in Figure 5.4(b) and the square lattice in Figure 5.2), and even three (honeycomb lattice, Figure 5.4(a)), at least if the boundaries are fixed appropriately. In a DEM simulation, however, obtaining the thinned-out configurations (with particles removed in a regular pattern) of the honeycomb and Kagome lattices with granular particles via conventional processes (flow, random deposition) is highly unlikely. The coordination number is also influenced by the particle shape. For a circular particle on the surface of a two-dimensional assembly, the minimal number of contacts for a stable position is two; but for polygons with friction, even coordination number one may be stable, as in the case of a block on a slope that is inclined below the critical friction angle. If granular particles are frictional, the densities should in principle turn out smaller than without friction, but if spheres or circles are used, rolling will allow them to reach very dense configurations easily. If, instead, polygonal or polyhedral approximations to round particles are used (i.e. shapes with many corners), packings of frictional particles should indeed yield lower densities. Therefore we abstain from giving ‘lowest densities’: the packing densities of arches in gothic cathedrals are stable and have been so for centuries, but we are not very interested in ‘constructed’ structures which have no possibility of occurring within ordinary processes involving granular materials. If regular polyhedra or polygons with few corners are used, the density may in fact be above those for ‘densest packings’; with squares and cubes, space-filling packings can be obtained. No matter what the theoretical density is for infinite packings, in practice there will always be influences from the boundary, so that usually the

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 182 — #8

i

182

i

Understanding the Discrete Element Method

density for finite packings will be smaller. It is a good idea to keep an eye on the density in one’s simulations: if the particles are supposed to model a solid, but no matter how one treats the system they always order in a closest packing, the whole system is maybe fluidized due to a too-small time-step or other numerical sources of noise. Below we will encounter examples where this is not so easy to see with the naked eye in the presence of disorder, which is the interesting case for granular materials.

5.1.4

Strength parameters of materials

There are many parameters which characterize the strength of continuum materials. Young’s modulus Y =

σt t

is the ratio of (usually tensile) stress σt to tensile strain t . The bulk modulus K = −V

dP dV

is the resistance to uniform compression from all directions, i.e. the infinitesimal pressure increase due to a decrease of the volume. For solids, these strength parameters should show a proportionality: when a material has a high Young’s modulus, the bulk and shear moduli can also be expected to be high; see Figure 5.9, which plots Young’s modulus versus the bulk modulus for a variety of materials. These data for the Young’s modulus and the bulk modulus are taken from sources where measurements were performed for both (though not all values were collected from the same experiments). When the strength parameters are high, the melting point will be high too, as can be seen from Figure 5.8, which plots Young’s modulus versus the melting point (in kelvins). Young’s modulus Y and the bulk modulus K are commonly related in textbooks via the formula K=

Y , 3(1 − 2P )

(5.1)

where P is the Poisson ratio; when a material is compressed in one direction, this indicates how it expands in the other two directions. The values of P are not less than −1 and no greater than 0.5. Together with Equation (5.1), this means that Young’s modulus, the shear modulus and the bulk modulus are all positive. Further, there is the yield strength (the limit where a deformation is not elastic any more but becomes plastic), the hardness (the resistance to deformation when a force is applied, usually defined for scratching or indentation), and so on. These, of course, result from inter-atomic or inter-molecular forces: if the forces are weak, the particles are displaced easily (Young’s modulus), separate under mechanical strength (yield strength) and disintegrate under thermal excitations (melting, burning). So if one makes one’s own beads out of beeswax (or synthetic wax) or ‘free plasticTM ’ (from Daicel FineChem Ltd), which can be melted in a hot water bath, one cannot achieve high mechanical strength: wax

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 183 — #9

i

i

Condensed Matter and Solid State Physics

183

Os

4000

W Mo

Melting point [K]

3000

GR BN Ir TC

Cr Rh

Gn

2000

Fe

Granite

Bas

1500

Cu

Au

1000

Bra

Al Lim Zn

700

PTFE FEP

500

PE

400

BW

PW

300 7

8

10

9

10

Pb

PC PS Nylon PP Li ACR Na ABS

Bi

Cd Sn

P 10

10

1011

10

1012

Young’s modulus [Pa]

Figure 5.8 Young’s modulus versus melting point for (in alphabetical order of the abbreviations): acrylonitrile butadiene styrene (ABS), polymethylacrylate (ACR), aluminum (Al), gold (Au), boron nitride (BN), beeswax (BW), basalt (Bas), bismuth (Bi), brass (Bra), cadmium (Cd), chromium (Cr), copper (Cu), fluorinated ethylene propylene (FEP), iron (Fe), monocrystalline graphite (GR), gneiss (Gn), granite (Granite), iridium (Ir), lithium (Li), dry limestone (Lim), molybdenum (Mo), sodium (Na), polyamide (Nylon), osmium (Os), white phosphorous (P), polycarbonate (PC), polyethylene (PE), polypropylene (PP), polystyrene (PS), polyurethane (PU), teflon (PTFE), paraffin wax (PW), lead (Pb), rhodium (Rh), styrofoam (SF), tin (Sn), tungsten carbide (TC), tungsten (W) and zinc (Zn).

1012

PTFE

Bulk modulus [Pa]

Sha LiPEs

1010 Nylon PC

PU Rubber

ACR PS

Os Bas W Ste Fe Mo TC Au Cu Al Bra Granite Lim Zn Pb Gl Bi Cd Chalk Na

Di

P

San

PE

108

106 107

BW

PW

108

109

1010

1011

1012

Young’s modulus [Pa]

Figure 5.9 Young’s modulus versus bulk modulus for (in alphabetical order of the abbreviations): polymethylacrylate (ACR), aluminum (Al), gold (Au), beeswax (BW), basalt (Bas), bismuth (Bi), brass (Bra), cadmium (Cd), chalk (Chalk), copper (Cu), diamond (Di), iron (Fe), glass (Gl), granite (Granite), potassium (K), lithium (Li), dry limestone (Lim), molybdenum (Mo), sodium (Na), polyamide (Nylon), osmium (Os), white phosphorous (P), polycarbonate (PC), polyethylene (PE), polyester (PEs), polystyrene (PS), teflon (PTFE), polyurethane (PU), paraffin wax (PW), lead (Pb), rubber (Rubber), sandstone (San), shale (Sha), steel (Ste), tungsten carbide (TC), tungsten (W) and zinc (Zn).

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 184 — #10

i

184

(a)

i

Understanding the Discrete Element Method

(b)

(c)

Figure 5.10 Plastic flask with filling that consists of compactified, ordered monodisperse glass beads of about 3 mm diameter in water.

and ‘free plastic’ melt and break easily. It is typical of many organic materials that structural modifications and chemical disintegration occur before any melting sets in. Likewise, human tissue is not very hard, and can get burns already from hot water. There are various problems with Equation (5.1): Materials with a Poisson ratio P approaching 0.5 (among them many polymers) would be incompressible, which is unrealistic from the atomistic viewpoint of matter. Both theory and experiments have to be dealt with in a more subtle way [7, 8]. For granular materials, the problems are worse. First, the materials are more inhomogeneous; for rock, experimentally there are deviations in the strength parameters of up to 20% [9]. Further, for granular assemblies, there is of course no tensile strength. The positivity of the bulk modulus is another problem: there are various experiments showing that the volume of a granular assembly can increase when external pressure is applied; see Figure 5.10 (‘Reynolds dilatancy’ for dense granular materials—see § 4.3.2). There is a ‘generalized Hooke’s law’ which goes beyond Equation (5.1) and the equations related to it. It connects different elastic parameters in different directions, but these definitions of strain for the continuum cannot be easily transferred to granular assemblies: while there is microscopic deformation at the grain contacts, the ‘granular continuum’ would be a homogenization of the grain and pore space. Grains can slide and turn and, when pushed in one direction, wedge other particles into the orthogonal direction. Therefore, the directions for the strain in a granular matrix cannot be clearly separated into different directions in the same way as for continuum theories. For a formation of shear bands, one cannot even rely on the fact that the variation of stress and strain is continuous. For DEM simulations, as well as for experiments, the fluctuation in the stress in stress–strain curves during continuous compression may be on the order of 30%. So relations between strength parameters should be understood as approximations for averages rather than exact relations. For this reason, we use Young’s modulus throughout the text, to be consistent with the rest of the literature on the discrete element method, even if in some cases the bulk modulus would be the more appropriate quantity.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 185 — #11

i

Condensed Matter and Solid State Physics

5.1.5

i

185

Strength of granular assemblies

Applying the logic from the previous subsection to assemblies of granular materials, one can conclude that, because in a given granular material the particle interactions are either weak or strong, the strength parameters for the material will also be correlated. From a mixture of particles that gives a low angle of repose, we cannot expect a high yield strength or good stability against vibration either. By comparing granular materials with metals, one can draw rather illuminating conclusions about the material strength. If one compares atoms in solids with granular materials, three of the four binding mechanisms (ionic, covalent and hydrogen) are very different from the purely repulsive interaction of dry grains. Ionic bindings imply different kinds of particles, covalent bonds are so directed that they don’t resemble dry grains at all, and hydrogen bonds are much too anisotropic to have any explanatory value. Metal ions in solids are similar to dry grains in that they are purely repulsive and need external compression (from the surrounding electron gas as glue) to hold them together, in the same way as granulates need gravitation or walls. Why are some metals hard and others soft? Lead is famous for being soft, because the varieties commonly in use are very pure. If we imagine the atoms as spheres in a crystalline ordering, a strip made of lead is bent by inducing slip of neighboring crystal planes; because all the spheres are the same size, there are no obstacles. Something similar happens when one tries to make a heap from smooth cylinders (see Figure 4.1): after a certain height is reached, the heap collapses under its own weight due to slip along the ‘crystal axes’. Pure aluminum is similar to lead: it is soft as there is only one size of atom. Adding impurities (atoms which as a rule have a different diameter than those in the original material) increases the strength, because now there are obstacles to slip. Duralmin contains copper (atomic radius ≈ 128 pm) impurities in the aluminum (atomic radius ≈ 120 pm) matrix, and lead (atomic radius ≈ 175 pm) can be hardened by mixing it with antimony atoms (radius ≈ 140 pm). Iron (atomic radius ≈ 150 pm) is hardened by the addition of carbon atoms (atomic radius ≈ 70 pm). While our use of atomic diameters is a bit dubious from the viewpoint of chemistry of solids (as would be the use of the van der Waals radius, or the monoatomic lattice constant)—because the electron structures between free atoms and between atoms in a crystal are different—qualitatively, at least, they give a clear picture of the geometric effect. Admixtures of both smaller and larger atoms increase the strength of a crystal. Especially striking is the effect on iron, which without impurities has a tensile strength of 10 MPa as a single crystal; but iron with carbon admixtures, though much inferior to steel, already has a tensile strength of 140 MPa. Another soft substance is the copper used in electric wires. It has to be pure: impurities increase the electric resistance, as they would ‘get in the way’ of the conduction electrons which form the electric current; impurities would also block the slip of atomic planes under bending. However, there is another possible way of making copper harder: remove the insulation around a copper wire of 1–2 mm diameter, and bend the wire repeatedly—you will feel how the bending gets more difficult, as the copper gets ‘harder’. Bending induces dislocation, but bending back to the original conformation does not heal the dislocations; rather, it induces new ones in different places and in other directions. In that respect, dislocations are a mechanism allowing matrix atoms instead of foreign atoms to ‘get in the way’. Hardening steel usually means inducing dislocations by hammering, folding etc., and then preserving those dislocations from healing under thermal motion of the atoms by fast cooling of the hot metal.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 186 — #12

i

186

i

Understanding the Discrete Element Method

In the same way, we can ‘heal’ dislocations in a granular material by shaking or vibrating it at large amplitudes, and we can induce dislocations by shearing. Thus, the greatest strength of granular materials may not be obtained from densest packings; highly symmetric crystals can be deformed more easily than materials with dislocations. On the other hand, packings that are too loose are not stable either, because lower density implies lower coordination numbers. Therefore, in regard to the realism of packings made up of spherical particles (especially ones of equal or nearly equal diameter) and the possibility of reproducing the physical behavior of general granular materials, the conclusions are damning. Such particles don’t form proper heaps, and if one does not do anything about the boundary conditions, in simulations they have a tendency to order in the densest packings; if there are dislocations, it is difficult to prevent reordering due to rolling. Of course, one can try to increase the strength of spherical packings by introducing unphysical parameters (such as rolling friction coefficients which are so high that they are never found in nature, or even switching off the rolling altogether, ridiculing three hundred years of theoretical mechanics), but what goes around comes around, so one would then end up with artifacts also for other observables.

5.2 5.2.1

From wave numbers to the Fourier transform Wave numbers and the reciprocal lattice

When we deal with waves, rather than the wavelength λ it is often more convenient to use the wave number k = 2π/λ. The analogue in higher dimensions is the wave vector, a vector of wave numbers which sometimes correspond to different λi in the different dimensions. For a one-dimensional lattice with lattice spacing a, the ‘reciprocal lattice’ (or ‘inverse lattice’) has an elementary cell of length 2π/a, corresponding to the wave number of the reciprocal lattice. For discrete lattices, the wave vectors are multiples of the reciprocal lattice vectors. The ‘larger’ the elementary cell of the original lattice is, the ‘smaller’ will be the elementary cell of the reciprocal lattice. Working with wave numbers on a lattice corresponds to sampling a wave train at different discrete points. At a set of discrete points, the wave vector k for describing the wave train is not necessarily unique, as can be seen in Figure 5.11, but the points will lie on wave trains with wave vectors k  which are equivalent to k plus integer multiples of π . Suppose that the analytical data are taken from a curve x(k, x) = cos(kx). Then discrete sampled data with sampling interval δ will be xn (k) = cos(knδ),

n ∈ Z.

As k increases, the oscillations of the cosine curve become faster; for k = π/δ, we have xn (π/δ) = cos(π nδ) = (−1)n . It is obvious that a faster change than between +1 and −1 as one goes from each n to the next (n + 1) is not possible. Let us continue to increase k beyond π/δ, into the range π/δ < k < 2π/δ; set k˜ = 2π/δ − k so that 0 < k˜ < π/δ. Then we have

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 187 — #13

i

i

Condensed Matter and Solid State Physics

187

1 0.5 0 −0.5 −6

−4

−2

0 x

2

4

6

Figure 5.11 Aliasing: discrete points xn for n = −7, . . . , 7 (circles) sampled from the curve cos(kx) with k = 1.2 (bold line) also lie on the curves cos((2π − k)x) (thin line) and cos((3π − k)x) (dotted line).

xn (k) = cos(knδ)   ˜ = cos (2π/δ − k)nδ   ˜ = cos 2π n − knδ   ˜ , = cos knδ

(5.2) (5.3)

where from Equation (5.2) to Equation (5.3) we have made use of the facts that cos(2nπ + x) = cos(x) and cos(−x) = cos(x). So xn (k) can be reformulated with a dependence not on ˜ Analogous relations are valid for the sine curve. Thus, discretely sampled periodic k but on k. functions can be represented with different frequencies, which is called aliasing. (The optical illusion of wheels turning forward fast looking as if they were running slowly backward is such a phenomenon.) In Figure 5.11, one can see how the points x0 (k), x±1 (k), x±2 (k), . . . take the same values for k = 1.2 as for 2π − 1.2 and 3π − 1.2. For periodic systems with period a, we usually work only with wave vectors from −π/a to π/a, a range which is called the ‘first Brillouin zone’. Recall that the Wigner–Seitz cell describes the ‘closest space’ for a point on a lattice; the ‘first Brillouin zone’ is the Wigner– Seitz cell of the reciprocal lattice, which identifies the wave vectors that do not contain redundant multiples of π . In solid state physics, processes with wave vectors larger than those in the first Brillouin zone and which are not affected by ‘aliasing’ are called ‘second-order processes’ and will not concern us here in our discussion of granular materials. In two and three dimensions, besides a wave number we have to additionally take the direction into account. In three dimensions, for a lattice with lattice vectors a1 , a2 , a3 , the corresponding reciprocal lattice vectors b1 , b2 , b3 can be computed as follows: a2 × a3 , a1 · (a2 × a3 ) a3 × a1 , b2 = 2π a1 · (a2 × a3 ) a1 × a2 . b3 = 2π a1 · (a2 × a3 ) b1 = 2π

The reciprocal two-dimensional lattices are obtained by simply choosing a3 to be the unit vector in the z-direction. The units of the reciprocal lattice are the reciprocal units of the

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 188 — #14

i

188

i

Understanding the Discrete Element Method

original lattice. The reciprocal lattice of the reciprocal lattice is just the original lattice. The reciprocal lattice inherits the symmetry class (e.g. hexagonal, cubic) of the original lattice, but not necessarily the flavor of the elementary cell; for example, the reciprocal lattice of the body-centered cubic lattice is the face-centered cubic lattice, and vice versa.

5.2.2

The Fourier transform in one dimension

For the above mathematical framework involving reciprocal vectors, an infinite lattice with exact periodicity is needed. However, in granular materials as well as many other systems, we have to deal with disorder and boundaries, both of which can destroy the mathematical exactness. A mathematical method that allows us to analyze the periodicity in the underlying data is the Fourier transform. A spatially periodic structure fL with extent L can be represented by a series of sines and cosines as ∞  fL (x) = an cos(kn x) + bn sin(kn x), (5.4) n=1

where an , bn are the Fourier coefficients and kn = 2nπ/L are the wave numbers. Unfortunately, in many practical applications, the structures are not strictly periodic, so the series in Equation (5.4) must be replaced with the Fourier integral. The Fourier integral is valid for continuous k, and can be also applied to non-periodic structures. The most common and versatile implementation is via the complex exponential, exp(ikx) = cos(kx)+i exp(kx). The resulting Fourier transform F (k) of a function f (x) in space gives a relation between the spatial variable x and the continuous wave vector k by  ∞ 1 f (x) exp(−ikx) dx, (5.5) F (k) = √ 2π −∞  ∞ 1 f (x) = √ F (k) exp(ikx) dk. (5.6) 2π −∞ The factors in front of the integrals come from a convention: the product of the pre-factors in the expressions for F (k) and f (x) must be 1/(2π ). We have chosen the symmetric convention here, but other texts may use 1/(2π ) for F (k) and 1 for f (x), or vice versa. For the analytical treatment of and mathematical theorems on the continuous Fourier transform (5.5) and its inverse (5.6), there are many texts available. Here we focus on the discrete Fourier transform—as one of its variants, the‘fast Fourier transform’, is the numerically most feasible implementation of the Fourier transform—and discuss how analytical theorems have to be understood for discrete input data. For a wave number k, the discrete Fourier transform (DFT) X(k) and its inverse x(n) for N data points are defined as Xν =

N 

  exp −i2π(ν − 1)(n − 1)/N · xn ,

1 ≤ ν ≤ N;

(5.7)

n=1

xn =

N   1  exp i2π(ν − 1)(n − 1)/N · Xν , N

1 ≤ n ≤ N.

(5.8)

k=1

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 189 — #15

i

i

Condensed Matter and Solid State Physics

189

R We start the summation with index 1, not 0, because MATLAB does not allow indices to be 0. As in the case of the continuous Fourier transform, there are conventions about pre-factors. R , where 1/N multiplies the spatial components Here we follow the convention in MATLAB and the factor 1/(2π ) in (5.5)–(5.6) is dropped entirely. From the index ν we obtain the wave vector  k as k = 2π ν/N.  Equations (5.7)–(5.8) take the form of matrix–vector products: Xν = j Tnν xn and xn = j T˜nν xν , respectively. For a N × N matrix such products would require O(N 2 ) operations. Note, however, that exp(−i2π(ν − 1)(n − 1)/N ) is not an ordinary matrix with arbitrary coefficients; due to the occurrence of the product (ν − 1)(n − 1) in the exponent, many of the matrix elements are the same. For n = ν = 8, the matrix Tnν is

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

1 1 √ √ 2 1 + i 22 2 1 i √ √ 2 1 − 2 + i 22 1 −1 √ √ 2 1 − 2 + i 22 1 −i √ √ 2 2 1 2 +i 2

1 i −1 −i 1 i −1 −i

1 √ − 22 + i 22 −i √ √ 2 2 2 +i 2 −1 √ √ 2 2 2 +i 2 i √ √ − 22 + i 22 √

1 1 √ √ −1 − 22 + i 22 1 i √ √ 2 2 −1 2 +i 2 1 −1 √ √ 2 2 −1 2 +i 2 1 −i √ √ −1 − 22 + i 22

1 −i −1 i 1 −i −1 i

1 √ 2 + i 22 2 −i √ √ 2 − 2 + i 22 −1 √ √ 2 − 2 + i 22 i √ √ 2 + i 22 2 √

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

The regular multiple entries can be grouped so that some submatrices are applied to the vectors Xν and xn in Equations (5.7) and (5.8); this can then be repeated for sub-submatrices, and so on, reducing the overall computational effort. This strategy of ‘divide and conquer’ (where the sum of all sub-problems is more efficiently dealt with than with the original problem) allows the product to be computed in O(N log N ) operations. Thus it is called the ‘fast’ Fourier transform (FFT). It works best when the size of the data set is a power of 2; for other data sizes, either one needs to use other divide-and-conquer algorithms, or the data must be padded with zeros until the next power of 2 is reached.

5.2.3

Properties of the FFT

We start by reviewing a few properties of the FFT, to familiarize ourselves with the effect of the number of discrete points and the different normalization compared to the analytical Fourier transform. Fourier-transformed data are called the ‘Fourier spectrum’, and analyzing the Fourier spectrum is sometimes called ‘spectral analysis’ (other mathematical ‘spectra’ include the ‘eigenvalue spectra’ of matrices in linear algebra). • Sine and cosine. The Fourier transform of a full period of the sine curve is a peak in the second point of the imaginary component; see Figure 5.12(a1 )–(a3 ). The Fourier transform of a full period of the cosine curve is a peak in the second point of the real component; see Figure 5.12(b1 )–(b3 ). • Number of data points. If one uses more sampling points for the input signal (equivalent to using smaller discretization steps), the amplitude of the transform becomes larger, as conventionally no normalizations are performed in the FFT; compare (b1 )–(b3 ) with (c1 )– (c3 ) in Figure 5.12.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 190 — #16

i

190

i

Understanding the Discrete Element Method

(a1)

(a2) sin(x), sampled with 101 data−points

1

2

0.5

1

0

0

−0.5

−1

−1

0

2

x

4

6

(b1)

cos(x), sampled with 101 data−points

1 0.5

(a3) real(fft(sin(x)))

imag(fft(sin(x)))

50 0 −50

−2

20

(b2)

40

60

80

(b3)

real(fft(cos(x)))

60

20

100

40

60

80

100

80

100

imag(fft(cos(x)))

2 1

40

0

0 20

−0.5 −1

0

(c1)

2

x

4

6

cos(x), sampled with 11 data−points

−1

0

20

(c2)

40

60

80

100

−2

20

(c3)

real(fft(cos(x)))

40

60

imag(fft(cos(x)))

1 0.5

1

4

0

0

2

−0.5 −1

−1 0 0

(d1) 1

2

x

4

6

cos(x), sampled with 10 data−points

4

6

8

(d2)

(d3)

5

4

6

8

10

−16 x 10

5

3

0

0

2

−0.5

1 0

(e1)

2

x

−5

0

4

cos(x), sampled with 9 data−points

2

4

6

8

10

(e2)

1

4

0.5

3

0 0

2

x

4

4

6

8

10

1 0.5 0 −0.5 −1

1 −0.5

2

(e3)

2

0

−1

2

10

4

0.5

−1

2

2

4

6

8

2

4

6

8

Figure 5.12 Effect of the sampling rate (number of discrete points) and boundary condition on the result of the Fourier transform. (a) Sine function: (a1 ) curve sampled with 101 points; (a2 ) real part of

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 191 — #17

i

Condensed Matter and Solid State Physics

i

191

• Information and number of points. For N input points, the FFT gives N output points in the real channel and N output points in the imaginary channel, as can be seen in Figure 5.12. As information cannot be generated out of thin air, these output points are actually symmetric—mirror symmetric for the real spectrum (as the cosine components are even) and point symmetric for the imaginary spectrum (as the sine components are odd) with respect to the center of the axis; see Figure 5.12. Only for the first output point, which indicates the deviation between the first and last points of the input signal, is there no symmetric point. • Periodicity and endpoints. For a periodic signal, the first point in a period should not be repeated in the sample. The FFT for 11 points in Figure 5.12(c), where the first point is repeated redundantly at the right end of the interval, gives spurious deviations from 0 compared to the FFT using 10 points (with the right endpoint left out) shown in Figure 5.12(d). • Effect of the first point. A jump in the input signal between the first and last points, i.e. a deviation from periodicity, leads to a non-zero amplitude for the first point of the real part of the Fourier transform. In Figure 5.12(c1 ), the rightmost value is higher than in Figure 5.12(d1 ), so for the real amplitude the first entry is positive. In Figure 5.12(e1 ), the rightmost value is lower than in Figure 5.12(d1 ), so for the real amplitude the first entry is negative. Depending on whether the signal is even or odd, we get non-zero amplitudes in the cosine (real) or sine (imaginary) channel. Our input signals in Figure 5.12(a1 ) and (b1 ), namely sin(x) and cos(x), differ only by a phase shift. For random phase shifts, the FFT will distribute the information into both the real and imaginary channels. In such cases, when the phase information is unclear, it is better to take the absolute value of the FFT. Then one finds the following properties. • Neutral curve. The absolute value of the FFT of a Gaussian curve is again a Gaussian curve—or at least two half Gaussians, one from the left and the other from the right of the spectrum; see Figure 5.13(a2 ). In that case, it is convenient to shift the extremes of the R this is achieved by interval into the middle, as shown in Figure 5.13(a3 ); in MATLAB using the command fftshift.

Figure 5.12 (Cont’d) FFT; Effect of the sampling rate (number of discrete points) and boundary condition on the result of the Fourier transform. (a) Sine function: (a1 ) curve sampled with 101 points; (a2 ) real part of FFT; (a3 ) imaginary part of FFT. (b) Cosine curve sampled with 101 points, together with the real and imaginary parts of the FFT. (c) Cosine curve sampled with 11 points between 0 and 2π inclusive. (d) Cosine curve sampled with 10 points (where the point at 2π has been removed). (e) Cosine curve sampled with only 9 points. For the FFT plots (middle and right columns), the scale of the horizontal axis is the number of values sampled. With the right number of points, the amplitude of the Fourier transform of a single cosine wave only has a non-trivial amplitude at the correct wavelength, as in (d2 ); when one uses one point too many, as in (c2 ) and (c3 ), or one point too few, as in (e2 ) and (e3 ), there are also other non-trivial amplitudes. With more data points, as in (a2 )–(a3 ) and (b2 )–(b3 ), the effect of using one point too many or too few is less marked.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 192 — #18

i

192

i

Understanding the Discrete Element Method

(a1)

exp(−(x/2)2), 101 data−points

(a2)

1

25 20 15 10 5

0.8 0.6 0.4 0.2 −5

0 x

(b1)

abs(fft(exp(−(x/2)2))) shifted

25 20 15 10 5

5

20

40

60

80

100

(b2)

exp(−(5x)2), 101 data−points 1

20

40

60

80

100

abs(fft(exp(−(5*x)2.))), shifted

2.5 2 1.5 1 0.5

0.5

0

(a3)

abs(fft(exp(−(x/2)2)))

−6

−4

(c1)

−2

0 x

2

4

6

20

(c2)

atan(x/2), 101 data−points

40

60

80

100

abs(fft(atan(x/2))), half spectrum

60

1 0.5

40

0 20

−0.5 −1 −6

−4

(d1)

−2

0 x

2

4

6

10

(d2)

atan(x), 101 data−points

1 0.5 0 −0.5 −1

20

30

40

50

abs(fft(atan(x))), half spectrum

60 40 20 −6

−4

−2

0 x

2

4

6

π/2 sign(x), 101 data−points

(e1)

10

(e2)

20

30

40

50

abs(fft(π/2 sign(x))), half spectrum

100

1 0

50

−1 −6

−4

−2

0 x

2

4

6

0

10

20

30

40

50

Figure 5.13 Absolute value of the FFT (power spectrum) for: (a) a wide Gaussian; (b) a narrow Gaussian. It can be seen that the Fourier transform of a wide Gaussian is a narrow Gaussian, and vice versa, when one looks at the shifted power spectrum of (a3 ) instead of the unshifted one in (a2 ). In the power spectrum of stair-like functions, oscillations occur; the steeper the stairs are, the more oscillatory the power spectrum is. As one moves from the smooth variation shown in (c1 ) to the steeper stair in (d1 ) and then to the step function of (e1 ), the oscillations in the power spectrum, shown in (c2 )–(e2 ), increase.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 193 — #19

i

Condensed Matter and Solid State Physics

i

193

• Fourier reciprocity. Wide Gaussians are transformed into narrow Gaussians by the Fourier transform; see Figure 5.13(a). Conversely, narrow Gaussians are transformed into wide Gaussians, as shown in Figure 5.13(b). This is similar to the transformation of large lattice constants a into small wave vectors 2π/a in § 5.2.1. • Oscillations from steps. Jumps in the original signal lead to oscillations in the Fourier spectrum. If we increase the steepness of the ‘stair’ from (2/π ) arctan(x/2) in Figure 5.13(c1 ), through (2/π ) arctan(x) in (d1 ), and then to the sign-like step function sgn(x) in (e1 ), the oscillations in the Fourier spectra increase, as can be seen from Figure 5.13(c2 ), (d2 ) and (e2 ). This means that we have to avoid jumps in our input data, or else we will end up with noise all over the Fourier spectrum.

5.2.4

Other Fourier variables

Fourier transforms exist not only between k and x but also between other ‘conjugate variables’, for which the units in the exponents cancel. In the above we considered the wave vector k and the position variable x, but another often-used pair of conjugate Fourier variables are t and ω. A common definition of the frequency-dependent Fourier transform H (ω) of the time-varying signal h(t) is  ∞ 1 f (t) exp(iωt) dt, (5.9) H (ω) = √ 2π −∞  ∞ 1 H (ω) exp(−iωt) dω. (5.10) h(t) = √ 2π −∞ Note that, in this convention, the signs in the exponent are defined opposite to the Fourier transform for k and x in Equations (5.5)–(5.6). Later, when we discuss waves, we will usually study ones that travel to the right, so the dependency on k and ω will be something like cos(ωt − kx). If instead of ω the frequency f is used, the exponential must be modified to exp(±i2πf t).

5.2.5

The power spectrum

The Fourier spectrum (or the real and imaginary components in the case of the complex Fourier transform) gives full information about the wavelengths, relative amplitudes and relative phases. Often, one is only interested in the wavelengths and their corresponding amplitudes, without the phase information. In that case, it suffices to evaluate the power spectrum  ∞ 1 |f (x) exp(−ikx)| dx, (5.11) P (k) = |F (k)| = √ 2π −∞ the absolute value of the Fourier transform. Plotting and visualization is also simpler, as there are no imaginary components, but small details of the Fourier transform may get lost. In the same way, one can take absolute values for the discrete and fast Fourier transforms. The name ‘power spectrum’ derives from the fact that the frequency distribution of energy and power in many phenomena is proportional to the square of the absolute value of the Fourier transform.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 194 — #20

i

194

i

Understanding the Discrete Element Method

For the continuous case, one speaks of ‘power spectral density’. The power spectrum of white noise symmetric around the origin is a constant, overlaid with some noise.

5.3

Waves and dispersion

In this section, we work out how fast signals may propagate in granular materials; it will turn out that the signal is slower than an ordinary plane wave. For a signal which is periodic in time with period T , we have the frequency f = 1/T and the angular frequency ω = 2πf.

5.3.1

Phase and group velocities

Waves are characterized by the oscillation frequency ω of the amplitude and the wave number k = 2π/λ (‘inverse’ of the wavelength λ). The dimension of velocity is the dimension of the frequency ω divided by the dimension of the wave vector k. Accordingly, one can define two kinds of velocity for a wave. The phase velocity is the quotient vph =

ω , k

(5.12)

which indicates the velocity with which a node (zero-point) or crest of the wave propagates. The group velocity is the derivative vg =

∂ω , ∂k

(5.13)

and describes the velocity with which a wave packet propagates. The group velocity can be derived from ‘beats’, oscillations generated by the superposition of two close frequencies ω1 and ω2 ; see Figure 5.14. From the trigonometric identity  sin α + sin β = 2 sin

α+β 2



 cos

 α−β , 2

we see that a superposition of two oscillations sin(ω1 t) and sin(ω2 t) can be described by two frequencies: the average frequency ωav =

ω1 + ω2 2

and the modulation frequency ωmod =

ω1 − ω2 ; 2

see Figure 5.15. The wave packet has an envelope which propagates as  ω1 − ω2 k1 − k 2 x− t . = A sin 2 2 

Amod

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 195 — #21

i

Condensed Matter and Solid State Physics

i

195

(a)

(b)

Figure 5.14 Beats, superpositions of wave trains with frequencies which are close to each other, lead to periodic amplitude modulations. (a) Two individual oscillations sin(ω1 t) with ω1 = 1 (thick line) and sin(ω2 t) with ω2 = 1.1 (thin line). (b) The superposition sin(ω1 t) + sin(ω2 t) (thick solid line), the oscillation frequency of the combined signal sin((ω1 + ω2 )t) (thin solid line), and the envelope 2 sin((ω1 − ω2 )t) (dashed line).

Figure 5.15 Superposition of two waves with frequencies ω1 = 1 and ω2 = 1.1, which results in beats. In this case the group velocity (propagation velocity of the dashed envelope, marked by the halfamplitude) is the same as the phase velocity (propagation velocity of the crests of the wave train): the line through the four graphs at successive times follows both the wave crest and the half-height of the wave packet.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 196 — #22

i

196

i

Understanding the Discrete Element Method

For the wave to be stationary (i.e. not changing its shape), the argument of the sine must be constant: k 1 − k2 ω1 − ω2 x− t = constant. 2 2 Expressing x in terms of t, k1 , k2 , ω1 , ω2 and taking the derivative with respect to t gives the velocity of the wave packet as the group velocity, vg =

ω1 − ω2 dx = , dt k1 − k2

which for small differences between ω1 and ω2 and between k1 and k2 is equivalent to Equation (5.13). For ‘dispersion-free’ characteristics, where ω = kc, the phase velocity ω/k = c is independent of the wave number, and the group velocity is also c; see Figure 5.15. If the phase velocity is different for different ω and k, the group velocities vg will be slower than the phase velocities vph , as illustrated in Figure 5.16.

5.3.2

Phase and group velocities for particle systems

For granular particles, the ‘natural’ polarization will be longitudinal, i.e. like a sound wave: there is compression in the direction of the propagation velocity of the wave, normal to the

Figure 5.16 Dispersion for the superposition of two plane waves with equal wave velocity. In this example the group velocity (propagation velocity of the dashed envelope, marked by the half-amplitude) is not the same as the phase velocity (propagation velocity of the crests of the wave train). In the same time, the wave crest (represented by the left oblique line) moves farther than the half-height of the wave packet (represented by the right oblique line).

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 197 — #23

i

i

Condensed Matter and Solid State Physics

(a)

2r

197

2r

(c)

a

a (n−1)a

a na

n−1n

a

Fn,n−1

(n+1)a

n−1 n δx

(b)

k/2 δx

n+1 n Fn+1,n

x((n−1)a) x(na)

n+1 n

x((n+1)a)

Figure 5.17 (a) Equilibrium configuration of particles of diameter 2r (circles), with centers at a regular distance a < 2r apart; compression is indicated by gray shading for some particles. (b) Displacements x((n−1)a), x(na), x((n+1)a) of the particles from the equilibrium configuration. (c) The forces acting on particle n from its contacts with particles n − 1 and n + 1 (gray), and expansion of the sum of these forces around the origin, giving a linear relation.

contact area of the particles, due to the elastic interaction; see the sketch in Figure 5.17(a). We have not found conclusive evidence for the existence of transversal waves inside the bulk of granular materials, apart from in discussions relating to continuum theory. Transversal waves would be based on restoring shear forces between the particles parallel to the contact area; but as the dominant forces there are frictional forces, which are not elastic, we will stick to longitudinal waves in this subsection. On the surface, transversal waves for whole layers exist, mediated by normal contacts, but they are a result of the boundary geometry and the resonance of the material with external excitations, which necessitates a different kind of analysis from that offered by solid state physics. Let us compute the relation between k and ω for waves on a ‘linear chain’ of N particles each with mass m and radius r. The spring constant between the masses is usually denoted by K (upper case), and one should be careful not to confuse it with the wave number k (lower case). The chain is pre-stressed as shown in Figure 5.17(a), such that the particles are all in equilibrium positions with their centers of mass at locations 0, a, 2a, . . . , where a < 2r, i.e. the separation distance between the centers of mass is smaller than the particle diameter. The displacement of particle n from its equilibrium position at na can be expressed by an amplitude x(na). In the following, we will assume that the displacement is smaller than the overlap , or else no analytical treatment (via smoothly varying functions) is possible. Particle n interacts with particle n − 1 with a force Fn,n−1 = F (x(na) − x((n − 1)a)); it also interacts with particle n + 1 with a force Fn+1,n = F (x((n + 1)a) − x(na)). The overall force Fn (δx)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 198 — #24

i

198

(a)

i

Understanding the Discrete Element Method

(b) m(n+1) m(n) m(n−1) K(n+1) K(n−1)

2 K m

−π a

ω (k)

0 First Brillouin zone

π a

Figure 5.18 (a) Periodic linear chain of masses m with spring constants K between them. (b) The resulting sine dispersion linear force law, with the dispersion relation in the first Brillouin zone drawn as a thick curve.

on particle n depends on its deviation δx from the force equilibrium between particles n − 1 and n, and the interactions with particles n − 1 and n are symmetric, so Fn (δx) = Fn,n−1 + Fn+1,n . As this force is an odd function, i.e. Fn (−δx) = −Fn (δx), the corresponding potential V (x) = F (x) dx must be an even function and thus depend only on even powers of δx: V (δx) = V0 + K1 δx 2 + K2 δx 4 + · · · . In the configuration for mechanical equilibrium, the particles sit at the minima of the potential, so that ∂V (x)/∂x = 0, as in Figure 5.17(c). Accordingly, we can expand the force in a Taylor series about the equilibrium: F (x) = 2K1 x + 4K2 x 3 + · · · . Up to now we have made use of only the symmetry and the equilibrium position, so the approximation is also valid for solid state physics where the quantum mechanical potentials show much wider variations and dependencies than in granular mechanics. Assuming that  is considerably smaller than the radius of the particles, we can neglect the K2 term, so that the force around the equilibrium position is approximately linear; we set 2K1 = K to simplify the notation. For a chain of particles as in Figure 5.18(a), the equation of motion for the nth particle is then   mu(na) ¨ = −K 2u(na) − u((n − 1)a) − u((n + 1)a) .

(5.14)

A feasible way to obtain the time-dependent part of the solution to u(na) ¨ = . . . u(na) is by using a complex exponential function exp(i(∓ωt)). To satisfy the spatial relation in Equation (5.14), let us try a wave train exp(i(±κξ ∓ ωt)). In principle, to cover the full problem (both traveling and standing waves for any boundary condition), we would have to work with a wave function A exp(i(+κξ − ωt)) + B exp(i(−κξ + ωt)),

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 199 — #25

i

Condensed Matter and Solid State Physics

i

199

where A and B would be determined by the boundary conditions. For standing waves with fixed boundaries, with the ends held at 0, imaginary A = B would lead to sine waves; with periodic boundary conditions, real A = B would lead to cosine waves. To simplify the derivation, we work only with a wave traveling towards the right, exp(i(κξ − ωt)). Next, we have to determine physically reasonable values for the wave number κ and the position variable ξ. The chain of N particles allows only N different wave numbers, resulting in k=

2π n . a N

(5.15)

The same result is obtained when we consider the degrees of freedom of the system: N particles correspond to N degrees of freedom in real space, and consequently there can only be N degrees of freedom in momentum space (‘normal modes’). With Equation (5.15), proper choices for κ and ξ in exp(i(κξ − ωt)) will be κ = k and ξ = na for the nth particle. Therefore, we have the equation   −ika ika i(kna−ωt) −mω2 ei(kna−ωt) = −K 2 − . (5.16)  e − e  e −2 cos ka

After canceling the complex exponential from both sides, we are left with −mω2 = −2K(1− cos ka), which yields       1 2K(1 − cos ka) K  sin (5.17) ω= =2 ka  . m m  2 When we draw the dispersion curve as in Figure 5.18(b), we see that for k > π/a we leave the Brillouin zone, so we shift the whole curve into the interval [−π/a, π/a]. In solid state physics texts this curve is referred to as the ‘phonon dispersion relation’, but despite the word ‘phonon’, there is no quantum mechanics involved. These ‘phonons’ are the vibration modes of a purely classical system, and up to here there is no element which needs a quantum mechanical treatment. Thus, these relations hold also for one-dimensional chains of DEM particles. Even if the particle interactions are nonlinear, as long as the particles move around a force equilibrium, the functional form given in Equation (5.17) is still valid, although pre-stressing of the chain may lead to modification of the ‘spring constant’ K. The spring constant K will depend on Young’s modulus, as well as on the shape of the particle contact and the pressure on the granular assembly. For granular assemblies with vertical extension, in lower layers the sound velocity may be higher due to the higher load, especially if the Young’s modulus is low [10].

5.3.3

Numerical computation of the dispersion relation

Equations (5.14) and (5.16) are actually equivalent to an eigenvalue equation Ax = λEx, with some general matrix A and an identity matrix E scaled by the eigenvalues λ, for the ‘wave function’ u. We can therefore use linear algebra software to compute the dispersion relation for a finite number of degrees of freedom. For a simulation with 100 degrees of

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 200 — #26

i

200

i

Understanding the Discrete Element Method

(b)

(a)

2

2

1.5

ω

ω

1.5

1

1

0.5

0.5

0

0

0.5

1

1.5

k

2

2.5

3

0

10 A(ω)

0

Figure 5.19 (a) Dispersion relation generated with Program 5.1. (b) Density of states computed with Program 5.2.

freedom (100 particles), mass=1 and force constant k=1, Program 5.1 sets up the matrix D and scales it to a = 1 so that the wave number k will be in the interval [−π, π ] and the frequency spectrum for ω will be between 0 and 2k = 2. As in the analytic solution (5.17) and Figure 5.18, the dispersion is sinusoidal; see Figure 5.19(a). For wave vectors close to ±π, the group velocity will be nearly zero, i.e. the wave will not propagate. As for the analytic solution, the dispersion is not linear, and the sound velocity is largest for the longest wavelengths (i.e. the smallest wave numbers k) and decays with increasing k. This means that the force law is linear (∝ k × amplitude); nevertheless, the wave velocity is not constant even for the simplest setting which assumes linear interaction, the same mass for all particles and one dimension!

5.3.4

Density of states

In higher dimensions and with more complicated interactions, dispersion relations can become more complex due to different dependencies in different directions. To be able to analyze the whole spectrum at once, instead of the relation between k and ω one considers the number of states in an interval from ω to ω + ω. The ‘spectrum’ (probability distribution) for all states is called the ‘density of states’. For the numerical solution, we can just add up the number of data points in a given interval, as is done by Program 5.2, with the results shown in Figure 5.19(b). One sees that at the upper end of the spectrum, where the branch becomes horizontal, there is the highest number of states as per ω (‘van Vove singularity’). Nevertheless, what determines the dynamics are not the possible states but the actually excited states. For atomic and molecular systems, the value of ω up to which vibrations are actually excited depends on the temperature. For the analysis of dynamics of granular

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 201 — #27

i

Condensed Matter and Solid State Physics

i

201

Program 5.1 Code to calculate the dispersion relation (dependence of the frequency ω on the wave number k) for a one-dimensional linear chain with 100 modes (particles) via eigenvalue decomposition. clear % Numerical dispersion format compact n=100 randn(’seed’,4) % +---+---+---+---+---+ % mass 1 2 3 4 5 6 % spring 1 2 3 4 5 6 7 mass=ones(n+1,1); K=ones(n+1,1); % Spring Constant K(n+1)=K(1); % periodicity for i=1:n ip=i+1; if (ip>n) % periodicity ip=ip-n; end D(i,i)=(K(i)+K(ip))/mass(i); D(ip,i)=-(K(i))/sqrt(mass(i)*mass(ip)); D(i,ip)=-(K(ip))/sqrt(mass(i)*mass(ip)); end [U,Deigval]=eig(D); kvec(1:2:n)=pi*[1:2:n]/n; kvec(2:2:n)=-pi*[2:2:n]/n; fullk1=[kvec’]; fullomega=[sqrt(diag(Deigval))’]; clf plot(fullk1,fullomega,’+’) xlabel(’k’) ylabel(’\omega’) axis tight axis([-pi pi 0 2.2]) return

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 202 — #28

i

202

i

Understanding the Discrete Element Method

Program 5.2 Code to combine the states obtained from Program 5.1 into the density of states, without normalization. [n,omega]=hist(fullomega,[0:.10:2.05]) barh(omega,n) axis([0 1.15*max(n) 0 2.2]) xlabel(’A(\omega)’) a=ylabel(’\omega’)

materials via the density of states as in [11], it is much easier to determine the possible than the actually excited vibration states. In the (stationary) finite element analysis of mechanical systems, which neglects even damping, in general only very few eigenfunctions for the lowest (vibration) energies are calculated. Translated into the formalism for the density of states, this would mean that only the first few k-vectors would be relevant. As in granular materials damping and solid friction may further suppress vibrations with higher k-vectors, it is not clear a priori whether the formalism for the density of states is applicable to granular materials. DEM simulations can show which vibration modes are actually excited.

5.3.5

Dispersion relation for disordered systems

The real reason for studying the dispersion relation numerically is the possibility of incorporating disorder; this allows us to study conditions which are much closer to those found in actual granular materials. Conventional lattice dynamics assumes central forces and vanishing bending moments for inter-particle actions; however, for macroscopic grains, additionally we have friction in the tangential force, and for particles which have a contact of finite width, bending moments are also possible. For a start, we can set up the linear chain of § 5.3.2 and add disorder to the masses and the force constants; the disorder parameter should be limited to values which do not reverse the sign of the masses and the spring constant. In the spectrum of the system without disorder, calculated in Program 5.1, the eigenvectors are automatically ordered; but for the system with disorder, additional ordering of the eigenvectors according to wavelength must be introduced, as is done in Program 5.3. The result is shown in Figure 5.20: one can see that for small k, the dispersion curve is practically unchanged compared with Figure 5.19(a), but towards the end of the Brillouin zone (i.e. near k = π ), the data scatter, with the degree of scattering being proportional to the disorder in the masses and the spring constants. When disorder is introduced, the wave numbers may change, so that the spectrum of the wave numbers for a given eigenvector in matrix D of Program 5.3 becomes more complex. Therefore, the wave number is calculated from the Fourier transform to assign the k-vectors accordingly. Figure 5.20 shows that the curve near k = 0 is hardly affected by the introduction of disorder, i.e. the sound velocity for large wavelengths does not change much. On the other hand, the dispersion relation near k = π shows considerable scattering, which increases with the disorder parameter (the pre-factor used with the random numbers). In higher dimensions and for different crystal symmetries, there are additional ‘crystal directions’ (e.g. [1, 1] in two dimensions, not just the [0, 1] direction, like the elementary crystal vectors in § 5.1). When

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 203 — #29

i

Condensed Matter and Solid State Physics

i

203

Program 5.3 Code to calculate the dispersion relation (dependence of the frequency ω on the wave number k ∈ [0, π ]) for a one-dimensional linear chain with 200 modes (particles) via eigenvalue decomposition with disorder Ar in the masses mass and the spring constants K. Because the data scatter, it is better to use more modes than for the case without disorder. Sometimes, it may be convenient to vary the disorder for the masses and the disorder for the spring constants independently. clear format compact n=200 clf axes(’position’,[ 0.07 .45 .67 .5 ]) randn(’seed’,4) % +---+---+---+---+---+ % mass 1 2 3 4 5 6 % spring 1 2 3 4 5 6 7 Ar=.3 mass=ones(n+1,1)+Ar*(randn(n+1,1)-.5); % K=ones(n+1,1)+Ar*(rand(n+1,1)-.5); % Spring Constant K(n+1)=K(1); for i=1:n ip=i+1; if (ip>n) % periodicity ip=ip-n; end D(i,i)=(K(i)+K(ip))/mass(i); D(ip,i)=-(K(i))/sqrt(mass(i)*mass(ip)); D(i,ip)=-(K(ip))/sqrt(mass(i)*mass(ip)); end [U,Deigval]=eig(D); for i=1:n % Sort according to the maximal wavelength absfft=abs(fft(U(:,i))); [f,j]=max(absfft(1:n/2)); kvec(i)=2*pi*j/n; end fullk1=[kvec’]; fullomega=[sqrt(diag(Deigval))’]; plot(fullk1,fullomega,’+’) xlabel(’k’) ylabel(’\omega’)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 204 — #30

i

204

i

Understanding the Discrete Element Method

(a)

(b)

1.5

1.5 ω

2

ω

2

1

0.5

0.5 0

1

0

0.5

1

1.5 k

2

2.5

3

0

0

0.5

1

1.5 k

2

2.5

3

Figure 5.20 Graphs for the dispersion relation generated with Program 5.3: (a) with disorder parameter Ar=0.15; (b) with Ar=0.3. While the curve near k = 0 is barely affected by the disorder, scattering of the data increases towards k = π proportional to Ar.

disorder is introduced so that the lattice order is destroyed, the dispersion relation for different lattice directions collapses into a single dimension, which is just the inverse of the distance.

5.3.6

Solitons

Up to now we have dealt with ‘linear’ waves, for which the velocity is independent of the amplitude. Nevertheless, the dispersion relation was not a linear function: as the wavelength becomes smaller (on the order of the particle diameter), the group velocity decreases. Linear waves are obtained from linear force laws; that is, for a deformation δ between two particles, the force is F = −kδ.

(5.18)

When we have nonlinear interactions, e.g. if we can expand the force as F = −k1 δ − k2 δ|δ|

(5.19)

F = −k1 δ − k3 δ 3 ,

(5.20)

or

the wave velocity depends on the amplitude when δ exceeds some critical value (for small values of δ, the term with −k1 δ will dominate and the phenomena are essentially linear). In the nonlinear case, similar to the nonlinear oscillators in § 1.5, solutions are not independent of the amplitude any more, and ‘everything depends on everything else’. Typical waves in the nonlinear regime are solitons—groups of waves which travel together, some preserving their shape, while others change shape in characteristic ways. In general, these are ‘singular waves’ (with wave trains of limited length), not ‘plane waves’ (sine- or cosine-shaped of unlimited length). Solitonic phenomena occur in a wide range of fields, from mechanics and

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 205 — #31

i

Condensed Matter and Solid State Physics

i

205

hydrodynamics to lattice dynamics and electrodynamics, in situations where dispersion and nonlinearities are present. While for the advection equation ∂u(x, t) ∂u(x, t) +c =b ∂t ∂x

(5.21)

2 ∂ 2 u(x, t) 2 ∂ u(x, t) = c ∂t 2 ∂x 2

(5.22)

and the linear wave equation

(which can be derived from (5.21) under some assumptions) the traveling wave velocity c is independent of the amplitude, nonlinear modifications of Equations (5.21) and (5.22) exhibit much more varied behavior. The Korteweg–de Vries equation ∂u(x, t) + ∂t



∂u(x, t) ∂x

3 + 6u(x, t)

∂u(x, t) =0 ∂x

(5.23)

can have a solution where an initially step-like wave develops oscillating wave crests; see Exercise 5.4. There are several other classes of ‘typical’ solitons. With ‘envelope solitons’, only the long-wavelength outline of the moving wave is considered, not the oscillations with shorter wavelength. ‘Breather solitons’ are localized to a narrow region, where due to spatial nonlinearities the wave cannot break out but rather oscillates on the spot. In granular materials, they have been found as ‘oscillons’, single Gaussian-shaped waves localized near the surface of bronze beads [12, 13]. Traveling waves with amplitude-dependent velocities faster than the ‘linear’ wave velocity have been studied in DEM simulations. While the mathematically rigorous study of solitons is mostly limited to one-dimensional equations, solitonic phenomena can easily be observed in two- and three-dimensional discrete element simulations of granular materials [14–16]. Due to the dependence of the interaction, there is an additional dependence on the pressure. For discrete element systems (and granular materials), both dispersion and nonlinearity are easily realized. For many interaction laws between DEM particles, the repulsive force grows faster than linear in the dislocation δ: in Equation (5.19) or Equation (5.20), for large amplitudes neighboring particles are accelerated more strongly and the resulting wave will travel faster. However, granular and DEM systems contain dissipation, which is usually not considered in the theory of solitons. This means that there may be nonlinear waves which propagate with slowly decaying velocity, or nonlinear waves that suddenly turn into linear waves with much smaller velocity. This may distort the results for time-of-flight measurements of the sound velocity between an emitter and a detector. Due to dissipation, a certain amplitude at the emitter is necessary to excite not only linear but also solitonic waves. The solitonic waves may reach the detector faster than the linear waves, or if they decay to linear waves, these linear waves will reach the detector faster than the linear waves coming directly from the emitter; see Figure 5.21. In either case, the result can be misinterpreted as a too-high sound velocity if the wave fronts are not discriminated.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 206 — #32

i

206

i

Understanding the Discrete Element Method

(b)

ar

Line

O

rom

ed f

itt e em

Space

ed mitt ave ve enear w a w i nl ear Lin the no from

No emnline itte ar d f wa rom ve O

Space No n em line itte ar d f wa rom ve O

(a)

itted e e em wav wav nlinear r a Line the no from

O tted

wav

Time

O

from

O

a

ar w

Line

mi ve e

Time

Figure 5.21 (a) Emission of linear (thin black line) and nonlinear waves (thick black line) from a point O; when the nonlinear wave is damped sufficiently, it will propagate further as a linear wave (gray line). (b) In practice, when a wave is emitted from a point source, there will be cone-like spreading and then cone-like damping, after [10]. In reality, all three kinds of waves may overlap.

5.4

Further reading

The standard texts on conventional solid state physics are still those by Ashcroft and Mermin [17] and Kittel [18], which cover lattice symmetries, unit cells, phonon dispersion relations, Burgers vectors, etc. However, most of the concepts are developed assuming central potentials. Amorphous materials, which are in many respects similar to granular materials, are treated by Elliott [19]. A good overview of the discrete and fast Fourier transforms, including theorems and algorithms, can be found in the ‘Numerical Recipes’ books in various programming languages [20–22]. Nice examples of the fast Fourier transform and a more detailed R examples, can be found in Garcia’s book explanation of aliasing, along with MATLAB [23]. Further treatment of waves in general is provided in the Berkeley physics course [24] on an elementary level, and in Pain’s book [25] at a more advanced level. An introduction to solitons which covers the phenomenology, the underlying equations, and the application to mechanical models and particle chains is the book by Remoissenet [26]. A readable discussion of the numerical treatment of the Korteweg–de Vries equation can be found in the text by Landau et al. [27], though the finite difference treatment there has its limitations due to the noise it generates. Analytical treatment of nonlinear chains is given in Manevich and Manevich’s book [28]. Dispersion (the deformation of wave packets due to different propagation speeds of components with different wavelengths) is not limited to mechanical systems: such phenomena arise also in numerical solutions of partial differential equations, due to the finite grid spacing; see [29].

Exercises 5.1 Voronoi construction and Delaunay triangulations. R a) Create a ‘hexagonal’ grid with MATLAB ’s meshgrid function, taking the distance between the crystal planes in the x-direction to be dx and the distance in the

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 207 — #33

i

Exercises

i

207

y-direction to be dy. Compute the Voronoi construction and the Delaunay triangulaR ’s built-in functions (learn how to use them by typing help tion using MATLAB voronoi and help delaunay). Plot the crystal points and the grid. Don’t forget to use axis image or axis equal to avoid having the length of one axis distorted relative to the other. b) The Friedrichs–Keller grid (or finite element grid; see the left diagram below) is the Delaunay triangulation of the square grid. However, actual Delaunay functions like the R will produce rather random orientations of the diagonal one available in MATLAB (middle diagram below), due to the symmetry of the square grid and small rounding errors in the last digits of the distance computation for the neighboring grid. To remove this degeneracy due to the equal distance between the diagonals of a square, it is R ’s voronoi function. sufficient to compute the tri-structure from MATLAB The direction of the diagonals can be enforced by skewing the positions of the lattice points upward or downward, rightward or leftward. A grid where the y-coordinates are skewed is shown in the rightmost diagram below.

5.2 Fourier transform: boundary values of an input signal. In the example code on the right, the Fourier transform of a sine curve is computed. In the code as it is, the sine curve is computed from 0 to 2π. If the % sign at the beginning of the sixth line is deleted, the last point in the data set will be removed. Vary the number of points by setting l to be 50, 200, 500, etc., and vary the periodicity by changing the function y=sin(x) to sin(2x), sin(4x), etc. Investigate which variant gives more meaningful results with respect to the period of the signal.

clear all format compact l=50 x=linspace(0,2*pi,l); %removes the end point: % x=x(1:end-1) y=sin(x); ffty=fft(y); subplot(1,3,1) plot(x,y,’*’) axis tight subplot(1,3,2) plot(real(ffty)) axis tight subplot(1,3,3) plot(imag(ffty)) axis tight return

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 208 — #34

i

208

i

Understanding the Discrete Element Method

5.3 Dispersion relation with gap. a) Take Program 5.3 and introduce alternating masses m1 = m and m2 = 2m as in Figure 5.22. This can be done by replacing the line mass=ones(n+1,1)+Ar*(randn(n+1,1)-.5); with mass=ones(n+1,1)+A1*(randn(n+1,1)-.5); mass(1:2:end)=2*mass(1:2:end); b) Compute the dispersion relation. Because the vibration of masses m1 = m and m2 = 2m corresponds to k = π, the horizontal axis must be rescaled and the ‘upper branch’ shifted into the first Brillouin zone as shown in Figure 5.23. c) Observe that the density of states will have a ‘gap’, i.e. there will be a range of ω values for which there are no states. This means that in a physical system with such a gap, no waves can propagate with frequencies in the gap region. d) Convince yourself that the sound velocity vg = ∂ω/∂k does not increase due to the introduction of heavier particles. Be aware that the sound propagation is due to the lower branch, where the light and heavy particles swing ‘together’; the upper branch describes the dispersion of the light and heavy particles swinging ‘against each other’.

m

Figure 5.22 constant K.

2m

Periodic linear chain with alternating masses m1 = m and m2 = 2m and spring

(a)

(b) 2

1.5

1.5 ω

2

ω

m

1

0.5

0.5 0

1

0

0.5

1

1.5 k

2

2.5

3

0

0

0.5

1

1.5 k

2

2.5

3

Figure 5.23 Dispersion relations for alternating masses m1 = m = 1 and m2 = 2m = 2, with spring constant K = 1: (a) calculated by a program with equal masses; (b) rescaled and shifted into the first Brillouin zone.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:26 — page 209 — #35

i

Exercises

i

209

Program 5.4 Numerical solution of the Korteweg–de Vries equation: small changes in the parameters may have a large effect on the solution, due to the nonlinearity. % Evolution of a Korteweg-De-Vries soliton clear all, format compact ntime=9000 % number of time-steps npoints=131 % number of gridpoints dt=0.025 % size of the time-step mu=0.1 % Prefactor for the term with the third derivative eps=0.2 % Prefactor for the term with the gradient ds=0.4 % Grid-size u(:,1)=0.5*(1-tanh(.2*ds*([1:npoints]-1)-5)); % Initial state u(1,2)=1.0; u(1,3)=1.0; % Endpoints (Boundaries) u(end,2)=0.0; u(end,3)=0.0; fac=mu*dt/(dsˆ3.0) time=dt for i=2:npoints-1 % First time-step a1=eps*dt*(u(i+1,1)+u(i,1)+u(i-1,1))/(ds*6); if ((i>2)&(i2)&(i=0) dy=[g y(1)]; else dy=[g-k*y(2)-D*y(1) y(1)]; end

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:47 — page 225 — #3

i

i

The Discrete Element Method in Two Dimensions

225

Program 7.1 Soft particle DEM program for bouncing ball without dissipation. clear, format compact global k, k=100; global g, g=-9.81; x0=4 v0=1 tspan=[0 10] [t,y]=ode23(’bouncing_ball’,tspan,[v0 x0]); plot(t,y(:,2),’k*’) % uncomment to obtain ’’continuous trajectories’’: %hold on %tspan=[0:.1:10] %[t2,y2]=ode23(’bouncing_ball’,tspan,[v0 x0]); %plot(t2,y2(:,2),’k-’) return Program 7.2 ODE function for bouncing ball without dissipation, with soft particle DEM. function [dy]=bouncing_ball(t,y) % bouncing ball without dissipation global g global k if (y(2)>=0) dy=[g y(1)]; else dy=[g-k*y(2) y(1)]; end return (b)

(c)

1

1

1

0.5

0.5

0.5

0

0

Height

(a)

× 10

0

–0.05 0.45

–0.5

–0.5 0

0.5 Time

1

0

0.5 Time

0

0

–0.025

−3

5

–5

–0.5

0.48

1

0.45

0

0.5 Time

0.456

1

Figure 7.1 Trajectories for the bouncing ball, with the spring constant taken to be: (a) k = 102 ; (b) k = 104 ; (c) k = 106 . The inserts show the trajectory magnified around the contact (where the height is below zero).

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:47 — page 226 — #4

i

226

Understanding the Discrete Element Method

0.02 0 −0.02 −0.04

1 Height

i

0.5

0.45

0.46

0.47

0.48

0.49

0 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Time

√ Figure 7.2 Trajectories for the bouncing ball with m = 1 and k = 104 , with damping D = 0.2 k/m (black curve) and without damping (gray curve). The inset shows that the size of the time-steps is reduced during approach to and separation from the contact.

(a)

(b) Ftot

(c)

Fdamp

Start of contact

Fel

Fel

Fel 0

Ftot

Ftot

End of contact

Fdamp

0

Start of contact

End of contact

0

Fdamp

Start of contact

End of contact

Figure 7.3 Force evolution for an elastic force with velocity-proportional damping: (a) direct addition of the elastic force (with sine-like time evolution) and the damping (with cosine-like time evolution); (b) regularization to avoid spurious cohesive (attractive) behavior at separation; (c) the force resulting from too-large impact velocities, which is difficult to integrate numerically.

The trajectory of the bouncing ball with damping is plotted in Figure 7.2; the inset shows that during the approach to and separation from contact, the time-step is much smaller than at the extremal penetration, in contrast to the case without damping in Figure 7.1, where the timestep stays constant. The reason for this is the non-smooth evolution of the damping force: if the elastic force corresponds to a sine-like shape between 0 and π, then the damping corresponds to a cosine over the same interval (both with decaying amplitude due to energy loss), with a jump at approach and at separation; see Figure 7.3(a). Both jumps are absent in the linear oscillator, where the attractive part of the interaction guarantees smooth evolution of the force; for dry granular materials, however, we have to demand that the forces be only repulsive or zero. The jump at approach has some justification, as impacts are non-smooth processes which trigger sound and damage at the surface; but the attractive (i.e. cohesive) force at separation is totally unphysical. The resulting jump in the force may make it necessary for the adaptive time integrators to reduce the time-step to very small values; nevertheless, ‘explosions’ may result in particle clusters with multiple contacts if several particles separate in an unfavorable manner. BDF integrators can deal with the jump if the impact velocity is not too large, while the attractive part is not only non-smooth but also non-monotonic and there is sufficient noise generated so that the simulation is destabilized. Such noise is behind the ‘detachment effect’ [4] and perhaps also the ‘brake failure’ [5] (with influences of the friction modeling), as well as the need for significantly more than ten time-steps (a hundred time-steps in [6]) to resolve

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:47 — page 227 — #5

i

The Discrete Element Method in Two Dimensions

i

227

Program 7.3 Function for bouncing ball with dissipation, with soft particle DEM, when the unphysical attraction is eliminated. f_el=-k*x f_damp=-D*v f_tot=f_el+f_damp if (sign(f_tot*f_el)(f_damp)) f_damp=sign(f_damp)*abs(f_el) end The previous discussion remains qualitatively the same if, instead of linear force laws, nonlinear powers of the penetration depth are assumed. Many studies use a Hertzian force law (∝ x 3/2 ) and the corresponding damping (∝ v 3/2 , called the Kuwabara–Kono force law [8]), but summing these two forces without additional precautions for free collisions leads to the same problem as for the linear force law discussed above. In dense systems, where particles are in permanent contact with their neighbors, the force equilibrium allows linearization of the particle interactions anyway. As we are interested in modeling particles with different shapes, we need shape-dependent force laws in any case.

7.1.2

Using two different stiffness constants to model damping

As mentioned in Chapter 1, § 1.7.4, if potentials are not symmetric, the energy in a system will not be conserved. Making use of this effect, Walton and Braun [9] proposed a dissipative force law  −k1 x for approach, F(x) = −k2 x for separation,

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:47 — page 228 — #6

i

228

i

Understanding the Discrete Element Method

where k1 > k2 . As long as the contacts are collisive, i.e. the particles separate again, this force law can be used, although adaptive integrators may reduce the time-step considerably due to the non-smooth change of the spring constant at the transition between approach and separation. One criticism of this model is that the dissipation is independent of the collision velocity and depends only on the ratio between k1 and k2 , whereas in experiments the dissipation actually increases with the collision velocity. The most serious drawback of Walton and Braun’s force law is that it does not allow equilibrium positions to be dealt with. When the velocities are close to zero at the position where the relative velocity is reversed, the force will vary by ±|k1 − k2 | times the penetration depth at that position. As the computation of ‘zero relative velocity’ is additionally affected by discretization errors in the time integrator, nominally static configurations will always exhibit considerable noise.

7.1.3

Simulation of round DEM particles in one dimension

The majority of simulations in the discrete element field used round particles. The appeal of such an approach is obvious: for one to three dimensions, the overlap computation becomes a one-dimensional geometrical problem; from the distance between particles and the radii alone, the magnitude of the force can be computed. But already with a single particle, we can see that shape has a crucial influence on the outcome. If one puts marbles on a slightly inclined surface, they will roll away, while dice won’t. So shape matters, or, as Kepler put it: where there is matter, you have to deal with geometry (‘Ubi materia ibi geometria’ [10]). Not even the central forces of planets can be dealt with adequately by considering only circular trajectories. The use of round DEM particles introduces central forces into physical systems where none are common in nature. Round particle simulations do have their uses as test cases during program development for non-spherical particle simulations. For instance, a first implementation to test the interplay between integrator, force law and neighborhood algorithm can be done with round particles. (If the implementation does not work properly with round particles, it also will not work with other shapes.) Program 7.4 is a driver program for simulating a vertical column of particles, R each with mass 1 and diameter 1, under the influence of gravity. It calls the MATLAB integrator ode113 with the force computation function DEMround1D (Program 7.5), and plots the trajectories as shown in Figure 7.4(a). The simulation time, number of particles, etc. can be modified easily. Program 7.5 contains the actual interaction computations. The nested loop computes the interactions between all particles with index i_part and all other particles with index j_part. For a large number of particles, this double loop will be over many noninteracting particles, and the simulation becomes inefficient, a problem which will be dealt with in § 7.5 on neighborhood algorithms. Computation of the magnitude and the direction of the force are separated to obtain a code which can be rewritten easily for two dimensions.

7.1.4

Simulation of round particles in two dimensions

The code for the one-dimensional simulation from the previous subsection can easily be generalized to two dimensions, as is done in Programs 7.6–7.7. A horizontal coordinate is introduced, and the magnitude of the force is computed from the particles’ distances and radii in the same way as for the one-dimensional case, except that the computation of the direction must be adapted to two dimensions. The code is, however, still unphysical, as it includes neither rotation nor friction. A single frame of the graphical output is shown in Figure 7.4(b).

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:47 — page 229 — #7

i

The Discrete Element Method in Two Dimensions

i

229

Program 7.4 Driver program that calls the function DEMround1D (Program 7.5) and produces graphical output of the trajectory. clear all format compact n_part=5 % initialize radius and mass global rad, rad(1:n_part)=0.5; global m, m(1:n_part)=1; global E, E=1000; % Young’s modulus global lmax, lmax=2*n_part+2; global lmin, lmin=0; global g, g=-9.81; % initialize positions and velocities=0 r0=2*[1:n_part]; v0=r0*0; y0(1:2:2*n_part-1)=r0; y0(2:2:2*n_part)=v0; t_end=4 [t,y]=ode113(’DEMround1D’,[0 t_end],y0); hold on for i=1:n_part plot(t,y(:,2*i-1),’ko-’) end axis([0 max(t) lmin-.5 lmax+.5]) return

The dynamics of (dense) granular materials is governed by a competition between rolling and sliding. If the particle shapes make rotation impossible, the dynamics will be governed by sliding alone. For round DEM particles, however, rotation is possible with relatively small mechanical resistance and much lower energetic cost. Even regular polygons do not behave exactly like circles: the finite length of their edges always produces a finite torque needed for rolling.

7.2 7.2.1

Modeling of polygonal particles Initializing two-dimensional particles

Using two-dimensional particles like those in Figure 7.5(a) may at first glance seem to be a makeshift approach, compared with the three-dimensional reality. Nevertheless, if in the

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:47 — page 230 — #8

i

230

i

Understanding the Discrete Element Method

Program 7.5 Force computation function DEMround1D to be called with the driver Program 7.4. function [dydt]=DEMround1D(t,y); global m rad E lmax lmin g n_part=length(m); if length(y)˜=2*length(m) error(’length of y must be twice the length of m’) end if length(rad)˜=length(m) error(’length of r must be twice the length of m’) end a=zeros(1,n_part); for i_part=1:n_part x1=y(2*i_part-1); % position of first particle rad1=rad(i_part); % Particle-Particle Interaction for j_part=i_part+1:n_part x2=y(2*j_part-1); % position of second particle rad2=rad(j_part); if (abs(x2-x1) d Exit the inner loop for faces % Vi is outside of P2 end end Record Vi in the list of vertices inherited from P1 end is included in the algorithm; for a chosen precision, e.g. an absolute error  = 10−14 , a point whose distance to a plane is within ± will be regarded as being on the plane. We only treat those vertices which penetrate into the other polyhedron further than  as being inherited vertices. Upon running Program 8.2, we obtain a list of the vertices of P1 inside P2 and a list of the vertices of P2 inside P1 , and from these we can obtain not only the coordinates of the inherited vertices but also the topological information about the faces on which those vertices are located.

8.3.5

Determination of generated vertices

The generated vertices of P0 , indicated by stars in Figure 8.14, are the intersection points of the triangular faces of P1 and P2 . To compute them, we have to resort to the triangle intersection algorithm introduced in § 8.3.2. In the current code, we call Program 8.1, which uses the point–normal form to represent a plane. We can compute the generated vertices by brute force, i.e. by first computing the intersections of all the faces of P1 with all the faces of P2 . For two polyhedra with n f faces each, this involves O(n 2f ) operations, as can be seen from Program 8.3. Then we index the intersection points as the generated vertices, which is done in Program 8.4. From Program 8.3 we obtain a list of pairs of intersection points, which is then used by Program 8.4 to determine each generated vertex and its coordinates. In addition, we also get a list of pairs of intersecting faces, contact face pair in Program 8.4, which will be used to determine the faces of the overlap polyhedron and the contact line. We need to index the intersection points as the generated vertices, or else each generated vertex would enter the list of intersection point pairs (intersect_point_pair in line 11 of Program 8.3) at least twice. Most of the time, an intersection point of two triangles would come from the intersection of an edge of one triangle with the interior of the other triangle, as in Figure 8.11(e). For a polyhedron, each edge is always shared by two triangular faces. Thus, if an edge of one face of polyhedron P1 intersects a face of polyhedron P2 , the face of P1 which shares that edge would also intersect the same face of P2 and report the same

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 313 — #25

i

The Discrete Element Method in Three Dimensions

i

313

Program 8.3 Program to compute all the generated vertices by ‘brute force’, i.e. by computing the intersections of all the faces of one polyhedron with all the faces of the other polyhedron. % Compute generated vertices: Part I num_int_pair = 0 % number of pairs of intersecting faces forall faces F1i of polyhedron P1 forall faces F2k of polyhedron P2 call compute_triangle_intersection(F1i , F2k ) % defined in Program 8.1. if two intersection points Vint1 and Vint2 exist num_int_pair = num_int_pair + 1 % Record (F1i , F2k ) in a list of contact face pairs contact_face_pair(1:2,num_int_pair)= (F1i , F2k ) % Record the two points in a list of pairs of intersection points intersect_point_pair(1:2,num_int_pair)= (Vint1 , Vint2 ) end end end Program 8.4 Program to index the intersection points as generated vertices. % Compute generated vertices: Part II % Assign the first two intersection points as the first two generated vertices vert_gen(1:2)= intersect_point_pair$(1:2,1) % Assign the indices of the generated vertices for the intersection point pairs intersect_point_pair_idx(1:2,1) = (1:2) vert_idx=2 % initialize the counter for the generated vertices for i=2:num_int_pair for j=1:2 Vtrial =intersect_point_pair(j,i) if Vtrial is not in the list of generated vertices vert_gen vert_idx=vert_idx+1 vert_gen(vert_idx)= Vtrial intersect_point_pair_idx(j,i)=vert_idx end end end intersection point again, as shown in Figure 8.15. This is the usual case in our polyhedral intersection computation, but exceptional cases may also occur, i.e. edge–edge intersections (in the discussion of Program 8.1; see also Figure 8.16). If an intersection point comes from an edge–edge intersection, it may enter the intersect_point_pair list four times. We will refer to such cases as degenerate cases for overlap computation (although they would not be exceptional from the point of view of triangle intersection computation), which would necessitate additional arrangements to index the generated vertices. The reason will usually be

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 314 — #26

i

314

i

Understanding the Discrete Element Method

(b)

(a)

F1

F1

Vi

Vi

E

F2

E

Figure 8.15 The case where an intersection point will be recorded twice in the list of intersection points in the overlap polyhedron computation. (a) The triangle–plane intersection case of Figure 8.11(e) is checked again for the overlap computation; the edge E of a triangular face F1 intersects with the shaded triangle of the other polyhedron at Vi . (b) The face F2 shares the edge E with face F1 and intersects the shaded triangle also at Vi . Thus the intersection point Vi will be recorded twice in the loop for computing the triangle intersections of the two polyhedra.

P2

F2 F2

F1

F1

P1

Figure 8.16 The degenerate cases in the overlap polyhedron computation: one intersection point of the two triangular faces F1 and F2 comes from the single edge-edge intersection on the left, and two points come from the two edge–edge intersections on the right. The circles are intersection points obtained from edge–plane intersections, while the black dots are from edge–edge intersections. As can be seen from Figure 8.15, each edge is shared by two triangular faces, which means that each intersection point obtained from an edge–edge intersection would be recorded twice for the edge of F1 and twice for the edge of F2 .

a penetration of two particles which can be regarded as unphysical, caused by, e.g., a too-large time step or wrong initialization of particle positions. Therefore, we need to index the generated vertices in the intersection point pair list intersect_point_pair to identify the generated vertices and record their coordinates. Simultaneously, we also obtain a list of segments in terms of the generated vertex indices, the intersect_point_pair_idx list in Program 8.4, which is used to determine the contact line. The brute-force approach (Program 8.3) has computational complexity of O(n 2f ), which means that for polyhedra Pi with n fi faces (i = 1, 2), in total the computation runs over n f1 ·n f2 triangle pairs to look for intersection points, and returns no intersection most of the time.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 315 — #27

i

The Discrete Element Method in Three Dimensions

i

315

Figure 8.17 The vertices of the overlap polyhedron (magnified on the left) obtained after computing the inherited vertices (circles) and generated vertices (stars) from the two intersecting polyhedra on the right (same as the polyhedra in Figure 8.14). The vertices obtained are points scattered in space, and we need to find the topological relations among them, namely the faces, to determine the overlap polyhedron.

In § 8.4, we discuss algorithms that decrease the simulation time by significantly reducing the number of triangle pairs considered when computing both inherited and generated vertices. For the time being, with the results given by brute-force methods (Program 8.2 for inherited vertices and Programs 8.3–8.4 for generated vertices), we obtain all the vertices of the overlap polyhedron, as in Figure 8.17.

8.3.6

Determination of the faces of the overlap polyhedron

When all vertices, inherited and generated, of the overlap polyhedron P0 have been computed as scattered points in space, we need to determine the topological relations among these vertices to obtain the faces of the overlap polyhedron. As soon as the vertex coordinates and the faces in terms of vertex indices are known, we can proceed to compute the volume and center of mass of the overlap polyhedron P0 as described in § 8.2.3. Similar to the vertices, which are partly inherited from P1 and P2 and partly generated from triangular face intersections, the faces of P0 can also be classified into generated faces and inherited faces. The inherited faces are those faces of P1 whose three vertices are all inside P2 , or vice versa; see Figure 8.18(a). The generated faces are parts of the original faces of P1 and P2 which are bounded by generated vertices or by generated vertices together with inherited vertices; see Figure 8.18(b) for an example. In contrast to generated vertices, which all originate from the intersection, generated faces are not ‘totally new’ but are parts of the intersecting faces of P1 and P2 . With the list of the intersecting face pairs (contact_face_pair in Program 8.3) obtained from computing the generated vertices, what remains to be determined for

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 316 — #28

i

316

i

Understanding the Discrete Element Method

(a)

(b) P2

Vg Vi P1 Vg

Vi

Vg Vi

Fi

Vg

Vg

Vi

Fg

P2

Vg

F

P1

Figure 8.18 Example of an inherited face Fi and a generated face Fg for the overlap polyhedron of the two tetrahedra P1 and P2 . (a) Since the three vertices Vi of the dark-gray triangular face of P1 all lie inside P2 , the face is an inherited face for the overlap polyhedron. (b) There is only one vertex Vi of P1 which lies inside P2 , so there are no inherited faces; the gray triangle in the face F of P2 , which consists of three generated vertices Vg , is a generated face of the overlap polyhedron by the face intersections of F with the faces of P1 which meet at Vi . Faces which have both generated vertices Vg and inherited vertices Vi are also generated faces.

those faces are the indices of the generated vertices located on them. For the inherited faces, instead of finding them directly by checking their vertices, we make use of the VERTEX_FACE_TABLE array (which stores for each vertex all the faces it is located on, as described in § 8.2.1). For each inherited vertex Vk , we check all its faces in VERTEX_FACE_TABLE: if a face has already been registered as a face of P0 , we register Vk as a vertex of that face of P0 ; if a face has not been registered as a face of P0 , we register that face as a new entry in the face list of P0 and register Vk as an inherited vertex for this newly registered face. In this way, we not only register the inherited vertices on the generated faces it may belong to, but also find the inherited faces. The algorithm for finding the faces of the overlap polyhedron P0 that come from P1 (respectively, P2 ) is summarized in Program 8.5. Although the faces of the original polyhedra P1 and P2 are triangles, generated faces are not necessarily triangular, as can be seen from the generated face formed with two inherited vertices Vi and two generated vertices Vg in Figure 8.18(a). Since our formulae (and the corresponding subroutines in the DEM code) for computing the physical properties of a polyhedron are based on triangular faces, to obtain the volume and center of mass of the overlap polyhedron, we need to triangulate those generated faces with more than three vertices. For this purpose, we have devised two algorithms to determine the relative orientations of the vertices of a generated face and to order them counterclockwise: one method uses the centroid of the generated face (Figure 8.19), and the other uses an edge (Figure 8.20). For the method which uses the centroid, we first need to set up a reference system for ordering the orientations of the vertices to be sorted. We choose the origin to be the centroid C(C x , C y , C z ) of the generated face. If the face has k vertices Vi (Vi x , Vi y , Vi z ), the centroid is given by the arithmetic mean of the vertex coordinates: C=

k

Vi /k.

(8.39)

i=1

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 317 — #29

i

The Discrete Element Method in Three Dimensions

i

317

Program 8.5 Algorithm to determine the faces of the overlap polyhedron P0 which come from faces of polyhedron P1 ; the same operations have to be performed with the faces of polyhedron P2 . % faces generated from intersections with P1 forall faces Fi of polygon P1 if face Fi is has an intersection with polygon P2 register Fi as a face of P0 find the entry of Fi in the list contact_face_pair find the vertex indices for all intersection points of Fi from the corresponding entries in intersect_point_pair_idx register all vertices of Fi as vertices as P0 end end % inherited faces from P1 forall inherited vertices Vk of P1 forall faces Fi in VERTEX_FACE_TABLE of the vertex Vk of P1 if Fi is in the list of faces of P0 register vertex Vk for the face Fi else register Fi as a new entry in the face list of P0 register Vk as a vertex of the face Fi end end end

V5 (V4ʹ) V3 (V5ʹ)

θ5

Va (v) C θ1 V4 (V6ʹ)

V6 (V3ʹ)

nf

Vb (u)

θ6 V2 (V2ʹ)

V1 (V1ʹ)

Figure 8.19 Ordering of vertices of a generated face with respect to the centroid: Vi is the entry in the vertex index list of the generated face before ordering, and Vi is the entry in the list after ordering; C, the centroid, is the average of the vertex coordinates; nf is the normal to the face; a unit vector from C to the first vertex V1 is selected as the base unit vector Vb ; an auxiliary unit vector Va is defined to be nf × Vb . The angle between Vb and the vector from C to Vi is θi in Equation (8.42). The vertices Vi are ordered according to the values of θi . For the ordered list Vi , we can obtain a triangulation of the  ) for i = 1, . . . , 5 and (C, V , V ). face with triangles (C, Vi , Vi+1 6 1

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 318 — #30

i

318

i

Understanding the Discrete Element Method

V3 (V5ʹ) V5 (V4ʹ)

V6 (V3ʹ)

V4 (V6ʹ) θ4

θ3 θ5 θ6

V2 (V2ʹ)

V1 (V1ʹ)

Figure 8.20 Ordering of vertices of a generated face with respect to an edge: the first two entries V1 and V2 in the vertex list of the generated face are used to define the unit base vector Vb ; then cos(θi ) for the remaining vertices Vi are computed from Equation (8.43). The larger cos(θi ) is, the closer Vi is to the edge V1 V2 . A triangulation is obtained automatically from the ordered list, with triangles  ) for i = 2, . . . , 5. (V1 , Vi , Vi+1

Next, we define the w-axis to be the normal nf to the generated face, which can be either found in the FACE_EQUATION array (see § 8.2.1) or computed from three non-collinear vertices (see Equation (8.5)). We choose the u-axis to be the unit vector Vb from C to the first vertex in the list, V1 : Vb =

V1 − C . V1 − C

(8.40)

Then we choose an auxiliary vector Va as the v-axis, defined by Va = nf × Vb .

(8.41)

We denote by θi the angle between Vb and the vector from C to Vi . Then, sin(θi ) and cos(θi ) can be computed from vector inner products as follows: sin(θi ) = (Vi − C) · Va , cos(θi ) = (Vi − C) · Vb .

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 319 — #31

i

The Discrete Element Method in Three Dimensions

i

319

Using the atan2(y, x) function5 in FORTRAN, we can obtain the angle θi of each Vi with respect to the unit base vector Vb : θi = atan2(sin(θi ), cos(θi )), θi = 2π + θi

(8.42)

if θi < 0.

Finally, we sort the Vi according to the angles θi , and obtain a list of vertices ordered counterclockwise, as shown in Figure 8.19. The above algorithm is self-sufficient, i.e. no additional information is necessary besides the coordinates of the vertices to be ordered. It can therefore be used in other applications as a general method for ordering a set of disordered points on a plane. Our overlap computation, however, provides other convenient information which can facilitate the ordering process, namely the information on intersection segments (the intersect_point_pair_idx array in Program 8.4), which are the edges of the overlap polyhedron. Our second ordering method, illustrated in Figure 8.20, makes use of this edge information. When we register the vertex indices of the intersection points for a generated face, the first two vertices V1 and V2 are always recorded in the list as a pair, which means that for a generated face, we know at least one edge V1 V2 from the first two vertices in its vertex indices list; we define our base −−−→ vector Vb as the unit vector in the direction of V1 V2 , Vb =

V2 − V1 . V2 − V1 

Since the angles of all the other vertices Vi (i = 3, . . . , k) with respect to Vb cannot be larger than π , instead of computing the angles θi it suffices to compute only their cosine values: cos(θi ) =

(Vi − V1 ) · Vb Vi − V1 

(8.43)

for i = 3, . . . , k. Then we sort the Vi according to these cos(θi ) values. The result from this method can be either a counterclockwise or a clockwise ordering, since the normal of the face has not been taken into account. To triangulate the generated face with a ordered vertex list is then trivial; see Figure 8.20. The two different ordering methods produce two different triangulation methods, as shown in Figures 8.19 and 8.20, with the triangles formed by either C or V1 and two successive vertices after ordering. The latter method, i.e. the one which makes use of the edge information, is implemented in our DEM simulations.

5 For a point (x, y) in any of the four quadrants, the atan2(y, x) function gives the angle in radians that the position vector makes with the positive x-axis. The resulting angle is positive (counterclockwise) when y > 0 and negative (clockwise) when y < 0.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 320 — #32

i

320

i

Understanding the Discrete Element Method

Once the coordinates of the vertices and the triangular faces are determined, the volume and center of mass of the overlap polyhedron can be computed using Equation (8.15) and Equation (8.17), based on the polyhedron decomposition method introduced in § 8.2.3.

8.3.7

Determination of the contact area and normal

The contact area is the triangulated surface determined by the segments of the intersection line (the line along which surfaces of the polyhedra P1 and P2 intersect) connected to the center of mass of the overlap polyhedron. The normal direction for the collision is then defined as a weighted average of the normals to the triangles in the triangulated surface. During the computation of the generated vertices (see § 8.3.5), the intersection point pairs from the triangular face intersections are registered in the intersect_point_pair array (in Program 8.3), and the vertex indices (of the overlap polyhedron) for those points are stored in a list, the array intersect_point_pair_idx (in Program 8.4). The pairs in the list form the intersection segments of the two intersecting polyhedra. The intersection segments are connected with each other (which leads to the same point being entered two or more times in intersect_point_pair, as discussed earlier) and form closed paths traversing the surfaces of the two contacting particles; we call these closed paths ‘contact lines’ (see Figure 8.21). In DEM simulations of granular materials, it is always assumed that the deformation at the contacts is small. The largest overlap between two intersecting polyhedra is limited to a few percent (depending on the Young’s modulus) of the particle diameters. So cases like the one shown in Figure 8.21(a), where there are two or more contact lines, should not occur. Only cases like the one in Figure 8.21(b), with only one contact line, will be expected in DEM simulations. The existence of more than one contact line would result from an edge of one polyhedron piercing two faces of the other. More than two entries for one intersection point could be caused by an edge–edge intersection (which is treated as a degenerate case for the overlap computation; see, e.g., Figure 8.16), so that two contact lines are joined by the intersection point from the edge–edge intersection. (a)

(b) P2

P2 V3

V6

V1 V4

V1

V2 V5 P1

V3

Po

V3 V2

F

P1

Figure 8.21 Sketches of the contact lines (thick dashed lines) of two intersecting tetrahedra P1 and P2 (the same situations as shown in Figure 8.18); Vi are the vertices of the overlap polyhedron P0 . (a) There are two contact lines, (V1 , V2 , V3 , V1 ) and (V4 , V5 , V6 , V4 ), which results from an ‘unphysical’ overlap of granular particles in a DEM simulation. (b) There is only one contact line, (V1 , V2 , V3 , V1 ), which is the expected (if exaggerated) case for DEM granular particles.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 321 — #33

i

The Discrete Element Method in Three Dimensions

i

321

Program 8.6 Algorithm to determine the points which make up the contact line: go through the list of vertex pairs and connect them in a closed path. % Copy the first point pair into the contact line array contact_line(1:2)=intersect_point_pair_idx(1:2,1) intersect_point_pair_idx(1:2,1) = 0 % clear the entry V(end)=contact_line(2) % current end vertex of the contact line for i=2 to num_int_pair % loop over all intersection points for j=i to num_int_pair % loop over the remaining intersection points Find V(end) as entry k in the list of pairs intersect_point_pair_idx end Assign the other vertex of this pair to V(end) Assign V(end) as a vertex of the contact line, contact_line(i+1) = V(end) Eliminate the entry k in the list intersect_point_pair_idx(1:2,k) = 0 end if contact_line(1) V(end) error(’contact line not closed; degenerate case!’ ) end The vertex computation and face determination methods discussed previously in this chapter are valid in general for finding the overlap geometry of any two intersecting polyhedra. In contrast, for our discussion of the contact line and the contact area, we assume hereafter that the overlap of the polyhedra is small and that the edges of one polyhedron do not pierce the other. Thus, we assume that in the intersection point pair list (the intersect_point_pair array), each point appears in the list only twice. To obtain the contact line, we need to connect the contact segments defined by the vertex indices in the intersect_point_pair_idx array; this is done by Program 8.6. After obtaining the center of mass C0 of the overlap polyhedron and the contact line, we can connect C0 with each vertex on the contact line; this gives us the contact triangles (see Figure 8.22(b)), which define the contact area for the two intersecting polyhedra (Figure 8.22(c)). To define a unique normal direction for the contact area, we first compute the area-weighted normals (not unit vectors!) of the contact triangles (C0 , Vi1 , Vi2 ) as follows: ni =

1 (Vi1 − C0 ) × (Vi2 − C0 ). 2

The directions of the normal vectors are chosen to point towards one of the intersecting polyhedra, say P1 ; in this case we use the vector from C0 to the centroid CP1 of P1 , and specify that if ni · (CP1 − C0 ) < 0,

then ni → −ni .

We then take the average of the normals of all k contact triangles, weighted by the areas Ai of the triangles, to obtain the normal for the contact area:

Ai ni . nc = k  k Ai ni 

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 322 — #34

i

322

(a)

i

Understanding the Discrete Element Method

(b)

(c)

Co

Co

Figure 8.22 (a) Two overlapping polyhedra, showing the contact line (thick black line) and the generated vertices (stars). (b) The overlap polyhedron, together with the generated vertices (stars), the inherited vertices (circles), the centroid C0 , and the triangles formed by C0 and the segments of the contact line. (c) The magnified and rotated contact area, bounded by the contact line (thick line) and showing the contact triangles along with their centroids (crosses) and normals (arrows), which are scaled by the areas of the triangles.

In summary, the overlap computation proceeds in the following steps: 1. Find the inherited vertices and compute the generated vertices from the intersections of the triangular faces of the two polyhedra. 2. Determine the generated faces and inherited faces based on the face intersection information. 3. With the vertex coordinates and the vertex index list for the faces, compute the volume of the overlap polyhedron and its centroid. 4. Join the intersection line segments to determine the contact line, and then determine the contact area by constructing the contact triangles from the centroid of the overlap polygon and each pair of successive vertices on the contact line.

8.4

Optimization for vertex computation

The brute-force way of computing the inherited vertices (Program 8.2) and generated vertices (Program 8.3) would mean that for two polyhedra each with n f faces and n v vertices, n f · n v intersections have to be computed to determine the inherited vertices, and n 2f triangle intersections need to be computed to determine the generated vertices. For particles with few features, the impact of all these operations on the simulation time might be not so severe; but for particles with many features (e.g. if one should attempt to approach the ‘limit’ of spherical particles by increasing the number of faces), the necessary computational effort becomes prohibitive. Therefore, there is an incentive to optimize the vertex computation. One way to reduce the time consumption is to identify regions where there is an actual physical overlap due to the coordinates of features, and work only in the neighborhood of such regions.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 323 — #35

i

The Discrete Element Method in Three Dimensions

8.4.1

i

323

Determination of neighboring features

The principles of the neighborhood algorithms (Chapter 7, § 7.5, in particular the section on bounding boxes, § 7.5.2.1) for identifying adjacent particles according to their bounding boxes can be extended to the features of polyhedral particles. Similar to the scenario in contact detection, where only neighboring particles are involved in contact, for a pair of intersecting polyhedra, only those features of one polyhedron which are in the ‘neighborhood’ of features of the other polyhedron would intersect and form the overlap polyhedron. We outline two methods to identify such ‘neighboring’ features: the overlap bounding box method and the projection of extremal vertex method, which make use of the information on the neighborhood (see § 7.5.2), and the projection algorithm of § 8.5.3 which refines the contact particle pair list from the neighborhood algorithm. The overlap of two polyhedra must be computed if their bounding boxes overlap. In Figure 8.23 we show an equivalent example in two dimensions. The overlap polyhedron, if it exists, would be located inside the overlap region of the two bounding boxes, which we call a overlap bounding box; see Figure 8.23(a). We treat those vertices which are located inside the overlap bounding box as neighboring vertices and those faces which have at least one vertex inside the overlap bounding box as neighboring faces. In Figure 8.23(a), the shaded area is the overlap bounding box within which the neighboring vertices (circles) are located, and the neighboring edges (thick lines) have at least one of their endpoints inside the shaded area. In the neighborhood algorithm, only when the bounding boxes of the two polyhedra overlap in the x-, y- and z-directions will they be registered in the contact list. Thus, to obtain the overlap bounding box, we have to record the overlaps along the x-, y- and z-axes explicitly. From the vertices of a polyhedron we can obtain a list of neighboring vertices, and from the VERTEX_FACE_TABLE array the faces on which the neighboring vertices are located can be found and then registered as neighboring faces. An alternative way to define the overlap region involves ‘recycling’ the projections of vertices for refining the contact list (see § 8.5.3). The vertices are projected along the vector (a)

(b) Maximal projection of P1 C2

C1 Minimal projection of P2

Figure 8.23 Two methods to determine neighboring features; in each case the overlap region is shown as a shaded area, the neighboring vertices are indicated by circles, and the neighboring edges are drawn as thick lines. (a) In the overlap bounding box method, the overlap region is defined to be the overlap of the two bounding boxes. (b) In the projection method, the overlap region is confined by the extremal projections along the centers of mass of the two particles.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 324 — #36

i

324

i

Understanding the Discrete Element Method

connecting the centers of mass of the two particles; see Figure 8.23(b). The maximum distance from one polyhedron and the minimum distance from the other one determine an overlap region; only those vertices whose projections lie inside this region are considered to be neighboring vertices and are relevant to the overlap computation. The neighboring faces (or edges in 2D) are those faces which have at least one vertex falling inside the overlap region. After checking the projections of the vertices against the overlap region formed by the extremal projections, we can obtain the lists of neighboring vertices and neighboring faces as well. Provided the projections of the vertices are known, a vertex in the projection method requires only two comparisons, while with the overlap bounding box method it would need six comparisons, three times as many (in the worst case). For simple geometries, the two methods give the same neighboring features, as we can see from Figure 8.23; however, for complicated geometries, the features may be different, so it becomes advantageous to use a combination of the two methods. Only those vertices yielded by both methods will be considered neighboring vertices, and edges or faces must have at least one vertex inside both regions. If the methods are used in combination, it is better to obtain a list of vertices from the projection method first and then refine the list by the bounding box method. In our current code, the projection method is used; the overlap bounding box method and its combination with the projection method is still under development.

8.4.2

Neighboring features for vertex computation

When we have a list of neighboring vertices and a list of neighboring faces for each polyhedron, we can just check the neighboring vertices of one polyhedron against the neighboring faces of the other one. Thus, Program 8.2 for the computation of the inherited vertices can be optimized and we obtain Program 8.7. For the generated vertices, instead of computing the intersection points for all pairs of triangular faces of the polyhedra (Program 8.3), we compute all the local triangular face pairs, as in Program 8.8. The computational effort for the optimized algorithms for neighboring features would be O(m 2 ), where m is the number of neighboring features. For polyhedra with a small number of vertices and faces, the improvement can be expected to be insignificant. However, for polyhedra with a large number of vertices and faces, with small deformations at contact, the number of neighboring features Program 8.7 Algorithm to determine the vertices of P0 which are inherited from P1 via neighboring features; the same procedure must be applied for the vertices inherited from P2 . Define d as the distance between the features which can still be resolved forall neighboring vertices Vi of P1 forall the neighboring faces Fk of P2 if distance(Vi , Fk ) > d Exit the inner loop % face Vi is outside P2 end end Record Vi in the list of inherited vertices from P1 end

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 325 — #37

i

The Discrete Element Method in Three Dimensions

i

325

Program 8.8 Algorithm to compute all the intersection points between polyhedron P1 and polyhedron P2 via neighboring features. Set num_int_pair = 0 % number of pairs of intersecting faces forall neighboring faces F1i of polyhedron P1 forall neighboring faces F2k of polyhedron P2 Call compute_triangle_intersection ( F1i , F2k ) % Program 8.1 if two intersection points Vint1 and Vint2 exist num_int_pair=num_int_pair+1 % record the pairs (F1i , F2k ) in a list of contacting faces contact_face_pair(1:2,num_int_pair)= (F1i , F2k ) % record the two points in a list of intersection point pairs intersect_point_pair(1:2,num_int_pair)= (Vint1 , Vint2 ) end end end

would be merely a small fraction of the total number of features; therefore, the effort for the overlap computation would be greatly reduced compared to the brute-force algorithms Programs 8.2 and 8.3, as well as [10].

8.5

The neighborhood algorithm for polyhedra

In three dimensions, there are more possible neighbors than in two dimensions, so the overlap computation is more expensive. Accordingly, some further considerations are necessary.

8.5.1

‘Sort and sweep’ in three dimensions

For the ‘sort and sweep’ algorithm in three dimensions, the situation is analogous to that in two dimensions, except that now there are more cases which may lead to double entries in the contact list. The algorithm works as in one dimension, except for the following cases: in Figure 8.24(a)–(c) the new particle pair (i, j) would be entered only once in the neighborhood list for the x-coordinate, in (d)–(f) it would be entered twice, and in a situation like (g) it could even be entered three times. Searching the lists for double or triple entries is even more inconvenient than in two dimensions. Again, we can make use of the information about the old bounding boxes from the previous time-step, as in the following piece of pseudo-code: 1. If there is a new overlap in the x-direction for a pair of particles, the pair is added to the list of pairs; this includes cases (a), (e), (f) and (g) in Figure 8.24. 2. If there is a new overlap in the y-direction for a pair of particles, the pair is added to the list of pairs only if there was an overlap of the bounding boxes in the x-direction in the previous time-step; this is true for cases (b) and (d) in Figure 8.24. 3. If there is a new overlap in the z-direction for a pair of particles, the pair is added to the list of pairs only if there was an overlap of the bounding boxes in the x- and y-directions in the previous time-step; this covers case (c) in Figure 8.24.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 326 — #38

i

326

i

Understanding the Discrete Element Method

j

(a)

(b)

(c) j

j i z x

y

i z x

y

j

x

y

j

j

(d)

(e)

y

i z

j

(f)

(g)

i

i

i

i

z

z

z

z

x

y

x

y

x

y

x

Figure 8.24 Relative movement of bounding boxes in three dimensions: (a)–(c) the candidate overlap pair (i, j) would be entered only once in the contact list; (d)–(f) the pair would be entered twice; (g) the pair would be entered three times, if no precautions are taken.

With this scheme, no double entry of pairs can occur in three dimensions. In principle, the subroutines for constructing the contact list in the x-, y- and z-directions via sorting can be computed in parallel, which makes it suitable for implementations with shared-memory parallelism [11].

8.5.2

Worst-case performance in three dimensions

For physically plausible configurations (particles in disorder, all particles of different sizes), ‘sort and sweep’ behaves very economically; compared with the polyhedral overlap method, the time consumption is negligible. There are, however, some artificial cases which could lead to a considerable downgrading of performance. If, instead of polyhedral particles, round particles are simulated, the relative time consumption of the overlap computation is reduced. If all particles are of the same diameter and the simulation uses a box-like geometry, with particle centers on a square grid as shown in Figure 8.25, ‘lattice vibrations’ may occur. For long-wavelength lattice vibrations, a rather large proportion of particles may change their relative positions in a comparatively short time. While this √ is not too problematic in two dimensions—where for square systems of N particles, about N bounding boxes would be affected as in Figure 8.25(a)—in three dimensions with cubic geometries, a larger fraction of

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 327 — #39

i

i

The Discrete Element Method in Three Dimensions

(a)

327

(b)

y z

y

x x

Figure 8.25 Worst-case performance of the sort and sweep algorithm: (a) In two dimensions, with particles on an l x × l y ordered grid, for long-wavelength oscillations of the configurations in the y-direction, up to l x particles may change position in each row of the contact list for the y-direction. (b) In three dimensions, with particles on an ordered l x × l y × l z grid, for long-wavelength oscillations normal to the x-z plane, up to l x ×l z particles may change position in each row of the contact list for the y-direction.

√ 2 up to 3 N particles might be affected, as in Figure 8.25(b). Nevertheless, for such particle geometries, neighborhood tables would work anyway.

8.5.3

Refinement of the contact list

The ‘sort and sweep’ algorithm will yield some particle pairs which are close but do not actually overlap, especially when features are aligned diagonally with the axes. In that case, one still wishes to avoid the full overlap computation, by undertaking some computationally inexpensive pre-processing of the pairs. Particle pairs which cannot have an intersection due to additional geometric constraints should not be passed to the intersection computation. This can be achieved either by creating a ‘reduced contact list’ where non-intersecting particle pairs from the original contact list have been eliminated, or by calling additional functions immediately before the intersection computation. Note that the original contact list should not be manipulated: if particle pairs are eliminated from that list erroneously, and these particles begin to have contact in a later time-step, they cannot be recovered any more, as the ‘sort and sweep’ algorithm deals with them only in the particular time-step where their bounding boxes start to overlap. The following approaches can help to speed up the simulation: Comparison of bounding boxes of the features. Faster than performing an intersection computation for the triangular faces is to compare the bounding boxes of the features (vertices, edges, triangular faces). If there is no possible overlap, the intersection computation can be skipped.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 328 — #40

i

328

i

Understanding the Discrete Element Method

Projection of extremal vertices. If there are features with possible overlaps, in the next step one computes the projections of their extremal vertices onto the line which connects the centers of mass of the particles. For two polyhedra P1 and P2 with centers of mass C1 and C2 , the following steps are needed: 1. Compute the unit vector u for the connection between C1 and C2 : u=

C2 − C1 . C2 − C1 

2. Compute the projections of the vertices of P1 and of P2 onto u. The simultaneous computation of the projection of several vectors onto another vector can be performed efficiently by using a matrix–vector product. 3. Find the maximally protruding projection of the vertices of P1 (in the direction of u), max projection P1. Find the minimum of the projections of the vertices of P2 (in the opposite direction to u), min projection P2. 4. If max projection P1 > min projection P2, pass the particle pair to the intersection computation; otherwise, there is no overlap. As can be seen in Figure 8.26(b), the projection algorithm can deal with particles of very elongated shape, though the overlap of the projection is only a necessary, not a sufficient condition.

(a)

(b)

C2 C2 Min_projection_P2 C1

Max_projection_P1 Min_projection_P2 C1

Max_projection_P1

Figure 8.26 Refinement of the contact detection results via projection in two dimensions: (a) There is overlap of the minimal projection of polygon 1 and the maximal projection of polygon 2, so this pair is kept in the contact particle pair list. (b) There is no overlap of the minimal projection of polygon 1 and the maximal projection of polygon 2, so this pair is removed from the contact list, and the reduced contact list is passed to the computation of the overlap polygon.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 329 — #41

i

The Discrete Element Method in Three Dimensions

(a)

(b)

i

329

(c)

Figure 8.27 Refinement of the contact detection results via bounding circles in two dimensions (the analog of bounding spheres in three dimensions): (a) The bounding circles (dotted lines) intersect, the bounding boxes (dashed lines) overlap, and the polygonal particles also overlap. (b) The bounding circles are non-intersecting, the bounding boxes overlap, but the polygonal particles do not overlap. (c) The bounding circles intersect, the bounding boxes overlap, but the polygonal particles, which are elongated, do not overlap. Neither bounding boxes nor bounding circles are good indicators of the overlap between particles.

There are various other obvious approaches, but they turn out not to be efficient: ‘Bounding spheres’ (as in Figure 8.27). In this approach, when spheres circumscribed around the particle vertices do not overlap, the particle pair is treated as non-overlapping. This approach turns out to be inefficient because of the increase in volume compared to the original polyhedron; particles which are indicated as overlapping by their bounding boxes will often also be indicated as overlapping by the bounding spheres. ‘Separating planes’. One can construct a sequence of planes touching the nearest corners of the neighboring particles. If one succeeds in constructing a plane such that all vertices of one polyhedron are on one side of the plane and all vertices of the other polyhedron are on the opposite side, there is no overlap of the polyhedra. However, the ratio of detected nonoverlapping pairs for a fast (not exact) algorithm [11] turned out to be lower than for the projection algorithm, while in itself the computational effort for this method was so great that it was inefficient to use to pre-process particle pairs before applying the projection algorithm.

8.6

Programming strategy for the polyhedral simulation

For a start, getting some experience with two-dimensional polygonal simulations is advised: the amount of geometrical information one has to deal with considerably exceeds that for spherical particle simulations. What has been said in Chapter 7 about modularization and program flow for two-dimensional simulations applies also to three-dimensional simulations. One should be aware that any problem that has not been dealt with at an earlier programming stage (exception handling, error messages, confusing function interfaces) will come back to haunt one in the later stages to a much greater extent than for two-dimensional or spherical particle codes—trying to muddle through is a bad idea. In particular, the remarks on program development in Chapter 10 should be taken seriously.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 330 — #42

i

330

i

Understanding the Discrete Element Method

As always in computer simulations, one should begin with a few particles, and then increase the number of particles slowly. Carrying out the programming tasks in the following order guarantees that only minimum modifications will be needed to obtain the next step of functionality in the simulation: 1. Starting from a polygonal simulation (e.g. using squares), first introduce appropriate three-dimensional graphics as a debugging tool, so that the polygons move in a plane in three dimensions; see Figure 8.28(a). 2. Now the data structures for polyhedra can be added, with their corresponding graphics displayed ‘over’ the polygons. When the functionality for polyhedra is added, it should be in a module different from that for the corresponding two-dimensional functionality. To begin with, it is advisable to use simple polyhedra, such as regular octahedra with one square base aligned with the two-dimensional simulation plane. (Tetrahedra are not so suitable due to their sharp corners, which may lead to problems in later stages when intersection and overlap algorithms are being programmed.) 3. With the code so far, three-dimensional octahedra will move in a plane as in Figure 8.28(b). Choose a suitable angle, viewpoint and line resolution for the graphics of this ‘pseudo-3D simulation’. 4. At this point, one is ready to start programming the functions for the intersection computation. First the edge–triangle intersection and then the edge–edge intersection should be programmed in functions which are not part of the simulation code (but which can still be visualized with the graphics). Test and debug these functions using test triangles which are explicitly assigned in the code to easily reproduce intermediate results during debugging (choosing shapes randomly is not advised at this stage of the program development), together with the data structures of the simulation code for simple portability. 5. When the intersection computation for individually initialized edges and triangles works satisfactorily, one can introduce a third, i.e. z, coordinate for the polyhedra: each (a)

(b)

1 –0.5 0

0.5 0

0 –0.5 0

(c) 1

1 0.5

1

2

2

1

0

3

–0.5 0

1

2

0

(d)

3

–0.5 0

1

2

0

1

2

3

(e)

1 0.5

1 0.5 0 –0.5 0

1

2

1

2

0

1

2

3

0 –0.5 0

1

2

0

1

2

3

Figure 8.28 Stages of program development for the three-dimensional simulation: (a) polygons in two dimensions; (b) the polygons are augmented to octahedra in the same plane; (c) the polyhedra are shifted to different heights; (d) spheres are inscribed in the polyhedral shapes to be used for the interaction computation; (e) full polyhedral simulation.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 331 — #43

i

The Discrete Element Method in Three Dimensions

6.

7.

8. 9.

10.

11.

12. 13.

i

331

polyhedron should move along a different z-height, as shown in Figure 8.28(c), to avoid situations of degenerate geometry when the intersection functions are implemented. Next, implement the intersection functions in the pseudo-3D simulation for the whole polyhedra, and display the edge–face and edge–edge intersections. To obtain larger (more visible) intersection segments, the Young’s modulus of the two-dimensional interaction should be reduced. If the intersection computation works satisfactorily, one can implement the overlap computation, i.e. the assembly of the contact line and the overlap polyhedron with the ‘brute force’ approach (intersection of every face of one polyhedron with every face of the other polyhedron). The numerical values for the overlap volume and the completeness of the contact line, as well as their continuous variation, should be verified ‘by eye’ and by controlling the corresponding data that are output to a file. More complicated polyhedra can now be introduced to check that the data structures and algorithms work also in the general case. The two-dimensional interaction can then be eliminated, and one can introduce a threedimensional interaction of elastic spheres inscribed in the polyhedra, as shown in Figure 8.28(d). Further, the restriction of each polyhedron to a single z-height can be dropped, so that the particles can move fully in three dimensions with spherical interaction (but still without rotation). This will allow the testing of overlap situations different from those for purely two-dimensional movement. Implement the elastic force law for the polyhedral overlap (without using it for the time integration) and visualize the output, for example by drawing the vectors which would act between the particles at the appropriate force points. The situation should be observed and checked for plausibility. Monitor the variation of the force for colliding particles that has been output to files. If the variation of the force is smooth in both magnitude and direction, the interaction of the spheres can be replaced with the interaction law for the polyhedra (starting with small Young’s modulus). If the simulation runs without problems, the parts that are still missing—rotation with its equations of motion, damping and friction—can now be implemented. Finally, increase the number of particles and optimize the CPU-intensive parts of the code (intersection computation, neighborhood algorithms etc.).

The issues with time integration for the linear degrees of freedom are the same as in the twodimensional case; see Chapter 7, in particular § 7.6. However, some additional issues arise which relate to the use of quaternions for the angular degrees of freedom. First of all, the unit quaternions q should be normalized again after the integration, as the integrators usually do not conserve the norm. Further, the time derivatives q˙ of the quaternions q were assumed to be orthogonal to the quaternions in Chapter 1, § 1.3.9 and § 1.3.10, but this orthogonality is lost during the time integration if no precautions are taken. After normalization of the qi (for each particle i), their respective time derivatives q˙ i should be orthogonalized. The normalization and orthogonalization steps6 make the dynamics of the quaternions a constraint dynamics in the sense of DAEs (see § 2.8). 6 An equivalent approach to normalization and orthogonalization was proposed in [12], but in later texts (e.g. [13]) this advice was lost.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 332 — #44

i

332

8.7 8.7.1

i

Understanding the Discrete Element Method

The effect of dimensionality and the choice of boundaries Force networks and dimensionality

In general, two straight lines will nearly always (provided they are not parallel) intersect in two dimensions, and almost never (except when they lie in the same plane) in three dimensions. The consequence for force networks is that force chains with strong forces may easily meet in two-dimensional particle systems and lead to shear bands, destruction of heaps, and so on—but not in three dimensions. Accordingly, all things (friction coefficients, particle elongation, cross-sections) being equal, we may find greater stability of packings, larger angles of repose etc. in three dimensions than in two dimensions. That force concentrations don’t occur so easily in three dimensions might also be the reason that up to now we have not encountered penetration of vertices of one polyhedron through another polyhedron, while in two dimensions such penetration cases are not at all rare for polygons.

8.7.2

Quasi-two-dimensional geometries

Two-dimensional geometries in the x-z plane are often mimicked in experiments by parallel walls spaced narrowly in the y-direction. However, the physical situation is not twodimensional at all: due to the Jannsen effect, narrowly spaced walls take up much more stresses in the y-direction than distant walls would, as indicated in Figure 8.29(a) and (b). Therefore, such an attempt to reduce the influence of the third (i.e. y) direction may on the contrary magnify it. Another issue arises with monolayers of particles. In that case, the walls should be rather close, as in Figure 8.29(c) and (d); if the walls are spaced too widely, the forces will be deflected easily towards the walls, or the particles may tilt, as shown in Figure 8.29(e) and (f). These issues are particularly relevant when two-dimensional or quasi-two-dimensional results should be compared with three-dimensional simulations. (a)

(b)

(c)

(d)

(e)

(f)

Figure 8.29 Influence of walls for monolayers. For narrowly spaced walls as in (a), arching is stronger than for distant walls as in (b). For monolayers as in (c) and (d), narrowly spaced walls are better able to keep the particles in line, so that the influence of the walls is smaller. For more widely spaced walls as in (e) and (f), dislocation and tilting may deflect forces towards the walls and make their influence stronger.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 333 — #45

i

The Discrete Element Method in Three Dimensions

8.7.3

i

333

Packings and sound propagation

When sound waves are produced by the impact of particles on a granular agglomerate, it should not come as a surprise that the sound velocity will in general be higher than the velocity of the impacting particle. Before the impact, the whole mass m of the impacting particle moves with velocity v; but after the impact, the particles which propagate the sound velocity will move only a tiny fraction of their diameter, so due to momentum conservation, the velocity of propagation of this tiny part can be much higher. This is the reason for the eyewatering effect of watching a Newton’s cradle. The sound velocity (group velocity of the propagating wave) depends on the packing density of a granular agglomerate and on the contact situation. ‘Flat contacts’ (edge–edge contacts in two dimensions, or face–face contacts in three dimensions) will lead to a faster propagation speed than ‘sharp’ contacts (edge–vertex and vertex–vertex contacts in two dimensions, or face–edge and edge–edge contacts in three dimensions). Because in one- and two-dimensional simulations with polygons, the particles are effectively rods, the sound velocity will be faster than in three-dimensional simulations with arbitrarily oriented non-elongated particles.

8.8

Further reading

Beyond the references for Chapter 7, a valuable resource for dealing with polyhedra is [3]. Lin [4] discusses optimal searching strategies for rigid polyhedra in contact, but the algorithms need modifications for actually overlapping polyhedra. Parallelization of the threedimensional sort and sweep algorithm is explained in [11]. The sound velocity for polygonal particles in one, two and three dimensions is dealt with in [14, 15].

References [1] D. Zhao, E. G. Nezami, Y. M. Hashash, and J. Ghaboussi, “Three-dimensional discrete element simulation for granular materials”, Engineering Computations, vol. 23, no. 7, pp. 749–770, 2006. [2] P. Cundall, “Formulation of a three-dimensional distinct element model—Part I. A scheme to represent contacts in a system composed of many polyhedral blocks”, International Journal of Rock Mechanics and Mining Sciences & Geomechanics, vol. 25, no. 3, pp. 107–116, 1988. [3] F. Preparata and M. I. Shamos, Computational Geometry: An Introduction. Springer, 1985. [4] M. C. Lin, Efficient Collision Detection for Animation and Robotics. PhD thesis, University of California, Berkeley, 1993. [5] C. B. Barber, D. P. Dobkin, and H. T. Huhdanpaa, “The quickhull algorithm for convex hulls”, ACM Transactions on Mathematical Software, vol. 22, no. 4, pp. 469–483, 1996. http://www.qhull.org. [6] A. Schinner, “Numerische simulationen f¨ur granulare medien”, Master’s thesis, University of Regensburg, 1995. [7] G. H. Golub and C. F. van Loan, Matrix Computations, 3rd ed. Johns Hopkins Studies in Mathematical Sciences, Johns Hopkins University Press, 1996. [8] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, 1992. [9] S. McConnell, Code Complete: A Practical Handbook of Software Construction. Microsoft Press, 1993. [10] J. Chen, Discrete Element Method for 3D Simulations of Mechanical Systems of Non-Spherical Granular Materials. PhD thesis, The University of Electro-Communications, 2011. [11] J. Chen and H.-G. Matuttis, “Optimization and OpenMP parallelization of a discrete element code for convex polyhedra on multi-core machines”, International Journal of Modern Physics C, vol. 24, no. 2, article 1350001, 2013.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:49 — page 334 — #46

i

334

i

Understanding the Discrete Element Method

[12] M. P. Allen, “A molecular dynamics simulation study of octopoles in the field of a planar surface”, Molecular Physics, vol. 52, no. 3, pp. 717–732, 1984. [13] M. P. Allen and D. Tildesley, Computer Simulation of Liquids. Oxford University Press, 1987. [14] S. A. El Shourbagy, S. Okeda, and H.-G. Matuttis, “Acoustic of sound propagation in granular materials in one, two, and three dimensions”, Journal of the Physical Society of Japan, vol. 77, no. 3, article 034606, 2008. [15] W. S. Cheng, J. Chen, and H.-G. Matuttis, “Granular acoustics of polyhedral particles”, in Proceedings of the 7th International Conference on Micromechanics of Granular Media, Sydney, Australia, A. Yu, K. Dong, R. Yang, and S. Luding, eds., vol. 1542 of American Institute of Physics Conference Series, pp. 567–570, American Institute of Physics, 2013.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 335 — #1

i

i

9 Alternative Modeling Approaches This book focuses mainly on polygonal and polyhedral simulations, but in this chapter we give an overview of alternative modeling approaches for simulating non-spherical particles.

9.1

Rigidly connected spheres

As for polygons (respectively, polyhedra), where several edges (respectively, faces) are rigidly fixed relative to a center of mass and to each other, for clusters of connected spheres there are single spheres (‘monomers’) fixed with respect to each other. One advantage of using rigidly connected spheres [1] (or disks in two dimensions) is that the overlap computation needs only the computation for the monomers, i.e. geometrically the problem is still onedimensional. This approach is known by various names in the literature, such as ‘connected spheres’, ‘multi-spheres’ etc. For polygons or polyhedra, neighborhood algorithms and overlap computations have to be programmed for the whole particle; for clusters of spheres they are programmed for the monomers. As with other three-dimensional particle geometries, the Newton–Euler equations of motion must be implemented for the whole particles. A drawback of working with connected spheres is the modeling of smooth surfaces. If few spheres are used, the surfaces will be rather ragged, and interlocking between particles can occur easily. This makes it difficult to verify, for instance, the correct implementation of the friction law and the actual value of the friction coefficient. If a great many spheres are used, one obtains a very smooth approximation of the surfaces. Especially for the modeling of walls, there may be no choice other than to use many spheres, or else the ratio of the tangential force (due to friction) to the normal force would be distorted. However, considerable computer time would be needed to update the position of the monomers at each time-step. One reason to use clusters of round monomers is to save computer time compared with using mathematically more complex shapes, due to the reduced number of operations required

Understanding the Discrete Element Method: Simulation of Non-Spherical Particles for Granular and Multi-body Systems, First Edition. Hans-Georg Matuttis and Jian Chen. © 2014 John Wiley & Sons, Singapore Pte Ltd. Published 2014 by John Wiley & Sons, Singapore Pte Ltd. Companion website: www.wiley.com/go/matuttis

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 336 — #2

i

336

i

Understanding the Discrete Element Method

in the overlap computation; however, this advantage will be lost if many monomers have to be used. For two-dimensional simulations of polygons, we have found that from about 128 corners (or edges) onward, the most costly part of the program is the updating of the corner positions (costlier than the overlap computation, the neighborhood algorithm, or the BDF5 integrator). Accordingly, it can be expected that if one increases the number of connected particles to obtain smooth edges, 128 monomers would be the limit in two dimensions, above which clusters would become computationally more costly than polygons with few edges. Massive numbers of particles [2] have been used in general purpose graphics card units (GPGPU) to obtain smooth surfaces. While the single-instruction multiple-data (SIMD) parallelization is relatively easy for round particles, it is impracticable for polygons and polyhedra due to the different conditional operations which have to be executed. Nevertheless, in the case of polyhedra, smooth surfaces can be realized with much smaller numbers of particles. The implications for stability of using connected clusters of particles are still poorly understood. While simulations with single particles produce single contacts, the use of clusters may produce single or multiple contacts. For single contacts, it is easy to estimate the penetration depth, the minimal contact time (or the relative error in the overlap computation due to the finite time-step) and the necessary time-step. If multiple contacts are possible, especially if ‘smooth’ straight surfaces should be modeled1 so that whole rows of particles may be in contact, the penetration depth will be reduced. It should not be forgotten that the numerically computed positions are ‘noisy’, affected by the integrator and the force and inertia terms computed at discrete times: a time step which is barely large enough to resolve the evolution of two particles with a single contact may be too small for multiple contacts between the monomers. It is not necessarily the rectilinear degrees of freedom which are affected first, but rather the nonlinear equations for the rotation. If such problems are encountered, possible remedies are to use integrators with better stability properties (e.g. BDF) and, if necessary, smaller time-steps (with consequences for the performance in the latter case). Another issue with connected particles is the same for all non-spherical models: particles can interlock and may, under external strain, store considerable elastic energy; this energy can then be released by particle motion which is on a much faster time-scale than the usual oscillations due to Young’s modulus and mass. A possible remedy for instabilities due to such sudden releases of stress may be to use BDF methods with multiple corrector iterations.

9.2

Elliptical shapes

If one wants to go beyond circular shapes and to investigate the effect of particle elongation, ellipses are the natural choice, but the resulting overlap computations are not without problems. When we deal with polygons, the overlaps are again polygons; when we deal with polyhedra, the overlaps are again polyhedra. But when we deal with ellipses, the overlaps

1 For polygons (respectively, polyhedra), the penetration depth will also be smaller for contacts along the edges (respectively, faces). However, for single polygons and polyhedra, the error is relative to the size of the whole particles, whereas for connected clusters the error is relative to the smaller monomers.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 337 — #3

i

Alternative Modeling Approaches

i

337

are not ellipses—the conceptual simplicity gets lost when shapes with curved boundaries are used. In textbooks ellipses are usually represented in Cartesian coordinates as y2 x2 + = 1, a2 b2 where a and b are the lengths of the half-axes. This representation describes only ellipses whose half-axes are parallel to the Cartesian axes. For ellipses in general orientation, the equation will contain additional x, y and x y terms, as one can see when one rotates the Cartesian coordinate system by an angle φ: x → x cos φ − y sin φ, y → y cos φ + x sin φ. The transformed equation is (x cos φ − y sin φ)2 (y cos φ + x sin φ)2 + = 1, a2 b2 which gives mixed terms when the brackets are multiplied out.

9.2.1

Elliptical potentials

Elliptical potentials (in closed form, without the need to compute actual overlap areas) have been proposed [3–5], but such potentials can only mimic normal forces and maybe torques. We see no possibility of using them to model dry friction adequately, as no force point can be defined which would allow computation of the tangential velocity. This means that static friction—one of the most essential elements for the modelization of macroscopic bodies— cannot be implemented appropriately.

9.2.2

Overlap computation for ellipses

In the following, we will discuss the approach needed to adapt the force laws for polygons (from Chapter 7) to elliptic shapes, together with the overlap computation and force points. We can represent two general ellipses via the functions C1 (x, y) = a1 x 2 + b1 x y + c1 y 2 + d1 x + e1 y + f 1 ,

(9.1)

C2 (x, y) = a2 x 2 + b2 x y + c2 y 2 + d2 x + e2 y + f 2 ,

(9.2)

so that the ellipses are the curves given by C1 (x, y) = 0,

(9.3)

C2 (x, y) = 0.

(9.4)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 338 — #4

i

338

i

Understanding the Discrete Element Method

Points (x0 , y0 ) in the two-dimensional plane which lie inside the ellipse described by C1 (x, y) will satisfy C1 (x0 , y0 ) < 0, and points outside will satisfy C1 (x0 , y0 ) > 0; analogously for the ellipse described by C2 (x, y). We can try to find intersection points of the two ellipses by eliminating a variable, say y, by solving Equation (9.4) for y in terms of x and then substituting the expression into Equation (9.3). This yields a quartic equation ˜ 3 + C˜ x 2 + Dx ˜ + E˜ = 0, ˜ 4 + Bx Ax

(9.5)

which is usually simplified to x 4 + Ax 3 + Bx 2 + C x + D = 0

(9.6)

(with leading coefficient 1). One can then try to obtain the four roots x1 , x2 , x3 , x4 of (9.6) using Ferrari’s formula (the numerically stable variant from Numerical Recipes [6–8]), and calculate the yi corresponding to the xi . Those (xi , yi ) pairs which are real will be the intersection points of the ellipses in the plane. But the bad news is that this procedure will not work in a numerical implementation with double precision. When we count the information contained in the coefficients in Equations (9.3)–(9.4), we find that two times six coefficients means 2×6×8 = 96 bytes. The quartic equation (9.5), on the other hand, has only five coefficients (if we count the 1 in front of x 4 as a coefficient), which means only 5 × 8 = 40 bytes, less than half the information contained in the original equations. Thus, in deriving Equation (9.5), more than half the information has been lost. This means that for floating point computations in conventional simulations, where the number of digits is fixed, the information loss leads to a loss of accuracy in the intersection computation. Owing to the need to take third powers of A if Ferrari’s formula for fifth-order equations is applied to Equation (9.6) (even in the numerically stable form of [6–8]), considerable rounding errors will be introduced. Test cases with axes-parallel ellipses (which are usually the first examples one will try) may actually be computable without larger errors. In such cases Equations (9.3)–(9.4) simplify to a1 x 2 + c1 y 2 + f 1 = 0,

(9.7)

a2 x 2 + c2 y 2 + f 2 = 0,

(9.8)

and therefore the information loss is only 8 of 48 bytes. Nevertheless, for ellipses in arbitrary orientation, the numerical error is prohibitive. A handwaving approximation is that for an mass on the order of 1 and Young’s modulus on the order of 106 , one would obtain an overlap of approximately 10−3 of the radius. Thus, the overlap has to be resolved in fractions of one-thousandths of a radius to obtain a time integration. For the test cases we have computed, the accuracy of the solution was absolutely insufficient. The problem lies not with the solution of Equation (9.6)—it is already present in the transformations from Equations (9.3)–(9.4) to Equation (9.6), which introduces the error; so even an accurate solution of Equation (9.6) gives the wrong answer, as it is not related to the original problem.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 339 — #5

i

Alternative Modeling Approaches

9.2.3

i

339

Newton–Raphson iteration

Instead of finding the intersections of two ellipses by computing the roots of the corresponding fourth-order equation (9.6), we can try to directly find the solutions (x, y) to the simultaneous second-order equations (9.3)–(9.4); see [9]. The solutions of such nonlinear equations are usually computed by Newton–Raphson iteration. We can find the roots of a function C(x) (i.e. the values of x for which C(x) becomes zero) by applying the following iteration with suitable starting values x0 : xn+1 = xn − k ·

C(xn ) ∇C(xn )

where we have written the first derivative C  (x) suggestively with the ∇ symbol, as we will be using the equivalent formula for higher dimensions. For k = 1, we have the original Newton–Raphson iteration; for 0 < k < 1, we get the ‘damped’ Newton–Raphson iteration, which converges more slowly but will not overshoot as strongly as the method with k = 1. Geometrically, we choose tangents ∇C(xn ) to the curve C(x) which have intersection points with the x-axis that get successively close to the root. The convergence is in general quadratic, i.e. the number of correct digits doubles with each iteration. (If C(x) does not cross the x-axis but merely touches it, the convergence is only linear; and if C(x) does not have a root, the algorithm will not converge at all.) If the starting value x0 is unsuitable, the iteration will diverge to infinity very fast. For higher-dimensional systems the formula is analogous, but instead of the derivative the Jacobian is used. To find the solutions of the nonlinear system (9.3)–(9.4), i.e. the simultaneous equations C1 (x, y) = 0, C2 (x, y) = 0, Newton–Raphson iteration needs the derivatives of each function with respect to x and y (i.e. the Jacobian):  ∂C ∇C =  =

1 (x,y)

∂x ∂C2 (x,y) ∂x

∂C1 (x,y) ∂y ∂C2 (x,y) ∂y

2a1 x + b1 y 2a2 x + b2 y



 b1 x + 2c1 y . b2 x + 2c2 y

The equation for the iteration is then        −1 C1 (xn , yn ) xn+1 x = n − K · ∇C(xn , yn ) yn+1 yn C2 (xn , yn ) If the ellipses are not too elongated, one can use as starting values (x0 , y0 ) the intersection point coordinates between circles with radii equal to the longer half-axes of the respective ellipses. However, when the ellipses are very elongated (i.e. when the ratio of the longer halfaxis ra to the shorter half-axis rb is greater than 3), the iterations may not converge when the ellipses are unfavorably oriented. In that case one can perform intermediate iteration steps using dummy ellipses with half-axes of more similar lengths. Nevertheless, if the original purpose of using ellipses was to obtain a less complex algorithm than for polygons, at this point the advantage of using ellipses starts to become doubtful. For force laws analogous to the ones we have given for polygons in Chapter 7, the areas of ellipse sectors and ellipse

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 340 — #6

i

340

i

Understanding the Discrete Element Method

Table 9.1 Formulae for the areas A of ellipse segments and ellipse sectors; in some cases and for some values of the parameters, the formulae will give not the area of the smaller segment but the area of the whole ellipse minus the area of the smaller segment. y

Whole ellipse: A = πab

C

P2(x2,y2)

P1(x1,y1)

Ellipse sector: P2 O P3 B P2 , with P2 P3 vertical: x A = ab arccos 2 a Ellipse sector: P1 O P2 C P1 :

E

a

 x x A = ab arcsin 2 − arcsin 1 a a

B

x

O

b

Ellipse segment: P2 P3 B P2 , with P2 P3 vertical: x A = ab arccos 2 − x2 y2 a

P3(x3,y3) D

x x 1 ab  arcsin 2 − arcsin 1 Ellipse segment: P1 P2 C P1 : A = (x1 y2 − x2 y1 ) + 2 2 a a

segments will be needed. For ellipses centered at the origin with half-axes parallel to the Cartesian axes, these formulae are given in Table 9.1; for ellipses that are not axes-oriented, a rotation of the coordinate system must be used for points on the circumference.

9.2.4

Ellipse intersection computed with generalized eigenvalues

The algorithms for ellipse potentials don’t supply the force points necessary for the computation of static friction. Here we propose a very general method for computing a unique point inside the overlap of two ellipses. (It does not work for the case where one ellipse penetrates through the other.) The algorithm makes use of the solution of the generalized eigenvalue problem A − λB = 0,

(9.9)

where A and B are square matrices; this differs from the conventional eigenvalue problem where B is the identity matrix. Using the symmetric matrices ⎛

a1

⎜ A = ⎝ 21 e1 1 2 h1

1 2 e1

c1 1 2 b1

1 ⎞ 2 h1 1 ⎟, 2 b1 ⎠

f1



a2

⎜ B = ⎝ 21 e2 1 2 h2

1 2 e2

c2 1 2 b2

1 ⎞ 2 h2 1 ⎟ 2 b2 ⎠

f2

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 341 — #7

i

Alternative Modeling Approaches

i

341

and the vector ⎛ ⎞ x v = ⎝ y ⎠, 1 we can write the curves C1 (x, y) and C2 (x, y) from Equations (9.1) and (9.2) as quadratic forms of v: C1 (x, y) = vT A v, C2 (x, y) = vT B v. A common point on both ellipses would be given by a solution of the generalized eigenvalue problem (9.9). Note that while (9.9) is equivalent to AB −1 − λ1 = 0,

(9.10)

solutions with better numerical stability2 can be conceived from the formulation in (9.9). One can construct the bilinear form of a ‘joint solution’, C(x, y) = vT (A − λB) v.

(9.11)

The extremum of this relation is obtained by taking the derivative in the vectorial sense, ∇v , and setting it to zero, which yields (A − λB) v,

(9.12)

and this is exactly the generalized eigenvalue problem of (9.9) with eigenvector v. The meaning of this minimization can be see in Figure 9.1. The two ellipses in the figure are actually cross-sections of paraboloid surfaces in the x-y plane. The minimization procedure yields that point in the x-y plane for which the sum of the z-coordinates of the two surfaces is minimal, in this case zero. For non-intersecting ellipses, there will be no point in the x-y plane satisfying this extremal condition.3 The three (in general complex) eigenvectors ⎛

(1)

v1



⎜ (1) ⎟ ⎟ v(1) = ⎜ ⎝v2 ⎠, (1) v3

v(2)

⎞ ⎛ (2) v1 ⎜ (2) ⎟ ⎟ =⎜ ⎝v2 ⎠, (2) v3

v(3)

⎞ ⎛ (3) v1 ⎜ (3) ⎟ ⎟ =⎜ ⎝v2 ⎠ (3) v3

(9.13)

2 The issues here are similar to those in the computation of the pseudo-inverse via matrix multiplication and inversion of the product of two rectangular matrices versus computation via singular values as in Appendix A, Exercise 1.6. 3 The generalized eigenvalue problem (A + λB)v, where the pre-factor of the matrix B is reversed, will give the same point as solution. In that case, there are two paraboloid surfaces opening upward, and the point at which the sum of their z-coordinates is zero will be the same.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 342 — #8

i

342

i

Understanding the Discrete Element Method

3 2.5 2 1.5 1

R1 = R2

Ellipse 1

Ellipse 2

0.5

2

0 y-axis 0

–0.5 –1 –2

x-axis –1.5

–1

–0.5

0

–2

0.5

1

1.5

2

2.5

Figure 9.1 Two ellipses in the x-y plane, with the common point (marked by ⊕) computed from the generalized eigenvalue problem (9.12).

for the generalized eigenvalue problem (9.12) describe the intersection points in the vector space with ⎛ ⎞ x v = ⎝y⎠ (9.14) 1 as base vector. Therefore, if two points ⎛ (i) ⎞ ⎛ v1

⎜ v (i) ⎟ 3 ⎟ R1 = ⎜ ⎝ v (i) ⎠, 2 (i) v3

( j)

v1



⎜ v ( j) ⎟ 3 ⎟ R2 = ⎜ ⎝ v ( j) ⎠

with i, j ∈ {1, 2, 3}, i = j

(9.15)

2 ( j) v3

in two-dimensional real space can be found such that R1 and R2 fall together, they will be the Cartesian coordinates at which C(x, y) from (9.11) is extremal (minimal) and the two ellipses overlap.4 To decide whether ellipses are in contact or not, Viellard-Baron [10] gave a criterion involving determinants which is equivalent to our approach if, instead of the generalized eigenvalue problem, the corresponding characteristic polynomial for λ is evaluated. However, 4 One cannot tell beforehand which of the i for v(i) will yield the coordinates, as there is no canonical order in which

software packages will compute eigenvalues and eigenvectors.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 343 — #9

i

Alternative Modeling Approaches

i

343

(cx, cy)

R1 = R2 (ex, ey)

Figure 9.2 Computation of the extension (ex , e y ) of the intersection point (r x , r y ) to the circumference of the ellipse.

the numerical computation of determinants is notoriously unstable, whereas in our method, more coefficients are dealt with in the eigenvalue problem than were present in the original problem, and thus no input information about the location of the ellipses is lost as in the transformation to Equation (9.5). For overlapping ellipses, the coordinates of R1 and R2 will be identical up to rounding errors, and the rounding errors can be estimated by the eigenvalue R ). For ellipses which are close to touching, condition number (e.g. condeig in MATLAB R1 and R2 will also be close. Whether or not there is an overlap can be checked by inserting R1 and R2 into Equations (9.3)–(9.4) and seeing if they give negative values for C1 (x, y) and C2 (x, y). Codes for computation of the generalized eigenvalue problem are available in standard libraries, such as DSPGV (with symmetric matrices A and B) of LAPACK [11]. Besides the computation of a force point for potentials as in § 9.2.1, additional geometrical data and interaction laws analogous to the ones for overlapping polygons in § 7.3.1 can be constructed. For the vector from the centroid of one of the overlapping ellipses to the force point R = (r x , r y ), one can compute its extension e to the circumference of the ellipse; see Figure 9.2. The angle of inclination φ of e can be computed with the atan2 function: φ = atan2(r y − c y , r x − cx ). For an ellipse with half-axes a, b and inclination θ, the distance between the centroid and the circumference at angle φ is d(a, b, φ, θ ) =

ab (b cos(θ − φ)) + (a sin(θ − φ))2 2

.

Then, the extension e of the line connecting the center of the ellipse and the force point to the circumference of the ellipse can be computed as       c cos(φ) ex = x + d(a, b, φ, θ ) . ey cy sin(φ) To deal with ellipsoids in three dimensions, the above formalism can be generalized to a description of the curves via quadratic forms involving 4 × 4 matrices and the vector v = (x, y, z, 1)T .

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 344 — #10

i

344

i

Understanding the Discrete Element Method

(a)

(b)

Figure 9.3 (a) Contact line (thick solid line) of two cylinders crossed at right angles. (b) The same contact line drawn for two crossed ellipsoids; the dashed line indicates the inclination of the contact line for cylinders, from which the contact between the ellipsoids obviously deviates.

9.2.5

Ellipsoids

In two dimensions, two ellipses can have at most two intersection points, so the characterization of their overlap is geometrically quite simple. The overlap of ellipsoids in three dimensions is rather more difficult to deal with. Even compared to two cylinders crossed at right angles, as shown in Figure 9.3(a), the contact line between two intersecting ellipsoids is more complicated; see Figure 9.3(b). So both the determination of the overlap region and the definition of the contact direction will be less straightforward. Therefore, while the use of ellipses in two-dimensional simulations is feasible, generalizing the associated force laws to ellipsoids is problematic; in constrast, generalizations from polygons to polyhedra are fairly straightforward (even if the computational effort and algorithmic complexity will be considerably greater in three dimensions).

9.2.6

Superquadrics

From ordinary quadrics with half-axes a and b, such as the ellipse  x 2 a

+

 y 2 b

= 1,

(9.16)

‘superquadrics’ are obtained by manipulating the exponent away from 2. Super-ellipses with exponent n are given by  x n  y n     (9.17)   +   = 1, a b and the corresponding three-dimensional shapes, the super-ellipsoids, are obtained by rotation around the z-axis; see Figure 9.4. For overlap computations with super-ellipses or superellipsoids, the same problem arises as for ellipses. Even if the shapes are convex (i.e. n ≥ 1), only iterative algorithms can be used for the intersection computation. The absolute value in Equation (9.17) makes the iterations even more difficult, as it is necessary to distinguish between several cases for the sign. Although superquadrics can be problematic to use in DEM simulations, when they are used in computer graphics, their outlines or penetrations must be traced only pixelwise, which imposes much less stringent conditions on the accuracy.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 345 — #11

i

i

Alternative Modeling Approaches n = 0.5

n = 0.7

345 n=1

n=2

n=2.5

n=4

Figure 9.4 Super-ellipsoids (rotated super-ellipses) with exponents n = 0.5, 0.7, 1, 2, 2.5 and 4, with a = 1 and b = 2. The shape for n = 2 is a conventional spheroidal ellipsoid.

9.3

Composites of curves

If one considers polygons or polyhedra inconvenient, one should be aware that composites of curves share the same inconvenient aspects, but additionally have the problems associated with elliptic curves, as well as a few of their own.

9.3.1

Composites of arcs and cylinders

The use of piecewise curves, such as circle segments, leads to issues similar to those encountered with ellipses or ellipsoids, as the contact lines are more complicated than for intersections between straight edges. Another common issue arises with the composite primitives, as there is a need to decide during the overlap computation where each segment ends. Arcs of circles have been used [12, 13], and even (non-convex) shavings of hollow cylinders have been implemented [14]. Surprisingly, splines seem not to have been used in granular or discrete element simulations, perhaps due to the lack of reliable overlap computation methods. We will discuss splines because their mathematical form allows us to highlight some possible problems better than other composite curves.

9.3.2

Spline curves

There is a difference between spline functions and spline curves; see Figure 9.5(a). For R ’s spline functions, each x-value has a unique y-value corresponding to it. MATLAB interp(....,’spline’) produces a spline function, not a spline curve. For spline curves, given a set of support points, the ordering of their x-values is not necessarily the order in which the curve goes through the points. Spline curves are uniquely defined between their support points by the order of the curve as well as by the boundary values. If we want to model discrete elements with spline curves, we need periodic boundaries for the curve so that the curve will be closed. Additionally, the gradient of the spline from a region between one pair of successive support points to the adjoining region between the next pair of support points must be smooth. Because of this smoothness requirement, the lowest possible order for splines is three. There is no ambiguity with spline curves: if one fits a spline curve to a set of support points and then rotates it, one obtains the same curve as if one had rotated the support

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 346 — #12

i

346

i

Understanding the Discrete Element Method

(b)

(a)

1.5

7

1

6

2 3

Curve 2

5 4

6

0.5 0

1

8

2

4

7

3

5

Curve 1

−0.5

8

−1 −2

2

1

−1

0

1

2

3

4

Figure 9.5 (a) For the same support points (circles), the third-order spline function is the wavy curve and the third-order spline curve with periodic support points is the closed curve around the gray region. (b) Configuration for the intersection computation of two splines: for each spline curve, the polygon given by the support points (circles or diamonds) is drawn with dashed or dotted lines, and the corresponding spline is a closed curve drawn with solid lines.

(a)

(b)

3.5

4

3

3.5 3

2.5

Rotated point data Interpolated data rotated Rotated data interpolated

2.5

2

2 1.5

1.5

1

1

0.5

0.5 0

0 −2

−1

0

1

2

−4

−3

−2

−1

0

1

Figure 9.6 (a) Original data (crosses) interpolated with a cubic spline function. (b) The same data first rotated and then interpolated with a cubic spline function (thick black line), or first interpolated with a cubic spline function and then rotated (thick gray line).

points first and then computed the spline curve. The shape of closed spline curves does not change under rotation. Spline functions are a different matter, however; see Figure 9.6. Nevertheless, despite their many applications, spline curves do not seem to have been used to model discrete elements in simulations with large numbers of particles. This is hardly surprising if one thinks about the algorithmic effort required even in two dimensions and for purely convex shapes. In principle, the computational effort to locate the intervals between the support points of a spline is of the same complexity as the intersection computation for polygons. Then, the intersection of curve segments must be computed, the computational effort of which is comparable to the intersection computation for ellipses. Because we have to deal with spline curves, not spline functions, so that the support points can have arbitrary relative orientation, probably only Newton iterations can be used to compute the intersection points; when convergence is obtained, one must then verify whether the point is in the interval between the support points, or outside. The polygon defined by the support points may have

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 347 — #13

i

Alternative Modeling Approaches

i

347

an intersection between a certain pair of support points while the corresponding spline curve intersects one segment further. As can be seen from the shaded area in Figure 9.5(b), the edge between support points 4 and 5 of Curve 1 has an intersection with the edge between support points 6 and 7 of Curve 2, but the spline curves themselves intersect between support points 5 and 6 of Curve 1 and support points 6 and 7 of Curve 2. These issues increase the complexity of identifying the intersection points between curves.

9.3.3

Level sets

Level-set methods were originally developed to model the contours of fronts in flow problems. For an underlying square grid, level sets describe an approximation of a curved surface over the grid. Grids in flow problems usually don’t move—the representation is Eulerian (see Chapter 1, page 4), and accordingly the level set depends on the orientation of the grid axes. While this is not a drawback if contours in the solution of partial differential equations must be traced, a shape which is defined as a contour with respect to support points on a grid structure will subtly change. Obtaining the accuracy necessary for discrete element methods without introducing additional noise seems to be rather problematic for the rotational motion of the particles. Figure 9.6(b) shows that, with the same original data, interpolated rotated data and rotated interpolated data do not necessarily match. The overlap computation has the same issues with computational complexity as for splines: one has to first identify the neighborhood of the support points where the overlap occurs, and the actual overlap of the curved surfaces must then be found.

9.4

Rigid particles

Discrete element modeling with rigid particles is appealing, as contact occurs only on the surfaces, so one can do without the computational effort to determine the overlap. Moreover, while soft particle simulations have to resolve collisions over several time-steps, rigid particle simulations seem to be able to deal with a collision in a single time-step, which would reduce the overall number of time-steps needed. Nevertheless, the rigidity of particles can lead to some serious drawbacks.

9.4.1

Collision dynamics (‘event-driven method’)

In general, the event-driven method (ED) or discrete event simulation refers to a type of simulation method where some process A is simulated, and when a certain ‘event’ is detected, another process B is effectuated, after which process A is usually continued [15]. For granular materials, the event-driven method is a discrete element method in which process A corresponds to the free flight of particles, while process B is the collision of two rigid particles (computed based on the conservation of momentum and the conservation or loss of energy). The particles fly in trajectories (which would be parabolic under gravity) until a contact between the outlines of two particles occurs. At this ‘event’ (hence ‘event-driven’), all particles are stopped at their current positions; then the velocities of the colliding particles are dealt with (e.g. for the simplest case of frontal collisions of particles with the same mass

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 348 — #14

i

348

i

Understanding the Discrete Element Method

and opposite velocities, the velocities would be reversed). For systems of low density, such as granular gases, the process is very fast for round particles, but as each collision dissipates energy, the density in the system will increase. With physical coefficients of restitution, the system may soon become too dense to be dealt with via two-particle collisions. The larger the system is, the shorter the interval between collisions becomes, so the effective time-step becomes less efficient for larger systems. A (physical) remedy for this problem is to not deal with the whole system as one unit but partition it into subsystems, each with a ‘local clock’ [16], so that the time is advanced according to the collisions of neighboring particles, not according to the next collision in the global system. Closed formulae can be given for systems of round particles traveling in parabolic trajectories under constant gravity, but not when various additional potentials (e.g. electrostatic interaction) are present. For the event-driven method, in which the particles are practically ‘never’ in contact except at delta-like events, the sound velocity depends on the time to the next collision, i.e. the particle density [17], as for gas molecules. The event-driven method is the simplest example of a rigid-body discrete element method; nevertheless, it needs finite relative velocities between collision partners—it cannot deal with particles at rest. For systems of very low density (‘granular gases’), the shape effects become negligible.

9.4.2

Contact mechanics

Contact mechanics [18, 19] (unrelated to the field of ‘contact dynamics’ for the modeling of contacting solids with the finite element method [20–22]) is a simulation method which can deal with both loose and dense configurations of rigid particles as well as the effects of static friction by realizing a dynamics with unilateral constraints. For contact mechanics, J. J. Moreau introduced the ‘sweeping process’ [23], an iteration which satisfies simultaneously the equations for the unilateral constraints of the volume exclusion (‘normal force’) and for Coulomb friction (‘tangential force’). The solution by the ‘sweeping process’ is unique and well-defined, and iteration of the tangential and normal forces occurs simultaneously. However, this simultaneity contradicts the physical principle of Coulomb friction as a reactive force which must depend on the normal forces. Thus, there is no proof that the method would conform to any physical principle of mechanics (e.g. Gauss’s principle of least constraint). Contact mechanics allows for a non-smooth variation of velocities and is therefore a generalization (in the mathematical sense, though not necessarily in the physical sense) of classical physics. The Newtonian kinematics where accelerations are the derivatives of differentiable velocity functions has been abandoned. Already in the event-driven approach, delta-like forces are inherent, but in that method they don’t have to be dealt with explicitly because the simulation is stopped and then restarted. In contrast, contact mechanics needs time integrators which can deal with the non-smooth variation of the velocities, which is at odds with most derivations of ODE solvers in numerical analysis. Contact mechanics has been implemented in two and three dimensions with various particle shapes: with round particles and polygons in [24] and with polyhedra in [25]. As only touching contacts and no overlaps have to be considered, the possible contact geometries in three dimensions (face–face, face–edge, face–vertex, edge–edge) are simpler to handle than with comparable soft particle discrete element methods. However, as it is practically impossible to obtain ‘exactly’ touching contacts due to rounding errors for finite time-steps of the integrator,

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 349 — #15

i

Alternative Modeling Approaches

i

349

it is necessary to include numerical tolerances for the criteria for touching. Although the aforementioned problems have all been dealt with in some way, there is a more fundamental problem with the actual results of the simulation due to the rigidity of the particles. For all configurations with resting contacts, the sound velocity will be infinite, independent of the density of the system. Therefore, phenomena related to shock or sound propagation cannot be investigated with rigid particle approaches. Further, there is no ‘linear regime’ in the stress– strain diagram for small strains: at zero strain, the stresses will be zero, while for minimal finite strain, the stress will jump to a finite value in a discontinuous manner [26]. The fact that there is no elasticity parameter in contact mechanics (or other rigid body approximations) is often misinterpreted as meaning that its results would be ‘universal’, independent of the Young’s modulus. In fact, as the rigid body limit in mechanics is just the limit of vanishing strains, the corresponding results are only valid for vanishing external stresses. If finite deformations are relevant in modeling a phenomenon, soft body simulations are necessary.

9.5

Discontinuous deformation analysis

In discontinous deformation analysis (DDA) [27, 28] and its variants, the contact situation between discrete element particles is transformed into a stiffness matrix, similar to the matrices used in the finite element method (FEM), which is appealing to researchers with a background in that field. However, as well as having to deal with equations of motion in matrix form (and the concomitant extra effort which has to be invested in linear algebra), there is the disadvantage that velocity-dependent forces (normal damping and friction) are difficult to model in this approach. The focus on elastic forces in DDA leads to results which are not equivalent to those obtained from discrete element methods that solve Newton’s equation of motion for the particles. Whole (inertia-related) time-scales are missing from the dynamics, so that only a coarsened representation of the dynamics is obtained, as demonstrated by comparisons of experiments and DDA simulation results [29]. Originally DDA was designed for modeling in rock mechanics, where rebounds and vibrational dynamics are less important; but for granular media, the results of DDA simulations resemble those obtained with the ‘zero-order integrators’ discussed in § 2.7.5.

9.6

Further reading

For a readable introduction to the sweeping process, see [30]. Modelization with curves is treated in [31–34]. A comprehensive introduction to splines can be found in [35]. The standard references for level-set methods are [36] and [37]. Many aspects of the event-driven method are treated in [17].

References [1] S. Sokolowski and J. A. C. Gallas, “Grain non-sphericity effects on the angle of repose of granular materials”, International Journal of Modern Physics B, vol. 7, no. 9–10, pp. 2037–2046, 1993. [2] D. Negrut, A. Tasora, H. Mazhar, T. Heyn, and P. Hahn, “Leveraging parallel computing in multibody dynamics”, Multibody System Dynamics, vol. 27, no. 1, pp. 95–117, 2012.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 350 — #16

i

350

i

Understanding the Discrete Element Method

[3] J. W. Perram and M. S. Wertheim, “Statistical mechanics of hard ellipsoids. I. Overlap algorithm and the contact function”, Journal of Computational Physics, vol. 58, pp. 409–416, 1985. [4] J. W. Perram, J. Rasmussen, E. Præstgaard, and J. L. Lebowitz, “Ellipsoid contact potential: Theory and relation to overlap potentials”, Physical Review E, vol. 54, pp. 6565–6572, Dec 1996. [5] R. Everaers and M. R. Ejtehadi, “Interaction potentials for soft and hard ellipsoids”, Physical Review E, vol. 67, article 041710, Apr 2003. [6] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C++: The Art of Scientific Computing, 3rd ed. Cambridge University Press, 2002. [7] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in C, 2nd ed. Cambridge University Press, 1992. [8] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in Fortran: The Art of Scientific Computing, 2nd ed. Cambridge University Press, 1992. [9] H. G. Matuttis, N. Ito, H. Watanabe, and K. M. Aoki, “Vectorizable overlap computation for ellipse-based discrete element method”, Powders & Grains 2001, Y. Kishino, ed., pp. 173–176, Balkema, 2001. [10] J. Viellard-Baron, Th´ese de Doctorat d’Etat. PhD thesis, Facult´e des Sciences d’Orsay, 1970. [11] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, 1992. [12] A. V. Potapov and C. S. Campbell, “A fast model for the simulation of non-round particles”, Granular Matter, vol. 1, pp. 9–14, 1998. [13] P. Fu, O. R. Walton, and J. T. Harvey, “Polyarc discrete element for efficiently simulating arbitrarily shaped 2D particles”, International Journal for Numerical Methods in Engineering, vol. 89, no. 5, pp. 537–670, 2012. [14] S. R´emond, J. L. Gallias, and A. Mizrahi, “Simulation of the packing of granular mixtures of non-convex particles and voids characterization”, Granular Matter, vol. 10, pp. 157–170, 2008. [15] G. S. Fishman, Principles of Discrete Event Simulation. Wiley, 1978. [16] D. C. Rapaport, The Art of Molecular Dynamics Simulation. Cambridge University Press, 2004. [17] S. Luding, Die Physik trockener granularer Medien (Habilitation thesis, in German). Logos Verlag, 1998. [18] J. J. Moreau, “Unilateral contact and dry friction in finite freedom dynamics”, in Nonsmooth Mechanics and Applications, J. J. Moreau. and P. D. Panagiotopoulos, eds., vol. 302 of CISM Courses and Lectures, pp. 1–82, Springer, 1988. [19] M. Raous, M. Jean, and J. J. Moreau, eds., Contact Mechanics, Plenum, 1995. [20] T. Laursen, Computational Contact and Impact Mechanics: Fundamentals of Modeling Interfacial Phenomena in Nonlinear Finite Element Analysis. Engineering Online Library, Springer, 2002. [21] G. Zavarise and P. Wriggers, Trends in Computational Contact Mechanics. Lecture Notes in Applied and Computational Mechanics, Springer, 2011. [22] P. Wriggers and T. Laursen, eds., Computational Contact Mechanics. Vol. 498 of CISM International Centre for Mechanical Sciences, Springer, 2008. [23] J. Moreau, “Evolution problem associated with a moving convex set in a Hilbert space”, Journal of Differential Equations, vol. 26, pp. 347–374, 1977. [24] J. J. Moreau, “Application de la methode “contact dynamics” a` des collections de solides polygonaux”, in 4`eme R´eunion annuelle du R´eseau de Laboratoires G.E.O, Aussois, France, 24–28 Novembre, 1997. [25] E. Az´ema, F. Radjai, R. Peyroux, V. Richefeu, and G. Saussine, “Short-time dynamics of a packing of polyhedral grains under horizontal vibrations”, The European Physical Journal E, vol. 26, no. 3, pp. 327–335, 2008. [26] E. Az´ema, F. Radjai, and F. Dubois, “Packings of irregular polyhedral particles: Strength, structure, and effects of angularity”, Physical Review E, vol. 87, article 062203, Jun 2013. [27] G.-H. Shi, “Discontinous deformation analysis: A new numerical model for the statics and dynamics of deformable block structures”, Engineering Computations, vol. 9, pp. 157–168, 1992. [28] G. Shi, Block System Modeling by Discontinuous Deformation Analysis. Topics in Engineering, Computational Mechanics Publications, 1993. [29] A. Aikawa and F. Urakawa, “Modeling techniques for three-dimensional discrete element analysis of a conventional ballasted railway track and its application” (in Japanese), Technical Report 2, Railway Technical Research Institute, Kunitachi, Tokyo, Japan, 2009. [30] M. Kunze and M. Marques, “An introduction to Moreau’s sweeping process”, in Impacts in Mechanical Systems, B. Brogliato, ed., vol. 551 of Lecture Notes in Physics, pp. 1–60, Springer, 2000. [31] W. Boehm and H. Prautzsch, Geometric Concepts for Geometric Design. A. K. Peters, 1994.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 351 — #17

i

Alternative Modeling Approaches

i

351

[32] C. Gibson, Elementary Geometry of Algebraic Curves: An Undergraduate Introduction. Cambridge University Press, 1998. [33] R. Bix, Conics and Cubics: A Concrete Introduction to Algebraic Curves. Springer, 2006. [34] M. Mortenson, Geometric Modeling, 3rd ed. Industrial Press, 2006. [35] J. Hoschek and D. Lasser, Fundamentals of Computer Aided Geometric Design. A. K. Peters, 1993. [36] S. Osher and R. Fedkiw, Level Set Methods and Dynamic Implicit Surfaces. Applied Mathematical Sciences, Springer, 2003. [37] J. Sethian, Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, 1999.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 16:28 — page 352 — #18

i

i

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 353 — #1

i

i

10 Running, Debugging and Optimizing Programs Programming projects for discrete element simulations can turn out more complex than projects with other simulation methods. This is because a wide variety of fields are involved: classical mechanics, numerical analysis, computational geometry, computer algorithms, etc., in addition to software tools such as compilers and visualization libraries. Therefore, one has to think more deeply about the organization of one’s work, convenient tools and safe programming strategies than for projects with lattice methods or partial differential equations, where standards are long established. Most of the principles and ideas in this chapter are just common sense. Unfortunately, it is a ‘common sense’ that often takes a few years to develop. As the programmer can’t afford to get things wrong too often, in this chapter we give a summary of tips and pointers which we hope will be helpful for self-preservation of the programmer.

10.1

Programming style

A basic mistake of many researchers is the wrong choice of priorities. First, a code should run correctly and have a manageable structure; then, if it uses too much computer time, one should try to speed it up. It is dangerous to muddle along with a partially running code that lacks some computationally costly modifications which nevertheless would guarantee proper functionality in the long run. To do so would result in codes which run only for some initial conditions, or crash during long program runs shortly before the data need to be extracted for a paper with an urgent deadline. Further, in scientific computing, where program malfunction may become obvious only after hours and days of runtime, a defensive programming style is perhaps even more necessary than in commercial applications. • Input variables should be checked to verify that they lie in a meaningful range. • ‘Risky’ variable names which can lead to mistyping or confusion with other variables (e.g. the use of vzero and v0 in the same program part) should be avoided. Understanding the Discrete Element Method: Simulation of Non-Spherical Particles for Granular and Multi-body Systems, First Edition. Hans-Georg Matuttis and Jian Chen. © 2014 John Wiley & Sons, Singapore Pte Ltd. Published 2014 by John Wiley & Sons, Singapore Pte Ltd. Companion website: www.wiley.com/go/matuttis

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 354 — #2

i

354

i

Understanding the Discrete Element Method

• Error-prone language constructs should be avoided in favor of less error-prone ones whenever possible. Dynamic memory allocation ‘by hand’ is dangerous in practically all programming languages; for simulations, usually arrays with a fixed maximal size can be compiled into the program. While-loops, which can easily lead to ‘infinite loops’, are more tricky to use than ordinary do-loops or for-loops, so they should be avoided wherever possible. • If one is not absolutely sure how a language treats operator precedence for arithmetic operations +, -, *, /, ˆ and logical operations such as and, or, not (e.g. whether a*bˆc means (a*b)ˆc or a*(bˆc)), one should use brackets to be on the safe side. Aiming for safety in one’s programming practice is paramount. Just because a feature is provided by a language standard does not mean that it is necessarily ‘safe’. For example, FORTRAN’s implicit declaration (where the data type depends on the first letter of the variable) is a feature that has probably ruined more than one academic career through the mistyping of a variable; so implicit none should be used in any FORTRAN program and function header. C and C++ still allow an if-condition with an assignment using ‘=’, while JAVA allows only the boolean comparison ‘==’ for equality. It is best not to use pre-processors: the minimum requirement for healthy programming should be that one can at least read the whole source code, and pre-processors to all intents and purposes change the code from what the programmer sees to what the compiler sees, which can easily lead to intractable bugs and opaque code. It is no wonder that this has led to the creation of the ‘The International Obfuscated C Code Contest’, a world championship for writing unreadable C code [1].

10.1.1

Literature

Books on software engineering are often intended for programmers who are not concerned with scientific computations. The problems with floating point computations are hardly mentioned, and there is an assumption that the necessary algorithms already exist and are widely available, while in many scientific applications the algorithms must be developed together with the programs and the computation of the results. Further, most of these books assume that finishing the programming (and debugging) is the main task, whereas with scientific computations the goal is the actual result of the computation. Therefore, when reading software engineering books, one should be alert as to whether the principles explained are really suitable for the program development of scientific simulations. There are researchers in computer science who are unconcerned about the needs of scientific computing, and so the separation between computer science and numerical analysis dates to the year 1971, when the programming language PASCAL appeared [2]. PASCAL contained only a minimum set of mathematical functions: sine and cosine were defined, as well as arc tangent, but arc sine, arc cosine and tangent were missing, based on the argument that they could be obtained by combining other functions. This is fine for analytical manipulations, but problematic for floating point computations with rounding errors, and certainly disastrous from the point of view of performance. But after all, PASCAL was intended as

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 355 — #3

i

Running, Debugging and Optimizing Programs

i

355

a tool for learning how to program. This shows that programming methodologies developed from the computer science perspective may have unwanted and disadvantageous side effects for users who come from the scientific computing community. The opposite is also true: the basic training that scientists and engineers receive in linear algebra makes many constructions R R easy for them to understand and use; but to computer scientists, MATLAB ’s in MATLAB implicit assumption of familiarity with linear algebra may be rather an obstacle in learning the language. Programming is an art, different from computer science or physics. Despite the title of Knuth’s series The Art of Computer Programming [3], these books are actually about algorithms, not about programming at all. There are many books on programming [4–6], the related time management and tools [7, 8], and general strategies for organizing one’s work or the work of groups [9], which cover concepts that are also helpful in the development of computer simulations. The anecdotal and statistical evidence can usually be taken at face value—for example, that the productivity (the output of properly functioning code) can differ by orders of magnitude between programmers [10, p. 548]. Further (see [7]), there is Brook’s law, which states that adding manpower to a late project makes it even later (because the already overworked veteran programmers have to additionally instruct the newly added programmers), and there are observations on the actual costs of object-oriented programming. Nevertheless, there are differences between software engineering and scientific programming. Software is usually something that is used interactively, and therefore it should continue running even if bugs and problems occur. Computer simulations often run in the background, without direct observation, and therefore should stop as soon as something dubious happens, or else one could wait for days just to obtain a meaningless result (e.g. infinities due to division by zero). In contrast to commercial software, where the user is allowed as many actions as possible, for computer simulations it is sometimes more important what the user (who may be different from the programmer) is not allowed to do (e.g. set time-steps to unphysical values, use meaningless simulation geometries of particles falling towards infinity etc.), and this should also be taken into account at the programming stage.

10.1.2

Choosing a programming language

Often, one does not have much choice in programming language: the targeted hardware platform, the availability of compilers for existing legacy codes, and the preferences of the leader of the research group usually determine the language for the final code. Nevertheless, at least for the development of algorithms, some languages are more efficient than others. R , with its built-in graphics and high-level numerical routines, allows fast program MATLAB development and immediate inspection of visualized data, which is the reason we have chosen it as the programming language for this book. For compiler languages, one needs to first choose a graphics library and then write the interfaces for the existing graphical routines; however, they offer higher performance for the production code. Some programming languages are safer than others; it is not coincidental that the Pentagon made the use of ADA compulsory for new software in its aerospace engineering projects. In general, one is safer with a programming language one knows than with an unfamiliar language which may have promising features (i.e. features promised by somebody who does not have to write the program).

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 356 — #4

i

356

10.1.3

i

Understanding the Discrete Element Method

Composite data types, strong typing and object orientation

Composite data types are a mixed blessing. It may be that in commercial applications, professional programmers are able to fathom out which data should be combined at the beginning of the programming stage and to structure their data and programs accordingly. In scientific programming, however, one usually has fewer data types but more data—enough to keep a single processor, multiple cores or even hundreds of computing nodes busy for days and longer. Introducing sophisticated (and maybe unnecessary) data constructs at the beginning of the programming stage may turn out to be burdensome later when one realizes that half of the data structure is not needed and the other half must be modified. This is especially the case with ‘strong typing’, e.g. declaring a data type ‘Euler_angles’, rather than just using double precision arrays (‘weak typing’) to represent the data. In commercial applications, where one is usually not concerned with the choice of representation, such strong typing may have its uses; but in scientific computing, one may find in the course of program development that for the orientation of particles, Euler angles have to be abandoned and replaced by quaternions, or that instead of real space coordinates, some data have to be held as Fourier components. In such cases, besides the wasted programming effort one has additional overhead due to the definitions of data structures and interfaces. Object orientation comes on different levels. On the one hand, there is full object orientation (C++ and JAVA), where objects (data structures and the functions working on them) are first defined and then instantiated (i.e. initialized with data and used) with the possibility to inherit properties. On the other hand, there are modules (as in FORTRAN90 and later versions), in which data are grouped together with the functions working on them; but each module is a single entity, without the possibility to produce multiple copies. The whole code for a module is usually contained in a single file and compiled together. Data hiding (i.e. preventing unwanted access from other modules) and interface definition is possible, as for full object orientation, and it is also possible to share commonly used data within the module without passing in argument lists. The authors have written their DEM code for polygonal particles in modules: features of ‘full object orientation’ were not missed. Data hiding (be it in modules or objects) also has drawbacks. Usually, one should write out intermediate configurations of a long-running simulation, to avoid the necessity of a full restart from scratch if execution of the program should be interrupted (e.g. due to a power failure or a janitor switching off the ‘idle’ computer while cleaning). A checkpointing function which writes out all the data needed for a restart (and the corresponding function that reads in the data at the program start) may not have access to necessary but hidden data in other modules. In that case, it is better to have the checkpointing function call a non-hidden function in each of the modules which is responsible for writing out data, and which has access to all of its own module data since it belongs to the module. Together with checkpointing, it can be implemented that the simulation reads a file (or verifies the existence or non-existence of a file with a specific name) as a signal either to output specific data or graphics or to terminate the program prematurely. This is also useful for programs which have been started in ‘batch mode’ (in the background or in a queuing system), where no interactive control is possible.

10.1.4

Readability

Readability, i.e. the possibility to understand the code by just reading through it, without additional analysis, is crucially important for picking up the work after enforced interruptions, for

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 357 — #5

i

Running, Debugging and Optimizing Programs

i

357

efficient debugging and for the reuse of subroutines—in other words, for work efficiency. There is a certain ideal density of information. Too high a density, e.g. due to cramming unrelated commands in the same line, does not improve readability. But too low an information density, e.g. due to empty lines or unnecessary spaces before and after mathematical operations, is also unhelpful; such measures may have improved the readability of code printed from drum- and chain-printers in the 1970s, but do nothing to give the reader an overview of the program on today’s laptop computers. Comments are best written in English, even if it is not one’s native language. Anything could happen to comments containing non-standard characters when one views the files on a different operating system or moves to another country. R program with comments in German, where Recently the first author received a MATLAB R editor the a¨ , o¨ and u¨ characters had been replaced by Japanese characters in the MATLAB with Japanese localization. The problem is even worse with filenames; for example, not all R does not allow hyphens in operating systems tolerate spaces in filenames, and MATLAB file and variable names, so it is safer to use underscores _ instead.

10.1.5

Selecting variable names

The right choice of variable names can greatly help to improve programming efficiency. Variable names should be self-explanatory; in the case of physical formulae, this is usually not difficult to achieve. When one reads F=m*a one automatically assumes that the lines contain Newton’s equation of motion. Of course, if the lines mean something different, the choice of variables is utterly foolish. The imperative to use self-explanatory variable names can also be taken to the extreme: when the first author was a PhD student, a junior student came to him and said he needed a larger monitor (he already had a 17-inch monitor), as his program did not fit on the screen. Inspection of the code revealed the following variable names (no joke): this_is_the_variable_for_x this_is_the_variable_for_v this_is_the_variable_for_F The resulting astonishment was countered with the remark: ‘But you said variables should be self-explanatory . . . .’ In more technical subroutines, the choice of variable name is more tricky. For example, in the predictor–corrector algorithm, one needs both predicted and corrected variables along with their derivatives up to order five; for the x-coordinates, using X0 (zeroth derivative) for the original variable, X1 for the first derivative, X2 for the second derivative, etc. is a minimalist approach, However, there is the danger that somewhere in the program initial conditions might also be called X0. To distinguish between predicted and corrected variables, one could use X0_pred, X0_corr, etc., and accordingly the other coordinates would be Y0_pred, Z0_pred and so on. While there is a certain internal logic for variables derived from physical formulae, it is often difficult to find catchy names for variables in list manipulations. If the names get too

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 358 — #6

i

358

i

Understanding the Discrete Element Method

descriptive and long, the code again becomes unreadable. If similar formulae are formatted close to each other, it is advantageous to use the same number of letters in each variable name, as this allows us to spot typing mistakes more easily. For example, instead of phi (for φ), the single character f fits much better with x, so Fx=m*xdd Fy=m*ydd Ff=I*fdd would be an economical way of writing the two-dimensional equations of motion for the forces Fx and Fy in the x- and y-directions, as well as the torque Ff. Here the second time derivatives are called xdd, ydd, fdd, to stand for ‘x dot dot’, ‘y dot dot’ and ‘f dot dot’, which is reasonable at least for short programs. In long programs, m for the mass and I for the moment of inertia can easily lead to problems, as ‘m’ and ‘I’ are such tempting variable names that one is likely to use them again in an other context, which can at best cause compiler errors arising from repeated variable declarations, and in the worst case lead to repeated and wrong R where variable allocations are reallocations in programming languages such as MATLAB not necessary. If one has to use longer variable names and, with them, full words, one should stick to the correct spelling. Code like fors=mass*akseleration will generate error messages very easily, and anyway who would remember such arbitrary spellings over dozens of pages of code or months of programming? Also, one should avoid using variable names which sound similar but are written differently, e.g. with F=f*Ff there can be problems discriminating between F, Ff and f, so one may type the wrong variable when one spells out the intended code in one’s head; for functions, the existence of sin and sign already leads to such problems sometimes. Further, using uppercase and lowercase versions of the same letter to represent different variables ruins the portability between programming languages, some of which are case-sensitive while others are not. Variables like ‘omega one’ can be written as omega1 or omega_1, so it would be good to decide on a unique system to use throughout one’s programming career; unfortunately, one will often have to deal with legacy code or code written by co-workers where a different convention is used. When the value of a variable is modified, it is good practice to introduce a new variable name instead of continuing to use the old one; see the following example for a confusing array initialization: l=15 a(l)=22 l=l+1 a(l)=23 l=2*l a(l)=46

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 359 — #7

i

Running, Debugging and Optimizing Programs

i

359

The code below, which performs the same initialization but uses variables lp1 (‘l plus 1’) and two_l (‘2l’), is much easier to understand: l=15 a(l)=22 lp1=l+1 a(lp1)=23 two_l=2*l a(two_l)=46 If a value different from the value of the original variable is to be used in a program, a new variable name should be defined, or else the resulting code will be nearly impossible to debug. So an inverse should be given a different name than the original variable, and vectors should have different names than their normalized counterparts (the unit vectors). Once a student came to the first author and complained that his results were off by ten orders of magnitude (exactly a factor of 1010 ); the student was sure that there was no mistake in his program, and so concluded that the underlying algorithm (taken from a paper) must be wrong. Suspiciously, the discretization parameter τ = 10−5 was exactly one over the root of the error. A quick glance at the code revealed tau=10ˆ-5 tau=1/tau in the initialization and then, separated by nearly the whole height of the computer screen, the frequency computation omega=[1:l]/tau within a loop. The student, remembering that multiplication is faster than division, knew that he would need to multiply by the reciprocal of tau later on, so he computed it beforehand but unfortunately overwrote the original variable tau by using the same name for the reciprocal. Further down, he forgot this and divided where he should have multiplied. Had he used proper variable names which expressed more precisely the content of the variable, he would not have wasted a lot of time with fruitless debugging. Developing sensible naming conventions can take time. According to McConnel [10, p. 764], a ‘guru programmer’ is someone whose code is ‘crystal clear’ and well documented.

10.1.6

Comments

Meaningful comments are not just decorations—they are essential for understanding the more complex goings-on in a program. Every year, the first author emphasizes to his class the importance of comments and the dangers awaiting programmers who don’t comment their programs (one of which is getting no credit for the project); then, he finds a line like the following in some student’s code: % This is a comment

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 360 — #8

i

360

i

Understanding the Discrete Element Method

The student will usually justify including such a useless line by the fact that the professor required comments. It is futile to write comments just for the sake of having some comments in the code—the point is to include meaningful comments. The comment in a=x*y % multiply x by y and assign it to a is redundant, because that’s clearly what the code says anyway. Something like a=x*y % area is obtained by multiplying x by y is much more informative about the purpose of the code; but if one had used different variable names, such as area=length_x*length_y then the code would be sufficiently self-explanatory that no comments are necessary at all. More complex functions should have a header comment containing the following minimum amount of information (and, when one cannibalizes a subroutine and rewrites it, one should not forget to also update the comments): PURPOSE: Function names should be catchy and short, to avoid overly long command lines and the resulting unreadable code. If the purpose of the function is more complex than can be captured by the function name, it should be documented here. USAGE: Sometimes, it is necessary to document where and how a subroutine should be used, or how input data should be prepared. In the case of a simulation of polygonal particles, the updating of the outline of the polygon has to occur after the predictor step, and the overlap computation has to come after the updating of the outline of the polygons, etc. If particular units or coordinate systems are assumed in the input, this should also be mentioned here. ALGORITHM: In cases where the algorithm does not become clear from looking at the source code, a description of the algorithm should be included in the header. For example, for the intersection computation between a line and a plane, it should be mentioned whether the point–normal or point–direction form is being used to represent planes; for sorting algorithms, the proper name of the method (e.g. insertion sort, quick sort) should be given. REFERENCES: If an algorithm or a formula is implemented according to a particular convention, it is good to write down where the formula has been taken from. In the case of the angular degrees of freedom, there are various ways to implement the equations of motion, so the particular alternative used in the code should be specified. The reference (book title, edition, page numbers) from which the actual implementation is taken should be cited, in case the algorithm or formula has to be looked up again for debugging purposes. CAVEATS: If there are conditions under which the algorithms will fail or lead to dubious results or excessively long runtimes, these should be mentioned under this heading. Bubble sort may take considerable time to sort bounding boxes that are initially in the wrong order, so it would be better to order the particle coordinates first with a faster sorting algorithm such as ‘quick sort’; such comments belong in the caveat section. Also, idiosyncratic definitions which deviate from ordinary conventions should be mentioned here. For example, Cartesian coordinates are usually defined in the order x, y, z in a right-handed coordinate

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 361 — #9

i

Running, Debugging and Optimizing Programs

i

361

system; if for any reason one uses a left-handed coordinate system, in which volumes computed from a vector-product will have their sign reversed, such features should be explained here. Of course, ideally one would rewrite the program so that no caveats are necessary, but often time constraints do not permit one to do so. TO DO: When one programs a complex project, one may initially have to work with preliminary versions of algorithms. The modifications needed in the future should be mentioned here, especially if one’s programming activity is frequently interrupted for longer periods. REVISION HISTORY: The current state of the function (along with the date) should be recorded in the header, so that anyone reading it will not need to go through the whole code. The programmer’s name also belongs here. If one develops different function versions in different directories, or has different program versions in various stages of development, one should be able to easily identify the version from these comments. Ideally, one would move newly programmed features from the TO DO comment directly into the revision history. Not updating this comment section will lead to confusion in later stages of the programming project.

10.1.7

Particle simulations versus solving ordinary differential equations

In the established programming style of the numerical analysis field, the main (driver) program for ODE codes typically looks like %Main program: Initialize variables [output]=solver(input,parameters) %End of main program In other words, the main program does not contain anything except the initialization and the call to the solver. This obeys the rule that long main programs should be avoided. Accordingly, everything which in ‘spaghetti code’ would have been written in the main program now sits in the solver function. This may be work fine for ordinary differential equations which are defined with continuous functions; but for particle simulations which contain additional stages such as neighborhood routines and updates to the geometry, it only creates problems in accessing the functions. We think that the most manageable main program structure for particle simulations is the following: %Main program: Initialize variables for i=1:tmax predictor particle_update neighborhood_computation overlap_computation force_computation corrector if (i==appropriate) graphical_output

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 362 — #10

i

362

i

Understanding the Discrete Element Method

observable_computation checkpointing end end writeout_final_configuration %End of main program Although the structure looks rather conservative, with the proper data structures and function definitions it will not be long, while at each stage in the time integration it will allow access to all the data which are necessary for observable computation and debugging.

10.2

Hardware, memory and parallelism

Many novice programmers believe that cramming more instructions into a single command line will give a faster program. This is not true—the processor will never ‘see’ the source code, only the executable that the compiler generates from it, and the compiler will partition the work in such a way that there are never more instructions at any given time than the functional units of the processor can handle (see Figure 10.1). Nevertheless, to write efficient code, a rudimentary understanding of the hardware is necessary.

10.2.1

Architecture and programming model

The fundamental programming model for most of today’s hardware is the ‘von Neumann architecture’. In this architecture, a program may contain instructions in any order, and the processor will execute the instructions in the given order. In ‘superscalar’ or ‘pipelined’ execution, operations which use different functional units of a processor will overlap. Other programming models include ‘vectorization’, where a certain set of operations is executed

Advanced/high-end processor Load/Store Generic/low-end processor data Floating− point unit

Fixed−point unit (integer)

Floating− point unit

Load/Store Fixed−point unit (integer)

Instructions

data Instructions and data

Memory (Cache)

Branching/Dispatch

Figure 10.1 Typical CPU architecture, with the dispatch for the program control and the branching (if-conditions), and integer and floating point units (FPUs). The latter may be able to execute additions and subtractions, or multiplications, or combinations of both preceding types of operation, and possibly also the evaluation of higher functions and divisions; not all FPUs in a processor necessarily have the same abilities. Additionally, the load and store units are in charge of the data transfer. High-end architectures (gray) differ from low-end models (black) by the number of functional units and by the memory bandwidth.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 363 — #11

i

i

Running, Debugging and Optimizing Programs

363

Scalar/Von Neumann: l:x(1)

l:y(1)

* −→

l:v(1) + s: u(1) l:x(2) −→ Execution time in effective cycles

l:y(2)

*

...

Super-Scalar/Pipelined: l:x(1)

l:y(1)

* l:v(1)

−→

+ l:x(2)

s: u(1) l:y(2)

Execution time in effective cycles

* l:v(2)

+

s: u(2)

...

−→

Vectorized: l:x(1)

l:y(1) l:x(2)

* l:y(2) l:x(3)

l:v(1) * l:y(3) l:x(4) .. .

+ l:v(2) * l:y(4) .. . −→

s: u(1) + l:v(3) * ..

.

s: u(2) + l:v(4) .. .

s: u(3) + .. .

Execution time in effective cycles

s: u(4) .. . −→

SIMD-Parallel: l:x(1) l:x(2) l:x(3) l:x(4) .. .

l:y(1) l:v(1) + * l:y(2) l:v(2) + * l:y(3) l:v(3) + * l:y(4) l:v(4) + * .. .. .. .. . . . . −→ Execution time in effective cycles

s: s: s: s:

u(1) u(2) u(3) u(4) .. . −→

Figure 10.2 Temporal order of the inherent parallel execution of a loop over i for u(i)= v(i)+x(i)*y(i) in various architectures. ‘l:’ indicates the loading of the data from memory, and ‘s:’ indicates the writing back of the data to memory. The diagram for the superscalar execution assumes that only one floating point operation can be realized per cycle; the diagram for the vectorized execution assumes that two floating point operations can be realized per cycle. An empty rectangle indicates that in that cycle no unit is free to execute an operation. The actual degree of parallelism (number of operations which can be executed in parallel, or length of subvectors which can be dealt with at once) depends on the actual hardware. The execution is given in effective cycles, i.e. with respect to the time an operation is completed.

successively on whole vectors, and SIMD (single instruction, multiple data) parallelization, where a set of operations is executed simultaneously (i.e. in parallel) on multiple data points. In Figure 10.2, the execution of the loop for i=1:N u(i)=v(i)+x(i)*y(i); end is represented in the four different architectures mentioned above. The instructions in Figure 10.2 themselves consist of various stages (instruction fetch, instruction decode,

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 364 — #12

i

364

i

Understanding the Discrete Element Method

CPU Registers O (102 words) up to 100 GB/s

↑↓

Level 1 Cache O (10 Kb–0.1 MB) up to 50 GB/s

↑↓

Level 2 Cache O (0.1–1 MB) up to 30 GB/s

↑↓ Level 3 Cache O (18MB)

up to 10 GB/s

↑↓

Main Memory O (1–10 GB) 10–400 MB/s

↑↓

Mass Storage: Hard Disk O (TB) Figure 10.3 Memory hierarchy, together with the typical order of the size and bandwidth (rate of data transfer possible, indicated by arrows ↑↓) for current hardware. 1 word is 4–10 bytes, depending on the manufacturer and data type. Writing is usually slower than reading, and caches usually consist of data and instruction caches. If several programs are executed at the same time, the effective data transfer rates may be considerably lower.

register fetch, etc.), the understanding of which is not so important for scientific programming, and whose meanings can be found in books such as [11]. Pipelining allows use of the output from one operation directly as input to another operation, without writing it back to memory and loading it again. Whereas in former times ‘vectorization’1 and SIMD parallelism were limited to supercomputers, with the ‘Streaming SIMD Extensions’ of various levels (SSE1 to SSE4 currently), such hardware features have reached the mass market. General purpose graphics processing units (GPGPUs) are basically SIMD computers on a single board.

10.2.2

Memory hierarchy and cache

In the previous subsection and Figure 10.2, we did not specify where the data actually came from. In fact, there is a whole hierarchy of memory, ranging from cheap and large memory (the hard disk, which may actually be used in program execution if the amount of data is so large that the main memory cannot hold it), over the main memory and several layers of caches (for which a higher level means larger size but also slower access) to the registers which are the memory locations in the CPU that it can access directly. All other data must be loaded from the lower level in the memory hierarchy; see Figure 10.3. As can be seen, both the size of the memory at the different levels and the bandwidth for the transfer between levels vary by orders of magnitude. Nowadays caches are on-chip, i.e. the circuits of the cache are on the same dye as the CPU itself. All things being equal, price differences for main-boards are usually due to different bandwidths between memory and CPU. Similarly, price differences for USB memory sticks (or SD cards) of the same size usually reflect different bandwidths. R 1 This is different from ‘vectorization’ for MATLAB programs, which refers to writing a code using implicit loops

only, so that the execution is speeded up as no explicit handling of single indices is necessary for the compiler.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 365 — #13

i

Running, Debugging and Optimizing Programs

10.2.3

i

365

Multiprocessors, multi-core processors and shared memory

For several years, the clock rate of CPUs has not improved significantly due to limitations arising from the intrinsic properties of silicon. The need to further increase computing power has therefore led to the production of multi-core architectures, where several CPUs like the one in Figure 10.1 are integrated on a single chip and connected with some additional core interface, and they all access the same memory; see Figure 10.4. The idea itself is not new and was already anticipated by Cray Research between the 1970s and 1990s. The Cray X-MP was basically a composition of two Cray-1 computers (its predecessor), and the X-MP’s successor, the Cray Y-MP, was basically a composition of two X-MP computers. What is new is that the hardware is now available to private users, not only supercomputer centers. Further, for the high-end market, there are multiprocessor machines, so that several processors (with perhaps fewer cores) all work with the same memory; see Figure 10.4. For the past few decades, faster hardware was associated in the minds of many with a faster clock-rate of the CPU, although the faster execution was in fact limited by the memory: CPUs with high clock-rates but low memory throughput due to cheap main-boards don’t perform very well. During that era, the clock-rate of the memory was considerably slower than that of the CPU; but in recent years this gap is closing, and the clock-rate of the memory has become comparable to that of the CPU. DDR2 memory for O(2–3 GHz) processors was clocked with 666 MHz; DDR3 memory is clocked with 1.3 GHz or more. Nevertheless, due to multiple threads or cores, the computing power has also increased, and so memory throughput is still the main obstacle (‘bottleneck’) in obtaining better performance. Processors with larger cache size may give better performance, as long as all the data of the program fit in the cache, which may actually be the case for DEM simulations with thousands, though not tens of thousands, of degrees of freedom.

10.2.4

Peak performance and benchmarks

The peak performance is the hardware limit for the computation speed. In a single cycle, a functional unit can produce a single result with pipelining. A clock-rate of 1 GHz corresponds to 109 cycles per second, so if such a processor has two floating point units, it is able to produce 2 × 109 floating point results per second. But if there are cache misses (i.e. more operations could be executed than necessary data delivered to the functional units), the performance will be lower. Therefore, a tongue-in-cheek definition of peak performance is ‘the performance you are guaranteed not to reach’. Programs which are used to measure the actual performance are called benchmarks. To estimate the speed of one’s own application on a certain architecture, a benchmark has to be comparable to the application. The LINPACK benchmark [12] computes a matrix inversion [13] with a kernel (an inner loop which consumes most of the computer time) y=y+a*x This routine is called the DAXPY (double precision A X plus Y) kernel. Time integrators in particle simulations have a similar structure; however, they are usually not the most time-consuming parts of the program, so the validity of the benchmark results for particle simulations is very limited. Moreover, inversions of matrices with l rows and columns take O(l 3 )

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 366 — #14

i

366

i

Understanding the Discrete Element Method

(a) CPU 1

CPU 2

CPU 3

CPU 4

↑↓

↑↓

↑↓

↑↓

Level 1 Cache ↑↓

Level 1 Cache ↑↓

Level 1 Cache ↑↓

Level 1 Cache ↑↓

Level 2 Cache

Level 2 Cache

Level 2 Cache

Level 2 Cache

↑↓

↑↓

↑↓

↑↓

Main Memory

(b) CPU 1

CPU 2

Core 1

Core 2

Core 1

↑↓

↑↓

↑↓

Core 2 ↑↓

Level 1 Cache ↑↓

Level 1 Cache ↑↓

Level 1 Cache ↑↓

Level 1 Cache ↑↓

Level 2 Cache ↑↓

Level 2 Cache ↑↓

Level 2 Cache ↑↓

Level 2 Cache ↑↓

Level 3 Cache

Level 3 Cache

Level 3 Cache

Level 3 Cache

↑↓

↑↓

↑↓

↑↓

Main Memory

(c) CPU Core 1

Core 2

Core 3

↑↓

↑↓

↑↓

Core 4 ↑↓

Level 1 Cache ↑↓

Level 1 Cache ↑↓

Level 1 Cache ↑↓

Level 1 Cache ↑↓

Level 2 Cache ↑↓

Level 2 Cache ↑↓

Level 2 Cache ↑↓

Level 2 Cache ↑↓

Level 3 Cache ↑↓ Main Memory

Figure 10.4 Shared memory configurations with: (a) four single processors; (b) two double-core processors with proprietary third-level caches; (c) a single processor with quadruple-core, where the third-level cache can be used for inter-core communication.

operations, while the DAXPY arrangement for time integrators will be proportional to the number of particles. LINPACK benchmarks are published with the corresponding compiler options, which is convenient for helping the user learn about additional compiler optimization possibilities. Unfortunately, benchmarks almost never monitor the accuracy of the solution, so without additional information, it is difficult to judge whether stunning performance is due to hardware superiority or ruthless rounding. A LINPACK benchmark usually performs below the hardware limit: For two floating point operations its DAXPY routine will need two load operations from and one store operation to

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 367 — #15

i

Running, Debugging and Optimizing Programs

i

367

the memory, so that the memory bandwidth (the maximal transfer rate) will be the limit. In contrast, the computation of an inner product (DDOT) ddot=0 for i=1:n ddot=ddot+x(i)*y(i) end needs, on average, only two load operations from memory; ddot can be held in the CPU’s register and has to be written to memory only at the end of the loop; so with DDOT kernels in loops with large n, it is easier to come close to the hardware limit than with the DAXPY kernels. Unfortunately, one cannot write realistic particle simulations so that most of the computation is performed in DDOT routines alone: divisions and function evaluations, if-conditions, as well as the necessity to use relatively short loops will lead to far fewer floating point computations per cycle than that of the peak performance. At its conception, the LINPACK benchmark used 100 × 100 matrices, considered huge at that time; later 1000 × 1000 matrices were used. As time went by, hardware manufacturers implemented cache sizes and compiler options which would lead to favorable LINPACK benchmarks. For that reason, benchmarks have been devised which are guaranteed to use more than the cache full of data; an example is the Himeno benchmark [14], with computational effort depending linearly on the amount of data, which is the standard situation in simulations for classical mechanics, whether with grid methods or with particle methods.

10.2.5

Amdahl’s law, speed-up and efficiency

If a program is executed on several cores in parallel, there are several concepts that can be used to evaluate how ‘well’ the parallelization is going. Speed-up: If the execution time is ts on a single core and tn on n cores, the speed-up S is defined as S=

ts . tn

An honest measurement would imply determination of the runtime without any parallelization overhead or parallelization option for the compiler. Later (in § 10.4.4) we will encounter an example of a code on a single core which was already slower when it was compiled with a parallelization option. Efficiency: The efficiency E is defined as the speed-up divided by the number of processors: E=

ts S = n ntn

Parallelization overhead (i.e. the additional time needed for communication, waiting for tasks to complete on other cores, etc.) reduces the efficiency from the ideal value of ‘1’. On cache-based machines, efficiencies larger than 1 are possible, if the use of more processors

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 368 — #16

i

368

i

Understanding the Discrete Element Method

Serial execution on 1 core

Parallel execution on 8 cores

Parallelizable Parallel code finished par.

serial

Core 1

par.

serial

Core 2

par.

serial

Core 3

par.

serial

Core 4

par.

serial

Core 5

par.

serial

Core 6

par.

serial

Core 7

par.

serial

Core 8

0T

1T

2T

serial Serial code finished

8T

9T

Figure 10.5 Amdahl’s law: if a serial part in a computation cannot be parallelized, the execution time on a parallel computer will be longer than the serial execution time divided by the number of processors; in this case, a program which takes 9T in serial execution can be reduced to 2T.

leads to the use of more cache and higher memory bandwidth [15]; this is more likely for multiprocessor configurations than for single-processor multi-core configurations. Amdahl’s law: For a parallel fraction f p and a non-parallel (serial) fraction f np , such that f p + f np = 1, Amdahl’s law predicts a speed-up of Smax =

1 f np + f p /n

on n processors; see Figure 10.5. The efficiency is correspondingly limited to E max =

1 . n f np + f p

(10.1)

Ideal conditions are assumed, i.e. absolutely equal distribution of work and no parallelization overhead. The latter condition, at least, is not true for an example we will encounter later (in § 10.4.4), where the overhead is proportional to the number of cores. Typical speed-ups and efficiencies for several values of f np and numbers of cores are plotted in Figures 10.6 and 10.7. A high efficiency according to Equation (10.1) does not by itself imply a high speed of the program. There is a case where for comparable processors with similar clock-rates, 40% efficiency on 32 processors [16] and over 90% efficiency on 256 processors [17] was reported for the same physical system, but the execution times of the programs were the same. The reason is the speed of the scalar code: slow programs will give better efficiencies than faster programs for the same amount of parallelization overhead. Load balancing: Amdahl’s law predicts the speed-up under the assumption of ideal load balancing, i.e. each processor is supposed to be finished with its sub-task in exactly the same time-span, as shown in Figure 10.5. If there are fluctuations in the load balancing, i.e. if some cores take longer than others, the efficiency will decrease accordingly.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 369 — #17

i

i

Running, Debugging and Optimizing Programs

Speed-up

15

50% nonp. code 25% nonp. code 10% nonp. code 1% nonp. code

10 5 2 1

369

2

4

6

8

12 10 Number of cores

14

16

18

20

Figure 10.6 Speed-up as predicted by Amdahl’s law. For 30% non-parallelizable content in a program, the maximal parallel speed-up is below 5, no matter how great the number of processors is.

Efficiency

1 0.8 50% nonp. code 25% nonp. code 10% nonp. code 1% nonp. code

0.6 0.4 0.2 2

4

6

8

10

12

14

16

18

20

Number of cores

Figure 10.7 Efficiency as predicted by Amdahl’s law. Already for 1% non-parallelizable content in a program, the deviation from 100% efficiency is clearly visible.

Updates per second: All things being equal, programs with more particles and more timesteps need a longer execution time than those with few particles. To compare the speed of programs with Np particles, Nt time-steps, and a total execution time of t, one computes the updates per second (UPS), U=

t Np Nt

.

Ideally, one would obtain the same U for programs with small and with large particle numbers. In practice, programs with larger particle numbers will have smaller U due to the increased incidence of cache misses with the amount of data used. Using more primitive time integrators with fewer operations per time-step or less accurate computation methods may increase U . Nevertheless, if due to the increased noise level a smaller time-step has to be used, the actual execution time for the simulation over a given time-span may go up.

10.3

Program writing

An important part of program writing is time management. Program components which are more complex and difficult (time integrators, parts that involve computational geometry and exception handling) should be written when one has better concentration and can expect fewer interruptions than parts of the code where errors will become obvious more easily (data input and output, graphics). Additionally, the right choice of tools can make life easier.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 370 — #18

i

370

10.3.1

i

Understanding the Discrete Element Method

Editors

It is better to get used to an editor that is available on many platforms than to become specialized in one which may not be available when one changes the software platform. Some proprietary editors seem to be designed with the intention of making it difficult for users to switch to other software. Some properties of editors are more suitable for programming, and others are less so. Humans see by contrast, which is why the boring black-letters-on-white-background prevails as the default setting. With white letters on black background, much less light reaches the eyes. Syntax highlighting is in principle a good thing, but nearly every color will produce less contrast than black and white, so colors should be used with care. Code will look more regular if shown in monospaced type, which is the default of many editors; but often the font may mimic that of a typewriter, where the letter ‘l’ and the number ‘1’, as well as the letter ‘O’ and zero ‘0’, are difficult to distinguish. Playing a bit with the font settings may result in much more readable source code. If one finds programming tiring to the eyes, it sometimes helps just to switch to a larger or more easily readable font, or to a combination of background and letter colors that produces better contrast. The peculiarities of some languages may require special treatment. For example, in R variables are not declared, so adding an letter unintentionally due to key repMATLAB etition may create a new variable name without warning; to avoid such errors, in some R ’s built-in editor, key repetition is turned off. configurations of MATLAB For scientific programming, where the same operations may be repeated with several variables, successive lines of the program may contain similar code. It can be easier to edit such code in an editor which manipulates text (i.e. cut and paste, copy, delete) not only line-wise but also column-wise. For debugging and analysis, the commenting and uncommenting of larger portions of code is convenient with editors that allow this action by means of single key-combinations. Until the 1990s, small monitors with bad resolution usually could allow only a single window or application on the screen. Then, graphics resolution got sharper and screens became larger, so working with multiple windows on one screen became common. Nowadays, with cheap laptops and sub-notebooks, we are back to using small screens. Usually, during program writing, one will need about four application windows at the same time: the editor, the window showing the compiler messages, the graphics output window, and another window to view the data output. Both more and fewer open windows can make it difficult to access the necessary information simultaneously.

10.3.2

Compilers

During writing and debugging, it is usually advantageous to have more than one compiler available, as the messages from some compilers are more helpful than others. Moreover, if for the same language bugs occur, the different behavior of different compilers may help to identify the problem. Some compilers and some programming languages are more helpful than others for debugging. Very often, errors with indices lead to problems. Historically, FORTRAN compilers have been able to verify whether indices were in the allowed range at runtime by a compiler option like -C (capital C, not lowercase -c, which is for the generation of object files), or -fcheck=bounds in the gfortran compiler. Although for production runs

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 371 — #19

i

Running, Debugging and Optimizing Programs

i

371

the code should be compiled without this option, as it can slow down the program execution considerably, it is a valuable tool during the debugging phase. Programming styles which rely heavily on the use of pointers instead of indices are at a disadvantage here. In many programming languages of the 1970s (C, FORTRAN77), data are passed to subroutines only by the initial pointer. This may be faster than the actual copying of the data, R , but it is quite unsafe, as integer data can be passed which is what happens in MATLAB to subroutines which expect floating point data, and the meaning of the bits will be totally different; see Chapter 2, § 2.1.1. Although this was sometimes used intentionally (e.g. to generate random floating point numbers from integer arithmetic: overflow due to multiplication with large numbers leads to truncated bit-patterns which are fairly uniformly distributed when interpreted as floating point numbers), there is considerable danger that in large codes, some data mismatches will remain undiscovered. Newer programming languages like ADA will check the data types, so that the types and dimensions of data in calling and called functions must match. This has become available to FORTRAN programs with the FORTRAN90 standard (and later versions), but such ‘module variables’ are programmed in a slightly different way than the old FORTRAN77 variables, which are still available. As for any software, there are quality differences between compilers. Compilers produced by different manufacturers may lead to different program performance; they also differ in the verbosity of their error and warning messages. Newer compilers usually use novel features of newer processors more efficiently than do older compilers.

10.3.3

Makefiles

If one uses various modules, libraries and compiler options, these can be specified in ‘makefiles’ [18] (under Unix; other operating systems offer similar features). After modifications of the program, only the modified program parts have to be recompiled, which saves time in the later stages of programming or debugging when a code has assumed considerable length. Unfortunately, the concept is a bit involved, and makefiles can be arbitrarily complicated (e.g. for automatic compilation on different platforms, automatic checks of compilers can be included). Instead of writing a makefile from scratch, it is better to modify an existing example makefile. The makefile should be simple enough so that one can understand it, and it should be for one’s target language. Different languages have different dependencies—they need or produce different files; for example, C needs .h header files, FORTRAN90 produces .mod files for interface descriptions, and so on. ‘Make’ also allows one to set a rule to ‘clean’ files if the compilation should be done from scratch; this is necessary when, for instance, one has changed the compiler options for debugging (‘make’ only recompiles those files which have been modified since the last compilation, not the files which have been compiled with different compiler options). Also, when one changes to a different compiler, it is safer to delete the files produced by other compilers. While FORTRAN compilers can in principle use each other’s .o files, they don’t understand each other’s .mod files, and the error messages that result when one compiler uses the leftover .mod files of another are totally unintelligible. A frequent source of errors with makefiles is that they require the use of tabs, which are easily confused with spaces when reading program examples. Only some, but not all, platforms are helpful in explicitly pointing out that one has used spaces instead of tabs by mistake.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 372 — #20

i

372

10.3.4

i

Understanding the Discrete Element Method

Writing and testing code

Coding usually consists of three stages: writing the code, testing it, and debugging it. There are several principles which one has to internalize while one educates oneself to become a programmer of scientific problems. Think first, do the coding later: Before starting to code, one should first work out the necessary development stages away from a computer. Having a keyboard under one’s fingers usually clouds the thinking, because one wants to type the first thing which comes to mind, and rash implementations may make costly refactoring necessary at a later programming stage, when the implementation that seemed so straightforward earlier on turns out to be unfeasible for the problem at hand. One’s favorite doping substances (coffee, tea, chocolate or other sweets, caffeinated lemonades and, in extreme cases, vitamin-C rich fruit juices) are as effective in the planning stage as during the actual coding (and, on the downside, an overdose may ruin one’s digestion, sleep patterns, power of concentration and working ability to the same extent). Graphics: Particle simulations have considerable complexity, and onscreen visualizations are an efficient tool for monitoring the correct implementation of boundary conditions, particle initializations and ‘physicality’. To discriminate errors in the initialization from those due to computation, it is useful to call the graphics already before entering the ‘main loop’ over the time-steps. Input data verification: One should always check whether input variables are in a permissible range; even innocuous-looking functions like asin or acos can lead to trouble if they are called with input values greater than 1, even if the excess is only in the fifteenth digit. Often, it will not be the programmer alone who uses the final code, and the next user might attribute any malfunction to the original programmer’s error rather than their own meaningless data initializations, as in the following (unfortunately true) anecdote: Undergraduate to thesis advisor: ‘The program does not work. It crashes.’ (Hectic activity of thesis advisor at student’s terminal, with suppressed swearing about the stylish screen setting of dark lilac letters on black background. After five minutes . . . ) Thesis advisor to undergraduate: ‘What did you think when you set the diameter of the sand grains to two meters?’ Undergraduate to thesis advisor: ‘Nothing.’ Some variables are ‘safer’ than others, and can be left unverified; for example, masses are usually initialized with positive values. Moments of inertia, on the other hand, must be computed, and careless handling of vectorial directions can lead to negative entries. If one intends to use data in a certain range, at least during the phases of writing, testing and debugging one should verify that the data remain in the permissible range. The authors have made it a habit to put walls around the domain where the particles should be; that way, if anything goes wrong, the particles will at least still lie within a guaranteed range. Only integer data types can be compared for equality; double precision results will be affected by rounding errors, and analytically identical results may differ in floating point arithmetic. While analytically a = tan(π ) and b = tan(0) should both be identical to zero, R gives MATLAB

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 373 — #21

i

Running, Debugging and Optimizing Programs

i

373

>> a=tan(pi) a = -1.2246e-16 >> b=tan(0) b = 0 So, for floating point numbers, rather than checking for equality, one should verify that the absolute value of the deviation between the two numbers is below a certain tolerance, which the programmer should be able to supply. For the above a and b, we have >> a==b ans = 0 >> my_epsilon=1e-14; >> abs(a-b) 100% load, e.g. due to program compilation, followed by a longer period of load > 100%, e.g. due to the running of the program.

Sudden increase of the load to > 400%, due to the start of a single multi-threaded job consisting of four single-threaded jobs at once, followed by a sudden decay, e.g. due to a program crash, and then gradual increase of the load due to the start of another job and yet another.

Xload is a relatively old graphical X11 tool for visualizing the load. If the load increases beyond multiples of 100%, a new horizontal stroke is added; see Table 10.1. Because Xload also displays the name of the host it is running on, it is convenient for monitoring several machines at once via remote login. Xload is also useful for checking whether programs started without problems: if the load increases suddenly and then decreases immediately for a program which was supposed to run for hours, the program must have either crashed or been started with an unintended small number of time-steps. On the other hand, if one had intended to submit a chain of 40 jobs to be run successively in the background, a rise of the load towards 40 horizontal strokes indicates that the jobs were started synchronously instead.

10.4.3

Performance monitor for multi-core processors

Xload does not display information about the available number of threads, only the overall load. More recent performance monitors separate the load according to whether the processes are user processes or system processes, with less time resolution. In Table 10.2 we display a few patterns in a design similar to the ‘Activity Monitor’ of Mac OS X.

10.4.4

The ‘time’ command

One can obtain the most rudimentary understanding of a program’s behavior by measuring the execution time and the turnaround time (the real time the program needs to finish). The time command in Unix can be used from the command line in front of any executable which is started. It measures the real time (the actual ‘wall clock time’ elapsed from the start to the end

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 381 — #29

i

Running, Debugging and Optimizing Programs

i

381

Table 10.2 Characteristic load patterns for a machine with eight cores: light gray rectangles indicate user time, dark gray rectangles indicate system time, while black areas indicate that the cores are idle; two rectangles correspond to the maximal load of a single core. Interactive use of the computer, with various applications run for short times. Depending on the owner of the application, the CPU activity is shown as user time (light gray) or system time (dark gray). A single job is started, which does not use much system time, and then terminates. If interactive profiling tools were used during the execution of the job, the system load would increase.

A first job is started, and shortly afterwards a second job is started. The increase in the system time (dark gray rectangles) indicates that running both jobs at the same time strains the resources, probably due to swapping of the memory between the jobs. The execution of a parallel program is regularly interrupted, with the load being reduced to that of a single core. There is some heavy graphical or file output taking place at regular intervals, which slows down the program execution. A multi-threaded job is started with an initialization which is partially scalar and partially parallel. During the simulation, the load increases slightly over time with the internal parallelism (e.g. as the loop with the contacting particles becomes longer, so that the time consumption for the interaction computation increases).

of the job), the user time (the amount of time that the CPU executed processes owned by the user), and the system time (the amount of time that the CPU executed processes owned by the system, root etc.). The following data were obtained for a program for simulating polyhedral particles [15] with (thread-based) shared memory parallelization, compiled for scalar execution and with OpenMP, and executed with 1, 2, 4, 6 and 8 threads with Intel multi-threading on up to four cores. The output of time a.out for the executable a.out of the scalar code is 1555.79 real

1550.77 user

1.57 sys

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 382 — #30

i

382

i

Understanding the Discrete Element Method

This means that the job takes 1556 seconds, or about 26 minutes, to finish; the system time is negligible, so that the user time is nearly the same as the wall clock time. Compiled with the OpenMP option and executed on a single thread, the output is 1636.00 real

1633.05 user

0.94 sys

which is basically the same as before, except that due to the different memory management for OpenMP, the total time has increased. For an execution with two threads, we get 1041.41 real

1952.78 user

36.48 sys

i.e. the real time is reduced, as the work has been jointly undertaken by two threads which were executed in parallel. Note that the user time has become greater than the real time, as the separate execution for the two threads is ‘billed together’. The actual user time has increased compared to the execution in a single thread, as ‘forking’ (distributing the jobs onto the different threads) and ‘joining’ (collecting the results) during the parallel execution increases the work for the CPU. The system time has increased too, due to the additional ‘administrative effort’ associated with multi-threaded execution. With four threads, the times are 798.61 real

2660.66 user

99.95 sys

The trend in the previous examples continues, i.e. with additional threads the real time is further reduced—the program finishes faster than for the execution on fewer threads—while the user time becomes higher. With six threads, 857.95 real

3481.17 user

249.70 sys

Now the parallelization overhead has increased to such an extent that the real time has become greater than for the execution on four threads. With eight threads, we get 922.60 real

3858.49 user

405.01 sys

The efficiency is even worse than for six threads, even though the machine had been ‘emptied’ before running the program, i.e. other programs which were expected to slow down the execution (mail, internet browser etc.) had been shut down. The increase in the system time is purely due to parallelization overhead, which for n processors is approximately proportional to n − 1, or O(n), less favorable than the assumption of a constant scalar part in Amdahl’s law of § 10.2.5. For performance analysis on other architectures and systems, it is important to choose the right function. On multi-core architectures, if parallelization is used, the user time will in R general be greater than the real time. In MATLAB , the function cputime would give the user time, while the pair tic and toc would give the real time. Before measuring the runtime of a program, one should not only shut down all unrelated programs but also reboot the machine; otherwise, there may be memory regions which are usable only in small fractions. Such ‘memory fragmentation’ results when programs have been shut down but the memory was not released properly and is not available to the operating system any more. The use of such memory regions during runtime may slow down the program execution considerably. Of course, for production runs of programs which run for a long time, the machines should also be freshly rebooted before the programs are started.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 383 — #31

i

Running, Debugging and Optimizing Programs

10.4.5

i

383

The Unix profiler

Program profiling is the analysis of how much time a program and individual functions within it take, be it for optimization purposes or to figure out the CPU demands of applications. Usually, on Unix systems, additional compiler options such as -pg or -pgprof (depending on the distribution) have to be specified with the compiler. The execution is then interrupted at random points, and a ‘tick’ is added for the function which was executed at that moment. These ticks are written into a file ‘mon.out’ or ‘gmon.out’. The file can then be read with the command prof, which creates a table of the functions ordered according to name or CPU consumption, depending on the option specified. For profiling, one should take care that the program reaches ‘equilibrium’, i.e. that functions which are used only during initialization do not contribute too much to and distort the information on CPU usage. As the length of ‘mon.out’ files can be considerable, and the interruption during profiling may increase the runtime by up to 30%, the final code for production runs should never be compiled with the profiling option.

10.4.6

Interactive profilers

Nowadays, some commercial software comes with interactive profilers, so that the profile can be viewed while the program is running, or the profiling can be switched on after the initialization. The necessary compiler options and associated information have to be extracted from the man-pages, and sometimes the information may be ambiguous; for example, when a function A calls a function B, the time taken for allocation of memory in function B may be added to the CPU time consumption of B, while the time used for deallocation may be added to the CPU time consumption of A. Waiting time is conventionally the time for which cores are idle in parallelized parts of a program, but some systems may also allocate the idle time in non-parallelized program parts to the waiting time.

10.5

Speeding up programs

Trying to speed up a program that has to run for only a day is not worthwhile; but when the same program has to be rerun 50 times, and the total runtime will exceed a month, it is worth thinking about optimizing it. If the program contains only one subroutine which takes up the bulk of the computing time, say 50% or more, optimization is simple. If the CPU time is divided evenly between dozens of functions, optimization is more difficult. Discrete element simulations are usually very costly in terms of CPU time, while lattice simulations in fluid mechanics are often more memory-intensive than CPU-intensive. This means that with lattice methods, when larger simulations cannot be performed it is mainly due to lack of memory, whereas the limiting factor for discrete element simulations is usually the runtime.

10.5.1

Estimating the time consumption of operations

Particle simulation methods are usually quite computing-intensive. During the programming process one should get an idea of how much time each routine will take. The relevant unit

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 384 — #32

i

384

i

Understanding the Discrete Element Method

is the cycle, e.g. for a 1 GHz machine, 109 cycles are executed per second. With pipelining, by overlapping different stages of the execution of commands, it is possible to execute one multiplication, addition or subtraction per cycle, for either integer or floating point numbers. Function evaluations are more costly: divisions and evaluation of transcendental functions usually take about ten times as long as multiplications or additions (or subtractions), as they have to be composed of the latter operations. Branching (if-conditions) may increase the computational costs; depending on the result of the logical decision, one or the other operation has to be effectuated, so it is impossible to overlap different operations, and the average computation rate in results per cycle decreases. Calls of self-written functions usually make it necessary to push the data of the calling program part onto the stack, and these data then have to be retrieved after the return from the called function. While in cache-oblivous programming a single function for a single operation is common, that usually does nothing for the transparency of the program, let alone the performance.

10.5.2

Compiler optimization options

Here optimization usually means optimizing the speed of the executable. As mentioned above, particle simulations are limited more by the computing speed than by the amount of available memory, so we will not deal with optimization in the sense of reducing the necessary amount of memory. Usually, the code which is compiled and executed is not ‘optimal’, in that with R is not some adjustments a higher execution speed could be obtained. Although MATLAB a compiler language, we will use it in the following examples to be consistent with program examples in the rest of the book. The following list gives several measures which are available to the compiler to speed up programs. Different layout of data: The loading of variables which are in unsuitable relative positions in memory may lead to cache misses (additional waiting time due to reloading of data from the memory into the cache, or into the hierarchy of caches). A more advantageous data layout in the memory may speed up the program execution. Pipelining: The result of one operation is directly piped as input into the next operation, instead of being written into memory or cache and then read from there again. Reordering of code: This means that either operations on the same data are executed together, or loops may be separated or fused. Parallel execution: Processors may have several units which can perform the same operation, e.g. a floating point multiplication. So, for a loop like for i=1:k a(i)=b*c(i) d(i)=e*f(i) end where two multiplications are executed at the same time, the multiplication by b might be executed on one processor unit and the multiplication by e on the other unit.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 385 — #33

i

Running, Debugging and Optimizing Programs

i

385

Extraction of loop invariants: In the following loop, it is not necessary to compute the arc tangent every time, as its value is always the same: for i=1:k b=atan(4) a(i)=2*b*c(i) end Such ‘loop invariants’ can be computed outside the loop for greater efficiency: b=2*atan(4) for i=1:k a(i)=b*c(i) end Removal of if-conditions: As mentioned above, branching, i.e. the execution of if-conditions, may inhibit the program flow, as pipelining (overlapping the execution of instructions) will be interrupted. For simple if-conditions like for i=1:k if (a(i)>b(i)) c1(i)=d(i); else c1(i)=e(i); end end it may be possible to replace if-conditions with arithmetic operations which yield the same result but are executed faster: for i=1:k fak1=.5*(sign(a(i)-b(i))+1); % 1 if a(i)> b(i), else 0 fak2=1-fak1; % 1 if a(i)< b(i), else 0 c2(i)=d(i)*fak1+e(i)*fak2; end R (written with implicit loops) to This code can further be vectorized in MATLAB

fak1=.5*(sign(a-b)+1); % 2 if a(i)> b(i), else 0 c3=d.*fak1+e.*(1-fak1); This vectorized variant indeed performs faster, at least for vector lengths greater than 10 000. For compiler languages, the second code version should be faster. Code modification which does not change the result of the program: For example, instead of for i=1:k cosa=cos(i)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 386 — #34

i

386

i

Understanding the Discrete Element Method

sina=sin(i) a(i)=sina/cosa end which requires the evaluation of two transcendental functions and a division, a direct call to the tangent function would be faster: for i=1:k a(i)=tan(i) end Elimination of code which does not lead to programming output: This means that the compiler analyzes the data flow and eliminates all variables and operations which do not lead to any output. This can be dangerous if one wants to profile an operation by writing it in a loop, which repeats the operation a large number (thousands or ten of thousands) of times in order to obtain accurate time measurements. Therefore, one should have an approximate idea of the time consumption of operations and the performance of the processor, so that one is not surprised by timings which are one-tenth or less of what the processor is actually able to achieve; in such cases, the compiler may have simply eliminated that part of the code whose execution time one wanted to measure (as experienced by the first author on a Cray Y-MP in the mid-1990s). Loop unrolling: Loops introduce an overhead (additional cost) compared to the operations which are actually executed in the loop. In for i=1:k a(i)=b*c(i) end the for command means that in each iteration of the loop, an index variable has to be incremented and compared in an if-condition to check whether it is larger than the upper index k; if not, the multiplication with b is executed. The overhead associated with the index variable can be reduced by loop unrolling, i.e. by changing the increment of the index: for i=1:4:k a(i )=b*c(i ) a(i+1)=b*c(i+1) a(i+2)=b*c(i+2) a(i+3)=b*c(i+3) end Loop reordering: Indices should access the memory in the order in which the variables are R and FORTRAN, where arrays of variables are stored held in the storage. In MATLAB column-wise (column-major order), the first index (the row number) should change fastest. In C, where arrays are stored row-wise (row-major order), the last index should change

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 387 — #35

i

Running, Debugging and Optimizing Programs

i

387

fastest to reduce cache misses. A good compiler may change the execution order of the loops in for i=1:l for j=1:k a(i,j)=b(i,j)*c(i,j) end end so that the loop over i is rewritten as the inner loop. Function inlining: Like loops, the calling of functions induces an overhead: variables from the calling program have to be pushed onto the stack, new temporary variables must be initialized, and so on. Inlining means that the compiler will in principle write the function’s source code into the lines of the calling program to avoid overhead. Code replacement with optimized machine code: Some compilers are able to recognize the functionality of loops (and other constructs) and replace them with optimized machine code. The most prominent example is the BLAS (basic linear algebra subroutines) suite [20], originally written in FORTRAN but nowadays also available in other languages from Netlib [21], along with a lot of other software. BLAS1 consists of vector operations (inner product, scaling of vectors, etc.), BLAS2 includes matrix–vector operations and BLAS3 matrix–matrix operations. As the routines are usually provided by the processor vendor, who knows how to use all the features of the processor, the performance is generally considerably faster than that of self-written source code. Mathematical transformations: As divisions are much more costly than multiplications (by a factor of five to ten), code like for i=1:k a(i)=c(i)/b end for large k will execute more slowly (by a factor of five to ten) than ib=1/b for i=1:k a(i)=c(i)*ib end However, for floating point numbers, division by a number and multiplication by its inverse may not give results that are identical up to the last bit, so such optimizations are not standard. Automatic parallelization: If one works on a multi-core or multiprocessor architecture, the compiler may be able to distribute the work over several processors or cores. Execution with lower precision: The IEEE standard states that operations should be rounded to the last digit. This last digit is sometimes very costly to obtain, maybe even taking as much time as the whole operation for the other digits. In that case, a compiler may skip the final accuracy refinement of the last bits.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 388 — #36

i

388

i

Understanding the Discrete Element Method

Table 10.3 Evolution of various computer architectures from single precision (4-byte, 32-bit) arithmetic (SPA) to double precision (8-byte, 64-bit) arithmetic (DPA). Years in brackets are approximate, due to the common ambiguity in the ‘release date’ (date of announcement, finished prototype, production start or commercial availability). Architecture

Earlier models

Later models

Mini Computer

IBM 1130 (1965): no DPA

DEC PDP-11 (1970): coprocessor with DPA optional

Vector Supercomputer

Control DATA 6600 (1964): SPA slower than DPA SUN SPARC (1987), Intel I860 (1989): DPA slower than SPA

Cray-1 (1976): only DPA, no SPA

Inmos Transputer T414 (1986): SPA Connection Machine CM2 (1987): DPA considerably slower than SPA Cell (2005), NVIDIA CUDA Geforce-8 (2006): DPA slower than SPA Intel Penryn (2007), AMD K10 (2007): only SPA

Inmos Transputer T800 (1987): DPA

RISC (Reduced Instruction Set Cycle) Workstation CPU with integrated communication unit SIMD (Single Instruction Multiple Data) computer Graphic processors for general purpose computing SSE4 (Streaming SIMD Extensions 4)

IBM RS6000 (1990), DEC ALPHA (1992): DPA as fast as SPA

Connection Machine CM-200 (1991): DPA speed improved NVIDIA Geforce-400 Fermi (2010): DPA as fast as SPA Intel Sandy Bridge (2011), AMD Bulldozer (2011): also DPA

While this last point perhaps does not have serious consequences for discrete element codes, another possibility is much more dangerous: on some computer platforms, single precision floating point operations are executed faster than double precision operations. In fact, it is a long-standing pattern in hardware history that earlier models appeared with single precision arithmetic being executed faster than double precision. When it turned out that such hardware was not useful for scientific computation, later versions were built with faster double precision arithmetic; see Table 10.3. However, this pattern has been around for so long that it will in all likelihood continue into the future. For this reason, one has to be aware of compiler optimization options which, for the sake of performance, may reduce the computational accuracy. For particle simulations, this may make the overlap computation too inaccurate to be useful. The reduction in accuracy, though not necessarily to single precision, may be to a precision level which introduces enough noise to reduce the stability of algorithms or granular packings. As one of the authors found on a Intel Pentium 4 processor in early 2000, enforcing the full precision led to an increase of the computing time by a factor of 3. If one suspects that a performance gain has led to a decrease in the numerical precision (one indication is that the strength parameters for granular assemblies, such as stress–strain curves and angles of repose, are lower with than without optimization), there is usually an option that allows one to invoke the higher optimization levels but without a reduction of accuracy; this option may be called -IEEE, -mp (‘maintain precision’) or -assume accuracy_sensitive. There is no standard governing which optimizations are performed by which compiler option, so it is advisable to study the man-pages of the compiler. Compiler options which lead to faster code usually use -O (uppercase ‘O’; lowercase ‘o’ is usually reserved for the

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 389 — #37

i

Running, Debugging and Optimizing Programs

i

389

creation of object files). Common directives for higher-level optimization are -O1, -O2 and -O3. The higher the numerical value, the faster (in general) the resulting code—but also the higher the risk. Numbers above 3, i.e. levels -O4 or -O5, usually indicate ‘aggressive’ optimization, which may not be trouble-free. If no optimization options higher than level 3 are available, -O3 will in all likelihood perform inlining. If higher optimization levels are available, or if there is an option -fast, inlining is usually performed with them. For some compilers, inlining must be selected by a special -inline option, or functions can even be inlined selectively. Processors come in families, i.e. there are several generations of each processor based on the same ‘generic’ architecture, the first architecture in the family. Usually, one works not on the generic model of a family, but on a newer member which is considerably faster, has larger cache sizes (see § 10.2.2) or has more functional units or multiple cores (see Figure 10.1). Compilers usually have a switch like -generic, so that a code will run on all processors of a given family; this is often the default option. Options like -arch native or -arch host optimize the code for the particular machine on which the code is compiled, which is preferable when speed is of the essence. With some compilers, setting the cache size by hand or enforcing the alignment of data along certain boundaries in the memory (again, reading the man-pages of the compiler is recommended) may also lead to performance improvements.

10.5.3

Optimizations by hand

Of course, some of the strategies in the previous subsections can already be incorporated into one’s coding practice—as long as readability is not diminished—so that one does not have to rely on the compiler to obtain a code with minimum overhead. Other approaches require the programmer’s understanding of the programming language and its handling of data. For example, the definition of variables along rows or columns should be done in such a way to avoid R the most frequent data accesses unnecessary cache misses; so in FORTRAN and MATLAB in multi-dimensional arrays should be via the first index, while in C and its derivatives C++ and JAVA the last array index should be used. In the context of particle simulations, corners R should be defined with an ordering in MATLAB corner(icorner,iparticle) where icorner goes over the corners and iparticle goes over the particles, to make sure that the data for all corners are in successive memory locations. Some operations, especially in linear algebra, can be replaced with a ‘higher BLAS level’ so that instead of several vector operations, a single matrix operation is used. For v1=A*w1 v2=A*w2 ... grouping the vectors before performing the matrix multiplication W=[w1 w2 ...] V=A*W

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 390 — #38

i

390

i

Understanding the Discrete Element Method

may give better performance (depending on the length of the vectors). In the same way, instead of solving several linear systems with the same matrix and different right-hand sides, v1=Aw1 v2=Aw2 ... the synchronous solution W=[w1 w2 ...] V=A\W may be faster, as only a single LU decomposition is necessary for A.

10.5.4

Avoiding unnecessary disk output

Of all the memories in the computer, the mass storage (hard disk) is the slowest. When programs provide the possibility to write out data for generating graphics or recording individual trajectories, the data are usually written to disk. In case one is not interested in these data, the data should not be written out. If the hardware monitor (see § 10.4.3) does not indicate approximately 100% load for an executable, the reason is often that the program has to wait for data to be written to the disk. Usually, an operating system does not write data from a program directly to the disk; data are written into buffers and then flushed onto disk if they exceed a certain amount of storage in the main memory—so there is no harm in using a few write-operations every time-step. Nevertheless, if all the coordinates are needlessly flushed onto disk in every time-step, there will be a considerable delay in the program execution, not to mention the extra wear-and-tear on the hard disk.

10.5.5

Look up or compute

Sometimes, information can either be computed by brute force or be looked up. For the overlap of two polygons, one can write a loop which computes the intersection of all edges. If the contact existed also in the previous time-step, one can save the neighboring corner numbers and look for overlap only near the neighboring sides; this simplification is based on the fact that large relative motion is not possible from one time-step to the next—or else the time integration will blow up anyway.

10.5.6

Shared-memory parallelism and OpenMP

The principle of using shared-memory parallelism for multiple threads, cores or processors is the following: computers execute their tasks as threads, which are combinations of data and instructions operating on the data; a parallelizing compiler can issue several threads instead of one, over which the data, and also the execution, will be distributed. When the compiler finds no dependencies between the data, ‘automatic parallelization’ is possible, without the need of additional information; but if the independence of the data is not clear, the programmer has

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 391 — #39

i

Running, Debugging and Optimizing Programs

i

391

Table 10.4 Kinds of data and corresponding attribute for the declaration in OpenMP. Usage of data

Attribute in OpenMP

Data a thread receives only as input data Data a thread produces as output data Data accessible to a thread Data accessible by all functions

firstprivate lastprivate private shared

to specify additional information so that the compiler can distribute the work. This is done by ‘parallelization directives’, which are usually written as comments so that the same code can be used without parallelization on a single core. Some vendors have their proprietary set of directives; the most common portable library is OpenMP [22]. Threads can exchange data via the memory or, depending on the hardware, the (usually third-level) cache, which is faster. In principle, the computing power of four cores may be available via four independent processors, two processors with two cores each, or a single processor with four cores. For a given clock-rate, which configuration gives the better performance will depend on the size of the caches and the bandwidth of the main-board. Cores may be capable of executing only a single thread or multiple threads at a time. The transistor count for cores which can execute two threads will be about twice that of a single-threaded core, so apart from the control hardware, for scientific programming, double-threaded cores can be thought of as two cores. When the compiler is not able to parallelize the code automatically, the programmer has to at least specify the attributes; see Table 10.4. A wrong choice of data attributes may inhibit parallelization or lead to wrong computation results.

10.6

Further reading

The reason that books are still written and sold is that information is often much better structured in books than in documents found on the internet. Besides, one can still flip through books when the computer screen is already crammed full of other information. During a complex programming project, apart from the literature referenced in other chapters, one should also have access to books on the fundamentals of computer science, such as the one by Aho and Ullmann [23], as well as on books on algorithms. Reference and user manuals for programming languages and libraries—and introductory tutorial books if one has to learn the languages or packages while using them—also belong on one’s desk. Apart from the aforementioned series by Knuth [3], some less monumental books include the one by Cormen et al. [24], which is independent of programming languages, and Algorithms by Sedgewick, with [25, 26] or without [27] programming languages. Debugging the Development Process [9] can be very helpful in stimulating one to rethink one’s general working strategies. Death March [8] may offer comfort by recounting what happened to other people, though the circumstances described in that book are rather to be avoided, not mimicked. The Pragmatic Programmer [5] describes a collection of behavior patterns that are useful when developing programs, which is well worth reading and for the most part relevant to scientific computing. Code complete [4, 10] is a valuable text containing a lot of inspiring information and many references on programming practice, although the first edition may be more useful for programmers in

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 392 — #40

i

392

i

Understanding the Discrete Element Method

scientific computing, as the rewriting of the example code pieces in JAVA has not improved the diversity of information. Two complementary books on OpenMP are those by Chapman et al. [28] and Chandra et al. [29].

Exercises 10.1 Amdahl’s law is derived under the assumption of no parallelization overhead. Derive its analogue for constant overhead and overhead proportional to the number of cores, and plot graphs analogous to those in Figures 10.6 and 10.7 with suitably chosen magnitudes. Up to what number of cores does a parallelization make sense? 10.2 Find out how many floating point units your processor has. Apart from technical reports and data sheets, graphics searches on the internet may yield a block-diagram or even an annotated photograph of the processor’s dye. R program (which does nothing but call a few ODE 10.3 Save the following MATLAB R  demos) in the MATLAB editor:

clear format compact vdpode(20) ballode orbitode return Run the profiler (in the tools menu) on this program. Try to understand the output of the profiler.

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

The International Obfuscated C Code Contest, http://www.ioccc.org, last visited December 2013. N. Wirth, “The programming language Pascal”, Acta Informatica, vol. 1, pp. 35–63, 1971. D. Knuth, The Art of Computer Programming, Volumes 1–4A. Addison-Wesley, 2011. S. McConnell, Code Complete, 2nd ed. Microsoft Press, 2009. A. Hunt and D. Thomas, The Pragmatic Programmer: From Journeyman to Master. Pearson Education, 1999. J. Bentley, Programming Pearls, 2nd ed. ACM Press Series, Prentice Hall, 2000. F. P. Brooks, Jr, The Mythical Man-Month: Essays On Software Engineering, Anniversary Edition, 2nd ed. Pearson Education, 1995. E. Yourdon, Death March. Yourdon Press Computing Series, Prentice Hall, 2004. S. Maguire, Debugging the Development Process: Practical Strategies for Staying Focused, Hitting Ship Dates, and Building Solid Teams. Microsoft Press, 1994. S. McConnell, Code Complete, 1st ed. Microsoft Press, 1993. L. Null and J. Lobur, The Essentials of Computer Organization and Architecture. Jones & Bartlett Learning, 2010.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 393 — #41

i

Exercises

i

393

[12] HPL – A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers, http://www.netlib.org/benchmark/hpl/, last visited December 2013. [13] J. Dongarra, J. Bunch, G. Moler, and G. Stewart, LINPACK Users’ Guide. Society for Industrial and Applied Mathematics, 1987. [14] R. Himeno, Himeno Benchmark, http://openbenchmarking.org/test/pts/himeno, last visited December 2013. [15] J. Chen and H.-G. Matuttis, “Optimization and OpenMP parallelization of a discrete element code for convex polyhedra on multi-core machines”, International Journal of Modern Physics C, vol. 24, no. 2, article 1350001, 2013. [16] R. Hackl, H.-G. Matuttis, J. M. Singer, T. Husslein, and I. Morgenstern, “Parallelization of the 2D Swendsen– Wang algorithm”, International Journal of Modern Physics C, vol. 4, no. 6, pp. 1117–1130, 1993. [17] M. Flanigan and P. Tamayo, “A parallel cluster labeling method for Monte Carlo dynamics”, International Journal of Modern Physics C, vol. 3, no. 6, pp. 1235–1249, 1992. [18] R. Mecklenburg, Managing Projects with GNU Make. O’Reilly Media, 2009. [19] Cygwin, http://www.cygwin.com, last visited December 2013. [20] I. S. Duff, M. A. Heroux, and R. Pozo, “An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum”, ACM Transactions on Mathematical Software, vol. 28, no. 2. pp. 239–267, 2002. [21] BLAS (Basic Linear Algebra Subprograms), http://www.netlib.org/blas/, last visited December 2013. [22] The OpenMP API specification for parallel programming, http://openmp.org/wp/, last visited December 2013. [23] A. Aho and J. Ullman, Foundations of Computer Science: C Edition. Principles of Computer Science Series, W. H. Freeman, 1994. [24] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction To Algorithms. MIT Press, 2001. [25] R. Sedgewick, Algorithms in C++: Graph Algorithms. Addison-Wesley, 2002. [26] R. Sedgewick, Algorithms in Java. Parts 1–4: Fundamentals, Data Structures, Sorting, Searching. Prentice Hall, 2003. [27] R. Sedgewick and K. Wayne, Algorithms. Prentice Hall, 2011. [28] B. Chapman, G. Jost, and R. Van Der Pas, Using OpenMP: Portable Shared Memory Parallel Programming. Scientific and Engineering Computation Series, MIT Press, 2008. [29] R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. Menon, Parallel Programming in OpenMP. Academic Press, 2001.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/25 — 13:55 — page 394 — #42

i

i

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:31 — page 395 — #1

i

i

11 Beyond the Scope of This Book Several topics relating to non-spherical particles can be treated only marginally in this book, partly due to space considerations, partly owing to the fact that they are much more topics of ongoing research, and partly because the continuum nature of the coupled systems is difficult to integrate with a treatment of particle systems.

11.1

Non-convex particles

There are many possible ways to simulate non-convex particles, and some approaches are more efficient than others. Allowing a non-convex shape for the particles and adapting the interaction computation for the polygons or polyhedra accordingly will lead to much more complicated and time-consuming algorithms, so we don’t consider it a feasible alternative both from the perspective of algorithms (we know of no existing interaction algorithm for nonconvex polyhedra) and from the performance point of view. Therefore, treating non-convex particles as composites of convex particles is the more feasible approach. Connecting DEM particles with springs is not very efficient for granular simulations (in the case of fracture mechanics the argument is slightly different; see § 11.4). It introduces additional degrees of freedom between the particles, which have to be integrated out. Additionally, in the forces which act between the particles, damping forces must be included. This leads to either smaller time-scales, if very soft springs are selected, or very wobbly particles, for which the principle of ‘hard particles, soft springs’ must be abandoned. It is computationally more efficient to connect the particles rigidly. This is basically what is done for clusters of round particles, and for the ‘clusters’ of lines which are polygons.

11.2

Contact dynamics and friction

Johnson [1] gives a comprehensive treatment of the mechanical behavior at contacts, including the classical case of linear, Hertzian, wedge-shaped contacts, the analysis of stress

Understanding the Discrete Element Method: Simulation of Non-Spherical Particles for Granular and Multi-body Systems, First Edition. Hans-Georg Matuttis and Jian Chen. © 2014 John Wiley & Sons, Singapore Pte Ltd. Published 2014 by John Wiley & Sons, Singapore Pte Ltd. Companion website: www.wiley.com/go/matuttis

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:31 — page 396 — #2

i

396

i

Understanding the Discrete Element Method

distributions under various contacts, and the behavior of contacts under vibration, together with experimental results. Complementary to Johnson, with more focus on friction, is the book by Popov [2]. We reiterate here that there is ambiguity in the term ‘contact dynamics’ (or ‘contact mechanics’): it can be used to refer to a discrete element method with rigid particles [3–5]; however, much of the content of newer texts that have ‘contact mechanics’ in the title (e.g. [6, 7]) is unrelated to discrete element methods but rather deals with the modeling of contacts between discretized particles. These books focus mostly on FEM solutions of surfaces in contact, drawing heavily on nonlinear finite element methods, but leave out friction at the contact.

11.3

Impact mechanics

In principle, impact problems (which range from single particles hitting planes to the collision of dominoes) can also be treated with the discrete element method. However, the problem of unphysical jumps for velocity-dependent forces (see § 7.1.1) has to be dealt with appropriately. While at least at the separation of the contact the singularity must be removed for many-particle simulations, for impact problems with high velocity, the singularity at the closing of the contact must also be dealt with. A comprehensive theoretical, computational and experimental treatment of impact problems which fits well with the discrete element method is provided by Brach [8] or Stronge [9].

11.4

Fragmentation and fracturing

Modeling sintering with the discrete element method is easy: cohesive forces (as in § 7.3.5) are implemented as permanent interactions or with a time dependence. Accordingly, fragmentation can be computed by joining particles with elastic interactions and releasing the interaction when a critical stress or strain in the agglomerate is exceeded. Interactions connecting the centers of mass have been used [10, 11], but defining interactions along the surface as for cohesive particles is also feasible.

11.5

Coupling codes for particles and elastic continua

In the context of fragmentation, and also for large-scale simulations, the coupling of continuum approximations and particle simulations looks attractive: particle modeling is desirable for the small-scale structures, to retain physicality, while continuum methods are desirable ‘far away’, using coarse grids to save computing time. Nevertheless, there are some physical limitations to this approach, which have to do with the propagation of perturbations between regimes with different discretizations. While continuum models work with grids, and on the grid amplitudes are defined as degrees of freedom, for spring models the degrees of freedom and the mass points to which the springs connect are one and the same. In this context, both grid models and spring models must be discussed together.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:31 — page 397 — #3

i

Beyond the Scope of This Book

i

397

A fundamental parameter in this respect is the wave resistance, also called ‘mechanical impedance’, which for elastic waves in a continuum is I=

 Yρ,

where Y is Young’s modulus and ρ is the density. (For waves in electric circuits, the impedance, i.e. the resistance that electric waves experience during propagation, √ turns out to be the electrical resistance R.) It deviates from the continuum sound velocity c = Y /ρ by a factor of ρ, the density. When a signal (wave) moves from material a with wave resistance Ia to material b with wave resistance Ib , and Ia ≥ Ib , the transmission will be 100%: the whole wave moves into domain b. If Ia < Ib , the wave will be partially reflected, i.e. only part of the wave is transmitted. The reflected and transmitted amplitudes (see, e.g., [12, p. 117] for derivations, which are the same for longitudinal waves as in the transversal case) are as follows: Ia − Ib reflected amplitude = , incident amplitude Ia + Ib 2Ia transmitted amplitude = . incident amplitude Ia + Ib Therefore, if the impedances are not well matched, the processes one wants to investigate in the granular phase will send out elastic waves which are at least partially reflected at the interface, adding noise (and reducing the reliability of the simulation) in the granular region. In the general theory of elastic waves, ‘impedance matching’ is done by inserting a region of a given length lc and impedance Ic to minimize the reflection. The impedance for the granular region is obtained from the sound velocity in the corresponding particle region and the bulk density, i.e. the density of the particles with the voids included. While the analytical derivations (see [12, p. 121]) are relatively straightforward, the idea ‘works’ because there is no dispersion, i.e. the wave resistance is independent of the wave length. In the practical situation of discontinuous particles interacting with (likewise discontinuous) grids, with different dispersions (i.e. signal propagation velocities which depend on the wavelength and frequency), it is not clear up to what point ‘impedance matching’ is possible. Apart from the impedance, there is another, purely geometrical, issue: mechanical waves can only be transmitted into another medium if the wavelength is actually obtainable in the new medium. While particles with diameter d can swing in opposite directions, transferring such oscillations to a neighboring grid (even if it is made of springs, which would then not be isotropic) would require a lattice constant which is not too different from d. Thus, at least at the interface with the particles, the grid points must have the same density as the centers of mass of the particles. Reducing the point density on the grid as the distance from the particles increases may also lead to wavelengths which cannot be taken up by the grid and therefore waves that are reflected back towards the particles. Because the movement of the grid will follow the movement of the particles only very roughly due to the different interactions, the use of explicit integrators will not make sense— at least not for the particles. Implicit integrators such as the Gear predictor–corrector method will increase the stability of the simulation considerably [13].

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:31 — page 398 — #4

i

398

11.6

i

Understanding the Discrete Element Method

Coupling of particles and fluid

In the previous section we discussed some problems associated with coupling discrete elements and continua. But the continuum was still solid, like the discrete element particles, and also linear. When the nonlinear equations of fluids are introduced, more difficulties are to be expected. If one needs to simulate particles in fluids, from the modeling point of view one has two options: either one demands that the particles form the exact boundaries of the fluid, or one relaxes this constraint. In the latter case, the fluid at least partially goes ‘through’ the particles, which is called ‘macroscopic simulation’; in the former case, one has a ‘microscopic’ simulation. In these two frameworks, we will treat mesh-based (Eulerian) formulations of fluid simulations with particles. A third approach, where the fluid is simulated using particles (Lagrangian formulation), allows both macroscopic and microscopic formulations. Further, we will comment briefly on ‘novel’ approaches to simulating fluids, and conclude with a remark on the simulation of surfaces.

11.6.1

Basic considerations for the fluid simulation

From the start, one has to consider which properties of the fluid part are relevant to obtaining a realistic flow simulation. If the fluid part is not treated with sufficient rigor, the physical outcome of the simulation as a whole becomes dubious. Too much noise will destabilize the fluid part and prevent the formation of static configurations in the granular part.

11.6.2

Verification of the fluid code

One problem in comparing fluid simulations with analytical results is that most analytical results are derived by assuming boundaries at infinity; for the corresponding quantities in systems with finite boundaries, other methods or reference data must be used. For low flow velocities (Reynolds number Re < 2) in the ‘Stokes regime’, where the flow lines are basically parallel to the obstacles’ surfaces, it is common to compute the drag force. For narrow channels, often the wall correction factor (the multiple of the drag force for the boundaries at infinity) is computed as a test case. Both drag force and wall correction factor are often only available for circles [14] or spheres. In that case, one has to use a large number of corners for the approximation, and the orientation of the corners will also play a role [15]. Intuitively, one might be tempted to verify a fluid code, or the interaction of the fluid code with a particle, also for larger flow velocities via the drag force on an stationary particle. However, the outcome may be rather ambiguous. The drag force depends crucially on the properties of the surface, and small changes in the surface may lead to rather large changes in the drag. This fact is used by baseball pitchers, who throw the ball with ‘two seams’ or ‘four seams’, depending on the curve they want to give the ball. This is more than mere sports folklore: the variation of drag coefficients with the orientation and spin of the ball has also been scientifically established [16, 17]. Conventionally, in drag flow simulations it will be difficult to resolve the surface roughness of seams on a baseball, so the deviations in the drag force due to different discretizations or meshes will not be negligible; the same is true for ‘ideal’ spheres.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:31 — page 399 — #5

i

Beyond the Scope of This Book

i

399

A parameter which is more stable with respect to the underlying discretizations than the drag is the Strouhal number Sr =

fL v

for a body with diameter L, flow velocity v and frequency f for the shedding of vortices in a Karman vortex sheet.

11.6.3

Macroscopic simulations

In macroscopic simulations [18, 19], where there is no ‘excluded volume’, particles do not act as boundaries of the fluid; rather, the fluid can go through the particles. The interaction is then computed based on assumed interaction laws between the particles and the fluid. Apart from the problem that the assumptions may not be valid in the parameter region of the simulation, the volume exclusion and resulting blocking effects from particles cannot be modeled using that approach.

11.6.4

Microscopic simulations

In microscopic simulations, the flow goes around the particle. Generally, the fluid will be simulated as Newtonian fluid with the incompressible Navier–Stokes equations, which gives rise to the Stokes paradox: no solution to the low-Reynolds number (slow flow) Navier–Stokes equations can be found which would satisfy the boundary conditions both at the surface and at infinity. As the problem arises at infinity, one could just shrug this off, were it not for the fact that the Stokes paradox situation can be used to construct a proof, with the Navier– Stokes equation in differential form (‘strong formulation’), that two fluid-immersed particles cannot collide. On the other hand, in the ‘weak formulation’ (cum grano salis, with one spatial integration over the Navier–Stokes equations), a proof can be constructed where collision is possible (see [20] and references therein). This contradiction arises in a well-known equation which has been studied for a long time, in a regime of low Reynolds numbers, conventionally considered to be unproblematic, for a rheological regime which is not totally unlike that of flowing granular materials. So one should be aware that formulating a problem via partial differential equations does not necessarily lead to unambiguous solutions. Conventionally, in finite element (FEM) simulations, ‘weak solutions’ are understood as finite element solutions to a problem. However, finite volume and even finite difference discretizations can also be classified in the finite element formalism, so they can be considered weak solutions too [21]. While finite difference models are mathematically intuitive (as derivatives are simply replaced by finite differences), they have several drawbacks. In general, they have to be formulated on rectangular grids, which are incompatible with practically any particle shape. Also, finite difference methods are not translation invariant; as can be seen in Figure 11.1, for the same configuration of particle pairs, whether flow between the particles is possible or not will depend on the positions of the particles relative to the grid. While it is possible to have particles that overlap the underlying grid and work with the extrapolated boundary values set to zero (in which case the flow on the inside of the particles

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:31 — page 400 — #6

i

400

i

Understanding the Discrete Element Method

P4

P2 P1

P3

Figure 11.1 Artifacts of grid generation: for the same size as well as the same relative position and orientation, no flow is possible between particles P1 and P2, but flow is possible between P3 and P4.

Figure 11.2 Moving particle (indicated by the gray hexagons) overlaid with a grid, and interpolation of the boundary value of the particle for one grid point: the flow is zero on the particle boundary (circle); accordingly, the flow at the grid-points (crosses) must vary between positive and negative values.

must be negative; see Figure 11.2), this does not necessarily lead to practicable simulations: for a (two-dimensional) discretization of a circle on a grid, noise (non-smooth variation of the force on the particle while reaching its maximal sinking velocity) is much larger for finite difference formulations [22] than for the corresponding finite element simulations [23]. Common to all methods is the problem that when the mesh changes, the forces on the particles may change, too, and in a relatively non-smooth fashion. This applies to changes both due to movement of the particle and due to changes of the grid. A discretization where tens of mesh rectangles have the area of a particle will give solutions which are not smooth enough to guarantee smooth forces on the particles. This makes the use of implicit integrators necessary both for simulation of the particles and for simulation of the fluid.

11.6.5

Particle approach for both particles and fluid

There are several Lagrangian approaches to simulating fluids, i.e. the fluid is modeled with particles so that the collective behavior of these particles reproduces the pressures, viscosities,

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:31 — page 401 — #7

i

Beyond the Scope of This Book

Blocking

i

401

Shot noise

Figure 11.3 Possible artifacts in simulations of a fluid by Lagrangian methods: blocking of flow (left) and shot noise (right).

flow fields etc. A crucial difference from granular and other systems of solid particles is that the interaction is tangential, so that the assembly models ‘viscous’ behavior. While in DEM simulations in the absence of friction, the decay of a heap will look rather viscous, the normal interaction prevents, e.g., the sinking of a body with high density if it is placed on particles of lower density, even if the modulus of elasticity is very low. Switching off the Coulomb friction will not help either. Due to the particle character, fluid surfaces are generated automatically: the fluid boundary is where the particle density goes to zero. Smoothed particle hydrodynamics (SPH) originated in astrophysics [24], to study the transfer of material in solar systems between various celestial bodies. SPH is a simulation method for compressible fluids (conventionally, one speaks of compressible flow if the flow velocities transcend 10% of the sound velocity) which is able to mimic the wave-like propagation of density changes. The particles used in SPH are point particles (i.e. no rotational motion is taken into account, and forces between particles are due to the relative motion of the particle coordinates) with a certain smoothing radius over which the interaction of the particle is smoothed out. The interaction radius, the strength of the viscous force and the particle density together determine the viscosity: a given viscosity can be mimicked with a given density and a given viscous force, or half the density and twice the viscous force. The pressures are computed from the time evolution of the density. There is also an incompressible variant of this approach, called the ‘moving particle semiimplicit’ (MPS) method [25], where additional equations are employed to control the density variation. Similar approaches have been developed by Gauger et al. [26] using the ‘finite mass method’, with the explicit aim of reshaping the particles with different density to conserve the accuracy [27]. A general drawback of particle methods is that it is difficult to balance the particle densities: regions with reduced particle density (e.g. wakes behind an obstacle; see [28]) lead to increased fluctuations and reduced accuracy, although there have been proposals to improve the balance of the particle distributions and verify the accuracy gains [29]. Blocking of flow in two dimensions (see the left part of Figure 11.3) is even more of a problem than with grid methods, as it will be difficult to reconcile the smoothing radius for the SPH interaction with the width of the particle shadow. Another drawback of particle methods is that even stationary states have to be modeled by actual dynamical systems. While for discretizations of the fluid equations, at least the stationary flow solutions lead to stationary (constant) forces on the particles, there may be

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:31 — page 402 — #8

i

402

i

Understanding the Discrete Element Method

‘shot noise’ due to the motion of discrete fluid particles (see the right part of Figure 11.3). For transient problems with material transport, where fluid-dynamical details can be expected to be negligible, the approach is certainly attractive. A comprehensive introduction to the SPH method is the book by Liu [30]. To see how DEM and SPH can be used with free surfaces in a geoscience application, see Cleary et al. [31].

11.6.6

Mesh-based modeling approaches

Besides the particle approaches described in § 11.6.5, there are several ‘relatively’ novel fluid simulation approaches that don’t resort to the Navier–Stokes equation and which are gridbased. They all allow the implementation of relatively complicated boundaries of moving particles, though they have not yet found their way in to the mainstream of fluid dynamics simulations, mostly because their validity for higher flow velocities is under debate. We mention them here for completeness and because they allow relatively easy implementation of complicated boundaries. On the other hand, owing to the use of grids, like finite difference methods they are not necessarily translation invariant. Frisch, Haslacher and Pomeau [32] designed rules for a cellular automaton so that ‘integer’ particles moving on a hexagonal (triangular) grid recover the flow for the two-dimensional Navier–Stokes equation. Grid density and particle density determine the viscosity. The drawbacks for dealing with particles or porous flow are the same as for the grid models and the Lagrangian methods in § 11.6.5. For three spatial dimensions, a four-dimensional grid must be used to guarantee isotropy. Instead of using automata, the newer lattice Boltzmann method (see [33] and references therein) uses continuous amplitudes and the corresponding generalizations of the collision rules. While it is easy to construct even complicated boundary conditions, in mainstream computational fluid dynamics there is still a certain reserve towards this method. It is partially to do with the fact that the collision operator which is necessary for this approach depends on the lattice, and the dependence on the lattice inhibits the formulation of a Galilei-invariant approach.

11.7

The finite element method for contact problems

When contact mechanics problems involving several bodies are investigated with the finite element method (FEM), the interaction between the bodies is often implemented via a penalty method: the force between the bodies is chosen proportional to their overlap. As the bodies are usually discretized with elements equivalent to polygons or polyhedra, the implementation of the penalty method with the polygonal or polyhedral force laws explained in this book seems to be a natural approach. For dynamic simulations in which damping occurs, the latter should be modeled with the FEM solid, i.e. the penalty contacts should be modeled without dissipation—‘fully elastic’. To reduce the penetration between contacting FEM grids, the ‘Young’s modulus’ for the penalty part can be chosen higher than for the FEM part, as it is an unphysical ‘penalty constant’ anyway. An alternative to the penalty method would be an approach with FEM grids in touching contact. For the usually nonlinear surface deformations, this may be impractical due to the necessity of using very small time-steps to obtain the contacts as numerically realized constraints with the necessary precision.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:31 — page 403 — #9

i

Beyond the Scope of This Book

11.8

i

403

Long-range interactions

There are two kinds of long-range interactions which occur for granular materials: Newtonian gravitation (for granular material in asteroids) and Coulomb interaction (for electrostatically loaded grains). The exact computation of the forces would involve computing a loop of O(N 2 ) interacting pairs for N particles. To avoid this prohibitive effort, over the years several alternative approaches have been developed. What is special about gravitation and electrostatic Coulomb forces is that their 1/r 2 dependence allows the construction of centers of mass or charge concentration, respectively. This property can be used to reduce the computational effort. In the following we adopt the language for gravitational interaction. The particle–particle particle–mesh method. In this so-called P3 M method (Hockney et al. [34]), the long-range and short-range parts of the forces are separated. The long-range forces are added up approximately on an underlying grid. Then the particle interactions are computed with this grid for the long-range part and directly for the short-range part of the interaction forces. Ewald sums. Like the P3 M method, the Ewald sum approach splits the interaction into a short-range and a long-range part. Additionally, the masses or charges are approximated by Gaussians, to obtain favorable summation schemes. The resulting sums for the longrange part can then be computed via, e.g., particle-mesh approaches [35, 36] or FFT [37]. In [36, 37] the force terms are given explicitly, whereas many other papers give only the sums for the energy or the potential. Tree codes. In this approach, hierarchies of particles are constructed in tree structures. First, the particles are assigned to a domain by recursive subdivisions of space, until only a single particle is contained in a domain. The subdivisions can be described by a tree-like structure. Between closest domains, the centers of mass are computed, then the joint centers of mass, and so on. The information about the centers and the corresponding masses is communicated upward in the tree structure. From the masses, which have to be communicated downward again to the interaction partners, the respective forces on the objects are computed. The tree structure must support the neighborhood relation of the corresponding dimension. For two dimensions, ‘quad-trees’ (where each node branches up to four-fold) should be used; and for three dimensions, ‘oct-trees’ (where each node branches up to eight-fold) must be used. Multi-pole methods. These methods work with similar hierarchies to tree codes, but instead of summing the respective terms, they use series expansions in the kth moments, Mk =

N 

mi xik ,

i=1

for N particles of masses mi for the subdivisions; see Schlick [38] and references therein.

References [1] K. Johnson, Contact Mechanics. Cambridge University Press, 1987. [2] V. Popov, Contact Mechanics and Friction: Physical Principles and Applications. Springer, 2010.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:31 — page 404 — #10

i

404

i

Understanding the Discrete Element Method

[3] J. J. Moreau, “Unilateral contact and dry friction in finite freedom dynamics”, in Nonsmooth Mechanics and Applications, J. J. Moreau and P. D. Panagiotopoulos, eds., CISM Courses and Lectures, vol. 302, pp. 1–82, Springer, 1988. [4] J. J. Moreau and P. D. Panagiotopoulos, eds., Nonsmooth Mechanics and Applications, CISM Courses and Lectures, vol. 302, Springer, 1988. [5] E. Az´ema, F. Radjai, R. Peyroux, V. Richefeu, and G. Saussine, “Short-time dynamics of a packing of polyhedral grains under horizontal vibrations”, The European Physical Journal E, vol. 26, no. 3, pp. 327–335, 2008. [6] T. Laursen, Computational Contact and Impact Mechanics: Fundamentals of Modeling Interfacial Phenomena in Nonlinear Finite Element Analysis. Engineering Online Library, Springer, 2003. [7] P. Wriggers, Computational Contact Mechanics. John Wiley & Sons, 2002. [8] R. M. Brach, Mechanical Impact Dynamics: Rigid Body Collisions. Brach Engineering, 2007. [9] W. Stronge, Impact Mechanics. Cambridge University Press, 2000. [10] F. Kun and H. J. Herrmann, “Transition from damage to fragmentation in collision of solids”, Physical Review E, vol. 59, pp. 2623–2632, Mar 1999. [11] B. Behera, F. Kun, S. McNamara, and H. J. Herrmann, “Fragmentation of a circular disc by impact on a frictionless plate”, Journal of Physics: Condensed Matter, vol. 17, no. 24, article S2439, 2005. [12] H. J. Pain, The Physics of Vibrations and Waves, 6th ed. John Wiley & Sons, 2005. [13] M. Fuhr, Hybrid FE-DE Simulation of Notched Bar Impact Testing, Master’s thesis, Swiss Federal Institute of Technology (ETH), 2008. [14] A. B. Richou, A. Ambari, M. Lebey, and J. Naciri, “Drag force on a circular cylinder midway between two parallel plates at Re  1. Part 2: moving uniformly (numerical and experimental)”, Chemical Engineering Science, vol. 60, no. 10, pp. 2535–2543, 2005. [15] S. H. Ng and H.-G. Matuttis, “Two-dimensional microscopic simulation of granular particles in fluid”, Theoretical and Applied Mechanics Japan, vol. 60, pp. 105–115, 2012. [16] R. G. Watts and R. Ferrer, “The lateral force on a spinning sphere: Aerodynamics of a curveball”, American Journal of Physics, vol. 55, no. 1, pp. 40–44, 1987. [17] T. Taniguchi, T. Miyazaki, T. Shimizu, and R. Himeno, “Measurement of aerodynamic forces exerted on baseball using a high-speed video camera”, in The Impact of Technology on Sport: Proceedings of the Asia-Pacific Congress on Sports Technology, A. Subic and S. Ujihashi, eds., pp. 269–279, Australasian Sports Technology Alliance Pty, 2005. [18] Y. Pan, T. Tanaka, and Y. Tsuji, “Direct numerical simulation of particle-laden rotating turbulent channel flow”, Physics of Fluids, vol. 13, no. 8, pp. 2320–2337, 2001. [19] K. H¨ofler and S. Schwarzer, “Navier-Stokes simulation with constraint forces: Finite-difference method for particle-laden flows and complex geometries”, Physical Review E, vol. 61, pp. 7146–7160, Jun 2000. [20] M. Hillairet, “Do Navier-Stokes equations enable to predict contact between immersed solid particles?”, in Analysis and Simulation of Fluid Dynamics, C. Calgaro, J.-F. Coulombel, and T. Goudon, eds., pp. 109–127, Advances in Mathematical Fluid Mechanics, Birkh¨auser, 2007. [21] P. Gresho and R. Sani, Incompressible Flow and the Finite Element Method, Volume Two: Isothermal Laminar Flow. John Wiley & Sons, 2000. [22] G. H. Ristow, “Wall correction factor for sinking cylinders in fluids”, Physical Review E, vol. 55, pp. 2808–2813, Mar 1997. [23] S. H. Ng and H.-G. Matuttis, “Adaptive mesh generation for two-dimensional simulation of polygonal particles in fluid”, Theoretical and Applied Mechanics Japan, vol. 59, pp. 323–333, 2011. [24] R. Gingold and J. Monaghan, “Smoothed particle hydrodynamics: theory and application to non-spherical stars”, Monthly Notices of the Royal Astronomical Society, vol. 181, pp. 375–389, 1977. [25] S. Koshizuka and Y. Oka, “Moving-particle semi-implicit method for fragmentation of incompressible fluid”, Nuclear Science Engineering, vol. 123, no. 3, pp. 421–434, 1996. [26] C. Gauger, P. Leinen, and H. Yserentant, “The finite mass method”, SIAM Journal on Numerical Analysis, vol. 37, pp. 1768–1799, 2000. [27] H. Yserentant,“The convergence of the finite mass method for flows in given force and velocity fields”, in Meshfree Methods for Partial Differential Equations, M. Griebel and M. A. Schweitzer, eds., vol. 26 of Lecture Notes in Computational Science and Engineering, Springer, 2003. [28] A. V. Potapov, M. L. Hunt, and C. S. Campbell, “Liquid–solid flows using smoothed particle hydrodynamics and the discrete element method”, Powder Technology, vol. 116, pp. 204–213, 2001.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:31 — page 405 — #11

i

Beyond the Scope of This Book

i

405

[29] M. S. Shadloo, A. Zainali, S. H. Sadek, and M. Yildiz, “Improved incompressible smoothed particle hydrodynamics method for simulating flow around bluff bodies”, Computer Methods in Applied Mechanics and Engineering, vol. 200, pp. 1008–1020, 2011. [30] G. R. Liu and M. B. Liu, Smoothed Particle Hydrodynamics: A Meshfree Particle Method. World Scientific, 2003. [31] P. W. Cleary and M. Prakash, “Discrete-element modelling and smoothed particle hydrodynamics: potential in the environmental sciences”, Philosophical Transactions of the Royal Society A, vol. 362, pp. 2003–2030, 2004. [32] U. Frisch, B. Hasslacher, and Y. Pomeau, “Lattice-gas automata for the Navier-Stokes equation”, Physical Review Letters, vol. 56, pp. 1505–1508, 1986. [33] S. Succi, The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond. Numerical Mathematics and Scientific Computation, Clarendon Press, 2001. [34] R. Hockney and J. Eastwood, Computer Simulation Using Particles. Adam Hilger, 1988. [35] T. Darden, D. York, and L. Pedersen, “Particle mesh Ewald: An N · log(N ) method for Ewald sums in large systems”, Journal of Chemical Physics, vol. 98, pp. 10089–10092, 1993. [36] U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee, and L. G. Pedersen, “A smooth particle mesh Ewald method”, Journal of Chemical Physics, vol. 103, pp. 8577–8593, 1995. [37] D. York and W. Yang, “The fast Fourier Poisson method for calculating Ewald sums”, Journal of Chemical Physics, vol. 101, no. 4, pp. 3298–3300, 1994. [38] T. Schlick, Molecular Modeling and Simulation: An Interdisciplinary Guide. Interdisciplinary Applied Mathematics, Springer, 2010.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 19:31 — page 406 — #12

i

i

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 407 — #1

i

i

A R MATLAB as Programming Language

R MATLAB is not only a useful programming language but also a convenient tool for fast prototyping of algorithms for use in other programming languages. In this appendix we give a R , aimed at programmers who have already mastered an algorithbrief overview of MATLAB mic language. The uses of specific functions are explained or illustrated in the chapters where they are relevant. In the exercises, we have provided a whole introduction to linear algebra, as eigenvalues turn up as a tool of analysis in various aspects of DEM modeling and simulation.

A.1

R Getting started with MATLAB

R For learning MATLAB , its built-in help function is sufficient; other books or documentation will be needed only if one wants to study particular algorithms or the many possibilities of its graphics features. An explanation of how to use help can be obtained by typing help help. The command help on its own displays the scope of the language; help followed by the name of a specific topic will list individual commands under that topic. The help texts also contain references to functions with similar functionality, which is a helpful R . aid in exploring the capabilities of MATLAB R MATLAB can be used in command-line mode as a scientific calculator. If no variables are defined, the result of the most recent calculation is automatically assigned to the variable ans (for ‘answer’), which can be used in further calculations:

>> 4+5 ans = 9 >> ans*2 ans = 18

Understanding the Discrete Element Method: Simulation of Non-Spherical Particles for Granular and Multi-body Systems, First Edition. Hans-Georg Matuttis and Jian Chen. © 2014 John Wiley & Sons, Singapore Pte Ltd. Published 2014 by John Wiley & Sons, Singapore Pte Ltd. Companion website: www.wiley.com/go/matuttis

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 408 — #2

i

408

i

Appendix A

R (In this chapter, we have suppressed blank lines and spaces in the original MATLAB output R  to save space.) MATLAB comes into its real strength with programs, which can be either script-files (sequential pieces of code) or functions (with interfaces). Filenames for both funcR tions and scripts must have the extension .m. Here we give a brief overview of MATLAB R  to help beginners find their way faster and more experienced MATLAB users see which subsets of the language we are using. R is an interpreted language, i.e. in principle the program files are read in at runMATLAB time and then translated and executed line by line; by contrast, with compiler languages the whole program is translated in one go by the compiler so that the execution is not slowed R internally compiles prodown. As of release 2012, for performance reasons MATLAB gram parts which are longer than mere lines. An advantage of developing programs with interpreters instead of compiler languages is that when the program stops due to an error, the whole memory content is still accessible and can be used for debugging; moreover, for R code. In debugging, the same syntax and graphics can be used as for regular MATLAB contrast, debuggers for compiler languages have to be linked (often making recompilation necessary), can change the outcome of the program, and may operate with a different syntax than the original language. Also, for compiler languages, at least the supporting graphics (if R R programs, MATLAB ’s any) will be system-dependent, whereas for debugging MATLAB own graphics features can be used.

A.2

Data types and names

R MATLAB was originally developed as MATrix LABoratory, a tool to teach numerical analysis students complex algorithms for numerical linear algebra. Accordingly, the elementary data structures are matrices. Therefore, numerical (floating point) calculations can be written R R and MATHEMATICA , which started out as commuch more concisely than in MAPLE puter algebra programs, with the corresponding abundance of data structures and symbolic R is the complex matrix in double precision functions. The elementary data type of MATLAB (approximately 16 digits). Variables do not have to be declared, but can be assigned ‘on the fly’. Assignments are performed with =, and all conventional representations can be used, so

a=1, a=1., a=1.e0, a=1.E0, a=1.d0, a=1.D0 are all equivalent. (Not only the E-format but also the D-format from FORTRAN is recognized, so that test data can be exchanged freely between programs written in the two R , indices start at 1, not 0 as in C, and no negative or zero indices languages.) In MATLAB (as in FORTRAN) are allowed. Global variables that one wants to use in different functions without the exchange in the argument lists of functions are declared via global a. The function who can be used to obtain an overview of all the initialized variables, while whos displays the variables along with the size and data type of each. Variable names may contain numbers, but not as the first character in the name; so var1 R is case-sensitive, i.e. it makes a distinction between is allowed, but not 1var. MATLAB uppercase and lowercase letters, but we advise against giving different variables names which are made up of the same letters but differ only in capitalization. To learn more about a function name, type help name, with name in lowercase letters; in the resulting help text,

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 409 — #3

i

R MATLAB as Programming Language

i

409

however, the function name will appear in capital letters for highlighting purposes. The help R text will often contain program examples which can be copied and pasted into the MATLAB window and executed. If one is not able to guess the name of a function, one can search for related terms in help by typing lookfor term; this browses help for occurrences of term. If typed without any options, lookfor term returns those functions which contain term in their title. For example: >> lookfor Gauss quadgk - Numerically evaluate integral, adaptive Gauss-Kronrod quadrature. If one wants to search for occurrences of term anywhere in the help text, not just in function titles, use the -all option, as in >> lookfor Gauss -all which gives too many hits to list here. The escape sequence to the operating system is ! . So one can obtain the current directory R command ls or (under Unix) ! ls. For actions listing by typing either the MATLAB R ’s such as copy and remove, both the commands of the operating system and MATLAB copyfile and delete can be used. R spans the whole line. Three dots ... at the end of a line A command line in MATLAB indicate that the statement on that line is continued into the next line. Any text following a percentage sign % is treated as comment text. Most of the names of functions and constants are quite self-explanatory, and the abbreviations √ are intuitive. Some pre-defined constants are pi for π and i or j for the imaginary unit −1. Standard functions are usually abbreviated with three letters, as in other programming languages, such as exp, sin, cos, min and max. A very convenient trigonometric function is atan2 which, given the y- and x-components of a vector as arguments, computes the angle φ ∈ [−π, π ] such that tan(φ) = y/x (if (x, y) is in quadrant 1 or quadrant 2, φ will be positive; and if (x, y) is in quadrant 3 or quadrant 4, φ will be negative). Functions with fourletter names include atan and ceil (rounding to the next larger integer). Longer names are also in use, e.g. round or floor (rounding to the next smaller integer) and some special functions from mathematics, such as the  function gamma and the various Bessel functions besselh, besseli, besselj and besselk.

A.3

Matrix functions and linear algebra

Matrices can be assigned directly by putting numbers within square brackets [ ]: A=[ 1 3

2 4]

Rows and columns must be consistent in length, or an error message will be issued. When one assigns a matrix element outside the index range of the current indices, e.g. A(3,3)=5, the size of the matrix is incremented (to three rows and three columns in this case), and any remaining unassigned entries are filled with zeros; for the above example, we would have

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 410 — #4

i

410

A=[ 1 3 0

i

Appendix A

2 4 0

0 0 5]

R However, in MATLAB all the data are actually copied into a new matrix, which is then given the old name; this means that repeated assignments of uninitialized matrix elements will slow down the performance, compared to an initialization via zeros. There are some special matrices that can be used to initialize or construct other matrices. One such matrix is ones: ones(5) will give a 5 × 5 matrix with 1 as every entry, and ones(5,1) will give a matrix with five rows and one column (i.e. a column vector of length 5) whose elements are all equal to 1. Similar matrices can be obtained whose entries are all zeros (zeros), uniformly distributed random numbers (rand), or Gaussian distributed random numbers (randn). The identity matrix is generated with the function eye. Most functions operate on matrices elementwise, except for matrix functions such as inv (matrix inverse), norm (matrix norm), eig (eigenvalues), expm (exponential defined in the matrix sense, i.e. for the eigenvalues of the matrix) and logm (matrix logarithm). A twodimensional version of linspace to create matrix pairs with x- and y-dependencies is R evaluates products of arrays in the sense of matrix meshgrid; see Exercise 1.1. MATLAB algebra, in the usual row-by-column order. Accordingly, a scalar (inner) product is the result of multiplying a row vector by a column vector:

>> [1 2 3]*[1 2 3] ans = 14 If the order of multiplication is reversed, so that the column vector is on the left and the row vector on the right, the result is the outer product (which is not a vector product): >> [1 2 3]*[1 2 3] ans = 1 2 2 4 3 6

3 6 9

If one wants to compute a scalar product of two non-conforming vectors, the dot function can be used: >> a=[1 2 3] a = 1 >> b=[1 2 3]

2

3

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 411 — #5

i

R MATLAB as Programming Language

i

411

b = 1 2 3 >> dot(a,b) ans = 14 For elementwise multiplication, division or exponentiation, instead of * , / or ˆ the operators .* , ./ or .ˆ must be used. These elementwise variants should also be used to write functions which can operate elementwise on matrices. Because n × n matrices are different from vectors of length n2 , it is sometimes necessary to change the dimensions of a matrix with the reshape command. Division by a matrix from the left, using \ , or from the right, using /, corresponds to multiplication by the inverse of the matrix: >> B=hilb(2) B = 1.0000 0.5000 0.5000 0.3333 >> inv(B)*a-B\a ans = 1.0e-14 * -0.3553 0.7105 As can be seen, inv(B)*a and B\a are not exactly the same due to rounding errors. For large matrices, \ and / will be faster than computation of the inverse, as only an LU-decomposition of the matrix into a product of a lower-triangular matrix L and an upper-triangular matrix U is needed: ⎞ ⎞⎛ ⎛ .. .. . . . . . . .. . . . ⎟⎜ ⎟ ⎜ ⎟⎜ ⎜ .. .. ⎟ .. .. ⎜ ⎟ ⎜ . . . U . ⎟ 0 ⎟⎜ ⎟x = b Ax = ⎜ ⎜ ⎟ ⎜ .. . .. .. .. ⎟ ⎟⎜ ⎟ ⎜ . . . 0 L ⎠⎝ ⎠ ⎝ .. .. .. . . . ... ... R which can be achieved by using MATLAB ’s lu function. From this product of triangular matrices the linear system is solved by backward-substitution. On the other hand, to form the inverse of a matrix A with N rows and N columns, the system ⎞ ⎞⎛ ⎛ .. .. . . . . . . .. . . . ⎟ ⎟⎜ ⎜ ⎟⎜ ⎜ .. .. ⎟ .. .. ⎜ ⎟ ⎜ . . U . 0 . ⎟ ⎟ ⎟⎜ (A.1) A˜xi = ⎜ ⎜ ⎟ ⎜ .. .. ⎟ x˜ i = ei .. .. ⎟ ⎜ ⎟ ⎜ . . . 0 . ⎠ L ⎠⎝ ⎝ .. .. .. . . . ... ...

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 412 — #6

i

412

i

Appendix A

must be solved for all N column vectors ei of the identity matrix, ⎛ ⎜ ⎜ ⎜ e1 = ⎜ ⎜ ⎝

1 0 0 .. . 0





⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ , e2 = ⎜ ⎟ ⎜ ⎠ ⎝

0 1 0 .. .





⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ , e3 = ⎜ ⎟ ⎜ ⎠ ⎝

0

0 0 1 .. .





⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ , . . . , eN = ⎜ ⎟ ⎜ ⎠ ⎝

0

0 0 0 .. .

⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠

1

The inverse A−1 is then the matrix which has the solutions x˜ i to Equation (A.1) as its columns:  A−1 = x˜ 1

x˜ 2

x˜ 3 · · · x˜ N .

The overhead of computing N solution vectors x˜ i of Equation (A.1) and then multiplying the resulting A−1 by b will cost about twice as much CPU time as the operation x=A\b. For non-square matrices, (Moore–Penrose) pseudo-inverses can be defined as A+ = (AT A)−1 AT

(A.2)

A+ = (AAT )−1 A.

(A.3)

or

Depending on the shape of the matrix, the variant (A.2) or (A.3) is chosen according to whether AT A or AAT has the smaller size (this size will the smaller of the number of rows R and the number of columns of A, i.e. min(nrows , ncolumns )). In MATLAB , the function for numerical computation of the pseudo-inverse is pinv. For (A.2) or (A.3), the product A+ A or AA+ , respectively, will be the identity matrix of rank(A) and size min(nrows , ncolumns ). The pseudo-inverse A+ is either a right-inverse or a left-inverse; see Exercise 1.6. For a square matrix, the result of applying pinv is equivalent (i.e. very close, but not the same up to the last bit) to the inverse. Left- or right-division (\ or /) by non-square matrices is interpreted in R as solution with pseudo-inverses; this means that, depending on the dimensions MATLAB of the matrix, a division might give an error message or a numerical result: >> pi/([1 2 3]) Error using / Matrix dimensions must agree. >> ([1 2 3])\pi ans = 0 0 1.0472 Often, what is intended when one performs division with non-square matrices is elementwise division:

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 413 — #7

i

R MATLAB as Programming Language

>> pi./([1 2 3]) ans = 3.1416

1.5708

i

413

1.0472

Pseudo-inverses and the operations \ and / for non-square matrices are at the core of techR via polyfit. Owing to the niques such as least-squares fitting, available in MATLAB importance of least-squares fitting in data processing, we have given an introduction to pseudo-inverses here. For the associated accuracy issues; see Exercise 1.7.

A.4

Syntax and control structures

At the beginning of a program, it is advisable to reset the memory and erase pre-existing variables by using clear or, when global variables have been used, clear all. With the R format command, one can control the length of the displayed output. Usually, MATLAB inserts blank lines between results, and these can be suppressed by format compact. The number of digits displayed is controlled by format short (5 digits), format long (all 16 digits) or format + (only the sign is displayed). Usually, format short is sufficient for monitoring the program flow—only the output is shortened; the computation is still performed with the full double precision (of about 16 decimal digits). Trailing zeros after the decimal point are suppressed: >> 1.0000000 ans = 1 If the result contains a string of zeros after the decimal point, it means that there are non-zero digits further along: >> 1+1e-7 ans = 1.0000 When the elements of a vector are of vastly diverging size, the representation is rescaled so that the largest element is of order one; this could lead to a display of only zeros for the smallest element: >> [1e+4 5 1e-6] ans = 1.0e+04 * 1.0000 0.0005

0.0000

R , each line generally contains a single command; if more commands need to In MATLAB be written on a single line, they have to be separated by commas. The semicolon has no syntactic function as in C, but it suppresses the output of the result of an operation: without semicolons, the result of every operation will be echoed on the R , output operations don’t have to be specified explicitly; screen. This means that in MATLAB nevertheless, they are available in the display and printf commands. Round brackets, i.e. parentheses ( ), can be put around arithmetic expressions to signify precedence of arithmetic operations. They are also used to enclose indices of arrays,

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 414 — #8

i

414

i

Appendix A

so A(k,l) is the element in the kth row and the lth column of matrix A. Parentheses are used to enclose the input arguments of functions as well, so exp(1.) yields the Euler number e. Square brackets [ ] are used for array constructions. A row vector can be assigned by v=[1 2 3 4] or v=[1,2,3,4] (i.e. the elements can be separated by spaces or commas). Column vectors are assigned by v=[1;2;3;4] (i.e. semicolons are used to separate the rows of an array); alternatively, instead of semicolons one can use newlines: v=[1 2 3 4] Array construction works recursively, i.e. the elements inside square brackets can themselves be vectors, matrices or matrix-construction functions like zeros, ones and eye. Square brackets are also used to receive the output of some functions. Each single-variable output can be enclosed in square brackets, as in [mypi]=4*atan(1), and an output consisting of multiple variables must be contained in square brackets. The if command is terminated with an end: if (a==6) a=4 end Equality and inequality are tested with the usual operations = and ==; ‘true’ is represented by 1, and ‘false’ is represented by ‘0’. The comparison works elementwise in the vector sense: >> a=[1 a = >> b=[1 b = >> a==b ans =

2 3 4] 1 2 3 3 0] 1 3 1

0

3

4

3

0 1

0

Unlike in C, it is syntactically not possible to use by mistake an assignment = instead of the ‘equals’ symbol == in a comparison. Local operators are & (and), | (or) and ˜ (not), which work also with integer results from floating point operations. R the loop command for, like the if command, is terminated by end, not In MATLAB endif as in FORTRAN. For example, for k=1:5 a(k)=2*k end Instead of the above explicit loop, an implicit loop a=2*[1:5] with the colon operator : can be used. If a step-size is set for the loop, the order is lower index:step-size:upper index.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 415 — #9

i

i

R MATLAB as Programming Language

415

Loops can be exited prematurely by using the break statement. Implicit loops work also for negative step-sizes and indices, as in >> a=[9:-1:1] a = 9 8 >> a(1:2:8) ans = 9 7

7

6 5

5

4

3

2

1

3

Non-integer step-sizes are possible but not advisable, as even with small rounding errors (in the 14th digit in the following example) the upper bound may not be reached: >> step=1/3 step = 0.3333 >> 0:step:1 ans = 0 0.3333 >> step2=step+1e-15 step2 = 0.3333 >> 0:step2:1 ans = 0 0.3333

0.6667

1.0000

0.6667

If the number of elements in the implicit loop and the upper and lower bounds are known, it is usually safe to create the vector via the linspace command: >> linspace(1,pi,3) ans = 1.0000 2.0708

3.1416

The end operator provides a convenient way to access the last element in a vector, and can be manipulated arithmetically, though rounding is usually advisable: >> a=1:5 a = 1 2 3 4 5 >> a(end) ans = 5 >> a(end/2) Subscript indices must either be real positive integers or logicals. >> a(round(end/2)) ans = 3

A.5

Self-written functions

R There are basically three kinds of functions in MATLAB . Those which are part of the R R  MATLAB kernel are the built-in functions; then there are the m-files in the MATLAB R  directory, which are functions programmed all in MATLAB ; finally, there are self-written functions. By typing which name, you can find out whether the function name is built-in

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 416 — #10

i

416

i

Appendix A

R or, if it is an m-file, the directory path in which it can be found. MATLAB ’s m-files can be R  read to learn about programming style in MATLAB , or modified (and saved under a new name in a different directory) if a different functionality is desired. R function is a file with extension .m which has the basic structure A MATLAB

function [out1,out2,...]=functionname(in1,in2,...) % Comment echoed in the help function [Body of the function] return A function’s name may deviate from the name of the file it is contained in; however, the function is called by the filename. A single output argument can be written without brackets, but multiple output arguments have to be enclosed in square brackets. The input arguments in1, in2, etc. are transferred to the function by a copy of the original value (the ‘call by value’), so if they are overwritten within the function, this has no effect on their value in R the type is part of each variable, the explicit the calling program. Because in MATLAB dimensions of vectors or arrays don’t have to be passed into the function; if necessary, they can be queried within the function by the command size or, when only the largest dimension of a matrix or vector is needed, length. If a modification of an input variable is desired, it has to appear in both the input and the output arguments: function [inout]=functionname(inout) R , but when it is used to terminate a The return command is not necessary in MATLAB function, the interpreter is able to issue more useful error messages in cases where the control structure is defective, e.g. due to missing end statements. One can also add return to the end of the main program and store unused code pieces after it. To terminate the execution of a program at any point, the error command can be used. R looks for m-files either in the current directory or in the directories listed in MATLAB the function matlabpath. If one wants to use a self-written function stored in another directory, one has to either change to that directory or add the function’s directory path to matlabpath. The first comments inside a function (just below the function’s name and input and output R ’s help file. So, for a self-written funcarguments) are read in and displayed by MATLAB tion, help functionname will display the explanation one wrote at the beginning of the function. R script which We like to use the following naming convention taken from [1]: a MATLAB R  has no other purpose than to call or test a MATLAB function name will be called dr_name (driver for name). This allows one to identify immediately the main programs in a directory, and which files belong together.

A.6

Function overwriting and overloading

It is possible to overwrite the names of most built-in constants and functions. For example, if R ’s built-in pi will be one uses pi as the name of a variable in one’s computations, MATLAB overwritten and therefore unavailable for comparisons. The old functionality can be recovered

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 417 — #11

i

R MATLAB as Programming Language

i

417

with the command clear name, where name is the name of the overwritten function. Only for few control statements, such as end, will an attempt to overwrite them lead to an error message. The authors routinely redefine i and j as loop indices, since their original meaning, the imaginary unit, can always be obtained as sqrt(-1). R are ‘overloaded’, i.e. they give different output depending Many functions in MATLAB on the number of output arguments, or have different functionality according to the input arguments. A function which is overloaded to the hilt is the hist function. It creates histograms and is a very versatile tool for gaining an overview of distributions, both for debugging and for data analysis. For data arranged in a vector a, hist(a) will plot the data in a histogram with ten bins; hist(a,n) will use n bins. If, instead of the number of bins a vector x (with monotonically increasing entries) is specified, hist(a,x) will use x to define the bins’ positions; the elements of x don’t have to be equally spaced. If one specifies output arguments, e.g. [nx,x]=hist(a), the graphical output will be suppressed and instead the positions of the histogram bins will be written in the vector x and the height of the bars in vector nx. This is useful if a plotting style different from the default (e.g. semi-logarithmic) is desired. If one wants to write one’s own overloaded function, it is well worth studying the (not easily readable) source code of hist.m, as it provides an instructive example of a function that responds differently depending on the type and number of input and output arguments. Another useful overloaded function is eig. The call eig(A) gives the same result as eig(A,E), where E is the identity matrix with the same dimensions as A, i.e. it will give the λ values that solve the eigenvalue problem det(A − λE) = 0. The call eig(A,B), where B is a general matrix with the same dimensions as A, solves the generalized eigenvalue problem det(A − λB) = 0. The call D=eig(A) returns the eigenvalues of A as a vector D. The call [V,D] = eig(A) will return the eigenvalues of A as the diagonal elements of the diagonal matrix D, and the eigenvectors will be given as the columns of matrix V; this functionality is useful for diagonalizing stress and strain tensors obtained from DEM computations.

A.7

Graphics

R Graphics in MATLAB are not only for the display of final output; monitoring critical quantities graphically during the programming and debugging phases helps to speed up program development considerably. The graphics window can be cleared with the command clf. Values of vectors can be plotted alone or against other vectors via the plot command in two dimensions or the plot3 command in three dimensions. Matrices are visualized with the mesh command. Grid lines can be added to the plot with the grid command to guide the eye. Polyhedra can be drawn by using path and fill. If several graphs should be displayed together, subplot(n,m,l) will partition the graphics window into n times m sub-windows (arranged in n rows and m columns); the argument l selects the lth sub-window for the current plot. Sub-windows can have different sizes, e.g.

subplot(1.3,2,1) subplot(2,2,2) subplot(4,2,6) subplot(4,1,4)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 418 — #12

i

418

i

Appendix A

but if any of them overlap, the older sub-windows will be erased. New sub-windows at arbitrary positions which don’t erase old ones (e.g. for insets) can be placed using the axes command. The axis command (with ‘i’ instead of ‘e’) is used to set axis annotations, ranges etc. New graphics windows can be opened with the figure command. Issuing a new plot or mesh command for the same graphics window or sub-window will overwrite the previous plot, but hold on allows the plotting of several graphs in the same window. Note, however, that a figure generated by plotting several data sets with hold on in between may deviate in appearance from one where all the data are plotted with a single command, especially for three-dimensional plots. After plotting, the plots can be manipulated (e.g. changing the fonts of labels) by using the toolbar above the display window. Nevertheless, if multiple graphs have to be manipulated in the same way several times, it is more efficient to use commands from R language. For example, the viewing perspective can be set with the command the MATLAB view, and to produce graphics files one can type print -depsc2 name.eps (eps for encapsulated postscript file, c for color, and 2 for level-2 postscript, which allows compression). Changes of the line width in graphs are also possible, e.g. plot(a,’Linewidth’,2) will plot the data in a with twice the usual line width.

A.8

Solving ordinary differential equations

For the numerical solution of ordinary differential equations (ODEs), we need to define a function which contains the differential equation to be solved. Suppose that the ODE is written in first-order form dx = f (x, t) dt (see § 2.2.1; here x and f may be vectors). The first-order derivative dx/dt must be assigned R as the output of a MATLAB function which describes the right-hand side f (x, t) of the ODE and takes as input the variables x and t, For example, to solve the differential equation x˙ = −x, we write the function function [dxdt]=expfun(t,x) dxdt=-x; return The independent variable t is the first input argument, and the dependent variable x is the second input argument. Parameters can also be specified, but in general it is more convenient

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 419 — #13

i

R MATLAB as Programming Language

i

419

to pass parameters as global variables. The name of the above function expfun is then passed R ’s integrator: as a string (i.e. enclosed in single quotation marks) to MATLAB clear, format short, format compact xinitial=1, timespan=[0 3] [t,x]=ode23(’expfun’,timespan,xinitial); return For new programming projects, we recommend the use of ode23, as it is of low order and more robust against programming mistakes in the differential equation than higher-order methods. The selection of suitable solvers is discussed in Chapter 2, in particular § 2.9. Besides the ‘right-hand side’ function, further input arguments for ODE solvers are the time-span and the initial condition. Output arguments are the vector of values for the independent variable t and the vector (or matrix) of computed values for the dependent variable x, which can then be post-processed, plotted etc. The values of the independent variable are selected within the interval timespan by the integrator according to the desired accuracy. If a different accuracy is desired (higher for better accuracy, or lower for faster execution speed), the relative and/or absolute error (see § 2.1.3) can be specified in the options list of the odeset command. Relative tolerances (RelTol), absolute tolerances (AbsTol), maximal time-steps (MaxStep) and many other variables can be adjusted via the odeset function. Capitalization is not obligatory: relTol and reltOl will do just as well. The resulting code that computes, with two different accuracies, the exponential function exp(−t) from the differential equation x˙ = x looks like clear, format short, format compact xinitial=1, timespan=[0 3] options1=odeset(’RelTol’,1e-1,’AbsTol’,1e-1); [t1,x1]=ode23(’expfun’,timespan,xinitial,options1); options2=odeset(’RelTol’,1e-4,’AbsTol’,1e-4); [t2,x2]=ode23(’expfun’,timespan,xinitial,options2); subplot(1.8,1.8,1) plot(t1,x1,’*--’,t2,x2,’o-’) legend(’RelTol=10ˆ{-1},AbsTol=10ˆ{-1}’,... ’RelTol=10ˆ{-4},AbsTol=10ˆ{-4}’) The output is plotted in Figure A.1, where one can see that higher accuracy leads to denser solution points; in this simple example, the solution trajectories obtained with different accuracies are not visibly different. For the time-span variable, besides the initial and final times, sub-steps can be specified, as in timespan2 of the following example: clear format short format compact

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 420 — #14

i

420

i

Appendix A

1 RelTol = 1e−4, AbsTol = 1e−4 0.8

RelTol = 1e−1, AbsTol = 1e−1 RelTol = 1e−1, AbsTol = 1e−1, interpolated

0.6 0.4 0.2 0

0

0.5

1

1.5

2

2.5

3

Figure A.1 Exponential function x = exp(−t) computed as the solution of the differential equation x˙ = x, with high accuracy and small step-size (◦) and with low accuracy and large step-size (∗); the crosses (×) are points obtained by interpolating the low-accuracy solution between the asterisks.

xinitial=1 timespan1=[0 3] timespan2=[0:0.1:3] [t1,x1]=ode23(’expfun’,timespan1,xinitial); [t2,x2]=ode23(’expfun’,timespan2,xinitial); subplot(1.8,1.8,1) plot(t1,x1,’*’,t2,x2,’o’) legend(’computed solution’,’interpolated solution’) However, the solution [t2,x2] is not computed at the times given in timespan2; it is actually computed at the same time-steps as for [t1,x1] (because the same integrator is used for both), and then interpolated at the values in timespan2 (via ‘dense output’, the internal representation of the solution as a power series between two time-steps). Therefore, the accuracy of the solution is not improved by setting sub-time-steps; but for graphics and data analysis, ‘smoother curves’ with intermediate values can be produced.

A.9

R Pitfalls of using MATLAB

In this section, we summarize several kinds of unexpected behavior that might be encounR tered when using MATLAB ; some are due to unsuitable input data, and others arise from using notation similar to other programming languages but which gives different results in R . All of the following cases have been known to lead to a great deal of time being MATLAB wasted, even by experienced programmers. Inadvertent deletion of end: Because both loops and if-conditions are terminated by end, careless deletion of end statements (or insertion of for or if statements while forgetting the corresponding end statements) upsets the whole program flow.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 421 — #15

i

R MATLAB as Programming Language

i

421

a1,1 a1,2 a1,3 a1,4

a1,1

a1,2

a1,3

a1,4

a2,1 a2,2 a2,3 a2,4

a2,1

a2,2

a2,3

a2,4

a3,1 a3,2 a3,3 a3,4

a3,1

a3,2

a3,3

a3,4

a4,1 a4,2 a4,3 a4,4

a4,1

a4,2

a4,3

a4,4

a5,1 a5,2 a5,3 a5,4

a5,1

a5,2

a5,3

a5,4

a6,1 a6,2 a6,3 a6,4

a6,1

a6,2

a6,3

a6,4

R Figure A.2 Two-dimensional array (left) and the corresponding data layout in MATLAB ’s memory (which is the same as in FORTRAN): the data are stored column-wise in the memory.

Use of the wrong data array dimensions: To assign a vector of nm elements to a n×m matrix, R ’s reshape function. the shape of the array must be changed explicitly using MATLAB Nevertheless, for a matrix which has been assigned via A=[1.1 2.2 3.3 4.2 5.3 6.4 7.3 8.4 9.5] the elements, which are conventionally accessed by two indices as A(k,l), can also be accessed using a single index: >> A(3) ans = 7.3000 >> A(5) ans = 5.3000 >> A(8) ans = 6.4000 R In this case A(n) means the nth element of array A in the storage, and in MATLAB arrays are stored column-wise; see Figure A.2. This layout the same as in FORTRAN, R nested loops whereas in C the data layout in memory is row-wise. Therefore, in MATLAB ordered as

for i=1:l for j=1:n .....a(j,i).... end end

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 422 — #16

i

422

i

Appendix A

can be executed faster (with fewer cache misses) than for j=1:n for i=1:l .....a(j,i).... end end It is possible to access the elements of A with a one-dimensional index, according to the position in the ‘unrolled’ array on the right of Figure A.2. For example, A=[1.1 2.2 3.3 4.2 5.3 6.4 7.3 8.4 9.5] I=[1 3 6] J=[1 2 3 4] gives >> A(I) ans = 1.1000 >> A(J) ans = 1.1000 7.3000

7.3000

8.4000

4.2000 2.2000

The output is always displayed in the shape of the input matrices (I and J here). Assignments of scalars to arrays are done elementwise. In the same way, comparisons of arrays with scalars are performed elementwise: if matrix A is compared for equality with a scalar c, A==c will only be true if all elements of A are the same and equal to c. Wrong order of specifying the lower bound, upper bound and step-size: In many programming languages (including C, BASIC and FORTRAN), the order for specifying loop R , the step-size must variables is lower bound, upper bound, step-size; but in MATLAB be specified in the middle. Using a comma instead of a colon: In Fortran, the lower and upper bounds for loop variables R , if one types a comma instead of a colon :, as in are separated by a comma. In MATLAB the following example, >> for k=1,4 k end ans = 4 k = 1 the result is a syntactically valid statement; however, the loop variable k will assume only R interprets the above code as the value 1, because MATLAB

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 423 — #17

i

R MATLAB as Programming Language

i

423

>> for k=1:1 4 k end ans = 4 k = 1 R , so the 4 after the comma is The comma is used to separate statements in MATLAB treated as a different statement following the statement for k=1. Obtaining extremal elements of arrays with max and min: The functions max and min operate on different data types as well as on single variable names. For a vector a, max(a) will return the largest element of a, while for a matrix A, max(A) will return the row vector containing the maximal element, not the element alone. The maximal element of matrix A can be obtained by max(A(:)). Undesired program continuation for complex results: In compiler languages, an unintentional change of data type from real to complex will lead to a program crash if the variables R knows no variable declarations, it will involved were declared as real. As MATLAB continue with complex arithmetic. If the program should be stopped when values become complex, one should examine intermediate results using the real and imag commands. A common situation in computational geometry where complex results arise is the computation of area functions, e.g. acos(arg), where arg has been obtained from floating point computations which are contaminated by tiny rounding errors:

>> atan(1) ans = 0.7854 >> acos(1) ans = 0 >> acos(1+1e-15) ans = 0 + 4.7122e-08i In cases where the input arguments of the area functions are computed and may contain rounding errors, they should be checked to see whether they are in a valid range; and if not, they need to be rounded to the largest (or smallest) permissible value. Undesired plotting of complex results with spurious imaginary components: A single vector a can be plotted with plot(a); but when at least one entry of a is complex, the imaginary components of a will be plotted over the real components. The graphs may look rather puzzling if the results should be real and the imaginary components are just due to rounding errors. Dividing by zero: While on many compilers and operating systems a division by zero will R works (partially) with the IEEE-754 standard, so lead to a program crash, MATLAB infinities are obtained by >> 1/0 ans = Inf >> -1/0 ans = -Inf

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 424 — #18

i

424

i

Appendix A

Further, ‘not a number’ NaN is supported, which is obtained, for example, by >> Inf/Inf ans = NaN >> Inf-Inf ans = NaN Testing for equality with NaN: The infinities Inf and -Inf behave under comparison as expected: >> Inf==Inf ans = 1 i.e. infinity is always equal to infinity; but NaN is always unequal to NaN: >> NaN==NaN ans = 0 To check whether a variable (or an element of an array) has the value NaN, the function isnan must be used: >> isnan([1 Inf NaN -15]) ans = 0 0 1

0

Computation with more data points instead of higher accuracy: The ODE solvers in R MATLAB determine the (usually irregular, problem-dependent) step-size adaptively, but the user can also specify the output with smaller (usually regular) step-size. This does not mean that the accuracy of computation is increased, as mentioned at the end of § A.8.

A.10

Profiling and optimization

Profiling (the analysis of which parts of a program consume how much time) can be done on several levels. One can use the pair of commands tic and toc to measure the time consumption of program parts sandwiched between them. Finer-grained profiles can be obtained from the ‘Tools’ menu, upon selecting ’Open Profiler’ and then ‘Start Profiling’. For many R ’s performance will be satisfactory, at least after inner explicit loops applications, MATLAB have been rewritten as implicit loops. However, this may not be possible for inner loops involving many if-conditions. In that case, MEX-files (MATLAB EXternal files) can be used: R executables for easier porting and FORTRAN or C files can be combined with MATLAB faster execution, especially of code containing many if-statements in loops, which perform very slowly in interpreter mode. Nevertheless, some rewriting will be needed due to differences between the different languages in the handling of array structures in function calls. To R compiler also exists. obtain faster execution, a MATLAB

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 425 — #19

i

R MATLAB as Programming Language

A.11

i

425

R Free alternatives to MATLAB

R There are several free packages which offer a subset of MATLAB ’s functionality. R  GNU Octave [2] offers the best MATLAB -compatibility of all the clones, at least in Unix environments (which includes Mac OS X). Octave’s compatibility has improved much from R . only a few years ago, when even the size function gave a result different than MATLAB Octave’s output has to be scrolled through by hand when the screen is full, whereas in R the scrolling is done automatically. Currently, a notable difference is that Octave MATLAB issues a warning if a function name is different from the corresponding filename; this may stall program execution for multiple calls when scrolling by hand becomes necessary. In any case, it is better programming practice to have consistent function names and filenames. Error R , so learning and debugging are better messages in Octave are less concise than in MATLAB R R   done in MATLAB . The subset of MATLAB functions that we use in this book should run without modification in the newer versions of Octave. To obtain reasonable execution speeds, R to use implicit rather than explicit loops. it is even more crucial in Octave than in MATLAB Octave’s ‘native’ ODE solver lsode [3] is the implementation of the well-regarded package R ’s integrators. Integrators of the same name [3], but its interface is different from MATLAB R  compatible with the ones in MATLAB are supplied with the ODEPKG package, which is not a standard part of Octave and must be installed separately. ODEPKG contains considerR does. As the versions of Octave change relatively fast, ably more integrators than MATLAB one needs to make sure that ODEPKG is installed with the current Octave version, not an older one, and is updated accordingly. R and exchange subroutines In principle, one can alternately use Octave and MATLAB between them, if one is careful with some compatibility issues, mostly related to how text is R allows only ... to indicate the continuation of a statement to another handled. MATLAB R allows only single quotation line, while Octave also allows the backslash \. MATLAB marks (to enclose strings), but Octave allows either single or double quotation marks. While R uses the exclamation mark ! as the escape sequence to the operating system, in MATLAB Octave ! is an alternative expression for ˜ (logical ‘not’). R clone developed mainly by INRIA, the French national instiScilab [4] is a MATLAB R and tute for research in computer science and control. It differs in feel from MATLAB Octave, as many libraries have to be explicitly loaded and are then sort of pre-compiled. This improves performance but leads to black boxes, so we don’t recommend Scilab for programR ming projects where program transparency is paramount. While scripts written in MATLAB will perform identically in Octave, Scilab is much less compatible; notably, the argument list of the plot command is already different. We can’t recommend alternating the use of R and Scilab, as the interfaces for more complex commands are too different. MATLAB Finally, there is FreeMat [5], which was originally developed for Windows platforms. It R than is Scilab, but less compatible than Octave. Already the mesh is closer to MATLAB command is missing; the same functionality can be obtained via use of its plot command, R -conforming. which is not MATLAB

A.12

Further reading

R The most readily available and therefore efficient reference is of course MATLAB ’s own help function (in ASCII), or the more graphically oriented doc function.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 426 — #20

i

426

i

Appendix A

Information beyond function documentation can be obtained by searching the support pages of the MathWorks website [6]. Additional functions, not supported directly by MathWorks, R ’s file exchange page [7]. While in Mathematica the particular can be found on MATLAB R are given in the refalgorithms implemented are not indicated, algorithms used in MATLAB erence manual (though computer centers sometimes don’t make them available) or mentioned R support site. on the MATLAB R from various On the website of MathWorks, there is a list of books [8] on MATLAB R  fields, or on other topics where MATLAB is used as the programming language; a notable example is Golub and van Loan’s ‘bible’ of numerical linear algebra [9]. Borse [10] gives an extensive introduction to numerical algorithms, similar to the Numerical Recipes series for compiler languages [11–13]. As for Numerical Recipes, one should not rely unconditionally on the validity of the algorithms, but rather use them as a reference on a wide range of methods. For getting an overview of the field of numerical algorithms, [10] is a good starting point. R , here we will just make a special From the vast amount resources available on MATLAB R (which is why mention of the book [14] by Cleve Moler, the original creator of MATLAB sometimes the why function will respond with ‘Cleve made me do it’), who developed it as a tool with which his students could have fun learning numerical analysis.

Exercises The following exercises are best written as script files, and we recommend clearing the memory at each program start with clear all. 1.1

Generating data with mesh Set up a grid for the complex plane: n=100 x=linspace(-2*pi,2*pi,n) y=x [X,Y]=meshgrid(x,y) Z=X+sqrt(-1)*Z Then compute the complex exponential by exp(Z) and visualize it with the mesh command. Don’t forget that you will have to choose whether you want to look at the real part, the imaginary part or the absolute value.

1.2

Try to understand the following code which generates the coordinates of a regular R ’s cylinder function: polyhedron (or plot of a circle) via MATLAB [X,Y,Z]=cylinder([2 1]) figure(1)plot(X(1,:),Y(1,:)) figure(2) plot(X(2,:),Y(2,:))

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 427 — #21

i

Exercises

1.3

i

427

Random walk Compute a two-dimensional random walk by summing Gaussian-distributed random numbers cumulatively, and visualize the walk with the comet function: n=100 randx=randn(n,1), randy=randn(n,1) x=cumsum(randx) , y=cumsum(randy) comet(x,y) a) Compute the above example for three dimensions, using comet3. b) Compute the random walk on a grid, allowing only motion on integer positions with nearest-neighbor steps (i.e. movement is permitted only to one of the next lattice sites; no diagonal steps), using random numbers, rounding and appropriate conditions.

1.4

Eigenvalues R is an excellent tool for verifying theorems one has (or should have) learned MATLAB in linear algebra ‘by numerical experiment’; it is especially instructive with regard to R ’s eig function). Unless stated otherwise, perform the eigenvalues (using MATLAB following exercises with square matrices of various sizes whose entries are real random R ’s rand and randn). Be aware that the results can only be numbers (use MATLAB correct ‘up to rounding errors’. Convince yourself of the following facts ‘by numerical experiment’: a) For real matrices, the eigenvalues are either real or pairwise complex conjugate. b) The determinant of a matrix (det) is equal to the product of its eigenvalues. c) The eigenvalues of a matrix A are the inverses of the eigenvalues of the inverse matrix A−1 . d) The rank of a matrix (the number of the linear independent rows or columns) is equal to the number of non-zero eigenvalues of the matrix. Define some row or column vectors and form their sums and multiples, so that you know the dependencies; then compute the eigenvalues of the resulting matrices, like in v1=[1 1 0] v2=[ 1 -2 0] A=[v1 v2 (v2+v1)] eig(A) B=[v1 2*v1 v1+v2] eig(B)

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 428 — #22

i

428

i

Appendix A

e) The determinant of singular (or rank-deficient) matrices, which have at least one eigenvalue equal to zero, is zero. f) Rank-deficient matrices cannot be inverted; the execution of inv should terminate with an error message and return a matrix with infinite entries. 1.5

Eigenvectors, diagonalization and nullspaces R , the eigenvectors of a matrix A can be computed together with the In MATLAB eigenvalues by calling [V,D]=eig(A). a) Notice that the eigenvectors are given as the columns of the matrix V, and the eigenvalues are given as the diagonal elements of the diagonal matrix D. Compare the execution of [V,D]=eig(A) with the computation of eigenvalues alone, E=eig(A), where the eigenvalues are given as the elements of a vector E. b) Convince yourself that the matrix V of eigenvectors can be used to form a transformation matrix, so that inv(V)*A*V gives the diagonal matrix D, which has the eigenvalues of A on its diagonal. This transformation is called ‘diagonalization’. c) Set up a triangular matrix, and compute its eigenvalues (they will be the diagonal elements of the matrix) and eigenvectors. The first eigenvector (first column of matrix V) is the eigenvector corresponding to the first eigenvalue (first element of matrix D), and so on. d) For a scalar equation a × b = 0, if a = 0 we know that b must be zero. For a matrix A and a vector v, from Av = 0

(A.4)

(where the 0 on the right-hand side is the zero vector of the same length as v), if A is non-zero one cannot conclude that v is the zero vector. Use the matrices A and B from Exercise 1.4.d and compute their eigenvalues and eigenvectors. In each case, multiply the original matrix by the eigenvector corresponding to the eigenvalue zero; the result will be the zero vector (up to rounding errors). Remark: When n eigenvalues of a matrix A are zero, there will be n vectors, v0,i for i = 1, . . . , n, such that Av0,i = 0. The linear combination i ai v0,i with real coefficients ai defines the so-called nullspace or kernel of A. 1.6

Pseudo-inverse: left- and right-inverses R ’s pinv command Convince yourself that for an m×n matrix A with m < n, MATLAB will produce a matrix B which is a right-inverse, such that A*B is equal to the m × m identity matrix eye(m); on the other hand B*A is a rank-deficient n × n square matrix, i.e. rank(A*B) will give m, which is smaller than the number of rows or columns. a) Try this out with matrices created by rand(m,n) or randn(m,n); choose different m and n less than 10. b) Now consider m × n matrices A with m > n. In this case, pinv will produce a matrix B which is a left-inverse, such that B*A is equal to the n × n identity matrix eye(n), while A*B is a rank-deficient m×m square matrix (the number of non-zero eigenvalues is smaller than the number of rows or columns). c) For A with m = n, pinv and inv are equivalent, i.e. apart from rounding errors, the results will be the same.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 429 — #23

i

Exercises

1.7

i

429

Pseudo-inverses: numerical accuracy Analytically (with infinite accuracy), one of the Moore–Penrose pseudo-inverses R ’s defined in (A.2) and (A.3) should be the same as the result produced by MATLAB pinv. In practice, due to the finite accuracy, the formulae in (A.2) and (A.3) should R ’s implementation pinv is numerically more accurate. never be used, as MATLAB For a numerically computed matrix inverse A−1 (which can be a pseudo-inverse A+ ), one can define the residual R = A−1 A − 1R ,

(A.5)

where 1R stands for the identity matrix with the same dimensions as A−1 A. The residual can be used as a crude estimator of the accuracy, but in general it is not a good estimator. Some algorithms are designed to give a small residual (such as the LU-decomposition R ’s lu function), while the actual error has to be defined in implemented in MATLAB terms of the error in the eigenvalues and eigenvectors. At least for the computation of pseudo-inverses of square matrices, the residual can demonstrate how more digits are lost in computing (A.2)–(A.3) than in the computation with pinv: clear format compact l=15 A=rand(l); pinv(A)*A-eye(l) A*pinv(A)-eye(l) pinvl=inv(A’*A)*A’; pinvl*A-eye(l) pinvr=A’*inv(A*A’); A*pinvr-eye(l) pinvr-pinvl 1.8

Singular values Singular values are (simply speaking) a generalization of eigenvalues to non-square matrices, or a definition of a set of real numbers for real matrices which have properties similar to eigenvalues. Use random matrices to convince yourself of the following facts. a) Trying to use eig to compute eigenvalues for non-square matrices terminates in an error message, but not if one computes singular values with svd. b) Singular values of a matrix A are the square roots of the eigenvalues of AT A or AAT , whichever one of these two square matrices has the smaller dimensions. As a

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 430 — #24

i

430

i

Appendix A

consequence, singular values of a real square matrix are positive definite even if the eigenvalues are complex. c) The inverses of the singular values of a rectangular matrix A are the singular values of the pseudo-inverse of A. 1.9

Interpolation R Program the interpolation which is given in MATLAB ’s help function, Compute and plot the solution for the interpolation with ’nearest’, ’linear’ and ’spline’.

1.10 Erroneously complex solutions The solution of the differential equation x˙ = 1−tx 1/3 with initial value x0 = 1 is strictly real in the interval [0, 5]. Use ode23 to compute the numerical solution with the default accuracy, and observe how the solution turns complex as the accuracy is reduced. R ’s default accuracy may not always (The conclusion is not so much that MATLAB be sufficient, but that one should always check that the numerical output of a function makes mathematical sense.) 1.11 Harmonic oscillator R ’s solvers are written for first-order differential equations, so the secondMATLAB order harmonic oscillator x¨ = −x has to be rewritten as a first-order system v˙ = −x, x˙ = v. The vector of dependent variables is usually called y, so to solve this system we set up a vector y = (y1 , y2 ) = (v, x) and write a suitable function for the right-hand side of the ODE system: function [dydt]=harmos(t,y); %harmonic oscillator %y(1)=v %y(2)=x dydt(1,1)=-y(2); dydt(2,1)=y(1); return; Then one calls an ODE solver with this function as its first input argument. The analytical solution of the harmonic oscillator equation is x(t) = A sin(t) + B cos(t), where the amplitudes A and B (which depend on the initial conditions) are constant. Convince R ’s solvers will conserve the amplitudes A, B and thereyourself that none of MATLAB fore the energy. Solvers which are built to conserve the energy are called ‘symplectic’ and are discussed in Chapter 2, § 2.4.

References [1] E. Hairer, S. P. Norsett, and G. Wanner, Solving Ordinary Differential Equations I: Nonstiff Problems, 2nd ed. Vol. 8 of Springer Series in Computational Mathematics, Springer, 1993. [2] GNU Octave, http://www.gnu.org/software/octave/, last visited December 2013.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 431 — #25

i

Exercises

i

431

[3] K. Radhakrishnan and A. C. Hindmarsh, “Description and use of LSODE, the Livermore Solver for Ordinary Differential Equations”, Report UCRL-ID-113855, Lawrence Livermore National Laboratory, 1993. [4] Scilab, www.scilab.org, last visited December 2013. [5] FreeMat, http://freemat.sourceforge.net, last visited December 2013. R [6] The MathWorks, MATLAB support site, http://www.mathworks.com/support/, last visited December 2013. R [7] The MathWorks, MATLAB Central File Exchange, http://www.mathworks.com/matlabcentral/fileexchange/, last visited December 2013. R [8] The MathWorks, MATLAB and Simulink Based Books, http://www.mathworks.com/support/books, last visited December 2013. [9] G. H. Golub and C. F. van Loan, Matrix Computations, 3rd ed. Johns Hopkins Studies in Mathematical Sciences, Johns Hopkins University Press, 1996. [10] G. Borse, Numerical Methods with Matlab. International Thomson Publishing, 1997. [11] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C++: The Art of Scientific Computing, 3rd ed. Cambridge University Press, 2002. [12] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in Fortran: The Art of Scientific Computing, 2nd ed. Cambridge University Press, 1992. [13] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in C, 2nd ed. Cambridge University Press, 1992. [14] C. B. Moler, Numerical Computing with Matlab. Society for Industrial and Applied Mathematics, 2004.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:46 — page 432 — #26

i

i

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:47 — page 433 — #1

i

i

B Geometry and Computational Geometry Many geometrical concepts are necessary or useful for the programming of non-spherical particles and their interactions. In this appendix we give a brief outline of these concepts, including some characteristics to do with the computational implementation, and mention related physical concepts. The use of inner products is usually safe in terms of numerical evaluation, but we warn against the use of formulations involving 3 × 3 determinants (they are alternatives to some of the formulae presented here), as the numerical stability is much less satisfactory. In any R ’s case, determinants should be numerically evaluated by LU decomposition (see MATLAB documentation in help lu), not with the rule of Sarrus.

B.1

Trigonometric functions

The trigonometric functions sin, cos, tan and cot can be used to compute the sides of triangles with given angles; see Figure B.1(a). When using identities such as cos 2α = cos2 α − sin2 α,

(B.1)

keep in mind that for floating point numbers, the results of √ evaluating the left- and righthand sides will be different due to rounding errors. For |α| ≈ 2/2, at which cos α ≈ sin α, catastrophic cancellation may occur, so that the left-hand side of Equation (B.1) may be much more accurate than the right-hand side. In DEM simulations, computations of angles from given points are often needed, so the inverse trigonometric functions arcsin, arccos and arctan, such that arcsin(sin(α)) = α,

Understanding the Discrete Element Method: Simulation of Non-Spherical Particles for Granular and Multi-body Systems, First Edition. Hans-Georg Matuttis and Jian Chen. © 2014 John Wiley & Sons, Singapore Pte Ltd. Published 2014 by John Wiley & Sons, Singapore Pte Ltd. Companion website: www.wiley.com/go/matuttis

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:47 — page 434 — #2

i

434

i

Appendix B

(a)

(b)

cotan(α)

2 cos(α) Quadrant 2

tan(α)

1

atan2(1.7,0)=90 atan2(1,−1.5)=145

sin(α)

atan2(0.14,2.9)=2.68

0 Quadrant 3

Quadrant 4

atan2(−0.41,2.1)=−11

atan2(−0.45,−1.1)=−158

−1 atan2(−1.3,2.1)=−31.9 atan2(−1.5,0)=−90

−2 −3

−2

−1

0

1

2

3

4

5

Figure B.1 (a) Representation of the trigonometric functions sin, cos, tan and cot with respect to the unit circle (circle of radius 1 centered at the origin), shown for a point in the first quadrant. (b) Results of the computation of angles with atan2 (rounded to two or three digits for readability).

arccos(cos(α)) = α, arctan(tan(α)) = α, may in fact be used more frequently than sin, cos and tan themselves. (In this book we use the arcsin(x), arccos(x) and arctan(x) notation for inverse trigonometric functions in preference to sin−1 (x), cos−1 (x) and tan−1 (x), which can easily be confused with 1/ sin(x), 1/ cos(x) and 1/ tan(x).) In programming languages the inverse trigonometric functions are usually abbreviated as asin, acos and atan. Conventionally, arcsin, arccos and arctan are defined for regions where their function values are real and monotonic, so θ = arcsin(x)

is defined for

−1 ≤ x ≤ 1, with output values −π/2 ≤ θ ≤ π/2;

θ = arccos(x)

is defined for

−1 ≤ x ≤ 1, with output values 0 ≤ θ ≤ π ;

θ = arctan(x)

is defined for

−∞ < x < ∞, with output values −π/2 < θ < π/2.

However, when x is computed to be passed as input to asin or acos, it may be that instead of, e.g., x = 1, due to rounding errors something like x = 1 + , with  ≈ 10−15 , is obtained, which is outside the allowed range of input values. In that case, different compilers may react in different ways. If the error is obvious at compilation time, the compiler may simply reject the compilation. If there is actually an attempt to evaluate the function, it may lead to R , with its automatic a program crash, or Not-a-Number (NaN) may be returned. MATLAB use of complex numbers, will just continue with the (correctly evaluated) complex value of the function. So, before passing critical data into inverse trigonometric functions, one should use an if-condition to make sure that the input value is in the appropriate range, and truncate it otherwise. Very often, one has to compute angles from the relative positions of points. With only asin, acos and atan available, it is rather tedious to identify the proper quadrant for the angle from the signs of cos(α) and sin(α); see Figure B.1(a). Further care is necessary because divisions by zero must be avoided, and these situations arise easily if the angles are multiples of π/2. For this reason, the atan2 function was developed, originally in FORTRAN, and

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:47 — page 435 — #3

i

i

Geometry and Computational Geometry

435

R later also in MATLAB and many other programming languages. Given a point (x, y) in the coordinate plane, atan2(y,x) takes as its first argument the y-value and its second R ; other languages may differ) argument the x-value (at least in FORTRAN and MATLAB and outputs the angle as a number between −π and π. If one requires angles to be between 0 and 2π , the output of atan2(y,x) must be post-processed, i.e. by adding π when the result is negative.

B.2

Points, line segments and vectors

Names of points are usually capital letters, e.g.   2 S= , 3



 5 T = , −2

⎛ ⎞ 4 U = ⎝5⎠, 6



⎞ 2 V = ⎝−2⎠. 2

(B.2)

Here S and T are points in the plane, i.e. two-dimensional space, while U and V are points in three-dimensional space. The coordinates of a point are denoted by a subscripted lowercase letter corresponding to its name; for example, the coordinates of U would be written as u with subscripts. When we use x, y, z as subscripts, e.g. ax , ay , az , we imply a space-fixed Cartesian coordinate system; if the subscripts are numbers, e.g. a1 , a2 , a3 , we mean a bodyfixed coordinate system which may not have the same orientation. A line segment between points U and V will be written as (U, V ). As a line segment, (U, V ) is the same as (V , U ), i.e. the direction does not matter; but, when considered as vectors, (U, V ) and (V , U ) have opposite directions. The distance between U and V will be written as the norm of the line segment between U and V , i.e. (V , U ). While points have only coordinates, their respective position vectors, which are denoted by bold lowercase letters, additionally have a direction (see Figure B.2): ⎛ ⎞ ⎛ ⎞     4 2 2 5 s= , t= , u = ⎝5⎠, v = ⎝−2⎠. (B.3) 3 −2 6 2

(a)

(b)

S

3 2

z

y 0 −1

T 4 x

6

0 −2

u

v

2

−2 2

4 V

t

0

U

6

s

1

0

2

4 y

6 6

4

2

0 x

Figure B.2 Points and their corresponding position vectors from Equations (B.2) and (B.3): (a) in the plane (two-dimensional space); (b) in three-dimensional space.

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:47 — page 436 — #4

i

436

i

Appendix B

Both points and vectors can be written in the same component form, but when bold lowercase letters are used, we mean vectors that have directions; and when capital letters are used, we mean points which do not have directions. We will follow the convention of linear algebra R ) and distinguish vectors v from their transposes vT . For the column vector (and MATLAB s in Equation (B.3), its transpose is a row vector sT = (2, 3). A vector pointing from the origin to a point S will be called the position vector of S, designated by s. A vector from point S to point U will be written as ⎛ ⎞ ⎛ ⎞ s1 u1 (S, U ) = u − s = ⎝u2 ⎠ − ⎝s2 ⎠. u3 s3 The length of a vector is designated by the letter for the vector without boldface, or with norm brackets  · ; so the length of vector v is written as v or v. Vectors of length 1 are called unit vectors, and are designated with a hat: ˆa = 1. The unit vector aˆ in the direction of vector a is obtained from dividing a by its norm: aˆ =

B.3 B.3.1

a . a

Products of vectors Inner product (scalar product, dot product)

In introductory mathematics and physics courses, the inner product of two vectors a and b is called the scalar product or dot product and is given by s = a · b = ab cos α,

(B.4)

where a is the length of a, b is the length of b (i.e. a = a, b = b) and α is the angle between the two vectors. Often we need to compute the scalar product from the components of the two vectors: ⎛ ⎞ b1 ⎜b2 ⎟ s = a b = a1 b1 + a2 b2 + · · · for a = (a1 , a2 , . . .), b = ⎝ ⎠ . .. . Note that here a is a row vector and b is a column vector, so that the usual matrix multiplication gives the scalar product. If c and d are two column vectors, their scalar product will be written as s = cT d; if e and f are two row vectors, their scalar product will be written as e fT ; see Figure B.3. With the scalar product, the vector norms of a row vector a and a column vector b can be written as

i

i i

i

i

i “Matuttis-Driv-1” — 2014/3/24 — 14:47 — page 437 — #5

i

i

Geometry and Computational Geometry

(a)

437

(b)

(c)

a b

a b

b a

Figure B.3 Vectors with: (a) the same orientation and positive inner product; (b) opposite orientations and negative inner product; (c) orthogonal orientation and zero inner product.

aaT ,

b = b = bT b.

a = a =

Therefore, using Equation (B.4), one can compute the cosine of the angle α between two column vectors c and d as cos α =

cT d . cd

(B.5)

Accordingly, if cos α = 1, the two vectors are parallel; and if cos α = 0, the two vectors are perpendicular. Equation (B.5) involves only the scalar product and the norms, which are defined in arbitrary dimensions. Thus, the angle α between two vectors can be computed in any dimension. A typical physical quantity which is computed with the inner product is the work W =

F · dx,

(B.6)

where the scalar product of the vectorial quantities F and dx turns up even under the integral sign. The work is maximal when F and dx are parallel.

B.3.2

Orthogonality

Vectors at right angles are said to be orthogonal; their inner product vanishes. Remember that for floating point numbers, rounding errors have to be taken into account when comparing the result of a calculation with zero; rather than check for equality, one should set a tolerance and check whether the result is within that tolerance. For example, the following code uses the inner product to decide whether two vectors are orthogonal: if ( abs(dot(a,b))