Numerical Methods for Black-Box Software in Computational Continuum Mechanics: Parallel High-Performance Computing 9783111319568, 9783111317298

The organization of the material is presented as follows: This introductory chapter I represents a theoretical analysi

142 90 13MB

English Pages 148 Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Numerical Methods for Black-Box Software in Computational Continuum Mechanics: Parallel High-Performance Computing
 9783111319568, 9783111317298

Table of contents :
Preface
Acknowledgments
Contents
Notation
I Forward to black-box software
II Sequential robust multigrid technique
III Parallel robust multigrid technique
IV Background in iterative methods
Conclusions
Bibliography
Index

Citation preview

Sergey I. Martynenko Numerical Methods for Black-Box Software in Computational Continuum Mechanics

Also of Interest The Robust Multigrid Technique. For Black-Box Software Sergey I. Martynenko, 2017 ISBN 978-3-11-053755-0, e-ISBN (PDF) 978-3-11-053926-4, e-ISBN (EPUB) 978-3-11-053762-8

High Performance Parallel Runtimes. Design and Implementation Michael Klemm, Jim Cownie, 2021 ISBN 978-3-11-063268-2, e-ISBN (PDF) 978-3-11-063272-9, e-ISBN (EPUB) 978-3-11-063289-7 Computational Physics. With Worked Out Examples in FORTRAN® and MATLAB® Michael Bestehorn, 2023 ISBN 978-3-11-078236-3, e-ISBN (PDF) 978-3-11-078252-3, e-ISBN (EPUB) 978-3-11-078266-0

Optimal Control. From Variations to Nanosatellites Adam B. Levy, 2023 ISBN 978-3-11-128983-0, e-ISBN (PDF) 978-3-11-129015-7, e-ISBN (EPUB) 978-3-11-129050-8

Numerical Analysis on Time Scales Svetlin G. Georgiev, Inci M. Erhan, 2022 ISBN 978-3-11-078725-2, e-ISBN (PDF) 978-3-11-078732-0, e-ISBN (EPUB) 978-3-11-078734-4

Sergey I. Martynenko

Numerical Methods for Black-Box Software in Computational Continuum Mechanics �

Parallel High-Performance Computing

Mathematics Subject Classification 2020 Primary: 65N55, 65M55, 65F08; Secondary: 65F50, 65M22 Author Dr. Sergey I. Martynenko Federal Research Center of Problems of Chemical Physics and Medicinal Chemistry Russian Academy of Sciences Department of Combustion and Explosion Ас. Semenov avenue 1 Moscow district, Chernogolovka 142432 Russian Federation [email protected]

ISBN 978-3-11-131729-8 e-ISBN (PDF) 978-3-11-131956-8 e-ISBN (EPUB) 978-3-11-131975-9 Library of Congress Control Number: 2023940862 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2023 Walter de Gruyter GmbH, Berlin/Boston Cover image: Stocktrek Images / Stocktrek Images / Getty Images Typesetting: VTeX UAB, Lithuania Printing and binding: CPI books GmbH, Leck www.degruyter.com

Preface Science in the name of peace and progress

Mathematical modeling of physical and chemical processes has always been an important activity in science and engineering. Scientists and engineers, however, cannot understand all the details of the mathematical models, numerical algorithms, parallel computing technologies, and parallel supercomputer architectures. This fact motivates the development of black-box software. To some extent, attempts to automate mathematical modeling have already been exploited in the first black-box computational fluid dynamics (CFD) code. In 1978, British scientist Brian Spalding1 conceived the idea of a single CFD code capable of handling all fluid-flow processes. Consequently, the company Concentration Heat and Momentum Ltd (CHAM) abandoned the policy of developing individual application-specific CFD codes, and in the late 1978 began creating the world’s first general-purpose CFD code PHOENICS, which is an acronym for Parabolic, Hyperbolic, Or Elliptic Numerical Integration Code Series. The initial creation of PHOENICS was largely the work of Brian Spalding and Harvey Rosten, and the code was launched commercially in 1981. Thus, for the first time, a single CFD code was to be used for all thermo-fluid problems. Clearly, each general-purpose code should be based on a robust computational technique for solving a wide class of nonlinear (initial-)boundary value problems. CFD software has become an essential modeling tool to study and validate flow problems in engineered systems. Many computer-aided engineering (CAE) programs for CFD are available with different capabilities, making it difficult to select the best program for a specific application. In fact, modern codes (Fluent, Star-CCM+, COMSOL’s CFD Module, Altair’s AcuSolve and others) are collections of common building blocks and diagnostic tools helping users to develop their own application without having to start from scratch. Users will, therefore, need a basic knowledge of numerical methods. We define software as black-box if it does not require any additional input from the user apart from a specification of the physical problem consisting of the domain geometry, boundary and initial conditions, source terms, the enumeration of the equations to be solved (heat conductivity equation, Navier–Stokes equations, Maxwell equations, etc.), and mediums. The user does not need to know anything about numerical methods or high-performance and parallel computing. Unfortunately, true black-box solver (optimal algorithmic complexity and full parallelism without the problem-dependent components) cannot be constructed. For example, a general-purpose stopping criterion for iterative algorithms cannot be proposed and developed, i. e. each iterative algorithm has at least one problem-dependent component. It is possible to develop various close-to-black-box solvers based on a reason-

1 https://en.wikipedia.org/wiki/Brian_Spalding https://doi.org/10.1515/9783111319568-201

VI � Preface able compromise between the number of problem-dependent components, algorithmic complexity, and parallelism. Obviously, close-to-black-box solvers need black-box optimization to tailor their problem-dependent components to the problem at hand in order to obtain the highest possible efficiency for the solution process. The aim of this book is to relate our experience in the development of a robust (the lowest number of problem-dependent components), efficient (close-to-optimal algorithmic complexity), and parallel (faster than the fastest sequential algorithm) computational technique for a black-box (de)coupled solution of multidimensional nonlinear (initial-)boundary value problems. The goal of this activity is to develop a black-box computational technique for the numerical solution of a wide class of applied problems starting with the Poisson equation up to systems of nonlinear strongly coupled partial differential equations (multiphysics simulation) in domains with complex geometry, which we already know how to solve. In other words, the purpose of this book is, therefore, to share the author’s personal opinion with the readers about formalizing scientific and engineering computations. It is recommended to read the Preface and Сonclusions first to see if the contents of the book are of interest. The organization of the material is presented as follows. The first chapter is introductory in nature. Chapter I discusses general requirements on grid generations, discretizations, iterative methods, and computational accuracy in black-box software. Chapter II describes the sequential robust multigrid technique (RMT), which is developed as a general-purpose solver in black-box codes. Chapter III provides a description of parallel RMT. Theoretical aspects of the used algorithms for solving multidimensional problems are discussed in Chapter IV. This book generalizes the RMT presented in [21]. This book is of an applied nature and is intended for specialists in the field of computational mathematics, mathematical modeling, and for software developers engaged in the simulation of physical and chemical processes for aircraft and space technologies and for power, chemical, and other branches of engineering. To make this book useful to as many practitioners as possible, a minimal amount of mathematical background has been assumed. It is expected that the readers are familiar with the fundamentals of computational mathematics and are acquainted with the material presented in the authoritative textbook Multigrid [35]. It should be emphasized that our definitions of black-box software and robustness (Definition I.1, p. 3) differ from the standard ones. Sergey Martynenko Moscow 24 February 2023

Acknowledgments The author expresses his sincere gratitude to Professors R. P. Fedorenko, V. S. Ryaben’kii, M. P. Galanin, V. T. Zhukov, S. V. Polyakov, and V. A. Bakhtin of the Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences for their interest in the given work and for a critical discussion of the obtained results. The author is very grateful to Professors I. B. Badriev, М. М. Karchevsky, R. D. Dautov, A. V. Lapin, O. A. Zadvornov, V. V. Banderov, and V. Yu. Chebakova of Kazan Federal University, who organized the International Conference Mesh Methods for Boundary Value Problems and Applications, where the results of this book were discussed. In preparation of this text, I have benefited from the advice and assistance of many people. I am especially grateful to Professor M. A. Olshanskii of Houston University and to Professors S. P. Kopyssov, A. K. Novikov, and A. S. Karavaev of Udmurt State University for their constant advice and helpful contributions. I wish to give special thanks to Professors A. V. Antonik and P. D. Toktaliev for their extremely careful and competent help in the preparation of the manuscript. I express my deepest gratitude to my school mathematics teacher Tamara Rabochaya, who gave me a love for computational mathematics. This activity is a part of the project “Supercomputer simulation of physical and chemical processes in the high-speed direct-flow propulsion jet engine of the hypersonic aircraft on solid fuels” supported by the Russian Science Foundation (project 15-11-30012) and “Supercomputer modeling of hypervelocity impact on artificial space objects and Earth planet” supported by the Russian Science Foundation (project 21-72-20023). The main part of this work has been performed at the Federal Research Center of Problem of Chemical Physics and Medicinal Chemistry of the Russian Academy of Sciences (Chernogolovka, Moscow district, Russia).

https://doi.org/10.1515/9783111319568-202

Contents Preface � V Acknowledgments � VII Notation � XI I 1 2 3 4 5 6 7 8

Forward to black-box software � 1 Formalization of scientific and engineering computations � 1 Mathematical models in the continuum mechanics � 3 Finite volume discretization � 6 Smoothers � 7 An introduction to the black-box solver � 11 Segregated and coupled solvers for systems � 18 Nonlinear two-grid algorithm � 25 Summary � 35

II 1 2 3 3.1 3.2 3.3 4 5 6 7 8 9

Sequential robust multigrid technique � 37 Multiple coarse grid correction strategy � 37 Analytic part of the technique � 38 Computational part of the technique � 39 Generation of the finest grid � 39 Triple coarsening � 40 Finite volume discretization on the multigrid structure � 44 Multigrid cycle � 47 1D numerical illustration � 49 Computational work � 65 Black-box optimization on the multigrid structure � 66 Remarks on the index mapping � 69 Summary � 73

III 1 2 3 4 5 6

Parallel robust multigrid technique � 74 Introduction to parallel RMT � 74 Geometric parallelism � 77 Algebraic parallelism � 82 Parallel cycle for middle-thread parallelization � 84 Parallel solution of time-dependent problems � 85 Summary � 101

IV 1

Background in iterative methods � 103 Basic notation and definitions � 103

X � Contents 2 3 4 5

Basic direct and iterative methods � 105 Computational grids � 109 Auxiliary space preconditioner � 118 Remarks on the smoothing and approximation property � 124

Conclusions � 127 Bibliography � 133 Index � 135

Notation A E G10 Gkl ℒ L+3 L∗3 ℳ𝒮(Gkl ) 𝒩 N NM 𝒫A→0 ℛ0→A Re S S T V W 𝒲 c d f h l ℓ n nb q p r û xif xiv Δ Ω 𝜕Ω ν ρ τ ‖ ⋅ ‖∞ ‖e‖∞

coefficient matrix (1.11) parallel algorithm efficiency (3.7) the finest grid (p. 39) (3.1) kth grid of level l (p. 39) linear (elliptic) differential operator the coarsest level (2.7) the dynamical coarsest level (p. 98) multigrid structure generated by the grid Gkl (p. 42) nonlinear differential operator the number of unknowns the number of model equations (1.15) prolongation operator (2.41) restriction operator (2.39) Reynolds number (p. 22) (smoothing) iteration matrix (4.5), (4.23) parallel algorithm speed-up (3.7) time finite volume (2.8) (p. 78, 88) splitting matrix (4.4) algorithmic complexity (p. 13) coarse grid correction dimension (d = 2, 3) the right-hand side function (2.1), (3.3), (3.8) mesh size (p. 12) the grid level (p. 42) the dynamic finest level discretization parameter (p. 12) the number of unknown blocks (p. 15) multigrid iteration counter the number of computing units (threads) residual approximation to the solution grid point abscissa (p. 109), (4.9b) grid point abscissa (p. 109), (4.9a) Laplace operator (1.16) domain domain boundary (smoother) iteration counter average reduction factor (4.6), (4.27) relaxation parameter (1.6), (4.4) Chebyshev (max) norm (p. 20), (4.1) Chebyshev (max) norm of the error vector

https://doi.org/10.1515/9783111319568-203

XII � Notation

Subscripts {⋅} 0 A O a h opt s t x y z

index mapping related to the finest grid G10 auxiliary original analytic solution grid optimal space time X-direction Y -direction Z-direction

Superscripts h T 0 (q) (ν) (0)

grid transposition related to the finest grid (G10 ) related to qth multigrid iteration related to νth smoothing iteration starting guess

Abbreviations AMG ASM BBS BVP CFD FAS FEM FVD FVM IBVP ℳ𝒮 PDE PSMG SLAE SOR RMT ao

algebraic multigrid method auxiliary space method black-box software boundary value problem computational fluid dynamics full approximation storage finite element method finite volume discretization finite volume method initial-boundary value problem multigrid structure partial differential equation parallel superconvergent multigrid method system of linear algebraic equations successive overrelaxation robust multigrid technique arithmetic operations

I Forward to black-box software This introductory chapter represents a theoretical analysis of the computational algorithms for a numerical solution of the basic equations in continuum mechanics. In this chapter the general requirements for computational grids, discretization, and iterative methods for black-box software are examined. Finally, a concept of a two-grid algorithm for (de)coupled solving multidimensional nonlinear (initial-)boundary value problems in continuum mechanics (multiphysics simulation) in complex domains is presented.

1 Formalization of scientific and engineering computations In the early days of computers, every machine computation constituted a notable scientific event. Since the early 1980s, the development of high-speed digital computers has had a great impact on the activities of engineers, physicists, chemists, and other non-mathematicians solving time-consuming engineering and scientific problems. It immediately became clear that specialized software would significantly reduce the work of coding and debugging in scientific and engineering applications. Several industries and engineering and consulting companies worldwide use commercially available general-purpose CFD codes for the simulation of fluid flow, heat, and mass transfer and combustion in aerospace applications (Fluent, Star-CCM+, COMSOL’s CFD Module, Altair’s AcuSolve, and others). Also, many universities and research institutes worldwide apply commercial codes, besides using those developed in-house. Today open-source codes such as OpenFOAM are also freely available. Other important issues are the description of complex domain geometries and the generation of suitable grids. However, to successfully apply such codes and to interpret the computed results, it is necessary to understand the fundamental concepts of computational methods. A promising and challenging trend in numerical simulation and scientific computing is to devise a single code to handle all problems already solved. As a rule, the mathematical modeling consists of the following stages: 1) The formulation of the mathematical model for the studied physical and chemical processes in the form 𝒩 (u) = f .

2) 3)

(1.1)

The approximation of the space-time continuum (generation of a computational grid G). The approximation of the differential problems (1.1) on the grid G to obtain a discrete analogue of the mathematical model 𝒩h (uh ) = fh .

https://doi.org/10.1515/9783111319568-001

(1.2)

2 � I Forward to black-box software 4) A numerical solution for the (non)linear discrete equations uh = 𝒩h−1 fh 5)

(1.3)

on a sequential or parallel computer. The visualization and analysis of computational results.

Here 𝒩 (u) = f is a system of (non)linear (integro-)differential equations and (initial-)boundary conditions (mathematical model), 𝒩h (uh ) = fh is the resulting system of (non)linear algebraic equations (the discrete analogue of the mathematical model), and uh = 𝒩h−1 fh is the numerical solution. Unfortunately, each stage of the mathematical modeling is a very complex problem, which has not yet been solved robustly. The most time-consuming step in execution is the numerical solution of (non)linear discrete equations (1.3). The determining conditions for the software development are the wishes of potential users. However, what do the engineers, physicists, chemists, and other nonmathematicians want? Non-mathematicians want to operate with convenient and customary objects (domain geometry, initial and boundary conditions, working environments, source terms, fundamental equations, etc.) and get computational results in their obvious form, and they do not want to know anything about grids, approximations, linearizations, iterative methods, parallel computations, computer architectures, and other details of numerical experiments. Users need a very easy-to-use and powerful tool that will significantly expand their capabilities without the need to learn computational mathematics and programming, i. e., black-box software. It is natural to want to minimize execution time for solving real-life problems using simple computers. Our definition of black-box software is given in the Preface. All problems caused by developing black-box software for solving real-life applications can be subdivided into “physical”, “mathematical”, and “computer” subproblems: – “Physical” subproblems arise from the difficulties of mathematically describing complex physical and chemical processes (1.1), such as hydrodynamics, heat and mass transfer in multiphase reacting flows, turbulent combustion, etc. – “Mathematical’’ subproblems arise from the formalization complexity of the main stages of computational experiments: generation of computational grids, approximations of the governing system of nonlinear (integro-)differential equations (1.1), and the efficient solution of systems of nonlinear algebraic equations (1.2) on a sequential or parallel computer (1.3). – “Computer” problems arise from the compatibility difficulties of frequently updated software and hardware. “Physical” subproblems are a consequence of the variety of modeled physical and chemical processes and accuracy required for their mathematical description, whereas the

2 Mathematical models in the continuum mechanics



3

imperfection of numerical methods for solving the governing nonlinear PDEs leads to “mathematical” subproblems. The idea behind robust multigrid algorithms is to choose the components independently of a given problem to match as large a class of problems as possible. The robust approach is often used in software packages where attaining the highest efficiency for a single problem is not so important [35]. Further, it is supposed that the number of problem-dependent components defines the robustness of the algorithm: Definition I.1. Let there be a set of algorithms for solving some problem. An algorithm from this set is called robust if it has the lowest number of problem-dependent components. This book focuses on “mathematical’’ subproblems. To overcome the problem of robustness, a two-grid algorithm is developed for black-box software, but nonlinear problems are not transferred between these grids. The basic solver consists of two ingredients, smoothing on the original grid and correction of the auxiliary grid. The main feature of the proposed approach is application of the Robust Multigrid Technique (RMT) for computing the correction on the auxiliary structured grid (Chapter II). The RMT is a single-grid (pseudo-multigrid) algorithm with the lowest number of problemdependent components, close-to-optimal complexity, and high parallel efficiency. Our considerations for the development of black-box software are summarized in the last section.

2 Mathematical models in the continuum mechanics Continuum mechanics, a large branch of mechanics, is devoted to the study of the motion of gaseous, liquid, and rigid deformable bodies. In continuum mechanics, with the aid of and on the basis of methods and observations developed in theoretical mechanics, motions of material bodies that fill the space in a continuous manner and the distances between the points of which change during motion are examined. Besides ordinary material bodies, such as water, air, or iron, special media are also investigated in continuum mechanics, i. e., fields of electro-magnetism, radiation, gravitation, etc. [32]. These are the basic hypotheses of continuum mechanics: 1) The concept of a continuous medium must first of all be defined. All bodies consist of individual particles; however, there are very many such particles in any volume that are essential for the ensuing studies, and it is possible to assume that a medium fills the space in a continuous manner. Water, air, iron, etc. will be considered to be bodies that completely fill a part of space. Such a hypothetical continuous matter is termed a continuum. Not only common material bodies can be considered continuous continua, but also fields, for instance, electromagnetic fields. Such an idealization is necessary to employ the apparatus of continuous functions, differential and integral calculus for studying the motion of deformable bodies [32].

4 � I Forward to black-box software 2)

3)

Only Euclidean spaces will be considered, as experience shows that within not very large scales, a real physical space can be assumed to be Euclidean with a high degree of accuracy. Only absolute time for the description of reality will be used if the effects of the theory of relativity can be neglected.

Mathematical modeling of physical and chemical processes can be represented as a “model–algorithm–software” triad [30, 31]. In the first stage the system of equations (1.1) expressing in mathematical form the most important properties of the modeled object or process is chosen or constructed. The general technique for obtaining the government equations of continuum mechanics is to consider a small finite volume V in a fixed coordinate system. The size of volume V is small enough compared to the volume of the medium, but it is too large compared to the molecular distances. Mass, energy, and other physical quantities should be conserved in volume V through which the medium moves. For the given volume V , a phenomenological approach based on general empirical hypotheses is used to formulate the fundamental conservation laws. Taking the limit as ht → 0 and V → 0, partial differential equations can be derived from these conservation laws. In fact, the limit as ht → 0 and V → 0 conflicts with the hypothesis of continuity. In a study of the mechanics of deformable media, we wish to rely on differential and integral calculus. Therefore assume that the functions entering into the laws of motion of a continuum are continuous and possess continuous partial derivatives with respect to all variables. This assumption is general enough, but it severely limits the class of phenomena that may be studied [32]. In general, the results of differential equations analysis may not be valid for the discrete equations and vice versa. The above-mentioned considerations lead to the following conclusions: 1) Mathematical modeling in continuum mechanics is a chain of approximations: – Difference schemes1 approximate the governing differential equations (1.1). – Differential equations (1.1) approximate the fundamental conservation laws of continuum mechanics. – A continuous medium approximates a real one. 2) The finite volume method (FVM) seems to be the most natural way to approximate the governing equations of the continuum mechanics [37]. Moreover, the FVM allows us to obtain discrete analogues of the fundamental conservation laws without differential equations formulated in the limits as ht → 0 and V → 0. 3) Since any chemical reaction is the result of intermolecular interactions, modeling chemical processes in continuum mechanics is only possible by using empirical hypotheses and experimental data to approximate the quantum nature of these intermolecular interactions. 1 A difference scheme is a finite system of algebraic equations replacing some differential problem.

2 Mathematical models in the continuum mechanics



5

Each mathematical model describes only the most important properties of the modeled objects or phenomena. Often, some physical and chemical processes are described in a simplified way to limit the spatial and\or temporal scales of the studied phenomena. Sometimes the model parameters, domain geometry, boundary conditions, medium properties, and source terms are not given accurately enough. The accuracy of the mathematical model formulation (1.1) determines the type and order of the PDEs approximation, the choice of the numerical methods, and the error of the results obtained (1.3). The second stage consists in choosing or developing an algorithm for computer implementation of the model, i. e., a finite set of precisely defined rules for solving the model equations. The model is presented in a form convenient for numerical solution, and a sequence of computational and logical operations is defined to find the desired solution with given accuracy. Numerical methods are an approximate approach, and they also put a computational error into the simulation results. The computational error must be consistent with the model error, because it is pointless to use high-precision numerical methods with a simplified mathematical description of physical and chemical processes. The algorithms should not distort the basic properties of the models, but they should be economical and adaptable to the problems solved and computers used. In the third step, programs to “translate” the model and algorithm into a computer language are created. They must also be cost-effective and adaptive. The complexity of modern mathematical models of continuum mechanics makes their theoretical analysis too difficult, so often physical considerations are a more useful tool for developing numerical methods than useless theorems proved in ideal settings. If the simulation error has a dominant physical nature, then such problems should be classified to computational physics – a branch of physics associated with solving inaccurate posed physical problems with numerical methods. As a rule, these nonlinear problems are very complicated (e. g., turbulent combustion of multiphase media), but the methods for their numerical solution are very simple: second-order finite volume discretization (Section 3, Сhapter I), some linearization, Gaussian elimination, Gauss– Seidel iterations, and interpolation. In computational physics, the solution accuracy is determined by comparing computational and experimental data, which are also approximate. In the following, we mainly will consider numerical methods for such problems. If the simulation error has a mathematical nature, then such problems should be classified to computational mathematics – a branch of mathematics associated with the development and theoretical investigation of numerical methods. As a rule, these linear problems are very simple (Poisson equation, Stokes problem, etc.), but algorithms for their solution are complex and constructed to achieve high accuracy and\or optimal algorithmic complexity (finite element method, Galerkin discontinuous discretization, classic multigrid methods, etc.). In computational mathematics, the solution accuracy is determined by comparing the analytical and approximate solutions.

6 � I Forward to black-box software Conclusion. As a rule, the mathematical description errors of physical and chemical processes in the approximation of continuum mechanics have a physical nature (inaccurate initial and\or the boundary conditions, equation state errors, approximate description of the turbulent transport and chemical reactions, etc.), and they exceed the discretization errors of the governing (integro-)differential equations. In many cases the second-order accurate finite volume discretization does not damage the discrete solution accuracy of the mathematical model equations required for practical applications. However, for reasons of robustness, advanced software can use a high-order discretization without significant changes in the computational algorithm.

3 Finite volume discretization The finite volume method (FVM) was proposed by the Soviet academician A. A. Samarskii [29]. The basis of the FVM is an integral conservation law, and the essential idea is to divide a domain into many finite volumes (cells) and approximate the conservation law on each volume. In other words, FVM is one of approaches for approximating partial differential equations that uses the values of the conserved variables averaged across the volume. Since the mid-1950s, the discretization approach has been actively used in computational practice for the numerical solution of various applied problems. Many general-purpose codes for fluid flows, including the majority of commercial CFD programs, are based on the finite volume approach. The background of the FVM is given in [27, 37]. We summarize the advantages of FVM: 1) Conservative discretization of physical laws Modern physics is based on the fundamental conservation laws, which state that the physical quantities in a closed system do not change over time. In FVM the conservation of mass, momentum, and energy is ensured at the level of each cell/finite volume. It is always better to use governing equations in their conservative form with the finite volume approach to solve any problem that ensures the conservation of all the properties in each cell/finite volume. It is natural to require that difference schemes should express the discrete conservation laws on a computational grid. Such schemes are called conservative ones. FVM is the most elegant approach for the construction of the conservative difference scheme. 2) The possibility of solving equations with discontinuous coefficients There are many important physical phenomena that can be modelled using PDEs with discontinuous coefficients. Previously, several approaches have been developed for the numerical solution of problems with discontinuous coefficients. One of them is based on the explicit use of conjugation conditions. For these purposes, the grid points must coincide with the coefficient discontinuity lines of surfaces. The second approach consists in constructing so-called homogeneous schemes. From

4 Smoothers

3)

� 7

the black-box point of view, these schemes seem to be preferable for having the same form for problems with continuous and discontinuous coefficients. Application in geometric multigrid methods At present, multigrid methods have become the dominant set of algorithms in software packages used to model physical and chemical processes. One problemdependent component of multigrid methods is the approximation of original differential problems on the sequence of nested grids. If the coarser grids are generated by the agglomeration of finite volumes of the finer grids, then FVM is the most convenient discretization approach (Chapter II).

Conclusion. The finite volume method is the best way to discretize the (initial-)boundary value problems in continuum mechanics in black-box software. However, advanced software should use finite element schemes or other discretization approaches without significant modification of the computational algorithm.

4 Smoothers One of the key components of multigrid is the smoother, which aims at reducing high-frequency errors on each grid [12, 35, 38]. To illustrate the smoothing effect used in a multigrid, we start with the discrete Poisson equation u′′ = −f (x) with the Dirichlet boundary conditions u(0) = α,

u(1) = β

in the unit interval (0, 1). The points (nodes or vertices) xi of 1D uniform grid are given by xi = (i − 1)h =

i−1 , n

i = 1, 2, . . . , n + 1,

where n is a discretization parameter, and h = 1/n is a mesh size. Standard three-point finite-difference approximation of the 1D Poisson equation leads to the discrete BVP υ1 = α,

υi−1 − 2υi + υi+1 = −f (xi ), h2 υn+1 = β,

(1.4a) i = 2, 3, . . . , n,

(1.4b) (1.4c)

where the grid function υi is discrete analogue of the continuous function u(x): υi = u(xi ). The goal here is to compute υi , i = 2, . . . , n.

8 � I Forward to black-box software For simplicity, we assume that f (x) = 0 and α = β = 0 (consequently, u = 0), and therefore the discrete Poisson equation becomes υi−1 − 2υi + υi+1 = 0,

i = 2, 3, . . . , n.

The Jacobi iterations are given by 1 υ(ν+1) + υ(ν) = (υ(ν) ), i i+1 2 i−1

(1.5)

where ν is the Jacobi iteration counter. More generally, we consider the damped Jacobi method defined by introducing the parameter τ: υ(ν+1) := υ(ν) + τ(υ(ν+1) − υ(ν) ), i i i i

(1.6)

where τ is a relaxation parameter, and := denotes assignment “left equals right”. Substitution of (1.5) into (1.6) gives the recurrent formula τ + υ(ν) ). υ(ν+1) = (1 − τ)υi(ν) + (υ(ν) i+1 i 2 i−1

(1.7)

An oscillatory starting guess υ(0) = i

0 if i odd, 1 + (−1)i ={ 2 1 if i even,

will be used to illustrate the convergence of the damped Jacobi method depending on τ for n = 10. The simplicity of the discrete BVP (1.4) makes it possible to obtain approximations to the exact solution υi = 0 in analytical form: 1) First damped Jacobi iteration (ν = 1) i

i−1

1 + (−1) τ 1 + (−1) τ υ(1) = (1 − τ)υ(0) + (υ(0) + υ(0) ) = (1 − τ) + ( i i i+1 2 i−1 2 2 2 = (1 − τ)

+

1 + (−1)i+1 ) 2

1 + (−1)i τ 1 − (−1)i 1 − (−1)i 1 + (−1)i 1 − (−1)i + ( + ) = (1 − τ) +τ . 2 2 2 2 2 2

The first approximation to the exact solution υi = 0 becomes 1−τ

if i even,

τ

if i odd.

υ(1) ={ i 2)

Second damped Jacobi iteration (ν = 2 and 2 < i < n) i

i

1 + (−1) 1 − (−1) τ υ(2) = (1 − τ)υ(1) + (υ(1) + υ(1) ) = ((1 − τ)2 + τ 2 ) + 2τ(1 − τ) . i i i+1 2 i−1 2 2

4 Smoothers

� 9

The second approximation to the solution υi = 0 becomes (1 − τ)2 + τ 2

if i even,

2τ(1 − τ)

if i odd.

υ(2) ={ i

We define the error of approximation to the exact solution υi = 0 by 󵄨 (ν) 󵄨 󵄩󵄩 (ν) 󵄩󵄩 (ν) 󵄩󵄩e 󵄩󵄩∞ = max󵄨󵄨󵄨υi − υi 󵄨󵄨󵄨 = max υi . i

i

The convergence of the iterations means that ‖e(ν) ‖∞ → 0 as ν → ∞ (4.1). For τ = 1 (Jacobi method), we have 0 󵄩󵄩 (0) 󵄩󵄩 (0) 󵄩󵄩e 󵄩󵄩∞ = max υi = max { i i 1

if i odd if i even

1−τ 󵄩󵄩 (1) 󵄩󵄩 (1) 󵄩󵄩e 󵄩󵄩∞ = max υi = max { i i τ

= 1,

if i even if i odd

(1 − τ)2 + τ 2 󵄩󵄩 (2) 󵄩󵄩 (2) 󵄩󵄩e 󵄩󵄩∞ = max υi = max { i i 2τ(1 − τ)

= 1,

if i even if i odd

= 1,

whereas for τ = 1/2 (damped Jacobi method), we have 󵄩󵄩 (0) 󵄩󵄩 󵄩󵄩e 󵄩󵄩∞ = 1,

󵄩󵄩 (1) 󵄩󵄩 󵄩 (2) 󵄩 󵄩󵄩e 󵄩󵄩∞ = 󵄩󵄩󵄩e 󵄩󵄩󵄩∞ = 1/2.

Obviously, during the first iterations, the Jacobi method with τ = 1/2 reduces errors much more effectively than that with τ = 1, as shown in Figure 1.1. After a few iteration steps, the error of the approximation to solution υ(ν) becomes smooth if τ = 1/2 i independently of h = 1/n. Note that the Jacobi method with τ = 1 is faster than that with τ = 1/2 for this problem (Figure 1.2). Highly efficient multigrid methods use the fast convergence of smoothers during the first iterations (independent of the mesh size h) on a coarse grid hierarchy. A smooth error term is well approximated on coarse grids. We continue with the Gauss–Seidel iterations (4.8) υ(ν+1) − 2υ(ν+1) + υ(ν) = 0, i−1 i i+1

i = 2, 3, . . . , n,

(1.8)

where ν is a counter of the Gauss–Seidel iterations. The first approximation υ(1) to the exi act solution υi = 0 becomes i 1 (1) 1 (1) 1 − (−1) (0) υ(1) = (υ + υ ) = (υ + ), i i+1 2 i−1 2 i−1 2

or

10 � I Forward to black-box software

Figure 1.1: Convergence of the Jacobi method (1.7) during the first iterations.

Figure 1.2: Convergence of the (damped) Jacobi method. 2 1 (1) 1 − (−1) (υ + )= υ(1) = 2 2 1 2 3 1 (1) 1 − (−1) υ(1) = (υ + )= 3 2 2 2 4 1 (1) 1 − (−1) υ(1) )= 4 = 2 (υ3 + 2 5 1 (1) 1 − (−1) υ(1) )= 5 = 2 (υ4 + 2 .. .

1 (0 + 0) = 0, 2

1 1 (0 + 1) = , 2 2 1 1 1 ( + 0) = , 2 2 4 1 1 5 ( + 1) = . 2 4 8

5 An introduction to the black-box solver

� 11

Figure 1.3: Convergence of the Gauss–Seidel method (1.8) during the first iterations.

Figure 1.3 demonstrates zero, first, and second approximations υ(ν) , ν = 0, 1, 2, to the i solution υi = 0 and h-independent convergence during the first iterations (h = 1/n). Classical iteration methods such as damped Jacobi or Gauss–Seidel-type iterations are often called relaxation methods (or smoothing methods or smoothers) if they are used for error smoothing [35]. The Gauss–Seidel iteration (or more generally, an appropriate smoother) on different grids gives rapid reduction of the corresponding high frequency components of the error and as this process covers all frequencies, a rapid reduction of the overall error can be achieved [35]. A theoretical explanation of this smoothing effect is given in [12, 35, 38]. The main purpose of this theory is to show the h-independent fast convergence of multigrid iterations. An efficient smoother is a key problem-dependent component of multigrid methods. We will consider only Gauss–Seidel-type smoothers and the possibility of black-box optimization of the smoothing procedures on multigrid structures (Section 7, Сhapter II). Remember that parallelization properties of the smoothers depend on pattern of the coefficient matrix (Сhapter III).

5 An introduction to the black-box solver We continue our discussion with Poisson equation since the linear analysis is particularly easy and illustrative. The 2D Poisson equation 𝜕2 u 𝜕2 u + = −f (x, y) 𝜕x 2 𝜕y2 with the Dirichlet boundary conditions

(1.9)

12 � I Forward to black-box software u(0, y) = u(1, y) = u(x, 0) = u(x, 1) = 0 is the prototype of an elliptic boundary value problems (BVPs). The points (nodes or vertices) (xi , yj ) of uniform grid in the unit square [0, 1] × [0, 1] are given by xi = (i − 1)h =

i−1 , n

i = 1, 2, . . . , n + 1,

yj = (j − 1)h =

j−1 , n

j = 1, 2, . . . , n + 1,

where n is the discretization parameter, and h = 1/n is the mesh size. Standard five-point finite-difference approximation of the Poisson equation (1.9) becomes h h ui−1j − 2uijh + ui+1j

h2

+

h h uij−1 − 2uijh + uij+1

h2

= −f (xi , yj ),

i, j = 2, 3, . . . , n,

(1.10)

where the function uijh is discrete analogue of the function u(x, y), i. e., uijh = u(xi , yj ). When a system of linear equations such as (1.10) is expressed in matrix form Aυ = b,

(1.11)

it is implied that a correspondence between equations and unknowns exists and that an ordering of the unknowns has been chosen [14]. In writing the matrix form, if the kth unknown in the vector υ is uijh , then we assume that the kth row of A is obtained from the difference equation (1.10), corresponding to the mesh point (xi , yj ). Independent of the ordering of the unknowns uijh , this correspondence between equations and unknowns

determines the diagonal entries of A. The number of unknowns is N = (n−1)2 ≈ n2 = h−2 . The natural point or block ordering of unknowns is used to write discrete BVPs in matrix form (1.11). The ordering of unknowns usually determines the sequence in which the unknowns are updated in the iterative process. For block iterative methods, blocks or groups of unknowns are updated simultaneously. Forward or lexicographic point ordering defines the components of the vector of unknowns υ as υi+(n−1)j+1−2n = uijh ,

i, j = 2, 3, . . . , n,

i. e., the unknowns υk , k = 1, 2, . . . , (n − 1)2 are associated with points of a computational grid as shown in Figure 1.4. Finally, the discrete BVP (1.10) is written as a resulting system of linear algebraic equations (1.11) (SLAE) with a structured sparse (n − 1) × (n − 1) coefficient matrix A. Five-point finite-difference approximation of the Laplace operator on the left-hand side of (1.10) means that the matrix A has a regular sparsity pattern consisting of five nonzero diagonals. There are two classes of methods for solving SLAEs. In direct methods the exact solution of a SLAE (1.11) can be obtained by performing a finite number of arithmetic operations in the absence of round-off errors (Section 2, Сhapter IV). The strategy of Gaussian

5 An introduction to the black-box solver

� 13

Figure 1.4: Lexicographic ordering of the unknowns υk , k = 1, 2, . . . , (n − 1)2 , (∙) (n = 5).

elimination is to reduce the full linear system (1.11) to a triangular system using elementary row operations. In general, the Gaussian elimination does not exploit the pattern of the coefficient matrix A. A major drawback of the direct methods is the impractical amount of computations. We define the algorithmic complexity 𝒲 as the total number of arithmetic operations needed to solve a given SLAE by the selected algorithm, i. e., 𝒲 = 𝒲 (N) arithmetic operations (ao), where N is the number of unknowns. In general, the number of arithmetic operations needed to solve SLAE (1.11) by the direct methods is proportional to N 3 , i. e., 𝒲 = O(N 3 ) ao. From a practical point of view, one of our goals is to develop close-tooptimal algorithms for many PDEs, where the number of arithmetic operations needed to solve a discrete problem is proportional to N log N in the problem considered. In contrast to direct methods, all iterative methods rarely produce the exact answer after a finite number of steps but decrease the error by some fraction after each step (Section 2, Сhapter IV). The Gauss–Seidel method begins with the decomposition of the coefficient matrix A, A = L + D + U, in which D is the diagonal of A, and L and U are its strict lower and upper parts, respectively [28]. The Gauss–Seidel iterations are defined by (L + D)(υ(ν+1) − υ(ν) ) = b − Aυ(ν) ,

ν = 0, 1, 2, . . . ,

(1.12)

where ν is the Gauss–Seidel iteration counter. The convergence of these iterations means Aυ(ν) → b ⇒ υ(ν) → A−1 b for ν → +∞.

14 � I Forward to black-box software For the above-mentioned lexicographic point ordering of the unknowns uijh , the Gauss–Seidel iteration for (1.10) can be represented as do j = 2, n do i = 2, n uijh :=

h2 1 h h h h (ui−1j + ui+1j + uij−1 + uij+1 ) − f (xi , yj ) 4 4

end do end do where := denotes assignment “left equals right”. Appropriately applied to many discrete elliptic problems, the Gauss–Seidel method has a strong smoothing effect on the error of any approximation, because this iterative solver reduces nonsmooth (or rough) parts of the error with a small number of iterations (independent of the mesh size h = 1/n) (Section 4, Сhapter I) [35]. As a result, the multigrid methods use this relaxation approach for error smoothing. The algorithmic complexity of the Gauss–Seidel method strongly depends on the ordering of equations and unknowns in many applications. Also, the possibilities of vectorized and parallel computing depend strongly on this ordering. For theoretical analysis, h h h we consider a block ordering, where the unknowns uijh , ui+1j , uij+1 , and ui+1j+1 form a 2 × 2 block as shown in Figure 1.5. Such block ordering is used for the coupling solving of systems of PDEs (so-called Vanka-type iterations [36]). Each discrete BVP can be written in the general form p

h h h h aijw ui−1j + aije ui+1j + aijs uij−1 + aijn uij+1 + aij uijh = bij ,

where the superscripts w, e, s, n, and p denote west, east, south, north, and pole vertex of the five-point stencil, respectively. The idea of block relaxation is to solve not only all the equations at one grid point collectively, but all the equations at a set of grid points (block). These blocks may or may not be overlapping [35]. Iterations of the block Gauss–Seidel method are based on the solution of systems of linear algebraic equations associated with each block of the unknowns. This means that all the unknowns forming the block are updated collectively. Formation of the SLAE associated with each 2 × 2 block of the unknowns (Figure 1.5) p

aij

w ai+1j ( s aij+1

0

where

aije

aijn

0

aij+1

p

ai+1j s ai+1j+1

0

p

w ai+1j+1

uijh

0

n ai+1j

e aij+1 p

ai+1j+1

)(

h ui+1j

h uij+1

h ui+1j+1

b1 b ) = ( 2) , b3 b4

5 An introduction to the black-box solver

� 15

h h h unknowns. , and ui+1j+1 , uij+1 Figure 1.5: Block 2 × 2 with uijh , ui+1j

h h b1 = bij − aijw ui−1j − aijs uij−1 ,

e h s h b2 = bi+1j − ai+1j ui+2j − ai+1j ui+1j−1 ,

w h n h b3 = bij+1 − aij+1 ui−1j+1 − aij+1 uij+2 ,

e h n h b4 = bi+1j+1 − ai+1j+1 ui+2j+1 − ai+1j+1 ui+1j+2 ,

has been demonstrated in [21]. Solution of the SLAE by the Gaussian elimination with h h h (partial) pivoting gives the updated values uijh , ui+1j , uij+1 , and ui+1j+1 forming the block of unknowns. The Gauss–Seidel method with the block ordering of unknowns depends weakly on the stencil used for discretization of the differential operators. This block Gauss –Seidel method will serve as a model Vanka-type iterative solver for systems of PDEs for experimental study of the execution time (Section 6, Сhapter I). For estimating the algorithmic complexity, we assume that a block ordering of the unknowns is used, i. e., the number of unknowns becomes N = nb Nb , where nb is the number of blocks, Nb is the number of unknowns forming each block, N ≈ nd , d = 2, 3, is the number of unknowns. The computational cost of each block Gauss–Seidel iteration is 3

𝒲1 = Cnb Nb = Cnb N −2

3

arithmetic operations (ao). The number of iterations is estimated as Θ = nϰ = N ϰ/d ,

16 � I Forward to black-box software where the parameter ϰ depends on the condition number of the coefficient matrix A (Section 2, Сhapter IV), and d = 2, 3. Then the algorithmic complexity of the block Gauss– Seidel method (4.8) becomes 𝒲 = Θ𝒲1 = Cnb N −2

3+ϰ/d

ao.

(1.13)

Use of the uniform grid in the above-mentioned linear analysis makes it possible to obtain expression (1.13) for estimating computational work. If nb = 1, then the block Gauss–Seidel iteration coincides with the Gaussian elimination ϰ = 0 ⇒ 𝒲 = CN 3 ao, i. e., the complexity 𝒲 is ϰ-independent. The Gaussian elimination is implemented without the problem-dependent components, but large complexity (𝒲 = O(N 3 ) ao) allows for using this direct method for solving small SLAEs. The point ordering of an unknown corresponds to nb = N. In this case the algorithmic complexity of the Gauss–Seidel method becomes 𝒲 = CN

1+ϰ/d

ao.

It is expected that the point Gauss–Seidel method will be faster than the Gaussian elimination for sufficiently large N, i. e., ϰ < 2d. The parameter ϰ depends on the coefficient matrix A: ϰ → 0 for well-conditioned problems and the point Gauss–Seidel method has almost optimal algorithmic complexity, i. e., 𝒲 → CN ao. As a rule, it is not useful to accelerate a highly efficient solver. The extra effort does not pay off. Note that the coefficient matrix of the resulting SLAE (1.11) has a deteriorating condition number cond(A) = O(h−2 ) as h → 0 (Section 2, Сhapter IV), i. e., A is an illconditioned matrix. Thus the simplest problem of constructing black-box iterative algorithm for solving linear (initial-)boundary value problems on a uniform grid can be formulated as follows: 1) If A in (1.11) is a well-conditioned coefficient matrix (ϰ → 0), then the black-box iterative algorithm must coincide with the Gauss–Seidel method. 2) If A in (1.11) is an ill-conditioned coefficient matrix (0 ≤ ϰ < 2d), then it is necessary to add the lowest number of problem-dependent components to the Gauss–Seidel method in order to: a) Reduce the algorithmic complexity (1.13) down to a close-to-optimal value 3

𝒲 = Cnb N log N ao, −2

1 ≪ nb ≤ N,

(1.14)

in sequential implementation. b) Ensure that a parallel algorithm should be faster than the fastest sequential solver.

5 An introduction to the black-box solver

� 17

In general, development of black-box algorithm is more difficult than that of solving linear BVP on a uniform grid. We summarize these considerations as follows: – As a rule, systems of nonlinear strongly coupled PDEs in complex domains (multiphysics simulation) are needed to solve in a (de)coupled manner for industrial applications, so theoretical analysis of algorithmic complexity such as (1.13) becomes more difficult. – Simplicity of Gauss–Seidel iterations makes this algorithm attractive for smoothing in low-memory sequential or parallel multigrid. For real-life applications, it is far from trivial to choose optimal components uniformly for a large class of problems. In many cases, the Krylov subspace methods may have advantages. Therefore each iterative algorithm for the numerical solution of nonlinear (initial-)boundary value problems has at least three problemdependent components: the ordering of unknowns, (de)coupled iterations for a locally/globally linearized discrete problem, and a stopping criterion. In Chapters II and III, we will return to the estimation of algorithmic complexity (1.14) in a parallel algorithm analysis and in solving well-conditioned problems. As a result, a black-box solver requires black-box optimization (i. e., the optimal choice of the problem-dependent components for the given problem without user control). Definition I.2. Black-box solver is a black-box optimized robust algorithm. Black-box optimization will be considered in Section 7, Сhapter II. In general, constructing black-box algorithm for solving nonlinear (initial-)boundary value problems remains unchanged: 1) If the sequential Newton–Gauss–Seidel iterations converge slowly, then the convergence should be accelerated up to close-to-optimal value using the least number of extra problem-dependent components. 2) The parallel nonlinear algorithm should be faster than the fastest sequential algorithm. The development of black-box solvers for multidisciplinary applications based on the solution of the “robustness–efficiency–parallelism” problem is a new challenge for scientific computing. Obviously, it is impossible to construct a highly parallel optimal solver without problem-dependent components. Therefore we are forced to reduce the mutually exclusive requirements for the desired algorithm: “the least number of extra problem-dependent components” instead of “without problem-dependent components” and “close-to-optimal complexity” instead of “optimal complexity”. This will lead to different approaches to construct a solver for black-box software; we expect that all approaches will be fruitful, but single best code will emerge.

18 � I Forward to black-box software

6 Segregated and coupled solvers for systems There are two main methods for generalizing iterative algorithms to systems of PDEs. We consider the application of the Gauss–Seidel iterations to solve the model system of PDEs NM

Δu(ς) − α(NM − 1)u(ς) + α ∑ u(m) = −f ,

ς = 1, 2, . . . , NM ,

m=1 m=ς̸

(1.15)

in the unit cube Ω = (0, 1)3 , where Δ is the Laplace operator Δ=

𝜕2 𝜕2 𝜕2 + + , 𝜕x 2 𝜕y2 𝜕z2

(1.16)

NM is the number of PDEs, and α ⩾ 0 is a parameter. Assuming that the exact solution (1.15) is given by ua(ς) (x, y, z) = exp(x + y + z),

ς = 1, 2, . . . , NM ,

(1.17)

substitution of (1.17) into (1.15) gives the right-hand side function f (x, y, z) = −3 exp(x + y + z) and the Dirichlet boundary conditions. If α → 0, then system (1.15) is quite close to NM independent equations Δu(ς) = −f , so that some iterative solver efficient for the Poisson equation Δu = −f will be efficient for (1.15). It means that all equations of system (1.15) are weakly coupled. As an example, we describe segregated (decoupled) point Gauss–Seidel iterations. If α ≫ 0, then equations of system (1.15) are strongly coupled and all grid functions should be updated collectively. A characteristic feature of the model system (1.15) is that solution (1.17) is independent of the parameter α. In other words, α and the mesh size h strongly affect the convergence rate of the iterative algorithms, but the accuracy of the numerical solution is α-independent. In practice, two main methods are used for solving systems of (integro-)differential equations. In segregated (decoupled) algorithms, each ςth equation of the system is solved by some numerical method to update the approximation to the solution u(ς) with fixed (i. e., taken from the previous iteration) values of other functions u(m) , m ≠ ς. The standard seven-point approximation of system (1.15) becomes (ς) (ς) (ς) û i−1jk − 2û ijk + û i+1jk

h2

+

(ς) (ς) (ς) û ij−1k − 2û ijk + û ij+1k

h2

NM

+

(ς) (m) − α(NM − 1)û ijk + α ∑ û ijk = −f (xi , yj , zk ), m=1 m=ς̸

(ς) (ς) (ς) û ijk−1 − 2û ijk + û ijk+1

h2

(1.18)

6 Segregated and coupled solvers for systems

� 19

(ς) where the grid function û (ς) is a discrete analogue of the function u(ς) , i. e., û ijk =

u(ς) (xi , yj , zk ), and h = 1/n is the mesh size of uniform grid. We rewrite (1.18) as

(ς) (ς) (ς) (ς) (ς) (ς) û i−1jk + û i+1jk û ij−1k + û ij+1k û ijk−1 + û ijk+1 6 (ς) ̂ + + − ( 2 + α(NM − 1))uijk + h h2 h2 h2 NM

(m) = −f (xi , yj , zk ) − α ∑ û ijk . m=1 m=ς̸

Then the point Gauss–Seidel iteration of the segregated algorithm can be represented as follows: do ς = 1, NM

do k = 2, n do j = 2, n do i = 2, n (ς) û ijk :=

(ς) (ς) (ς) (ς) (ς) (ς) NM û ij−1k û ijk−1 û i−1jk + û ij+1k + û ijk+1 + û i+1jk 1 (m) ̂ (f (xi , yj , zk )+α ∑ uijk + + + ) D h2 h2 h2 m=1 m=ς̸

end do end do end do end do where := denotes assignment “left equals right”, and D=

6 + α(NM − 1). h2

(1.19)

It is expected that the segregated algorithms work fine for solving the weakly coupled discrete system (1.18), i. e., if 6 ≫ α(NM − 1)h2 . A natural extensions of the point Gauss–Seidel method are collective iterations in the system case. In coupled algorithms the block ordering of the unknowns makes it possible to consider the relationship of individual equations in the system. We rewrite (1.18) as NM

(ς) (m) −Dû ijk + α ∑ û ijk = −f (xi , yj , zk ) − m=1 m=ς̸

with parameter D (1.19).

(ς) (ς) û i−1jk + û i+1jk

h2



(ς) (ς) û ij−1k + û ij+1k

h2



(ς) (ς) û ijk−1 + û ijk+1

h2

20 � I Forward to black-box software (1) (2) Furthermore, we form the vector of unknowns u = (û ijk û ijk . . . û ijkM )T and the right(N )

hand side vector b = (b1 b1 . . . bN )T with components M

bς = −f (xi , yj , zk ) −

(ς) (ς) û i−1jk + û i+1jk

h2



(ς) (ς) û ij−1k + û ij+1k

h2



(ς) (ς) û ijk−1 + û ijk+1

h2

,

ς = 1, 2, . . . , NM .

Then the SLAE associated with each grid point (xi , yj , zk ) becomes −D α (α ( .. . α (

α −D α .. . α

α α −D .. . α

⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ .. . ⋅⋅⋅

(1) û ijk

b1 α (2) û ijk b α 2 ) ( ( û (3) ) ( b ) α) ) ( ) ( ijk ) = ( 3 ) . .. .. ( . ) .. . . b −D) (NM ) ( NM ) (û ijk )

(1.20)

(ς) All unknowns û ijk , ς = 1, 2, . . . , NM , in each grid point are relaxed simultaneously by solving (1.20) using Gaussian elimination (if the number of equations in (1.20) is relatively small: NM ≲ 30). The solution of the SLAE (1.20) associated with each grid point (xi , yj , zk ) defines the iteration of the coupled algorithm. To illustrate these (de)coupled iterations, we solve the system of PDEs (1.15) (NM = 20) for the standard seven-point discretization (1.18) on the 61 × 61 × 61 uniform grid. The values û (ς) = ς, ς = 1, 2, . . . , NM , are taken to be a starting guess. The segregated iteration consists of ten Gauss–Seidel iterations applied to each equation of system (1.15). The relative Chebyshev (max) norm (4.1) of the residual vector

‖r‖∞ =

(ς) maxς=1,2,...,NM maxijk |rijk |

̃(ς) | minς=1,2,...,NM maxijk |rijk

,

where (ς) rijk = −f (xi , yj , zk ) −



(ς) (ς) (ς) û i−1jk − 2û ijk + û i+1jk

h2

(ς) (ς) (ς) û ijk−1 − 2û ijk + û ijk+1

h2



(ς) (ς) (ς) û ij−1k − 2û ijk + û ij+1k

h2

NM

(ς) (m) + α(NM − 1)û ijk − α ∑ û ijk , m=1 m=ς̸

̃(ς) is the residual vector of the starting guess û (ς) = ς, ς = 1, 2, . . . , NM , and the Chebyshev rijk norm (4.1) of the error vector ‖e‖∞ =

󵄨 (ς) 󵄨 max max󵄨󵄨󵄨û ijk − exp(xi , yj , zk )󵄨󵄨󵄨 ijk

ς=1,2,...,NM

is taken as convergence criterion.

6 Segregated and coupled solvers for systems

� 21

Figure 1.6 (left) illustrates the convergence of the segregated and coupled iterations: the Chebyshev norms of the residual vectors ‖r DS ‖∞ and the error vectors ‖eDS ‖∞ after 1000 segregated iterations (Decoupled Solver – DS) for various values of α. It is easy to see that the residual vector norm ‖r DS ‖∞ reduces slowly for α ≳ 300, since the equations of system (1.15) are strongly coupled. In addition, the figure represents similar convergence criteria of the coupled iterations (Coupled Solver – CS). The results of the computational experiment show that the segregated algorithm is preferable for α ≲ 700, whereas the coupled algorithm is preferable for α ≳ 700. Figure 1.6 (right) shows the convergence of the segregated and coupled algorithms for α = 0 and α = 2000. Since the algorithmic complexity of (de)coupled iterations is different, execution time is plotted on the x-axis.

Figure 1.6: Convergence of the segregated and coupled iterations.

The main practical result of the above-mentioned computational experiment is that the main components of an iterative solver (ordering of unknown, choice of (de)coupled iterations, and stopping criterion) are, in general, no longer known in advance but have to be obtained from the black-box optimization of the algorithm (Section 7, Сhapter II). The problems of computational fluid dynamics (CFD) is an example for which the difference of segregated and coupled algorithms becomes clear. Large nonlinear systems of the saddle point type arise in a wide variety of applications throughout computational science and engineering. Because of their indefiniteness, such nonlinear systems represent a significant challenge for the developers of solvers [3].

22 � I Forward to black-box software The incompressible Navier–Stokes equations for a constant property flow without body forces are given by [4, 6, 33]: a) X-momentum 2 𝜕p 𝜕u 𝜕(u ) 𝜕(uυ) 𝜕(uw) 1 𝜕2 u 𝜕2 u 𝜕2 u + + + = − + ( 2 + 2 + 2 ). 𝜕t 𝜕x 𝜕y 𝜕z 𝜕x Re 𝜕x 𝜕y 𝜕z

b) Y -momentum 2 𝜕p 1 𝜕2 υ 𝜕2 υ 𝜕2 υ 𝜕υ 𝜕(uυ) 𝜕(υ ) 𝜕(υw) + + + = − + ( 2 + 2 + 2 ). 𝜕t 𝜕x 𝜕y 𝜕z 𝜕y Re 𝜕x 𝜕y 𝜕z

c)

Z-momentum 2 𝜕p 1 𝜕2 w 𝜕2 w 𝜕2 w 𝜕w 𝜕(wu) 𝜕(wυ) 𝜕(w ) + + + = − + ( 2 + 2 + 2 ). 𝜕t 𝜕x 𝜕y 𝜕z 𝜕z Re 𝜕x 𝜕y 𝜕z

d) Continuity equation 𝜕u 𝜕υ 𝜕w + + = 0. 𝜕x 𝜕y 𝜕z Here 3D incompressible Navier–Stokes equations are given in conservative form using the velocities u, υ, w and the pressure p as primary variables (the so-called primitive variable formulation). The common practice is to write these equations in a nondimensional form using dimensionless quantities obtained through the use of proper characteristic scales. The use of nondimensional variables allows for the reduction of the number of appropriate parameters for the problem being considered. The Reynolds number Re may be interpreted as a measure of the relative importance of advection (inertia) and diffusion (viscous) momentum fluxes. We will not discuss the development of stable (non)staggered discretizations, stable iteration schemes, and other algorithm components but focus on some practical aspects of pressure p computations. The main problem of numerical solution of the incompressible Navier–Stokes equations is the lack of an equation for computing the pressure. The momentum equations are used to compute velocity components, and the pressure is determined in such a way that the velocity components satisfy the continuity equation. Segregated algorithms are iterative solvers, which treat the momentum equations and a “pressure equation” separately in an outer iteration. Within this iteration, the pressure is updated using a Poisson-type equation with dubious boundary conditions. For simplicity, to avoid formal complications, we use a uniform grid hs = hx = hy = hz , where the mesh sizes satisfy the limitation 1 1 ht < hs < . 2 Re

(1.21)

The condition hs < 2 Re−1 allows using central differences to approximate convective terms, and in the case 2ht < hs the nonstationary terms in the momentum equations are dominant.

6 Segregated and coupled solvers for systems

� 23

First, for convenience, we rescale the pressure p=2

hs p,̃ ht

where p̃ is the rescaled pressure. Let us use the Crank–Nicolson scheme to approximate the nonstationary 3D Navier– Stokes equations on a uniform staggered grid [27]: a) a discrete analogue of the X-momentum (n+1) uijk = −p̃ (n+1) + p̃ (n+1) + Â ijk , ijk i−1jk

(1.22)

where (n) Â ijk = uijk + A(n) + A(n+1) − p̃ (n) + p̃ (n) , ijk ijk ijk i−1jk

ht 𝜕(u2 ) 𝜕(uυ) 𝜕(uw) h 1 𝜕2 u 𝜕2 u 𝜕2 u ( + + ) + t ( 2+ 2+ 2) , 2 𝜕x 𝜕y 𝜕z ijk 2 Re 𝜕x 𝜕y 𝜕z ijk (m)

A(m) =− ijk

(m)

m = n, n + 1,

b) a discrete analogue of the Y -momentum υ(n+1) = −p̃ (n+1) + p̃ (n+1) + B̂ ijk , ijk ijk ij−1k

(1.23)

where B̂ ijk = υ(n) + B(n) + B(n+1) − p̃ (n) + p̃ (n) , ijk ijk ijk ijk ij−1k h 1 𝜕2 υ 𝜕2 υ 𝜕2 υ ht 𝜕(uυ) 𝜕(υ2 ) 𝜕(wυ) ( + + ) + t ( 2+ 2+ 2) , 2 𝜕x 𝜕y 𝜕z ijk 2 Re 𝜕x 𝜕y 𝜕z ijk (m)

B(m) =− ijk c)

(m)

m = n, n + 1,

a discrete analogue of the Z-momentum (n+1) wijk = −p̃ (n+1) + p̃ (n+1) + Ĉ ijk , ijk ijk−1

(1.24)

where (n) Ĉ ijk = wijk + C(n) + C(n+1) − p̃ (n) + p̃ (n) , ijk ijk ijk ijk−1

h 1 𝜕2 w 𝜕2 w 𝜕2 w ht 𝜕(uw) 𝜕(υw) 𝜕(w2 ) ( + + ) + t ( 2+ 2+ 2) , 2 𝜕x 𝜕y 𝜕z ijk 2 Re 𝜕x 𝜕y 𝜕z ijk (m)

C(m) =− ijk

A discrete analogue of the continuity equation (n+1) (n+1) (n) (n) ui+1jk − uijk 1 ui+1jk − uijk ( + ) 2 hs hs

(m)

m = n, n + 1.

24 � I Forward to black-box software (n+1) (n+1) υ(n) − υ(n) 1 υij+1k − υijk ij+1k ijk + ) + ( 2 hs hs

(n+1) (n+1) (n) (n) wijk+1 − wijk 1 wijk+1 − wijk + ( + )=0 2 hs hs

can be rewritten as (n+1) (n+1) (n+1) (n+1) − ui+1jk + uijk − υ(n+1) + υ(n+1) − wijk+1 + wijk = 0. ij+1k ijk

(1.25)

Assuming that the discrete analogues of the velocity components exactly satisfy the discrete analogue of the continuity equation at each time step n, we have (n) (n) (n) (n) ui+1jk − uijk + υ(n) − υ(n) + wijk+1 − wijk = 0. ij+1k ijk

Equations (1.22), (1.23), (1.24), and (1.25) can be rewritten in the matrix form

1 0 (0 ( (0 ( (0 0 (1

0 1 0 0 0 0 −1

0 0 1 0 0 0 1

0 0 0 1 0 0 −1

0 0 0 0 1 0 1

0 0 0 0 0 1 −1

(n+1) uijk

p̃ (n+1) + Â ijk i−1jk

(n+1) 1 ui+1jk −p̃ (n+1) + Â i+1jk i+1jk ) ( ) ( −1 ) ( (n+1) ) ( (n+1) ( υijk ) ( p̃ ij−1k + B̂ ijk ) ) ) ( 1 )( ) ) ( (n+1) ) ( (n+1) ) ) = (−p̃ (υ ̂ −1) + B ij+1k ) . ) ( ij+1k ) ( ij+1k ) ( ) 1 )( ) (w(n+1) ) ( p̃ (n+1) + Ĉ ( ijk ) ( ijk−1 ijk ) ) ) ( ( −1 (n+1) wijk+1 + Ĉ ijk+1 −p̃ (n+1) 0) ijk+1

( p̃ ijk

(n+1)

)

(

0

(1.26)

)

(n+1) (n+1) (n+1) (n+1) (n+1) (n+1) Here uijk , ui+1jk , υijk , υij+1k , wijk , and wijk+1 are discrete analogues of the velocity components on faces of the finite volume used for the approximation of the continuity equation, and p̃ (n+1) is a discrete analogue of the pressure in this finite volume center. ijk The basic idea of block relaxation is to solve the discrete Navier–Stokes equations locally cell by cell involving all the discrete equations located in the cell [36]. The exact solution of the resulting SLAE (1.26) is (n+1) uijk (n+1) ui+1jk

( ) ( (n+1) ) ( υijk ) ( ) ( ( (n+1) ) 1 ( (υ )= ( ( ij+1k ) 6 ( ( (n+1) ) ( (w ) ( ijk ) ( ) (n+1) wijk+1 ( (n+1) ̃ p ( ijk )

5 1 −1 1 −1 1 1

1 5 1 −1 1 −1 −1

−1 1 5 1 −1 1 1

1 −1 1 5 1 −1 −1

−1 1 −1 1 5 1 1

1 −1 1 −1 1 5 −1

p̃ (n+1) + Â ijk i−1jk −p̃ (n+1) + Â i+1jk ( i+1jk ) ) (n+1) )( ̂ ( ) ̃ p + B ) ( ij−1k ijk ) )( ), ) ( (n+1) ) (−p̃ ij+1k + B̂ ij+1k ) ) ( ) (n+1) p̃ + Ĉ )

ijk−1 (n+1) ̃ ( −pijk+1

ijk

+ Ĉ ijk+1 )

7 Nonlinear two-grid algorithm

� 25

i. e., all unknowns are relaxed simultaneously. As compared to the separate algorithms, the coupled Vanka-type solvers (special block Gauss–Seidel iterations) do not use a Poisson-type equation with dubious boundary conditions, i. e., the Navier–Stokes equations are solved in the original form. Since such ordering of unknowns for the Vanka-type iterations needs a geometric input caused by assignment of discrete velocity components and pressure to the finite volume, the coupled algorithms are difficult to use in algebraic multigrid solvers (AMG). The simplicity of these coupled iterations (1.26) is the result of condition (1.21), and generally the coefficient matrix in (1.26) is dense. The general form of the coupled iterations for solving the Navier–Stokes equations is considered in [21]. These block Gauss–Seidel iterations are slowly convergent due to the ellipticparabolic nature of nonstationary incompressible Navier–Stokes equations, so convergence acceleration techniques are needed. It should be emphasized that the more robust coupled iterations are much more expensive in computational sense than the segregated ones. Conclusion. Each iterative algorithm has the problem-dependent components. The optimal choice of the ordering of unknowns, iterations, and their (de)coupled implementations and others is not known in advance. All problem-dependent components should be defined automatically during the solution process without the user’s control (Section 7, Сhapter II).

7 Nonlinear two-grid algorithm In this section, we consider the (non)linear elliptic NM × NM system of PDEs 𝒩 (u) = f

(1.27a)

on a domain Ω ∈ ℝd together with a set of appropriate boundary conditions 𝒩𝜕Ω (u) = f 𝜕Ω

(1.27b)

at the domain boundary 𝜕Ω. Here 𝒩 is a nonlinear elliptic operator, f is a known function, and u = (u1 u2 . . . uN )T is the desired solution. M A discrete analogue of system (1.27a) with eliminated boundary conditions (1.27b) can formally be written in the form h

h

𝒩 (u)̂ = f ,

where û is the discrete approximation to the solution u. General nonlinear iterations can be represented in the form W (s) (û

(s+1)

− û ) = f h − 𝒩 h (û ), (s)

(s)

s = 0, 1, 2, . . . ,

(1.28)

26 � I Forward to black-box software (s) where the matrix W (s) (û ) defines a basic nonlinear algorithm. Further, we will assume that this iterative solver is a smoother, i. e., it reduces the nonsmooth (or rough) part of the error with a small number of iterations. As before, if the nonlinear solver converges slowly, then we try to accelerate the convergence using the lowest number of extra problem-dependent components. For this given purpose, the solution u of (1.27) can be represented as the sum of two functions

u = û + c,

(1.29)

where û is the known approximation to the solution, and c is аn unknown correction. This representation is called a Σ-modification of the solution u, and substitution of (1.29) into (1.27) leads to the Σ-modified system 𝒩 (û + c) = f

(1.30a)

together with a set of appropriate Σ-modified boundary conditions 𝒩𝜕Ω (û + c) = f 𝜕Ω .

(1.30b)

We abbreviate the discrete analogue of system (1.30a) with eliminated boundary conditions (1.30b) as h

h

h

h

𝒩 (û + c ) = f .

Our goal is to determine the correction ch , i. e., the difference of the exact solution uh h h and known approximation to the exact solution û since ch = uh − û . Let a computational grid GO be generated in the domain Ω ∪ 𝜕Ω, where NG is O the number of grid points. Figures 4.6, 4.5 and 4.2 show possible unstructured, blockstructured and structured grid GO . Further, the subscript “O” (original) will denote affilh iation to the original grid GO . Omitting the superscript h in û and ch , some discretization of system (1.30) on the grid GO can be denoted as h

h

𝒩O (û O + cO ) = f O ,

(1.31)

where 𝒩Oh , û O , cO , and f hO are discrete analogues of 𝒩 , u,̂ c, and f on the original grid GO . Unstructured grids become quite popular because unstructured automatic mesh generation is much easier than the generation of (block-)structured grids for very complicated 3D domains. One fundamental difference between the structured and unstructured meshes is that the latter has the provision of a variable number of neighboring cell vertices, unlike the former, which has a fixed number of neighboring cell vertices for all internal cells. From the multigrid point of view, unstructured grids are a complication. For a given unstructured grid, it is usually easy to define a sequence of finer grids, but it may be difficult to define a sequence of reasonable coarser grids [35]. The algebraic multigrid method (AMG) constructs a hierarchy of coarse grids automatically and

7 Nonlinear two-grid algorithm

� 27

is thus particularly well suited for problems on unstructured grids. However, constructing AMG components with desirable complementary properties can be complicated for strongly coupled nonlinear PDE systems. Our goal is to construct a two-grid algorithm with the lowest number of problemdependent components for numerical solving (1.27) starting with (non)linear BVP (NM = 1) up to a system of strongly coupled nonlinear PDEs (NM > 1). The two-grid algorithm will be based on the auxiliary space method (ASM). The ASM is a (non)nested two-level preconditioning technique based on a simple relaxation scheme (smoother) and an auxiliary space (here a structured grid is the auxiliary space). The basic idea of the ASM is to use an auxiliary nonlinear problem in an auxiliary space, where it is simpler to solve [39]. The solution of the auxiliary problem (auxiliary grid correction) is then transferred back to the original space. The mismatch between the auxiliary and original spaces is corrected by a few smoothing iterations. Theoretical analysis of linear ASM is given in Section 4, Сhapter IV. Let an auxiliary computational grid GA be generated, where NG (≈ NG ) is the A 0 number of auxiliary grid points. Figures 4.1, 4.5 and 4.2 show possible Cartesian, blockstructured and structured grid GA . Further, the subscript “A” (auxiliary) will denote affiliation to the auxiliary grid GA . The second-order finite volume discretization of (1.27) on GA becomes h

h

𝒩A (û A + cA ) = f A .

(1.32)

Equations (1.31) and (1.32) are independent of each other, and therefore we rewrite these equations in the form h

h

h

h

𝒩O (û O + cO ) − 𝒩O (û O ) = f O − 𝒩O (û O ), h 𝒩A (û A

+ cA ) −

h 𝒩A (û A )

=

f hA



h 𝒩A (û A ),

(1.33a) (1.33b)

by subtracting 𝒩Oh (û O ) and 𝒩Ah (û A ). The full approximation scheme (FAS) approach is based on the following assumptions for coupling (1.33) [35]: a) A change of the right-hand side function in (1.33b) by their value transferred from GO to GA f hA − 𝒩Ah (û A ) = ℛO→A (f hO − 𝒩Oh (û O )),

(1.34)

where ℛO→A is a restriction operator transferring the residual r hO = f hO − 𝒩Oh (û O ) from the original grid GO to the auxiliary grid GA . b) A change of the approximation to the solution in (1.33b) by their value transferred from GO to GA û A = ℛ̂ O→A û O , where ℛ̂ O→A is the transfer operator (in general, ℛ̂ O→A ≠ ℛO→A ).

(1.35)

28 � I Forward to black-box software Taking (1.34) and (1.35) into consideration, we continue with modified Eq. (1.33b), h

h

h

h

𝒩A (ℛ̂ O→A û O + cA ) − 𝒩A (ℛ̂ O→A û O ) = ℛO→A (f O − 𝒩O (û O )),

(1.36)

where unknown cA is an auxiliary grid correction. Nonlinear iterations of FAS can be represented as follows: (q) (q) 1. Transfer of the residual f hO − 𝒩Oh (û O ) and the approximation to the solution û O , where q is a counter of the intergrid iterations. 2. Solution of the auxiliary system by some nonlinear algorithm h

h

h

h

𝒩A (ℛ̂ O→A û O + cA ) − 𝒩A (ℛ̂ O→A û O ) = ℛO→A (f O − 𝒩O (û O )) ⇒ cA .

3.

(q)

(q)

(q)

Transfer of the correction cA to the original grid GO cO = 𝒫A→O cA ,

4.

where 𝒫A→O is a prolongation operator transferring the correction to GO . Computation of the starting guess for the smoothing iterations (q) (0) û O = cO + û O .

5.

Smoothing iterations on the original grid GO W (s) (û O

(s+1)

6.

− û O ) = f hO − 𝒩Oh (û O ), (s)

(s)

s = 0, 1, 2, . . . ,

where s is the smoothing iteration counter. Updating the approximation to the solution (q+1) (s+1) û O = û O ,

7.

where q is the intergrid iteration counter. Check convergence, repeat if necessary.

Figure 1.7 represents all stages of the above-mentioned nonlinear iterations. The results of the linear analysis show that if the smoothing and approximation properties hold, then the number of intergrid iterations does not depend on the size of the problem (Section 4, Сhapter IV). In addition, the two-grid algorithm is a variant of the defect correction approach [35]; it offers a general possibility to employ the second-order FVD on the auxiliary grid GA and to obtain high-order accuracy on the original grid GO . The grid generation in complex domains, which usually requires manual interaction and expertise in modern software, has become a major bottleneck of the mathematical modeling. This means that an efficient solver is required to compute the correction (q) cA (1.36) and to evaluate quickly the basic properties of the approximation û O + 𝒫A→O cA to the solution uO (1.31). High-precision correction computations (1.36) are generally not necessary at this stage, the Cartesian grid methods (or the immersed boundary methods

7 Nonlinear two-grid algorithm

� 29

Figure 1.7: A nonlinear FAS-based two-grid algorithm.

or cut-cell methods) can be seen as a good compromise between the accuracy of the correction cA (1.36) and how quickly it can be obtained by geometric multigrid. Hereinafter, only the structured auxiliary grid GA (i. e., the grid-generating multigrid structure, Figure 2.6) will be used to compute the correction cA (1.36). First, we compare the number of problem-dependent components in the basic algorithm (1.28) and in the two-grid one: a) A nonnested case: let GO and GA be the unstructured and structured grids, respectively (Section 3, Сhapter IV). The two-grid algorithm has three extra problemdependent components: transfer operators ℛO→A , ℛ̂ O→A , and 𝒫A→O (Figure 1.8).

Figure 1.8: 2D transfer operators for the nonnested two-grid algorithm.

b) A nested case: let GO and GA be block-structured grids (Section 3, Сhapter IV). In this case, we suppose that GA = GO ⇒ ℛO→A = ℛ̂ O→A = 𝒫A→O = I (Figure 2.9) in the absence of smoothing on the original grid and the two-grid algorithm has extra problem-dependent components (interblock interpolation). c) A nested case: let GO and GA be structured grids (Section 3, Сhapter IV). In this case, we suppose that GA = GO ⇒ ℛO→A = ℛ̂ O→A = 𝒫A→O = I (Figure 2.9) in the absence of smoothing on the original grid and the two-grid algorithm has no extra problemdependent components.

30 � I Forward to black-box software Example 1. To illustrate the above-mentioned (nonnested) two-grid algorithm, we consider the 1D nonlinear BVP u′′ − eu = f (x),

u(0) = 0, u(1) = 1.

(1.37)

Of course, 1D problems do not require the application of iterative methods, since for the algebraic systems resulting from discretization, a direct solution is efficient, but it can be analyzed by elementary methods in one-dimension iterative algorithms, and the convergence of the iterations and computational efforts are easily demonstrated. Let u(x) = x 4 be the exact solution, which leads to f (x) = 12x 2 − exp(x 4 ). First, we consider the Newton–Gauss–Seidel iteration for numerical solving (1.37). The uniform original grid GO in the domain [0, 1] is a set of points given by xi =

i−1 , n

i = 1, 2, . . . , n + 1.

The finite difference analogue of BVP (1.37) becomes υi−1 − 2υi + υi+1 − eυi = f (xi ), h2

(1.38)

where υi is a discrete analogue of the function u(x): υi = u(xi ) and h = 1/n is the mesh size. The Newton linearization is eυi ≈ eυi + ̄

𝜕eῡ 󵄨󵄨󵄨󵄨 ῡ ῡ ῡ 󵄨 (υ − ῡ i ) = e i + e i (υi − ῡ i ) = e i (υi − ῡ i + 1), 𝜕ῡ 󵄨󵄨󵄨x i

(1.39)

i

where ῡ i is the previous iterant value. The linearized form of (1.38) is υi−1 − 2υi + υi+1 ̄ ̄ − eυi υi = f (xi ) + eυi (1 − ῡ i ). h2 The Newton–Gauss–Seidel iteration can be represented as do i = 2, n υi :=

1 (υ + υi+1 − h2 (f (xi ) + eυi (1 − υi ))) 2 + h2 eυi i−1

(1.40)

end do where := denotes assignment «left equals right». Figure 1.9 demonstrates the results of computations. Starting with υ(0) = xi , the i (1) (2) (3) 4 approximations υi , υi , and υi to the exact solution υi = xi are shown in Figure 1.9 (left). We define the stopping criterion as 󵄨󵄨 υ − 2υ + υ 󵄨󵄨 󵄨󵄨 󵄨 υi −6 i i+1 − e − f (x ) ‖r‖∞ = max󵄨󵄨󵄨 i−1 i 󵄨󵄨󵄨 < 10 . i 󵄨󵄨 h2 󵄨

7 Nonlinear two-grid algorithm

� 31

Figure 1.9: Convergence of the Newton–Gauss–Seidel iterations (1.40) (n = 10).

The numerical experiment illustrates that the number of iterations Θ is proportional to n2 , i. e., Θ ∼ n2 , as shown in Figure 1.9 (right). Example 2. The next step is to illustrate the FAS two-grid algorithm. First, the auxiliary (nonnested) grid GA is generated as i − 0.5 1 , xiv = (xi + xi+1 ) = 2 n

1 i v xif = (xiv + xi+1 )= , 2 n

where xiv are the vertices of the auxiliary grid GA , xif are the finite volume faces, and the finite volume Vi is given by f Vi = {x | xi−1 ≤ x ≤ xif }.

The generated auxiliary grid GA is shown in Figure 1.10. It is clear that v f xi+1 − xiv = xif − xi−1 =h=

1 , n

i = 1, 2, . . . , n.

In addition, xi ≠ xiv , it means that the original grid GO simulates a unstructured one, and the auxiliary grid GA is a structured one.

Figure 1.10: Original and auxiliary grids for the two-grid algorithm.

32 � I Forward to black-box software Eq. (1.37) has the modified form (c + u)̂ ′′ − ec+u = f , ̂

where û and c = u−û are the approximation to the solution u and correction respectively. In the FAS approach, the differential analogue of (1.36) becomes ̂ ̂ ̂ (c + u)̂ ′′ − ec+u − (u)̂ ′′ + eu = f − (u)̂ ′′ + eu .

This equation can rewritten as ̂ ̂ c′′ − eu (ec − 1) = f − (u)̂ ′′ + eu .

Integration of this equation over the volume Vi leads to xif

xif

xif

f xi−1

f xi−1

f xi−1

1 1 1 ̂ ̂ ∫ c′′ dx − ∫ eu (ec − 1) dx = ∫ (f − (u)̂ ′′ + eu ) dx. h h h

(1.41)

Evaluation of the integrals define the finite volume discretization. The first integral on the left-hand side of (1.41) is approximated as xif

h ch − 2cih + ci+1 1 , ∫ c′′ dx ≈ i−1 h h2 f xi−1

where ch is the discrete analogue of the correction c: cih = c(xiv ). The midpoint rule is used to approximate the second integral on the left-hand side of (1.41): xif

1 ̂ 󵄨 ∫ eu (ec − 1) dx ≈ exp(û h 󵄨󵄨󵄨x v )(exp(cih ) − 1), i h f xi−1

where û h is the discrete analogue of the approximation to solution u.̂ Using Newton linearization (1.39) and linear interpolation 1 󵄨 h û h 󵄨󵄨󵄨x v = (û ih + û i+1 ), i 2 we obtain xif

1 1 ̂ h ))(exp(cīh )(cih − cīh + 1) − 1), ∫ eu (ec − 1) dx ≈ exp( (û ih + û i+1 h 2 f xi−1

where cīh is the previous iterant value.

7 Nonlinear two-grid algorithm

� 33

The trapezoidal rule is used to approximate the integral on the right-hand side of (1.41) xif

1 1 ̂ ∫ (f − (u)̂ ′′ + eu ) dx ≈ (ri + ri+1 ), h 2 f xi−1

where ri is the residual computed on the original grid GO : ri = f (xi ) −

h h û i−1 − 2û ih + û i+1 + exp(û ih ). h2

(1.42)

In the case of second-order equations with Dirichlet boundary conditions, which means h h that û 1h = 0 and û n+1 = 1, the residuals in the boundary points are zero: r1h = rn+1 = 0. Finally, Eq. (1.41) becomes h h ci−1 − 2cih + ci+1 1 1 h − exp( (û ih + û i+1 ))(exp(cīh )(cih − cīh + 1) − 1) = (ri + ri+1 ) 2 2 2 h

or h h ci−1 − ai cih + ci+1 = bi ,

where the coefficient ai and the right-hand side bi are 1 h ) + cih ), ai = 2 + h2 exp( (û ih + û i+1 2

(1.43a)

bi =

(1.43b)

h2 1 h (r + ri+1 ) + h2 exp( (û ih + û i+1 ))(exp(cih )(1 − cih ) − 1). 2 i 2

Figure 1.10 shows that the boundary vertices x1v and xnv of the auxiliary grid GA are not located on the boundaries of the domain [0, 1]. The Dirichlet conditions û 1h = 0 and h û n+1 = 1 lead to corrections ch from boundary vertices x0f and xnf , which are assumed to be 0 in this case. The formulas for quadratic interpolation of corrections can be applied near the domain boundaries to eliminate the corrections ch at the ghost vertices2 (ξ = 1/2) (2.16): 1 c0h = −2c1h + c2h , 3 1 h h cn+1 = −2cnh + cn−1 . 3 2 Such grid points outside the computational domain are called ghost points.

34 � I Forward to black-box software Using these interpolation formulas, it is also possible to eliminate the corrections ch at the ghost points, for example, 4 1 b1 = c0h − a1 c1h + c2h = −2c1h + c2h − a1 c1h + c2h = −(2 + a1 )c1h + c2h . 3 3 1. 2. 3.

Iterations of the FAS two-grid algorithm take the form: Starting guess: û ih = xi , cih = 0. Computation of the residual r on the original grid GO (1.42). i=1 (a) Computation of the coefficient a1 (1.43a) and the right-hand side b1 (1.43b). (b) Computation of the correction c1h =

4.

1 L+3 + 1. 2d − 1

If the coarsest grids used in RMT have approximately three points in each spatial direction (2.7), then 3L3 +1 ≈ n10 , +

and the maximum discretization parameter becomes p

nmax = 3

2d

2d −1

.

If n10 < nmax , then the parallel RMT will be more efficient than the sequential V-cycle. For 3D BVPs (d = 3), we have d=3 ⇒

2d 8 = ≈ 1.143 ⇒ nmax ≈ 31.143p . d 2 −1 7

For three-thread implementation (p = 3), nmax ≈ 33 , which means that the parallel RMT will be slower than the sequential V-cycle for p = 3. For nine-thread implementation (p = 9), nmax ≈ 310 . Using (3.2), it is possible to estimate the speed-up as 𝒮̃ = 9

23 1 10.286 ≈ + . + 3 2 − 1 L3 + 1 L3 + 1

2 Geometric parallelism �

77

Considering a linear discrete BVP with a structured sparse 5013 × 5013 (⇒ L+3 = 5, n10 = 500 < nmax ≈ 310 , h = 1/500) coefficient matrix, we have 𝒮̃ ≈

10.286 10.286 = = 1.7. L+3 + 1 6

Taking into account the roughness of the complexity analysis, we can conclude that the execution time of the parallel RMT for p = 9 and the sequential V-cycle are approximately the same. The speed-up (3.2) for p = 27 can be estimated as 𝒮 ̃ = 27

23 1 ≈ 5, 23 − 1 5 + 1

i. e., a noticeable speed-up is expected. Conclusion. An efficient parallelization of RMT can be achieved by distributing the set of discrete tasks over p = 3κ , κ = 1, . . . , d, computing units. Theoretically, the execution time of the parallel RMT implemented over nine computing units is approximately equal to the execution time of the sequential V-cycle. To achieve high parallel efficiency of RMT, it is necessary to consider two special cases: 1) The number of grids is larger than or equal to the number of computing units (geometric parallelism). 2) The number of grids is less than the number of computing units (algebraic parallelism).

2 Geometric parallelism Section 5, Chapter II illustrated FVD on the coarse grids, the transfer operators, and the coarse grid solutions for 1D BVP. Figure 2.10 shows that the discrete BVPs on three coarse grids G11 , G21 , and G31 can be solved in parallel. A starting guess for the finer grid solution assembled from the solutions on coarser grids is sufficiently accurate (2.48). It begins with the generalization of 1D results for 3D BVP 𝜕2 u 𝜕2 u 𝜕2 u + + = −f (x, y, z) 𝜕x 2 𝜕y2 𝜕z2

(3.3)

in the unit cube Ω = (0, 1)3 . If the exact solution of (3.3) is given by ua (x, y, z) = exp(x + y + z),

(3.4)

then substitution of (3.4) into (3.3) gives the right-hand side function f (x, y, z) = −3 exp(x+ y + z) and the Dirichlet boundary conditions. In fact, this BVP is system (1.15) for NM = 1.

78 � III Parallel robust multigrid technique Let the uniform grid G10 be generated in the domain [0, 1]3 as a product of three 1D grids (3.1), where (ψvm , ψfm ) ∈ G10 , ψ

ψvm =

m−1 , ψ n0 1

1 ψfm = (ψvm +ψvm+1 ), 2

m = 1, 2, . . . , ψ n10 +1,

ψ = (xyz)T ,

and ψvm and ψfm are the vertices and finite volume faces, respectively. The finite volume Vi is defined as f Vijk = {(x, y, z) | xi−1 ≤ x ≤ xif , yfj−1 ≤ y ≤ yfj , zfk−1 ≤ z ≤ zfk }.

The standard seven-point FVD of (3.3) on the uniform grid G10 is abbreviated as Δh uh = −f h , where Δh , uh , and f h are the discrete analogues of the Laplace operator, the solution u, and the right-hand side function f , respectively. The BVP is solved by RMT with x n10 = y n10 = z n10 = n10 = 100 (hx = hy = hz = 1/100) using the stopping criterion 󵄨 󵄨 max󵄨󵄨󵄨Δh uh + f h 󵄨󵄨󵄨 < 10−6 .

(3.5)

ijk

The error of the numerical solution is defined by comparison of the exact and approximated solutions 󵄨 h 󵄨󵄨 ‖e‖∞ = max󵄨󵄨󵄨ua (xiv , yvj , zvk ) − uijk 󵄨󵄨,

(3.6)

ijk

where ua and uh are the exact (3.4) and numerical solutions, respectively. The finest grid G10 generates the multigrid structure ℳ𝒮 (G10 ). All coarse grids Gk1 , k = 1, 2, . . . , 3d , d = 2, 3 of the first level (l = 1) generate their own multigrid structures ℳ𝒮 (Gk1 ). According to Property 1 (p. 41), l l Gnl ∩ Gm = ⌀ ⇒ ℳ𝒮 (Gnl ) ∩ ℳ𝒮 (Gm ) = ⌀,

n ≠ m, l ≠ 0.

Figure 3.1 demonstrates the multigrid structures ℳ𝒮 (G10 ) and ℳ𝒮 (Gk1 ). Let the finest grid G10 be omitted, i. e., 3d independent multigrid structures ℳ𝒮 (Gk1 ) are used for solving (3.3) instead of the single structure ℳ𝒮 (G10 ) (Figure 3.2). In other words, 3d independent SLAEs with ≈ (n10 + 1)d /3d × (n10 + 1)d /3d coefficient matrix are solved instead of the single SLAE with (n10 + 1)d × (n10 + 1)d coefficient matrix. It is clear that the 3d SLAEs can be solved in parallel independently of the iterative solver used. Figure 3.3 represents the error of the numerical solutions ‖e‖∞ (3.6) obtained on the multigrid structures ℳ𝒮 (G10 ) and ℳ𝒮 (Gk1 ), k = 1, . . . , 3d , starting with the iterant zero on the 101 × 101 × 101 uniform finest grid G10 (n10 = 100, h = 1/100). Taking into account the stopping criterion (3.5), the iterative solution of the model BVP on the

2 Geometric parallelism

� 79

Figure 3.1: Multigrid structures ℳ𝒮(Gk1 ) generated by the coarse grids Gk1 , k = 1, . . . , 3d of the first level (l = 1).

Figure 3.2: Geometric parallelism of RMT: parallel solution on 3d independent discrete BVPs on the multigrid structures ℳ𝒮(Gk1 ), k = 1, . . . , 3d .

finest grid G10 is a reduction of the error of the zero starting guess (‖e(0) ‖∞ ≈ e3 ≈ 20) down to ‖e‖∞ ≈ 10−6 . A starting guess to the exact solution (3.4) assembled from 27 (d = 3) coarse grid solutions obtained on the independent multigrid structures ℳ𝒮 (Gk1 ), k = 1, . . . , 3d , is sufficiently accurate: estimation (2.48) predicts that the difference does not exceed ℓ significant digits for the second-order discretization, where ℓ is the serial number of the level of the coarse grids generated by the independent multigrid structures (here ℓ = 1). Figure 3.3 illustrates the accuracy of the starting guess to the finest grid solution assembled from the solutions obtained in parallel on coarser grids. Now the influence of parallel architectures on the efficiency of the parallel RMT is illustrated using the measures of parallelism: Definition III.2. The speed-up S and efficiency E of a parallel algorithm are S = pE =

T(1) , T(p)

(3.7)

80 � III Parallel robust multigrid technique

Figure 3.3: Error of the numerical solutions obtained on the multigrid structures ℳ𝒮(G10 ) and ℳ𝒮(Gk1 ), k = 1, . . . , 27.

where T(1) is the execution time for a single computing units, and T(p) is the execution time using p computing units [26]. It is clear that the numerical solutions of 27 independent discrete BVPs (Figure 3.3) should have full parallelism: E = 1. Personal computers (Intel(R) Core(TM) i7-4790 [email protected] GHz) and computer cluster2 K-60 are used for the computational experiments. The parallel OpenMP-based computer program for the numerical solution of these BVPs on the multigrid structures ℳ𝒮 (Gk1 ), k = 1, . . . , 27, is given in [23]. Figure 3.4 represents the obtained efficiency of the parallel Vanka-type smoother [24]. Inefficient memory access is one of the most common performance problems in the parallel programs. The speed of loading data from memory traditionally lags behind the speed of being processed. The trend of placing more and more cores on a chip means that each core has a relatively narrower channel for accessing shared memory resources. This results in the reduction of the parallel program efficiency for the 27 threads. The parallel solution of discrete BVPs on the multigrid structures ℳ𝒮 (Gk1 ), k = 1, . . . , 3d , is a useful test for a parallel computer to measure the real efficiency of parallel RMT. Values E ≈ 1 correspond to the expected full parallelism. In addition, the execution time depends on the arrangement of the unknowns in the memory, so a superlinear speed-up (E > 1) is observed for three-thread implementation.

2 Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences, www.kiam.ru/MVS/ resourses/k60.html

2 Geometric parallelism

� 81

Figure 3.4: Parallel solution efficiency on the multigrid structures ℳ𝒮(Gk1 ), k = 1, . . . , 27.

Typically, specific results of the parallel efficiency of an algorithm are obtained by measurements on a particular parallel computer. Often, not only an algorithm is evaluated, but different parallel computer architectures are also implicitly included in the comparison [35]. The most elegant large-scale granular3 geometric parallelization of RMT can be achieved by distributing the 3d discrete tasks over 3d computing units. In this case, the private arrays for the coarse grid correction can be declared in each thread for OpenMP technology. Approximation to the solution is stored in a shared array due to Property 1 (p. 41). The parallel cycle of RMT is shown in Figure 3.5. In the first multigrid iteration, it is assumed that the finest grid G10 is omitted and the coarse grids Gk1 , k = 1, . . . , 3d , of the first level (l = 1) are considered the finest grids (Figure 3.2). The discrete BVPs are solved on the multigrid structures ℳ𝒮 (Gk1 ), k = 1, . . . , 3d , generated by the coarse grids Gk1 , k = 1, . . . , 3d , of the first level (l = 1). In the following, the coarse grids, which are considered the finest grids in the solution process, will be called dynamic finest grids (Figure 3.5). The first multigrid iteration stopped after smoothing on the finest grid (update the approximation to the solution, reset to zero the coarse grid correction, and check the stopping criterion). Linear analysis of the iteration will be given in Section 4, Chapter IV. The subsequent multigrid iterations for the low-thread parallelization are performed in the traditional manner by implementing the geometric parallelism on coarser levels and algebraic parallelism on finer levels (Section 3, Chapter III). 3 Large-scale granularity means large tasks that can be performed independently in parallel [26].

82 � III Parallel robust multigrid technique

Figure 3.5: The parallel cycle of RMT for solving BVPs (low-thread parallelization): ∘ – geometric parallelism, ∙ – algebraic parallelism.

Conclusion. The geometric parallelism of RMT is based on nonoverlapping the finest grid partition for distribution of 3d , d = 2, 3, independent tasks over p = 3κ , κ = 1, 2, . . . , d, computing units in the low-thread implementation. The assemblage of the numerical solutions obtained on the multigrid structures generated by the dynamic finest grids results in the starting guess for the finest grid solution. The difference between this starting guess and the finest grid numerical solution does not exceed ℓ significant digits for the second-order FVD, where ℓ is the serial number of the dynamic level (Figure 3.5). The geometric parallelism does not require parallelization of the iterative/direct solvers. In addition to solving the discrete BVPs, the main goal of the first multigrid iteration is the black-box optimization of the computational algorithm (Section 7, Chapter II): 1) Determination of the optimal ordering of the unknowns. 2) Determination of the optimal smoother and the (segregated or coupled) type of smoothing iterations for systems. 3) Determination of the dynamic coarsest level for time-dependent-problems (Section 5, Chapter III). 4) Adaptation of all grids to the behavior of the solution. Theoretically, full parallelism is expected (E ≈ 1 (3.7); see [21] for details), but practically everything depends on the architectures of parallel computers (Figure 3.4). A description of the parallel OpenMP-based software for RMT is given in [23].

3 Algebraic parallelism The algebraic parallelism of RMT is based on the multicolor ordering of unknowns to parallelize the smoothing iterations on the finer grids where the number of finer grids is less than the number of computing units. Typically, the coloring is chosen to decouple unknowns of the same color, thus allowing these unknowns to be updated

3 Algebraic parallelism � 83

simultaneously. This property makes multicolor iterative methods especially attractive on vector or parallel computers [26]. As the simplest example, we consider 1D discrete BVP ai ui−1 − bi ui + ci ui+1 = di ,

i = 1, 2, . . . , n + 1,

where ai ≥ 0, bi > 0, ci ≥ 0, and di are known coefficients (a1 = cn+1 = 0), and ui is the desired function. The lexicographic ordering of unknowns ui (Figure 3.6) makes it possible to write the sequential Gauss–Seidel iteration in the form ui :=

1 (a u + ci ui+1 − di ), bi i i−1

i = 1, 2, . . . , n + 1.

Two-color ordering allows unknowns of the same color to be updated in parallel (Figure 3.6). For example, ∘-unknowns can be updated on two computing units as follows: 1) First computing unit ui := 2)

1 (a u + ci ui+1 − di ), bi i i−1

i = 1, 3.

1 (a u + ci ui+1 − di ), bi i i−1

i = 5, 7.

Second computing unit ui :=

Parallel coupled solvers use multicolor block ordering of unknowns. The software for parallel plane 3D Vanka-type smoother is described in [23]. Figure 3.7 represents the efficiency E (3.7) of this parallel smoother for solving (3.3). The Vanka-type smoother is more expensive than point ones. As a rule, the efficiency of the parallel Vanka-type smoother is higher, since more arithmetic operations are performed in parallel. In general, algebraic parallelism is a crucial RMT component with respect to the overall one.

Figure 3.6: Orderings of unknowns: a) lexicographic ordering for the sequential implementation, b) twocolor ordering for the parallel implementation.

84 � III Parallel robust multigrid technique

Figure 3.7: Efficiency (3.7) of the parallel plane 3D Vanka-type smoother.

Conclusion. Algebraic parallelism of RMT is based on the multicolor orderings of unknowns (or block of unknowns) to parallelize the smoothing iterations on the finer levels where the number of grids is less than the number of computing units. Small-scale granular algebraic parallelism is grid-independent.

4 Parallel cycle for middle-thread parallelization Figure 3.8 presents the overall efficiency (3.7) of the parallel RMT for p = 27 (low-thread parallelization). Using appropriate hardware, software, operating system, and compiler, an almost full parallelism of the RMT-based algorithm is expected. In Section 1, Chapter III the application of p = 3κ , κ = 1, . . . , d, threads is classified as the low-thread parallelization of RMT. If the parallel computer allows almost full geometric and algebraic parallelism, then more threads can be used to OpenMP-based implement RMT in parallel. The number of threads p = 3d+κ , κ = 1, . . . , d, defines the middle-thread parallelization of RMT. Figure 3.9 shows the parallel cycle of RMT: algebraic parallelism is used on the finer grid of the zero and first levels for smoothing in parallel. The middle-thread parallelization uses p = 3d+κ threads for parallel smoothing on the finest grid and p = 3κ threads for parallel smoothing on the multigrid structures generated by the dynamic grids of the first level. Theoretical estimations of the speed-up and efficiency of parallel RMT are given in [21]. The degree of parallelism of a classic multigrid differs by grid levels (i. e., it is small on coarse grids). On coarse grids, the relation of computation and communication be-

5 Parallel solution of time-dependent problems � 85

Figure 3.8: Overall efficiency (3.7) of the parallel RMT for solving BVP (3.3) (the geometric and algebraic parallelisms).

comes worse than on fine grids [35]. The absence of coarse grids in the single-grid RMT avoids a decrease in parallelization efficiency on very coarse grids. In any specific case the efficiency of parallel RMT depends on the hardware and on the finest grid smoother which of these parameters is crucial. The PSMG and RMT refer to massively parallel computing.

5 Parallel solution of time-dependent problems The initial-boundary value problems (IBVPs) for the time-dependent PDEs of mathematical physics and the efficient algorithms for their numerical solution are of considerable scientific and practical interest. The following are reasons for this: 1) Some physical processes (for example, turbulence) are nonstationary 3D phenomena. 2) The IBVPs are particularly useful if the solution shows an unsteady behavior, which is not known in advance. A steady-state solution can be obtained through pseudotime marching. Using semi-implicit or fully implicit discretizations, large and adaptable time steps can be used, and parallel processing across space and/or time is feasible. 3) The systems of strongly coupled nonlinear PDEs describe real-life problems. In this case the time step can be used as an underrelaxation factor for convergence control of the nonlinear iterations.

86 � III Parallel robust multigrid technique

Figure 3.9: Parallel cycle of RMT for solving BVPs (the middle-thread parallelization): ∘ – geometric parallelism, ∙ – algebraic parallelism.

Therefore a time-dependent formulation of PDEs is more preferable for black-box implementation. The computational algorithm must be efficient for different values of mesh sizes in time and space. Problems involving time t as one independent variable usually lead to parabolic or hyperbolic PDEs. Here only an implicit time discretization is used, and RMT is applied to each of the discrete problems that must be solved at each time step. An attempt is made to develop RMT for the parallel solution of BVPs and IBVPs in a unified manner. Second-order parabolic PDEs are most often used to describe nonstationary heat and mass transfer processes. Let us consider a heat conductivity phenomenon in some absolutely solid body of volume V and boundary B consisting of a homogeneous material with constant density ρ, specific heat capacity c, and thermal conductivity coefficient λ. Then the heat equation has the form ut′ = aΔu + f ,

(3.8)

where u is the temperature, f is the volumetric density of heat sources, a = λ/(ρc) is the temperature diffusivity coefficient of the material, and Δ is the Laplace operator (1.16). Let the temperature ub be given on the boundary B, i. e., u(t, P) = ub (t, P),

P ∈ B, t ≥ 0.

(3.9)

In addition, the body temperature u0 is known at some initial moment t = 0: u(0, M) = u0 (0, M),

M ∈ V.

(3.10)

The classical solution of the IBVP (3.8)–(3.10) is a continuous function u having the first spatial derivatives in a closed cylinder [0, T] × V , the first-order time-continuous derivatives, and the second-order continuous spatial derivatives in the open cylinder (0, T]×V and satisfying Eq. (3.8), the boundary condition (3.9), and the initial condition (3.10).

5 Parallel solution of time-dependent problems

� 87

Assuming that some approximation û to the desired solution u is known, substitution of the approximation û into Eq. (3.8) leads to û t′ = aΔû + f + r, where r is some residual. A correction c is added to the approximation û for deleting the residual r: ct′ = aΔc + f − û t′ + aΔu.̂

(3.11)

In the multigrid literature a discrete analogue of Eq. (3.11) is called a defect equation [35]. The boundary condition (3.9) and the initial condition (3.10) take the form c(t, P) = 0,

P ∈ B, t ≥ 0,

and c(0, M) = 0,

M ∈ V.

As in Section 7, Chapter I, the solution u will not be found, but the correction c, i. e., the difference of a solution u and its approximation u:̂ c = u − u.̂ Let uniform computational grid be generated in the domain [0, T]×V . The standard solution approach is to fully discretize Eq. (3.11), obtaining a discrete BVP at each time step when an implicit scheme is used for the approximation of the time derivative. One of the most popular approaches is the Crank–Nicolson scheme (n+1) (n) cijk − cijk

ht

=

a 1 (Δh ch(n) + Δh ch(n+1) ) − (rh(n) + rh(n+1) ), 2 2

where Δh is a discrete analogue of the Laplace operator (1.16), and r = û t′ − aΔû − f and ht are the residual and mesh size in time, respectively. The discretization accuracy is O(ht2 + hx2 + hy2 + hz2 ) for the Crank–Nicolson scheme. (n+1) Using some ordering on the unknowns cijk , the discrete equation can be written in matrix form

A(n+1) c(n+1) = B(n+1) c(n) + b(n+1) .

(3.12)

(n) Since the value of û ijk is known, c(n) = 0. In standard time-stepping methods, the multigrid can be used as an iterative solver for the discrete BVPs arising at each time step, (n) i. e., for the sequential/parallel computing û ijk . The parallelism of this elliptic solver limits the efficiency of the parallel algorithm, since the time dimension is treated strictly sequentially. It is more attractive to use not only parallelism in space, but also parallelism in time. For the given purpose, the resulting SLAE (3.12) is written at several new time steps:

88 � III Parallel robust multigrid technique A(n+nt ) c(n+nt ) = B(n+nt ) c(n+nt −1) + b(n+nt ) , A(n+nt −1) c(n+nt −1) = B(n+nt −1) c(n+nt −2) + b(n+nt −1) , .. .

A(n+1) c(n+1) = b(n+1) , where nt is the number of time steps treated in parallel. The resulting SLAE can be rewritten in matrix form A(n+nt ) ( (

−B(n+nt ) A(n+nt −1)

(n+nt −1)

−B A(n+nt −2)

−B(n+nt −2) .. .

(

b(n+nt ) c(n+nt ) (n+nt −1) b(n+nt −1) c (n+n −2) t ) (b(n+nt −2)) ) (c ) )=( )( .. .. . . A(n+1)) ( c(n+1) ) ( b(n+1) )

with the block two-diagonal coefficient matrix. Our goal is to develop a highly parallel RMT-based algorithm for solving such SLAEs. First, we represent a formal description of the RMT-based parallel algorithm and then discuss the features of its implementation. Let the computational domain be [0, T]× Ω, where Ω = [0, 1]d and [0, T] are the d-dimensional cube and time interval, respectively. As in Section 1, Chapter III, the 3D structured grid G10 = {((xiv , xif ), (yvj , yfj ), (zvk , zfk )) | i = 1, 2, . . . , x n10 + 1, j = 1, 2, . . . , y n10 +1, k = 1, 2, . . . , z n10 +1} is formed by the product of three 1D grids G10x , G10y , and G10z (in general, x n1 ≠ y n1 ≠ z n1 ), i. e., G10 = G10x × G10y × G10z ,

(xiv , xif ) ∈ G10x ,

(yvj , yfj ) ∈ G10y ,

(zvk , zfk ) ∈ G10z .

The uniform grid GT = {tn = ht n̄ | n̄ = 0, 1, . . . , nt + 1} is used for time discretization, where ht is the time step size. To unify the parallel RMT-based algorithm for BVPs and IBVPs, only spatial coarsening will be used. The finest grid GT × G10 = {tn , (xiv , xif ), (yvj , yfj ), (zvk , zfk )} generates the multigrid structure GT × ℳ𝒮 (G10 ), where each level l of the structure consists of 3lx +ly +lz grids (Section 1, Chapter III). The finite volume Vi is defined as (n+1) f f Vijk = {(t, x, y, z) | tn ≤ t ≤ tn+1 , x{i−1} ≤ x ≤ x{i} , yf{j−1} ≤ y ≤ yf{j} , zf{k−1} ≤ z ≤ zf{k} },

5 Parallel solution of time-dependent problems � 89

i. e., each point (tn , xiv , yvj , zvk ) of the grid is a vertex. Integration of (3.11) over this finite volume leads to the following equation: 1

∫ ∫ ∫

lx +ly +lz

ht hx hy hz 3

=

zf{k}

∫ ct′ dz dy dx dt

tn x f yf zf {i−1} {j−1} {k−1} f tn+1 x{i}

1

ht hx hy hz 3lx +ly +lz

+

yf{j}

f tn+1 x{i}

1

yf{j}

zf{k}

∫ aΔc dz dy dx dt

∫ ∫ ∫

tn x f zf yf {i−1} {j−1} {k−1} f tn+1 x{i}

lx +ly +lz

ht hx hy hz 3

yf{j}

zf{k}

∫ (f − û t′ + aΔu)̂ dz dy dx dt.

∫ ∫ ∫

tn x f zf yf {i−1} {j−1} {k−1}

Application of the trapezoidal and the midpoint rules for the integral approximation tn+1

1 1 ∫ α(t) dt = (α(tn ) + α(tn+1 )) + O(tn2 ), ht 2 tn

f x{i}

1 v ) + O(hx2 32lx ) ∫ β(x) dx = β(x{i} hx 3lx f x{i−1}

defines the Crank–Nicolson scheme (n+1) (n) c{ijk} − c{ijk}

ht

=

a (n+1) (Δ c(n) + Δh ch(n+1) ) + ⟨r{ijk} ⟩, 2 h h

(3.13)

where Δh ch(m) =

(m) (m) (m) c{i−1jk} − 2c{ijk} + c{i+1jk}

hx2 32lx

+

(m) (m) (m) c{ij−1k} − 2c{ijk} + c{ij+1k}

hy2 32ly

+

(m) (m) (m) c{ijk−1} − 2c{ijk} + c{ijk+1}

hz2 32lz

is a discrete analogue of the Laplace operator on the multigrid structure, and

(n+1) ⟨r{ijk} ⟩=

1

ht hx hy hz 3lx +ly +lz

f tn+1 x{i}

yf{j}

∫ ∫ ∫

zf{k}

∫ (f − û t′ + aΔu)̂ dz dy dx dt

tn x f yf zf {i−1} {j−1} {k−1}

(n+1) are the residuals averaged over the finite volume Vijk .

90 � III Parallel robust multigrid technique Recall that the residual on the finest grid GT × G10 (n+1) (n) û ijk − û ijk 1 a (n+1) ⟨rijk ⟩ = (f (tn , xiv , yvj , zvk )+f (tn+1 , xiv , yvj , zvk ))− + (Δh û h(n) +Δh û h(n+1) ) (3.14) 2 ht 2

defines the averaging on the multigrid structure GT × ℳ𝒮 (G10 ) [21], where Δh û h(m) =

(m) (m) (m) û i−1jk − 2û ijk + û i+1jk

hx2

+

(m) (m) (m) û ij−1k − 2û ijk + û ij+1k

hy2

+

(m) (m) (m) û ijk−1 − 2û ijk + û ijk+1

hz2

.

The discrete IBVPs (3.13) coupled with the initial and the boundary conditions can be written in the above-mentioned matrix form A(n+1) c(n+1) = Bl(n+1) c(n) + ℛ0→l r (n+1) , 0 l l l

l = 0, 1, . . . , L+3 ,

where ℛ0→l is the problem-independent restriction operator of RMT transferring the residual (3.14) from the finest grid (l = 0) to the coarse level l. To treat the time dimension in parallel, we write the resulting SLAE on nt time steps Al

(n+nt )

(n+nt )

Al

(n+nt −1)

( ( ( (

cl

(n+nt )

−Bl

(n+nt −1)

t c ) )( l ) ( (n+nt −2) ) ) ) (cl ) )( .. .

(n+n −1)

−Bl Al

(n+nt −2)

(n+nt −2)

−Bl

..

.

(n+1) A(n+1) ) ) ( cl l

(

r0

(n+nt )

ℛ0→l

( =( (

r0 t ( ) ( (n+nt −2) ) ), ) (r 0 ) .. . (n+n −1)

ℛ0→l ℛ0→l

..

.

ℛ0→l )

l = 0, 1, . . . , L+3 ,

(n+1) ( r0 )

or briefly Al cl = R0→l r 0 ,

l = 0, 1, . . . , L+3 .

(3.15)

As for the BVPs (Section 2, Chapter III, Figure 3.5), we omit the finest grid GT × G10 (geometric parallelism or parallelism in space) and solve the discrete IBVP (3.15) on the 1 1 multigrid structures GT∗ × ℳ𝒮 (Gm ), m = 1, . . . , 3d , generated by the coarse grids GT∗ × Gm , d ∗ m = 1, . . . , 3 , of the first (dynamic) level (l = 1) and GT = {tn+n̄ , n̄ = 1, 2, . . . , nt }. Since l l Gnl ∩ Gm = ⌀ ⇒ GT∗ × ℳ𝒮 (Gnl ) ∩ GT∗ × ℳ𝒮 (Gm ) = ⌀,

n ≠ m, l ≠ 0

5 Parallel solution of time-dependent problems

� 91

(Property 1, p. 41), FVD of (3.11) on the coarse grids of the first level results in 3d independent IBVPs that can be solved in parallel. Figure 3.10 represents the parallel solution process of three independent IBVPs on the coarse grids of the first level and on the five time steps (d = 1, p = 3, nt = 5).

Figure 3.10: Geometric parallelism or parallelism in space: parallel solution of independent 1D IBVPs on the coarse grids G11 , G21 , and G31 of the first level (d = 1, p = 3, nt = 5). 1 The numerical solution of (3.15) obtained on the multigrid structures GT∗ × ℳ𝒮 (Gm ), d m = 1, . . . , 3 , can be abbreviated as

c1 = A−1 1 R0→1 r 0 . The problem-independent prolongation operator P1→0 transfers the solution (coarse grid correction) to the finest grid GT∗ × G10 , where it is used as a starting guess

92 � III Parallel robust multigrid technique

(c0 )

(0)

= P1→0 c1 = P1→0 A−1 1 R0→1 r 0 .

(3.16)

As opposed to elliptic problems, it is difficult to estimate the accuracy of the correction computation. If the spatial discretization error significantly exceeds the time discretization error, then a close-to-BVP estimation is expected: the difference of the starting guess (c0 )(0) (3.16) and the finest grid solution A−1 0 r 0 does not exceed ℓ significant digits, where ℓ is the serial number of the dynamic finest level (Figure 3.5). If the time discretization error significantly exceeds the spatial discretization error, then the triple coarsening does not affect the accuracy of the correction computations. Since R0→0 = I, parallel smoothing iterations on the finest grid GT∗ × G10 are defined by (ν) W0 (c(ν+1) − c(ν) 0 0 ) = r 0 − A0 c 0 ,

ν = 0, 1, . . . .

As an example, we consider the heat equation 𝜕2 u 𝜕2 u 𝜕2 u 𝜕u = a( 2 + 2 + 2 ) + f (t, x, y, z) 𝜕t 𝜕x 𝜕y 𝜕z

(3.17)

in the domain (0, T) × [0, 1]3 . The exact solution of (3.17) u(t, x, y, z) = sin(2πt) + sin(2πmx) sin(2πmy) sin(2πmz)

(3.18)

defines the right-hand side function f (t, x, y, z) = 2π cos(2πt) + 12aπ 2 m2 sin(2πmx) sin(2πmy) sin(2πmz),

(3.19)

boundary conditions u(t, 0, y, z) = u(t, 1, y, z) = u(t, x, 0, z) = u(t, x, 1, z) = u(t, x, y, 0) = u(t, x, y, 1) = sin(2πt), and initial conditions u(0, x, y, z) = sin(2πmx) sin(2πmy) sin(2πmz). Here m = 1, 2, 3, . . . is an integer parameter, and spatial discretization error depends sensitively on m. To illustrate the geometric parallelism (Figure 3.10), the heat equation (3.17) is approximated (i. e., discrete analogue (3.13) with û = 0) and solved on the multigrid structures GT∗ × ℳ𝒮 (G10 ) and GT∗ × ℳ𝒮 (G11 ). Recall that ℳ𝒮 (G11 ) is the multigrid structure generated by the first grid G11 of the dynamic level l = 1. The results of computations (nt = 15, ns = 152 (hx = hy = hz = 1/ns = 1/152), m = 2, 3) are shown in Figure 3.11, where the error of the numerical solution is computed by

5 Parallel solution of time-dependent problems � 93

󵄨 󵄨 (n+1) − u(t (n+1) , x{i} , y{j} , z{k} )󵄨󵄨󵄨 ‖e‖∞ = max󵄨󵄨󵄨u{ijk} {ijk}

on nt + 1 time steps.

Figure 3.11: Error of the numerical solution obtained on the multigrid structures nt × ℳ𝒮(G10 ) and nt × ℳ𝒮(G11 ).

Obviously, the heat equation (3.17) tends to the Cauchy problem ut′ = f as a → 0. In this case the error of numerical solution depends weakly on the spatial discretization errors (i. e., on the spatial mesh size hx = hy = hz and m). If a ≫ 0, then the error estimation for BVPs (2.48) can be used: the difference of the initial guess formed by the dynamic level solutions and the finest grid solution does not exceed one significant digit for the second-order discretizations (Figure 3.11). As for BVPs (Section 3, Chapter III, Figure 3.6), the multicolored Gauss–Seidel method can be used for the parallel smoothing on the finest grid. Figure 3.12 shows this parallel smoothing process for 1D IBVP (algebraic parallelism or parallelism in time). For a completely consistent smoother (Section 2, Chapter IV), we have ‖b0 − A0 û 0 ‖ (1)

‖b0 − A0 û 0 ‖ (0)

≤ CA η(ν) < 1

if the smoothing property (4.28) and approximation property (4.29) hold (Section 4, Chap(0) (1) ter IV). Here û 0 is an initial approximation to the solution, and û 0 is an approximation to the solution after the first multigrid iteration. Reduction of the residual norm means the convergence of the multigrid iterations.

94 � III Parallel robust multigrid technique

Figure 3.12: Algebraic parallelism or parallelism in time: parallel 1D two-colored block smoothing on the finest grid (d = 1, p = 3, nt = 5): a) updating of the corrections forming white blocks of the unknowns (∘); b) updating of the corrections forming black blocks of the unknowns (∙).

Let the nt × (ns + 1)3 uniform computational grid (nt = 15, ns = 152, ht = hx = hy = hz =

1/152) be generated in the domain (0, T) × [0, 1]3 . The Crank–Nicolson scheme becomes (n+1) (n) uijk − uijk

ht

=

a h (n) 1 (n) (n+1) (Δ u + Δh u(n+1) ) + (fijk + fijk ), 2 2

(n+1) where Δh is the discrete Laplace operator (3.13). The unknowns uijk , ns = 0, nt , i, j = 2, n − 1, (nb − 1)25 + 2 ≤ k ≤ nb 25 + 1 form two-colored blocks of unknowns: white nb = 1, 3, 5 and black nb = 2, 4, 6 blocks as shown in Figure 3.12. The Crank–Nicolson scheme can be rewritten as (n+1) uijk =

a ht (n+1) a ht (n+1) (n+1) (n+1) (n+1) (n+1) (u − 2uijk + ui+1jk )+ (u − 2uijk + uij+1k ) 2 hx2 i−1jk 2 hy2 ij−1k +

a ht (n+1) (n+1) (n+1) (n) (u − 2uijk + uijk+1 ) + Sijk , 2 hz2 ijk−1

where the source term becomes

5 Parallel solution of time-dependent problems � 95

h (n) a (n+1) ht Δu(n) + t (fijk + fijk ) 2 2 a ht (n) a ht (n) (n) (n) (n) (n) (n) = uijk + (ui−1jk − 2uijk + ui+1jk )+ (u − 2uijk + uij+1k ) 2 2 hx 2 hy2 ij−1k

(n) (n) Sijk = uijk +

+

h (n) a ht (n) (n) (n) (n+1) (u − 2uijk + uijk+1 ) + t (fijk + fijk ). 2 hz2 ijk−1 2

The smoothing iteration ν takes the form do Color = 1, 2 do nb = Color, pColor, 2

do k = (nb − 1)25 + 2, nb 25 + 1 do j = 2, ns − 1

do i = 2, ns − 1 do t = 0, nt

(n+1+t) uijk

(n+1+t) (n+1+t) (n+1+t) (n+1+t) (n+1+t) (n+1+t) := ℏ̂ x (ui−1jk + ui+1jk ) + ℏ̂ y (uij−1k + uij+1k ) + ℏ̂ z (uijk−1 + uijk+1 )

+

1 S (n+t) , 1 + 2(ℏx + ℏy + ℏz ) ijk

(3.20)

end do end do end do end do end do end do where := denotes assignment “left equals right”, Color = 1, 2 for this two-colored block ordering, p = 3 is the number of threads, and ℏ̂ x = ℏx =

ℏx , 1 + 2(ℏx + ℏy + ℏz )

a ht , 2 hx2

ℏ̂ y = ℏy =

ℏy

1 + 2(ℏx + ℏy + ℏz )

a ht , 2 hy2

,

ℏ̂ z = ℏz =

ℏz , 1 + 2(ℏx + ℏy + ℏz ) a ht . 2 hz2

Figure 3.13 represents the convergence of iterations (3.20) in nt + 1 time steps (nt = 15), where the residual norm is given by (n+1) (n) 󵄨󵄨 uijk 󵄨 − uijk a 1 (n) 󵄩󵄩 (ν) 󵄩󵄩 󵄨 (n+1) 󵄨󵄨󵄨 − (Δh u(n) + Δh u(n+1) ) − (fijk + fijk )󵄨󵄨, 󵄩󵄩r 󵄩󵄩∞ = max󵄨󵄨󵄨 󵄨󵄨 ijk 󵄨󵄨 ht 2 2

and the error is given by

96 � III Parallel robust multigrid technique 󵄨 󵄨 (n+1) 󵄩󵄩 (ν) 󵄩󵄩 (n+1) , xi , yj , zk )󵄨󵄨󵄨, 󵄩󵄩e 󵄩󵄩∞ = max󵄨󵄨󵄨uijk − u(t ijk where u(t (n+1) , xi , yj , zk ) is the exact solution (3.18).

Figure 3.13: Convergence of iterations (3.20) in nt + 1 time steps (nt = 15, n = 152, nb = 6, m = 1, p = 1).

The parallel iterations (3.20) are stopped by the stopping criterion ‖r‖∞ < 10−6 . Figure 3.14 represents the efficiency E (3.7) of parallel iterations (3.20) for solving (3.17); full parallelism is achieved: E ≈ 1.

Figure 3.14: Efficiency E (3.7) of parallel iterations (3.20) for solving (3.17) (nt = 15, n = 152, nb = 6, m = 1, p = 3).

5 Parallel solution of time-dependent problems

� 97

Figure 3.15 demonstrates the overall efficiency (3.7) of the parallel RMT in solving the model IBVP (3.17) (p = 3, 27, a = 10−2 ). It should be emphasized that the first difference between parallel RMT-based algorithms for solving BVPs and IBVPs is the number of time steps: one step for BVPs (nt = 1) and several steps for IBVPs (nt ≥ 1, the memory requirements being approximately proportional to the number of time steps).

Figure 3.15: Overall efficiency (3.7) of the parallel RMT for solving IBVP (3.17), (3.18) (geometric and algebraic parallelisms).

The second difference is the dependence of the RMT-based algorithm on the IBVP being solved. FVD of time-dependent problems on the coarse levels leads to the resulting SLAEs with a well-conditioned coefficient matrix. Therefore it is necessary to perform a comparative analysis of the algorithmic complexity of the single-grid Gauss–Seidel method and RMT. The number of single-grid Gauss–Seidel iteration is Θ = (n10 )ϰ , where ϰ depends on the condition number of the coefficient matrix A and d = 2, 3 (Section 5, Chapter I). On the coarse levels the discretization parameter is nkl ≈

n10 , 3l

l = 0, 1, 2, . . . , L+3 ,

so the number of Gauss–Seidel iterations becomes ϰ

Θ = (nkl ) ≈ (

ϰ

n10 ) . 3l

98 � III Parallel robust multigrid technique Taking (2.7) into account, the number of Gauss–Seidel iterations can be estimated on the coarse grids of level l by Θl =

ϰ (nkl )

ϰ

ϰ

+ n0 3L3 +1 ≈ ( 1l ) ≈ ( l ) = 3ϰ(L3 −l+1) . 3 3 +

Assume that three Gauss–Seidel iterations should be performed on the coarse grids of level L∗3 to reach a stopping criterion. Then ΘL∗ = 3 ⇒ 3 = 3ϰ(L3 −L3 +1) ⇒ 1 = ϰ(L+3 − L∗3 + 1). +



3

(3.21)

If three smoothing iterations are needed to obtain a numerical solution on level L∗3 , then it makes no sense to perform smoothing iterations on coarser level L∗3 + 1. In the following, some coarse grids, which are considered to be the coarsest grids in the solution process, will be called the dynamic coarsest grids. These grids form the dynamic coarsest level L∗3 : 0 ⩽ L∗3 ⩽ L+3 . Comparison of the algorithmic complexity of the single-grid Gauss–Seidel method (1.13) 𝒲GS = Cnb N −2

3+ϰ/d

ao

and RMT with the Gauss–Seidel smoother and the dynamic coarsest level (Section 6, Chapter II) 3

𝒲RMT = Cqνnb N (L3 + 1) ao −2



leads to 𝒲GS 𝒲RMT

=

Cnb−2 N 3+ϰ/d

0 ϰ ϰ(L +1) 1 (n1 ) 1 3 3 = , ∗ qν L3 + 1 qν L∗3 + 1 +

Cqνnb−2 N 3 (L∗3 + 1)

=

where q and ν are the numbers of the multigrid and smoothing iterations, respectively. Using (3.21) for the elimination of the parameter ϰ, we achieve the final ratio of the complexities (speed-up) 𝒲GS 𝒲RMT

=

L+ 3 +1 L+ −L∗ +1 3 3

1 3 qν L∗3 + 1

and can consider the following particular cases: a) ϰ = 2d (the algorithmic complexity of the Gauss–Seidel method is equal to the algorithmic complexity of the Gaussian elimination): Θ=

2d (n10 )



L∗3

=

L+3



𝒲GS 𝒲RMT

2d(L+ +1)

1 3 3 = ≫ 1, qν L+3 + 1

i. e., impressive benefit of RMT is expected (Figure 3.17).

5 Parallel solution of time-dependent problems � 99

b) ϰ = 1: Θ=

c)

n10



L∗3

=

L+3



𝒲GS 𝒲RMT

L +1 1 3 3 = ≫ 1, qν L+3 + 1 +

i. e., benefit of RMT is reduced as compared to ϰ = 2d (Figure 3.17). The condition L∗3 = L+3 (3.21) means that the multigrid schedule shown in Figure 3.5 should be used for this case. ϰ = 1/2: Θ=

√n10



L∗3

=

L+3

−1 ⇒

𝒲GS 𝒲RMT

L+ 3 +1

1 3 2 = > 1, qν L+3

i. e., the benefit of RMT is reduced as compared to ϰ = 1 (Figure 3.17). The condition L∗3 = L+3 −1 (3.21) means that the multigrid schedule shown in Figure 3.16 (low-thread parallelization) should be used for this case.

Figure 3.16: Parallel cycle of RMT for solving IVBPs (Θ = √n01 ): ∘ – geometric parallelism, ∙ – algebraic parallelism.

d) ϰ = 1/3: Θ = √3 n10 ⇒ L∗3 = L+3 − 2 ⇒

𝒲GS 𝒲RMT

L+ 3 +1

1 3 3 = > 1, qν L+3 − 1

i. e., the benefit of RMT is reduced as compared to ϰ = 1/2 (Figure 3.17). The condition L∗3 = L+3 −2 (3.21) means that the multigrid schedule shown in Figure 3.18 (low-thread parallelization) should be used for this case.

100 � III Parallel robust multigrid technique

Figure 3.17: Dynamic coarsest level L+∗ .

3 0 Figure 3.18: Parallel cycle of RMT for solving IVBPs (Θ = √n 1 ): ∘ – geometric parallelism, ∙ – algebraic parallelism.

e)

ϰ = 1/L+3 : L+ 3

Θ = √n10 ⇒ L+∗ = 1 ⇒

𝒲GS 𝒲RMT

L+ 3 +1 +

1 3 L3 = qν 2

≈ 1,

i. e., the algorithmic complexity of RMT is equivalent to that of a single-grid smoother. If the number of Gauss–Seidel iterations on the finest grid is Θ = (n10 )ϰ , ϰ ∈ (1/L+3 , 1), then the discrete IBVP is a well-conditioned problem (obviously, the smaller the time steps, the stronger the diagonal dominance, and the better the convergence rate), and it is an ill-conditioned problem for ϰ ∈ [1, 2d). For the above example, the ratio of the Gauss–Seidel/RMT complexities (speed-up) behaves with the constant ϰ ∈ (1/L+3 , 2d):

6 Summary

� 101

the smaller ϰ, the slower the speed-up; for ϰ → 0 ⇒ L∗3 → 0, the RMT transforms to the single-grid Gauss–Seidel method. In practice, the dynamic coarsest level needed to solve the (non)linear IBVPs can be determined by black-box optimization on the multigrid structure (Section 7, Chapter II). Starting from the same initial guess, the number of smoothing iterations (Θl ) can be determined experimentally on several coarse grids of the same level l = L+3 , L+3 − 1, . . . , 1. Only a few smoothing iterations are needed for convergence on the dynamic coarsest level L∗3 . The amount of work for the black-box optimization on the multigrid structure is negligible. Conclusion. RMT can be used for unified parallel solving BVPs and IBVPs; the main differences lie in the number of time steps treated in parallel (one time step for BVPs and several time steps for IBVPs) and in the number of extra grid levels (which is L+3 for the ill-conditioned problems and L∗3 for the well-conditioned problems). As a rule, if the time steps become very small, even simple smoothing methods may have good convergence properties (Θ = nϰ , ϰ < 1) and may be comparable to (ϰ < 1) (or even more efficient (ϰ ≪ 1)) than RMT. The larger the time steps (ϰ ∈ [1, 2d)), the more will be gained by RMT. The dynamic coarsest level can be determined by black-box optimization (Section 7, Chapter II).

6 Summary A parallel system needs to have p = 3κ or p = 3d+κ computing units for the low- and middle-unit parallelization of RMT, respectively (κ = 1, . . . , d). There are two parallel implementations of RMT: 1) Large-scale granular geometric parallelism: the number of computing units is less than the number of grids of any coarse level. The smoother-independent geometric parallelism is based on the nonoverlapping finest grid partition for the distribution of discrete tasks over p computing units (Section 2, Chapter III). 2) Small-scale granular algebraic parallelism: the number of computing units is greater than the number of grids of any fine level. The grid-independent algebraic parallelism is based on the multicolor ordering of unknowns for the distribution of the discrete problems arising in smoothing iteration partition over p computing units (Section 3, Chapter III). The parallel cycle has a modified first multigrid iteration (Figure 3.5). The basic idea of this modification is to use the multigrid structures generated by the dynamic finest grids, where all discrete BVPs are independent and can be solved in full parallel (Section 4, Chapter III). The assemblage of the numerical solutions obtained on these multigrid structures is used as the starting guess for the finest grid solution. The difference between the starting guess and desired numerical solution on the finest grid does not exceed ℓ significant digits for the second-order discretization, where ℓ is the serial num-

102 � III Parallel robust multigrid technique ber of the dynamic finest level (Figure 3.5). In addition to solving the discrete BVPs, the first multigrid iteration makes it possible to optimize the computational algorithm in the black-box manner for 1) determination of the optimal ordering of the unknowns; 2) determination of the optimal smoother and the (segregated or coupled) type of smoothing iterations for systems; 3) determination of the dynamic coarsest level for time-dependent-problems; 4) adaptation of all grids to the behavior of the solution. RMT can be used for the unified parallel solving BVPs and IBVPs; the main differences lie in the number of time steps treated in parallel and in the number of grid levels (one time step for BVPs (nt = 1) and several time steps for IBVPs (nt ≥ 1), Section 5, Chapter III). The number of levels and the efficiency of parallel RMT decrease for the well-conditioned IBVP. For a given architecture, hardware, software, memory/cache organization, operating system, and compiler have an essential impact on the overall efficiency of the parallel RMT-based algorithms. Development of parallel RMT-based software should be organized as follows: a) Testing a subroutine that implements geometric parallelism. Since independent discrete BVPs or IBVPs are solved in parallel, almost full parallelism should be achieved. b) Testing a subroutine that implements algebraic parallelism. Parallel smoothing on the finer levels is a crucial RMT component with respect to overall parallelism. As a rule, Vanka-type smoothers give much better parallel efficiencies than the point ones since more operations can be carried out in parallel.

IV Background in iterative methods This chapter contains the theoretical aspects of the algorithms used for the numerical solving of the resulting system of linear algebraic equations obtained from discrete multidimensional (initial-)boundary value problems. It should be emphasized that the results of linear convergence analysis are completely useless for solving nonlinear applied problems. For a detailed description, we refer to [1, 11, 13, 14, 28, 30].

1 Basic notation and definitions Let ℝ denote the set of real numbers. A square matrix has the same number of rows as columns:

A ∈ ℝN×N

a11 a21 ⇒ A = {aij } = ( . .. aN1

a12 a22 .. . aN2

⋅⋅⋅ ⋅⋅⋅ .. . ⋅⋅⋅

a1N a2N .. ) , . aNN

aij ∈ ℝ.

The space of all N-by-N real matrices will be denoted as ℝN×N . If a capital letter is used to denote a matrix (A, B, C), then the corresponding lower-case letter with subscript ij refers to the (i, j) entry aij , bij , cij . Definition IV.1. A vector norm on a vector space 𝕏 is a real-valued function x → ‖x‖ on 𝕏 that satisfies the following three conditions: 1. ‖x‖ ⩾ 0 ∀x ∈ 𝕏, and ‖x‖ = 0 iff x = 0. 2. ‖αx‖ = |α|‖x‖ ∀x ∈ ℝ ∀α ∈ ℝ. 3. ‖x + y‖ ⩽ ‖x‖ + ‖y‖ ∀x, y ∈ 𝕏. The most commonly used vector norms in numerical linear algebra are particular cases of the Hölder norms 1/p

n

‖x‖p = (∑ |xi |p ) i=1

.

The cases p = 1, p = 2, and p = ∞ lead to the most important norms in practice: n

‖x‖1 = ∑ |xi |, i=1

n

2

1/2

‖x‖2 = (∑ |xi | ) , i=1

‖x‖∞ = max |xi |. i=1,...,n

https://doi.org/10.1515/9783111319568-004

(4.1)

104 � IV Background in iterative methods All vector norms are equivalent, that is, if ‖⋅‖α and ‖⋅‖β are two vector norms on ℝN , then there are positive constants c1 and c2 such that c1 ‖x‖α ⩽ ‖x‖β ⩽ c2 ‖x‖α for all x ∈ ℝN . Definition IV.2. For a matrix A ∈ ℝN×N , the following special set of p-norms is defined: ‖A‖p = max x =0 ̸

‖Ax‖p ‖x‖p

.

A fundamental property of the p-norm is that ‖AB‖p ⩽ ‖A‖p ‖B‖p , ‖Ax‖p ⩽ ‖A‖p ‖x‖p . Matrix norms that satisfy the above property are sometimes called consistent. A result of this consistency is that for any square matrix A, 󵄩󵄩 k 󵄩󵄩 k 󵄩󵄩A 󵄩󵄩p ⩽ ‖A‖p . All matrix norms are equivalent, that is, if ‖ ⋅ ‖α and ‖ ⋅ ‖β are two matrix norms

on ℝN×N , then there are positive constants c1 and c2 such that c1 ‖A‖α ⩽ ‖A‖β ⩽ c2 ‖A‖α

for all A ∈ ℝN×N . It often results in a subscript-free norm notation ‖ ⋅ ‖. Basic matrix operations for the matrices A ∈ ℝN×N , B ∈ ℝN×N and vector c ∈ ℝN include a) Transposition B = AT ⇒ bij = aji . b) Addition C = A + B ⇒ cij = aij + bij . c)

Scalar–matrix multiplication (α ∈ ℝ) C = αA ⇒ cij = αaij .

d) Matrix–matrix multiplication n

C = AB ⇒ cij = ∑ aik bkj . k=1

2 Basic direct and iterative methods

e)

� 105

Matrix–vector multiplication n

b = Ac ⇒ bi = ∑ aik ck . k=1

The inverse of a matrix A ∈ ℝN×N , when it exists, is a matrix B such that BA = AB = I, where I is the identity matrix. The inverse of A is denoted by A−1 . If A−1 exists, then A is considered nonsingular. Otherwise, A is singular. Definition IV.3. A complex (real) scalar λ is called an eigenvalue of the square matrix A of size n if there exist a nonzero vector υ ∈ ℝN such that Aυ = λυ. The vector υ is called an eigenvector of A associated with λ. The set of all the eigenvalues of A is called the spectrum of A and is denoted by σ(A). Definition IV.4. Two matrices A and B are considered similar if there is a nonsingular matrix X such that A = XBX −1 . The mapping B → A is called a similarity transformation. If two matrices are similar, then they have exactly the same eigenvalues.

2 Basic direct and iterative methods Let us seek the solution of the system of linear algebraic equations (SLAE) a11 x1 + a12 x2 + a13 x3 + ⋅ ⋅ ⋅ + a1N xN { { { { { { a21 x1 + a22 x2 + a23 x3 + ⋅ ⋅ ⋅ + a2N xN { { { { { a31 x1 + a32 x2 + a33 x3 + ⋅ ⋅ ⋅ + a3N xN { { { { { { { { { { { { aN1 x1 + aN2 x2 + aN3 x3 + ⋅ ⋅ ⋅ + aNN xN

= b1 , = b2 , = b3 , .. .

(4.2)

= bN ,

or N

∑ aij xj = bi , j=1

1 ⩽ i ⩽ N,

where xj (j = 1, 2, . . . , N) are unknown variables, aij (i, j = 1, 2, . . . , N) are the coefficients, and bi (i = 1, 2, . . . , N) are the nonhomogeneous terms. The first subscript i identifies the row of the equation, the second subscript j identifies the column of the system of equations, and N is the number of unknowns.

106 � IV Background in iterative methods System (4.2) is written in the matrix notation as Ax = b,

(4.3)

where a11 a21 (a A = ( 31 .. . a ( N1

a12 a22 a32 .. . aN2

a13 a23 a33 .. . aN3

... ... ... .. . ...

a1N a2N a3N ) ), .. . aNN )

x1 x2 (x ) x = ( 3), .. . x ( N)

b1 b2 (b ) b = ( 3). .. . b ( N)

The matrix A ∈ ℝN×N is called a coefficient matrix of system (4.2), b is a right-hand side vector, and x is a vector of unknowns. For a vector x, a vector of residuals denoted by r is r = b − Ax. All vectors are in bold italics. If A is a nonsingular N × N matrix and b ∈ ℝN is given, then the solution of (4.3) x = A−1 b is to be found. In direct methods the exact solution of SLAE (4.3) can be obtained by performing a finite number of arithmetic operations in the absence of round-off errors. The classical direct method is the Gauss elimination process, which successively replaces (4.3) with an upper triangular system to be solved through backward substitution. As a rule, the number of arithmetic operations needed to solve SLAE (4.3) by direct methods is proportional to N 3 , i. e., the algorithmic complexity of the directs methods is 𝒲 = O(N 3 ) arithmetic operations. Therefore direct methods are used for solving small-sized SLAEs. Other examples of such direct methods include Gauss–Jordan elimination, the matrix inverse method, and LU (lower upper) factorization [11]. Iterative methods achieve the solution asymptotically with an iterative procedure, starting from an initial guess. The computations continue until a sufficiently accurate approximation to the exact solution x = A−1 b is obtained. All iterative methods considered in this book may be expressed in the form W

x (ν+1) − x (ν) = b − Ax (ν) , τ

ν = 0, 1, 2, . . . ,

(4.4)

where x (0) is a starting guess, W and τ are any matrix and relaxation parameter, respectively. In this expression the superscript ν denotes an iteration counter, and a nonsingular matrix W is called a splitting matrix. Let SLAE (4.4) be rewritten as x (ν+1) = (I − τW −1 A)x (ν) + τW −1 b = Sx (ν) + τW −1 b,

(4.5)

2 Basic direct and iterative methods



107

where the matrix S = I − τW −1 A is called the an iteration matrix for method (4.4). The iterative method (4.5) converges to a limit if lim x (ν) = x = A−1 b.

ν→∞

The iterative method (4.5) whose related system (I − S)x = τW −1 b has a unique solution x, which is the same as the solution of (4.3), is said to be completely consistent. The basic iterative method (4.4) is always assumed to be completely consistent since this property seems essential for any reasonable method. Definition IV.5. The residual after ν iterations is r (ν) = b − Ax (ν) . Definition IV.6. The exact error after ν iterations is x − x (ν) . Definition IV.7. The iteration error after ν iterations is x (ν+1) − x (ν) . The rigorous matrix analysis gives the following convergence theorem. Theorem IV.1. Let S be a square matrix such that maxλ∈σ(S) |λ| < 1. Then I −S is nonsingular, and iteration (4.5) converges for any b and x (0) . Conversely, if iteration (4.5) converges for any b and any starting vector x (0) , then maxλ∈σ(S) |λ| < 1. Theorem IV.2 (Sufficient condition for convergence). Let S be a square matrix such that ‖S‖ < 1 for any matrix norm ‖ ⋅ ‖. Then I − S is nonsingular, and iteration (4.5) converges for any b and starting vector x (0) . For the completely consistent iterative methods, we have x = Sx + τW −1 b. Subtracting (4.5), we obtain the equation x − x (ν+1) = S(x − x (ν) ), or x − x (ν) = S ν (x − x (0) ),

ν = 1, 2, 3, . . . .

Multiplying the equation by the matrix A gives Ax − Ax (ν) = AS ν (x − x (0) ) = AS ν A−1 (Ax − Ax (0) ). Since b = Ax (4.3), we have the following equation: ν

r (ν) = AS ν A−1 r (0) = (ASA−1 ) r (0) ,

108 � IV Background in iterative methods where r (ν) = b − Ax (ν) is the residual after ν iterations (see Definition IV.5). This means that the residual norm is estimated by 󵄩󵄩 (ν) 󵄩󵄩 󵄩󵄩 −1 󵄩ν 󵄩 (0) 󵄩 󵄩󵄩r 󵄩󵄩 ≤ 󵄩󵄩ASA 󵄩󵄩󵄩 󵄩󵄩󵄩r 󵄩󵄩󵄩. Let us introduce the average reduction factor of the residual ρν = (

1/ν

‖r (ν) ‖ ) ‖r (0) ‖

(4.6)

,

which shows the averaged reduction of the residual over ν iterations. Finally, we have the estimate 󵄩 󵄩 ρν ≤ 󵄩󵄩󵄩ASA−1 󵄩󵄩󵄩.

(4.7)

The inequality ρν ≤ ‖ASA−1 ‖ < 1 means the convergence of iterations (4.4). The goal of this linear analysis based on (4.7) is to prove that ‖ASA−1 ‖ < 1. Inequality (4.7) is more convenient for a theoretical analysis of convergence of the two-grid algorithm and RMT (Section 4, Сhapter IV). Recall that the matrices S and ASA−1 have the same spectrum (see Definition IV.4). Many iterative methods begin with the decomposition of the coefficient matrix A (4.3) as a sum or product of easily invertible matrices, for example A = L + D + U, in which D is the diagonal of A, L its strict lower part, and U is its strict upper part [28]. It is always assumed that the diagonal entries of A are all nonzero. The Gauss–Seidel iterations are defined by W = L + D and τ = 1 in (4.4), i. e., (L + D)(x (ν+1) − x (ν) ) = b − Ax (ν) ,

ν = 0, 1, 2, . . . .

(4.8)

The Gauss–Seidel method is used in RMT as a smoother (Section 4, Chapter I). Often, the damped Jacobi iterations defined by W = D and τ < 1 are used for theoretical convergence analysis. For any nonsingular matrices A, the condition number cond(A) of the matrix A is defined by 󵄩 󵄩 cond(A) = ‖A‖ ⋅ 󵄩󵄩󵄩A−1 󵄩󵄩󵄩. A problem with low condition number is said to be well conditioned, whereas a problem with high condition number is said to be ill conditioned.

3 Computational grids



109

3 Computational grids Computational grid generation is the most important stage in the mathematical modeling of physical and chemical processes. As a rule, the grids determine the computational efforts and accuracy of the numerical solution. Numerous requirements on the grid properties make the grid generation in complex domains a very difficult mathematical problem: 1) Consistency with the physical domain boundaries In many cases the grid points (vertices or finite volume faces) must be located on the physical domain boundaries (so-called boundary-fitted grids), but it is difficult to formalize and automate 3D boundary-fitted grid generation in black-box software. 2) Consistency with the physical parameters and functions Some mathematical model results in a strong gradient in their solution (so-called boundary layers). Boundary layers may require the use of extremely fine and anisotropic grids near certain boundaries of the computational domain. To resolve such thin boundary layers, highly stretched cells need to be employed. This results in the use of adaptive grids, which are constructed automatically during the solution process according to an error estimator that takes the behavior of the solution into account. 3) Simplicity and accuracy of differential operator approximations Highly stretched cells complicate the approximation of the governing differential equations and can lead to an additional approximation error. Therefore it is very important to control the cell shape in the process of the computational grid generation. Here we give a rough survey of globally structured grids, locally structured grids, and unstructured grids that are used in RMT. The 1D uniform grid for FVD of BVPs on the unit interval Ω = [0, 1] is the first to be discussed. The most elementary partition in subintervals is the one where the step h is constant. Having chosen the number of subintervals (discretization parameter) nx , hx = 1/nx is posed, and two sets of points xiv and xif are introduced: xiv = xif =

i−1 = (i − 1)hx , nx

v xiv + xi+1 2i − 1 = hx , 2 2

i = 1, 2, . . . , nx + 1,

(4.9a)

i = 1, 2, . . . , nx .

(4.9b)

For the uniform grid, we have v f xi+1 − xiv = xif − xi−1 = hx = const.

The points xiv and xif can be vertices or finite volume faces, respectively. It is hard to say which configuration of the finite volumes is best in general. Often, this choice depends on

110 � IV Background in iterative methods the type of boundary conditions and on the differential problem. Figure 2.2 represents an example of 1D grid generation for nx = 8. The simplest grid for solving multidimensional problems is a Cartesian grid, which is composed of squares (cubes in 3D) and aligned with the Cartesian coordinate axes. If the domain boundaries do not coincide with the regular mesh lines, Cartesian grids can be generated easily and with low computational effort even for geometrically complex domains. In general, the weakness of Cartesian grids can be the solution accuracy at the domain boundaries (Figure 4.1). This problem becomes especially serious in the case of RANS simulations, where highly stretched and boundary orthogonal cells are required. In the following, such Cartesian grids will be used for computations of the coarse grid corrections in a two-grid algorithm.

Figure 4.1: A Cartesian grid in a geometrically complex domain.

Structured boundary-fitted grids can be generated if the given (physical) domain can be mapped to a rectangular (computational) domain. Figure 4.2 shows how to transform a nonrectangular region [A, B, C, D] in the physical plane into a square uniformly spaced grid in the computational plane: a) Direct mapping [A, B, C, D] → [0, 1] × [0, 1]: x , xD y(x) ȳ = . δ(x)

x̄ =

b) Inverse mapping [0, 1] × [0, 1] → [A, B, C, D]: x = xD x,̄

y = δ(x)y.̄

3 Computational grids

� 111

In the context of interdomain mappings, there are two different approaches. In the first, coordinate transformations are used to obtain simple domains and correspondingly simple (rectangular) grids. Here the differential (and/or the discrete) equations are transformed into the new curvilinear coordinates. In the second approach, the computations are performed in the physical domain with the original (nontransformed) equations [35].

Figure 4.2: Interdomain mapping.

It is well known that the accuracy of the solution is degraded by grid distortion. For high accuracy, the grid should be orthogonal or near-orthogonal. Assume that two functions U(x, y) and V (x, y) are defined in some 2D domain (Figure 4.3). A computational grid will be generated as an intersection of these function isolines. Let F(x0 , y0 ) be some point of intersection of the isolines U(x, y) = const and V (x, y) = const. The tangents Y U (x) and Y V (x) to the isolines U(x, y) = const and V (x, y) = const become Y U (x) = y0 − Y V (x) = y0 −

Ux′ (x0 , y0 ) (x − x0 ), Uy′ (x0 , y0 )

Vx′ (x0 , y0 ) (x − x0 ). Vy′ (x0 , y0 )

The isoline perpendicularity condition takes the form (Figure 4.4) Ux′ (x0 , y0 ) Vx′ (x0 , y0 ) = −1. Uy′ (x0 , y0 ) Vy′ (x0 , y0 ) An arbitrary choice of the point F(x0 , y0 ) leads to the orthogonality condition of the computational grid

112 � IV Background in iterative methods Ux′ Vx′ + Uy′ Vy′ = 0.

(4.10)

Assume that these functions U(x, y) and V (x, y) are related to a function Ψ(x, y) by U = Ψ′x + Ψ′y , V=

Ψ′x



Ψ′y .

(4.11a) (4.11b)

Substitution of (4.11) into (4.10) leads to the following lemma.

Figure 4.3: 2D grid generation.

Figure 4.4: The isoline perpendicularity.

′′ ′′ ′′ Lemma IV.1. If a function Ψ(x, y) satisfies to Ψ′′ xx − Ψyy = 0 or Ψxx + Ψyy = 0, then the ′ ′ ′ ′ isolines of the functions U = Ψx + Ψy and V = Ψx − Ψy form an orthogonal grid.

Unfortunately, this approach cannot be used in the 3D case. It is easy to see that the grid generation approach based on interdomain mapping can be used if the domain geometry is simple enough. More generally, all the types of grids mentioned above can be used in the context of overlapping grids [35]. A typical situation for the use of overlapping grids is where an overall Cartesian grid is combined with a local boundary-fitted grid. An example of this approach is shown in Figure 4.5.

3 Computational grids



113

Figure 4.5: Overlapping boundary-fitted grids.

Unstructured automatic mesh generation is much easier than the generation of block-structured grids for very complicated domains. From the multigrid point of view, unstructured grids are a complication. For a given unstructured grid (Figure 4.6), it is usually not difficult to define a sequence of finer grids, but it can be difficult to define a sequence of reasonable coarser grids [35].

Figure 4.6: An unstructured grid in the geometrically complex domain.

Thus unstructured grids are easier to generate in complex domains, but it is more difficult to develop an efficient algorithm for solving (initial-)boundary value problems. In contrast, the structured grids are much more difficult to generate, but they allow the development of an efficient geometric multigrid algorithm for solving (initial-)boundary value problems. The auxiliary space method (Section 7, Сhapter I) is one of attempts to combine the advantages of both approaches. The numerical solution of PDEs strongly depends on the computational grid, so the grid must be adapted to the features of the desired numerical solution, i. e., the grid is a problem-dependent component of the iterative algorithm. In an adaptive method the grid and/or the discretization (type, order) are adapted to the behavior of the solution to solve a given problem more efficiently and/or more accurately [35]. The most popular

114 � IV Background in iterative methods way to generate an “optimal” grid is to use the self-adaptive grid refinement approach, where the grid refinements are carried out dynamically during the solution process, controlled by some appropriate adaptation criteria. We will show the main idea of this approach using the 1D singularly perturbed equation u′ + εu′′ = 0,

u(0) = 0, u(1) = 1,

(4.12)

where ε > 0 is a parameter. The exact solution of this problem u(x) =

1 − exp(−x/ε) 1 − exp(−1/ε)

has a sharp gradient (boundary layer) near the boundary x = 0 for ε ≪ 1. The layer thickness depends on the parameter ε. First-order upwind discretization on the uniform grid (4.9) leads to the discrete analogue of the BVP (4.12) φ(0) 1 = 0, a(0) φ(0) i−1

+

b(0) φ(0) i

+

c(0) φ(0) i+1 φ(0) n(0) +1

(4.13a)

= 0,

(4.13b)

= 1,

(4.13c)

where a(0) =

ε + ϖ − 1, h(0)

b(0) = −

2ε − 2ϖ + 1, h(0)

c(0) =

ε + ϖ, h(0)

1 ε ϖ = max( ; 1 − (0) ), 2 h

h(0) = 1/n(0) is the mesh size, and φ(0) is the discrete analogue of u: φ(0) = u(xiv ) [20, 27, 35]. i i Figure 4.7a shows the numerical solution of the resulting SLAE (4.13) for n(0) = 100 and ε = 10−3 . Self-adaptive refinements are particularly useful if the solution shows a certain singular behavior that is not (exactly) known in advance but detected during the solution process [35]. Typically, adaptive grid refinement techniques start with a grid covering the whole computational domain, here the superscript “0” denotes affiliation to the initial (uniform) grid. It is easy to see that the numerical solution of (4.13) becomes less accurate in the sharp gradient subdomain (boundary layer) near the boundary x = 0. Generation of the adaptive grids requires some refinement criterion on the basis of which the algorithm can work and, in particular, also automatically terminate. It is very difficult to propose a robust refinement criterion; therefore the simplest considerations will be used. Since the exact solution u(x) is a differentiable function, it is expected that the discrete left- and right-hand derivatives should be sufficiently close to each other: |φ(0) − φ(0) | i i−1 h(0)



|φ(0) − φ(0) | i+1 i h(0)

as h(0) → 0.

3 Computational grids



115

Figure 4.7: Numerical solution of BVP (4.12) on an adaptive grid.

As a result, the refinement criterion can be defined as 󵄨󵄨 |φ(0) − φ(0) | |φ(0) − φ(0) | 󵄨󵄨 󵄨 󵄨 max󵄨󵄨󵄨 i+1 (0) i − i (0) i−1 󵄨󵄨󵄨 < ϵ, 󵄨󵄨 i 󵄨󵄨 h h where the parameter ϵ depends on 󵄨󵄨 |φ(0) − φ(0) | |φ(0) − φ(0) | 󵄨󵄨 󵄨 󵄨 min󵄨󵄨󵄨 i+1 (0) i − i (0) i−1 󵄨󵄨󵄨. 󵄨󵄨 i 󵄨󵄨 h h The numerical solution φ(0) in the second vertex (0) x2v demonstrates a sharp change i of the first derivatives:

116 � IV Background in iterative methods (0) |φ(0) 2 − φ1 |

h(0)



(0) |φ(0) 3 − φ2 |

h(0)

,

i. e., the initial grid should be refined in the region (0, (0) x3v ). For the given purpose, a local fine subgrid is generated: (1) v xi

= (0) x3v

i−1 , n(1)

i = 1, 2, . . . , n(1) + 1,

where n(1) = 6 (triple refinement). The discrete BVP on the fine subgrid becomes φ(1) 1 = 0, aφ(1) i−1

+

bφ(1) i

+ cφ(1) i+1 (0) φn(1) +1

= 0, =

φ(0) 2 .

(4.14a) (4.14b) (4.14c)

The numerical solution φ(1) shown in Figure 4.7b demonstrates a sharp change of the i first derivatives in the region (0, (1) x3v ). Continuing the grid refinement (Figure 4.7c), a satisfactory boundary layer resolution is obtained (Figure 4.7d). This approach can be used to approximate coarse grid correction computation in multigrid methods (Figure 4.8). The combination of Cartesian grid and the adaptive mesh refinement technology is an effective way to handle complex geometry and solve real-life flow problems.

Figure 4.8: Adaptively refined grid.

The most fruitful way of constructing a robust algorithm is to combine the basic ingredients (grid generation, approximation of the differential problem, and solution of the resulting grid equations) into a unified computational technique. In the two-grid algorithm (Section 7, Сhapter I), the grid GO for computing the approximation to solu-

3 Computational grids

� 117

tion û (1.29) should be constructed using the auxiliary grid GA for computing the correction c (1.29) and accurate boundary treatment. For these purposes, A. S. Karavaev and S. P. Kopysov developed an original adaptive hexahedral grid generator using a priori information about the solution obtained on the auxiliary grid GA [15, 16]. The goal of this activity is not only to provide numerical solutions of partial differential equation with high accuracy, but also to minimize the difference between the grids GO and GA for more robust problem-dependent intergrid operators ℛO→A (1.34), ℛ̂ O→A (1.35), and 𝒫A→O . To illustrate this process, we consider the hexahedral grid generation in the domain shown in Figure 4.9. An auxiliary grid GO for computing the correction c (1.29) is presented in Figure 4.10. A hexahedral grid GA for computing the approximation to solution û (1.29) is sketched in Figure 4.11.

Figure 4.9: Physical domain.

Another example the hexahedral grid generation is shown in Figure 4.12. Grid generation has been discussed in the traditional numerical literature [2, 9, 10, 17, 18, 19, 25, 34].

118 � IV Background in iterative methods

Figure 4.10: Auxiliary grid GA for computing the correction c (1.29).

4 Auxiliary space preconditioner The solution u(x) = (u(1) (x), u(2) (x), . . . , u(NM ) (x))T of 3D BVP for the following system of linear elliptic partial differential equations needs to be found: NM

∑ ℒΩ u(j) (x) = fΩ(i) (x), (ij)

j=1

NM

(i) (x), ∑ ℒ𝜕Ω u(j) (x) = f𝜕Ω j=1

(ij)

x ∈ Ω, x ∈ 𝜕Ω,

i = 1, 2, . . . , NM ,

(4.15a)

i = 1, 2, . . . , NB .

(4.15b)

Here Ω ∈ ℝ3 is any bounded domain with boundary 𝜕Ω, x = (x1 , x2 , x3 )T , ℒΩ and ℒ𝜕Ω (i) are the given functions in the doare linear elliptic differential operators, and fΩ(i) and f𝜕Ω main Ω and its boundary 𝜕Ω (NM ⩽ NB ). For brevity, this boundary value problem is denoted as ℒu = f , and we assume that it has a unique solution u = ℒ−1 f . (ij)

(ij)

4 Auxiliary space preconditioner

Figure 4.11: Hexahedral grid GO for computing the approximation to solution û (1.29).

Figure 4.12: Example of the hexahedral grid.

� 119

120 � IV Background in iterative methods The solution u(x) of the system ℒu = f (4.15) can be represented as the sum of two functions u = û + c,

(4.16)

where û is an approximation to the solution, and c is a correction. This representation is called Σ-modification of the solution u, and substitution of (4.16) into (4.15) leads to the Σ-modified system ℒc = f − ℒu.̂

(4.17)

Let a computational grid G0 be generated in the domain Ω ∪ 𝜕Ω, where NG is the 0 number of grid points. Below, the subscript “0” (original) will denote affiliation to the original grid G0 . Some discretization of system (4.17) on the grid G0 can be denoted as h h

h

h h

ℒ0 c0 = f 0 − ℒ0 û 0 .

Some ordering of the unknowns makes it possible to construct the resulting SLAE A0 c0 = b0 − A0 φ0 , where c0 and φ0 are the vectors of correction and approximation to the solution. Solution of the SLAE requires the same computational effort as for the original system A0 φ0 = b0 . (q) h Let φ0 be the approximation to the solution û 0 after q iterations of the algorithm A0 c0 = b0 − A0 φ0 . (q)

(4.18)

It is necessary to formulate an auxiliary system to reduce the total amount of computations. Let an auxiliary computational grid GA be generated, where NG (≈ NG ) is the A 0 number of auxiliary grid points. Below, the subscript “A” (auxiliary) will denote affiliation to the auxiliary grid GA . Second-order FVD of (4.17) on the auxiliary grid GA becomes h h

h

h h

ℒA cA = f A − ℒA û A .

Some ordering of the unknowns makes it possible to form the auxiliary SLAE AA cA = bA − AA φA , (q)

h

(4.19)

where φA is the approximation to the solution û 0 on the auxiliary grid GA after q iterations. It should be emphasized that systems (4.18) and (4.19) are independent of each other. To use the auxiliary SLAE (4.19) for reduction of the computational work required to solve the SLAE (4.18), we assume that (q)

4 Auxiliary space preconditioner � 121

bA − AA φA = ℛ0→A (b0 − A0 φ0 ), (q)

(q)

where ℛ0→A is a restriction operator transferring the residual b0 − A0 φ0 from the original grid G0 to the auxiliary grid GA , i. e., replace the right-hand vector of (4.19) with the right-hand side vector (4.18) transferred interpolated from the original grid G0 to the auxiliary grid GA . Then the equation for correction (4.19) becomes (q)

AA cA = ℛ0→A (b0 − A0 φ0 ).

(4.20)

(q)

We assume that this auxiliary SLAE is solved exactly and the correction cA is cA = A−1 A ℛ0→A (b0 − A0 φ0 ). (q)

Interpolation of the correction cA from the auxiliary grid GA to the original grid G0 gives −1 c(0) 0 = 𝒫A→0 c A = 𝒫A→0 AA ℛ0→A (b0 − A0 φ0 ), (q)

(4.21)

where 𝒫A→0 is a prolongation operator. Note that the intergrid operators ℛ0→A and 𝒫A→0 put an error to the correction c(0) 0 if G0 ≠ GA . As a result, the computed correction c(0) is not the same as the exact value c0 , i. e., 0 −1 c(0) 0 ≠ c 0 = A0 b0 − φ0 . (q)

The error introduced by the ℛ0→A and 𝒫A→0 operators can be reduced with a small number of iterations of a basic iterative method (smoother) on the original grid G0 (ν) W0 (c(ν+1) − c(ν) 0 0 ) = b0 − A0 φ0 − A0 c 0 , (q)

ν = 0, 1, 2, . . . ,

(4.22)

where W0 is the splitting matrix of the iterative (Gauss–Seidel) method. The correction c(0) = 𝒫A→0 cA interpolated from the auxiliary grid GA is used as the starting guess 0 on the original grid G0 . To get the iteration matrix of this method explicitly, we rewrite (4.22) as −1 c(ν+1) = (I − W0−1 A0 )c(ν) 0 0 + W0 (b0 − A0 φ0 ), (q)

ν = 0, 1, 2, . . . .

(4.23)

We assume that the matrix of smoothing iterations S0 = I −W0−1 A0 satisfies the sufficient convergence condition (see Theorem IV.2) 󵄩 󵄩 ‖S0 ‖ = 󵄩󵄩󵄩I − W0−1 A0 󵄩󵄩󵄩 ⩽ ω0 < 1, where ω0 depends on the mesh size of the original grid G0 . (q) If the exact solution c0 = A−1 0 b0 −φ0 SLAE (4.18) is the fixed point of iterations (4.23), then the result is

122 � IV Background in iterative methods ν (0) ν c(ν) 0 − c 0 = S (c 0 − c 0 ) = S (𝒫A→0 c A − c 0 ).

Taking (4.21) into account, the correction c(ν) 0 becomes ν (0) c(ν) 0 = c 0 + S0 (c 0 − c 0 )

ν −1 −1 = A−1 0 b0 − φ0 + S0 (𝒫A→0 AA ℛ0→A (b0 − A0 φ0 ) − A0 (b0 − A0 φ0 )) (q)

(q)

(q)

ν −1 −1 = A−1 0 (b0 − A0 φ0 ) + S0 (𝒫A→0 AA ℛ0→A − A0 )(b0 − A0 φ0 ) (q)

(q)

ν −1 −1 = [A−1 0 + S0 (𝒫A→0 AA ℛ0→A − A0 )](b0 − A0 φ0 ). (q)

Adding the correction c(ν) 0 to the old approximation to the solution φ0 gives a new ap(q)

proximation to the solution

(q+1) φ0

φ0

(q+1)

= c(ν) 0 + φ0

(q)

or b0 − A0 φ0

(q+1)

= M(b0 − A0 φ0 ), (q)

(4.24)

where the iteration matrix of the linear two-grid algorithm is −1 M = A0 S0ν (A−1 0 − 𝒫A→0 AA ℛ0→A ).

(4.25)

The two-grid algorithm (4.24) is shown in Figure 1.7. To analyze the two-grid convergence, Eq. (4.24) should be rewritten as b0 − A0 φ0 = M q (b0 − A0 φ(0) 0 ),

q = 0, 1, 2, . . . ,

(q)

which leads to the estimation as (4.7), 󵄩 󵄩 󵄩 󵄩󵄩 −1 ρq ≤ ‖M‖ ≤ 󵄩󵄩󵄩A0 S0ν 󵄩󵄩󵄩 ⋅ 󵄩󵄩󵄩A−1 0 − 𝒫A→0 AA ℛ0→A 󵄩 󵄩,

(4.26)

where ρq is an average reduction factor of the residual, ρq = (

‖b0 − A0 φ0 ‖

1/q

(q)

‖b0 − A0 φ(0) 0 ‖

)

.

(4.27)

The factor ρq is similar to ρν (4.6), i. e., ρq shows the averaged reduction of the residual over q intergrid iterations. The classical multigrid theory is based on the approximation and smoothing property as introduced by Hackbusch [12]: 1. Smoothing property: there exists a monotonically decreasing function η(ν) : ℝ+ → ℝ+ such that η(ν) → 0 as ν → ∞ and

4 Auxiliary space preconditioner

󵄩󵄩 ν󵄩 󵄩󵄩A0 S0 󵄩󵄩󵄩 ⩽ η(ν)‖A0 ‖. 2.

� 123

(4.28)

Approximation property: there exists a constant CA > 0 such that 󵄩󵄩 −1 󵄩 −1 −1 󵄩󵄩A0 − 𝒫A→0 AA ℛ0→A 󵄩󵄩󵄩 ⩽ CA ‖A0 ‖ .

(4.29)

The smoothing property states, in principle, that the smoother reduces the highfrequency components of the error (without amplifying the low-frequency components). The approximation property requires the coarse grid correction to be reasonable [12]. The basis of this theory is a splitting of the two-grid iteration matrix M (4.25) as (4.26). Theorem IV.3. Assuming that the smoothing property (4.28) and approximation property (4.29) hold, the h-independent estimation of ρq (4.27) follows immediately for ν large enough. Proof. Using (4.26), we have 󵄩 󵄩 󵄩 󵄩󵄩 −1 ρq ≤ 󵄩󵄩󵄩A0 S0ν 󵄩󵄩󵄩 ⋅ 󵄩󵄩󵄩A−1 0 − 𝒫A→0 AA ℛ0→A 󵄩 󵄩 ≤ CA η(ν) < 1. This theorem predicts that the average reduction factor of the residual ρq depends on the number of smoothing iterations, but the number of intergrid iterations (G0 󴀘󴀯 GA ) is independent of the mesh size of the original grid G0 and the auxiliary grid GA . However, the fact that a certain iterative method has an h-independent average reduction factor says nothing about its efficiency as long as the computational work is not taken into account. If SLAE (4.20) is solved by RMT, then the total effort is the sum of work needed for smoothing on the original grid G0 and on the auxiliary grid GA , i. e., 𝒲 = q(𝒲0 + 𝒲A ) ao,

where q is the number of intergrid (G0 󴀘󴀯 GA ) iterations. As described in Section 6, Сhapter II, for point smoother, we have 𝒲0 = C0 ν0 N0 ao,

1 d

𝒲A = CA νA qA NA [log3 NA ] ao.

If both grids have approximately the same number of points (N0 ≈ NA ≈ N), then the algorithmic complexity of the two-grid algorithm becomes 𝒲 ≤ C0 ν0 qN(1 + C̄ A

νA qA log3 N) ao. ν0 d

For sufficiently fine grids (N → ∞), the two-grid algorithm has close-to-optimal complexity

124 � IV Background in iterative methods 𝒲 ≤ CA

νA qqA N log3 N ao. d

Linear iterations of RMT can be written as (4.24) with the iteration matrix M = A0 Q0 , where the matrix Ql is defined in a recursive form ν

Sl l (dl ℛ0→l + 𝒫l+1→l Ql+1 ),

Ql = {

ν

Sl l dl ℛ0→l ,

l = 0, 1, 2, . . . , L+3 − 2, l = L+3 − 1,

and −1 dl = A−1 l − 𝒫l+1→l Al+1 ℛl→l+1 .

If the smoothing property (4.28) and approximation property (4.29) hold, then the h-independent estimation of ρq (4.27) follows immediately for ν large enough [21]. Remark. The original system A0 φ0 = b0 can be rewritten as P−1 A0 φ0 = P−1 b0 . This system, which has the same solution as the original system A0 φ0 = b0 , is called a preconditioned system, and P = (I − M)−1 A0 is the preconditioning matrix or preconditioner. The matrix M is defined by (4.25). In other words, a relaxation scheme is equivalent to a fixed-point iteration on a preconditioned system.

5 Remarks on the smoothing and approximation property Although the multigrid algorithms have been successfully applied to various nonsymmetric problems (A ≠ AT ), there is no satisfactory convergence theory to date for the nonsymmetric case. To illustrate the limitations of smoothing analysis, a damped iteration is considered, i. e., τ = (1 + ω)−1 in (4.4): (1 + ω)W (φ(ν+1) − φ(ν) ) = b − Aφ(ν) with parameter ω ⩾ 0. The iteration matrix S(ω) (4.5) becomes S(ω) = I −

1 W −1 A. 1+ω

The goal of the smoothing analysis is to estimate the norm of the matrix AS ν (ω) = (1 + ω)W (I − S(ω))S ν (ω) = (1 + ω)W

1 ν (I − S(0))(ωI + S(0)) ν+1 (1 + ω)

5 Remarks on the smoothing and approximation property

� 125

as a function of the parameter ω. The proof of the smoothing property for the nonsymmetric iterative method defined by the splitting matrix W can be reduced to the following subproblems. Subproblem 1: Proof of ν󵄩 󵄩󵄩 ̂ ω). 󵄩󵄩(I − S(0))(ωI + S(0)) 󵄩󵄩󵄩 ⩽ η(ν,

Subproblem 2: Proof of ‖W ‖ ⩽ c‖A‖, where c is a constant. The solution of the first subproblem is based on the following lemmas. Lemma IV.2. If ‖S(0)‖ = ‖I − W −1 A‖ < 1, then for any ω ⩾ 0, we have ‖S(ω)‖ < 1. Proof in [21]. Lemma IV.3. If ‖S(0)‖ < 1 and 3 − 2√2 ⩽ ω ⩽ 3 + 2√2, then we have 1 1 ν󵄩 󵄩󵄩 , 󵄩(I − S(0))(ωI + S(0)) 󵄩󵄩󵄩 ⩽ √eων (1 + ω)ν+1 󵄩

for ν = 1, 2, . . . .

Proof in [21]. If both subproblems have been solved, then the smoothing property (4.28) holds: 1 1+ω ν󵄩 󵄩󵄩 ν 󵄩󵄩 󵄩󵄩 ̄ ω)‖A‖, ‖A‖ = η(ν, 󵄩󵄩AS (ω)󵄩󵄩 ⩽ (1 + ω)‖W ‖ 󵄩(I − S(0))(ωI + S(0)) 󵄩󵄩󵄩 ⩽ c √eων (1 + ω)ν+1 󵄩 ̄ ω) is given by where the function η(ν, ̄ ω) = c η(ν,

1+ω . √eων

In a nonsymmetric case the smoothing property can be proved only for the nonsymmetric iterative method with ω ∈ [3 − 2√2, 3 + 2√2]. In practice, the results of the numerical experiments may differ from what is predicted by the above-mentioned convergence study. Often, local Fourier analysis or other tools can be used for the quantitative research of multigrid methods [35]. Now we will illustrate the approximation property (4.29) used in the multigrid convergence analysis. Let a discretization of some linear boundary value problem on the finest grid G0 and on the coarse grids of the first level be abbreviated as the resulting SLAEs: A0 u0 = b0 ,

(4.30)

126 � IV Background in iterative methods and A1 u1 = b1 .

(4.31)

Recall that the right-hand side vectors of (4.30) and (4.31) are related by b1 = ℛ0→1 b0 , where ℛ0→1 is the restriction operator. The finest grid solution u0 = A−1 0 b0 (4.30) and the coarse grid solution u1 = −1 A1 b1 (4.31) interpolated from the coarser grids of the first level to the finest grid G0 are compared, i. e., 𝒫1→0 u1 = 𝒫1→0 A1 ℛ0→1 b0 , −1

where 𝒫1→0 is the prolongation operator. This comparison can be written as −1 u0 − 𝒫1→0 u1 = (A−1 0 − 𝒫1→0 A1 ℛ0→1 )b0 .

If the approximation property (4.29) holds, then −1 ‖u0 − 𝒫1→0 u1 ‖ ‖(A−1 󵄩 󵄩󵄩 −1 −1 0 − 𝒫1→0 A1 ℛ0→1 )b0 ‖ = ⩽ 󵄩󵄩󵄩A−1 0 − 𝒫1→0 A1 ℛ0→1 󵄩 󵄩 ⩽ CA ‖A0 ‖ . ‖b0 ‖ ‖b0 ‖

If ‖A0 ‖−1 = Chδ , δ > 0, and ‖b0 ‖ < Cb , then ‖u0 − 𝒫1→0 u1 ‖ = O(hδ ). A detailed explanation of this theoretical approach based on the multigrid grid matrix factorization is given in [12]. Unfortunately, it hardly seems possible to acquire useful aids for solving applied problems from theoretical analysis.

Conclusions At present, many numerical methods have been developed for solving the real-life problems of heat transfer, fluid dynamics, elasticity, electromagnetism, and other issues in complicated geometries. A characteristic feature of these real-life problems is an approximate mathematical description of physical and chemical phenomena, i. e., the simulation error has predominantly physical nature (Section 2, Сhapter I). It can be argued that we are able to solve these problems with the required accuracy, although some theoretical tasks remain unsolved. From a practical point of view, the next step is to formalize the mathematical modeling to reduce the effort for coding and debugging. The best solution of the problem of formalizing computation is to develop a single computational algorithm to solve all the applied tasks that are currently solvable. This single algorithm will be implemented in black-box software,1 which can be used to solve a large class of applied problems. In other words, the problem of formalizing computation is related to the solution of the tasks already solved by well-known algorithms and not to the development of new numerical methods for solving new tasks. However, numerous computational methods developed for various applied problems are very difficult to combine into a single black-box algorithm. As a rule, all algorithms have problem-dependent components (such as the computational grid, ordering of the unknowns, the stopping criterion, and others), their optimal values are not known in advance, and these components must be determined during the solution process to achieve the required accuracy with the least effort. Optimization of the problemdependent components is a very difficult problem; therefore black-box algorithm should have the least number of problem-dependent components and efficient tools for their black-box optimization. Let the real-life problems be described by the systems of nonlinear PDEs with (initial and) boundary conditions (1.1). These systems can be solved in a segregated or coupled manner in complicated domain geometries, but it is not known in advance which method is better. We can use a structured, block-structured, or unstructured grid and finite-difference, finite volume, finite element, or another method to discretize the given real-life problem with kth-order discretization (k ≥ 2). Since the simulation error has a physical nature, some discretization of governing PDEs (1.2) with an accuracy of less or equal to the second order does not damage the discrete solution accuracy required for practical applications (Section 2, Сhapter I). However, advanced blackbox software should use high-order discretizations without significant changes in the computational algorithm. To develop the black-box solver, it is necessary to consider the most difficult problem, assuming that this algorithm will be effective for all simpler problems. Unstructured irregular grids and corresponding finite volume or finite element discretizations

1 Our definition of black-box software is given in the Preface. https://doi.org/10.1515/9783111319568-005

128 � Conclusions seem to be preferable for many real-life problems. These grids are more flexible and can easily be adapted to the boundary of a general domain [35]. A numerical solution (1.3) of the resulting system of nonlinear algebraic equations (1.2) arising from differential problem (1.1) discretization on the unstructured irregular grid is the most time-consuming component of the black-box solver. Coupled iterations use special block ordering of the unknowns, local linearization of the nonlinear algebraic equations, and solution of linearized equations using the direct or iterative method (Section 6, Сhapter I). The main requirements for the black-box solvers are: а) Robustness (the least number of problem-independent components). b) Efficiency (close-to-optimal algorithmic complexity). c) Parallelism (a parallel robust algorithm should be faster than the fastest sequential one). They are discussed in Chapter I. The problem of black-box solver development is formulated in Section 5, Сhapter I. Multigrid methods are one of the fastest solution techniques known today. Unfortunately, unstructured grids cannot be handled by a multigrid as easily as structured ones [35]. Generating a hierarchy of coarse grids is not a trivial task for a general unstructured grid. To avoid problems caused by the generation of an unstructured grid hierarchy, algebraic multigrid methods (AMG) were proposed and developed. In classic AMG the “coarse grid” variables are regarded as a subset of “fine grid” ones. The iteration of AMG uses a transfer of the discrete problems between fictitious grids without geometric information. It complicates the coupled solution of the systems of nonlinear PDEs. In addition, the large communication overhead and the idling processes on the coarse grids reduce the efficiency of parallel classic multigrid methods. To avoid the transfer of the systems of discrete nonlinear PDEs between grids or levels, the auxiliary space method (ASM) was developed for solving real-life problems (Section 7, Сhapter I). The basic idea of the ASM is to use an auxiliary structured grid GA , where the nonlinear problems are more easily solved in the coupled manner [39]. To implement ASM without transferring discrete problems between the original and the auxiliary grids, it is necessary to represent the desired solution as a sum or product of the correction and the approximation to the solution before discretization. Various approaches can be used for the discretization of nonlinear problems on the original grid GO , which can lead to discrete solutions with at least a second order of accuracy. The FVD is used on the auxiliary structured grid, where the correction should be computed with an accuracy of no more than a second order. This two-grid algorithm can be considered as a variant of the defect correction approach (Section 7, Сhapter I), where the second-order FVD on the auxiliary grid is used to obtain high-order solution accuracy on the original grid. In general, a residual (operator ℛO→A ) and an approximation to the solution (operator ℛ̂ O→A ) are transferred from the original grid to the auxiliary one, and an auxiliary grid correction (operator 𝒫A→O ) is interpolated back (the so-called FAS scheme, Section 7, Сhapter I). Some nonlinear problems (for example,

Conclusions

� 129

the Navier–Stokes equations) can be solved without FAS [21]. The ASM makes it possible to combine the advantages of unstructured grids (simplicity of generation in complex domain geometry) and structured grids (opportunity to solve nonlinear BVPs and IBVPs by very efficient geometric multigrid methods). Figure 1 represents the parallel FAS/RMT-based two-grid algorithm for solving nonlinear BVPs, i. e., RMT is used for computing the correction cA as shown in Figure 1.7. A characteristic feature of the iterative approach is that sufficiently accurate correction can be obtained on the auxiliary grid GA in parallel as combination of solutions of independent discrete problems on the dynamic level. In addition, it is possible to blackbox optimize the problem-dependent components and to adapt the dynamic grids to the solution features. Then the parallel smoothing iterations on the auxiliary grid GA are stopped, the sufficiently accurate correction and the adaptively refined auxiliary grid GA are obtained and used for the original grid generation GO . The original grid GO should be refined in the same subdomains as the auxiliary grid GA . The most attractive approach for generating GO consists in preserving the structured pattern of GA as much as possible in order to simplify the correction prolongation operator 𝒫A→O cA , the structured grid in the interior part of the domain as shown in Section 3, Сhapter IV. The smoothing procedure used on GO should be close to that on GA (for example, if the coupled smoothing is used on GA as a result of black-box optimization, then the coupled smoothing should be used on GO ). Since the correction cA computed in the first intergrid iteration is sufficiently accurate, the following intergrid iterations are implemented without black-box optimization of the problem-dependent components and additional grid refinement as shown in Figure 1.

Figure 1: Nonlinear parallel FAS/RMT-based intergrid iterations of the two-grid algorithm: ∘ – smoothing on the auxiliary grid GA , ∙ – smoothing and updating on the original grid GO , ⋄ – generation of the original grid, × – check convergence of the intergrid iterations.

130 � Conclusions A comparison of the basic algorithm (1.28) and the two-grid one gives the minimum number of extra problem-dependent components: а) A nonnested case: let GO and GA be the unstructured and structured grids, respectively (Section 3, Сhapter IV). The two-grid algorithm has three extra problemdependent components: transfer operators ℛO→A , ℛ̂ O→A , and 𝒫A→O . b) A nested case: let GO and GA be the block-structured grids (Section 3, Сhapter IV). In this case, GA = GO ⇒ ℛO→A = ℛ̂ O→A = 𝒫A→O = I, and the two-grid algorithm has an extra problem-dependent component (interblock interpolation). c) A nested case: let GO and GA be the structured grids (Section 3, Сhapter IV). In this case, GA = GO ⇒ ℛO→A = ℛ̂ O→A = 𝒫A→O = I, and the two-grid algorithm has no extra problem-dependent components. In general, each intergrid iteration consists of the restriction of the residual and the approximation to the solution, the computation of the auxiliary grid correction, the prolongation of the correction, and a few smoothing steps (a sweep over all (blocks of) unknowns) on the original grid GO . Theoretical analysis predicts that the number of intergrid iterations is independent of the size of the linear problems (Section 4, Сhapter IV). Since basic idea of ASM is to use the auxiliary grid for more computational work, the total number of problem-dependent components, total algorithmic complexity and total parallelism of the two-grid algorithm critically depend on the solver used on the auxiliary grid. Robust multigrid technique is based on triple coarsening, FVD, the low-order Shortley–Weller-type discretizations, and adaptive grid refinement is used for computing the correction on the auxiliary grid (Chapter II). The features of RMT are: a) Triple coarsening based on the agglomeration of three finite volumes on the finer grids makes it possible to accurately approximate the integrals over this united volume and its faces to form the coarse grid problem independently on mesh size. b) A multiple coarsening strategy coupled with the FVD (Section 3, Сhapter I) leads to the problem-independent transfer operators and the coarse grid operators, high parallel efficiency, and making the task of the smoother the least demanding. In fact, RMT is a single-grid algorithm based on the essential multigrid principle (2.43). c) An extra problem-dependent component of RMT is the number of smoothing iterations. d) The computational cost of linear iteration of the RMT is O(N log N) ao, where N is the number of unknowns, and the number of the linear multigrid iterations is independent of the mesh size [21]. Close-to-optimal complexity is a result of the least number of the problem-dependent components. e) Black-box optimization on the multigrid structures makes it possible to optimize some problem-dependent components by comparatively analyzing results of computational experiments performed on grids of the same level (for example, choosing between segregated and coupled (Vanka-type) iterations for systems). The amount of work for this optimization is negligible (Section 7, Сhapter II).

Conclusions

� 131

In the simplest case, the efficient parallelization of RMT is achieved by distributing the set of discrete tasks over the p = 3ℓκ , κ = 2, . . . , d, computing units, ℓ = 1, 2, . . . is the dynamic level. Theoretically, the execution time of the parallel RMT implemented over nine computing units is approximately equal to the execution time of the sequential V-cycle (Section 1, Сhapter III). The usual classification of RMT-based parallelism depends on the ratio of the number of grids/computing units: 1) Geometric parallelism: the number of grids is larger than or equal to the number of computing units (3dl ≥ p = 3ℓκ , l = 1, 2, . . . , d = 2, 3). 2) Algebraic parallelism: the number of grids is less than the number of computing units (3dl < p = 3ℓκ , l = 1, 2, . . . , d = 2, 3). For ℓ = 1, the geometric parallelism of RMT is based on nonoverlapping the finest grid partition for distribution of 3d , d = 2, 3, independent tasks over p = 3κ , κ = 1, 2, . . . , d, computing units in the low-thread implementation (Section 2, Сhapter III). Numerical solutions of these independent tasks form a sufficiently accurate starting guess for numerical solutions on the finest grid (Figure 3.3). Algebraic parallelism of RMT is based on multicolor orderings of unknowns (or a block of unknowns) to parallelize the smoothing iterations on the finer grids (Section 3, Сhapter III). As shown in Section 5, Сhapter III, RMT can be used for unified parallel solving time(in)dependent problems, where the main differences lie in the number of time steps treated in parallel (one time step for BVPs and several time steps for IBVPs) and in the number of extra grid levels (which is L+3 for the ill-conditioned problems and L∗3 ≤ L+3 for the well-conditioned problems). Finally, the most general case is described, in which there is a numerical solution for a system of nonlinear time-dependent PDEs in parallel on Nt time steps, but the original grid GO , the auxiliary grid GA , and the smoother are not given in advance. The solution process starts on the multigrid structures generated by the uniform dynamic grids (Figure 3.5) with zero initial guess (i. e., the correction coincides with the approximation to the solution). By solving these independent 3d IBVPs on Nt time steps we choose the ordering of unknowns, the smoother, the type of smoothing (segregated or coupled) iterations, and adapt the dynamic grids to the behavior of the solution through black-box optimization (Section 7, Сhapter II). Remember that the optimization is based on extra computations with different smoothers in a (de)coupled manner and grid refinement on one or more of the coarse grids of each level. The problem-dependent components are not optimized on the dynamic grids, and instead of the black-box optimization, the extrapolated values from the coarse levels are used. Then the finest grid is generated for algebraic parallelism using information on adaptive coarse grid refinement. The problem-dependent components are optimized in the black-box manner only during the first multigrid iteration and on the first time step. Furthermore, these optimized components are used for all times steps. The black-box optimization on the first

132 � Conclusions multigrid iteration does not conflict with geometric parallelism. The subsequent optimization can be performed on the (Nt +1)th time step to minimize computational efforts. The final algorithm convergence rate depends strongly on the (smoothing, refinement, etc.) criteria used to optimize these problem-dependent components. In the worst (nonnested) case, the two-grid algorithm has four problem-dependent components: intergrid transfer operators ℛO→A , ℛ̂ O→A , and 𝒫A→O and the number of smoothing iterations. The two-grid RMT-based algorithm can be considered as an alternative approach to algebraic multigrid methods. The results of numerical experiments for the illustration of the robustness and the efficiency of RMT are given in [21]. Sequential and parallel software are presented in [22, 23].

Bibliography [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]

[25] [26]

Axelsson O. Iterative Solution Methods. Cambridge University Press, Cambridge, 1994. Bern M., Plassmann P. Mesh Generation. Handbook of Computational Geometry. Elsevier Science, North Holland, 2000. Benzi M., Golub G. H., Liesen J. Numerical solution of saddle point problems. Acta Numer., 14:1–137, 2006. Cebeci T., Bradshaw P. Physical and Computational Aspects of Convective Heat Transfer. Springer-Verlag, New York, 1988. Fedorenko R. P. A relaxation method for solving elliptic difference equations. USSR Comput. Math. Math. Phys., 1:1092–1096, 1962. Fletcher C. A. J. Computational Techniques for Fluid Dynamics, Volume I: Fundamental and General Techniques. Springer-Verlag, Berlin, 1988. Frederickson P. O., McBryan O. A. Parallel superconvergent multigrid. In: Multigrid Methods: Theory, Applications and Supercomputing (ed. S. McCormick), pp. 195–210. Marcel Dekker, New York, 1988. Frederickson P. O., McBryan O. A. Recent developments for the PSMG multiscale method. In: Multigrid Methods III, Proceedings of the 3rd International Conference on Multigrid Methods (eds. W. Hackbusch and U. Trottenberg), pp. 21–40. Birkhauser, Basel, 1991. Frey P., George P. L. Mesh Generation. Wiley, New York, 2010. George P. L. Automatic Mesh Generation. Wiley, New York, 1991. Golub G. H., van Loan C. F. Matrix Computations. The John Hopkins University Press, 3rd edition. Baltimore, MD, 1996. Hackbusch W. Multi-Grid Methods and Applications. Springer, Berlin, 1985. Hackbusch W. Iterative Solution of Large Sparse Systems. Springer, Berlin, 1994. Hageman L. A., Young D. M. Applied Iterative Methods. International Series of Numerical Mathematics. Academic Press, New York, 1981. Karavaev A. S., Kopysov S. P. A modification of the hexahedral mesh generator based on voxel geometry representation. Vestn. Udmurtsk. Univ. Mat. Mekh. Komp. Nauk., 30(3):468–479 (2020) (in Russian). https://doi.org/10.35634/vm200308. Karavaev A. S., Kopysov S. P. Hexahedral mesh generation using voxel field recovery. In: Numerical Geometry, Grid Generation and Scientific Computing. Lecture Notes in Computational Science and Engineering, vol. 143, pp. 295–305 (2020). https://doi.org/10.1007/978-3-030-76798-3_19. Knupp P., Steinberg S. Fundamentals of Grid Generation. CRC Press, Boca Raton, 1993. Liseikin V. D. Grid Generation Methods. Springer, Berlin, 1999. Liseikin V. D. Layer Resolving Grids and Transformations for Singular Perturbation Problems. VSP, Utrecht, 2001. Martynenko S. I. Multigrid Technology: Theory and Applications. Fizmatlit, Moscow, 2015 (in Russian). Martynenko S. I. The Robust Multigrid Technique: For Black-Box Software. De Gruyter, Berlin, 2017. Martynenko S. I. Sequential Software for Robust Multigrid Technique. Triumph, Moscow, 2020 (in Russian). http://github.com/simartynenko/Robust_Multigrid_Technique_2020. Martynenko S. I. Parallel Software for Robust Multigrid Technique. Triumph, Moscow, 2021 (in Russian). https://github.com/simartynenko/Robust_Multigrid_Technique_2021_OpenMP. Martynenko S., Zhou W., Gökalp İ, Bakhtin V., Toktaliev P. Parallelization of Robust Multigrid Technique Using OpenMP Technology. In: Parallel Computing Technologies. PaCT 2021 (ed. Malyshkin V.). Lecture Notes in Computer Science, vol. 12942. Springer, Cham, 2021. https://doi.org/10.1007/9783-030-86359-3_15, https://link.springer.com/book/10.1007/978-3-030-86359-3. Oevermann M., Klein R. A Cartesian grid finite volume method for elliptic equations with variable coefficients and embedded interfaces. J. Comput. Phys., 219:749–769 (2006). Ortega J. Introduction to Parallel and Vector Solution of Linear Systems. Plenum Press, New York, 1988.

https://doi.org/10.1515/9783111319568-006

134 � Bibliography

[27] Patankar S. V. Numerical Heat Transfer and Fluid Flow. Hemisphere Publishing Corporation, New York, 1980. [28] Saad Y. Iterative Methods for Sparse Linear Systems. PWS Publishing Company, Boston, 1995. [29] Samarskii A. A. Equations of parabolic type and difference methods for their solution. In: Proceedings of the All-Union Conference on Differential Equations, pp. 148–160. Izd. Academy of Sciences of the ArmSSR, Yerevan 1958 (in Russian). [30] Samarskii A. A., Nikolaev E. S. Numerical Methods for Grid Equations. Vol. I/II. Birkhäuser-Verlag, Basel-Boston-Berlin, 1989. [31] Samarskii A. A. The Theory of Difference Schemes. Marcel Dekker Inc., New York, 2001. [32] Sedov L. I. A Course in Continuum Mechanics. Vol. 1. Wolters-Noordhoff, Groningen, 1971. [33] Tannehill J. C., Anderson D. A., Pletcher R. H. Computational Fluid Mechanics and Heat Transfer, 2nd edition. Taylor & Francis Ltd., Oxfordshire, 1997. [34] Thompson J. F., Soni B. K., Weatherill N. P. Handbook of Grid Generation. CRC Press, Boca Raton, 1998. [35] Trottenberg U., Oosterlee C. W., Schüller A. Multigrid. Academic Press, London, 2001. [36] Vanka S. P. Block-implicit multigrid solution of Navier–Stokes equations in primitive variables. J. Comput. Phys., 65(1):138–158 (1986). [37] Versteeg H. K., Malalasekera W. An Introduction to Computational Fluid Dynamics – The Finite Volume Method. Longman Scientific & Technical, New York, 1996. [38] Wesseling P. An Introduction to Multigrid Methods. Wiley, Chichester, 1992. [39] Xu J. The auxiliary space method and optimal multigrid preconditioning techniques for unstructured grids. Computing, 56:215–235 (1996).

Index Σ-modification 26, 39, 120 algorithmic complexity 13, 15 AMG 26 ASM 27 basic algorithm 26 black-box software V, 38, 64, 73, 127 Chebyshev norm 20, 62 condition number 16, 108 correction 26, 120 Crank–Nicolson scheme 23, 87, 94 defect correction 28 difference scheme 4 discretization parameter 7, 40, 44 equation – Navier–Stokes 22, 47, 129 – Poisson 7, 11, 18, 47 FAS 27, 36, 128 finite volume 31, 44, 49, 78, 88 FVM 6 Gaussian elimination 15, 20 ghost points 33, 51 grid – auxiliary 27, 36, 120 – hierarchy 42 – level 41 – original 26, 36, 120 – points 40 – staggered 23, 40 – structured 43 index mapping 43 iterations – counter 106 – coupled 19 – Gauss–Seidel 13, 14, 83, 100, 108

https://doi.org/10.1515/9783111319568-007

– Jacobi 8 – damped 8, 108 – segregated 18 – Vanka 14, 83 Laplace operator 18, 78, 86 matrix – coefficient 12, 16, 55, 78, 106 – dense 25 – ill-conditioned 16, 108 – non-singular 105 – similar 105 – singular 105 – sparse 12 – splitting 58, 106, 121 – well-conditioned 16, 108 matrix norm 104 – consistent 104 mesh size 7, 18 multigrid structure 42, 75, 78, 88 operator – prolongation 121 – restriction 121 ordering of unknowns – block 12, 128 – lexicographic 12 – multicolour 82 – point 12, 66 relaxation 11 residual 107 Reynolds number 22 robustness 3, 38, 128 Shortley–Weller discretization 47, 130 spectrum 105 speed-up 76, 79 V-cycle 47, 76, 131 vector norm 103