Semi-empirical Neural Network Modeling and Digital Twins Development [1 ed.] 0128156511, 9780128156513

Semi-empirical Neural Network Modeling presents a new approach on how to quickly construct an accurate, multilayered neu

447 54 7MB

English Pages 320 [281] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Semi-empirical Neural Network Modeling and Digital Twins Development [1 ed.]
 0128156511, 9780128156513

Table of contents :
Cover
SEMI-EMPIRICAL
NEURAL NETWORK
MODELING AND
DIGITAL TWINS
DEVELOPMENT
Copyright
About the authors
Preface
Acknowledgments
Introduction
References
1
Examples of problem statements and functionals
Problems for ordinary differential equations
A stiff differential equation
The problem of a chemical reactor
The problem of a porous catalyst
Differential-algebraic problem
Problems for partial differential equations for domains with fixed boundaries
The Laplace equation on the plane and in space
The Poisson problem
The Schrödinger equation with a piecewise potential (quantum dot)
The nonlinear Schrödinger equation
Heat transfer in the vessel-tissue system
Problems for partial differential equations in the case of the domain with variable borders
Stefan problem
Problem formulation
The problem of the alternating pressure calibrator
Problem statement
Inverse and other ill-posed problems
The inverse problem of migration flow modeling
The problem of the recovery of solutions on the measurements for the Laplace equation
The problem for the equation of thermal conductivity with time reversal
The problem of determining the boundary condition
The problem of continuation of the temperature field according to the measurement data
Construction of a neural network model of a temperature field according to experimental data in the case of an interval sp ...
The problem of air pollution in the tunnel
The conclusion
References
Further reading
2
The choice of the functional basis (set of bases)
Multilayer perceptron
Structure and activation functions of multilayer perceptron
The determination of the initial values of the weights of the perceptron
Networks with radial basis functions-RBF
The architecture of RBF networks
Radial basis functions
Asymmetric RBF-networks
Multilayer perceptron and RBF-networks with time delays
References
3
Methods for the selection of parameters and structure of the neural network model
Structural algorithms
Methods for specific tasks
Methods of global non-linear optimization
Methods in the generalized definition
Methods of refinement of models of objects described by differential equations
References
Further reading
4
Results of computational experiments
Solving problems for ordinary differential equations
Stiff form of differential equation
Chemical reactor problem
The problem of a porous catalyst
Differential-algebraic problem
Solving problems for partial differential equations in domains with constant boundaries
Solution of the Dirichlet problem for the Laplace equation in the unit circle
Solving boundary value problems for the Laplace equation in the unit square
The Laplace equation in the L-region
The Poisson problem
Schrödinger equation with a piecewise potential (quantum dot)
Nonlinear Schrödinger equation
Heat transfer in the tissue-vessels system
Solving problems for partial differential equations for domains with variable boundaries
Stefan problem
The problem of the variable pressure calibrator
Solving inverse and other ill-posed problems
Comparison of neural network and classical approaches to the problem of identification of migration processes
The problem of the recovery solutions of the Laplace equation on the measurements
Problem for heat conduction equation with time reversal
The problem of determining the boundary conditions
The problem of continuing the temperature field according to measurement data
Construction of a neural network model of a temperature field in the case of an interval specified thermal conductivity co ...
The problem of air pollution in a tunnel
References
5
Methods for constructing multilayer semi-empirical models
General description of methods
Explicit methods
Implicit methods
Partial differential equations
Application of methods for constructing approximate analytical solutions for ordinary differential equations
Comparison of methods on the example of elementary functions
Results of computational experiments 1: The exponential function
Error analysis
Comparison with Maclaurin series with the same number of operations
Results of computational experiments 2: The cosine function
Error analysis
Comparison with the Maclaurin series
Search of period
Stiff differential equation
Mathieu equation
Nonlinear pendulum equation
Results of computational experiments for the segment [0;1]
Results of computational experiments for the segment [0;4]
The problem of modeling processes in the porous catalyst granule
Multilayer methods for a model equation with delay
Application of approximate multilayer methods for solving differential equations in the problem of stabilizing an inverted ...
Application of multilayer methods for partial differential equations
Heat equation
Comparison of multilayer methods for solving the Cauchy problem for the wave equation
Problems with real measurements
The problem of sagging hemp rope
Simulation of the load deflection of the real membrane
Semi-empirical models of nonlinear bending of a cantilever beam
References
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
V
W
Z
Back Cover

Citation preview

SEMI-EMPIRICAL NEURAL NETWORK MODELING AND DIGITAL TWINS DEVELOPMENT

SEMI-EMPIRICAL NEURAL NETWORK MODELING AND DIGITAL TWINS DEVELOPMENT DMITRIY TARKHOV Professor, Department of Higher Mathematics, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russian Federation

ALEXANDER VASILYEV Professor, Department of Higher Mathematics, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russian Federation

Academic Press is an imprint of Elsevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Street, Suite 1650, San Diego, CA 92101, United States 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom © 2020 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/ permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-815651-3 For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

Publisher: Mara Conner Acquisition Editor: Chris Katsaropoulos Editorial Project Manager: Mariana L. Kuhl Production Project Manager: Nirmala Arumugam Cover Designer: Greg Harris Typeset by SPi Global, India

About the authors Alexander Vasilyev was born in St. Petersburg (Leningrad) on 10 August 1948. After graduating mathematical school №239 with a gold medal and the Physics Faculty of Leningrad State University (LSU) with honors, he defended the Ph.D. thesis: New boundary value problems for ultrahyperbolic and wave equations, in the Leningrad branch of the Steklov Mathematical Institute in 1978. Working since 1980 at the Department of higher mathematics of Peter the Great St. Petersburg Polytechnic University as an Associate Professor and since 2007 as a Professor, he read advanced courses and electives in various areas of modern mathematics, led seminars. He prepared and in 2011 defended his doctoral thesis: Mathematical modeling of systems with distributed parameters based on neural network technique, for the degree of Doctor of Technical Sciences in the specialty 05.13.18 – “Mathematical modeling, numerical methods, and software.” Professor Vasilyev’s scientific interests are in the field of differential equations, mathematical physics, ill-posed problems, meshless methods, neuro-mathematics, neural network modeling of complex systems with approximately specified characteristics, heterogeneous data, digital twins, deep learning, global optimization, evolutionary algorithms, big transport system, environmental problems, educational projects. He is the author and co-author of three monographs, chapters in the reference book, chapters in two collective monographs, textbook (with the Ministry of Education stamp) – in Russian; he published about 180 works devoted to neural network modeling; he has Honors Diploma of the Ministry of education of the Russian Federation, the Diploma and Awards of Polytechnic University Board. Professor Vasilyev is the Chairman, the member of the Organizing Committee of the conferences; he is the head, the central executive and participant of projects supported by grants of RF, a member of the editorial board of the “Journal Mathematical Modeling and Geometry.” He is fond of painting and graphics.

vii

viii

About the authors

Dmitry Tarkhov, born 14 Jan 1958 in St. Petersburg. In 1981 graduated with honors from the faculty of physics and mechanics of Leningrad Polytechnic Institute, majoring in Applied Mathematics and entered graduate school at the Department “Higher mathematics”. After graduation, he worked at the Department as an assistant, then associate Professor and continues to work presently as a Professor. In 1987 he defended the thesis “the Straightening of the trajectories on the infinite dimensional torus”, for which he was awarded the degree of Ph.D. of physical and mathematical Sciences. In 1996, while working as a part-time chief systems analyst at the St. Petersburg Futures exchange he began studying neural networks. He has published more than 200 scientific papers on this topic. In 2006 he defended doctoral thesis “Mathematical modeling of technical objects on the basis of structural and parametrical adaptation of artificial neural networks”, for which he was awarded the degree of doctor of technical Sciences.

Preface The fourth industrial revolution is taking place before our eyes; it poses urgent problems for modern science. The adoption of cyber-physical systems in manufacturing requires an adequate reflection of the physical world in cybernetic objects. In computing nodes that control technological lines, robots, complex technical objects (airplanes, cars, ships, etc.), twins of the corresponding objects should function. Such twins cannot be laid into the computing node once unchanged, as the simulated object changes during operation. The virtual twin of the real object should change according to the information coming from the sensors. Algorithms of such changes should be implemented in the mentioned computational node. Industry 4.0, the transition to which is currently underway, is not possible without solving the above problems. However, presently accepted methods of mathematical modeling are poorly adapted to their solution. In our opinion, the creation of suitable mathematical and algorithmic tools to solve uniformly a wide range of problems arising in the transition to Industry 4.0 is a crucial step in the development of Industry 4.1. By Industry 4.1, we mean an industry that has the same production processes as Industry 4.0, but the implementation of these processes will be based on unified and cheaper technologies. We believe that one of these key technologies is the neural network technique. Currently, neural networks are actively used in the problems of Big Data, image processing, pattern recognition, complex system control, and other tasks of artificial intelligence. The issue of creating cyber-physical systems requires adequate means of mathematical modeling. We offer neural networks as such tools. Quite a lot of publications is devoted to the application of neural networks to mathematical modeling problems; a small review of them we have led in the introduction to this book. At the moment, it is necessary to move from solving individual problems to a single methodology for solving them, to which we have devoted this monograph. Our methodology is based on three simple steps. The first step is to characterize the quality of the mathematical model in the form of a functional. The first chapter demonstrates how to do this on a large set of problems. The second step is to choose the type of neural network that is most suitable for solving the problem.

ix

x

Preface

In the second chapter, we described the most useful from our point of view types of neural networks and gave recommendations concerning the choice of a particular type of neural network depending on the characteristics of the problem solved. The third step is to train the neural network, by which we mean minimizing the target functional. Algorithms of such training we have considered in the third chapter. A significant part of the algorithms involves the simultaneous adaptation of the neural network parameters and the selection of its structure. In the fourth chapter, we presented the results of computational experiments on the problems formulated in the first chapter. The fifth chapter is devoted to the construction of approximate multilayer solutions of differential equations based on classical numerical methods. To understand our approach, we must point out that the transition to Industry 4.1 requires a paradigm shift in mathematical modeling. Traditional mathematical modeling of a real object is performed in two steps. The first step is to describe the object by a differential equation or a system of such equations (ordinary or partial ones) with accompanying conditions. The second step is the numerical solution of these equations with the maximum possible accuracy, the construction of a control system based on a differential model, etc. This numerical calculation is the basis for the design of the object, the choice of its optimal parameters, conclusions about its performance, etc. If subsequent observations of the object, measurements of its parameters come into conflict with the calculations done, then an additional study of the processes taking place in the object is carried out. According to the results of these studies, the differential model of the object is refined. Then the computational studies of this model are repeated. Such the approach requires a lot of time and intellectual resources. To improve the efficiency of mathematical modeling, we propose to change the point of view on the differential model of the object. We consider it not as accurate initial information for subsequent studies, but as approximate information about the object along with the measurement data. According to this information, we make a set of mathematical models of the object. At the same time, we foresee and provide the possibility of changing the parameters of these models in the operation of the object. We can choose from a set of models the model best suited to the object at this stage of its life cycle. Thus, we use known formulas of numerical methods for solving differential equations not to generate tables of numerical

Preface

solutions, but to create a set of adaptive functional solutions. We have a number of problems with real measurements for which our models reflect the object more accurately than the exact solutions of the original differential equations—some similar problems we gave in the fifth chapter.

xi

Acknowledgments We want to express our gratitude for the useful discussions and support of our colleagues – the Professors and Experts of Peter the Great St. Petersburg Polytechnic University, the Moscow State University, the Moscow Aviation Institute, and other Institutions and Organizations: Evgeniy Alekseev, Alexander Belyaev, Elena Budkina, Vladimir Gorbachenko, Vladimir Kozlov, Boris Kryzhanovsky, Evgenii Kuznetsov, Tatiana Lazovskaya, Sergey Leonov, Nikolai Makarenko, Galina Malykhina, Yuri Nechaev, Vladimir Osipov, Alexander Petukhov, Dmitry Reviznikov, Vladimir Sukhomlin, Sergey Terekhov, Valery Tereshin, Yuri Tiumentsev, Tatiana Shemyakina, Lyudmila Zhivopistseva. We would also like to express our gratitude to our students at Polytechnic University for their assistance. Among them, we wish to mention Yaroslav Astapov, Anastasia Babintseva, Alexander Bastrakov, Serafim Boyarskiy, Dmitry Dron, Daniil Khlyupin, Kirill Kiselev, Anna Krasikova, Polina Kvyatkovskaya, Ilya Markov, Danil Minkin, Arina Moiseenko, Anastasia Peter, Nikita Selin, Roman Shvedov, Gleb Poroshin, Dmitry Vedenichev. We express our sincere gratitude to the Elsevier Team that worked on our book for their valuable support and assistance: our Senior Acquisitions Editor, Chris Katsaropoulos, our Senior € hl Leme, our Project Editorial Project Manager, Mariana Ku Manager, Nirmala Arumugam, and our Copyrights Coordinator, Kavitha Balasundaram. This book is based on research carried out with the financial support of the grant of the Russian Science Foundation (project №18-19-00474).

Dmitry Tarkhov Alexander Vasilyev

xiii

Introduction At the moment, the transition to the use of a simulation model (digital twin) at all stages of the work of a complex technical object has matured. At the first stage—during the design process— structure and parameters of the object are determined according to the requirements using a mathematical model. Next, a series of problems is solved to test the operation of the future object in various situations. After its manufacture, it is necessary to clarify the model by measurements made with a real object and evaluate its performance. During operation, the object is exposed to external influences, wear, which leads to changes in its characteristics. These characteristics need to be refined by measurements and observation of the object in the course of its work, making the necessary changes in its operation modes. The simulation model allows predicting the possibility of destruction of the object and the need to stop its operation for the repair or end of the life cycle. Using a single model greatly simplifies the conduct of the entire simulation cycle. This book is devoted to the methods of creating models of this sort (digital twins). The classical approach to the mathematical modeling of real objects consists of two stages. At the first stage, a mathematical model is made on the basis of studying physical or other processes, most often in the form of differential equations (ordinary or partial derivatives) and additional conditions (initial, boundary, etc.). At the second stage, the model is examined—numerical solutions of the indicated equations are built, etc. During the modeling of complex technical objects, computer-aided engineering (CAE) packages are usually used, based on distinct variants of finite element method (FEM)—ANSYS, ABAQUS, etc. However, modeling a real object through them encounters several principled difficulties. Firstly, to apply FEM, it is essential to know the differential equations, describing the behavior of an object. Precise information on the equations usually lacks due to the complexity of describing the physical processes which occur in the simulated object. Secondly, to apply FEM, it is necessary to know the initial and boundary conditions, information on which for the real objects is even less accurate and complete. Thirdly, during the operation of the real object, its properties and details of physical processes taking place may vary. This functioning

xv

xvi

Introduction

requires appropriate adaptation of the model, which is difficult to implement with a model built based on FEM. We believe that another approach seems to be more promising when at the second stage, an adaptive model is built, which can be refined and rearranged in accordance with the observations on the object. The monograph is devoted to the presentation of our methodology for constructing such a model, illustrated with numerous examples. As the main class of mathematical models, the class of artificial neural networks (ANN) is used, that have proven themselves to behave well in complex data processing tasks. At the moment, the neural network technique is one of the most dynamically developing spheres of artificial intelligence. It has been successfully used in various applied areas, such as: 1. Forecasting various indicators of financial market and economic indicators (exchange rates and stocks, credit, and other risks, etc.). 2. Biomedical applications (diagnosis of different diseases, identification of personality). 3. Control systems. 4. Pattern recognition, identity recognition. 5. Geology (prediction of the availability of mineral resources). 6. Ecology and environmental sciences (weather forecast and various cataclysms). Before describing the features of neural network modeling, we formulate the task of mathematical modeling in the most general form. By the mathematical model of the researched object (system), we will understand the mapping F : (X, Z) 7! Y, that establishes the relationship between the set of input data X, which defines the conditions for the functioning of the object, the set of state parameters Z, which characterizes the condition of the elements (components) of the model, and the set of output data Y. The mapping F in this approach is characterized by the structure (the type of the element and the connections between them) and the set of parameters. Most often, researchers are limited development of the model to the adjustment of the parameters with a fixed mapping structure, which is selected based on the “physical considerations,” experimental data, etc., but the simultaneous selection of the structure and parameters of the model seems more adequate. Such a selection of parameters (or parameters and structure) is performed based on the minimization of a certain set of functionals {J } of an error, quality, etc., which determine the degree to which the model fulfills its purpose. In data processing tasks generally, a finite set of input parameters X and a set of corresponding parameters Y is given. In this

Introduction

case, the error functional shows by how much does the output of the model for a given input differs from the output Y known from the experience. In the problems, where the construction of the mathematical model is carried out based on differential data, the functional shows by how much does the desired function satisfy the differential equation, which is assumed to be known. In such problems, additional ratios are traditionally used in the form of specified boundary and initial conditions, although our approach allows us easily to consider the problems in which, in addition to the differential equation, approximately known, generally speaking, replenished experimental data are specified. Meanwhile, additional relationships related to the nature of the described object or reflecting the model features could be considered: symmetry requirements, conservation laws, solvability conditions for the arising problem, etc. In fact, the methods we offer for solving these problems differ little from each other; we only need to properly select a set of functionals {J } and a set of functions F, in which the model is selected. The proposed approach to modeling in this wording is consonant with the idea expressed by the great L. Euler: “Every effect in the universe can be explained as satisfactorily from final causes, by the aid of the method of maxima and minima, as it can from the effective causes.” In this book, models are sought in several standard functional classes, usually referred to as neural networks. Currently, both in Russia and abroad, a wealth of experience has been gained in applying certain types of neural networks to numerous practical tasks. The need to establish a unified methodology for the development of algorithms for the construction and training of various types of neural networks, as applied to solve a wide class of modeling problems, has matured. There is a fairly wide range of tasks in mathematical modeling (related generally to the description of systems with distributed parameters), which lead to the studies of the boundary value problems for partial differential equations (or integro-differential equations). The main methodological mistake in the building of mathematical models of real objects is that the partial differential equation (along with the boundary conditions) is taken as the modeling object from which its approximate model is built— the solution found through one or another numerical method. It is more accurate to look at the differential equations (all together with accompanying initial and boundary conditions) as an approximate model containing information about the modeled object, from which one can move to a more convenient

xvii

xviii

Introduction

model (for instance, functional), using equations and other available information. Even more accurate is the consideration of the hierarchy of models of different accuracy and area of applicability, which can be refined as new information becomes available. Only a small number of problems in mathematical physics, usually possessing symmetry, allows an exact analytical solution. Existing approximate solution methods either allow us to obtain only a pointwise approximation likewise to grid methods (obtaining a certain analytical expression from the pointwise solution is a separate problem) or impose specific requirements on a set of approximate functions and require solving an important auxiliary problem of partition of the original region, just as in the finite element method. The existing neural network approaches for solving problems of mathematical physics are either highly specialized or use variants of the collocation method with fixed neural network functions, which can lead to notable errors between the nodes. Among the publications devoted to the topic of the use of neural networks for solving partial differential equations (generally, these are special type networks with inherent adjustment method), we are going to note some. Our goal was not to present an exhaustive review of the literature on the application of artificial neural networks for building approximate solutions of differential equations. Any review of this sort quickly becomes outdated. This goal is more consistent with the review publications in scientific journals. Here we indicate those works that we consider important for the indicated research direction, those publications that impressed us and helped us in determining our approach towards the construction of mathematical models of real objects from heterogeneous data. In his fundamental work [1], devoted to the 20th anniversary (1968–88) of the multiquadric-biharmonic method (MQ-B), R. Hardy not only addresses the history of the origin and development of this original method but also summarizes the main ideas and advantages associated with its application. Failures in use in the topography of different types of modifications of trigonometric and polynomial series for constructing curves or surfaces on the scattered data led to the emergence of a new approach. To approximate a function, for example, of two independent variables instead of piecewise approximators, expansion (in Hardy’s notation) H ðx, y Þ ¼

n X j¼1

  αj Qj x, y; xj , yj , Δ ,

Introduction

by multiquadrics in the form Qj(x, y; xj, yj, Δ)¼[(xxj)2+(yyj)2+Δ2]1/2 is used. The advantages of expansions are noted: their greater efficiency in comparison, for instance, with expansions in spherical harmonics; multiquadrics Qj, and, consequently H type functions are infinitely differentiable, which makes it possible to “fit” the values of functions, derivatives, etc. An overview of the applications of this approach is given in geodesy, geophysics, cartography, and terrestrial survey; in photogrammetry, remote sensing, measurement, control and recognition, signal processing; in geography and digital terrain model; in hydrology. A number of significant results in the application of neural networks with radial basis functions, called RBF, for solving differential and integral equations were obtained by Edward Kansa. We would highlight articles [2–5]. In the online publication [4], he gives an overview of different approaches and motivation to use RBF for solving partial differential equation, outlining prospects, and identifying problem areas. Kansa notes that the numerical solution of partial differential equations is mainly prevailed by the finite differences, elements or volume methods, using local interpolation schemes. For local approximations, these methods require the construction of a grid, which in the case of large dimensions represents a nontrivial problem. An asymmetric collocation method for partial differential equations is considered using the example of elliptic equations. In the case of parabolic and hyperbolic equations, the use of RBF expansions is carried out only for spatial coordinates, whilst the time dependence is considered in the method of straight lines (which leads to the consideration of regular differential equations). In the elliptic case, differential equations and boundary conditions, for simplicity, are assumed to be linear, and the boundary value problem is properly posed. The action of the operators comes down to the effect upon the basis functions, and the collocation problem arises, leading to the solution of the linear system for the coefficients of expansion in the basis functions. Kansa points out that more interesting problems for the partial differential equations require a well-thought-out arrangement of nodes to cover a fairly wide range of examples for important physical phenomena. The use of MQ for solving Burgers’ equation with viscosity revealed that with an increase in Reynolds number, the adaptive selection of the subset nodes was required to describe the discontinuity front adequately. The most important place in Kansa’s work is his remark on the problem definition from the perspective of global optimization, bypassing collocation methods, which are essentially

xix

xx

Introduction

ill-conditioned. The works by Yefim Galperin et al. [6, 7], developing this approach, are distinguished, in which the initial conditions, boundary conditions and the equations themselves are introduced through functionals, the weighted sum of which defines the final global functional. The vector of valid parameters (nodes, shaping parameters, expansion coefficients) of MQ RBF networks varies until the residuals reach the required level of accuracy. This approach decreases the number of functions used. Thus, not only do any ill-conditioned issues are bypassed, arising due to symmetrical or asymmetrical collocation problem for solving partial differential equations, but poorly posed, or so-called incorrect problems are also introduced to the circle of considered problems, which represent physical reality, even though “exact” mathematical solution may not exist. Galperin and Kansa [8] succeeded in using this approach to numerically solve weakly singular Volterra integral equations: adding one at a time MQ basis function and optimizing parameters in each step using a three-parameter optimization procedure, they found out that in relation to the problem they required from 4 to 7 basis functions for the convergence with an error not exceeding 5  107. A similar approach was used in an article written by E. Kansa et al. [5], in which elliptic equations with Dirichlet and Neumann boundary conditions were considered, two-dimensional problems for the Laplas, Poisson and biharmonic equations were taken as test ones; a good agreement between the exact and calculated solution was found. In work by Yefim Galperin and Quan Zeng [7] a new approach based on η-equivalent solutions (approximate solutions for which the error functional does not exceed η) is given to consider incorrect and overdetermined problems for partial differential equations and for problems where a solution does not exist. A new method is being developed, based on the global optimization, for solving and controlling the processes described by partial differential equations. Following [7], we describe the concept η-equivalent solution in more detail. Definition For a given η > 0, a function is called η-equivalent solution (η-solution) to a system of equations (including differential equations, boundary and other conditions) if and only if, when during its substitution in these equations, left and right parts differ by less than η. Obviously, the two η-solutions do not have to be close to each other. They are “close” regarding the conditions which are included in the specific concept of η-equivalence.

Introduction

Class of η-equivalent solutions includes all exact solutions; a particular η-solution can approximate one, several or none of the exact solutions. Moreover, η-equivalent solutions may exist if no exact solutions exist. Thus, defined in such a way, η-equivalent solutions are natural due to random inaccuracies in the specification of model equations and impurities, environmental pollution in which physical problems are considered. Moreover, it is exactly η-equivalent solutions (or their approximations) that are obtained by calculations. The numerical method for solving DE proposed by Dissanayake et al. [9] is also based on the neural network approach. The construction of a neural network using the point collocation method transforms the numerical problem of solving a partial differential equation to an unconditional minimization problem. The technique is illustrated using two numerical examples. In the works by Vladimir Gorbachenko [10–16], cellular neural networks (CNN) were used to solve partial differential equations. They combine the features of cellular automaton and neural networks. The proximity of the mathematical description of this class of neural networks and systems of difference equations, approximating partial differential equations, argues the naturalness and perspective of their application. There is an opinion that CNN allows one to classify more complex nonlinear physical phenomena, than the well-known partial differential equations. In the monograph by V. I. Gorbachenko [10], the possibilities of using neurocomputers for solving boundary problems of the field theory were examined in depth: the structures and algorithms of analog and digital cellular neuro-like networks to solve discrete analogues of partial differential equations were considered; the results of the study of learning algorithms and the structure of modeling neural-like networks are given for solving problems of field theory (mainly problems described by the equations of elliptic and parabolic types), a number of nonlinear problems, internal and external problems (boundary integral equations in connection with the latter), problems of the plate theory; neural networks for solving the partial eigenvalue problem were considered; an analysis of the structure of neurocomputers, based on analog and digital networks, is given. In the recent works by Gorbachenko et al. (for example, [13]) the neural network approach is used in improper problems: for solving the coefficient inverse problem of mathematical physics, in which the coefficients of the equation (or functions belonging to the initial boundary conditions) are restored by a set of solution values at certain points in the domain [17, 18]. In other works [14, 15], an attempt was made to use a different type of neural

xxi

xxii

Introduction

networks (radial-basis) to solve partial differential equations. Theoretical results are illustrated by the numerical experiments for the case of homogeneous Dirichlet problem in the square for the Poisson equation and the special right part, the advantage of the gradient weighting method in comparison to the traditional collocation method was noted. The article by Mai-Duy [19] presents non-grid procedures for solving linear differential equations (in ordinary and partial derivatives of elliptic type), based on the MQ-RBF neural networks and using the methods of function approximation and their derivatives proposed by the author. An interest is posed by S. A. Terekhov’s work [20], in which on test problems neural networks are considered as variational trial solutions of boundary and initial problems for the equations of mathematical physics. In work by I. Lagaris et al. [21] the solution of a boundary value is sought as the sum of two addends, one of which satisfies given initial boundary conditions and does not contain selectable parameters, the other follows homogeneous initial boundary conditions and includes the feedforward neural network with selectable parameters. The neural network solution constructed is adjusted to satisfy the differential equation. Such an approach applies to a regular differential equation, partial differential equation, system of differential equations, but, unfortunately, only in the case of linear problems. The method is illustrated by solving a set of model problems for a domain with simple geometry, such as a segment or square, a comparison of neural network solutions with the solutions obtained on the basis of Galerkin’s finite element method is conducted. In work by I. Lagaris et al. [22] the implementation of the method [21] on a hardware platform with two digital signal processors is described, and comparative results against a PC-based implementation of the method are provided. We should also note his other significant works on the application of ANNs to solving PDEs. The paper [23] studies the solution of eigenvalue problems for differential and integro-differential operators using ANNs. The well-known equations of quantum mechanics such as the € dinger equation for the Morse potential, the Schro € dinger Schro and the Dirac equations for the muon atom, and the non-local € dinger equation are considered as integro-differential Schro model problems. In the case of two dimensions, a well-studied Henon-Heiles hamiltonian is treated, and in three dimensions a model problem for three coupled anharmonic oscillators is considered. In all cases, the method was highly accurate, robust,

Introduction

and efficient. A neural network approach is a promising tool for solving problems of higher complexity and dimension. PDEs with boundary conditions (Dirichlet or Neumann) defined on boundaries with simple geometry have been successfully treated using ANNs. The article [24] deals with the case of complex boundary geometry, where the boundary is determined by a number of points that belong to it and are closely located, so as to offer a reasonable representation. Two networks are employed: a multilayer perceptron and a radial basis function network. The RBFN is used to account for the exact satisfaction of the boundary conditions. The method has been successfully tested on two-dimensional and threedimensional PDEs and has yielded accurate results. A novel method for solving ordinary and partial differential equations, based on grammatical evolution is presented in [25]. The technique forms generations of trial solutions expressed in a closed analytical form. Several examples are worked out, and in most cases, the exact solution is recovered. When the solution cannot be expressed in a closed analytical form, then the offered method produces an approximation with a controlled level of accuracy. Results on several problems are reported to illustrate the potential of this approach. We also mention the noteworthy publications by I. Lagaris on optimization, parallel approximation, infinite-dimensional neural networks. In work by С.E. Burnaev et al. [26] a method for solving differential equations, like-minded to works [10, 11], using cellular neural networks, is presented. They take finite-difference approximation as the initial statement. The method is illustrated by solving several simple examples for the Laplace equation and the heat equation in rectangular domains with a known analytical solution. S.V. Belikov’s publication [27] proposes the use of neural network methodology not only for identifying coefficients of a known pattern for solving the problem of mathematical physics but also for choosing the solution method and type of solution, which is a non-trivial task. Neural automata, which expand the concept of hierarchical neural networks and model of the mixture of experts, represent the union of neural networks into a graph, which is considered as a transition graph of a finite-state automaton. The neural network decides to shift from one state to another or to stop. The solution to the problem is a path in the transition graph; an incomplete path is interpreted as a part of the solution or an approximation to it. The article by R. Masuoka [28] presents the results of experiments on the training of neural networks based on restrictions

xxiii

xxiv

Introduction

in the form of differential data (including partial differential equations), and a corresponding algorithm has been developed. A.V. Shobukhov [16] made the first steps in the research of neural network variant of the spine-collocation method for numerical solving of nonlinear equations of mathematical physics (the Burgers equation). At this time, neural networks are quite widely used to solve differential equations—ODEs and PDEs. There were a lot of works in applied areas and computational mathematics and physics, where the use of neural networks was considered as numeric meshless methods. The range of applications is expanding, specific issues are being studied. Thus, fully connected multilayer perceptrons are used to obtain numerical solutions of partial differential equations in publication [29]. The solution of the system of ordinary differential equations is considered in work [30], where neural networks used are structurally similar to recurrent neural networks but have differences in the number and activation functions of neurons. In the paper [31] for the numerical solution of the ordinary differential equations of a high order, neural networks with radial basis functions (RBFN) at their asymmetric placement are applied. A general and simple numerical method for solving delay differential equations (DDEs) based on radial basis functions with their special arrangement is proposed in the article [32]. A combination of RBF and finite difference method (RBF-FD) were proposed in publication [33] to reduce computational complexity and improve the accuracy of solving differential equations. The authors investigated the RBF-FD method on the example of elliptic PDEs. In article [34] for the solution of some known classes of problems of astrophysics, compact RBF-networks, which allowed authors of this article to increase the speed of training, are used. In the article [35], the method of solving classes of LaneEmden type equations, which are nonlinear ordinary differential equations in the semi-infinite region [0, ∞); the technique is based on RBF differential quadrature (RBF-DQ). The article [36] is devoted to the solution of PDEs using an artificial neural network. In this paper, the numerical solution of elliptic PDEs was obtained for the first time using the Chebyshev neural network. RBF-based methods are widely used in numerical analysis and statistics. For example, the article [37] proposes a solution to PDEs using RBFs, which have different form parameters. The study included the determination of the optimal value of the shape parameter depending on the localization of RBF-functions.

Introduction

High-order numerical methods used to solve differential equations are generally quite sensitive to data perturbations. In order to improve the solution, it is proposed in the article [38] using integrated multiquadric (IMQ) as an RBF-network for solving boundary value problems. A notable review and monograph by Yadav et al. [39, 40] can be considered as a useful base for beginners in the field of neural network methods for differential equations in order to start quickly and easily. Recently, there has been a growing interest in deep learning neural networks, and in this regard, there is a lot of work on the use of such networks to solve differential equations. Deep learning methods are promising in solving problems with high dimension. In [41], the researchers obtained a deep learning neural network to solve the Laplace problem in two and higher dimensions. In the paper [42], deep feedforward artificial neural networks are used to approximate solutions to partial differential equations in complex geometries. It is shown how to modify the backpropagation algorithm to compute the partial derivatives of the network output with respect to the space variables which is needed to approximate the differential operator. The method is based on an ansatz for the solution which requires nothing but feedforward neural networks and an unconstrained gradient-based optimization method such as gradient descent or a quasi-Newton method. An example is given where classical mesh-based methods cannot be used, and neural networks can be seen as an attractive alternative. The work highlights the benefits of deep compared to shallow neural networks and convergence enhancing techniques. The work [43] establishes connections between non-convex optimization methods for training deep neural networks (DNNs) and the theory of partial differential equations (PDEs). In particular, it focuses on relaxation techniques initially developed in statistical physics. The researchers employ the underlying stochastic control problem to analyze the geometry of the relaxed energy landscape and its convergence properties, thereby confirming empirical evidence. This paper opens non-convex optimization problems arising in deep learning to ideas from the PDE literature. An elegant algorithm is constructed that outperforms state-ofthe-art methods. It is shown that these algorithms scale well in practice and can effectively tackle the high dimensionality of modern neural networks. The paper [44] proposes an approach to the study of a fast iterative solver adapted to the geometry of a particular domain. This goal is achieved by changing the existing solver using a deep

xxv

xxvi

Introduction

neural network. After training at one geometry, the model is generalized to a wide range of geometries and boundary conditions and achieves an acceleration of two to three times compared to modern solvers. In the noteworthy article [45], researchers propose to solve high-dimensional PDEs by approximating the solution with a deep neural network which is trained to satisfy the differential operator, initial condition, and boundary conditions. The algorithm is meshfree. It is essential in the case of higher dimensions. Instead of forming a mesh, the neural network is trained on batches of randomly sampled time and space points. The algorithm is tested on a class of high-dimensional free boundary PDEs, which can be accurately solved in up to 200 dimensions. The algorithm is tested on some examples—a high-dimensional H-J-B PDE and Burgers’ equation. The deep learning algorithm approximates the general solution to the Burgers’ equation for a continuum of different boundary conditions and physical conditions (which can be viewed as a high-dimensional space). The algorithm is called a "Deep Galerkin Method (DGM)" since it is similar in spirit to Galerkin’s methods, with the solution approximated by a neural network instead of a linear combination of basis functions. Also, a theorem is proved regarding the approximation power of neural networks for a class of quasilinear parabolic PDEs. Note also the publications [46–48]. The work [46] put forth a deep learning approach for discovering nonlinear partial differential equations from scattered and potentially noisy observations in space and time. Specially, two deep neural networks approximate the unknown solution as well as the nonlinear dynamics. The first network acts as a prior to the unknown solution and essentially enables to avoid numerical differentiations which are inherently ill-conditioned and unstable. The second network represents the nonlinear dynamics and helps to distill the mechanisms that govern the evolution of a given spatiotemporal data-set. The effectiveness of the approach is tested for several benchmark problems spanning a number of scientific domains and demonstrate how the proposed framework can help to learn the underlying dynamics accurately and forecast future states of the system. In particular, the Burgers’, Korteweg-de Vries (KdV), Kuramoto€ dinger, and Navier-Stokes equations Sivashinsky, nonlinear Schro are studied. Recent papers [47, 48] develop physics informed deep learning models. They estimate deep neural network models which merge data observations with PDE models. This allows for the estimation of physical models from limited data by leveraging a priori

Introduction

knowledge that the physical dynamics should obey a class of PDEs. Their approach solves PDEs in one and two spatial dimensions using deep neural networks. The authors have proposed a unified approach that allows applying almost identical algorithms to completely different tasks [49–52]. To date, the authors have published about 200 papers (in Russian and English) and 6 monographs (in Russian) concerning the construction of approximate neural network models of real complex systems with lumped and distributed parameters according to heterogeneous replenished data. This book is the first monograph of the authors in English and is a synthesis of the results and presentation of new findings, ideas, generalizations, and perspectives. In particular, it gives an original approach to the construction of semi-empirical deep neural network models based on multilayer functional systems. In 2003, in work [49], the authors of this book formulated a program to apply the methodology of neural networks for constructing approximate solutions to problems of mathematical physics. The main stages are: (1) Consideration of a simple problem that has a well-known analytical solution, with which the solution found using neural networks is compared. The extension of the methodology for solving this problem to a sufficiently wide class of practically important problems. (2) The solution of several more complex problems, for which the well-known numerical approaches encounter some difficulties, although they are not insurmountable but require the use of various kinds of artificial techniques. (3) Solving problems for which the standard methods are not applicable. (4) The construction of neural network algorithms that even for known problems will be more effective than classical algorithms such as the method of grids or finite elements. (5) Proofs of convergence theorems for certain classes of problems, obtaining constructive estimates of the number of required functions, etc. (6) Creating a self-learning intellectual system for solving a fairly wide class of problems in mathematical physics. A significant part of the declared program was subsequently implemented in a series of articles. We presented the results of the study in this monograph, which summarizes our work on the implementation of the program from 2003 to the present moment, taking into account the accumulated experience and summarizing certain results and findings.

xxvii

xxviii

Introduction

The classical formulation of an applied problem in such areas as aerohydromechanics, heat and mass transfer, combustion theory, etc., usually includes partial differential equations, initial-boundary conditions, and conditions at the junctions of subdomains, as well as other conditions responsible for the conservation laws, symmetry, phase transitions, etc. Exact solutions of such problems can be built extremely rarely, so it is usually necessary to build an approximate model. In order to determine the quality of the model, it is necessary to move from the initial formulation to the functional (set of functionals) that characterizes the accuracy of satisfying the equations, boundary, and initial conditions, other requirements for the approximate solution or the designed structure. One should not forget that differential equations, boundary conditions, etc. correspond to the simulated object, phenomenon, or process approximately. In addition to these, observations are usually made, which must also be taken into account in the modeling process. Accounting for this information can be carried out by modifying the above-mentioned functionals. These functionals can include addends that are responsible for experimental data that can be accumulated in the process of building a model, whereas the simulation results can be used when planning experiments. There are many traditional, well-studied approaches to construct an approximate solution of such problems—various modifications of the Galerkin method, grid method, finite elements, boundary integral equations, asymptotic expansions, and others. Most of them are various kinds of numerical methods that allow us to obtain a pointwise or local approximation of the solution. However, building, for example, from a pointwise solution of some analytic expression is a separate task. Sometimes there is a need to solve additional problems (for example, a triangulation of a domain), the consideration of which is not the initial formulation of the problem, but only by the method of its solution. Classical methods consider a mathematical model in the form of differential equations, boundary conditions, etc. as a source object. If such a model describes a real object fairly accurately and fully, then such idealization is justified, but for most technical applications this is not the case. At present, there is a need to consider existing methods from a unified position, allowing to take all the available information into account when building a mathematical model that may be incomplete and replenished in the modeling process, while the model can be refined and rebuilt in the process of observing a real functioning object.

Introduction

From this point of view, the application for modeling artificial neural networks turns out to be justified. The neural network approach allows one to get a solution immediately as a function that satisfies the required smoothness conditions and, if necessary, has a given behavior at infinity. What is more important is the stability of neural networks concerning data errors and the natural parallelization of computations, which in conjunction with the application of the evolution type algorithms discussed further below, allows the proposed methods to be implemented in the case of the complex geometry of the region in which the solution is sought. This approach can also be successfully applied when constructing a series (hierarchy) of solutions in accordance with the specified data. Inverse problems of mathematical physics often belong to the class of improper in the classical sense problems, in contrast to problems correctly posed for which the solution exists, unique and continuously depends on its parameters. If the incorrectness is associated with the lack of a solution of the problem, formalized in the form of equations and boundary conditions, while the real solution (the function describing the modeled object) exists, then it is necessary to change the concept of the solution. If the solution of a formalized problem is not unique, then the question of identifying the solution that corresponds to the actual process arises. If the solution of a formalized problem does not depend continuously on its parameters, then it is necessary to figure out whether this is true for the modeled object and to build the corresponding functional model. It is necessary to mention an important class of problems in which it is required to construct a model not for fixed values of some parameters, but on the whole interval of their change. If we act in a standard way, building a model on a discrete grid of changes in parameters, then the calculations increase manifold. It is possible to take parameters as input variables and apply some kind of the finite element method, but it is clear that this does little to improve the situation. Experience has shown that the use of neural networks is more promising for such tasks, and it is better to use perceptron type basis functions for parameter approximation. Let us explain the essence of our neural network approach on the example of the simplest boundary value problem AðuÞ ¼ g, u ¼ uðx Þ, x 2 Ω  Rp , BðuÞjΓ ¼ h,

(1)

here A(u) is some differential operator, i.e., an algebraic expression which contains derivatives of an unknown function u, B(u) is an operator that allows setting boundary conditions, Γ is the boundary of a domain Ω.

xxix

xxx

Introduction

The well-known Ritz-Galerkin method, the variation of which can be attributed to FEM, is about finding an approximate solution of the problem (1) as a sum uð x Þ ¼

N X

ci v i ð x Þ

(2)

i¼1

Expression (2) can be interpreted as an expansion over a functional basis {vi(x)}N i¼1. The elements of this basis are called basis functions. The formation of these basic functions is determined by the method used. Thus, the approximate solution (2) can be interpreted as an approximation of the sought element of the functional space (solution of the problem) by an element of its finite-dimensional subspace. We suggest seeking an approximate solution of the problem (1) in the form of an artificial neural network (ANN) output for a given architecture uðx, w Þ ¼

N X

ci vðx, ai Þ

(3)

i¼1

Here w ¼ (w1, …, wN) is the vector of weights wi ¼ (ci, ai). Thus, the basis functions become dependent on parameters vi(x) ¼ v(x, ai). In this case, the parameters ai, together with the parameters ci, are usually selected in the process of solving a problem. In the literature on neural networks, this process is called training (learning). Thus, the learning process of a neural network is represented by the search for an optimal element on a finite-dimensional manifold from a given functional space. Due to the fact that our algorithms allow a change in the structure of the network in the learning process (see Chapter 3), the manifold mentioned above is also adjusted. The basis function is defined by choice of the activation function. Usually, the functions φ of one real variable are used as activation functions: for example, in the case of a plane x ¼ (x1, x2), when choosing the activation function φ(t) ¼ exp(t2 ) and the corresponding quadratic form, we come to an ellipsoidal Gaussian neuro element of the form   cvðx, aÞ ¼ c exp a1 ðx1  a4 Þ2  a2 ðx1  a4 Þðx2  a5 Þ  a3 ðx2  a5 Þ2 , a ¼ ð a 1 , a2 , a3 , a4 , a5 Þ

A separate addend in the sum (3) civ(x, ai) we call neuro elements. Sometimes it is advisable to use heterogeneous ANN with different basis functions vi. Usually, such networks are used when

Introduction

the desired solution behaves in a fundamentally different way in different subdomains, for example, in problems with phase transitions, jumps, etc. The weights of the ANN—linear input parameters ci and nonlinear input parameters ai—are in the process of step-by-step network training, built in the general case by minimizing some error functional. The error functional could be taken as ð ð jAðuÞ  g j2 dΩ + δ jBðuÞ  f j2 dΓ: Ω

Γ

However, this functional can be calculated analytically only in exceptional cases, so it is more convenient to use its discrete form M     K      2 X X     B u ξ0  h ξ0 2 :  A u ξ j  g ξj  + δ k k j¼1

k¼1

It is necessary to emphasize that it is not the minimum of the functional J(w) that is sought, but the point uη on the variety of neural network functions of this architecture—η-solution: J(uη) < η while the number η > 0 is small enough to consider the constructed model to be sufficiently accurate. Our numerical experiments showed that the use of a fixed set of test (trial) points {ξj} is impractical since in this case, the smallness of the functional (3) can go with the large errors at other points of the domain Ω. We proposed and used a solution to this problem, which is to use periodically regenerated sets of test points {ξt} in the domain Ω. Sometimes we used the regeneration of points {ξk0 } on its boundary Γ. Usually, it had to be done for boundaries of complex shape or multi-dimensional boundaries. Regeneration of test points after a certain number of steps of the network learning process makes it more sustainable. In fact, a sequential minimization of a set of functionals is considered, each of which is obtained by a specific choice of test points and is not fully minimized (only a few steps of the selected minimization method are performed between regenerations of the test set). Such an approach, in particular, makes it possible to circumvent the problem of entering the local extremum, which is characteristic of most nonlinear optimization methods. Test points can be selected within a domain Ω (or in a wider set e  Ω) and on the boundary Γ in a regular manner, for example, uniΩ formly in the case of a bounded region or according to the normal law if the region is unbounded. Sometimes the learning process is

xxxi

xxxii

Introduction

based on the targeted arrangement of trial points. However, in most situations, a random distribution of points generated over a certain number of learning epochs (optimization steps), using a constant (or other) probability density, is more appropriate which ensures a more stable course of learning. This approach allows us to control the quality of learning using standard statistical procedures. In some cases, we found that it is advisable to use a non-uniform distribution of test points—thickening them near singularities (discontinuity boundaries, angles, etc.) or in the areas with large errors. We note that it is possible to choose a different type of error functional, for example, the energy one for some statements of the problem. The right parts of the functional representations may include addends that are also responsible for other formulation conditions: symmetry, boundary conditions, the correlation at the interfaces, the equation of state, etc. So, for A ¼ Δ, g ¼ 0 the sought solution can be obtained by minimizing the functional ð ð jruj2 dΩ + δ jBðuÞ  f j2 dΓ: Γ

Ω

The requirements for the model (experimental data, symmetry property, unknown boundaries, conservation laws, etc.), as mentioned above, can be implemented as additional terms in error functional M X

jAðuÞ  g j2 ðξt Þ + δ

t¼1

K X

  jBðuÞ  hj2 ξ0k + ⋯,

k¼1

We note that in some problems, it is convenient to use sets of ANNs us ðx, w s Þ ¼

Ns X

ci,s vs ðx, ai, s Þ

i¼1

(usually in the case of composite domains) or an ANN system. Examples of such problems will be given later in Chapters 1 and 4. In our approach, we considered non-stationary problems (in particular, initial-boundary problems) by changing the dimension of space—including time in the number of variables. But there are other ANN approaches, in particular, the use of dynamic neural networks. In practice, it is often necessary to investigate the behavior of a solution depending on a certain parameter, identify the value of a parameter from measurement data, or find an approximate solution in a situation when the characteristics defining the system

Introduction

being simulated are not known precisely, but are given by the values distributed in certain intervals, by interval parameters. When considering such problems, we use the neural network approach we are developing to construct stable approximate models of complex systems. Let us explain the essence of this approach on the simplest (generally speaking, nonlinear) boundary value problem (1). The modification of the task lies in the fact that its formulation includes parameters r ¼ (r1, …, rk) varying at some intervals + ri 2 (r i ; ri ), i ¼ 1, …, k:  Aðu, rÞ ¼ g ðrÞ, u ¼ uðx, rÞ, x 2 ΩðrÞ  Rp ,Bðu, rÞΓðrÞ ¼ f ðrÞ, (4) where A is some differential operator, that is an algebraic expression containing ordinary or partial derivatives of an unknown function u; B is an operator that allows setting boundary conditions; Γ is the domain boundary of Ω. The representation for the approximate solution of the problem also changes. We look for an approximate solution of the problem (4) in the form of an output of an artificial neural network of a given architecture: uðx, rÞ ¼

N X

ci vi ðx, r, ai Þ,

(5)

i¼1

the weights of which—linear incoming parameters ci and nonlinear incoming parameters ai —are determined in the process of gradual network training based on minimizing the error functional of the form M    M0      2 X X       A u x j , rj  g x j , rj  2 + δ B u x 0j , r0j  f x 0j , r0j  : (6) j¼1

j¼1

—periodically regenerated trial points in Here, as before, {xj, rj}M   Qj¼1   0 the domain Ω rj  ki¼1 ri ; ri+ ; {xj0, rj0 }M j¼1 are trial points on its boundary Γ(rj0 ); δ is a positive penalty parameter. The result of our work is the methodology for building a hierarchy of neural network models of complex processes that can be updated as new information becomes available. Based on this methodology, a unified process for solving complex problems of mathematical modeling has been developed. Its main stages are: 1. Characteristics of the quality of the model in the form of a functional (a set of functionals). This stage is based on the information about the models of phenomena being studied (these models can be refined in the process of constructing a solution, designing and operating an object) and can be implemented

xxxiii

xxxiv

Introduction

by a specialist in the subject area. This stage is discussed in detail in Chapter 1 with numerous examples. 2. The choice of a functional basis (bases). This stage can be performed as a subject matter expert on the basis of information about the nature of the phenomena being modeled, or automatically, using the evolutionary algorithms developed by the authors (see previous chapters). Neural network bases have shown their effectiveness in solving various problems and weakly depend on the characteristics of the problem itself. Description of neural network basis functions that we used in solving specific problems we have given in Chapter 2. 3. The choice of methods for selecting the parameters and structure of the model can be fully automated and does not require the mandatory intervention of a specialist in the subject area, although the approximate information he has about the behavior of the object can be easily taken into account when building the model. A description of the methods we used to solve problems is given in Chapter 3. 4. Implementation of methods for refining models of objects in the process of their operation and the corresponding adjustment of their control algorithms. The principles for constructing such methods are based on the application of the methods that we set out in Chapters 1–3. 5. Updating the database of models, algorithms, and programs. The use of neural networks as a new methodology for solving both old—classical and new—non-classical tasks is based on several special properties of neural networks. The proposed neural network approach allows one to obtain a solution immediately in an analytical (or piecewise analytic) form—as a function that satisfies the required smoothness and behavior conditions at infinity: for example, the neural network basic function—Gaussian—allows one to get an infinitely differentiable solution that decreases at the infinity faster than any degree of an argument. Obviously, by using neural networks, one can calculate linear functions, nonlinear functions of one variable, as well as possible superpositions—composite functions obtained by the cascade connection of networks. But how great are the possibilities of neural networks? There are quite natural questions: 1. What functions can be calculated exactly: can, for example, an arbitrary continuous function of n variables be obtained using addition, multiplication, and superposition operations from continuous functions of a smaller number of variables? 2. What functions can be arbitrary closely approximated with the help of neural networks, what are the requirements for neural network functions?

Introduction

The answer to the first fundamental question turned out to be positive. A. N. Kolmogorov [53, 54] and V. I. Arnold [55, 56] have proved that any continuous function of n variables can be obtained using the operations of addition, multiplication, and superposition from continuous functions of one variable. It should be noted that the condition for continuity cannot be significantly enhanced: there are analytic functions of several variables that do not allow representation by means of a superposition of analytic functions of two variables. Moreover, A. G. Vitushkin showed that all k times continuously differentiable functions of three variables could not be represented as superpositions of functions of two variables, each of which is differentiable [2k/3] times and all partial derivatives of which of the order [2k/3] satisfy the Lipschitz condition. Let us present the formulation of the Kolmogorov theorem, which completed a series of studies for continuous functions and served as the fundamental result in substantiating neural network applications: Each continuous function of n variables defined on the unit cube of n-dimensional space can be represented as ! 2n +1 n X X i hj φ j ðx i Þ , f ðx1 , x2 , …, xn Þ ¼ j¼1

i¼1

where the continuous functions hj() depend on f(), and continuous (but nonsmooth) functions φij(), moreover, are also standard, i.e., do not depend on the choice of function f(). In particular, for example, any continuous function of two vari  P5 1 ables can be represented as f ðx1 , x2 Þ ¼ j¼1 hj φj ðx1 Þ + φ2j ðx2 Þ : Also see [57]. The answer to the second question—about the possibility and conditions of neural network approximation—is also very important, especially in connection with practical applications. The problem of approximation of an arbitrary function from a given class with the help of functions selected from a certain “narrow” family has a rich history and is characterized by numerous remarkable results. The well-known Weierstrass theorem on the approximation of functions by polynomials asserts that a continuous function of several variables on a closed bounded set Ω can be uniformly approximated by a sequence of polynomials. Recently, powerful development, especially in connection with physical applications, has received approximation by rational  approximation), as well as an approximation by functions (Pade nonlinear finite-parametric manifolds [58–62].

xxxv

xxxvi

Introduction

Weierstrass’s theorem has a far-reaching generalization— Stone’s theorem. Instead of a closed bounded set Ω  Rp, we consider a compact space X and an algebra C(X) of continuous realvalued functions on X. We give a variant of the formulation of the theorem of Stone: Theorem 1 Let E  C(X) be a closed subalgebra in C(X): 1 2 E and functions from E separate points in X (that is, for any pair of different points x, y 2 X there is a function g 2 E such that g(x) 6¼ g(y)), then E ¼ C(X). It is important that not only the functions of many variables are introduced for consideration, and the set of approximating “narrow” families is significantly enriched. А ring of polynomials from any set of functions that separate points can act as a dense set, and not just the family of polynomials from coordinate functions, as in the Weierstrass theorem. Consequently, there is a dense set of trigonometric polynomials, a set of linear combinations of radial basis functions—ellipsoidal Gaussians of the form exp{ Q(x  x0 )}, where Q is a positive definite quadratic form, etc. Neural networks can be considered as universal approximators [63–68]. This conclusion follows from the theorem 1 for RBFnetworks and from the following generalized Stone approximation theorem for nets of another kind—multilayer perceptrons (see Section 2.2). Let E  C(X) be a closed linear subspace of the space of continuous functions on a compact set X, C(R) is the linear space of continuous functions on the real axis R, φ 2 C(R) is some non-linear function such that for any g 2 E φ ∘ g 2 E is satisfied. In this case, we will say that the subspace E is closed with respect to the nonlinear unary operation φ. The generalized Stone theorem is formulated as follows: Theorem 2 Let E  C(X) be a closed linear subspace in C(X), 1 2 E, functions from the set E share points in X, E closed with respect to a nonlinear unary operation φ 2 C(R), then E ¼ C(X). Stone’s theorem is interpreted as a statement about the universal approximation possibilities of arbitrary nonlinearity: using linear operations and cascade connection using an arbitrary nonlinear element (these operations are carried out with neurocomputing—a superposition of simple functions of one variable and their linear combinations) it is possible to calculate any continuous function with any predetermined accuracy. Several theorems on the approximation of continuous multivariable functions by neural networks were proved using an almost arbitrary continuous function of one variable (see the bibliography in [63, 64]).

Introduction

To date, many results have been obtained on approximations based on neural networks in various functional spaces. This also includes questions of approximation by nonlinear finitedimensional manifolds—Brudny, Temlyakov, and others [58–62]. Very important for the motivation of the chosen approach is the stability of the neural network model with respect to errors in the data—inaccuracies in specifying the coefficients of equations, boundary, and initial conditions, boundary disturbances, calculation errors. The neural network approach in the proposed form weakly depends on the form of the domain and can be applied in the case of problems with a complex geometry of the region; it allows to take into account the discontinuities and change the type of equation in subregions, to consider nonlinearities [49–52]. Another principled matter typical of the neural network approach is the parallelization of the problem and the possibility of using a set of networks, which is essential when building models of systems with piecewise specified parameters. Also, this approach allows the use of well-developed techniques for neural networks to search for the optimal structure using clustering, genetic algorithms (for example, procedures of the GMDH multi-row algorithm), a team of expert networks [69], and others. For more on these algorithms, see Chapter 2. We do not offer to abandon the use of classical approaches completely. If these approaches work successfully, it is great. Apparently, the most effective algorithms will be obtained by a combination of classical and neural network approaches. In a number of non-classical problem statements for modeling systems with distributed parameters, nonstandard equations are considered, there are no classical options for specifying initialboundary conditions, instead of which, for example, point data are known, known, as a rule, approximately (see paragraphs 1.4 and 4.4), which leads to the need to build a series of refined models. The neural network approach is also effective in this case; see examples of such problems in 4.4. The development of intelligent computing and, in particular, neurocomputing, is, in essence, nothing more than an attempt to give an answer to the challenge that modern computing technology, based on the von Neumann type architecture, presents in its limited capabilities. Separate problems are the parallelizability and scalability of the considered algorithms, which is a necessary prerequisite for their effective implementation on a super-computer and in a distributed mode (using the cloud). Often, problems of parallelization of the computational process are solved at the

xxxvii

xxxviii

Introduction

programming stage, although the development and implementation of obviously parallelized methods and algorithms seem to be more efficient. The undoubted advantage of the neural network approach is the multiscale parallelism of the resulting algorithms. At the lowest level, the application of the learning algorithm requires calculations of the values of the neural network function, as a rule (if gradient optimization algorithms are used) it is required to calculate the derivatives of the neural network function concerning its weights, the use of higher order algorithms may require calculations of the corresponding higher derivatives. For neural network functions, these operations are parallelized in a natural way, which makes them well scalable on multi-core computing systems. The possibility of adapting the network structure for the configuration of the computing system should be particularly emphasized. At the next level, the methods of parallelization of optimization algorithms are used; this question was studied quite well, so we will not dwell on it. At the third level, there are evolutionary algorithms that allow combining the selection of weights of a network and the adaptation of its structure. Several such algorithms are discussed in the book. They, as a rule, allow not only good paralleling but can also be implemented quite effectively in distributed systems. The currently used neural network methods have significant drawbacks, such as the long training procedure and the singlelayer nature of the neural network built on methods similar to FEM. We have developed an approach that allows us to quickly generate a fairly accurate multilayer neural network solution of a differential equation. In contrast to the previous approaches, the method is not based on FEM, but the finite difference method. The strength of the method is the automatic inclusion in the final formula of the parameters of the problem, which allows us to do without multiple solutions if necessary to investigate the effect of parameters on the result. This way of parameter inclusion is especially important for building an individual model of a specific object, taking into account its unique features—this method we discussed in Chapter 5. The presentation is based on the illustration of the work of individual stages of the general methodology (unified approach) on the example of a large number of specific problems—both model and of practical interest. Chapter 1 is devoted to the problem statements and detailed consideration of the first stage of the unified process mentioned above for solving complex problems of mathematical modeling.

Introduction

On the example of numerous various problems for differential equations (ordinary and partial derivatives), it is shown how to formalize the requirements for the quality of their solution using a functional or a set of functionals. This chapter will be useful not only in the study of neural network modeling, but also in other areas of mathematical modeling, as the above functionals are determined by the formulation of the problem, and not the class of functions in which the solution is sought. In the first paragraph, we consider several different problems for ordinary differential equations—the Cauchy problem for a stiff first-order differential equation, the problem of modeling processes in a chemical reactor and a porous catalyst granule, and the differential-algebraic problem. A characteristic feature of these problems is the presence of parameters in their formulation, which should depend on the approximate solution that we want to build. Despite the features of the problems, we evaluate the quality of their solutions in the form of functionals in a similar way, which allows the reader to perform a similar operation for their problems with ease. In the second paragraph, the estimation of the quality of the solution of problems for partial differential equations in the domains with constant boundaries is formulated in the form of functionals. Boundary value problems are considered for standard equations, such as the Laplace equation on the plane and in space and the Poisson equation. The properties of solutions to elliptic boundary value problems are very similar to the properties of the solution of the Laplace equation, for which there is an explicit integral representation—it is possible to test the results. In addition, we have given functionals for problems interesting from € dinger equation with the point of view of applications—the Schro € dinger piecewise potential (Quantum dot), the nonlinear Schro equation, the plane and spatial heat transfer problem in the “vessels-tissues” system: venous and arterial vessels are surrounded by muscle tissue in which the heat is released. We assume that the heat transfer in vessels is carried out mainly by convection, whilst in tissues—by conduction. A boundary value problem, which is associated with the change of the type of equations and boundary conditions arises: temperature field T in the tissue satisfies the Poisson equation (elliptic type), the vessels—heat transfer equation (parabolic type); on the part of the border (including the inputs of the vessels), there is the Dirichlet condition, on the part of the border corresponding to the tissue—the homogeneous Neumann condition, in areas of the joint sections there are matching conditions in the form of continuity of the field and its normal derivative to the

xxxix

xl

Introduction

section (heat conductivity coefficients of blood and tissues are almost equal). The third paragraph deals with two problems for partial differential equations for regions with variable boundaries. The first problem with phase transitions is examined on the example of the one-dimensional Stefan problem, the solution to which is known and can be used to control the proposed neural network approach. The following natural approaches to the Stefan problem from the point of view of our methodology are considered: 1. The approximation of the temperature fields for both phases using a single model. 2. The construction of a heterogeneous network, which includes models describing the temperature regimes for each of the phases along with a model that specifies the front of the phase transition. The second problem differs from the first one in that the variable boundary in it is determined not by the natural physical conditions, but by the optimality condition chosen by us. The model verification unit of alternating pressure is considered, the measuring working cavity of which is symmetrical both with respect to the axis of rotation and to the plane which is perpendicular to this axis. The cavity is filled with a viscous liquid. On the cylindrical part of the cavity boundary, there is a piezoelectric source of harmonic oscillations. It puts variable pressure on the present constant pressure. We believe that the acoustic wave field in the measuring chamber is time-harmonic, axisymmetric, and even with respect to the plane of symmetry. On the axis of symmetry, there are two pressure sensors—standard and testable. It is necessary to choose the shape of the part of the border containing the sensor so that the pressure on it is maximum. In the fourth paragraph, we have considered several inverse and ill-posed problems, which are especially relevant from the point of view of practical applications. Many of the applied problems lead to the necessity for building an approximate solution to the differential equation (or set of equations), highlighting this solution among others not by the initial boundary conditions, as is customary in the classical formulations of problems of mathematical physics, and, for example, by a certain set of experimental data. Note that in such an unconventional statement, the problems become incorrect and, generally speaking, may not have a solution. The neural network approach that we propose allows for designing approximate solutions in such non-standard situations. As an example of a non-classical formulation, we study the problem of finding a function for which the equation is known

Introduction

in some part of the region; besides, its values are obtained (for example, as a result of measurements, perhaps with some error) in some set of points. The proposed method allows working not only with equations of elliptic type. On the example of the heat equation for the string it is applied to evolutionary equations—the problem of continuation of unsteady fields according to point observations. In addition to that, the inverse problem of modeling migration flows and the problem of air pollution in the tunnel, which are of practical interest, were considered. Once again, we draw attention to the fact that the construction of functionals for a variety of problems we carry out by using a single methodology. In Chapter 2, we have looked at the main types of neural networks that were useful in our work on mathematical modeling. This chapter is not a complete review of the well-known neural network architectures. For a detailed review, we refer one to the book [63, 64]. Nonetheless, we are certain that for the most problems of mathematical modeling, the neural network architectures are sufficient, which we have considered in Chapters 2 and 5. In the first paragraph of the second chapter, we had a look at one of the most commonly used types of neural networks— perceptron with an arbitrary number of hidden layers. Simple recurrent relations are given, which allow calculating the network output and the derivatives of the network output concerning the weights, required to apply gradient learning methods. The analysis of the approaches for determining the initial values of the weights of the multilayer perceptron was conducted. In the second paragraph of this chapter, neural networks with radial basis functions (RBF-networks) are considered. We brought up the most commonly used activation functions, formulas for finding derivatives of the output of the RBF-network concerning the weights, which are required for the application of gradient learning methods. We gave the expression of the functional for the Dirichlet problem for the Laplace equation on the unit disk as an example. Also, in this paragraph, we had a look at the asymmetrical RBF-networks, useful for solving anisotropic problems. In the third paragraph of this chapter, we looked at the multilayer perceptron and RBF-networks with time delays. This particular type of neural networks is useful in the tasks that include time. The first form of temporary RBF-networks allows simulating situations where a certain event occurs, whose action weakens with time. The second form of timed RBF-networks is designed to simulate the dynamics of systems with several interacting centers.

xli

xlii

Introduction

In Chapter 3, the methods for parameter adjustment and structure selection for neural network, oriented on the mathematical modeling tasks, are considered. Recall that in the first chapter, we have reduced a wide range of mathematical modeling problems to the problem of optimizing the corresponding functional or set of functionals. Fixing the desired solution in the form of some neural network from Chapter 2 reduces this task to the problem of global nonlinear optimization. However, in our experience, we have seen that in most of the problems, methods that combine the adjustment of weights and the selection of network structure turned out to be the most effective. The third chapter is dedicated to the consideration of such methods. It is worth recalling that we assume rather not the optimization of a fixed functional, but rather a sequence of functionals, which are obtained by the regeneration of test points. This feature of algorithms eliminates the advantages of second-order algorithms, such as the Newton method. In this chapter, we focused our attention on the algorithms that turned out to be the most useful in this situation. In the first paragraph of the third chapter, the most useful algorithms are given that allow selecting the structure of neural networks during the learning process. In addition to general algorithms, the algorithms for individual problems are also considered. In particular, three structural algorithms are given for solving the Dirichlet problem for the two-dimensional Laplace equation in the L-domain, which is a union of two rectangles Π1 and Π2. Approach A. In this approach, the ideology of GMDH—Group Method of Data Handling is used. Several variants of the multiplerow algorithm for selecting the best functions are built. Approach B. Modifications of genetic algorithms for constructing a neural network are proposed using the learning of two network ensembles. Genetic operations (mutations, translocations, crossover) are specified in neural network characteristics, and not in terms of binary codes. Approach C. Training of the team of expert networks takes place—the resulting group of networks gives a local representation for solving a problem in the whole domain, i.e., every network gives a solution in its subdomain. The procedure of domain decomposition, on which the abovementioned approaches are based, can be conducted in the case of domains of more complex shape when the domain is divided into a greater number of components. The implementation of approaches, in this case, is described. Comparative analysis of the results of calculations has shown that the use of these evolutionary approaches leads to a significant

Introduction

reduction (from 4 to 10 times) in the number of neurons required to achieve this accuracy. In the second paragraph of the third chapter, many well-known and new algorithms are considered, which makes it possible to approach the global extremum. These algorithms were tested in the learning tasks of neural networks and have proven to be quite effective when the number of selected variables (network weights) ranges from hundreds to several thousand. In the third paragraph of the third chapter, general methods for constructing approximate neural network models using heterogeneous information (differential equations and data) are given. The proposed algorithms that replace the traditional two-staged model building method consider the hierarchy of models, both differential and functional, including all available initial information, allowing the evolution of models at any level and capable of including newly received information into the consideration. Along this way, it is also possible to construct regularizations of the solutions for ill-posed and non-classical problems. In the fourth paragraph of the third chapter, we discuss the possible modifications of the algorithms, which we considered in previous paragraphs in the situation when the simulated object is changing. In accordance with the changes in the original object, the model should also change. The modifications of the algorithms depend on the frequency of the arrival of new data and the rate of changes in the simulated object. Chapter 4 contains some of the results of the simulation experiments on the application of our methodology for the problems, the formulation of which is given in the first chapter. The purpose of this chapter is to allow the reader to compare his results with them in the analogous tasks and to conclude whether their quality turned out to be in line with what should be expected. In Chapter 5, with the use of our modification of well-known formulas of the numerical methods, the approximate analytical solutions are built for differential equations. Ordinary estimates of the accuracy of the original classical methods show that arbitrarily accurate solutions can be obtained in this way. In this case, it is possible to avoid the use of a very resource-intensive learning procedure. We consider this approach as a new version of deep learning. In paragraph 1 of Chapter 5, we gave a general formulation of the proposed procedure for constructing multi-layer models. In paragraph 2 of Chapter 5, examples for the construction of the proposed multi-layer models for ordinary differential equations are considered. Paragraph 3 of Chapter 5 is devoted to the application of these methods for partial differential equations.

xliii

xliv

Introduction

Paragraph 4 of Chapter 5 presents the results of solving three problems of modeling real objects using differential equations and additional data. As a result of the application of our methods, approximate models are obtained that correspond better to real measurements than the exact solutions of the original differential equations. In conclusion, the main results are presented, and promising areas of research are outlined.

References [1] R.L. Hardy, Theory and applications of the multiquadric-biharmonic method, Comput. Math. Appl. 19 (8/9) (1990) 163–208. [2] E. Kansa, Multiquadrics—a scattered data approximation scheme with applications to computational fluid dynamics I: surface approximations and partial derivative estimates, Comput. Math. Appl. 19 (8/9) (1990) 127–145. [3] E. Kansa, Multiquadrics—a scattered data approximation scheme with applications to computational fluid dynamics II: solutions to parabolic, hyperbolic and elliptic partial differential equations, Comput. Math. Appl. 19 (8/9) (1990) 147–161. [4] E.J. Kansa, Motivation for Using Radial Basis Functions to Solve PDEs, Lawrence Livermore National Laboratory and Embry-Riddle Aeronatical University, 1999, http://www.rbf-pde.uah.edu/kansaweb.ps. [5] M. Sharan, E.J. Kansa, S. Gupta, Application of the multiquadric method to the numerical solution of elliptic partial differential equations, Appl. Math. Comput. 84 (1997) 275–302. [6] E. Galperin, Z. Pan, Q. Zheng, Application of global optimization to implicit solution of partial differential equations, Comput. Math. Appl. 25 (10/11) (1993) 119–124. [7] E. Galperin, Q. Zheng, Solution and control of PDE via global optimization methods, Comput. Math. Appl. 25 (10/11) (1993) 103–118. [8] E.A. Galperin, E.J. Kansa, Application of global optimization and radial basis functions to numerical solutions of weakly singular Volterra integral equations, Comput. Math. Appl. 43 (2002) 491–499. [9] M.W.M.G. Dissanayake, N. Phan-Thien, Neural-network-based approximations for solving partial differential equations, Commun. Numer. Methods Eng. 10 (3) (1994) 195–201. [10] V.I. Gorbachenko, Neurocomputers in Solving Boundary Value Problems of the Field Theory, Radiotekhnika, Moscow, 2000, 336 p. (in Russian). [11] V.I. Gorbachenko, Methods of solving partial differential equations on cellular neural networks, in: “Neurocomputers”: Development, Application, No. 3–4, 1998, pp. 5–14 (in Russian). [12] V.I. Gorbachenko, S.N. Katkov, Neural network methods for solving problems of thermoelasticity, in: “Neurocomputers”: Development, Application, No. 3, 2001, pp. 31–37 (in Russian). [13] V.I. Gorbachenko, S.A. Moskvitin, Neural network approach to solving the coefficient inverse problem of mathematical physics, in: Proceedings of the VII All-Russian Scientific and Technical Conference “Neuroinformatics2005”, Part 1, MEPhI, Moscow, 2005, pp. 60–68 (in Russian). [14] V.I. Gorbachenko, E.V. Yanichkina, Solution of partial differential equations using radial basis neural networks, in: Proceedings of the VI International

Introduction

[15]

[16] [17]

[18]

[19] [20]

[21]

[22]

[23] [24]

[25] [26]

[27]

[28] [29]

[30]

[31] [32]

Scientific and Technical Conference “Computer Modeling 2005”, St. Petersburg State Polytechnical University, St. Petersburg, SPb., 2005, pp. 101–105 (in Russian). V.I. Gorbachenko, E.V. Yanichkina, Solution of elliptic partial differential equations using radial basis neural networks, in: Proceedings of the VIII All-Russian Scientific and Technical Conference “Neuroinformatics-2006”, Part 3, MEPhI, Moscow, 2006, pp. 15–21 (in Russian). Neuromathematics, Book 6, Under General Editorship by A. I. Galushkin, IPRZHR, Moscow, 2002, 448 p. (in Russian). A.A. Samarskii, P.N. Vabischevich, Numerical Methods for Solving Inverse Problems of Mathematical Physics, Editorial URSS, Moscow, 2004, 480 p. (in Russian). A.A. Samarskii, P.N. Vabishchevich, Numerical Methods for Solving Inverse Problems of Mathematical Physics, Inverse and Ill-Posed Problems Series, Walter de Gruyter, Berlin, New York, 2007, 438 p. N. Mai-Duy, T. Tran-Cong, Numerical solution of differential equations using multiquadric radial basis function networks, Neural Netw. 14 (2001) 185–199. S.A. Terekhoff, N.N. Fedorova, “Cascade neural networks in variational methods for boundary value problems,” IJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339), Washington, DC, USA, vol.3, 1999, pp. 1507–1510. I.E. Lagaris, A. Likas, D.I. Fotiadis, Artificial neural networks for solving ordinary and partial differential equations, IEEE Trans. Neural Netw. 9 (5) (1998) 987–1000. K. Valasoulis, D.I. Fotadis, I.E. Lagaris, A. Likas, Solving differential equations with neural networks: implementation on a DSP platform, in: 14th International Conference on Digital Signal Processing, vol. 2, 2002, pp. 1265–1268. I.E. Lagaris, A. Likas, D.I. Fotiadis, Artificial neural network methods in quantum mechanics, Comput. Phys. Commun. 104 (1–3) (1997) 1–14. I.E. Lagaris, A.C. Likas, D.G. Papageorgiou, Neural-network methods for boundary value problems with irregular boundaries, IEEE Trans. Neural Netw. 11 (5) (2000) 1041–1049. I.G. Tsoulos, I.E. Lagaris, Solving differential equations with genetic programming, Genet. Program. Evolvable Mach. 7 (2006) 33–54. K.E. Burnayev, N.I. Korsunov, Technique of solving partial differential equations based on the use of cellular neural networks, in: Proceedings of the VIII All-Russian Scientific and Technical Conference “Neuroinformatics-2006”, Part 3, MEPhI, Moscow, 2006, pp. 67–75 (in Russian). S.V. Belikov, Application of neural automata in problems of mathematical physics, in: Proceedings of the VI International Conference on Nonequilibrium Processes in Nozzles and Jets—NPNJ-2006, University Book, St. Petersburg, Moscow, 2006, pp. 64–66 (in Russian). R. Masuoka, Neural networks learning differential data, IEICE Trans. Inf. Syst. E83-D (8) (2000) 1291–1299. V.I. Avrutskiy, Neural networks catching up with finite differences in solving partial differential equations in higher dimensions, IEEE Trans. Neural Netw. Learn. Syst., 2017, https://arxiv.org/pdf/1712.05067.pdf. T. Hagge, P. Stinis, E. Yeung, A.M. Tartakovsky, Solving differential equations with unknown constitutive relations as recurrent neural networks, 2017, https://arxiv.org/pdf/1710.02242.pdf. N. Mai-Duy, Solving high order ordinary differential equations with radial basis function networks, Int. J. Numer. Methods Eng. 62 (6) (2005) 824–852. rrez, Solving Delay Differential Equations Through RBF ColF. Bernal, G. Gutie location, Universidad Carlos III, Universidad Pontificia Bolivariana, Medellı´n, Madrid, 2017, https://arxiv.org/pdf/1701.00244v1.pdf.

xlv

xlvi

Introduction

[33] S. Yensiri, R.J. Skulkhu, An investigation of radial basis function-finite difference (RBF-FD) method for numerical solution of elliptic partial differential equations, Mathematics 5 (4) (2017) 54, https://www.mdpi.com/2227-7390/ 5/4/54. [34] K. Paranda, M. Hemamia, Numerical Study of Astrophysics Equations by Meshless Collocation Method Based on Compactly Supported Radial Basis Function, 2016, https://arxiv.org/pdf/1509.04326.pdf. [35] K. Parandab, S. Hashemi, RBF-DQ method for solving non-linear differential equations of Lane-Emden type, Ain Shams Eng. J. 9 (4) (2018) 615–629. [36] S. Mall, S. Chakraverty, Single layer Chebyshev neural network model for solving elliptic partial differential equations, Neural Process. Lett. 45 (3) (2017) 825–840. [37] M. Mongillo, Choosing Basis Functions and Shape Parameters for Radial Basis Function Methods, 2011, https://www.siam.org/Portals/0/Publications/ SIURO/Vol4/Choosing_Basis_Functions_and_Shape_Parameters.pdf? ver¼2018-04-06-103239-587. [38] S.E. Huber, M.R. Trummer, Radial basis functions for solving differential equations, Comput. Math. Appl 71 (1) (2016) 319–327. [39] M. Kumar, N. Yadav, Multilayer perceptrons and radial basis function neural network methods for the solution of differential equations: a survey, Comput. Math. Appl. 62 (2011) 3796–3811. [40] N. Yadav, A. Yadav, M. Kumar, An Introduction to Neural Network Methods for Differential Equations, SpringerBriefs in Applied Sciences and Technology Computational Intelligence, Springer, 2015, 114 p. [41] X. Kailai, S. Bella, Y. Shuyi, Deep Learning for Partial Differential Equations (PDEs), CS230: Deep Learning, Winter, Stanford University, CA, 2018. € m, A unified deep artificial neural network approach to par[42] J. Berg, K. Nystro tial differential equations in complex geometries, Neurocomputing 317 (2018) 28–41. [43] P. Chaudhari, A. Oberman, S. Osher, S. Soatto, G. Carlier, Deep relaxation: partial differential equations for optimizing deep neural networks, 2017, arXiv:1704.04932v1 [cs.LG]. [44] J.-T. Hsieh, S. Zhao, S. Eismann, L. Mirabella, S. Stefano, Learning neural PDE solvers with convergence guarantees, in: Published as a Conference Paper at ICLR, 2019. [45] J. Sirignano, K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, 2018, arXiv:1708.07469v5 [q-fin.MF]. [46] M. Raissi, Deep hidden physics models: deep learning of nonlinear partial differential equations, J. Mach. Learn. Res. 19 (2018) 1–24. [47] M. Raissi, P. Perdikaris, G.E. Karniadakis, Physics informed deep learning (part I): data-driven solutions of nonlinear partial differential equations, 2017, arXiv:1711.10561v1 [cs.AI]. [48] M. Raissi, P. Perdikaris, G.E. Karniadakis, Physics informed deep learning (part II): data-driven discovery of nonlinear partial differential equations, 2017, arXiv:1711.10566v1 [cs.AI]. [49] A.N. Vasilyev, D.A. Tarkhov, Application of neural networks to non-classical problems of mathematical physics, in: Proceedings of the International Conference on Soft Computing and Measurements—SCM’2003, vol. 1, 2003, pp. 337–340 (in Russian). [50] A. Vasilyev, D. Tarkhov, G. Guschin, Neural networks method in pressure gauge modeling, in: Proceedings of the 10th IMEKO TC7 International Symposium on Advances of Measurement Science, Saint-Petersburg, Russia, vol. 2, 2004, pp. 275–279.

Introduction

[51] D.A. Tarkhov, A.N. Vasilyev, New neural network technique to the numerical solution of mathematical physics problems. I: simple problems, Opt. Memory Neural Netw. (Inform. Optics) 14 (1) (2005) 59–72. [52] D.A. Tarkhov, A.N. Vasilyev, New neural network technique to the numerical solution of mathematical physics problems. II: complicated and nonstandard problems, Opt. Memory Neural Netw. (Inform. Optics) 14 (2) (2005) 97–122. [53] A.N. Kolmogorov, On the representation of continuous functions of several variables by superpositions of continuous functions of fewer variables, Dokl. Akad. Nauk SSSR 108 (2) (1956) 179–182 (in Russian). [54] A.N. Kolmogrov, On the representation of continuous functions of several variables by superpositions of continuous functions of one variable and addition, Am. Math. Soc. Transl. 28 (1963) 55–59. [55] V.I. Arnol’d, On the representation of continuous functions of three variables by superpositions of continuous functions of two variables, Math. Collect. (Sbornik) 48 (1) (1959) 3–74 (in Russian). [56] V.I. Arnol’d, Representation of continuous functions of three variables by the superposition of continuous functions of two variables, Amer. Math. Soc. Transl. 28 (2) (1963) 61–147. [57] A.N. Gorban’, Approximation of continuous functions of several variables by an arbitrary nonlinear continuous function of one variable, linear functions, and their superpositions, Appl. Math. Lett. 11 (3) (1998) 45–49. [58] P. Petrushev, V. Popov, Rational approximations of real functions, in: Encyclopedia Math., Appl., vol. 28, Cambridge University Press, Cambridge, 1987. [59] V.N. Temlyakov, Nonlinear Methods of Approximation, IMI-Preprint Ser, University of South Caroline, 2001, pp. 1–57. [60] Y. Brudny, Adaptive approximation of functions with singularities, Proc. Moscow Math. Soc., 55 (1994) 149–242 (in Russian). [61] Y. Brudny, Nonlinear N-term approximation by refinable functions, St. Petersburg Math. J. 16 (1) (2005) 143–179. [62] Y. Brudny, N. Krugljak, Interpolation functors and interpolation spaces, Vol. 1, North-Holland Math. Library, vol. 47, North-Holland, Amsterdam, 1991. [63] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed., Prentice Hall, 1998, 842 p. [64] S. Haykin, Neural Networks and Learning Machines, third ed., Prentice Hall, 2009, 906 p. [65] G. Cybenko, Approximation by superposition of a sigmoidal function, Math. Control Signals Syst. 2 (1989) 303–314. [66] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Netw. 2 (1989) 359–366. [67] K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural Netw. 4 (1991) 251–257. [68] F. Scarselli, A.C. Tsoi, Universal approximation using feedforward neural networks: a survey of some existing methods, and some new results, Neural Netw. 11 (1) (1998) 15–37. [69] L.A. Rastrigin, R.H. Erenstein, Method of Collective Recognition, Energoizdat, Moscow, 1981, 80 p. (in Russian).

xlvii

Examples of problem statements and functionals

1

Before looking for an approximate solution of a certain problem, it is necessary to introduce a characteristic of its quality. In other words, it is necessary to characterize by the number (or set of numbers) how much a certain function (the approximate solution founded by us) corresponds to the conditions of the problem. Let us explain this with an example of the simplest boundary value problem AðuÞ ¼ g, u ¼ uðx Þ, x 2 Ω  Rp , BðuÞjΓ ¼ f ,

(1.1)

here A(u)—differential operator, i.e., an algebraic expression containing derivatives of an unknown function u, B(u)—operator, allowing to set boundary conditions, Γ—the boundary of the region Ω. In connection with the fact that it is possible to find the exact solution of problem (1.1) only in individual particular cases, we must search for its approximate solution. As a quality characteristic of such an approximate solution u, we will use the so-called error functional J. For problem (1.1) it is natural to choose J ¼ J1 + δ  J2, where the term J1 expresses the satisfaction of the equation, the term J2—expresses an estimate of the satisfaction of the boundary condition. If necessary, we add the term e δ  J3 to the functional J. The term J3 can contain terms corresponding to other conditions by which the desired solution of the problem must correspond: symmetry, boundary conditions, relations at the interfaces, equations of state, experimental data, etc. In this way, δ  J3 : J ðw Þ ¼ J1 + δ  J2 + e

(1.2)

Here, as before, penalty factors δ > 0, e δ > 0 are used. Ð It seems natural to choose: J1 ¼ Ω ðAðuÞ  g Þ2 dΩ, Ð J2 ¼ Γ ðBðuÞ  f Þ2 dΓ, but it is possible to calculate the corresponding integrals analytically only in exceptional cases, so it is more convenient to use a discrete form for functionals, for example, Semi-empirical Neural Network Modeling and Digital Twins Development. https://doi.org/10.1016/B978-0-12-815651-3.00001-8 # 2020 Elsevier Inc. All rights reserved.

1

2

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

J1 ¼

M     K      2 X X  2 A u ξj  g ξj , J2 ¼ B u ξ0k  f ξ0k : (1.3) j¼1

k¼1

Functionals are computed on the sets of test points: Ξ ¼ {ξj}M 1 in the domain Ω and Ξ0 5 {ξ0k}K1 on the border of the domain Γ. We note that instead of functionals (1.3) we can use functionals J1 ¼

M     K       X X     B u ξ0  f ξ0  A u ξj  g ξj , J2 ¼ k k j¼1

(1.4)

k¼1

or               J1 ¼ max A u ξj  g ξj , J2 ¼ max B u ξ0k  f ξ0k : (1.5) 1jM

1kK

Moreover, formula (1.4) works better than formula (1.3) in the situation when condition (1.1) can be violated on individual subsets of small measure. Formula (1.5) is better suited to the situation when it is required to ensure uniform satisfaction of condition (1.1) in the whole the region under consideration. Another interpretation of the choice of functionals (about statistical interpretation, see [1]) is connected with the fact that the choice (1.3) corresponds to normal errors, the choice of Eq. (1.4)— to errors that obey the Laplace distribution (changing conditions), the choice (1.5)—to the uniform distribution of errors. Let’s discuss the approaches to the choice of parameters δ and e δ. It is easiest to fix them in advance, starting from the a priori preset significance of satisfaction of the equation, the boundary condition, and the additional conditions. There is too much arbitrariness in this approach. More appropriate will be the choice before starting the training of such δ and e δ, that all three terms in the sum (1.2) would have the same value or correlate in a predetermined proportion. In the learning process, it makes sense to recalculate these parameters in the same way, and this conversion should be performed less often than the regeneration of the test points. It must be emphasized that we not be interested in the function in which the functional J reaches its minimum, we consider the function uη, for which the inequality J(uη) < η holds. Such a function is called the η-solution of the problem (1.1). This point of view is one of the differences between our approach to mathematical modeling and the traditional approach. The reason is that conditions (1.1) describe the real object approximately, so the number η > 0 is chosen so that the constructed model is assumed to be sufficiently accurate.

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

Computational experiments have shown that the use of a fixed set of test points in the expressions (1.3) is inexpedient since in this case the smallness of the functional J at trial points can be accompanied by large errors at other points of the region Ω or boundary Γ. The solution to this problem was the use of periodically regenerated sets of test points {ξj} in the domain Ω and, if necessary, regenerative sets of test points {ξ0k} on its boundary Γ. The regeneration of test points after a certain number of steps in the iterative selection process of uη makes it more stable. In fact, there is a consistent minimization of the set of functionals, each of which is obtained by a specific choice of test points and is not minimized (between the regeneration of the test set, we perform only a few steps of the selected minimization method). This approach, in particular, makes it possible to circumvent the problem of falling into the local extremum, which is characteristic of most methods of nonlinear optimization. The test points can be chosen inside the domain Ω and on the boundary regularly, for example, uniformly in the case of a bounded domain or according to the normal law if the domain is unbounded. In some cases, it is advisable to use the uneven distribution law of test points—thickening them near singularities (boundaries of discontinuity, angles, etc.) or in areas with large errors. It is also possible to regenerate only subsets of the set of test points, for example, corresponding to a certain fraction of the minimum errors. Corresponding examples are given in Chapter 4 when considering the results of computational experiments with specific problems. Note that it is possible to choose an error functional of a different type, for example, some energy functional for those statements of problems for which it can be constructed. Thus, for A ¼ Δ (the Laplace operator), one can take Ð J1 ¼ Ω ðruÞ2 dΩ. The use of such a functional reduces the requirements on the smoothness of the desired function but does not allow us to estimate the accuracy of the fulfillment of the condition (1.1). The choice of sets of functions among which an approximate solution of the problem (1.1) will be sought, we are going to analyze in Chapter 2, where we discuss the corresponding class— artificial neural networks (ANN) and consider the remarkable properties inherent in these functions. Often the formulation of the problem includes parameters, for example, density, thermal conductivity, elasticity, etc., which for a real environment may not be known exactly.

3

4

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

The size of the object, the ambient temperature, and other parameters can vary in different tasks. The problem of constructing parametric models arises when it is required to investigate the behavior of a solution as a function of a certain parameter, to identify the value of a parameter from measurement data, or when the characteristics defining a simulated system are given by values distributed in certain intervals, by interval parameters. The classical approach assumes in a similar situation a numerical solution of the problem for a sufficiently representative set of parameters. The approach suggested by us allows us to look for a solution in the form of a neural network function, for which these parameters are among the arguments. The result is an approximate solution that is convenient for further work—researching the behavior of the model, plotting the graphs, optimizing the parameters of the object, constructing the control system for it, etc. Let us explain the essence of this approach with the boundaryvalue problem (1.1). The modification of the problem consists in the fact that its formulation includes parameters r ¼ (r1, …, rk), + which vary on certain intervals ri 2 (r i ; ri ), i ¼ 1, …, k. Thus, condition (1.1) takes the form  Aðu, rÞ ¼ g ðrÞ, u ¼ uðx, rÞ, x 2 ΩðrÞ  Rp , Bðu, rÞΓðrÞ ¼ f ðrÞ: (1.6) The terms of the error functional (1.3) are replaced by the following expressions J1 ¼

M    X    2 A u x j , rj  g x j , rj , j¼1

M0      2 X J2 ¼ B u x 0j , r0j  f x 0j , r0j :

(1.7)

j¼1

The functionals (1.4) or (1.5) are modified similarly. In the representation (1.7), {xj, rj}M j¼1—periodically regenerated test points in k    Q  0 the domain Ω rj  ri ; ri+ ; {xj0 , rj0 }M j¼1—test points on its border Γ(rj0 ).

1.1

i¼1

Problems for ordinary differential equations

It is more convenient to start the demonstration of the possibility of describing the quality of the solution of problem (1.1) and its generalizations with the help of functionals of the form (1.2) in

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

the case of ordinary differential equations (ODE). Thus, an approximate solution of the classical Cauchy problem. y 0 ðxÞ ¼ F ðx, y Þ, y ðx0 Þ ¼ y0

(1.8)

on an interval can be obtained by minimizing the error functional with terms (1.3) of the form J1 ¼

M    X    2 y 0 xj  F xj , y xj and J2 ¼ ðy ðx0 Þ  y0 Þ2 ,

(1.9)

j¼1

where {xj}M j¼1—a set of test points in the interval (a; b), which usually changes in the learning process. The necessity of applying neural networks to this classical problem in such a formulation is not clear since there are a lot of standard methods for solving it. Most of them are various numerical methods leading to pointwise approximation. The transition from a solution defined in a finite set of points to a certain analytic expression is a separate problem if we take into account the adequacy of the initial equation and stability concerning the errors present in the initial data and the errors obtained in the computation process. The neural network approach makes it possible to obtain a solution immediately as a function satisfying the required smoothness conditions. More important is the stability of neural networks in relation to data errors. If it is necessary to increase this stability, then we can add random perturbations to the expressions (1.9) by choosing J1 ¼

M    X    2 y 0 xj  F xj , y xj  ξj , J2 ¼ ðy ðx0 Þ  y0  ξ0 Þ2 : j¼1

Here ζ 0 and ξj, j ¼ 1, …, M—random additives whose distribution law and amplitudes are determined by the assumed error properties. It makes sense to change these additives simultaneously with the test points {xj}jM j¼1. If the point x0 does not belong to the interval (a; b), then a nonclassical problem is already obtained. More important for applications is the ability to build approximate solutions of knowingly overdetermined tasks. The simplest example is the modification of the problem (1.8), consisting of replacing (or supplementing) (usually the results the equality y(x0) ¼ y0 with a set of  conditions  e e of observations) y ðx1 Þ ¼ f1 ,…, y xp ¼ fp . We note that in the general case, there is no exact solution to such a problem. Such a modification makes sense in a situation where the object of modeling is not the problem (1.8), but some real object for which the problem in a statement (1.8) is an approximate model.

5

6

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

When implementing our approach, a third term of the error funcp P 2 tional (1.2) is added, i.e., the term J3 ¼ ðy ðe xk Þ  fk Þ will appear, k¼1

while small errors in these data have little effect on the constructed approximate solution. Note that some of the points e xk may not belong to the gap(a; b). If the measurements do not have equal accuracy, then an addip P 2 δk  ðy ðe x k Þ  fk Þ , tional term can be represented in the form k¼1

where more reliable observations come in with large weights δk > 0. Much more difficult is the problem of finding the function F(x, y) from the results of observations. At the same time, we build the equation and its solution in parallel—both of the modeling stages discussed above are combined. Another approach ispossible, in which based on the observa xp ¼ fp , the neural network interpolation tion data y ðe x1 Þ ¼ f1 ,…, y e of the solution is constructed by the usual methods [2]. After this, from the obtained function y(x), we look for the form of the equation; the form of the function F(x, y) is chosen in such a way that the error functional reaches a minimum value. In this case, (with the appropriate modification of the functional), one can also satisfy certain additional conditions, for example, the symmetry or smoothness of the function F(x, y). One can apply the third approach—to combine the approximation of F(x, y) and some classical method for the numerical solution of a differential equation of the Runge-Kutta type (see, for example, [3]) or the use of a multilayer model (see Chapter 5) to construct an approximate solution of the differential equation. Our approach allows natural generalizations to systems of ordinary differential equations and higher-order differential equations. For example, the solution of the Dirichlet boundary € dinger operator on the value problem for the stationary Schro interval (x0; x1)  (a; b) y 00 ðxÞ + qðxÞy ðxÞ ¼ 0, y ðx0 Þ ¼ y0 , y ðx1 Þ ¼ y1 can be sought by minimizing the error functional (1.2), whose terms have the form J1 ¼

M        2 X y 00 xj + q xj y xj , J2 ¼ ðy ðx0 Þ  y0 Þ2 + ðy ðx1 Þ  y1 Þ2 , J3 ¼ 0: j¼1

As was noted earlier, the first term can be given in the form

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

J1 ¼

M h   X    2 i 2 y 0 xj + q xj y xj : j¼1

The above approach easily allows us to consider the case when the points x0 and x1 do not belong to the interval (a; b). Just like before, we can search for a solution that takes given values at individual   xp ¼ fp or satisfies more complex condipoints y ðe x1 Þ ¼ f1 , …,y e tions such as conservation laws, spectral properties of the solution, and so on. Even more interesting for applications is the problem of determining the potential q(x) from observational data. The potential is sought by minimizing the functional (1.2) into which the expression for the function q in the form of the output of a neural network. At the same time, the potential q(x) and decisions y(x) are selected. Similarly, we can consider other equations of the second and higher order. Without special changes, equations that are not resolved concerning the highest derivative are also considered. For example, in the case of an equation of the first order F(x, y, y 0 ) ¼ 0, it suffices to choose J1 ¼

M X

     F 2 xj , y xj , y 0 xj :

j¼1

In this paragraph, these and some other problems are considered using examples of specific problems.

1.1.1

A stiff differential equation

As the first problem on which the application of our methods will be demonstrated, we consider the parametrized Cauchy problem from [4]:  0 y ¼ αðy  cos xÞ (1.10) y ð0 Þ ¼ 0 With parameter values α 50 problem (1.10) is stiff in the sense that its solution by the Euler method is unstable in a neighborhood of zero. Therefore, to demonstrate the work of our approach under conditions of stiffness, it suffices to consider x 2 [0; 1], α 2 [αmin; αmax], where 50  αmax, i.e. for some values of the parameter from the interval under consideration the problem is stiff. By the above general approach, we seek an approximate solution of the problem (1.10) by minimizing the error functional (1.2) with terms

7

8

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

Xm        2 y 0 ξj , αj + αj y ξj , αj  cos ξj , j¼1 Xm   y 2 0, αj , J3 ¼ 0: J2 ¼ j¼1

J1 ¼

Pairs of test points are regenerated as random variables uniformly distributed in the range of variation of the variable x 2 [0; 1] and parameter α 2 [αmin; αmax]. If there are additional data, then they are recorded by adding to the error functional the term J3. If there is information about the solution at individual points: y(xj, αj) ¼ yj, this term has the form J3 ¼

m  X   2 yj  y xj , α j : j¼1

Such information can be obtained, for example, by a numerical solution to the problem (1.10) for some fixed values of the parameter α. Let us dwell in more detail on the model, where as additional data we use information in the form of equations, which we obtain directly from the condition of the problem that is solved. We note that for sufficiently small values of the parameter α the asymptotic expansion of the third-order by the parameter α for the solution takes place [5] y ffi A1 ðx, αÞ ¼ α sin x + α2 ð cos x  1Þ  α3 ð sin x  xÞ: This asymptotic condition will be taken Pinto account in the error functional by choosing the term J3 ¼ Kk¼1(y(xk, αk)  A1(xk, αk))2, where αk 2 [α 0 min; α 0 max], α 0 max ≪ αmax. It should be noted that when using the asymptotic condition with the requirement α 0 min < αmin, the problem becomes nonclassical since we superimpose the condition y ffi A1(x, α) outside the solution search area. For large values of the parameter α, we can consider 1 =α as a small parameter. In this case, the asymptotic expansion of the solution takes the form y ffi cos x +

sin x cos x sin x  2  3 : α α α

(1.11)

We note that the right-hand side of the expression (1.11) does not satisfy the boundary condition of the problem (1.10): y(0) ¼ 0. We correct the condition (1.11) as follows. Considering the variable x itself as a small parameter, we obtain in a neighborhood x ¼ 0 new  x3 ðαxÞ2  3 + α α : decomposition y ffi αx  2 6 Then instead of condition (1.11), we obtain the following condition

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

y ffi A2 ðx, αÞ

!  x3 sin x cos x sin x ðαxÞ2  3 + α α  2  3 , αx  ¼ min cos x + : 2 6 α α α

These asymptotic decompositions are used in constructing the neural network model by choosing an additional term XK δ1  ðy ðxk , αk Þ  A1 ðxk , αk ÞÞ2 J3 ¼ e k¼1 +e δ2 

XL

ðy ðxl , αl Þ  A2 ðxl , αl ÞÞ2 l¼1

(1.12)

In expression (1.12), the first term is computed for small values of the parameter α, and the second term—for large values of the parameter α. Usually, in the process of optimization, there is a complete regeneration of the test points. If it is not carried out, then the neural network method can be considered as an analog of the collocation method, where the approximation is constructed for a set of fixed points. We introduced an algorithm for partial regeneration of test points in the work [6]. The parameter dt 2 [0; 1], reflecting the fraction of points fixed from one iteration to another, is introduced. For example, the condition dt ¼ 0 means complete regeneration, i.e., all points are newly selected randomly before each iteration; the condition dt ¼ 1 corresponds to the method of collocations. For intermediate values of the parameter, the following rule is used: dtm points from all m test points with the largest values of terms in the term J1 are fixed, and the remaining points are regenerated randomly. The results of the experiments given in Chapter 4 confirmed the hypothesis that for a small number of test points, it is advisable to use their total regeneration (dt ¼ 0).

1.1.2

The problem of a chemical reactor

In this section, we consider the macrokinetic model of a nonisothermal chemical reactor [7]. The processes of the thermal explosion, autoignition, ignition, propagation of combustion waves, etc. occurring in the reactor are described by a system of quasilinear (parabolic, elliptic) partial differential equations or ordinary differential equations. Next, we study the stationary problem of a thermal explosion in the plane-parallel case [7] under the assumption that the reaction is one-stage, irreversible, is not accompanied by phase transitions, proceeds in a stationary medium.

9

10

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

An approximate solution of the boundary value problem is sought: d2 θ dx

2

+ α exp ðθÞ ¼ 0,

dθ ð0Þ ¼ 0, θð1Þ ¼ 0: dx

(1.13)

This problem is interesting because we know the exact solution, the existence domain of the solution, and the parameter values for which the solution of the problem does not exist (α > α∗ 0.878458), see [8]. The error functional (1.2) is constructed from residuals: in the satisfaction of the equation  2 M     2 P d θ xj , αj + αj exp θ xj , αj , J1 ¼ 2 j¼1 dx in the satisfaction of the boundary!conditions  M  2    2 P dθ  0, αj J2 ¼ + θ 1, αj , and dx j¼1 in the satisfaction of additional conditions in the form m    2 P θ xi0 , α0i  θi . J3 ¼ i¼1

Here xj—periodically regenerated test points from a segment [0;1]; αj—test points from a segment [αmin; αmax]; θ—fixed points at which the values of the desired function are known θ(xi0 , αi0 ) ¼ θi. Our approach was applied both without taking into account additional information and with its account (hybrid method). The additional information had the form of a rough numerical solution of Eq. (1.13), the values of which were considered as additional data for solving the problem. For this problem, the situation is interesting when the solution is constructed in the domain of the parameter change, in which part the exact solution is absent. By this, we check the behavior of the error functional in a similar situation. It is actual for problems in which we do not know in advance the range of the parameter values for which there is an exact solution of the problem.

1.1.3

The problem of a porous catalyst

Analysis of the balance of heat and mass in the granule of a porous catalyst during a catalytic chemical reaction leads, in dimensionless variables, to the study of the following nonlinear boundary value problem [7, 9]: it is required to find a solution y(x) of the ordinary differential equation

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

p γβy + y ¼ α ð 1 + y Þ exp  , 1  βy dx2 x d2 y

(1.14)

satisfying the boundary conditions dy ð0Þ ¼ 0, y ð1Þ ¼ 0: dx The parameter p takes into account the geometry of the pellet: for a spherical particle p ¼ 2, for a cylindrical particle—p ¼ 1. In this example, we consider the case of a flat granule—p ¼ 0. In [10], from the materials of the VI International Conference NPNJ’2006 two methods for the numerical solution of the discrete analog of the posed problem (its difference approximation) are presented: the Lahae method and the method of discrete continuation concerning the best parameter. The results of calculations for these original methods, unfortunately, are not given, but it is asserted that they coincide with the results obtained by the method of integral equations, which are shown in the famous monograph of Professor Na—[11]. We construct an error functional for this problem (1.2). In our case, the area where the solution is sought is the interval Ω ¼ (0; 1), with the boundary Γ ¼ ∂ Ω ¼ {0, 1}; the terms of the error functional are given by "   #!2 M X    γβy xj d2 y     xj  α 1 + y xj exp  , J1 ¼ 2 1  βy xj j¼1 dx (1.15)  2 dy 2 ð0Þ + ðy ð1ÞÞ : J2 ¼ dx More interesting is the problem of constructing the solution of the problem not for fixed values of the parameters, but for values from certain intervals. In this case, these parameters will be included in the number of input variables along with the variable x. As such a parameter, we could choose α, since the dependence on it is most interesting from applications. However, it is more tempting to introduce all three interval parameters: α 2 (α; α+), β 2 (β; β+) and γ 2 (γ ; γ +). The terms of the error functional become

"

M     X d2 y  x , α , β , γ 1 + y x , α , β , γ  α J1 ¼ j j j j j j j j j 2 j¼1 dx   132 0 γ j β j y xj , α j , β j , γ j   A5 , exp @ 1  βj y xj , αj , βj , γ j

11

12

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

!   M  2   X dy J2 ¼ + y 2 1, αj , βj , γ j : 0, αj , βj , γ j dx j¼1 We also applied a hybrid variant of the method, in which we used additional data, adding to the functional the term m    2 P J3 ¼ y xi0 , α0i  yi , composed of discrepancies at the points i¼1

at which the values of the desired function are known. As these values, we took the results of an inaccurate solution of the difference approximation to the original boundary-value problem.

1.1.4

Differential-algebraic problem

Solving nonstandard boundary value problems for differential equations and systems of differential equations usually comes across difficulties of various kinds. In this case, the direct application of classical methods usually does not yield a result, and to obtain an approximate solution of acceptable accuracy, it is necessary to use artificial techniques or combinations of several approaches, the choice of which depends on the specific task. The paper [12] considers a class of differential-algebraic problems that are proposed to be solved using a combination of shooting method and method of the best parametrization. The authors of [12] demonstrated the successful application of this combination of methods to specific problems. To test our approach, we consider one of these problems—a boundary value problem for a singularly perturbed differential-algebraic system εy 00 ¼ y  z2 , y 2 ¼ z, y ð0Þ + y 0 ð0Þ ¼ 0, y ð1Þ ¼ 1=2:

(1.16)

In this case, the authors of [12] showed that depending on the value of the parameter ε, the problem (1.16) has a different number of solutions. We will apply our approach to this problem. The most straightforward version of the error functional (1.2), which we used to search for functions y(t) and z(t), is a functional that includes components J1 ¼

M  X

εy 00 ðti Þ  y ðti Þ + zðti Þ2

2

,

i¼1

J2 ¼ ðy ð1Þ  1=2Þ2 + ðy ð0Þ + y 0 ð0ÞÞ2 , M  2 X J3 ¼ zðti Þ  y ðti Þ2 , i¼1

(1.17)

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

where test points {ti}M i¼1  [0; 1] are chosen evenly distributed on the segment [0; 1], and, as before, we periodically regenerate them during the process of minimizing the functional (1.2). As another variant of the solution to the problem, we introduce an unknown parameter p similar to the shooting method from work [12]. Then the boundary conditions of the problem (1.16) will take the form of y ð0Þ ¼ p, y 0 ð0Þ ¼ p, zð0Þ ¼ p2 , y ð1Þ ¼ 1=2: In this case, we look for y(t, p) and z(t, p) in the process of minimizing the error functional (1.2), where the terms J1 ¼

M  X

2 εy 00 ðti , pi Þ  yðti , pi Þ + zðti , pi Þ2 ,

i¼1   M X 2 0 ð0ÞÞ2 + ðy ð0, p Þ  p Þ2 , ð y ð 1, p Þ  1=2 Þ + ð y ð 0, p Þ + y i i i i J2 ¼ (1.18) i¼1

J3 ¼

M  X

zðti , pi Þ  yðti , pi Þ2

2

i¼1

are calculated on the set of test points {ti, pi}M i¼1  [0; 1]  [0; 1]. We assume that the number of local minima of the functional (1.2) with terms (1.18) as a function of the parameter J(p) corresponds to the number of solutions of the system (1.16). Our assumption is caused by the fact that the exact solutions of problem (1.16) correspond to the exact equal to 0 minima of the given functional. We seek specific solutions by minimizing the functional obtained from relations (1.18) by replacing pi with the values of the parameter corresponding to local extrema J(p).

1.2

Problems for partial differential equations for domains with fixed boundaries

In this section, we show how the general methodology can be applied to the simplest classical problems of mathematical physics, the solution of which is well known, and to certain nonclassical problems. First, we consider the Dirichlet problem for the Laplace equation, on the example of which we illustrate both general methods of constructing the error functional and particular methods applicable, as a rule, to linear problems. We note that

13

14

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

some complications of the problem: other options for specifying boundary conditions, complex geometry, nonlinear equations, the effect of dimension, and also multicomponent domains, nonclassical statements, etc.,—do not lead to problems for our approach. Various complications of this kind are discussed below on specific examples.

1.2.1

The Laplace equation on the plane and in space

Properties of solutions of elliptic boundary value problems in many respects are similar to the solution of Dirichlet’s problem for the Laplace operator in the unit disc. The solution of the problem is well known: there is an explicit integral representation, which makes it possible to test the results obtained. By the Riemann theorem, any simply connected domain can be conformally mapped onto a circle; from the conformal invariance of the Laplace operator it follows that the solution set will be invariant: the solutions of the equation at that go over into the solutions (see, for example, the book [13]). Let be a point (x, y) on the plane R2, r, φ—its polar coordinates, Ω : r < 1,  the unit circle, and the function u(x, y)—the solution of the problem Δu ¼ 0 inside the circle (r < 1), and u ¼ f on the boundary of the circle (at r ¼ 1). ∂2 u ∂2 u Here Δu ¼ 2 + 2 is the Laplace operator. ∂x ∂y It is known [14, 15] that the solution of this problem is represented by the Poisson integral, which in polar coordinates has the form: uðr, φÞ ¼

1 2π

ðπ f ðϕÞ π

1  r2 r 2  2r cos ðφ  ϕÞ + 1

dϕ:

It is possible to calculate this integral analytically only in exceptional cases (for example, when the boundary function f(ϕ) is a rational function of the functions cosϕ and sinϕ), therefore, as a rule, we have to find it numerically. To solve this problem, we apply a method based on the error functional (1.2). For the Dirichlet problem, as the term J1, we can choose the expression of the form ðð " 2  2 # ∂u ∂u dxdy + ∂x ∂y Ω

alternatively, its discrete representation

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

"  # M X  2  2 ∂u  ∂u  + xj , yj xj , yj : ∂x ∂y j¼1 We usually use another functional, the first term in which has the M   2 P form: Δu xj , yj : This kind of functional J1 is useful in cases j¼1

where it is difficult to compare the variational principle to the equation—an analog of the Dirichlet integral, for example, when considering non-linear equations, problems with complex-valued coefficients, etc. In most computational experiments, we used a random distribution of test points {(xj, yj)}M j¼1. These points will regenerate through a certain number of learning epochs (optimization steps) with the help of a constant (or other) probability density, which provides a more stable course of learning. We also used a combination of regular and random distribution of control points. For example, in order to more accurately estimate the normal derivative of the solution at the boundary, it turned out to be useful to add a regular set of points sufficiently close to the boundary to a random collection of points. Just as it was done in the paper [16], one can search for a solution in the form of two summands—one satisfies the boundary condition and does not contain the chosen parameters, and the other—the equation with the first term in mind and contains the chosen parameters. This method is suitable only for linear problems. As a second example, consider the following problem. We seek a solution u ¼ u(x, y) of the two-dimensional Laplace equation Δu ¼ 0 in the region L: 0 < x, y < a, min(x, y) < d < a (L-region), which is the union of two rectangles Π1 : 0 < x < a; 0 < y < d and Π2 : 0 < x < d; 0 < y < a; the solution satisfies the Dirichlet conditions: ujΓk ¼ fk on the parts {Γk}6k¼1 of the domain boundary (Fig. 1.1). The solution of this model problem is expressed in explicit form through solutions in canonical subdomains Π1, Π2, obtained, for example, by the method of separation of variables. For the selected f2(y) ¼ sin(πy/a), boundary conditions f1(x) ¼ sin(πx/a), e , extended by zero to a square f3 ¼ f4 ¼ f5 ¼ f6 ¼ 0, this solution u Qd,a ¼ (d; a)  (d; a), has the form shðπ ðd  y Þ=aÞ ð1 + signðd  y ÞÞ + 2shðπd=aÞ shðπ ðd  xÞ=aÞ + sin ðπy=aÞ ð1 + signðd  xÞÞ: 2shðπd=aÞ

eðx, y Þ ¼ sin ðπx=aÞ u

(Fig. 1.2)

15

16

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

y

Γ3

a

Γ4 Γ2 d

Γ5

Π2

Γ6

Π1 d

Γ1

x

a

Fig. 1.1 The area L in which the solution is sought.

1 0.75 0.5 0.25 0 0

1 0.8 0.6 0.4

0.2 0.4

0.2

0.6 0.8

1 0

Fig. 1.2 The plot of the exact solution ue (a ¼ 1, d ¼ 0.2).

In this case, the components of the minimized functional (1.2) become J1 ¼

M1  X

M3 M2   2 X  2 X   2 Δu xj, 1 , yj, 1 + Δu xj, 2 , yj, 2 + Δu xj, 3 , yj,3 ,

j¼1

J2 ¼

j¼1

mk 6 X X     2 u xjk , yjk  fk xjk , yjk , J3 ¼ 0: k¼1 jk ¼1

j¼1

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

k The sets of control points are chosen as follows: {(xj,k, yj,k)}M j¼1—within k —on the regions Π1\Π2, Π2\Π1, and Π1 \ Π2 accordingly, {(xjk,yjk)}jm k¼1 the parts Γk, k ¼ 1, …, 6, of the boundary of the region L. It is possible to use the error functional with another choice of the term J1:

J1 ¼

M1  X

M3 M2   2 X  2 X   2 ru xj,1 , yj,1 + ru xj, 2 , yj, 2 + ru xj,3 , yj,3 :

j¼1

j¼1

j¼1

We also looked for a solution in the form of a set of three functions u1, u2, u3 inside the domains Π1\Π2, Π2\Π1, and accordingly Π1 \ Π2, then J1 ¼

M1  X

M3  M2   2 X  2 X  2 Δu1 xj, 1 , yj, 1 + Δu2 xj, 2 , yj, 2 + Δu3 xj, 3 , yj, 3 ,

j¼1

J2 ¼

j¼1

mk  8 X X k¼1 jk ¼1 0

J3 ¼

   2 ulðkÞ xjk , yjk  fk xjk , yjk ,

mk  2 X X k¼1 jk ¼1

j¼1

   2 upðkÞ xj0k , yj0k  uqðkÞ xj0k , yj0k :

Here l(k)—the subdomain number corresponding to this part of 0 k the border, {(xjk0 , yjk0 )}m jk¼1—test points at the junctions between neighboring sub-areas, p(k) and q(k)—the subdomains numbers corresponding to this junction. The above reasonings easily extend to the case of a higher dimension. Consider, for example, an elliptic boundary-value problem: in a domain Ω  R3 with a smooth boundary ∂ Ω ¼ Γ it is required to find a harmonic function u(x, y, z)—a solution of the Laplace equation: ∂2 u ∂2 u ∂2 u + + ¼ 0, ∂x2 ∂y 2 ∂z2 

  ∂u ∂u is satisfying the boundary conditions α + βu  ¼ f , where ∂n ∂n Γ the derivative along the normal to the boundary Γ. The Dirichlet condition is obtained for α ¼ 0, β ¼ 1; the Neumann condition— α ¼ 1, β ¼ 0; the third boundary condition—α ¼ 1, β 6¼ 0. It is known from the method of potentials that the solution of the internal Dirichlet problem (or the external Neumann problem) is unique, exists, and can be represented in the form

17

18

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

ð uðx, y, zÞ ¼ G ðx, y, z; ξ, η, ςÞf ðξ, η, ςÞdΓ: Γ

One can usually find a function G for domains that have a symmetry: for example, for a unit ball Ω  R3 the solution of the Dirichlet problem u is represented by the Poisson integral [15] ð 1 1  x2  y 2  z2 uðx, y, zÞ ¼ h i3=2 f ðξ, η, ζ ÞdΓξ,η,ζ : 4π 2 2 2 ð x  ξ Þ + ð y  η Þ + ð z  ζ Þ Γ Calculate similar integral analytically is possible only in exceptional cases; therefore, as a rule, we have to find it numerically. In accordance with the above reasoning, we seek a minimum of the functional (1.2). For the Dirichlet problem the first term has M   2 P Δu xj , yj , zj alternatively, the form: j¼1

"   # M X  2  2  2 ∂u  ∂u  ∂u  + + xj , yj , zj xj , yj , zj xj , yj , zj , ∂x ∂y ∂z j¼1 where test points {(xj, yj, zj)}M j¼1  Ω, as well as before, we regenerate periodically as random quantities, uniformly distributed in the region Ω. The term J2 is written in an obvious way for each kind of boundary conditions. For example, in the case of the Dirichlet boundary condition, this term has the form J2 ¼

m  X    2 u xj , yj , zj  f xj , yj , zj , j¼1

here a set of test points



m  xj , yj , zj xj 2 + yj 2 + zj 2 ¼ 1 j¼1  Γ is ran-

domly selected on the boundary of the unit ball; this set is also regenerated.

1.2.2

The Poisson problem

Consider the following boundary-value problem for the twodimensional Laplace operator: let Ω  R2 be a bounded domain with piecewise-smooth boundary ∂ Ω; D  Ω—its strictly internal subdomain; It is required to find a solution u(x, y) of the homogeneous Dirichlet problem for the Poisson equation Δu ¼ uxx + uyy ¼ g, where g(x, y) ¼ 0 for (x, y) 2 Ω\D, u j∂Ω ¼ 0.

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

y 1.0

0.5

W –1.0

D

–0.5

1.0

0.5

x

–0.5

–1.0

Fig. 1.3 The domain considered in the Poisson problem.

For computational experiments, we have chosen Ω : x2 + y 2 < 1, D : ðx  x0 Þ2 + ðy  y0 Þ2 < r 2 ,x0 ¼ 0:4, y0 ¼ 0, r ¼ 0:4, g ¼ A ¼ 10, ðx, y Þ 2 D,g ¼ 0, ðx, y Þ 2 Ωn D: (Fig. 1.3) The first term of the error functional (1.2) can be chosen ð ð ð 2 2 ðΔu  g Þ dxdy ¼ ðΔu  AÞ dxdy + ðΔuÞ2 dxdy (1.19) Ω

Ωn D

D

alternatively, the Dirichlet integral: # ðð " 2  2 ∂u ∂u + + 2gu dxdy ∂x ∂y

(1.20)

Ω

Since these integrals can only be analytically calculated in rare special cases, in real calculations we usually use their discrete analogs, which for the given problem have the form:  2 M  ∂2 u     2 P ∂ u x , y x , y for (1.19) and +  g x j j j j j 2 ∂y 2 j¼1 ∂x " #  M  2  2   P ∂u  ∂u  + + 2gu xj , yj for (1.20) xj , yj xj , yj ∂x ∂y j¼1 respectively. Homogeneous boundary conditions are taken into account by the choice of the term 2ðπ

ðuð cos ϑ, sin ϑÞÞ2 dϑ

J2 ¼ 0

alternatively, its discrete analog is possible.

(1.21)

19

20

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

In the case when an approximate solution is constructed from two functions for each of the subdomains, the compatibility condition is also required to be introduced into the functional. Denoting by u+(x, y) and u(x, y) the approximations of the solution in the domain D and the domain complement Ω\D, respectively, we obtain the following representation for the components of the error functional: ð ð 2 + ðΔu Þ2 dxdy J1 ¼ ðΔu  AÞ dxdy + Ωn D

ð

ð ðΔu Þ2 dxdy  2A Δu + dxdy + πr 2 A2

Ωn D

D

ð

D + 2

¼ ðΔu Þ dxdy + D

for the functional (1.19), ðð " + 2  + 2 # ðð "  2   2 # ∂u ∂u ∂u ∂u J1 ¼ + + dxdy + dxdy ∂x ∂y ∂x ∂y D Ω=D ð + 2A u + dxdy D

for the functional (1.20), 2ðπ

J2 ¼

u ð cos ϑ, sin ϑÞ2 dϑ,

(1.22)

0 2π ð

J3 ¼

ðu + ð0:4 + 0:4cos φ, 0:4 sin φÞ  u ð0:4 + 0:4cos φ, 0:4 sin φÞÞ2 dφ:

0

(1.23)

Conditions of smoother gluing of the solutions u can be investigated similarly.

1.2.3

The Schr€odinger equation with a piecewise potential (quantum dot)

The problem of constructing stable approximate mathematical models of nanoobjects (quantum dots, wires) is very relevant for both theory and practice. Numerous publications testify to this, among which we single out a monograph [17] and articles [18, 19] devoted to individual problems of this kind. An exact solution of the problem can be obtained in the case of high symmetry and an extremely simplified model—in the case when there is no

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

symmetry, or it is required to describe the behavior of the modeled object more accurately, an approximate solution must be constructed. The problem of constructing a model of such a nanostructure as a quantum dot belongs to the problems of the type indicated above. As a model, we consider the following boundary value problem: in the compound domain Ω ¼ Ω1 [ Ω2, where Ω1 is a simplyconnected strictly interior subdomain Ω with boundary ∂ Ω1 ¼ Γ12, Ω2 ¼ Ω\Ω1 is a doubly-connected subdomain Ω with a complete boundary ∂ Ω2 ¼ Γ12 [ Γ, it is required to find a solution of the sta€ dinger equation (Fig. 1.4) tionary Schro r  ðpruÞ + ðq  λÞu ¼ 0 in the case of piecewise constant coefficients p jΩj ¼ pj, q jΩj ¼ qj, j ¼ 1, 2, the matching conditions of the form p1 ∂ u1/∂ n jΓ12 ¼ p2 ∂ u2/∂ n jΓ12 at the interface segment Γ12—Ben Daniel-Duke interface condition—with a discontinuous coefficient p: p1 6¼ p2, and the Dirichlet boundary conditions u2 jΓ ¼ 0 on the boundary segment Γ. The subdomain Ω1 corresponds to a quantum dot, and the subdomain Ω2 corresponds to the surrounding matrix. The coefficients pj are in the given model of a quantum dot the rational functions of the spectral parameter λ:  1   1  + 2 λ + E j  qj + Δ j , pj ¼ Kj2 λ + Ej  qj coefficients Kj, Ej, Δj and potentials qj are assumed to be known.

Γ Ω1 Ω2

Fig. 1.4 Domain O.

Γ12

21

22

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

The spectral parameter λ enters into the equation nonlinearly, which complicates the solution. With a known value of the parameter obtained from an experiment or calculated in simple problems with the symmetry via the nonlinear characteristic equation, the assigned task is embedded in the general scheme. The problem of determining the admissible values of the spectral parameter is far from trivial [18, 19]. Its consideration here is not carried out, representing a subject of separate research. In the case of a one-dimensional problem, the domain Ω is a system of two nested segments; ordinary differential equation u00 ðxÞ +

qλ uð x Þ ¼ 0 pðλÞ

(1.24)

for each subdomain Ω1 ¼ [ d; d] and Ω2 ¼ [ m;  d] [ [d; m] is solved explicitly. The matching conditions at the junction and the requirement of decreasing the solution with x ! m lead to a transcendental relation for the values of the spectral parameter λ, characterizing the bound states. When q1 ¼ 0 and finite potential q2, we will have a finite number of such values. The eigenfunctions are also computed explicitly in a piecewise analytical form. Computing process, we build by minimizing the error functional, which has been selected as an assembly (1.2) from the first term in the form of energy and the terms of coordination on the border of subdomains

J1 ¼

M1   X 2 p1 ðu1 0 ðxm1 ÞÞ + ðq1  λÞðu1 ðxm1 ÞÞ2 m1 ¼1 M2 X



+

 2 p2 ðu2 0 ðxm2 ÞÞ + ðq2  λÞðu2 ðxm2 ÞÞ2 ,

m2 ¼1

(1.25)

J2 ¼ ðu1 ðd Þ  u2 ðd ÞÞ2 + ðu1 ðd Þ  u2 ðd ÞÞ2 , J3 ¼ ðp1 u1 0 ðd Þ  p2 u2 0 ðd ÞÞ2 + ðp1 u1 0 ðd Þ  p2 u2 0 ðd ÞÞ2 :

Here, in accordance with the general approach, we proposed earj lier, {xmj}M 1 , j ¼ 1, 2—a set of test points on Ωj, which are regenerated after a certain number of steps in the learning process. Similarly, we consider the multidimensional case. As before, the solution of the problem—the wave function—will approach piecewise in each of the subdomains Ωj via minimization of the error functional (1.2), for which the summands are

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

M1  X

J1 ¼

m1 ¼1 M2 X

p1 kru1 ðx m1 Þk2 + ðq1  λÞu1 2 ðx m1 Þ



+

m2 ¼1

M12 X

J2 ¼ J3 ¼

 p2 kru2 ðx m2 Þk2 + ðq2  λÞu2 2 ðx m2 Þ ,

ðu1 ðx m12 Þ  u2 ðx m12 ÞÞ2 ,

m12 ¼1 0 M12  X m012 ¼1 M X

+



(1.26)

       2 p1 n x 0m0  ru1 x 0m0  p2 n x 0m0  ru2 x 0m0 12

12

12

12

u2 2 ðx m Þ:

m¼1

Here, n—a normal unit vector of the boundary Γ12. We will change a M12 j set of test points—{xmj}M 1 , j ¼ 1,2, in subdomains Ωj, {xm12}1 , 0 0 0 M12 {x m12}1 on the interface Γ12 and the borderΓ—after a certain number of steps in the process of minimizing the functional (1.2).

1.2.4

The nonlinear Schr€ odinger equation

As another model equation, we considered the nonstationary € dinger equation of the form nonlinear Schro iΨt + ΔΨ +νjΨj2 Ψ ¼ G ðx, t Þ ¼ g ðx Þ exp ½iðk  x  ωt Þ , k ¼ kx , ky 2 R2 , x ¼ ðx, y Þ 2 R2 , k  x ¼ kx x + ky y: This equation is widely used to describe nonlinear wave processes—light beams in waveguides, oscillations in plasma, effects in the theory of superconductivity, etc. If we look for its solution in the form of a plane wave Ψ(t, x) ¼ u(x) exp[i(k  x  ωt)], we will € dinger equation with a cubic nonlinearobtain a stationary Schro ity for the complex function u ¼ u(x) ¼ u(x, y)   (1.27) Δu  jkj2  ω u + 2ik  ru + νjuj2 u ¼ g: To this problem, we have applied the previously stated approach. Herewith the first summand of the functional (1.2) has the form M    X   2      Δu xj , yj  jkj  ω u xj , yj + 2ik  ru xj , yj j¼1   2    2 + νu xj , yj  u xj , yj  g xj , yj  :

Consider two types of conditions—two variants of the problem statement. First, we can look for a solution in the region Ω on the plane— as a model region, we chose the circle Ω : x2 + y2 < 1—and set the condition on the boundary of the region (circle). If g 6¼ 0, we can

23

24

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

set this condition to be homogeneous. Herewith summand J2 of the functional (1.2) is formed in the same way as before. During carrying out computational experiments, we used g functions of two types of different smoothness with supports in some small domain D: (x  x0)2 + (y  y0)2 < r2, x0 ¼ 0.4, y0 ¼ 0 located inside the initial circle Ω. In the first case, the function g is equal to the constant in the circle D, that is to say, it is a cylindrical step g ¼ A ¼ 10; (x, y) 2 D; g ¼ 0; (x, y) 2 Ω\D. In the second case, the function g is a smooth function with a smooth vertex: g ¼ 10{1  [((x  x0)2 + (y  y0)2)/r2]2}, (x, y) 2 D, g ¼ 0, (x, y) 2 Ω\D. Secondly, we can look for the solution of equations in the entire plane, herewith the requirement of limitation or qualified tendency to zero is usually the boundary condition at infinity. With the minimization of the functional, in this case, we took some of the test points as evenly distributed in the neighborhood of the singularity, and the rest—as normal distributed in the entire plane. Similarly, we constructed an approximation of the solution u for the entire plane in the case of a smooth right-hand side of the equation g in the form of a Gaussian packet g(x, y) ¼ A exp {(x  x0)2  (y  y0)2}.

1.2.5

Heat transfer in the vessel-tissue system

Construction of neural network mathematical models of multicomponent systems in the case of areas of composite type with complex geometry is of undoubted interest. A solution of such problems, characterized by multi-scale processes in the components, by traditional methods is substantially complicated when the type of the equation changes in the transition from one component of the region to another under the condition of sufficiently smooth docking of the corresponding solutions. As a model, we consider the following plane task about heat transfer in the “vessel-tissue” system: venous and arterial vessels are surrounded by muscle tissue, in which heat is released. We assume that the heat transfer in the vessels is mainly due to convection, in tissues—due to conduction [20, 21]. We will seek a temperature field in a certain neighborhood of the vessels—see Fig. 1.5. We denote by Tv the temperature field in the venous vessel and by Tv—the temperature field in the adjacent tissue, by Ta and Ta— the respective temperature fields for the artery and the adjacent tissue. Blood flow’s speed in the vein—uv and the artery—ua, taking into account the sticking effect u ¼ 0 on the walls of blood vessels have the form

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

z za

Γ3

Γ5 Γ4 Γ12

Γ2

Tv

Γ1

Tv

xv

Γ34

Γ23

Ta

Ta

Γ8 xva xa

Γ6

Γ7

xa

X

Fig. 1.5 The domain of modeling the temperature field (plane task).

e v ðx, zÞ uv ðx, zÞ ¼ 4u e a ðx, zÞ ua ðx, zÞ ¼ 4u

ðx  xv Þðxva  xÞ ðxva  xv Þ2 ðx  xva Þðxa  xÞ ðxva  xv Þ2

, ,

ev , u ea are functions, weakly dependent on the variable x, where u with small random additive perturbations (in the model task we consider them to be constant), xv, xva and xva, xa—the coordinates of the vein and artery boundaries, respectively. Let’s denote q as the heat release density in muscle tissue, c—its heat capacity, b—coefficient of thermal diffusivity, ρ—density, β(x, z)—a small random variable with an estimate determined experimentally. We obtain the following boundary-value problem related to a change in the type of equations and the boundary-value condition: the temperature T v and T a in tissues satisfy the Poisson ∂2 T ∂2 T q equation (elliptical type) 2 + 2 ¼  , ∂x ∂z cρb the temperature Tv and Ta in vessels satisfy the heat transfer ∂2 T ∂T ∂T equation (parabolic type) b 2 + β  u ¼ 0, ∂x ∂x ∂z herewith in the vein T ¼ Tv and u ¼ uv, in the artery T ¼ Ta and u ¼ ua; the boundary conditions have the form: on parts of the border Γ1, Γ7, and Γ8—the Dirichlet condition T ¼ T 0, on parts Γ3, Γ4, and Γ5—the Dirichlet condition T ¼ T1, ∂T on parts Γ2 and Γ6—the Neumann condition ¼ 0; ∂x the matching conditions on the sub-area junction areas have the form: ∂T v ∂Tv ¼ , for the interface for the interface Γ12—T v ¼ Tv, ∂x ∂x ∂Tv ∂Ta ¼ , Γ23—Tv ¼ Ta, ∂x ∂x

25

26

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

∂Ta ∂T a ¼ : ∂x ∂x We go from the formulation of the task presented above to the functional (1.2). In this case, the term J1 ¼ J11 + J12 is responsible for satisfying differential equations, the term J2 ¼ J21 + J22 + J23 is responsible for satisfying the boundary conditions, and the term J3 ¼ J31 + J32 + J33 is responsible for satisfying the agreement conditions on the sub-areas. In the case under consideration for the interface Γ34—Ta ¼ T a,

      12 0 M ∂2 T ξj ∂2 T ξj q ξj X @ A , + + J11 ¼ 2 2 ∂x ∂z cρb j¼1 J12 ¼

M0 X

0 @b

  ∂2 T e ξj

j¼1

∂x2



  ∂T e ξj ∂x

  12   ∂T e ξj A : u e ξj ∂z

ξj —in the vessels. Here ξj —test points in tissues, e J21 ¼

K1 K2 X X   0 2   00  2 T ξk  T0 , J22 ¼ T ξk  T1 , k¼1

J23 ¼

K3  X k¼1

∂T



ξ000 k

∂x

k¼1

 2 :

Here ξ0j —test points on the parts Γ1, Γ7, and Γ8, ξ00j —on Γ3, Γ4, and Γ5, ξ000j —on Γ2 and Γ6. 2    12 3 0 v e0 e0 N       1 ξ ∂T ∂T X6 v ξk 2 k v e0 0 e @ A 7  J31 ¼ +α 4 T ξk  Tv ξk 5, ∂x ∂x k¼1 2    12 3 0 N2     2 ξ00k ξ00k ∂T v e ∂T a e X 6 v e00 A 7  ξ00k + α@ J32 ¼ 4 T ξk  T a e 5, ∂x ∂x k¼1 2     12 3 0 a e000 N3     2 ξ000 ξk ∂Ta e ∂T X k 6 a e000 @ A 7 ξ000 ξ + α  T  J33 ¼ 4 Ta e 5: k k ∂x ∂x k¼1 Here ξej0 —test points on the interface Γ12, ξej00—on the part 000 Γ23, ξf j —on the part Γ34. It should be noted that the coefficients of thermal diffusivity of tissue and blood are approximately equal. In the case of capillaries whose walls are consisted of a single layer of endothelial cells, the

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

thickness of this layer is so small that heat transfer is freely carried out through the walls of the vessels—and this allows the conditions specified above to be established on the boundaries of the vessels between themselves and the boundaries of the vessels with tissues: equality of flows leads to equality of the derivatives. In the case of vessels of a different type—thick-walled or slagged—the conditions for coupling at the interfaces can be more complicated. The formulation of a plane task in the case of a system of several dissimilar vessels surrounded by the muscle tissue is formulated similarly. We also considered variants of the formulation of this task, which include perturbations that approximate it to real conditions. In the case of a plane task, perturbations of two types were considered: vessels with curved walls and vessels with parietal wall plaques. Our approach allows even with these complications to build sufficiently accurate solutions of perturbed tasks. See Chapter 4 for details. Note that in all these cases the error functional is written similarly. It is of interest to extend the considered statement of the problem to the case of three variables: we need to find the temperature field T(x, y, z) in the domain Π ¼ (0; xα)  (0; yα)  (0; zα) that satisfies in its subdomains: the vein Πv ¼ (xv; xvα)  (yv; yα)  (0; zα)  Π and the artery Πa ¼ (xvα; xα)  (yv; yα)  (0; zα)  Π—the equation of the parabolic type with the variable coefficient λ, which describes the condition of the adhesion λ ¼ 0 to the vessel wall: ∂2 T ∂2 T ∂T + 2  λ ¼ 0, 2 ∂x ∂y ∂z in the domain Πv T ¼ Tv , λ ¼ λv ðx, y Þ ¼ 16λ1

ðx  xv Þðxva  xÞðy  yv Þðya  y Þ ðxva  xv Þ2 ðya  yv Þ2

,

in the domain Πa T ¼ Ta , λ ¼ λa ðx, y Þ ¼ 16λ2

ðx  xva Þðxa  xÞðy  yv Þðya  y Þ ðxva  xv Þ2 ðya  yv Þ2

:

In the tissues that surround the vessels—in the sub-domain Πt ¼ Π\(Πv [ Πa)  Π, the function T ¼ Tt satisfies the equation of the elliptic type

27

28

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

Πv Πa Π1

Fig. 1.6 The domain of the temperature field (the spatial problem).

∂2 T ∂2 T ∂2 T + + ¼ Q: ∂x2 ∂y 2 ∂z2 On the mutual pieces of the interface of the components, the function T is continuous; its normal derivative is continuous as well; on the side part of the interface the function T ¼ Tt satisfies the Dirichlet condition: T ¼ T0; on the top part of the surface, the function satisfies the Dirichlet condition: T ¼ T1, except for the piece Γ1 ¼ (xva; xa)  (yv;ya)  {zα}. Three-dimensional formulation of the problem is illustrated in Fig. 1.6. Functional (1.2) is formed analogically to the two-dimensional problem in this case.

1.3

Problems for partial differential equations in the case of the domain with variable borders

Problems with some of its parts of the section or parts of the borders of the domain which are unknown beforehand and are determined or selected during its solution cause much more difficulties for classical approaches in comparison with the problems with fixed boundaries that have been considered above. Problems with an unknown border could be divided into two big categories. The problem of modeling a multicomponent system is related to the first one. In this problem, the interface is unknown and is determined during its solution. A typical problem of this kind is the Stefan problem, in which the unknown partition

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

boundary is the phase transition front. We consider the simplest variant of such a problem in Section 1.3.1. To the problems of the second category, we can relate the problems, which the control of the domain’s border is implemented in when, for example, we need to pick out from the family of boundary-value problems the one the solution of which delivers the extremum to some given functional. In this case, during the building of the model we have to use two functionals: with the help of the first one, we make the solution of the boundary-value problem, and with the second one, we define the optimal domain’s border. The considered in the 1.3.2 section problem of building a mathematical model of the standard gauge system— calibrator of the alternating pressure—is one of such problems. Earlier [22] we researched it with the help of the traditional methods of mathematical physics. In the works [23, 24], the neural network approach has been applied to the solution to the problem. It seems to be adequate due to several obvious advantages: the noise stability—the result changes a little when the input data is changed a little (boundary conditions, environment properties, temporary instabilities); there is no need to set up the network again if the series of problems (including nonlinearly perturbed ones) needs to be solved, we can use some already trained network for the input data that is close enough and, if necessary, train up the network to the level of the required accuracy.

1.3.1

Stefan problem

Problem formulation As the model one, let’s consider one-dimensional (for spatial variables) non-linear problem of the theory of heat conductivity that is related to the phase transition – the Stefan problem, which solution is known and can be used to control the supposed approach. Let us describe the two-phase system in the following way: in the rectangle Π ¼ ð0; T Þ  ð0; 1Þ ¼ Π + [ Π , where Π + ¼ fðt, xÞ 2 Πj0 < t < T , 0 < x < ξðt Þg, Π ¼ fðt, xÞ 2 Πj0 < t < T , ξðt Þ < x < 1g, we need to find solutions to the equations of the heat conductivity for each phase ∂u

∂2 u

¼ a2 2 , ðt, xÞ 2 Π : ∂t ∂x

29

30

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

Here a 2 are coefficients of thermal diffusivity of the respective phases, u (t, x)—the temperature of these phases which satisfy the initial condition u(0, x) ¼ u0(x)  0, boundary conditions u+(t, 0) ¼ φ(t) 0, u(t, 1) ¼ ψ(t)  0 and the condition on a free surface—the front of phase transition γ. This section border is set by an unknown smooth function x ¼ ξ(t), t 0 which needs to be determined during the solution process according to conditions:   ∂u +  ∂u  dξ , u + x¼ξ0 ¼ u x¼ξ + 0 ¼ 0, k + x¼ξ0  k x¼ξ + 0 ¼ q ∂x ∂x dt where k are coefficients of heat conductivity, q is the heat of the dξ phase transition. To calculate , we can use the expression dt ∂u ðt, ξÞ dξ . Notice that the last expression, which we have ¼  ∂t ∂u dt ðt, ξÞ ∂x got by differentiating the isotherm condition u(t, ξ(t)) ¼ 0 by the time t, is written for limit points of components on the front so that it could be viewed as: ∂u

ðt, ξÞ dξ : ¼  ∂t ∂u

dt ðt, ξÞ ∂x Due to features of the equal ratios for the speed of the front’s movement, it is fair to write: ∂u + ∂u ðt, ξÞ + ðt, ξÞ dξ ∂t ∂t : ¼ ∂u + ∂u dt ðt, ξÞ + ðt, ξÞ ∂x ∂x Uncomplicated modifications of the listed below approaches allows us to consider the cases when functions u0(x), φ(t), and ψ(t) change the sign, and the border is disintegrated into several components of the boundary. The multidimensional case does not require the change of the approach as well and should not let us to the catastrophic increase of calculation time. We have implemented the following approaches to the Stefan problem, which are natural from neural network methodology: (1) Approximation of the temperature field for both of the phases with one function. (2) Building the system of functions which includes, along with the functions that describe the temperature modes for each phase, the function ξ(t) which sets the front γ.

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

The first approach is the simplest one to implement and differs little from its analogs for other problems of mathematical physics. The second approach is not much more complicated, but it better meets the peculiarities of the problem and allows to achieve the required accuracy using simpler functions; it also allows natural parallelization of the problem, in which the solution for each of the phases and each phase interface is constructed in its computational flow. In the first approach, the solution of the problem is reduced to the minimization of the error functional (1.2). In this case, the component J1 ¼ J11 + J12 is responsible for equation fulfillment, the component J2 ¼ J21 + J22 + J23 shows initial and boundary conditions, and the component J3 replies for the condition on the phase transition line:    12 0  2 + + e + ∂u t + , x + M ∂ u t , x X j j j j @ A ,  a2+ J11 ¼ 2 ∂t ∂x j¼1   12 0  2   e  ∂u t  , x M ∂ u t , x X j j j j @ A , J12 ¼  a2 2 ∂t ∂x j¼1 n where

tj , xj

oM e

j¼1

are test points inside the region Π ;

J21 ¼

M0 X     2 u 0, xj0  u0 xj0 , j0 ¼1

J22 ¼

M+ X    2  u tj + , 0  φ tj + , j + ¼1

J23 ¼

M   X   2 u tj , 1  ψ tj , j ¼1

0 , {(0, xj0)}jM0¼1

+  {(tj+, 0)}jM+¼1 , {(tj, 1)}M where j¼1 are test points on the parts of the region border;         2 Mb  X ∂u tjb , ξ + tjb ∂u tjb , ξ tjb dξ tjb  k q k+ : J3 ¼ ∂x ∂x dt j ¼1

b

This sum is calculated in the neighborhood of the free boundary, which is defined as the line on which u(t, x) ¼ 0. Points with the minimum absolute value of u are used to dξ is estiestimate the last term J3 in the sum, and the derivative dt mated by linear regression on these points in a sufficiently small

31

32

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

subdomain. We can also estimate the derivative by the above fordξ mula, i.e., we replace by the expression dt       ∂u + tjb , ξ tjb ∂u tjb , ξ tjb +  ∂t    ∂t   :  ∂u + tjb , ξ tjb ∂u tjb , ξ tjb + ∂x ∂x For an unbounded domain, the described above algorithm requires simple modification. We can not take test points distributed uniformly to estimate the sums—in a neighborhood of infinity, the density of distribution of these points should tend to zero. The normal distribution appears to be the most appropriate for the modification under consideration, although other options also are possible. In particular, it is possible to choose the distribution depending on the rate of change of temperature, i.e., the density of the point distribution should be maximal in a region where the temperature gradient has a maximum. The second approach differs from the first one in that we use different functions to define the solution components u+ and u. Also, it is necessary to add the fitting condition u+jx¼ξ0 ¼ ujx¼ξ+0 into the error functional. We can search for solutions for both subdomains simultaneously, or we can use different functionals to find solutions in each subdomain and define the interface. We used for finding the solution u+ the functional (1.2) with the following components    !2 e+ M 2 X ∂u + tj , xj 2 ∂ u + tj , xj , J1 ¼ a+ ∂t ∂x2 j¼1 J2 ¼

M+ X  j + ¼1

Mb    2 X    2 u + tj + , 0  φ tj + + u + tjb , ξ tjb , jb ¼1

we used for finding the solution u the functional (1.2) with the following components    ! 2 e M 2 X ∂u tj , xj 2 ∂ u tj , xj  a , J1 ¼ ∂t ∂x2 j¼1 J2 ¼

M0 X 

M     2 X    2 u 0, xj0  u0 xj0 + u tj , 1  ψ tj

j0 ¼1 Mb X

+

jb ¼1

j ¼1

   2 u tjb , ξ tjb ,

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

we looked for the interphase boundary ξ(t) minimizing the following functional J3 ¼

Mb X 

     2 u + tjb , ξ tjb  u tjb , ξ tjb

jb ¼1 Mb  X

+

jb ¼1

        2 ∂u + tjb , ξ tjb ∂u tjb , ξ tjb dξ tjb  k q k+ : ∂x ∂x dt

At that in the process of optimization; we alternated several steps to minimize the first two functionals at a fixed phase boundary γ with several steps to minimize the third functional at fixed solutions u+ and u. In this case, more simple functions can be used, but the calculating process is less stable.

1.3.2

The problem of the alternating pressure calibrator

Problem statement We consider a standard verifier gauge—the calibrator of alternating pressure. The measuring working chamber of the gauge is symmetric about both the rotation axis and the plane that is perpendicular to the rotation axis. The chamber is filled with a viscous liquid, for example, with transformer oil. We create a constant high pressure in the chamber to suppress cavitation processes. On the cylindrical part of the cavity boundary, there is a piezoelectric source of harmonic oscillations, which imposes a variable pressure on the present constant pressure. We assume that the acoustic wave field in the measuring chamber is time harmonic, axisymmetric, and even with respect to the plane of symmetry. On the axis of symmetry, there are two pressure sensors—standard and certifiable. It is necessary to choose the shape of the part of the border containing the sensor so that the pressure on the sensor is maximum. The sensors are assumed to be quite small: we make assumptions that the sensors are not point-like—they have a finite size (although significantly smaller than the characteristic size of the working cavity of the calibrator), at that the field on the sensor is practically unchanged. We will use the following notations: V is oscillation velocity in material, p is pressure, η is density, ν is kinematic viscosity, c is the velocity of sound in material, x ¼ (x, y, z) is position vector, (ρ, φ, z) are cylindrical point coordinates.

33

34

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

A linear approximation of the acoustic equations leads to the equation for alternating pressure p ¼ p(x, t) ∂2 p ∂ ¼ c2 Δp + ν Δp: 2 ∂t ∂t We will look for the solution of this equation in some bounded domain Ω  R3—the working chamber of the calibration device. We choose special vibrational conditions on the boundary of the domain Γ ¼ ∂ Ω (border conditions): the impermeability of the walls of the working chamber, the piezoelectric generator on the cylindrical part of the chamber boundary; we take into account possible heterogeneities (sensors, measuring windows). The Fourier expansion leads to the study of solutions of the form p(x, t) ¼ p(x, ω) exp(iωt). For the Fourier component, u ¼ u(x, y, z) p(x, ω) we have the Neumann boundary value problem for the Helmholtz equation    8 < Δ+ k 2 u ¼ 0, k 2 ¼ ω2 = c2 + iνω , ∂u (1.28) :  ¼ f : ∂n Γ Here u is pressure in the Ω, Δ is Laplace operator, ω is cyclic free is the boundary of the domain Ω, fjΓ ¼ f0, quency, ∂Ω ¼ Γ ¼ Γ0 [ Γ 0  +  e is the part of the boundary that needs to e¼Γ e [Γ f e ¼ 0, where Γ Γ be optimized, taking into account the functional I[u] describing the wave field at the sensor location. Known results on the solvability of the task (1.28) are contained, for example, in the book [25]. With fixed data k, f0, it is possible to solve the task (1.28) in the domains Ω, differing e of the border Γ. A set of in the choice of the component Γ such domains {Ω} will be further parameterized by a functional parameter ς. The symmetry of the problem leads to an even concerning the variable z axisymmetric solution u(x, y, z) ¼ u(ρ, z) ¼ u(ρ,  z) in a ^ ^ +  R2 —the part of the axial section Ω. flat closed domain Ω Further, for the part of this section, the previous notation Ω will be used. pffiffiffiffiffiffiffiffiffiffiffiffiffiffi In the new variables ρ, z, where ρ ¼ x2 + y 2 , this domain is defined as Ω:  ς(ρ)  z  ς(ρ), 0  ρ  a; components of the bounde : z ¼ ςðρÞ, ary Γ are given in the form Γ0 : ρ ¼ a,  H  z  H; Γ 0  ρ  a, ς(a) ¼ H, ς(0) ¼ h > 0, ς_ ð0Þ ¼ 0. The specified function e of z ¼ ς(ρ) > 0 sets the parameterization of the boundary area Γ

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

the domain Ω and so the parametrization of the domain Ω itself. Thus, a family of domains is generated. We obtain the corresponding Neumann boundary condition in the following form: u0ρ ða, zÞ ¼ f0 ðzÞ,  H  z  H; u0ρ ð0, zÞ ¼ 0,  h  z  h; u0z ðρ, ςðρÞÞ ς_ ðρÞu0ρ ðρ, ςðρÞÞ ¼ 0, u0z ðρ, 0Þ ¼ 0, 0  ρ  a: The function f0(z) is also assumed to be even in the z-coordinate and independent of the angle φ 2 [0; 2π). Here we have entered the dςðρÞ . notation ς_ ðρÞ ¼ dρ The original method of optimization of the border Γ and calculation of the alternating pressure u based on boundary integral equations (with allowance for the symmetry of the problem) was presented in the publication [22]. The disadvantages of this approach include a geometry dependence—a significant complication of calculations in the absence of symmetry, special rather strong requirements for the smoothness of the boundary of the domain; a dependence on the equation—the principal use of linearity of the problem, consideration of the case of constant equation coefficients; the impossibility of solving a series of problems (for example, perturbed ones)—the need to solve the problem anew when changing input data. The functional, by which the function u defining the pressure field is sought, is chosen in the form (1.2), where the term J1 expresses the satisfaction of the equation J1 ¼

M     2 X Δu ρj , zj + k 2 u ρj , zj , j¼1

the term J2 estimates the satisfaction of the boundary condition J2 ¼

e  M X ej¼1

u0z  ς_ u0ρ

2 

X M0  2   ρ e, ze + u0ρ  f0 ρj0 , zj0 : j

j

j0 ¼1

n oM Here three sets of test points are used: ρj , z j —within the n j¼1 oM~ domain Ω, ρ ~j , z~j ~ —on the boundary part j¼1 n o M0 e Γ, ρj0 , zj0 —on the boundary part Γ0. j0 ¼1

35

36

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

Among various ways of describing the optimization conditions for the simulation of the pressure sensor G —“a large field on a small flat sensor”—the following were chosen: we consider curves z ¼ ς(ρ), flat in a small neighborhood of the sensor, orthogonal to the axis of symmetry z. The functional I describing the requirements imposed on the pressure sensor has the form m PG ðςðρi Þ  hÞ2 , I ¼ δ1  juð0, hÞj2 + δ2  ðςðaÞ  H Þ2 + δ3  i¼1

here δi > 0, i ¼ 1,2,3,—penalty coefficients; additional requirements for the function ς(ρ) are described by the third term of the G functional I through a set of points {(ρi, ς(ρi))}m i¼1—at the sensor location G. We obtain the variational problem I ! Max with the constraint (coupling condition) in the form of a boundary-value elliptic Neumann problem (1.28). We will be interested in non-zero values of extrema.

1.4

Inverse and other ill-posed problems

The study of real objects very often leads to the construction of mathematical models in the form of problems for differential (or other) equations in the formulations far from classical ones. Let us take, for example, the task of searching for minerals using geophysical methods. Mathematically, it is usually posed as the inverse problem of mathematical physics; solving the problem, this or that method of regularization is used. At that, one does not take into account the complex nature of the earth’s thickness, described in this approach by a simple equation. It seems advisable to consider this equation as the first approximation, refining its structure in the process of investigation, together with the geometric and other properties of the layers lying in the depths of the earth. Another known attempt to construct a complex, practically interesting model is connected with the weather forecast. It is very likely that the insufficient accuracy of the received forecasts is determined not only by the instability of the corresponding system but also by the adoption of many a priori assumptions that are not specified during further calculations and observations. Similarly, we can criticize the model of the “vessels-tissues” system considered in Section 1.2. Obviously, the chosen model is oversimplified both as for vessels and as for tissues and can be used only as of the first approximation, elaborated in further calculations and observations.

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

The same can be applied to practically any model of a real technical or biological object. According to the proposed method, a model can be elaborated, yielding new scientific and practical results. Furthermore, this elaboration without any principal obstacles can be automatized. Many applied real-world tasks make it necessary to build an approximated solution of a differential equation (or system of equations) in some class of functions, selecting this solution, not by initial-boundary conditions, as it is traditionally done in the classical statements of mathematical physics problems, but, for example, by a set of experimental data. Note that in such an unconventional statement, the problems become incorrect and, generally speaking, may not have a solution. The proposed approach is an approximated analytical method to study mathematical models; in particular, it makes possible to construct approximate solutions at the initial stage of modeling and in such unconventional situations. As a rule, in real problems equation coefficients and parameters are set inaccurately: for example, the parameter changes in a certain interval, the center of which characterizes the average value of the parameter. Our approach allows constructing an approximate solution of the problem using a single function.

1.4.1

The inverse problem of migration flow modeling

There are two models (see the book [25]): 8 dx > < ¼ shðkx + k1 y Þ  x chðkx + k1 y Þ, dt > : dy ¼ shðk2 x + ky Þ  y chðk2 x + ky Þ; dt 8 dx > < ¼ ðk  1Þx + k1 y, dt > : dy ¼ k2 x + ðk  1Þy: dt

(1.29)

(1.30)

Here Eq. (1.29) is an initial nonlinear model, the system (1.30)—a linearized model near the zero-point of balance. Based on these models (see [25]), we can calculate a prediction of migration dynamics. For this, it is necessary to determine the coefficients of the models from the available data. Let us consider three methods to identify the coefficients of these models:

37

38

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

The first method involves solving the system (1.30) analytically and then determining the coefficients according to the best fit of the solution and the data. This approach has two drawbacks: the impossibility of its extension to the case of the nonlinear system (1.29) and the need to solve the system again when extending to higher order models. The second method is a discretization of the system (1.30) and determination of its coefficients by formulas for two-dimensional linear regression. This approach can be extended to higher-order systems, but the case of a nonlinear system (1.29) requires the construction of a nonlinear regression, which also reduces its universality. We are going to consider this approach in Chapter 4. We applied the third method. It involves the use of a neural network. For this, we numerically solved the system (1.29) or (1.30) for a sufficiently large set of parameters {k(i), k1(i), k2(i)}N i¼1 with a fixed initial condition y(x0) ¼ y0. After this, the models of the dependence of these parameters on the values (x(T), y(T), x(2T), y(2T)) were built using minimization of error functionals: M X

ðk ðxi ðT Þ, yi ðT Þ, xi ð2T Þ, yi ð2T ÞÞ  kðiÞÞ2 ,

i¼1 M X

ðk1 ðxi ðT Þ, yi ðT Þ, xi ð2T Þ, yi ð2T ÞÞ  k1 ðiÞÞ2 ,

i¼1 M X

ðk2 ðxi ðT Þ, yi ðT Þ, xi ð2T Þ, yi ð2T ÞÞ  k2 ðiÞÞ2 :

i¼1

After that, the calculation of the necessary coefficients of the model from observations is reduced simply to substituting the data (x(T), y(T), x(2T), y(2T)) into the created model. This approach has some advantages, for example, a greater versatility—its application in linear and nonlinear cases is practically the same.

1.4.2

The problem of the recovery of solutions on the measurements for the Laplace equation

As an example of a non-classical statement, the problem of finding a function that is a solution of a given equation was studied. For this function, the equation itself is known in the part of the domain. Also, function values in some set of points are obtained (for example, as a result of measurements). Thus, we will search in the domain Ω ¼ Ω1 [ Ω2 for a function u(x), x 2 Rn that satisfies the

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

conditions A(u) ¼ 0, x 2 Ω2, where A is a known differential operator; u(xj) ¼ zj, xj 2 Ω1, j ¼ 1, …, m1, xj 2 Ω2, j ¼ m1 + 1, …, m1 + m2. Here, boundary conditions are not used at all. We characterize the quality of the solution using the error functional (1.2) with the terms J1 ¼

M X

ðAðuÞðxk ÞÞ2 , J3 ¼

mX 1 + m2 

  2 u xj  zj ,

j¼1

k¼1

is a set of test points in the subdomain Ω2. where As a specific example, we consider one of the simplest equations of elliptic type—the Laplace equation on the plane {xk}M k¼1

Δu

∂2 u ∂2 u + ¼ 0: ∂x2 ∂y 2

We will look for its solutions in the domain Ω. Let us denote by U a set of these solutions. How to select a solution from the set U? Let us say, for example, the domain Ω is a disc. If we set the solution values at the boundary of the disc—on the circle ∂ Ω, then, as is well known, an element u 2 U with boundary conditions u j∂Ω ¼ f can be uniquely found by the Poisson formula. However, in practice, we rarely encounter with such a way of selecting a solution from a set U. Through Up, we denote a subset of functions u from U taking values zk at points of some finite set P that can be located both in and out of the domain Ω. Usually, such values are known as results of observations (possibly with some error). Note that setting the conditions (selecting the solution from U ) in the points of some set P may also include specifying boundary conditions on some subsets of points on the boundary ∂ Ω. The problem becomes even more complicated if the Laplace equation Δu ¼ 0 is satisfied not in the domain Ω, but in some subdomain of Ω, or changes to a different differential relation in the complement of the subdomain. Rejecting the uniqueness of the solution and moving to the classes of equivalent solutions of the given accuracy, using the proposed general approach we construct an approximation uN of the solution u 2 Up based on the minimization of the error functional (1.2) with the terms J1 ¼

M  X j¼1

Mp X  2    2 Δu xj , yj , J3 ¼ u e xk , e y k  zk , k¼1

M where {(xj, yj)}jj¼1 is a set of test points, Mp is a number of points e y k in the set P. xk , e

39

40

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

1.4.3

The problem for the equation of thermal conductivity with time reversal

In this problem, we are trying to find out the initial distribution of the temperature (or the concentration of some substance) based on the final distribution. Such a task is a well-known incorrect problem. Direct problem. We are to find a function u(x, t), x 2 [0; 1], t 2 [0; T], meeting the conditions ut ¼ uxx , ðx, t Þ 2 Ω ¼ ð0; 1Þ  ð0; T Þ, uðx, 0Þ ¼ φðxÞ, x 2 ð0; 1Þ, uð0, t Þ ¼ 0, t 2 ½0; T , uð1, t Þ ¼ 0, t 2 ½0; T : Inverse problem. We search for a function u(x, t), x 2 [0; 1], t 2 [0; T], satisfying the conditions ut ¼ uxx , ðx, t Þ 2 ð0; 1Þ  ð0; T Þ, uðx, T Þ ¼ f ðxÞ, x 2 ð0; 1Þ, uð0, t Þ ¼ 0, t 2 ½0; T , uð1, t Þ ¼ 0, t 2 ½0; T : The function u(x, 0) ¼ φ(x), x 2 (0; 1) in this statement is unknown and is supposed to be found. The following equivalent modification of the statement is possible ut ¼ uxx , ðx, t Þ 2 ð0; 1Þ  ð0; T Þ, uðx, 0Þ ¼ f ðxÞ, x 2 ð0; 1Þ, uð0, t Þ ¼ 0, 2 t 2 ½0; T , uð1, t Þ ¼ 0, t 2 ½0; T , u(x, T) ¼ φ(x), x 2 (0; 1) this function should be found. The quality of the solution is characterized by the error functional (1.2), the components of which in this problem look like this: N      2 P ut ξj , τj  uxx ξj , τj is a term corresponding to the J1 ¼ j¼1

differential equation;

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

J2 ¼

Nb  P j¼1

    u2 0, τj + u2 1, τj is a term corresponding to the

boundary conditions; Nd     2 P J3 ¼ u x j , T  f xj is a term corresponding to the values j¼1

of the temperature at the final time point T. In the terms J1 and J2 periodically regenerated test points Nb are used, {(ξj, τj)}N j¼1—in the area Ω, {(0, τj), (1, τj)}j¼1—on parts of the border. It is not possible to obtain a stable approximation in such a statement without attracting additional information about the solution. Here are examples of two ways to get such information. The first way is to replace several summands in the term J3 with those corresponding to the initial condition. The computational experiment has shown that it is enough to use one value, i.e., it is assumed that we know the initial condition at one point. The second approach is to use instead of a known point in the initial condition one or more random points within the area (measurements at intermediate moments of time).

1.4.4

The problem of determining the boundary condition

In this paragraph, we consider the problem of determining the law of temperature change of a rod with an isolated end if the initial temperature and the desired law of temperature change at the isolated end or at the intermediate point are known. Direct problem. We are to find a function u(x, t), x 2 [0; 1], t 2 [0; T], meeting the conditions ut ¼ uxx , ðx, t Þ 2 ð0; 1Þ  ð0; T Þ, uðx, 0Þ ¼ φðxÞ, x 2 ð0; 1Þ, ux ð0, t Þ ¼ 0, t 2 ½0; T , uð1, t Þ ¼ qðt Þ, t 2 ½0; T : Inverse problem ut ¼ uxx , ðx, t Þ 2 ð0; 1Þ  ð0; T Þ, uðx, 0Þ ¼ φðxÞ, x 2 ð0; 1Þ, ux ð0, t Þ ¼ 0, t 2 ½0; T , uð0, t Þ ¼ f ðt Þ, t 2 ½0; T :

41

42

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

The function u(1, t) ¼ q(t) in this statement is unknown and is supposed to be found. As a model solution, we use the function  2 2 k x exp t t pffiffiffiffiffiffiffiffiffiffiffi 0 : Rðx, t Þ ¼ t  t0 The quality of the solution is characterized by the error functional (1.2), the components of which in this problem look like this: N      2 P ut ξj , τj  uxx ξj , τj is a term corresponding to the J1 ¼ j¼1

differential equation; Nb          P J2 ¼ u2x 0, τj + u xj , 0  φ xj 2 is a term corresponding j¼1

to the boundary and initial conditions; Nd     2 P J3 ¼ u 0, τj  f τj is a term corresponding to the j¼1

required boundary conditions. Just as for the previous problem, instead of the points on the boundary on which the solution is sought, we can specify the solution at random points within the area. The following modification of the statement is possible ut ¼ uxx , ðx, t Þ 2 ð0; 1Þ  ð0; T Þ, uðx, 0Þ ¼ φðxÞ, x 2 ð0; 1Þ, ux ð0, t Þ ¼ 0, t 2 ½0; T , uðx0 , t Þ ¼ f ðt Þ, t 2 ½0; T , x0 2 ð0; 1Þ: The function u(1, t) ¼ q(t) in this statement is unknown and is supposed to be found. For this problem, the required modifications are minimal— only the summand J3 of the error functional should be replaced Nd     2 P u x0 , τ j  f τ j . with the sum j¼1

1.4.5

The problem of continuation of the temperature field according to the measurement data

Let u(x, t), where x 2 [0; 1], t 2 [0; T], is a solution of the homogeneous heat conduction equation ut  uxx ¼ 0, which satisfies the boundary conditions u(0, t) ¼ u(1, t) ¼ 0 and the initial condition u(x, 0) ¼ φ(x).

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

  We replace the initial condition with relations u e xk , et k ¼ φk , k ¼ 1, …, p at points of some set P. Here φk are experimental data received with some error. Such a task is typical for practice— part of the conditions (initial or boundary ones) may be unknown, but the data of object observations can be available instead. Note that this problem is incorrect [26, 27]. The constructed approximation for the solution can be considered as its regularization. The error functional (1.2), characterizing the quality of the solution, is written in the standard way by specifying the compoM   P nents. Here J1 ¼ ðut  uxx Þ2 ξj , τj is a term defined by the corj¼1

respondence of the solution to the differential equation, M0 M1   P   P J2 ¼ boundary conditions, u2 0, τj0 + u2 1, τj1 —by J3 ¼

j0 ¼1 p  P k¼1

j1 ¼1

  2 u e xk , et k  φk —by

measurement

data;

{(ξj, τj)}M j¼1,

0 1 , {(1, τj1)}jM1¼1 are sets of test points inside the area, on left {(0, τj0)}jM0¼1 and right parts of the border. Without problems, the statement is extended to the twodimensional (and multidimensional) case. We seek a solution ∂u ∂2 u ∂2 u u(x, y, t) of the heat equation ¼ 2 + 2 for the case of two spa∂t ∂x ∂y tial variables x, y in the domain Ω : 0 < x < 1; 0 < y < 1; 0 < t < T. Boundary conditions are the Dirichlet homogenous conditions:

uð0, y, t Þ ¼ uð1, y, t Þ ¼ uðx, 0, t Þ ¼ uðx, 1, t Þ ¼ 0: However, the initial condition is absent, instead of it a set of “experimentally measured” values of the function u (for example, data from sensors) at some points (xj, yj, tj) is given: u(xj, yj, tj) ¼ fj, j ¼ 1, …, Nd. The points (xj, yj, tj) also belong to the specified domain Ω. It is assumed that the data fj are known with some error. The quality of the solution is characterized by the error functional (1.2), the components of which in this problem look like this:     o2 N n  P ut ξj , ηj , τj  uxx ξj , ηj , τj  uyy ξj , ηj , τj is a term J1 ¼ j¼1

corresponding to the differential equation;    Nb n     o P J2 ¼ u2 0, ηj , τj + u2 1, ηj , τj + u2 ξj , 0, τj + u2 ξj , 1, τj is j¼1

a term corresponding to the boundary conditions; Nd    2 P u xj , yj , tj  fj is a term corresponding to the J3 ¼ j¼1

experimental data.

43

44

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

Here, as before, in the terms J1 and J2 we used periodically re-generated test points: {(ξj, ηj, τj)}N j¼1—in the domain Ω, {(0, ηj, τj), Nb —on parts of the border. (1, ηj, τj), (ξj, 0, τj), (ξj, 1, τj)}j¼1

1.4.6

Construction of a neural network model of a temperature field according to experimental data in the case of an interval specified thermal diffusivity

In this section, we consider the problem of restoring the temperature field from experimental data, provided that the cooling samples have a different coefficient of thermal diffusivity. In this case, the initial temperature distribution is the same for all samples. The formal statement of the problem is as follows: ui ¼ ruxx , ðx, t Þ 2 Ω ¼ ð0; 1Þ  ð0; T Þ, r 2 ½rmin ; rmax , uðx, 0, r Þ ¼ φðxÞ, x 2 ð0; 1Þ, uðx, 0, r Þ ¼ 0, t 2 ½0; T , uð1, t, r Þ ¼ 0, t 2 ½0; T , uðxi , ti , ri Þ ¼ fi , i ¼ 1, …, p:

The quality of the solution is determined by the error functional (1.2), where   o2 N n  P J1 ¼ ui ξi , τj , ηj  rj uxx ξi , τj , ηj is the term correj¼1

sponding to the differential equation;   o2 Nb n  P J2 ¼ u2 0, τj , ηj  u2 1, τj , ηj is the term corresponding j¼1

to the boundary conditions; Nd    2 P J3 ¼ u xj , tj , rj  fj is the term corresponding to the j¼1

“experimentally obtained” data. Here in the terms J1 and J2 we used periodically re-generated Nb test points: {(ξj, τj, ηj)}N j¼1—in the area Ω, {(0, τj, ηj), (1, τj, ηj)}j¼1—on parts of the border.

1.4.7

The problem of air pollution in the tunnel

Air pollution in the human environment reduces the duration of his life, leading to a variety of serious diseases. In his professional activities, the human is exposed to an even greater number of harmful production factors, which constantly requires the development of additional protective measures that cannot be carried out without research and design work. In the construction of underground structures, tunnels for various purposes (for example, underground roads and railways), one of the most important tasks to ensure the safety of ongoing work is to monitor indicators of the air

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

environment, which can be contaminated with sources of different origin. Normalization of the air environment is mainly solved by the use of a rational ventilation system. However, at the stage of construction and operation, it may be necessary to carry out some works, for example, repairs, in the absence of ventilation or its shutdown. The study of the problem of the distribution of harmful substances from a source of pollution located in the depth of the tunnel showed that information on the concentration of the harmful substance at the outlet makes it possible to determine the distribution of zones of hazardous concentrations of such a substance along the tunnel. We consider the identification problem arising in the ecological forecasting the state of the air environment in a tunnel using the diffusion equation in a moving medium as a basis. Let the concentration of a pollutant Φ ¼ Φ(x, t) in the point x of the region I ¼ [0; L]  R at the time t 2 (0; T) satisfy the following initial-boundary value problem: ∂Φ ∂Φ ∂2 Φ + u  σ 2 + τΦ ¼ P, Φðx, 0Þ ¼ Φ0 ðxÞ,Φð0, t Þ ¼ Φ1 ðt Þ, ∂t ∂x ∂x P here the function P(x, t) ¼ ks¼1ps(t)δs(x; xs) characterizes the power of pollution sources, ps(t) is the power of the s-th pollution source, distributed in a neighborhood of xs with a density δs(x; xs), u is the speed, σ is the turbulent diffusion coefficient, τ > 0 is a parameter that determines the intensity of absorption of a pollutant due to its entrainment, deposition, chemical reactions, etc. We assume that there is no initial contamination throughout the tunnel Φ0(x) ¼ 0, 0  x  L. At the entrance to the tunnel, there is no pollution Φ1(t) ¼ 0, 0  t  T. We note that via replacing Φ(x, t) ¼ exp(ax + bt)U(x, t), P(x, t) ¼ exp(ax + bt)Q(x, t), where a ¼ u/2σ, b ¼  τ  u2/4σ, the differential equation can be transformed into the heat equation ∂U ∂2 U  σ 2 ¼ Q with the corresponding initial-boundary conditions ∂t ∂x on the function U: U(x, 0) ¼ U0(x) ¼ 0, U(0, t) ¼ U1(t) ¼ 0. We note that the specification of boundary conditions of the form  ∂ u Φ  Φ ðL, t Þ ¼ 0 at the end of the tunnel would lead for the ∂x 2σ function U to a homogeneous Neumann condition  ∂ U ðL, t Þ ¼ 0. Next, we will consider the transformed problem ∂x for the function U. The coordinates x, t are normalized in such a way that L ¼ 1, T ¼ 1. The problems in the proposed statement, when a part of the boundary conditions at the end of the tunnel and some power sources are not specified, are poorly posed—they need additional data.

45

46

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

Suppose that instead of the missing   conditions, the data of the experimental observations U e xj , et j ¼ φj , j ¼ 1, …, m obtained from a certain set of sensors are known. The problem consists of finding an approximate solution of the problem—the function U, as well as restoring the function Q. Another variant of setting up the identification task—restoring the initial conditions—was considered above in Sections 1.4.3 and 1.4.4. Our proposed approach allows us to combine heterogeneous information about the system in a neural network model to use the regularizing properties of neural networks in solving inverse and ill-posed problems. An approximate solution of the identification problem can be found as outputs of a system of two artificial neural networks of a given architecture based on minimizing the error functional (1.2) of the form 2 M0 M  X X     ∂U ∂2 U σ 2 Q x j , t j + λ0  U 2 xj , 0 J¼ ∂t ∂x j¼1 j¼1 + λ1 

M1 X j¼1

m   2 X    U 2 0, tj + λ  U e xj , et j  φj : j¼1

Here {(xj, tj)}M j¼1 are periodically regenerated test points in the area M1 0 [0; 1]  [0; 1], {(xj, 0)}M j¼1, {(0, tj)}j¼1 are test points on the border sections; λ0, λ1, λ > 0 are penalty parameters. In the considered problem of modeling air pollution in tunnels, estimating the concentration in the main part of the tunnel is more important than identifying the source, so it is possible to find a neural network approximation to the function U, minimizing the functional J without the source Q in the first term. In this case, the first sum is taken from the test points from that part of the domain of the variables for which it is known, that Q ¼ 0.

The conclusion An interesting generalization of the approach described above to solving the Dirichlet problem is obtained if we use not only spatial variables but also boundary values in a certain set of points as input variables of the desired solution. The function constructed in this way makes it possible to obtain a solution of the Dirichlet problem not for fixed, but for arbitrary boundary conditions (the boundary condition is given in a tabular form in this set of points). Similarly, it is possible to set and solve inverse problems of various kinds – for example, to determine the boundary conditions for the solution given in a certain set of points. The construction is done as follows: 1. We find a solution of the problem (1.1) in a given set of points of the domain under different boundary conditions f. It creates a

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

data set for training the network, and this set should be representative enough. 2. We look for a function (or a set of functions) whose value is the boundary conditions in the specified set of points (the boundary condition at a fixed point). The search is carried out by the minimization of some functional (for example, the sum of the squared deviations of the network outputs from the prescribed values mentioned in item 1). 3. The function can produce boundary conditions f for the required values of the solution u in a given set within the domain. Instead of solving in a set of points, we can use other conditions—a minimum of some auxiliary functional (or a system of functionals), limiting the solution to certain additional conditions, and so on. As noted, multi-criteria optimization methods can be used here. If a solution is known on the part of the boundary, then the corresponding values f can be included in the number of network inputs, leaving the values f at the points of the remaining part of the boundary as the outputs. It is obvious that our methodology weakly depends on both the equation and the form of the domain and the type of boundary conditions. Equations and boundary conditions can also be nonlinear; it is sufficient to associate a minimized functional of the type (1.2) with them. If the domain has features, for example, sharp corners, we can take more test points in their neighborhood. In the same way, combined problems are posed and solved when the equation under consideration in different subdomains has a different form. Moreover, the form of the region itself can be considered as a variable to be determined. In this case, the boundary can be defined by a certain set of points or regarded as an element of some parametric family, the parameters of which are to be determined. We consider solutions uς of a family of boundary value problems (ς is a functional parameter) p Aðuς ð x  ÞÞ ¼ λuς ðx Þ, x 2 Ως  R , λ 2 ℂ, Bðuς ÞΓς ¼ f :

The simplest option is to look for the function uς(x) so that the solution of the boundary value problem is obtained, and further to find the parameter ς from the extremum condition for the functional (1.2). At the same time, for all λ and f its function is constructed. Next, we can build a mapping that will have as arguments λ and values of the function f in some fixed set of points and the output—ς. In this case, it will be necessary to collect the original database, i.e., to solve the problem for a fairly representative set of parameters.

47

48

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

Similarly, the problems of reconstructing an equation (or boundary condition) as an element of a certain parametrized family are solved. Our approach to solving inverse problems of this kind can be considered as one of the methods of their regularization. In the standard regularization procedure—see [26]—some solution is selected from a possible set of solutions in an artificial way, for example, the solution having a minimum norm. This may not correspond to the informal requirements for the problem being solved. More promising is the creation of a set of samples with the help of an expert making an informal choice and in the future the construction of a function (or functions), not only by minimizing the functional (or functionals) (1.2) but also in accordance with these samples.

References [1] Vapnik V.N. The Restoration of Dependencies From Empirical Data. M.: Science, 448 p. (in Russian). [2] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, 1999. 823 p. [3] J.C. Butcher, Numerical Methods for Ordinary Differential Equations, John Wiley & Sons, New York, 2008. [4] E. Hairer, G. Vanner, Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems, second revised ed., vol. XV, Springer Verlag, Berlin, Heidelberg, New York, 1996. 614 p. [5] F.J. Olver, Asymptotics and Special Functions, Academic Press, New York and London, 1974. 584 p. [6] T.V. Lazovskaya, D.A. Tarkhov, Fresh approaches to the construction of parameterized neural network solutions of a stiff differential equation, St. Petersburg Polytech. Univ. J. Phys. Math. (2015), https://doi.org/10.1016/j.spjpm. 2015.07.005. [7] V. Hlavacek, M. Marek, M. Kubicek, Modelling of chemical reactors. Part X, Chem. Eng. Sci. 23 (1968). [8] S.I. Hudyaev, Threshold Phenomena in Nonlinear Equations, M.: PHYSMATLIT, 2003. 272 p. (in Russian). [9] M. Kubicek, V. Hlavacek, Solution of nonlinear boundary value problems. Part VIII, Chem. Eng. Sci. 29 (1974) 1695–1699. [10] S.S. Dmitriev, E.B. Kuznetsov, Heat and mass transfer in porous catalyst, in: Proceedings of the VI International Conference on Nonequilibrium Processes in Nozzles and Jets – NPNJ-2006, Saint Petersburg, 2006, pp. 159–160. M.: University book. (in Russian). [11] C. Na, Computational Methods for Solving Applied Boundary Value Problems, The Publishing World, Singapore, 1982. [12] E.M. Budkina, E.B. Kuznetsov, Solving of boundary value problem for differential-algebraic equations, in: Proceedings of the XIX International Conference on Computational Mechanics and Modern Applied Software Systems, MAI Publishing House, Moscow, 2015, pp. 44–46 (in Russian). [13] V. Sidorov Yu, M.V. Fedoryuk, M.I. Shabunin, Lectures on the Theory of Functions of a Complex Variable, M.: Science, 1982. 488 p. (in Russian).

Chapter 1 EXAMPLES OF PROBLEM STATEMENTS AND FUNCTIONALS

[14] S.K. Godunov, Equations of Mathematical Physics, M.: Science, 1979. 392 p. (in Russian). [15] V.P. Mikhailov, Partial Differential Equations, second ed., 1983. M.: Science. 424 p. (in Russian). [16] I.E. Lagaris, A. Likas, D.I. Fotiadis, Artificial neural networks for solving ordinary and partial differential equations, IEEE Trans. Neural Netw. 9 (5) (1998) 987–1000. [17] Zh.I. Alferov, Physics and Life, second ed., 2001. M.; SPb.: Science, 288 p. (in Russian). [18] H. Voss, Numerical calculation of the electronic structure for threedimensional quantum dots, Comput. Phys. Commun. 174 (2006) 441–446. [19] W. Wang, T.-M. Hwang, J.-C. Jang, A second-order finite volume scheme for three dimensional truncated pyramidal quantum dot, Comput. Phys. Commun. 174 (2006) 371–385. [20] V.I. Antonov, A.N. Vasilyev, A.I. Zagainov, I. Luchakov Yu, Study of heat exchange in the capillaries of warm-blooded, in: Proceedings of the 11 National Conference on Science and Higher School Problems “Basic Research and Innovations in Technical Universities”, St. Petersburg, 2007, pp. 101–106 (in Russian). [21] V.I. Antonov, A.N. Vasilyev, D.A. Tarkhov, Neural network approach to the modeling of heat transfer in the system of vessels-tissues, in: Proceedings of the International Conference “Intellectual and Multiprocessing Systems, 2005, Taganrog-Donetsk-Minsk, vol. 2, 2005, pp. 223–227 (in Russian). [22] A.N. Vasilyev, N.G. Kuznetsov, On some extremal problems arising in acoustics, in: Boundary Value Problems for Nonclassical Equations of Mathematical Physics, Proceedings of the All-Union School “Nonclassical Equations of Mathematical Physics”, Novosibirsk, 1989, pp. 94–98 (in Russian). [23] A.N. Vasilyev, D.A. Tarkhov, G. Gushchin, Modeling of the calibrator of alternating pressure through a system of neural networks, in: Collection of Papers of International Conference on Soft Computing and Measurements – SCM’2004, SPb, vol. 1, 2004, pp. 304–308 (in Russian). [24] A. Vasilyev, D. Tarkhov, G. Guschin, Neural networks method in pressure gauge modeling, in: Proceedings of the 10th IMEKO TC7 International Symposium on Advances of Measurement Science, Saint-Petersburg, Russia, vol. 2, 2004, pp. 275–279. [25] W. Weidlich, Sociodynamics. A Systematic Approach to Mathematical Modeling in the Social Sciences, Harwood Academic, Amsterdam, 2000. [26] A.N. Tikhonov, Solutions of Ill-Posed Problems, Winston, New York, 1977. [27] A.A. Samarskii, P.N. Vabishchevich, Numerical Methods for Solving Inverse Problems of Mathematical Physics. Inverse and Ill-Posed Problems Series, Walter de Gruyter, Berlin, New York, 2007, 438 p.

Further reading rive es partielles et leur signification [28] H. Jacques, Sur les proble`mes aux de physique, 1902, pp. 49–52.

49

The choice of the functional basis (set of bases)

2

It would seem that before choosing a basis, it is necessary to choose between classical methods—grids and finite elements— and neural network approach. A comparison of the various options of grid methods in the classical setting with the finite element methods and methods of neural networks is hampered by different shape of models obtained. The discrete type of models is typical for the method of grids and the method of cellular neural networks; the functional type of models is typical for the finite element method and the method based on multilayer perceptrons (MLP) and networks with radial basis functions (RBF). However, this difference is not so drastic for the following reasons: firstly, from a discrete solution using interpolation, we can go to the functional representation and compare the functional model with the result of interpolation of the discrete model; secondly, it is possible to build a mixed model, discrete one in one subdomain and functional one in another subdomain of its definition, depending on the requirements for the model: if the functional model is needed only in the subdomain, the asymptotics of the solution is known, etc. The difference between the types of models becomes even smaller if you notice that you can often see the construction of functional models behind the grid methods. At its core, grid, finite element, and quasi-linear neural network expansions are similar. However, in the neural network approach, it is possible to configure the vector parameters included in the elements of the functional basis, while the basis elements in other approaches are assumed to be initially specified. This fact gives certain advantages to the approach based on artificial neural networks, since ☆

Co-author: Galina Malykhina

Semi-empirical Neural Network Modeling and Digital Twins Development. https://doi.org/10.1016/B978-0-12-815651-3.00002-X # 2020 Elsevier Inc. All rights reserved.

51

52

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

it generalizes the known Galerkin’s expansions, allows reducing the number of parameters used, leads to the construction of approximate solutions stable with respect to perturbations, provides the possibility of parallel and distributed implementations of algorithms for setting parameters and structure of the neural network functional basis. In our experience, RBF-networks and a perceptron with one hidden layer turned out to be quite effective in most tasks. In some problems, we used more sophisticated variants of perceptron functions with multipliers in the form of polynomials or Gaussians. In the case of composite domains, we used two approaches. The first approach uses a single error functional and a single basis for the entire domain. The second approach uses a set of error functionals and bases for individual subdomains. In some problems, it is advisable to use a mixed basis, changing in the process of solving the problem.

2.1 2.1.1

Multilayer perceptron Structure and activation functions of multilayer perceptron

This type of neural network is characterized by the fact that linear combinations of its inputs (independent variables of the desired solution) are fed to the first layer of neurons, which are one-dimensional nonlinear functions. Linear combinations of outputs of neurons of all layers except the last one are fed to the next layer, and linear combinations of outputs of neurons of the last layer form the network output. Formally, such a network is described by recurrent relations   y ðlÞ ¼ W ðlÞ x ðlÞ , x ðl + 1Þ ¼ φ y ðlÞ , where x(l) is the input vector, and y(l) is the output vector of the summators of the l-th layer, x(l+1) is the output vector of the neurons of the l-th layer, which is simultaneously the input vector of the (l + 1)-th layer, φ() is the vector function acting coordinate-wise. For clarity, in the literature on neural networks, such networks are usually represented in the form of a so-called signal transmission graph (signal-flow graph, see S. Haykin) [1]. Such a graph for one neuron is shown in Fig. 2.1. A graph is a network with many nodes connected by directional links. Links between nodes are characterized by transfer functions that specify the signal conversion between nodes. We assume that the connections have linear transfer functions indicated by weights wi,j. Neurons are

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

x1(l)

xi(l)

wj1 wji

Σ yj

j(yj(l))

xj(l+1)

… wjm wj0

xm(l)

Fig. 2.1 A graph for one neuron (signal-flow graph).

characterized by one-dimensional nonlinear functions φj(y), which are called activation functions. The output signal of a neuron is the output of a single-layer network or the input of the next layer of a multi-layer network. The output signal of the l-th layer of neurons is determined by the ratio: ! m ðl Þ X ðl + 1Þ ðl Þ ðl Þ ðl Þ ¼φ wj,i xi , j ¼ 1, …,mðl + 1Þ : (2.1) xj i¼0

x(l+1) j

where are the input signals of neurons the following (l + 1)-th layer. Let us consider a signal graph between two sequential hidden layers l, l + 1 of a multi-layer perceptron. Let the l-th layer consists of m(l) neurons; the (l + 1)-th layer consists of m(l+1) neurons, as shown in Fig. 2.2. The activation functions of the l-th and (l + 1)-th hidden layers are φ(l)(yj), j ¼ 1, …, m(l) and φ(l+1)(yi), i ¼ 1, …, m(l+1) respectively. Linear combinations of neuron outputs are fed to the next layer, and linear combinations of neuron outputs of the last layer

Fig. 2.2 Graph of signal transmission between two hidden layers of the perceptron.

53

54

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

form the network output. The input of the l-th hidden layer receives signals of the (l  1)-th layer of neurons: xi ¼

ml X j¼0

ðl Þ φi

ðl Þ ðl1Þ wj,i φj

m l1 X

!! ðl1Þ wk, j xk

, i ¼ 1, …,mðlÞ :

(2.2)

k¼0

(l1) (l) where wk, , j ¼ 1, …, m(l) are weights of the l-th j, k ¼ 1, …, m (l+1) are hidden layer of neurons, w(l+1) j,i , j ¼ 1, …, ml, i ¼ 1, …, m weights of the (l + 1)-th hidden layer of neurons. Here an additional input is added to the l-th and the (l + 1)-th layers of the neural network; its value x0 is always equal to one. Its meaning is the same as adding an offset (bias) to linear regression, i.e., actually subtracting a constant value. The amount of displace(l) ment is determined by the weights w(l1) 0, k , w0, j, which are selected together with the rest of the weights in the learning process. Experience has shown that this greatly improves the properties of the network. In the case of solving differential equations using neural networks, the quality of the solution depends on the choice of the type of activation functions. The type of activation functions determines the amount of computing, the smoothness of the solution, the number of neurons in the neural network, the amount of data required for training. Usually, the perceptron activation P function approaches to the function xj ¼ sign(yj) , where yj ¼ m i¼0wi,jxi. Its meaning is to divide the space into two half-spaces by the hyperplane (w, x) ¼ 0 if w ¼ [w0, w1, …, wm]T, x ¼ [1, x1, …, xm]T or (w, x) + w0 ¼ 0 if w ¼ [w1, …, wm]T, x ¼ [x1, …, xm]T. In one half-space neuron output is +1, in the other is 1. If we take not one neuron, but a layer of neurons, then the whole space is divided into subsets by hyperplanes; each of subsets corresponds to its own set of outputs of neurons of this layer. In this case, the network output is obtained as a piecewise constant function. Usually, instead of the function sign(y), smooth functions are used, which allows to calculate derivatives and apply gradient optimization methods to train the network. In this case, the above division into half-spaces is fuzzy, i.e., the neighborhood of the separating hyperplane becomes a transition zone. It is desirable that the calculation of derivatives of the activation function requires a minimum of additional operations. For such functions, the network will no longer give a piecewise constant function, but a smoothed dependence. We give some examples of activation functions.

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

Piecewise linear symmetric activation function with saturation is determined by the ratio: 8 8 < 1, < 0, y <  1 =2 y <  1 =2 0 1 1 1 φðy Þ ¼ 2y,  =2  y  =2 ; φ ðy Þ ¼ 2,  =2  y  1 =2 ; : : 1, y > 1 =2 0, i y > 1 =2 or piecewise linear positive function with saturation: 8 8 y1 The piecewise linear function has discontinuities of the first derivative, which complicates the application of many neural network training methods. When the argument of such a function goes beyond the interval [1; 1], then the derivative of the activation function turns to 0. More smooth activation function recommended in [2] and other papers by A. N. Gorban and his disciples y has the form φðy Þ ¼ ; thus the derivative of the function is 1 + jy j 1 ¼ ð1  jφðy ÞjÞ2 : φ0 ðy Þ ¼ ð1 + jy jÞ2 In the world literature on neural networks, another activation function φ(y) ¼ tanh(y) is most often used, and 1 ¼ 1  φ2 ðy Þ: Sometimes an asymmetric version φ0 ðy Þ ¼ cosh 2 ðy Þ ey with derivative of such a function φð y Þ ¼ 1 + ey y e ¼ φðy Þð1  φðy ÞÞ: is used. Other functions are also φ0 ðy Þ ¼ ð1 + e y Þ2 1 : used, e.g., φðy Þ ¼ arctg ðy Þ, φ0 ðy Þ ¼ 1 + y2 Unlike piecewise linear functions, sigmoid functions are close to the asymptote in most of their domain of definition. They approach a “1” with large positive y and a “1” or zero when y tending to minus infinity. Sigmoid functions have high sensitivity only in the vicinity of zero. Because of the saturation of sigmoidal functions, gradient learning is very difficult, so their use in hidden layers of feedforward neural networks is not always effective. Let us describe the functioning of a multilayer perceptron in vector-matrix form. We define the input vector of the l-th layer as y(l1) and the output vector as x(l). Then the network is described by recurrent relations y ðl Þ ¼ W ðl Þ x ðl Þ ,   x ðlÞ ¼ φðlÞ y ðlÞ ,

(2.3) (2.4)

55

56

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

where x(0) ¼ x is a network input; y(L) ¼ f(x) is a network output, W(l) is a weight matrix of the l-th layer, and φ(l) is an activation function that acts coordinately. Formula (2.4) assumes that the activation functions of the neurons of the l-th layer coincide; if this condition is not met, then the func(0) tions φ(l) should be replaced by the functions φ(l) j . In particular, xi ¼ xi (L) are input coordinates, yi are output coordinates of the network. The use of gradient methods for training the network, i.e., minimizing the error functional (1.2) requires computation of the derivatives of the network output concerning weights. These derivatives are easy to obtain using the formula of differentiation of a composite function according to the following algorithm. Step 1. Calculate the network output using the formulas (2.3), (2.4). This step is called direct functioning. Step 2. Calculate values   ðl Þ ðl Þ ðl1Þ (2.5) zj ¼ φ0 yj Calculations by formulas (2.3)–(2.5) are carried out sequentially for all l. The procedure defined by steps 1 and 2 is referred to as the direct operation of the network. Step 3. Define by Z(l) the matrix φ0 (l)(y(l1)). Note that it is a diagonal matrix with elements zil on the i-th place of the diagonal. To determine the desired derivatives, we need to perform calculations using the formula ∂y ðLÞ ðl Þ ∂wi, j

¼ W ðLÞ ZðLÞ W ðL1Þ ZðL1Þ …Zðl + 1Þ

∂W ðlÞ ðl Þ ∂wi, j

x ðlÞ

(2.6)

Thus the matrixes Z(l) are computed by counting the outputs of the network. According to Eq. (2.6), the calculations begin on the last layer and move to the first one, so this procedure is called inverse operation. In order to get the formula (2.6), it is required to apply the formula of the derivative of a composite function to Eqs. (2.3) and (2.4). ∂y ðLÞ

∂W ðlÞ

∂wi, j

ðl Þ ∂wi, j

¼ ðl Þ

x ðlÞ

(2.7)

∂W ðlÞ where the element of the matrix in place i, j is 1, the rest ðl Þ ∂wi, j elements are zero. Let l < s  L, then ∂y ðsÞ

∂x ðsÞ

∂wi, j

∂wi, j

¼ W ðs Þ ðl Þ

ðl Þ

(2.8)

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

∂x ðsÞ ðl Þ ∂wi, j

¼ φ0

ðs Þ

  ∂y ðs1Þ y ðs1Þ ðl Þ ∂wi, j

(2.9)

The next step is to apply the formula (2.7). Similarly, we calculate the derivative concerning the inputs of ∂y ðLÞ ¼ W ðLÞ ZðLÞ W ðL1Þ ZðL1Þ …W ð0Þ . This derivative the network ∂x allows us to evaluate the significance of inputs and discard the least significant, if necessary. Some optimization methods, such as the Newton method, require the computation of second derivatives. The formula (2.8) ∂2 y ðlÞ ¼ 0. From Eq. (2.6) we have gives ðl Þ ðl Þ ∂wi, j ∂wp, q ∂2 y ðs Þ ðl Þ ðsÞ ∂wi, j ∂wp,q 0

¼

∂W ðsÞ ∂x ðsÞ ðsÞ ðl Þ ∂wp,q ∂wi, j

for l < s  L and

∂2 y ðsÞ ðl Þ ðl 0 Þ ∂wi, j ∂wp, q

¼ W ðs Þ

for l, l < s  L. Differentiating Eq. (2.9), we obtain ∂2 x ðs Þ ðl 0 Þ

ðl Þ

∂wi, j ∂wp,q

¼ φ0

ðsÞ

∂2 x ðsÞ ðl Þ

ðl 0 Þ

∂wi, j ∂wp,q

  ∂2 y ðs1Þ   ðs1Þ ∂y ðs1Þ 00 ðsÞ ðs1Þ ∂y y ðs1Þ + φ y : 0 ðl Þ ðl Þ ðl Þ ðl 0 Þ ∂wi, j ∂wp,q ∂wi, j ∂wp,q

Here φ00 (s)(y(s1)) is the bilinear diagonal form, so ðsÞ

∂2 xk ðl Þ

¼ φ0 ðl 0 Þ

ðs Þ

∂wi, j ∂wp,q

  ∂2 y ðs1Þ   ðs1Þ ∂y ðs1Þ ðs1Þ ðs1Þ ∂yk 00 ðsÞ k k yk + φ y : k ðl Þ ðl 0 Þ ðl Þ ðl 0 Þ ∂wi, j ∂wp,q ∂wi, j ∂wp,q

If we want to calculate the error gradient in one term of the func∂en tional (1.3), i.e., numbers , where en ¼ (A(y(L)(ξn))  g(ξn))2 for ðl Þ ∂wi, j all network weights, then we make a dual network, which instead (l) 0 (l) 0 (l) of numbers x(l) j and yj will get the numbers x j and y j . For dual network, the movement is in reverse order, from the outputs to the inputs in accordance with the formulas ðl Þ x0 j

¼

ðl + 1 Þ m X

ðl Þ

ðl Þ

wi, j y 0 i

(2.10)

i¼0 ðl1Þ

y0 j y0

ðLÞ

ðl Þ

¼ zj,i x0 j



    ∂A y ðLÞ ðξ Þ n ¼ 2 A y ðLÞ ðξn Þ  g ðξn Þ ∂y



(2.11) (2.12)

If another formula is used instead of Eq. (1.3), the expression (2.12) changes accordingly. Herewith ∂en ðl Þ ðl Þ ¼ xj y 0 i : (2.13) ðl Þ ∂wi, j

57

58

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

Thus, the calculations are based on the formulas (2.10)–(2.11) from the network output to the network input, starting with the error—see Eq. (2.12), obtaining the gradient values in the course of the case in accordance with the formula (2.6). Reaching the end, we obtain the derivatives concerning the inputs. In this case, once again recall that all the calculations should be done for each term in Eq. (1.3), and the results are folded. The algorithm for calculating the derivative of the error functional with respect to the weights of the network. 1. Calculate the network output by formulas (2.3), (2.4) and auxiliary variables z(l) j by formula (2.5), moving from the first layer l ¼ 0 to the last one l ¼ L. 2. Perform calculations using the formulas (2.10)–(2.13), moving from the last layer to the first one. 3. We conduct the calculations of items 1 and 2 for all summands (1.3) by adding the derivatives getting by the formula (2.13). A good procedure for the calculation of the gradient makes optimization methods that use the gradient particularly effective. Usually, the calculation of the gradient by the specified algorithm requires only three times more time than the calculation of the network output [2], and not the number of times that is equal to the number of selected weights, as it could be assumed. If you use the method of zero order, then the computation of the gradient and related quantities is not performed. Usually, it is necessary to do so, if part of the activation functions or the error functional is nondifferentiable functions, although in this case, a subdifferential can be used instead of a derivative [3]. In addition to linear mappings, quadratic forms or even higher degree polynomials can be used in formula (2.1). For the quadratic case, the formula (2.3) is replaced by the expression we have written in the coordinates ðl Þ ðl Þ m m P P ðl Þ ðl Þ ðl Þ ðl Þ ðl Þ ðl Þ wi, j xj + wi, j, j1 xj xj1 ; l ¼ 0, 1,…,L, and we change yi ¼ j¼0 j, j1¼0 the formula for calculating the gradient (2.5) on the formula ðl Þ ðl Þ ∂yi ∂yi ðl Þ ðl Þ ðl Þ ¼ x ; ¼ xj xj1 : j ðl Þ ðl Þ ∂wi, j ∂wi, j, j1 If l < s  L, then we get !! ðsÞ ðs Þ mðlÞ m ðl Þ X X ∂yk ∂xp ðs Þ ðs Þ ðsÞ ¼ wi,p + 2 wi,p, j1 xj1 . The formula (2.8) is ðl Þ ðl Þ ∂wi, j ∂wi, j p¼1 j1¼1 saved with the corresponding changes in the matrix W(l). Similar changes occur in Eqs. (2.10) and (2.13). In the matrix recordings, we have y(l) ¼ W(l)x(l) + W(l)(2)(x(l)x(l)), where W(l)(2) is a

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

vector-valued quadratic form corresponding to a threedimensional matrix. Formula (2.7) takes the form  ∂x ðsÞ ∂y ðsÞ  ðlÞ ðlÞ ¼ W x + 2W ðlÞ ð2Þx ðlÞ ðl Þ ðl Þ ∂wi, j ∂wi, j 0 1 ðs Þ ∂x ∂x s A : (2.14) ¼ W ðlÞ ðlÞ + 2W ðlÞ ð2Þ@x ðlÞ , ðl Þ ∂wi, j ∂w i:j Obviously, these formulas are generalized to the case of higher degrees, for example, P   X W ðlÞ ðpÞ x ðlÞ , x ðlÞ , …, x ðlÞ y ðlÞ ¼ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} p¼1 p

and ∂y ðsÞ

¼ ðl Þ

∂wi, j

  ∂x ðsÞ W ðlÞ ðpÞ x ðlÞ , x ðlÞ , …, x ðlÞ : ðl Þ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} ∂wi, j p¼1

P X

p1

Formula (2.6) remains valid   P P pW ðlÞ ðpÞ x ðlÞ , x ðlÞ , …, x ðlÞ : p¼1 |fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}

if

we

replace

W(l)

by

p1

2.1.2

The determination of the initial values of the weights of the perceptron

In order to start the learning process of the network, it is necessary to assign some initial values to the weights. The easiest way would be to take them as zero, but then in accordance with Eq. (2.6) and derivatives will also be equal to zero, i.e., gradient methods cannot be started. We recommend taking random num(l) bers as initial weights, for example, to select w  i, j as independent  1 1 values uniformly distributed on the segment  ðlÞ ; ðlÞ or the m m   1 1 segment  pffiffiffiffiffiffiffiffiffi ; pffiffiffiffiffiffiffiffiffi . The output weights can be initialized mðlÞ mðlÞ to zero. Another option is to take weights independent, normally 1 distributed values with zero expectation and dispersion ðlÞ . m A fundamentally different way of determining the initial weights is to form a network for which the error is small enough at once. To do this, we propose to use the methods that we have outlined in Chapter 5.

59

60

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

2.2 2.2.1

Networks with radial basis functions—RBF The architecture of RBF networks

The main difference between this type of network and the perceptron is that in the RBF-network, each neuron is responsible for local approximation in the neighborhood of some point, and not for the difference between the values of the function in half-spaces, as in the case of the perceptron. It can be concluded that the perceptron is better used for extrapolation, which performs the continuation of the function in the area where its values are not set and for modeling problems with jumps (discontinuity), such as phase transitions of shock waves. RBF-networks preferred in problems, which are characterized by a smooth change of the solution.

2.2.2

Radial basis functions

In accordance with the Ritz-Galerkin method, the solution of m P the problem (1.1) is searched in the form u ¼ ci vi ðx, ai Þ, where, i¼1

wi ¼ (ci, ai) is a vector of weight coefficients, v is a basis function. One example of such a basis function is a function of the form vi(x) ¼ v(kx  x0ik, αi) with centers x0i. Such basis functions are called radial basis functions. Selection of coefficients (ci, αi, x0i), where ci ¼ (ci,1, ci,2, …, ci,m1), in order to minimize the error functional is called learning network. Most often use functions of the form vi(x) ¼ φ(αikx  x0ik), where φ is a single variable function called an activation function. In this case, the network output is determined by the formula: u¼

m X

 

ci φ αi x  x 0i

(2.15)

i¼1

A more general approach is to use a function of the form vi(x) ¼ v(x  x0i, ai), where ai is a vector of nonlinear input parameters. Since the basis functions have several parameters, which, like weight, ought to be optimized in the learning process, it was advisable to introduce the concept of neuro elements. A neuro element cv, where v is the simplest neural network basis function, for RBF-network and perceptron with one hidden layer, is determined by the activation φ function of one variable: t 2 R: v(x; a) ¼ φ(t), t ¼ τ(x; a), a 2 Rm. In the case of artificial neural networks with a special architecture, it is possible to use different types of neuro elements with different activation functions. Some problems use more complex types of neuro elements, for example, Gaussians with polynomial coefficients.

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

The most common activation function of the RBF-network is 2 2 the Gauss function φ(x) ¼ e x . Its derivative is φ0 (x) ¼  2xe x . 1 , having a A similar graph has the Cauchy function φðxÞ ¼ 1 + x2 2x , but decreasing slower than the Gauss derivative φ0 ðxÞ ¼ ð1 + x 2 Þ2 function at a distance when moving away from the center. To solve differential equations more effectively, reduce the number of neurons in the hidden layer, sometimes RBF functions are used with shape parameters. An example of such a function is  0:5β ^2 [4], the inverse Multiquadric by Hardy (MQ) φðxÞ ¼ x2 + x which form depends on the parameter β. Its derivative has the  0:5β1 . form φ0 ðxÞ ¼ βx x2 + x^2 The family of generalized multi-quadrics allows using two parameters of scale and shape to select the type of neuro elements of the hidden layer of neurons. The generalized multiquadric is determined by the relation: φ(x) ¼ (1 + (εx)2)β and its derivative is equal φ0 (x) ¼ 2βε2x(1 + (εx)2)β1. Using gradient methods for teaching RBF-networks requires the calculation of derivatives concerning parameters. We give specific formulas for the case vj(x) ¼ exp( αjkx  x0jk2) where

2 P 2 k 



0 xi  xj,i , where k is the dimension of x.

x  x 0j ¼ i¼1

k  

2

2  X 2 ∂vi ðx Þ 0 ¼ vi ðx Þ x  x 0i ¼  exp αi x  x 0i

xs  xi,s ; ∂αi s¼1   ∂vi ðx Þ ¼ 2αi vi ðx Þ xj  xi,0 j : ∂xi,0 j

In the anisotropic case, we can take vi ¼ vi(ρ, v), where ρ ¼ kx  x0ik, v is a vector on a unit sphere that can be parametrized by the corresponding spherical coordinates. An explicit expression of the error functional through the parameters (weights) of the neural network greatly simplifies the process of minimization—network configuration. As an example, we find the functional expression for the model problem formulated in Section 1.2.1. To do this, we present an approximate solution of the Dirichlet problem for the Laplace equation in a unit circle as the output of ANN, constructed on the basis of N Gaussian neuro elements: N n h io X ci exp αi ðx  xi Þ2 + ðy  yi Þ2 ¼ uðx, y Þ ¼ i¼1

¼

N X

 ci exp αi r 2 + ri2  2rr i cos ðυ  υi Þ

i¼1

where x ¼ r cos ϑ, y ¼ r sin ϑ, xi ¼ ri cos ϑi, yi ¼ ri sin ϑi.

61

62

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

As a minimized functional we choose the Dirichlet integral; we introduce a boundary condition into the functional in the form of a summand with a positive penalty factor δ ð ðð " 2  2 # ∂u ∂u + dxdy + δ ½u  f 2 dϑ J ð uÞ ¼ ∂x ∂y Ω

∂Ω

Using polar coordinates and integrating the angular variable in the first integral, we obtain a representation for the functional in the form (1.2), where J1 ¼

m X

n     oð1    2 2 2 2 8πci ck ai ak exp  ai xi + yi + ak xk + yk exp  ai + ak ρ2 

i, k¼1

0

  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2  2     I0 2ρ ai xi + ak xk + ai yi + ak yk ρ2 + xi xk + yi yk +        xi + xk ai xi + ak xk  yi + yk ai yi + ak yk ρ  + qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2  2 ai xi + ak xk + ai yi + ak yk  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2  2 ρdρ I1 2ρ ai xi + ak xk + ai yi + ak yk

(2.16)

where In(z) is a modified Bessel function of the order n, 2π ð(

J2 ¼

m X

h ci exp



ai 1 + xi2 + yi2  2

qffiffiffiffiffiffiffiffiffiffiffiffiffiffi i xi2 + yi2 cos ðϑ  ϑi Þ  f ðϑÞ

)2 dϑ

i¼1

0

The integral (2.16) can be represented as three summands, which are reduced to two integrals depending on two combined parameters. J1 ¼

m X

     8πci ck ai ak exp  ai xi2 + yi2 + ak xk2 + yk2  i, k¼1   qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  Ai, k exp ½ðai + ak ÞI0 2 ðai xi + ak xk Þ2 + ðai yi + ak yk Þ2 +  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi + Bi, k K1 ai + ak ðai xi + ak xk Þ2 + ðai yi + ak yk Þ2 +  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , + Ci,k K3 ai + ak ðai xi + ak xk Þ2 + ðai yi + ak yk Þ2

where Ai,k ¼

½ðxi + xk Þðai xi + ak xk Þ  ðyi + yk Þðai yi + ak yk Þ h i , 2 ðai xi + ak xk Þ2 + ðai yi + ak yk Þ2

(2.17)

Bi, k ¼ ðxi xk + yi yk Þ  2Ai,k ,

(2.18)

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

ð1

Ci, k ¼ 1 + 2ðai + ak ÞAi,k ,

  Kp ðα, βÞ ¼ exp αρ2 I0 ð2βρÞρp dρ, p ¼ 1, 3:

(2.19) (2.20)

0

In order not to calculate these integrals at each step of the iterative process, they can be calculated in a sufficiently representative set of points and then interpolate, for example, using a separate RBF-network. The penalty term can also be simplified and converted to the form: m X    ci ck exp ai 1 + xi2 + yi2 J2 ¼ 2π i, k¼1  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   ak 1 + xk2 + yk2 ÞI0 2 ðai xi + ak xk Þ2 + ðai yi + ak yk Þ2  ð m X   2 2 exp ½2ai ðxi cos ϑ + yi sin ϑÞf ðϑÞdϑ 2 ci exp ai 1 + xi + yi 2π

i¼1 2ðπ

0

½f ðϑÞ2 dϑ:

+ 0

Note that the governing equations for the coefficients ci in the case of the quadratic functional will be linear. In the case of a linear problem, the action of the operator is reduced to calculating it for each of the basis functions v: N

 

P Δu ¼ ci Δv αi y  y^i . The expression for the Laplacian in the i¼1

case of the Gauss function has the form: Δu ¼ 4

N X

h  i h i , ci exp ai ðx  xi Þ2 + ðy  yi Þ2 a2i ðx  xi Þ2 + ðy  yi Þ2  a1 i

i¼1

or in the case of the Cauchy function: Δu ¼ 4

N h   i3 h i X : ci ai ðx  xi Þ2 + ðy  yi Þ2 + 1 a2i ðx  xi Þ2 + ðy  yi Þ2  a1 i i¼1

It should be noted another approach in which we choose as functions v the fundamental solutions of a linear differential operator with centers outside the region where the solution is sought. At the same time, we will certainly satisfy the differential equation—the training of the network will be reduced to satisfying the boundary conditions. In the case of our model problem. uðx, y Þ ¼

N X i¼1

N   X   ci ln ðx  xi Þ2 + ðy  yi Þ2 ¼ ci ln r 2 + ri2  2rr i cos ðϑ  ϑi Þ i¼1

63

64

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

as a minimized functional acts, for example, the following version #2 2ðπ " N X   2 ci ln 1 + ri  2ri cos ðϑ  ϑi Þ  f ðϑÞ dϑ: J ð uÞ ¼ 0

i¼1

It is known that such functions can approximate the solution as accurately as necessary. The corresponding estimates and detailed discussion are given in [5]. From here we can see another way of constructing RBFnetwork: we approximate the boundary data (for example, in a mean square sense) by radial basis functions v, then calculate the solution of the Dirichlet problem uN(x, y) by Poisson’s formula, choosing the following approximation as the boundary condition: f N ð ϕÞ ¼

N X

    cj ν 1  2rj cos ϕ  ϑj + rj2 :

j¼1

Simple calculations show that in the case of Gaussian basis functions, the solution uN has the form: uN ðr, ϑÞ ¼

N X

∞    X ck exp ak 1 + rk2 δn r n cos nϑIn ð2ak rk Þcos ϑk , n¼0

k¼1

where In(z)—modified Bessel function of the order n, δ0 ¼ 1, δn ¼ 2, n 1. We can obtain an approximation for the solution, not in the form of a Fourier series, but in the form of a finite sum of rather cumbersome expressions composed of elementary functions, if we approximate a Gaussian package of the form: h  i h  i : v ¼ exp α ðx  xc Þ2 + ðy  yc Þ2 ffi 1  α ðx  xc Þ2 + ðy  yc Þ2 +

(

This approximation for the solution has the form: N  X ci ai ri uN ðr, ϑÞ ¼ π 1 1  r 2 i¼1



sin ðβi Þ r

  2r cos αi + 1 + r 2 cos βi 1  2r cos ðαi  βi Þ + r 2 cos βi αi + ln  1  2r cos ðαi + βi Þ + r 2 r r j1  r 2 j 9 > > >    α > = 2 2 i  2 1  r 1 + 2r cos βi + r tg ð =2 Þ , arctg 2 a1  ðri  1Þ > > > > j1  r 2 j2 + 4r 2 sin 2 βi  ð1 + 2r cos βi + r 2 Þ2 i ; ðri + 1Þ2  a1 i

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

where xi ¼ ri cos ðϑi Þ; yi ¼ ri sin ðϑi Þ; αi ¼ arccos h i 2 2 2 ð 1  r Þ ; ð 1 + r Þ a1 ; βi ¼ ϑ  ϑi : i i i

1 + ri2  a1 i ; 2ri

An even more convenient result is obtained by using the Cauchy kernel—in this case, the Poisson integral is computed explicitly, so by approximating the boundary condition we can immediately obtain an approximation to solve the Dirichlet problem (in the same notations) 9 8 > > N = < 2 X   1r 2 ci Ai sign 1  r + Bi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uN ðr, ϑÞ ¼    2 >, > i¼1 : a4i + 2a2i 1  ri2 + 1  ri2 ; where h   i   r r a2i + 1 + ri2  ri 1 + r 2 cos βi Ai ¼  ,   2 2   ri 1 + r 2  2ri r 1 + r 2 a2i + 1 + ri2 cos βi + r 2 a2i + 1 + ri2  4ri2 r 2 sin 2 βi h   i   ri ri 1 + r 2  r a2i + 1 + ri2 cos βi : Bi ¼    2 2   ri 1 + r 2  2ri r 1 + r 2 a2i + 1 + ri2 cos βi + r 2 a2i + 1 + ri2  4ri2 r 2 sin 2 βi

In some cases, usually associated with the symmetry of the problem and a special kind of boundary data, it is possible to obtain explicit formulas for the error functional or for the approximate neural network solution itself, which, of course, accelerates the process of neurocomputing. Just as it is done in [6], it is possible to look for the solution in the form of two summands—one satisfies the boundary condition and does not contain the selected parameters, and the other satisfies the equation taking into account the first summand and contains the selected parameters. This technique is suitable only for linear problems. The combination of this approach with the neural network approximation of boundary conditions f leads to another learning algorithm involving two networks: first, we construct an RBF (or other) neural network approximating the N P Dirichlet data f ffi fN ¼ ci vi , the function fN generates an approxi¼1

imation for the solution uN in the domain Ω and for h ¼ u  uN a homogeneous Dirichlet problem for the Poisson equation Δh ¼  ΔuN, h jΓ ¼ 0 (since only the boundary conditions are satisfied, not the Laplace equation itself ).

65

66

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

We

construct a second network that approximate K P g ¼ ΔuN ; g ffi di ψ i : Solving a series of special homogeneous i¼1

Poisson problems Δhi ¼ ψ i, hi jΓ ¼ 0, i ¼ 1, …, K, we will get K P h ¼ ci hi , therefore, u ¼ uN + h. The solution of each such probi¼1

lem for a particular RB-function is given by explicit formulas hðr, ϑÞ ¼ where

1 2π

ð ln Ω

r 2  2rρcos ðϑ  ϕÞ + ρ2 ψ ðρ, ϕÞρ dρdϕ, r 2 ρ2  2rρ cos ðϑ  ϕÞ + 1

 ψ ðρ, ϕÞ ¼ exp ai ρ2 + ri2  2ρri cos ðϕ  ϑi Þ

or

1 : ψ ðρ, ϕÞ ¼ ai ρ2 + ri2  2ρri cos ðϕ  ϑi Þ + 1 For the problem of Section 1.2.2, taking into account its symmetry, we consider the following implementations of the proposed neural network approaches to the solution. RBF-network with Gaussian basis functions was used as a single neural network uðx, y Þ ¼

N X

n h io ci exp ai ðx  xi Þ2 + ðy  yi Þ2 :

i¼1

The network learning algorithm is a random search method with the re-generation of test set points after each training stage. The Dirichlet integral (as one of the possible variants) is chosen as the summand J1 of the error functional (1.3) ðð " J1 ¼ Ω

∂u ∂x

#  2  2 ∂u + + 2gu dxdy: ∂y

Homogeneous boundary conditions are taken into account by introducing a penalty term into the error functional 2ðπ

J2 ¼

u2 ð cos ϑ, sin ϑÞdϑ: 0

Using the symmetry of the problem and a special kind of function g, we lead J1 to the form

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

      8πci ck ai ak exp  ai xi2 + yi2 + ak xk2 + yk2  i, k¼1   qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  Ai,k exp ½ðai + ak ÞI0 2 ðai xi + ak xk Þ2 + ðai yi + ak yk Þ2 +  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi +Bi,k K1 ai + ak ðai xi + ak xk Þ2 + ðai yi + ak yk Þ2 +  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi + +Ci, k K3 ai + ak ðai xi + ak xk Þ2 + ðai yi + ak yk Þ2

J1 ¼

m X

N n  o X + 4πAr 2 ck exp ak ðxk  x0 Þ2 + ðyk  y0 Þ2  k¼1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 K1 ak r , ak r ðxk  x0 Þ2 + ðyk  y0 Þ2 ,

where Ai,k, Bi,k, Ci,k, Kp(α, β) define by formulas (2.11)–(2.14). The penalty term J2 can also be simplified and converted to the form N X

    2π ci ck exp ai 1 + xi2 + yi2 + ak 1 + xk2 + yk2 i, k¼1  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi I0 2 ðai xi + ak xk Þ2 + ðai yi + ak yk Þ2 : Thus, the functional is reduced to a finite number of summands, which, along with the modified Bessel function I0, include only two integrals K1 and K3 depend on two combined parameters. Note that these integrals cannot be calculated at each step of the iterative process, but calculated in a sufficiently representative set of points and interpolated, for example, using a separate RBF-network. Having explicit formulas will simplify the network setup process.

2.2.3

Asymmetric RBF-networks

The main feature of the above basis function vj is that for it all directions from the center are the same. If the desired solution has such a property, it is a virtue, if not—it is a disadvantage. In the case when not all directions are equal, other functions are more attractive. In a simpler version, when the anisotropy is manifested only in the unequal coordinates, we recommend taking ! k  2 X 0 αj, i xi  xj:i : (2.21) vj ¼ exp  i¼1

 2 ∂vi ðx n Þ ¼ vi ðx n Þ xn, j  xi,0 j the ∂αi, j rest of the formulas for the derivatives remain unchanged, only aj changing to aj,i. Differentiating (2.21), we obtain

67

68

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

A more complex situation occurs if the characteristic directions are at an angle to the original coordinate system. In this case, it makes sense to use vj ¼ exp 

k X k X



0 αj,i, s xi  xj,i



0 xs  xj,s

! 

:

(2.22)

s¼1 i¼1

In this case, as usual, we consider aj,i,s ¼ aj,s,i or, doubling the coefficients of nondiagonal terms, we consider the sum in exponent argument only for indices satisfying the condition s  l. Differentiating equality (2.22), we obtain    ∂vi ðxn Þ ¼ vi ðxn Þ xn, j  xi,0 j xn,s  xi,0 s ; ∂αi, j,s i X   ∂vj ðxn Þ ¼ 2vj ðxn Þ αj, i,s xn,s  xi,0 s : 0 ∂xj, i s¼1

It is possible to use functions with other types of asymmetry. In problems with anisotropy, it is important to consider the behavior of the function along with the rays coming out of the center x0i. In the two-dimensional case as basis functions we have vi ¼ vi(ρ, ψ), где{ρ, ψ} are polar coordinates of the vector x  x0i, x(r, ϕ) is the vector of the current point. The basis functions typical for the finite element method are obtained as a special case of the RBF-network with the appropriate choice of the function vi. If the position of the element boundary relative to its center is characterized  by some function ρ ¼ ai(ψ), then we take the function vi ¼ 1  ρ =a ðψ Þ , where as usual defined i +  w, w  0; w+ ¼ 0, w < 0: Herewith for the polygonal boundary, we obtain a piecewise linear basis function, for which ai(ψ) is obtained from the polar equations of the corresponding lines, i.e., at the corresponding intervals of change ψ, we obtain ai ðψ Þ ¼ ρi = cos ðψ  ψ Þ : If we want i

the basis function on the boundary to have a zero derivative, then  2 we choose vi ¼ 1  ρ =a ðψ Þ : For the basis function of the form i +   2 2ρ ρ we obtain a smooth vertex. v i ¼ 1 + =a ð ψ Þ 1  =a ð ψ Þ i

i

+

Still, some complication allows to receive a smooth surface with the polygonal basis, but we will not go into it.

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

In the multidimensional case, we take vi ¼ vi(ρ, v), where ρ ¼ kx  x0ik and v is a vector on a unit sphere, which can be parametrized by the corresponding coordinates.

2.3

Multilayer perceptron and RBFnetworks with time delays

The solution of problems which statements include the variable time requires corresponding basis functions. As mentioned earlier, time can be viewed as an additional dimension similar to spatial ones. However, special dynamic neural network functions, two types of which will be described below, are sometimes more effective for this class of problems. For a multilayer perceptron with time delays, the connections go not only from one neuron to another at a fixed time but also from the outputs of the neurons of the network corresponding to an earlier time, to the neurons of the next layer of networks corresponding to later moments. The formulas (2.3) and (2.4) are replaced by the relations y j ðn Þ ¼

p m1 X X

wi, j ðt Þxi ðn  t Þ

(2.23)

i¼0 t¼0

  xj ðnÞ ¼ vj yj1 ðnÞ

(2.24)

Time-delayed RBF networks could be thought of as normal RBF-functions with additional coordinate—time, if not for two circumstances. Firstly, for the problems of constructing a solution taking into account the experimental data, observations {xn(t), yn(t)} come at discrete moments, while the remaining coordinates x are considered real continuous. This is not the biggest problem since these discrete moments can be considered to be located on an additional coordinate axis. Secondly, if we consider real-time network training, i.e., additional network after-training as new observations arrive, the time axis will differ significantly from the others. To solve these problems, we propose the following forms of temporary RBF-networks. The first form of temporary RBF-networks allows us to simulate situations where some events occur, the effect of which weakens over time. Herewith the formula (2.21) is converted to a formula y¼

m X j¼1



 cj vj x  x^j , t  ^t j :

(2.25)

69

70

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

Here as a basis function vi can be taken, for example, the Gauss function 

2  2  (2.26) vj ðx, t Þ ¼ exp αj x  x^j  βj t  ^t j or the Cauchy function: vj ðx, t Þ ¼

1

2  2 : 1 + αj x  x^j + βj t  ^t j

(2.27)

A mixed version is also possible: 

2  exp α x  x^j

vj ðx, t Þ ¼  2 ; 1 + βj t  ^t j   2  exp βj t  ^t j vj ðx, t Þ ¼

2 : 1 + αj x  x^j

In this case, the meaning ^t j is determined by the features of the problem: 1. ^t j are known moments. In this case, we find the remaining coefficients using conventional optimization procedures or in another way typical for RBF-networks. 2. We find moments ^t j and centers x^j using the clustering procedure. 3. We find the weights ^t j , x^j αj, βj, cj with the help of some error functional minimization algorithm. All three options have one feature—the influence of the event extends not only forward in time, but also back. If the event under consideration is caused by the internal prerequisites of the system under study, such a model is justified and even allows us to predict the culmination of the event before its occurrence. The third variant of the interpretation of time parameters is especially well adapted for this since it does not interfere with one or several moments ^t j to be later than the entire sample under consideration. If the event is caused by external factors that did not manifest themselves before its onset, and the impact of this event fades with time, then we recommend using the basis function  (

2  2  exp αj x  x^j  βj t  ^t j ; t  ^t j (2.28) vj ¼ 0; t < ^t j or the basis function

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

( vj ¼



2   exp αj x  x^j  βj t  ^t j ; t  ^t j 0; t < ^t j

(2.29)

The latter function slowly decreases with time. Even more slowly decreases the Cauchy function, which, depending on the problem we use only in time  8

2 



> ^ exp α x  x > j j < ; t  ^t j   2 (2.30) vj ¼ ^ 1 + β t  t j > j > : 0; t < ^t j or both in space and time 8 1 > <

2  2 ; t  ^t j



vj ¼ 1 + αj x  x^j + βj t  ^t j > : 0; t < ^t j

(2.31)

These one-way basis functions allow the same three options for working with parameters ^t j as the two-way functions presented above. The difference will be only in the fact that the moments ^t j will not be associated with the centers of the clusters in question, but with their beginning. For problems with pronounced anisotropy, it is advisable to consider a set of basis functions adapted for them. If anisotropy manifests itself only in the coordinate inequality, then instead of Eq. (2.26) as a basis function, we  k  2  2 P  , and if the  αj,i xi  x^j, i  βj t  ^t j take vj ¼ exp i¼1

characteristic directions are not directed along the coordinate axes, then we use the basis functions ! of the form k       P 2 αj,i,s xi  x^j, i xs  x^j, s  βj t  ^t j . In the same vj ¼ exp  i, s¼1 way, we transform formulas (2.28)–(2.30). The second form of temporary RBF-networks is designed to simulate the dynamics of systems with several interacting centers. For it, the formula (2.15) is replaced by the formula u¼

m X



 cj ðt Þvj x  x^j ðt Þ , αj ðt Þ ,

(2.32)

j¼1

where we take the basis function, for example, in the form 

2  (2.33) vj ¼ exp αj ðt Þ x  x^j ðt Þ :

71

72

Chapter 2 THE CHOICE OF THE FUNCTIONAL BASIS (SET OF BASES)

It is possible also to use the other basis functions listed above, modifying them accordingly. Neural networks with different architectures are used to solve differential equations. We mainly used RBF-networks and perceptrons. The introduction (use) of delays at the input of the neural network and delays between layers allows us to get dynamic networks that take into account the time. The use of neural networks with sufficiently smooth basis functions in solving differential equations makes it possible to obtain as an approximate solution a function with the desired smoothness. If the set of neural network architectures, which we have considered in this Chapter, is not enough to solve some problem, we recommend referring to the fundamental monograph [1] and contemporary publications on the application of neural networks to differential equations.

References [1] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, 1999. 823 p. [2] A.N. Gorban, D.A. Rossiev, Neural Networks on a Personal Computer, Nauka, Novosibirsk, 1996. 276 p. (in Russian). chal, Fundamentals of Convex Analysis, Springer, [3] J.-B. Hiriart-Urruty, C. Lemare 2001. 418 p. [4] R.L. Hardy, Theory and Applications of the multiquadric-biharmonic method, Comput. Math. Appl. 19 (8/9) (1990) 163–208. [5] M.A. Aleksidze, Fundamental Functions in Approximate Solutions of Boundary Value Problems, Nauka, Moscow, 1991. 352 p. (in Russian). [6] I.E. Lagaris, A. Likas, D.I. Fotiadis, Artificial neural networks for solving ordinary and partial differential equations, IEEE Trans. Neural Netw. 9 (5) (1998) 987–1000.

Methods for the selection of parameters and structure of the neural network model

3

After the problem formulation in the form of functional optimization and choosing the type of basis functions, it is necessary to select an algorithm for finding the decomposition coefficients of the approximate neural network solution to the problem (1.1) according to the basis chosen and the method of the neural network restructuring. If it is necessary, the type of elements of the basis can also be fitted. The network structure justification implies the choice of the basis elements, the number of layers, and the number of elements in every layer. The selected algorithms should allow efficient parallel and distributed implementation. In this chapter, we create a complex of methods and algorithms for finding neural network dependence y 5 f(x, w) according to heterogeneous data. We suppose this data can be specified, using differential equations or other relationships. We focus on finding the structure of the function f in conjunction with searching the weight vector w. In the overwhelming majority of publications dedicated to the neural networks application in the field of mathematical modeling problem, only finding the weights is implemented without selecting a structure, leaving the most important question of choosing a structure for imperfect user intuition. Our experience has shown that the implementation of evolutionary algorithms for the network structure selection according to the problem being solved allows us to obtain a significantly more compact model without loss of modeling accuracy. Reducing the size of the model (the number of layers and the number of neurons, etc.) speeds up the calculations for the model and improves its predictive properties.

Semi-empirical Neural Network Modeling and Digital Twins Development. https://doi.org/10.1016/B978-0-12-815651-3.00003-1 # 2020 Elsevier Inc. All rights reserved.

73

74

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

The general scheme of the algorithms that we offer for training neural networks includes the following sequence of steps: 1. Statement of the problem. The problem is formulated as a system of functionals, finding the type of the required model (we assume a neural network); the initial structure of the neural network and other initial parameters. Most often, instead of a single initial structure, we suppose a set of structures to be used. 2. Adaptation. The coefficients of the model (or models) are adapted in accordance with the selected algorithm (or algorithms) for optimizing the objective functional (or system of functionals). 3. The structure justification. The structure of the model (or models) varies according to certain rules, while usually another criterion (objective) functional is used, which differs from that used in the previous step. 4. Iteration procedure. Steps 2 and 3 are to be repeated to achieve the required quality of the model or are performed in the period to which an adaptive model of the functioning object belongs. For the above algorithm implementation, the initial sample is usually randomly divided into two parts for training and testing. The training sample is used for an error functional justification, which is used to estimate the network weights. A testing sample is used to build an auxiliary functional (or set of functional) that is used to justify a structure. For the quality of the network control, the third sample is often used; it is called verifying sample. In the solving of mathematical physics problems, the training, testing, and verifying samples, are selected on a set of samples. That protects from the insufficient sample size, which is found in data processing tasks.

3.1

Structural algorithms

In this section, we consider the algorithms that determine the modification of the neural network structure. The simplest way to organize a structural algorithm is to add neurons as needed, creating so-called growing network method. In this case, it is possible to use the errors of satisfying the equation obtained in the network learning process, the boundary conditions, matching conditions at the junction areas, to find the centers of the clusters. At the same time, it is advisable to take cluster centers as the initial approximation for centers of RBF networks, training the network if necessary. A similar procedure can be applied to the perceptron, but in this case, the role of the

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

centers is played by the lines of maximum steepness (switching in the case of a discontinuous activation function). In the method of growing networks, the number of terms N in a neural network decomposition is not constant but increases from step to step of the algorithm. The optimization task is performed by a sequence of independent steps. Note that this approach can be applied to heterogeneous networks with different elements v ¼ vi. Let the required level of accuracy be determined η0 > 0 (see η-solution in Introduction and Chapter 1). Step 1: Take N ¼ 1, then: u1η (x, t; w1) ¼ c1v1(x, t; a1). This expression for an approximate solution generates an optimization problem with only one vector parameter w1. (At the first step, it is possible to reduce the dimension of the minimization set, by specially selecting some components.) Then, having performed the minimization process, we obtain   η1 ¼ min J u1η ¼ min E ðw 1 Þ w1

w1

Obviously, this problem has a solution:   η1  0, w 01 ¼ c10 , a01 ¼ arg min η1 , implying u01(x, t) ¼ c01v1(x, t; a01). If the required level of accuracy is achieved: η1  η0, then the calculation process ends; otherwise jump to Step 2. Step 2: Take N ¼ 2, then: We will use the solution u01(x, t), found in the first step, then the function under consideration in the second step will be as follows: u2η (x, t) ¼ u01(x, t) + c2v2(x, t; a2), where the second term is entered into the right-hand side of the equality with an indefinite weight w2 ¼ (c2, a2)—with the desired parameters c2, a2 to be determined at this step. Then  the minimization problem looks like this: η2 ¼ min J u2η ¼ min E ðw 2 Þ, where E(w2) is the result of the subw2

w2

stitution u2η (x, t) in the general expression for the functional J. As a result, we obtain a new solution η2 0, w02 ¼ (c02, a02) ¼ arg min η2 and value u02(x, t) ¼ c02v2(x, t; a02). It’s obvious that η2  η1. If for N ¼ 2 the result represented as the level of deviation is acceptable: η2  η0, then the process finishes. If the result is not acceptable, then go to Step 3 and so on. … Step i (and subsequent iterations): We take N ¼ i. We will use the solutions u0j (x, t) found in the previous steps, then the function under investigation for the Step i

75

76

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

looks like this: uiη ðx, t Þ ¼

iP 1 j¼1

u0j ðx, t Þ + ci vi ðx, t; ai Þ, where the last

term on the right side of the equality is a function of the same type as before (or of a given type), but with unknown parameters wi ¼ (ci, ai). The weight vector wi is to be determined at this step i. The minimization problem is as follows:   ηi ¼ min J uiη ¼ min E ðw i Þ, wi

wi

where E(wi) is the result of substitution uiη(x, t) in the general expression for the functional J. It’s obvious that ηi  ηi1. Repeating the process, we get a monotonic sequence of deviations η: η1 η2 η3 … ηm … η0 0 and the corresponding sequence of weights w01 ¼ (c01,a01), w02 ¼ (c02,a02), …, w0m ¼ (c0m, a0m), m ¼ 3, 4, …. If at the current step, the result (deviation) is not acceptable, then the step is repeated until the current value ηm is sufficiently small: ηm  η0. In this case, the iterative process is completed, and an ηm-equivalent solution to the problem will be found. The structure of the weight matrix is shown below: 0 0 1 0 0 w1 0 0 … B 0 0 C B w1 w2 0 … 0 0 C B C B 0 0 0 C 0 0 C B w1 w2 w3 … C W¼B B C …C B… … … … … B C B w0 w0 w0 … w0 0 C @ 1 2 3 A m21 w 01 w 02 w 03 … w 0m21 w 0m Note that the above arguments do not take into account the regeneration of a set of test points; however, if it is representative enough, then such regeneration does not affect the computational process significantly. Note that these approaches to setting up the ANN weights allow the following modification: they are applied to optimizing the entire set of parameters to a certain limit (for example, a certain number of test point regenerations), and then the set of optimized parameters is narrowed - only some of them are tuned. For example, linearly included in the neural network representation for solving parameters ci (quadratically included in the functional for linear problems) is found from a system of linear equations. Further, in this section, we present a few general structural algorithms for specific types of networks. When solving each problem, the details of the algorithm must be specified in

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

accordance with its features and available human and computing resources. Genetic algorithm for adjusting the structure of a multilayer perceptron. The genetic algorithm implements the following steps. 1. Determination of the size of the initial population, the initial structure of the networks, types of neurons (its activation functions), and other parameters of the algorithm. 2. Calculation of principal components of the input sample collection [1,2]. For each instance from the network population, we should take as many principal components as the corresponding network of neurons of the first layer should have. 3. The principal components are fed the input of the first layer of neurons. 4. We apply the operations of points 2 and 3 to the outputs of neurons as many times as there are layers in the networks that we are building. At the same time, different networks in a population may have a different number of layers. 5. We construct a linear regression, which inputs are the outputs of the neurons of the last layer, and the outputs are the outputs of the network in accordance with the sample. 6. Check the break condition. Usually, as a stop condition, a sufficiently small value of error functional for the purpose of operation, a specified number of steps, or a specified time is used. 7. For each network from the set, we apply a specified number of steps of some (the same) learning algorithm (selection of weights) or apply this algorithm to each network during the same time. 8. We calculate the value of some on each network and discard the given number of worst networks (this number can be zero). 9. We apply mutation operations to all or some of the networks in the set. The mutation consists of using with some probability one of the following procedures for removing or adding neurons or adding a layer. Generally speaking, the probability of using a mutation, depends on the value of the chosen auxiliary functional—the higher it is, the greater is this probability. 10. Repeat steps 7 and 8. 11. For the best pairs of networks (according to auxiliary functional), we perform a crossing operation. Crossing consists in connecting a randomly selected part of one network with a similar part of another. At the same time, the number of

77

78

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

layers of the connected networks must be the same, the layer of one network is connected to the next layer according to the number of another, and there should be no empty layers. Cross-links are endowed with weights in accordance with the following procedures for adding neurons. You can choose to cross the most dissimilar pairs of the best. At the same time, “dissimilarity” can be defined both by the dissimilarity of the structure, weights, and by the difference of sets correctly (with an error not more than a given level) of the examples being solved. More precisely, a pair is selected for crossing, for which the union of sets of correctly solvable examples is maximal. 12. Repeat steps 7 and 8. 13. Repeat steps 7–12 until the stop conditions are met. One of these conditions is the presence of a network that gives a sufficiently small error. Note that both the main and the auxiliary functional can vary from step to step. Let us discuss some of the details of the above algorithm and its possible modifications, which may allow, if necessary, to increase the efficiency of the algorithm in a particular situation. Some of steps 9 and 11 can be omitted. Typically, the parameters are chosen so that the total number of networks after steps 8–12 remains unchanged, although populations of varying size can be considered. At the same time, in the course of the algorithm operation, it makes sense to reduce the population size simultaneously with an increase in the average network size (the average arithmetic number of neurons per population). To determine the initial weights of the network, it is not necessary to apply the method of principal components, the disadvantage of which is the limitation of the number of neurons of each layer by the dimension of the input. If the initial network structure is chosen from other considerations, which follow from the characteristics of the problem being solved, then to start the learning process, it is necessary to assign any initial values to its weights. The easiest way would be to take them to be zero, but then, in accordance with formula (2.8), the derivatives of the output of the network concerning the corresponding weights will also be equal to zero, i.e., gradient methods for the selection of weights cannot be run. You can get away from this problem by starting to select weights, not from the gradient method, but using one of the methods of zero order. Another approach is possible—take small random numbers as initial weights [3, 4]. It is recommended to choose the weights of the neurons of the l-th layer w(l) ij equal to independent random

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

  1 1 numbers uniformly distributed on the segment,  pffiffiffiffiffiffi ; pffiffiffiffiffiffi ml ml where ml is the number of neurons in the corresponding layer. Output weights with this procedure are initialized with zeros. An alternative approach to choosing weights is to take weights equal to independent, normally distributed numbers with zero 1 . The third approach recommended expectation and variance ml in the literature on neural networks is to give up independence X ðlÞ2 wij ¼ 1. when choosing a scale and set a condition i, j The number of principal components included in the processing is not necessary to set a priori. It is recommended to choose this number during processing, fixing a part of the variance reflected by the main components [1, 2]. If the error in the training sample decreases too slowly and does not become sufficiently small in the learning process, then neurons are added, or the likelihood of such an operation increases in the process of mutations. Suppose we want to add one neuron to the l-st layer. In this case, it is necessary to determine the input and output weights of this neuron. The simplest way to do this is to assume that ðl1Þ the input weights wi, ml + 1 are equal to random numbers uniformly distributed on the segment, and the output weights are assumed to be random numbers uniformly distributed on the segment ðl Þ [ βj; βj] where βj ¼ min wij . 1iml

In addition to adding individual neurons, you can add whole layers to the constructed network. In order to embed another one between the two layers, it is required to determine the number of neurons in it, as well as the input and output weights. The number of neurons of the added layer is assumed to be equal to the number of neurons of the next layer. The input weights of the added layer are obtained by dividing the weights of the connections of the shared layers by a given number A (for example, A ¼ 100 or the maximum weight module). If the derivative of the activation function of the neurons of the added layer is equal φ0 (0), then the weights at the output of the added layer wii ¼ A(φ0 (0))1, and wij for i 6¼ j are independent uniformly distributed numbers on the segment [(φ0 (0))1; (φ0 (0))1]. If φ0 (0) ¼ 0 or does not exist, then this operation is not applicable. If the error in the test sample begins to grow, despite the decrease in the error in the training sample, then neurons are removed, or the likelihood of such an operation is increased in

79

80

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

the process of mutations. Neuron removal can also be made at the end of the training when the errors of training and testing have become quite small. This reduction in the number of neurons is made to reduce the size of the network; after this procedure, the network usually needs to be retrained. Similarly, network entries can be excluded, especially if their number is too large. Different neuron removal procedures are applied. The simplest procedure is to remove a neuron that has the minimum sum of modules of output coefficients (or the sum of their squares). Note that the above genetic algorithm for the selection of the structure of a multilayer perceptron is not the only one. Without special problems instead of it, it is possible to implement even more sophisticated evolution procedures applied to the population of multi-layer perceptrons. Determining the structure of RBF networks is even more urgent than determining the structure of a multi-layer perceptron since usually more RB-functions are required for the same approximation accuracy. The following are algorithms for setting the structure of this type of network using clustering and genetic operations. Also, we discuss some new types of neural networks and their possible application areas. In accordance with the previous chapter, let us define the formula of RBF-network y¼

N X

  cj vj x, aj ,

(3.1)

j¼1

where cj are the linearly entering parameters of the neural element vj, aj 5 (αj, xj0 ) are nonlinearly entering parameters, αj is symmetrically positively definite matrix, x j0 is the vector conditioning 2 centers (see also Eq. (2.15)). Here φ(t) ¼ e|t | as activation function was chosen. Thus we study neural networks using elliptical Gaussian packets:  D    E   vj x, aj ¼ exp  αj x  x 0j , x  x 0j ! k X k    X 0 0 αjls xl  xjl xs  xjs ¼ exp  s¼1 l¼1

Training such a network is reduced to the selection of optimal (in relation to a given functional or set of functionals) weights wj ¼ (cj, aj) ¼ (cj, αj, xj0 ). We present an efficient algorithm for selecting the above weights for linear problems (1.1) and quadratic functional (1.2):

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

An algorithm for constructing an RBF network based on clustering errors and principal components. 1. We build the initial solution (1.1) and calculate the discrepancies δn of the equation or the boundary condition. 2. We carry out clustering of the sample {xn, δn}. 3. As vectors xj0 whose coordinates are parameters xjl0 , we take the centers of the clusters. 4. For each cluster, we build the principal components. 5. According to the spread of cluster points along with the principal components, we determine the coefficients αjls. 6. We find cj by minimizing the functional (1.2). In this case, as linearity of the problem (1.1) and the quadraticity of the functional (1.2), to determine cj, we obtain a system of linear algebraic equations. 7. If necessary, we specify all the weights of the neural network using some selected non-linear optimization algorithm. 8. Repeat the process until the errors become small enough. The above algorithm is attractive because it allows one to circumvent the problems of multidimensional nonlinear optimization—the presence of ravines, false local extremes, etc.—and choose the right network structure. Consider an analog of the genetic algorithm for adjusting the structure of a population of RBF networks, resulting in some modification of the genetic algorithm for adjusting the structure of a multilayer perceptron taking into account the features of the structure of RBF-networks. Algorithm for building a population of RBF-networks. 1. We obtain the initial solution (1.1) and calculate the discrepancies δn of the equation or the boundary condition. 2. We carry out clustering of the samples {xn, δn}. 3. In the same way as in the algorithm based on the clustering of errors and on the principal components analyzes, we build the approximation of each cluster by identical basic function, i.e., by the points related to the j-th cluster, we define cj, αj (αjl or αjls) and xj0. 4. From the basic functions, we form a population of RBF networks, summing up the basic functions, which are characterized by proximity to the center of the cluster. In this case, one basic function can be included in the amount of several networks. 5. To the subset of data corresponding to a particular RBFnetwork, we refer the points Ω and Γ for which it gives a minimal error. If a network is too small in scope, then remove it from the population, redistributing the corresponding subset of data.

81

82

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

6.

We train the networks: each on the basis of minimizing the functional (1.2), which is obtained by narrowing the original error functional to a set of points assigned to this network in the previous step. 7. We make recombination (also called crossover) and mutation—we add and exclude variables and points of clusters. We implement translocations—clusters exchange points, and networks exchange variables and corresponding coefficients. We carry out crossover—the networks corresponding to the two groups of clusters are joined and trained on a common data set, if necessary. 8. We discard networks that are worse in terms of auxiliary functionality. 9. For points having no network with a sufficiently small error, use steps 1–6. 10. Step 7 is repeated until all sample points have been exhausted. 11. We create an auxiliary network, determining to which of RBF networks an arbitrary point of the region belongs where the sample is located in.

3.1.1

Methods for specific tasks

For the Laplace equation in the L-domain (see Fig. 1.1 and specified problem definition), we use the following elliptical Gaussian packets as a neural network basis functions.

   2 0 0 Cj vj x, y; αj , βj , γ j , xj , yj ¼ Cj exp αj x  xj0     2  2γ j x  xj0 y  yj0  βj y  yj0 :



0 xj αj γ j , x 0j ¼ . γ j βj yj0 Approach A. This approach is based on the ideology of the Group Method of Data Handling (GMDH) [5]. One of its simplest variants can be the following multi-row algorithm: 1. For each canonical subregion Π1, Π2, we take a set of regularly (or randomly) specified network centers (xi0 , y i0 ) and parameters (αi, βi, γ i) i ¼ 1, …, N associated with this subregion. 2. Consider N(N  1)/2 paired linear combinations ψ ij, 1  i < j  N, of neural network functions ψ i ¼ ν(x, y; ai): ψ ij ¼ Ciψ i + Cjψ j. We adjust the weights Ci, Cj for each pair combination ψ i,j such that the condition of a minimum of an error functional J for each subregion (rectangle Π) separately; the

Here the weight parameters wj are Cj, αj ¼

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

Dirichlet integral with the penalty term is chosen as the error functional here ð n ð o ð∂u=∂xÞ2 + ð∂u=∂y Þ2 dΩ + δ  J ð uÞ ¼ ju  f j2 dΓ ¼ Π

∂Π

¼ hu, ui + δ  ðu, uÞ  2δ  ð f , uÞ + δ  ð f , f Þ: From N(N  1)/2 the pairwise combinations ψ ij with adjusted coefficients, we select the N best ones, which give the smallest value of the error functional in another subdomain. 3. Repeat steps 2 and 3 with the resulting functions. Let’s write in more detail steps 2 and 3 of the iterative process— for this we give an expression for the error functional through weights Ci, Cj   J ψ ij ¼ Jii Ci2 + 2Jij Ci Cj + Jjj Cj2  2J0i Ci  2J0j Cj + J00 , here the following notation was used: Jkl ¼ Αkl + δ  Βkl , Αkl ¼ hψ k , ψ l i, Βkl ¼ ðψ k , ψ l Þ, similarly, we specify J0i ¼ δ  Β0i ¼ δ  ð f , ψ i Þ, J00 ¼ δ  Β00 ¼ δ  ð f , f Þ In this case, the elements of the matrix J are functions ν of special type like Gaussian distributions and a simple type of subdomains Π (rectangles) may be computed in explicit form. They can be expressed with cumbersome formulas. Notice that in general conditions integrals driving Jkl can be calculated using interpolating formulas or Monte-Carlo method, being more effective for multidimensional integrals. ∗ ∗ The optimal weights C i , C j for pair combinations ψ ij and the minimum value J on the corresponding pair combination are found by the formulas: 0 1

1 ∗ J0i @ C i A ¼ Jii Jij ∗ Jij Jjj J0j Cj or 8∗ < C ¼ Δ1 J J + Δ1 J J , jj 0i ij 0j i Δ ¼ Jii Jjj  Jij 2 ;   :∗ 1 1 C j ¼ Δ J  ij J0i + Δ Jii J0j ,  1 2 2 Jjj J0i  2Jij J0i J0j + Jii J0j Jmin ¼ Δ + J00 : We calculate the values of the error functional eJ in another area by ∗



the functions ψ ij ¼Ci ψ i + Cj ψ j :

83

84

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

  ∗ ∗ ∗ ∗ ∗ ∗ eJ ψ ij ¼ eJ ii C 2 + 2eJ ij C i C j + eJ jj C 2  2eJ 0i C i  2eJ 0j C j + eJ 00 , i j and N functions ψ i1j1, …, ψ iN jN that give the smallest value  select  eJ ψ ij . Repeating step 2, we compose new pairwise linear combinations of these functions ψ is jsit jt ¼ Cis jsψ is js + Cit jtψ it jt, where, 1  s < t  N and we calculate again the error functional:   J ψ is js it jt ¼ Jis js is js Ci2s js + 2Jis js it jt Cis js Cit jt + Jit jt it jt Ci2t jt  2J0is js Ci  2J0it jt Cit jt + J00 , There of it is easy to derive the relationship between the matrix elements of this and the previous iteration ∗



















Jis js is js ¼ Jis is C 2is + 2Jis js C is C js + Jjs js C 2js , ∗









Jis js it jt ¼ Jis it C is C it + Jjs it C js C it + Jis jt C is C jt + Jjs jt C js C jt , ∗

Jit jt it jt ¼ Jit it C 2it + 2Jit jt C it C jt + Jjt jt C 2jt , ∗







J0is js ¼ J0is C is + J0js C js , J0it jt ¼ J0it C it + J0jt C jt : With using these numbers, we determine new optimal coeffi∗



cients C is js , C it jt are, etc. (Moreover, at each time, as before, to find new weights, it is necessary, to solve a system of two linear equations with two unknown variables.) If the number of iterations is greater than log2N, then all the neural network functions ψ i of this population will be involved in the calculation process. The given formulas and explicit expressions for the initial parameters allow us to simplify and to accelerate calculations. Note that the following modification of the algorithm is also possible: when pairing linear combinations in step 2 of the algorithm, the neural network functions that correspond to different subdomains are chosen, and when selecting pairs in step 3, for example, the pairs giving the minimal mismatch in the intersection of subdomains are considered as the best ones. In a more sophisticated version of this approach, it is possible to select not only coefficients C but also other weights of the network. Note that, using different types of functions ν, we can build heterogeneous networks of various architectures on this path. The implementation of approach A in the case of splitting a region L into a larger number of subdomains can be carried out in various ways; we describe the simplest of them. As usual at each step, it is possible to choose paired combinations of functions. At the same time, either each pair of functions must correspond to a

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

pair of intersecting regions, then the best neural networks are selected by the minimum of the intersection functional, or the regions are not required to intersect, then the minimum is taken for the region over which the coefficients were not selected. It is possible to use linear combinations with a larger number of terms, the most reasonable is to take the terms as many as the subdomains are chosen, on which the coefficients are selected. Approach B. Using the obvious domain decomposition, we implement a specialized genetic algorithm for a neural network building: 1. We train two teams of K networks in each. We train the first team in the region Π1, the second—in the region Π2, i.e., the weights of each network are selected by minimizing the functional (1.2). In the first term of the functional, we use the points only of the corresponding rectangle, and in the second term, we use the points of the part of its boundary that belongs to the boundary of the area L. 2. For the further work from each team we choose the best neural networks in number K1 < K, based on the minimum error in another area, i.e., from the networks that were trained by minimizing the functional J1(u), we choose the networks having a minimum value of the functional J2(u) and vice versa. Another variant of this step is the selection of K1 networks that give the minimum value to the functional that was used for their training, and of these K2 < K1 networks, on which the second functional is minimal. 3. We produce random mutations of networks, which probability is the greater, the greater errors in their own and another area (for example, the number of errors). These mutations can be of different types: the removal of the term in the sum (4.3) for a given network with a minimum coefficient ci or removal of a randomly selected term; adding a function vi with a random center (xi0 , yi0 ) and parameters αi, βi, γ i; random change of centers (xi0 , yi0 ) and parameters αi, βi, γ i a certain value, etc. The functions vi may be different, which leads to the construction of heterogeneous networks. 4. We do random translocations (crossover)—for example, symmetrically reflecting centers (xi0 , yi0 ) or exchanging coefficients ci for two subsets of functions. 5. We crossover the best K2 < K1 networks (in the sense of the minimum of the corresponding functional), choose two networks from them and take some of the functions from one network, some from another network. As a result, we obtain a new network that we affix to the set of training networks and call the descendant. This operation is repeated with some set of such networks. At the same time, a part of descendants are produced

85

86

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

from networks of the same population (networks that trained in the same rectangle), and some from networks of different populations. With the resulting descendants, we supplement each population to the same number K or some other, if a population of variable size is considered. 6. Repeat the previous steps a certain number of times or until the error J1,2(u) is sufficiently small. As an approximate solution, in the rectangle Π1\Π2, the output of the best network trained in the area Π1 is taken, in the rectangle Π2\Π1, the output of the best network trained in the area Π2, and in the square Π1 \ Π2, the half-sum of these outputs. We can modify this algorithm by adding to it the following error clustering procedure. In step 1, we choose a fairly small number of basis functions. After step 3, we carry out the following procedure: Modification of the genetic algorithm. 1. We calculate errors zj ¼ u(xj, yj)  fk(xj, yj) on some set of points {(xj, yj)}m j¼1 of the boundary ∂ Ll from [ Γk and the Laplacian zj ¼ Δu(xj, yj) in a random set of points {(xj, yj)}m+M j¼m+1 inside the corresponding region Ll. 2. We cluster the points {(xj, yj, zj)}m+M j¼1 in the corresponding threedimensional space. 3. A certain set of clusters corresponding to the maximum average errors in the cluster are approximated by the corresponding neural network functions. In this case, the local character of the RB-functions is essentially used. 4. We add the resulting functions to the main set as one of the types of mutations. 5. We repeat the main procedure as many times as necessary. It is easy to see that Approach B is practically independent of the type of equation and the boundary conditions. Replacing the Laplace equation Δu ¼ 0 with an equation A(u) ¼ 0 that may be non-linear differential, integro-differential, etc., only leads to the replacement of the first term in error functional. A similar remark is valid for boundary conditions. The represented algorithm is easily generalized to the case of more complex regions; it is enough to decompose the region to do this. The simplest approach is to select s teams of networks, training members of each team in their subdomain, and testing in neighboring subdomains. It is also reasonable to crossover within an identical team of networks or with the best networks from neighboring teams. Approach C. With this approach, the team of networks is trained: 1. The domain is decomposed. Decomposition L includes some finite system of subdomains covering it. For example, in the test problem in question, these are rectangles Π1\Π2 and,

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

Π2\Π1 as well as a square. We can also divide the area into smaller partitions. 2. A set of RBF-networks is trained on the basis of minimization of the functional (1.2), i.e., each network gives an approximate solution for the entire region L. 3. For each subdomain, we select a network that gives the best result, in the sense of the functional value calculated by the formula (1.2), while the sums of Eq. (1.3) include only points from the corresponding subdomains, and for each functional, we choose the network having minimal it’s value. 4. Selected networks complete training in their subdomains, taking into account the approximation of solutions in neighboring subdomains. At the same time, the functional that is used to train each network is updated with the term responsible for the mismatch of the values at the junctions. Another variant of this step consists of the joint training of the entire team, which makes it possible to reduce the error, but impede parallel calculations. The resulting team of networks gives a local view for solving the problem in the whole domain, i.e., each network gives a solution in its subdomain. Approach C unchanged extends to more complex areas. If a system of equations is considered, then not a single best network is assigned to each subdomain, but a whole set of them—one network for each unknown function; therefore, the search procedure is complicated. If we train K networks for each of r the unknown functions, then for each subdomain we will have to search the Kr options, i.e., calculate the value of the corresponding functional and choose a set of networks that gives it a minimum value. Even with not too large values K and r direct search is not feasible, so you have to look for other methods, one of which can be selected genetic algorithm. Such an algorithm differs from that presented in Approach B in that the evolving individuals are teams of neural networks, and not the networks themselves.

3.2

Methods of global non-linear optimization

The above approaches for the selection of the structure of the neural network model involve the use of some non-linear optimization algorithm [6–8] (minimization of the selected error functional (1.3)) to search for its parameters. If the number of parameters is small and the initial approximation is good enough, then we can use the Newton method or some quasi-Newton method, such as BFGS. With a large number of matched

87

88

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

parameters, conjugate gradient methods are more efficient. If the initial approximation is not very good, then it is better to start the process with a non-local method. Since the error functional in most cases is multiextremal, to find its minimum, we need to apply a procedure that allows us to find the global minimum. The fact that we are interested in not the global minimum itself, but the point at which the value of the error functional is small enough simplifies the problem. We describe several approaches to such a global search. Restart methods. We can start methods of the local extremum search from several start points. Restart method (sequential version). 1. We select the initial values of the fitting variables w0 (for example, randomly in a given set), the local optimization algorithm and its parameters. 2. We implement steps in accordance with the selected local algorithm until the stop condition is met. These conditions include reaching the required error value, performing a specified number of steps, having the smallness of the step value, reaching a low speed of decreasing error, increasing error on the testing subset, etc. 3. If the required error value is not reached, then we fix the final value of the vector wk and the corresponding value of the error functional J(wk). 4. Repeat point 1, choosing another vector w0. In this case, it is possible to replace the local algorithm as a whole or change its parameters. 5. Repeat point 2, adding the proximity to the values wk achieved in previous starts as a stopping condition. 6. Repeat point 3, fixing the point with the minimum value J(wk). 7. Repeat steps 4–6 to achieve the required value of error, the expiration of the settlement time or the implementation of a specified number of restarts. Restart method (parallel version). 1. Choose a set of initial selectable vectors {w0(i)}M i¼1 (for example, randomly and independently from each other in a given set), local optimization algorithm, and its parameters. 2. We carry out the steps of the selected local algorithm until the stopping condition is satisfied in each of the processes. 3. If all the iterative processes converge to a single point, then it can be considered a global minimum point, if not—the minimum value point wk can be taken as the global minimum J(wk). It is possible to apply a more sophisticated version of this algorithm—after a certain number of steps of the local algorithm

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

(point 2), the clustering procedure is carried out, and then we can proceed to one of three ways: (A) From each cluster, we take one point with the minimum value of J(wk), add randomly or regularly selected points to their previous number, and execute a local descent again, etc. (B) In addition to the best point, we take the center of the cluster, and continue the same way as in (A). (C) We approximate the quadratic function J(w) on each cluster (by coordinate or by the principal components), find its minimum, calculate the value J(wk) at this minimum and, if this value is less than the minimum value among the cluster points, then for further optimization we take it, and if not—both points are the best point and the minimum point of a quadratic form. Next, we add new points in the same way as in (A), and implement a local descent from these points. Cloud methods. These algorithms are similar to the previous algorithm (a parallel version of the restart method) in fact that instead of one point the several points move. Their difference lies in the fact that the points do not move by themselves, but their movement is interconnected. The method of virtual particles. 1. Select the initial values of the fitting variables w0 (for example, randomly in a given set), the local optimization algorithm and its parameters. 2. We generate a certain set of points distributed not in the whole region of the parameter change, but some sufficiently small neighborhood of the point w0, for example, in a ball of sufficiently small radius. 3. At each point we calculate the gradient, we average these gradients and shift the points of the cloud in accordance with the gradient descent method (or some other local algorithm). 4. We generate a new cloud with the center at the best point (or shift the old ones) and repeat step 3 the necessary times. At the same time, the cloud size can be reduced from step to step according to some predetermined law in accordance with a decreasing in error or depending on the step size. Especially successful was the combination of this algorithm with the Rprop method, i.e., each point of the cloud moves in accordance with the signs of the coordinates of the total gradient along with all points of the cloud. The dense cloud method. 1. Select the initial values of the fitting variables—w0 (for example, randomly in a given set), the local optimization algorithm and its parameters. 2. We generate the initial set of points distributed in a sufficiently small neighborhood of the point w0.

89

90

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

3. We calculate the error gradient for each point of the cloud and calculate the shifts from each point in the direction of each gradient—we get from n the points n2 new ones. 4. From the resulting points, we select n the best ones and repeat step 3 the required times. The disadvantage of this approach is that the points can stick together. To solve this problem, we offer several options for the step 4: (a) Choose n clusters and take their centers for new points. The disadvantage of this choice is that the method becomes less stable. (b) We throw out a certain proportion of the worst points, then we cluster the points of the cloud, and continue as in step (a). (c) We discard all the points with an increased error. If there are too little points, then we replenish the cloud with new points (for example, via random generation) until the moment when their number becomes again n. (d) For the new center, we take the average value of the best cluster (in the sense of the average error), then we generate a new cloud around this center. We can combine the known polyhedron method with the gradient descent method from each vertex. At the same time if the polyhedron is not reduced, i.e., the worst point is to display symmetrically with respect to the average value of the others, then we get a non-local method. Modified polyhedron method. 1. Select the initial values of the parameters w0 and the size of the edge of the simplex l. 2. Build a regular simplex around w0 8 w 0 ,при j < i  1, > > > sjffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi > > > j < 0 wj + l , при j ¼ i  1, wi, j ¼ 2ðj + 1Þ > > > l > > w 0  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi > ffi , при j > i  1: : j 2jðj + 1Þ 3. We calculate the value of the minimized function J(w) at the vertices of the simplex. 4. We carry out several steps of any gradient descent method from each vertex. 5. We reflect the point characterized by the maximum value of J(w) relative to the average of the other vertexes (apexes) c: X wjmax ðk + 1Þ ¼ 2 wi, j ðk Þ  wjmax ðk Þ: i6¼imax

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

6. We repeat steps 3–5 the required number of times. If at the same time, the worst point (step 5) from time to time is replaced by a random point selected in the parameter change area with such a distribution that this point can be any in that area, then the method becomes global. A distributed version of the polyhedron method. 1. On each of the computers, the process of searching for a local extremum from its point is started. 2. Each computer from the set of computers intending for networks training sends the following results: the network weights, the training error to another computer of the set (for example, a randomly selected). Technical specification of the data communication such as addresses, protocols, data format, etc.—we leave aside. 3. The computer receiving this data recalculates the average value of the weights and sends the result further, adding the error value achieved by its local method. 4. Step 3 is repeated. Each new computer node in this chain receives the average value of the network weights and the set of the error functionals achieved by the previous computers. The computer that received the data can continue this chain, i.e. recalculate the average value of the weights according to a well-known recurrent formula cN ¼ cN 1 + N1 ðw N  cN 1 Þ. If we consider the obsolescence of data, then the multiplier N1 can be replaced by a number α : 0 < α < 1. 5. If some conditions are met (for example, the error in this node is greater than the specified value of errors in the previous nodes), then weights are reflected about the average value by the polyhedron method, and the local minimization method continues from this point. In this case, the data transfer chain may break, continue further, or start again. 6. If the network is unreliable and the computer does not receive such “messages” from other nodes for a long time, it initiates a new chain. Such a computational process has several useful properties. First, the global nature of the search is preserved, if it does not happen that all local processes are attached to a single point, which is a point of local extremum. Reflection relatively the average value for a different set of vertices and changing the vertices themselves by local optimization methods prevent the method from circularity. Secondly, the obtained global minimum cannot be lost, but the local ones, especially having large errors, sooner or later will be “thrown out” the local process.

91

92

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

Third, the loss of a part of the nodes from the optimization process slightly affects its efficiency. Losses of a significant part of the packages, lead only to their more frequent generation, having a weak effect on the overall course of the optimization process. The idea of the proposed further approach is to replace the condition of the error decreasing by condition J(wk + 1) < J(wk) + εk, where εk is some sequence converging to zero. Thus, the error may slightly increase, which allows the sequence of approximations to go out of a small local extremum. The method of jumping ball. 1. Select the initial values of the variables w0, the algorithm of local optimization, its parameters, and the method of selecting the sequence εk. 2. Carry out a step of the optimization algorithm (for example, gradient-descent algorithm). 3. Change the variables to the resulting vector wk + 1, if J(wk + 1) < J(wk) + εk. 4. Change. For example, if inequality is satisfied, then we decrease, if not notably, we increase. A more complicated way of algorithm modification includes following ideas: using the angle between gradients we change step size, estimate errors and predict the further movement of the iterative sequence of weights and, based on this prediction, we generate εk optimally. Obviously, the method allows us to overcome shallow local minima (less than deep εk) and flat slope characterizing the ravines. The effectiveness of the method strongly depends on the successful law of formation εk. Each of the algorithms described above has its advantages and disadvantages, which manifestation in solving a specific problem is difficult to predict in advance. In order to obtain a more computationally steady result, it is possible to combine several algorithms as follows. Competition of the optimization methods. 1. Select the initial values of variables w0, several local optimization algorithms, and their parameters. 2. We start the selected algorithms from a point w0 for a certain period (one step of different algorithms requires different time costs, so it makes no sense to run the algorithms for a certain number of steps). 3. Choose the best result and start all the algorithms from this point again. 4. Steps 2 and 3 are repeated until the stop condition is met. If among the selected algorithms, there is a global one, then the general algorithm also turns out to be global. If not all algorithms

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

are restarted, but only some randomly selected part, then a more advanced version of the algorithm will be obtained. At the same time, the probability of selection should be different and depend on implementation of the algorithm at the previous steps. For the algorithm being best at the algorithm step, the probability of its launch increases further, and for being worst, it decreases. It is reasonable to reuse at the next stage the best algorithm obtain at the previous step. Even more, the advanced versions combine basic algorithms as well as their mutations, having modified parameters. At the same time, the least successful algorithms should “mutate” more intensively. Since information about the ensemble of algorithms performing can be accumulated in the course of different problems solving, the program implementing such an algorithm can self-organize itself over time. It allows us to choose the best algorithm more likely. Based on the accumulated information—including the dimension of the problem, the sample size, the rate of error decreasing, etc.,—a secondary neural network can be trained, which can choose the optimal subset of algorithms and their parameters. The global optimization algorithms discussed above proved to be quite effective in the tasks of training neural networks in cases where the vector dimension of optimized parameters (network weights) is hundreds and thousands. Under these conditions, the well-known methods of global optimization lose their effectiveness. We can increase the stability of the computational process by applying a genetic algorithm to the choice of the optimization method. The evolutionary approaches presented in the next section, primarily 2 and 3, make it possible to use models built for typical close problems as elements of the initial population.

3.3

Methods in the generalized definition

Successful applications of neural networks to problems of mathematical physics inspire confidence that a much wider range of problems can be solved in the same way. The task of selecting a model from experimental data allows the following generalization, including many standard and non-standard problems. Let a set of conditions {Aq(u1, u2, …, ur)jΩq ¼ 0}Q q¼1 be given, where Ωq is a set, on which the corresponding condition must be fulfilled, us are unknown functions. Operators Aq specify equations, as well as boundary and other conditions—for example,

93

94

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

conservation laws, state equations, requirements of experimental data symmetry, as well as other requirements for solving a problem. In the process of training, operators Aq may be changed, including into consideration new incoming information, for example. We will search for each unknown function as the expansion: Ns P us ðx Þ ¼ ci,s vs ðx; ai, s Þ, s ¼ 1, …, r, adjusting weights—parameters i¼1

ai, s and ci, s—by minimizing the error functional composed of terms Mq   2 P Aq ðu1 , u2 , …, ur Þ x j,q Ωq . Each term is included in the j¼1

sum with a weighting factor δq > 0, usually fixed in advance or recalculated from time to time according to a certain procedure. (Let us leave aside the possible procedures for selecting weight factors, mentioning the simplest one—to adjust them so that the terms in the functional are approximately the same, recalculating from time to time δq.) The considerations given in the introduction make it possible to recommend repeating the choice of the test points sets after a certain number of optimization algorithm steps. The terms in the functional do not have to be quadratic; they can be taken in another form. To solve this problem, you can use several fundamentally different variants of the algorithm organization. First, you can create a single functional, using all the conditions at once, and search for all the parameters at once, minimizing this functional. This version is very demanding on computing resources. Secondly, it is possible to create several functionals based on different sets of conditions and adjust their part of the weights, minimizing each of them alternately. With the rational organization of calculations, this option allows us to increase computational speed, but the problem remains of a reasonable choice of the structure of the models used. Thirdly, you can apply an algorithm that combines the adjustment of parameters and the structure of the model. This option allows as to get the most accurate and adequate solution to the problem. Further, five approaches of this kind are discussed. I. The generalized error clustering algorithm. 1. We are looking for a set of functions of the type Ns P us ðx Þ ¼ ci,s vs ðx; ai,s Þ, by completing several steps to i¼1

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

minimize J ðu1 , u2 , …, ur Þ ¼

Q P q¼1

δq

functional   2 Aq ðu1 , u2 , …, ur Þ x j,q Ωq . In this

Mq P j¼1

the

case, we can take Ns the number is not too large. 2. For each set, we calculate errors zj,q ¼ Aq(u1, u2, …, ur)(xj), Q S and the set of test points xj from Ω ¼ Ωq can be changed q¼1

compared to the previous item, the main condition is that it should be sufficiently representative. These test sets are regenerated upon completion of each stage of training (passing a certain number of steps in the process of minimizing the functional). If the value Aq is not defined at some point xj, then we assume zj,q ¼ 0. 3. We fulfill the clustering of points {(xj, zj,1, zj,2, …, zj,Q)} in the corresponding space. 4. We take clusters (they can intersect), we construct for each an appropriate approximation, which gives the minimum error for the functional J(u1, u2, …, ur)calculated on multiple test points from the cluster under consideration. 5. Add to the required functions the terms constructed in the previous stage and repeat step 1 with the resulting set. 6. If the functional is not sufficiently small, then we replenish the population of the sets, bringing the clusters closer and applying steps 1–5 to the new sample. It should be noted that the proposed method does not impose special requirements on the form of the region (simple connectivity, the possibility of decomposition), or the equation (linearity, real coefficients). However, complicating the shape of the domain Ω and the equation leads to the difficulty of choosing the initial values of the model parameters, increasing the number of functions required to achieve a given accuracy, and the corresponding slowdown in the nonlinear optimization process. II. The generalized Schwarz method. It is significant that in this case, the possibility of decomposing the initial domain into subdomains that intersect only along part P S Ω0p . of the boundaries is used: Ω ¼ p¼1

1. Just as it was implemented at the approach I for the whole domain Ω, we construct our approximation for the solution u1,p, u2,p, …, urp,p in each of the subdomains Ωp0, using the part of the conditions corresponding subdomains Ωp0 to specify the error functional Jp

95

96

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

Jp ¼

Qp X q¼1

2 Mp X    δq,p Aq u1,p , u2,p , …, urp , p x j, q, p Ωq \Ω0p : j¼1

At the same time, the boundary conditions are taken into account only on the part of the border Ωq \ Ωp0 —checking sets of points are taken on the boundary so far only where boundary conditions are known. 2. After a certain number of training stages a set of neural networks, at the boundaries of each subregion approximations arise for the unknown part of the boundary conditions. 3. Data is being exchanged—additional terms are introduced into each of the error functionals, due to the information about the solution on that part of the boundary of the subdomains Ωp0 where the solutions were not specified, this information is a solution built on another subdomain. 4. The calculation procedure is repeated a specified number of times or until the required accuracy is reached. More interesting is the modification of the Schwarz method, where the subdomains Ωp0 cannot only have a common interface but also intersect along with sets of a nonzero measure. Also, in stage 3 of the decomposition algorithm, information about the mismatch of the solutions Jp is entered into the error functionals X q6¼p

εq,p

Mq0 , p rp XX

  ui,p  ui,q 2 x j, q,p ,

j¼1 i¼1

where the test points xj,q,p are taken at the intersection Ωq0 \ Ωp0 (which gives a smoother junction). The calculation of approximate solutions for subdomains and the data exchange at building solutions in the whole area can be implemented within the framework of grid technologies. In this case, the solution for each subdomain Ωp0 is adjusted on its computer, taking into account the approximations of solutions at the intersection with the neighboring subdomains, which information is sent from time to time from the appropriate computers. III. The approach based on the Group Method of Data Handling (GMDH) ideology [5,10,14–17]. 1. For each condition Aq(u1, u2, …, ur)jΩq ¼ 0, we select a set ψ q,s(x) ¼ vs(x; aq,s), of basis functions: ψ 1, ψ 2, …, ψ Q, q ¼ 1, …, Q, s ¼ 1, …, r. On the first step k ¼ 1, we consider linear combinations of basis functions (not only paired combinations yi,s(1) ¼ cq,s(1)ψ q,s + ct,s(1)ψ t,s, q ¼ q(i, 1), t ¼ t(i, 1)) but also not paired ones; in any case, the volume of

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

enumeration should be limited). We adjust the coefficients via Mp   2 P minimizing the functional Ap ðu1 , u2 , …, ur Þ x j,p Ωp . j¼1

2. Choose the best of the resulting functions in the sense of the best functional value of the form. X q2Qp

Mp X   2 δq,p Aq ðu1 , u2 , …, ur Þ x j,q Ωq , j¼1

where Qp are the numbers intersecting domains (close to Ωp) For the selection, it is possible to use several such functionals. 3. On the step number k: When creating a single-layer neural network (for example, RBF-network), we consider linear combinations of functions obtained on the previous step (only pair combinations yi,s(k) ¼ cq,s(k)yq,s(k  1) + ct,s(k)yt,s(k  1), q ¼ q(i, k), t ¼ t(i, k) or not only) and adjust its coefficients by the use of minimizing the same functional as in step 2. When creating a multilayer neural network, we consider the following linear combinations yi,s(k) ¼ cq,s(k)yq,s(k  1) + ct,s(k)yt,s(k  1) + cp,s(k)φ(yp,s(k  1)), where φ is the activation function, q ¼ q(i, k), t ¼ t(i, k), p ¼ p(i, k) by the use of minimizing the functional. 4. Later we repeat the previous steps adjusting parameters of models using recommended functional and select the best models using other functional until the error becomes sufficiently small. Note that such a modification of the algorithm is possible: when drawing up of pairs of linear combinations in step 2. We select functions corresponding to different subdomains, and when pairs selecting in step 3, for example, pairs that give minimal mismatch at the intersection of subdomains are considered the best. IV. A genetic algorithm for constructing a neural network team defining a solution on the base of domain decomposition. For each of the subdomains Ωp0, we construct the population from the K approximations for the solution u1,p, u2,p, …, urp,p, using the corresponding part Ωp0 of the conditions when specifying the corresponding functional Jp, while, as in the second approach, the boundary conditions 2 Qp Mp    P P Aq u1,p , u2,p , …, ur ,p x j,q, p are used. At Jp ¼ δq, p 0 p Ωq \Ωp q¼1

j¼1

the same time, as in the second approach, boundary conditions are used only on the part of the boundary Ωq \ Ωp0 —testing sets of points are taken on the boundary where border conditions are known.

97

98

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

1. For further work from each team we choose the best models in number K1 < K, based on the minimum of the error functional in the domain Ω\Ωp0 . Another variant of this step is to choose K1 models that give the minimum value to the functional that was used for their training and among them K2 < K1 models having another functional of the same type is minimal. Another option is to rank all models by an error on each of the subsets Ωp0 and for further work choose models with the minimum total rank. If a model is ranked among the worst in terms of its functionality, but concerning another functional, it is ranked among the best, then it can be moved to the appropriate team. 2. We produce random mutations of models, which probability is the greater, the greater the errors in their own and another domain (for example, the sum of errors). These mutations can be of different types: deleting the term in the sum Ns P us ðx Þ ¼ ci,s vs ðx; ai, s Þ for this model with a minimum coefficient i¼1

ci or deleting a randomly selected term; adding a function v with a random vector parameter ai; random change of parameters ai to some value, etc. In this case, the functions vs can be different—vi,s, which leads to the construction of heterogeneous models. 3. We make random mutations—for example, by exchanging the coefficients ci of two subsets of functions. 4. We carry out crossing—we take the best K2 < K1 models (in the sense of the minimum of the corresponding functional), choose two of them and take part of the items from one model, and part from the other, the result is a new model that adds to the set of selected models and is called a descendant. This operation is repeated with some set of such model couples. At the same time, some descendants are produced from models of the same population (models trained in the same set Ωp0 ), and some from models of different populations. With the resulting descendants, we supplement each population to the same number K or some other, if a population of variable size is considered. 5. Repeat the previous steps a certain number of times or until the Q M   2 P P δq Aq ðu1 , u2 , …, ur Þ x j,q Ωq for a error J ðu1 , u2 , …, ur Þ ¼ q¼1

j¼1

set of models is sufficiently small. Algorithms of an evolutionary type are modified quite easily, which allows them to be adapted to the features of a specific task. One of the similar variants of the genetic algorithm under consideration can be obtained if instead of functional Jp, we use separate terms in the sum J(u1, u2, …, ur). In step 3, the models that

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

are the worst in terms of each of the considered functionals are discarded. In this case, it is reasonable to cross in step 5 to expose models that give the minimum value to various of these functionals. It is possible to modify this algorithm by adding an error clustering procedure (approach I). The obtaining functions are added to the basic function set as one of the types of mutations in step 3. This genetic approach is easy to adapt to distributed computing. The most natural option arises if each team of models is adjusted in its node. In this case, only separate models (a set of parameters and information about the structure) or parts of models intended for crossing should be transferred between the nodes. Also, some information can be sent, for example, the values of optimized functionals. If there are few nodes, then several populations can be trained in one node, it is reasonable to realize more intensive crossing between them than between populations from different nodes. If there are many nodes, then on each of them, it is possible to place part of the population or even only individual elements. This action does not affect the speed of the calculations too much, especially if the models are large since the most time-consuming operation is the selection of parameters locally. V. Training a team of expert models. Ns P We select K sets of functions of the form us ðx Þ ¼ ci, s vs ðx; ai,s Þ, i¼1 minimizing a single functional J ðu1 , u2 , …, ur Þ ¼

Q X q¼1

δq

Mq X   2 Aq ðu1 , u2 , …, ur Þ x j,q Ωq : j¼1

In this case, we can take the number Ns not too large. 1. For each subdomain, Ωq select the set of models for which the Mq   2 P error Jq ðu1 , u2 , …, ur Þ ¼ Aq ðu1 , u2 , …, ur Þ x j, q Ωq correj¼1

sponding to the given subdomain is minimal. 2. We select each set on its subset Ωq, including in the minimized functional the terms responsible for the mismatch at the joints (see Approach II). As a result of training, an approximate solution arises, which in each subdomain is given by the corresponding model. This algorithm allows a modification in which the decomposition of a region is not set a priori, but is produced naturally in the process of the algorithm. 1. We select K sets of functions of the form Ns P ci,s vs ðx; ai,s Þ, minimizing the functional us ð x Þ ¼ i¼1

99

100

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

J ðu1 , u2 , …, ur Þ ¼

Q P q¼1

δq

Mq   2 P Aq ðu1 , u2 , …, ur Þ x j, q Ωq .

j¼1

In this case, the number Ns is not too big. 2. For each set, we calculate errors zj,q ¼ Aq(u1, u2, …, ur)(xj), while Q S Ωq can be changed comthe set of test points xj from Ω ¼ q¼1

pared to the previous item, the main condition is that it should be sufficiently representative. If the value Aq is undefined at some point xj, we guess zj,q ¼ 0. 3. We carry out clustering of points {(xj, zj,1, zj,2, …, zj,Q)}. 4. We take clusters (they may intersect) build the corresponding covering of the set Ω, and choose for each resulting subset a set of functions that gives a minimal error. 5. We modify each set on its subset, including in the minimized functional the terms responsible for the mismatch at the junctions (see Approach II). The subsets themselves may change during the operation of the algorithm. Just like the previous one, approach V allows for a fairly simple and efficient distributed implementation. The simplest version of such an implementation is to train on each node its own set of functions, which turned out to be the best on some subset. In this case, it is necessary to send only information about the behavior of the approximation at the joints, and only to those nodes that correspond to the areas joined to the area for this node. It is also possible that one set of functions corresponds to one node. If there are many nodes, then it is possible to train one set on several nodes, implementing some distributed algorithm. The evolutionary algorithms presented above are a generalization of the approaches formulated above for the L-region. Their specification is determined by the characteristics of the problem being solved.

3.4

Methods of refinement of models of objects described by differential equations

As mentioned in the introduction, real objects usually change over time. In accordance with the changes of the original objects, their models (digital twins) should also change. In this regard, there is a need to make adjustments to the model of the object during its operation. Such an update should be made on the basis of the object observations. At first glance, it is not difficult to do this within the framework of our methodology. It is enough to

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

add the terms in the component J3 of the functional (1.2) corresponding to the new observations. It would seem that the change of the functional (1.2) makes significant changes in the optimization process. However, this functional, namely its components J1 and J2, so changes in the process of optimization due to the regeneration of test points. Thus, we do not make any fundamental changes in the process of building the model. The details of the learning algorithms depend on how essential the object changes, respectively, as far as the new data about the object correspond to the model built on the old data. If the changes are large, it may be necessary not only to rebuild the model parameters but also to change its structure. Such a dynamic change of structure can be achieved by using an evolutionary algorithm that we have given earlier in this chapter. The problem arises due to the fact that an evolutionary algorithm is a quite slow procedure, which is especially inconvenient if periodically come new observations that need to be included in the processing. At the same time, not only the time interval between the arrival of new data δT but also the characteristic time ΔT of restructuring of the object under study is of significant importance. By time ΔT we mean the time in which the simulated object changes so much that the error of the neural network model constructed from the old data does not exceed the error of a similar neural network model with initially specified weights. If the time for which the algorithm converges is greater than δT, then its application is possible, only it is necessary to include new observations in the processing. If the convergence time of the algorithm is longer ΔT, it should be considered unusable. We will discuss different situations and the choice of algorithms for them in more detail. • Algorithms based on approaches I–V of paragraph 3.3 can be applied if the convergence time of these algorithms is less δT. • If the convergence time of the above algorithms is longer δT but less ΔT, then these algorithms can be used to adapt the weights of networks with the addition of the component J3 of the functional (1.2) with new data and the removal of obsolete terms from it. For the selection of models in this situation, we recommend using the accuracy of the short-term forecast (on 1–3δT). • If the convergence time of algorithms I–V of paragraph 3.3 is longer ΔT, then steps should be taken to reduce the enumeration or convert the process to a more adaptive one. A sharp change in the situation can be tracked by the accuracy of the short-term forecast. In this case, the time interval at which observations are used to calculate the coefficients of the models should not be much longer than the time interval for

101

102

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

which the forecast accuracy is in the required framework. For genetic algorithms, the adaptability increases, if to take action against degeneracy, for example, to increase the likelihood of mutations or to provide a diverse selection of individuals for breeding that allows us to move quickly to a satisfactory model in the changed situation When different situations can be repeated, then it is possible to apply the resurrection of chromosomes, which were the best in some period of time, extinct in a changed situation, but can again be useful with a new change in the observed object. • If ΔT even less, then to reduce the search you need to use a multi-row algorithm GMDH—Group Method of Data Handling (approach III) with paired combinations on each selection series (this dramatically reduces the calculation volume of coefficients) and the inclusion of variables from the previous series in the search. If at some point the optimal model is not from the last selection series, the next step is to search the models no further than the next series. • If the rate of new data is still increasing, different subsets of the dataset can be used to generate different models. The search of models can be reduced to sequential addition or removal of terms one by one

References [1] I.T. Jolliffe, Principal Component Analysis, second ed. Series: Springer Series in Statistics, Springer, New York, 2002. XXIX, 487 p. 28 illus. [2] A.N. Gorban, B. Kegl, D. Wunsch, A.Y. Zinovyev (Eds.), Principal Manifolds for Data Visualisation and Dimension Reduction, In: Series: Lecture Notes in Computational Science and Engineering 58, Springer, Berlin, Heidelberg, New York, 2007. XXIV, 340 p. 82 illus. [3] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, 1999. 823 p. [4] S. Osovsky, Neural Networks for Information Processing, Finance and Statistics, Moscow, 2002. 344 p. (in Russian). [5] H.R. Madala, A.G. Ivakhnenko, Inductive Learning Algorithms for Complex Systems Modeling, CRC Press, 1994. 368 p. chal, C.A. Sagastiza´bal, Numerical Optimiza[6] J.F. Bonnans, J.C. Gilbert, C. Lemare tion: Theoretical and Practical Aspects, Springer-Verlag, Berlin, 2006. pp. xiv+490.  ski, Nonlinear Optimization, Princeton University Press, Princeton, [7] A. Ruszczyn NJ, 2006. pp. xii+ 454. [8] M. Minoux, Mathematical Programming: Theory and Algorithms, A WileyInterscience Publication, John Wiley & Sons, Ltd, Chichester, 1986. pp. xxviii+489.

Further reading [9] J. Schmidhuber, Deep learning in neural networks: an overview, Neural Netw. 61 (2015) 85–117. 1404.7828. https://doi.org/10.1016/j.neunet.2014.

Chapter 3 METHODS FOR THE SELECTION OF PARAMETERS

[10] A.G. Ivakhnenko, Inductive Method of Self-Organization of Models of Complex Systems, Naukova Dumka, Kyiv, 1981 (in Russian). [11] H.R. Malada, A.G. Ivakhnenko, Inductive Learning Algorithms for Complex Systems Modeling, CRC Press, 1994. [12] V.V. Strizhov, Methods of Inductive Generation of Regression Models, CC RAS, Moscow, 2008. 55 p. Brochure, PDF (in Russian). [13] V. Strijov, E.A. Krymova, The Methods of Choice Regression Models, CC RAS, Moscow, 2010. 60 p. Brochure, PDF (in Russian). [14] A.G. Ivakhnenko, Heuristic Self-Organization in Problems of Engineering Cybernetics, Automatica 6 (1970) 207–219. [15] A.G. Ivakhnenko, Polynomial Theory of Complex System, IEEE Trans. Syst. Man Cybern. (4) (1971) 364–378. vol. SMC-1. [16] S.J. Farlow, Self-Organizing Methods in Modelling: GMDH Type Algorithms, New-York, Bazel, MarcelDeckerInc., 1984. 350 p. [17] M. Mottaghitalab, Genetic Algorithms and Group Method of Data Handling— Type Neural Networks Applications in Poultry Science, in: Real-World Applications of Genetic Algorithms, University of Guilan, 2012. https://doi.org/ 10.5772/37551.

103

Results of computational experiments

4

In previous chapters, we outlined the main stages of the universal process of building neural network models from data, including differential equations and other conditions. In this chapter, we present the results of computational experiments demonstrating the working capacity and effectiveness of the proposed methods. To facilitate understanding, we save the set and the order of problems described in Chapter 1. Before reading this chapter, we recommend re-read the relevant section from Chapter 1.

4.1

Solving problems for ordinary differential equations

4.1.1

Stiff form of differential equation

In this section, we present the results of computational experiments on the construction of a parametric neural network model for the problem (1.10) solving [1,2]. We chose α 2 [5; 50] or α 2 [0.5; 50], x 2 [0; 1] as intervals for changing a parameter. When solving this problem using the explicit Euler method α ¼ 10, a critical value for the grid step of 1/25 arises, and when it is exceeded, the approximate solution becomes unstable with strong oscillations (Fig. 4.1A) [1]. The smaller step provides a too big error. We focused our attention on several fundamental issues. First, we studied the possibility of extending the parameter change interval within a single neural network model without loss of accuracy, that is, the possibility of increasing the set of simultaneously solvable problems. Secondly, we investigated the influence of a new approach to the test points choosing on the accuracy of the solution. Thirdly, we conducted a study aimed at clarifying the solution using additional heterogeneous data. As this data, we used point Semi-empirical Neural Network Modeling and Digital Twins Development. https://doi.org/10.1016/B978-0-12-815651-3.00004-3 # 2020 Elsevier Inc. All rights reserved.

105

106

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

y 0.30

1.5

0.25 0.20

1.0

0.15 0.10

0.5

0.05

(A)

0.2

0.4

0.6

0.8

1.0

x

(B)

0.2

0.4

0.6

0.8

1.0

Fig. 4.1 Solutions of a stiff form of the differential equation (1.10) by the explicit Euler method (points) and analytic (lines); grid spacing is 1/25, (A) a ¼ 50 and (B) a ¼ 0.5.

data on the desired function, including inaccurate data, which is often found in real models. The convenience of the problem under consideration is that it has an analytical solution, comparing which with our approximate solutions allows us to evaluate the obtained results objectively. The stiff form of the problem manifests itself for a variable x in the neighborhood of zero, which is the reason for choosing the appropriate interval. Trial solutions showed that the quality of the neural network result is saved for large intervals of variable variation x. An approximate solution is sought as the output of an artificial neural network of a given architecture. yn ðx, αÞ ¼

n X

ci vðx, α, ai Þ

i¼1

whose weights {ci, ai}ni¼1 are determined in the process of minimizing the error functional (1.2) with Xm   Xm        2  y 0 ξj , αj + αj y ξj , αj  cos ξj , J2 ¼ y 2 0, αj J1 ¼ j¼1 j¼1 We assess the quality of the solution obtained according to the exact analytical solution of the problem (1.10), which has the form y ðx, αÞ ¼

α2 ð cos x  exp ðαxÞÞ + α sin x : α2 + 1

(4.1)

To search for an approximate solution, we used neural networks with different basic functions v(x, α, a) with different numbers of

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

neurons in the network. In this section, we present the results of calculations for the two types of functions that turned out to be the best. The first type of functions is sigmoids vðx, α, aÞ ¼ tanh ½aðx  d Þ tanh ½a1 ðα  d1 Þ, a ¼ ða, d, a1 , d1 Þ, the second type is asymmetric Gaussians h i h i vðx, α, aÞ ¼ x exp aðx  d Þ2 exp a1 ðα  d1 Þ2 , a ¼ ða, d, a1 , d1 Þ, which satisfy the initial condition. The optimization of the error functional was performed using an algorithm that combines the Rprop method and the cloud method (see Section 3.2 and Ref. [2]); the points were randomly regenerated every three steps; the cloud consisted of three particles. The optimization process is complicated by the fact that the optimized functional changes after each regeneration of test points. So we avoid the possibility of obtaining a good approximation in a fixed set of points and a poor approximation in other points of the considered region, which can occur when applying the method of collocations. It is necessary to emphasize that the applied computational procedure contains double stochastic: in addition to this random regeneration of points, the initial weights of the neural network are chosen randomly. Also, we studied model creating algorithms that use additional data about the desired solution, estimated the effect of such refinement for different types of basis functions and different numbers of neurons in the network. As variant data, the correspondence of the sought-for solutions already found using the explicit Euler method to approximations for the parameter α values of 5 and 50 was considered. Note that when α ¼ 5 the equation is no longer stiff and is solved fairly accurately. A “bad” solution at the value α ¼ 50 allows studying the model’s reaction to inaccurate data. Introduction to the model of new information occurs by adding to the functional (1.2) an additional term of the form δ1 

m    X  2 f xj  y xj , j¼1

where f(xj)—point-wise solution obtained by the Euler method; the weight δ1 can be varied while taking into account the accuracy of the available data. In the first modification of the algorithm for constructing a neural network model, we used the correspondence of the

107

108

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

obtained solutions to the approximations already found using the explicit Euler method for the parameter α ¼ 50. Some results of computational experiments for the error functional for two types of neural networks are given in Table 4.1. We considered networks with a different number of neurons and the number of iterations equal to 200. Obviously, in the absence of additional data, a variant with basic functions satisfying the initial condition (asymmetric Gaussians) showed itself from the best side. Attracting additional information improves accuracy only when using universal basic functions (sigmoids), whereas, in the case of a network with functions selected under the initial condition, an increase in error occurs. The effect of using data for networks with a relatively high number of neurons (n ¼ 50) is especially noticeable.

Table 4.1 Values of the error functional with a different data set.

Basic model (d1 5 0)

1st modification (data for a 5 50)

2nd modification (data for a 5 5 and a 5 50)

n

Sigmoids

Gaussians

Sigmoids

Gaussians

Sigmoids

Gaussians

5 20 50

4.078 4.312 8.811

1.503 0.932 1.787

2.176 2.781 4.482

3.746 1.226 1.587

1.561 1.673 1.260

3.376 2.074 1.556

Notations: n is the number of neurons, a is the parameter; d1 is the weight of additional data; for non-zero d1, additional data is used for various values n. The number of iterations is 200.

A large error in the graph δ1 ¼ 0 (this means the absence of additional data) in the case of a network with sigmoid basis functions can be explained by the features of the functional sensitive to a sharp increase in the solution. An increase in the number of neurons does not change the situation. In the absence of data or when a small number of them, a large number of neurons only decelerates learning and impairs the result, it is important to note that in this case, the results can be improved with a significant increase in the number of iterations. Thus, the refinement of the neural network solution is possible when even inaccurate data are used as additional information,

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

for example, approximate numerical solutions obtained by classical methods, including such weak ones as the explicit Euler method. We use the new procedure for the regeneration of test points. We introduce a parameter dt that takes values from 0 to 1 and reflects the fraction of points fixed from one iteration to another. For example, the condition dt ¼ 0 means complete regeneration, i.e., all test points are re-selected at random (points are distributed evenly over the interval in question) before each iteration; dt ¼ 1 means that the points are fixed from the first iteration and no longer change. For intermediate values of the parameter dt, the following rule is used: the points dtm are fixed from all m the test points with the largest values of the error functional, and the remaining points are regenerated randomly. In the first iteration, in all cases of setting the parameter dt, the points are chosen so that they are randomly and evenly distributed on the area under consideration. In the experiments, a single-layer perceptron type network was used, which showed itself in a previous study as the most susceptible to the additional information about the model. The number of neurons in the network was chosen to be twenty n ¼ 20, the data on the correspondence of the desired solution to the approximate solution obtained by the explicit Euler method was used at α ¼ 5 and at α ¼ 50, the number of iterations was 300. As an objective assessment of the results, we introduce the following characteristic. Since for our equation, there is a solution in the explicit form (4.1), we have the opportunity to compare with it the solution built using neural networks. As a characteristic, we used the standard deviation s found in 100 thousand pairs of values α, x, and we took them α and x evenly distributed at the corresponding intervals. A series of 10 tests was conducted for different parameter values dt. The quality of the solutions built by the neural network was determined using the root-mean-square estimate mentioned above. Table 4.2 presents the results of experiments in the form of the standard deviation s of the obtained sample from the corresponding values of the exact solution. The number of iterations is 300. The number of test points is m ¼ 20. The obtained results allow us to conclude that in this case, the regeneration by the above-described rule with dt ¼ 0.3 and dt ¼ 0.5 allows us to obtain a more stable result of neural network modeling in comparison with other options for choosing a parameter dt.

109

110

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

Table 4.2 Evaluation of the quality of the neural network model with partial regeneration of points. Deviation

dt 5 0

dt 5 0.3

dt 5 0.5

dt 5 0.7

dt 5 1

102s

0.96

0.59

0.48

0.92

1.25

In the next series of computational experiments, we study the possibilities of refining the neural network model using the asymptotic condition. At the same time, in the functional (1.2) we added the term (1.12) δ1  J3 ¼ e

XK

+e δ2 

k¼1

ðy ðxk , αk =M Þ  A1 ðxk , αk =M ÞÞ2

XL

l¼1

ðy ðxl , αl + M Þ  A2 ðxl , αl + M ÞÞ2 :

Here, αj the same points as in the functionals J1 and J2; M is a sufficiently large fixed positive number. Also, we took into account the data on the correspondence of the desired solution to the approximate solution obtained by the Euler method at the parameter α ¼ 50. Thus, when constructing the model, we used heterogeneous data. Also, we continued to study the effect of the test point regeneration procedure described above. The approach described above was investigated for a network with 20 basis functions (n ¼ 20) and with 20 and 50 test points (m ¼ 20 and m ¼ 50). The neural network was applied with an asymptotic condition for M ¼ 50, M ¼ 100 and M ¼ 200. Note that the values of the parameters in Eq. (1.12) were beyond the interval in which the conditions of the problem was assumed to be satisfied (1.10), in this case, we can speak of a non-classical statement. For each set of parameters, a series of tests was carried out. The quality of the solutions built by the neural network was determined using the root-mean-square estimate mentioned above. The results of the experiments are given in Table 4.3. For the model with 50 test points, only the case of using the asymptotic condition is presented for M ¼ 50, since M ¼ 100 and M ¼ 200 no significant differences in the results were found. Obviously, for large values of the parameter M, the problem usually nonclassical, which is expressed in the form of a deterioration in the results at full regeneration for the value m ¼ 20. In these cases, the method with partial regeneration of test points works better, and the collocation method gives the greatest error. It is important that it is for the model with fixed points at m ¼ 50 for which we get

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

111

the same good result as for the complete regeneration of points at m ¼ 20. In this case, the training time greatly increases (it linearly depends on the number of test points). Thus, all other things being equal (ceteris paribus), the complete regeneration of points allows reducing the time of the algorithm by reducing the number of test points while maintaining the accuracy of the result.

Table 4.3 Quality assessment of the constructed neural network models 102s taking into account the asymptotic condition for various values of the parameter M. M m ¼ 20 50 100 200 m ¼ 50 50

Deviation

dt 5 0

dt 5 0.3

dt 5 0.5

dt 5 0.7

dt 5 1

102s 102s 102s

0.41 0.68 1.49

0.99 1.25 0.96

0.32 1.70 1.27

0.83 1.10 1.01

1.24 1.35 1.95

102s

0.95

0.46

1.18

0.71

0.27

An increase in the error during the complete regeneration of 50 test points indicates some network over-fitting, that is, the method and the number of points must be selected correctly concerning the conditions of the problem. Partial regeneration can be successfully applied to a non-classical problem, thereby allowing the model to be refined at “difficult” points. Thus, the neural network successfully solves non-classical problems, while the asymptotic model gives a more uniform approximation over the entire interval of the parameter variation. We investigated another hybrid approach, which is that the neural network approximation yn(x, α) for solving the problem (1.10) is used to modify the implicit second-order accuracy method of Adams to solve a differential equation y 0 ¼ f(x, y): h yi + 1 ¼ yi + ðf ðxi , yi Þ + f ðxi + 1 , yi + 1 ÞÞ: 2 In the classical approach, yi+1 it is given implicitly and requires at each step the application of a certain method of solving this non-linear equation. Most often, in this case, the “predictorcorrector” method is used, consisting of two-stage calculations at each step. As a first stage, the Euler method is applied for calculation: y^i + 1 ¼ yi + hf ðxi , yi Þ

112

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

and the second stage is the formula of the Adams method: yi + 1 ¼ yi +

  h f ðxi , yi Þ + f xi + 1 , ^y i + 1 : 2

We use the neural network approximation yn(x, α) to replace two formulas with one: h yi + 1 ¼ yi + ðf ðxi , yi Þ + f ðxi + 1 , yn ðxi + 1 , αÞÞÞ: 2 This approach was used and analyzed for neural network models with a perceptron network and with RBF-network with basic functions—Gaussians—for different numbers of neurons, as well as in the presence and absence of additional data. We describe the most interesting results. As expected, in the absence of additional data, the best result and for the considered hybrid method shows the use of a neural network with Gaussians. The number of neurons n ¼ 5 is insufficient, and when n ¼ 50 the effect of retraining occurs. When n ¼ 20, there is a significant decrease in error when α ¼ 50, and thus, the hybrid method allows to improve the result of the predictor-corrector method in the case of the stiff form of the differential equation. For the perceptron network, the effect of applying the hybrid method is manifested in the model using additional data for the parameter values α ¼ 5 and α ¼ 50. Already for a network with a small number of neurons n ¼ 5, the result is improved for the stiff equation with α ¼ 50, compared to the neural network and classical methods, although this is not observed for small ones. Apparently, there is a lack of the number of neurons in the network, since n ¼ 20 n ¼ 50 the best result for the stiff case is preserved and the accuracy of the classical methods for the equation is achieved for small values of the parameter α. As a result of the study, it was established that the neural network approach when building a mathematical model from heterogeneous data including differential equations allows modifying the error functional to take into account additional conditions of various types without the need for a significant change in the algorithm. The use of neural networks makes it possible to solve nonclassical problems, as well as the problem of building models from inaccurate data, which is often found in real applications. In the considered problem, it turned out that the additional information is better perceived by the perceptron with sigmoid basis functions.

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

The proposed algorithm for the regeneration of test points saves network operation time in simple problems and model refinement in the presence of complex additional information. When using neural networks that provide reasonably accurate approximations, a hybrid algorithm is effective, using approximate neural network solutions in classical implicit methods. In this case, such an algorithm makes it possible to significantly improve the accuracy of an approximate solution in a discrete set of points.

4.1.2

Chemical reactor problem

This section presents the results of computational experiments to solve the problem (1.13). This problem is solved exactly using the standard method of decreasing order [3]. A feature of the problem is the lack of an exact solution for α > α∗  0.878458. The presence of an exact solution of the problem θ(x, α) for α  α∗ allows us to estimate the error of the method objectively (Fig. 4.2).

0.8

0.6

0.4

0.2

1

2

3

4

5

0

Fig. 4.2 Graph of the parameter a as a function of the value y(0).

The minimization of the functional (1.2) was carried out using the RProp method [2] with the regeneration M ¼ 100 of test points every five steps of the algorithm; 200 regenerations were done. A detailed analysis of the results can be found in the publications [4, 5].

113

114

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

Numerical experiments showed the following results. First, the methods for constructing neural network models we have proposed allow us to improve the neural network solution significantly if additional information is used. It can be obtained, for example, using approximations based on traditional numerical methods (even not very accurate ones) or using asymptotic expansions. The specified refinement effect is lost when approaching the critical value of the parameter. Secondly, the neural network allows us to construct an approximate solution of the parametric problem in the form of a function; one of its arguments is the parameter. In this case, the parameter can take the value at which the exact solution of the problem does not exist. The effect of lack of a solution is manifested by a sharp increase in the error in satisfying the equation. At this problem, we conducted a comparative analysis of the effect of the regeneration of points on the error. Below are the results of computational experiments for a neural network with 100 neurons (Table 4.4). Sigmoids were used as basis functions vðx, α, aÞ ¼ tanh ½aðx  d Þtanh ½a1 ðα  d1 Þ, a ¼ ða, d, a1 , d1 Þ: The number of test points (xj, αj) at which the error in satisfying equation (1.13) was calculated is 20. Two cases were considered. In the first case, the number of iterations was chosen equal to 200 with the regeneration of test points (xj, αj) in five steps; points are chosen as uniformly distributed randomly in the region α 2 [0.4; 1], x 2 [0; 1]. In the second case, the points were not regenerated, fixed points were located in the same area at regular intervals over the variation interval of each of the parameters, and 1000 optimization steps were performed.

Table 4.4 Maximum error of the neural network model depending on the parameter with full regeneration of points and without regeneration. Parameter value

a 5 0.4

a 5 0.5

a 5 0.8

With full regeneration Without regeneration

0.084 0.072

0.0048 0.12

0.064 0.46

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

115

Fig. 4.3 Dependence of the first term of the error functional (1.2) on the parameter calculated at 20 test points in the following cases: (A) complete regeneration of points, (B) without regeneration of points.

The results obtained allow us to draw the following conclusions. First, the regeneration of points allowed to reduce the error radically: it decreased by more than an order of magnitude in the interval α 2 [0.5; 0.8]. Note that the regeneration of points is a much faster operation than the calculation of the value of the solution and its derivatives concerning parameters at test points. Therefore, the regeneration of points practically does not slow down the process of calculations. Secondly, in Fig. 4.3A a sharp increase J1 is seen with α > α∗, which may serve as an indirect sign of the lack of an exact solution. In Fig. 4.3B, there is no such effect. Moreover, this dependence on the parameter α resembles a quadratic dependence, which corresponds to the substitution of an arbitrary smooth function θ(x) in the functional. Thus, the nature of the dependence of the solution on the parameter cannot be restored. Third, in Fig. 4.3A, a significant increase is seen at the left end of the segment. This feature is typical for other problems, which allows us to recommend building a neural network model in a slightly wider range of parameter variation than that of interest.

4.1.3

The problem of a porous catalyst

This section presents the results of computational experiments for the problem (1.14). In this case, we considered the case of a flat granule, which corresponds to the choice p ¼ 0. In order to compare our results with the results of work [6] in the first series of experiments, calculations were carried out for the same parameter values: α∗ ¼ 0.1, β∗ ¼ 0.5, γ ∗ ¼ 1. We looked

116

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

for a solution in the form of an RBF network with the following basic neuro elements n o vðx, a1 , a2 Þ ¼ exp a1 ðx  a2 Þ2 : As a method of global minimization to adjust the parameters of the approximate neural network solution y(x), a modified polyhedron method was chosen. Already for a network with six neuro elements, it was possible to build an approximate neural network solution of the problem with a root-mean-square order error 4  105 (relative error not exceeding 0.08%) stable with respect to the perturbations of its parameters, and the solution is presented in analytical form, its values at the control points coincided with the data given in the monograph by Na [6]. In the second series of computational experiments, we carried out the construction of a neural network, which gives a solution to the problem not with fixed values of the parameter, but with its values from some interval. At the same time, this parameter was considered as a network input along with a variable x. As such a parameter, we could choose the parameter α, since dependence on it is most interesting from the standpoint of applications. However, we decided to introduce into consideration all three parameters: α 2 (αmin; αmax) β 2 (βmin; βmax) and γ 2 (γ min; γ max). The minimized error functional in this case is given in Section 1.1.3. Gaussians could be taken as basis functions vðx, α, β,nγ, a1 , a2 , …, a8 Þ ¼ o ¼ exp a1 ðx  a2 Þ2  a3 ðα  a4 Þ2  a5 ðβ  a6 Þ2  a7 ðγ  a8 Þ2 , but it turned out to be more effective to use a heterogeneous neural network with basic neuro elements of the form vðx, α, β, γ, a1 , a2 , …, a8 Þ ¼   ¼ exp a1 ðx  a2 Þ2 tanh fa3 ðα  a4 Þg tanh fa5 ðβ  a6 Þg tanh fa7 ðγ  a8 Þg:

The calculations were carried out for the following parameter change intervals α 2 (0.05; 0.15), β 2 (0.4; 0.6), γ 2 (0.8; 1.2). The optimal weights of the approximate neural network solution y (x, α, β, γ) were selected on the basis of minimizing the functional (1.2) using both the modified polyhedron method and the dense cloud method, which in this case turned out to be more efficient. For a network of 30 neuro elements with the following values of parameters: cloud size ε ¼ 0.03, penalty factor δ ¼ 1, number of test points M ¼ 100, the obtained approximate solution of the problem at control points differ from the data indicated in [6] by less than 2%.

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

117

In the third series of computational experiments, we limited ourselves to the interval problem of one parameter, significantly expanding the range of its change. Following the approach outlined above, we look for a parameterized neural network solution in the parameter α 2 (0; 0.25) change interval. As basic neuro elements, we use functions of the form n o vðx, α, a1 , a2 , a3 , a4 Þ ¼ exp a1 ðx  a2 Þ2 tanh fa3 ðα  a4 Þg: The neural network was trained on the basis of the error functional minimization using the RProp method with the regeneration of 100 test points (xj, αj) every five steps of the algorithm. We obtained the results below after 200 regenerations. We conducted three subseries of computational experiments. In the first sub-series, the above approach was applied without changes. In the second and third subseries, a hybrid method was used. To do this, we construct an approximate pointwise solution on a uniform one-dimensional grid with α ¼ 0.01, the discrepancy with which we will include in the form of a term with a penalty (weight) multiplier in error functional, considering these values of the pointwise solution as additional incoming data for a neural network solution. In the second sub-series of numerical experiments, the weighting factor with the term corresponding to the difference between the network output and the pointwise solution mentioned above did not change in the learning process. In the third subseries, we reduced the weight of the term, which is responsible for the mismatch of the neural network model and the discrete approximation, in error functional, multiplying this weight by 0.95 for each regeneration of test points. Table 4.5 shows the root-mean-square error at the test points for the differential equation σ 1 and the boundary conditions σ 2.

Table 4.5 Values of the error functional with a different set of data. First subseries

Second subseries

Third subseries

n

s1

s2

s1

s2

s1

s2

10 30 100

0.041 0.0055 0.023

0.0064 0.00047 0.0021

0.014 0.0027 0.0037

0.0026 0.00031 0.00031

0.031 0.0021 0.0023

0.0057 0.00017 0.00015

Notations: n is the number of neurons, s1 is the root-mean-square error at the test points for the differential equation, s2 is the root-meansquare error at the test points for the boundary conditions. The number of iterations is 200.

118

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

Consider some features of the simulation results. For the first subseries in the case of using a network of 10 neurons, the error at the value of the parameter α ¼ 0.1 can be considered acceptable, which cannot be said about the cases when α ¼ 0.01 and α ¼ 0.25. Although the absolute error is rather small, the nature of the solution for small values of the parameter is α reflected by the neural network model not accurately. There is a natural assumption that the problem is in the insufficient number of neurons, and it can be solved simply by increasing the used neural network. For a network of 30 neurons with the value of the parameters α ¼ 0.25 and α ¼ 0.1, it was indeed possible to achieve a significant decrease in the error. When the case α ¼ 0.01, the problem remains. When using a network of 100 neurons, errors increase, which is caused by the insufficiency of 200 iterations for training such a network. For the second sub-series, if a network of 10 neurons is used, the results of calculations show that the network size is not enough to assimilate all the available information - the equation, boundary conditions, and a pointwise approximation to the solution. When a network of 30 neurons is used, the results are significantly better than the results for a network of 10 neurons. The error of the approximate neural network solution at α ¼ 0.01 approximately corresponds to the error of the pointwise approximation used. At the same time, taking into account additional information at the value α ¼ 0.01 allows to increase the accuracy significantly and when the parameter equals α ¼ 0.1. In the case when a network of 100 neurons is applied, the use of an approximate solution made it possible to train the neural network much better at small values of the parameter, increasing the accuracy of the approximation and at the value of the parameter α ¼ 0.1. With other values of the parameter α 2 (0; 0.25), the neural network gives an approximation of similar quality. It should be especially noted that with the parameter value α ¼ 0.01, the error that the neural network gives is less than the error of the additional data used. For the third sub-series, when using a network of 10 neurons, the considered approach does not increase the accuracy of the obtained neural network approximation. In the case where a network of 30 neurons is used, the result is noticeably better than in the previous approach (with a constant weighting factor). This circumstance is explained by the fact that at the initial stage of neural network training, its error is significantly higher than that of the pointwise approximation used, and it allows to accelerate the training of the neural network. At

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

the stage when the neural network gives an error comparable to the point-by-point approximation error, the use of the latter becomes impractical and its effect on the learning process is reduced by the above weight adjustment with an appropriate term in error functional. In the case when a network of 100 neurons is used, the results are similar.

Fig. 4.4 Graphs of the neural network, point-wise solutions, and solutions found in the Mathematica development environment at a ¼ 0.01 for the third sub-series when using a neural network of 100 neurons.

As a conclusion, we can point out that the considered hybrid algorithm allows us to obtain a neural network approximation, which is significantly better not only for the neural network approximation, constructed without additional information but also using the pointwise approximation. The result obtained shows (see Fig. 4.4) that 100 neurons are sufficient to assimilate all the information used in training – the equation, boundary conditions, and pointwise approximation. An explicitly constructed approximate neural network solution very accurately simulates the joint processes of heat and mass transfer in a porous catalyst particle not only for specific parameter values, but also in the intervals of variation of these parameters, and the model is defined by a single neural network. Note that a neural network trained in this way can be used to determine parameters from measurement data, minimizing the discrepancy between the measurement data and the output of the neural network using these parameters. The results obtained allow us to suggest that such hybrid algorithms can be effective for a fairly representative class of problems

119

120

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

for constructing approximate solutions of ordinary differential equations and partial differential equations.

4.1.4

Differential-algebraic problem

In this section, we present the results of testing two neural network approaches to solving the problem (1.16). We are looking for solutions—a pair of functions y(t) and z(t)—in the form of RBF-networks with basic elements of the form v(x, a1, a2) ¼ exp { a1(x  a2)2}. The main problem in finding a solution to a problem consists in its ambiguity, i.e., there may be several such pairs of functions for a given value ε of the parameter. The first approach is to restart the process of optimizing the functional (1.2) with components (1.17) with random sets of initial weights. We used networks of 10 neurons and performed 200 restarts. The second approach allows us to estimate the number of approximate solutions to the problem and the corresponding values of the parameter p with a fixed value of ε. To do this, we optimize the error functional (1.2) with components (1.18) on neural network functions with 50 basic neuro elements of the form n o vðx, α, a1 , a2 , a3 , a4 Þ ¼ exp a1 ðx  a2 Þ2 tanh fa3 ðp  a4 Þg: After that, we consider the graph of the dependence of the error functional (1.2) on the parameter p, replacing in Eq. (1.18) pi with a variable p. We assume that the local minima of this graph correspond to approximate solutions of problem (1.16). Fixing the corresponding values of the parameter p, we obtain an approximate neural network solution of the problem, which can be refined by optimizing the functional (1.2) with the components (1.17). In [5,7], the results of computational experiments for these two approaches for the parameter values ε ¼ 0.1 and ε ¼ 0.04, which confirmed the efficiency of the approaches, are presented. This section presents the results of computational experiments for the parameter values ε ¼ 0.4 and ε ¼ 0.01. As samples for comparison, we used the solution of the problem, which is obtained by excluding the function z(t) from Eq. (1.16): εy 00 ¼ y  y 4 , y ð0Þ + y 0 ð0Þ ¼ 0, y ð1Þ ¼ 1=2:

(4.2)

We solved this problem by the method of adjustment (shooting method), i.e., solved the Cauchy problem εy 00 ¼ y  y 4 , y ð0Þ ¼ y 0 ð0Þ ¼ p,

(4.3)

selecting the parameter p so that the condition y(1) ¼ 1/2 is met.

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

€ rmer method [3] We solved the Cauchy problem (4.3) by the Sto with 10,000 steps. Below are the results for the parameter value ε ¼ 0.4 (Fig. 4.5). y(1)–0.5

0.2

0.4

0.6

0.8

1.0

1.2

1.4

p

0.2

0.4

0.6

0.8

Fig. 4.5 The graph of dependence on the parameter p of the difference in the solution of problem (4.3) at the right end of the gap and the required value for the parameter, which was found by the St€ormer method with 10,000 steps.

This graph suggests the existence of two solutions, which is confirmed by the solution of the problem in a wider range of parameter p variation. The graphs of these two solutions are presented below (Fig. 4.6). y1,2 0.9 0.8 0.7 0.6 0.5

0.2

0.4

0.6

0.8

1.0

t

Fig. 4.6 Graphs of approximate solutions of problem (4.2), found by a combination of the shooting method and the St€ormer method for the parameter value E ¼ 0.4.

121

122

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

These solutions are fairly easy to find using the above first approach (restart method) and the second approach (neural network analog of the shooting method). J 140 120 100 80 60 40 20

0.5

1.0

1.5

2.0

p

Fig. 4.7 Graph of the dependence of the error functional (1.2) on the parameter p obtained by replacing in Eq. (1.18) pi with a variable p for the parameter value E ¼ 0.4.

The graph Fig. 4.7 shows two local minima, which are close to the roots on the graph Fig. 4.5. y1,2 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.2

0.4

0.6

0.8

1.0

t

Fig. 4.8 Graphs of approximate solutions of problem (4.2), found using the second approach for the parameter value E ¼ 0.4.

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

These solutions in Fig. 4.8 are qualitatively close to the solutions depicted in the chart in Fig. 4.6. They allow refinement by optimizing the same functional as in the first approach. With the parameter value ε ¼ 0.01, the situation is much more complicated (Figs. 4.9 and 4.10). y 1 0.5 0.8 0.6 0.4 0.2 0.2

0.4

0.6

0.8

1.0

p

1.2

0.2 0.4

Fig. 4.9 A graph of the dependence on the parameter p of the difference in the solution of problem (4.3) at the right end of the gap and the required value of it which is found by the St€ormer method with 10,000 steps for the parameter E ¼ 0.01.

From this graph, there can be a wrong idea that the problem has three solutions (Fig. 4.10). y1.2,3 1.4 1.2 1.0 0.8 0.6 0.4 0.2

0.2

0.4

0.6

0.8

1.0

t

Fig. 4.10 The graphs of the first three approximate solutions of problem (4.2), found by a combination of the shooting method and the St€ ormer method for the parameter value E ¼ 0.01.

123

124

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

A more thorough study of the left and right ends of the parameter change interval shows that this representation is incorrect (Figs. 4.11–4.14). y(1)–0.5 0.8 0.6 0.4 0.2 0.0002

0.0004

0.0006

0.0008

0.0010

p

–0.2 –0.4

Fig. 4.11 Graph of dependence on a parameter p for its small values of the difference between the solution of the problem (4.3) at the right end of the gap and the required value for the parameter E ¼ 0.01, which was found by the St€ ormer method with 10,000 steps.

y4,5 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.2

0.4

0.6

0.8

1.0

t

Fig. 4.12 The graphs of the fourth and fifth approximate solutions of problem (4.2), found by a combination of the shooting method and the St€ ormer method for the value of the parameter E ¼ 0.01.

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

y(1)–0.5 0.4 0.2 0.0

1.340

1.345

p

1.350

–0.2 –0.4 –0.6 –0.8

Fig. 4.13 Graph of dependence on a parameter p for its maximum values (for which a solution exists) of the difference between the solution of problem (4.3) at the right end of the gap and the required value of it which is found by the St€ ormer method with 10,000 steps for the parameter E ¼ 0.01.

y6,7,8 1.4 1.2 1.0 0.8 0.6 0.4 0.2

0.2

0.4

0.6

0.8

1.0

t

Fig. 4.14 The graphs of the last three approximate solutions of problem (4.2), found by a combination of the shooting method and the St€ ormer method for the parameter value E ¼ 0.01.

A feature of the solution set when ε ¼ 0.01 is a drastic change in the nature of decisions with minor changes in the parameter p. This change creates great difficulties for the proposed two neural network approaches.

125

126

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

The second approach faces great difficulties. From the graph of Fig. 4.15 can be seen that all approximate solutions are not immediately available for this method. Reducing the parameter p measurement interval and increasing the number of neurons improves the situation. J

150

100

50

0.5

1.0

1.5

2.0

p

Fig. 4.15 Graph of the dependence of the error functional (1.2) on the parameter p obtained by replacing the variables in Eq. (1.18) with a variable for the parameter value E ¼ 0.01.

The considered computational experiments allow us to conclude that the proposed neural network approach does not work well for finding unstable solutions. Improving the situation allows the use of additional information about the simulated object, which will be shown later when considering other problems.

4.2 4.2.1

Solving problems for partial differential equations in domains with constant boundaries Solution of the Dirichlet problem for the Laplace equation in the unit circle

In this section, we present the results of computational experiments for the Dirichlet problem for the Laplace operator in the unit circle, which formulation was described in Section 1.2.1. Problem-solving is in the form of an RBF-network (2.15). We fit the solution by optimizing the error functional (1.2).

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

We used standard Gaussians as basis functions.  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   vðx, y, ai , xi , yi Þ ¼ φ ai ðx  xi Þ2 + ðy  yi Þ2 , φðt Þ ¼ exp t 2 : In addition to the general methods stated earlier, special methods were used. The following special methods for fitting parameters we called “Compensation Methods.” (a) Method of “Inner circles.” The arrangement (selection) of centers of basis functions begins from the boundary of the region for sufficiently large values of the parameters ai and values ci providing the satisfaction of the boundary conditions with further movement into the region. In this case, the position of the centers of the next layer is selected at the points of maximum for the value of the Laplace operator on the solutions of the previous step, the values ci of the parameters are chosen from considerations of compensation for these values. Next, the minimization of the functional is carried out, accompanied by a decrease in the values ai at each step of the iterative process. (b) Method of “Outer circles.” This method consists in choosing centers that are far enough away and selecting a parameter a in such a way as to zero the Laplacian in the region in which the solution is sought. (c) The combined method allows uniting the arrangement of centers both inside and outside the area. Another approach should be noted. In this approach, we choose as functions ν the fundamental solutions of a linear differential operator with centers outside the region where the solution is sought. At the same time, we will certainly satisfy the differential equation—network training will be reduced to satisfying the boundary conditions. In the case of our model problem uðx, y Þ ¼

N X



ci ln ðx  xi Þ2 + ðy  yi Þ2 ,

i¼1

Herewith the term J1 in error functional (1.2) automatically becomes 0. The results of the calculation. As one would expect, the fastest methods down to the limit take into account the most specific features of the problem. The method of “inner” circles immediately gives a relatively good approximation, but the process of its iterative refinement turned out to be unstable. The method of “external” circles was even more effective than expected, and it showed rapid convergence. The method based on the use of fundamental

127

128

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

solutions (logarithms) also rapidly converges. Taking into account the features of the problem has a reverse side—these methods are difficult to extend to more complex cases, for example, to nonlinear problems. More general methods—for example, simply approximation of a solution by a linear combination of Gaussian packets converges more weekly, but without special problems extends to other domains and equations, including non-linear ones. As for the approach with the approximation of functions by “elliptic” exponents, their advantage over ordinary Gaussians mainly manifests itself in complex areas. In our case, the gain from using this approach was not very significant. A detailed description of the results is given in [8].

4.2.2

Solving boundary value problems for the Laplace equation in the unit square

This section presents the results of computational experiments ∂2 u ∂2 u for the Dirichlet problem for the Laplace equation 2 + 2 ¼ 0 in ∂x ∂y the unit square [0; 1]  [0; 1]. As boundary conditions, we chose conditions on the sides of a square u ¼ 0 with x ¼ 0 or y ¼ 0 and u ¼ 1 with x ¼ 1 or y ¼ 1. We took the boundary conditions discontinuous at the points (0; 1) and (1; 0) in order to test our algorithms for a similar difficulty. We fit the solution by optimizing the error functional (1.2). As the first term J1 responsible for the implementation of the equation, we used the following: M

2 X Δu xj0 , yj0

(4.4)

j¼1

or

"    # M X ∂u 0 0 2 ∂u 0 0 2 + x ,y x ,y ∂x j j ∂y j j j¼1

(4.5)

where (x 0j, y 0j) are sample points from the square [0; 1]  [0; 1]. As the second term J2 responsible for the fulfillment of the boundary conditions, we used M1 M1

2 X

2 X u xj00 , 0 + u 0, yj0 j¼1

j¼1

M1 M1

2 X

2 X + u xj00 , 1  1 + u 1, yj00  1 , j¼1

j¼1

(4.6)

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

where the test points at which the values of the function are calculated, we choose at the boundaries of the square. The solution, as in Section 4.2.1, is obtained in the form of an RBF-network (2.15). We used standard Gaussians as basis functions.  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   vðx, y, ai , xi , yi Þ ¼ φ ai ðx  xi Þ2 + ðy  yi Þ2 , φðt Þ ¼ exp t 2 and the second-order basic splines vðx, y, ai , xi , yi Þ ¼ φðai ðx  xi ÞÞφðai ðy  yi ÞÞ, 8 0:75  t 2 , jt j  0:5, > > < φðt Þ ¼ 0:5ð0:75  jt jÞ2 , 0:5 < jt j  1:5, > > : 0, 1:5 < jt j:

(4.7)

Optimization was carried out using the RProp method [9]. The results were compared with an approximate solution to the problem [10] w ðx, y Þ ¼

100 4X sh½π ð2i  1Þx sin ½π ð2i  1Þy  + sh½π ð2i  1Þy  sin ½π ð2i  1Þx , π i¼1 sh½π ð2i  1Þð2i  1Þ

based on the application of the Fourier method. We carried out a comparative test of our algorithms in the following situations: (i) Choosing the term J1 (4.4) or (4.5) as a component of the error functional that is responsible for the implementation of the equation. The results for the first option are shown in Tables 4.6–4.8 and 4.10, the second option— in Table 4.9. (ii) Using a different number of neurons m (terms in sum (2.15)). We considered in the case of m ¼ 10, m ¼ 30 and m ¼ 100. (iii) Various ways to regenerate test points (x 0 j, y 0 j) including absent of regeneration (points are chosen randomly— evenly point is distributed in the square [0; 1]  [0; 1]), full regeneration (points are regenerated uniform distributed in the square [0; 1]  [0; 1] every five steps of the nonlinear optimization algorithm) and partial (with a fraction of non-regenerable points dt, as in Section 4.1.1). (iv) Analysis of various ways for the network structure: a fixed number of neurons, the adding of single-neuron neurons

129

130

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

with additional training of the entire network and with the additional training of the latest added neuron only. The results for the first option are shown in Tables 4.6, 4.9, and 4.10, the second option—in Table 4.7, and the third option—in Table 4.8. (v) A different number of test points. (vi) Various basic functions. The quality of the neural network model assessment was carried out using the functional J1 calculated at 10,000 random test points inside the square and the functional J2 calculated at 10,000 test points distributed at regular intervals at each of its boundaries. The first series of computational experiments. In this series, the functional with the first term (4.4) is used, the number of neurons N is constant. Errors were evaluated after 1000 steps of the RProp method.

Table 4.6 Evaluation of the quality of the neural network model under the condition of at once training of the entire network with a fixed number of neurons. M 5 M1 5 10 N

dt

10

0 0.5 1

30

0 0.5 1

100

0 0.5 1

4

Min Average Min Average Min Average Min Average Min Average Min Average Min Average Min Average Min Average

4

M 5 M1 5 100 4

10 J1

10 J2/4

10 J1

104J2/4

1.50346 5.191937 2.45477 32.21212 8.91144 136.9011 1.12195 34.60807 2.72336 28.42257 46.3502 215.229 1.57796 490.3426 4.33921 438.294 48.2489 257.7467

0.010043 0.020631 0.013496 0.024794 0.013612 0.021124 0.018525 0.021805 0.016623 0.020994 0.013689 0.016629 0.018168 0.042897 0.017015 0.02112 0.013939 0.016415

0.535972 0.8535074 0.683068 1.36162211 0.58785 0.86259503 0.395031 0.7311151 0.437046 1.0236731 0.437676 2.37419 0.453604 0.7433044 0.801245 2.36012046 0.394715 3.3334859

0.0100638 0.01727334 0.0114621 0.01866244 0.0134612 0.01912194 0.0109505 0.01281265 0.0103951 0.01282435 0.0100392 0.01220732 0.0109912 0.02621479 0.0108334 0.02309024 0.0090717 0.02953327

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

Analysis of the results of computational experiments, presented in Table 4.6, leads to the following conclusions: (1) With a small number of test points M ¼ 10 at which the operator is calculated, their regeneration can significantly (multiplied many times) reduce the discrepancy by the equation. With a large number of test points M ¼ 100 , their regeneration affects little. (2) With an increase in the number of neurons, the error decreases weakly, which indicates the insufficiency of the applied number of steps for training neural networks with the number of neurons N 30. With a further increase in the number of optimization steps, the error continued to decrease, but this greatly increased the training time. (3) The error in the fulfillment of the boundary condition weakly depends on both the number of test points on the border and the number of neurons. (4) With a large number of neurons and test points, the regeneration of test points leads to greater stability of the learning process, since the average and minimum errors differ little. Thus, the learning process of a neural network with the regeneration of test points can be started once, for the network without the regeneration of test points in order to obtain the same result, the process of its training will have to be launched more than once. We illustrate the results with several graphs. u x,x

u,w

w x,x

1.0 0.8 0.6 0.4 0.2

0.2

0.4

0.6

0.8

1.0

x

Fig. 4.16 Comparison of neural network solution u(x, x) for N ¼ 3, M ¼ M1 ¼ 10, built with dt ¼ 0, and approximate solution w (x, x).

131

132

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

From the graph (Fig. 4.16), we conclude that even a network with a small number of neurons allows us to get the right qualitative picture of the behavior of the solution. u x,1

u,w

w x,1

1.2 1.0 0.8 0.6 0.4 0.2

0.2

0.4

0.6

0.8

1.0

x

Fig. 4.17 Comparison of neural network solution u(x, 1) for N ¼ 3, M ¼ M1 ¼ 10, built with dt ¼ 0, and approximate solution w (x, 1).

According to the graph (Fig. 4.17), we can see that the approximation of the boundary condition is significantly worse than the approximation of the solution inside the region. This result of poor approximation was to be expected since the boundary condition is discontinuous at x ¼ 0 and u ¼ 1 at y ¼ 1. The solution constructed using the Fourier method approximates the boundary condition much better, but oscillates quickly, which is difficult to expect when modeling real objects. w x,x

u x,x

u,w 1.0 0.8 0.6 0.4 0.2

0.2

0.4

0.6

0.8

1.0

x

–0.2

Fig. 4.18 Comparison of neural network solution u(x, x) for N ¼ 3, M ¼ M1 ¼ 10, obtained with dt ¼ 1, and approximate solution w (x, x).

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

From the graph (Fig. 4.18), we conclude that, without regenerating the test points, the approximation was noticeably worse, although the network still provides the correct qualitative understanding of the solution’s behavior. u x,1

u,w

w x,1

1.2 1.0 0.8 0.6 0.4 0.2

0.2

0.4

0.6

0.8

1.0

x

Fig. 4.19 Comparison of neural network solution u(x, 1) for N ¼ 3, M ¼ M1 ¼ 10, obtained with dt ¼ 1, and approximate solution w (x, 1).

On the graph (Fig. 4.19), we see that, without regeneration of the test points, the approximation of the boundary condition derived noticeably worse than with regeneration. u x,x

u,w

w x,x

1.0 0.8 0.6 0.4 0.2

0.2

0.4

0.6

0.8

1.0

x

Fig. 4.20 Comparison of neural network solution u(x, x) for N ¼ 10, M ¼ M1 ¼ 10, obtained with dt ¼ 0, and approximate solution w (x, x).

133

134

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

From the graph (Fig. 4.20), we conclude that the network with the number of neurons N ¼ 10 allows us to obtain a good approximation for the solution inside the region. u x,1

u,w

w x,1

1.2 1.0 0.8 0.6 0.4 0.2

0.2

0.4

0.6

0.8

1.0

x

Fig. 4.21 Comparison of neural network solution u(x, 1) for N ¼ 10, M ¼ M1 ¼ 10, built with dt ¼ 0, and approximate solution w (x, 1).

According to the graph (Fig. 4.21), we see that the approximation of the boundary condition becomes somewhat better, but not dramatically, with an increase in the number of neurons to N ¼ 10. u x,x

u,w

w x,x

1.0 0.8 0.6 0.4 0.2

0.2

0.4

0.6

0.8

1.0

x

Fig. 4.22 Comparison of neural network solution u(x, x) for N ¼ 10, M ¼ M1 ¼ 10, obtained with dt ¼ 1, and approximate solution w(x, x).

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

135

From the graph (Fig. 4.22), we conclude that without the regeneration of the test points, the approximation turned out to be significantly worse than with the regeneration. w x,1

u x,1

u,w 1.2 1.0 0.8 0.6 0.4 0.2

0.2

0.4

0.6

0.8

1.0

x

Fig. 4.23 Comparison of neural network solution u(x, 1) for N ¼ 10, M ¼ M1 ¼ 10, obtained with dt ¼ 1, and approximate solution w (x, 1).

A comparison of Figs. 4.21 and 4.23 shows that without regeneration of test points in the region [0; 1]  [0; 1], the approximation of the boundary condition turned out to be somewhat better than with regeneration. This situation is because the regeneration of the boundary points was not performed. Let’s discuss the second series of computational experiments. In this series, the functional with the first term (4.4) is used, and neurons are added one by one. After adding a neuron, 250 learning steps of the entire network were performed using the RProp method. Errors were evaluated after the addition of a neuron with a number N.

Table 4.7 The quality assessment of the neural network model with the sequential addition of neurons one at a time and training the entire network at once. M 5 M1 5 10 N

dt

10

0 0.5

Min Average Min Average

M 5 M1 5 100

104J1

104J2/4

104J1

104J2/4

1.12437 7.546195 2.84753 19.985071

0.0190266 0.0249316 0.0187088 0.025771

0.636236 0.8868568 0.68108 1.408333

0.0123691 0.01814682 0.0143824 0.01980869 Continued

136

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

Table 4.7 The quality assessment of the neural network model with the sequential addition of neurons one at a time and training the entire network at once.—cont’d M 5 M1 5 10 N

dt 1

30

0 0.5 1

100

4

0 0.5 1

Min Average Min Average Min Average Min Average Min Average Min Average Min Average

4

M 5 M1 5 100 4

10 J1

10 J2/4

10 J1

104J2/4

8.9184 141.9396 33.5568 478.68432 9.19747 362.06276 180.072 1245.9837 37.3804 704.72347 10.2741 339.26067 690.382 1683.3929

0.0170643 0.0222022 0.0132936 0.0163435 0.0133931 0.0172382 0.011423 0.0165221 0.0122037 0.0167627 0.0172789 0.0200427 0.0087997 0.0130646

0.655234 1.427212 0.299308 32.3059913 0.360398 269.864234 0.465636 25.2227232 0.234528 1.6299688 0.346248 3289.46954 3.40314 137.957773

0.0129098 0.01854509 0.0083889 0.01084514 0.00753512 0.00943272 0.00549309 0.00844393 0.00615361 0.00702222 0.00208466 0.00536033 0.0032832 0.00527263

Analysis of the results of computational experiments presented in Table 4.7 allows us to conclude the following consequences: (1) The first and the second conclusions from Table 4.6 are also preserved for this case. (2) The error in the fulfillment of the boundary condition weakly depends on the number of neurons but can be significantly reduced by increasing the number of test points on the border. (3) With a large number of neurons and test points, the regeneration of test points leads to a significant decrease in both the average and minimum error in satisfying the equation. In this case, partial regeneration of test points can significantly reduce the error in the fulfillment of the boundary condition. (4) The computation time in the second series of computational experiments is significantly longer than in the first, which makes this version of the organization of computations unjustified.

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

137

The third series of computational experiments. In this series, the functional with the first term (4.4) is used, neurons are added one by one. After adding a neuron, 250 steps of learning its weights were performed using the RProp method. Errors were estimated after the addition of a neuron with a number N.

Table 4.8 The quality assessment of the neural network model with the sequential addition of neurons one by one and learning the weights of the last neuron. M 5 M1 5 10 N

dt

10

0 0.5 1

30

0 0.5 1

100

0 0.5 1

4

Min Average Min Average Min Average Min Average Min Average Min Average Min Average Min Average Min Average

4

M 5 M1 5 100

10 J1

10 J2/4

10 J1

104J2/4

1.24483 1.943577 0.695721 4.3012431 0.0387453 37.409902 0.730198 3.7559288 0.670894 15.991006 1.94203 66.857042 2.88166 20.250062 3.35531 31.269307 7.55644 36.675575

0.0330869 0.04542019 0.0329726 0.04459741 0.0228246 0.04358381 0.0256905 0.03744267 0.0219016 0.04446444 0.0202368 0.03922225 0.0259495 0.07519275 0.0361489 0.19731299 0.0274263 0.05369863

0.709983 1.0517298 0.812818 1.238377 0.689476 3.7309356 0.816593 1.1164618 0.655584 1.3127814 0.730069 1.6610091 0.484643 0.9476574 0.755501 1300.1118 1.10541 103.40072

0.0268538 0.04388407 0.0303641 0.04017467 0.0235465 0.04232399 0.0195414 0.03203485 0.0215338 0.02593248 0.0213206 0.03037779 0.0175792 0.02532041 0.0155587 3.31541737 0.0164017 0.19754137

Analysis of the results of computational experiments presented in Table 4.8 allows drawing the following conclusions: (1) The findings from Table 4.6 are also preserved for this case. (2) Since this method is computationally significantly less expensive than the previous two, we recommend using it in linear problems.

4

138

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

The fourth series of computational experiments. In this series, we use the functional with the first term (4.5), and the number of neurons N was constant. Errors were estimated after 1000 steps of the RProp method.

Table 4.9 The quality assessment of the neural network model when training the entire network at once using the energy functional. M 5 M1 5 10 N

dt

10

0 0.5 1

30

0 0.5 1

100

0 0.5 1

4

Min Average Min Average Min Average Min Average Min Average Min Average Min Average Min Average Min Average

4

M 5 M1 5 100 4

10 J1

10 J2/4

10 J1

104J2/4

75.1589 152.5317 151.549 293.7417 111.93 352.2608 110.632 330.8709 137.564 443.864 210.217 1011.48 159.774 1001.124 220.91 2085.591 728.782 2177.974

0.015186 0.017836 0.014755 0.017345 0.014952 0.017504 0.01494 0.017582 0.014119 0.017052 0.010674 0.017868 0.015101 0.022083 0.015879 0.02241 0.015789 0.01926

15.4035 146.42966 31.2173 126.65599 42.4635 438.70353 53.85 243.66644 39.4318 258.2448 91.7742 839.07612 43.8676 136.95634 80.0465 242.78757 47.3603 687.69984

0.00739925 0.00913999 0.00830978 0.00999224 0.00610112 0.00860041 0.00697124 0.00859789 0.00609196 0.00783993 0.00491224 0.00705614 0.00754138 0.00967942 0.00709269 0.00842608 0.00703687 0.00902301

A detailed analysis of Table 4.9 allows us to draw conclusions similar to those in Table 4.6, but a significant increase in the error compared with the case of using the functional (4.4) does not allow us to recommend its use in conjunction with the basic functions of high smoothness. The only advantage of the functional (4.5) is the possibility of using non-smooth basis functions, for example, piecewise linear functions. The fifth series of computational experiments. In this series, we used the functional with the first term (4.4), and the number of neurons N is constant, the basis functions are splines (4.7). Errors were estimated after 1000 steps of the RProp method.

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

139

Table 4.10 Evaluation of the quality of the neural network model when training the entire network at once using basic splines. M 5 M1 5 10 N

dt

10

0 0.5 1

30

0 0.5 1

100

0 0.5 1

4

Min Average Min Average Min Average Min Average Min Average Min Average Min Average Min Average Min Average

4

10 J1

10 J2/4

10 J1

104J2/4

8.44614 12.2981 14.2969 20.48641 52.1881 118.5783 17.021 23.04674 22.2959 32.70329 93.326 146.1737 45.7117 71.71628 88.504 103.179 172.488 355.4236

0.035921 0.041431 0.030122 0.035416 0.023738 0.029391 0.028611 0.039657 0.028991 0.033974 0.024032 0.02612 0.032815 0.049847 0.034962 0.045688 0.022387 0.025813

4.65418 6.211137 6.18003 8.555389 6.66204 13.29541 5.79705 7.117371 6.41039 8.044745 9.58485 13.49212 19.8224 24.76891 26.1421 30.47873 42.3532 52.42333

0.031536 0.036267 0.028138 0.034856 0.028138 0.033969 0.026416 0.031264 0.026713 0.02896 0.023037 0.02471 0.025768 0.030875 0.024986 0.02912 0.01933 0.021555

A detailed analysis of Table 4.10 allows us to draw conclusions similar to those in Table 4.6. However, a significant increase in error compared with the case of using Gaussian basis functions does not recommend its use in conjunction with the basic functions of high smoothness. Computational experiments performed for other tasks have shown that similar conclusions can be made for them. Below are just some of the most interesting results.

4.2.3

M 5 M1 5 100

The Laplace equation in the L-region

We will look for a solution u ¼ u(x, y) to the two-dimensional Laplace equation Δu ¼ 0 in the region L: 0 < x, y < a, min(x, y) < d < a, which is the union of two rectangles Π1 : 0 < x < a; 0 < y < d and Π2 : 0 < x < d; 0 < y < a; on the parts {Γk}6k¼1 of the boundary of the region, the solution satisfies the Dirichlet conditions: ujΓk ¼ fk (Fig. 4.24).

4

140

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

y

G3

a G4 G2 d

G5

P2 P1

G1

d

G6 a

x

Fig. 4.24 The region in which the solution is sought.

The solution of this model problem for the L-domain can be found explicitly through solutions in canonical subdomains Π1, Π2, obtained, for example, using the method of separation of variables. For the selected boundary conditions f1(x) ¼ sin(πx/a), e , continued by f2(y) ¼ sin(πy/a), f3 ¼ f4 ¼ f5 ¼ f6 ¼ 0, this solution u zero per square Qd,a ¼ (d; a)  (d; a), has the following form e ðx, y Þ ¼ sin ðπx=aÞ u

shðπ ðd  y Þ=aÞ ð1 + signðd  y ÞÞ + 2shðπd=aÞ

+ sin ðπy=aÞ

shðπ ðd  xÞ=aÞ ð1 + signðd  xÞÞ: 2shðπd=aÞ

In solving this problem, we developed and then tested two types of algorithms. The first type of algorithms is associated with the application of fixed-size neural networks. There were two such algorithms. In the first of these, an approximate solution of the problem was found using a single neural network obtained on the basis of “elliptic” exponents. Network training (adjusting of its weights) was carried out according to the cloud method (see Section 3.2). The accuracy of the approximate solution un and the training time of the neural network increases with the number of neurons n. For the n ¼ 128 maximum error does not exceed 0.1. In the second algorithm on the base of a fixed-size network, we use the possibility of decomposing the origin area: for a model problem, we represent the area L as a union of rectangular areas Π1 and Π2 with a non-empty intersection Q0,d. The equation type and structure of the area allow the problem to be parallelized using the following modification of the well-known Schwartz method: (1) Just as it was implemented in the first algorithm for the entire complex area L, in each of the subareas Π1 and Π2, we construct its neural network approximation for the solution

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

u1N1 and u2N2, accordingly, using boundary conditions only on the part of the boundary for specifying the corresponding error functionals J1 and J2, and the test point sets are taken on the border only where boundary conditions are known. (2) After a certain number of stages of training each of the neural networks, approximations arise for the unknown part of the boundary conditions at the boundaries of each subarea. (3) Data is being exchanged—additional terms are introduced into each of the error functionals due to information about the solution on that part of the boundary of the subareas Π1 and Π2 on which no solutions were given (the classical version of the Schwarz method in neural network interpretation); this information is a solution built on another subarea. (4) The calculation procedure is repeated a specified number of times or until the required accuracy is reached. The following modification of the algorithm seems to be more interesting. In step 3, information on the mismatch of solutions u1N1 and u2N2 in Q0,d ¼ Π1 \ Π2 (which gives a smoother junction) is introduced into the error functional J1 and J2. We used just this approach at the numerical experiment. When using two networks with the number of neurons N1 ¼ N2 ¼ 32, the maximum error does not exceed 0.1. In addition to these two algorithms of the first type, we also considered algorithms of the second type, which implemented three evolutionary approaches. Similar approaches are describe in Section 3.1. We obtained the best results using an approach based on the training of an ensemble of two networks. This approach allowed achieving accuracy of 0.1 when using networks with 12 neurons.

4.2.4

The Poisson problem

When solving the Poisson problem from Section 1.2.2, the following parameters were chosen for numerical calculations: Ω : x2 + y 2 < 1, D : ðx  x0 Þ2 + ðy  y0 Þ2 < r 2 ,x0 ¼ 0:4, y0 ¼ 0, r ¼ 0:4, g ¼ A ¼ 10, ðx, y Þ 2 D,g ¼ 0, ðx, y Þ 2 Ωn D: We performed comparative testing of two approaches. In the first approach, we constructed a solution using a single neural network composed of 20 linear elements with coefficients—single-layer perceptrons with an activation function γ(s) ¼ tanh(s) or RBfunctions in the form of Gaussian packets. In the second approach, we used two networks of 10 neurons, the first for the subdomain D, the second for the subdomain Ω\D. The learning outcomes are similar, but for the second case, the achievement

141

142

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

of a given level of learning is faster. The calculation of the solution of this problem, made on the basis of the Finite Element Method using the standard FEM package, yielded the same results. The results of the calculations showed that application of a single neural network does not adequately describe the piecewise nature of the solution in case of violation of smoothness (in this case, the second order derivatives have a discontinuity). This fact was noted earlier when comparing two neural network approaches in the general situation. A detailed description of the results is given in [8].

4.2.5

€dinger equation with a piecewise Schro potential (quantum dot)

In this section, we present the results of computational experiments on the problem of modeling a quantum dot in the onedimensional and two-dimensional cases, stated in Section 1.2.3. In the case of a one-dimensional problem, the ordinary differential equation (1.24) for each of the subdomains Ω1 ¼ [ d; d], Ω2 ¼ [ m,  d] [ [d, m] is solved explicitly. The conditions of matching at the junction and the requirement of decreasing the solution, at the spatial variable x ! m, lead to a transcendental relation for the values of the spectral parameter λ characterizing the bound states. In the case of potential q1 ¼ 0 and a finite potential q2, we will have a finite number of such values of the parameter λ. Eigenfunctions are also computed explicitly in piecewise analytic form. The calculations were performed for d ¼ 10нм, m ¼ 40нм and the following parameter values: K1 ¼ 0:8503, E1 ¼ 0:42, Δ1 ¼ 0:48;K2 ¼ 0:8878, E2 ¼ 1:52, Δ2 ¼ 0:34, wherein potentials q1 ¼ 0, q2 ¼ 0.7. We present the corresponding results. They will allow evaluating the quality of the neural network solution of the problem. We will seek for wave functions in a piecewise form. 8 rffiffiffiffiffiffiffiffiffiffiffiffi   rffiffiffiffiffiffiffiffiffiffiffiffi  q2  λ q2  λ > > x + Dexp  x ,x 2 ð∞;  d Þ; C exp > > > p p2 2 > > > > rffiffiffiffiffi  rffiffiffiffiffi  < λ λ uðxÞ ¼ A cos x + B sin x ,x 2 ½d; d ; > p p 1 1 > > > > rffiffiffiffiffiffiffiffiffiffiffiffi   rffiffiffiffiffiffiffiffiffiffiffiffi  > > q2  λ q2  λ > > : E exp x + F exp  x , x 2 ðd; +∞Þ: p2 p2

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

First, we consider even solutions u ¼ u+(x) ¼ u+( x). The decay conditions of the solution at infinity, parity, and agreement with x ¼ d lead to relations rffiffiffiffiffi   rffiffiffiffiffiffiffiffiffiffiffiffi  8 λ q2  λ > > d ¼ C exp  d , Acos > > > p p2 1 > > > rffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffi   rffiffiffiffiffiffiffiffiffiffiffiffi  > < λ λ q2  λ q2  λ Ap1 sin d ¼ Cp2 exp  d , p p p p2 > 1 1 2 > > > > > B ¼ D ¼ E ¼ 0, > > > : F ¼ C: Nontrivial even solutions of the problem (A 6¼ 0, C 6¼ 0) exist only for some specific values of the parameter included in the spectrum of the problem; these values are found from the condition 0 1 rffiffiffiffiffi   rffiffiffiffiffiffiffiffiffiffiffiffi  λ q2  λ cos d exp  d B C p1 p2 B C B C det B rffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffi   rffiffiffiffiffiffiffiffiffiffiffiffi  C ¼ 0: @ λ λ q2  λ q2  λ A sin d p2 exp  d p1 p1 p1 p2 p2 The last relation can be rewritten as sffiffiffiffiffiffiffiffiffiffiffi ! sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi λ p 2 ð λ Þ q2  λ d ¼ tg p1 ðλÞ p 1 ðλÞ λ This transcendental equation for the parameter λ has three different real roots λ λ0 < λ2 < λ4 with the given parameters: λ0 ¼ 0:03333135654541547, λ2 ¼ 0:23125766492176728, λ4 ¼ 0:4896336310029232: Odd decisions u ¼ u(x) ¼  u( x) are treated similarly. The same requirements for the solution at infinity in combination with the conditions of oddness and matching at the junction of subdomains with x ¼ d lead to a system of equations rffiffiffiffiffi   rffiffiffiffiffiffiffiffiffiffiffiffi  8 λ q2  λ > > d ¼ C exp  d , > B sin > > p p2 1 > > > rffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffi   rffiffiffiffiffiffiffiffiffiffiffiffi  > < λ λ q2  λ q2  λ Bp1 cos d ¼ Cp2 exp  d , p p p p2 > 1 1 2 > > > > > A ¼ D ¼ E ¼ 0, > > > : F ¼ C:

143

144

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

The points of the spectrum corresponding to antisymmetric wave functions are determined from the appropriate characteristic equation rffiffiffiffiffi   rffiffiffiffiffiffiffiffiffiffiffiffi  0 1 λ q2  λ d  exp  d sin B C p1 p2 B C C ¼ 0, det B B rffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffi   rffiffiffiffiffiffiffiffiffiffiffiffi  C @ λ λ q2  λ q2  λ A cos d p2 exp  d p1 p1 p1 p2 p2 which can also be represented as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffi ! λ p1 ðλÞ λ d ¼ tg p1 ðλÞ p2 ðλÞ q2  λ With the chosen values of the parameters, this equation also has three different real roots λ: λ1 < λ3 < λ5 λ1 ¼ 0:1183375658839791, λ3 ¼ 0:3575647706695449, λ5 ¼ 0:6214300827986622: Thus, in the energy spectrum σ of stable states—there is an alternation of values λ0 < λ1 < λ2 < λ3 < λ4 < λ5 corresponding to states with a symmetric wave function and an antisymmetric one: 0.03333135654541547, 0.1183375658839791, 0.23125766492176728, 0.3575647706695449, 0.4896336310029232, 0.6214300827986622. Here is a graph of the wave function u+(x) for the smallest eigenvalue λ0 ¼ 0.03333135654541547 (Fig. 4.25).

1 0.8 0.6 0.4 0.2 –40

–20

20

Fig. 4.25 The exact solution graph for the smallest eigenvalue.

40

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

In order to find the wave function u within the neural network approach, a system of neural networks is used, piecewise approximating it in each of the subregions Ωj with the RBF-network:  2

PN j m, j ¼ 1,2. We found the netuj ðxÞ ¼ i¼1 cij exp aij x  bij work weights in the process of its training based on minimizing the error functional (1.25). In this simple case, functional (1.25) can also be computed explicitly, which greatly facilitates the process of training networks. Due to the bulkiness of this expression, it is not given. With certain tricks mentioned above, namely the “smearing” of matching conditions, we could manage with a single neural network, but only the second approach was involved in the numerical experiment. Calculations in the neural network version were also carried out for d ¼ 10нм, m ¼ 40нм and the following parameter values: K1 ¼ 0:8503, E1 ¼ 0:42, Δ1 ¼ 0:48; K2 ¼ 0:8878, E2 ¼ 1:52, Δ2 ¼ 0:34 and potentials q1 ¼ 0, q2 ¼ 0.7, which led to the following six values of the spectral parameter λ—the energy values of the related states: 0.03333135654541547, 0.1183375658839791, 0.23125766492176728, 0.3575647706695449, 0.48963363100100232, 0.621430082798666622. A set of two networks based on Gaussian packets was used with the following values of the parameters: N1 ¼ 16 elements for the region Ω1, corresponding to the quantum dot, N2 ¼ 8 elements for the matrix, and the region Ω2 surrounding the dot. The numerical experiment showed that the neural network approximation coincides well with the known exact solutions (Fig. 4.26).

1 0.8 0.6 0.4 0.2 20

40

60

–0.2

Fig. 4.26 Neural net solution graph for least eigenvalue.

80

145

146

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

Note that in the multidimensional situation, the neural network approach has the mentioned advantages, which are manifested in the simplest one-dimensional situation. Fig. 4.27 shows a graph of the approximate neural network solution of the problem in the two-dimensional case for the minimum value of the spectral parameter (energy level). This approximation is constructed using a set of two neural networks based on Gaussian packets having N1 ¼ 36 neuro elements for the region Ω1 corresponding to the quantum dot, N2 ¼ 12 neuro elements for the matrix the region Ω2 surrounding the quantum dot. Numerical experiments showed good agreement between approximations and exact solutions (in simple cases) with solutions obtained by other methods [11,12].

Fig. 4.27 Approximation of a neural network solution for lmin.

It should be noted that the obtained positive results at the initial stage of constructing a model of a nano-object give reason to apply the neural network method when creating a hierarchy of models for a wide range of related problems. The following studies may be of undoubted interest in these problems: • study of various types of quantum dots, wires (other variants of the geometry of the region), • consideration of another choice of the error functional (in particular, in connection with another type of the coefficients p, q dependences on the spectral parameter λ), • perturbations of the problem parameters, • use of different neural network basic sets, • comparison of one-network and two-network approximations of solving the problem,

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

• application and comparative analysis of evolutionary learning algorithms (adjustment of weights and selection of the structure of neural networks), leading to a decrease in the number of neural network functions used, • consideration of the nonlinear spectral problem in the framework of the neural network approach, • comparison of the results obtained on the basis of the neural network approach with the calculations carried out using standard mathematical packages that allow modeling physical processes in areas with complex geometry, • consideration of quantum dots in an external magnetic field.

4.2.6

Nonlinear Schr€ odinger equation

In this section, we discuss the results of computational exper€ dinger equation (1.27) (see iments to solve the nonlinear Schro Section 1.2.4). We solved two types of problems. In a problem of the first type, we were looking for a solution to € dinger equation in the circle Ω : x2 + y2 < 1, setting a conthe Schro dition on the boundary of the region (circle). In the calculations, an RBF network of 25 Gaussian packets was used. As the right side of Eq. (1.27), we used functions of two types of smoothness with support in a small circle D: (x  x0)2 + (y  y0)2 < r2, x0 ¼ 0.4, y0 ¼ 0, r ¼ 0.4 located inside the original circle Ω. One function is constant in the circle D and equals zero out of the circle D. In this case, quite acceptable results are obtained if we exclude the neighborhood of the border of this step (or choose a special distribution law for test points when training ANN). The second function g is a smooth function with a smooth vertex: h

i2 , ðx, y Þ 2 D, g ¼ 0, ðx, y Þ 2 Ωn D: g ¼ 10 1  ðx  x0 Þ2 + ðy  y0 Þ2 =r 2

For obvious reasons, for such a function, the results are much better. In a problem of the second type, we were looking for a solution to the equation in the entire plane, with the requirement of being bounded or qualified to vanish at infinity as a boundary condition. The considered class of RBF-networks satisfies this condition automatically; moreover, the resulting functions tend to zero at infinity quick enough. When training a neural network, in this case, a part of the test points was taken uniformly distributed in the neighborhood of the singularity point, and a part—normally distributed in the whole plane.

147

148

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

We also constructed an approximation of the solution for the entire plane in the case of a smooth right-hand side in the form of a Gaussian package g(x, y) ¼ A exp {(x  x0)2  (y  y0)2}. For the calculations, we chose the parameters A ¼ 100, x0 ¼ 0.4, y0 ¼ 0, kx ¼  1, ky ¼ 1, ω ¼ 20, ν ¼ 10. We trained the Gaussian RBFnetwork of 10 elements. In this case, the error in satisfying Eq. (1.27) did not exceed 2%. Details can be found in [13].

4.2.7

Heat transfer in the tissue-vessels system

In this section, we discuss computational experiments to build a neural network solution of the problem, the formulation of which is given in Section 1.2.5. To estimate the quality of the neural network training algorithms, we compared the neural network solution with the model solution in an explicit form. The following temperature field was used as such a solution. ΔA Q  Qxv ðx  xv Þ  ðx  xv Þ2 + Ez, 2 2 ΔA λv E  Qxv ðx  xv Þ + Tv ðx, y Þ ¼ A + ðx  xv Þ2 + Ez, 2 2 ΔA λa E Ta ðx, y Þ ¼ A  ðx  xa Þ2 + Ez, + Qðxα  xa Þðx  xa Þ + 2 2 ΔA Q + Qðxα  xa Þðx  xa Þ  ðx  xa Þ2 + Ez, T a ðx, y Þ ¼ A  2 2 T v ðx, y Þ ¼ A +

where notations are introduced: q uv ua Q½xv + ðxα  xa Þ , , λv ¼ , λa ¼ , x0 ¼ xva , E ¼ a a cρa λv ðx0  xv Þ  λa ðx0  xa Þ i Eh ΔA ¼ Q½xv ðx0  xv Þ + ðxα  xa Þðx0  xa Þ + λa ðx0  xa Þ2  λv ðx0  xv Þ2 : 2 Q¼

Neurocomputing results showed that, for example, in the case of parameter values A ¼ 0, Q ¼ 0:128, λv ¼ 93:75, λa ¼ 83:75, xv ¼ 0:85, x0 ¼ 1, xa ¼ 1:15, xα ¼ 2, zα ¼ 4:5, the mismatch between the model solution and its neural network approximation is less than 5%. Numerical calculations have shown that the neural network approximation correctly displays the qualitative behavior of the solution to the problem in the plane and the spatial case.

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

Fig. 4.28 Laplacian solutions in the neighborhood of vessels (spatial problem).

The Fig. 4.28 illustrates the experimentally observed non-zero value of the Laplacian of the temperature field in the vessels and the neighborhood of the vessels on the lower boundary of the region Π, which corresponds to the formulation of the problem. Similar results were obtained for vessels with a more realistic curved shape and vessels with parietal plaques. Details can be found in [12].

4.3 4.3.1

Solving problems for partial differential equations for domains with variable boundaries Stefan problem

In this section, we present some results of calculations for Stefan problem formulated in Section 1.3.1. For the model problem implementation, we choose the following parameter values: a+ ¼ 1.2, a ¼ 1, T ¼ 3, k+ ¼ 1.2, k ¼ 1, q ¼ 1, boundary conditions φ ¼ t  1, ψ ¼  1, and initial condition: u0 ¼  1. As a result of the first approach, based on a single neural network, we obtained fairly good results. So, when we selected a neural network of 10 linear elements with coefficients in the form of single-layer perceptron with the activation function tanh(), the maximum error was 5%. For a neural network of another architecture, for example, for an RBF network composed of 50 Gaussian packets, we received a maximum error of 9%. As expected, networks created on the basis of perceptrons are easier to train and allow to obtain better

149

150

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

approximate solutions of problems with discontinuous coefficients than smooth RBF networks. Increasing the number of neurons in the approximating neural network allows us to increase accuracy, but significantly increases the training time. When we applied the second approach, in which separate neural networks are used to approximate the temperature field for each phase and the phase transition line, the results did not improve significantly. In [9], we itemized presentation of the results of computational experiments.

4.3.2

The problem of the variable pressure calibrator

In this section, we present some results of solving the problem of modeling the variable pressure calibrator, defined in Section 1.3.2. The essential difference between solving this and the previous problem is that we are looking for unknown boundary based on the optimality condition. The process of a neural network training (or fitting parameters to minimize the functionals I and J) is organized as follows: e is defined as the output of (1) The initial state of the boundary Γ the perceptron with such initialization of coefficients (weights) that the boundary is close to the horizontal line. (2) The centers of the RBF network are initially distributed randomly (according to a uniform or other probability distribution) or regularly (according to a certain rule) in a rectangle Π Ω (or some of its surroundings). The coefficients ci and ai are considered independent random uniformly distributed parameters. (3) We select the test points on the border (at a given distance from each other) and inside the region (been distributed randomly and uniformly) and calculate the functional J. (4) There are several steps to minimize the functional J based on an iterative method, which leads to a variation in the coefficients (weights) of the RBF network. In our study, we successfully applied the random search method. (5) The perceptron weights and the corresponding part of the e change while minimizing the functional I. boundary Γ (6) Random test points inside the region are generated anew, taking into account changes in the boundary. (7) Steps 5 and 6 of the iterative process are repeated a specified number of times or until the required accuracy is achieved: the functional J becomes less than a certain preassigned value, and the change in the functional I is quite small. In order to implement this algorithm, a program was written on the programming languageC + +.

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

In the publication [13], we presented the results of calculations e ¼ M0 ¼ 100, for the following parameter values: M ¼ 1536, M mG ¼ 8; δ1 ¼ 0:5, δi ¼ 1, i ¼ 2, …, 6. The neural network approach to solving the described problem has the following obvious advantages: noise immunity, adaptation facilities, application for nonlinear non-classical problem solution. Noise immunity is the stability of solutions under conditions of small changes in the input data and boundary conditions, instability of medium properties. Adaptation possibility means that we need not train the network anew; it is enough to use an already trained network for sufficiently close input data or to upgrade the network to the required accuracy (post-train the neural network if necessary). So, for example, you can implement the following actions: • to clarify the model of the measuring system, using the system of acoustic equations for calculating the wave field in the working chamber, not limited to linear approximation; • to improve the model, refining its structure and coefficients based on a series of well-known approximate experimental data in accordance with the ideas outlined in Sections 3.3 and 4.4, in particular, solving the incorrect identification problem with the zero-free optical method for measuring the parameters of the medium filling the working chamber of variable pressure installation; • to modify the permissible boundary operator, taking into account the interaction of waves with the chamber walls, to investigate the effect of the sensor on the wave field in the working chamber; • consider working chamber with complex geometry without symmetry, systems of composite resonators, with acoustically interacting elements, and take into account the effects (influence of bubbles, gaseous media, acoustic flows, etc.) arising in the working chamber of the installation. The study each of these problems with the help of traditional approaches confined intractable questions.

4.4

Solving inverse and other ill-posed problems

In this section, we present the results of solving a range of problems that are particularly relevant for practice. The fact is that when modeling real objects, it is difficult to obtain accurate data specifying the processes taking place in them and formulate it as a classical correctly posed problem, including a differential equation (or equations) and boundary conditions. Usually, equations

151

152

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

and (or) conditions are known as inaccurately or incompletely, but there are results of observations. In this section, we show that our methods, outlined in previous chapters, allow us to construct a completely adequate model based on this data.

4.4.1

Comparison of neural network and classical approaches to the problem of identification of migration processes

In this section, we consider the results of solving the inverse problem of modeling migration flows, formulated in Section 1.4.1. A comparative analysis of the classical approach (based on quantization and linear regression) and the approach based on the neural networks. We considered two migration models from [14]. The first model has the form of a system of two nonlinear differential equations (1.29). The second model (1.30) is obtained from the first one by linearization in the neighborhood of the zero equilibrium position. This approach is often used in solving applied problems, but when deviating from the equilibrium position, the accuracy of the results leaves much to be desired, as we will show later. Based on the second model in [14], a forecast of migration dynamics was constructed. In order to construct such a forecast, it is necessary to determine the coefficients of the models from the available data. Consider three methods for identifying the coefficients of the models under consideration: The first method involves solving the system (1.30) analytically and obtaining the coefficients for the best fit between the solution and the data. Obviously, this method allows to obtain an arbitrarily accurate solution of the problem (1.30) in the presence of a sufficient amount of data of appropriate accuracy but does not allow approaching the solution of system (1.29) in a situation where its solutions are far from the exact solutions of system (1.30). Also, this method is much more time consuming than the second, so its testing was not conducted. The second method is based on the system quantization (1.30). The quantization is the segmentation of the time interval [0; T] into equal intervals and replacement the derivative by difference quotient. Such a conversion transforms the system (1.30) into a linear system of two equations, which coefficients can be determined by the formulas for a two-dimensional linear regression. Values {x(i), y(i)} at sample points are considered known. In the test calculations, we took them from the exact solution of the

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

system. In modeling real processes, they must be taken from observations. The third method involves a neural network. As a neural network, we used perceptron with one hidden layer. To create a data set for neural network training, we solve a system (1.29) numerically for a sufficiently large set of parameters {k, k1, k2}. Next, we train a neural network to determine each of them from the data using the functionals given in Section 1.4.1. In this case, the calculation of the coefficients {k, k1, k2} of the observations is the substitution of observational data in the neural network. We present some results of comparative testing of the second and third methods. The forecast period was considered five times longer than the period for which the model was developed. We present the results of two series of computational experiments, choosing a different time interval. In the first series, we consider the case when the period for which the model is being developed is the interval [0;1]. The time interval for which the forecast is carried over is a segment [0, 5]. We present the results of applying the second and third methods for the values of the coefficients k ¼ 1.2; k1 ¼  0.5; k2 ¼ 0.5. (1) The second method (linear regression). With three sampling points (data is used at times 0, 0.5 and 1), we got a strong discrepancy between the calculated model and the theoretical one (the maximum error in the time interval [0; 5] was 15%), but the shape of the graph is not distorted [15]. With an increase in the number of input points to 10 in the neighborhood of the origin, we have a more accurate approximation, but the error is still large (5%). Further, by increasing the number of sampling points to 100, we achieve excellent accuracy—the error does not exceed 1%. (2) The method of neural network modeling. Networks were trained to set of 300 randomly selected uniformly distributed parameter from a region k 2 [0; 2]; k2 2 [0; 1]; k1 2 [1; 0]. To begin, choose the number of neurons equal to five. Such a neural network model constructed on three time points from a segment [0; 1] for the system (1.30) (data for time points 0, 0.5 and 1 are used) reflects the main trend; however, there is a significant error (the error is up to 50%). If we use the system (1.29), that is, a nonlinear model, the accuracy is much better (the error is no more than 7%). This more high level of accuracy is an advantage of neural networks: the nonlinear model, which is more accurate, allows to obtain better approximation. By increasing the number of neurons to 15, good accuracy is achieved (the error does not exceed 4%) for the case of

153

154

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

a non-linear model, especially in the neighborhood of the origin (see Ref. [15]). Increasing the size of the neural network to 25 neurons using the linear system (1.30), we saw that it approximates the theoretical model very well. The result is comparable to the result of the linear regression method with one hundred points, i.e., the error does not exceed 1% (see Ref. [15]). In the second series of computational experiments, we used the time interval [0; 10] to construct the model. Used to test the values of the coefficients k ¼ 1.2; k1 ¼  0.5; k2 ¼ 0.5 remained the same. (1) The second method (linear regression). We begin our consideration with the case of 100 sampling points distributed on an interval [0; 10] with a step of 0.1. In this case, the error becomes very large (up to 30%), which means that by 100 points it is not possible to predict the migration trend well for a sufficiently long period of time (the forecast period [0; 50] was considered) and to achieve an acceptable approximation, an increase in the number of points is necessary. With an increase in the number of points to 1000, the accuracy became rather high (an error of no more than 5%). However, in such problems, it is difficult to obtain such a large amount of input data. (2) The method of neural network modeling. The model is constructed on 3 points (using data at times 0, 5, and 10). Consider the case of a neural network consisting of 15 neurons, the system (1.29) was used to build the network. The accuracy of the approximation decreases as compared with the first series of computational experiments, but at the same time, the neural network model reproduces well the curve shape of the graph of the function for the theoretical model (an error of no more than 15%). Let increase the size of the neural network to 25 neurons. For the case of a linear model, the result will be several times better than a linear regression obtained on a hundred points, but worse than a thousand points, the error is quite large, but this is not so important, since the linear model itself has significant inaccuracy (unlike the non-linear model (1.29) by three orders of magnitude). When applying a nonlinear model, even with an increase in the period to [0; 10] in comparison with the interval [0; 1] in the first series of computational experiments, we get a good approximation, and in all areas, the error does not exceed 2% (see Ref. [15]). Based on the results of computational experiments, we can draw several conclusions:

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

(1) Neural networks with a sufficient number of neurons (in this problem—greater than or equal to 25) make it possible to restore the model well. In order to achieve the same accuracy using linear regression, a large amount of input data is needed. Also, this method causes significant difficulties for the case of a nonlinear model, which is more accurate. (2) Neural networks can work both with a linear model in compliance with the system (1.30) and with a nonlinear model in compliance with the system (1.29). The accuracy of the approximation depends on the number of neurons significantly and depends on the number of input data weakly. This dependence gives neural networks a great advantage over the linear regression method because real demographic data are usually known at fairly large intervals. As noted earlier, neural networks have several significant advantages compared with other methods of approximation and forecasting: universality, accuracy, ability to learn. Neural networks are already a universal method for solving many complex applied problems; the results only confirmed this. Details on this problem can be found in [15].

4.4.2

The problem of the recovery solutions of the Laplace equation on the measurements

This section contains the results of calculations for the problem, which formulation we presented in Section 1.4.2. The problem of finding a function for which an equation is known in a part of the region, moreover, its values in a certain set of points were obtained (for example, as a result of measurements), which is an example of the problem of recovering a real object model from partial information, which is often encountered in practical applications. Recall that the specific problem is that a function u(x), x 2 Rp is searched for in a region Ω ¼ Ω1 [ Ω2, satisfying a differential equation for x 2 Ω2, for which values are known at a certain set of points from sub-domains Ω1 and Ω2. We looked for the neural network approximation of the solution in the form of an RBF network from Gaussian functions uðx Þ ¼

N X

n

2 o ci exp ai x  x 0i ,

i¼1

where kk is the Euclidean norm in Rp. For calculations as a region, we chose the circle Ω : x2 + y2 < 1, sub-domains—Ω1 : x2 + y2 < 1, x < 0, Ω2 : x2 + y2 < 1, x > 0. We assumed that the substitute boundary conditions “measurable”

155

156

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

data {zj} are known with an error, which is a random variable uniformly distributed over the interval [ ε; ε]. For computational experiments, we have chosen ε ¼ 0.1. As a test, we took the function u ¼ xy. If, in the form of initial requirements for a solution, in addition to satisfying the Laplace equation in a semicircle Ω2, set the solution values with random error ε at m2 ¼ 3 random points in Ω2, and take a solution in the semicircle Ω1 with the same error m1 ¼ 7 at random points, then the network allows you to find a solution with the same mean square error ε. At the same time, no boundary conditions are specified! These results are given for a Gaussian RBF network of 30 functions and the following parameter values: the number of test points M ¼ 50, the penalty factor δ ¼ 100.

4.4.3

Problem for heat conduction equation with time reversal

This section discusses the classic incorrect problem, set in Section 1.4.3. The problem is to restore the initial temperature distribution of a rod of unit length from the temperature distribution, measured at a specified time interval T, which was chosen equal to 1. The function sin(πx) exp( π 2t) was chosen as the model solution. A stable approximation in such a formulation cannot be obtained without invoking additional information about the solution. We present the results of two ways to attract such information. The first method consists of adding one or several terms corresponding to the initial condition to the functional J3. The computational experiment showed that it is enough to use one value, i.e., it is assumed that we know the initial condition at one point. For network training, we used the method of growing networks with the addition of neurons one at a time and rejection of the unsuccessfully added one (an error after learning the entire network with an added neuron on the test set is more than an error without it). As a nonlinear optimization algorithm, a combination of the cloud method and RProp was used (see Chapter 3). Let us give some results of calculations for the case when the number of test points at which the differential operator is calculated N ¼ 200, the number of test points in time at which the boundary conditions are checked Nb ¼ 50, the number of points at which the condition is checked at the final time point Nd ¼ 50. In the first experiment, the number of attempts to add a neuron 10, the number of neurons is 11. In this case, the maximum error was 8%.

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

In the second experiment, the number of attempts to add a neuron is 20, the number of neurons is 21. In this case, the maximum error was 2.5%. In the third experiment, the number of attempts to add a neuron is 50, the number of neurons is 48. In this case, the maximum error was 1%. The second method, instead of a known point in the initial condition, applies one or several random points inside the region (notable measurements at intermediate times). Below five such points were used, the number of attempts to add a neuron is 1559, the number of neurons is 51. In this case, the maximum error was 1%.

4.4.4

The problem of determining the boundary conditions

This section deals with the incorrect problem set in Section 1.4.4. The problem is to determine the temperature distribution of the rod with the heat-insulated end according to its distribution at the initial moment, as well as the temperature value at   1 x2 was the heat-insulated end. The function pffiffiffiffiffiffiffiffiffi exp 4ðt + 1Þ t +1 chosen as a model solution. For training the network, the method of growing networks was used with the addition of neurons one at a time and the rejection of the unsuccessfully added one (the error after learning the entire network with the added neuron on the test set is larger than the error without it). As a nonlinear optimization algorithm, a combination of the cloud method and RProp was used (see Chapter 3). We present some calculation results for the case when the number of test points at which the differential operator is calculated N ¼ 200, the number of test points in time at which the isolation condition is checked and the points at which the initial condition is checked Nb ¼ 20, the number of points in time at which the required temperature is checked at the isolated end Nd ¼ 50. If the error in setting the temperature at the isolated end is a random variable uniformly distributed in the interval [0.001; 0.001], the number of attempts to add neuron 50, the total number of neurons is 31, then we restore the boundary condition at the end, at which it is unknown with an error not exceeding 0.03, the rootmean-square error in determining solutions was 0.003. If just as in the previous subsection, to take a point on the right border, the number of attempts to add a neuron is 50, and the number of neurons is 41, then the maximum error of restoring

157

158

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

the boundary condition decreased to 0.015, the root-mean-square error in determining the solution was 0.0005. If we take three points on the right border, then with zero error in setting the experimental data, the number of attempts to add neuron is 50, and the number of neurons is 41, the maximum error of restoring the boundary condition decreased to 0.005, the rootmean-square error in determining the solution was 0.0001. If in the previous version the required temperature was set with a random error uniformly distributed in the interval [0.1; 0.1], then the boundary condition is restored with the corresponding error: the maximum error of restoring the boundary condition was 0.13, the root-mean-square error in determining the solution was 0.02. Instead of points on the border where a solution is sought, you can specify a solution at random points inside the region. Let five such points be given, the experimental data specification error is zero, the number of attempts to add a neuron is 50, and the number of neurons is 37, then the maximum error of restoring the boundary condition was 0.03, the root-mean-square error in determining the solution was 0.001. If in the previous version we take 10 points and set the temperature with a random error uniformly distributed over the interval [0.1; 0.1], the results are obtained with the corresponding error: the maximum error of restoring the boundary condition was 0.3, the root-mean-square error in determining the solution was 0.03. A modification of the setting was also considered, in which, instead of the temperature at the insulated end, the temperature is set at some internal point on the rod. The results of the calculations are similar. Here are some of them for the case when the temperature is set in the middle of the rod, the error in specifying the experimental data (solution at random points inside the region) is a random variable uniformly distributed on the interval [0.1; 0.1], the number of attempts to add a neuron is 50, the number of neurons is 38. We used prior methods for determining the network structure and nonlinear optimization. Under these conditions, the maximum error in the restoration of the boundary condition was 0.05, the root-mean-square error in determining the solution was 0.012. As the experimental data, we take the value of the desired function at a point on the right border, and its error as random variable uniformly distributed over the interval [0.001; 0.001]. In this computational experiment, we choose the number of attempts equal to 50 and the number of neurons equal to 22. For this initial data, we obtained the following results: the maximum error of

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

restoring the boundary condition was 0.003, and the root-meansquare error in determining the solution was 0.0002. As a prospect for further research, undoubted interest may be attracted to the following: • comparative analysis of the use of various neural network functional bases (in particular, heterogeneous), • consideration of non-standard neural network basic elements (for example, solutions generated by neural network decompositions of Cauchy data, etc.), • application of evolutionary algorithms and natural problem parallelization for simultaneous adjustment of the weights and structure of neural networks, • consideration of other approaches and their corresponding algorithms when setting up a neural network functional basis, • study of other problem statements: other equations, a different type of boundary conditions, the case of large dimensions, domains of a complex form, recovery of coefficients, etc. • Similar constructions can be made to isolate the sets of solutions of integral equations, integro-differential, and other equations.

4.4.5

The problem of continuing the temperature field according to measurement data

This section presents the results of calculations for the spatially one-dimensional problem, formulated in Section 1.4.5. As a model (defined in a similar formulation) solution, a function u(x, t) ¼ exp( π 2t) sin(πx) was used, the values of which were set with an error ε in the set of points  p e xk , et k k¼1 ¼ P Ω ¼ ½0; 1  ½0; 1. Cases of different numbers of “experimental” points p ¼ 50 and p ¼ 24, as well as options for setting “experimental data” with different accuracy ε ¼ 0.001, ε ¼ 0.01, and ε ¼ 0.1 were considered. As an approximate solution, a neural network of N ¼ 36 “circular” Gaussian exponentials was considered (“elliptical” Gaussians were also used, but did not yield any gain). The dense cloud method was used to train the network, which proved to be better in this problem than the modified polyhedron method (see Chapter 3). The radius of the cloud was as follows ρ ¼ 0.04. Different options were considered for initiating the original values of the parameters of the neural network to be trained—the result was expected: the closer the original unconfigured network to the desired solution, the faster (in a smaller number of learning periods) the approximate neural network solution of the problem

159

160

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

of this accuracy is built, but failure to choose the initial approximation can be compensated for a large number of iterations; the increase in the number of functions used N increases the number of iterations to achieve the prescribed accuracy and the time of each operation (networks from 49, 64 and 100 Gaussians were considered). A change in the accuracy of the “experimental data” assignment from ε ¼ 0.001 to ε ¼ 0.01 (and even to ε ¼ 0.1) did not lead to a significant change in the quality of the constructed neural network solution. As a result of computational experiments for the values N ¼ 36, p ¼ 50, ε ¼ 0.01 the maximum error in restoring the initial conditions and the solution was 13%. For N ¼ 49, p ¼ 24, ε ¼ 0.1 the maximum error in restoring initial conditions and solutions was 48%. In this case, the general view of the initial solution is tracked; it is restored with a root-meansquare error of less than 10%. The approximation thus constructed for solving the problem can be regarded as its regularization. An adjustment of the weights of the RBF-network from N ¼ 64 Gaussian neuro elements after the fifth regeneration of test points resulted in a maximum error of 2.4%. Similar results for the RBF network of N ¼ 64 neuro elements with the activation function of the spline type 8 0, t 2 ð∞;  2, > > > > > > ðt + 2Þ3 =6, t 2 ð2;  1, > > >   > < 3t 3  6t 2 + 4 =6, t 2 ð1; 0,  : (4.8) φðt Þ ¼  3 > 3t  6t 2 + 4 =6, t 2 ð0; 1 > > > > > > ðt + 2Þ3 =6,t 2 ð1; 2, > > > : 0, t 2 ð2; ∞Þ significantly worse—the maximum error was 26%. We note a significant dependence of the error on the number of neuro elements in the network. Using too small a network leads to a big error, and using too large a network requires an unnecessarily lengthy learning procedure. Some computational results for growing networks are given below. Artificial neural networks (ANN) based on Gaussian packets: an increase in the number of neurons up to N ¼ 80, a degenerate (from one element) cloud. The maximum error, in this case, was 2%.

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

If we performed the increase in the number of neurons up to N ¼ 60, choosing a cloud of radius ρ ¼ 0.01, then the maximum error was 10%. Similar results were obtained for a cloud with a radius ρ ¼ 0.005. When using spline-based ANN, the error is significantly larger; the maximum error is 20%. A more accurate result is provided by an approach that uses non-standard neural network basic elements—fundamental solutions that are generated by neural network expansions of Cauchy data: for example, for Gaussian packets, these are network n o neural pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 0 2 functions of the form c exp αðx  x Þ =ð1 + 4αt Þ = 1 + 4αt , α > 0. So, when using fundamental solutions as basis functions for N ¼ 64 a maximum error is of 4%. With this approach, the errors in the functional are only terms for the Cauchy data and boundary conditions. The quality of the approximation with this particular approach is very good. Still, the applicability is limited by the possibility of constructing fundamental solutions. The complex of methods considered in this section can be applied to other ill-posed problems of this kind. Similar constructions can be made to isolate the solution sets of integral equations, integro-differential, and other equations, and the advantages of the neural network approach can be fully used in the case of problems with complex geometry, with nonlinearities, with piecewise data, when solving a series of problems with refined problem statement, etc. Generalization to a two-dimensional case in spatial variables. As is known, with a given initial condition u(x, y, 0) ¼ ϕ(x, y) and a selected zero boundary condition, the analytical solution (obtained by the Fourier method—separation of variables) has the form: ua ðx, y, t Þ ¼

∞ X ∞ X

2 2 2 Al, m sin ðπlxÞsin ðπmy Þe π ðl + m Þt ,

l¼0 m¼0

where Al,m ¼ 4

R1 R1

ϕðx, y Þ sin ðπlxÞ sin ðπmy Þdxdy. Further, this ana-

0 0

lytical solution is used to generate “measurement data” and compare it with the obtained approximate neural network solution. In this study, we selected the function ϕ(x, y) as: ϕ(x, y) ¼ A sin(πlx) sin(πmy) with some natural l, m, and amplitude A. Then the analytical solution has the form: 2 2 2 ua(x, y, t) ¼ A sin(πlx) sin(πmy)e π (l +m )t.

161

162

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

In most of the numerical experiments, the “measured” values fj were chosen according to the formula fj ¼ ua(xj, yj, tj) + ε, in some experiments, according to the formula fj ¼ ua(xj, yj, tj)(1 + ε), where the test points and “measurement errors” xj, yj, tj, ε are random variables that are uniformly distributed in some intervals. Points xj, yj, tj were chosen from the above area Ω : 0 < x < 1, 0 < y < 1, 0 < t < T, ε 2 [ εl; εr]. Next, we present the results of computational experiments that use fixed-size networks and growing networks. Networks of constant size. We were looking for the solution as an RBF-network with basis functions in the form of ellipsoidal Gaussians. h i P uðx, y, t Þ ¼ ni¼1 ci exp ki ðx  xi Þ2  mi ðy  yi Þ2  li ðt  ti Þ2 , wi ¼ ðci , ki , mi , li , xi , yi , ti Þ; and the value n was fixed. As a method of local optimization, we used the conjugate gradient method (Fletcher-Reeves). We chose a floating step and used the restart method as a global optimization method. Once again, we note that the set of test points (points (ξj, ηj, τj), etc. in terms J1, Jb) was not fixed, but changed after a certain number of steps. It is clear that this made it possible not to choose the parameters N, Nb that are too large (which would lead to excessive computation-intensive work) and at the same time not to fear that the solution would be acceptable only at certain points, and between them, it would “behave” badly. A large number of calculations were performed for various parameters. A large amount of numerical experiments is needed to find the optimal values of some parameters, such as penalty factors, the number of neurons n, the number of test points N and Nb, the area W from which the initial weight vector is randomly selected. Note that the most important of the listed parameters is the area W, the correct choice of it leads to a significant acceleration of the algorithm. It was necessary either to choose the initial centers of the Gaussians xi, yi, ti from a region similar to Ω, but several times larger, indicators in Gaussians—based on the three-sigma rule, and coefficients ci—based on the scale of the experimental data, or to base on the results of the calculations already performed. Penalty factors should provide a balance between the three terms included in the error functional. Numerical experiments showed that the root mean square error corresponding to the term J1 in error functional is usually an order of magnitude larger than

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

each of the root mean square errors for the other two terms. Hence N N the formulas for penalty multipliers: δb ¼ 25 , δd ¼ 100 . Nb Nd The number of neurons n and numbers N and Nb are somewhat less significant. Of course, one wants to make them as large-scale as possible, but this leads to an undesirable complexity of calculations. However, experience has shown that even when n  100, N  300, Nb  50 the result is quite satisfactory. We present one of the results. We considered the parameter values: n ¼ 20, N ¼ 100, Nb ¼ 20, Nd ¼ 100, A ¼ 0.3, l ¼ m ¼ 1, εl ¼ εr ¼ 0.0001, T ¼ 1. Under these conditions, the maximum error was 5%. The method of growing networks has proved to be much more efficient. In the course of this study, a growing network was implemented with the addition of neurons one at a time with the entire network being trained after the next addition. We note an important point: in the process of performing the minimization steps (more specifically, after each step), a check is made whether the neuron being trained is not “bad.” A “bad” neuron is one that contributes to the minimization of an error less than a predetermined small number or increases the error functional on a test sample (that is, calculated using regenerated test points). For this method, all the arguments about the selection of the parameters of the algorithm and the degree of importance given above are also valid. The formulas for calculating penalty factors are the same. We present some results. Computational experiment 1. The values of the parameters of the algorithm were: n ¼ 50, N ¼ 350, Nb ¼ 40, Nd ¼ 100, A ¼ 0.3, l ¼ m ¼ 1, εl ¼ 0, εr ¼ 0.01, T ¼ 1. With these parameters, the maximum error was 2%. Computational experiment 2. The parameters were the same, except for the error: εl ¼ εr ¼ 0.001. With these parameters, the maximum error was 4%. Let’s pay attention to the unexpected result—the solution obtained from the data with the larger error turned out to be better than the solution, where the error was less. This strange phenomenon can be associated with the quality of “experimental” data. The solution was close to zero already with the value t > 0.3. The result is constructed for the values 0 < t < 1. If a significant part of the data falls into the region t > 0.3 (recall that the data were generated randomly and evenly in the interval 0 < t < 1), then the solution will be badly restored, which, we suppose, happened in this case. Of course, the interesting question is how the approximate solution depends on the number Nd, that is, on the number of

163

164

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

point measurements. In this study, solutions were also constructed for Nd ¼ 50 (which is two times less than in computational experiments 1 and 2) using the algorithms described above. Oddly enough, the results practically do not differ from those already given. Computational experiment 3. The most interesting is the restoration of higher modes. Using the method described in this section yielded the following results. Parameter values are n ¼ 200, N ¼ 350, Nb ¼ 40, Nd ¼ 100, A ¼ 0.3, l ¼ 1, m ¼ 2, εl ¼ εr ¼ 0.001, T ¼ 0.2. With these parameters, the maximum error was 15%. At the same time, the nature of the solution can be traced, and it would seem natural that while continuing to train the network through this algorithm, the solution will approach the true one. However, in practice, this was not the case—the solution did not get any better even after a long network training. It became clear that the algorithm must be modified. Computational experiment 4. Such a modification was the solution at a certain step of the algorithm of the linear system concerning weights ci (since they are included quadratically in the functional): ∂J ¼ 0, i ¼ 1, ::,n ∂ci

(4.9)

for finding coefficients ci. Thus, we solve the system Ac ¼ B:

(4.10)

Here are the formulas for the matrix of this system and its right side: Aik ¼

N



X αi ξj , ηj , τj αk ξj , ηj , τj + j¼1







9 0, ηj , τj ψ k 0, ηj , τj + ψ i 1, ηj , τj ψ k 1, ηj , τj + > = +δb







+ > > j¼1 : +ψ i ξj , 0, τj ψ k ξj , 0, τj + ψ i ξj , 1, τj ψ k ξj , 1, τj ; 8

Nb > α, (4.12) G ð ti Þ ¼ 0, otherwise where ε is the measurement error, ξ is a random variable uniformly distributed on the segment [0; 1], α is the sensitivity threshold ti from the segment [0; 1]. The value β was chosen equally 2α. The second formulation of the problem corresponds to the situation when we can determine the fact of the presence or absence of contamination with some accuracy. The error functional is minimized using the cloud method in conjunction with the RProp method (see Chapter 3). To solve this problem, we used two types of basis functions. As the first type, we have chosen a single-layer perceptron with an activation function tanh(). As the second type, we have chosen special basis functions, which are solutions of the diffusion equation, ! 1 0, 25ðx  xc Þ2 +a : uðx, t, xc , tc , a, cÞ ¼ c pffiffiffiffiffiffiffiffiffiffiffi exp t  tc t  tc

(4.13)

In the latter case, the first term J1 of the error functional disappears, since the diffusion equation is performed automatically. For comparison, we present the results of applying a neural network to solve a problem in which data is generated using the function R1(x, t). The number of neurons for the perceptron network and the network with special functions in the first computational experiment was taken equal to 10. The result is represented by lines of equal concentration level of harmful substances 0.05 for the true function R1(x, t) and constructed neural network approximations. For the perceptron, the solutions differed by no more than 7%, and for the above special basis functions, by no more than 2%. In this model case, lines of equal concentration level is an example of a critical level, the excess of which is dangerous for humans. By increasing the number of neurons to 20, the perceptron gives a good approximation of the function R1(x, t) in the region located at the exit of the tunnel, for the far dead-end of the tunnel there is an accumulation of error. In the case of choosing R2(x, t) as a function of concentration, the problem is solved less successfully. A significant increase in

169

170

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

the number of neurons (up to 100) allows us to construct a fairly accurate approximation of the concentration function on the left side of the tunnel. A similar result for the function R1(x, t) is obtained already with a network of 20 neurons. We also obtained the neural network solution of the problem for the data generated by the formula (4.12). As parameters, we considered ε ¼ α ¼ 0, 03 and β ¼ 2α. Given the nature of the data and the chosen error, a neural network in which special functions (4.13) were used as basis functions showed a good result. Let consider the result of training the network with five neurons. The lines of the equal level of the true function R1(x, t) and neural network approximations at the level of the concentration of harmful substances 0.05 for the above special basis functions differed by no more than 12%, and at the level of the concentration of harmful substances 0.02 did not differ more than 7%. Note that the number of neurons increases, but the error increases because of the effect of network overfitting. In conclusion of the section, we note the clearly manifested universality of the neural network method of constructing stable models of objects using the received heterogeneous data. When solving a series of problems with a refined formulation (updated data), along with the construction of an approximate solution of the problem, the coefficients included in its formulation can be determined. The method can be used in constructing regularizations for solving ill-posed problems. The neural network approach also works successfully in problems related to the optimization of the shape of the region where a solution is sought, as well as in problems with multicomponent systems with the desired boundary environment.

References [1] E. Hairer, G. Wanner, Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems, Springer, 2010, p. 614. [2] R. Martin, B. Heinrich, A direct adaptive method for faster backpropagation learning: the Rprop algorithm, in: Proceedings of the IEEE International Conference on Neural Networks, IEEE Press, 1993, pp. 586–591. [3] E. Hairer, S.P. Norsett, G. Wanner, Solving Ordinary Differential Equations I: Nonstiff Problem, Springer-Verlag, Berlin, 1987. xiv + 480 pp. [4] T.A. Shemyakina, D.A. Tarkhov, A.N. Vasilyev, in: L. Cheng, et al. (Ed.), Neural Network Technique for Processes Modeling in Porous Catalyst and Chemical Reactor, Springer International Publishing, Switzerland, 2016, pp. 547–554. ISNN 2016, LNCS 9719. [5] E.M. Budkina, E.B. Kuznetsov, T.V. Lazovskaya, D.A. Tarkhov, T.A. Shemyakina, A.N. Vasilyev, Neural network approach to intricate problems solving for ordinary differential equations, Opt. Mem. Neural Netw. 26 (2) (2017) 96–109.

Chapter 4 RESULTS OF COMPUTATIONAL EXPERIMENTS

[6] C. Na, Computational Methods for Solving Applied Boundary Value Problems, The Publishing World, Singapore, 1982. [7] E.M. Budkina, E.B. Kuznetsov, T.V. Lazovskaya, S.S. Leonov, D.A. Tarkhov, A.N. Vasilyev, L. Cheng et al. (Ed.), Neural Network Technique in Boundary Value Problems for Ordinary Differential Equations, Springer International Publishing, Switzerland, 2016, pp. 277–283. ISNN 2016, LNCS 9719. [8] D.A. Tarkhov, A.N. Vasilyev, New neural network technique to the numerical solution of mathematical physics problems. I: Simple problems, Opt. Mem. Neural Netw. (Inform. Opt.) 14 (1) (2005) 59–72. [9] D.A. Tarkhov, A.N. Vasilyev, New neural network technique to the numerical solution of mathematical physics problems. II: Complicated and nonstandard problems, Opt. Mem. Neural Netw. (Inform. Optics) 14 (2) (2005) 97–122. [10] H. Voss, Numerical calculation of the electronic structure for threedimensional quantum dots, Comput. Phys. Commun. 174 (2006) 441–446. [11] W. Wang, T.-M. Hwang, J.-C. Jang, A second-order finite volume scheme for three dimensional truncated pyramidal quantum dot, Comput. Phys. Commun. 174 (2006) 371–385. [12] V. Antonov, D. Tarkhov, A. Vasilyev, Unified approach to constructing the neural network models of real objects. Part 1, Math. Models Methods Appl. Sci. 41 (18) (2018) 9244–9251. [13] A. Vasilyev, D. Tarkhov, G. Guschin, Neural networks method in pressure gauge modeling, in: Proceedings of the 10th IMEKO TC7 International Symposium on Advances of Measurement Science, Saint-Petersburg, Russia, vol. 2, 2004, pp. 275–279. [14] W. Weidlich, Sociodynamics. A Systematic Approach to Mathematical Modeling in the Social Sciences, Harwood Academic, Amsterdam, 2000. [15] D.A. Tarkhov, I.K. Shanshin, D.O. Shakhanov, The reversed problem of migration streams modeling J. Phys. Conf. Ser. 772(1):012034 http://iopscience.iop. org/1742-6596/772/1/012034.

171

Methods for constructing multilayer semi-empirical models 5.1

5

General description of methods

Our experience in solving various tasks (see Chapter 4) showed that neural network models are resistant to errors in the initial data, allow us to include problem parameters in the number of inputs of a neural network, can be based on heterogeneous information, in particular, containing the results of experiments, and have a number of other merits. At the same time, our approach, which was stated in Chapters 1–4, had a number of disadvantages compared with the classical methods of grids (finite differences), finite elements, etc. Firstly, the training of the neural network remains a very resource-intensive procedure. Secondly, the required size of the neural network and the time of its training increase rapidly with increasing requirements to the accuracy of the model. This problem is relevant in a situation where the model in the form of a boundary value problem with high accuracy displays properties of a simulated real object. Thirdly, the neural network approach has not yet been implemented as a standard spent software package and is not built into any standard package like ANSYS. In this connection, the actual task is to combine neural network and classical approaches. We assume that new algorithms will be spared (at least partially) from the limitations of each of the approaches while maintaining their merits. We tried three options for such a union. The first option is that the results of applying classical methods are used as additional data when training a neural network [1–3]. The second option is to transform the implicit grid methods into explicit ones using a previously trained neural network. For ordinary differential equations, this approach was tested in [2]. Semi-empirical Neural Network Modeling and Digital Twins Development. https://doi.org/10.1016/B978-0-12-815651-3.00005-5 # 2020 Elsevier Inc. All rights reserved.

173

174

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

The third option leads to the construction of multilayer functional approximations based on classical numerical methods [1]. The result is an analog of deep learning [4–6]. In this chapter, we focused on the implementation of the third option. In previous chapters, we repeatedly noted as one of the advantages of neural network modeling over classical approaches to the construction of approximate solutions of differential equations (such as the grid method) that the neural network approach allows to obtain a solution in the form of an analytical formula, rather than a set of numerical values [7–14]. This book shows that this observation can be formulated more precisely: the analytically given approximations for the solution can be obtained on the basis of more general approaches, in which the neural network approach is included as a special case. By modifying the wellknown formulas of numerical methods such as the Euler method, approximate analytical solutions of differential equations are constructed. The usual estimates of the accuracy of the original classical methods show that in this way arbitrarily exact solutions can be obtained; such approaches allow us to give convenient estimates of the accuracy of the approximations obtained. The resulting models, if necessary, can be refined using the methods outlined in previous chapters, introducing into consideration the functional, constructed as shown in Chapter 1, and selecting the model parameters with one of the methods proposed in Chapter 2. We will explain our method on the example of the classical explicit Euler method. Consider the Cauchy problem for a system of ordinary differential equations  0 y ðxÞ ¼ f ðx, y ðxÞÞ, (5.1) y ðx0 Þ ¼ y 0 on the segment D ¼ [x0; x0 + a]. Here x 2 D  ℝ, y 2 ℝp, f : ℝp+1 ! ℝp. The classic Euler method is to divide the segment D into n parts: x0 < x1 < … < xk < xk+1 < … < xn ¼ x0 + a, and applying an iterative formula   y k + 1 ¼ y k + hk f xk , y k , (5.2) where hk ¼ xk+1  xk; yk is an approximation to the exact value of the desired solution y(xk). A known estimate of the resulting approximations in the form   y ðxk Þ  y   C max ðhk Þ, (5.3) k where the constant C depends on the estimates of the function f and its derivatives in the region in which the solution is sought [15]. In order to get an approximate solution as a function by

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

the obtained pointwise approximations, a polyline (Euler polyline) or a spline is drawn. We offer a fundamentally different approach for constructing an approximate solution in the form of a function. Using formula (5.2), we will construct an approximate solution of e ¼ ½x0 ; x with a variable upper limit problem (5.1) on the interval D x 2 [x0; x0 + a]. In so doing hk ¼ hk(x), yk ¼ yk(x), y0(x) ¼ y0. We propose to use yn(x) as the desired approximate solution. The simplest version of the algorithm is obtained by uniformly partitioning the segment with a step hk(x) ¼ (x  x0)/n. This idea allows for development in many areas. Here are some of them: The first direction is to replace the Euler method with other methods of Runge-Kutt, Adams, etc. [15]. Our modification of these methods is quite simple, as will be shown below. The second direction of development is associated with the use in the formula (5.2) (and other similar formulas) not the function itself f(x, y), but its neural network approximation. A similar option may occur, for example, when the function f(x, y) in the task (5.1) is given in tabular form or obtained by solving some other problem when it is advisable to search for this solution in the class of neural network functions. As a result, even for single-layer neural network functions f(x, y), we receive the solution in the form of a multilayer neural network. In order to speed up the computations of repeatedly solvable problems for different initial conditions and parameters, this solution can be implemented as a neurochip. The third direction is obtained while optimizing the arrangement of points xk based on minimization of the appropriate error functional. This direction can be developed by replacing the numerical values in the approximate analytical solutions obtained above with the parameters and selecting these parameters by minimization of the error functional, using the initial numerical values as the initial approximations. As a result of this approach for the above neural network, we obtain the usual learning procedure.

5.1.1

Explicit methods

For the numerical solution of the Cauchy problem (5.1) on the segment [x0; x0 + a], a wide palette of numerical methods has been developed [15]. A significant part of them is dividing this interval by points xk into length intervals hk, k ¼ 1, …, n and applying a recurrent formula:   y k + 1 ¼ y k + F f, hk , xk , y k , y k + 1 (5.4) We consider in similar way methods with memory, in which the dependence of the solution at the next point is built not only

175

176

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

on the solution at the previous point but also on the solutions at several previous points yk+1 ¼ yk + F(f, hk, xk, ykp, …, yk, yk+1). Here the operator F defines a specific method. The method is called explicit if the operator F(f, hk, xk, yk, yk+1) does not depend on the variable yk+1. In this case, we apply n times the formula (5.4) to the interval with a variable upper limit [x0; x]  [x0; x0 + a] (wherein hk ¼ hk(x), y0(x) ¼ y0, yk ¼ yk(x)). As a result, we obtain a function yn(x) that can be considered an approximate solution of problem (5.1). For a uniform estimate of the accuracy of such a formula, the usual estimates of the accuracy of the corresponding numerical method are applicable. In the simplest case of the x xk . uniform partition, we have hk ¼ , xk ¼ x0 + n n For the simplest explicit numerical methods, we obtain the following formulas. For the explicit Euler method, we have F(f, hk, xk, yk) ¼ hkf(xk, yk). More accurate formulas arise when applying second-order methods [15], for which the estimate (5.3) is replaced by the estimate ky(xk)  ykk  C max(hk)2. One of the methods of this type is the improved Euler method, for which the formula   (5.5) y k + 1 ¼ y k1 + 2hk f xk , y k , replaces formula (5.2), herewith you can use the formula to start the algorithm    h1 h1  y 1 ¼ y 0 + h1 f x0 + , y 0 + f x0 , y 0 : 2 2 The midpoint method is obtained by applying the formula (5.4) with      hk hk  (5.6) F f, hk , xk , y k , y k + 1 ¼ hk f xk + , y k + f xk , y k , 2 2 Another similar method is the Heun method, for which:   h        F f, hk , xk , y k , y k + 1 ¼ k f xk , y k + f xk + hk , y k + hk f xk , y k : 2

(5.7)

The modified Euler method works in accordance with the formula:

h





F f, hk , xk , y k , y k + 1 ¼ hk f xk , y k + k f 0x xk , y k + f 0y xk , y k f xk , y k 2

ð5:8Þ

Differential equations of higher orders are reduced to a system of higher dimension [15], but sometimes there are more efficient methods for them, for example, for a second-order equation of € rmer method the form y 00 (x) ¼ f(x, y), this is the Sto

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

  y k + 1 ¼ 2y k  y k1 + hk 2 f xk , y k : Quite often in practice, there are cases when the formulation of the problem (5.1) includes parameters.  0 y ðxÞ ¼ f ðx, y ðxÞ, μÞ, (5.9) y ðx0 Þ ¼ y 0 ðμÞ Here μ is the vector of the mentioned parameters. Usually, the problem requires investigating the behavior of solutions depending on these parameters, choosing parameters so that the solution is optimal in a certain sense, etc. In this situation, some asymptotic expansions are usually used [16], obtaining an expression for a solution from a rather narrow class (as a rule, power expansions are used), or the problem (5.9) is solved numerically for a fairly representative set of parameters. Our approach allows us immediately to get a recurrent formula of the form yk+1 ¼ B(f, hk, xk, yk, μ), the use of which automatically gives an approximate version of the required dependency, which takes the value as yn(x, μ). Another common complication of the problem (5.1) is the boundary value problems, which have the form  y 0 ðxÞ ¼ f ðx, y ðxÞÞ, (5.10) uðx0 Þ ¼ u0 ,v ðx0 + aÞ ¼ v 0 : Here the coordinates of the vectors u, v are composed of the coordinates of the vector y; their total dimension is equal to the dimension of the vector y. The boundary value problem can be reduced to the problem with a view parameter  y 0 ðxÞ ¼ f ðx, y ðxÞÞ, (5.11) uðx0 Þ ¼ u0 , w ðx0 Þ ¼ μ: The vector w contains the coordinates of the vector y that are not in the vector u. Just as before, we construct a multilayer solution of the problem (5.11) yn(x, μ). This technique allows us to obtain an equation from the conditions on the right end in the formulation of the problem (5.10) vn(x0 + a, μ) ¼ v0, solving which we find μ. This approach can be considered as a functional version of the method of shooting. Often a slightly different approach gives a more accurate solution. We choose an arbitrary point t 2 (x0; x0 + a), and in the same way as before, we build a solution to the problem  0 y ðxÞ ¼ f ðx, y ðxÞÞ, (5.12) y ðt Þ ¼ μ:

177

178

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

In this case, the parameter μ is selected from the conditions  uð x 0 Þ ¼ u0 , (5.13) v ðx0 + aÞ ¼ v 0 : We select the point t from the convenience of the solution, for example, in the middle of the gap. One way to improve the accuracy of the solution without increasing the number n is to select several such points, build a solution to the problem (5.12) for each of them, and then choose the parameters μ of each task so as to obtain the solution (5.11) and the smoothest solution joints in these points. When modeling real objects, the solution to the problem of constructing a model using heterogeneous information, including differential equations and measurement data, is of particular importance. Our approach allows us to simplify the solution to this problem dramatically. Let us consider the problem (5.9), assuming the parameters unknown, and they to be determined on the basis of the measurement data y(xi) ¼ yi, i ¼ 1, …, N. We propose to find an approximate solution yn(x, μ) by the above method, and then find the parameter μ by minimizing the functional N  X  y ðxi , μÞ  y 2 n i

(5.14)

i¼1

If necessary, the functional (5.14) can be replaced by another functional, better taking into account the peculiarities of the problem. Because there is a large arbitrariness in the formation yn(z, μ)—the choice of the number n, various numerical methods, on the basis of which this function is built, the method of forming steps hk, etc., we have the opportunity to choose such a function that best fits the measurement data. Due to the fact that task (5.1) corresponds approximately to the real object being modeled, one of the resulting functions yn(x, μ) likely corresponds to measurement data better than the exact solution of problem (5.9). Three such practical examples are discussed further in Section 5.4.

5.1.2

Implicit methods

If the function F in Eq. (5.4) depends on yk+1 , then equality (5.4) can be considered as an equation for yk+1. This equation may allow an exact solution, then instead of Eq. (5.4) we get the following relation:   (5.15) y k + 1 ¼ B f, y k , hk , xk :

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

Formula (5.15), as before, allows us to calculate yn(x) and use it as an approximate solution of problem (5.1). If Eq. (5.4) cannot be solved exactly relatively yk+1, then to obtain a formula of the form (5.15) we can use some approximate method (like the Newton method) or a specially trained neural network, as it was done in [2]. Several variants of this approach are used further in Section 5.4 when considering specific applied problems. Most often of the implicit methods we used • Euler’s implicit method     F f, hk , xk , y k , y k + 1 ¼ hk f xk + 1 , y k + 1 , • one-step Adams method (trapezoid method)        F f, hk , xk , y k , y k + 1 ¼ 0:5hk f xk , y k + f xk + 1 , y k + 1 :

5.1.3

Partial differential equations

We consider an evolutionary equation of the form ∂ uðx, t Þ ¼ F ðuðx, t Þx, t Þ, ∂t where u(x, t) is a sufficiently smooth function with respect to variables (x, t) 2 ℝp  ℝ+, F is, generally speaking, a nonlinear mapping, for example, a differential operator. We assume that the solution u(x, t) must satisfy the initial conditions u(x, 0) ¼ φ(x). Additional conditions—belonging to a certain functional space, boundary conditions, etc.—in this context, we can omit them, it makes sense to discuss them for a specific task. We propose to construct approximations for solving the time problem for a variable segment [0; t], using for this the implicit Euler method uk+1(x, t) ¼ uk(x, t) + hF(uk+1(x), x, tk+1) with a constant step h ¼ t/n. Other methods also can be used, for example, the Runge-Kutta method. As an approximation for the solution, we propose to use un(x, t). Denoting the value uk(x, t) of the solution on the layer with the number k (with tk ¼ k t/n), we get a series of equations for the approximations uk(x, t). 1 1 uk + 1 ðx, t Þ  F ðuk + 1 ðx, t Þ, x, tk + 1 Þ ¼ uk ðx, t Þ, k ¼ 0, …,n  1, h h u0 ðx, t Þ ¼ φðxÞ:

179

180

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

To solve these equations, we use some suitable method, for example, the Newton method. As a result, we get a recurrent dependence of the form uk+1(x, t) ¼ G(uk(x, t), x, tk+1), which can be interpreted as a procedure for the transition from layer to layer of a neural-like structure. At the same time, the similarity with neural networks increases if the operator Ð G manages to be presented in an integral form G(u, x, t) ¼ K(u, x, t, y, s)dyds. Thus, we obtain a neural network with a neuron continuum. The resulting neural-like structure can be viewed as a generalization of a multilayer feedforward network or a Hopfield recurrent network. Such an analogy makes it possible to apply known methods, for example, the back-propagation error algorithm, to refine the solution obtained (operator G or kernel K). In the case when the mapping F is represented as F ðuðx, t Þx, t Þ ¼ Luðx, t Þ + f ðx, t Þ, where L is a linear operator: for example, a linear differential operator with respect to space variables x, we arrive at the following series of approximations   1 1 I  L uk + 1 ðx, t Þ ¼ uk ðx, t Þ + fk + 1 ðx, t Þ, k ¼ 0, …, n  1: h h Here I is the identity mapping, fk+1(x) ¼ f(x, tk+1) is a known function. The value u0(x, t) ¼ φ(x) is still considered set. If the number λ ¼ 1/h does not belong to the spectrum of the operator: L: λ 62 σ(L),   1 1 exists, and approximathen the resolvent R ¼ RðLÞ ¼ L  I h tions are determined 1 u1 ðx, t Þ ¼  RφðxÞ  Rf 1 ðx, t Þ, uk + 1 ðx, t Þ h 1 ¼  Ruk ðx, t Þ  Rf k + 1 ðx, t Þ, k ¼ 0, …, n  1: h In the case when the resolvent is recorded in an integral form, just as before, the result can be interpreted as a neural network with a continuum of neurons, and it is possible using the corresponding learning algorithms to refine the solution. We will be interested in the case of an ordinary differential operator L on a semi-axis ∂ uðx, t Þ ¼ Luðx, t Þ, ∂t here (x, t) 2 ℝ+  ℝ+, L is a linear differential operator with respect to a spatial variable x. As a boundary condition, we consider the

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

boundedness of the solution at infinity. The series of relations will take the form 1 1 Luk + 1 ðx, t Þ  uk + 1 ðx, t Þ ¼  uk ðx, t Þ: h h The task of constructing a resolvent for such a case is solved exhaustively. Second-order equation   ∂2 ∂ u ð x, t Þ, x, t u ð x, t Þ ¼ F u ð x, t Þ, ∂t 2 ∂t can be reduced to a first-order system by substitution ∂ vðx, t Þ ¼ uðx, t Þ. Here F is the differential operator concerning ∂t the spatial variable. To the received system 8 ∂ > < uðx, t Þ ¼ vðx, t Þ, ∂t > : ∂ uðx, t Þ ¼ F ðuðx, t Þ, vðx, t Þ, x, t Þ, ∂t we can apply the above approach. Similarly, it is possible to consider equations of higher order and the case of several spatial variables.

5.2 5.2.1

Application of methods for constructing approximate analytical solutions for ordinary differential equations Comparison of methods on the example of elementary functions

This section is devoted to a description of the beginning of the study of multilayer methods for constructing approximate solutions of differential equations. Here we test their work on simple problems with well-known analytical solutions. Such a check allows us to find out their real error, and not its upper estimate, which follows from the general convergence theorems of the corresponding numerical methods. When we use these methods, we obtain new asymptotic expansions for ex and cos(x) and compare them with the known expansions according to the Maclaurin formula.

Results of computational experiments 1: The exponential function We will consider the solution of the Cauchy problem for the model differential equation y 0 ¼ y on the segment [0; l], where l 2 [0.25; 1], by using our modification of the well-known numerical

181

182

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

methods considered in paragraph 5.1, implying their application to the interval [0; x] with a variable upper limit. The following are the results of computational experiments for the condition y(0) ¼ 1. In this case, the solution is the exponent ex. We consider a partition of the interval to be uniform, that is hk ¼ x/n. We will investigate the dependence of the error of the number n of partitions of the segment. Based on formulas (5.4)–(5.8), it is easy enough to write formulas describing the solution for any number n. For example: • for the explicit Euler method yn(x) ¼ n n(x + n)n; • for the implicit Euler method yn(x) ¼ (1)n(x  n) nnn; • for the one-step Adams method yn(x) ¼ (1)n(x  2n) n(x + 2n)n; • for the second-order Runge-Kutta method yn(x) ¼ 2 n(n2) n (2n2 + 2nx + x2)n. From the known estimates of type (5.3), it follows that all functional sequences converge to ex uniformly on the segment [0; l]. Thus, we obtain the solution of the model differential equation y 0 ¼ y with indefinitely great accuracy, increasing the number of partitions n. For this problem, the Adams method has the smallest error, the Runge-Kutt method has the slightly larger error, the explicit Euler method has the significantly larger error, and the implicit Euler method has the largest one. This trend continues with increasing of the interval length l and with increasing of the number n. Error analysis We will evaluate the error by the value at the right end of the segment. Let us analyze the error for approximate solutions obtained for n ¼ 10 and n ¼ 100. Through err1 let us denote the modulus of the difference between the value of the exact solution at the point at the right end of the interval and the value of the approximate solution at the point at the right end of the interval at n ¼ 10, through err2 accordingly we will denote the modulus of the same difference when n ¼ 100 (Table 5.1). Table 5.1 Maximum error of the methods for the differential equation y 0 5 y on the segment [0; l ]. l 5 0.25

l 5 0.5

l51

Method

err1

err2

err1

err2

err1

err2

Euler’s explicit method Euler’s implicit method Adams’s method Runge-Kutta’s method

0.0039 0.004 0.00001 0.00003

0.0004 0.0004 0.0000001 0.0000003

0.01 0.02 0.0001 0.0003

0.002 0.002 0.000001 0.000003

0.1 0.1 0.002 0.004

0.01 0.01 0.00002 0.00004

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

The data in the table confirm the well-known general provisions—the Adams and Runge-Kutta methods are significantly more accurate than the explicit and implicit Euler methods. Comparison with Maclaurin series with the same number of operations We will compare the accuracy of the formula, that we obtain using the Adams method, with the partial sum of the Maclaurin series. We perform the comparison for the same number of addition/subtraction, multiplication/division operations fulfilled. To obtain exponential expansion to a degree n: x2 x3 xn , it is necessary to carry out: n add/subtract 1+x+ + +⋯+ 2 6 n! operations, 2n  2 multiplication/division operations. Total: 3n  2 operations. To calculate by the general formula of the Adams method:   2n  x n , it is required: 2 addition/subtraction operations, 2n + x n + 1 multiplication/division operations. Total: n + 3 operations. As an example, we will consider the error that occurs when applying these formulas, if we can perform seven operations, which will be necessary for the standard Taylor expansion with n ¼ 3, and in Adams’s method at n ¼ 4 (Fig. 5.1). y 0.00007 0.00006 0.00005 0.00004

|exp (x) – Adams (x) |

0.00003

|exp (x) – Taylor (x) |

0.00002 0.00001 0.05

0.10

0.15

0.20

x

Fig. 5.1 Graphs of the errors of approximate solutions that arise with the standard Taylor expansion with n ¼ 3 and in the Adams method with n ¼ 4.

A similar situation remains with an increase in the number of operations, that is, the Adams method becomes more accurate than the standard expansion in a series with a sufficiently large x.

Results of computational experiments 2: The cosine function We consider the solution of the Cauchy problem for a model differential equation y 00 + y ¼ 0 on the segment [0; l], where l 2 [0.5; 3], using the application of well-known numerical methods to a

183

184

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

segment with a variable upper limit [0; x]. The following are the results of computational experiments for the initial condition y(0) ¼ 1, y 0 (0) ¼ 0. In this case, the solution of the problem is the function cos(x). We consider partitioning the segment to be uniform, that is hk ¼ x/n. To apply the four methods, that we had described in the first part, we transform the second order equation in question into a system of first order equations. To solve this model equation, we will apply our modification of a specialized method € rmer method, for solving second-order equations—the Sto the essence of which is

to use a recurrent formula yk + 1 ¼ 2yk  yk1 + hk 2 f ðxk , yk Þ ¼ 2  hk 2 yk  yk1 :

Computational experiments have shown that in this case too, the error of the explicit and implicit Euler method substantially exceeds € rmer methods. Among the error of the Adams, Runge-Kutta, and Sto € rmer’s method has the highest accuracy. these methods, Sto The calculations demonstrate a rather interesting error behavior, different from the behavior in the case of ex, discussed earlier. The error does not grow monotonously, so for numerical analysis of the error, it is not enough to take the value of the error at the right end of the considered segment. Error analysis The error will be estimated by the greatest deviation of the approximate solution from the true solution for the entire interval. Let us analyze the error for approximate solutions obtained for n ¼ 10 and n ¼ 100. Through err1 we denote the magnitude of the largest deviation of the selected method on a given interval at n ¼ 10, through err2 accordingly, we denote the modulus of the same difference when n ¼ 100 (Table 5.2). Table 5.2 The error of the methods for the differential equation y 00 + y 5 0 for initial condition y(0) 5 1, y 0 (0) 5 0. l 5 0.25

l51

l52

Method

err1

err2

err1

err2

err1

err2

Euler’s explicit method Euler’s implicit method Adams’s method Runge-Kutta’s method St€ormer’s method

0.01 0.01 0.00004 0.0000009 0.00002

0.001 0.001 0.0000004 0.0000009 0.0000007

0.03 0.02 0.0006 0.001 0.0003

0.002 0.002 0.000007 0.000001 0.000003

0.06 0.09 0.006 0.01 0.003

0.008 0.008 0.00006 0.0001 0.00005

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

€ rmer method, developed The data in the table show that the Sto specifically for solving second-order differential equations, turned out to be more accurate than other methods. Comparison with the Maclaurin series Let us make a comparison of the accuracy of the formula € rmer method with the accuracy given obtained by applying the Sto by the partial sum of the Maclaurin series (Fig. 5.2). y 0.8

0.6 | cos (x) – Stermer (x) | 0.4

| cos (x) – Taylor (x) |

0.2

0.5

1.0

1.5

2.0

2.5

3.0

x

Fig. 5.2 Graphs of cosine and approximations obtained by the St€ ormer method at n ¼ 2 and on Maclaurin’s expansion at n ¼ 4.

The graph below shows that on the segment [0; 2] both the € rmer solution and the partial sum of the Maclaurin series Sto approximate the cosine fairly well. On the segment [2; 3] the € rmer approximation is more accurate. Sto With an increase in the number of terms in both approxima€ rmer method becomes more accurate tions, the solution by the Sto than the partial sum of the Maclaurin series for sufficiently large values of the variable x. Search of period It is interesting to try to find a period of an approximate solution. To do this, we build on the phase plane {y, y 0 } a curve corre€ rmer sponding the approximate solution yn(x), found by the Sto method (Fig. 5.3).

185

186

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

y′ 1.0

0.5

–1.0

–0.5

0.5

1.0

y

–0.5

–1.0

Fig. 5.3 Graph of a point on the phase plane {y, y 0 } for the approximation obtained by the St€ormer method at n ¼ 7.

To

determine the period, we find the value qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 0 ðyn ðxÞ  1Þ + y n ðxÞ, the argument corresponding this r ¼ min 5 < ¼ z, dx (5.28) > > : dz ¼ f ðy Þ: dx Initial conditions have the form z(0) ¼ 0, y(0) ¼ p. Let us write down four variants of the beginning of the construction of the solution from the left end of the interval. (1) The first step is done by the Euler method y1 ¼ p + hz0 ¼ p, z1 ¼ zð0Þ + hf ðpÞ ¼ hf ðpÞ:

(5.29)

The second step is done according to Euler’s method y2 ¼ y1 + hz1 ¼ p + h2 f ðpÞ, z2 ¼ z1 + hf ðy1 Þ ¼ 2hf ðpÞ: x2 In this case h ¼ x/2, from where y2 ðxÞ ¼ p + f ðpÞ. 4

(5.30)

(2) The first step is done by the Euler method. The second step is done according to the refined Euler method [8] y2 ¼ p + 2hz1 ¼ p + 2h2 f ðpÞ, z2 ¼ z0 + 2hf ðy1 Þ ¼ 2hf ðpÞ: x2 from where y2 ðxÞ ¼ p + xz1 ðxÞ ¼ p + f ðpÞ. 2

(5.31)

195

196

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

(3) We do two steps according to the Euler method with h ¼ x/4, x from where z2 ðxÞ ¼ f ðpÞ. 2 Then we take the step of the refined Euler method y3 ðxÞ ¼ p + 4hz2 ðxÞ ¼ p + 8h2 f ðpÞ ¼ p +

x2 f ðpÞ: 2

(4) We implement the second option with h ¼ x/4, from where x z2 ðxÞ ¼ f ðpÞ. 2 Then we take the step of the refined Euler method y3 ðxÞ ¼ p + 4hz2 ðxÞ ¼ p + 8h2 f ðpÞ ¼ p +

x2 f ðpÞ: 2

We can see that the formulas for the second, third, and fourth options are the same. For the “cultivation” of the solution from the right end of the interval, we make the replacement x ¼ 1  t; however, the form of equation (1.14) does not change; again we reduce it to the 8 dy > > < ¼ z, dt system > > : dz ¼ f ðy Þ: dt Boundary conditions have the form z(0) ¼ q, y(0) ¼ 0. We realize the same four options. (1) The first step is done by the Euler method y5 ¼ y4 + hq ¼ hq, z5 ¼ z4 + hf ðqÞ ¼ q + hf ð0Þ:

(5.32)

The second step is done according to Euler’s method y6 ¼ y5 + hz5 ðt Þ ¼ 2hq + h2 f ð0Þ, z6 ¼ z5 + hf ðy5 Þ ¼ q + hf ð0Þ + hf ðhqÞ:

(5.33)

In this case h ¼ t/2, from where t2 ð1  x Þ2 f ð0Þ. y6 ðxÞ ¼ tq + f ð0Þ ¼ ð1  xÞq + 4 4 (2) The first step is done by the Euler method. The second step is done according to the refined Euler method y6 ¼ y4 + 2hq ¼ 2hq + 2h2 f ð0Þ, z6 ¼ q + 2hf ðy5Þ ¼ q +2hf ðhqÞ: (5.34) from where y6 ðt Þ ¼ tq +

t2 ð1  x Þ2 f ð0Þ ¼ ð1  xÞq + f ð0Þ. 2 2

(3) We do two steps according to  theEuler method with h ¼ t/4, t t tq . from where z6 ¼ q + f ð0Þ + f 4 4 4

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

Then we take the step of the refined Euler method   t2 t 2 tq y7 ¼ y4 + 4hz6 ¼ tq + f ð0Þ + f 4 4 4    ð1  x Þ2 ð1  xÞq f ð0 Þ + f ¼ ð1  xÞq + : 4 4 (4) We implement   the second option with h ¼ x/4, from where t tq . z6 ¼ q + f 2 4 Then we take the step of the refined Euler method     t 2 tq ð1  xÞ2 ð1  xÞq f ¼ ð1  xÞq + : y7 ¼ y4 + 4hz6 ¼ tq + f 2 2 4 4 In further calculations, we will test the first two variants for the join of solutions. Option 1. We join the solutions in the middle of the segment y2(0.5) ¼ y6(0.5), y20 (0.5) ¼ y60 (0.5), from where p+

1 1 1 1 1 f ðpÞ ¼ q + f ð0Þ, f ðpÞ ¼ q  f ð0Þ: 16 2 16 4 4

From these equalities, we obtain an equation to determine p p+

3 1 f ðqÞ + f ð0Þ ¼ 0: 16 16

(5.35)

We consider that f(0) ¼ α, then for the first half of the segment, we get the solution x2 (5.36) y2 ðxÞ ¼ p  ðα + 16pÞ, 12 and for the second half of the segment, we get the solution y 6 ðx Þ ¼

ðx  1Þ ð3xα  α  16pÞ: 12

(5.37)

Option 2. We join the solutions in the middle of the segment y2(0.5) ¼ y6(0.5), y20 (0.5) ¼ y60 (0.5), from where 1 1 1 1 1 p + f ðpÞ ¼ q + f ð0Þ, f ðpÞ ¼ q  f ð0Þ: 8 2 8 2 2 From these equalities, we obtain an equation to determine p 3 1 (5.38) p + f ðpÞ + f ð0Þ ¼ 0: 8 8 We consider that f(0) ¼ α, then for the first half of the segment, we get the solution of the form x2 (5.39) y2 ðxÞ ¼ p  ðα + 8pÞ, 6

197

198

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

and for the first half of the gap, we get the solution y 6 ðx Þ ¼

ð1  x Þ ð1  x Þ2 ðx  1 Þ α¼ ðα  4pÞ + ð3xα  α  8pÞ: (5.40) 2 3 6

We investigate the dependence between the values of standard deviation errors in test points for the equation and boundary conditions on the number of layers corresponding to the multilayer formula and the number of neurons in the network approximating the initial condition. At the same time, compared with Section 4.1.3, the range of variation of the parameters α,β,γ was dramatically expanded; each of them varied now on the interval (0; 2). When implementing the first method to approximate the solution y(x, p, α, β, γ), the number of layers in which is two, we get the expression

0

p + 0:25x2 α

pβγ @e 1pβ ð1 + p Þ +

2 6 6 exp6 6 4

pβγ  1pβ

!

3

1

! ð1 + pÞx2 α βγ 7 pβγ 7  1pβ 2 7 ! 7ð1 + pÞ 1 + 0:125e x α pβγ 5  1  p + 0:125e 1pβ ð1 + pÞx2 α β p + 0:125e

A:

The smallest error in the boundary conditions was achieved with the maximum number of neurons considered 15, it was 0.0012, and for the test points for the equation, the smallest standard deviation error 0.0025 was obtained with the same number of neurons. When we perform calculations for the three-layer and four-layer formulas, the time for constructing an approximate solution and calculating root-mean-square errors increases significantly. The calculation results for errors of computing for three-layer formulas showed that the most accurate result is achieved with five neurons in the network that approximates the initial condition. For test points for the equation, the smallest root-mean-square error was 0.0025. Subsequent calculations in which four layers were considered corresponding to a multilayer formula showed that there is a direct relationship between the increase in the accuracy of the approximate calculation and the number of layers since the smallest error was detected precisely at the maximum number of layers considered in this work. Thus, using an approximate network including five neurons, the values of the RMS errors were obtained for test points of equation 0.00202 and boundary conditions 0.00082, respectively. These values showed the minimum error of the approximate solution.

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

Thus, the solution constructed by this method, as compared with the solution corresponding to the previous approach described in Section 4.1.3, retains its accuracy in a much broader set of problem parameters with a smaller number of network weights being selected. For the second method, the results of applying the two variants of the algorithm differ significantly. For the first option (formulas 5.35–5.37), the error is too large. An increase in the number of neurons does not lead to a significant decrease in the error. The second variant of the second method (formulas 5.38–5.40) leads to better results (Table 5.6).

Table 5.6 The root-mean-square errors for the second variant of the second method—formulas (5.38–5.40). The number of neurons in approximation network

The range of variation of parameters a, b, g 2 [0; 1] at training the network

The range of variation of parameters a, b, g 2 [0; 2] at training the network

2 5 15

0.00317 0.00232 0.00141

0.0156 0.00648 0.00389

A comparison of the new approach with that described in Section 4.1.3 shows its superiority in several aspects: • The application of the new approach allows us to expand drastically the area of change of parameters for which the model is being built. Moreover, the second variant of the second method allows the model to be extrapolated to a wider area of variation of parameters than that used in the construction of the model. • A new approach allows us to build much less complex models while maintaining accuracy. The model constructed using the second version of the second method with two neurons is a convenient approximate analytical model that can be used not only for computer calculations. • A new approach allows us to create a wider palette of approximate formulas, which is especially relevant in a situation where the original equation poorly describes a real object. In such a situation, it is required to choose a model that most accurately reflects the observational data on an object; therefore a wider set of models makes it possible to find a more appropriate model among them.

199

200

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

5.2.6

Multilayer methods for a model equation with delay

In this section, we test the above methods for constructing approximate multilayer solutions of differential equations on a model problem with delay ( y 0 ðxÞ ¼ e y ðx  1Þ, (5.41) y ðxÞ ¼ e x ,  1  x  0: To find a solution on the segment [0; 1], we apply the recurrence relation of the explicit Euler method yk+1 ¼ yk + hk f(xk, yk) for the equation y 0 (x) ¼ f(x, y) on the segment [0; x] with a variable upper limit x 2 [0; 1]. With a uniform partition of the segment [0; x] x kx with step hk ¼ , xk ¼ , we get n n x x kx yk + 1 ðxÞ ¼ yk ðxÞ + e y ðxk  1Þ ¼ yk ðxÞ + e n . As y0(x) we take e0 ¼ 1. n n As a result, we obtain the approximate solution n1 xX x ex  1 n e ¼1+ x n k¼0 n n e 1 kx

yn ðxÞ ¼ 1 +

on the segment [0; 1]. Below we provide graphs of exact and approximate solutions (explicit Euler’s method) for n ¼ 10.

Fig. 5.5 Graphs of the exact solution and the approximate solution of the problem (5.41), constructed using the explicit Euler method for n ¼ 10.

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

If for the solution to the problem to use the implicit Euler method yk+1 ¼ yk + hk f(xk, yk+1) under the same conditions, we will obtain the recurrent relation   kx x kx x  1 ¼ yk1 ðxÞ + e n , from where we yk ðxÞ ¼ yk1 ðxÞ + e y n n n get an approximate solution kx x x Xn x n ex  1 n y n ðx Þ ¼ 1 + e ¼ 1 + e : x k¼1 n n en  1 Below we provide graphs of exact and approximate solutions (implicit Euler’s method) for n ¼ 10.

Fig. 5.6 Graphs of the exact solution and the approximate solution of the problem (5.41), constructed using the implicit Euler method for n ¼ 10.

As can be seen from Figs. 5.5 and 5.6, both Euler’s methods— explicit and implicit ones—lead to similar results. If we use the trapezoid method yk+1 ¼ yk + hk(f(xk, yk) + f(xk+1, yk+1))/2 under the same conditions,  x  x x e 1 n we get an approximate solution yn ðxÞ ¼ 1 + e +1 x , 2n en  1 which has significantly higher accuracy. Below we give graphs of the difference between exact and approximate solutions (the trapezoid method) for n ¼ 10 (Fig. 5.7).

201

202

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

y 0.2

0.4

0.6

0.8

1.0

x

–0.0002 –0.0004 –0.0006 –0.0008 –0.0010 –0.0012 –0.0014

Fig. 5.7 Graphs of the difference between the exact solution and the approximate solution of the problem (5.41), constructed using the trapezium method for n ¼ 10.

The above-presented methods can be applied to find the continuation of a solution—approximate solutions—on subsequent intervals. In this case, in order to find a solution on the interval m  x  m + 1, we use the solution on the interval m  1  x  m. As an example, let us consider the trapezoid method yk + 1 ¼ yk + hk ðf ðxk , yk Þ + f ðxk + 1 , yk + 1 ÞÞ=2 on the interval [m; x] with a variable upper limit x 2 [m; m + 1]. We suppose that the partition of the interval [0; x] is uniform hk ¼ x/n, xk ¼ m + kx/n. We get yk + 1 ðx, m + 1Þ ¼ yk ðx, m + 1Þ +

xm

xm xm + e yn m  1 + k, m + yn m  1 + ðk + 1Þ, m : 2n n n Here yk(x, m + 1)is the k-th layer of an approximate solution on the interval m  x  m + 1, yn(x, m) is the final approximate solution in the previous interval m  1  x  m. We assume that y0(x, m + 1) ¼ yn(m, m). As a result, we obtain the formula yn ðx, m + 1Þ ¼ yn ðm, mÞ + n



x  mX xm xm +e yn m 1+ ðk  1Þ, m + yn m  1 + k, m : 2n k¼1 n n We present graphs of the exact solution and the approximate solution found by the trapezium method for n ¼ 3 (Fig. 5.8).

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

Fig. 5.8 Graphs of the exact solution and the approximate solution of the problem (5.41), constructed on the segment [5; 6] using the trapezoid method for n ¼ 3.

5.2.7

Application of approximate multilayer methods for solving differential equations in the problem of stabilizing an inverted pendulum

Application of approximate multilayer methods for solving differential equations in the problem of inverted pendulum stabilization. The task of controlling a pendulum in a neighborhood of unstable equilibrium point is a simple example of the stabilization of a nonlinear dynamical system in an unstable state. In this section, we will compare two approaches to system control. The first approach is based on the transition to a linearized model. In the second approach, we use the approximate solutions, based on the application of our methods proposed and discussed in the previous paragraphs. We simulate the behavior of the pendulum by a differential equation: φ€ ¼ a sin φ + bu, (5.42) where φ is a deflection angle of the pendulum from the vertical, u is the moment of the force applied, a, b are coefficients depending on the parameters of the object. The challenge is to choose such a

203

204

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

control u that the conditions φ ! 0, φ_ ! 0 when t ! + ∞ are met. We consider the model case with a ¼ b ¼ 1. Let us move in the _ Eq. (5.42) to the coordinates on the phase plane x ¼ φ, y ¼ φ:  x_ ¼ y, (5.43) y_ ¼ sin x + u: We consider two problem statements. The first statement is using approximate solutions to search for control that brings the system in question closer to the desired equilibrium position according to a given exponential law. Let us compare two approaches. In accordance with the first approach commonly used in such problems, we will choose u ¼  k1x(t)  k2y(t), where x(t), y(t) are solutions of the linearized system  x_ ¼ y, (5.44) y_ ¼ x + u: We choose factors k1, k2 so that the corresponding system  x_ ¼ y, (5.45) _y ¼ ð1  k1 Þx  k2 y had given characteristic numbers. In this model example, select them λ1 ¼ λ2 ¼  1. At the same time, we get k1 ¼ k2 ¼ 2. As a result, we have the law of motion x(t) ¼ (x0 + (x0 + y0)t)e t, y(t) ¼ (y0  (x0 + y0)t)e t and the control u(t) ¼  2(x0 + y0)e t enforcing this law, where x0 ¼ x(0), y0 ¼ y(0). For specified functions x(t) and y(t), we can find control from the system (5.43):   uðt Þ ¼ ððx0 + y0 Þt  x0  2y0 Þe t  sin e t ððx0 + y0 Þt + x0 Þ : We will compare all other solutions with this exact one. The second approach uses approximate solutions of the system (5.43), constructed using the methods proposed by us in this chapter. For this approach, we will test three methods. The first method uses the implicit Euler method with a step h ¼ t(see Section 5.1.2)  x1 ¼ x0 + ty 1 , (5.46) y1 ¼ y0 + t ð sin x1 + uÞ: Substituting the second equality (5.46) into the first, we get: x1 ¼ x0 + ty 0 + t 2 ð sin x1 + uÞ:

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

Substituting the expression (x0 + (x0 + y0)t)e t instead of x1, we find the required control:      u ¼  sin ðx0 + ðx0 + y0 Þt Þe t + x0 ð1 + t Þe t  1 =t 2 + y0 e t  1 =t: The second method is based on the implicit trapezoid method (see Section 5.1.2) with a step h ¼ t  x1 ¼ x0 + 0:5t ðy0 + y1 Þ, (5.47) y1 ¼ y0 + 0:5t ð sin x0 + uð0Þ + sin x1 + uðt ÞÞ: Substituting the second equality (5.47) into the first, we get: x1 ¼ x0 + ty 0 + 0:25t 2 ð sin x0 + uð0Þ + sin x1 + uðt ÞÞ: Substituting the expression (x0 + (x0 + y0)t)e t instead of x1, we find the required control:  uðt Þ + uð0Þ ¼  sin ðx0 + ðx0 + y0 Þt Þe t       sin ðx0 Þ + 4x0 ð1 + t Þe t  1 =t 2 + 4y0 e t  1 =t: Going to the limit at t ! 0, we get u(0) ¼  sin(x0)  x0  2y0, from where    uðt Þ ¼  sin ðx0 + ðy0 + x0 Þt Þe t + 4x0 ð1 + t Þe t  1 =t 2 +   + 4y0 e t  1 =t + x0 + 2y0 : The third method is an explicit, two-step one. The first step is done in accordance with the corrected Euler method (see Section 5.1.1) with a step h ¼ t/2, for which x1 ¼ x0 + 0:5ty 0 + 0:125t 2 ð sin x0 + uð0ÞÞ: € rmer method We take the second step in accordance with the Sto (see Section 5.1.1) with a step h ¼ t/2 x2 ¼ 2x1  x0 + 0:25t 2 ð sin x1 + uð0:5t ÞÞ ¼ x0 + ty 0 + 0:25t 2 ð sin x0 + uð0Þ + sin x1 + uð0:5t ÞÞ: Substituting instead of x2 the expression (x0 + (y0 + x0)t)e t, we get:   uð0Þ + uðt Þ ¼  sin x0  sin x1 + x0 ð1 + 2t Þe 2t  1 =t 2   + 2y0 e 2t  1 =t, from where, going to the limit at t ! 0, we get u(0) ¼  sin(x0)  x0  2y0, from where we find the required control:   uðt Þ ¼  sin x0 + ty 0  0:5t 2 ðx0 + 2y0 Þ +     + x0 ð1 + 2t Þe 2t  1 =t 2 + 2y0 e 2t  1 =t + x0 + 2y0 :

205

206

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

Computational experiments have shown that for a given formulation of the problem, the control formed by the second method of the second approach turns out to be the most accurate. In this case, an approximate solution in the vicinity of the minimum is more efficient than the exact one. The second statement is the search for bounded control, and it is based on the Pontryagin maximum principle. We will solve the problem of stabilizing the pendulum in the upper equilibrium position under the condition j u j  u0 for the minimum time. In accordance with the principle of maximum, we make up the Hamilton function H ¼ φy + ψ(sin x + u). We have a system for the functions φ and ψ  φ_ ¼ ψ cos x, (5.48) ψ_ ¼ φ: In accordance with the principle of maximum, control is matched by maximizing the Hamilton function, i.e., u ¼ u0 sign ψ. Accurate research and analytical control construction are difficult; therefore, we apply approximate methods. In the future, we will assume that u0 ¼ 1. The first approach is to build approximate solutions on the interval Δt with u ¼ 1 and select control mark based on the minimum of the function x2 + y2 at the end of the specified time interval. Then, a transition to a new point is made in accordance with the solution of system (5.43) at this interval, and the selection of the control sign is repeated. For finding the approximate solution, we used four methods. The first method uses the solution of the linearized system (5.44). As a second method, the aforementioned corrected Euler method is used: xðt Þ ¼ x0 + ty 0 + 0:5t 2 ð sin x0 + uÞ: In this case, to find the second phase coordinate, we use the derivative of the first one y(t) ¼ y0 + t(sinx0 + u). The third method uses the third method from the first statement (two-step one), i.e., the formula x(t) ¼ x0 + ty0 + 0.25t2(sinx0 + sin(x0 + 0.5ty0 + 0.125t2(sinx0 + u)) + 2u) is applied; for finding y(t) the corresponding derivative is used. The fourth method differs from the second one in that the corrected Euler method is applied to find y(t): y(t) ¼ y0 + t(sinx0 + u) + 0.5t2y0 cos x0.

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

The second approach is to take two steps with the controls of different signs. The execution time of these steps is chosen in such a way that, as a result, the system moves to the desired equilibrium position. For this, we use the implicit Euler method. The motion in the first step is defined by the system (5.46); on the second step, it is defined by the system:  x2 ¼ x1 + τy2 , y2 ¼ y1 + τð sin x2  uÞ: Since we require compliance with  the conditions x2 ¼ y2 ¼ 0, then x1 ¼ 0, from the above system we get  y1 ¼ τu: x0 + tτu ¼ 0, Substituting these conditions into (5.46) gives y0 ¼ ðτ  t Þu that is, with x0 6¼ 0 we get u ¼  sign (x0). Wherein t and τ we find  x0 u + ty 0 u + t 2 ¼ 0, from the equations: τ ¼ y0 u + t:

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 8 < t ¼ 0:5 y02 + 4jx0 j  y0 u ,

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi The result is: : τ ¼ 0:5 y02 + 4jx0 j + y0 u : Next, we calculate the point to which the pendulum passes in accordance with the solution (5.43) on the interval of length t + τ when implementing the specified control. Further, the control selection is repeated. Computational experiments for the first approach allow us to conclude that with sufficiently large initial deviations, the fourth method is the most effective. The second and fourth methods turned out to be more effective than the first, based on linearization. We started the process from several dozen randomly selected starting points. For most of them, the second approach turns out to be much more efficient, and a two-step solution based on the implicit Euler method turned out to be more effective in most cases than a two-step solution based on linearization. The performed computational experiments have shown that the modifications of classical numerical methods proposed by us can be successfully applied to control problems, including problems related to control in conditions of instability. These methods may be particularly useful in situations where the original mathematical model of a controlled object is inaccurate and can be refined in the process of controlling it as data on processes in the system modeled are accumulated.

207

208

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

Especially promising is the last of the considered methods. Its development, in particular, the application of more accurate variants of modifications of numerical methods (for example, the trapezoid method) requires solving a nonlinear equation at each step, which can be carried out using a previously trained neural network. It is also possible to use methods with a larger number of steps instead of a two-step method.

5.3 5.3.1

Application of multilayer methods for partial differential equations Heat equation

Let us consider an example of applying the methods of Section 5.1.3 to the classical heat equation that we write in a dimensionless form ∂uðx, t Þ ∂2 uðx, t Þ ¼ , ∂t ∂x2

(5.49)

where x 2 ℝ and t 2 ℝ+. We will solve the initial-boundary problem with a piecewise-given initial condition  1, x 0; uðx, 0Þ ¼ ϑðxÞ ¼ (5.50) 0, x < 0: The function ϑ(x) is one of the typical activation functions that is widely used in neural networks. The linearity of the problem allows us to reduce to this case a problem with an initial condition, for which a function u(x, 0) can be represented as a neural network with one hidden layer, the neurons of which have an activation function ϑ(x). We know [20] that we can give the approximation of any function from L2(ℝ) by the corresponding weighted sum of shifts of functions of the form (5.50) with a given accuracy. Moreover, instead of functions of the form (5.50) in initial conditions, we can consider sigmoidal or other functions that we use as basic elements in the approximation of a given function by an artificial neural network with one hidden layer. Also, we will preserve the general scheme for solving the problem (5.49) for such an approximation of the initial condition. We can also take into account and calculate external influences on each of the layers using Duhamel’s integrals, which allows us to interpret the resulting approximation as a neural network with a continuum of neurons in each layer.

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

We are looking for a solution among the functions bounded at infinity. We note that the exact solution of problem (5.49)–(5.50) is ð   1 1 z uðx, t Þ ¼  pffiffiffi exp t 2 dt: π 0 2 We will construct an approximate solution based on the implicit Euler method on a variable time interval [0; t] (taking into account the initial condition (5.50)). Using the implicit Euler method with respect to the time variable t, we obtain a secondorder linear equation for the variable x 1 1 u00k + 1 ðxÞ  uk + 1 ðxÞ ¼  uk ðxÞ, h h

(5.51)

where 0  k  n. We derive the recurrent formulas for solving an equation of the type (5.51), for simplicity, by making a change to a variable pffiffiffi z ¼ x= h. The equation for the new variable is u00k + 1 ðzÞ  uk + 1 ðzÞ ¼ uk ðzÞ:

(5.52)

At each step, we get a task of the form  y 00 ðzÞ  y ðzÞ ¼ A + Pm+ ðzÞexp ðzÞ + Pm ðzÞ exp ðzÞ:

(5.53)

Here, y(z) ¼ uk+1(z), P+m(z), P m(z) are polynomials of degree m: m P i Pm ð z Þ ¼ p i z i¼0

Let us find a particular solution for the term P+m(z) exp(z) by the method of undetermined coefficients in the form m P + y∗ðzÞ ¼ zQm ðzÞ exp ðzÞ ¼ exp ðzÞ qi+ zi + 1 . i¼0

As a result + qm ¼

+ pm 1 i+2 ; qi+ ¼ p +  qi++ 1 , i ¼ 0, …, m  1: (5.54) 2ðm + 1Þ 2ði + 1Þ i 2

For the term P m(z) exp( z) on the right side of the equation, we similarly have an idea for solving m P i+1 y∗ðzÞ ¼ zQ q : m ðz Þ exp ðz Þ ¼ exp ðz Þ i z i¼0

The method of undetermined coefficients leads to relations q m¼ 

p 1 i+2 m ; q ¼  p  + q ,i ¼ 0, …, m  1: (5.55) i+1 2ðm + 1Þ i 2ði + 1Þ i 2

The general view of the solution obtained after using n layers looks like

209

210

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

! ! 8 > x x > + > ffi exp  pffiffiffiffiffiffiffiffi , x 0; > < 1  Pn1 pffiffiffiffiffiffiffi t=n t=n ! ! un ðx, t Þ ¼ > x x >  > > : Pn1 pffiffiffiffiffiffiffiffi exp pffiffiffiffiffiffiffiffi ,x  0: t=n t=n As the number of layers grows n, the error decreases. So when n ¼ 3 a maximum error is 0.23, with an increase in the number of layers n to 20, the maximum error decreases to 0.0035, with an increase in the number of layers n to 1000, it decreases to 0.000072. We want to mention that empirical evaluation takes place |un(x, t)  u(x, t) | < 10(1+[n/10]). It should be noted that we should consider the given formulas as the direct functioning of a neural network with initially defined weights. If the accuracy obtained is not enough, then we can apply the usual training procedure (for example, the error back propagation method) by changing the numerical parameters of formulas (5.54)–(5.55) as weights, minimizing the appropriate error functional.

5.3.2

Comparison of multilayer methods for solving the Cauchy problem for the wave equation

This section deals with the solution of a one-dimensional wave equation. It is the simplest example of a linear hyperbolic partial differential equation. Let us consider a homogeneous one-dimensional wave equation in dimensionless variables ∂2 y ∂2 y ¼ : ∂t 2 ∂x2

(5.56)

Using the linearity of the problem for y ¼ y(t, x), in the first part of the section we solve the Cauchy problem with the initial conditions ∂y ð0, xÞ ¼ 0, (5.57) y ð0, xÞ ¼ f ðxÞ, ∂t where t 2 ℝ+ is time, x 2 ℝ is spatial variable. In the second part of the section, we consider the initial con∂y ditions of the form y(0, x) ¼ 0, ð0, xÞ ¼ g ðxÞ, and we present the ∂t results of the corresponding calculations.

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

We obtain the solution of the problem with the initial condi∂y tions of general form y(0, x) ¼ f(x), ð0, xÞ ¼ g ðxÞ by summing ∂t the two solutions found. Eq. (5.56) with such initial conditions has a well-known solution in analytical form   ðx + t 1 f ðx + t Þ + f ðx  t Þ + g ðsÞds : y ∗ ðt, xÞ ¼ 2 xt We will use the comparison with this solution further to evaluate the results. For the application of our methods, which we describe in this chapter of the book, we rewrite an Eq. (5.56) as an equation concerning a variable t in the operator form ∂2 y ¼ Aðy Þ, ∂t 2

(5.58)

∂2 y : ∂x2 We also write the Eq. (5.58) as a system of first-order differential equations for the functions y and z 8 ∂y > < ¼ z, ∂t (5.59) > : ∂z ¼ Aðy Þ, ∂t where Aðy Þ ¼

where the auxiliary function z(t, x) satisfies all necessary conditions. We know [20] that we can approximate arbitrary functions from a wide class using linear combinations of shifts and extensions of standard functions of neural network bases (sigmoids, Gaussians, and others). Therefore, we considered the initialboundary problem with the condition (5.57), in which   f ðxÞ ¼ exp x2 : (5.60) The function (5.60) is the simplest radial basic function (RBF) (see Section 2.3). In the general case, we can approximate an initial condition of a fairly general form by f ðx Þ ffi

N X

2 ci exp a2i ðx  bi Þ :

(5.61)

i¼1

Suppose we have constructed an approximate solution of Eq. (5.56) u(t, x) for the initial condition (5.60). Then, for the initial condition of the form (5.61), the general form of the solution of N P Eq. (5.56) will look like U ðt, xÞ ffi ci uðt, ai ðx  bi ÞÞ. i¼1

211

212

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

Throughout, the number n means the number of layers of the final multilayer approximation yn(t, x), on which including the step size depends h(t) ¼ t/n. We tested the following methods from Section 5.1.1 1. The Euler method, which for this problem has the form: 8 < ∂2 yk ðt, xÞ , zk + 1 ðt, xÞ ¼ zk ðt, xÞ + hðt Þ  (5.62) ∂x2 : yk + 1 ðt, xÞ ¼ yk ðt, xÞ + hðt Þ  zk ðt, xÞ, where k ¼ 0, …, n  1. 2. Refined Euler’s method: 8 < ∂2 yk ðt, xÞ zk + 1 ðt, xÞ ¼ zk1 ðt, xÞ + 2hðt Þ  , ∂x2 : yk + 1 ðt, xÞ ¼ yk1 ðt, xÞ + 2hðt Þ  zk ðt, xÞ,

(5.63)

where k ¼ 0, …, n  1. 3. Corrected Euler’s method: 8 > ∂2 yk ðt, xÞ 1 2 ∂2 zk ðt, xÞ > < zk + 1 ðt, xÞ ¼ zk ðt, xÞ + hðt Þ  + ð t Þ  , h ∂x2 2 ∂x2 (5.64) 2 > 1 ∂ yk ðt, xÞ > : yk + 1 ðt, xÞ ¼ yk ðt, xÞ + hðt Þ  zk ðt, xÞ + h2 ðt Þ  , 2 ∂x2 where k ¼ 0, …, n  1. € rmer method for the problem: 4. The Sto yk + 1 ðt, xÞ ¼ 2yk ðt, xÞ  yk1 ðt, xÞ + h2 ðt Þ 

∂2 yk ðt, xÞ , ∂x2

(5.65)

where k ¼ 0, …, n  1. The initial condition for the first step is the following y1(t, x) ¼ y0(t, x). € rmer method with the first step according to the cor5. The Sto rected Euler method: ∂2 yk ðt, xÞ , (5.66) yk + 1 ðt, xÞ ¼ 2yk ðt, xÞ  yk1 ðt, xÞ + h2 ðt Þ  ∂x2 where k ¼ 0, …, n  1. The initial condition for the first step is y1(t, x) ¼ exp( x2)(1 +(2x2  1)h(t)). For different values of the time variable from close t ¼ 0 to t ¼ 3, we calculated the maximum approximation error—the maximum error modulus at 1000 equally spaced points relative to the analytical solution y∗ðt, xÞ of problem (5.56): emax ðt Þ ¼ max fjy∗ðt, xÞ  y ðt, xÞj; x ¼ i  15=1000, i ¼ 1, …, 1000g:

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

213

Table 5.7 Maximum error emax at the point t 5 0.1.

Euler’s method Refined Euler’s method Corrected Euler’s method St€ormer’s method Euler’s corrected + St€ormer’s method

n55

n 5 10

n 5 20

n 5 50

0.0055 0.0004 0.0000066 0.0019 0.00156

0.0045 0.038 0.0000018 0.00096 0.00091

0.0018 0.0015 0.00000045 0.00049 0.00042

0.00155 0.00245 0.000000073 0.00019 0.00108

Table 5.8 Maximum error emax at the point t 5 1.

Euler’s method Refined Euler’s method Corrected Euler’s method St€ormer’s method Euler’s corrected + St€ormer’s method

n55

n 5 10

n 5 20

n 5 50

0.21 0.033 0.0164 0.067 0.053

0.205 0.221 0.00345 0.036 0.0343

0.0065 0.0064 0.00084 0.0182 0.0152

0.0081 0.0165 0.000136 0.0076 0.0382

Table 5.9 Maximum error emax at the point t 5 3.

Euler’s method Refined Euler’s method Corrected Euler’s method St€ormer’s method Euler’s corrected + St€ormer’s method

n55

n 5 10

n 5 20

n 5 50

1.65 17.1 3.43 6.2 1.48

1.68 8.55 0.162 0.056 0.045

0.6 0.17 0.023 0.028 0.024

0.29 0.0535 0.0035 0.0123 0.0063

As we see from Tables 5.7–5.9, the corrected Euler method shows the best results. In the case when it is necessary to approximate the solution in the neighborhood of the initial point (for example, when t ¼ 0.1), we can use the usual Euler method by choosing a sufficient number of layers.

214

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

The corrected Euler method is the most accurate of all considered. Its labor intensity is also greatest, but the increase in labor intensity, in this case, pays off by increasing accuracy. € rmer method are significantly The results of applying the Sto better than those of the refined Euler method, with similar complexity, which allows us to recommend it for such tasks. We present the results of computational experiments for the   ∂y initial condition y(0, x) ¼ 0, ð0, xÞ ¼ exp x2 : ∂t Table 5.10 Maximum error emax at the point t 5 0.1.

Euler’s method Refined Euler’s method Corrected Euler’s method St€ormer’s method Euler’s corrected + St€ormer’s method

n55

n 5 10

n 5 20

n 5 50

0.00172258 0.00001294 0.00001304 0.00001313 0.00001319

0.00009266 0.00001313 0.00000324 0.00000328 0.00000329

0.00004789 0.00000328 0.000000809 0.000000820 0.000000824

0.00001954 0.000000525 0.000000129 0.000000131 0.000000132

Table 5.11 Maximum error emax at the point t 5 1.

Euler’s method Refined Euler’s method Corrected Euler’s method St€ormer’s method Euler’s corrected + St€ormer’s method

n55

n 5 10

n 5 20

n 5 50

0.0969773 0.0071519 0.0053022 0.0028264 0.0049492

0.0419807 0.0028264 0.0013861 0.0006968 0.0012288

0.0195704 0.0006968 0.0003568 0.0001737 0.0003067

0.0075341 0.0001111 0.0000581 0.0000278 0.00004904

Table 5.12 Maximum error emax at the point t 5 3.

Euler’s method Refined Euler’s method Corrected Euler’s method St€ormer’s method Euler’s corrected + St€ormer’s method

n55

n 5 10

n 5 20

n 5 50

1.32383 8.39208 0.87565 3.11714 0.03359

0.62337 3.11714 0.066003 0.01317 0.00672

0.195054 0.013172 0.0115796 0.003093 0.001625

0.0480651 0.0019675 0.00180301 0.0004877 0.0002578

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

As we can see from Tables 5.10–5.12, for little time values and a sufficient number of layers, we can use the classical Euler method. € rmer method, with the first step according to the corrected The Sto Euler method, gives the best result among the presented, showing errors similar to the refined and corrected Euler methods at short € rmer method shows good times. At long times, the modified Sto results.

5.4

Problems with real measurements

When modeling real objects, it is often impossible (or very difficult) to sufficiently accurately describe the physical processes occurring in them. As a result, an arbitrarily exact solution of differential equations (which are mathematical models of the above-mentioned physical processes) often does not allow constructing an adequate mathematical model of the object under study. Also, each real object has its characteristics, which are not always possible to take into account because of general considerations. Under these conditions, it is reasonable to refine the model of the object, based on data from observations of it. Refining the physical models and the corresponding differential equations is a difficult task. It is not always possible to obtain a sufficiently accurate model, modifying only the coefficients of the equation, but without changing its structure. We offer a fundamentally different approach, which consists of two stages. At the first stage by methods as described in the preceding sections of this chapter, we construct an approximate solution of differential equations under consideration in the form of the function for which the task settings are input variables. At the second stage, we refine this function according to observations, and the refinement process can continue when new data is received. Our approach differs from the approach [21–23], so we replace with a semi-empirical (for example, neural network) model not a part of the system, that is difficult to model with differential equations, but the entire system, including differential equations. Our approach is preferable in a situation where the accuracy of the description of an object using differential equations is low. Below we provide the results of solving three specific problems of this kind.

215

216

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

5.4.1

The problem of sagging hemp rope

In this section, we illustrate the approach by the problem of calculating the hemp rope sagging line, which is difficult to solve by standard methods (Fig. 5.9).

Fig. 5.9 Photograph of the real rope, for which we built a sagging model.

Let us consider a free-hanging, non-stretchable thread of length l, fixed at their ends at the same level. Fig. 5.10 shows the design scheme with all the symbols. Here L is a distance between supports, s is a length of the portion of the curve, θ is an angle of inclination of the tangent, which is measured from the direction of the horizontal axis x counterclockwise, A and B are horizontal and vertical force components of the support reactions, A and B are vector’s lengths, q is distributed load caused by the weight of the thread. We take into account that mg mg ,q¼ , where m is a mass of the object and g is the value B¼ 2 l of the acceleration of gravity, we also introduce the notation mg s μ¼ and z ¼ . A l

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

If the thread is thin, then we can neglect the internal cutting force ðB  qsÞcos θ + A sin θ ¼ 0:

(5.67)

Fig. 5.10 Estimated rope sagging pattern.

We obtain the ratio (5.67) in the new variables θ, μ, z   1 tg θ ¼ μ  z 2

(5.68)

To find the equation of a curve for a free-hanging rope, it suffices to define a function θ(z). Indeed, in this case, the Cartesian coordinates of the curve— the sagging line of the thread—are determined by integrating the equalities 8 dx > < ¼ cos θ, ds > : dy ¼ sin θ: ds

(5.69)

x y dx dξ In the notation ξ ¼ and η ¼ with subject to equalities ¼ , l l ds dz dy dη ¼ , we rewrite the relations (5.69) ds dz 8 dξ > < ¼ cos θðzÞ, dz > : dη ¼ sin θðzÞ: dz

217

218

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

We integrate these equations taking into account the ratio (5.69) ðz

ðz

dζ ξðzÞ ¼ cos θðζ Þdζ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2ffi 1 0 0 1 + μ2  ζ 2    

1 μ 1 arcsh : ¼  arcsh μ  z μ 2 2   1 μ  ζ dζ 2 ηðzÞ ¼ sin θðζ Þdζ ¼  sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2ffi 1 0 0 2 1+μ ζ 2 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2ffi sffiffiffiffiffiffiffiffiffiffiffiffiffi 1 1 1 1 ¼ + + : z  2 μ 2 μ2 4

(5.70)

ðz

ðz

(5.71)

L Parameter μ can be determined from the condition ξð1Þ ¼ . l Substituting this condition in the formula (5.70), we obtain the

μ L 2 equation for finding μ(l, L): ¼ arcsh . For the explicit analytl μ 2 ical representation of dependencies μ(l, L), we can use a simple approximation with an error of no more than 10%

μ 2 1 arcsh : (5.72) μ 2 1 + 0, 1μ Then we come to the relation

Consequently,

L 1 ¼ : l 1 + 0, 1μ

(5.73)

  mg l ¼ 10  1 : μ¼ A L

(5.74)

Returning in Eqs. (5.70), (5.71) to the original coordinate notation, we write the coordinate equations for line sagging the thread in the form:     8

μ l 1 s > > arcsh ,  arcsh μ  x ðs Þ ¼ > > > μ 2 2 l < 2sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3 (5.75)  2 rffiffiffiffiffiffiffiffiffiffiffiffiffi > 1 1 s 1 1 > > 4 5 > y ðs Þ ¼ l   + + : > : μ2 2 l μ2 4

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

Fig. 5.11 shows the experimental and calculated curves for l ¼ 1 meter, L ¼ 0.3 meter. The differences in the graphs are most likely to be determined by the rough approximation (5.72) and the neglect of the bending stiffness (flexural rigidity) of the thread.

Fig. 5.11 Experimental curve (dashed line) and calculated curve with a 10% error (solid line) for a free-hanging thread.

We will refine the simple model constructed. Due to symmetry, we restrict ourselves to the right half of the line. To account for flexural rigidity, we use the equation EJ

d2 θ ds2

¼ ðB  qsÞ cos θ + Asin θ

(5.76)

219

220

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

with boundary conditions: θ(l/2) ¼ 0, θ(l) ¼ 0 and system of equations (5.69). Here E is Young’s modulus of filament material, J is a moment of inertia of the cross-section. Substituting the above parameter values B and q, we give d 2 θ mg A ð0:5  s=l Þcos θ + sin θ. Assuming Eq. (5.76) to the form 2 ¼ EJ EJ ds that t ¼ 2s/l  1. In this case, Eq. (5.76) takes the form d2 θ dt 2

¼ a t cos θ + b sin θ:

(5.77)

Constants—parameters a and b—are unknown and meant to be determined by experimental data. Eq. (5.77) is supplemented with boundary conditions θ(0) ¼ θ(1) ¼ 0. The study of the behavior of the solutions of this equation showed that observable form of the thread could not be obtained by selecting parameters a and b. Further, we use described earlier in this chapter the variants of a multilayer approach for constructing approximate solutions of ordinary differential equations. We tested both explicit and implicit methods. In the first series of methods, the starting point is at the left end of the segment [0; 1]. In this case, Eq. (5.77) is rewritten as a system 8 >
: dw ¼ f ðt, θÞ, dt

(5.78)

f ðt, θÞ ¼ at cos θ + bsin θ:

(5.79)

where

From the left side boundary conditions are in the form: w(0) ¼ w0, θ(0) ¼ 0. We write the implementation of the six options. (1) The first step is done by Euler’s method θ1 ¼ θ0 + hw 0 ¼ hw 0 , w1 ¼ w ð0Þ + hf ð0, 0Þ ¼ w0 : The second step is also done by Euler’s method θ2 ¼ θ1 + hw 1 ¼ 2hw 0 , w2 ¼ w1 + hf ðt1 , θ1 Þ ¼ w0 + hf ðh, hw 0 Þ: In this case h ¼ t/2, from which w2(x) ¼ tw0.

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

(2) The first step is done according to the Euler method as in option 1. The second includes doing the refined method of Euler. w2 ¼ w0 + 2hw 1 ¼ 2hw 0 , w2 ¼ w0 + 2hf ðt1 , w1 Þ ¼ w0 + 2hf ðh, hw 0 Þ: from which w2(t) ¼ tw0. (3) We take two steps according   to Euler’s method with t t t , w0 . h ¼ t/4, sow2 ðt Þ ¼ w0 + f 4 4 4 Next, we take the step of the refined Euler method   t2 t t θ3 ðt Þ ¼ θ0 + 4hw 2 ðt Þ ¼ tw 0 + f , w0 ¼  4 34 4   t2 t t t ¼ tw 0 + b sin w0  a cos w0 : 4 16 4 4 (4) We realize the second  option with h ¼ t/4, from which t t t w2 ðt Þ ¼ w0 + f , w0 . 2 4 4 Next, we take the step of the refined Euler method   t2 t t θ3 ðxÞ ¼ θ0 + 4hw 2 ðxÞ ¼ tw 0 + f , w0 ¼   2 3 4 4  t2 t t t ¼ tw 0 + b sin w0  a cos w0 : 2 8 4 4 (5) The first step is done according to the fixed Euler method θ1 ¼ θ0 + hw 0 + w1 ¼ w ð0Þ + hf ð0, 0Þ +

f ð0, 0Þ 2 h ¼ hw 0 , 2

ft0 ð0, 0Þ + fθ0 ð0, 0Þw ð0Þ 2 bw0  a 2 h ¼ w0 + h : 2 2

The second step is carried out by the refind Euler method θ2 ¼ θ0 + 2hw 1 ¼ 2hw 0 + ðbw 0  aÞh3 : bw 0  a 3 t . Taking into In this case h ¼ t/2, from which θ2 ðt Þ ¼ tw 0 + 8 a account that θ2(1) ¼ 0, we get w0 ¼ , from which 8+b   a θ 2 ðt Þ ¼ t  t3 . 8+b (6) The first step is done according to the corrected Euler method. € rmer method: The second step is done by the Sto θ2 ¼ 2θ1  θ0 + h2 f ðh, θ1 Þ ¼ 2hw 0 + h2 f ðh, hw 0 Þ:

221

222

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

In this case h ¼ x/2, so that        t 2 t tw0 t2 tw 0 tw0 ¼ tw0 + b sin  at cos . θ2 ¼ tw 0 + f , 4 2 2 4 2 2 In the second series of methods, the starting point is at the right end of the segment [0; 1]. We make a replacement τ ¼ 1  t and again reduce Eq. (5.77) to the system 8 dθ > < ¼ w, dτ > : dw ¼ f ð1  τ, y Þ ¼ f1 ðτ, y Þ, dτ where f1(τ, θ) ¼  a(1  τ) cos θ + b sin θ. Boundary conditions have the form w(0) ¼ w4, θ(0) ¼ 0. We implement the same six options. (1) The first step is done by Euler’s method θ5 ¼ θ4 + hw 4 ¼ hw 4 , w5 ¼ w4 + hf ðw4 Þ ¼ w4 + hf 1 ð0, 0Þ: The second step is done by Euler’s method θ6 ¼ θ5 + hw 5 ðτÞ ¼ 2hw 4 + h2 f1 ð0, 0Þ, w6 ¼ w5 + hf 1 ðh, θ5 Þ ¼ ¼ w4 + hf 1 ð0, 0Þ + hf 1 ðh, hw 4 Þ: In this case h ¼ τ/2, where from τ2 ð1  t Þ2 f1 ð0, 0Þ ¼ ð1  t Þw4 + f ð1, 0Þ ¼ 4 4 2 ð1  t Þ : ¼ ð1  t Þw4  a 4 t ð1  t Þ If θ6(0) ¼ 0, so w4 ¼ a/4 and θ6 ðxÞ ¼ a . 4 θ6 ðxÞ ¼ τw4 +

(2) The first step is done by Euler’s method as in version 1. The second step is done by the refined Euler method θ6 ¼ θ4 + 2hw 5 ¼ 2hw 4 + 2h2 f1 ð0, 0Þ, w6 ¼ w4 + 2hf 1 ðh, θ5 Þ ¼ w4 + 2hf 1 ðh, hw 4 Þ, τ2 ð1  t Þ2 f ð1, 0Þ ¼ where from θ6 ðt Þ ¼ τw4 + f1 ð0, 0Þ ¼ ð1  t Þw4 + 2 2 2 ð1  t Þ . ð1  t Þw4  a 2 (3) We make two steps according to the Euler method with h ¼ τ/4, from which τ τ τ τw4 w6 ¼ w4 + f1 ð0, 0Þ + f1 , : 4 4 4 4

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

Next, we take the step of the refined Euler method τ2 τ2 τ τw4 θ7 ¼ θ4 + 4hw 6 ¼ τw4 + f1 ð0, 0Þ + f1 , ¼ 4 4 4 4    ð1  t Þ2 ð3 + t Þ ð1  t Þw4 a+a  cos ¼ ð1  t Þw4  4 4 4   ð1  t Þw4 b sin : 4 (4) We implement the second option with h ¼ τ/4, from which τ τ τw4 w6 ¼ w4 + f1 , . 2 4 4 Next, we take the step of the refined Euler method τ2 τ τw4 ¼ θ7 ¼ θ4 + 4hw 6 ¼ τw4 + f1 , 2 4 4    ð1  t Þ2 ð3 + t Þ ð1  t Þw4 cos a+a  ¼ ð1  t Þw4  4 2 4   ð1  t Þw4 :  bsin 4 (5) The first step is done by the corrected Euler method; the second step is done by the refined Euler method. 3a , Considering the condition θ6(0) ¼ 0, we get w4 ¼ 8+b    at  where from θ6 ðt Þ ¼ ðb + 2Þ 1  t 2 + t ðb  1Þ . 8+b (6) The first step is done according to the corrected Euler method. € rmer method. In this case The second step is done by the Sto ð1  t Þ2 + 4 ! ð1  t Þ2 ð1  t Þw0 ð1  t Þ2 a + b sin  4 2 8 !! ð1 + t Þ ð1  t Þw0 ð1  t Þ2 a cos a : 2 8 2

θ6 ¼ ð1  t Þw4  a

The application of explicit methods to this problem did not lead to the construction of a solution, corresponding to the experimental data with acceptable accuracy. By increasing the number of steps, it is possible to obtain approximate solutions, which arbitrarily accurate correspond to Eq. (5.77). At the same time, the agreement with the experimental data does not improve, due to the fact that Eq. (5.77) describes the thread inaccurately. A more exact match is obtained when we use implicit methods.

223

224

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

Let us apply to the construction of the approximate model of the thread the implicit Euler method with one step. The implicit one-step Euler method for this system (5.78) has the form  θ ¼ θ0 + t w, w ¼ w0  a t 2 cos θ + b t sin θ: As a result, we get θ ¼ t w0  a t3 cos θ + b t2 sin θ. Considering the boundary conditions θ(1) ¼ 0 leads to w0 ¼ a, where we get the equation for the angle   (5.80) θ ¼ t a 1  t 2 cos θ + b t 2 sin θ: The equations for the coordinates of the line are obtained by integrating Eq. (5.69) using the Simpson method  M

s X s 1 + cos θðs, a, bÞ + 4 ð2i  1Þ, a, b + xðs, a, bÞ ¼ cos θ 6M 2M i¼1  M 1

s X +2 i, a, b ; cos θ M i¼1 y ðs, a, bÞ ¼

s 6M +2

sin θðs, a, bÞ + 4

M X i¼1

M1 X i¼1

!

s i, a, b : sin θ M

sin θ

s ð2i  1Þ, a, b + 2M

Identification of parameters a and b is obtained by minimizing the error functional J¼

M X i¼1

ðxðsi , a, bÞ  xi Þ2 +

M X

ðy ðsi , a, bÞ  yi Þ2 ,

(5.81)

i¼1

which includes observational data {(xi, yi)}M i¼1. For this, we use the approximate solution of the Eq. (5.80), constructed in one of three ways. In the first method, we use approximate equalities cosθ 1, sin θ  θ. At the same time, from Eq. (5.80) we get t a 1  t2 . θ1 ðt, a, bÞ ¼ 1  b t2 In this case, despite the general fit of the curve to the observed data, the errors remain to be too large. In the second method, we use approximate equalities cosθ 1  θ2/2, sin θ θ. At the same time, from Eq. (5.80) we get 2θ1 ðt, a, bÞ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi. θ2 ðt, a, bÞ ¼ 1 + 1 + θ21 ðt, a, bÞ

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

In this case, the matching results of the experiment are much better; the relative root-mean-square (RMS) error was 0.05. The third method consists of two stages. At the first stage, Eq. (5.80) is solved using a neural network. For doing so, it is rewritten as θ ¼ α cos θ + β sin θ, and the neural network θ3(α, β) is selected by minimizing the sequence of functionals m1 X

ðθ3 ðαi , βi Þ  αi cos θ3 ðαi , βi Þ  βi sin θ3 ðαi , βi ÞÞ2 :

i¼1

The test points are regenerated after several steps of the minimization process (in the experiment every five steps) as uniformly distributed on the set, which we chose as [0; 10]  [0; 1]. In the second step, parameters a and b are selected by minimizing the functional (5.81). The best result at the second stage was obtained when in the first one, we used a neural network of six neurons with the function of activating of the type tanh(x). In this case, compliance with the experimental results is even better. The relative RMS error was 0.02. Analysis of the solution of this problem allows us to draw the following conclusions. The construction of a mathematical model of an object or process is usually associated with an approximate solution of differential equations with inaccurately given coefficients using heterogeneous data, which may also include experimental data. In a good situation (when the equations are chosen successfully), the refinement of the solution of the problem leads to both a decrease in the error in satisfying the equation and a decrease in the error in satisfying the measurement data. However, with an unsuccessful choice of equations, it is possible to improve the model by selecting parameters customized to meet the experimental data. This improvement also was observed in our model problem. An increase in the number of neurons makes it possible to increase the accuracy of the solution of Eq. (5.80), which, apparently, does not give the best description of the thread sagging. At the same time, compliance with experimental data is deteriorating. This result is due both to the approximate correspondence of Eq. (5.78) of the modeled thread and to the inaccurate transition from Eqs. (5.78) to (5.80), which is caused by the insufficient number of layers of the multilayer approach. The expediency of increasing the number of layers is doubtful due to the inaccuracy of Eq. (5.78). The refinement of both the thread sag equation itself and the more accurate version of the numerical method in passing from Eqs. (5.78) to (5.80) will lead to the construction of a more accurate mathematical model.

225

226

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

5.4.2

Simulation of the load deflection of the real membrane

In this section, by the example of a specific problem of deflection of a loaded membrane, we show that using our methods, described in Sections 5.1–5.3, we can obtain a more accurate model than the one that gives the exact solution of the differential equation. For this work, we took measurements of the deflection of a round loaded membrane, of a radius R ¼ 0.5 meter made of “Oxford 600” fabric, fixed at sixteen points on a round table. Loads of circular shapes and different weights were arranged alternately in the center of the membrane. The deflection was determined at seven points located at different distances from the center of the membrane at several radii with averaging results for the same distance to the center. The distance was calculated in the projection of points on the plane of the unloaded membrane. The membrane is supposed to be weightless. The area of contact of the load with the membrane was a circle, the radius of which is further designated as a; it is assumed that the tension is isotropic (the tension is the same in all directions). This section compares the exact solution of the differential equation and the approximate solution obtained by our modification of the two-step Euler method, as well as the approximate neural network solution constructed using the methods outlined in Chapters 1–3 for consistency with the experimental data. For each of the first two approaches, we estimate the two natural parameters (they are listed below) by experimental data using the least squares method. For the neural network solution, we use the separation of the sample into the training part and the test part. This separation is caused by a more large number of weights of the neural network than the natural parameters that we select by two other methods. We conduct the neural network learning process until the standard (RMS) error on the test part of the sample begins to grow. Let u(r) be the deviation of the membrane from the equilibrium position depending on the distance from the center of the membrane. We will use the nonhomogeneous equation to describe the deflection: 1 u00rr + u0r ¼ f ðr Þ, r

(5.82)

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

which is the Laplace equation in polar coordinates, where u(r, ϕ) ¼ u(r). The sought function u depends only on the distance r of the point from the center of the membrane. Here  B, r 2 ½0; a, f ðr Þ ¼ 0, r 2 ða; R, A B ¼ , A is the weight of the load, T is the magnitude of the tensile T force applied to the edge of the membrane. For further comparison with the approximate solution, we will write down the exact solution of Eq. (5.82): 8 > 1 Ba2 ln a + u  1 Br 2  a2 , r 2 ½0; a, > < 0 2 R 4 : (5.83) u ðr Þ ¼ >1 2 > : Ba ln r + u0 , r 2 ða; R: 2 R Here u0 ¼ u(R). The solution (5.83) is obtained, taking into account continuity u(r) at the point r ¼ a and boundedness u(r) at the center r ¼ 0. The parameter B is selected P here using the least squares 2 method so as to minimize the value 10 i¼1 ðuðri Þ  ui Þ . Here ri are the values of radius r for which we measured the deflection, ui are the results of the corresponding measurements, u(ri) are the function values u(r) found using the formula (5.83). Having found the value B, we will know the corresponding value z0 ¼ u 0 (R). Taking into account the above formulas, knowing the load weight from the experiment, and determining the value B, we find u(r). To apply the methods outlined earlier in this chapter, we will bring Eq. (5.82) to a normal system of differential equations: ( 0 u ¼ z, z (5.84) z0 ¼  + f ðr Þ: r Here f(r) is the right side of Eq. (5.82). For r 2 (a; R], solving the system (3) by the two-step Euler method from r ¼ R, we get uðr Þ ¼ u0  ðR  r Þz0 

ðR  r Þ2 z0 : 4R

(5.85)

The value u0, as before, is taken from the experiment. Let us solve the system (5.84) for r 2 [0; a] using the same method, considere 0 while r ¼ 0 unknown, and ing the value of the deflection u setting the value of the derivative ur0 to zero, while r ¼ 0. Therefore r2B e0  we get uðr Þ ¼ u . From the continuity of u and its derivative 4 at the point r ¼ a, we find the expressions of the parameters e0 and B through the value of z0 and determine the latter u

227

228

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

using the least squares method by minimizing the error P10 2 i¼1 ðuðri Þ  ui Þ . Now, in the approximate solution, all the parameters will be found, and we can compare it with the exact solution. Also, described in Chapters 1–3 methods constructing a neural network model on a differential equation and experimental data were applied. We compared exact and approximate solutions for three loads of different mass. The mass of the first load is 500 grams, its radius is 4 cm; for the second the mass is 1000 grams, its radius is 6 cm; the mass of the third load is 2000 grams, its radius is 6 cm. In all three cases, the exact solution deviates much more from the experiment than the solution which we obtained with the help of an approximate model. So for a load of 500 grams, the maximum deviation of the exact solution (5.83) from the experimental data is 15 mm, the maximum deviation of the approximate solution (5.85) from the experimental data is 5 mm. For a load of 1000 grams, the maximum deviation of the exact solution (5.83) from the experimental data is 14 mm, the maximum deviation of the approximate solution (5.85) from the experimental data is 2 mm. For a load of 2000 grams, the maximum deviation of the exact solution (5.83) from the experimental data is 13 mm, the maximum deviation of the approximate solution (5.85) from the experimental data is 1.5 mm. As a third way to solve this problem, we built a neural network with one hidden layer and an activation function in the form of a P hyperbolic tangent uðr Þ ¼ с0 + ni¼1 сi tanh ðαi r + bi Þ. The neural network was trained by minimizing the error functional Pm 2 i¼1 ðuðri Þ  ui Þ with the cloud method (see Chapter 3). Here m is the number of points in the training sample. Below is a table of the maximum errors for the test and training samples in millimeters (Table 5.13). Table 5.13 The errors of neural networks with different numbers of neurons for test and training samples. Load mass—500 grams

Load mass—1000 grams

Load mass—2000 grams

Number of neurons

Training sample

Test sample

Training sample

Test sample

Training sample

Test sample

1 2 3 4

0.728 0.176 0.210 0.0456

3.81 3.49 3.36 0.734

0.720 0.284 0.0627 0.270

1.01 0.755 0.238 0.432

0.591 0.453 0.0325 0.0456

0.701 0.645 0.728 0.734

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

From the above table, we see that for the load of 500 grams in mass, a sufficiently adequate display of the experimental data required at least four neurons. For the load of 1000 grams, the neural network with three neurons is optimal. For the load of 2000 grams, the test error for a network with two neurons is optimal, since an increase in the number of neurons leads to an increase in error. Thus, the neural network model more accurately reflects the experimental data but requires the selection of a larger number of parameters, which reduces its potential prognostic capabilities. The above simulation results of membrane deflection allow us to draw the following conclusions. For all three values of the mass of the weight, the exact solution more strongly deviates from the experimental results than the approximate one. In this case, for the approximate and exact solutions, according to the experimental data, the same parameter (the derivative of the deflection at the edge of the membrane) is selected, whence the magnitude of the tensile force is determined. The lack of accuracy of the solution (5.83) suggests that the model (5.82) requires clarification. Such refinement can be done by clarifying the physical model of the membrane. The most obvious way is to account for the weight of the membrane itself; however, it is doubtful that this weight could explain the large deviation of the experimental results from formula (5.83) since the weight of the membrane is small compared to the weight of the loads.

5.4.3

Semi-empirical models of nonlinear bending of a cantilever beam

This section presents the results for the third applied problem, in which our approximate methods obtained a more accurate simulation result than the one that gives the exact solution to the original model equation. The measurements were performed on the following experimental setup. A straight aluminum alloy rod, 940 mm long, of the circular cross-section with a diameter of 8 mm and a weight of 126 grams was taken. One end of the rod was clamped in a vice, and loads of 100 grams, 200 grams, etc. were alternately attached to the other until 1900 grams. The rod was photographed after attaching and removing each load. Behind the experimental setup at a distance of 100 mm in the vertical plane, there was a screen with a millimeter grid. The horizontal and vertical lines of the grid were controlled by a bubble level

229

230

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

with an error of no more than 3 mm per 1000 mm of length. Before the experimental setup, at a distance of 1500 mm horizontally, there was a camera connected to a computer. In the center of the image was the middle of the rod without load. The Wolfram Alpha program was used to recalculate the coordinates from the pixel space to the millimeter grid space to compensate for optical distortion. Besides, to compensate for the majority of the systematic error of the measuring system, two calibration functions depending on the two coordinates of the pixel space for a sufficiently large number of points were constructed. The total measurement error did not exceed 5 mm. To derive an approximate differential model, we assume the rod is infinitely thin, uniform, linearly elastic, and straight in the initial position. These assumptions are not supported by measurements, but our methods allow us to construct mathematical models on this basis, the error of which does not exceed the error of the original data. As a differential equation, we used the equation of a large static deflection of such a rod under the action of a distributed and concentrated forces projected on a tangent to the large deflection line [24]. d2 θ dz2

¼ aðμi + zÞcos θ,

(5.86)

mgL2 mi , μi ¼ , D is constant flexural rigidity and L is rod D m length; θ is the angle of inclination of the tangent to the curve describing the rod; z ¼ 1  s/L, where s is the natural coordinate of the curved axis of the rod, measured from the seal, m is rod mass, mi is load mass. In the performed experiment, the distributed and concentrated forces were the weights of the rod and the load at the end. The boundary conditions in this formulation are: dθ  ¼ 0; θjz¼1 ¼ θ1 . However, experiments have shown that the dz z¼0 angle θ1 is not constant, i.e., with the addition of the load, the rod scrolls into a seal. Thus, a single boundary condition is an unknown parameter and must be determined according to experimental data. As such a parameter, it will be more convenient for us to use θjz¼0 ¼ θ0 , solving not a boundary value problem, but a Cauchy problem with an unknown value θ0. The angle θ is associated with the coordinates of points on the rod by equalities where a ¼

dx dy ¼ cos θ; ¼ sin θ: ds ds

(5.87)

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

Eq. (5.86) describes the process of bending a rod with a large error—the rod has a yield zone near the seal; it has geometric and physical errors. This statement was confirmed by numerical experiments. To build a spectrum of models, we move from the system of Eqs. (5.86), (5.87) to its approximate parametric solutions x(s, θ0, a) and y(s, θ0, a), using the methods discussed earlier in this chapter. From the angle θ(z, θ0, a) go to the original Cartesian coordinates, integrating Eq. (5.87) by the Simpson formula for an interval of variable length. Parameters θ0, a are found via the least squares method by minimizing the error functional N X

ðxðsi , θ0 , aÞ  xi Þ2 + ðy ðsi , θ0 , aÞ  yi Þ2 :

(5.88)

j¼1

Here N is the number of points where measurements were taken, {(xi, yi)} are measured coordinates of points on the rod located at a distance si from the fixed end. In this case, the coordinates si are unknown in advance. In order to find them, the length of the rod is divided into 100 parts, and in quality of si, the number corresponding to the minimum value of the expression (x(si, θ0, a)  xi)2 + (y(si, θ0, a)  yi)2 is taken, i.e., corresponding to the curve point (x(s, θ0, a), y(s, θ0, a)), located at the minimum distance from the experimental point (xi, yi). Due to the fact that the numbers si are not known in advance, the minimization of the functional (5.88) was carried out according to the following variant of the random search algorithm: Algorithm random search. 1. We fix parameters of the algorithm—the initial step η0, γ 1 > 1, γ 2 < 1, β < 1, type of distribution of random vectors, etc. 2. We select the initial values of the parameters w0 and p0 ¼ q0. 3. Making another step wk+1 ¼ wk + ηkpk of the parameter ηk, check if it is successful, i.e., whether the error decreases after it. 4. If the error decreases, then as a new direction we take pk+1 ¼ βpk + qk, where qk is a new random vector, as well as the magnitude of step ηk+1 ¼ γ 1ηk. 5. If the error increases, then as a new direction, we take pk+1 ¼  pk with the same step size. 6. If the error, in this case, increases, then we take as a new direction pk+1 ¼ qk, where qk is a new random vector, as well as the step size ηk+1 ¼ γ 2ηk. 7. Next, we repeat steps 3–6 the required number of times.

231

232

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

For implementing the algorithm, we chose the parameters η0 ¼ 0.0001, γ 1 ¼ 2, γ 2 ¼ 0.9, β ¼ 0.9. As a parameter qk the vector of the required parameters was taken (θ0, a, L), each component of which was multiplied by the current approximation to the corresponding parameter, multiplied by a random variable uniformly distributed over the interval [1; 1]. The initial values of the parameters were selected a ¼ 0.055, L ¼ 900 mm, and the initial value θ0 was determined by the two experimental points closest to the free end. As a result, we get dependencies x(s, θ0, a) and y(s, θ0, a). Parameters θ0, a, as mentioned above, can be found by the minimization of expression (5.88). We present the results of calculations for the four variants of this approach. The first option is to apply to Eq. (5.86) our modification of the € rmer method mentioned above. Sto As a result, we obtain an approximate solution of the form θ1 ðzÞ ffi θ0 +

  z2 z2 aðμi + zÞ cos θ0 + aðμi + zÞ , 4 16

(5.89)

where θ0 ¼ θ(0) is the angle of the rod at its end. Substituting Eq. (5.89) into Simpson’s formulas allows us to obtain dependencies x1(s, θ0, a) and y1(s, θ0, a). For parameters θ0 and a, formally, there are no restrictions. The accuracy of this solution is the higher, as the parameter is the smaller, but due to the approximate nature of Eq. (5.86), we are not interested in the smallness of the error in solving this equation, but in the accuracy of matching the measurement data. The second and third options are to use the implicit Euler method with one step and two steps, respectively. In the second variant, we obtain approximate equality θ(z) ffi θ0 + z2a(μi + z) cos(θ(z)), from which we get an approximate solution θ 0 + z 2 aðμ i + z Þ : θ2 ðzÞ ffi 2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 + 2z2 aðμi + zÞðθ0 + z2 aðμi + zÞÞ + 1

(5.90)

Substituting Eq. (5.90) into Simpson’s formulas allows obtaining dependencies x2(s, θ0, a) and y2(s, θ0, a). Here, we do not give these dependencies due to the cumbersomeness of the corresponding formulas. In the third variant, we get the system of two equations 8

z2 > e < θ ðz Þ , θðzÞ ffi θ0 + aðμi + zÞcos e 4 z2 > : θðzÞ ffi 2e θðzÞ  θ0 + aðμi + zÞ cos ðθðzÞÞ, 4

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

from which we get an approximate solution 2e θ2 ðzÞ  θ0 + 0:25z2 aðμi + zÞ (5.91) θ3 ðzÞ ffi 2 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ffi 2 2 e 1 + 0:5z aðμi + zÞ 2θ2 ðzÞ  θ0 + 0:25z aðμi + zÞ +1 where θ0 + 0:25z2 aðμi + zÞ e ffi θ2 ðzÞ ¼ 2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 + 0:5z2 aðμi + zÞðθ0 + 0:25z2 aðμi + zÞÞ + 1 Substituting (5.91) into Simpson’s formulas allows obtaining dependencies x3(s, θ0, a) andy3(s, θ0, a). Here, we also do not give these dependencies due to the cumbersomeness of the corresponding formulas. The fourth option is to apply our modification of the trapezoid method. As a result, we obtain approximate equality θ4(z) ffi θ0 + 0.25z2a(μi cos θ0 + (μi + z) cos(θ4(z))), from which we get an approximate solution θ 2 ðz Þ ffi θ 0 + 2

z2 að2μi + zÞcos θ0 4  z2 aðμi + zÞsin θ0

(5.92)

We present the results of calculations for the three values of the mass of the load: m1 ¼ 0, m2 ¼ 800 grams and m3 ¼ 1900 grams. For m1 ¼ 0 grams, the standard deviation of measurement results from the theoretical curve {x1(s, θ0, a), y1(s, θ0, a)}, in this case, equals to 2.30 mm. The standard deviation of measurement results from the theoretical curve {x2(s, θ0, a), y2(s, θ0, a)} and theoretical curve {x3(s, θ0, a), y3(s, θ0, a)} equals to 2.25 mm. The standard deviation of measurement results from the theoretical curve {x4(s, θ0, a), y4(s, θ0, a)} , in this case, equals to 2.26 mm. These errors are somewhat larger than the measurement error due to the curvature of the rod. For m2 ¼ 800 grams, the standard deviation of measurement results from the theoretical curve {x1(s, θ0, a), y1(s, θ0, a)}, in this case, equals 2.85 mm. The standard deviation of measurement results from the theoretical curve {x2(s, θ0, a), y2(s, θ0, a)} equals 2.29 mm. The standard deviation of measurement results from the theoretical curve {x3(s, θ0, a), y3(s, θ0, a)} equals 2.47 mm. The standard deviation of measurement results from the theoretical curve {x4(s, θ0, a), y4(s, θ0, a)} is 2.30 mm. These errors are within the measurement error. For m3 ¼ 1900 grams, the standard deviation of measurement results from the theoretical curve {x1(s, θ0, a), y1(s, θ0, a)}, in this case, is 4.35 mm. The standard deviation of measurement results from the theoretical curve {x2(s, θ0, a), y2(s, θ0, a)} equals 2.46 mm. The standard deviation of measurement results from the theoretical curve {x3(s, θ0, a), y3(s, θ0, a)} is 2.52 mm. The standard

233

234

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

deviation of measurement results from the theoretical curve {x4(s, θ0, a), y4(s, θ0, a)} is equal to 2.73 mm. These errors are within the measurement error. Note that we got similar results for other values of the mass of the load. We note in particular that an increase in the accuracy of matching an approximate formula to a differential equation (5.86) does not lead to greater accuracy in matching the experimental data. On the contrary, formula (5.90), which has the lowest accuracy concerning the differential equation, most accurately reflects the experimental data. It is interesting to estimate the possibility of predicting the deflection of the rod depending on the mass of the load. To do this, we studied the dependence of the parameters θ0, a and L on the variables μi. The results of calculations showed that the parameters a and L practically do not change in almost the entire range of changes in the mass of the load, and the angle θ0 varies linearly. The exceptions are the two minimum values of the mass of the load for the parameter a (which is caused by the initial bending of the rod) and the ejection of the value L for the mass of the load m2 ¼ 800 grams, which is a consequence of the measurement error. Comparison of the experimental data with the results of calculations using the formula (5.90) for a and L, taken from the dependence constructed for the load mass 200 grams and θ0, calculated on the basis of the linear dependence constructed from the first two points (for the mass of 0 grams and 100 grams), for the load mass 1600 grams shows that the discrepancy between the theoretical curve and experimental data is within the measurement error. We obtain similar results for other formulas and values for the load mass. The methods presented above allow us to construct simple but fairly accurate models by differential equations and experimental data. We used far from all possibilities of model refinement. If necessary, we can increase the number of steps of the used iterative formula. There are other possibilities, for example, when using the implicit Euler method with two steps, the steps themselves can be made unequal. This version leads to the appearance in formula (5.91) of an additional parameter that can be selected according to the measurement data along with θ0 and a. Besides, it is possible to replace part of the numerical parameters in formulas (5.89–5.92) with selected parameters, which can even more accurately reflect the experimental data. Quite typical for the practice is the situation when the results of the observations of a real object contradict the mathematical

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

model obtained on the basis of an attempt to apply known physical laws. In this situation, some researchers often seek to clarify the physical model of the object and obtain differential equations that reflect the processes occurring in it more accurately. In particular, they try to refine the coefficients of the equations. In order to solve such problems, there are several approaches, one of which is to use neural networks, as it is done in Chapters 1–3. Sometimes no selection of parameters allows the experimental data to be reflected with acceptable accuracy, then the structure of the model has to be changed, which can lead to a dramatic complication of differential equations and does not always succeed in building an adequate mathematical model in a reasonable time. These difficulties often lead to the fact that an object model is built empirically using interpolation from experimental data. Our method allows us to apply an intermediate approach, which consists in obtaining approximate semi-empirical formulas based on an inaccurate differential model and measurement results. The well-known theorems on the error of numerical methods allow us to state that we can obtain an arbitrarily accurate approximation to the solution of a differential equation using a partition into a sufficiently large number of intervals. Without using interpolation, our approach allows us to obtain formulas, which can be refined using experimental data.

References [1] T. Lazovskaya, D. Tarkhov, Multilayer neural network models based on grid methods, IOP Conf. Ser. Mater. Sci. Eng. 158 (2016). http://iopscience.iop. org/article/10.1088/1757-899X/158/1/01206. [2] T.V. Lazovskaya, D.A. Tarkhov, Fresh approaches to the construction of parameterized neural network solutions of a stiff differential equation, St. Petersburg Polytechnical University Journal: Physics and Mathematics (2015). https:// doi.org/10.1016/j.spjpm.2015.07.005. [3] A. Vasilyev, D. Tarkhov, T. Shemyakina, Approximate analytical solutions of ordinary differential equations. Selected Papers of the XI International Scientific-Practical Conference Modern Information Technologies and ITEducation (SITITO 2016) Moscow, Russia, November 25–26, 2016, 2016 р. 393-400 http://ceur-ws.org/Vol-1761/paper50.pdf. [4] J. Schmidhuber, Deep learning in neural networks: an overview, Neural Netw. 61 (2015) 85–117. [5] L. Deng, D. Yu, Deep learning: methods and applications, Found. Trends Signal Process. 7 (3–4) (2014) 1–199. [6] Y. Bengio, Learning deep architectures for AI, Found. Trends Mach. Learn. 2 (1) (2009) 1–127.

235

236

Chapter 5 METHODS FOR CONSTRUCTING MULTILAYER SEMI-EMPIRICAL MODELS

[7] A. Vasilyev, D. Tarkhov, G. Guschin, Neural networks method in pressure gauge modeling, in: Proceedings of the 10th IMEKO TC7 International Symposium on Advances of Measurement Science, Saint-Petersburg, Russia, vol. 2, 2004, pp. 275–279. [8] D. Tarkhov, A. Vasilyev, New neural network technique to the numerical solution of mathematical physics problems. I: Simple problems, Opt. Memory Neural Netw. (Inform. Optics) 14 (2005) 59–72. [9] D. Tarkhov, A. Vasilyev, New neural network technique to the numerical solution of mathematical physics problems. II: Complicated and nonstandard problems, Opt. Memory Neural Netw. (Inform. Optics) 14 (2005) 97–122. [10] N.U. Kainov, D.A. Tarkhov, T.A. Shemyakina, Application of neural network modeling to identification and prediction problems in ecology data analysis for metallurgy and welding industry, Nonlinear Phenom. Complex Syst. 17 (1) (2014) 57–63. [11] A. Vasilyev, D. Tarkhov, Mathematical models of complex systems on the basis of artificial neural networks, Nonlinear Phenom. Complex Syst. 17 (2) (2014) 327–335. [12] T.N. Lazovskaya, D.A. Tarkhov, A.N. Vasilyev, Parametric neural network modeling in engineering, Recent Pat. Eng. 11 (1) (2017) 10–15. http://www. eurekaselect.com/148182/article. [13] E.M. Budkina, E.B. Kuznetsov, T.V. Lazovskaya, D.A. Tarkhov, T.A. Shemyakina, A.N. Vasilyev, Neural network approach to intricate problems solving for ordinary differential equations, Opt. Memory Neural Netw. 26 (2) (2017) 96–109. https://link.springer.com/article/10.3103/S1060992X17020011. [14] V. Antonov, D. Tarkhov, A. Vasilyev, Unified approach to constructing the neural network models of real objects. Part 1, Math. Models Methods Appl. Sci. 41 (18) (2018) 9244–9251. [15] E. Hairer, S.P. Norsett, G. Wanner, Solving Ordinary Differential Equations I: Nonstiff Problem, Springer-Verlag, Berlin, 1987. xiv + 480 pp. [16] N. Bleistein, R. Handelsman, Asymptotic Expansions of Integrals, Dover, New York, 1975. [17] E. Hairer, S.P. Norsett, G. Wanner, Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems, Springer-Verlag, Berlin, 1996. 614 pp. [18] N.W. McLachlan, Theory and Application of Mathieu Functions, Clarendon Press, Oxford, 1947. [19] D. Tarkhov, E. Shershneva, Approximate analytical solutions of Mathieu’s equations based on classical numerical methods. Selected Papers of the XI International Scientific-Practical Conference Modern Information Technologies and IT-Education (SITITO 2016) Moscow, Russia, November 25–26, 2016 р. 356-362 http://ceur-ws.org/Vol-1761/paper46.pdf. [20] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, 1999. 823 p. [21] Y.S. Prostov, Y.V. Tiumentsev, A hysteresis micro ensemble as a basic element of an adaptive neural net, Opt. Memory Neural Netw. (Inform. Optics) 24 (2) (2015) 116–122. [22] M.V. Egorchev, T. Yu.V, Learning of semi-empirical neural network model of aircraft three-axis rotational motion, Opt. Memory Neural Netw. (Inform. Optics) 24 (3) (2015) 210–217. [23] D.S. Kozlov, Y.V. Tiumentsev, Learning of semi-empirical neural network model of aircraft three-axis rotational motion, Opt. Memory Neural Netw. (Inform. Optics) 24 (4) (2015) 279–287. [24] Y.P. Artyukhin, Arbitrary bending of a cantilever beam by a conservative force, Sc. Notes of Kazan Univ.Physic-Math sc. 2 (2013) 144–157.

Index Note: Page numbers followed by f indicate figures and t indicate tables.

A Activation functions multilayer perceptron, 52–59 radial basis function networks, 60–61 Adams method, 112, 179, 182–184, 183f, 187 Air pollution in tunnel computational experiments, results of, 168–170 statements and functionals, 44–48 Alternating pressure calibrator problem computational experiments, results of, 150–151 statements and functionals, 29, 33–36 Artificial neural networks (ANN), xvi, xviii, xxix–xxxii, 3, 51–52, 60, 106, 160 Asymmetric collocation method, xix Asymmetric radial basis function networks, 67–69

B Back-propagation error algorithm, 180 Basis functions, xxx, xxxiv. See also Radial basis function (RBF) networks Burgers’ equation, xxvi

C Cantilever beam, nonlinear bending of, 229–235 Cauchy function, 61, 63, 70 Cauchy problem, xxxix, 120 cosine function, 183–184

exponential function, 181–182 for wave equation, multilayer methods for, 210–215 Cellular neural networks (CNN), xxi Chemical reactor problem computational experiments, results of, 113–115 statements and functionals, 9–10 Cloud methods, 89, 107 Compensation methods, 127 Cosine function, 183–187

D Deep feedforward artificial neural networks, xxv Deep Galerkin Method (DGM), xxvi Deep learning methods, xxv–xxvi Dense cloud method, 89–90 Differential-algebraic problem computational experiments, results of, 120–126 statements and functionals, 12–13 Direct functioning, 56 Direct operation, of the network, 56 Domain decomposition, xlii, 85–86

E Elementary functions cosine function, 183–187 exponential function, 181–183 Elliptic equations with Dirichlet and Neumann boundary conditions, xx Error analysis

cosine function, 184–185, 184t exponential function, 182–183, 182t Error functional, xxxi Error functionals, 1, 58 calculations with respect to weights of the network, 58 results of computational experiments, 107–108, 108t root-mean-square error, 117, 117t η-equivalent solution, xx–xxi Euler’s method, 107, 174, 191 corrected, 205–206, 212–215, 221, 223 explicit, 105, 106f, 107–109, 174, 176, 182, 184, 188, 200, 200f implicit, 179, 182, 184, 201, 201f, 204, 207, 209, 224, 232, 234 improved, 176 modified, 176, 188, 192–193 Evolutionary algorithms, xxxviii, 73, 98–102. See also Genetic algorithms Explicit methods, 175–178 Exponential function, 181–183 External circles method, 127–128

F Finite difference method, xxxviii Finite element method (FEM), xv–xvi, xxix, 51, 68, 141–142

G Gauss function, 61, 70 Gaussian function, xxxiv

237

238

Index

Generalized error clustering algorithm, 94–95 Generalized multi-quadrics, 61 Generalized Schwarz method, 95–97 Genetic algorithms, xlii modification, 86 multilayer perceptron, 77–78 neural network building, 85–86 neural network team construction, 97–99 Gradient methods, 56, 61, 78 Granular porous catalyst approximate analytical solutions for, 194–199 computational experiments, results of, 115–120 statements and functionals, 10–12 Grid methods, 51 Group Method of Data Handling ideology, xlii, 82–85, 96–97, 102 Growing networks method, 74–75, 156–157, 163

H Heat conduction equation with time reversal, 156–157 Heat equation, multilayer methods for, 208–210 Heat transfer in tissue-vessel system, xxxix computational experiments, results of, 148–149 statements and functionals, 24–28 Heterogeneous data, 46, 51, 105–106, 110, 112, 170, 173, 178, 225 Heun method, 176, 193

I Implicit methods, 178–179 Inner circles method, 127 Interval specified thermal conductivity coefficient computational experiments, results of, 166–167

statements and functionals, 44 Inverse and ill-posed problems, xl–xli, 151–152 air pollution in tunnel, 44–48, 168–170 continuation of temperature field, 42–44, 159–166 heat conduction equation with time reversal, 156–157 interval specified thermal conductivity coefficient, 44, 166–167 migration flow modeling, 37–38, 152–155 recovery solutions of Laplace equation, 38–39, 155–156 temperature change of rod at isolated end, 41–42 temperature distribution of heat-insulated rod, 157–159 thermal conductivity equation with time reversal, 40–41 weather forecast model, 36 Inverse problems, xxix Inverted pendulum stabilization problem, 203–208

J Jumping ball method, 92

K Kolmogorov theorem, xxxv

L Laplace equation Dirichlet problem for in the unit circle, 126–128 in the unit square, 128–139 in the L-region, 139–141 on the plane and in space, xxxix, 14–18 recovery solutions of, 38–39, 155–156 Learning network, 60 Linear regression method, 153–154 Loaded membrane deflection, simulation of, 226–229

M Maclaurin series cosine function, 185 exponential function, 183 Mathieu equation, 189–191 Midpoint method, 176, 192–193 Migration flow modeling computational experiments, results of, 152–155 statements and functionals, 37–38 Multilayer perceptrons (MLP) genetic algorithm, 77–78 initial weights, determination of, 59 vs. radial basis functionnetworks, 60 structure and activation functions, 52–59 with time delays, xli, 69–72 in vector-matrix form, 55–56 Multilayer semi-empirical models, xliii explicit methods, 175–178 implicit methods, 178–179 loaded membrane, deflection of, 226–229 nonlinear bending of cantilever beam, 229–235 ordinary differential equations, approximate solutions for elementary functions, 181–187 inverted pendulum stabilization problem, 203–208 Mathieu equation, 189–191 model problem with delay, 200–202 nonlinear pendulum equation, 191–194 porous catalyst granule problem, 194–199 stiff differential equation, 187–189 partial differential equations, 179–181 Cauchy problem for wave equation, 210–215

Index

heat equation, 208–210 sagging hemp rope problem, 216–225 Multiquadric-biharmonic method (MQ-B), xviii

N Network learning algorithm, 65–66 Neural network applications, xvi modeling method, 153–154 team construction, 97 team training, 87, 99 training process, xxx as universal approximators, xxxvi Neural network model approximation, xxxv–xxxvi cloud methods, 89 dense cloud method, 89–90 domain decomposition, 85–86 drawbacks, xxxviii generalization, 93–100 generalized error clustering algorithm, 94 generalized Schwarz method, 95 genetic algorithm modification, 86 neural network building, 85–86 neural network team construction, 97–99 Group Method of Data Handling ideology, 82–85, 96–97, 102 jumping ball method, 92 maximum error of, 114, 114t neural network team construction, 97 neural network team training, xlii, 87, 99 non-linear optimization methods, 87–93 optimization methods, 92–93 parallelization of the problem, xxxvii pointwise approximation, 119, 119f

polyhedron method, 90–91 quality assessment of, 110–111, 111t, 132, 135–136t, 138t quality evaluation of, 109, 110t, 130t, 131, 139t refinement methods, 100–102 restart methods, 88–89 Rprop method, 89 stability of, xxxvii structural algorithms, 74–87 training algorithms, 74, 86–87, 99–100 virtual particles method, 89 Neurocomputing, xxxvii, 65, 96 Neuro elements, 60–61 Neuron removal procedures, 79–80 Newton method, xlii, 57, 87–88, 180 Non-convex optimization methods, xxv Non-isothermal chemical reactor, macrokinetic model of, 9–10 Nonlinear bending of cantilever beam, 229–235 Non-linear optimization methods, 87–93 Nonlinear pendulum equation, approximate solutions for, 191–194 € dinger equation, Nonlinear Schro xxxix computational experiments, results of, 147–148 statements and functionals, 23–24

O One-dimensional quantum dot problem, 142–145 Optimization methods, 92–93 Ordinary differential equations chemical reactor problem, 9–10 differential-algebraic problem, 12–13 elementary functions, 181–187

239

inverted pendulum stabilization problem, 203–208 Mathieu equation, 189–191 model problem with delay, 200–202 nonlinear pendulum equation, 191–194 porous catalyst granule problem, 10–12, 194–199 problems for, 4–13 solutions for, 105–126 stiff differential equation, 7–9, 187–189 Outer circles method, 127

P Partial differential equations, xxv–xxviii, 179–181 Cauchy problem for wave equation, 210–215 for domains with fixed (constant) boundaries boundary value problems, solution of, 128–139 Dirichlet problem, solution of, 126–128 heat transfer in tissuevessels system, xxxix, 24–28, 148–149 Laplace equation in L-region, 139–141 Laplace equation on the plane and in space, 14–18 € dinger nonlinear Schro equation, xxxix, 23–24, 147–148 Poisson problem, 18–20, 141–142 € dinger equation with a Schro piecewise potential, xxxix, 20–23, 142–147 for domains with variable boundaries alternating pressure calibrator problem, 29, 33–36 Stefan’s problem, 28–33, 149–150

240

Index

Partial differential equations (Continued) variable pressure calibrator problem, 150–151 heat equation, 208–210 problems for, 13–28 radial basis functions (RBF), xix Period of an approximate solution, 185–187 Poisson problem computational experiments, results of, 141–142 statements and functionals, 18–20 Polyhedron method distributed version, 91–92 modified, 90–91 Pontryagin maximum principle, 206 Porous catalyst granule problem approximate analytical solutions for, 194–199 computational experiments, results of, 115–120 statements and functionals, 10–12 Predictor-corrector method, 111

Q Quantization method, 152–153 Quantum dot computational experiments, results of, 142–147 statements and functionals, 20–23

R Radial basis function differential quadrature (RBF-DQ) method, xxiv Radial basis function (RBF) networks, xxiii–xxiv, xli, 51 algorithms for building a population of, 81–82

for constructing, 81 anisotropic case, 61 architecture of, 60 asymmetric, 67–69 Cauchy function, 63 Dirichlet integral, 62 formula, 80 Gauss function, 61, 63 with Gaussian basis functions, 66 gradient methods, 61 inverse multiquadrics, 61 modified Bessel function, 62 vs. multilayer perceptrons, 60 network learning algorithm, 65–66 neuro elements, concept of, 60 Poisson integral, 65 with time delays, xli, 69–72 training, 87 Refinement methods, 100–102 Restart methods parallel version, 88–89 sequential version, 88 Ritz-Galerkin method, xxx, 60 Rprop method, 89, 107, 113, 117, 129, 135, 137–138, 156–157, 169 Runge-Kutta method, 179, 182–184, 188

S Sagging hemp rope problem, 216–225 € dinger equation, xxii–xxiii Schro nonlinear, xxxix, 23–24, 147–148 with piecewise potential, xxxix, 20–23, 142–147 Schwartz method, 140–141 Sigmoid functions, 55 Sigmoids, 106–108, 112, 114 Signal transmission graph, 52–53, 53f

Simpson method, 224 Stefan’s problem, xl, 149–150 statements and functionals, 28–33 Stiff differential equations approximate analytical solutions for, 187–189 computational experiments, results of, 105–113 statements and functionals, 7–9 Stone’s theorem, xxxvi € rmer method, 121, 121f, Sto 123–125f, 176–177, 184–185, 185–186f, 187, 190, 190f, 195, 205, 212, 214–215, 221, 223

T Temporary radial basis function networks, 69–71 Time-delayed radial basis function networks, 69–72 Training of neural networks, 173 Trapezoid method, 179, 201–202, 203f, 205, 208, 233 Two-dimensional quantum dot problem, 146–147

V Variable pressure calibrator problem computational experiments, results of, 150–151 statements and functionals, 29, 33–36 Virtual particles method, 89 Volterra integral equations, xx

W Weather forecast model, 36 Weierstrass theorem, xxxv–xxxvi

Z Zero order method, 58