Advances in Neural Computation, Machine Learning, and Cognitive Research III: Selected Papers from the XXI International Conference on Neuroinformatics, October 7-11, 2019, Dolgoprudny, Moscow Region, Russia [1st ed. 2020] 978-3-030-30424-9, 978-3-030-30425-6

This book describes new theories and applications of artificial neural networks, with a special focus on answering quest

596 78 50MB

English Pages XVII, 428 [434] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Advances in Neural Computation, Machine Learning, and Cognitive Research: Selected Papers from the XIX International Conference on Neuroinformatics, October 2-6, 2017, Moscow, Russia 9783319666037, 9783319666044, 3319666037

278 18 6MB Read more

Advances in Neural Computation, Machine Learning, and Cognitive Research VII. Selected Papers from the XXV International Conference on Neuroinformatics, October 23–27, 2023, Moscow, Russia 9783031448645, 9783031448652

141 58 Read more

Advances in Neural Computation, Machine Learning, and Cognitive Research IV: Selected Papers from the XXII International Conference on Neuroinformatics, October 12-16, 2020, Moscow, Russia [1st ed.] 9783030605766, 9783030605773

This book describes new theories and applications of artificial neural networks, with a special focus on answering quest

688 72 39MB Read more

Advances in Neural Computation, Machine Learning, and Cognitive Research : Selected Papers from the XIX International Conference on Neuroinformatics, October 2-6, 2017, Moscow, Russia 978-3-319-66604-4, 3319666045, 978-3-319-66603-7

This book describes new theories and applications of artificial neural networks, with a special focus on neural computat

454 38 12MB Read more

Advances in Bionanomaterials II: Selected Papers from the 3rd International Conference on Bio and Nanomaterials, BIONAM 2019, September 29 – October 3, 2019 [1st ed.] 9783030477042, 9783030477059

This book presents multidisciplinary research focusing on the analysis, synthesis, and design of bio and nanomaterials.

405 36 45MB Read more

Advances in Water Jetting: Selected Papers from the International Conference on Water Jet 2019 - Research, Development, Applications, November 20-22, 2019, Čeladná, Czech Republic [1st ed.] 9783030534905, 9783030534912

This book reports on recent advances in the rapidly growing field of high-speed water jet technology, discussing researc

427 54 47MB Read more

Optimization and Applications: 11th International Conference, OPTIMA 2020, Moscow, Russia, September 28 – October 2, 2020, Proceedings [1st ed.] 9783030628666, 9783030628673

This book constitutes the refereed proceedings of the 11th International Conference on Optimization and Applications, OP

484 110 20MB Read more

Distributed Computer and Communication Networks: Control, Computation, Communications: 23rd International Conference, DCCN 2020 Moscow, Russia, September 14–18, 2020 Revised Selected Papers 3030662411, 9783030662417

This book constitutes the refereed proceedings of the 23rd International Conference on Distributed and Computer and Comm

211 102 52MB Read more

Research and the Future of Telematics: 20th International Conference on Transport Systems Telematics, TST 2020, Kraków, Poland, October 27-30, 2020, Selected Papers [1st ed.] 9783030592691, 9783030592707

This book constitutes selected papers from the 20th International Conference on Transport Systems Telematics, TST 2020,

352 59 64MB Read more

Advances in Air Traffic Engineering: Selected Papers from 6th International Scientific Conference on Air Traffic Engineering, ATE 2020, October 2020,Warsaw, Poland 9783030709242, 3030709248

483 135 10MB Read more

Advances in Neural Computation, Machine Learning, and Cognitive Research III: Selected Papers from the XXI International Conference on Neuroinformatics, October 7-11, 2019, Dolgoprudny, Moscow Region, Russia [1st ed. 2020]
978-3-030-30424-9, 978-3-030-30425-6

Author / Uploaded
Boris Kryzhanovsky
Witali Dunin-Barkowski
Vladimir Redko
Yury Tiumentsev

Table of contents :
Front Matter ....Pages i-xvii
Front Matter ....Pages 1-1
Deep Learning a Single Photo Voxel Model Prediction from Real and Synthetic Images (Vladimir V. Kniaz, Peter V. Moshkantsev, Vladimir A. Mizginov)....Pages 3-16
Tensor Train Neural Networks in Retail Operations (Serge A. Terekhov)....Pages 17-24
Semi-empirical Neural Network Based Modeling and Identification of Controlled Dynamical Systems (Yury Tiumentsev, Mikhail Egorchev)....Pages 25-42
Front Matter ....Pages 43-43
Photovoltaic System Control Model on the Basis of a Modified Fuzzy Neural Net (Ekaterina A. Engel, Nikita E. Engel)....Pages 45-52
Impact of Assistive Control on Operator Behavior Under High Operational Load (Mikhail Kopeliovich, Evgeny Kozubenko, Mikhail Kashcheev, Dmitry Shaposhnikov, Mikhail Petrushan)....Pages 53-61
Hierarchical Actor-Critic with Hindsight for Mobile Robot with Continuous State Space (Staroverov Aleksey, Aleksandr I. Panov)....Pages 62-70
The Hybrid Intelligent Information System for Music Classification (Aleksandr Stikharnyi, Alexey Orekhov, Ark Andreev, Yuriy Gapanyuk)....Pages 71-77
The Hybrid Intelligent Information System for Poems Generation (Maria Taran, Georgiy Revunkov, Yuriy Gapanyuk)....Pages 78-86
Front Matter ....Pages 87-87
Is Information Density a Reliable Universal Predictor of Eye Movement Patterns in Silent Reading? (Valeriia A. Demareva, Yu. A. Edeleva)....Pages 89-94
Bistable Perception of Ambiguous Images – Analytical Model (Evgeny Meilikov, Rimma Farzetdinova)....Pages 95-105
Video-Computer Technology of Real Time Vehicle Driver Fatigue Monitoring (Y. R. Muratov, M. B. Nikiforov, A. S. Tarasov, A. M. Skachkov)....Pages 106-115
Consistency Across Functional Connectivity Methods and Graph Topological Properties in EEG Sensor Space (Anton A. Pashkov, Ivan S. Dakhtin)....Pages 116-123
Evolutionary Minimization of Spin Glass Energy (Vladimir G. Red’ko, Galina A. Beskhlebnova)....Pages 124-130
Comparison of Two Models of a Transparent Competitive Economy (Zarema B. Sokhova, Vladimir G. Red’ko)....Pages 131-137
Spectral Parameters of Heart Rate Variability as Indicators of the System Mismatch During Solving Moral Dilemmas (I. M. Sozinova, K. R. Arutyunova, Yu. I. Alexandrov)....Pages 138-143
The Role of Brain Stem Structures in the Vegetative Reactions Based on fMRI Analysis (Vadim L. Ushakov, Vyacheslav A. Orlov, Yuri I. Kholodny, Sergey I. Kartashov, Denis G. Malakhov, Mikhail V. Kovalchuk)....Pages 144-150
Ordering of Words by the Spoken Word Recognition Time (Victor Vvedensky, Konstantin Gurtovoy, Mikhail Sokolov, Mikhail Matveev)....Pages 151-156
Front Matter ....Pages 157-157
A Novel Avoidance Test Setup: Device and Exemplary Tasks (Alexandra I. Bulava, Sergey V. Volkov, Yuri I. Alexandrov)....Pages 159-164
Direction Selectivity Model Based on Lagged and Nonlagged Neurons (Anton V. Chizhov, Elena G. Yakimova, Elena Y. Smirnova)....Pages 165-171
Wavelet and Recurrence Analysis of EEG Patterns of Subjects with Panic Attacks (Olga E. Dick)....Pages 172-180
Two Delay-Coupled Neurons with a Relay Nonlinearity (Sergey D. Glyzin, Margarita M. Preobrazhenskaia)....Pages 181-189
Brain Extracellular Matrix Impact on Neuronal Firing Reliability and Spike-Timing Jitter (Maiya A. Rozhnova, Victor B. Kazantsev, Evgeniya V. Pankratova)....Pages 190-196
Contribution of the Dorsal and Ventral Visual Streams to the Control of Grasping (Irina A. Smirnitskaya)....Pages 197-203
Front Matter ....Pages 205-205
The Simple Approach to Multi-label Image Classification Using Transfer Learning (Yuriy S. Fedorenko)....Pages 207-213
Application of Deep Neural Network for the Vision System of Mobile Service Robot (Nikolay Filatov, Vladislav Vlasenko, Ivan Fomin, Aleksandr Bakhshiev)....Pages 214-220
Research on Convolutional Neural Network for Object Classification in Outdoor Video Surveillance System (I. S. Fomin, A. V. Bakhshiev)....Pages 221-229
Post-training Quantization of Deep Neural Network Weights (E. M. Khayrov, M. Yu. Malsagov, I. M. Karandashev)....Pages 230-238
Deep-Learning Approach for McIntosh-Based Classification Of Solar Active Regions Using HMI and MDI Images (Irina Knyazeva, Andrey Rybintsev, Timur Ohinko, Nikolay Makarenko)....Pages 239-245
Deep Learning for ECG Segmentation (Viktor Moskalenko, Nikolai Zolotykh, Grigory Osipov)....Pages 246-254
Competitive Maximization of Neuronal Activity in Convolutional Recurrent Spiking Neural Networks (Dmitry Nekhaev, Vyacheslav Demin)....Pages 255-262
A Method of Choosing a Pre-trained Convolutional Neural Network for Transfer Learning in Image Classification Problems (Alexander G. Trofimov, Anastasia A. Bogatyreva)....Pages 263-270
The Usage of Grayscale or Color Images for Facial Expression Recognition with Deep Neural Networks (Dmitry A. Yudin, Alexandr V. Dolzhenko, Ekaterina O. Kapustina)....Pages 271-281
Front Matter ....Pages 283-283
Use of Wavelet Neural Networks to Solve Inverse Problems in Spectroscopy of Multi-component Solutions (Alexander Efitorov, Sergey Dolenko, Tatiana Dolenko, Kirill Laptinskiy, Sergey Burikov)....Pages 285-294
Automated Determination of Forest-Vegetation Characteristics with the Use of a Neural Network of Deep Learning (Daria A. Eroshenkova, Valeri I. Terekhov, Dmitry R. Khusnetdinov, Sergey I. Chumachenko)....Pages 295-302
Depth Mapping Method Based on Stereo Pairs (Vasiliy E. Gai, Igor V. Polyakov, Olga V. Andreeva)....Pages 303-308
Semantic Segmentation of Images Obtained by Remote Sensing of the Earth (Dmitry M. Igonin, Yury V. Tiumentsev)....Pages 309-318
Diagnostics of Water-Ethanol Solutions by Raman Spectra with Artificial Neural Networks: Methods to Improve Resilience of the Solution to Distortions of Spectra (Igor Isaev, Sergey Burikov, Tatiana Dolenko, Kirill Laptinskiy, Sergey Dolenko)....Pages 319-325
Metaphorical Modeling of Resistor Elements (Vladimir B. Kotov, Alexandr N. Palagushkin, Fedor A. Yudkin)....Pages 326-334
Semi-empirical Neural Network Models of Hypersonic Vehicle 3D-Motion Represented by Index 2 DAE (Dmitry S. Kozlov, Yury V. Tiumentsev)....Pages 335-341
Style Transfer with Adaptation to the Central Objects of the Scene (Alexey Schekalev, Victor Kitov)....Pages 342-350
The Construction of the Approximate Solution of the Chemical Reactor Problem Using the Feedforward Multilayer Neural Network (Dmitriy A. Tarkhov, Alexander N. Vasilyev)....Pages 351-358
Linear Prediction Algorithms for Lossless Audio Data Compression (L. S. Telyatnikov, I. M. Karandashev)....Pages 359-364
Front Matter ....Pages 365-365
Approach to Forecasting Behaviour of Dynamic System Beyond Borders of Education (A. A. Brynza, M. O. Korlyakova)....Pages 367-374
Towards Automatic Manipulation of Arbitrary Structures in Connectivist Paradigm with Tensor Product Variable Binding (Alexander V. Demidovskij)....Pages 375-383
Astrocytes Organize Associative Memory (Susan Yu. Gordleeva, Yulia A. Lotareva, Mikhail I. Krivonosov, Alexey A. Zaikin, Mikhail V. Ivanchenko, Alexander N. Gorban)....Pages 384-391
Team of Neural Networks to Detect the Type of Ignition (Alena Guseva, Galina Malykhina)....Pages 392-397
Chaotic Spiking Neural Network Connectivity Configuration Leading to Memory Mechanism Formation (Mikhail Kiselev)....Pages 398-404
The Large-Scale Symmetry Learning Applying Pavlov Principle (Alexander E. Lebedev, Kseniya P. Solovyeva, Witali L. Dunin-Barkowski)....Pages 405-411
Bimodal Coalitions and Neural Networks (Leonid Litinskii, Inna Kaganowa)....Pages 412-419
Building Neural Network Synapses Based on Binary Memristors (Mikhail S. Tarkov)....Pages 420-425
Back Matter ....Pages 427-428

Citation preview

Studies in Computational Intelligence 856

Boris Kryzhanovsky Witali Dunin-Barkowski Vladimir Redko Yury Tiumentsev Editors

Advances in Neural Computation, Machine Learning, and Cognitive Research III Selected Papers from the XXI International Conference on Neuroinformatics, October 7–11, 2019, Dolgoprudny, Moscow Region, Russia

Studies in Computational Intelligence Volume 856

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. The books of this series are submitted to indexing to Web of Science, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.

More information about this series at http://www.springer.com/series/7092

Boris Kryzhanovsky Witali Dunin-Barkowski Vladimir Redko Yury Tiumentsev •

•

•

Editors

Advances in Neural Computation, Machine Learning, and Cognitive Research III Selected Papers from the XXI International Conference on Neuroinformatics, October 7–11, 2019, Dolgoprudny, Moscow Region, Russia

123

Editors Boris Kryzhanovsky Scientific Research Institute for System Analysis of Russian Academy of Sciences Moscow, Russia

Witali Dunin-Barkowski Scientific Research Institute for System Analysis of Russian Academy of Sciences Moscow, Russia

Vladimir Redko Scientific Research Institute for System Analysis of Russian Academy of Sciences Moscow, Russia

Yury Tiumentsev Moscow Aviation Institute (National Research University) Moscow, Russia

ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-30424-9 ISBN 978-3-030-30425-6 (eBook) https://doi.org/10.1007/978-3-030-30425-6 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

The international conference “Neuroinformatics” is the annual multidisciplinary scientific forum dedicated to the theory and applications of artificial neural networks, the problems of neuroscience and biophysics systems, artificial intelligence, adaptive behavior, and cognitive studies. The scope of the conference is wide, ranging from theory of artificial neural networks, machine learning algorithms, and evolutionary programming to neuroimaging and neurobiology. Main topics of the conference cover theoretical and applied research from the following fields: Neurobiology and neurobionics: cognitive studies, neural excitability, cellular mechanisms, cognition and behavior, learning and memory, motivation and emotion, bioinformatics, adaptive behavior and evolutionary modeling, brain–computer interface; Neural networks: neurocomputing and learning, paradigms and architectures, biological foundations, computational neuroscience, neurodynamics, neuroinformatics, deep learning networks, neuro-fuzzy systems, hybrid intelligent systems; Machine learning: pattern recognition, Bayesian networks, kernel methods, generative models, information theoretic learning, reinforcement learning, relational learning, dynamical models, classification and clustering algorithms, self-organizing systems; Applications: medicine, signal processing, control, simulation, robotics, hardware implementations, security, finance and business, data mining, natural language processing, image processing, and computer vision. More than 100 reports were presented at the Neuroinformatics 2019 Conference. Of these, 50 papers were selected, including 3 invited papers, for which articles were prepared and published in this volume. Boris Kryzhanovsky Witali Dunin-Barkowski Vladimir Redko Yury Tiumentsev v

Organization

Editorial Board Boris Kryzhanovsky Witali Dunin-Barkowsky Vladimir Red’ko Yury Tiumentsev

Scientific Research Institute for System Analysis of Russian Academy of Sciences Scientific Research Institute for System Analysis of Russian Academy of Sciences Scientific Research Institute for System Analysis of Russian Academy of Sciences Moscow Aviation Institute (National Research University)

Advisory Board Prof. Alexander N. Gorban (Tentative Chair of the International Advisory Board) Department of Mathematics University of Leicester Email: [email protected] Homepage: http://www.math.le.ac.uk/people/ag153/homepage/ Google scholar profile: http://scholar.google.co.uk/citations?user=D8XkcCIAAAAJ&hl=en Tel. +44 116 223 14 33 Address: Department of Mathematics University of Leicester Leicester LE1 7RH UK Prof. Nicola Kasabov Professor of Computer Science and Director KEDRI Phone: +64 9 921 9506 Email: [email protected] http://www.kedri.info vii

viii

Organization

Physical Address: KEDRI Auckland University of Technology AUT Tower, Level 7 Corner Rutland and Wakefield Street Auckland Postal Address: KEDRI Auckland University of Technology Private Bag 92006 Auckland 1142 New Zealand Prof. Jun Wang, PhD, FIEEE, FIAPR Chair Professor of Computational Intelligence Department of Computer Science City University of Hong Kong Kowloon Tong, Kowloon, Hong Kong +852 34429701 (tel.) +852-34420503 (fax) [email protected]

Program Committee of the XXI International Conference “Neuroinformatics-2019” General Chair Vedyakhin A. A.

Sberbank and Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region

Co-chairs Kryzhanovskiy Boris Dunin-Barkowski Witali Gorban Alexander Nikolaevich

Scientific Research Institute for System Analysis, Moscow Scientific Research Institute for System Analysis, Moscow University of Leicester, Great Britain

Organization

ix

Program Committee Ajith Abraham

Anokhin Konstantin Baidyk Tatiana Balaban Pavel Borisyuk Roman Burtsev Mikhail Cangelosi Angelo Chizhov Anton Dolenko Sergey Dolev Shlomi Dosovitskiy Alexey Dudkin Alexander Ezhov Alexander

Frolov Alexander Golovko Vladimir Hayashi Yoichi Husek Dusan Ivanitsky Alexey Izhikevich Eugene Jankowski Stanislaw Kaganov Yuri Kazanovich Yakov Kecman Vojislav Kernbach Serge

Koprinkova-Hristova Petia

Machine Intelligence Research Labs (MIR Labs), Scientific Network for Innovation and Research Excellence, Washington, USA National Research Centre “Kurchatov Institute,” Moscow The National Autonomous University of Mexico, Mexico Institute of Higher Nervous Activity and Neurophysiology of RAS, Moscow Plymouth University, UK National Research Centre “Kurchatov Institute,” Moscow Plymouth University, UK Ioffe Physical Technical Institute, Russian Academy of Sciences, St. Petersburg Skobeltsyn Institute of Nuclear Physics, Lomonosov Moscow State University Ben-Gurion University of the Negev, Israel Albert-Ludwigs-Universität, Freiburg, Germany United Institute of Informatics Problems, Minsk, Belarus State Research Center of Russian Federation “Troitsk Institute for Innovation and Fusion Research,” Moscow Institute of Higher Nervous Activity and Neurophysiology of RAS, Moscow Brest State Technical University, Belarus Meiji University, Kawasaki, Japan Institute of Computer Science, Czech Republic Institute of Higher Nervous Activity and Neurophysiology of RAS, Moscow Brain Corporation, San Diego, USA Warsaw University of Technology, Poland Bauman Moscow State Technical University Institute of Mathematical Problems of Biology of RAS, Pushchino, Moscow Region Virginia Commonwealth University, USA Cybertronica Research, Research Center of Advanced Robotics and Environmental Science, Stuttgart, Germany Institute of Information and Communication Technologies, Bulgaria

x

Kussul Ernst Litinsky Leonid Makarenko Nikolay

Mishulina Olga Narynov Sergazy Nechaev Yuri

Pareja-Flores Cristobal Prokhorov Danil Vladimir Red’ko Rudakov Konstantin Rutkowski Leszek Samarin Anatoly

Samsonovich Alexei Sandamirskaya Yulia Shumskiy Sergey Sirota Anton Snasel Vaclav Terekhov Serge Tikidji-Hamburyan Ruben Tiumentsev Yury Trofimov Alexander Tsodyks Misha Tsoy Yury Ushakov Vadim Velichkovsky Boris Vvedensky Viktor

Organization

The National Autonomous University of Mexico, Mexico Scientific Research Institute for System Analysis, Moscow The Central Astronomical Observatory of the Russian Academy of Sciences at Pulkovo, Saint Petersburg National Research Nuclear University (MEPhI), Moscow Alem Research, Almaty, Kazakhstan Honored Scientist of the Russian Federation, Academician of the Russian Academy of Natural Sciences, St. Petersburg Complutense University of Madrid, Spain Toyota Research Institute of North America, USA Scientific Research Institute for System Analysis of Russian Academy of Sciences, Moscow Dorodnicyn Computing Centre of RAS, Moscow Czestochowa University of Technology, Poland A. B. Kogan Research Institute for Neurocybernetics Southern Federal University, Rostov-on-Don George Mason University, USA Institute of Neuroinformatics, UZH/ETHZ, Switzerland P. N. Lebedev Physical Institute of the Russian Academy of Sciences, Moscow Ludwig Maximilian University of Munich, Germany Technical University Ostrava, Czech Republic JSC Svyaznoy Logistics, Moscow Louisiana State University, USA Moscow Aviation Institute (National Research University) National Research Nuclear University (MEPhI), Moscow Weizmann Institute of Science, Rehovot, Israel Institut Pasteur Korea, Republic of Korea National Research Centre “Kurchatov Institute,” Moscow National Research Centre “Kurchatov Institute,” Moscow National Research Centre “Kurchatov Institute,” Moscow

Organization

Yakhno Vladimir Zhdanov Alexander

xi

The Institute of Applied Physics of the Russian Academy of Sciences, Nizhny Novgorod Lebedev Institute of Precision Mechanics and Computer Engineering, Russian Academy of Sciences, Moscow

Contents

Invited Papers Deep Learning a Single Photo Voxel Model Prediction from Real and Synthetic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vladimir V. Kniaz, Peter V. Moshkantsev, and Vladimir A. Mizginov Tensor Train Neural Networks in Retail Operations . . . . . . . . . . . . . . . Serge A. Terekhov Semi-empirical Neural Network Based Modeling and Identification of Controlled Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yury Tiumentsev and Mikhail Egorchev

3 17

25

Artificial Intelligence Photovoltaic System Control Model on the Basis of a Modified Fuzzy Neural Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ekaterina A. Engel and Nikita E. Engel Impact of Assistive Control on Operator Behavior Under High Operational Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mikhail Kopeliovich, Evgeny Kozubenko, Mikhail Kashcheev, Dmitry Shaposhnikov, and Mikhail Petrushan Hierarchical Actor-Critic with Hindsight for Mobile Robot with Continuous State Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Staroverov Aleksey and Aleksandr I. Panov

45

53

62

The Hybrid Intelligent Information System for Music Classification . . . Aleksandr Stikharnyi, Alexey Orekhov, Ark Andreev, and Yuriy Gapanyuk

71

The Hybrid Intelligent Information System for Poems Generation . . . . Maria Taran, Georgiy Revunkov, and Yuriy Gapanyuk

78

xiii

xiv

Contents

Cognitive Sciences and Brain-Computer Interface, Adaptive Behavior and Evolutionary Simulation Is Information Density a Reliable Universal Predictor of Eye Movement Patterns in Silent Reading? . . . . . . . . . . . . . . . . . . . . . . . . . . Valeriia A. Demareva and Yu. A. Edeleva

89

Bistable Perception of Ambiguous Images – Analytical Model . . . . . . . . Evgeny Meilikov and Rimma Farzetdinova

95

Video-Computer Technology of Real Time Vehicle Driver Fatigue Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Y. R. Muratov, M. B. Nikiforov, A. S. Tarasov, and A. M. Skachkov Consistency Across Functional Connectivity Methods and Graph Topological Properties in EEG Sensor Space . . . . . . . . . . . . . . . . . . . . . 116 Anton A. Pashkov and Ivan S. Dakhtin Evolutionary Minimization of Spin Glass Energy . . . . . . . . . . . . . . . . . . 124 Vladimir G. Red’ko and Galina A. Beskhlebnova Comparison of Two Models of a Transparent Competitive Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Zarema B. Sokhova and Vladimir G. Red’ko Spectral Parameters of Heart Rate Variability as Indicators of the System Mismatch During Solving Moral Dilemmas . . . . . . . . . . . 138 I. M. Sozinova, K. R. Arutyunova, and Yu. I. Alexandrov The Role of Brain Stem Structures in the Vegetative Reactions Based on fMRI Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Vadim L. Ushakov, Vyacheslav A. Orlov, Yuri I. Kholodny, Sergey I. Kartashov, Denis G. Malakhov, and Mikhail V. Kovalchuk Ordering of Words by the Spoken Word Recognition Time . . . . . . . . . 151 Victor Vvedensky, Konstantin Gurtovoy, Mikhail Sokolov, and Mikhail Matveev Neurobiology and Neurobionics A Novel Avoidance Test Setup: Device and Exemplary Tasks . . . . . . . . 159 Alexandra I. Bulava, Sergey V. Volkov, and Yuri I. Alexandrov Direction Selectivity Model Based on Lagged and Nonlagged Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Anton V. Chizhov, Elena G. Yakimova, and Elena Y. Smirnova Wavelet and Recurrence Analysis of EEG Patterns of Subjects with Panic Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Olga E. Dick

Contents

xv

Two Delay-Coupled Neurons with a Relay Nonlinearity . . . . . . . . . . . . 181 Sergey D. Glyzin and Margarita M. Preobrazhenskaia Brain Extracellular Matrix Impact on Neuronal Firing Reliability and Spike-Timing Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 Maiya A. Rozhnova, Victor B. Kazantsev, and Evgeniya V. Pankratova Contribution of the Dorsal and Ventral Visual Streams to the Control of Grasping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Irina A. Smirnitskaya Deep Learning The Simple Approach to Multi-label Image Classification Using Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Yuriy S. Fedorenko Application of Deep Neural Network for the Vision System of Mobile Service Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Nikolay Filatov, Vladislav Vlasenko, Ivan Fomin, and Aleksandr Bakhshiev Research on Convolutional Neural Network for Object Classification in Outdoor Video Surveillance System . . . . . . . . . . . . . . . 221 I. S. Fomin and A. V. Bakhshiev Post-training Quantization of Deep Neural Network Weights . . . . . . . . 230 E. M. Khayrov, M. Yu. Malsagov, and I. M. Karandashev Deep-Learning Approach for McIntosh-Based Classification Of Solar Active Regions Using HMI and MDI Images . . . . . . . . . . . . . . 239 Irina Knyazeva, Andrey Rybintsev, Timur Ohinko, and Nikolay Makarenko Deep Learning for ECG Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Viktor Moskalenko, Nikolai Zolotykh, and Grigory Osipov Competitive Maximization of Neuronal Activity in Convolutional Recurrent Spiking Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Dmitry Nekhaev and Vyacheslav Demin A Method of Choosing a Pre-trained Convolutional Neural Network for Transfer Learning in Image Classification Problems . . . . . 263 Alexander G. Trofimov and Anastasia A. Bogatyreva The Usage of Grayscale or Color Images for Facial Expression Recognition with Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 271 Dmitry A. Yudin, Alexandr V. Dolzhenko, and Ekaterina O. Kapustina

xvi

Contents

Applications of Neural Networks Use of Wavelet Neural Networks to Solve Inverse Problems in Spectroscopy of Multi-component Solutions . . . . . . . . . . . . . . . . . . . . 285 Alexander Efitorov, Sergey Dolenko, Tatiana Dolenko, Kirill Laptinskiy, and Sergey Burikov Automated Determination of Forest-Vegetation Characteristics with the Use of a Neural Network of Deep Learning . . . . . . . . . . . . . . . 295 Daria A. Eroshenkova, Valeri I. Terekhov, Dmitry R. Khusnetdinov, and Sergey I. Chumachenko Depth Mapping Method Based on Stereo Pairs . . . . . . . . . . . . . . . . . . . 303 Vasiliy E. Gai, Igor V. Polyakov, and Olga V. Andreeva Semantic Segmentation of Images Obtained by Remote Sensing of the Earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Dmitry M. Igonin and Yury V. Tiumentsev Diagnostics of Water-Ethanol Solutions by Raman Spectra with Artificial Neural Networks: Methods to Improve Resilience of the Solution to Distortions of Spectra . . . . . . . . . . . . . . . . . . . . . . . . . 319 Igor Isaev, Sergey Burikov, Tatiana Dolenko, Kirill Laptinskiy, and Sergey Dolenko Metaphorical Modeling of Resistor Elements . . . . . . . . . . . . . . . . . . . . . 326 Vladimir B. Kotov, Alexandr N. Palagushkin, and Fedor A. Yudkin Semi-empirical Neural Network Models of Hypersonic Vehicle 3D-Motion Represented by Index 2 DAE . . . . . . . . . . . . . . . . . . . . . . . . 335 Dmitry S. Kozlov and Yury V. Tiumentsev Style Transfer with Adaptation to the Central Objects of the Scene . . . 342 Alexey Schekalev and Victor Kitov The Construction of the Approximate Solution of the Chemical Reactor Problem Using the Feedforward Multilayer Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Dmitriy A. Tarkhov and Alexander N. Vasilyev Linear Prediction Algorithms for Lossless Audio Data Compression . . . 359 L. S. Telyatnikov and I. M. Karandashev Neural Network Theory, Concepts and Architectures Approach to Forecasting Behaviour of Dynamic System Beyond Borders of Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 A. A. Brynza and M. O. Korlyakova

Contents

xvii

Towards Automatic Manipulation of Arbitrary Structures in Connectivist Paradigm with Tensor Product Variable Binding . . . . . 375 Alexander V. Demidovskij Astrocytes Organize Associative Memory . . . . . . . . . . . . . . . . . . . . . . . . 384 Susan Yu. Gordleeva, Yulia A. Lotareva, Mikhail I. Krivonosov, Alexey A. Zaikin, Mikhail V. Ivanchenko, and Alexander N. Gorban Team of Neural Networks to Detect the Type of Ignition . . . . . . . . . . . . 392 Alena Guseva and Galina Malykhina Chaotic Spiking Neural Network Connectivity Configuration Leading to Memory Mechanism Formation . . . . . . . . . . . . . . . . . . . . . . 398 Mikhail Kiselev The Large-Scale Symmetry Learning Applying Pavlov Principle . . . . . . 405 Alexander E. Lebedev, Kseniya P. Solovyeva, and Witali L. Dunin-Barkowski Bimodal Coalitions and Neural Networks . . . . . . . . . . . . . . . . . . . . . . . 412 Leonid Litinskii and Inna Kaganowa Building Neural Network Synapses Based on Binary Memristors . . . . . 420 Mikhail S. Tarkov Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427

Invited Papers

Deep Learning a Single Photo Voxel Model Prediction from Real and Synthetic Images Vladimir V. Kniaz1,2(B) , Peter V. Moshkantsev1,3 , and Vladimir A. Mizginov1 1

State Research Institute of Aviation Systems (GosNIIAS), Moscow, Russia {vl.kniaz,vl.mizginov}@gosniias.ru, [email protected] 2 Moscow Institute of Physics and Technology (MIPT), Moscow, Russia 3 Moscow Aviation Institute, Moscow, Russia

Abstract. Reconstruction of a 3D model from a single image is challenging. Nevertheless, recent advances in deep learning methods demonstrated exciting progress toward single-view 3D object reconstruction. However, successful training of a deep learning model requires an extensive dataset with pairs of geometrically aligned 3D models and color images. While manual dataset collection using photogrammetry of laser scanning is challenging, the 3D modeling provides a promising method for data generation. Still, a deep model should be able to generalize from synthetic to real data. In this paper, we evaluate the impact of the synthetic data in the dataset on the performance of the trained model. We use a recently proposed Z-GAN model as a starting point for our research. The Z-GAN model leverages generative adversarial training and a frustum voxel model to provide the state-of-the-art results in the single-view voxel model prediction. We generated a new dataset with 2k synthetic color images and voxel models. We train the Z-GAN model on synthetic, real, and mixed images. We compare the performance of the trained models on real and synthetic images. We provide a qualitative and quantitative evaluation in terms of the Intersection over Union between the ground truth and predicted voxel models. The evaluation demonstrates that the model trained only on the synthetic data fails to generalize to real color images. Nevertheless, a combination of synthetic and real data improves the performance of the trained model. We made our training dataset publicly available (http://www.zefirus.org/SyntheticVoxels). Keywords: Generative adversarial networks · Deep learning Voxel model prediction · 3D object reconstruction

1

·

Introduction

Prediction of a 3D model from an image requires an estimation of the camera pose related to the object and reconstruction of the object’s shape. While traditional multi-view stereo approaches [22,23,25] provide a robust solution for 3D c Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 3–16, 2020. https://doi.org/10.1007/978-3-030-30425-6_1

4

V. V. Kniaz et al.

Fig. 1. Results of our image-to-voxel translation based on generative adversarial network (GAN) and frustum voxel model. Input color image (left). Ground truth frustum voxel model slices colored as a depth map (middle). The voxel model output (right).

reconstruction, prediction of a 3D model from a monocular camera is required in such applications as mobile robotics, augmented reality for smartphones, and reconstruction o f lost cultural heritage [21]. Single image 3D reconstruction is ambiguous. Firstly, a single image doesn’t provide enough data to estimate the distance to the object’s surface. Secondly, back surfaces are not visible on a single photo. Therefore, a priori knowledge about the object’s shape is required for an accurate single-view reconstruction. Recent advances of deep learning methods demonstrated impressive progress in single-view 3D reconstruction [13,35,41,43]. Modern voxel model prediction methods fall into two categories: object-centered and view-centered [35]. Objectcentered methods [13,41] predict the same voxel model for any camera pose relative to an object. They aim to recognize object class in the input photo and to predict its voxel model in the object-centered coordinate system. For example, an object-centered method will generate the same voxel model for the front facing car and the car captured from the rear side. In contrast to object-centered methods, view-centered models provide different outputs for different camera poses. They aim to generate a voxel model of the object in the camera’s coordinate system. While a training dataset for an object-centered method requires only a single voxel model for all images of a single object class, each image in the training dataset for a view-centered approach requires a geometrically aligned voxel model. Thus, generation of a viewcentered dataset is challenging. Nevertheless, view-centered methods generally outperform object-centered methods [24,35]. A research project has been recently started by the authors with the aim of developing a low-cost driver assistance system with a monocular camera. An efficient training dataset generation technique is required to train a single-view 3D reconstruction model successfully. The technique should provide means for modeling various traffic and weather conditions. Recently a new kind of a view-centered 3D object representation was proposed [24]. It is commonly called a frustum voxel model (fruxel model). Unlike ordinary voxel models with cubic elements, fruxel models have trapezium-shaped

Hamiltonian Mechanics

5

elements that represent slices of the camera’s frustum. Each fruxel is aligned with the pixel of the input color image (see Fig. 1). Fruxel models facilitate robust training of a view-centered model as the contour alignment between the input image and the fruxel model is preserved. To the best of our knowledge, there are no results in the literature regarding view-centered voxel model dataset generation using synthetic images and 3D modeling. In this paper, we explore the impact of the synthetic data in the performance of a view-centered model. We use a recently proposed generative adversarial model Z-GAN [24] as a starting point for our research. We prepared an extensive SyntheticVoxels dataset with 2k synthetic images of three object classes and corresponding ground truth fruxel models. We made our dataset publicly available. We compare the performance of the Z-GAN model trained on real, synthetic, and mixed data. The results of joint training on the synthetic and real data are encouraging and show that synthetic data allows the model to generalize to previously unseen objects. The developed view-centered dataset generation technique allows modeling challenging 3D object configurations and traffic situations that can not be reconstructed online using laser scanning or similar approaches.

2

Related Work

Generative Adversarial Networks. Development of a new type of neural networks known as Generative Adversarial Networks (GANs) [14] made it possible to provide a mapping from a random noise vector to a domain of the desired outputs (e.g., images, voxel models). GANs have received a lot of scholar attention in recent years. These networks provide inspiring results in such tasks as image-to-image translation [20] and the voxel model generation [42]. Single-Photo 3D Model Reconstruction. Accurate 3D reconstruction is challenging if only a single color image is used as an input. This problem was intensively studied recently [10,31,32]. Recently some authors proposed new methods that leverage deep learning [7,13,19,33,35,39,41,42,45]. Despite some methods were proposed for prediction of unobserved voxels from a single depth map [12,37,46–48], prediction of the voxel model of a complex scene from a single color (RGB) image is more ambiguous. The 3D shape of an object should be known for the accurate performance of the method. Therefore, the solution of the problem occurs in 2 steps: object recognition and a 3D shape reconstruction. In [13] a deep learning method for a single image voxel model reconstruction was proposed. The method leverages an auto-encoder architecture for a voxel model prediction. The method showed encouraging results, but the resolution of the model was only 20 × 20 × 20 elements. A combined method for 3D model reconstruction was proposed in [7]. In [33] a new voxel decoder architecture was proposed that uses voxel tube and shape layers to increase the resulting voxel model resolution. A comparison of surface-based and volumetric 3D model prediction is performed in [35].

6

V. V. Kniaz et al.

Methods that leverage a latent space for 3D shape synthesis were developed recently [5,13,42]. Wu et al. have proposed a GAN model [42] for a voxel model generation (3D-GAN). This made it possible to predict models with a resolution 64 × 64 × 64 elements from a randomly sampled noise vector. The developed method was used for a single-image 3D reconstruction using an approach proposed in [13]. Despite the fact that 3D-GAN increased the number of elements in the model compared to [13], the generalization ability of this method was low, especially for previously unseen objects. 3D Shape Datasets. Several 3D shape datasets were designed [6,27,38,44] for deep learning. Semantic segmentation was made for the Pascal VOC dataset [11] to align a set of CAD models with color photos. The extended dataset was named Pascal 3D+ [44]. However, the models trained with this dataset showed a rough match between a 3D model and a photo. ShapeNet dataset [6] was used to solve the problem of 3D shape recognition and prediction. However, the ShapeNet provides only synthetic images and the exact reconstruction of the model using single image is possible only with synthetic data. Hinterstoisser et al. have generated a large Linemod dataset [15] with aligned RGB-D data. The Linemod dataset was intensively used for training 6D pose estimation algorithms [1–4,8,17,18,26,28,30,36,40]. In [16] a large dataset for 6D pose estimation of texture-less objects was developed. An MVTec ITODD dataset [9] addresses the challenging problem of 6D pose prediction in industrial application.

3

Method

The aim of the present research is to compare the performance of a single photo voxel model prediction method trained on synthetic, real and mixed data. In our research we use a generative adversarial network Z-GAN [24] that performs color image-to-voxel model translation. Z-GAN model uses a special kind of voxel model in which the voxel model is aligned with an input image. While a depth map that present distances only to the object surface from a given viewpoint, the voxel model includes information about the entire 3D scene. The proposed frustum voxel models combines features of a depth map and a voxel model. We use a hypothesis made by [41] as the starting point for our research. To provide the aligned voxel model, we combine depth map representation with a voxel grid. We term the resulting 3D model as a Frustum Voxel model (Fruxel model). 3.1

Frustum Voxel Model

The main idea of the fruxel model is to provide a precise alignment of voxel slices with contours of a color image. Such alignment can be achieved with a common voxel model if the camera has an orthographic projection and its optical axis coincides with the Z-axis of the voxel model. As the camera frustum is no longer corresponds to the cube voxel elements, we use sections of a pyramid.

Hamiltonian Mechanics

7

Fruxel model representation provides multiple advantages. Firstly, each XY slice of the model is aligned with some contours on a corresponding color photo (some parts of them can be invisible). Secondly, a fruxel model encodes a shape of both visible and invisible surfaces. Hence, unlike the depth map, it contains complete information about the 3D shapes. In other words, the fruxel model imitates perspective space. It is important to note that all slices of the fruxel model have the same number of fruxel elements (e.g., 128 × 128 × 1). A fruxel model is characterized by a following set of parameters: {zn , zf , d, α}, where zn is a distance to a near clipping plane, zf is a distance to a far clipping plane, d is the number of frustum slices, α is a field of view of a camera. Fruxel model is a special kind of a voxel model optimized for the training of conditional adversarial networks. However, a fruxel model can be converted into 3 common data types: (1) voxel model, (2) depth map, (3) object annotation. A voxel model can be generated from the fruxel model by scaling each consequent layer slice by the coefficient k defined as: k=

zn , zn + sz

(1)

z −z

where sz = f d n is the size of the fruxel element along the Z-axis. To generate a depth map P from the fruxel model, we multiply indices of the frontmost non-empty elements by the step sz P (x, y) = argmin[F (x, y, i) = 1] · sz + zn ,

(2)

i

where P (x, y) is an element of a depth map, F (x, y, i) element of a fruxel model at slice i with coordinates (x, y). An object annotation is equal to a product of all elements with given x, y coordinates d F (x, y, i). (3) A(x, y) = i=0

3.2

Conditional Adversarial Networks

ˆ for a given random noise Generative adversarial networks generate a signal B ˆ [14,20]. Conditional GAN transforms an input image vector z, G : z → B ˆ G : {A, z} → B. ˆ The input A can be A and the vector z to an output B, an image that is transformed by the generator network G. The discriminator network D is trained to distinguish “real” signals from target domain B from the ˆ produced by the generator. Both networks are trained simultaneously. “fakes” B Discriminator provides the adversarial loss that enforces the generator to produce ˆ that cannot be distinguished from “real” signal B. “fakes” B ˆ to synthesize a fruxel model B ˆ ∈ Rw×h×d We train a generator G : {A} → B w×h×3 conditioned by a color image A ∈ R .

V. V. Kniaz et al. 256 × 256 × 3

128 × 128 × 64

64 × 64 × 128

8 × 8 × 512

16 × 16 × 512

32 × 32 × 256

4 × 4 × 512

conv2D

deconv3D

deconv3D

4×4

4×4×4

2×4×4

4×4

conv2D 4×4

1 × 1 × 512

conv2D 4×4

deconv3D

conv2D

4×4

2×4×4

conv2D

4×4

deconv3D

conv2D

4×4×4

4×4×4

4×4×4

deconv3D

deconv3D

4×4

deconv3D

conv2D

4×4

4×4×4

conv2D

deconv3D

2 × 2 × 512

4×4×4

8

1 × 1 × 1 × 1024 2 × 2 × 2 × 1024 4 × 4 × 4 × 1024

128 × 128 × 128

16 × 16 × 16 × 1024 8 × 8 × 8 × 1024

128 × 128 × 128 × 128

Feature map

64 × 64 × 64 × 256

2D convolution

32 × 32 × 32 × 512

3D deconvolution

Copy inflate

Fig. 2. The architecture of the generator.

3.3

Z-GAN Framework

We use pix2pix [20] framework as a base to develop our Z-GAN model. We keep the encoder part of the generator unchanged. We change 2D convolution layers with 3D deconvolution layers to encode a correlation between neighbor slices along the Z-axis. We keep the skip connections between the layers of the same depth that were proposed in the U-Net model [34]. We believe that skip connections help to transfer high-frequency components of the input image to the high-frequency components of the 3D shape. 3.4

Z-GAN Model

The main idea of our volumetric generator G is to use the correspondence between silhouettes in a color image and slices of a fruxel model. The original U-Net generator leverages skip connections between convolutional and deconvolutional layers of the same depth to transfer fine details from the source to the target domain effectively. We made two contributions to the original U-Net model. Firstly, we replaced the 2D deconvolutional filters with 3D deconvolutional filters. Secondly, we modified the skip connections to provide the correspondence between shapes of 2D and 3D features. The outputs of 2D convolutional filters in the left (encoder) side of our generator are F2D ∈ Rw×h×c tensors, where w, h is the width and the height of a feature map and c is the number of channels. The outputs of a 3D deconvolutional filters in the right (decoder) side are F3D ∈ Rw×h×d×c tensors.

Hamiltonian Mechanics

9

Fig. 3. Synthetic dataset generation technique: (a) virtual camera, (b) slice of fruxel model, (c) cutting plane, (d) low-poly 3D model, (e) synthetic color image.

We use d copies of each channel of F2D to fill the third dimension of F3D . We term this operation as “copy inflate”. The architecture of generator is presented in Fig. 2. 3.5

Synthetic Dataset Generation Technique

We developed a synthetic dataset generation technique to create our SyntheticVoxels dataset (see Fig. 3). We use low poly 3D models of objects to render both realistic synthetic images and generate frustum voxel models. We use 360◦ panoramic textures to provide a variety of realistic backgrounds. For each object, we sample random points on a hemisphere around the object and use them as virtual camera (a) locations. For each frame, we point the camera’s optical axis at the object and select a random background texture. We randomly select the color of the object’s texture for each frame. When camera locations and background textures are prepared for all frames, we perform dataset generation twofold. Firstly, we render a synthetic color image. Secondly, we move a cutting plane object (c) normal to the camera optical axis from the distance zn to the distance zf of the target fruxel model with the step sz . Therefore, for each synthetic color image, we render d slices of the fruxel model. We use a Boolean intersection between the cutting plane and the lowpoly 3D model to get all slices (b) of the fruxel model. Such approach allows us to keep contours in color images (e) and slices (b) geometrically aligned. We stack all d slices along the camera’s optical axis to obtain the resulting fruxel model with dimensions w × h × d. We generate our dataset using the Blender 3D creation suite. We automate background and object color randomization, the camera movement, and the

V. V. Kniaz et al.

Off-road vehicle

10

Fig. 4. Examples of color images and corresponding fruxel models from our SyntheticVoxels dataset. Fruxel models are presented as depth maps in pseudo-colors.

cutting plane movement using the Blender Python API. We use an additional ground plane to provide realistic object shadows. We render the plane with shadows separately and use alpha-compositing to obtain the final synthetic image. SyntheticVoxels Dataset. Examples of synthetic images with ground truth fruxel models from our SyntheticVoxels dataset are presented in Figs. 4 and 5. The dataset includes images and fruxel models for four object classes: car, truck, off-road vehicle, and van.

4 4.1

Experiments Network Training

Our Z-GAN framework was trained on the VoxelCity [24] and SyntheticVoxels datasets using PyTorch library [29]. We use independent test splits of SyntheticVoxels and VoxelCity datasets for evaluation with fruxel model parameters {zn = 2, zf = 12, d = 128, α = 40◦ }. The training was performed using the NVIDIA 1080 Ti GPU and took 20 hours for the whole framework. For network optimization, we use a minibatch stochastic gradient descent with an Adam solver. We set the learning rate to 0.0002 with momentum parameters β1 = 0.5, β2 = 0.999 similar to [20]. 4.2

Qualitative Evaluation

We show results of single-view voxel model generation in Figs. 6 and 7. We use three object classes: car, off-road vehicle, and van. The Z-GAN model trained

11

Van

Car

Hamiltonian Mechanics

Fig. 5. Examples of color images and corresponding fruxel models from our SyntheticVoxels dataset. Fruxel models are presented as depth maps in pseudo-colors.

only on synthetic data fails to generalize to real images. Nevertheless, it successfully predicts realistic fruxel models for the synthetic input. The real data from VoxelCity dataset [24] contains images of only nine models of cars. Therefore, the Z-GAN model trained only on real data fails to predict fruxel models for cars with a new 3D shape or color. The Z-GAN model trained on the union of real and synthetic data produces voxel models of the complex objects with fine details.

12

V. V. Kniaz et al. GT

Real

Synthetic

Real+Synthetic

Off-road vehicle Off-road vehicle Car

Input

Fig. 6. Qualitative evaluation on synthetic images from SyntheticVoxels dataset. Fruxel models are presented as depth maps in pseudo-colors. GT

Real

Synthetic

Real+Synthetic

Car

Car

Car

Input

Fig. 7. Qualitative evaluation on real images from VoxelCity dataset. Fruxel models are presented as depth maps in pseudo-colors.

Hamiltonian Mechanics

4.3

13

Quantitative Evaluation

We present results of the quantitative evaluation in terms of Intersection over Union (IoU) in Table 1. The Z-GAN model predicts probability p of each element of fruxel model being occupied by an object. We use a threshold p > 0.99 to compare a predicted fruxel model with the ground truth model. The Z-GAN model trained on synthetic and real data provides the best IoU for all object classes except the van class. Most of images for the van class in our SyntheticVoxels dataset do not provide backgrounds similar to the van in the VoxelCity dataset. We believe that this is the reason for a slightly lower performance on the van class. Nevertheless, the Z-GAN model trained on synthetic and real data provides the highest mean IoU. Table 1. IoU metric for different object classes for Z-GAN model trained on real, synthetic and mixed data.

5

Method

Object class car van off-road vehicle mean

Z-GAN synthetic

0.06 0.15 0.07

0.34

Z-GAN real

0.71 0.84 0.53

0.73

Z-GAN real + synthetic 0.76 0.79 0.79

0.78

Conclusions

We demonstrated that augmentation of the dataset with the synthetic data improves the performance of image-to-frustum voxel model translation method. While methods trained on purely synthetic data fail to generalize to real images, joint training on synthetic and real images allows our model to achieve higher IoU and to generalize to previously unseen objects. Our main observation is that the variety of background textures aids the model’s generalization ability. In our experiments, we use the Z-GAN generative adversarial network. To train the Z-GAN model, we generated a new SyntheticVoxels dataset with 2k synthetic images of three object classes and view-centered frustum voxel models. We developed a technique for the automatic generation of a view-centered dataset using low-poly 3D models and 360◦ panoramic background textures. Our technique and dataset can be used to train a single-view 3D reconstruction models. The Z-GAN model trained on our SyntheticVoxels dataset achieves state-of-the-art results in single photo voxel model prediction. Acknowledgments. The reported study was funded by Russian Foundation for Basic Research (RFBR) according to the project No 17-29-04410, and by the Russian Science Foundation (RSF) according to the research project No 19-11-11008.

14

V. V. Kniaz et al.

References 1. Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.: Pose guided RGBD feature learning for 3d object pose estimation. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 3876–3884 (2017). https://doi.org/10.1109/ICCV.2017.416 2. Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.K.: Pose guided RGBD feature learning for 3D object pose estimation. In: The IEEE International Conference on Computer Vision (ICCV) (2017) 3. Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S., Rother, C.: DSAC - differentiable RANSAC for camera localization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) 4. Brachmann, E., Rother, C.: Learning less is more - 6d camera localization via 3d surface regression. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 5. Brock, A., Lim, T., Ritchie, J., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks, pp. 1–9 (2016). https://nips.cc/ Conferences/2016. Workshop contribution; Neural Information Processing Conference : 3D Deep Learning, NIPS, 05–12 Dec 2016 6. Chang, A.X., Funkhouser, T.A., Guibas, L.J., Hanrahan, P., Huang, Q.X., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: Shapenet: an information-rich 3d model repository (2015). CoRR arXiv:abs/1512.03012 7. Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3d-r2n2: a unified approach for single and multi-view 3d object reconstruction. In: Proceedings of the European Conference on Computer Vision (ECCV) (2016) 8. Doumanoglou, A., Kouskouridas, R., Malassiotis, S., Kim, T.: Recovering 6d object pose and predicting next-best-view in the crowd. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 3583–3592 (2016). https://doi.org/10.1109/CVPR.2016.390 9. Drost, B., Ulrich, M., Bergmann, P., Hartinger, P., Steger, C.: Introducing mvtec itodd - a dataset for 3d object recognition in industry. In: The IEEE International Conference on Computer Vision (ICCV) Workshops (2017) 10. El-Hakim, S.: A flexible approach to 3d reconstruction from single images. In: ACM SIGGRAPH, vol. 1, pp. 12–17 (2001) 11. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2009) 12. Firman, M., Mac Aodha, O., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016) 13. Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects, chap. 34, pp. 702–722. Springer, Cham (2016) 14. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014) 15. Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian Conference on Computer Vision, pp. 548–562. Springer, Heidelberg (2012)

Hamiltonian Mechanics

15

ˇ Matas, J., Lourakis, M., Zabulis, X.: T16. Hodaˇ n, T., Haluza, P., Obdrˇza ´lek, S., LESS: an RGB-D dataset for 6d pose estimation of texture-less objects. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2017) 17. Hodan, T., Haluza, P., Obdrz´ alek, S., Matas, J., Lourakis, M.I.A., Zabulis, X.: T-LESS: an RGB-D dataset for 6d pose estimation of texture-less objects. In: 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017, Santa Rosa, CA, USA, 24–31 March 2017, pp. 880–888 (2017). https://doi.org/10.1109/ WACV.2017.103 ˇ On evaluation of 6d object pose estimation. 18. Hodaˇ n, T., Matas, J., Obdrˇza ´lek, S.: In: European Conference on Computer Vision Workshops (ECCVW) (2016) 19. Huang, Q., Wang, H., Koltun, V.: Single-view reconstruction via joint analysis of image and shape collections. ACM Trans. Graph. 34(4), 87:1–87:10 (2015) 20. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976. IEEE (2017) 21. Kniaz, V.V., Remondino, F., Knyaz, V.A.: Generative adversarial networks for single photo 3d reconstruction. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII2/W9, 403–408 (2019). https://doi.org/10.5194/isprs-archives-XLII-2-W9-4032019. https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-2W9/403/2019/ 22. Knyaz, V.: Deep learning performance for digital terrain model generation. In: Proceedings SPIE Image and Signal Processing for Remote Sensing XXIV, vol. 10789, p. 107890X (2018). https://doi.org/10.1117/12.2325768 23. Knyaz, V.A., Chibunichev, A.G.: Photogrammetric techniques for road surface analysis. ISPRS - Int. Arch. Photogram. Remote Sens. Spatial Inf. Sci. XLI(B5), 515–520 (2016) 24. Knyaz, V.A., Kniaz, V.V., Remondino, F.: Image-to-voxel model translation with conditional adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) Computer Vision - ECCV 2018 Workshops, pp. 601–618. Springer, Cham (2019) 25. Knyaz, V.A., Zheltov, S.Y.: Accuracy evaluation of structure from motion surface 3D reconstruction. In: Proceedings SPIE Videometrics, Range Imaging, and Applications XIV, vol. 10332, p. 103320 (2017). https://doi.org/10.1117/12.2272021 26. Krull, A., Brachmann, E., Nowozin, S., Michel, F., Shotton, J., Rother, C.: Poseagent: budget-constrained 6d object pose estimation via reinforcement learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) 27. Lim, J.J., Pirsiavash, H., Torralba, A.: Parsing IKEA objects: fine pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision ICCV (2013) 28. Ma, M., Marturi, N., Li, Y., Leonardis, A., Stolkin, R.: Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos. Pattern Recogn. 76, 506–521 (2017) 29. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) 30. Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 3848–3856 (2017). https://doi.org/10.1109/ICCV.2017.413 31. Remondino, F., El-Hakim, S.: Image-based 3D modelling: a review. Photogram. Rec. 21(115), 269–291 (2006)

16

V. V. Kniaz et al.

32. Remondino, F., Roditakis, A.: Human figure reconstruction and modeling from single image or monocular video sequence. In: Fourth International Conference on 3-D Digital Imaging and Modeling, 2003 (3DIM 2003), pp. 116–123. IEEE (2003) 33. Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested shape layers. arXiv.org (2018) 34. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer, Cham (2015) 35. Shin, D., Fowlkes, C., Hoiem, D.: Pixels, voxels, and views: a study of shape representations for single view 3d object shape prediction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 36. Sock, J., Kim, K.I., Sahin, C., Kim, T.K.: Multi-task deep networks for depth-based 6D object pose and joint registration in crowd scenarios. arXiv.org (2018) 37. Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) 38. Sun, X., Wu, J., Zhang, X., Zhang, Z., Zhang, C., Xue, T., Tenenbaum, J.B., Freeman, W.T.: Pix3d: dataset and methods for single-image 3d shape modeling. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 39. Tatarchenko, M., Dosovitskiy, A., Brox, T.: Multi-view 3D Models from single images with a convolutional network. arXiv.org (2015) 40. Tejani, A., Kouskouridas, R., Doumanoglou, A., Tang, D., Kim, T.: Latent-class hough forests for 6 DoF object pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 40(1), 119–132 (2018). https://doi.org/10.1109/TPAMI.2017.2665623 41. Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, W.T., Tenenbaum, J.B.: MarrNet: 3D shape reconstruction via 2.5D sketches. arXiv.org (2017) 42. Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016) 43. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, Princeton University, Princeton, United States, pp. 1912–1920. IEEE (2015) 44. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3d object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2014) 45. Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learning single-view 3d object reconstruction without 3d supervision. papers.nips.cc (2016) 46. Yang, B., Rosa, S., Markham, A., Trigoni, N., Wen, H.: 3D object dense reconstruction from a single depth view. arXiv preprint arXiv:1802.00411 (2018) 47. Yang, B., Wen, H., Wang, S., Clark, R., Markham, A., Trigoni, N.: 3D object reconstruction from a single depth view with adversarial learning. In: The IEEE International Conference on Computer Vision (ICCV) Workshops (2017) 48. Zheng, B., Zhao, Y., Yu, J.C., Ikeuchi, K., Zhu, S.C.: Beyond point clouds: scene understanding by reasoning geometry and physics. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

Tensor Train Neural Networks in Retail Operations Serge A. Terekhov(B) SC Svyaznoy, Moscow, Russian Federation [email protected]

Abstract. The neural network generalization of Tensor Train decomposition for multidimensional datasets of censored Poisson counts is presented. The model is successfully applied to two important classes of retail operations: sales process under the controlled stock distribution over the retail network, and the optimization of active retailer decisions, such as pricing policy, marketing actions, and discounts. The advantage of proposed Tensor Train Neural Network model is in its ability to capture non-linear relations between similar retail stores and similar consumer goods, as well as jointly estimate sales potential of commodities with wide dynamic range of popularity. Keywords: Tensor Train Neural Network Poisson counts process · Censored samples Retail operations

1

· Statistical estimation · · Context bandits ·

Introduction

Stable statistical estimation of performance indicators and activity responses is critical for control tasks in modern retail operations. Corporate data is not only very noisy because of the intrinsic stochasticity of market processes, but observations are also the subject of truncation and censoring, e.g. due to endogenous control decisions and stock availability. Decision makers need truthful disturbancefree measures both for the operations (such as sales) under the current conditions, and for the potential value of business in some new or alternative contexts. In case of retail network operations, the living example is an estimation of perspective sales of a certain commodity in alternative stores, where it was not shown before. This paper addresses two important classes of retail operations: the process of sales over the retail network, and the optimization of active decisions, such as pricing policy, marketing actions, and discounts. Optimal sales require the control of stock distribution, while actions need valuable mix of their parameters. The resulting value of operation depends on several context dimensions, which will be treated as exogenous factors. For example, the estimated intensity of buyers flow is identified at certain location (retail store), for particular item to be sold, and at certain period of time. These discrete context variables are c Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 17–24, 2020. https://doi.org/10.1007/978-3-030-30425-6_2

18

S. A. Terekhov

considered as dimensions of multi-dimensional tables or modes of a tensor [1]. Number of tensor modes is usually limited, but the size of discrete dictionary along each mode is rather high. The entire count of data cells span can quickly become several billions and more. The data is sparse and self-similar, so it can be modelled using low rank tensor decompositions [1–4]. Among them the Tensor Train decomposition [4] is in the focus of our research, since this formulation allows to introduce useful generalizations. As an complement to standard Tensor Train formulation [2,4], this paper presents the following extensions: (1) Sparse data tensor contains realizations of random event counts rather than well-defined numerical values. Each value is observed for its own combination of indices of context tensor modes; (2) Observations follow right-censored Poisson distributions, with individual rates β and upper censoring bounds. The censoring indicators are reported together with counts truncated at their upper bounds; (3) Basic Tensor Train matrix multiplications are generalized to layered neural operations. For each modelled tensor cell the set of neural layers is operated layer by layer instead of the chaining linear multiplications matrix by matrix. Elements of original Tensor Train matrices (i.e. reduced 3-mode tensors) are serving as adjustable synaptic weights for neural units. The neural extension of Tensor Train model, called Tensor Train Neural Network (TTNN), was proposed in author’s lectures [6–8]. In this paper the TTNN is treated as non-parametric variational probe function that estimates de-censored Poisson distribution parameter β. TTNN is specific neural network architecture comprising large set of small neural layers dynamically combined for each set of tensor indices. This approach avoids training of one huge neural network with sparse representation of input context variables. On the other hand, it preserves the full richness of nonlinear neural approximations, not available e.g. in recent linear variable grouping methods [21]. The idea of representing the regular deep neural network layer by Tensor Train decomposition was proposed earlier [5]. Here, in contrast, the neural architecture is ab initio designed as the Tensor Train set of small elements, with no assumption about any underlying large neural model. In described retail operations the TTNN model represents a statistical estimate of Poisson events rates from passively observed censored counts. The application of TTNN model to rewards estimation for actively collected samples in context bandit formulation is also discussed.

2

Formulation

Consider the basic operation of daily sales of commodities portfolio in large retail network stores for extended period. Collected data is counts of items sold for each commodity, at each store, and for every business day. Each store sells just a fraction of the whole list of available commodity types, limited both physically

TTNN in Retail

19

by the store capacity and by the marketing plan. The practical problem is to estimate the intensity of sales of total set of portfolio items over the whole set of stores. Daily sales at particular context usually are limited by availability of stock, thus some of the observed counts are censored by truncation. The probability to observe a ≥ 0 counts at the end of time period t given initial available stock r > 0 is defined by truncated Poisson distribution: P (a|r, β, t) =

(β · t)a exp(−β · t), a!

a = 1, 2, ...r − 1

P (a = 0|r, β, t) = exp(−β · t) P (a = r|r, β, t) = 1 −

r−1

P (k|r, β, t)

k=0

where β is intensity of Poisson flow. This distribution can be easily derived from the general queueing birth-death process [16]. The observations can be represented as a tensor with d = 3 modes for the time periods, set of stores, and the portfolio items. To simplify the notation let’s introduce tensor enumeration index s = (i1 , i2 , .., id ). The set of index combinations of all available observations in data set A = {A(s)} is denoted as S. Then each observation is independently generated from the distribution with individual unknown parameter β = β(s).

Fig. 1. Schematic representation of Tensor Train Neural Network assembly. For each set of tensor indices (filled), the linked chain of corresponding neural layers is composed ˆ to compute the output β.

The stable estimates βˆ of the complete tensor from sparse, noisy and censored data is challenging task [1], in some sense similar to tensor design of recommender systems [9]. We will consider the variational approach, when βˆ is approximated with a member of non-parametric low-rank Tensor Train model, where all matrix elements are treated as free variational parameters. In classical formulation [2]: ˆ 1 , ..., id ) ≈ g(i1 , j1 ) · G(j1 , i2 , j2 ) · ... · G(jd−2 , id−1 , jd−1 ) · g(jd−1 , id ) β(i j1 ..jd−1

20

S. A. Terekhov

with log statistical link function for unrestricted model, or identity link with the restriction of non-negativity of matrix elements. Algebraic multiplications h(j) = g(i1 , :) · G(:, i2 , j) are the special case of more general neural layer functions h(i2 , j) = Fj (h(i1 , :) · G(:, i2 , j)), where h(i1 , :) = g(i1 , :), and F is vector of sigmoids. Neural transformations are applied mode by mode, with single neuron for the last id . Resulting model is called Tensor Train Neural Network [6–8]. Conceptually, the diagram of TTNN functioning is shown in Fig. 1. Instead of using a single large neural network, the TTNN model comprises many tiny neural networks, with the number of units defined by tensor decompoˆ 1 , i2 , .., id ) the neural layers chain with sition rank. To estimate every element β(i indices (i1 , i2 , .., id ) is dynamically composed. The gradient of modelled function is computed via standard backpropagation chaining rule.

3

Estimation

The pattern of observed data samples follows some stable distribution as defined by retail operational practice. Samples are independent, since at every location and time period the outcome is produced by different customers. The logarithm of observed data likelihood depends on variational tensor parameters: log P (A(s)|β(s), r(s)) L(A|β) = s∈S

where s is tensor index, r are censoring indicators. The target Poisson intensities β are given by TTNN neural model, as described above.

0.141

0.139

0.137

2

4

6

Fig. 2. Likelihoods (arbitrary units) of predictions from bagging ensembles versus varying model rank (decomposition matrix dimension, same for all tensor modes). Circles are off-samples, dots - training samples for each committee member.

TTNN in Retail

21

The estimates of variational matrices g(i1 ), G(i2 ), ...G(id−1 ), g(id ) could be obtained using direct maximization of total likelihood. But it should be noticed that the elements with pure censored observations are formally unbounded (as it reads form truncated Poisson distribution). Also different tensor modes can √ be re-scaled with proper multipliers d C1 C2 ...Cd = 1, leading to non-uniqueness similar to one in canonical CP tensor decomposition [1]. This adjusts the use of Bayesian formulation, when random model parameters are regularized with L1/L2 prior terms, and random Poisson data β is enriched with Gamma prior Lg = ξβ − η · log(β) with small low-informative parameters. Resulting maximization of posterior is fulfilled by means of stochastic gradient heuristics (Adam [10], ADADELTA [11]). Also, more traditional Rprop [12] was used with extended random batch pages (1M and more samples). The attraction of Rprop is in more clear evidence of the convergence, with no external step size controlling schedules. There is no known sequential tensor update methods (like classic [13]) for censored likelihoods. Applications of this kind are scaled upto 100 × 5000 × 5000 tensors with hundreds of millions of sparse observations. All estimation models are intended to be used in a new unseen combinations of tensor indices. Thus, in discussed retail application, some commodity sales are planned for alternative stores. A statistical measure of estimated β uncertainty is required, with low and high quantiles for risk assessment. To meet these requirements it is proposed to apply bagging committee methodology [14]. Bootstrap ensemble of TTNN models is trained in parallel, with control of on- and off-sample likelihoods. Resulting estimates of β are aggregated into one median estimate, with individual quantile ranges for each estimated value. The learning curve for bagging ensemble with incremented model complexity (TT matrix rank) is shown in Fig. 2. Lowest off-sample neg-log likelihood indicates the best ensemble rank. Bagging estimators have many attractive properties. In context of tensor data they correctly report large IQR [16] uncertainties for rare indices combinations. The accuracy is constantly validated using new field data available fresh after each business day. Table 1 represents a typical example of report row from overnight computation. Table 1. Example of report row with β estimates. index i0 index i1 index i2 beta.med beta.lo beta.hi 91

3053

471

0.138

0.1

0.145

It reads that at the particular store 3053, the SKU 471 for date 91 is expected to sell with rate of one item in 7 ≈ 1/0.138 days, with 25% risk of running as low as one item in 10 days or slower. Adequate complexity of tensor neural model can be assessed also with ordered rank statistics of false neighbors [15]. Let’s consider series of models with growing

22

S. A. Terekhov

tensor matrices dimensions M = 1, 2, .... For particular tensor mode (the latest one with indices id is picked in our applications) pairwise distances between all vectors g(id ) are computed. Ordered neighbors set for each vector U (id ) is collected. The set of neighbors tends to stabilize when the model is approaching the correct dimension. Rank correlations between these sets for decompositions of varying complexity M and M + 1 are compared, and recommended model complexity is one with low number of false neighbors. Formal rank-based hypothesis testing criteria can be utilized [16].

4

Active Operations and Context Bandits

The estimation problem discussed in previous section assumes the fixed exogenous distribution of data samples collected in passive observation regime. This is the usual situation for routine retail operations. No special treatment of data except regular data quality checking is required in this case. Another important class of applications is active operations that change data generation conditions. These may include varying prices decisions, marketing and discount actions, and other dynamic control technologies common in retail practice. In these cases data generation process become partly endogenous, i.e. it starts depending on performance under previously taken decisions and internal system variables. This phenomenon is somewhat similar to “self-selection bias” known in economics literature. Well-established way to optimize the retail system performance is balanced stratified sampling under conditions of randomized designed experiments, such as latin hypercubes [17]. This approach, developed mostly for technical systems, is very limited in retail practice where necessary experimentation in any nonprofitable conditions should be justified by additional gains in advance. Tensor decomposition models lead to optimal utilization of all designed data, since experimentation with different actions can be performed at different locations (and even with different commodities). Collected data is then fused into one reward estimating tensor model. In pure active settings the problem reduces to common contextual bandits formulation [18]. Consider a set of available controls or actions V , also called “bandit arms”. In retail applications these are pricing level decisions, discount packages, or corporate KPI’s targeted to sales optimization. The actions are applied in different contexts defined by tensor modes. Only one particular action can be tried for particular tensor indices s = (i1 , i2 , .., id ), and the feedback estimate is revealed only for chosen option. Let’s extend the set of tensor modes with additional mode {id+1 ∈ V } for available actions v ∈ V , including no-action option. The tensor TTNN model is still directly applicable to estimation of events flow intensities β, provided that selection of actions is randomized. To actively choose more profitable action for each context the exploration process [18,19] is used, with constant exploration factor γ. Given the context s, the estimates of potentials βˆs are computed form TTNN model for each action v

TTNN in Retail

23

(last coordinate). Let v ∗ (s) is best believed action for s, estimated from TTNN ensemble median, or from upper confidence bound (UCB), as defined by estimated quantiles. The probability to select an action v in context s is given by: P (v ∈ V, s) = (1 − γ) · δv,v∗ (s) +

γ |V |

where δ is Kronecker delta, and |V | is total count of available actions. In case of active selection the TTNN model is trained with weighted [20] by inverse propensity samples, ws ∼ 1/P (v, s). The optimal actions selection control is guided by periodically updated tensor neural model. The exploration rate γ is usually limited by economic considerations. The practical one, e.g. for pricing decisions, is to confine potential loss β · γ · (vmax − v ∗ ) within the experimentation budget.

5

Conclusion

Contemporary tensor decomposition models and algorithms are subject of intensive study for more than a decade, from point of view both of theory and of academic concept applications. During recent couple of years “early birds” of industrial and commercial applications become more frequent. This paper is one of the kind, it discusses two important classes of operation research problems in very traditional offline retail industry. For the case of sales planning and control, the general sales potential estimation problem has been considered. It is shown that available sales and stock statistics can be utilized to predict potential sales of new commodities and in new locations, including estimates of probabilistic risk of operation outcome. The advantage of proposed Tensor Train Neural Network model is in its ability to capture non-linear relations between similar stores and similar goods, as well as jointly estimate sales potential of commodities with wide dynamic range of popularity. In the area of active operations such as marketing actions and pricing decisions, neural tensor models are extremely helpful in aggregating actively collected information in common self-sustained estimates. In bandit setting, different control actions can be applied to different contexts, such as different discounts in different stores. Tensor estimates of sales potential then provide a guidance of selection of most profitable actions for each store location.

References 1. Acar, E., Dunlavy, D.M., Kolda, T.G., Mørup, M.: Scalable tensor factorizations with missing data. (2010). http://www.cs.sandia.gov/dmdunla/publications/ AcDuKoMo10.pdf 2. Oseledets, I.V., Tyrtyshnikov, E.E.: TT-cross approximation for multidimensional arrays. Linear Algebra Appl. 432, 70–88 (2010)

24

S. A. Terekhov

3. Tensor Decompositions: Applications and Efficient Algorithms at SIAM CSE 2017. http://perso.ens-lyon.fr/bora.ucar/tensors-cse17/index.html. Accessed 10 Oct 2019 4. Oseledets, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33, 2295–2317 (2011) 5. Novikov, A., Podoprikhin, D., Osokin, A., Vetrov, D.: Tensorizing neural networks. In: Advances in Neural Information Processing Systems 28, NIPS, pp. 442–450 (2015) 6. Terekhov, S.A.: Tensor decompositions in statistical estimation. In: XIX International Conference Neuroinformatics–2017, Moscow, 2–6 October 2017 (2017). (in Russian) 7. Terekhov, S.A.: Tensor decompositions in estimation and statistical decision making. In: Conference OpenTalks.ai, Moscow, 7–9 February 2018 (2018). (in Russian) 8. Terekhov, S.A.: Tensor decompositions in statistical decisions. In: Conference on Artificial Intelligence Problems and Approaches, Moscow, 14 March 2018, pp. 53– 58 (2018). http://raai.org/library/books/Konf II problem--2018/book1 intellect. pdf. Accessed 10 Oct 2019. (in Russian) 9. Frolov, E., Oseledets, I.: Tensor methods and recommender systems. arXiv:1603.06038 [cs.LG], 19 March 2016 10. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 [cs.LG] (2014) 11. Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv:1212.5701 [cs.LG] (2012) 12. Igel, C., Husken, M.: Improving the Rprop learning algorithm. In: 2nd ICSC International Symposium Neural Computation, NC 2000, pp. 115– 121. ICSC Academic Press (2000). https://pdfs.semanticscholar.org/df9c/ 6a3843d54a28138a596acc85a96367a064c2.pdf 13. Cichocki, A., Zdunek, R., Amari, S.: Hierarchical ALS algorithms for nonnegative matrix and 3D tensor factorization. In: Davies, M.E., James, C.J., Abdallah, S.A., Plumbley, M.D. (eds.) Independent Component Analysis and Signal Separation, pp. 169–176. Springer, Heidelberg (2007) 14. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996) 15. Rhodes, C., Morari, M.: The false nearest neighbors algorithm: an overview. Comput. Chem. Eng. 21, S1149 (1997). https://doi.org/10.1016/S0098-1354(97)87657-0 16. Ivchenko, G.I., Medvedev, Yu.I.: Mathematical statistics. URSS, Moscow (2018). (in Russian) 17. Montgomery, D.C.: Design and Analysis of Experiments, 9th edn. Wiley, New Jersey (2017) 18. Langford, J., Zhang, T.: The epoch-greedy algorithm for contextual multi-armed bandits. In: Advances in Neural Information Processing Systems 20, NIPS, pp. 1096–1103 (2008) 19. Allesiardo, R., Feraud, R., Bouneffouf, D.: A neural networks committee for the contextual bandit problem. arXiv:1409.8191 [cs.NE], 29 September 2014 20. Chu, W., Li, L., Reyzin, L., Schapire, R.E.: Contextual bandits with linear payoff functions. In: 14th International Conference on Artificial Intelligence and Statistics, AISTATS, Fort Lauderdale, FL, USA (2011) 21. Tay, J.K., Friedman, J., Tibshirani, R.: Principal component-guided sparse regression. arXiv:1810.04651 [stat.ME], 24 October 2018

Semi-empirical Neural Network Based Modeling and Identification of Controlled Dynamical Systems Yury Tiumentsev(B) and Mikhail Egorchev Moscow Aviation Institute (National Research University), Moscow, Russia [email protected]

Abstract. One of the critical elements of the process of creating new engineering systems is the formation of mathematical and computer models that provide solutions to the problems of creating and using such systems. For such systems, typical is a high level of complexity of the objects and processes being modeled, their multidimensionality, non-linearity and non-stationarity, the diversity and complexity of the functions implemented by the simulated object. The solution to the problems of modeling for objects of this kind is significantly complicated by the fact that the corresponding models have to be formed in the presence of multiple and diverse uncertainties, such as incomplete and inaccurate knowledge of the characteristics and properties of the object being modeled, as well as the conditions in which the object will operate. Besides, during operation, the properties of the object being modeled may change, including sharp and significant, for example, due to equipment failures and/or structural damages. An approach to the formation of gray box models (semi-empirical models) for systems of this kind, based on combining theoretical knowledge about the object of modeling with the methods and tools of neural network modeling, is considered. As an example, we demonstrate the formation of a model for the longitudinal angular motion of a maneuverable aircraft, as well as the identification of the aerodynamic characteristics for the aircraft included in this model. Keywords: Nonlinear dynamical system · Semi-empirical model Grey box model · Neural network · Aircraft motion simulation

1

·

Introduction

In the processes of development and operation of technical systems, including aircraft, a significant place is occupied by the solution of such problems as the analysis of the behavior of dynamical systems, the synthesis of control algorithms for them, and the identification of their unknown or inaccurately known characteristics. A crucial role in solving the problems of these three classes belongs to mathematical and computer models of dynamic systems [1,2]. Traditional classes of mathematical models for technical systems are ordinary differential equations (for systems with lumped parameters) and partial c Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 25–42, 2020. https://doi.org/10.1007/978-3-030-30425-6_3

26

Y. Tiumentsev and M. Egorchev

differential equations (for systems with distributed parameters). As applied to controlled dynamical systems, ordinary differential equations are most widely used as a modeling tool [3–5]. Methods of forming and using models of the traditional type are by now sufficiently developed and successfully used to solve a wide range of tasks. However, concerning modern and advanced technical systems, some problems arise, the solution of which cannot be provided by the traditional methods. These problems are caused by the presence of various and numerous uncertainties in the properties of the corresponding system and in its operational conditions, which can be parried only if the system in question has the property of adaptability [6–8], i.e., there are means of operational adjustment of the system and its model to the changing current situation. As the experience shows [9,10], the modeling tool that is adequate to this situation is an approach based on the concept of an artificial neural network (ANN). We can consider such an approach as an alternative to traditional methods of modeling dynamic systems, which provides, among other things, the possibility of obtaining adaptive models. At the same time, conventional neural network models of dynamical systems, in particular, the models of the NARX and NARMAX classes [10], which are most often used for the simulation of controlled dynamic systems, do not fully meet the requirements. One of the most important reasons for the insufficiently high efficiency of traditional-type ANN-models concerning the class of problems under consideration is the formation of a purely empirical model (that is, a model of the black box type), which should cover all the nuances of the behavior of dynamic systems. For this, it is necessary to build an ANN-model of a sufficiently high dimension (that is, with a large number of adjustable parameters in it). At the same time, we know from the experience of ANN-modeling that the larger the dimension of the ANN -model, the higher the amount of training data required to configure it. As a result, with the amount of experimental data that we can obtain for complex technical systems, it is not possible to train such models, providing a given level of accuracy. We propose some combined approach to overcome such difficulties, which are typical for traditional models, both in the differential equations and ANN-model forms [11–16]. We base this approach on ANN-modeling because only in this variant, we can obtain adaptive models. Theoretical knowledge about the object of modeling, existing in the form of ordinary differential equations (these are, for example, traditional models of aircraft motion), we embed into the ANN-model of the combined type (so-called semi-empirical ANN-model). At the same time, a part of the ANN-model is formed based on the available theoretical knowledge and does not require further adjustment (learning). Only those elements that contain uncertainties, such as the aerodynamic characteristics of the aircraft, are subject to adjustment and/or structural reorganization in the learning process of the generated ANN-model. The result of this approach are semi-empirical ANN-models, which allow solving problems inaccessible to traditional ANN-methods: dramatically reduce the dimension of the ANN-model, which enables us to achieve the required accuracy

Semi-empirical ANN-Modeling of Controlled Dynamical Systems

27

from it using training sets that are insufficient for traditional ANN-models; provide the ability to identify the characteristics of dynamic systems described by nonlinear functions of many variables (for example, the coefficients of aerodynamic forces and moments). The following sections discuss the implementation of this approach, as well as an example of its application for modeling the aircraft motion and identifying the aerodynamic characteristics of the aircraft.

2

Dynamical System as an Object of Study

Let there be some dynamic system S, which is the object of modeling (Fig. 1). ξ u

F (u, ξ)

ζ x

G(x, ζ)

y

y = Φ(u, ξ, ζ) = G(F (u, ξ), ζ) Fig. 1. General structure of the simulated dynamical system

The system S perceives controlled u(t) and uncontrolled ξ(t) effects. Under these influences, S changes its state x(t) according to its transformation (mapping) F (u(t), ξ(t)). At the initial time instant t = t0 the system state S takes the value x(t0 ) = x0 . The state x(t) is perceived by the sensor (observer) implementing the transformation G(x(t), ζ(t)), and is given as the output of the system S, i.e. as the results of observation y(t) for its state x(t). The imperfection state sensors of the system S is taken into account by the introduction of an additional uncontrolled effect ζ(t) (“measuring noise”). The composition of mappings F (·) and G(·) describes the relationship between the controlled input u(t) ∈ U of the system S and its output y(t) ∈ Y , taking into account the influence of uncontrolled effects ξ(t) and ζ(t) on the system under consideration: y = Φ(u(t), ξ(t), ζ(t)) = G(F (u(t), ξ(t)), ζ(t)). Let for system S be made Np observations {yi } = Φ(ui , ξ, ζ), i = 1, . . . , NP ,

(1)

each of which recorded the current value of the controlled input ui = u(ti ) and the corresponding output yi = y(ti ). The results y(ti ), ti ∈ [t0 , tf ] of these observations together with the corresponding values of the controlled inputs ui form a set of NP ordered pairs: {ui , yi }, ui ∈ U, yi ∈ Y, i = 1, . . . , NP .

(2)

28

Y. Tiumentsev and M. Egorchev

It is required to find, using the data (2), such an approximation Φ(·) to display Φ(·), implemented by the system S, to fulfill the condition Φ(u(t), ξ(t), ζ(t)) − Φ(u(t), ξ(t), ζ(t)) ε, ∀u(ti ) ∈U, ∀ξ(ti ) ∈ Ξ, ∀ζ(ti ) ∈ Z, t ∈ [t0 , tf ], x(t0 ) = x0 .

(3)

Thus, as it follows from (3), it is necessary that the sought approximate map has the required accuracy not only when reproducing observations (2), but Φ(·) also for all valid values of ui ∈ U for all valid initial conditions x(t0 ) = x0 . We will call this mapping property Π(·) generalizing. The entries ∀ξ(ti ) ∈ Ξ approximation will have the required and ∀ζ(ti ) ∈ Z in (3) mean that the Φ(·) accuracy provided that at any time instant t ∈ [t0 , tf ] uncontrolled impacts ξ(t) on the S and the measurement noises ζ(t) do not exceed the permissible limits. The mapping Φ(·) corresponds to the considered modeling object (dynamical will be further named model of this object. We system S), and the mapping Φ(·) will also further assume that for the S system, we have data of the form (2), and possibly some knowledge of the “design” of the Φ(·) mapping implemented by the considered system. In this case, the presence of data of this type is required. At least, they are required to test the Φ(·)model being created. Knowledge about the mapping Φ(·) may not be available, or they may be, but will not be used in the formation of the model Φ(·). Since the available number of experiments generating the set (2) is finite, the norm · in the expression (3) will be treated as the standard deviation of the form ξ, ζ) − Φ(u, ξ, ζ) = Φ(u,

NP 1 i , ξ, ζ) − Φ(ui , ξ, ζ)]2 [Φ(u NP i=0

(4)

or form NP 1 i , ξ, ζ) − Φ(ui , ξ, ζ)]2 . [Φ(u Φ(u, ξ, ζ) − Φ(u, ξ, ζ) = NP i=0

(5)

mapping to evaluate its generalizing properties is performed Testing the Φ(·) on a set of ordered pairs similar to (2) {˜ uj , y˜j }, u ˜ ∈ U, y˜ ∈ Y, i = 1, . . . , NT ,

(6)

this requires that the condition ui = u ˜i , ∀i ∈ {1, . . . , NP }, ∀j ∈ {1, . . . , NT } is met, i.e. all pairs in sets P {ui , yi }N i=1 ,

should be non-matching.

T {˜ uj , y˜i }N j=1

Semi-empirical ANN-Modeling of Controlled Dynamical Systems

29

The error on the test set (6) is calculated in the same way as for the training set (2) NT 1 uj , ξ, ζ) − Φ(˜ [Φ(˜ uj , ξ, ζ)]2 , Φ(˜ u, ξ, ζ) − Φ(˜ u, ξ, ζ) = NT j=0

(7)

it can also be represented as

NT 1 uj , ξ, ζ) − Φ(˜ Φ(˜ u, ξ, ζ) − Φ(˜ u, ξ, ζ) = [Φ(˜ uj , ξ, ζ)]2 . NT j=0

(8)

Now we can formulate the problem of forming a model of the dynamical system S. We need to build a model Φ(·), which reproduce with the required level of accuracy a mapping Φ(·), realized by the system S, i.e., a model of Φ(·) for which the magnitude of the modeling error (7) or (8) on the test set (6) will not exceed the specified permissible values ε in (3). This formation should be based on the data (2) used to learn the model, as well as (6) used to test the model, in addition, possibly on knowledge about the S system.

3

The Main Problems that Need to Be Solved in the Formation of a Dynamical System Model

When developing a model of a dynamic system, several problems arise that need to be solved. Namely, we need to form: – a set of quantities characterizing the object being modeled; – class (family) of models, which includes the desired model; – representative (informative) set of experimental data for the formation and testing of the model; – tools for selecting a specific model from the given class (criterion of the adequacy of the model and the algorithm for its search). Briefly, these problems we can characterize as follows. Formation of a Set of Quantities Characterizing the Modeled Object. The first thing that needs to be done when forming a model of a dynamic system is to reveal a set of quantities characterizing the system under consideration. This problem refers to the statement of the modeling problem and is not considered further. We consider that the task has already been stated, i.e., a decision has already been made as to which quantities should be taken into account in the simulation. Formation of a Family of Models that Includes the Desired Model. To solve the problem of modeling a dynamical system, we have firstly to form some j (·), j = 1, 2, . . .. Then we need to choose the (F ) = Φ set of variants (family) Φ ∗ (·). As already noted, when best, in a certain sense, a variant of the model Φ solving this part of the problem of modeling a dynamic system, it is necessary to answer the following two questions:

30

Y. Tiumentsev and M. Egorchev

(F ) = {Φ j (·)}, j = 1, 2, . . .? – What is the desired family of variants Φ (F ) ∗ (·) that satisfies the con the variant Φ – How to choose from the family Φ dition Φ(u(t), ξ, ζ) − Φ(u(t), ξ, ζ) ε, t ∈ [t0 , tf ], ∀u ∈ U , ξ ∈ Ξ, ζ ∈ Z? The main ideas, which are further used to answer these questions, are as follows: (F ) the key is efficient structurization and – when forming the set of options Φ parameterization of this family of models; (F ) family, the key is machine ∗ (·) from the Φ – when choosing some variant Φ learning. Generation of a Representative Set of Experimental Data for the Formation and Testing of the Model. One of the essential components of the process of forming a dynamical system model is the acquisition of a data set, which completely characterizes the behavior of the considered system. The success of solving a simulation problem to a considerable extent depends on how informative the existing training set is. Forming a Tool for Selecting a Particular Model from the Given Class. After a family of models has been formed for the dynamical system under consideration, as well as a representative set of data describing its behavior, it is necessary to define a tool that allows us to “extract” from this family a specific model that satisfies a prescribed set of requirements. As such a tool within the framework of the approach under consideration, it is quite natural to use the means of neural network learning.

4

Neural Network Semi-empirical Models of Controllable Dynamical Systems

As already noted, empirical ANN-models have severe limitations for the complexity level of the problems to be solved. We propose to solve this problem in the class of modular semi-empirical dynamic models combining the capabilities of theoretical and neural network modeling. The formation and step-by-step adjustment of such ANN-models we discuss in more detail in [11,12], which also provides a comparison of the accuracy characteristics for semi-empirical and empirical models. The Relationship Between Empirical and Semi-empirical Models of Dynamical Systems. Purely empirical models (black box models) are based only on experimental data obtained by observing the behavior of the simulated system [10,17,18]. This approach is typical for traditional neural network modeling. In some cases, it may be the only possible one if there is no a priori knowledge about the nature of the system being modeled, as well as about mechanisms of its functioning. However, this kind of knowledge is often present. In particular, there are numerous models of motion for objects of various types (aircraft, ships, cars, etc.) based on the laws of mechanics, and in some cases, on laws from other fields of science. For example, when modeling the motion of aircraft with high

Semi-empirical ANN-Modeling of Controlled Dynamical Systems

31

supersonic speeds, when thermal phenomena begin to play a significant role, we need to include in the motion model not only relations based on the laws of mechanics but also relations from the field of thermodynamics and heat transfer. Models of this kind, especially those derived directly from the fundamental laws of nature (“from first principles”), play a crucial role in all areas of science and technology. The formation of such models, however, is associated with severe difficulties. We need to have appropriate knowledge about the object being modeled, but it is not always possible. Besides, even if such a model exists, for example, an aircraft motion model, it may be unsuitable for solving some specific task. Firstly, this model may contain quantities and dependencies, in the values of which there are significant uncertainties, which, accordingly, prevents from obtaining an accurate and reliable solution. Secondly, even if the model is fully formed and there are no uncertainties in it, it may be unsuitable for solving real-world applied problems. For example, if we want to simulate the motion of some object in real time with high accuracy, the traditional model of motion in the form of a system of differential equations (ordinary or partial derivatives), which is solved using appropriate methods of numerical integration, may require an unacceptably long time to obtain solutions. As was shown above, approximate empirical models are formed to overcome these difficulties. One of the most effective ways to obtain such models is the neural network approach. The models, called theoretical (“white box”), are directly opposed to purely empirical models (“black box”) according to the principles of their formation. Empirical data are involved in the process of obtaining a theoretical model only indirectly as a source of information about the system, the nature of its behavior. This information makes it possible to choose the appropriate class of relationships that describe the modeled system behavior, but these empirical data themselves are not used when forming the relationships themselves. In contrast, empirical models are based solely on experimental data. They are formed in such a way as to respond to this data in the best possible way, i.e., reproduce them with the least error. Empirical models, allowing to overcome the difficulties associated with theoretical models, cause, in turn, new challenges that were not for models of the theoretical type. In particular, learning of these models requires the presence of appropriate training data sets, the acquisition of which can be a complicated task. In the case when there is no data on the object being modeled, other than experimental ones, characterizing its behavior, nothing remains but to try to obtain an empirical model for the object. But in the case when, in addition to empirical data, there is also some knowledge about it, for example, in the form of equations of motion, albeit with uncertain factors in them, we have to try to use theoretical data as complementary to the available experimental data. Such a combined, compromise approach can be called a semi-empirical simulation (gray box simulation) [19–22]. In comparison with a purely theoretical approach, its application allows to increase the accuracy of modeling due to the fact that the negative impact of elements of a theoretical model that cannot be

32

Y. Tiumentsev and M. Egorchev

adequately described due to a lack of relevant knowledge can be compensated by conversion this model into a semi-empirical form and refining it with training on available experimental data. As applied to purely empirical models, taking into account existing theoretical knowledge through the transition to semi-empirical models allows us to simplify the process of forming models that meet the specified requirements, and also, which is very important, to reduce the amount of experimental data required to train the model. Moreover, the higher the amount of theoretical knowledge involved, the smaller the amount of experimental data necessary. The General Scheme of the Formation of Semi-empirical ANNModels. The proposed approach consists of using to improve the model being formed the theoretical knowledge about the simulated dynamical system jointly with structural transformations and the learning of the theoretical model transformed into a neural network form. We take into account theoretical knowledge of two types: about the object of modeling and appropriate computational methods. Model refinement is performed through neural network learning. As a result, we form a dynamic ANNmodel, the architecture of which takes into account the existing knowledge about the object of modeling. Traditional neural network models, as has been repeatedly noted, are purely empirical (black box), they are based only on experimental data on the behavior of the system [23]. The dynamic modular networks considered below, related both experimental data and theoretical knowledge available, can be classified as semi-empirical models (gray box) [19,20]. The formation of dynamic networks with a modular architecture in the form of semi-empirical ANN-models consists of the following stages [11,12]: (1) the formation of a theoretical model with continuous time for the studied dynamical system, the acquisition of available experimental data on the behavior of this system; (2) evaluation of the accuracy for the theoretical model of a dynamical system on available data, in the case of insufficient accuracy of it, hypothesizing the reasons for this and possible ways to eliminate them; (3) conversion of the source model with continuous time to a model with discrete time; (4) formation of a neural network representation for the obtained model with discrete time; (5) learning neural network model; (6) assessment of the accuracy of the trained neural network model; (7) adjustment, in case of insufficient accuracy, of the neural network model by introducing structural changes into it. The issues of structural formation of semi-empirical ANN-models, as well as the comparison of their properties with the properties of traditional (empirical) ANN-models, are discussed in more detail in [24]. An Example of a Semi-empirical ANN-Model. An assessment of the performance of the ANN-model under consideration was carried out regarding the

Semi-empirical ANN-Modeling of Controlled Dynamical Systems

33

aircraft angular longitudinal motion, which is described using a mathematical model traditional for flight dynamics [25,26]: α˙ = q − q˙ =

g q¯S CL (α, q, ϕ) + , mV V

q¯Sc Cm (α, q, ϕ) , Jy

(9)

T 2 ϕ¨ = −2T ζ ϕ˙ − ϕ + ϕact , α is angle of attack, deg; q is pitch angular velocity, deg/sec; ϕ is deflection angle of elevator or all-moving horizontal tail, deg; CL is lift coefficient; Cm is pitching moment coefficient; m is mass of aircraft, kg; V is airspeed, m/sec; q¯ = ρV 2 /2 is airplane dynamic pressure; ρ is mass air density, kg/m3 ; g is acceleration of gravity, m/sec2 ; S is wing area of aircraft, m2 ; c is mean aerodynamic chord, m; Jy is pitching moment inertia, kg · m2 . T , ζ are time constant and relative damping factor for elevator actuator; ϕact is command signal value for the elevator (or all-moving horizontal tail) actuator limited by ±25◦ .

q(k + 1)

α(k + 1) 1 Δt g V

φ(k + 1)

ψ(k + 1) 1

1

Δt

1

Δt

Δt

1 qS¯ ¯ c Jy

qS ¯ − mV

1 T2

−1

Δ−1

1 −2T ζ

α(k)

q(k)

φ(k)

ψ(k)

φact (k)

Fig. 2. Structure of semi-empirical ANN-model for dynamical system (9) according to the Euler difference scheme

In the model (9) the values α, q, ϕ and ϕ˙ are the states of the controlled object, the variable ϕact is the control. We consider maneuverable aircraft F-16 as an example of a specific object of modeling. The source data for this aircraft were taken from [27]. A block diagram of a semi-empirical model based on (9) is shown in Fig. 2. Here, the Euler method of integrating ordinary differential equations was used

34

Y. Tiumentsev and M. Egorchev

α(k + 1)

q(k + 1)

Δ−1

δeact (k)

α(k)

α(k − m)

q(k)

q(k − n)

Fig. 3. Structure of empirical NARX-type ANN-model for dynamical system (9)

to transform the original model with continuous time into a model with discrete time. For comparison, in Fig. 3 for the same model, a block diagram based on the NARX network is shown. In both of these schemes, the links are highlighted in red, whose synaptic weights are adjustable parameters of the model.

5

Generation of Training Sets for ANN-Modeling of Dynamical Systems

To obtain training data, we use an approach based on a set of specially organized test control actions applied to a dynamical system. With this approach, the actual motion of the dynamical system (x(t), u(t)) consists of the program motion (x∗ (t), u∗ (t)) (test maneuver) caused by the control signal u∗ (t), as well as the motion (˜ x(t), u ˜(t)) generated by the additional excitation u ˜(t): x(t) = x∗ (t) + x ˜(t), u(t) = u∗ (t) + u ˜(t).

(10)

As examples of test maneuvers in relation to the aircraft, we can call: – – – –

straight horizontal flight at a constant speed; flying with a monotonically increasing angle of attack; turn in a horizontal plane; ascending/descending spiral.

The type of test maneuver (x∗ (t), u∗ (t) in (10) determines the resulting ranges of values of state and control variables, the type of excitation u ˜(t) specifies the variety of examples within these ranges.

Semi-empirical ANN-Modeling of Controlled Dynamical Systems

35

Fig. 4. Test disturbances as functions of time used in studying the dynamics of controlled systems: a is a random signal; b is a polyharmonic signal. Here φact is the actuator command signal for the all-moving horizontal tail of the maneuverable aircraft from the example (9)

As was shown in the work of Schr¨ oder [28] (also in [29,30]) in this case, it is advisable to use the polyharmonic signal as an excitation. An example of such a signal is shown in Fig. 4a. The mathematical model of such a signal uj acting on the j-th control is a harmonic polynomial 2πkt + ϕk , Ik ⊂ K, K = {1, 2, . . . , M }, Ak sin (11) uj = T k∈Ik

which is a finite linear combination of the main harmonic A1 sin(ωt + ϕ1 ) and higher order harmonics A2 sin(2ωt + ϕ2 ), A3 sin(3ωt + ϕ2 ) etc. If the phase angles ϕk in (11) are randomly selected in the interval (−π, π], then individual harmonic components, being summed up, can give at certain points t(i) the amplitude value of the total signal uj (i), which violates the conditions of proximity of the perturbed motion to the reference one. Prevention of this undesirable phenomenon is carried out by appropriate selection of the phase shift values ϕk . One more typical excitation signal is random. An example of it is shown in Fig. 4b. The values of this signal are kept constant at all time intervals [ti , ti+1 ), i = 0, 1, . . . , n − 1. At time instants ti , these values may change randomly.

36

6

Y. Tiumentsev and M. Egorchev

Algorithms for Learning ANN-Models

A number of problems arise when learning dynamic ANN-models in the form of recurrent neural networks. The main sources of difficulties will be the following: – bifurcations of the operation modes of the network when changing the values of model tunable parameters (synaptic weights, biases, internal parameters of neurons) in the process of learning the ANN-model [31]; – the presence of long-term dependencies of the network outputs on the inputs and states of the ANN-model at previous time instants [32,33]; – a very complicated landscape of the error function, rugged by numerous deep, narrow and curved gorges, and often having a plateau [34]. Bifurcation of Network Dynamics. In the theory of nonlinear dynamical systems, bifurcation is a qualitative restructuring of the functioning modes of a dynamical system with a small change in its parameters [35]. The bifurcation of the network dynamics is a qualitative change in the dynamic properties and behavior of the ANN-model with small changes in its adjustable parameters (synaptic weights, biases and, in some cases, internal parameters of neurons). In terms of neural network learning, this means that the landscape of the error function changes abruptly and significantly. Long-Term Dependencies. When learning dynamic networks, there is a socalled problem of long-term dependencies, because the output of the ANN-model depends on its inputs and states at previous time instants, including those far from the current point in time. Gradient methods of searching the minimum of the error function behave unsatisfactorily in this case. The reason for this behavior is clarified by the analysis of the asymptotic behavior of the learning error and its gradient in the backpropagation process [32,33], which shows that the values of these quantities rapidly (exponentially, as a rule) decrease. Complicated Landscape of the Error Function. One of the most important reasons for the emergence of difficulties in learning dynamic ANN-models is a very complicated relief of the error function, carved by numerous deep, narrow, and curved gorges. This reason is the most difficult for implementation of the ANN-model learning process. In this case, the determining factor is the number of examples in the training set. In this situation, we can only rely on such an approach to working with training data, which would allow increasing their number used consistently. The problem of learning the recurrent ANN-model, taking into account the complicated relief of the error function, can be solved for it in various ways [36–38]. These methods include the following: – regularization J(w) = SSE + α · SSW, where SSE is total mean square error of the network, SSW is the sum of the squares of the weights;

Semi-empirical ANN-Modeling of Controlled Dynamical Systems

37

– random variation of starting weights; – combination of regular and genetic search; – segmentation of the training sequence (changing the input network data changes the location of the cavities on the relief of the error function). Of the above approaches in complex problems (nonlinear multi-parameter mapping implemented by the network in combination with a large training set characterizing the complex behavior of a dynamic system), only the latter has sufficient efficiency based on the segmentation of the training sequence. The essence of this approach is as follows. Due to the very complicated relief of the error function, only for a small set of initial values of the network parameters, we can find a global minimum using gradient optimization methods. If we proceed to solve the problem of finding the initial values of parameters that are sufficiently close to the minimum, then we can assume that they are solutions of similar problems. That is, we need to generate a sequence of tasks that: – the first task is quite simple, and we can find its solution for any initial parameter values; – each subsequent task is similar to the previous one - their solutions are close in the parameter value space; – the sequence converges to the original, required task. The solution of individual subtasks from this sequence can be performed using such algorithms, most often used to solve the problem of learning dynamic networks [17,18,39], such as: – Back Propagation Through Time (BPTT); – Real-Time Recurrent Learning (RTRL); – Extended Kalman Filter (EKF). The main features of these algorithms are analyzed in [17]. This approach demonstrated its high efficiency in a series of computer experiments and was successfully applied to solve several problems of modeling and identification of dynamical systems. We discuss an example of such an application in the next section.

7

Neural Network Semi-empirical Modeling of Aircraft Motion

In this section, using the example of longitudinal angular motion of a maneuverable aircraft, the high efficiency of semi-empirical ANN-models (gray box models) when solving applied problems is demonstrated. The theoretical model is the corresponding traditional motion model of aircraft in the form of a system of ordinary differential equations (9). The semi-empirical ANN-model formed in this particular example includes two elements of the “black box” type (see Fig. 2) describing the dependencies of lift coefficients and pitch moment on state variables (angle of attack, pitch angular velocity and deflection angle of a controlled stabilizer). We need to restore these coefficients basing on the available experimental data for the observed state variables of the dynamical system.

38

Y. Tiumentsev and M. Egorchev , deg

0

q, deg/sec

φ

act

−5 −10

0

2

4

6

8

10

12

14

16

18

20

0

2

4

6

8

10

12

14

16

18

20

0

2

4

6

8

10

12

14

16

18

20

0

2

4

6

8

10

12

14

16

18

20

0

2

4

6

8

10 t, sec

12

14

16

18

20

10 0 −10

α, deg

10 5 0

E ,%

0.5 q

0 −0.5

α

E ,%

5 0 −5

Fig. 5. Estimation of the restoration accuracy for the dependencies CL (α) and Cm (α) based on the results of testing the ANN-model (point mode, identification and testing using a polyharmonic signal). The output values of the object (9) and ANN-models are shown by a blue line and a green line, respectively

As an example of a specific object of simulation, the F-16 maneuverable aircraft was considered, the source data for which were taken from [27]. A computational experiment with the model (9) to obtain a training set was conducted for the time interval t ∈ [0; 20] sec with discretization step Δt = 0.02 sec for a partially observable state vector y(t) = [α(t); ωz (t)]T , with additive white noise with a standard deviation of σ = 0.01, affecting the output of the system y(t). Test excitations used in studying the dynamics of the system (9) are shown in Fig. 4. As the target value of the simulation error, we will further use the standard deviation of additive noise acting on the system output. Training in the sample {yi }, i = 1, . . . , N , obtained using the source model (9), is carried out in Matlab for networks in the form of LDDN (Layered Digital Dynamic Networks) using the Levenberg–Marquardt algorithm by criterion of mean square error of model. The Jacobi matrix is calculated using the RTRL (Real-Time Recurrent Learning) [27] algorithm. An extensive series of computational experiments was carried out comparing the effectiveness of test signals for two types of test maneuvers: straight-line horizontal flight at a constant speed and flight with a monotonically increasing angle of attack. As a typical example, Fig. 5 shows how accurately unknown

Semi-empirical ANN-Modeling of Controlled Dynamical Systems

39

Table 1. Simulation error on the training set (polyharmonic signal) Problem

Point mode RMSEα RMSEq

Monotonous mode RMSEα RMSEq

Adjusting Cy

1.02 · 10−3 1.24 · 10−4 1.02 · 10−3 1.24 · 10−4

Learning CL

1.02 · 10−3 1.23 · 10−4 1.02 · 10−3 1.24 · 10−4

Learning Cy , Cm

1.02 · 10−3 1.19 · 10−4 1.02 · 10−3 1.27 · 10−4

NARX simulation 1.85 · 10−3 3.12 · 10−3 1.12 · 10−3 7.36 · 10−4 Table 2. Simulation error on the test set (polyharmonic signal) Problem

Point mode RMSEα RMSEq

Monotonous mode RMSEα RMSEq

Adjusting CL

1.02 · 10−3 1.59 · 10−4 1.02 · 10−3 1.17 · 10−4

Learning CL

1.02 · 10−3 1.59 · 10−4 1.02 · 10−3 1.17 · 10−4

Learning CL , Cm

1.02 · 10−3 1.32 · 10−4 1.02 · 10−3 1.59 · 10−4

NARX simulation 2.32 · 10−2 4.79 · 10−2 3.16 · 10−2 5.14 · 10−2 Table 3. Simulation error on the test set for the semi-empirical model and three types of excitation signals Signal

Point mode Monotonous mode RMSEα RMSEq RMSEα RMSEq

Doublet

0.0202

0.0417

8.6723

34.943

Random

0.0041

0.0071

0.0772

0.2382

Polyharmonic 0.0029

0.0076

0.0491

0.1169

dependencies are restored (nonlinear functions CL (α), Cm (α)). The accuracy of the obtained semi-empirical ANN-model using these dependencies was evaluated in comparison with the original system (9), which used exact representations of the functions CL (α), Cm (α). The results for these two models are so close that the curves in the graphs almost coincide. Numerical estimates of the accuracy of the models obtained are given in Table 1 (an estimate of the accuracy on the training set) and Table 2 (an estimate of the generalizing properties of the ANN-model). It also gives a comparison of the models obtained with the NARX models. Table 3 compares the values of the simulation error depending on the type of excitation signal for the considered semi-empirical model of the longitudinal angular motion of the aircraft. We can see that similar results for the empirical NARX model are much less accurate, in particular, for the polyharmonic signal, RMSEα = 1.3293, RMSEq = 2.7445.

40

Y. Tiumentsev and M. Egorchev

In Tables 1, 2 and 3 we denote straight-line horizontal flight at a constant speed as a point mode, and flight with a monotonically increasing angle of attack as a monotonic mode. In addition, the term “learning” for the corresponding aerodynamic coefficients denotes the problem of restoring the corresponding unknown function “from scratch”, i.e. assuming no information about the possible values of these coefficients. The term “adjusting” denotes the task of refining the values of the corresponding coefficient, known, for example, from the results of wind tunnel tests.

8

Conclusions

The obtained results allow us to conclude that the methods of semi-empirical neural network modeling, combining knowledge and experience from the relevant subject area, as well as from traditional computational modeling, are a powerful and promising tool potentially suitable for solving complex problems of describing and analyzing the controlled motion of aircraft. Comparison of the results obtained within the framework of the semi-empirical approach with those obtained by traditional ANN-modeling (NARX-type models) shows the undeniable advantages of semi-empirical models. Acknowledgments. This research is supported by the Ministry of Science and Higher Education of the Russian Federation as Project No. 9.7170.2017/8.9.

References 1. Hangos, K.M., Bokor, J.: Analysis and Control of Nonlinear Process Systems. Springer, Berlin (2004) 2. Kulakowski, B.T., Gardner, J.F., Shearer, J.L.: Dynamic Modeling and Control of Engineering Systems, 3rd edn. Cambridge University Press, Oxford (2007) 3. Scott, L.R.: Numerical Analysis. Princeton University Press, New Jersey (2011) 4. Hairer, E., Norsett, S.P., Wanner, G.: Solving Ordinary Differential Equations I: Nonstiff Problems, 2nd edn. Springer, Berlin (2008) 5. Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems, 2nd edn. Springer, Berlin (2002) 6. Tao, G.: Adaptive Control Design and Analysis. Wiley, New York (2003) 7. Ioannou, P.A., Sun, J.: Robust Adaptive Control. Prentice Hall, New Jersey (1995) 8. Astolfi, A., Karagiannis, D., Ortega, R.: Nonlinear and Adaptive Control with Applications. Springer, Berlin (2008) 9. Nelles, O.: Nonlinear System Identification: From Classical Approaches to Neural Networks and Fuzzy Models. Springer, Berlin (2001) 10. Billings, S.A.: Nonlinear System Identification: NARMAX Methods in the Time, Frequency and Spatio-temporal Domains. Wiley, New York (2013) 11. Egorchev, M.V., Kozlov, D.S., Tiumentsev, Yu.V., Chernyshev, A.V.: Neural network based semi-empirical models for controlled dynamical systems. J. Comput. Inf. Technol. 9, 3–10 (2013). [in Russian]

Semi-empirical ANN-Modeling of Controlled Dynamical Systems

41

12. Egorchev, M.V., Kozlov, D.S., Tiumentsev, Yu.V.: Neural network adaptive semiempirical models for aircraft controlled motion. In: Proceedings of the 29th Congress of the International Council of the Aeronautical Sciences, vol. 4 (2014) 13. Egorchev, M.V., Tiumentsev, Yu.V.: Learning of semi-empirical neural network model of aircraft three-axis rotational motion. Opti. Mem. Neural Netw. 24(3), 201–208 (2015) 14. Kozlov, D.S., Tiumentsev, Yu.V.: Neural network based semi-empirical models for dynamical systems described by differential-algebraic equations. Opt. Mem. Neural Netw. 24(4), 279–287 (2015) 15. Egorchev, M.V., Tiumentsev, Yu.V.: Semi-empirical neural network based approach to modelling and simulation of controlled dynamical systems. Procedia Comput. Sci. 123, 134–139 (2018) 16. Egorchev, M.V., Tiumentsev, Yu.V.: Neural network semi-empirical modeling of the longitudinal motion for maneuverable aircraft and identification of its aerodynamic characteristics. In: Advances in Neural Computation, Machine Learning, and Cognitive Research. Studies in Computational Intelligence, vol. 736, pp. 65–71 (2018) 17. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, New Jersey (2006) 18. Hagan, M.T., Demuth, H.B., Beale, M.: Neural Network Design. PWS Publishing Company, New Orleans (1996) 19. Oussar, Y., Dreyfus, G.: How to be a gray box: dynamic semi-physical modeling. Neural Netw. 14(9), 1161–1172 (2001) 20. Dreyfus, G.: Neural Networks - Methodology and Applications. Springer, Berlin (2005) 21. Bohlin, T.: Practical Grey-Box Identification: Theory and Applications. Springer, Berlin (2006) 22. Chen, Z., Wei, J., Jiang, R.: A gray-box neural network based model identification and fault estimation scheme for nonlinear dynamic systems. Int. J. Neural Syst. 23(6), 1–15 (2013) 23. Rivals, I., Personnaz, L.: Black-box modeling with state-space neural networks. In: Zbikowski, R., Hint, K.J. (Eds.) Neural Adaptive Control Technology, World Scientific, pp. 237–264 (1996) 24. Brusov, V.S., Tiumentsev, Yu.V.: Neural Network Based Modeling of Aircraft Motion. The MAI Publishing House, Moscow (2016). [in Russian] 25. Cook, M.V.: Flight Dynamics Principles. Elsevier, Amsterdam (2007) 26. Hull, D.G.: Fundamentals of Airplane Flight Mechanics. Springer, Berlin (2007) 27. Nguyen, L.T., Ogburn, M.E., Gilbert, W.P., Kibler, K.S., Brown, P.W., Deal, P.L.: Simulator study of stall/post-stall characteristics of a fighter airplane with relaxed longitudinal static stability. Technical Report, TP-1538, NASA, December 1979 28. Schr¨ oeder, M.R.: Synthesis of low-peak-factor signals and binary sequences with low autocorrelation. IEEE Trans. Inf. Theory 16(1), 85–89 (1970) 29. Morelli, E.A., Klein, V.: Real-time parameter estimation in the frequency domain. J. Guidance Control Dyn. 23(5), 812–818 (2000) 30. Smith, M.S., Moes, T.R., Morelli, E.A.: Flight investigation of prescribed simultaneous independent surface excitations for real-time parameter identification. In: AIAA Paper 2003, No. 23, p. 5702 (2003) 31. Doya, K.: Bifurcations in the learning of recurrent neural networks. IEEE Int. Symp. Circuits Syst. 6, 2777–2780 (1992) 32. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. Trans. Neur. Netw. 5(2), 157–166 (1994)

42

Y. Tiumentsev and M. Egorchev

33. Schaefer, A.M., Udluft, S., Zimmermann, H.-G.: Learning long-term dependencies with recurrent neural networks. Neurocomputing 71(13–15), 2481–2488 (2008) 34. De Jesus, O., Horn, J.M., Hagan, M.T.: Analysis of recurrent network training and suggestions for improvements. In: Proceedings of IJCNN, vol. 4, pp. 2632– 2637 (2001) 35. Seydel, R.: From Equilibrium to Chaos: Practical Bifurcation and Stability Analysis. Elsevier, Amsterdam (1988) 36. Phan, M.C., Hagan, M.T.: Error surface of recurrent neural networks. IEEE Trans. Neural Netw. 24(11), 1709–1721 (2009) 37. Horn, J., De Jes´ us, O., Hagan, M.T.: Spurious valleys in the error surface of recurrent networks – analysis and avoidance. IEEE Trans. Neural Netw. 20(4), 686–700 (2009) 38. Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48(1), 71–99 (1993) 39. Mandic, D.P., Chambers, J.A.: Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability. Wiley, New York (2001)

Artificial Intelligence

Photovoltaic System Control Model on the Basis of a Modified Fuzzy Neural Net Ekaterina A. Engel(&) and Nikita E. Engel Katanov State University of Khakassia, Shetinkina 61, 655017 Abakan, Russian Federation [email protected]

Abstract. This paper represents the photovoltaic system control model on the basis of a modified fuzzy neural net. Based on the photovoltaic system condition, the modified fuzzy neural net provides a maximum power point tracking under random perturbations. The architecture of the modified fuzzy neural net was evolved using a neuro-evolutionary algorithm. The validity and advantages of the proposed photovoltaic system control model on the basis of a modified fuzzy neural net are demonstrated using numerical simulations. The simulation results show that the proposed photovoltaic system control model on the basis of a modified fuzzy neural net achieves real-time control speed and competitive performance, as compared to a classical control scheme with a PID controller based on perturbation & observation, or incremental conductance algorithm. Keywords: Modified fuzzy neural net Random perturbations Photovoltaic system Maximum power point tracking

1 Introduction The Republic of Khakassia is one of the most perspective regions for development of solar power system in Russian Federation. The annual average of the solar insolation for town Abakan is about 1450 kWh/sq.m [1]. That exceeds values of the European part of the Russian Federation (about 1200-1450 kWh/sq.m). But the photovoltaic (PV) systems aren’t stable due to complex dynamics of the solar irradiance fluctuations. Therefore, maximum power point tracking (MPPT) algorithms have an important role in solar power generation. We consider a non-linear MPPT problem for PV systems. PV system is non-linear and commonly suffers from restrictions imposed by sudden variations in the solar irradiance level. Within the research literature, a whole array of differing MPPT algorithms has been proposed [2]. Among them, the perturbation & observation (P&O) and incremental conductance (IC) algorithms are the most common due to simplicity and easy implementation. But controllers based on P&O, or IC algorithm for PV systems have slow response times to changing reference commands, take considerable time to settle down from oscillating around the target reference state, must often be designed by hand. Moreover, the PV system control model should be robust to different environmental conditions, in order to reliably generate maximum power. Therefore, automatic intelligent algorithms such as fuzzy neural networks are promising alternatives [3]. © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 45–52, 2020. https://doi.org/10.1007/978-3-030-30425-6_4

46

E. A. Engel and N. E. Engel

The real-life PV systems have complex dynamic due to random variation of the system parameters and fluctuation of the solar irradiance. Thus, neural-network-based solutions have been proposed to approximate this complex dynamic [3]. But the neural network needs to become more adaptive. Adaptive behavior can be enabled by modifying the network into a recurrent neural network with fuzzy units. This forms the motivation for the development of a PV system control model on the basis of a modified fuzzy neural net (MFNN) as presented in this paper. Compared to existing fuzzy neural nets, including ANFIS, the MFNN includes recurrent neural networks and fuzzy units. The function approximation capabilities of a neural net are exploited to approximate a membership function.

2 The PV System Control on the Basis of a MFNN In this article, the function approximation capabilities of a MFNN are exploited to approximate a nonlinear control law of PV system. This paper considers the development of an effective maximum PV power point tracking algorithm on the basis of a MFNN that remains easy to implement. The proposed modified fuzzy neural net is capable of handling uncertainties in both the PV system parameters and in the environment. 2.1

Mathematical Modelling of a PV System

We design and simulate in Octave environment 20 kW PV module by implementing following mathematical models of electrical characteristics. The open-circuit voltage is the extreme voltage offered from a PV cell at zero current. We calculate the opencircuit voltage as follows V ¼ NKT=Q ln ðIL Io Þ=Io þ 1:

ð1Þ

where is the open-circuit voltage, is diode ideality constant, is the Boltzmann constant ð1:381 1023 J=KÞ, T is temperature in Kelvin, Q is electron charge ð1:602 1019 cÞ, IL is the light-generated current same as Iph ð AÞ, and Io is the saturation diode current ð AÞ. We calculate the light-generated current as follows IL ¼ G=Gref ðILref þ aIsc ðTc Tc ref ÞÞ:

ð2Þ

where G is the radiation ðW=m2 Þ, Gref is the radiation under standard condition ð1000 W=m2 Þ, ILref is the photoelectric current under standard condition (0.15 A), Tc ref is module temperature under standard condition (298 K), aIsc is the temperature coefficient of the short-circuit current ðA=K Þ ¼ 0:0065=K, IL is the light-generated current. We calculate the reverse saturation current as follows 3 Io ¼ Ior T=Tref exp QEg =KN ð1=Tr 1=T Þ :

ð3Þ

Photovoltaic System Control Model on the Basis of a MFNN

47

where Ior ¼ Ish=expðVocn=NVtnÞ is the saturation current, Io is the reverse saturation current, N is the ideality factor 1.5, and Eg is the band gap for silicon 1.10 eV. We calculate the short-circuit current as follows Ish ¼ IL Io ðexpðQ ðV IRS Þ=NKT Þ 1Þ:

ð4Þ

This PV module provides 20 kW under standard condition (irradiance is 1000 W=m2 , temperature is 20 C). This Octave model uses a MPPT system with a duty cycle that generates the required voltage to extract maximum power. 2.2

The PV System Control Model on the Basis of a MFNN

The MFNN is trained based on the data Z i ¼ ð xi ¼ ðIr i ; V i ; Pi ; dI=dV i Þ; si ¼ ðDIr i ; dI=dV i Þ; Di Þ; ð5Þ where i 2 1; ::; 106 , I and V represent the current and voltage respectively, Di is the duty cycle of boost converter, dI and dV represent (respectively) the current error and voltage error before and after the increment, Ir represents the solar irradiance, D Ir i ¼ Ir0i Ir1i ; Ir0i is the irradiance before the increment, Ir1i is the irradiance after the increment, P – the PV system power; xi – input signal of MFNN; Di – control signal. Data (5) have a training set of 8 * 105 examples, and a test set of 2 * 105 examples. Fulfillment of the MFNN briefly can be described as follows. Step 1. All samples of the data (5) si were classified into two groups according to speed of the PV system conditions change: A1 is sudden change ðCi1 ¼ 1Þ; A2 is smooth change ðCi2 ¼ 1Þ. This classification generates vector with elements C i . Step 2. We trained two-layer network: Y ðsi Þ (number of hidden neurons is 2). The vector si was network’s input. The vector C i was network’s target. We formed membership function lj ðsÞ based on the two-layer network Y ðsi Þ as follows l1 ðs Þ ¼ i

Yðsi Þ; if Yðsi Þ 0 ; 0; if Yðsi Þ\0

l2 ðs Þ ¼ i

jYðsi Þj; 0;

if Yðsi Þ\0 if Yðsi Þ 0

ð6Þ

This step provides the fuzzy sets Aj , (A1 is sudden change of the PV system conditions, A2 is smooth change of the PV system conditions) with membership function lj ðsÞÞ; j ¼ 1::2. Step 3. We created the MFNN based on the data (5). The MFNN includes two recurrent neural networks Fj (number of delays is 2), j ¼ 1::2. The MFNN architecture’s parameters (number of nodes in hidden layer, corresponded weights and biases) have been coded into particles X. The dimension component of particle X is dh ¼ 12 h þ 2 2 fDmin ¼ d1 ¼ 14; Dmax ¼ d10 ¼ 122g. To make the PV system control become adaptive, it needs to have some idea of how the actual PV system behavior differs from its expected behavior, so that the recurrent neural network Fj can recalibrate its behavior intelligently during run time, and try the constant to eliminate tracking error. We give the recurrent neural network Fj lj ðsÞ; x an extra input lj ðsÞ

48

E. A. Engel and N. E. Engel

which corresponds to the value of membership function lj ðsÞ. This input signal of the recurrent neural networks Fj lj ðsÞ; x will give useful feedback for providing the maximum PV power during the dynamically changing PV system conditions. This control approach does provide a more intelligent algorithm of generating the control signal u on the basis of a MFNN. We evaluated the fitness function as follows: f ðD; uÞ ¼ ð1=H Þ

H X

jD uj:

ð7Þ

l¼1

where H is number of evaluated samples. We used modified ALO as optimization algorithm. We presented modified ALO in [4]. We used function (7) as a fitness function for the modified ALO. This step provides trained MFNN bestðdh Þ which generate the control signal uðbestðdh ÞÞ – best solution X created by the modified ALO). If-then rules are defined as: Pj : IF x is Aj THEN u ¼ Fi lj ðsÞ; x ; j ¼ 1::2:

ð8Þ

Simulation of the trained MFNN briefly can be described as follows. Step 1. Aggregation antecedents of the rules (8) maps input data x into their membership functions and matches data with conditions of rules. These mappings are then activate the k rule, which indicates the k PV system mode and correspondent k recurrent neural network Fk ðlj ðsÞ; xÞ; k 2 1::2. Step 2. According the k mode the correspondent k recurrent neural network Fk lj ðsÞ; x (trained based on the data (5)) generates the control signal u ¼ Fj lj ðsÞ; x . 2.3

Simulation and Results

We revisited the numerical examples from the previous subsections 2.1 and 2.2 due to illustrate the benefits of the proposed photovoltaic system control model based on a modified fuzzy neural net. All the simulations for this study are implemented in Octave. Figure 1 shows the solar irradiance during the simulation time. For the purpose of this simulation study, four solar irradiance scenarios were adopted: (1) From time = 0 s to 0.4 s graph demonstrates slow variable shadow cast by an obstacle, which causes a smooth change of irradiation; (2) From time = 0.5 s to 1 s graph demonstrates smooth and steady decline in solar irradiance which simulates a cloud covering; (3) From time = 1.1 s to 2.1 s irradiation changes to the exact target values with a smooth change; (4) From time = 2.1 s to 2.5 s graph demonstrates sudden change in irradiation, from sunshine conditions. We fulfilled the MFNN based on the training set of the data (5). We trained the MFNN using modified ALO [4]. Due to obtain statistical results, we perform 120 modified ALO runs with following parameters: n ¼ 50 (we use 50 ants and ant lions), T ¼ 100 (we

Photovoltaic System Control Model on the Basis of a MFNN

49

terminate at the end of 100 iterations), the dimension is dh ¼ 12 h þ 2 2 fDmin ¼ d1 ¼ 14; Dmax ¼ d10 ¼ 122g. The vector f ðbestðdh ÞÞ ¼ ð4:2e 3; 3:7e 4; 1:5e 5; 1:1e 5; 2:5e 6; 1:1e 7; 3:4e 8; 4:2e 7; 1:7e 5; 2:4e 4Þ shows that only one set of MFNN architecture with d7 ¼ 86 can achieve the fitness function (7) above 4e 8 over data (5).

Fig. 1. Plot of solar irradiance.

This MFNN includes two recurrent neuronets Fk ðlk ðsÞ; xÞ; k ¼ 1::2. The aforementioned recurrent neuronets are the two-layered networks with seven hidden neurons. In this comparison study, the performance of the proposed PV system control model on the basis of a MFNN is compared against the standard model with the PID controller (based on P&O or IC algorithm), under the same conditions. Figures 2 and 3 show the simulation results.

Fig. 2. Plot of the PV system power provided by control model with PID controller based on P&O algorithm and the control model on the basis of a MFNN respectively.

50

E. A. Engel and N. E. Engel

According to Fig. 3, the response time using the IC algorithm is not better than the one using the proposed algorithm in the first 0.5 s. This means that the IC algorithm which creates the control signal within the transient mode is the overshoot. From time = 2.2 s to 3 s the PV system energy producing by the control model with the PID controller based on the IC algorithm drops to zero.

Fig. 3. Plot of the PV system power provided by the control model with PID controller based on the IC algorithm and the control model on the basis of a MFNN respectively.

The proposed PV system control model is more robust and provides more power (Figs. 2 and 3) in comparison with the control models with the PID controller (based on P&O, or the IC algorithm). Figure 2 shows the misjudgment phenomenon for the P&O algorithm when solar irradiance continuously increases ðtime t 2 T ¼ ½0:3 s; 0:4 s [ ½0:8 s; 1 s [ ½1:7 s; 2:1 sÞ. In such situations, the proposed PV system control model - which is based on a fuzzy modified neural net - produces on average 8.6% more energy than does the case of the standard model, which is based on a perturbation and P t P observation algorithm (100% PMFNN PtP&O =PtP&O = 1 ¼ 8:6%, where t2T

t2T

PMFNN is energy provided by proposed PV system control model based on a modified fuzzy neural net, PP&O PMFNN is energy provided by standard model based on P&O algorithm). During time t 2 ½1:1 s; 1:3 s [ ½1:5 s; 1:7 s [ ½2:2 s; 3 s the PID controller based on the IC algorithm generates a huge numerical value of the control signal (value of control signal u 2 ½5:0706e þ 32; 5:6385e þ 33) as a result of sudden fluctuations in the solar irradiance, while the proposed PV system control model provided the maximum PV power (Figs. 1, 3 and 4).

Photovoltaic System Control Model on the Basis of a MFNN

51

Fig. 4. Plot of the control signal provided by the PID controller based on the IC algorithm.

The MFNN provides a more suitable approach to the MPPT problem, with the pointing accuracy. Extensive simulation studies on the Octave model have been carried out on different initial conditions, different disturbance profiles, and variation in photovoltaic system and solar irradiation level parameters. The results show that consistent performance has been achieved for the proposed PV system control model with good stability and robustness as compared with the standard model with a PID controller.

3 Conclusions It is shown that the PV system control model on the basis of a MFNN is robust to PV system uncertainties. Unlike popular approaches to nonlinear control, a MFNN is used to approximate the control law and not the system nonlinearities, which makes it suitable over a wide range of nonlinearities. Compared to standard MPPT algorithms, including P&O and IC, the PV system control model on the basis of a MFNN produces good response time, low overshoot, and, in general, good performance. Simulation comparison results for a PV system demonstrate the effectiveness of the PV system control model on the basis of a MFNN as compared with the standard model with a PID controller (based on P&O, or IC algorithm). It is our contention that the proposed modified fuzzy neural net architecture can have generic control applications to other kinds of systems, and produce a competitive alternative algorithm to neural networks and PID controllers. Acknowledgement. The reported study was funded by RFBR and Republic of Khakassia according to the research project №. 19-48-190003.

References 1. Beta-energy official page. https://www.betaenergy.ru/insolation/abakan. Accessed 27 Apr 2019 2. Tavares, C.A.P., Leite, K.T.F., Suemitsu, W.I., Bellar, M.D.: Performance evaluation of PV solar system with different MPPT methods. In: 35th Annual Conference of IEEE Industrial Electronics IECON 2009, pp. 719–724 (2009)

52

E. A. Engel and N. E. Engel

3. Kumar, A., Chaudhary, P., Rizwan, M.: Development of fuzzy logic based MPPT controller for PV system at varying meteorological parameters. In: 2015 Annual IEEE India Conference (INDICON), New Delhi, pp. 1–6 (2015) 4. Engel, E.A., Engel, N.E.: Temperature forecasting based on the multi-agent adaptive fuzzy neuronet. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V. (eds.) Advances in Neural Computation, Machine Learning, and Cognitive Research. Neuroinformatics 2018. Studies in Computational Intelligence, vol. 736. Springer, Cham (2019)

Impact of Assistive Control on Operator Behavior Under High Operational Load Mikhail Kopeliovich(B) , Evgeny Kozubenko, Mikhail Kashcheev, Dmitry Shaposhnikov, and Mikhail Petrushan Research Center of Neurotechnologies, Southern Federal University, Rostov-on-Don, Russia [email protected]

Abstract. This work describes the impact of artificial assistant on the operator’s performance which is applied to correct the operator’s actions in case of unsafe or ineffective behavior. In order for assistive control to be effective a method to evaluate and predict operator performance should be applied. This paper presents a model of operator activity based on histograms of the distribution of reaction times to particular stimuli. The model is then applied to the task of monitoring operator activity in a controlled environment, designed to emulate certain actions performed by an aircraft pilot. For each subject, an individual behavioral portrait is made. Then, performance changes under high operational load conditions and impact of assistive control are evaluated.

Keywords: Adaptive behavior Model of operator activity

1

· Assistive control ·

Introduction

Safety and performance of different processes (driving, flying, manufacturing, building and others) depend on the efficiency of the operator’s behavior and accuracy of their actions. Therefore, the operator’s activity should be monitored to forecast process performance [1,7]; in a case of unsafe behavior, the activity should be corrected by an artificial assistant. The problem is considered with a focus on the activities of the pilot of the aircraft, although it is formulated in a universal manner, allowing to transfer the methods of monitoring and control to another type of operator activity. A universal model (Fig. 1) of operator activity is formalized, including components of monitoring and control of the performance and safety of behavior. This model is applied to the task of controlling operator activity in a developed test experiment, where particular elements emulate certain piloting actions. In particular, one of the tasks in the test experiment is “alignment of the artificial horizon” (see Sect. 3). To implement the procedure for monitoring and controlling operator activities, it is necessary to determine the criteria of the effectiveness and safety of c Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 53–61, 2020. https://doi.org/10.1007/978-3-030-30425-6_5

54

M. Kopeliovich et al.

behavior and methods of their evaluations. The following approach was implemented: step 1—a list of events and situations is determined which may occur or take place in the process of operator activity and to which a specific operator response is required, consisting of a specific set of actions. Further, step 2—determining the list of expected actions and methods of their registration. Actions are characterized by a number of parameters that can be recorded. Thus, step 3—registration of action parameters. Video surveillance system and feedback from the onboard system of the experimental setup (from the object of operator activity) are used to capture the operator’s actions and their parameters. From the onboard system, information is obtained about the presence (or absence) of the action (pressing the key, switching the toggle switch, etc., see Sect. 3) and the characteristics of the action (latency relative to the event for which this action is expected). Some actions or characteristics allow direct evaluation of the performance and safety of behavior. For example, the absence of response to a particular stimulus or a long reaction time (RT) can be interpreted by the control system as a failed task (or an inefficiently performed task). Such control is the “first control layer”. If the RT (or another parameter of the behavioral response) falls within the permissible range, the behavior characteristics are analyzed in the “second control layer” (or the “differential control layer”). In the second control layer, the current observing behavior is compared with that typical for the operator at a specific event. Variation of the characteristics of the operator’s actions within the acceptable range depends on his physiological and psychological capabilities, on the current state, on distractions, and it is difficult to directly interpret certain values of such characteristics in terms of effectiveness and safety of behavior. The project verifies the hypothesis that deviations of the characteristics of the operator’s actions from those typical for a particular event are a correlate of the efficiency and safety of operator activity and (or) allow us to make a forecast of the effectiveness and safety of future behavior. Safety behavior can be formalized as the probability of making a critical control error. Efficiency—in the form of the number of non-critical errors per unit of time, possibly weighted by the degree of significance. We identify the deviation from typical behavior by trying to classify the feature vector comprised of behavioral parameters. If it fails to classify as a model, which belongs to “typical behavior” class of particular operator, we treat it as a possible correlate of non-optimal performance (see Sect. 4). The overall scheme of the operational cycle with assistive control is presented in Fig. 1. In general, it is similar to schemata of control systems in works [2– 4,8–10]. Assistive support components are described in works [3,11] and are implemented in our approach in a similar manner. This work is a summary of selected results of the project no. 2.955.2017/4.6, supported by the Russian Ministry of Science and Higher Education.

Impact of Assistive Control on Operator Behavior

55

Artificial Assistant

Operational Cycle

Context perception and recognition

Assistance

Data Acquision and Storing

Performance estimation

Behavioral correlation of performance

Behavior model recognition

Personalized behavioral models

Behavior model building

Surveillance systems

Operator's decisions

Context changes

Operator's actions

Uncontrollable events

Causal relationship Data transfer

Fig. 1. Operational cycle with assistive control

2

Problem Statement

We’ve evaluated operator’s behavior changes under high operational load conditions which we model by positioning stimuli densely in time, where each stimulus requires a certain response. A particular case of assistance is tested which involves: (a) recognition of potentially non-optimal performance by classification of features vector comprised of behavioral parameters, and (b) adding latency to particular stimuli visualization. To select the characteristics that make up the “individual portrait of behavior” (a schema of actions and a list of ranges of their characteristics under certain events), their variability is analyzed in a series of experiments. It makes sense to carry out an analysis of the characteristics of behavioral reactions only after the skill is established. The skill is considered established when the initial stage of learning a new type of operator activity ends and performance indicators reach a quasi-constant level. It is assumed (and confirmed in our test experiment) that operational failure is caused by perception conflict and, thus, serialization of visualization of quasisimultaneous stimuli may lead to more optimal performance despite the fact that artificial latency of stimulus visualization itself increases RT.

3

Methods

We applied the general model of operational activity with assistive control (Fig. 1) to a particular case of the test experiment, which involves high load operator’s activity. The task for a test subject is to react, as quick and precise as possible, to stimuli from the output devices (Fig. 2).

56

M. Kopeliovich et al.

Fig. 2. Experimental setup. Output devices: left (1) and right (2) monitors, LED panel (3), speakers (4). Input devices: switch panel (5), keyboard (6), joystick (7)

According to Fig. 2, the stimuli come from monitors (1) and (2) in front of the subject, the LED panel (3) and speakers (4). The image on the left monitor (1) imitates artificial horizon (white line at the center), continuously changing its tilt and height in random directions at a frequency of 10 Hz. On the right monitor (2) there are the timer (upper-left corner) that starts from 15 s, and restarts after the correct reaction is received, the penalty counter (bottom-left corner) and the shape (circle, square, triangle, pentagonal star or hexagonal star), which is changing randomly with 15–20 s interval. LED panel contains 6 rows of 3 diodes in each. During the experiment, a pattern of 1 to 9 randomly chosen diodes is active, changing after random intervals of 2–12 s. Sound signal with length about 100 ms rings after random intervals of 5–5.3 s. Subjects were asked to react to stimuli using following input devices: switch panel (5) with 5 toggles corresponding to possible shapes appearing on the right monitor (2), computer keyboard (6) and Thrustmaster T.16000M joystick (7). During the experiment, the subjects were charged to react quickly on the four following stimuli: 1. Timer: when the timer on the right monitor (2) ends, to press a key on the keyboard corresponding to the number of active diodes on the LED panel (3). 2. Shape: to switch toggles on the switch panel (5) corresponding to shape on the right monitor. 3. Sound: to press the joystick (7) trigger on the sound signal. 4. Horizon: to hold artificial horizon (1) aligned with two black horizontal bars using the joystick (7), while the horizon line randomly changing height of its center with a speed of 0%–2.5% of the monitor width per second and randomly changing rotation angle with a speed of 0–5 per second. The experiment goes on for 3 min (success) or until reaching a threshold of 100 penalty points (failure). Table 1 illustrates permissible RT to the stimuli and the corresponding penalties.

Impact of Assistive Control on Operator Behavior

57

Table 1. Permissible reaction time to external stimuli and penalty points in case of a late/erroneous reaction Stimulus Permissible RT (s)

Penalty for late reaction (per second)

Penalty for wrong reaction

Timer

3

5

3

Shape

3

5

2

Sound

2

5

3

Horizon

0

Proportional to the deviation of the horizon

None

5 male subjects aged from 23 to 40 took part in the experiments. During one session, each subject took part in 5 experiments, each subject participated in one or more sessions with 1 to 2 week in-between. The Bioethics Committee of SFedU approved the experimental protocol. Each volunteer signed the agreement to participate in the experiment. After the described experiments (will be further referred to as Normal Mode), the subjects took part in two types of experiments with increased operational load caused by the increased temporal density of stimuli (Hard Mode 1 and 2). For Hard Mode 1, Timer, Shape and Sound stimuli occur 30% to 60% more often, in addition, there is 50% chance for the shape to change in about 0–2 s before or after the Timer ends. Hard Mode 2 features the same changes as Hard Mode 1, in addition, when the shape changes in 0–1.5 s before or 0–2 s after the Timer ends, the new shape will be shown only after correct reaction on the Timer stimulus or after 5 s passed. Hard Mode experiments were performed starting with Hard Mode 1, consisting of 3 sessions of 4 to 5 experiments in each with a 1-week interval between sessions. We assume that the histogram of the distribution of the RT may be the basis for building a model of the subject’s behavior, which allows identifying the subject by matching of reactions characteristics and determine the deviation of behavior from the subject’s typical one in the next experiments. Reaction time is widely used for evaluation of operator performance and for failures prediction [5] and for modeling of human behavior [6]. In the study, the histogram of RT distribution on a certain stimulus is considered as a model of the subject’s behavior. Each stimulus is considered independently. The problem of verification of the subject’s identity is based on an analysis of his RT distributions obtained in one or more experiments. Identity verification algorithm: 1. Create a model of the subject response for a certain stimulus: according to the data available in the dataset calculate the histogram of the distribution of the RT for the stimulus. 2. Get a subsample of RT values of the subject’s reactions from one or more experiments.

58

M. Kopeliovich et al.

3. Calculate the distance (or measure of proximity) between the subsample and the subject’s response model. The subject is considered to be successfully verified if the distance is less than a fixed threshold (or more, in case of a proximity measure). To determine the thresholds for each stimulus and different subsample size, a histogram of the RT distribution of the subject is chosen from the available dataset of experimental results, which is compared with the histogram-model of the subject (d1 value) and with histograms-models of other subjects (d2 values). Subsamples are generated by randomly selecting K values from the subject’s RT to the stimulus. Considered subsample sizes K are 4, 8, 16, 32, 64, 128. The set of RT values of the subject for generating the subsample is contained in the set of values for generating the model of the subject, which may affect the results of comparing the subsample with the model. This will be discussed further. For each of the five subjects, 100 independent calculations are performed for each sample size, resulting in a set of 500 d1 values (for five subjects, 100 comparisons with their model) and 2000 d2 values (for five subjects, 100 comparisons with the model of each of the other four subjects). The following functions of histogram comparison are considered: chi-square and correlation. The chi-square distribution function is as follows: dchi (H1 , H2 ) =

(H1 (I) − H2 (I))2 I

H1 (I)

,

(1)

where H1 is the histogram of the model distribution, H2 is the histogram of the subsample distribution. The correlation function is equal to correlation coefficient: H1 (I) − H1 H2 (I) − H2 (2) dcorr (H1 , H2 ) = I 2 2 , H H (I) − H (I) − H 1 1 2 2 I I where H1 , H2 are the average values of the corresponding histograms. The value of chi-square function represents distance: the smaller its value, the “closer” the corresponding histograms are. The value of correlation function represents proximity: the greater its value, the “closer” the corresponding histograms are. The identity verification algorithm requires a threshold for histogram comparison. The choice of the threshold in real applications should be made taking into account the specifics of a task. In this paper, we use such a threshold that leads to no more than 5% False Rejection Rate (FRR) and minimizes the False Acceptance Rate (FAR).

4

Results

FRR (for operator’s identification task) for the correlation and chi-square functions decrease with the increase in the subsample size. The FRR of the selected thresholds for the Timer and Shape stimuli (Table 2) is low as compared to the Sound and Horizon stimuli. This is due to two factors: first, the number of RT

Impact of Assistive Control on Operator Behavior

59

values for these stimuli in the dataset for each test is about 300 values, making the subsample of 128 values close to the full model of the subject, substantially reducing the distance (or increasing proximity) between them; second, the correct response to these stimuli is more difficult for the subject than to the Sound and Horizon stimuli, which can lead to visible differences in the behavior of the subjects. Table 2. The FRR for the Timer and Shape stimuli calculated for the FAR of 5% Subsample size

Chi-square

Correlation

4

93%

84%

67%

54%

8

91%

85%

51%

29%

16

83%

73%

33%

20%

32

73%

53%

12%

5%

64

12%

12%

0%

1%

128

0%

0%

0%

0%

Timer Shape Timer Shape

We define the subject’s error as an average penalty per second value within an experiment. An average error is defined as error averaged among all experiments for a certain Mode. Table 3 illustrates the subject’s average error in different scenarios and the portion of average error reduction, which is calculated as the proportion of experiments in the Hard Mode 2 in which the subject’s error was lower than the average error in the Hard Mode 1. It can be seen that for most subjects, the addition of a visual delay in the appearance of the stimulus on average increased efficiency. There are special cases with Subject 3, where the efficiency has increased significantly, and with Subject 4, where the efficiency, on the contrary, has decreased. Such changes indicate the individual nature of the perception of simultaneous stimuli and different behaviors that are optimal for different subjects. Table 3. Average errors and error reduction coefficients when adding visual delay to the Sound stimulus. Explanations in the text Subject The average error value in the Hard Mode 1

The average error value in the Hard Mode 2

Portion of average error reduction

1

0.81

0.78

0.73

2

0.63

0.60

0.71

3

1.11

0.87

1.00

4

0.58

0.65

0.33

5

0.70

0.70

0.71

60

M. Kopeliovich et al.

In the case of Normal Mode experiments, the correlation coefficients between the set of error values of all subjects in all experiments and the set of values of the distances or proximity of the subsamples obtained from the relevant experiments to the model of the subject are calculated. Table 4. The correlation coefficients calculated for the metrics under consideration between the error and the correspondence of the subject’s model Metrics Chi-square

Sound Timer Horizon Shape 0.35

0.14

0.25

0.21

Correlation −0.26 −0.18 −0.08

−0.29

Table 4 indicates low correlation for any metric and any stimulus. The highest values of the coefficient are achieved on the Sound stimulus. Chi-square function represents distance, so the correlation is positive: the greater the distance to the own model, the greater the error. Similarly, we can explain the negative correlation for the correlation function. Normal Mode experiments, where the subject’s error was more than twice their average error are referred to as failed experiments (experiments with critical errors). There were three such experiments: one for each of the Subjects 3, 4, and 5. We consider the hypothesis that the model of the subject’s behavior in these experiments is significantly different from the subject general model. To test the hypothesis, an analysis similar to the problem of recognition of subjects is carried out, except for only three subjects participating in the analysis, and only values in the selected experiments being used as subsamples of RT values.

5

Conclusion

A behavioral model of the operator was built based on histograms of the distribution of reaction times (RT) for particular stimuli in the test experiment which imitates certain pilot’s actions. It was shown that such RT distribution is unique to an individual, and therefore the operator can be identified based only on behavioral model matching. Accuracy of such identification depends on a number of reaction times registered for the person being identified. For example, having 64 measurements of RT for Timer or Shape stimuli, FAR was 0% at fixed FRR = 5% when identifying a particular operator among 5 possible ones. According to our results, deviation of RT in particular experiment from typical distribution for the operator is a weak correlate of performance; strong deviation may indicate worse performance. Failure of behavioral model identification of particular operator (besides indicating possible operator replacement) is a strong indicator of unsafe or inefficient behavior, especially for the Shape stimulus. A particular case of assistive control was considered which involves adding latency to stimuli visualization for suppression of perceptive or cognitive conflict

Impact of Assistive Control on Operator Behavior

61

which occurs if an operator’s reaction is required for multiple stimuli appearing simultaneously. This assistive control had an individual impact on an operator’s performance; it increases the performance of most of the subjects in the test experiment. The behavior model of the subject in a failed experiment differs from the typical behavior model by some metrics. For example, only 8% of such samples have been attributed to the general model of the operator behavior if using correlation as proximity measure, hence in 92% of cases, the model behavior would be recognized as atypical for this metric. Acknowledgments. This work is supported by the Russian Ministry of Science and Higher Education, project no. 2.955.2017/4.6.

References 1. Aloui, Z., Ahamada, N., Denoulet, J., Pierre, F., Rayrole, M., Gatti, M., Granado, B.: Embedded real-time monitoring using SystemC in IMA Network. In: SAE 2016 Aerospace Systems and Technology Conference, September 2016, Hartford, United States, pp. 1–4 (2016) 2. Didactic, F.: Process Control. Pressure, Flow, and Level. Legal Deposit — Library and Archives Canada (2010) 3. Dittmeier, C., Casati, P.: Evaluating internal control systems. In: IIARF (2014) 4. Fotopoulos, J.: Process Control and Optimization Theory. Application to Heat Treating Processes. Air Products and Chemicals Inc, Allentown (2006) 5. Kim, B., Bishu, R.: On assessing operator response time in human reliability analysis (HRA) using a possibilistic fuzzy regression model. Reliab. Eng. Syst. Saf. 52(1), 27–34 (1996) 6. Mahmud, J., Chen, J., Nichols, J.: When will you answer this? Estimating response time in Twitter. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, pp. 697–700 (2013) 7. Martin, S., Vora, S., Yuen, K., Trivedi, M.: Dynamics of driver’s gaze: explorations in behavior modeling and maneuver prediction. IEEE Trans. Intell. Veh. 3(2), 141–150 (2018) 8. O’Connor, D.: A Process Control Primer. Honeywell, Charlotte (2000) 9. Olum, Y.: Modern management theories and practices. In: East African Central Banking Course, vol. 1, No. 11, pp. 5–6 (2004) 10. Rao, G.P.: Basic elements of control system. Control Syst. Robot. Autom. 1 (2009) 11. Stouffer, K., Pillitteri, V., Lightman, S., Abrams, M., Hahn, A.: Guide to industrial control systems (ICS) security (2015)

Hierarchical Actor-Critic with Hindsight for Mobile Robot with Continuous State Space Staroverov Aleksey1 and Aleksandr I. Panov2,3(&) 1

Bauman Moscow State University, Moscow, Russia [email protected] 2 Artificial Intelligence Research Institute, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Moscow, Russia [email protected] 3 Moscow Institute of Physics and Technology, Moscow, Russia

Abstract. Hierarchies are used in reinforcement learning to increase learning speed in sparse reward tasks. In this kind of tasks, the main problem is elapsed time, required for the initial policy to reach the goal during the first steps. Hierarchies can split a problem into a set of subproblems that could be reached in less time. In order to implement this idea, Hierarchical Reinforcement Learning (HRL) algorithms need to be able to learn the multiple levels within a hierarchy in parallel, so these smaller subproblems could be solved at the same time. Most famous existing HRL algorithms that can learn multi-level hierarchies are not able to efficiently learn levels of policies simultaneously, especially in continuous space and action space environment. To address this problem, we had analyzed the newest existing framework, Hierarchical Actor-Critic with Hindsight (HAC), test it in the simulated mobile robot environment and determine the optimal configuration of parameters and ways to encode information about the environment states. Keywords: Hierarchical Actor-Critic Reinforcement learning

Hindsight Experience Replay

1 Introduction Hierarchy has the potential to greatly accelerate reinforcement learning in sparse reward tasks because hierarchical agents can decompose problems into smaller subproblems. In order to take advantage of the sample efficiency benefits of multi-level hierarchies, an HRL algorithm must be able to learn the multiple levels within the hierarchy in parallel. That is, hierarchical agents need to be able to simultaneously learn both the appropriate subtasks and the sequences of primitive actions that achieve each subtask. Yet the existing HRL algorithms that are capable of automatically learning hierarchies in continuous domains [1–5] do not efficiently learn the multiple levels within the hierarchy in parallel [10–12]. Instead, these algorithms often resort to learning the hierarchy one level at a time. HAC can learn multiple levels of policies in parallel. The hierarchies produced by HAC framework are comprised of a set of nested, © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 62–70, 2020. https://doi.org/10.1007/978-3-030-30425-6_6

HAC with Hindsight for Mobile Robot with Continuous State Space

63

goal-conditioned policies that use the state space to decompose a task into short subtasks. Authors demonstrate experimentally in both grid world and simulated robotics domains that HAC approach can significantly accelerate learning relative to other nonhierarchical and hierarchical methods. Thus, the HAC framework [6] is the first to successfully learn 3-level hierarchies in parallel in tasks with continuous state and action spaces (Fig. 1).

Fig. 1. Results of HAC framework with one, two and three level of hierarchy.

As a more realistic example, we tested HAC method using a simulated mobile robot. The simulated environment and robot are shown in Fig. 2. The environment has dimensions of 10 units by 10 units with walls and two visible rooms. The robot has nine sonar sensors, a simple vision system with a 1-D ‘retina’, having a 135” field of view and a gripper with a sensor that signals when a sphere is being gripped. The vision system was modeled as a pinhole camera. Each sonar had a 15” field of view. The sonar reading is the distance to the closest object within that 15” field of view but it does not contain the exact heading to the object. This is consistent with the observed behavior of sonars on physical robotic platforms.

Fig. 2. Simulated mobile robot environment (left) and HAC hierarchy (right).

64

S. Aleksey and A. I. Panov

The primitive action set consist of five actions: distance forward, distance forward with angle rotation to the left and right, and the left or right rotation without distance forward. If robot reaches the yellow sphere, the simulation is over and we achieve our purpose.

2 Hierarchical Actor-Critic Algorithm The hierarchies produced by HAC framework have a specific architecture consisting of a set of nested, goal-conditioned policies that use the state space as the mechanism for breaking down a task into subtasks. The hierarchy of nested policies works as follows. The highest-level policy takes as input the current state and goal state provided by the task and outputs a subgoal state. This state is used as the goal state for the policy at the next level down. The policy at that level takes as input the current state and the goal state provided by the level above and outputs its own subgoal state for the next level below to achieve. This process continues until the lowest level is reached. The lowest level then takes as input the current state and the goal state provided by the level above and outputs a primitive action (Fig. 2). Further, each level has a certain number of attempts to achieve its goal state. When the level either runs out of attempts or achieves its goal state, execution at that levels ceases and the level above outputs another subgoal. The purpose of HAC framework is to efficiently learn a k-level hierarchy Pk1 consisting of k individual policies p0 ; . . .; pk1 , in which k is a hyperparameter chosen by the user. In order to learn p0 ; . . .; pk1 in parallel, HAC framework transforms the original Universal Markov Decision Process UMDP, Uoriginal ¼ ðS; G; A; T; R; cÞ, into a set of k UMDPs U0 ; . . .; Uk1 , in which Ui ¼ ðSi ; Gi ; Ai ; Ti ; Ri ; ci Þ. In our example with mobile robot, we choose HAC with 3 level of policies ðp3 ; p2 ; p1 Þ as the most successful method, tested in the original article. The green sphere is the policy level 2 ðp2 Þ goal, purple sphere is the policy level 1 ðp1 Þ goal (Fig. 2). When the algorithm is fully trained, the agent must first reach purple sphere with at most of 20 ticks of time, then reach green sphere with at most of 20 attempts of generating purple sphere and finally reach goal yellow sphere with at most of 20 attempts of generating green sphere. HAC approach enables agents to learn multiple policies in parallel using only sparse reward functions because of two types of hindsight transitions. The example of such transitions can be shown at next simple toy environment (Fig. 3).

Fig. 3. An example episode trajectory of the toy environment.

HAC with Hindsight for Mobile Robot with Continuous State Space

65

The tic marks along the trajectory show the next states for the robot after each primitive action is executed. The pink circles show the original subgoal actions. The gray circles show the subgoal states reached in hindsight after at most H actions by the low-level policy. Hindsight action transitions help agents learn multiple levels of policies simultaneously by training each subgoal policy with respect to a transition function that simulates the optimal lower level policy hierarchy. For toy example, action transition for the states S0 and S1 would be like: • [initial state = s0, action = s1, reward = −1, next state = s1, goal = yellow flag, discount rate = gamma] • [initial state = s1, action = s2, reward = −1, next state = s2, goal = yellow flag, discount rate = c] The second type of hindsight transition, hindsight goal transitions, helps each level learn a goal-conditioned policy in sparse reward tasks by extending the idea of Hindsight Experience Replay [7] to the hierarchical setting. The hindsight goal transition created by the fifth primitive action that achieved the hindsight goal would be: • [initial state = 4th tick mark, action = joint torques, reward = 0, next state = s1, goal = s1, discount rate = 0] Assuming the last state reached s5 is used as the hindsight goal, the first and the last hindsight goal transition for the high level would be: • [initial state = s0, action = s1, reward = −1, next state = s1, goal = s5, discount rate = c] • [initial state = s4, action = s5, reward = 0, next state = s5, goal = s5, discount rate = 0] Hindsight goal transitions should significantly help each level learn an effective goal-conditioned policy because it guarantees that after every sequence of actions, at least one transition will be created that contains the sparse reward (in our case a reward and discount rate of 0). These transitions containing the sparse reward will in turn incentivize the UVFA critic function to assign relatively high Q-values to the (state, action, goal) tuples described by these transitions. The UVFA can then potentially generalize these high Q-values to the other actions that could help the level solve its tasks. Technically, HAC builds off three techniques from the reinforcement learning literature [2]: • the Deep Deterministic Policy Gradient (DDPG) learning algorithm [8] • Universal Value Function Approximators (UVFA) [9] • Hindsight Experience Replay (HER) [7].

66

2.1

S. Aleksey and A. I. Panov

DDPG: An Actor-Critic Algorithm

DDPG serves as the key learning infrastructure within Hierarchical Actor-Critic. It is an actor–critic algorithm and thus uses two neural networks to enable agents to learn from experience. The actor network learns a deterministic policy that maps from states to actions p:S!A

ð1Þ

The critic network approximates the Q-function or the action-value function of the current policy. Qp ðst ; at Þ ¼ E ½Rt jst ; at

ð2Þ

Rt: the discounted sum of future rewards. Thus, the critic network maps from (state, action) pairs to expected long-term reward: Q:SA!R

ð3Þ

The agent first interacts with the environment for a period using a noisy policy pðsÞ þ N ð0; 1Þ. The transitions experienced are stored as ðst ; at ; rt ; st þ 1 ; gt ;Þ. The agent then updates its approximation of the Q-function of the current policy by performing min-batch gradient descent on the loss function: L ¼ ðQðst ; at Þ yt Þ2

ð4Þ

yt ¼ rt þ cQðst þ 1 ; pðst þ 1 ÞÞ

ð5Þ

yt the Bellman estimate of the Q-function The agent modifies its policy based on the updated approximation of the actionvalue function. The actor function is trained by moving its parameters in the direction of the gradient of Q w.r.t. the actors parameters. The hierarchical policy is composed of multiple goal-based policies or actor networks (Fig. 4). 2.2

Universal Value Function Approximator

Value functions are a core component of reinforcement learning systems. The main idea is to construct a single function approximator V (s; h) that estimates the long-term reward from any state s, using parameters h. The main idea of Universal Value Function Approximator (UVFA) [9] compare to ordinary value function approximator is to generalize not just over states s but also over goals g. UVFA improve learning, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from s and g to these factored embedding vectors (Fig. 5).

HAC with Hindsight for Mobile Robot with Continuous State Space

67

Fig. 4. Actor-Critic networks for hierarchical policy with 1 sub-goal layer.

Fig. 5. Diagram of the function approximation architectures.

At the left side of the figure, there is concatenated architecture. At the center is twostream architecture with two separate sub-networks combined at h. At the right shown a decomposed view of two-stream architecture when trained in two stages, where target embedding vectors are formed by matrix factorization (right sub-diagram) and two embedding networks are trained with those as multi-variate regression targets (left and center sub-diagrams). Thus, instead of Qp ðst ; at Þ ¼ E ½Rt jst ; at , we use Qp ðst ; at ; gt Þ ¼ E½Rt jst ; at ; gt .

68

2.3

S. Aleksey and A. I. Panov

Hindsight Experience Replay

As the toy robot example illustrates, it can be difficult for any level in our framework to receive the sparse reward. In the Actor-Critic algorithm, a buffer of past experiences is used to stabilize training by decorrelating the training examples in each batch used to update the neural network. This buffer records past states, the actions taken at those states, the reward received, the next stat that was observed and goal that we wanted to achieve. As we have seen, the data in the experience replay buffer can originate from an exploration policy, which raises an interesting possibility; what if we could add fictitious data, by imagining what would happen had the circumstance been different? This is exactly what Hindsight Experience Replay (HER) [7] does in fact. Even though an agent may have failed to achieve its given goal in an episode, the agent did learn a sequence of actions to achieve a different objective in hindsight – the state in which the agent finished. Thus, learning how to achieve different goals in the goal space should help the agent better determine how to achieve the original goal, by creating a separate copy of the transitions that occurred in an episode and replacing: • the original goal with the goal achieved in hindsight • the original reward with the appropriate value given the new goal.

3 Experiment For mobile robot environment as features, we choose x, y position, angle of rotation, the x, y component of velocity and sensors data. Combination with all of those features gives better performance compared to only x, y components and angle rotation. Only sensors data lead to big problems, cause at different agent position, sensors can return the same data. Due to that, higher levels of hierarchy will fail to give the proper goal to lower level. Without the velocity vector, the agent also can get trouble with the learning process, because the next state starts to depend on the previous state that breaks the MDP concept (Fig. 6).

Fig. 6. Figure compares the performance of HAC with sensors data features (right) and without sensors data features (left). The charts show the average success rate.

HAC with Hindsight for Mobile Robot with Continuous State Space

69

4 Results Hierarchy has the potential to accelerate learning in sparse reward tasks because hierarchy can decompose tasks into short horizon subtasks. A new framework HAC can solve those simpler subtasks simultaneously. As for our mobile robot environment with continuous space of states, the HAC outperform basic algorithms by 20–30% with a relatively small area of the environment. With increasing of the size, this gap growth will continue due to hindsight actions. One of the issues of this approach, that we cannot set other different rewards than uses in hindsight. Due to that, we cannot penalize actions that lead the agent to the wall collisions. It could be harmful when we would train the algorithm in the real world with a real agent. The biggest advantage of this approach is definitely a hierarchical neural network structure, because we can transfer higher levels of neural networks weights to another agent or another environment, which will dramatically decrease the time of training. Acknowledgements. The reported study was supported by RFBR, research Projects No. 17-2907079.

References 1. Schmidhuber, J.: Learning to generate sub-goals for action sequences. In: Kohonen, T., Mäkisara, K., Simula, O., Kangas, J. (eds.) Artificial Neural Networks, pp. 967–972. Elsevier Science Publishers B.V., North-Holland (1991) 2. Konidaris, G.D., Barto, A.G.: Skill discovery in continuous reinforcement learning domains using skill chaining. Adv. Neural. Inf. Process. Syst. 22, 1015–1023 (2009) 3. Bacon, P.-L., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 1726–1734 (2017) 4. Vezhnevets, A., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., Kavukcuoglu, K.: FeUdal networks for hierarchical reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, pp. 3540–3549 (2017) 5. Nachum, O., Gu, S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. Adv. Neural. Inf. Process. Syst. 31, 3303–3313 (2018) 6. Levy, A., Konidaris, G., Platt, R., Saenko, K.: Learning multi-level hierarchies with hindsight. arXiv:1712.00948. [cs.AI], March 2019 7. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay. Adv. Neural. Inf. Process. Syst. 30, 5048–5058 (2017) 8. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. CoRR (2015). arXiv:1509.02971 9. Silver, D., Schaul, T., Horgan, D., Gregor, K.: Universal value function approximators. In: International Conference on Machine Learning (July 2015) 10. Shikunov, M., Panov, A.I.: Hierarchical reinforcement learning approach for the road intersection task. In: Samsonovich, A.V. (ed.) Biologically Inspired Cognitive Architectures 2019. Springer, Cham (2019)

70

S. Aleksey and A. I. Panov

11. Kuzmin, V., Panov, A.I.: Hierarchical reinforcement learning with options and united neural network approximation. In: Abraham, A., Kovalev, S., Tarassov, V., Snasel, V., Sukhanov, A. (eds.) Proceedings of the Third International Scientific Conference “Intelligent Information Technologies for Industry” (IITI 2018), pp. 453–462. Springer, Cham (2018) 12. Ayunts, E., Panov, A.I.: Task planning in “Block World” with deep reinforcement learning. In: Samsonovich, A.V., Klimov, V.V. (eds.) Biologically Inspired Cognitive Architectures (BICA) for Young Scientists, pp. 3–9. Springer, Cham (2017)

The Hybrid Intelligent Information System for Music Classification Aleksandr Stikharnyi, Alexey Orekhov, Ark Andreev, and Yuriy Gapanyuk(B) Bauman Moscow State Technical University, Moscow, Russia [email protected]

Abstract. The article proposes an approach for music classification problem using hybrid intelligent information systems (HIIS). The HIIS consists of two main components: the subconsciousness module and the consciousness module. The subconsciousness module is implemented as a set of binary classifiers based on the LSTM network. The output of the subconsciousness module is the metadata stored in the metadata buffer. The consciousness module is implemented using decision trees approach. The implementation is based on the CART algorithm from the scikit– learn library. The output of the consciousness module is the predicted class of the music classification problem. The experiments were conducted using custom dataset. The algorithms of three levels of complexity were used for experiments: the logistic regression approach (the simplest model), the multilayer perceptron approach (the model of medium complexity), the HIIS approach (the model of high complexity). The results of the experiments make sure the validity of the proposed HIIS approach. Keywords: Music classification problem · Hybrid intelligent information system (HIIS) · Subconsciousness module · Consciousness module Decision trees

1

· LSTM ·

Introduction

Recently, the use of machine learning in the field of music processing is significantly increasing. One of the tasks of such processing is the problem of music classification. The review of existing methods for this problem is considered in details in [1,2]. We propose an approach based on the concept of hybrid intelligent information systems (HIIS). In this article, we will consider the details of the HIIS-based system implementation for music classification and discuss the results of experiments.

2

The HIIS-Based Approach for Music Classification

The music classification problem may be described as follows. Let S be an arbitrary set of musical compositions consisting of vectors x ∈ S; {1, 2, . . . , N } is a c Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 71–77, 2020. https://doi.org/10.1007/978-3-030-30425-6_7

72

A. Stikharnyi et al.

set of N user classes. Then the classification problem is reduced to the construction of the mapping algorithm: f ∗ : {x|x ∈ S} ⇒ {1, 2, . . . , N },

(1)

consistent with the real users of the system. In other words, it is necessary to build an algorithm that assigns to each music track of an arbitrary set one of the predefined class labels according to users’ real music preferences. For this purpose, the HIIS-based approach is used. The HIIS approach is described in details in the paper [3]. According to the [3] the HIIS consists of two main components: the subconsciousness module (MS) and the consciousness module (MC). The subconsciousness module is related to the environment in which a HIIS operates. Because the environment can be represented as a set of continuous signals, the data processing techniques of the MS are mostly based on neural networks, fuzzy logic, and combined neuro-fuzzy methods. The consciousness module performs logical processing of information. It may be based on traditional programming or workflow technology, and in particular, the rule-based programming approach is gaining popularity. In the proposed approach, both MS and MC are based on machine learning algorithms. The generalized structure of the intelligent system is represented in Fig. 1.

Fig. 1. The generalized structure of the intelligent system

It should be noted that the hybrid system as a whole is implemented as an intelligent agent using the experience replay approach [4,5]. The metadata buffer is used for replay purposes. This approach allows us to move from sequential to parallel training. After receiving first metadata, we generate mini-batches from metadata buffer and train the subconscious and the consciousness modules at the same time. The implementation of the subconsciousness and the consciousness modules are discussed in details in the following sections.

The Hybrid Intelligent Information System for Music Classification

3

73

The Implementation of the Subconsciousness Module

The subconsciousness module is implemented as a set of binary classifiers [6] based on the LSTM network [7]. On the input classifier received of vectors describing small time intervals in sequence. To do this, it is necessary to break down into N images of 4096 samples length, each of which is given in the amplitude-frequency characteristic using the FFT transform [8]. This transformation, in effect, turns the time series into a set of the magnitude of different frequencies that make up the time series. This snapshot length provides optimal resolution in both frequency and time. For a file with a sampling frequency of 44100 Hz–11 Hz and 93 ms. The set of values obtained after the transformation is converted into a vector using the convolution function (moving average) and statistical estimates. As a rule, intervals of 1–10 s are used as an interval for work. Since it is important for us to trace the dynamics of changes in parameters, each training vector will include sequences of these parameters in time. It is worth saying that we have a restriction on the length of the training vector dictated by computational capabilities. For this reason, if you increase the time expansion, you will have to reduce the frequency expansion. Thus, each vector will describe 20 frequency corridors (500 Hz) and their dynamics over the interval of 60 s, taking into account overlaps. The structure of the LSTM network is represented in Fig. 2. Thus, the output of the subconsciousness module is the metadata stored in the metadata buffer.

4

The Implementation of the Consciousness Module

The goal of the consciousness module is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the metadata (stored in the metadata buffer). The metadata was received from a large number of binary classifiers. The metadata is a set of relative probabilities that an object belongs to the two randomly selected classes. We assume that any object belongs to all classes, but with a different probability. Then the task will be to “fit” the mixture of distributions to the data and then to determine the probabilities of the observations belonging to each class. Obviously, the observation should be assigned to the class for which this probability is higher. To solve this task, we propose to use the decision trees as a kind of rule-based approach. The practical implementation of the system is based on the CART algorithm [9] from the scikit-learn library [10]. As an impurity measure the Gini impurity was used [11]: pmk (1 − pmk ) (2) H(Xm ) = k

where pmk - the proportion of class k observations in node m.

74

A. Stikharnyi et al.

Fig. 2. The structure of the LSTM network

Fig. 3. The structure of the consciousness decision tree

The structure of the consciousness decision tree is represented in Fig. 3. Thus, the output of the consciousness module is the predicted class of the music classification problem.

The Hybrid Intelligent Information System for Music Classification

5

75

The Experiments

For our research, we used custom dataset, which includes 1378 tracks divided into three classes, with a sampling frequency of 44,100 Hz, the average track length was 137.3 s. Since we used custom dataset, we did not have the opportunity to compare the obtained quality metrics with the quality metrics obtained by other researchers. Therefore, to ensure the validity of the proposed approach, we have conducted experiments with algorithms of three levels of complexity: the logistic regression approach (the simplest model); the multilayer perceptron approach (the model of medium complexity) [12]; the HIIS approach (the model of high complexity). The Precision, Recall and F1 -score (F-Measure) were used as classification metrics [13]. The experiments results are represented in Table 1. Table 1. The experiments results Approach The HIIS approach

Class0 Class1 Class2

Precision Recall F1 -score

TP 129 FP 1 FN 7 TN 1

108 0 20 3

140 5 3 3

377 0.984 6 30 7

0.926

0.95

The multilayer TP 115 perceptron approach FP 27 FN 19 TN 2

98

133

346 0.797

0.813

0.8

29 35 3

32 34 1

103

127

0.691

0.72

42 71 4

56 38 2

The logistic regression approach

TP 119 FP FN TN

17 47 0

88 80 6 349 0.751 115 156 6

The results of the experiments turned out to be expected. The logistic regression approach (the simplest model) shows the worst results. The multilayer perceptron approach (the model of medium complexity) shows the medium results. The HIIS approach (the model of high complexity) shows the best results. Thus, the results of the experiments make sure the validity of the proposed approach. To assess the quality of the classifier (HIIS model), ROC-curves were built [14]. The ROC function can also be used in multiclass classification if the predicted outputs have been binarized. For this reason, ROC-curves are plotted for each class. There are then a number of ways to average binary metric calculations across the set of classes, each of which may be useful in some scenario (we use micro-average and macro-average metrics). AUC (Area Under Curve) is an

76

A. Stikharnyi et al.

Fig. 4. ROC curves for the HIIS model

aggregated quality characteristic of a classification, independent of the price ratio of errors. The higher the AUC value, the better the classification model. The following results were obtained for the HIIS model: AU Cmicro−average = 0.87, AU Cmacro−average = 0.86. The ROC curves for the HIIS model are represented in Fig. 4.

6

Conclusions

The article proposes an approach for music classification problem using hybrid intelligent information systems (HIIS). The hybrid system as a whole is implemented as an intelligent agent using the experience replay approach. The subconsciousness module is related to the environment in which a HIIS operates. Because the environment can be represented as a set of continuous signals, the data processing techniques of the MS are mostly based on neural networks, fuzzy logic, and combined neuro-fuzzy methods. In the proposed approach, it is implemented as a set of the binary classifier based on the LSTM network. The consciousness module performs logical processing of information. It may be based on traditional programming, workflow technology, rule-based programming. In the proposed approach it is implemented using decision trees. The experiments were conducted using custom dataset. The results of the experiments make sure the validity of the proposed approach.

The Hybrid Intelligent Information System for Music Classification

77

References 1. Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification and annotation. IEEE Trans. Multimedia 13(2), 303–319 (2011). https://doi.org/ 10.1109/TMM.2010.2098858 2. Goienetxea, I., Mart´ınez-Otzeta, J.M., Sierra, B., Mendialdua, I.: Towards the use of similarity distances to music genre classification: a comparative study. PloS one 13(2), e0191417 (2018). https://doi.org/10.1371/journal.pone.0191417 3. Chernenkiy, V., Gapanyuk, Yu., Terekhov, V., Revunkov, G., Kaganov, Y.: The hybrid intelligent information system approach as the basis for cognitive architecture. Procedia Comput. Sci. 145, 143–152 (2018). http://www.sciencedirect.com/ science/article/pii/S187705091832307X 4. Zhang, S., Sutton, R.S.: A deeper look at experience replay. arXiv preprint arXiv:1712.01275 (2017) 5. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay. arXiv preprint arXiv:1707.01495 (2017) 6. Koyejo, O.O., Natarajan, N., Ravikumar, P.K., Dhillon, I.S.: Consistent binary classification with generalized performance metrics. In: Proceedings of the 27th International Conference on Neural Information Processing Systems – vol. 2, pp. 2744–2752 (2014) 7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 8. Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990) 9. Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applications, 2nd edn. World Scientific Publishing Co., New Jersey (2014) 10. The Scikit-Learn Library: Decision trees. https://scikit-learn.org/stable/modules/ tree.html. Accessed 24 May 2019 11. Modarres, R., Gastwirth, J.L.: A cautionary note on estimating the standard error of the Gini index of inequality. Oxf. Bull. Econ. Stat. 68(3), 385–390 (2006) 12. Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994) 13. Powers, D.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011) 14. Pepe, M.S.: The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, Oxford (2004)

The Hybrid Intelligent Information System for Poems Generation Maria Taran, Georgiy Revunkov, and Yuriy Gapanyuk(&) Bauman Moscow State Technical University, Moscow, Russia [email protected]

Abstract. Any generated text must have a “form” and “content” components. It is the “content” that is the main component of the generated text, but the “form” component is no less important. It may be necessary to generate texts in different linguistic styles among them the poetic linguistic style. The article proposes an approach for poems generation problem using hybrid intelligent information systems (HIIS). The HIIS consists of two main components: the subconsciousness module and the consciousness module. In the case of poems generation, the subconsciousness module consists of two submodules: the stress placement module and the rhyme and rhythm module. These modules use machine learning techniques. The consciousness module includes the poem synthesis module, which is rule-based. The stress placement module is based on the convolutional neural network. On the test dataset, the accuracy of the classifier is 97.66%. The rhyme and rhythm module based on neural networks with a depth of 5–7 layers. On the test dataset, the accuracy of the classifier is 91.63%. Keywords: Natural-Language Generation (NLG) Hybrid Intelligent Information System (HIIS) Subconsciousness module Consciousness module LSTM Convolutional neural network

1 Introduction According to the Gartner’s report on BI Tools 2018 [1], “By 2020, natural language generation and artificial intelligence will be a standard feature of 90% of modern business intelligence platforms”. Thus, taking into account the needs of the industry, the natural-language generation (NLG) is a very important area of software engineering development [2]. The business intelligence platform may be considered as a special case of an intelligent assistant agent. The concepts of such assistants have long been developed, and there is no doubt that such software systems or hardware-software devices will be used more and more. The natural-language generation (NLG) module should also be part of such assistants. It should be noted that at present, the area of text-based speech synthesis is also actively developing. Therefore, we can assume that solving the problem of generating text, we simultaneously create both a writing and a speaking agent.

© Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 78–86, 2020. https://doi.org/10.1007/978-3-030-30425-6_8

The Hybrid Intelligent Information System for Poems Generation

79

Any generated text must have a “form” and “content” components. There is no doubt that it is the “content” that is the main component of the generated text. But if the assistants will be widely used, the questions of the “form” component of the transfer of the text become no less important. In the process of generating text, the assistant should be guided by the age of the interlocutor, his level of knowledge, and other aspects. Depending on the context of the situation and the peculiarities of the interlocutor, it may be necessary to generate texts in different linguistic styles. This article is devoted to the texts generation in the poetic linguistic style. On the one hand, the task of poems generation from the point of view of industry needs can be viewed as a “toy” one. Indeed, it is difficult to assume that even in the distant future financial statements will be formed in a poetic form. But on the other hand, the task of poems generation is simply a special case of the task of generating texts in different “forms”. According to [3], traditional approaches to the generation of poems include: 1. Template Based Poetry Generation: templates of poetry forms are filled with words that suit the defined constraints (either syntactic, rhythmic, or both). 2. Generate and Test Approaches: random word sequences are produced according to formal requirements, that may involve metric, other formal constraints, and semantic constraints. 3. Case-Based Reasoning Approaches: existing poems are retrieved, considering a targeted message provided by the user, and are then adapted to fit the required content. 4. Evolutionary Approaches: poetry generation is based on evolutionary computation. Obviously, the only evolutionary approach takes full advantage of the methods of artificial intelligence. Within the framework of the evolutionary approach, one of the most detailed papers is the dissertation [4]. Now, according to [5], the methods of generating poems are increasingly using artificial intelligence methods, especially deep neural networks. The example of such an approach is an interactive poetry generation system “Hafez” [6, 7]. The “Hafez” system generates poems in three steps: 1. Search for related rhyme words given user-supplied topic. 2. Create a finite-state acceptor (FSA) that incorporates the rhyme words and controls meter. 3. Use a recurrent neural network (RNN) to generate the poem string, guided by the FSA. The features of the “Hafez” system is that it is firstly focused on dialogue with the user, and secondly generates poems in English. The proposed approach does not involve a dialogue with the user and is focused on the Russian language. Thus, despite the many years of effort, the generation of poems remains an open problem. And the authors hope that this article will be a small step in the direction of poems generation.

80

M. Taran et al.

2 The HIIS-Based Approach for Poems Generation To solve the poems generation problem, we propose to use the approach based on the hybrid intelligent information system (HIIS). The HIIS-based approach is described in details in the paper [8]. In this section, we will briefly review the HIIS-based approach and consider its application for poems generation. The generalized structure of a hybrid intelligent information system is represented in Fig. 1.

Fig. 1. The generalized structure of a hybrid intelligent information system

According to [8], the HIIS structure consists of two main components: the subconsciousness module (MS) and the consciousness module (MC). The MS (subconsciousness module) is related to the environment in which a HIIS operates. Because the environment can be represented as a set of continuous signals, the data processing techniques of the MS are mostly based on neural networks, fuzzy logic, combined neuro-fuzzy methods, and machine learning techniques. The MC (consciousness module) is traditionally based on conventional data and knowledge processing, which may be based on traditional programming, workflow technology, rule-based programming approach. The advantages of a rules-based approach include flexibility. In this case, the program is not hardcoded but forward chained with rules based on the data. The disadvantages include the possibility of rules cycling and the complexity of processing a large set of rules. Nowadays, for the processing of a large set of rules, the Rete algorithm and its modifications are used. To build the module of consciousness, it is possible to use machine learning techniques, for example, building a set of rules in the form of a decision tree.

The Hybrid Intelligent Information System for Poems Generation

81

From the interaction point of view, the following options or their combinations are possible in a HIIS: 1. Interaction is implemented through the environment. The MS reads the data from the environment, converts them, and transmits them to the MC. The MC performs logic processing and returns the results to the MS (if transformation is required) or directly to the environment. The MS transforms the results and writes them into the environment, where they can be read by another HIIS. 2. The MI (Module of Interaction) is used for the interaction with another HIIS. Depending on the tasks to be solved, the MI can interact with the MC (which is typical for conventional information systems) or with the MS (which is typical for systems based on soft computing). 3. User interaction can be carried out using the MC (which is typical for conventional information systems) or through the MS (which can be used, for example, in automated simulators). In the case of poems generation, the subconsciousness module consists of two submodules: the stress placement module and the rhyme and rhythm module. These modules use machine learning techniques. The consciousness module includes the poem synthesis module, which is rule-based. The generalized structure of the HIIS for poems generation is represented in Fig. 2. The text in the poetic form

The prose text

The stress placement module

The rhyme and rhythm module

The subconsciousness module (MS)

The poem synthesis module

The consciousness module (MC)

Fig. 2. The generalized structure of the HIIS for poems generation

The interaction is implemented through the environment. In this case, the text in prose or poetic form is considered as the environment. The proposed approach is implemented for the Russian language. The implementation of modules is discussed in details in the following sections.

3 The Stress Placement Module The input of the module is the word in Russian without stress, and the output is the same word, but with the stress.

82

M. Taran et al.

The module is built on a hybrid approach, combining both rule processing (for simple cases) and machine learning (for more complex cases). The module operation algorithm contains the following steps: 1. The input word is converted to the required format, morphological analysis is performed, and the initial dataset is formed for further processing. 2. In order to detect simple cases, the generated data is processed using a set of rules. For example, of such a rule, the Russian letter “Ё” is always stressed. 3. If none of the rules fires, then the machine learning model is used. 4. At the output of the module, a stressed word is formed in a human-readable format as well as in the form of a dataset for further processing. From the point of view of machine learning, the stress placement problem may be considered as a problem of multi-class classification. The features of the model are the word itself and additional data that are extracted after morphological analysis. The target feature is the position of the stressed letter in the word.

Fig. 3. The neural network architecture for the stress placement module

Experiments with convolutional neural network and LSTM-network were carried out during the construction of the classifier. With a comparable quality model based on convolutional neural network is trained much faster. The categorical cross-entropy was used as a loss function, and accuracy was used as a metric.

The Hybrid Intelligent Information System for Poems Generation

83

The neural network architecture was chosen experimentally. The final architecture is shown in Fig. 3. On the test dataset, the accuracy of the classifier was 97.66%. The neural network was trained for 16 epochs. The results are shown in Fig. 4.

Fig. 4. The metrics for the stress placement module

The example of stress placement module output (in Russian, stressed letters are capitalized), “нa ocнOвe Этиx дAнныx тpEбyeтcя вoccтaнoвИть нeЯвнyю зaвИcимocть тo ecть пocтpOить aлгopИтм cпocOбный для любOгo вoзмOжнoгo вxOднoгo oбъEктa вЫдaть дocтAтoчнo тOчный клaccифицИpyющий oтвEт”.

4 The Rhyme and Rhythm Module One or several sentences can be submitted to the module input depending on the total number of words. This behavior is caused by learning four-line stanzas. First, the input text is divided into words. The stress placement module is used to determine the stress for each word. Then the features for machine learning models are created. These features include the selected syllables and stresses, as well as the last few letters in words. Different machine learning methods were used to determine the appropriate words for rhyme, size, presence or absence of alliteration and other target features. A separate model was trained to predict each individual target feature. The search for words for rhyme is performed on the basis of a pre-formed dictionary. The dictionary contains both possible word endings and their alternation. Based on empirically selected rules, only the most probable word sequences for a given text are left in the dictionary. Neural networks with a depth of 5–7 layers were used to determine other target features.

84

M. Taran et al.

A separate task is the formation of a data set for models training. A dataset was prepared with poems by well-known authors suitable for specific conditions: each verse contains four lines; white poems are removed from the corpus. Since the resulting dataset contained several thousand examples, it was decided to set the values of the target features automatically. For this purpose, methods of dimensionality reduction [9] (PCA algorithm) and hierarchical clustering [10] were used. As a result, seven separate clusters were identified. The visualization of clusters obtained by the t-SNE [11] algorithm is shown in Fig. 5.

Fig. 5. The clusters visualization results

On the test dataset, the accuracy of the classifier was 91.63%. The neural network was trained for ten epochs. The results are shown in Fig. 6.

Fig. 6. The metrics for the rhyme and rhythm module

The Hybrid Intelligent Information System for Poems Generation

85

To improve the quality of classification, it is planned to enhance the quality of the dataset and work out the neural network architecture in more detail.

5 The Poem Synthesis Module The poem synthesis module is rule-based. The module input is the input prose text and the set of features received from the rhyme and rhythm module. The module output is the generated text in the poetic form. A stanza is not formed if the deviation from the template is two or more. Declination and conjugation of words are also not produced in the current version, which leaves room for further improvement in the quality of the system’s work. The example of the module output (in Russian, the input text is a fragment of the cookbook): “Heoбxoдим cыp пoxoжий нa бpынзy. \Oтличнo пoдoйдeт пapмeзaн. \Eгo нaдo пopeзaть кycoчкaми. \Bce cмeшaть и зaпpaвить мacлoм”.

6 Conclusions The article proposes an approach for poems generation problem using hybrid intelligent information systems (HIIS). HIIS consists of two main components: the subconsciousness module (MS) and the consciousness module (MC). In the case of poems generation, the subconsciousness module consists of two submodules: the stress placement module and the rhyme and rhythm module. These modules use machine learning techniques. The consciousness module includes the poem synthesis module, which is rule-based. The stress placement module is based on the convolutional neural network. On the test dataset, the accuracy of the classifier is 97.66%. The rhyme and rhythm module is based on neural networks with a depth of 5–7 layers. On the test dataset, the accuracy of the classifier is 91.63%. The task of poems generation is simply a special case of the task of generating texts in different “forms”. The proposed approach allows generating the text in the poetic form from the prose text.

References 1. Gartner Report on BI Tools 2018. https://systelligent.com/gartner-report-on-bitools-2018. Accessed 24 May 2019 2. Khurana, D., Koli, A., Khatter, K., Singh, S.: Natural language processing: state of the art, current trends and challenges. arXiv preprint. arXiv:1708.05148 (2017) 3. Gervas, P.: Exploring quantitative evaluations of the creativity of automatic poets. In: Workshop on Creative Systems, Approaches to Creativity in Artificial Intelligence and Cognitive Science, 15th European Conference on Artificial Intelligence (2002) 4. Manurung, H.M.: An evolutionary algorithm approach to poetry generation. Ph.D. thesis, Institute for Communicating and Collaborative Systems, School of Informatics, University of Edinburgh (2003)

86

M. Taran et al.

5. Pandya, M.: NLP based poetry analysis and generation. Technical report. https://doi.org/10. 13140/rg.2.2.35878.73285 (2016) 6. Ghazvininejad, M., Shi, X., Choi, Y., Knight, K.: Generating topical poetry. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1183– 1191 (2016). https://doi.org/10.18653/v1/d16-1126 7. Ghazvininejad, M., Shi, X., Priyadarshi, J., Knight, K.: Hafez: an interactive poetry generation system. In: Proceedings of ACL 2017, System Demonstrations, pp. 43–48 (2017). https://doi.org/10.18653/v1/p17-4008 8. Chernenkiy, V., Gapanyuk, Y., Terekhov, V., Revunkov, G., Kaganov, Y.: The hybrid intelligent information system approach as the basis for cognitive architecture. Procedia Comput. Sci. 145, 143–152 (2018). http://www.sciencedirect.com/science/article/pii/ S187705091832307X 9. Maaten, L.V., Postma, E.O., Herik, J.V.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10(66–71), 13 (2009) 10. Mishra, H., Tripathi, S.: A comparative study of data clustering techniques. Int. Res. J. Eng. Technol. (IRJET) 4(5), 1392–1398 (2017) 11. Linderman, G.C., Steinerberger, S.: Clustering with t-SNE, provably. arXiv preprint. arXiv: 1706.02582 (2017)

Cognitive Sciences and Brain-Computer Interface, Adaptive Behavior and Evolutionary Simulation

Is Information Density a Reliable Universal Predictor of Eye Movement Patterns in Silent Reading? Valeriia A. Demareva1(&) and Yu. A. Edeleva2 1

2

Lobachevsky State University, Nizhny Novgorod 603950, Russia [email protected] Technical University of Braunschweig, 38106 Brunswick, Germany

Abstract. The role of information density as a reliable universal predictor of eye movement patterns in silent reading is considered. Density differences between Russian and English are taken to explain the difference in eye movement patterns for readers with Russian as a native language compared to English-speaking readers. An empirical eye tracking study shows that only one of four expectations got confirmed. Supposedly, the eye-movement pattern observed for Russian could be influenced by some additional language-specific properties of Russian other than information density. We conclude that a universal algorithm that allows to predict eye movement patterns during silent reading based on language density only hardly ever exists. Keywords: Eye movements Modeling

Information density Reading Prediction

1 Introduction Today, many studies in the field of computer vision focus, among other things, on eye movement recognition [1, 2]. Therefore, a lot of computational models of eye movements appear, that help not only to make inferences about the linguistic processes involved in reading, but also to diagnose neurocognitive disorders [3, 4]. These models are usually developed for specific languages. To make a model cross-linguistically applicable it is important to define universal factors that determine eye-movement reading patterns. This paper investigates information density as such a universal factor. Variability between languages remains a key issue in psychology and linguistics, as understanding of universal patterns of reading can feed models of information processing. Frost et al. speak of the necessity to define independent cross-linguistic parameters that underlie theory-motivated models of reading [5]. Yet a number of scholars deny universality in reading patterns across languages [6]. If such universals exist, they represent general principles by which information from print is extracted by the writing processing system. Moreover, the most obvious prediction that can be made based on Universality Theory by Frost et al. is that there are different ways of visual information encoding in different writing systems. However, the time it takes to extract

© Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 89–94, 2020. https://doi.org/10.1007/978-3-030-30425-6_9

90

V. A. Demareva and Yu. A. Edeleva

encoded meaning should remain comparable across languages regardless of the type of encoding. Humans with unimpaired visual system sample their environment by making a series of fixations and saccades [7]. During fixations information intake from the encoded input takes place while saccades do not supply any useful data. However, the eye movements are largely under cognitive control and the analysis of temporal and spatial characteristics of saccades during reading can reflect cognitive processing [8]. A number of studies show that during reading the upcoming visual input is partially pre-processed in the parafovea [9, 10]. Thus, saccadic movements are akin to all human species irrespective of language and culture; saccadic sampling and retinal make-up determine the speed at which visual information is encoded and made available to the linguistic processor. Any language makes use of reading to extract information from written texts. However, reading process itself may differ widely across languages at different levels, from single words to phrases and the text as a whole. For example, while reading the same text translated into different languages the participants show the range of eye tracking patterns that vary in the number and length of fixations, and the length of forward saccades [11]. Thus, cross-language differences may affect the eye-tracking patterns observed in reading [12, 13]. One such peculiarity is the so-called language density reported in various studies [14, 15]. Density in a language is the amount of information conveyed by one structural unit, for instance, a word or a character. Different types of density are distinguished in the literature. Lexical density is defined by the number of lexical items such as nouns, verbs, adjectives, and adverbs used in the text [16]. It can be used to estimate lexical variability in L2 speech production [15, 16] or to study properties of texts from different corpora [17]. Semantic density is a number of semantic features associated with one verb. It is used in studies on language development, and language impairment in aphasia. Neighborhood density is defined as the number of words that differ from a given word in only one phoneme in any word position [19]. This type of language density can significantly influence oral language decoding [20]. Propositional density is a measure of content richness in language production [21]. For the study of written language decoding, two more kinds of density may be informative. Visual density is the amount of visual information that is available per unit of text [12]. Another type of density – information density - represents the amount of information per word, depending on research goals and context [12]. Information density is exploited to define the difference between language, for instance, German and English. Visual density has been shown to influence the length of forward saccades, and information density influences fixation durations cross-linguistically [12]. Thus, visual and information density may account for cross-linguistic differences in patterns observed for written language decoding. Letters, phonemes, and syllables cross-linguistically have different information density. For example, single letters or syllables of English except for single-letter pronouns (I), articles (a) or inflectional morphology (-s; -ed; -ing) are not usually syntactically informative. In a language like Russian, however, which has a relatively

Is Information Density a Reliable Universal Predictor of Eye Movement

91

transparent orthography and a rich inflectional paradigm, letters and syllables, especially at word offset, bear semantic and syntactic information. As a result, when it comes to higher-level processing, cross-linguistic differences may emerge in the relative utility of allocating attention to various features of the input [13]. To this extent, words of equal length can be considered visually denser in Russian than in English. Based on the assumption that writing systems differ as to their density, Liversedge et al. in [12] defined universal and language specific eye-movement patterns for Finnish, English and Chinese [12]. However, no such investigation has been made yet with regard to Russian language. To address the issue of language universality in reading, the study replicates the experiment in [12] for Russian, whose writing system differs from English as to a number or parameters (alphabet, agglutination etc.). A systematic comparative analysis of how information density affects the reading pattern for Russian and English might be revealing for cross-linguistic modelling of reading patterns. Both Russian and English are alphabetic languages that have vowels and consonants. Therefore, their information density can be collocated: words in Russian are longer than in English. Thus, the information density should be greater for English than for Russian. In line with the study [12], we should expect the number of fixations and saccade size to be greater and the fixation length to be shorter for Russian texts that are equivalent to original English texts used in [12].

2 Methods To test the participants’ proficiency in their native language (Russian) a C-test was compiled. The C-test allows to assess different types of linguistic knowledge at the micro- and macrolevel and requires mastery of grammar and vocabulary. It, thus, can be used to assess “global” language proficiency [22]. The C-test is also reported as highly reliable and valid [23]. The C-Test used in the study was a version of the story “Who is called Mowgli?” that was adapted as to the guidelines in [24]. The C-test included 40 words whose second half was deleted and had to be restored by the participants. Every response was scored on a three-point scale: 3 points - correctly recovered word; 2 points - the basis of the word is correctly chosen, but the word form is erroneous; 1 point - the basis of the word is correctly chosen, but the initial form is used; 0 points - wrong word basis/no answer. The maximum number of points - 120. 27 Russian-speaking students took part in the experiment. All of them scored high on the C-test (more than 95% of maximum score). The eye-movements were recorded with the help of SMI-Hi Speed Tracker 1250. The sampling rate was set to 500 Hz. The experiment began with a 9-point calibration. After that the participants had to read eight texts in Russian and answer comprehension questions. The texts used in the study were translated from the stimuli used in [12]. They were split down into between 2 to 4 slides, so that each slide contained 1–8 sentences. Courier New with 0.46 visual angle character subtension was used.

92

V. A. Demareva and Yu. A. Edeleva

English and Russian text corpora are compared in Table 1. Table 1. Stimulus descriptives: number of words in total; average number of words in a sentence; average word length (in characters). Descriptives Total number of words Average number of words in a sentence Average word length (in characters)

Russian 1676 11.72 6.1

English [12] 1762 14.68 5.63

The sentence was selected as a unit of analysis. Four measures reflecting global properties of eye-movements were computed: (1) Total Sentence Reading Time, (2) Average Number of Fixations, (3) Average Forward Saccade Size, and (4) Average Fixation Duration. The date points beyond three standard deviations and fixations shorter than 60 ms and longer than 800 ms were removed from further statistical analysis. Statistical analysis was performed in MS Excel and Statistica 10.0. Unifactorial dispersion analysis was used. Study design and procedures were approved by the Ethical Committee of Lobachevsky State University, and all participants provided written informed consent in accordance with the Declaration of Helsinki.

3 Results and Discussion Global eye movement measures for Russian texts as obtained in the study as well as for English texts are provided in Table 2.

Table 2. Global eye-movement measures: total sentence reading time (in ms), average number of fixations, average forward saccade size (in characters), and average fixation duration (in ms). Standard deviations are provided in Parentheses. Eye-movement measures Total sentence reading time Average number of fixations Average forward saccade size Average fixation duration

Russian 4302 (1865) 8.6 (2.38) 7.78 (1.79) 195 (23)

English [12] 3093 (777) 14.81 (2.93) 8.53 (1.55) 207 (32)

Compared to the results in the study [12], our results for Russian texts show longer average reading times, smaller number of fixations, shorter forward, and shorter fixations. The texts themselves had a significant influence on the eye movement measures. For instance, there was a significant effect of text type on the total sentence reading duration (F (6, 3701) = 27.5, p < 0,001), which could partially account for the observed results [25].

Is Information Density a Reliable Universal Predictor of Eye Movement

93

The study fully reproduced the experimental design and analysis algorithm used in [12], however, only one of the expectations (shorter fixation durations) got confirmed. Supposedly, the eye-movement pattern observed for Russian could be influenced by some additional language-specific properties of Russian other than information density. Russian and English both belong to the Indo-European family [26]. Russian belongs to the East-Slavic group [27], and English is a language of the West-Germanic group [26]. Compared to English, Russian is a highly inflectional language [28]. English orthography with its 26 letters is considered irregular and morphophonemic where a word’s sound pattern depends on its meaning. Russian uses Cyrillic script. The alphabet contains 33 letters. Compared to English, Russian orthography is considered fairly regular [29]. Moreover, different types of linguistic density (lexical [15, 16], semantic [18]) should also be considered.

4 Conclusion In this paper we studied information density as a possible universal predictor of eye movement patterns during silent reading. Based on the results of the study, we conclude that a universal algorithm that allows to predict eye movement patterns based on information density only hardly ever exists. Additional factors that underlie such predictions should be investigated. Acknowledgment. This work was supported by the Russian Foundation for Basic Research (grant No. 18-013-01169).

References 1. Leroux, M., Raison, M., Adadja, T., Achiche, S.: Combination of eye tracking and computer vision for robotics control. In: Proceedings of 2015 IEEE International Conference on Technologies for Practical Robot Applications (TePRA), Woburn, pp. 1–6 (2015) 2. George, A., Routray, A.: Fast and accurate algorithm for eye localization for gaze tracking in low-resolution images. Comput. Vis. 10(7), 660–669 (2016) 3. Beltrán, J., García-Vázquez, M.S., Benois-Pineau, J., Gutierrez-Robledo, L.M., Dartigues, J.-F.: Computational techniques for eye movements analysis towards supporting early Diagnosis of Alzheimer’s disease: a review. Computational and Mathematical Methods in Medicine 2018. https://www.hindawi.com/journals/cmmm/2018/2676409/cta/. Accessed 26 May 2019 4. Heinzle, J., Aponte, E.A., Stephan, K.E.: Computational models of eye movements and their application to schizophrenia. Curr. Opin. Behav. Sci. 11, 21–29 (2016) 5. Frost, R.: Towards a universal model of reading. Behav. Brain Sci. 35(5), 263–279 (2012) 6. Coltheart, M., Crain, S.: Are there universals of reading? We don’t believe so. Behav. Brain Sci. 35(5), 20–21 (2012). Invited commentary on ‘‘Towards a universal model of reading” 7. Findlay, J.M., Gilchrist, I.D.: Active Vision: The Psychology of Looking and Seeing. Oxford University Press, Oxford (2003) 8. Liversedge, S.P., Findlay, J.M.: Saccadic eye movements and cognition. Trends Cogn. Sci. 4 (1), 6–14 (2000)

94

V. A. Demareva and Yu. A. Edeleva

9. McConkie, G.W., Rayner, K.: The span of the effective stimulus during a fixation in reading. Percept. Psychophysics 17, 578–586 (1975) 10. Rayner, K.: Eye movements and attention in reading, scene perception, and visual search. Q. J. Exp. Psychol. 62, 1457–1506 (2009). The thirty-fifth Sir Frederick Bartlett Lecture 11. Rahaman, J., Agrawal, H., Srivastava, N., Chandrasekharan, S.: Recombinant enaction: manipulatives generate new procedures in the imagination, by extending and recombining action spaces. Cogn. Sci. 42, 370–415 (2018) 12. Liversedge, S.P., Drieghe, D., Li, X., Yan, G., Bai, X., Hyönä, J.: Universality in eye movements and reading: a trilingual investigation. Cognition 147(3), 1–20 (2016) 13. Stoops, A., Christianson, K.: Parafoveal processing of inflectional morphology on Russian nouns. J. Cogn. Psychol. 29(6), 653–669 (2017) 14. Crocker, M.W., Demberg, V., Teich, E.: Information density and linguistic encoding (IDeaL). Künstl. Intell. 30, 77 (2016) 15. Gregori-Signes, C., Clavel-Arroitia, B.: Analyzing lexical density and lexical diversity in university students’ written discourse. Procedia – Soc. Behav. Sci. 198, 546–556 (2015) 16. Reza, K., Gholami, J.: Lexical complexity development from dynamic systems theory perspective: lexical density, diversity, and sophistication. Int. J. Instr. 10(4), 1–18 (2017) 17. Méndez, D., Ángeles, A.: Titles of scientific letters and research papers in astrophysics: a comparative study of some linguistic aspects and their relationship with collaboration issues. Adv. Lang. Literary Stud. 8(5), 128–139 (2017) 18. Borovsky, A., Ellis, E.M., Evans, J.L., Elman, J.L.: Semantic structure in vocabulary knowledge interacts with lexical and sentence processing in infancy. Child Dev. 87(6), 1893–1908 (2016) 19. Nair, V., Biedermann, B., Nickels, L.: Understanding bilingual word learning: the role of phonotactic probability and phonological neighborhood density. J. Speech Lang. Hear. Res. 60(12), 1–10 (2017) 20. Rispens, J., Baker, A., Duinmeijer, I.: Word recognition and nonword repetition in children with language disorders: the effects of neighborhood density, lexical frequency, and phonotactic probability. J. Speech Lang. Hear. Res. 58(1), 78–92 (2015) 21. Smolík, F., Stepankova, H., Vyhnálek, M., Nikolai, T., Horáková, K., Matejka, Š.: Propositional density in spoken and written language of Czech-speaking patients with mild cognitive impairment. J. Speech Lang. Hear. Res. 56(6), 1461–1470 (2016) 22. Eckes, T., Grotjahn, R.: A closer look at the construct validity of C-tests. Lang. Test. 23(3), 290–325 (2006) 23. Babaii, E., Ansary, H.: The C-test: a valid operationalization of reduced redundancy principle? System 29, 209–219 (2001) 24. Cook, S.V., Pandža, N.B., Lancaster, A.K., Gor, K.: Fuzzy nonnative phonolexical representations lead to fuzzy form-to-meaning mappings. Front. Psychol. 7, 1–17 (2016) 25. Demareva, V.A., Polevaia, A.V., Kushina, N.V.: The influence of language density on eye movements in silent reading: an eye tracking study in Russian vs. English. Int. J. Psychophysiol. 131S, S75–S76 (2018) 26. Baldi, P.H.: Indo-European languages. In: International Encyclopedia of the Social and Behavioral Sciences, 2nd edn, Oxford, Pergamon (2015) 27. Zaprudski, S.: In the grip of replacive Bilingualism: the Belarusian language in contact with Russian. Int. J. Sociol. Lang. 183, 97–118 (2007) 28. Maučec, M.S., Donaj, G.: Morphology in statistical machine translation from English to highly inflectional language. Int. Test Conf. 47(1), 63–74 (2018) 29. Boulware-Gooden, R., Joshi, R.M., Grigorenko, E.: The role of phonology, morphology, and orthography in English and Russian spelling. Dyslexia 21(2), 142–161 (2015)

Bistable Perception of Ambiguous Images – Analytical Model Evgeny Meilikov1,2(B) and Rimma Farzetdinova1 1 2

National Research Centre “Kurchatov Institute”, 123182 Moscow, Russia [email protected] Moscow Institute of Physics and Technology, 141707 Dolgoprudny, Russia

Abstract. Watching an ambiguous image leads to the bistability of its perception, that randomly oscillates between two possible interpretations. The relevant evolution of the neuron system is usually described with the equation of its “movement” over the nonuniform energy landscape under the action of the stochastic force. We utilize the alternative approach suggesting that the system is in the quasi-stationary state being described by the Arrhenius equation. The latter determines the probability of the dynamical variation of the image (for example, the left and right Necker cubes [1]) along one scenario or another. Probabilities of transitions from one perception to another are defined by barriers that detach corresponding wells of the energy landscape, and the relative value of the noise influencing this process. The mean noise value could be estimated from experimental data. The model predicts logarithmic dependence of the perception hysteresis width on the period of cyclic sweeping the parameter, controlling the perception (for instance, the contrast of the presented object). It agrees with the experiment and allows to estimate the time interval between two various perceptions. Keywords: Ambiguous images

1

· Bistable perception

Introduction

Bistable perception is manifested when an ambiguous image, admitting two interpretations, is presented to the subject. In that case the image perception oscillates with time in a random manner between those two possible interpretations [2]. Such a bistability arises for different types of modality [3] – ambiguous geometrical figures [Necker, 1832], figure-ground processes [4], etc. (cf. [5,6]). Why those oscillations occur? Concrete “microscopic” mechanism of that phenomenon is not known (see [7]), but various formal models are suggested based, mainly, on the idea of competitions between distinct neuron populations (engrams) [8]. The fundamental attribute of the most part of similar models is the existence of fluctuations (the noise) which leads to random switching over different perceptions. We exploit the popular model according to which the dynamical process of the bistable recognition might be reduced to traveling the ball along the energy c Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 95–105, 2020. https://doi.org/10.1007/978-3-030-30425-6_10

96

E. Meilikov and R. Farzetdinova

landscape in the presence of the high enough “noise” [8]. Relatively deep wells of that landscape correspond to old neuronal patterns (“long-stored” in the memory), while new images, being subjected to identification, are more shallow wells. The image recognition is analogous to removing the ball in the nearest deeper well corresponding to some known engram. Then, the possible perception bistability is due to the fact that probabilities of transitions in different wells, corresponding to different images, differ weakly, while in the usual situation (with unambiguous image recognition) one of these probabilities significantly outweighs another one. Now, the main problem is to establish, which details of the system dynamics define characteristics of the bistable image recognition.

2

Energy Function

Due to fluctuations, the system state changes randomly that results in the perception bistability. It is suggested [3], that two neuron populations (two different neuron graphs, or two engrams) represent two possible interpretations of the stimulus. Those two populations “compete” with one another, changing the activity of their neurons. Such a model is based on introducing some energy function U with two local minima, corresponding to both different image perceptions, and a barrier between these two states. The temporal evolution of the neuron system is usually described with the equation of its “movement” over the nonuniform energy landscape under the action of the stochastic force, representing noise perturbations [9,10]. We utilize the alternative (and simpler) approach suggesting that the system is in the quasi-stationary state which could be described by the Arrhenius equation [11]. That would be true if the average energy Φ of noise fluctuations was less than the hight of the barrier separating two system states. Below, we will see that this suggestion is valid. But it is the aim of the work to show that this “limited” model, though much more simple, gives no less (in some cases - more) information than the more complicated models of [3]-type for describing bistable perception. In addition, our approach is analytical one, while other models result in numerical calculations and results only. Usually, the energy function is written, by analogy with the phenomenological theory of phase transitions [12], in the form of the power function of some state parameter whose changing corresponds to the dynamic transition of the system from one state to another. However drawing such a power form is justified only by the possibility to expand the function U , in the neighborhood of its minimuma, in powers of the state parameter. Therefore, the form of that function could be selected arbitrary (mainly, for the ease of convenience) from the class of those, preferably simple functions, that describe the needed evolution of the two-well potential with changing the state parameter. Specifically, we write that function in the form (1) U (θ) = −U0 (sin2 θ + Jθ), where θ is the generalized coordinate of the system state (the dynamical variable, or the order parameter), U0 is the typical system “energy”. Here J(t) is the control parameter, generally time-dependent, that defines the system state. For instance, in the case of the Necker cube (see below) the image contrast could

Perception of Ambiguous Images

97

Fig. 1. Extremes of the energy function (1) at J = −0.2.

play the role of such a control parameter. We will be interested in the interval of changing the parameter θ0 that corresponds to those minima of the function U (θ) which are proximate to the point θ = 0. At J = 0 these extremes are placed in points θ1 = −π/2, θ2 = π/2 (minima), θ0 = 0 (maximum). If J = 0, then the maximum shifts to the point, where sin 2θ0 = J, and minima – to points θ1 = −π/2 + θ0 , θ2 = π/2 + θ0 (see Fig. 1). With rising the parameter J, the tilt of the energy landscape changes – the first minimum becomes shallower, the second one – more deep, and the barrier between them diminishes. Let, for instance, in the original state J = −1 and the system resides in the first deep minimum. Then, with rising the control parameter J the system will move (due to fluctuations) from the state θ1 (where it has existed at J = −1) to the state θ2 , clearing the reduced barrier with the top in the point θ0 . In full that barrier disappears at J = +1 (see Fig. 2).

Fig. 2. Energy landscape U = U (θ) at various values of the control parameter J. Arrows indicate the system evolution under cyclic changing that parameter within limits −1 < J < 1. The cycle corresponds to the hysteresis loop shown in the insert. It is accepted that jumps between states occur at J = ±0.5 by crossing the barrier of 0.5U0 - height.

98

E. Meilikov and R. Farzetdinova

Under cyclic variation of the parameter J, the system does not have time to follow it, and, due to such an “inertia”, the hysteretic dependence θ(J) arises, shown in the insert of Fig. 2 and associated with system transitions from one well to another over the detaching barrier of the finite hight. In the example case, the transition occurs at J = ±0.5. Barrier heights Δ12 , Δ21 , obstructing system transitions from the minimum θ1 to the minimum θ2 and the reverse, is readily found from Eq. (1): Δ12 /U0 =

1 − J 2 + J · (arcsin J − π/2), Δ21 /U0 = 1 − J 2 + J · (arcsin J + π/2).

In the linear approximation Δ12 /U0 ≈ 1 − πJ/2, Δ21 /U0 ≈ 1 + πJ/2.

(2)

Dependencies Δ12 (J), Δ21 (J) of those barriers on the control parameter are shown in Fig. 3 which demonstrates that, with monotonous variation (J = −1) → (J = +1), they are also monotonous and cross in the point J = 0. Somewhere in the vicinity of that point the transition occurs from one minimum to another. This is the phase transition with hysteresis whose width is, as usually, depends on the relation between the time T of sweeping the control parameter and the characteristic time τ (see Eq. (3)) of the phase transition.

Fig. 3. Dependencies of barriers heights for transitions θ1 → θ2 and θ2 → θ1 on the control parameter J.

Instead of the explicit accounting the noise influence we will use the wellknown Arrhenius-Kramers formula [13] for the mean lifetime τ of the system in the certain quasi-stationary state which is determined by the relation between the height Δ of the “energy” barrier and the mean value Φ of the noise fluctuation energy (that value could be called the chemical temperature)1 : 1

By fluctuations we mean the deviation of ion or neurotransmitter concentrations in synaptic contacts. That is why we call this noise as chemical one. This term is purely phenomenal, different processes could group together under this same heading. But, nevertheless, the electric potential of a membrane fluctuates in a random manner (see [14]).

Perception of Ambiguous Images

τ = τ0 exp(Δ/Φ),

99

(3)

where τ0 is the constant which should be estimated (see below), and by reason of its general sense is the time between two successive attempts to clear the barrier. In fact, that relationship defines the probability of the system transition in one or another state. The chemical or noise temperature Φ is the chemical analog of thermal fluctuations (to which the thermal energy corresponds in the chemical kinetics).

3

Hysteresis

To estimate the width of the hysteresis loop for the dependence θ(J) (for instance, with varying the control parameter J(t) with time), we will base on the assumption that the transitions θ1 → θ2 and θ2 → θ1 between minimums of the energy U (θ) occur not at the moment when the barrier between these two states disappears, but upon condition that the life-time τ of the current state (see Eq. (3)) diminishes (due to reducing the barrier hight) so that becomes much less than the time T of the J-parameter sweeping, that is under the condition τ = τ0 exp (Δ/Φ) = γT,

where

γ 1.

(4)

It follows from Eqs. (2), (4) that the transition θ1 → θ2 occurs at J = J1→2 = (2/π) [1 − Φ/U0 ln(γT /τ0 )] .

(5)

By the system symmetry respective to transitions θ1 → θ2 and θ2 → θ1 , the reverse transition occurs at J = J2→1 = −J1→2 , so that the whole width of the hysteresis loop equals h = J2→1 − J1→2 = (4/π) [1 − Φ/U0 ln(γT /τ0 )] .

4

(6)

Necker Cube: Perception Bistability

In the experiment [10], the Necker cube has been presented as the ambiguous figure (see Fig. 4) with the contrast of three neighbor cube edges, meeting in its left middle corner, as the control parameter −1 < J < 1. The values J = −1 and J = +1 correspond, respectively, to luminosities j = 0 and j = 255 for pixels of those edges images with 8-bit gray scale. Thus, the contrast J (the control parameter) has been defined by the relation J = 2j/255 − 1, where j is the luminosity of those lines on the given scale. In such a case, the contrast of three middle cube edges, meeting in the right middle corner, equals 1 − 2J, and the contrast of six visible outer cube edges equals to 1. In the symmetrical case J = 0, so that the parameter J defines the deviation from the symmetry. For the pure left cube J = −1, and for the pure right cube J = 1.

100

E. Meilikov and R. Farzetdinova

J= -1 J= - 0.42 J=0

J=0.56

J=1

Fig. 4. Images of Necker cubes with different contrasts being defined by the control parameter J [10].

In the course of the experiment, cube images with N random values Ji of the control parameter (i = 1, 2, . . . , N ) have been presented many times. Subjects have been requested to press buttons on the control panel, according to their initial impression – if the cube is “left” (Fig. 4a) or “right” (Fig. 4e). Each cube with the fixed value of the control parameter Ji has been randomly presented many times. For each value Ji of the control parameter, the probability PL (Ji ) = l(Ji )/[l(Ji ) + r(Ji )]

(7)

of observing the left cube has been calculated. Here l(Ji ) and r(Ji ) are, respectively, numbers of pressing the left or the right button after presenting cubes with the value Ji of the control parameter. Shown in Fig. 5 experimental results are qualitatively similar for all subjects but differ quantitatively. For some observers, the perception of images as left cube ones transforms steeply into their perception as right cubes (near the “symmetry point” J = 0, where PL = 0.5; see the upper panel of Fig. 5), while for others this conversion is smeared (see the lower panel of Fig. 5). In [10] those results are associated with competing different neuron populations near the cusp point in the catastrophe theory with noise included [15]. Our approach is much simpler one – we use the Arrhenius relation(3) for the system life-time in a metastable state that permits to describe correctly not only the dependency PL (J), but the hysteresis of the image perception under the cyclic variation of the control parameter (see below), as well. We could identify the memorized patterns of the left and the right cubes with some long-formed wells of the energy landscape, while the new image to be recognized – with the virtual (recently formed) well. Recognizing the image in that model is the transfer of the system from the new well of the energy landscape, corresponding to the presenting image, into one of two other wells, corresponding in our case to engrams of the left and the right cubes. The direction of such a random, to some extent, transfer is defined by the fact that barriers between the initial and two final wells have different heights. The barrier between wells of more similar images is lower, and that leads to the preferred transfer from the well of the presented image into the well of more similar memorized one. Let ΔL and ΔR be the heights of the barriers indicated. If the presented image is more similar to the left cube image, then ΔL < ΔR and conversely. It is clear, the more the contrast of the presented cube differs from the zero contrast

Perception of Ambiguous Images

101

Fig. 5. Typical experimental dependencies [10] of probabilities PL (J) to percept the image as a left cube on the control parameter J (points in three panels relate to three different observers). Solid curves are theoretical dependencies (10) with c-parameter values specified in each panel.

of the symmetrical image (J = 0, and ΔL = ΔR ), the more the difference between barriers. Then the simplest linear relation between barriers heights and the contrast J of the new image has the form [ΔL − ΔR ]/Φ = c ·J,

(8)

where c is the individual constant to be experimentally determined. The probability PL to recognize the cube as the left one (or the right one) depends on probabilities pL , (pR ) of transferring from the well corresponding to the presented image into the well of the left (the right) cube. According to (3): pL ∝ exp(−ΔL /Φ),

pR ∝ exp(−ΔR /Φ).

(9)

Hence, the total probability to see the left cube equals PL = pL /(pL + pR ) = 1/[1 + exp(cJ)].

(10)

Figure 5 shows (together with experimental data [10]) theoretical dependencies PL (J) calculated by Eq. (10) which, apparently, match well with the experiment when the numerical value of the parameter c is properly chosen. The latter varies within the limits from c ≈ 20 (the upper panel of Fig. 5) down to c ≈ 2 (the lower panel of Fig. 5). It follows therefrom, that in the first case the noise is rather weak: Φ/(ΔL − ΔR ) = 1/cJ ∼ 0.1, while in the last case the noise intensity is high enough: Φ/(ΔL − ΔR ) ∼ 1, and is comparable with barrier heights.

102

5

E. Meilikov and R. Farzetdinova

Necker Cube: Hysteresis of Perception

In [9], experiments with Necker cube are discussed which relate to the statistics of switching between two possible perceptions of the relevant image with varying the control parameter in time. At first, that parameter has been gradually changed (for the time T ) in the straight-going direction (from J = −1 to J = 1), and then, for the same time, in the reversal direction (from J = 1 to J = −1), wherein the time T of the contrast sweeping has been varied. Moments τf , τb (for the forward and back sweeping the control parameter, correspondingly) have been registered, when the observer has for the first time switched from some image perception to another one. In the bistable system, such an over-switching takes place twice – at the forward and backward variation of that parameter. Such a hysteresis phenomenon depends on the rate of varying the control parameter and is observed in different bistable systems. Hysteresis is the system property which lies in the fact that under varying external conditions the system state differs, more or less, from the state being equilibrium at the current conditions. The latter state is that, which could be reached in infinite time after onset of certain (further unchanged) conditions. Really, to arrive at the state which is close enough to the equilibrium one, the finite characteristic relaxation time τ is needed, so that the existence (or nonexistence) of hysteretic phenomena is defined by the relation of two times – the relaxation one and the experiment duration time T : there is the hysteresis if T τ , and the hysteresis is absent if T τ . The hysteresis (more exactly, the hysteresis width) could be conveniently characterized be the parameter h = (τf + τb ) /T − 1,

(11)

which goes to zero (and even becomes negative), when τf , τb < T /2, and is distinct from zero at low T , when τf , τb > T /2 and h > 0. Hysteresis loops for these two cases are gone in opposite directions – clockwise (h < 0) and anticlockwise (h > 0). As it is seen from (6), the case h < 0 is realized under the condition (12) Φ/U0 > 1/ ln(γT /τ0 ), that corresponds to high enough (other factors being equal) intensity of fluctuations Φ/U0 1, provoking “advanced” transitions between energy minima over high barriers. The logarithmic dependency predicted by our model agrees with the experiment [9], that allows to estimate numerically some model parameters. Figure 6 presents two typical experimental dependencies of the hysteresis width h on T (for two different subjects), which are properly approximated by straight lines in the logarithmic scale. For numerical estimates, it is convenient to introduce the dimensional constant τ1 = 1 s and rewrite Eq. (6) in the dimensionless form h = A − B ln(T /τ1 ), where A = (4/π) {1 − [Φ/U0 ] ln (γτ1 /τ0 )} , B = 4Φ/πU0 .

(13)

Perception of Ambiguous Images

103

Fig. 6. Typical experimental dependencies (points) of hysteresis width (11) on the duration T of scanning the control parameter [9]. Straight lines are linear fits.

Fig. 7. Experimental dependencies τf, b (T ) (• − τf , ◦ − τb [9]).

Parameters A and B are determined from linear dependencies of Fig. 7. For example, for the upper of those dependencies A ≈ 1, B ≈ 0.5. Herefrom, it follows at once (14) Φ/U0 ≈ 0.4, τ0 /τ1 ≈ 2γ. Thus, the relative intensity of fluctuations (for the concrete tested person) is rather high. To compare, the lower dependency in Fig. 6 gives Φ/U0 ≈ 0.15. We see that in all cases the noise is relatively small, and, hence, the Arrhenius equation could be used. As for the time τ0 or the parameter γ, directly coupled with it (see (14)), if one chooses, for instance, γ = 0.3, then τ0 ∼ 1 s. We could also consider some simpler model of transferring the system from one state into another one, suggesting that this transition occurs always (independently of the sweeping time T ) at the moment, when the difference between the initial contrast (J = −1 at the moment t = 0) and the contrast at the

104

E. Meilikov and R. Farzetdinova

switching moment (t = tf ) reaches some critical value Jc . For the linear sweeping in the forward direction J = −1+t/T , so that Jc = tf /T , or tf = Jc ·T , that corresponds to the simple rule: switching time is proportional to the sweeping time. That rule is in some extent confirmed by the experiment [9] – see Fig. 7, where experimental dependencies τf, b (T ) are presented. One could see that in spite of high data scattering those dependencies could be, in fact, considered as linear ones. They correspond to the value Jc ≈ 0.15. Hence, in that model the switching should happen every time when the contrast difference reaches ∼15%. However, this over-simplified model predicts the constant hysteresis width h ≈ −0.7 (see (11)), that contradicts to the experiment.

6

Conclusions

Described above bistability models consider, in fact, dynamical processes of switching between different perceptions of an ambiguous image and the hysteresis of such a perception. On the other hand, no dynamical equations (such as θ˙ = −∂U/∂θ) are used in our scheme. That is based on the Arrhenius-Boltzmann relation (3) which defines the probability of dynamical changing the percepted image type under some scenario. In the considered model, probabilities of transitions from one perception type to another are calculated. They are determined by barriers separating respective wells (of the depth U0 ) of the energy landscape, and the noise level influencing that process. The latter is represented by the parameter Φ whose relative value could be estimated from experimental data: Φ/U0 ≈ 0.1 − 1 (individually for various observers). Predicted by the model, the logarithmic dependency of the perception hysteresis width on the period of cyclic sweeping the parameter controlling the perception (for instance, the contrast of the presented image), agrees with the experiment and allows to estimate the time τ of switching between two potentially possible perceptions of the ambiguous image: τ ∼ 1 s for T = 30 s. Thus, in the framework of the described “non-dynamical” approach one could obtain some certain conclusions on dynamics characteristics of the bistable perception for ambiguous images.

References 1. Necker, L.: Observations on some remarkable phenomenon which occurs on viewing a figure of a crystal of geometrical solid. London Edinb. Philos. Mag. J. Sci. 3, 329– 337 (1832) 2. Huguet, G., Rinzel, J., Hupé, J.-M.: Noise and adaptation in multistable perception: noise drives when to switch, adaptation determines percept choice. J. Vis. 14(3), 19 (2014). 14 3. Moreno-Bote, R., Rinzel, J., Rubin, N.: Noise-induced alternations in an attractor network model of perceptual bistability. J. Neurophysiol. 98, 1125–1139 (2007)

Perception of Ambiguous Images

105

4. Pressnitzer, D., Hupé, J.M.: Temporal dynamics of auditory and visual bistability reveal common principles of perceptual organization. Curr. Biol. 16, 1351–1357 (2006) 5. Leopold, D.A., Logothetis, N.K.: Multistable phenomena: changing views in perception. Trends Cogn. Sci. (Regul. Ed.) 3, 254–264 (1999) 6. Long, G.M., Toppino, T.C.: Enduring interest in perceptual ambiguity: alternating views of reversible figures. Psychol. Bull. 130, 748–768 (2004) 7. Sterzer, P., Kleinschmidt, A., Rees, G.: The neural bases of multistable perception. Trends Cogn. Sci. 13(7), 310–318 (2009) 8. Haken, H.: Principles of Brain Functioning. Springer, Cham (1996) 9. Pisarchik, A.N., Jaimes-Re´ ategui, R., Alejandro Magall´ on-Garcia, C.D., Obed Castillo-Morales, C.: Critical slowing down and noise-induced intermittency in bistable perception: bifurcation analysis. Biol. Cybern. 108(4), 397–404 (2014). https://doi.org/10.1007/s00422-014-0607-5 10. Runnova, A.E., Hramov, A.E., Grubov, V.V., Koronovskii, A.E., Kurovskaya, M.K., Pisarchik, A.N.: Chaos. Solitons Fractals 93, 201–206 (2016) 11. Stiller, W.: Arrhenius Equation and Non-Equlibrium Kinetics. BSB B.G. Teubner Verlagsgesellschaft, Leipzig (1989) 12. Toledano, J.-C., Toledano, P.: The Landau Theory of Phase Transitions. World Science, Singapore (1987) 13. Kramers, H.A.: Brownian motion in a field of force and the diffusion model of chemical reactions. Physica 7, 284–304 (1940) 14. Burns, B.D.: The Uncertain Nervous System. Edward Arnold (Publishers) Ltd., London (1968) 15. Poston, T., Stewart, I.: Catastrophe Theory and its Applications. Pitman, London (1978)

Video-Computer Technology of Real Time Vehicle Driver Fatigue Monitoring Y. R. Muratov(&), M. B. Nikiforov, A. S. Tarasov, and A. M. Skachkov Ryazan State Radio Engineering University Named After V.F. Utkin, Ryazan, Russia [email protected]

Abstract. This article is devoted to the actual problem of human fatigue control and attention concentration reduction in transport. The authors consider the most efficient, by their mind, method of person psycho-emotional condition assessment that is video control based on eye condition analysis. It is based on a convolutional neural network having its own topology. The problem of network optimal depth choosing to operate in real-time mode, and the problem of large accuracy indicators on a single-board ARM processor architecture computer were analyzed. As a research result, the software and hardware complex prototype was presented. This prototype allows to detect human fatigue by means of eye video image analysis. This system allows to reduce number of car accidents associated with vehicle driver falling asleep. In conclusion, the shortterm project development prospects are proposed. Fatigue of the person who makes control, management or decision-making, and decrease of attention concentration on object can lead to critical consequences. The most efficient person physiological state control is video control based on eye condition analysis. The algorithm based on convolutional neural network and its hardware implementation, providing face search in the image, eye detection and analysis of eye condition by “open-closed” principle is proposed. Keywords: Convolutional neural network Single-board computer Human fatigue control Accident reduction

1 Introduction Automated operator monitoring systems development is an integral part of reliability improvement in “human-machine” system. Such factors as emotional stress and fatigue can lead to performance decrease that is described by main following characteristics: attention state and emergency action readiness [1]. The most important is state control of the operators whose action errors pose a direct threat to human lives: nuclear power plants operators [2], military equipment [3], public and private transport drivers [4]. One of actual problems associated with attention concentration decreasing is driver falling asleep when driving. In 2017, Ford conducted a survey among Russian drivers. According to statistics, 32% of respondents fell asleep when driving, 3.8% of them

© Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 106–115, 2020. https://doi.org/10.1007/978-3-030-30425-6_11

Video-Computer Technology of Real Time Vehicle Driver

107

admitted that they woke up after a collision or exiting from the roadway. About 20% of all accidents are caused by falling asleep when driving [5]. A sleepy driver, like a drunk, is extremely dangerous on the road. Every year the number of car accidents caused by driver falling asleep, is increasing worldwide. A survey conducted in Norway found that 1 of 12 car drivers at least once fell asleep when driving during a year. Main signs of driver concentration decrease are the followings: • • • • • • • • • •

vision focusing difficulties; frequent eye blinking; feeling of heavy eyelids; hardness of keeping head straight; frequent driver yawn; driver can hardly remember the last traveled kilometers; driver passes road signs without paying attention to them; car often moves out of his lane; difficulties connected with distance keeping; a car touches a noise strip on a road side.

In addition, human physiological reactions of cardiovascular, respiratory, central nervous systems are changed in a state of fatigue and drowsiness. Behavioral indicators such as yawning, blinking, head tilting, long time road distracted look are often used to detect the signs of driver’s attention concentration decrease automatically [13]. Observation systems and video processing integrated into vehicle can significantly improve transport security. The most efficient concentration decrease sign is the dynamics of eye condition [6]. Developments of this type [2], implemented in hardware can be mentioned. Despite apparent simplicity of the open and closed eyes detecting function on the video frame, the presented systems are far from perfect. Eyes detecting difficulties arise when the driver turns his head, for example, looking at the side window or rear-view mirror, or at night, with variable road illumination and oncoming lighting. There are also many other factors that make operation of the proposed systems difficult [7]. Recently, systems of driver falling asleep monitoring based on face and eyes analysis in the image on video camera began to appear actively. Such systems include CoDriver made by Jungo, devices integrated into the steering wheel provided by Johnson safety system, and Driver Alert Sleep Warning Device and others. Existing developments suggest that the camera should be in a position to “see” the whole face. This arrangement can be inconvenient and also makes it difficult to integrate such systems into many vehicles where this optical sensor arrangement is not acceptable due to design features. Sharp problem of many systems is use of eyeglasses. Specialized lenses can create flares and distort arrangement of eyes, making the work of similar systems impossible.

108

Y. R. Muratov et al.

The developed complex also should have a possibility of fast substring in case of sharp brightness variation (the front lighting, driving in a tunnel, light reflection, etc.), to have broad range of working temperatures, low power consumption and reasonable price.

2 Methods of Faces and Eyes Detecting in Images Investigation System requires determining position of driver’s eye in the image. Before system begins to detect a position of eye, it is necessary to localize driver’s face. One of the most known methods of face detection is Viola-Jones’s method [3, 4]. Its basic principle is in image representation in integrated view. It allows to count total brightness of any rectangle in the image. Integrated characteristics are used for calculation of signs based on Haar’s primitives [9], and they give output using busting [10] result. Training takes place very slowly, however search results of an object (face) pass very quickly, but it is insufficiently correct in case of some head position. The other relevant image search algorithm is application of Single Shot Multi Box Detector (SSD) based on application of convolution neural networks, such as MobileNet. Such algorithms have the highest accuracy then the Viola-John’s (more than 90%). However, implementation of convolution network based on the ARM CPU shows poor performance. One more way of objects detection in the image is histogram of the directed gradients. The method is based on the directions of image brightness gradients calculation of and on finding the area where the majority of them satisfy to a template. In other words, it is necessary to find such section of the image that has HOG representation most similar to HOG representation of face structure. HOG allows you to detect face with the ability to vary between performance and accuracy. Therefore, for example, in DLib [11] library authors could achieve detection accuracy in 99.38%. Under the constraints imposed by the ARM processor architecture and camera angles, HOG was chosen. The result of face localization by the HOG algorithm is the coordinates of square frame containing whole face or its large area. Eye detection is also possible with several different algorithms. The first and most commonly used are Haar cascades. Algorithm gives correct results for 80% of cases when the face is full. In poor lighting, night driving conditions, the algorithm works unsatisfactorily. Low performance of the algorithm realization is also disadvantage of Haar cascades. The most productive of all algorithms is the analytical algorithm for determining landmark facial points. Versions of this algorithm using different datasets are able to determine from 5 to 68 facial points at a speed about 7000 FPS on the classical x86-x64 CPU architecture. In Fig. 1, the results of both methods, Haar cascades and Analytics in the case when eyes were not localized by Haar cascades are presented. Eyes selected by Haar cascades are highlighted by black rectangles. Data obtained by the analytical algorithm are highlighted by grey rectangles.

Video-Computer Technology of Real Time Vehicle Driver

109

Fig. 1. Search result of eyes on faces

To detect eye condition either open or closed, you can use the analysis of eye points coordinates obtained by analytical algorithm. In our implementation, each eye is described by six points. However, accuracy of this analysis depends on the size of the eyes and eyelids of different races of people, also on lighting conditions and head positions. For certain positions of head analytic algorithm gives the front point of the eye, that does not belong to eye. Therefore, it was decided to use a neural network algorithm to detect the state of the eye. As the platform for training Keras was used [8]. The models received using of this Framework have the small weight and good integration into Open CV. Input value of neural network is the image of an eye and its neighborhood in several pixels. The image was scaled to size 96*96. Such image size was optimum. After that image goes to an input of neural network. The output has 3 values: 1. Probability that the image has open eye 2. Probability that the image has closed eye 3. Probability that the image has no eye at all The third value allows to exclude false operations of a complex when eye (face) was not found, for example, in case of turn of the head. One of the main problems was the choice of network architecture. Small networks have low accuracy; however they make a decision very quickly. Large networks, such as Xception [7, 12], on the contrary, have low speed of work and give high accuracy. Optimum implementation was reached by means of new variant of network architecture creation that has the highest accuracy among others capable to work in real time mode at RockChip RK3399 CPU (less than 30 ms per 1 frame). Eight variants of neural network architectures were analyzed. The experiment results are shown in Fig. 2. As a result, the most optimum network has a structure P3C32P2C64P2C128P2C256P2D1024D3, where: Cn – convolution operation of the image with selection of n-signs Pm – subselection operation (Max Pooling) with a m*m core size Dk – full-meshed layer of neural network in k-neurons

110

Y. R. Muratov et al.

Fig. 2. Comparison of different neural network models depending on speed and accuracy

As a result of research the following architecture was obtained (Fig. 3):

Fig. 3. Neural network architecture

This model works with square color images of eye area, having 96*96 size and represents consecutive execution of Convolution and Pooling operations increasing number of characteristics until the array consisting only of 1024 characteristics is received. After that, received characteristics go to a full-connected layer from 1024

Video-Computer Technology of Real Time Vehicle Driver

111

neurons and after that images are separated into three required classes: BAD, OPEN and CLOSE. As activation function the ReLU function was selected: f ð xÞ ¼

0; if x\0 x; if x 0

ð1Þ

Training was made on special marked set of images consisting of more than 150.000 images of the opened and closed eyes. The network was trained on 6 iterations. Figure 4 shows dependence of false positives on training iterations.

Fig. 4. Quality of work depending on the iteration of training

The training was provided on our own datasets which included samples of eyes images of University students of different nationalities, under different lighting conditions, different attributes of face. It allowed to provide high-quality work regardless of conditions. The result of trained network is shown in Fig. 5. Test sample shows that the presented model demonstrates high accuracy under various conditions: glasses, glare, sudden changes in brightness, etc.

Fig. 5. The eye condition detection result

The algorithms are implemented in C++, using the OpenCV library under the ARM architecture.

112

Y. R. Muratov et al.

3 Experiment Results To assess the proposed complex quality an experiment was conducted. During experiment, about 10,000 different images prepared in different situations were analyzed: • • • •

Different lighting conditions and camera locations; Different age, ethnic and sex composition; Presence of limiting factors such as headdress, mustache, beard, etc. A driver in possession of glasses: sunglasses, anti-reflective, correcting hyperopia and myopia.

In Figs. 6 and 7 the result of face detection by HOG algorithms and neural network is presented. Experimentally, face detection algorithms demonstrated high detection rates (98.21%). A small percentage of incorrect results occurs only in case of bright lighting of half face and a large angle of head rotation more than 50°. However, error compensation at large angles is compensated by the presence of two cameras that complement each other in conditions when the driver looks in the side mirrors.

Fig. 6. The result of the face selection module using HOG

In Fig. 8 the result of the neural network algorithm for detecting eyes state is presented. The presented group of algorithms allows you to find and select areas regardless of the shooting conditions and the presence of glasses. Testing the system prototype made it possible to confidently determine the moment of closing and opening the eyes with the following shooting parameters: • daylight and head positions ±45° horizontally;

Video-Computer Technology of Real Time Vehicle Driver

113

Fig. 7. The result of the face extraction module using a neural network

Fig. 8. The eye detection result. Eyes closed on the left and open on the right

• night mode (illumination of the IR diodes with a wavelength of 840 nm and conditions of the head ±35°. horizontally; • glasses with diopters ±5 day and night lighting for cases when the glasses shackle does not cover the eyes in the image (angles of rotation of the head ±35° horizontally); • safety glasses with a light degree of shading, daylight for cases when the glasses do not cover the eye on the image (angles of rotation of the head ±35° horizontally.)

4 Conclusion The proposed video-computer technology includes two successive stages of determining approaching sleep driver state: first of all, facial area search and selection in the frame. Once this operation is performed, the algorithm uses a neural network that

114

Y. R. Muratov et al.

determines the both eyes state. Hardware implementation is focused on LowCost evaluator level on the basis of the SoC RK3399. RK3399 includes CPU with big. LITTLE architecture.: Dual-Core Cortex-A72 and Quad-Core Cortex-A53 and GPU Mali-T864. One or two video cameras are used as image registration sensors. The solution is not tied to a specific location of the camera, on the contrary, it allows the driver to determine the place of attachment. The test showed that in the case of one camera, the best place to install it is the space above the dashboard between the windshield and the driver’s face. In the case of two cameras the best place to install them is the front side pillars. Two cameras allow you to control the driver, e.g., when he looks in the side mirrors. The immediate prospects for the project development are the addition of new driver fatigue assessment metrics. The ability to measure heart rhythm using the video series is supposed as one of these metrics. This addition will increase the accuracy of driver fatigue determination. In addition, the developed complex makes it possible to set the limit of incessant driving of the car. This feature has great relevance for vehicles requiring the installation of tachographs. They have a common disadvantage – the ability to cheat device. Using the developed complex as a “Smart” tachograph there is no opportunity to cheat it, because the device “remembers” the driver’s face. Thus, the developed complex will reduce the number of road transport incidents that occur due to lack of concentration caused by driver fatigue or distraction.

References 1. Dushkov, B.A., et al.: Fundamentals of Engineering Psychology, p. 576, MoscowYekaterinburg (2002) 2. Alyushin, M.V., Alyushin, A.V., Belopolsky, V.M., Kolobashkina, L.V., Ushakov, V.L.: Optical technologies for monitoring systems of the current functional state of the operational composition of the management of nuclear power facilities. Global Nucl. Saf. 6, 9–77 (2003). Moscow 3. Melnik, O.V., Demidova, K.A., Nikiforov, M.B., Ustyukov, D.I.: Continuous monitoring of blood pressure of the vehicle crew and decision makers. Defense Technol. Sci. Tech. Collect./FSUE “NIISU” 9, 77–80 (2016) 4. Sahayadhas, A., Sundaraj, K., Murugappan, M.: Detecting driver drowsiness based on sensors: a review. Sensors 12(12), 16937–16953 (2012). (Basel). Publish 5. Ovcharenko, M.S.: Analysis and forecast of the state and level of accidents on the roads of the Russian Federation and ways to reduce it. Sci. Methodical Electron. J. Concept 15, 1661–1665 (2002) 6. Dimov, I.S., Derevyanko, R.E., Kotin, D.A.: Automated system for preventing the driver from falling asleep while driving. Vestn. MGTU 20(4), 659–664 (2017) 7. Image Processing in Aviation Vision Systems. Kostyashkin, L.N., Nikiforov, M.B. (eds.), p. 240. Fizmatlit, Moscow (2016) 8. Chollet, F.: Keras. https://github.com/fchollet/keras. Accessed 21 Nov 2015 9. Viola, P., Jones, M.: Robust Real-time Object Detection. Cambridge Research Laboratory, Cambridge (2001)

Video-Computer Technology of Real Time Vehicle Driver

115

10. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. Conf. Comput. Vis. Pattern Recogn. 1, l-511–l-518 (2001) 11. King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009) 12. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR, vol. 2 (2017) 13. Furman, G., Baharav, A., Cahan, C., Akselrod, S.: Early detection of falling asleep at the wheel: a heart rate variability approach. Comput. Cardiol. 35, 1109–1112 (2008)

Consistency Across Functional Connectivity Methods and Graph Topological Properties in EEG Sensor Space Anton A. Pashkov(&)

and Ivan S. Dakhtin

South Ural State University (National Research University), Chelyabinsk, Russia [email protected]

Abstract. One of the most widely used topological properties of brain graphs is small-worldness. However, different functional connectivity methods can generate quantitatively different results, particularly when they are applied to EEG sensor space. In this manuscript, we sought to evaluate the consistency of values derived from pairwise correlation between selected functional connectivity methods. We showed that the alpha band yielded maximal values of correlation coefficients between small-worldness indices obtained with different methods. In contrast, delta and gamma bands demonstrated the least consistent results. Keywords: EEG Brain graphs Small-world network

Functional connectivity

1 Introduction The recent progress in neuroscience has made it possible to frame the brain functioning in terms of graph theory. There are many metrics to evaluate topological features of the complex networks. Watts and Strogatz defined a generative model for graphs with two key properties: clustering coefficient and characteristic path length [1]. The generated graphs having hybrid properties, short path length and high clustering coefficient, were called small-world networks [2]. Their characteristic, small-worldness (SW), was found to be ubiquitous and universal across both living and non-living complex systems (e.g. C. elegance connectome, social networks, Internet) [2]. The mainstream standpoint in neuroscience is that these complex brain networks are organized through synchronization of multiple brain areas. Neural oscillations may play a causal role in forming brain activity and behavior [3]. Functional connectivity is intended to characterize such patterns of synchronization. It has repeatedly been demonstrated that topological properties of EEG-based brain graphs can be useful in constituting novel biomarkers of psychiatric and neurological disorders [4–6]. However, ultimate results of SW coefficient computations are highly dependent on the method being used. For example, M. Lai and colleagues, comparing scalp- and source-based measures of functional connectivity, found strong correlation for the global connectivity between scalp- and source-level, but arguing that network topology was only weakly correlated [7]. © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 116–123, 2020. https://doi.org/10.1007/978-3-030-30425-6_12

Consistency Across Functional Connectivity Methods

117

Thus, it is of critical importance to evaluate the difference of FC methods and determine the influence this difference imposes on final outcome. In this study, taking the first step toward this aim, we sought to make comparison of different functional connectivity methods (and topological properties of graphs they give).

2 Methods One hundred and seven healthy volunteers participated in the experiment. High-density EEG recordings in resting state with eyes open were analyzed. These recordings are part of publicly available EEG dataset [8–10]. The EEG was recorded from 64 electrodes as per the international 10-10 system (excluding electrodes Nz, F9, F10, FT9, FT10, A1, A2, TP9, TP10, P9, and P10). We defined frequency ranges of EEG activity according to conventional division: delta (1–3, 5 Hz), theta (4–7, 5 Hz), alpha (8–12, 5 Hz), beta (13–29, 5 Hz), gamma (30–45 Hz). Two reference electrodes were positioned at the left and right mastoids. The data were re-referenced offline to the common average reference. In present study, we used six functional connectivity measures. 1. Coherence [11], E Sxy Coh ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi E ½Sxx E Syy

ð1Þ

A widely used FC method estimating the relation between two signals. 2. Imaginary part of coherency [12], Im E Sxy iCoh ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi E½Sxx E Syy

ð2Þ

A method in which imaginary part of cross-spectral density is taken instead of magnitude. The method is considered to be insensitive to volume conduction bias. 3. PLI [13], PLI ¼ E sign Im Sxy Phase lag index, a method reducing the influence of common sources.

ð3Þ

118

A. A. Pashkov and I. S. Dakhtin

4. wPLI [14], E Im Sxy wPLI ¼ E Im Sxy

ð4Þ

An extension of PLI, a weighted PLI. The weight is magnitude of the imaginary part of the cross-spectral density. The method is less sensitive to small perturbations in phase lag. 5. PLV [15], " # Sxy PLV ¼ E Sxy

ð5Þ

Phase locking value, another method measuring phase synchrony. Originally tailored to study evoked activity, it can still be applied to the resting-state. 6. PPC [16], PPC ¼

N 1 X N X 2 cos hj hk : N ðN 1Þ j¼1 k¼j þ 1

ð6Þ

Pairwise phase consistency, an unbiased estimator of squared PLV. Here Sxy ; Sxx ; Syy are cross-spectral and autospectral density of signals, E½ denotes the averaging over epochs, N is epoch count, hj ; hk are relative phases at ith and jth epoch. The raw data was bandpass filtered from 1 to 45 Hz and epoched (5-s segments) in open-source Python software MNE [17]. Then, we run automated artifact rejection using Autoreject library [18]. Next, the preprocessed epochs were used to compute allto-all connectivity matrices; these matrices were thresholded (the threshold is the mean value) and set as adjacency matrices of graphs. Then, the largest (by node count) connected component of every graph was taken into consideration, for each average clustering coefficient C and average minimum path length L were computed. These computations were then applied for the set of ten random graphs with the same number of nodes and edges, gaining average Cr and Lr respectively. The SW values were calculated as SW ¼

C=Cr : L=Lr

ð7Þ

and stacked into 1 x 107 array, in accordance with the number of participants. Statistical analysis was performed using IBM SPSS version 25 (IBM Corp, Armonk, NY, USA). Normality of distribution of the data was assessed with Kolmogorov-Smirnov test. As a proxy measure for consistency, values of correlation coefficients between different functional connectivity metrics across different frequency bands were used.

Consistency Across Functional Connectivity Methods

119

3 Results Assumption of normal distribution was violated, so non-parametric Spearman’s rank correlation coefficient was computed. The main results are graphically displayed in Fig. 1 through Fig. 6. In delta frequency range the only statistically significant correlation for wPLI was PLI (q = 0, 72, p < 0, 01). Phase-lag index, in turn, showed negative correlation with three of five measured functional connectivity methods, namely coherence, PLV and PPC (q = −0,28, q = −0,22, q = −0,22, respectively). wPLI as well as PLI demonstrated null correlation coefficient with iCoh metric. Coherence correlated maximally with PLV and PPC at levels of q = 0,77 and q = 0,82, respectively. Maximal positive correlation in delta range was observed in relationship between PLV and PPC (q = 0,96, p < 0,01) (Figs. 2, 3, 4 and 5).

Fig. 1. Correlation coefficients of small-worldness values between wPLI-to-all other functional connectivity methods across different EEG frequency bands

Fig. 2. Correlation coefficients of small-worldness values between PLI-to-all other functional connectivity methods across different EEG frequency bands

120

A. A. Pashkov and I. S. Dakhtin

Fig. 3. Correlation coefficients of small-worldness values between Coherence-to-all other functional connectivity methods across different EEG frequency bands

Fig. 4. Correlation coefficients of small-worldness values between PLV-to-all other functional connectivity methods across different EEG frequency bands

Fig. 5. Correlation coefficients of small-worldness values between iCoh-to-all other functional connectivity methods across different EEG frequency bands

Consistency Across Functional Connectivity Methods

121

Fig. 6. Correlation coefficients of small-worldness values between PPC-to-all other functional connectivity methods across different EEG frequency bands

In similar vein, analysis of methods’ consistency in theta frequency resulted in high positive correlation between wPLI and PLI (q = 0,84, p < 0,01). Maximal correlation coefficient was found in PLV-PPC pair (q = 0,97, p < 0,01). Imaginary part of coherence had moderate to low correlation coefficient values, all of which did not exceed q = 0,55. In EEG alpha range, wPLI and PLI had correlation strength of 0,938 (p < 0,01). Coherence values highly correlated with PLV and PPC ones (q = 0,87 for both, p < 0,01). There were no correlation coefficient values (for relationship between coherence and other methods) which went below q = 0,6. Strong correlation was observed in pairs iCoh-wPLI (q = 0,79, p < 0,01) and iCoh-PLI (q = 0,78, p < 0,01). PLV had maximal correlation with Coh (q = 0,87, p < 0,01) and PPC (q = 0,98, p < 0,01). Peak value in this frequency range was in PPC-PLV pair (q = 0,98, p < 0,01). wPLI-PLI correlation coefficient in beta range was q = 0,93 (p < 0,01). Minimal values of correlation was found among wPLI and Coherence (q = 0,29, p < 0,01). Coherence, in turn, had maximal correlation values with PLV (q = 0,72, p < 0,01) and PPC (q = 0,83, p < 0,01). PLV highly correlated with PPC (q = 0,96, p < 0,01). Coherence and imaginary part of coherence had correlation coefficient of 0,39 (p < 0,01). Results in the gamma range showcase an absence of statistically significant correlation between wPLI and coherence, whereas demonstrating strong link between wPLI and PLI (q = 0,85, p < 0,01). wPLI had low correlation with PPC (q = 0,2, p < 0,05). Correlation between coherence and PPC took value of 0,75 (p < 0,01). Aside from PPC and PLV, other methods were shown to have correlations with coherence not significantly different from zero. PLV had strong relationship with PPC values (q = 0,9, p < 0,01). Imaginary part of coherence had no correlation values with other methods which exceed level of q = 0,46.

122

A. A. Pashkov and I. S. Dakhtin

4 Discussion and Conclusions In this paper, we strived to provide a brief and concise illustration of how consistent measures of functional connectivity across different EEG frequency ranges were. Major finding of the study is that the alpha range gives the highest correlation coefficients and, therefore, allowing one to get more similar estimations of topological properties of brain graphs across functional connectivity methods being tested. Predominance of activity in alpha band is distinguishing feature of brain resting state. Moreover, EEG studies have shown that alpha power fluctuations in brain areas directly point out to level of inhibition this region is exposed to [19]. Thus, alpha band, being a conspicuous and reproducible feature of brain activity at rest, provide us with the most consistent measures of topological properties of brain networks. The least consistent values of correlation strength between FC methods were found in delta and gamma frequency ranges. Delta and gamma bands are extreme examples of EEG frequencies continuum, representing different modes of neural information processing with delta being mostly involved in coordination distantly located areas, while gamma rhythm engaged in local information processing [3]. However, it is currently unclear to what extent it may relates to results observed in this paper. wPLI has high correlation values with PLI in all frequency ranges. This may be attributed to the fact that wPLI is an extension of the PLI. Both measures are insensitive to volume conduction which represents the major issue for FC computed on EEG sensor space data. The significance of this issue for functional connectivity analysis may also be evidenced by considering iCoh-Coh pair. Correlation values between iCoh and Coh didn’t surpass q = 0,39 (except for alpha range with q = 0,7), indicating the possible presence of volume conduction effects. It is worth noticing, however, that our study has a number of limitations. Firstly, we used sensor but not source space data for analyzing SW of brain graphs. Therefore, the obtained results should be taken with caution. Secondly, we did not verify the results on directional and weighted graphs which also may give different pattern of results. Finally, all the computations in sensor space are reference-dependent which implies the need to reexamine these results by using different reference techniques. As a possible extension of current paper, correlations between the different connectivity approaches [20], namely time domain methods and frequency domain ones may be considered. Space limitations prevent us from including an exhaustive list of all pairwise comparisons between selected functional connectivity methods. In conclusion, taking into account all abovementioned issues of extant data, it is highly warranted to direct our efforts to critical and thorough revision of currently used brain graph topological metrics and their clinical applications.

References 1. Watts, D.J., Strogatz, S.H.: Collective dynamics of “small-world” networks. Nature 393 (6684), 440–442 (1998) 2. Fornito, A., Zalesky, A., Bullmore, E.T.: Fundamentals of brain network analysis, p. 476. Academic press, Cambridge (2016)

Consistency Across Functional Connectivity Methods

123

3. Thut, G., Miniussi, C., Gross, J.: The functional importance of rhythmic activity in the brain. Curr. Biol. 22(16), R658–R663 (2012) 4. Jhung, K., Cho, S.-H., Jang, J.-H., Park, J.Y., Shin, D., Kim, K.R., An, S.K.: Small-world networks in individuals at ultra-high risk for psychosis and first-episode schizophrenia during a working memory task. Neurosci. Lett. 535, 35–39 (2013) 5. Stam, C., Jones, B., Nolte, G., Breakspear, M., Scheltens, P.: Small-world networks and functional connectivity in alzheimer’s disease. Cereb. Cortex 17(1), 92–99 (2006) 6. Wei, L., Li, Y., Yang, X., Xue, Q., Wang, Y.: Altered characteristic of brain networks in mild cognitive impairment during a selective attention task: an EEG study. Int. J. Psychophysiol. 98(1), 8–16 (2015) 7. Lai, M., Demuru, M., Hillebrand, A., Fraschini, M.: A comparison between scalp- and source-reconstructed EEG networks. Sci. Rep. 8(1), 12269 (2018) 8. Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), e215–e220 (2000) 9. Schalk, G., McFarland, D.J., Hinterberger, T., Birbaumer, N., Wolpaw, J.R.: BCI2000: a general-purpose brain-computer interface (BCI) system. IEEE Trans. Biomed. Eng. 51(6), 1034–1043 (2004) 10. http://www.schalklab.org/research/bci2000 11. Bowyer, S.M.: Coherence a measure of the brain networks: past and present. Neuropsychiatr. Electrophysiol. 2(1), 1 (2016) 12. Nolte, G., et al.: Identifying true brain interaction from EEG data using the imaginary part of coherency. Clin. Neurophysiol. 115(10), 2292–2307 (2004) 13. Stam, C.J., et al.: Phase lag index: assessment of functional connectivity from multi-channel EEG and MEG with diminished bias from common sources. Hum. Brain Mapp. 28(11), 1178–1193 (2007) 14. Vinck, M., et al.: An improved index of phase-synchronization for electro-physiological data in the presence of volume-conduction, noise and sample-size bias. NeuroImage 55(4), 1548– 1565 (2011) 15. Lachaux, J.P., et al.: Measuring phase synchrony in brain signals. Hum. Brain Mapp. 8(4), 194–208 (1999) 16. Vinck, M., et al.: The pairwise phase consistency: a bias-free measure of rhythmic neuronal synchronization. NeuroImage 51(1), 112–122 (2010) 17. Gramfort, A., Luessi, M., Larson, E., Engemann, D., Strohmeier, D., et al.: MEG and EEG data analysis with MNE-Python. Front. Neurosci. 7, 267 (2013). ISSN 1662-453X 18. Jas, M., Engemann, D., Bekhti, Y., Raimondo, F., Gramfort, A.: Autoreject: automated artifact rejection for MEG and EEG data. NeuroImage 159, 417–429 (2017) 19. Bazanova, O.M., Vernon, D.: Interpreting EEG alpha activity. Neurosci. Biobehav. Rev. 44, 94–110 (2014) 20. Bastos, A.M., Schoffelen, J.-M.: A tutorial review of functional connectivity analysis methods and their interpretational pitfalls. Front. Syst. Neurosci. 9, 175 (2016)

Evolutionary Minimization of Spin Glass Energy Vladimir G. Red’ko(&) and Galina A. Beskhlebnova Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow 117218, Russia [email protected], [email protected]

Abstract. The current work describes the model of evolutionary minimization of energy of spin glasses. The population of agents (modeled organisms) is considered. The genotypes of agents are coded by a large number of spins of the spin glass. The energy of the spin glass is calculated in accordance with the Sherrington-Kirkpatrick model. This energy determines the fitness of agents. The process of evolutionary minimization of the spin glass energy is analyzed by means of computer simulation. Several properties of spin glasses that are related to the model of evolutionary search are analyzed. In particular, the global energy minima of spin glasses and the variation of energy at one-spin mutation are estimated. The process of the gradual decrease of the spin glass energy is also analyzed. The gradual decrease is performed by sequential changes of signs of separate spins of spin glass. The computer simulation demonstrates that evolutionary optimization results in the finding of essentially deeper energy minima as compared with the gradual decrease. The rate and efficiency of evolutionary minimization of energy of spin glasses have been estimated and checked by computer simulation. Keywords: Evolutionary optimization Energy of spin glass Rate and efficiency of evolutionary process

Agents

1 Introduction The current work is development of our previous article [1]. The new features of the current paper are the following: we consider here the more detailed model of the evolutionary minimization of spin glass energy and analyze additionally several properties of spin glasses that are related with the considered evolutionary search. This additional analysis includes: (1) estimation of the global energy minima of spin glasses by computer simulation, (2) estimation of energy variation at changing the sign of one spin (this variation can be considered as the one-spin mutation), (3) the study of the gradual decrease of spin glass energy. The gradual decrease is performed by the following method: the spins of the spin glass are sequentially changed and the changes, which decrease the energy, are fixed. The analysis is performed by means of computer simulation.

© Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 124–130, 2020. https://doi.org/10.1007/978-3-030-30425-6_13

Evolutionary Minimization of Spin Glass Energy

125

The most essential result of the current work is the analytical estimation of the rate and efficiency of evolutionary minimization of the spin glass energy. Using computer simulation, we have checked this analytical estimation. Our evolutionary model is similar to the quasispecies model [2, 3]. In the current article, we use analogy with the quasispecies model and our previous estimation for the quasispecies model [4, 5] with Hamming distance between agent genotypes.

2 Model of Evolutionary Minimization of Spin Glass Energy 2.1

Formal Model of Spin Glass

Using the well-known model of spin glasses by Sherrington-Kirkpatrick [6, 7], we can construct the evolutionary model for a very large number of the local maxima of a fitness function. The spin-glass model describes a system of pairwise interacting spins. Interactions between the spins are random. A formal model of the spin glass is the following. (1) There is a system S of spins Si, i = 1,…,N (the number of spins N is supposed to be large, N >> 1), Si = +1 or –1. (2) The exchange interactions between spins are random. The energy of the spin system is defined as: N X

EðSÞ ¼

Jij Si Sj ;

ð1Þ

i;j¼1;i\j

where Jij are the exchange interactions matrix elements. Jij are normally distributed random values. Probability density P(Jij) is: ( " #) rffiffiffiffiffiffiffiffiffiffiffiffi Jij2 ðN 1Þ N1 PðJij Þ ¼ exp : 2p 2

ð2Þ

The model (1), (2) was intensively investigated. For further consideration, the following spin-glass features are essential. The number of local energy minima M is very large [8]: M eaN ;

a 0:2 :

ð3Þ

A local energy minimum is defined as a spin glass state SL, at which the change of sign of any one spin (Si !–Si) increases the energy E. The global energy minimum E0 equals approximately –0.8 N [9]: E0 0:8N:

ð4Þ

126

V. G. Red’ko and G. A. Beskhlebnova

From (1), (2) one can obtain that the mean value of the spin-glass energy is zero: \E [ ¼ 0

ð5Þ

and the mean square root value of the energy variation at the change of sign of any one spin (Si !–Si) is of the order of 1 [1]: rffiffiffi 8 \DE [ ¼ : p

ð6Þ

Using computer simulation, we have checked the estimations (4), (6). Figure 1 shows the dependence of the global energy minimum E0 on the number of spins N in the spin glass. Almost all results are averaged for different number of independent calculations. Numbers of independent calculations nav are as follows: for N = 5, 10 nav = 106, for N = 15 nav = 104, for N = 20 nav = 103, for N = 25 nav = 10. For N = 30, there was only single calculation.

0 E0

-5 -10 -15 -20 -25

0

5

10

15

20

25

30

N

Fig. 1. The dependence of the global energy minimum E0 on the number of spins N.

We also calculated the mean square root value of the energy variation at the change of sign of any one spin (Si !–Si) (this result was averaged for 10000 of independent calculations). The result was the calculated estimation: 1.60. The results of these calculations agree with the estimations (4), (6). 2.2

Model of Evolutionary Process

Let us construct the spin-glass model of evolution. We suppose that the genotype of the agent (modeled organism) is the set of N spins of the spin-glass system S. The fitness of the agent, which has the genotype Sk, is: f ðSk Þ ¼ ebEðSk Þ ; where b is the parameter of selection intensity, b > 0.

ð7Þ

Evolutionary Minimization of Spin Glass Energy

127

The population is the set of n agents with genotypes Sk, k = 1,…, n. We suppose, that (1) the evolutionary process consists of consecutive generations, (2) new generations are obtained by the selection and the mutations of agents. The agent is selected into the population of the new generation in accordance with the fitness (7). At the mutations, the signs of genotype symbols are changed (Ski !– Ski) with the probability Pm for any symbol. The selection of agents into the new population is probabilistic: any agent is selected into the new population with the probability, which is proportional to its fitness f(Sk); namely, the well-known method “roulette wheel selection or fitness proportionate selection” is used. The genotypes of agents of the initial population are random. Similar to the quasispecies model with the Hamming distance between the genotypes of agents [4, 5], we will suppose the following natural relationships between the parameters of the model: N, n >> 1, 2 N >> n, b >* PmN, PmN *> n means that the evolutionary process is essentially stochastic, the number of possible genotypes in populations is relatively small, and some kinds of genotypes S are absent in the population. The relation b >* PmN means that the intensity of selection is enough large. The relation PmN *> 1, 2 N >> n) and sufficiently large population size n (when the role of neutral selection is small), the total number of generations of the evolutionary process GT can be estimated as follows. The emergence of new agents in the population with less energy is the result of mutations, and then these agents are selected into the population of the new generation. The characteristic number of generations G–1, during which the average energy in the population P decreases by 1, can be estimated as: G1

GM þ GS ; DE

ð8Þ

where DE is the characteristic value of the variation of energy at one mutation. GM * (NPm)−1 is the characteristic number of generations required for a single mutation in a genotype. GS * (b DE)−1 is the typical number of generations, at which agents with the energy P – DE replace agents with the energy P in the population. Pm is the probability of one mutation. According to the expression (6) DE * 1.

128

V. G. Red’ko and G. A. Beskhlebnova

From these relations, we have: G1

1 1 1 þ ; DE 1: DE NPm bDE

ð9Þ

The total change of the energy in the population during the evolutionary search of energy minima according to (4), (5) is of order N, hence the characteristic number of generations of the whole process of the evolutionary minimization of spin glass energy GT for the considered model is GT * G–1 N. Therefore, we have: GT

1 N þ : Pm b

ð10Þ

The total number of agents involved in the evolution is ntotal = n GT . Let us estimate the values GT and ntotal at a sufficiently high intensity of selection (when it is possible to neglect the second term in (10)) and for a sufficiently large population size (when the role of a neutral selection is small). Similar to the model of quasispecies with Hamming distance between the genotypes [4, 5], we suppose that Pm * N −1 and n * N. Finally, we obtain: GT N; ntotal N 2 :

ð11Þ

The expressions (11) characterize the main results of our estimations. These expressions have been checked by means of computer simulation. 2.4

Checking Estimations of the Rate and Efficiency of Evolutionary Search

The process of evolutionary search was analyzed by means of computer simulation. The number of spins in the spin glass at computer simulation was sufficiently large: 0 -20

E

1

-40 -60

2 -80

0

50

100

150

200

G

Fig. 2. The dependence of spin glass energy of agents E on the generations G of evolutionary search. 1 – the average energy of agents in the population, 2 – the minimal energy of agents in the population. The parameters of simulation were the following: the number of spins N = 100, the population size n = N = 100, the mutation intensity Pm = N−1 = 0.01, the parameter of selection intensity b = 1. Results are averaged for 1000 different calculations.

Evolutionary Minimization of Spin Glass Energy

129

N = 100. Figure 2 shows the dependence of spin glass energy of agents on the generations of evolutionary search. Figure 2 shows that the characteristic number of generations at evolutionary search GT is of the order of the number of spins N. This is in accordance with the estimations (11). It should be underlined that the evolutionary search results in one of the local energy minima of a spin glass. These minima are rather close to the global minimum of the energy of the spin glass. We also considered the gradual decrease of the spin glass energy, which is formed as follows. The signs of the symbols of the spin glass are sequentially changed (Si !– Si, i = 1, …, N) and only successful sign changes (resulting in the decrease of the spin glass energy) are fixed. The considered sequential search needs smaller number of participants as compared with the evolutionary search. Using computer simulation, we have analyzed the sequential search. The process of energy minimization at the sequential search is characterized by Fig. 3.

0 E

-5 -10 -15 -20 -25 -30 0

200

400

800

600

1000

t

Fig. 3. The dependence of the spin glass energy E on the searching time t at the sequential search. Results are averaged for 1000 different calculations.

Comparison of Figs. 2 and 3 shows that the evolutionary search provides a significantly deeper local energy minima EL, as compared with sequential search, because different valleys in energy landscape are looked through simultaneously in the evolutionary process with approaching to energy minima. Moreover, the evolutionary search ensures the finding of sufficiently deep local minima that are close to the global minima (see the expression (4) and Fig. 1 that characterize the value of global minima quantitatively). Therefore, in the spin-glass case, the evolutionary search has a definite advantage with respect to the sequential search: the evolutionary minimization ensures the finding of the deeper energy minima.

130

V. G. Red’ko and G. A. Beskhlebnova

3 Conclusion Thus, the model of evolutionary minimization of spin glass energy has been developed. The rate and efficiency of evolutionary minimization of energy of spin glasses have been analytically estimated and checked by computer simulation. It has been demonstrated that the evolutionary search ensures the finding of sufficiently deep local energy minima that are close to the global minimum. Acknowledgments. The work was financially supported by State Program of SRISA RAS. Project number is No. 0065-2019-0003 (AAA-A19-119011590090-2).

References 1. Red’ko, V.G.: Spin glasses and evolution. Biofizika (Biophys.) 35(5), 831–834 (1990). (in Russian) 2. Eigen, M.: Molekulare selbstorganisation und evolution (selforganization of matter and the evolution of biological macromolecules). Naturwissenschaften 58(10), 465–523 (1971) 3. Eigen, M., Schuster, P.: The Hypercycle: A Principle of Natural Self-Organization. Springer, Berlin (1979) 4. Red’ko, V.G., Tsoy, Y.R.: Estimation of the efficiency of evolution algorithms. Doklady Math. (Rep. Math.) 72(2), 810–813 (2005) 5. Red’ko, V.G.: Modeling of cognitive evolution. Toward the Theory of Evolutionary Origin of Human Thinking. KRASAND/URSS, Moscow (2018) 6. Sherrington, D., Kirkpatrick, S.: Solvable model of spin-glass. Phys. Rev. Lett. 35(26), 1792– 1796 (1975) 7. Kirkpatrick, S., Sherrington, D.: Infinite range model of spin-glass. Phys. Rev. B. 17(11), 4384–4403 (1978) 8. Tanaka, F., Edwards, S.F.: Analytic theory of the ground state of a spin glass: I. Ising spin glass. J. Phys. F: Metal Phys. 10(12), 2769–2778 (1980) 9. Young, A.P., Kirkpatrick, S.: Low-temperature behavior of the infinite-range Ising spin-glass: Exact statistical mechanics for small samples. Phys. Rev. B. 25(1), 440–451 (1982)

Comparison of Two Models of a Transparent Competitive Economy Zarema B. Sokhova(&) and Vladimir G. Red’ko Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow 117218, Russia [email protected], [email protected]

Abstract. The article compares two models of a transparent competitive economy. In both models, the interaction between investors and producers is considered. In the first model, the producers do not take into account their own contributions to their capitals, in the second model, the producers take into account their contributions to their own capitals, i.e. the producers themselves play the role of investors. The analysis of these two models by computer simulation was performed. It is shown that in the first model when the producers give half of their profits to investors, the capital in the producer community is redistributed by investors more efficiently. Keywords: Autonomous agents Investors Producers

Transparent competitive economy

1 Introduction This paper develops our previous works [1–3], in which the basic model of interaction between two communities of agents has been constructed and investigated. The basic model considers agents-producers and agents-investors. In the basic model, the producers do not take into account their contributions to their own capitals at the distribution of profits. In this paper, in addition to the basic model, a new model has been constructed, in which producers take into account their own contributions to their capitals at the distribution of their profits. This means that producers can be considered as some kind of investors that contribute the capital into themselves. By computer simulation, the results obtained in these two models are compared for two regimes: (1) without taking into account their own contributions of producers (the basic model) and (2) taking into account their own contributions of producers (the new model).

2 Description of Models 2.1

Basic Model

In the basic model, two communities of agents are considered: agent-investors and agent-producers [1–3]. The number of investors is N; the number of producers is M; their capitals are equal to Kinv и Kpro , respectively. Agents function during NT periods. © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 131–137, 2020. https://doi.org/10.1007/978-3-030-30425-6_14

132

Z. B. Sokhova and V. G. Red’ko

At the end of each period T; the investors determine the values of contributions that they will make into producers in the next period T þ 1. To find these values, tmax iterations are performed. During iterations, the investors and producers exchange information by means of light agents: searching agents and intention agents. These light agents are similar to those used in the works [4, 5]. In the beginning of the period, the i-th producer has a capital Ci : Ci ¼ Ci0 þ

N X

Cij ;

ð1Þ

i¼1

where Ci0 is the own initial capital of the i-th producer, Cij is the capital invested by the j-th investor into the i-th producer at the beginning of the period. The dependence of the i-th producer’s profit on its capital Ci is determined by the formula: Pi ðCi Þ ¼ ki FðCi Þ;

ð2Þ

where the function FðxÞ is the same for all producers, and the coefficient ki characterizes the efficiency of the i-th producer. The function FðxÞ has the form: FðxÞ ¼

ax; if x Th ; Th; if x [ Th

ð3Þ

where a is the positive parameter, Th is the threshold of the function FðxÞ; Th [ 0: At the end of the period, the producer returns to investors their invested capital. In addition, the producer pays investors a portion of their profits. At this payment, the j-th investor obtains the part of the profit that is proportional to the investment made by this investor into the i-th producer: Pinv;ij ¼ krepay Pi ðCi Þ

Cij ; N P Cil

ð4Þ

l¼1

where Ci is the current capital (at the beginning of the period) of the i-th producer, krepay is the payment parameter that characterizes the part of profits paid to investors, 0\krepay \1: Note that in this basic model, the producers do not take into account the size of their own contribution Ci0 and give the part of their profits to the investors according to the parameter krepay (see the expression (4)). The producer itself obtains the remaining part of the profit: Ppro;i ¼ Pi ðCi Þ

N X

Pinv;ij :

ð5Þ

j¼1

Let’s characterize the iterative process, during which the contributions of investors into producers are determined. At the first iteration, the investors send the searching agents to all producers and determine the current capital of each producer. Further, the

Comparison of Two Models of a Transparent Competitive Economy

133

investors estimate the values Aij , which characterize the profit expected from the i-th producer in the period. The values Aij are equal to: 0 Aij ¼ dij Pinv;ij ¼ dij krepay ki FðCi0 Þ

Cij ; N P Cil

ð6Þ

l¼1

where dij is the current degree of confidence of the j-th investor to the i-th producer, Cil 0 is the capital invested by the l-th investor into the i-th producer, Ci0 is the initial capital of the i-th producer at the beginning of the period (in the first iteration, investments of other investors are not taken into account). The current degree of confidence dij is equal to dtest or duntest , dtest [ duntest [ 0: Parameters dtest , duntest take into account the fact that the investor prefers the tested producers. In computer simulation, we set dtest ¼ 1; duntest ¼ 0:5: Then the j-th investor forms the intention to distribute its capital Kinv; j among the producers proportionally to the values Aij : Namely, it is planned that the contribution of the j-th investor into the i-th producer Cij will be equal to: Cij ¼ Kinv; j

Aij : M P Alj

ð7Þ

l¼1

At the second iteration, each investor sends the intention agents to all producers and informs them about the planned values of capital investments Cij : Based on these data, the producers estimate their new capitals, which they expect after receiving capitals from all investors. These capitals are calculated in accordance with the expression (1). Then investors send again the searching agents to all producers and evaluate the 0 new capitals of the producers Ci0 (taking into account the planned values of investN P Cil . Investors estimate new values ments Cij of other investors), as well as the sums l¼1

Aij in accordance with the expression (6), which already takes into account the sum of the intended contributions of all investors. Further, each investor forms a new intention to distribute the capital Kinv; j according to the expression (7). Then investors send intention agents to the producers and inform them about the new intended values of contributions Cij : After a sufficiently large number of such iterations, each investor makes the final decision on investments for the next period. The final contributions are equal to the values Cij obtained by investors at the last iteration. At the end of each period, the capitals of the producers are reduced: Kpro ðT þ 1Þ ¼ kamr Kpro ðTÞ, where kamr is the amortization coefficient (0\kamr 1). Investors capitals are reduced analogously: Kinv ðT þ 1Þ ¼ kinf Kinv ðTÞ, where kinf is the inflation coefficient 0\kinf 1 . If the capital of an investor or producer becomes more than a certain large threshold Thmax inv or Thmax pro , and the number of agents in the community is less than the

134

Z. B. Sokhova and V. G. Red’ko

possible maximum, then this investor or producer is divided to two agents. When the investor or producer is divided, the “parent” gives half of its capital to the “descendant”. The “producer-child” inherits the effectiveness ki of its parent. The “investorchild” inherits the confidence factors dij of the parent investor. The confidence factor dij to the “descendant” of the producer is set equal to duntest ; since this new producer was not tested yet. If the capital of an investor or producer becomes less than a certain small threshold Thmin inv or Thmin pro ; then this investor or producer dies. 2.2

New Model

In the basic model described above, the distribution of profits by producers and the estimations of profits (see the expressions (4), (6)) do not consider the contribution of the producers: independently on producer contribution Ci0 ; the profit is distributed between the producer and investors according to the payment parameter krepay : In the new model, we consider the contribution of the producer at the distribution of profits. We modify the expressions (4) and (6) as follows. Pinv;ij ¼ Pi ðCi Þ

Cij N P

;

ð8Þ

Cil þ Ci0

l¼1 0 Aij ¼ dij Pinv;ij ¼ dij ki FðCi0 Þ

Cij N P

:

ð9Þ

Cil þ Ci0

l¼1

Thus, at the distribution of profits, each agent (both the producer and the investor) receives a profit that is proportional to the contribution of this agent. The other elements of the new model are the same as in the basic model.

3 Results of Computer Simulation At the computer simulation, we compared the basic model and the new model. The main parameters of the simulation were the following: the number of periods NT ¼ 1 or 100; the maximal number of iterations within the period tmax ¼ 10; maximal capital thresholds for investors and producers Thmax inv ¼ 100:0; Thmax pro ¼ 100:0; minimal capital thresholds for investors and producers Thmin inv ¼ 0:01; Thmin pro ¼ 0:01; the maximal possible number of producers and investors in the community is Mmax ¼ 2 or 100 and Nmax ¼ 1 or 100; the initial number of producers and investors M0 ¼ 2 or 100; N0 ¼ 1 or 100; the maximal number of producers in which the investor can invest its capital m ¼ 2 or 100; the parameter a of the profit function a ¼ 0:1; the threshold of the profit function Th ¼ 100 (see the expression (3)); the payment parameter krepay ¼ 0:5; the amortization and inflation coefficients kamr ¼ 1:0; kinf ¼ 1:0;

Comparison of Two Models of a Transparent Competitive Economy

135

the characteristic value of the random variation of the efficiency of producers at the transition to a new period Dk ¼ 0:01: For a clearer understanding of the influence of the scheme, which is used by the producer, on the process the capital investments, the certain simulation was carried out for the particular case of one investor and two producers. The efficiencies of producers were k1 ¼ 0:34; k2 ¼ 0:94; the capital of the investor was Kinv ¼ 0:54; the capitals of producers were Kpro; 1 ¼ 0:48; Kpro; 2 ¼ 0:26: Figure 1 shows the processes of redistribution of the capital by the investor during iterations for the two considered models.

Fig. 1. Distribution of the investor’s contributions during iterations in the period T ¼ 0:

Figure 1 demonstrates that in the basic model, the investor makes contributions into two producers, and in the new model, the investor selects only one, the most efficient producer. That is, in the basic model, the investor at planning the contributions to producers pays attention to both the efficiency and capital amount of producers, and in the new model, the investor takes into account only the efficiency of producers (see also the expressions (6), (9) and (2), (3)). Let’s consider the case of the large community: N ¼ M ¼ 100: The simulation results for the considered models are presented in Fig. 2. Analysis of the results for this case shows that in the basic model, when the producers pay half of their profits to investors krepay ¼ 0:5 , the capital of the producer community is redistributed by investors more effectively. That is in the next period, the investor gives the obtained capital to more efficient producers. This is the important effect of the basic model: the efficient redistribution of capital within the producer community (by means of investors). Indeed, in the basic model, the total profit (and the total capital) of the producer community is greater as compared with the new model (Fig. 2).

136

Z. B. Sokhova and V. G. Red’ko

Fig. 2. Dynamics of total capital of investors and producers in two models. N ¼ M ¼ 100 (lines for producers and investors in the basic model coincide).

On the other hand, the regime of the new model is more profitable for investors. In this model, investors choose the most efficient producers, and the profit depends only on the size of the investments and the efficiency of the producer. It should be noted the following point of the new model. The investor uses the efficiency of the producer and receives the main part of the profits that corresponds to the investor’s contribution. The producer receives only a rather small part of the profits that corresponds producer’s contribution. Therefore, in the new model, the profits of producers grow more slowly as compared with the basic model (Fig. 2). From an economic point of view, the regime of the new model is rather unnatural, since the intensive development of contributions of investors is not very useful for producers. Therefore, the interaction between agents is rather ineffective in the new model. Thus, the regime of the basic model is more interesting for further research.

4 Conclusion It can be concluded that the behavior of the investors depends on the rules for estimations and distributions of profits. And although the regime of the new model is beneficial for the investor community, this regime is not profitable for producers. The producer community is developing more efficiently if the regime of the basic model is used. Thus, the regime of the basic model is more effective for the total development of the whole economic community. Acknowledgments. The work was financially supported by State Program of SRISA RAS. Project number is No. 0065-2019-0003 (AAA-A19-119011590090-2).

Comparison of Two Models of a Transparent Competitive Economy

137

References 1. Red’ko, V.G., Sokhova, Z.B.: Model of collective behavior of investors and producers in decentralized economic system. Procedia Comput. Sci. 123, 380–385 (2018) 2. Red’ko, V.G., Sokhova, Z.B.: Iterative method for distribution of capital in transparent economic system. Opt. Mem. Neural Netw. (Inf. Opt.) 26(3), 182–191 (2017) 3. Sokhova, Z.B., Red’ko, V.G.: Agent-based model of interactions in the community of investors and producers, In: Samsonovich, A.V., Klimov, V.V., Rybina, G.V. (eds.) Biologically Inspired Cognitive Architectures (BICA) for Young Scientists. Proceedings of the First International Early Research Career Enhancement School (FIERCES 2016), pp. 235– 240. Springer, Switzerland (2016) 4. Claes, R., Holvoet, T., Weyns, D.: A decentralized approach for anticipatory vehicle routing using delegate multiagent systems. IEEE Trans. Intell. Transp. Syst. 12(2), 364–373 (2011) 5. Holvoet, T., Valckenaers, P.: Exploiting the environment for coordinating agent intentions. In: Environments for Multi-Agent Systems III, Lecture Notes in Artificial Intelligence, vol. 4389, pp. 51–66. Springer. Berlin (2007)

Spectral Parameters of Heart Rate Variability as Indicators of the System Mismatch During Solving Moral Dilemmas I. M. Sozinova1,2(&), K. R. Arutyunova2, and Yu. I. Alexandrov1,2,3 1

2 3

Moscow State University of Psychology and Education, Moscow, Russia [email protected] Institute of Psychology, Russian Academy of Sciences, Moscow, Russia Department of Psychology, National Research University Higher School of Economics, Moscow, Russia

Abstract. Variability in beat-to-beat heart activity reflects the dynamics of heart-brain interactions. From the positions of the system evolutionary theory, any behaviour is based on simultaneous actualization of functional systems formed at different stages of phylo- and ontogenesis. Each functional system is comprised by neurons and other body cells, the activity of which contributes to achieving an adaptive outcome for the whole organism. In this study we hypothesized that the dynamics of spectral parameters of heart rate variability (HRV) can be used as an indicator of the system mismatch observed when functional systems with contradictory characteristics are actualized simultaneously. We presented 4–11-year-old children (N = 34) with a set of moral dilemmas describing situations where an in-group member achieved optional benefits by acting unfairly and endangering lives of out-group members. The results showed that LF/HF ratio of HRV was higher in children with developed moral attitudes for fairness toward out-groups as compared to children who showed preference for in-group members despite the unfair outcome for the outgroup. Thus, the system mismatch in situations with a moral conflict is shown to be reflected in the dynamics of heart activity. Keywords: System evolutionary theory Heart brain interactions Spectral parameters of heart rate variability Moral dilemmas In-group Out-group

1 Introduction Changes in heart rate variability (HRV) reflect the brain – heart interactions (e.g., [10, 14, 22, 24]). HRV indexes have previously been considered as indicators of changes in brain activation [24]. The baseline HRV is different in people in a state of coma as compared to healthy people, some authors suggested that HRV can serve as an indicator of the intensity of brain activity [17]. Thayer and colleagues [23] argued that changes in HRV reflect the hierarchy in organization of an organism and usually observed in response to indeterminacy and mismatch. The authors suggested that HRV could indicate the “vertical” integration of the brain mechanisms controlling an © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 138–143, 2020. https://doi.org/10.1007/978-3-030-30425-6_15

Spectral Parameters of Heart Rate Variability as Indicators

139

organism. It was noted that research into the relationship between heart and brain activity could open new horizons for the study of psychophysiological bases of individual behaviour [12]. Considered from the positions of the system evolutionary theory [2, 5, 21], any behaviour is based on simultaneous actualization of functional systems [3] formed at different stages of phylo- and ontogenesis. Each functional system is comprised by neurons and other body cells, including those of the heart, the joint activity of which contributes to achieving an adaptive outcome for the whole organism. From these positions, “HRV originates in cooperation of the heart with the other components of actualized functional systems” and reflects the system organization of behaviour (see [6]: p. 2). Our previous studies have found that in the process of individual development children gradually shift from supporting in-group members, even when they behave unfairly towards out-group members, to prioritizing fairness towards all other individuals, irrespective of what group they belong to [19, 20]. We argued that learning to support fairness towards out-groups is associated with forming new functional systems enabling this more complex behaviour. However, fairness towards outgroups can be contradictory to earlier formed unconditional in-group preference. Situations like this can be described as the system mismatch, when functional systems with contradictory characteristics are actualized simultaneously. Here we hypothesize that in a situation of a conflict between in- and out-group members, fairness towards out-groups would predetermine the occurrence of a system mismatch reflected in HRV. To test this hypothesis, we analyzed the spectral parameters of HRV in children solving moral dilemmas with a conflict between in- and out-group members.

2 Materials and Methods Thirty-four children participated in the study: 4–5-year-old pre-schoolers (N = 19; Mean = 5,14; Med = 5; S.D. = 0,43; 25% = 4,48; 75% = 5,35) and 10–11-year-old school children (N = 15; Mean = 10,62; Med = 10,92; S.D. = 0,52; 25% = 10; 75% = 11). The experimental protocols were approved by the Ethics Committee of the Institute of Psychology Russian Academy of Sciences. Parents of all participants were provided with detailed information about procedures of the study and signed informed consent forms to allow their children to participate. Each child was individually interviewed in a separate room. All children were presented with a set of moral dilemmas describing situations when a limited resource was essential for the survival of an out-group member and beneficial, but not vital, for the well-being of an in-group member. In each dilemma, an in-group member took away the resource, putting an outgroup member’s life at risk, and children had to choose who to support in this situation. Heart rate was recorded during the entire experiment using a photoplethysmograph RB-16CPS (Neurolab) and wireless sensor Zephyr HxM BT. BMInput (A.K. Krylov) and HR-reader (V.V. Kozhevnikov) software were used. Pulsograms were recorded into sequences of RR intervals by “Neuru” program (A.K. Krylov). The spectral parameters of HRV were calculated using RRv7 software (I.S. Shishalov) (window length — 100 s; step — 10 s). We analysed the following

140

I. M. Sozinova et al.

spectral parameters of HRV: low frequency power of HRV (LF), high frequency power of HRV (HF), total power of HRV (TP), and LF/HF ratio [13]. Responses to dilemmas were coded as “1”, if a child chose to support an out-group member, and “0”, if a child chose to support an in-group member. Average scores characterising individual responses to all dilemmas were also calculated. For the analyses, all participants were subdivided into two groups: those who supported outgroup members in more than a half of the dilemmas (“out-group supporters”) and those who supported in-group members in more than a half of the dilemmas (“in-group supporters”). Statistical analyses were performed with IBM SPSS Statistic 17. Significance at p < 0.05.

3 Results Average scores characterising individual responses to all dilemmas were different between pre-schoolers and school age children, with pre-schoolers supporting outgroup members less often (Mann-Whitney U test: U = 73.5, z = –2.43; p = 0.015). No significant difference between the “in-group supporters” and “out-group supporters” was observed in LF, HF or TP. Higher values of LF/HF ratio were shown in “out-group supporters” as compared to “in-group supporters” (Mann-Whitney U test: U = 1.0, z = –2.939; p = 0.003, for 4–5-year-olds; and U = 4.0, z = –2.66; p = 0.0008, for all children). No difference in LF/HF ratio was observed between the groups of 4–5-yearold and 10–11-year-old children within the subgroup of “out-group supporters” (see Fig. 1).

Fig. 1. Higher values of LF/HF ratio in children supporting out-group members as compared to children supporting in-group members in situations with a conflict where out-group members were treated unfairly by in-group members * Mann-Whitney U test, p < 0.05.

Spectral Parameters of Heart Rate Variability as Indicators

141

There was an insufficient number of “in-group supporters” among the 10–11-yearold children for such statistical comparison.

4 Discussion In this study we tested the hypothesis that in a situation of a conflict between in- and out-group members, fairness towards out-groups would predetermine the occurrence of a system mismatch, which is observed when functional systems with contradictory characteristics are actualized simultaneously; and such a mismatch would be reflected in HRV. As mentioned above, any behaviour, including moral dilemma solving, is supported by simultaneous actualization of functional systems formed at different stages of individual development. Our previous work [19, 20] demonstrated that young preschool age children tended to exhibit unconditional in-group preference, which is considered a behavioural strategy based on actualization of functional systems formed early in individual development, including those associated with parochial altruism (unconditional in-group preference with aggressive behaviour toward out-groups [1, 9, 11]). Older children were shown to develop a more complex behavioural strategy to support those treated unfairly, including members of out-groups, which requires actualisation of later-formed functional systems. This is consistent with the view that reciprocal altruism toward out-group members requires higher cognitive complexity [16]. It is possible that the whole structure of individual experience is reorganised through the formation of “new” systems enabling a new type of behaviour, which may require some time. The development of moral attitudes towards out-groups occurs gradually and requires accumulation of a sufficient number of episodes associated with the “new” moral behaviour. The conflict between the earlier and later formed systems activated simultaneously can be described as an instance of the system mismatch, because these systems have contradictory characteristics. The results of this study showed that in situations involving a conflict where outgroup members are treated unfairly by in-group members, the decision to support outgroup members was associated with higher values of LF/HF ratio of HRV. Higher values of LF/HF ratio are usually observed during stress [7, 8, 15, 18], which is also considered as a situation of the system mismatch [4]. Thus, the results of this study indicate that characteristics of social behaviour and its development, as observed in case of moral attitudes toward in- and out-group members, can be manifested in the dynamics of individual psychophysiological states. Acknowlegements. The reported study was funded by RFBR, the research project № 18-31320003_mol_a_ved.

142

I. M. Sozinova et al.

References 1. Abbink, K., Brandts, J., Herrmann, B., Orzen, H.: Parochial altruism in inter-group conflicts. Econ. Lett. 117(1), 45–48 (2012) 2. Alexandrov, Yu.I.: How we fragment the world: the view from inside versus the view from outside. Soc. Sci. Inf. 47(3), 419–457 (2008) 3. Alexandrov, Yu.I.: Cognition as systemogenesis. In: Anticipation: Learning from the Past, pp. 193–220. Springer, Cham (2015) 4. Alexandrov, Yu.I., Svarnik, O.E., Znamenskaya, I.I., Kolbeneva, M.G., Arutynova, K.R., Krylov, A.K., Bulava, A.I.: Regression as stage of development [Regressiya kak etap razvitiya]. M.: Institute of Psychology Ras [Institut Psikhologii RAN] (2017) [in Russian] 5. Alexandrov, YuI, Grechenko, T.N., Gavrilov, V.V., Gorkin, A.G., Shevchenko, D.G., Grinchenko, Y.V., Bodunov, M.V.: Formation and realization of individual experience. Neurosci Behav Physiol 27(4), 441–454 (1997) 6. Anokhin, P.K.: Biology and Neurophysiology of Conditioned Reflex and Its Role in Adaptive Behavior, 1st edn. Pergamon Press, Oxford (1974) 7. Bakhchina, A.V., Arutyunova, K.R., Sozinov, A.A., Demidovsky, A.V., Alexandrov, Y.I.: Sample entropy of the heart rate reflects properties of the system organization of behaviour. Entropy 20(6), 449 (2018) 8. Bakhchina, A.V., Shishalov, I.S., Parin, S.B., Polevayam, S.A.: The dynamic cardiovascular markers of stress. Int. J. Psychophysiol. 94(2), 230 (2014) 9. Bernhard, H., Fischbacher, U., Fehr, E.: Parochial altruism in humans. Nature 442(7105), 912 (2006) 10. Billman, G.E.: The effect of heart rate on the heart rate variability response to autonomic interventions. Front. Physiol. 4, 222 (2013) 11. Choi, J.K., Bowles, S.: The coevolution of parochial altruism and war. Science 318(5850), 636–640 (2007) 12. Lane, R.D., Wager, T.D.: The new field of Brain-Body Medicine: What have we learned and where are we headed? NeuroImage 47(3), 135–1140 (2009) 13. Lombardi, F.: Clinical implications of present physiological understanding of HRV components. Card. Electrophysiol. Rev. 6(3), 245–249 (2002) 14. McCraty, R., Atkinson, M., Tomasino, D., Bradley, R.T.: The coherent heart heart-brain interactions, psychophysiological coherence, and the emergence of system-wide order. Integr. Rev. A Transdisc. Transcult. J. New Thought Res. Prax. 5(2) (2009) 15. Polevaya, S.A., Eremin, E.V., Bulanov, N.A., Bakhchina, A.V., Kovalchuk, A.V., Parin, S. B.: Event-related telemetry of heart rate for personalized remote monitoring of cognitive functions and stress under conditions of everyday activity. Sovremennye tekhnologii v medicine 11(1 (eng)) (2019) 16. Reznikova, Z.: Altruistic behavior and cognitive specialization in animal communities. In: Encyclopedia of the Sciences of Learning, pp. 205–208 (2012) 17. Riganello, F., Candelieri, A., Quintieri, M., Conforti, D., Dolce, G.: Heart rate variability: an index of brain processing in vegetative state? An artificial intelligence, data mining study. Clin. Neurophysiol. 121, 2024–2034 (2010) 18. Runova, E.V., Grigoreva, V.N., Bakhchina, A.V., Parin, S.B., Shishalov, I.S., Kozhevnikov, V.V., Nekrasova, M.M., Karatushina, D.I., Grigoreva, K.A., Polevaya, S.A.: Vegetative correlates of conscious representation of emotional stress. CTM 5(4), 69–77 (2013) 19. Sozinova, I.M., Znamenskaya, I.I.: Dynamics of Russian children’s moral attitudes toward out-group members. In: The Sixth International Conference On Cognitive Science, p. 94 (2014)

Spectral Parameters of Heart Rate Variability as Indicators

143

20. Sozinova, I.M., Sozinov, A.A., Laukka, S.J., Alexandrov, Yu.I.: The prerequisites of prosocial behavior in human ontogeny. Int. J. Cogn. Res. Sci. Eng. Educ. (IJCRSEE) 5(1), 57–63 (2017) 21. Shvyrkov, V.B.: Behavioral specialization of neurons and the system-selection hypothesis of learning. In: Human Memory and Cognitive Capabilities, pp. 599–611. Elsevier, Amsterdam (1986) 22. Stefanovska, A.: Coupled oscillators: complex but not complicated cardiovascular and brain interactions. In: 2006 International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 437–440. IEEE (2006) 23. Thayer, J.F., Lane, R.D.: Claude Bernard and the heart–brain connection: Further elaboration of a model of neurovisceral integration. Neurosci. Biobehav. Rev. 33, 81–88 (2009) 24. Van der Wall, E.E., Van Gilst, W.H.: Neurocardiology: close interaction between heart and brain. Netherlands Heart J. 21(2), 51–52 (2013)

The Role of Brain Stem Structures in the Vegetative Reactions Based on fMRI Analysis Vadim L. Ushakov1,2(&), Vyacheslav A. Orlov1, Yuri I. Kholodny1,3, Sergey I. Kartashov1,2, Denis G. Malakhov1, and Mikhail V. Kovalchuk1 National Research Center “Kurchatov Institute”, Moscow, Russia [email protected] 2 National Research Nuclear University “MEPhI”, Moscow, Russia 3 Bauman Moscow State Technical University, Moscow, Russia 1

Abstract. This work was aimed at studying the role of brain stem structures in vegetative responses upon presentation of self significant stimuli (personal name) using the functional MRI method. The subjects, based on the data of the MRI compatible polygraph, were divided into three groups with different degree of vegetative reactions to personality-related stimuli: with strong galvanic skin reactions (GSR) only—7 subjects; with medium GSR and cardiovascular response (CR)—6 subjects; and with low reactivity of GSR and CR—5 subjects. The obtained statistical maps of brain neural network activities showed high activation of the brain stem structures upon presentation of personality-related stimuli in the second group (medium GSR and CR); low activation of the stem structures in the first group (strong GSR); and complete absence of activation of the stem structures in subjects of the third group (with low reactivity of the GSR and CR). It was shown that the use of MRI compatible polygraph for selection of fMRI data to subsequent statistical analysis is effective. Keywords: MRI compatible polygraph Traces of memory

fMRI Vegetative reactions

1 Introduction In the study of operation of brain neural networks and the determination of their exact spatial-temporal characteristics, objective monitoring of the current condition of subjects during functional magnetic resonance imaging (fMRI) is necessary. For this purpose, an MRI compatible polygraph (MRIcP) has been developed at NRC “Kurchatov Institute”, which allows monitoring the dynamics of human vegetative reactions during MRI examination (earlier, for this purpose, we used MRI compatible electroencephalograph [1] and eye-tracker [2–4]). The data obtained with the use of MRIcP could serve as correlates of important neurophysiological processes in the brain and could be used to determine activation of neural networks involved in these processes. In this work, a study was carried out using an MRIcP to reveal the relationship between the dynamics of vegetative reactions—galvanic skin response (GSR) and © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 144–150, 2020. https://doi.org/10.1007/978-3-030-30425-6_16

The Role of Brain Stem Structures in the Vegetative Reactions

145

cardiovascular response (CR)—in response to presentation of stimuli that are personality-related for subjects, and activity of the brain stem areas potentially responsible for the regulation of the human cardiovascular system.

2 Materials and Methods Experiments were performed on a homogeneous group of 20 healthy subjects (men aged 22–25 years). This study was approved by the ethics committee of the National Research Centre Kurchatov Institute, ref. no. 5 (from April 5, 2017). All subjects signed an informed consent for participation in the study. The experiment was conducted using Siemens Magnetom Verio 3T MRI scanner based on NRC Kurchatov Institute. To obtain anatomical MRI images, a threedimensional T1-weighted sequence was used in the sagittal plane with high spatial resolution (176 slices, TR = 2530 ms, TE = 3.31 ms, thickness = 1 mm, angle = 7, inversion time = 1200 ms and FOV = 256 256 mm2). Functional data was obtained using a standard echo-planar sequence (32 slices, TR = 2000 ms, TE = 24 ms and isotropic voxel 2 2 2 mm3). Preprocessing of MRI data was carried out on the basis of the freely distributed software package SPM8 [5], and specially adapted and developed terminal scripts for the MacOS system. The coordinate center of structural and functional data were brought to front commissure. Further, calculation and correction of motion artifacts was made. With the help of separately recorded magnetic field inhomogeneity maps, functional data was corrected in order to remove magnetic susceptibility artifacts. Structural and functional MRI volumes were normalized to the MNI (Montreal Neurological Institute) space. In order to remove incidental emissions, a Gaussian based filter with a 6 6 6 mm3 core was applied to the functional data. The preprocessing procedure was carried out according to the above scheme for each of the 20 subjects. Student t-test was used for statistical analysis. For the calculation of brain areas connectivity, the SPM’s CONN toolbox was used. During the experiments, the so-called “test with a concealed name” (TCN) widely used in forensic studies using a polygraph (SUP), was applied, during which the person under study (hereinafter referred to as the subject) concealed his own name from the polygraph examiner along with five other names; the series of names were presented five times to the subject during the test. With the exception of one name, which stood under the number “0”, all the others were presented in a random, unknown order to the subject with the phrase “Your passport name is…”. The names were presented by the experimenter with an interval of about 20 s with the obligatory account of the current dynamics of the physiological parameters, recorded using MRIcP. The cumulative graphical representation of the physiological parameters during the TCN was visualized on the computer screen in the form of a polygram. The dynamics of the electrical properties of the skin, i.e. galvanic skin reactions, as well as reactions in the cardiovascular system manifested in the change of heart rate and narrowing of the blood vessels of fingers (the so-called vascular spasm) were analyzed. Registered physiological reactions were expertly evaluated on a 3-point scale widely used in SUP practice [6]. This test and the principle of classification of subjects (20 persons) into subgroups— high-reactive subjects (15 persons) and low-reactive subjects (5 persons)—are described

146

V. L. Ushakov et al.

in detail in [7]. In the subgroup of high-reactive subjects (15 persons), the degree of GSR was in the range of 60–100% (that is, the subjects have according to the GSR in TCN from 6 to 10 points out of 10 possible). In low-reactive subjects (5 persons), the degree of GSR was 40% or less (i.e., subjects received 4 or less out of 10 possible). It should be noted that a subgroup of 15 highly reactive subjects, according to MRIcP data, also turned out to be heterogeneous (as described in [7]) and was divided into two parts. The subjects, based on MRcP data, were divided into three groups with different degree of autonomic reactions to personality-related stimuli: with strong GSR only—7 subjects (group 1), with mean GSR and CR (measured by photoplethysmogram signal)—6 subjects (group 2), and with low reactivity of the GSR and CR—5 subjects (group 3). Two people were excluded from the analysis because they had no signs of this gradation. The obtained statistical maps of brain neural networks activity (see below) showed high activation of brain stem structures upon personality-related stimuli presentation in the second group (mean GSR and CR), low activation of stem structures in the first group (strong GSR), and total absence of stem structure activation in subjects of the third group (low-reactive GSR and CR). The first group included the subjects, in whom only GSR was highly informative in identifying the concealed name, and the subjects in the second group had GSR and vascular spasm (Fig. 1) as informative parameters.

Fig. 1. Polygram of TCN of a highly reactive subject. 8 channels correspond to: 1—sound of presented stimuli; 2—sound of subject responses (along with the sound of MRI scanner); 3— subject head movement; 4,5—upper and lower pneumogram sensors, 6—GSR; 7—HR; 8— photoplethysmogram.

The Role of Brain Stem Structures in the Vegetative Reactions

147

On the Fig. 1, the fifth, last presentation of the TCN is shown. Concealing meaningful information (the own name was Alexander, highlighted by a rectangle on Fig. 1) causes the subject to have a maximum GSR (channel 6), decrease in heart rate (channel 7; a moving “lens” shows 85 beats per minute) and pronounced, minimal in this presentation, narrowing of the vessels of the fingers (channel 8). It was very difficult for low-reactive subjects to isolate a concealed name by MRTcP recorded reactions, due to their physiological characteristics, low reactivity and instability of GSR, heart rate and vascular spasm (Fig. 2).

Fig. 2. Polygram of TCN of a low-reactive subject. 8 channels correspond to: 1—sound of presented stimuli; 2—sound of subject responses (along with the sound of MRI scanner); 3— subject head movement; 4,5—upper and lower pneumogram sensors, 6—GSR; 7—HR; 8— photoplethysmogram.

The Fig. 2 shows the chaotic appearance of the GSR during the third (out of five) presentations of the TCN. Concealing his own name (Andrew, highlighted by a rectangle), among other names, causes the subject to have a very weak GSR (channel 6), as well as not accompanied by a drop in heart rate (channel 7) and narrowing of the vessels of the fingers (channel 8).

3 Results Figure 3 shows fMRI results obtained for the three groups of subjects, divided on the basis of the MRIcP data: with strong GSR only (group 1); with mean GSR and CR (group 2); with low reactivity of the GSR and CR (group 3).

148

V. L. Ushakov et al.

Fig. 3. The results of group statistical analysis (p < 0,001) for comparison of personality-related stimuli perception in relation to neutral stimuli. The figure shows a group statistical map underlaid with a high-resolution T1 image at levels x = −8, −6, −4: A—group 1; B—group 2; C —group 1 with removal of some of the fMRI samples of perception of neutral names in the cases when there was high reactivity in the MRIcP signal; D—group 2 with removal of some of the fMRI samples of perception of neutral names in the cases when there was high reactivity in MRIcP signal; E—group 3; F—group 3 with removal of some of the fMRI samples of perception of neutral names in the cases when there was high reactivity in the MRIcP signal.

The Role of Brain Stem Structures in the Vegetative Reactions

149

Fig. 3. (continued)

On the basis of the obtained data of brain stem activation upon presentation of self significant stimuli, connectivity between this zone and other parts of the brain were restored separately for groups with pronounced physiological reactions (15 subjects) and with low physiological reactions (5 subjects). As a result, it was shown that for a group of subjects with pronounced physiological reactions, a statistically significant (p < 0,001) negative correlation was observed between the activity of the brain stem and the hippocampus when perceiving personality-related stimuli with respect to neutral ones.

4 Discussion As can be seen from results shown in Fig. 3, for a group with mean GSR and HR changes, a pronounced activation of the brain stem structures is observed upon presentation of self significant stimuli (see Fig. 3A and C), a significantly lower level of activity in the group with strong GSR (see Fig. 3E and F) and the complete absence of stem activations in the group with low reactivity of the GSR and CR (see Fig. 3B and D). When removing the neutral words from a sample of fMRI signals in the condition when high reactivity was observed in MRIcP data, more extensive activity was observed in the brain stem in groups 1 and 3 that consistent with the operation of autonomous regulation systems [8]. Thus, we can conclude about the effectiveness of using an MRIcP for the selection of fMRI data for subsequent statistical analysis. The revealed hidden negative correlation between the activity of the brain stem and the hippocampus in the perception of personality-related stimuli with respect to neutral ones shows the promise of using the method of constructing connectomes to visualize

150

V. L. Ushakov et al.

the processes of neural network interactions with each other, which will be used in further work. The experiments confirmed the promising prospects of the joint use of fMRI technology and SUP to study neurocognitive processes. In the course of the study, the criterion for classifying subjects according to the dynamics of their vegetative reactions was discovered: the criterion allows for a more focused approach to the study of neurocognitive processes and may contribute to improving the quality of fMRI research for various purposes. Acknowledgements. This study was partially supported by the National Research Centre Kurchatov Institute (MRI compatible polygraphy), by RFBR Grant ofi-m 17-29-02518 (the cognitive-effective structures of the human brain), by the Russian Foundation of Basic Research, grant RFBR 18-29-23020 mk (method and approaches for fMRI analyses). The authors are grateful to the MEPhI Academic Excellence Project for providing computing resources and facilities to perform experimental data processing.

References 1. Dorokhov, V.B., Malakhov, D.G., Orlov, V.A., Ushakov, V.L.: Experimental model of study of consciousness at the awakening: fMRI, EEG and behavioral methods. In: BICA 2018, Proceedings of the Ninth Annual Meeting of the BICA Society. Advances in Intelligent Systems and Computing, vol. 848, pp. 82–87 (2019) 2. Korosteleva, A., Mishulina, O., Ushakov, V.: Information approach in the problems of data processing and analysis of cognitive experiments. In: BICA 2018, Proceedings of the Ninth Annual Meeting of the BICA Society. Advances in Intelligent Systems and Computing, vol. 848, pp. 180–186 (2019) 3. Korosteleva, A., Ushakov, V., Malakhov, D., Velichkovsky, B.: Event-related fMRI analysis based on the eye tracking and the use of ultrafast sequences. In: BICA for Young Scientists, Proceedings of the First International Early Research Career Enhancement School on BICA and Cybersecurity (FIERCES 2017). Advances in Intelligent Systems and Computing, vol. 636, pp. 107–112 (2017) 4. Orlov, V.A., Kartashov, S.I., Ushakov, V.L., Korosteleva, A.N., Roik, A.O., Velichkovsky, B.M., Ivanitsky, G.A.: “Cognovisor” for the human brain: Towards mapping of thought processes by a combination of fMRI and eye-tracking. In: Book Advances in Intelligent Systems and Computing. Springer, vol. 449, pp. 151–157 (2016) 5. Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.B., Frith, C.D., Frackowiak, R.S.: Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 2, 189–210 (1995) 6. The accuracy and utility of polygraph testing (Department of Defense, DC). Polygraph 13, 1– 143 (1984) 7. Orlov, V.A., Kholodny, Y.I., Kartashov, S.I., Malakhov, D.G., Kovalchuk, M.V., Ushakov, V.L.: Application of registration of human vegetative reactions in the process of functional magnetic resonance imaging. In: Advances in Intelligent Systems and Computing (2019), in Press 8. Sclocco, R., Beissner, F., Bianciardi, M., Polimeni, J.R., Napadow, V.: Challenges and opportunities for brainstem neuroimaging with ultrahigh field MRI. NeuroImage 168, 412– 426 (2018)

Ordering of Words by the Spoken Word Recognition Time Victor Vvedensky1(&), Konstantin Gurtovoy2, Mikhail Sokolov2, and Mikhail Matveev3 1

2

NRC Kurchatov Institute, Moscow, Russia [email protected] Children’s Technology Park of NRC Kurchatov Institute, Moscow, Russia 3 Moscow State Institute of International Relations, Moscow, Russia

Abstract. We measured the time needed to recognize spoken words in a group of 12 subjects. We see that recognition time varies for different words with the same sound duration and they can be ordered from the word perceived most quickly to the “slowest” one. Every subject “generates” his own ordered list of 24 words used. The individual lists are similar to some extent, so that the robust average list can be compiled. Presumably, it reflects distribution of the word representations in the cortex and the time required to retrieve any word depends on its position. Keywords: Spoken word recognition time Word ordering Network science

1 Introduction Selecting operators for voice control of freely moving devices we encountered the phenomenon which was not explicitly reported in the spoken word recognition studies [1, 2]. Despite decades of intensive research the field of spoken word recognition still remains open for the study of the underlying cognitive and linguistic processes. With new technologies available it is worth to revisit simple experimental approaches used to explore the process of human perception of the spoken words. Before setting up complex study of speech perception by humans, which involves the use of sophisticated equipment such as functional magnetic resonance imaging fMRI, magnetic encephalography MEG or brain-computer interfaces BCI, one has to select proper linguistic material for the experiments. This requires a set of compact preliminary tests which on the one hand can assess the ability of the candidate subjects to perform smoothly the proposed task and on the other hand can sort the suggested linguistic material. One has to select these words or their combinations which would allow lucid interpretation of the experimental data. We believe that the spoken word should be selected as the basic stimulus, since visual presentation of words implies the study of the language for literate. The latter presumably involves other brain mechanisms than the “language for illiterate” or the basic language do. We hope that the cortical processes which activate smaller number of different activities will be easier to describe and may be even to understand. Starting from this background we designed our experiments described below. © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 151–156, 2020. https://doi.org/10.1007/978-3-030-30425-6_17

152

V. Vvedensky et al.

2 Methods 24 Russian nouns were presented in random order, each word three times. The words were pronounced by the same male speaker. Age range of our 12 listeners (5 women) was quite broad: 10, 16, 17, 31, 32, 45, 61, 61, 62, 63, 70, 80 years. All subjects gave informed consent to participate in the experiments. The study was approved by the local ethics committee for biomedical research of RNC Kurchatov Institute. Each session lasted about 20 min. The subjects were instructed to press “Enter” on the keyboard just at the moment they recognize the word they hear. Before the next trial, they repeat the word heard. The task is reasonably simple, so practically no errors occur. This is the list of the words used: эффeкт, кyлaк, пecoк, мocт, cпopт, глaз, книжкa, нapoд, пopoг, вaгoн, жизнь, вxoд, живoт, caпoг, мacтep, мeчтa, кocтюм, oceнь, гpyппa, ceлo, вpeмя, жeнa, чиcлo, тpyбкa (in English: effect, fist, sand, bridge, sport, eye, book, people, door-step, carriage, life, entrance, belly, boot, master, dream, suit, autumn, group, village, time, wife, number, pipe). Sound duration of the words is nearly equal despite different number of letters (4 to 6) in the selected words.

3 Results The scatter of recognition times is shown in Fig. 1 for three subjects, others display the same behavior. The scatter is considerable and at first glance looks noise-like. One should not think that such a large scatter is somewhat special for just the experiment

Fig. 1. Time when the subjects pressed the key, indicating that they understood the word they hear. 24 words and two repetitions for each were presented in random order. Average reaction time for these subjects is somewhat different. In this plot reaction time is referenced to the sound offset.

Ordering of Words by the Spoken Word Recognition Time

153

with words. Quite the opposite, this phenomenon always complicates measurements of the reaction time to simple stimuli, especially relevant for pilots and sportsmen. However, in our case the stimulus is quite complex and different each time. We analyze human reactions on different words separately. Recognition time is referenced to the sound offset point since majority of the key presses fall on the post-word period. It turns out that the recognition times for different words of the same sound duration can be ordered, so that each listener generates ordered list of 24 perceived words. Two examples are shown in Fig. 2.

Fig. 2. 24 words heard by two listeners (Subject1 and Subject 12 in Fig. 4) and ordered by their recognition times. In this plot reaction time is referenced to the sound onset. Time scale is in milliseconds. Each word was presented three times. Ends of the scatter bars correspond to the longest and shortest recognition times, while the third time lies in the middle. One can see similarity of these ordered word lists.

It is difficult to compare performance of different people using reaction time, because it is highly variable. One needs more robust characteristics describing experimental data. The test object for our subjects is the list of words. We see that each subject perceives the list in its own manner: some words quickly, some words slowly. We see that these individual lists are similar to some extent. We ascribe rank to each word in the individual list, transforming it into a vector with 24 components. The vectors for different subjects can be compared. Figure 3 displays average word list for 12 subjects. The plot also indicates the scatter of each word position in the individual lists. This scatter reflects individuality of each listener and can be used to assess ability of the subjects to perform the task. Each one recognizes words in a slightly different way. It turns out that just correlation of individual rank vector with the average value specifies each listener in a sufficiently

154

V. Vvedensky et al.

Fig. 3. List of 24 Russian words ordered by 12 listeners. The word below is recognized most quickly while the recognition time gradually increases for the words above. Each listener generates personal ordered list of the words with gradually growing recognition time. The ordered lists are basically similar for the subjects and error bar represents standard deviation of the rank for each word.

robust way. This correlation is shown in Fig. 4. Linear order emerging in a group of subjects performing some cognitive task is common – the most obvious example is the ranking of chess players. Ranking in the same group is not universal, though depends on the specific task. In the same group of tennis players the rankings for singles and doubles can differ considerably. It is worth to mention that the words also tend to be ordered into linear lists: the Zipf law is the most spectacular example. Earlier we observed the same ordering of both nouns and listeners for another group of 24 words: кaшa, лeди, пoни, минa, гpyшa, тyшa, cитo, пивo, ceти, тeмa, кoмa, вилы, бycы, мyxa, тинa, зoнa, cтaя, лocи, дypa, yши, дaмa, дoля, caжa, лыжи (in English: porridge, lady, pony, mine, pear, carcass, sieve, beer, net, theme, coma, hayfork, beads, fly, ooze, zone, flock, moose, fool, ears, dame, share, soot, ski). These words are presented in the order of decreasing recognition time. In this early experiment another group of listeners was tested.

Ordering of Words by the Spoken Word Recognition Time

155

Fig. 4. Correlation of ranked lists of 24 words, generated by 12 listeners, with the average list. Trend line demonstrates ranking of the subjects.

4 Discussion We analyze only a small group of words from several thousand used in the language. However this is the common feature of all linguistic experiments. We are looking forward to develop an approach which in evolutionary way will select proper groups of words for particular linguistic task. The choice of proper group of listeners is also quite important, since different people use variable strategies in the speech communication. So the dialects emerge. Our data show the directions where we shall proceed. We have to generate new lists of words around the “quick” and “slow” words in the analyzed list. There are plenty of words in the thesaurus. The same list has to be presented to several clearly distinct groups of listeners, which emerge from previous experimentation. In this way we expect to cover considerable part of the language thesaurus and to find directions where the experimental data will produce crucial information for the understanding of speech perception. Neuroimaging experimental data on the perception of words indicate broad scatter of cortical activity, related to individual words, over the considerable part of both cerebral hemispheres [3]. Locations for different word groups are detected using fMRI

156

V. Vvedensky et al.

machines, so that “across the cortex, semantic representation is organized along smooth gradients that seem to be distributed systematically” [4]. It seems likely that we see these local gradients in our experiments with groups of words. Observed linearity is certainly local (for just group of words) though we believe that these linear segments can be woven into the complete network of words, could be similar to the fishnet. We believe that our simple though careful testing of the words groups which can be represented in the same cortical area can shed light on the mechanisms people use for language communication. The tests described here can be easily combined with MEG measurements which have long shown that the word heard evokes neuronal activity in many places throughout the cortex [5]. The author VLV is supported by the Russian Fund for Basic Research, grant 18-0000575 comfi.

References 1. Pisoni, D.B., McLennan, C.T.: Spoken word recognition: historical roots, current theoretical issues, and some new directions. In: Neurobiology of Language, Chap. 20. Elsevier Inc., Amsterdam (2016). https://doi.org/10.1016/B978-0-12-407794-2.00093-6 2. Vitevitch, M.S., Luce, P.A.: Phonological neighborhood effects in spoken word perception and production. Annu. Rev. Linguist. 2(7), 1–7.20 (2016) 3. Huth, A.G., de Heer, W.A., Griffiths, T.L., Theunissen, F.E., Gallant, J.L.: Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532(7600), 453–458 (2016). PMID: 27121839 4. Huth, A.G., Nishimoto, S., Vu, A.T., Gallant, J.L.: A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron 76, 1210–1224 (2012). https://doi.org/10.1016/j.neuron.2012.10.01499-110 5. Vvedensky V.L., Korshakov A.V.: Observation of many active regions in the right and left hemispheres of the human brain which simultaneously and independently respond to word. In: Proceedings Part 1 XV Russian Conference Neuroinformatics-2013, MEPhI, Moscow, pp. 43–52 (2013). (in Russian)

Neurobiology and Neurobionics

A Novel Avoidance Test Setup: Device and Exemplary Tasks Alexandra I. Bulava1(&), Sergey V. Volkov2,3, and Yuri I. Alexandrov1,4,5 1

Shvyrkov Lab of Neuronal Bases of Mind, Institute of Psychology, Russian Academy of Sciences, Moscow, Russia [email protected] 2 Lab for Behaviour of Lower Vertebrates, Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, Moscow, Russia 3 Ocean Acoustics Lab, Shirshov Institute of Oceanology, Russian Academy of Sciences, Moscow, Russia 4 Moscow State University of Psychology and Education, Moscow, Russia 5 Department of Psychology, National Research University Higher School of Economics, Moscow, Russia

Abstract. This paper presents a novel rodent avoidance test. We have developed a specialized device and procedures that expand the possibilities for exploration of the processes of learning and memory in a psychophysiological experiment. The device consists of a current stimulating electrode-platform and custom software that allows to control and record real-time experimental protocols as well as reconstructs animal movement paths. The device can be used to carry out typical footshock-avoidance tests, such as passive, active, modified active and pedal-press avoidance tasks. It can also be utilized in the studies of prosocial behavior, including cooperation, competition, emotional contagion and empathy. This novel footshock-avoidance test procedure allows flexible currentstimulating settings. In our work, we have used slow-rising current. A test animal can choose between the current rise and time-out intervals as a signal for action in footshock avoidable tasks. This represents a choice between escape and avoidance. This method can be used to explore individual differences in decisionmaking and choice of avoidance strategies. It has been shown previously that a behavioral act, for example, pedal-pressing is ensured by motivation-dependent brain activity (avoidance or approach). We have created an experimental design based on tasks of instrumental learning: pedal-pressing in an operant box results in a reward, which is either a piece of food in a feeder (food-acquisition behavior) or an escape-platform (footshock-avoidance behavior). Data recording and analysis were performed using custom software, the open source Accord.NET Framework was used for real-time object detection and tracking. Keywords: Engineering Learning Footshock Avoidance task Appetitive task Approach/Withdrawal Behavioral analysis

© Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 159–164, 2020. https://doi.org/10.1007/978-3-030-30425-6_18

160

A. I. Bulava et al.

1 Introduction Animal models are used by researchers all over the world. Rodent passive/active avoidance tests are the typical models not only in experimental psychology but also in clinical psychology, psychiatry and behavioral neuroscience. Recent years have brought rapid advances in our understanding of the brain processes involved in the avoidance-learning, along with their clinical implications for anxiety disorders, PTSD etc. [7, 10]. Avoidance behavior in rodents has predominantly been studied using leverpress signaled avoidance task, which requires animals to press a tool upon presentation of a warning signal in order to prevent or escape punishment [10]. The development of new techniques capable of modeling multidimensional cognitive activity could be a valuable contribution to psychophysiological studies. The system organization of human and animal behavior, including the processes of systemogenesis, can be studied in a variety of situations, such as learning and performing behavioral tasks, acute/chronic stress, psychotrauma, alcohol intoxication, etc. This paper presents a novel rodent avoidance test designed to expand the possibilities for exploration of learning and memory processes.

2 Device The device we developed consists of a current stimulating electrode-platform and custom software that allows to control and record the real-time behavioral protocol, which can be used to reconstruct trajectories of the animal’s movement. The size and amount of the electrodes provide a stable contact with animal skin (see Fig. 1e). The device can be used for the typical footshock-avoidance tests, including passive, active and modified active (see Fig. 1a–c). This is achieved by combining separate sectors of electrodes (Patent RU2675174C1, Fig. 1). The device is completed with partitions and sound/light signals, which provide possibilities to implement a broad range of behavioral tasks in various situations and conditions, such as learning, helplessness, stress in the studies of anxiety, stress disorders and memory, etc. Finally, the device can be used to study prosocial behavior in rodents, including cooperation, competition, willingness to help a conspecific, emotional contagion and empathy. For instance, we have used a previously established model of emotional contagion [4, 6] in which an animal observes a conspecific experience painful electroshocks. This model is illustrated in Fig. 1d. It is known that the electrical resistance of rodent skin depends on such factors as age, sex and weight. Indeed, experiments revealed wide differences in the skin resistance of animals [5, 8]. In addition, our study showed that skin resistance in rats decreases after 5-min of electrostimulation. Therefore, we have applied electrical circuit of the voltage-controlled current source to compensate for this change in the operation of the device. A user can apply automatic settings for task-dependent stimulation or control stimulation manually, including both, AC (alternating current) and DC (direct current). Slow-growing stimulation can be regulated by a microcontroller. Impulse noise (artifacts) elimination is provided by the alternating current.

A Novel Avoidance Test Setup: Device and Exemplary Tasks

161

Fig. 1. Typical footshock-avoidance tests: (a) passive, (b) active, (c) modified active, (d) “emotional contagion” - observer (left) and pain-demonstrator (right). (e) Device controller (left) and a photograph illustrating the stable contact between electrodes (the arrow indicates one of the electrodes) and animal’s skin.

3 A Novel Avoidance Test Procedure A novel footshock-avoidance test procedure allows flexible current-stimulating settings with variable times of trials, currents (from 0 to 3 mA) and time between trials. In our work we have used slow-rising current. A typical trial consists of three intervals: (1) current rise; (2) maximum value; (3) time-out (pause between trials). In order to avoid footshock, a test animal learns to press a pedal during either the current rise period, or time-out period. This experimental procedure allows to explore individual differences in decision-making and choice avoidance strategies, when an animal makes a choice between escape and avoidance. Figure 2 illustrates the “learned helplessness” experiment, when unavoidable highintensity footshock is applied to an animal.

Fig. 2. An example of real-time protocol of footshock-avoidance behavior. Footshock is applied in all 4 sectors (A, B, C, D). Three trials are illustrated here. The rat is given a current of 0 to 1 mA, interval settings: current rise from 0 to 5 s, followed by the maximum value from 5 to 10 s, and current stops after 10 s (bottom, right). The next trial begins. Top right corner shows the real-time video recording.

162

A. I. Bulava et al.

4 Exemplary Tasks of Instrumental Learning 4.1

Approach/Withdrawal Paradigm

The most general division of behavior is considered to be approach and withdrawal. Studies demonstrated motivation-dependent brain activity (avoidance- or approachgoal) during behavioral acts, such as pedal pressing [1–3, 9]. A typical model of approach behavior is a food-acquisition task, while the typical model of withdrawal behaviour is an avoidance task. We have created an experimental design based on tasks of instrumental learning. Operant box is equipped with automated feeders, escape-platform and pedal bars located in the opposite corners of the box. Pedal-pressing results in a reward, which is either a piece of food in a feeder (food-acquisition behavior, see Fig. 3b), or an escapeplatform (footshock-avoidance behavior, see Fig. 3a). The action of pedal-pressing is the same in both cases, but its result is variable: escape-platform or feeder.

Fig. 3. (a) Instrumental footshock-avoidance behavior. (b) Instrumental food-acquisition behavior. (c) Movement paths of a representative rat. (d) Exemplary learning curve during appetitive bar-pressing behavior.

4.2

Behavioral Data Recording and Analysis

Data recording and analysis were performed using custom software developed by Volkov S.V. Fig. 4 shows exemplary real-time protocol for behavioral analysis (provided by the device).

A Novel Avoidance Test Setup: Device and Exemplary Tasks

163

Fig. 4. Exemplary real-time protocol for behavioral analysis (food-acquisition task). The behavioral cycle: 1 - pedal (bar) pressing; 2 - start of the feeder motor; 3 - lowering rat head and taking food from the feeder. Frame from the actual video recording during operant foodacquisition behavior (right). The object is identified (rectangle), coordinates are recorded in PC.

The food-acquisition behavioral cycle was divided into several acts (Fig. 4 left): pedal (bar) pressing (mechanosensor); moving to pedal corner; lowering head (photosensor) and taking food from the feeder. The moving object is identified (Fig. 4 right, rectangle) by custom software using the open source Accord.NET Framework [11]. The signal-coordinates are recorded into PC. Animals’ movement paths are restored by coordinates (see Fig. 3c). The Accord.NET Framework is a .NET machine learning framework combined with audio and image processing libraries completely written in C#. Real-time object detection and tracking, as well as general methods for detecting and tracking. Convenient open source.

5 Conclusion We have compiled and debugged a novel rodent avoidance task procedure that allows to obtain new type of data about individual differences in decision-making and choice of avoidance strategies. For example, experiments in active non-instrumental avoidance test (see Fig. 1b) showed, that female rats choose to minimize the risks and avoid shock during low-voltage current (a signal for avoidance), while male rats do it during the pause (between trials), which allows to avoid the shock completely but with a risk of high-voltage shock in rare occasions. We have created an experimental design based on tasks of instrumental learning that allows to explore motivation-dependent brain activity (avoidance or approach). The novel rodent avoidance test that we developed expands the possibilities for exploration of learning and memory processes.

164

A. I. Bulava et al.

Acknowledgments. This research was performed in the framework of the state assignment of Ministry of Science and Higher Education of Russia (No. 0159-2019-0001 by Institute of Psychology RAS - learning procedures; No. 0149-2019-0011 by Shirshov Institute of Oceanology RAS - designed device).

References 1. Alexandrov, Y.I., Sams, M.: Emotion and consciousness: ends of a continuity. Cogn. Brain Res. 25, 387–405 (2005) 2. Bulava, A.I., Grinchenko, Y.V.: Patterns of hippocampal activity during appetitive and aversive learning. Biomed. Radioelectron. 2, 5–8 (2017) 3. Bulava, A.I., Svarnik, O.E., Alexandrov, Y.I.: Reconsolidation of the previous memory: decreased cortical activity during acquisition of an active avoidance task as compared to an instrumental operant food-acquisition task. In: 10th FENS Forum of Neuroscience, Abstracts P044609, p. 3493 (2016) 4. Carrillo, M., Han, Y., Migliorati, F., Liu, M., Gazzola, V., Keysers, C.: Emotional mirror neurons in the rat’s anterior cingulate cortex. Curr. Biol. 29(8), 1301–1312 (2019) 5. Cheng, N., Van Hoof, H., Bockx, E., Hoogmartens, M.J., et al.: The effects of electric currents on ATP generation, protein synthesis, and membrane transport in rat skin. Clin. Orthop. 171, 264–272 (1982) 6. Keum, S., Shin, H.-S.: Rodent models for studying empathy. Neurobiol. Learn. Mem. 135, 22–26 (2016) 7. Krypotos, A.-M., Effting, M., Kindt, M., Beckers, T.: Avoidance learning: a review of theoretical models and recent developments. Front. Behav. Neurosci. 9, 189 (2015) 8. Muenzinger, K.F., Mize, R.H.: The sensitivity of the white rat to electric shock: threshold and skin resistance. J. Comp. Psychol. 15(1), 139–148 (1933) 9. Shvyrkova, N.A., Shvyrkov, V.B.: Visual cortical unit activity during feeding and avoidance behavior. Neurophysiology 7, 82–83 (1975) 10. Urcelay, G.P., Prevel, A.: Extinction of instrumental avoidance. Curr. Opin. Behav. Sci. 26, 165–171 (2019) 11. Accord.NET Framework. http://accord-framework.net/index.html. Accessed 14 May 2019

Direction Selectivity Model Based on Lagged and Nonlagged Neurons Anton V. Chizhov1,2(B) , Elena G. Yakimova3 , and Elena Y. Smirnova1,2 1

3

Ioffe Institute, Politekhnicheskaya str., 26, 194021 St.-Petersburg, Russia [email protected] 2 Sechenov Institute of Evolutionary Physiology and Biochemistry of RAS, Torez pr., 44, 194223 St.-Petersburg, Russia Pavlov Institute of Physiology, Makarova emb. 6, 199034 St.-Petersburg, Russia

Abstract. Direction selectivity (DS) of visual cortex neurons is modelled with a filter-based description of retino-thalamic pathway and a conductance-based population model of the cortex as a 2-d continuum. The DS mechanism is based on a pinwheel-dependent asymmetry of projections from lagged and non-lagged thalamic neurons to the cortex. The model realistically reproduces responses to drifting gratings. The model reveals the role of the cortex in sharpening DS, keeping interneurons non-selective. Keywords: Visual cortex · Direction selectivity Lagged and non-lagged cells · Coductance-based refractory density model

1

·

Introduction

Primary visual cortex neurons are selective to various characteristics of the stimulus: orientation, direction of motion, color, etc. [1]. Most of the DS models include a time delay between the spatially separated inputs into a cortical cell [2]. The physiological mechanism of this delay formation has been revealed in [3] and further, in more details, in [4], where with the help of intracellular in vivo registrations it was demonstrated that the lateral geniculate nucleus (LGN) neurons fall into two classes: lagged and non-lagged cells; and a delay of the lagged neurons is determined by the effects of the inhibitory-excitatory synaptic complexes formed on synaptic axonal terminals of retinal ganglion cells in LGN. In [5], it was proposed a complex schematic model of DS, based on specific convergent projections of the signals from lagged and non-lagged LGN cells, as well as on the intracortical interactions. Later, it was proposed a reduced, rate model of a hypercolumn that exploits lagged and non-lagged LGN cells and feedforward inhibition [6]. However no any detailed and comprehensive model has been reported yet. In our biophysically detailed model of V1 we use a conductancebased refractory density (CBRD) approach [7], which allows us to benefit from the advantages of population models and keep the precision of biophysically detailed models. c Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 165–171, 2020. https://doi.org/10.1007/978-3-030-30425-6_19

166

2

A. V. Chizhov et al.

Methods

Lag-Nonlag Mechanism of Direction Selectivity. The LGN neurons differ in their delayed reaction to visual stimuli and split into two populations of lagged and non-lagged cells (Fig. 1). These populations are equally and homogeneously distributed across LGN (Fig. 1, middle). The lagged/non-lagged cells have round, center-surround receptive fields (RF) (Fig. 1, left). We consider only so-called on-cells, they respond strongly to a bright stimulus in the center of RF and are inhibited in the surround of RF. The center-surround structure is described by an axisymmetric difference of Gaussians (DOG), as in [8], with the RF’s temporal component set as a double-exponential function. The firing rate of an LGN neuron at any given time is expressed as a convolution of RF with a stimulus and rectified at zero. The model of LGN cells is described in detail in [9]. Lagged cell activity is delayed by 40 μs, according to estimations from [4].

Fig. 1. Schematic representation of the proposed model reproducing direction selectivity in V1.

V1 consists of orientation hypercolumns (Fig. 1, right). V1 neurons receive inputs from LGN cells that are located within the elongated footprint (Fig. 1, middle). The elongation determines the orientation preference (horizontal). Neurons near the border of two neighboring hypercolumns prefer the same orientation but opposite directions and have similar footprints. A preferred direction depends on the asymmetry of connections with lagged and non-lagged cells, i.e. the footprint (Fig. 1, middle), which is split into two halves along the axis of elongation. V1 neurons preferring a certain direction (upward) receive non-lagged input from one (top) side of the footprint and lagged input from the other (bottom) side, and vice versa for V1 neurons preferring the opposite (downward)

Direction Selectivity Model Based on Lagged and Nonlagged Neurons

167

direction. Mathematical description of LGN to V1 projection is expressed in terms of firing rates and convolutions. A kernel expression that determines the direction selectivity bias is the thalamic input into V1 neuron, ϕT h,E (x, y, t), which is given below. The pinwheels with clockwise progression of orientation columns are adjacent to the ones with counterclockwise progression. The pinwheel-centers are distributed on the rectangular grid with the pinwheel radius R and indexed by iP W and jP W . The adjacent columns owing to different pinwheels have the same orientation preferences. The coordinates of the pinwheel-center are xP W = (2iP W − 1)R, yP W = (2jP W − 1)R. The orientation angle for the point (x, y) of V1 which belongs to the pinwheel (iP W , jP W ) is defined as θ(x, y) = arctan((y − yP W )/(x − xP W )). The progression is determined by the factor (−1)iP W +jP W . Finally, the input firing rate is d˜ xd˜ y DLGN −V 1 (x, y, x ˜, y˜) LLGN (˜ x, y˜, t − δ(x, y, x ˜, y˜)), ϕT h,E (x, y, t) = where 2 2 , DLGN −V 1 (x, y, x ˜, y˜) = 1/(πσpref σorth ) exp −x2 /σpref − y 2 /σorth x = (˜ x − xcf ) cos θ − (˜ y − ycf ) sin θ, y = (˜ x − xcf ) sin θ + (˜ y − ycf ) cos θ, δ(x, y, x ˜, y˜) = {40ms, if(−1)iP W +jP W x > 0; 0, otherwise}. Here DLGN −V 1 (x, y, x ˜, y˜) is the LGN-to-V1 footprint with the width across preferred orientation σpref and the width across orthogonal orientation σorth ; δ(x, y, x ˜, y˜) is the delay that determines contributions of either lagged or nonlagged cells. Biophysically Detailed Mathematical Model of V1. V1 is modeled as a continuum in 2-d cortical space. Each point contains 2 populations of neurons, excitatory (E) and inhibitory (I), connected by AMPA, NMDA and GABAA-mediated synapses for recurrent interactions and only AMPA and NMDA for LGN input. The strengths of the external connections correspond to the pinwheel architecture, thus neurons receive inputs according to their orientation and direction preferences. The strengths of the intracortical connections, i.e. maximum conductances, are isotropic and distributed according to locations of pre- and postsynaptic populations. The modeled area of the cortex was as large as 1 mm × 1.5 mm and included 6 orientation hypercolumns. The mathematical description of each population is based on the CBRD approach [10,11], where neurons within each population are distributed according to their phase variable, the time elapsed since their last spikes, t∗ . Single population dynamics is governed by the equations for the neuronal density, the mean over noise realizations voltage and gating variables. The CBRD for interacting adaptive regular spiking pyramidal cells and fast spiking interneurons is given in [7,12]. The model of an E-neuron takes into account two compartments and a set of voltage-gated ionic currents, including the adaptation currents.

168

3

A. V. Chizhov et al.

Results

We have testified the mechanism of DS by comparison of spatio-temporal activity patterns (Fig. 2) in response to horizontal gratings moving up (a) and down (b) with temporal frequency 8 Hz and spatial frequency 0.25 cycle/grad. The bright spots correspond to high activity. They appear in columns that prefer orientation similar to that of the stimulus. The patterns are not symmetrical in respect to the central vertical axis, which is due to DS, i.e. different direction preferences for neurons of the left and right columns with the same orientation preferences, as clear from the averaged over first 1600 ms activity of E-neurons (Fig. 2c). The peaks of the E-cell activity locate in different hypercolumns, depending on the direction of the grating movement. The plots for the excitatory firing rate (Fig. 2c) are comparable to the optical imaging data, for example, the ones obtained in cat visual cortex [13] (see their Figs. 4A-B). For the location marked in Fig. 2c, the LGN input, mean voltage, synaptic conductances, firing rate, voltage-sensitive dye (VSD) signal and voltage of representative neurons are shown in Fig. 2d,e. These simulated signals are similar to experimental recordings, for instance, those from [14] (their Fig. 5). The firing rates of E and I populations correlate in time. The amplitude of firing rate oscillations strongly depends on the direction of gratings movement (compare panels d and e). The voltage-sensitive dye (VSD) signal (bottom trace) was calculated as a sum of three quarters of the E mean voltage and one quarter of the I mean voltage. It is comparable to the experimentally recorded VSD-signals, for instance, from [15]. The input signals for the neurons of the populations are the synaptic conductances (Fig. 2d,e). Modulations in time of the excitatory and inhibitory components are in-phase. To compare with experiments, it should be noted that we present separate AMPA, NMDA and GABA conductances, whereas known experimental studies reported anti-phase estimates of summed, AMPA+NMDA, and inhibitory conductances [14,16–18], which should not be directly compared, because of the underestimation of the experimental method, that was recently revealed [19]. That is why, our observation of in-phase modulations of the AMPA and GABA conductances should not be considered as untrue if compared with experimental estimates of anti-phase excitatory and inhibitory conductances. The CBRD-model enables one to reconstruct a behavior of a representative neuron, if known input variables of a population. As seen from voltage traces, such a representative E-neuron generates spikes when the direction of gratings movement is the preferred one. When the direction is opposite, only sub-threshold depolarization is observed. As to an I-neuron, it shows weaker direction specificity. Voltage traces recorded in response to moving gratings are consistent with the ones presented in electrophysiological works in vivo, such as [14,18], if compare the shape and the amplitude of voltage oscillations. Mean voltage shown in Fig. 2d,e is the mean across noise realizations and across input weights. Membrane potentials of individual neurons generally differ from this

Direction Selectivity Model Based on Lagged and Nonlagged Neurons a

b 100

125

150

175

0

200

10

20

225 ms

30

169

c 100

125

150

175

0

Hz

d

200

10

20

225 ms

30

up

down

Hz

e non-preferred direction

Hz

Hz

preferred direction

10

input to V1 neuron

0

input to V1 neuron

10 0

units

-70 60 40 20 0

NMDA synaptic conductances GABA

mV

mean voltage

I

-60

E

-70 units

mV

mean voltage

-60

I E

AMPA

50

60 40 20 0

synaptic conductances NMDA GABA

AMPA

50 I

firing rate

Hz

Hz

firing rate

E

E

0 -40 voltage in representative neuron

-60

I

mV

0 -40

mV

I

voltage in representative neuron

-60

I E

E

-70

VSD-signal

250

500 ms

mV

mV

VSD-signal

-60

750

-60 -70

250

500 ms

750

Fig. 2. Activity of V1 domain in response to the moving gratings. (a,b) Distribution of the E-population firing-rate (bottom) across the modeled area of V1 at different time moments in response to the gratings (top) moving up (a) and down (b). The modeled area of V1 includes 6 orientation hypercolumns with the centers marked by small white dots. The white circle is the location of the representative population. (c) The firing rate of E-populations, averaged over 1600 ms. (d,e) Activity characteristics in one point of the modeled V1 area (big white point in b) in response to the grating stimuli moving up (d) and down (e): the LGN input to the E-population; the mean voltage of E (solid line) and I (dashed) populations; the AMPA (solid), NMDA (long-dashed) and GABA (dashed) recurrent synaptic conductances; the firing rate of E (solid) and I (dashed) populations; the voltage of representative E (solid) and I (dashed) neurons; voltage-sensitive dye (VSD) signal.

mean voltage due to an individual input weight obeying the lognormal distribution, different noise realizations and different refractory state t∗ , as seen from the example for the representative neuron.

170

4

A. V. Chizhov et al.

Discussion

In our model, the average activity patterns (Fig. 2c) are comparable with the optical imaging data [13]. The scales and contrast of the modeled and experimental spots of activity are similar. Also, the displacement of the spots after the change of the stimulus direction is similar. We have found that E-neurons are directionally selective, whereas I-neurons are not, because of two reasons: I-neurons do not receive direct LGN input and the characteristic length of E-cell connections to I-cells is 5 times bigger than that of E-to-E connections. The voltage traces registered in [14,17,18] have the same degree of DS as our model. The Lag-Nonlag mechanism is principally similar to that based on transient and sustained cells [20]. Alternatively, recently reported experimental data obtained with the help of optogenetics [21] and multielectrode electrophysiological recordings [22] suggest that DS in V1 is determined by a displacement of on- and off- subzones of the receptive fields of V1 neurons. Here we did not take into account the off-signals; instead, we considered only on-center off-surround neurons in LGN and their pure excitatory projections to V1. Introduction of feedforward inhibition and/or off-center on-surround LGN neurons and on-off separation at the level of V1 is expected to produce stronger DS. This issue is to be considered in our future study. Concluding, the proposed model is quite realistic by construction and behavior. Simulations approve that the suggested mechanism is consistent with known experimental constraints. Acknowledgment. The reported study was supported by the Russian Foundation for Basic Research (RFBR) research project 19-015-00183.

References 1. Hubel, D.H., Wiesel, T.N.: Receptive fields of single neurones in the cat’s striate cortex. J. Physiol. 148, 574–591 (1959) 2. Adelson, E.H., Bergen, J.R.: Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A. 2, 284–299 (1985) 3. Cai, D., DeAngelis, G.C., Freeman, R.D.: Spatiotemporal receptive field organization in the lateral geniculate nucleus of cats and kittens. J. Neurophysiol. 78(2), 1045–1061 (1997) 4. Vigeland, L.E., Contreras, D., Palmer, L.A.: Synaptic mechanisms of temporal diversity in the lateral geniculate nucleus of the thalamus. J. Neurosci. 33(5), 1887–1896 (2013) 5. Saul, A.B., Humphrey, A.L.: Evidence of input from lagged cells in the lateral geniculate nucleus to simple cells in cortical area 17 of the cat. J. Neurophysiol. 68(4), 1190–1208 (1992) 6. Ursino, M., La Cara, G.E., Ritrovato, M.: Direction selectivity of simple cells in the primary visual cortex: comparison of two alternative mathematical models. I: response to drifting gratings. Comput. Biol. Med. 37(3), 398–414 (2007)

Direction Selectivity Model Based on Lagged and Nonlagged Neurons

171

7. Chizhov, A.V.: Conductance-based refractory density model of primary visual cortex. J. Comput. Neurosci. 36, 297–319 (2014) 8. Dayan, P., Abbott, L.F.: Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. The MIT Press, Cambridge (2001) 9. Yakimova, E.G., Chizhov, A.V.: Experimental and modeling studies of orientational sensitivity of neurons in the lateral geniculate nucleus. Neurosci. Behav. Physiol. 45(4), 465–475 (2015) 10. Chizhov, A.V., Graham, L.J., Turbin, A.A.: Simulation of neural population dynamics with a refractory density approach and a conductance-based threshold neuron model. Neurocomputing 70(1), 252–262 (2006) 11. Chizhov, A.V., Graham, L.J.: Population model of hippocampal pyramidal neurons, linking a refractory density approach to conductance-based neurons. Phys. Rev. E 75, 011924 (2007) 12. Chizhov, A., Amakhin, D., Zaitsev, A.: Computational model of interictal discharges triggered by interneurons. PLoS ONE 12(10), e0185752 (2017) 13. Shmuel, A., Grinvald, A.: Functional organization for direction of motion and its relationship to orientation maps in cat area 18. J. Neurosci. 16, 6945–6964 (1996) 14. Monier, C., Fournier, J., Fregnac, Y.: In vitro and in vivo measures of evoked excitatory and inhibitory conductance dynamics in sensory cortices. J. Neurosci. Methods 169, 323–365 (2008) 15. Grinvald, A., Lieke, E.E., Frostig, R.D., Hildesheim, R.: Cortical point-spread function and long-range lateral interactions revealed by real. J. Neurosci. 14(5), 2545– 2568 (1994) 16. Anderson, J.S., Carandini, M., Ferster, D.: Orientation tuning of input conductance, excitation, and inhibition in cat primary visual cortex. J. Neurophysiol. 84(2), 909–926 (2000) 17. Priebe, N.J., Ferster, D.: Direction selectivity of excitation and inhibition in simple cells of the cat primary visual cortex. Neuron 45(1), 133–145 (2005) 18. Baudot, P., Levy, M., Marre, O., Monier, C., Pananceau, M., Fregnac, Y.: Animation of natural scene by virtual eye-movements evokes high precision and low noise in V1 neurons. Front. Neural Circ. 7, 206 (2013) 19. Chizhov, A.V., Amakhin, D.V.: Method of experimental synaptic conductance estimation: limitations of the basic approach and extension to voltage-dependent conductances. Neurocomputing 275, 2414–2425 (2017) 20. Lien, A.D., Scanziani, M.: Cortical direction selectivity emerges at convergence of thalamic synapses. Nature 558, 80–86 (2018) 21. Adesnik, H., Bruns, W., Taniguchi, H., Huang, J., Scanziani, M.: A neural circuit for spatial summation in visual cortex. Nature 490, 226–231 (2012) 22. Kremkow, J., Jin, J., Wang, Y., Alonso, J.: Principles underlying sensory map topography in primary visual cortex. Nature 533(7601), 52–57 (2016)

Wavelet and Recurrence Analysis of EEG Patterns of Subjects with Panic Attacks Olga E. Dick(&) Pavlov Institute of Physiology of RAS, St. Petersburg, Russia [email protected]

Abstract. The task of analyzing the reactive patterns of electroencephalogram (EEG) in individuals with panic attacks before and after non-drug therapy associated with the activation of artificial stable functional connections of the human brain is considered. The quantitative measures of the photic driving reaction for the suggested frequency are estimated by increasing the energy of the wavelet spectrum during the photostimulation and the parameters of the joint recurrence plot of the light stimulus and EEG pattern. Keywords: EEG

Panic attacks Wavelet analysis Joint recurrence plot

1 Introduction Panic attacks include a complex of symptoms characterized by paroxysmal fear [1, 2]. The importance of the problem of treating this disorder is due to the lack of effectiveness of drug therapy. That is why there is still a need to find safe non-drug therapies. One of these methods is the activation of artificial stable functional connections (ASFC) of the human brain. The ASFC method is based on the intracerebral phenomenon of long-term memory, which is a special kind of functional connections of the brain that are formed under conditions of activation of subcortical structures and impulse stimulation, and associated with the regulatory systems of the brain [3–5]. The aim of the work is to show the ability to identify quantitative indicators of the improvement of the functional state of the brain of patients with panic attacks after ASFC trials.

2 Materials and Methods Artifact-free EEG patterns were analyzed in 10 patients aged from 26 to 45 years with a disease duration of an average of 10 years and a diagnosis of panic disorder. The course of correction was performed at the clinic of the Institute of the Human Brain of the Russian Academy of Sciences and consisted of 10 trials of the formation of ASFC. Each trial included 6 series of photostimulation with a frequency of 20 Hz and duration of 10 s on the background of the medication of ethimizol, the intervals between the stimuli were 60 s. The photostimulation was carried out using the functional brain activity simulator “Mirage” (St. Petersburg). This device has proven itself in the © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 172–180, 2020. https://doi.org/10.1007/978-3-030-30425-6_20

Wavelet and Recurrence Analysis of EEG Patterns

173

programs of non-drug correction in earlier studies. [3–5]. Before and after these trials, the brain bioelectrical activity was recorded on a 21-channel electroencephalograph with a sampling rate of 256 Hz. The study was approved by the local Ethics Committee. Written informed consent was obtained from all the subjects. The stimulation lasted 10 s for each frequency, with a resting interval between frequencies of 30 s. Since the signals reproducing the light rhythm have maximal amplitude in the occipital lobes, the patterns at −O1-, Oz- и O2- sites were estimated. The photic driving reaction in EEG patterns was estimated by the continuous wavelet transform method [6] and the method of the joint recurrence analysis [7]. In the first method the complex Morlet wavelet was used as the basic wavelet: w0 ðtÞ ¼ p1=4 expð0:5t2 Þ expðix0 tÞ; where the value x0 = 2p gives the simple relation between the scale a of the wavelet transform and the real frequency f of the analyzed signal [6]: f ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi x0 þ 2 þ x0 1=a: 4pa

Due to the relation between a and f, the continuous wavelet transform of the signal x(t) is determined by the function: 1=4

Wðf ; t0 Þ ¼ p

þ1 pffiffiffi Z f xðtÞ exp ð0:5ðt t0 Þ2 f 2 Þ exp ði2pðt t0 Þf Þdt 1

where t0 gives the shift of the wavelet function along the time. The value jWðf ; t0 Þj2 determines the instantaneous distribution of the energy over frequencies f, and the integral Z

t2

Eðf Þ ¼

Wðf ; t0 Þ2 dt0

t1

describes the global wavelet spectrum, i.e., the integral distribution of the wavelet spectrum energy over frequencies on the time interval [t1, t2]. The light time series was approximated by a sequence of k Gauss impulses following each other with frequency fC: ! k1 X 0:5 ðt tj Þ2 pffiffiffi exp pðtÞ ¼ ; r p 4r02 j¼0 0 where r0 = 10 ms is the width of the impulse, tj are the centers of the impulses: tj ¼ tA þ j=fc ; j ¼ 0; . . .; k 1; tA is the time of the beginning of the first impulse in the sequence [8].

174

O. E. Dick

The wavelet transform of the light series p(t) was found in the form [9]: pffiffiffi X 2 f f k1 exp þ g tj t0 Wðf ; t0 Þ ¼p1=4 pffiffiffi 2 g j¼0 4r0 ! ð2pr0 Þ2 f 3 2pf 2 tj t0 þ þ i ; g g where g ¼ 1 þ 2ðr0 f Þ2 : The presence of the photic driving reaction was estimated by the value of the coefficient of photic driving (kR) in the narrow range [fC − Df, fC + Df] around each applied stimulation frequency fC, where Df = 0.5 Hz [9]. The coefficient of photic driving (kR) was determined by the ratio of the maxima of the global wavelet spectra during the photic stimulation and before it. The value kD < 1 means that the energy of the global wavelet spectrum during the light stimulation is less than the energy of the spectrum before stimulation and the absence of the photic driving reaction of the given frequency. The second method of the analysis of the photic driving reaction in the EEG patterns is connected with the construction of joint recurrence plots of the EEG and the light series. A joint recurrence plot is a graphical representation of a matrix Ri;j ðeÞ ¼

1; yi yj ; zi zj ; ; 0; yi 6¼ yj ; zi 6¼ zj

in which values 1 or 0 correspond to black or white points, where the black point means a recurrence and the white point corresponds to a nonrecurrence, respectively [7]. A joint recurrence, within the accuracy to e error, is determined as the repetition of the state yj of the phase EEG trajectory to the state yi and the simultaneous repetition of the state zj of the light signal phase trajectory to the state zi [7]. The phase trajectories of states z(t) and y(t) were obtained from the initial time series {x(t)} and {p(t)} by using the delay coordinate embedding method [10]: yðtÞ ¼ ðxðtÞ; xðt þ dÞ; . . .; xðt þ ðm1Þd Þ; where d is the delay time, m is the embedding dimension, i.e. the minimal dimension of the space in which the recovery trajectory reproduces properties of the initial trajectory. The optimal time delay d was fitted on the basis of first minimum of the mutual information function [11]. The optimal embedding dimension m was searched by the false nearest neighbors method [12]. Signal extraction in the narrow band of frequencies around the photostimulation frequency allowed us to find the value of the optimal embedding dimension m < 5. The value e was equal to 1% of the standard deviation of the analyzed signal. Using the recurrence analysis we determined the quantitative measures of joint recurrence plots such as

Wavelet and Recurrence Analysis of EEG Patterns

175

(1) the mean length of diagonal lines, L, in the joint recurrence plot, (2) the recurrence time, s, which is necessary to the signal value returns into e neighborhood of the previous point, as the vertical distance between the onset and end of the sequent recurrence structure in the recurrence plot; (3) the recurrence rate, RR: RR ¼

N 1 X Ri;j ðeÞ N 2 i;j

(4) the measure for determinism of the signal, DET, as the ratio of recurrence points that form diagonal structures of at least length lmin to all recurrence points: N P

DET ¼

l¼lmin N P

lPðe; lÞ ; Ri;j ðeÞ

i;j

where Pðe; lÞ ¼ fli ; i ¼ 1; . . .; Nl g is a frequency distribution of diagonal lines of the length l in the recurrence plot, N is the number of all the diagonal lines. To compare the mean parameters obtained for the different electrode sites of one tested subject, the non-parametric Friedman ANOVA test (p < 0.05) was applied. The averaging was performed by five trials of EEG recordings for the each subject. To examine the differences between the mean values of the parameters in the group of the patients obtained before and after ASFC, the non-parametric Mann-Whitney test (p < 0.05) was used.

3 Results In the background EEG of 69% of patients the high-amplitude activity of the h - range dominated, and EEG of 31% of patients showed the low-amplitude polymorphic activity in the d, h - and a - ranges before the ASFC trials. The ASFC trials resulted in a significant decrease in the amplitude of the h - activity, the disappearance of the polymorphic activity and an increase in the activity in the a - range. The reactive EEG patterns before the ASFC trials were characterized by asymmetry of the responses of the occipital lobes of the brain to the photostimulus. It was manifested in various values of maxima of local wavelet spectra of EEG patterns recorded in O1 and O2 sites (Fig. 1a, c). After 10 trials of ASFC, all patients reported a significant decrease or complete disappearance of panic attacks, a decrease in general and situational anxiety. The asymmetry of the photic driving reaction decreased (Fig. 1b, d). Table 1 shows the average values of the photic driving coefficient (kR) for the reactive EEG patterns before and after the ASFC trials. For 9 out of 10 patients with panic attacks the value of the photic driving coefficient kR < 1 for frequencies of the h range, that means the absence of the photic driving reaction of the given rhythm. The

O. E. Dick a

O1

before ASFS 20

20.2

15

20

10

19.8 19.6 0

t, s

19.6 0 b

3

f, Hz

20.2

2

20

1

19.8 10

t, s

20

10 10

after ASFS

20.4

19.6 0

20

20 19.8

20

O1

30

20.2

5 10

30

c

O2

20.4 f, Hz

f, Hz

20.4

f, Hz

176

t, s

20 d

O2

20.4

5

20.2

4 3

20

2

19.8 19.6 0

1 10

t, s

20

30

Fig. 1. A decrease of maxima of local wavelet spectra of EEG patterns in O1 and O2 sites after ASFC trials. The beginning and end of the photostimulation is indicated by arrows.

minor photic driving reaction is revealed for frequencies of the a - range (kR = 1.9 ± 0.2 for 12 Hz and kR = 1.1 ± 0.1 for 8 Hz). The large reaction is found for frequencies of the b - range, for example, kR = 101 ± 11 for 20 Hz. At the same time, it is noted that the value kR for the O2 site is almost five times higher than that value for the O1 site. Thus, there are statistically significant differences in mean values of the coefficient kR, calculated for the occipital sites O1 and O2 (p < 0.05), that testifies about the asymmetry of the photic driving reaction for the b - range in most patients tested. After the ASFC trials the asymmetry of the responses of the occipital lobes of the brain become statistically insignificant (p > 0.05), and the values kR < 1 for the a range. The photic driving reaction of the b - range decreases significantly (kR = 5.5 ± 0.5 for the O1 site at 20 Hz). The dynamics of the rhythm driving in EEG patterns in patients with panic attacks after the ASFC trials was also confirmed by a change in simultaneous recurrences in the joint recurrence plots of these patterns and light time series. Examples of such plots are presented in Fig. 2b and d, respectively. The plots are constructed at 20 Hz for the delay time d = 3 and the embedding dimension m = 3, the value of the neighborhood size e is equal to 1% of the standard deviation of the analyzed time series. The corresponding EEG patterns during photostimulation with this frequency are shown in Fig. 2a, with a bold line, and a photostimulus with a thin dash-dotted line. The left recurrence plot (Fig. 2b) has recurrent structures containing long diagonal lines. It testifies about the emergence of simultaneous recurrences in the EEG pattern and the light signal During the increase in the amplitude of the brain response to the photostimulation of the proposed frequency (within the range of nL values from 600 to 1800), the number of simultaneous recurrences increases, which is reflected in an increase in the length of the diagonal lines in the recurrence plot.

Wavelet and Recurrence Analysis of EEG Patterns

177

Table 1. The mean values of the photic driving coefficient (kR), the recurrence rate (RR) and the recurrence time (s) in joint recurrence plots of the EEG patterns and the light time series (N = 9 from 10) before ASFC f (Hz) O1 O2 Coefficient of photic driving (kR): 6 1 in the coupling between oscillators. The reasons for choosing the system (7) are as follows. Firstly, the general qualitative character of a synaptic link is preserved when passing from (5) to (7), because in both cases the corresponding coupling terms γ sj−1 (uj−1 )(u∗ − uj ) and γ s(uj−1 )uj ln(u∗ /uj ) (j = 1, 2, u0 = u2 ) change their sign from plus to minus as the potentials uj increase and cross the critical value u∗ . Secondly, which is the most important, there exists a welldefined limit object for system (7), which is a relay-type delay system. Indeed, after the passage to the new variables xj = (1/λ) ln uj ,

j = 1, 2

(9)

and as parameter λ tends to infinity, system (7) can be represented in the form x˙ 1 = −1 + αRx1 (t − 1) − βR(x1 ) + γ (c − x1 ) H x2 (t − h), (10) x˙ 2 = −1 + αR x2 (t − 1) − βR(x2 ) + γ (c − x2 ) H x1 (t − h) , where def

R(x) =

1, x ≤ 0, 0, x > 0,

def

H(x) =

0, x ≤ 0, 1, x > 0.

(11)

184

S. D. Glyzin and M. M. Preobrazhenskaia

As it turned out, the system (10) has a rather complex dynamics. As will be shown in the next section, in this system, by introducing a delay in the chain of relations between the equations, two fundamentally important phenomena can be achieved at once. The first of these consists in the coexistence of several stable periodic regimes in the system (10). In this case, a mechanism for increasing the number of such regimes can be indicated. This phenomenon is often called multistability. The second important property of the system (10) solutions is that they have some preassigned number of consecutive positive sections, followed by a large section of negativity. Taking into account the replacement (9), such cycles of the system (7) correspond to periodic solutions with the same number of consecutive asymptotically high spikes, alternating with the section where the potentials uj (t) are close to zero. Periodic solutions with this property are called bursting-cycles (see [2,3,10,11]). The proof of the theorem on the correspondence between the solutions of the system (7) and the limit system (10) is a technically rather complicated task (see, for example, [12,13]). It is connected with the construction of asymptotic approximations of the solution of the system (7). To avoid this, one can replace the system (7) with a relay type system u˙ 1 = λ [−1 + αF (u1 (t − 1)) − βG(u1 ) + γG(u2 (t − h)) ln(u∗ /u1 )] u1 , u˙ 2 = λ [−1 + αF (u2 (t − 1)) − βG(u2 ) + γG(u1 (t − h)) ln(u∗ /u2 )] u2 , where def

F (u) =

1, 0 < u ≤ 1, 0, u > 1.

def

G(u) =

0, 0 < u ≤ 1, 1, u > 1.

(12)

(13)

Note that substitutes (9) leads (13) to relay functions (11), in particular F (exp(λx)) = R(x), G(exp(λx)) = H(x) as λ > 0. Thus, all the properties of the relay system (10) automatically passed to the system (12).

2

Relay Model Analysis

We need the following definitions in the sequel. Let’s fix a sufficiently small constant σ > 0 and consider the space E = C([−h − σ, −σ]; R2 ) of continuous vector functions ϕ(t) = colon (ϕ1 (t), ϕ2 (t)) defined for t ∈ [−h − σ, −σ]. We will set the norm in E in the usual way, i.e. by the formula ||ϕ|| = max

|ϕj (t)|.

(14)

α−β−1 + c. 1−ξ

(15)

max

j=1,2 −h−σ≤t≤−σ

Let’s introduce constants: def

ξ = exp(−γα),

def

η =

Two Delay-Coupled Neurons with a Relay Nonlinearity

185

Further, in order to determine the set of initial functions S (m) ⊂ E we fix natural N ; fix constants q1 q2 such that q1 ∈ (0, σ), q2 > σ; fix index m dependent constants: q3 q4 , such that q3 ∈ 0, (N/2 − m)T0 + ξη + σ , q4 > (N/2 − m)T0 − ξη + σ. Here m = 1, . . . , N/2, · is an integer part of number. We define two function sets: def

S1 = {ϕ1 ∈ C[−h − σ, −σ] : ϕ1 (−σ) = −σ, −q2 ≤ ϕ1 (t) ≤ − q1 ∀t ∈ [−h − σ, −σ]}, (m) def

S2

= {ϕ2 ∈ C[−h − σ, −σ] : ϕ2 (−σ) = −d − σ, −q4 ≤ ϕ2 (t) ≤ −q3 ∀t ∈ [−h − σ, −σ]},

where (n − m)T0 + ξη ≤ d ≤ (n − m)T0 − ξη, m = 1, . . . , n.

(16)

Firstly, let us consider an alone relay equation which we get for xj from (10) if γ = 0: x˙ = −1 + αR x(t − 1) − βR(x). (17) The following statement was proved in the article [12]. Lemma 1 ([12]). Let α > β + 1 and σ < β + 1. Then equation (17) with initial function ϕ1 ∈ S1 for t ∈ [−1 − σ, −σ] admits a unique stable periodic solution given by equality ⎧ (α − 1)t, t ∈ [0, 1], ⎪ ⎪ ⎨ −t + α, t ∈ [1, α], def x0 (t) = (18) ⎪ −(β + 1)(t − α), t ∈ [α, α + 1], ⎪ ⎩ (α − β − 1)(t − T0 ), t ∈ [α + 1, T0 ], x0 (t + T0 ) ≡ x0 (t),

def

T0 = α + 1 +

β+1 . α−β−1

(19)

Secondly, we consider an additional task. Lemma 2. For any l ∈ N and τ ∈ [(l − 1)T0 + α + 1, lT0 ], a solution of the task (20) x˙ = −1 + α − β + γ(c − x)H x0 (t) , x t=0 = x0 (τ ) is described by the following formula for k ∈ {0} ∪ N ξ k (x0 (τ ) − η) exp(−γ(t − kT0 )) + η, t ∈ [kT0 , α + kT0 ], y0 (τ, t) = (α − β − 1)(t − α − kT0 ) + ξ k+1 (x0 (τ ) − η) + η, t ∈ [α + kT0 , (k + 1)T0 ]. Now let us formulate for (10) a theorem about an coexistence of burstingcycles. A set of initial functions for (10) is defined as follows: def

(m)

S (m) = S1 × S2

,

m = 1, . . . , n.

(21)

186

S. D. Glyzin and M. M. Preobrazhenskaia

For simplicity, further suppose that β = α − 2. Taking into account exponential dependence ξ and α, γ (see (15)), suppose that η + α − 1 < 0,

η + N T0 /2 < 0,

(N/2 − 1)T0 + α + 1 − ξη ≤ h ≤ N T0 /2 + ξη. (22)

By definition put ⎧ t ∈ [0, h + d∗ ], x0 (t), ⎨ def (m) x1 (t) = y0 (α + h, t − d∗ − h), t ∈ [h + d∗ , h + d∗ + α + (m − 1)T0 ], (23) ⎩ (m) (m) t − T1 , t ∈ [h + d∗ + α + (m − 1)T0 , T1 ], ⎧ ⎪ ⎪ ⎨

t − d∗ , t ∈ [0, d∗ ], (t), t ∈ [d∗ , h], x 0 (m) x2 (t) = y (h − d , t − h), t ∈ [h, h + α + (N − m − 1)T ], 0 ∗ 0 ⎪ ⎪ ⎩ (m) (m) t − T2 , t ∈ [h + α + (N − m − 1)T0 , T1 ], def

(24)

where = h + d∗ + α + (m − 1)T0 − ξ m (h + d∗ − (n − m)T0 − η) − η,

(25)

= h + α + (N − m − 1)T0 − ξ N −m (h − d∗ − mT0 − η) − η,

(26)

(m) def

T1

(m) def

T2

def

d∗ =

m

(N − 2m)T0 + ξ (h − (N − m)T0 − η) − ξ 2 − ξ m − ξ N −m

N −m

(h − mT0 − η)

.

(27)

Theorem 1. Let β = α − 2, γ, h satisfy (22). Then there exists σ > 0 such that system (10) with initial condition from (21) admits N − 1 periodic modes (m)

(m)

colon (x1 (t), x2 (t)) (m)

(m)

(m = 1, . . . , N − 1).

(m)

Here x1 (t) and x2 (t) are T1 -periodic functions which have N − m and m relatively short alternating segments of positivity and negativity which go after a long enough segment where the functions values are negative. A possible view of the periodic mode is illustrated in Fig. 1. The following statement is about a stability of the solutions from Theorem 1. Theorem 2. The solution of (10), described in Theorem 1, is asymptotically orbitally stable. A proof scheme is the same as, for example, in [10,12–14]. Let us introduce some notation for its presentation. def (m) Denote a function of S (m) by ϕ = colon (ϕ1 , ϕ2 ), where ϕ1 ∈ S1 , ϕ2 ∈ S2 . def For an arbitrary function ϕ(t) from (21), denote by x(t) = colon x1 (t), x2 (t) a solution of (10) such that x1 (t) ≡ ϕ1 (t), x2 (t) ≡ ϕ2 (t), when t ∈ [−h − σ, −σ]. Suppose that the equation (28) x1 (t − σ) = −σ

Two Delay-Coupled Neurons with a Relay Nonlinearity

187

Fig. 1. A solution of (10). Here N = 8, m = 1.

has 2N − 2m or more positive roots. We denote one with number 2N − 2m by (m) T1 . Finally, we define the Poincare operator Π:S→S by the formula def

(m)

Π(ϕ) = x(t + T1

),

−h − σ ≤ t ≤ −σ.

(29)

The first step of the proof is the construction of a solution on the segment (m) [−σ, T1 ]. It is possible to show that, here the solution is described by (23), (24). We skip technical details. (m) Similarly to T1 , denote the root of x2 (t − σ) = −σ with number 2m + 1 by (m) (m) T2 . From the construction of a solution, it follows that T1 equals (25) and (m) T2 is described by (26). Since (22), (25) and (26), the distance between (2N − 2m − 1)-th and (2N − 2m)-th roots of (28) more than length of the segment where S (m) is defined. Hence operator Π is defined on the set S (m) and transform it into itself. Thus, for any m = 1, . . . , n there exists periodic solution (23), (24) of the relay system. From the explicit formulas (23), (24), it follows that all functions from S (m) map to the unique function. Therefore, Π is contraction operator. According to the contraction mapping principle, Π has a unique fixed point in S (m) . Thus, periodic solution of (10) with initial condition from S (m) is unique. Its period is (25). Moreover, a contraction property of Π means that the stability spectrum of the periodic solution contains a multiplier μ2 = 0 in addition to μ1 = 1; all other multipliers equal to zero. In the same time, the multiplier μ2 is a multiplier ¯ where d¯ is a number such that of the map −d → −d, (m)

x2 (T1

(ϕ) − σ) = −d¯ − σ.

(30)

¯ Since (22), the value T (m) − σ belong to the segment [h + (N − Let us find d. 1 (m) (m) m − 1)T0 + α, T2 ] where x2 (t) = t − T2 . Hence, using (25), (26) and (22), we obtain (m) (m) d¯ = T2 − T1 = −1 + ξ m + ξ N −m d + (N − 2m)T 0 (31) −ξ N −m (h − mT0 − η) + ξ m h − (N − m)T0 − η .

188

S. D. Glyzin and M. M. Preobrazhenskaia

A fixed point of the map is (27). Formula (31) implies that μ2 = −1 + ξ m + ξ N −m . Thus, we proof the following statement about multipliers of the periodic solution of (10). Lemma 3. The solution (23), (24) of (10) has a countable set of the zero multipliers, one unit multiplier μ1 = 1 and multiplier μ2 = −1 + ξ m + ξ N −m .

(32)

Lemma 3 implies Theorem 2.

Fig. 2. A solution of (7). Here N = 8, m = 2.

Proved theorems claim that (10) has N − 1 asymptotically orbitally stable solutions with summary N ∈ N positivity segments on a period. Moreover, the first oscillator has m segments of a solution positivity, and the second one has N − m (m = 1, . . . , N − 1) segments of a solution positivity on a period. By (9), the segments with solution positive values correspond to spikes of the solutions of the systems (7) and (12). The spikes amplitude have an order exp(λ). One of the stable solutions of (7) and (12) is illustrated on Fig. 2 in case N = 8, m = 1.

3

Conclusion

We have proposed and studied a mathematical model of pair of synaptically coupled impulse neurons with relay nonlinearity and delay in connection chain. Lets point out the most important results. The first important feature is that the system (12) is independent phenomenological model of two synaptic coupled neurons. The presented approach allows us to consider only relay system (12) which is given a well defined biological meaning. This avoids a laborious proof of the correspondence theorems which one has to prove if right parts of (12) are continuous and parameter λ is large (see, for example, [6,10,12–14]). Secondly, an analysis of (12) shows that an introduction of a delay in a coupling between oscillators implies new effects which are not typically for systems without delay. In particular, for any even N we find a mechanisms of occurrence of (N − 1) stable relaxation periodic regimes. The components of the solutions

Two Delay-Coupled Neurons with a Relay Nonlinearity

189

have summary N spikes on a period. Thus, there are both multistability phenomenon and bursting-effect. Finally, thirdly, a set of coexisting attractors of (12) contains not only solutions described in the present paper. For example, there are antiphase and impulse-refractive modes which are not considered here. The reported study was funded by RFBR according to the research project 18-29-10055.

References 1. Hodgkin, A.L., Huxley, A.: A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117, 500–544 (1952) 2. Izhikevich, E.: Neural excitability, spiking and bursting. Int. J. Bifurcat. Chaos 10(6), 1171–1266 (2000). https://doi.org/10.1142/S0218127400000840 3. Rabinovich, M.I., Varona, P., Selverston, A.I., Abarbanel, H.D.I.: Dynamical principles in neuroscience. Rev. Mod. Phys. 78, 1213–1265 (2006). https://doi.org/10. 1103/RevModPhys.78.1213 4. Kashchenko, S.A., Maiorov, V.V., Myshkin, I.Y.: Wave distribution in simplest ring neural structures. Matem. Mod. 7(12), 3–18 (1995). http://mi.mathnet.ru/ mm1392 5. Kashchenko, S.: Models of Wave Memory. Springer, Switzerland (2015). https:// doi.org/10.1007/978-3-319-19866-8 6. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: On a method for mathematical modeling of chemical synapses. Differ. Equ. 49(10), 1193–1210 (2013). https://doi.org/10. 1134/S0012266113100017 7. Somers, D., Kopell, N.: Rapid synchronization through fast threshold modulation. Biol. Cybern. 68, 393–407 (1993). https://doi.org/10.1007/BF00198772 8. Somers, D., Kopell, N.: Anti-phase solutions in relaxation oscillators coupled through excitatory interactions. J. Math. Biol. 33, 261–280 (1995). https://doi. org/10.1007/BF00169564 9. Terman, D.: An introduction to dynamical systems and neuronal dynamics. In: Tutorials in Mathematical Biosciences I: Mathematical Neuroscience, pp. 21–68. Springer, Berlin (2005). https://doi.org/10.1007/978-3-540-31544-5 2 10. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: Modeling the bursting effect in neuron systems. Math. Notes. 93(5), 676–690 (2013). https://doi.org/10.1134/ S0001434613050040 11. Chay, T.R., Rinzel, J.: Bursting, beating, and chaos in an excitable membrane model. Biophys. J. 47(3), 357–366 (1985). https://doi.org/10.1016/S00063495(85)83926-6 12. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: Relaxation self-oscillations in neuron systems: I. Differ. Equ. 47(7), 927–941 (2011). https://doi.org/10.1134/ S0012266111070020 13. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: Relaxation self-oscillations in neuron systems: II. Differ. Equ. 47(12), 1697–1713 (2011). https://doi.org/10.1134/ S0012266111120019 14. Glyzin, S.D., Kolesov, A.Y., Rozov, N.K.: Discrete autowaves in neural systems. Comput. Math. Math. Phys. 52(5), 702–719 (2012). https://doi.org/10.1134/ S0965542512050090

Brain Extracellular Matrix Impact on Neuronal Firing Reliability and Spike-Timing Jitter Maiya A. Rozhnova(B) , Victor B. Kazantsev, and Evgeniya V. Pankratova Lobachevsky State University of Nizhni Novgorod, 23 Gagarin Ave., 603950 Nizhny Novgorod, Russia [email protected]

Abstract. In this work, the role of the brain extracellular matrix (ECM) in signal processing by a neuronal system is examined. For excitatory postsynaptic currents in the form of Poisson signal, we study the changes of the interspike intervals duration, spike-timing jitter and coefficient of variation in the presence of a background noise with varied intensity. Without ECM impacts, noise-delayed spiking phenomenon reflecting worsening of both reliability and precision of signal processing is revealed. It is shown that, the ECM-neuron feedback mechanism allows enhancing the robustness of neuronal firing in the presence of noise. Keywords: Brain extracellular matrix · Neuronal activity Reliability and precision of signal transmission

1

·

Introduction

Information about any changes in external environment is transmitted by neuronal systems via changes of their membrane potential activity. Despite the presence of huge number of background noise sources, a lot of experimental data show that repeated identic signals provoke outputs with similar characteristics [1,2]. This amazing neuronal ability to process signals with high reliability and precision is still poorly understood, and, therefore, is of particular interest. Recently, based on experimental observations, new mathematical model for neuronal activity in the presence of ECM was introduced in [3], where the authors studied the role of ECM-neuron feedback mechanisms activation in sustaining of homeostatic balance in neuronal firing network as well as its possible role in memory function implementation. In this study, within the frame of this model we discuss one possible mechanism for neuronal activity regulation that can enhance the reliability and precision of signal transmission in the presence of background noise.

c Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 190–196, 2020. https://doi.org/10.1007/978-3-030-30425-6_22

Brain Extracellular Matrix Impact on Neuronal Firing

2 2.1

191

Mathematical Model Postsynaptic Neuronal Dynamics

We assume that the membrane potential of postsynaptic cell is evolved according to the following current balance equation of the Hodgkin-Huxley model: C V˙ = Iapp − Iion + Isyn ,

(1)

where Iion = IN a + IK + Il is the sum of the transmembrane currents with IN a = gN a m3 (V )h(V )(V − EN a ), IK = gK n4 (V )(V − EK ), Il = gl (V − El ) are the sodium IN a and potassium IK ionic currents passing through the cell membrane and the current Il through an unspecific leakage channel, respectively. The dynamics of potential-dependent gating variables is described by the following kinetic equations: x˙ = αx (V )(1 − x) − βx (V )x,

(2)

where x is m(V, t), h(V, t) (that are responsible for the activation and inactivation of the Na+ -current), or n(V, t) (that controls the K+ -current activation). The mean transition rates αx (V ), βx (V ), and the parameters of the model are taken as in the classical work of Hodgkin and Huxley [4]. 2.2

Synaptic Currents Modeling

We assume that synaptic current is Isyn = IEP SCs (k) + ξ(t) where the first term is determined as follows: A, if tj < t < tj + τ, IEP SCs (k) = (3) 0, otherwise, where tj is the occurence time of a pulse with amplitude A in the input signal. This time satisfies Poisson distribution with average time interval τin between the subsequent pulses. The duration of each pulse in the input is assumed to be constant with τ = 1 ms. For each pulse the amplitude A has a random value that satisfies the probability distribution P (A) =

2A −A2 /b2 e b2

(4)

with the scaling factor b = b0 (1 + γZb Z), where γZb is the gain parameter that modifies the amplitude of IEP SCs [3]. The second term of Isyn is the white Gaussian noise with zero mean ξ(t) = 0 and with the correlation function ξ(t)ξ(t + τG ) = Dδ(τG ). Additionally we assume that Iapp = Idc (1 + γZ Z), where γZ is the feedback gain parameter that modifies the applied current. Thus, both the currents in the

192

M. A. Rozhnova et al.

input of the neuron (1) depend on the variable Z whose value should be taken from the following system of equations describing ECM dynamics: Z0 − Z1 Z˙ = −(αZ + γP P )Z + βZ Z0 − , 1 + exp(−(Q − θZ )/kZ ) (5) P0 − P1 P˙ = −αP P + βP P0 − , 1 + exp(−(Q − θP )/kP ) where Q is an average neuronal activity variable that changes in time in accordance with the following differential equation Q˙ = −αQ Q + βQ / (1 + exp(−V /kQ )).

(6)

In Eqs. (5) and (6), αZ = 0.001 ms−1 , γP = 0.1, βZ = 0.01 ms−1 , Z0 = 0, Z1 = 1, θZ = 1.1, kZ = 0.15, αP = 0.001 ms−1 , βP = 0.01 ms−1 , P0 = 0, P1 = 1, θP = 6, kP = 0.05, αQ = 0.0001 ms−1 , βQ = 0.01 ms−1 , kQ = 0.01. In our numerical calculations, to avoid influence of transients the analysis of interspike interval durations is carried out for t > 5 s. For all the averagings, n = 10000 sampling values were used.

3

Neuronal Firing Without Impact of ECM

The dynamics of the membrane potential depends on characteristics of an input current. In Fig. 1, two time series V (t) for low-and high-frequency Poisson input with random amplitudes are shown. To study the role of the input signal parameters in neuronal activity, we further calculate the mean of the interspike interval duration n 1 (i) τid = τ , (7) n i=1 id

Fig. 1. (a), (b) EPSCs-Poisson pulse trains for two values of interpulse duration τin = 10 ms and τin = 2 ms, and (c), (d) evoked oscillations of the membrane potential in the absence of ECM, D = 0, Idc = 5.7 μA/cm2 , b0 = 3.

Brain Extracellular Matrix Impact on Neuronal Firing

193

(i)

the spike-timing jitter (the mean square deviation of τid ) as n 1 (i) 2 σ= [τ ]2 − τid n i=1 id

(8)

and the coefficient of variation β = σ/τid that illustrates the degree of coherence in the neuronal output. 3.1

Output ISI-Statistics in the Absence of Gaussian Noise

For D = 0, three parameters define the change of the input, namely, Idc , τin and b0 . Since for Hodgkin-Huxley (HH) model the parameter Idc can lead to one of three possible regimes of neuronal behavior, we focus on three its values: Idc = 2 5 μA/cm (for dc-injected HH-model this corresponds to monostable regime with 2 a stable steady state), (b) Idc = 7 μA/cm (bistable regime with co-existence of 2 both stable steady state and limit cycle) and (c) Idc = 10 μA/cm (monostable regime with stable limit cycle) [5]. As seen from Fig. 2 for all of these cases, the decrease of the input frequency (as well as the decrease of Idc ) leads to increase of all the considered characteristics of the output. While the increase of b0 for large Idc -currents can lead either to the increase (for large τin ) or decrease (for small τin ) of τid .

Fig. 2. The mean of the interspike interval duration, the spike-timing jitter, and the coefficient of variation as functions of input interpulse duration for three values of the parameter b0 without ECM for (a) Idc = 5 μA/cm2 , (b) Idc = 7 μA/cm2 and (c) Idc = 10 μA/cm2

3.2

Neuronal Firing in the Presence of Noise: Noise-Delayed Spiking

For D = 0, fluctuations can either suppress some of the spikes or, on the contrary, can provide the appearance of additional pulses in the output. For con-

194

M. A. Rozhnova et al.

sidered values of Idc , Fig. 3(a) shows that almost all curves have a similar nonmonotonic behavior with a maximum at some value of noise intensity. Wherein, small amount of fluctuations impedes the spiking: noise with small intensity D provokes the increase of the mean interspike interval duration. Such noise delayed spiking phenomenon was observed in [7–13] for the mean latency time. Here, we demonstrate that for the interspike intervals this phenomenon also takes place: the neuronal cell sensitivity to noise is particularly high within a certain interval of noise intensities (where the increase of τid is observed). The degree of such sensitivity to noise is also dependent on b0 and τin . From Figs. 3(b), (c) follows that the increase of b0 as well as the decrease of τin lead to decrease of noisesensitivity, the maximum becomes less pronounced. As we can see, noise delayed 2 spiking is observed for large enough values of Idc only. For Idc = 7 μA/cm (blue curve in Fig. 3(a), upper panel) we observe another dependence. For this parameter, in noise-free case the system spends a lot of time near the resting state that (i) leads to appearance of large values in τid -statistics. Fluctuations drive out the system to oscillatory mode and lead to decrease of τid . Obviously, that a similar 2 behavior we can also observe for any Idc < 7 μA/cm . Taking above mentioned 2 differences into account, we further focus on two cases (Idc = 7 μA/cm and 2 Idc = 8.5 μA/cm ) and consider the role of ECM in cell’s sensitivity to external fluctuations.

Fig. 3. White Gaussian noise-induced changes: the mean of the interspike interval duration, the spike-timing jitter, and the coefficient of variation as functions of noise intensity D for (a) for four values of Idc , τin = 4 ms, b0 = 1, (b) for three values of b0 , Idc = 8.5 μA/cm2 , τin = 4 ms, (c) for three values of the input interpulse duration τin , Idc = 8.5 μA/cm2 , b0 = 1.

4

ECM-Induced Changes of Neuronal Firing

Dynamical regimes of ECM activity within the considered model (5) were circumstantially studied in [14]. It was shown that various bistable (when switching between two different steady states is possible or stable stationary level co-exists

Brain Extracellular Matrix Impact on Neuronal Firing

195

with oscillations) and monostable modes can be observed for various parameters. In this study, the average activity variable Q is assumed to be changeable in time in accordance with (6). The parameters of ECM-model provide the transition to some stationary level of concentrations of ECM molecules Z as a result of high level of averaged neuronal activity Q. Taking into account the gain of IEP SCs (Fig. 4(a)) and Iapp (Fig. 4(b)) due to the establishment of Z-level, leads to increase of the system’s reliability: ECM-induced elimination of the noisedelayed spiking effect is observed.

Fig. 4. ECM-induced changes: the mean of the interspike interval duration, the spiketiming jitter, and the coefficient of variation as functions of noise intensity D for two values of Idc (Idc = 7 μA/cm2 (blue curves) and Idc = 8.5 μA/cm2 (green curves)) for (a) γZb = 0.3, (b) γZ = 0.1, τin = 4 ms, b0 = 1.

5

Conclusions

Neuronal firing activity was studied within the frame of Hodgkin-Huxley model driven by the synaptic currents accounting the existence of background noise and impacts of ECM whose concentration of molecules can be modified via feedback mechanism of neuron-ECM interaction. In the absence of ECM, the phenomenon of noise-delayed spiking is observed: both reliability and precision of the signal transition in the presence of noise become worse. Introducing of ECM impacts into the model shows elimination of this negative noise-induced effects, that allows demonstrating more reliable and precise signal processing by the neuronal systems.

196

M. A. Rozhnova et al.

Acknowledgments. The work was supported by the Ministry of Education and Science of Russia (Project No. 14.Y26.31.0022).

References 1. Rodriguez-Molina, V.M., Aertsen, A., Heck, D.H.: Spike timing and reliability in cortical pyramidal neurons: effects of epsc kinetics, input synchronization and background noise on spike timing. PLoS ONE 2(3), e319 (2007). https://doi.org/ 10.1371/journal.pone.0000319 2. Tiesinga, P., Fellous, J.-M., Sejnowski, T.J.: Regulation of spike timing in visual cortical circuits. Nat. Rev. Neurosci. 9(2), 97–107 (2008). https://doi.org/10.1038/ nrn2315 3. Kazantsev, V., Gordleeva, S., Stasenko, S., Dityatev, A.: A homeostatic model of neuronal firing governed by feedback signals from the extracellular matrix. PLoS ONE 7(7), e41646 (2012). https://doi.org/10.1371/journal.pone.0041646 4. Hodgkin, A.L., Huxley, A.F.: A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117, 500–544 (1952) 5. Lee, S.-G., Neiman, A., Kim, S.: Coherence resonance in a Hodgkin-Huxley neuron. Phys. Rev. E 57(3), 3292–3297 (1998). https://doi.org/10.1103/PhysRevE.57.3292 6. Parmananda, P., Mena, C.H., Baier, G.: Resonant forcing of a silent HodgkinHuxley neuron. Phys. Rev. E 66, 047202 (2002). https://doi.org/10.1103/ PhysRevE.66.047202 7. Pankratova, E.V., Polovinkin, A.V., Mosekilde, E.: Resonant activation in a stochastic Hodgkin-Huxley model: interplay between noise and suprathreshold driving effects. Eur. Phys. J. B 45(3), 391–397 (2005). https://doi.org/10.1140/ epjb/e2005-00187-2 8. Gordeeva, A.V., Pankratov, A.L.: Minimization of timing errors in reproduction of single flux quantum pulses. Appl. Phys. Lett. 88, 022505 (2006) 9. Pankratova, E.V., Belykh, V.N., Mosekilde, E.: Role of the driving frequency in a randomly perturbed Hodgkin-Huxley neuron with suprathreshold forcing. Eur. Phys. J. B 53(4), 529–536 (2006). https://doi.org/10.1140/epjb/e2006-00401-9 10. Ozer, M., Graham, L.J.: Impact of network activity on noise delayed spiking for a Hodgkin-Huxley model. Eur. Phys. J. B 61, 499–503 (2008). https://doi.org/10. 1140/epjb/e2008-00095-y 11. Gordeeva, A.V., Pankratov, A.L., Spagnolo, B.: Noise induced phenomena in point Josephson junctions. Int. J. Bifurcat. Chaos 18, 2825–2831 (2008) 12. Uzuntarla, M., Ozer, M., Ileri, U., Calim, A., Torres, J.J.: Effects of dynamic synapses on noise-delayed response latency of a single neuron. Phys. Rev. E 92(6), 062710 (2015). https://doi.org/10.1103/PhysRevE.92.062710 13. Uzuntarla, M.: Inverse stochastic resonance induced by synaptic background activity with unreliable synapses. Phys. Lett. A 377(38), 2585–2589 (2013). https:// doi.org/10.1016/j.physleta.2013.08.009 14. Lazarevich, I.A., Stasenko, S.V., Rozhnova, M.A., Pankratova, E.V., Dityatev, A.E., Kazantsev, V.B.: Dynamics of the brain extracellular matrix governed by interactions with neural cells. arxiv:1807.05740

Contribution of the Dorsal and Ventral Visual Streams to the Control of Grasping Irina A. Smirnitskaya(&) Scientific Research Institute for System Analysis, Russian Academy of Sciences, Nakhimovsky Prospect, 36/1, Moscow 117218, Russia [email protected]

Abstract. Since 1982 Ungerleider and Mishkin’s paper about the different roles of dorsal and ventral visual streams, the first as “where” and the last as “what”, there is no consensus, what these pathways really do and are they really exist. In this review the contribution of parietal, premotor and prefrontal cortical regions in the control of grasping in the context of the existence of two visual streams is discussed. There is evidence that each of the two streams consists of two subdivisions. The roles of the subdivisions in control of grasping such as: the memorizing of the features of object for grasping, the calculation of value of the object for grasping, the control of the movement’s precision, the retention of the movement’s goal in working memory, and so on, are analyzed. The complementarity of the dorsal and ventral regions of visual pathways in motion control is shown. The separate problem is the coherency of the execution of all this tasks. Each of the pathways performs its part by interchanging signals and ensuring coordinated execution of the work. Keywords: Dorsal visual stream Ventral visual stream Premotor area Prefrontal area Value of action

Grasping

1 Introduction In 1982 the article by Ungerleider and Mishkin [1] introduced the “space versus object” principle in interpretation of functions of different visual areas during perception. The authors discovered that the processing of visual information starting in visual areas V1, V2 divides in two streams: the first, dorsal stream goes to the posterior parietal regions through visual areas V5 and V6, the second, ventral stream proceeds to the temporal lobe through area V4. The dorsal pathway is responsible for space perception, and the ventral pathway is related with object perception. The authors called them “Where” and “What” systems. The results were obtained in monkeys, but the information flow division is true for humans too [2]. Let us take the well-studied process of grasping as an example of manipulative actions to determine the roles of different visual streams and their interconnections.

© Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 197–203, 2020. https://doi.org/10.1007/978-3-030-30425-6_23

198

I. A. Smirnitskaya

2 The Separation of the Dorsal Visual Stream into the “Dorso-Dorsal” and “Dorso-Ventral” Sub-streams Patients with posterior parietal cortex lesion can omit some operations of grasping [3]. A patient with optic ataxia disorder has difficulty in directing his arm towards an object to be grasped. He can see the object and tell its location, but fails to get hold of it at once, finding it as if by chance. A patient suffering neglect has another type of malfunction: he can’t see an object at all, but keeps implicit perception [4]. The difference is that in the first case the lesion is in the superior parietal lobule and in the second, the lesion is centered in inferior parietal lobule. Both the superior parietal lobule (Brodmann area 5, SPL) and inferior parietal lobule (Brodmann area 7, IPL) belong to the dorsal visual pathway and are the superior and inferior parts of the intraparietal sulcus (IPS) (see Fig. 1). The SPL receives a visual signal from the visual area V5 and sends the output signal to the dorsal premotor area. The area is responsible for directing the hand and the eyes towards the object. The IPL receives a signal from the visual area V6, its motor-region destination being the ventral premotor area which controls the grasp motions of the hand and fingers. The authors of paper [3] proposed a model of visual information processing in dorsal visual stream that highlights two parts in it: the dorso-dorsal stream that goes through the SPL to the dorsal premotor areas and the dorso-ventral stream that runs through the IPL to the ventral premotor areas.

3 What Generally Should Be Done Before and During Grasping 1. To sight the object and determine whether the object is familiar or not. The latter implies scanning the whole dataset of images stored in the memory. 2. If the thing is familiar, it is necessary to determine its value by finding the object in another dataset which stores the values of objects. If the object has a negative value, i.e., it is dangerous, it should not be touched – instead it may be best to act quite differently, e.g. to run away or to freeze. If the value is positive, it is necessary to examine more general behavioral characteristics of the subject before grasping the object. Specifically, it is necessary to decide whether to stop the current action and occupy oneself with another job (e.g. leave the meal and attend to a toy). In the latter case the grasping starts. 3. If the object is novel, not familiar, the grasping program is triggered to investigate the object and to memorize its sensory characteristics and values. 3.1

The Different Behavioral Tasks of the Subdivisions of Dorsal and Ventral Visual Streams

Figure 1 gives a rough delineation of ventral and dorsal visual streams. The common pathway starts in the occipital visual areas. Then it parts: the dorsal stream goes to the

Contribution of the Dorsal and Ventral Visual Streams

199

parietal regions and proceeds to the motor, premotor and prefrontal areas of the neocortex, the ventral stream runs to the inferior temporal areas, and finely to the ventrolateral pre-frontal cortex [2] being considered as the destination of the ventral pathway. A detail inspection of pathways from inferotemporal area TE to prefrontal, orbitofrontal and medial temporal regions points to the engagement of TE with the network related to behavioral choice [6] determined by the values of objects and possible actions. The ventral visual pathway decides whether to answer to the input stimulus or not. This means, that it solves two problems: (a) it determines the value of the stimulus and (b) memorizes its sensory representation. To cope with the first problem, the interpretation of the visual signal is made in the tem-poral and prefrontal areas. As a result, the object value is computed and the behavioral choice is done. For this purpose, the inferior temporal area TE interchanges signals with the amygdala, orbitofrontal cortex, hippocampal formation. In turn, the amygdala, orbitofrontal and insular cortical areas are interconnected [7, 8] and jointly calculate the value of objects [9]. The destination of the ventral pathway is the ventrolateral prefrontal cortex holding the response pattern.

Fig. 1. Two ways of interpretation of the visual signal: the ventral way (bottom part of the figure) and dorsal way (the top part of the figure). The dorsal pathway divides in two ways: the dorso-dorsal and dorso-ventral way. V1 – V6 are the occipital visual areas, TEO, TE stand for inferior temporal areas, PMd, PMv are the dorsal and ventral pre-motor regions. Areas 46d and 46v are dorsolateral and ventrolateral prefrontal cortical regions.

The second problem is the memorizing and it is solved by the network consisting of the inferior temporal area, hippocampus, perirhinal, postrhinal and entorhinal cortical areas.

200

I. A. Smirnitskaya

The dorsal visual stream is the system that controls the action: it is responsible for reaching the object by the arm and grasping by the fingers.

4 The Dorsal Pathway. The Role of the Parietal, Motor and Premotor Areas in the Control of Grasping The visual and somatosensory features of objects are represented in the parietal cortex. It consists of the primary somatosensory region S1 and higher-order areas that store a combined visual and somatosensory representation of the object. These representations are transmitted to the motor and premotor areas [5] to perform the action. The somatosensory information that comes to the parietal cortex from the thalamus is of two types: tactile and proprioceptive. The tactile information arrives from cutaneous mechanoreceptors embedded in the skin, that converts the mechanical deformation of skin to neural signals. The proprioceptive information goes from deep receptors telling about the degree of the compression and stretching of muscles, tendons, ligaments and joints. For the motion that has already started, the both types of information are feedback signals. That is, with respect to the somatosensory signal the visual signal is a primary signal that triggers the motion. As the action lasts, the motion is corrected; the tactile characteristics of the object such as the form, texture, weight are analyzed and memorized; the patterns of the joint activity of hand and finger muscles that secure proper grasp motions are also stored. For these purposes all areas participating in the initiation and execution of the motion (both primary and high-order areas) sends feedforward and feedback projections. Four subareas can be selected in the primary somatosensory area S1. These are called Brodmann’s areas 1, 2, 3a and 3b. Area 3b is the primary area for tactile reception, area 3a is the primary proprioceptive area. Area 1 is secondary for tactile reception: its removal turns off the texture recognition; area 2 has equal amounts of tactile and proprioceptive secondary inputs and it deal with coordination of fingers in grasping and with recognition of the form and size of objects being grasped. The higher-order parietal areas form 2 clusters: lateral parietal areas and posterior parietal areas. In the previous paragraph, it was pointed out that the ventral visual pathway determines the value of the object and the value of the manipulations with the object, while the dorsal pathway is responsible for the arranging of the action. The examination of the pathways for sensory information in the parietal cortex (the dorsal pathway) shows, that somatosensory characteristics of the object received during the manipulation with it, arrive to the secondary somatosensory area S2 (the cluster of lateral parietal areas), and this area sends signals to the insular cortex [10], which is a part of the network storing values of objects and interacting with the ventral stream. We see the joint activity of the dorsal and ventral pathways here. The posterior parietal areas serve as the beginning of dorso-dorsal (SPL) and dorsoventral (IPL) pathways [3]. The dorso-ventral pathway starts in the inferior parietal lobule (Brodmann’s area 7) which sends signals to the motor area M1 and ventral premotor area PMv. As a result of interaction with S1, M1 and PMv, a distributed representation of sensory signals

Contribution of the Dorsal and Ventral Visual Streams

201

initiating the grasping is formed in the posterior parietal areas: the visual object to be grasped, the direction towards the object and the handling characteristics of the object (the form, size, weight and texture) found by referring to the previously investigated and accumulated information [11]. Additionally, these areas interchange information with inferior temporal areas TEa/m, which is a part of the ventral pathway: this area sends a permission to act to posterior parietal areas. That is, the interaction of the dorsal and ventral pathways also occurs in this place. The dorso-dorsal pathway originating in the superior parietal lobule (Brodmann’s area 5) sends signals to the dorsal premotor area PMd and is responsible for the direction of the eyes and the arm towards the object. Though it doesn’t interchange signals with inferior temporal areas and doesn’t receive signals about the value of the object from them, the end point of the prefrontal cortex that receives the signal from the dorso-dorsal pathway is the dorsolateral area 46d, which is considered to be the center of the working memory. So, the dorso-dorsal pathway holds the holistic representation of the current motor task.

5 The Pattern of Visual Pathways Interaction for the Control of Grasping Giving the formalized consideration of the trial-and-error learning, the classical textbook Reinforcement Learning by Sutton and Barto [12] begins with the description of a gambling machine which has n options with different winning probabilities (Multiarmed Bandits). The game with this timeless device comes down to the repetition of the same event: each time we, as though anew, come to the machine, activate an arm and hope for a win. And only our memory keeps different outcomes, adding a one-time outcome to the sequence of previous results. Having used this static example to introduce the concepts of value function and prediction problem, the authors quickly turn to the main objective: a sequence of actions where each step can be different and where the desire to get the greatest reward necessitates the optimization of the whole sequence. If the transformation of external sensory (visual) signals into motor commands is regarded as either a discrete action or succession of actions, the grasping is a discrete act, while reaching by the arm followed by grasping an object by the fingers is a sequence of actions. As regards the necessary calculation of the action value, the grasping value is equal to the value of the object to be grasped. The value of the action sequence consisting of outstretching of the hand and grasping of the object is also equal to the value of the object. However, there are many exceptions, e.g. when experimenters put different obstacles in the course of the hand, the values of action sequences vary. The ventral visual stream calculates the value of actions (in the discrete case, it is equal to the value of the object being manipulated), and for an action sequence the value can be found differently. It is important that the ventral visual stream engages the hippocampus which is responsible for remembering new objects and corresponding action sequences. In dealing with sequences the working memory plays an important

202

I. A. Smirnitskaya

role. The dorsolateral prefrontal cortex (area 46d) is regarded as a substratum of this kind of memory. This cortical area interacts with the hippocampus. It is widely accepted that the ventral stream endpoint is the ventrolateral prefrontal cortex. It is true for discrete actions which, as mentioned above, are governed by the dorsoventral stream. The dorso-dorsal stream whose endpoint is the dorsolateral cortex is responsible for the representation of the sequences of actions, in other term, the holistic representation of the motor task.

Dorso-dorsal Object DLPFC

PMd

VLPFC

PMv

SPL Value calculation Prefrontal cortex

OFC

Motor, Premotor areas

Insula

TE, TEa/m Hippocampus, entorhinal cortex

Visual

IPL

Areas

Dorso-ventral Parietal cortex

V1, V2

V4

Ventral stream

Inferotemporal areas

Fig. 2. The dorsal and ventral visual streams and their subsystems

6 Conclusion There are 4 visual pathways (Fig. 2). These are dorso-dorsal and dorso-ventral pathways which belong to the dorsal stream. Within the ventral stream the pathway from the visual areas to the inferior temporal areas TE splits to go to the orbitofrontal cortex and to the hippocampal areas. Though these pathways execute their own tasks, they interact and function conjointly. Acknowledgement. The review was done within the 2019 state task 0065-2019-0003 Research into Neuromorphic Big-Data Processing Systems and Technologies of Their Creation.

Contribution of the Dorsal and Ventral Visual Streams

203

References 1. Ungerleider, L.G., Mishkin, M.: Two cortical visual systems. In: Ingle, D.J., Goodale, M.A., Mansfield, R.J.W. (eds.) Analysis of Visual Behavior, pp. 549–586. MIT Press, Cambridge (1982) 2. Kravitz, D.J., Saleem, K.S., Chris, I., Baker, C.I., Ungerleider, L.G., Mishkin, M.: The ventral visual pathway: An expanded neural framework for the processing of object quality. Trends Cogn. Sci. 17(1), 26–49 (2013) 3. Rizzolatti, G., Matelli, M.: Two different streams form the dorsal visual system: anatomy and functions. Exp. Brain Res. 153, 146–157 (2003) 4. Rizzolatti, G., Berti, A., Gallese, V.: Spatial neglect: neurophysiological bases, cortical circuits and theories. In: Boller, F., Grafman, J., Rizzolatti, G. (eds.) Handbook of neuropsychology 2nd edn, vol. I, pp 503–537. Elsevier Science, Amsterdam (2000) 5. Delhaye, B.P., Long, K.H., Bensmaia, S.J.: Neural basis of touch and proprioception in primate cortex. Compr. Physiol. 8(4), 1575–1602 (2019) 6. Murray, E.A., Rudebeck, P.H.: The drive to strive: goal generation based on current needs. Front. Neurosci. 7, 1 (2013). Article112 7. Höistada, M., Barbas, H.: Sequence of information processing for emotions through pathways linking temporal and insular cortices with the amygdala. Neuroimage 40(3), 1016– 1033 (2008) 8. Ghashghaeia, H.T., Hilgetaga, C.C., Barbas, H.: Sequence of information processing for emotions based on the anatomic dialogue between prefrontal cortex and amygdala. Neuroimage. 34(3), 905–923 (2007) 9. Smirnitskaya, I.A.: How the cingular cortex, basolateral amygdala and hippocamp contribute to retraining. In: Proceedings of the XV All-Russia Conference Neuroinformatics (2013) 10. Friedman, D.P., Murray, E.A., O’Neill, J.B., Mishkin, M.: Cortical connections of the somatosensory fields of the lateral sulcus of macaques: evidence for a corticolimbic pathway for touch. J. Comp. Neurol. 252, 323–347 (1986) 11. Borra, E., Gerbella, M., Rozzi, S., Luppino, G.: The macaque lateral grasping network: a neural substrate for generating purposeful hand actions. Neurosci. Biobehav. Rev. 75, 65–90 (2017) 12. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)

Deep Learning

The Simple Approach to Multi-label Image Classification Using Transfer Learning Yuriy S. Fedorenko(&) Bauman Moscow State Technical University, Baumanskaya 2-Ya, 5, 105005 Moscow, Russia [email protected]

Abstract. The article deals with the problem of image classification on a relatively small dataset. The training deep convolutional neural net from scratch requires a large amount of data. In many cases, the solution to this problem is to use the pretrained network on another big dataset (e.g. ImageNet) and fine-tune it on available data. In the article, we apply this approach to classify advertising banners images. Initially, we reset the weights of the last layer and change its size to match a number of classes in our dataset. Then we train all network, but the learning rate for the last layer is several times more than for other layers. We use Adam optimization algorithm with some modifications. Firstly, applying weight decay instead of L2 regularization (for Adam they are not same) improves the result. Secondly, the division learning rate on the maximum of gradients squares sum instead of just gradients squares sum makes the training process more stable. Experiments have shown that this approach is appropriate for classifying relatively small datasets. Used metrics and test time augmentation are discussed. Particularly we find that confusion matrix is very useful because it gives an understanding of how to modify the train set to increase model quality. Keywords: Image recognition Transfer learning Adam One cycle policy Weight decay Amsgrad Test time augmentation Confusion matrix

1 Introduction Deep convolutional neural networks are very effective for solving image classification task. However, training such networks from scratch (with random initialization) is not always possible because it requires a large amount of data. Therefore transfer learning has become common in many applied tasks [1]. Deep learning frameworks already have common convolutional neural networks (VGG [2], ResNet [3], Inception [4]) pretrained on ImageNet. So, there is no need to train models yourself on this dataset. But in practice there are several issues that need to be solved. The first problem is connected with proper learning rate selection. Too small value may result in a very long training process which stop on a flat valley. Too large value may lead to learning a suboptimal set of weights. Besides, the learning rate on the last layers of the network should be greater than on the first layers, because the earlier layers of the network have enough generic features that may be useful in many tasks. The second problem is connected with unstable training process when using Adam algorithm. © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 207–213, 2020. https://doi.org/10.1007/978-3-030-30425-6_24

208

Y. S. Fedorenko

2 Problem Definition In this article, we consider the classification of advertising banners images. The user interest in the banner depends on the banner image, so it’s important to determine the banner image topic. The banner image is fed on the input of the model. The model output is one or several classes of image in our specialized taxonomy. But there are several problems. Firstly, the number of labeled images is relatively small. It is measured in hundreds not thousands of samples. This amount of data is not enough to train the model from scratch. Secondly, images of advertising banners are specific enough, so we can’t use pretrained on ImageNet model directly. And thirdly, each image can belong to several classes. For example, it may be the advertising of mobile application to call a taxi. In such a case, the model should detect two classes: mobile app and taxi.

3 The Training Procedure To deal with the first two problems we use transfer learning. We take pretrained neural network, reset last layer weights and change last layer size to match the number of classes in our taxonomy. We train all network, but for the last layer, the learning rate is five times more than for other layers. Also, we use an adaptive learning rate [5]. Initially, the upper limit of the learning rate is searched. To find it we increase the learning rate step by step from small value and train the neural net on each step. The whole procedure takes only about 10–20 epochs, so the classical overfitting after multiple passes through the training set does not have time to happen. The minimum learning rate at which the validation set error starts to increase is the required upper limit. The example is presented in Fig. 1. After each epoch, the learning rate was increased by 1 step (0.0001), and the loss value in the training and validation set was marked on the graph.

Fig. 1. Searching for the upper limit of learning rate

The Simple Approach to Multi-label Image Classification

209

We start training with a learning rate of 1/10 from the upper limit value. Then learning rate subsequently increases to the upper limit after that it decreases back (Fig. 2). This method, called the one cycle policy, has a simple motivation. At the start, the small learning rate provides more accurate convergence. Then when optimizer traverses a flat valley, increasing of learning rate allows to speed up training. In the final stages, optimizer falls into the local minimum, and the learning rate is again reduced to provide more accuracy. Besides, it’s argued that a relatively high learning rate in the middle of the training process is a form of regularization because it helps the network to avoid steep areas of the loss function which correspond to overfitted configurations [6].

Fig. 2. One cycle for learning rate

For training, we use Adam algorithm with modifications. Many researchers have disappointed in Adam after it introducing in 2014, claiming that SGD with momentum performs better. But in 2017 in [7] AdamW algorithm was proposed. It used weight decay instead of L2 regularization. As known, L2 regularization implies adding sum of the model weights squares to the loss function: Jr ¼ J þ

c Xn x2 k¼1 k 2

where J – loss function, Jr – loss function with regularization term, c – regularization coefficient and xk – weights of the neural net. For simple SGD it leads to weight decay because updating rule is as follows: xk ¼ xk a

@J a c xk @xk

Where a – learning rate. But for more sophisticated optimizers such as Adam, this is not true, because regularization term in loss function affects the value of the accumulated gradients and gradient squares. So, Adam with L2 regularization and Adam with weight decay (AdamW) are two different approaches. In [7] authors argue that we should use AdamW instead of Adam with L2 regularization implemented in classic deep learning frameworks. Our experiments show that AdamW leads to a better result, so we have used it.

210

Y. S. Fedorenko

One more modification is Amsgrad technique. In the article [8] an error was found in Adam update rule. It could cause an algorithm to converge to a suboptimal point. pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi The problem is that the proof of Adam requires that the step size a= E ½g2 þ e does not decrease over training process. But this is not satisfied in many cases because the exponential moving average of gradient squares E ½g2 may decrease in the last epochs of training. So, authors of Amsgrad suggested using the maximum value of this quantity, because it is guaranteed non-increasing. In practice, the effect of such modification is controversial. But in our experiments using Amsgrad allows achieving better and more stable results compared to simply Adam. So, we use Adam with weight decay and Amsgrad technique. As mentioned above, each sample may belong to multiple classes. In such a case, the sample is passed to the model several times separately with each label. This allows considering multi-label images in a simple way. Also, we use data augmentation during training to improve network generalization.

4 Experiments In the experimental analysis, we use ResNet18 network pretrained on ImageNet. We use ResNet18 model because it is simple in relation to other deep convolutional neural networks (so it requires less memory and training time), but the result is comparable with other more complex models. We use ImageNet dataset, because it contains wide variety of classes. It is also convenient in practice that deep learning frameworks has pretrained models on this dataset. Then we split our dataset into train, validation and test parts in the ratio 3:1:1 and train model as described above. Also for improving the result, we use test time augmentation (TTA) [9]. The main idea of this approach is to perform random transformations on the test set. Images from the test set are augmented several times, and for each of them predictions are calculated. Then they are averaged. This technique works because after averaging predictions errors are averaged too. And if there is an error on one sample, leading to the wrong answer, it may disappear after averaging over several samples, because errors on each sample will be differ, and only correct answer stand out. For evaluating model quality, we use confusion matrix and precision-recall graphs. The first one is the matrix which shows mutual errors between classes. By analyzing these metrics, one can conclude which classes have many false positives or false negatives. It gives an insight on how to modify train and validation sets (in our task we prepare dataset ourselves). Proper dataset preparation has a strong effect on the result. The second one shows precision and recall value for each class (for better readability, not precision is shown but 1 – precision). It allows to visually estimating good classes and classes with many false positives or false negatives. The fragment of confusion matrix and precision-recall graphs are shown in Figs. 3 and 4 accordingly.

The Simple Approach to Multi-label Image Classification

211

Fig. 3. Confusion matrix for “Auto” category

Fig. 4. Precision-recall graphs for “Auto” category (the top chart without TTA, the bottom chart with TTA by 10 samples)

So, we can see that using test time augmentation slightly improves the result. Examples of correct and wrong images classification are presented in Fig. 5.

212

Y. S. Fedorenko

Fig. 5. Examples of images from “Auto” category with model answers

5 Conclusion So, the concrete images classification tasks can be performed by transfer learning. It solves the problem of a relatively small dataset and eliminates the need for a computationally time-consuming procedure of training model from scratch. The using of Adam optimization algorithm with its recent modifications along with proper learning rate selection improves the training process and makes it more stable. Also, the dataset preparation is crucial. The analyzing of confusion matrix and viewing misclassified samples gives understanding, how to modify train dataset. Several iterations of dataset enhancement usually yield an acceptable practical result.

References 1. Karpathy, A.: Convolutional neural networks for visual recognition. https://cs231n.github.io/ transfer-learning/. Accessed 1 Apr 2019 2. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015). arXiv preprint, arXiv:1409.1556v6 [cs.CV] 3. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 770–778. IEEE, New Jersey (2016)

The Simple Approach to Multi-label Image Classification

213

4. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2818–2826. IEEE, New Jersey (2016) 5. Smith, L.: Cyclical learning rates for training neural networks. In: IEEE Winter Conference on Applications of Computer Vision, WACV, pp. 464–472. IEEE, New Jersey (2017) 6. Gupta, A.: Super-convergence: very fast training of neural networks using large learning rates. https://towardsdatascience.com/https-medium-com-super-convergence-very-fast-training-ofneural-networks-using-large-learning-rates-decb689b9eb0. Accessed 10 Apr 2019 7. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2019). arXiv preprint, arXiv:1711.05101v3 [cs.LG] 8. Reddi, S., Kale, S., Kumar, S.: On the convergence of adam and beyond. In: International Conference on Learning Representations, ICLR, Vancouver, BC, Canada , pp. 186–208 (2018) 9. Ayhan, M., Berens, P.: Test-time data augmentation for estimation of heteroscedastic aleatoric uncertainty in deep neural networks. In: Medical Imaging with Deep Learing Conference, MIDL, Amsterdam, Netherlands, pp. 278–286 (2018)

Application of Deep Neural Network for the Vision System of Mobile Service Robot Nikolay Filatov1(&), Vladislav Vlasenko1, Ivan Fomin1, and Aleksandr Bakhshiev1,2 1

2

The Russian State Scientific Center for Robotics and Technical Cybernetics, Tikhoretsky Prospect 21, 194064 Saint-Petersburg, Russia [email protected] Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya, 29, 195251 Saint-Petersburg, Russia

Abstract. The solution of object detection task is valuable in many fields of robotics. However, application of neural networks for mobile robots requires the use of high – performance architectures with low power consumption. In search of suitable model, a comparative analysis of the YOLO and SqueezeDet architectures was conducted. The task of detecting wooden cubes by mobile robot with the camera with the aim of collecting them was solved. A specific dataset was constructed for the training purposes. Applied SqueezeDet neural network has reached precision 89% and recall 82% for IOU 0.5. Keywords: Convolutional neural network Service robot

SqueezeDet Object detection

1 Introduction With the development of deep neural networks used for the classification, segmentation and detection of objects, the area of their application is also growing [1]. The use of neural networks to increase the level of autonomy of vehicles is a popular and urgent task. Also, neural network methods are often used to improve the accuracy of orientation of mobile robots in environment [2]. In general, the task of object detection by video image is extremely promising in robotics, its solution allows scout robots to increase the level of autonomy when searching for objects of interest, which is important when working in extreme conditions. It will also be useful to apply these technologies in the service robotics industry to create more intelligent systems capable of finding certain items. The main limitation for the implementation of neural network algorithms is the high requirements to computing hardware. This problem is being widely solved by the community and at the moment there is a set of methods that provide improved speed. It is relevant to compare and integrate these methods used in the real tasks of robotics.

© Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 214–220, 2020. https://doi.org/10.1007/978-3-030-30425-6_25

Application of Deep Neural Network for the Vision System

215

2 SqueezeDet and YOLO Architectures Comparison Single-stage neural network detectors have the highest speed, in which assumptions about the location of objects and the probability of their belonging to certain classes are made simultaneously with a convolutional neural network. Such neural networks are YOLO [3] and SqueezeDet [4]. The principle of operation is to extract from the image multidimensional feature maps and use them to train one (or more) layers, the output of which is a tensor containing the estimated coordinates of objects and indices of their classes. In the case of SqueezeDet, feature maps are retrieved using the SqueezeNet high-performance neural network [5]. Coordinates are predicted in accordance with the specified sampling grid, and object patterns—anchors. The templates are deformed and shifted relative to the sampling grid, and each is assigned a confidence value, according to which they are then filtered using non-maximum suppression. Neural networks SqueezeDet and YOLO have the same principle of operation, but the architecture of SqueezeDet was created more specifically to be embedded in lowpower platforms, which causes differences in the structure and performance of these neural networks. Consider the layers responsible for the object detection using the input object maps. In SqueezeDet, the detection layer is a convolutional layer called ConvDet; for simplicity, we denote the block responsible for detection in YOLO as FcDet, since it consists of two fully connected layers. Assume that the input feature map width is Wf, height is Hf, and input channels number is Cf. Denote ConvDet’s filter width as Fw and height as Ff. With the proper striding the output of ConvDet keeps the initial size of input feature map. Thus, to compute K ð4 þ 1 þ CÞ outputs for each reference grid the ConvDet requires FwFhChfK(5 + C) parameters (Fig. 1).

Fig. 1. ConvDet layer.

Using the same notation and designating the number of neurons in the first layer of the FcDet block as Ffc1, it can be determined that the number of parameters in the first fully connected layer will be WfHfChfFfc1. The second fully connected layer that generates C class probabilities and K(4 + 1) bounding box coordinates for the WoxHo sampling grid contains is Ffc1WoHo(5 K + C) (Fig. 2). The total number of parameters in these two fully connected layers is Ffc1(WfHfChf + WoHo(5 K + C)).

216

N. Filatov et al.

Fig. 2. FcDet layers.

The tensor 7 7 1024 is taken as the input feature map in YOLO, Ffc1 = 4096, K = 2, C = 20, Wo = Ho = 7. Thus, the total number of parameters required for two fully connected layers will be approximately 212 106. If the same configuration parameters are used for 3 3 ConvDet it would only require 3 3 1024 2 25 0:46 106 parameters which is 460 times smaller than FcDet. A small number of parameters of the neural network certainly allow you to require less space in the memory and provide a higher speed. However, due to the different computational complexity of the layers, the speed of the architecture is not directly proportional to its size, therefore, it is important to check the speed of operation of the studied architectures on identical hardware. For the YOLO neural network detector, there is a lightweight version - tiny-YOLO, the sizes of the architectures SqueezeDet, YOLOv3, tiny-YOLO are shown in Table 1. Table 1. Comparison of model sizes for selected architectures. Architecture SqueezeDet Tiny-YOLO YOLOv3 Memory, MB 7 34 243

The speed of the neural network detector also depends on the size of the input image, which allows you to adjust the size of the processed image in order to achieve optimal accuracy and processing speed. Two series of experiments were performed for three resolutions, using different hardware. In the first experiment (Table 2) computations were carried out without the use of a graphics processing unit, on the CPU AMD A10 9600p (2.4 GHz, 4 cores). In the second experiment (Table 3), computations were performed using an Nvidia GeForce GTX 1070 graphics processor (8 GB, 1683 MHz) and CPU Intel core I7 8700 (3.2 GHz, 6 cores). Taking into account speed and model sizes of compared architectures SqueezeDet neural network was chosen for object detecting task of mobile robot.

Application of Deep Neural Network for the Vision System

217

Table 2. Comparing the speed of neural networks to detect objects using CPU AMD A10 9600p. Input image resolution, pix Frame processing time, s SqueezeDet Tiny-YOLO YOLOv3 320 240 0,10 0,32 1,99 640 480 0,39 1,04 5,72 1280 1024 1,78 5,08 32,12

Table 3. Comparing the speed of neural networks to detect objects using GPU Nvidia GeForce GTX 1070, CPU Intel core I7 8700. Input image resolution, pix Frame processing time, s SqueezeDet Tiny-YOLO 320 240 0,006 0,016 640 480 0,009 0,026 1280 1024 0,027 0,054

YOLOv3 0,050 0,088 0,253

3 Problem Statement and Dataset Construction It is required to develop a vision system for a service robot that can collect objects of interest recognized using video camera images. The collected objects are wooden cubes with a side of 33 mm with magnetic inserts in the centers of the faces. For the application of neural network detector, data sets were made, the annotations to the images contain the coordinates of the cubes. Constructed data sets can be divided into two: «office» and «hall» . The first one contains 640 images of cubes in various scenes inside the office premises, the shooting angle is arbitrary. The hall data set consists of photographs obtained directly from a mobile robot in a large hall which is convenient for experiments. The resolution of all images in the data sets was limited to 640 480 pixels to ensure high speed of the neural network. It was decided to test the vision system in the hall for better specification of the task. Thus, the hall dataset was main one and the office dataset was made for initial and additional experiments.

4 Experimental Research We studied the effect of adding non-target scenes to the training set, as well as the effect of the choice of anchor boxes on the detection range and the accuracy of localization of objects. The correct setting of anchors is crucial in the SqueezeDet detector, since it is a template and initial approximation of objects of interest. It is recommended to find the values of the anchors using the clustering of the annotations via the k-means method [6]. However, due to problems with the multiple detection of an object a second set of

218

N. Filatov et al.

anchors was obtained by increasing the scale of the first set of anchors. Denote the anchors obtained by clustering as “precise”, the others as “enlarged” and consider the inference peculiarities of a neural network when using these anchors. The values of anchor boxes are shown in the Table 4. Table 4. Anchor boxes used in training. Name Anchor 1 Anchor 2 Anchor 3 Precise 20 20 36 36 64 64 Enlarged 36 36 64 64 100 100

A typical neural network prediction error when using “exact” anchors is the multiple detection of a single object, which leads to additional errors, since the additional bounding boxes usually have a low intersection coefficient (IOU) with annotation. A good feature is the detection of small scale objects (Fig. 3b). In contrast, with the use of “enlarged” anchors, repeated detections occur rarely, and objects over long distances are not detected (Fig. 3a).

Fig. 3. Typical inference errors when using different anchors: (a) small scale object is not detected, enlarged anchors, (b) multiple detection of singe object, precise anchors.

Such properties can be explained by the fact that large objects stand out from the background more strongly and the loss function for them converges faster, therefore bounding boxes based on small anchors can acquire relatively high confidence on fragments of a large object. An experiment in which we compared the precision and recall of the three trained models was conducted. Key features of the models learning process are shown in Table 5. The experimental results are shown in Fig. 4. In all cases, as the initial weights for neural network the weights obtained by training on the Kitti dataset [7] was used. weights of the neural network trained on the Kitti data set were used as starting

Application of Deep Neural Network for the Vision System

219

weights. An erroneous detection is any bounding box that intersects with the annotation less than the specified IOU threshold. Table 5. Description of experiments. Name Hall Hall + office Hall precise anchors

Anchors enlarged enlarged precise

Train dataset Test dataset Hall, 910 images Hall, 220 images Hall + office, 1400 images Hall, 220 images Hall, 910 images Hall, 220 images

Analyzing these graphs, it is clear that, despite periodic multiple detections, the model with precise anchors has better characteristics. It is also seen that the stable omission of distant objects leads to a decrease of recall for models «hall» and «hall + office» . At the same time, characteristics of the last two models are almost the same, but the model, trained on hall and office photos, may be considered better due to the fact that it works well in a larger variety of scenes. Despite the high accuracy of one of the models, this quality assessment cannot be final because it allows multiple detections of a single object, which is unacceptable when planning a route for a mobile robot. To exclude multiple object detections, an additional stage of filtering predictions was added. The implemented algorithm saves only one the bounding box with the greatest confidence in the area of one detection. Recalculation of quality metrics using additional filtration is shown in Fig. 5. The use of additional filtering not only made the technical vision system convenient to use, but also improved the F1 – score defined as: F1 ¼ 2

precision recall precision þ recall

ð1Þ

Then the maximum value of F1 before filtering was 0.80, and the maximum value of F1 after filtering is 0.84.

Fig. 4. Precision - recall curves.

220

N. Filatov et al.

Fig. 5. Precision – recall curves after additional filtration.

5 Conclusion A high-performance neural network detector has been used, which can be used on a wide range of hardware suitable for use on low-power platforms. For the task of searching and collecting wooden cubes by mobile robot, training datasets was created. The features of trained neural was analyzed, trained model achieved high precision and recall on test dataset. The probable direction of the research development is the analysis of ways to increase the range of object detection for a certain camera and neural network detector. Research on the object detection precision of small-scale objects depending on the resolution of the input image and applied preprocessing. Acknowledgment. This work was done as the part of the state task of the Ministry of Education and Science of Russia No. 075-00924-19-00 “Cloud services for automatic synthesis and validation of datasets for training deep neural networks in pattern recognition tasks”.

References 1. Nielsen, M.A.: Neural Networks and Deep Learning, vol. 25. Determination Press, San Francisco (2015) 2. Asadi, K., et al.: Real-time scene segmentation using a light deep neural network architecture for autonomous robot navigation on construction sites. arXiv preprint arXiv:1901.08630 (2019) 3. Redmon, J., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) 4. Wu, B., et al.: Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 129–137 (2017) 5. Iandola, F.N., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and > > > > > >
> aðtÞ ¼ 1 Dt > sa aðt DtÞ þ > > > > : hðtÞ ¼ 1 Dt hðt DtÞ þ sh

sðtÞ sa

ð2Þ

sðtÞ sh

Here Vj is a membrane potential of the neuron j, sm characteristic membrane time, Ij input current of the neuron j, Vth;j threshold potential of neuron j, DVth constant increment for the threshold, N total number of neurons in the layer, sth decay constant for the threshold, aðtÞ and hðtÞ are instant and average activities of the neuron, sðtÞ is a binary spike variable equaling 1 if spike occurs at the moment t, and 0 otherwise, sa and sh are decay parameters for aðtÞ and hðtÞ ðsa sh Þ.

Competitive Maximization of Neuronal Activity

257

Here we introduce Convolutional Recurrent Spiking Neural Network (CRSNN) architecture. This is a three-layer architecture, where the first feature extracting layer is convolutional, the second layer is fixed spike-pooling operator and the third layer is the classifier. Convolutional Feature Extraction. There are 25 different convolutional kernels that have 2 input channels. Size of each convolutional filter is 16 16. Each filter is applied with stride 4 without padding. This convolutional structure provides 25 feature maps with size 4 4 (400 total hidden neurons). Neurons corresponding to the same feature map share weights. Spike-Pooling. To reduce the number of weights of subsequent fully connected layer we apply pooling operation to 4 4 feature maps obtained with convolutional filters. All 16 neurons in each feature map are connected to one LIF neuron with equal constant weights to perform operation similar to average pooling in rate-based artificial neural networks. The difference is that in our case pooling operation integrates input spikes during time and produces one spike only after integrating several input spikes that should occur in a short period of time. Otherwise, the potential of the pooling LIF neuron will not reach the threshold. We call this operation spike-pooling. Spikepooling neurons can be assumed as hubs for each convolutional filter. Accordingly to [6] we insert inhibitory connections into the hidden layer. In proposed CRSNN architecture competition is introduced between spike-pooling neurons but not between convolution layer neurons. It allows to (1) allow competition only between different convolutional filters (there is no competition between neurons of the same feature map), (2) reduce total number of inhibitory connections from 400 399 to 25 24 (self-inhibition is not allowed). Classifier. To classify input images on 10 classes we add 25 10 fully connected layer at the end of the CRSNN. Additional supervised current is injected in corresponding classifying neuron while presenting input image of the definite class. Learnable inhibitory weights are also introduced between classifying neurons to provide distinguishable output of the network during test phase (supervised currents are removed in test and validation phases). Reciprocal learnable weights are added from classifying to hidden layer to provide backward signal from classifying neurons to the hidden spike-pooling layer. Additional backward weights allow to send reinforcing signal to the hidden layer from highly activated classifying neurons. This type of connection is especially useful during supervised learning. Initialization. All forward weight values are initialized from uniform distribution in the range [0, 1]. Forward weight values are clipped between 0 and 1 during training. All inhibitory weights were initialized as −1 and clipped between −1 and 0 during training. Initial reciprocal connections weights were set to 0.

258

2.3

D. Nekhaev and V. Demin

Learning Rules

In our recent work [6] we proposed local learning rule called Family-Engaged Execution and Learning of Induced Neuron Groups, or FEELING, that provides competitive learning of recurrent spiking neural networks. Core idea of our learning rules is that every single neuron strives to maximize its activity in competition with other neurons to justify its biological role in the whole network. We summarize FEELING rules in equations: 8 > > >
> > : dwjj0 dt

ð3Þ

Here wij are forward weights, wkj reciprocal weights, wjj0 inhibitory weights between classifying neurons, and wkk0 are inhibitory weights between neurons in spikepooling layer. Weights in convolutional layer are shared, so final update for wij is averaged between connections within each convolutional filter. In equations above, dðÞ stands for Dirac’s delta-function meaning the weight updates occur only with spikes of corresponding neurons, a, b, c, η are the learning rate parameters, and the last terms serve for the weight decays with time constant s to forget inactive patterns.

3 Results In this section we report our results of training CRSNN with FEELING learning rules. We compare our CRSNN to RSNN proposed in [6]. We get better accuracy in both supervised and semi-supervised training regimes having 56 times less number of learning parameters. Also we analyze learned convolutional filters and plot maximizing images [9] for classifying neurons for both normal and inverse input channels. 3.1

Learning Curves

Here we compare convergence of CRSNN to RSNN proposed in [6]. Both architectures use the same data encoding method with inverse images on the input. Number of hidden layer neurons is equal to 400 for both architectures (400 fully connected hidden neurons for RSNN and 16 4 4 for CRSNN). Supervised Learning. Learning curves for CRSNN and RSNN are presented in Fig. 1. RSNN converges faster, but CRSNN provides better accuracy - 97.25% against 96.40% for RSNN. Also, we analyze the impact of reciprocal weights in CRSNN and RSNN. Removing reciprocal weights from the training setup drops CRSNN accuracy to 96.33% (against 94.15% for RSNN).

Competitive Maximization of Neuronal Activity

259

Fig. 1. Learning curves for CRSNN and RSNN on MNIST dataset.

Semi-supervised Learning. Learning curves in semi-supervised mode for CRSNN and RSNN are presented in Fig. 2. Unsupervised learning starts after the first 400 training MNIST digit images passed through the network with the teacher current (in supervised mode). Unsupervised learning for convolutional network achieves better accuracy (76.9%) than fully connected network (72.1%).

Fig. 2. Learning curves for supervised and semi-supervised modes. RSNN converges faster but CRSNN provides better accuracy for both supervised and semi-supervised modes.

3.2

Weight Visualization

Weight visualization is useful for interpreting how neural network processes the input information. Here we analyze convolutional filters, inhibitory weights and plot maximizing images for every classifying neuron. Convolutional Filters. Convolutional layer consists of 25 filters with 16 16 for each input channel (50 filters in total). Filters for the first and the second input channels are presented in Fig. 3A and B respectively. Filters in the same row and column on Fig. 3A and B correspond to the same output feature map. We emphasize that visualizations for pairs of normal and inverse filters almost do not overlap and, moreover, correspond to each other as the parts of one puzzle.

260

D. Nekhaev and V. Demin

Fig. 3. Convolutional filters obtained after training CRSNN with the FEELING learning rule. (A) 25 filters on the left-side stand for the first (normal) input channel. (B) 25 filters on the rightside stand for the second (inverse) input channel. We notice that areas that have high values of weights of the first channel have small values of corresponding weights of the second channel.

Inhibitory Weights. Inhibitory weights between spike-pooling neurons can be viewed as 25 25 matrix as demonstrated in Fig. 4. Self-inhibition is not allowed, so we set all diagonal elements to 0. Despite the fact that we initialized all inhibitory weights to −1, final weight distribution is far enough from the total inhibition. Strong inhibition is essential mostly at the very early stages of training to provide learning of different filters. However, after some period of simulation time, some filters start to cooperate (yellow points on non-diagonal elements). I.e. 24 and 25 filters have very weak inhibitory connections because they have quite similar convolutional weights (see the last two filters in Fig. 3A and B).

Fig. 4. Inhibitory weights obtained while training with FEELING learning rule. These weights provide competition between different convolutional network in CRSNN. We notice that final inhibitory weight matrix looks symmetric, as it should be anticipated naturally.

Maximizing Images. In this work we also have applied our original method of reconstructing maximizing images [9] to convolutional spiking architecture. The main idea of this method is that for each classifying neuron we have to (1) find the image that provides the highest activation of this neuron, (2) compute the gradient of the activity of this neuron with respect to the input, (3) perform one step in the direction of gradient, (4) iteratively repeat steps 2 and 3 for a fixed number of epochs, and (5) pass

Competitive Maximization of Neuronal Activity

261

the result through the threshold filter to binarize maximizing image. Resulting maximizing images are presented in Fig. 5 for both normal and inverse input channels.

Fig. 5. Reconstructed maximizing images of the output neurons. The first row corresponds to the normal input channel, the second row corresponds to the inverse input channel.

3.3

Comparison with Other Methods

We compare our results with other training methods for different SNN architectures in Table 1. CRSNN architecture got improvement by 0.85% accuracy over our previous work [6] with FEELING leaning rule having 55 times less parameters than RSNN. Table 1. Recognition accuracies of different algorithms on the MNIST dataset. Architecture Spiking RBM [10] Fully connected SNN [4] Fully connected SNN [4] Fully connected SNN [2] Convolutional SNN [11] Convolutional SNN [2] Convolutional SNN [5] RSNN [6] RSNN [6] CRSNN (this work) CRSNN + LogReg (this work)

Hidden layers 500-500 100 6400 800 Convolutional coding conv (20)-conv (50)-200 conv (30)-conv (100)-100 100 400 conv (25) conv (25)

Learning rule CD STDP STDP Back-prop Tempotron Back-prop STDP FEELING FEELING FEEILNG FEELING

Accuracy (%) 91.3 82.9 95.0 98.56 91.3 99.3 98.4 95.40 96.40 97.25 98.35

To compare our results with that of [5], we have trained not spiking linear classifier on the top of spike-pooling layer. We recorded activities of 25 pooling neurons, fed them to the input of logistic regression classifier and obtained accuracy 98.35%. So, we have got the same accuracy with much shorter architecture using convolutional feature extractor trained with FEELING instead of STDP. Deeper architectures trained with a back-propagation technique (adapted to SNN) [2] still outperform our results by 0.95% at the best having approximately 3 times more trainable parameters than proposed CRSNN.

262

D. Nekhaev and V. Demin

4 Conclusion Introduced CRSNN is a light architecture that can be trained with FEELING rules using only 20000 MNIST images and provides high accuracy. An important advantage of the FEELING as well as STDP rules is that they are local, i.e. use only locally accessible data (the activities and weight values of interconnected neurons). This property is believed to be the key for successful hardware realizations of learning algorithms in the prospective high-performance and energy-efficient neuromorphic systems. Acknowledgements. This work has been carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/. Development of convolutional spiking architecture and learning experiments has been supported by Russian Science Foundation grant №. 17-71-20111, research and development of learning rules for spiking convolutional layers has been supported by scientific grant of NRC “Kurchatov Institute” №. 1713.

References 1. Merolla, P.A., et al.: A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197), 668–673 (2014) 2. Lee, J.H., Delbruck, T., Pfeiffer, M.: Training deep spiking neural networks using backpropagation. Front. Neurosci. 10, 508 (2016) 3. Bi, G., Poo, M.: Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci. 18(24), 10464– 10472 (1998) 4. Diehl, P., Cook, M.: Unsupervised learning of digit recognition using spike-timingdependent plasticity. Front. Comput. Neurosci. 9, 99 (2015) 5. Kheradpisheh, S.R., Ganjtabesh, M., Thorpe, S.J., Masquelier, T.: STDP-based spiking deep convolutional neural networks for object recognition. Neural Netw. 99, 56–57 (2018) 6. Demin, V., Nekhaev, D.: Recurrent spiking neural network learning based on a competitive maximization of neuronal activity. Front. Neuroinf. 12, 79 (2018) 7. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 8. Maass, W., Bishop, C.M.: Pulsed Neural Networks, p. 275. MIT Press, Massachusetts (1999) 9. Nekhaev, D., Demin, V.: Visualization of maximizing images with deconvolutional optimization method for neurons in deep neural networks. Procedia Comput. Sci. 119, 174– 181 (2017) 10. O’Connor, P., Neil, D., Liu, S., Delbruck, T., Pfeiffer, M.: Real-time classification and sensor fusion with a spiking deep belief network. Front. Neurosci. 7, 178 (2013) 11. Bo, Z., et al.: Feedforward categorization on AER motion events using cortex-like features in a spiking neural network. IEEE Trans. Neural Netw. Learn. Syst. 26, 1963–1978 (2015)

A Method of Choosing a Pre-trained Convolutional Neural Network for Transfer Learning in Image Classification Problems Alexander G. Trofimov(&) and Anastasia A. Bogatyreva National Research Nuclear University “MEPhI” (Moscow Engineering Physics Institute), Kashirskoye Hwy 31, Moscow 115409, Russian Federation [email protected]

Abstract. A method of choosing a pre-trained convolutional neural network (CNN) for transfer learning on the new image classification problem is proposed. The method can be used for quick estimation of which of the CNNs trained on the ImageNet dataset images (AlexNet, VGG16, VGG19, GoogLeNet, etc.) will be the most accurate after its fine tuning on the new sample of images. It is shown that there is high correlation (q 0.74, p < 0.01) between the characteristics of the features obtained at the output of the pre-trained CNN’s convolutional part and its accuracy on the test sample after fine tuning. The proposed method can be used to make recommendations for researchers who want to apply the pre-trained CNN and transfer learning approach to solve their own classification problems and don’t have sufficient computational resources and time for multiple fine tunings of available free CNNs with consequent choosing the best one. Keywords: Image classification Transfer learning

Convolutional neural network ImageNet

1 Introduction After the tremendous success of convolutional neural networks at the international competitions ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) since 2012 more and more researchers tend to apply them to solve their image classification problems. In each case the researcher must make a choice – either build his own CNN from scratch or take some pre-trained model as a starting point and adapt it to solve his problem. The CNN design from scratch is a difficult and time-consuming task involving the choice of sequence and types of network’s layers, number and dimension of convolutional layers, parameters of convolutions, pooling, transfer functions, etc. Due to the high computational complexity of the CNNs their design must take into account the available computational capabilities and RAM. Many articles are dedicated to practical recommendations for the CNN design. In [1], a method for estimating the resources required for the CNN under design is proposed. In [2, 3], it was proposed to use reinforcement learning and in [4] genetic algorithms in CNN design. © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 263–270, 2020. https://doi.org/10.1007/978-3-030-30425-6_31

264

A. G. Trofimov and A. A. Bogatyreva

Another approach to image classification with deep neural networks is transfer learning [5–8]. This approach consists in choosing some pre-trained deep model as a starting point and its additional training (fine tuning) on the new sample. The key advantage of transfer learning in image classification is the ability to apply deep models with small sample sizes [9]. CNNs obtained as a result of transfer learning were used in many practical applications such as image classification in medicine [10], text recognition [11], object classification within x-ray imagery at airports [12], etc. At the moment several dozen of CNNs trained for image classification are available in open access (GoogLeNet, Alexnet, VGG16, VGG19, ResNet, etc.). Most of them are trained on ImageNet dataset images [13]. For the new image classification problem the question arises whether the transfer learning is applicable to solve it and if so, which of the pre-trained models is best suited for this? In [14], it is noted that the efficiency of transfer learning depends on the distance between the samples of features formed by the CNN’s convolutional part for the original and new problems. In [15, 16], it was shown that the features of CNN trained on ImageNet dataset can be successfully transferred to solve a significantly different problem (biological data classification). Despite the fact that most of the free pre-trained CNNs were trained on ImageNet dataset images they all have different characteristics both in accuracy and in size and hence in required computational resources and performance time. According to the official results of the ILSVRC 2012–2015 competition there is a direct relationship between the classification accuracy and the number of CNN layers [17]. However, there is no reason to believe that the more complex the pre-trained network is, the more accurate it will be for a new classification problem after fine tuning especially if the new problem is very different from the original ImageNet problem. In this paper we propose an approach to choose a pre-trained CNN for fine tuning on the new sample of images. The approach is based on the estimation of pre-trained CNN’s features separability for the considered image classification problem and the choosing CNN with the greatest one.

2 Problem Statement Let C1 ; . . .; CM be the convolutional parts of pre-trained CNNs (AlexNet, VGG16, etc.), M is the number of considered deep models. The convolutional part Cm receives an image x and forms the corresponding feature map at the output, which is a tensor that can be transformed into Lm-dimensional vector zm ¼ ðzm1 ; . . .; zmLm ÞT : zm ¼ Cm ðxÞ, m ¼ 1; M. The dimension Lm is determined by the CNN architecture. Let D ¼ ðiÞ ðiÞ x ; r ; i ¼ 1; n be a sample of n labeled images, xðiÞ is the i-th image, rðiÞ is the corresponding class label, rðiÞ 2 f1; . . .; Kg, i ¼ 1; n, K is the number of classes. We pose the problem of determining which of the models C1 ; . . .; CM is most suitable for the transfer learning, i.e. for its fine tuning on the sample D. One of the approaches to solve this problem is exhaustive search consisting in the fine tuning of each of the models C1 ; . . .; CM on the sample D and then the selection of the best

A Method of Choosing a Pre-trained Convolutional Neural Network

265

accurate model. However, this approach has an obvious drawback in its high computational complexity. Fine tuning of just one model can take up to several hours or even days on modern GPUs. At the same time, a single run of models C1 ; . . .; CM on the sample D is much less computationally expensive procedure. In this paper we propose an approach based on the estimation of the separability of features formed by models C1 ; . . .; CM . It is assumed that the more separable the data observed at the output of the pre-trained CNN’s convolutional part for some sample of images, the more accurate the CNN will be after its fine tuning on this sample. This assumption is based on the fact that the fully connected CNN’s layers located after the convolutional part tend to adapt mostly during fine tuning and the CNN’s convolutional layers particularly the earlier ones adapt their weights much less or insignificantly [18]. In other words, the accuracy of the CNN after fine tuning is highly determined by the quality of the features formed by the pre-trained CNN’s convolution part. n o Let Dm ¼

ðiÞ

zm ; rðiÞ ; i ¼ 1; n

be the labeled sample of CNN’s features where

ðiÞ zm

¼ Cm ðxðiÞ Þ is obtained at the output of the CNN’s the Lm-dimensional vector convolutional part Cm, m ¼ 1; M, as a result of its simulation on the image xðiÞ from sample D. We estimate the separabilities c1 ; . . .; cM of the data in samples D1 ; . . .; DM and select the model that is characterized by the highest separability. This model is assumed to be most suitable for transfer learning as soon as it has a priori the more efficient features among the considered pre-trained CNNs for the given image classification problem.

3 Estimation of CNN’s Features Quality A direct method of estimating the data separability is to train some classifier on this data. Accuracy of the trained classifier on the test sample will be the measure of data separability. The CNN’s fully connected layers can be chosen as such classifiers. Its drawback is dependency on the initial weights, training method and its hyperparameteres, high computational cost and possible overfitting. In this regard we will use robust and high-speed indirect estimation methods. Existing metrics for data separability are usually based on the assumption that the data of one class are spatially close while the classes themselves are far from each other, i.e. that classes form clusters. Thus, some cluster indices are used (Dunn index, Davis-Baldwin index, etc.) as a measure of class separability [19]. But in practice the assumption of class compactness can be violated. We propose a “naive” method for assessing the quality of CNN’s features. Its idea is to assess the separability for each feature independently and construct the overall index of separability based on separabilities of single features. It is known that binary separability of one-dimensional data is characterized by ROC-curve and ROC AUC can be used as a separability measure. The micro-averaged and macro-averaged ROC AUC are generalizations of ROC AUC to the multiclass data [20]. Due to the fact that in practice these multiclass measures are usually very similar to each other we use macro-averaged ROC AUC as simpler one to calculate.

266

A. G. Trofimov and A. A. Bogatyreva ð1Þ

ðnÞ

Let zjm ; . . .; zjm be the sample obtained at the j-th output of model Cm, j ¼ 1; Lm , m ¼ 1; M, as a result of its simulation on the images xð1Þ ; . . .; xðnÞ from sample D. This sample is characterized by macro-averaged ROC AUC ajm calculated using the corresponding class labels rð1Þ ; . . .; rðnÞ . Thus, the outputs of model Cm will be characterized by a vector of macro-averaged ROC AUCs am ¼ ða1m ; . . .; aLm m ÞT , m ¼ 1; M. The overall quality measure of the model Cm’s features is a some function f of the vector am: cm ¼ f ðam Þ, m ¼ 1; M. It is argued that the model with the highest quality measure will be the most accurate after fine tuning on sample D. In order for this statement to be valid, it is necessary to find a transformation f that maximizes the correlation q between model’s quality measure and its accuracy after fine tuning: q ¼ corrððc1 ; . . .; cM Þ; ðp1 ; . . .; pM ÞÞ ! max; f

ð1Þ

where pm is the accuracy of m-th CNN after fine tuning on the sample D, m ¼ 1; M. Problem (1) is a variational problem in the space of functions. Its exact solution can be very difficult. We calculate some statistics (in particular, mean, variance, etc. of the vector am’s elements) as the quality cm and choose the statistic that provides the maximum correlation q. The statistics used in this paper are discussed in Sect. 4.

4 Experimental Results We used a convolutional parts of CNNs trained on ImageNet dataset: AlexNet, VGG19, GoogLeNet, ResNet18, etc. as models C1 ; . . .; CM . Totally M = 10 models. The sample of images (n = 5000) of handwritten digits from MNIST test dataset [21] was used as a sample D. Number of classes K = 10. The features observed at output of each model C1 ; . . .; CM were calculated for images from sample D, consequently the corresponding samples D1 ; . . .; DM were constructed and vectors a1 ; . . .; aM of macro-averaged ROC AUCs were calculated. Each of the models C1 ; . . .; CM was fine tuned on the training part of the MNIST sample and test accuracies p1 ; . . .; pM were obtained. The stopping criterion was to achieve 99% classification accuracy on the training sample or early stopping. To improve the reliability of the results, the training of each model was carried out 10 times and as a result 10 different samples p1 ; . . .; pM were calculated. Further for simplicity the mean (by training processes) accuracy is understood. Figure 1 shows box-and-whisker diagrams of samples a1m ; . . .; aLm m , m ¼ 1; M, and achieved after fine tuning mean accuracy pm, m ¼ 1; M, and its standard deviation. It can be noted on Fig. 1 that the relationship between statistics of sample a1m ; . . .; aLm m and the corresponding accuracy pm, m ¼ 1; M, is unclear. So, the SqueezeNet and AlexNet networks have the best accuracies after fine tuning but statistical characteristics of their features’ AUCs are not the highest. Moreover the average AUC of features formed by the convolutional parts of the SqueezeNet is lower than that of all other networks. The maximum AUC (0.74) is observed for the feature formed by the DenseNet201, however this network doesn’t demonstrate the best accuracy after fine tuning.

A Method of Choosing a Pre-trained Convolutional Neural Network

267

0.98

0.7 0.96

AUC

0.94 0.6

0.92

0.55

0.5

et t18 101 v3 len on resne snet og pti re go ce

in

Accuracy

0.65

0.9

6 1 t50 c7 c6 et g1 vg et20 esne netF en netF n ez r x se ue alex ale en sq

d

Fig. 1. Box-and-whisker diagrams of samples a1m ; . . .; aLm m , m ¼ 1; M, and accuracies achieved after fine tuning of each model C1 ; . . .; CM .

Figure 2 shows scatter plots on the plane (c, p). Different statistical characteristics of samples a1m ; . . .; aLm m , m ¼ 1; M, were used to calculate the measures c1 ; . . .; cM . The highest correlation (q 0.5) is observed between the fine-tuned CNN’s test accuracy and the maximum AUC of its features. In addition, there is an interesting relation between accuracy and the averaged (among all CNN’s features) AUC. As soon as the averaged AUC grows the accuracy at first decreases and then begins to increase. At the same time the networks with the minimum and maximum averaged AUCs (SqueezeNet and AlexNet respectively) have almost the same classification accuracy after fine tuning (97.5%).

Fig. 2. Scatter plots on the plane (c, p). Different statistical characteristics of samples a1m ; . . .; aLm m , m ¼ 1; M, were used to calculate the measures c1 ; . . .; cM – mean (left), standard deviation (center), maximum value (right). Each point corresponds to some CNN.

268

A. G. Trofimov and A. A. Bogatyreva

Other statistical characteristics (asymmetry coefficient, kurtosis, number of main components, etc.) of the samples a1m ; . . .; aLm m , m ¼ 1; M, were also calculated but the corresponding correlation coefficient q for them was less than 0.5. Choosing the CNN to fine tune based on the only greatest AUC of its features seems unreliable. It is known that a more robust statistic can be obtained as a result of some averaging. Thus, we average among q greatest AUCs to calculate cm: 1X amj ; q Lm ; m ¼ 1; M; q j¼1 q

cm ðqÞ ¼

ð2Þ

where the AUCs am1 ; . . .; amLm are sorted in descending order. The greatest AUC is maxfam1 ; . . .; amLm g ¼ cm ð1Þ, m ¼ 1; M. Figure 3 shows the dependency of the correlation coefficient qðqÞ ¼ corr ðcðqÞ; pÞ on the number of CNN’s features q used in averaging in (2).

Fig. 3. A plot of the correlation coefficient q(q) versus the number of CNN’s features q used in averaging (left) and scatter plot on the plane (c, p) for q = 100 (right).

The plot shows that the maximum correlation (qmax 0.74, p-value < 0.01) corresponds to the number of features q 100. It means that the accuracy of the fine tuned network can be predicted more precisely based on the AUC averaged over 100 best features formed by pre-trained CNN.

5 Conclusion It is shown that the accuracy of fine tuned CNN on the test images is strongly correlated with the AUC averaged over the features formed by the CNN’s convolutional part and having the greatest discrimination capability. This makes it possible to predict the accuracy of fine tuned CNN on the new sample of images before the carrying out the expensive fine tuning procedure. The proposed method can be used to make recommendations for researchers who want to apply the pre-trained CNN and transfer learning to solve their own

A Method of Choosing a Pre-trained Convolutional Neural Network

269

classification problems and don’t have sufficient computational resources for multiple fine tunings of available free CNNs and choosing the best one. A possible direction for further research is the construction of the more precise characteristics of CNNs’ features to estimate their capability to transfer learning, i.e. more accurately predict CNNs’ error after fine tuning on a new sample of images. Another and more ambitious direction is the development of a method for quick assessing the capability of the CNN to transfer learning based only on the descriptors of the given sample of images without the calculation and statistical analysis of features formed by of pre-trained CNN.

References 1. Ma, N., et al.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) 2. Baker, B., et al.: Designing neural network architectures using reinforcement learning. arXiv preprint. arXiv:1611.02167 (2016) 3. Mortazi, A., Bagci, U.: Automatically designing CNN architectures for medical image segmentation. In: International Workshop on Machine Learning in Medical Imaging, pp. 98– 106. Springer, Cham (2018) 4. Sun, Y., et al.: Automatically designing CNN architectures using genetic algorithm for image classification. arXiv preprint. arXiv:1808.03818 (2018) 5. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014) 6. Huang, Z., Pan, Z., Lei, B.: Transfer learning with deep convolutional neural network for SAR target classification with limited labeled data. Remote Sens. 9(9), 907 (2017) 7. Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big data 3(1), 9 (2016) 8. Kulik, S.: Neural network model of artificial intelligence for handwriting recognition. J. Theor. Appl. Inf. Technol. 73(2), 202–211 (2015) 9. Larsen-Freeman, D.: Transfer of learning transformed. Lang. Learn. 63, 107–129 (2013) 10. Ghafoorian, M., et al.: Transfer learning for domain adaptation in MRI: application in brain lesion segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 516–524. Springer, Cham (2017) 11. Tang, Y., Peng, L., Xu, Q., Wang, Y., Furuhata, A.: CNN based transfer learning for historical Chinese character recognition. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 25–29 (2016) 12. Akcay, S., et al.: Transfer learning using convolutional neural networks for object classification within x-ray baggage security imagery. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1057–1061 (2016) 13. ImageNet. http://www.image-net.org 14. Yosinski, J., et al.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014) 15. Zhang, W., et al.: Deep model based transfer and multi-task learning for biological image analysis. IEEE Trans. Big Data 99, 1 (2016)

270

A. G. Trofimov and A. A. Bogatyreva

16. Trofimov, A.G., Velichkovskiy, B.M., Shishkin, S.L.: An approach to use convolutional neural network features in eye-brain-computer-interface. In: International Conference on Neuroinformatics, pp. 132–137. Springer, Cham (2017) 17. He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 18. Reyes, A.K., Caicedo, J.C., Camargo, J.E.: Fine-tuning deep convolutional networks for plant recognition. CLEF (Working Notes), p. 1391 (2015) 19. Desgraupes, B.: Clustering Indices. University of Paris Ouest-Lab Modal’X, Paris (2013) 20. Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: European Conference on Machine Learning, pp. 406–417. Springer, Berlin, Heidelberg (2007) 21. The MNIST database. http://yann.lecun.com/exdb/mnist/

The Usage of Grayscale or Color Images for Facial Expression Recognition with Deep Neural Networks Dmitry A. Yudin1(&) , Alexandr V. Dolzhenko2 and Ekaterina O. Kapustina2 1

,

Moscow Institute of Physics and Technology (National Research University), Institutsky Per. 9, Dolgoprudny, Moscow Region 141700, Russia [email protected] 2 Belgorod State Technological University named after V.G. Shukhov, Kostukova Str. 46, Belgorod 308012, Russia

Abstract. The paper describes usage of modern deep neural network architectures such as ResNet, DenseNet and Xception for the classification of facial expressions on color and grayscale images. Each image may contain one of eight facial expression categories: “Neutral”, “Happiness”, “Sadness”, “Surprise”, “Fear”, “Disgust”, “Anger”, “Contempt”. As the dataset was used AffectNet. The most accurate architecture is Xception. It gave classification accuracy on training sample 97.65%, on cleaned testing sample 57.48% and top2 accuracy on cleaned testing sample 76.70%. The category “Contempt” is worst recognized by all the types of neural networks considered, which indicates its ambiguity and similarity with other types of facial expressions. Experimental results show that for the considered task it does not matter, the color or grayscale image is fed to the input of the algorithm. This fact can save a significant amount of memory when storing data sets and training neural networks. The computing experiments was performed using graphics processor using NVidia CUDA technology with Keras and Tensorflow deep learning frameworks. It showed that the average processing time of one image varies from 4 ms to 30 ms for different architectures. Obtained results can be used in software for neural network training for face recognition systems. Keywords: Image recognition Classification Facial expression Emotion Face Deep learning Convolutional neural network

1 Introduction Currently, significant progress has been made in creating efficient image recognition algorithms based on the use of deep neural networks [1–3]. As a rule, such algorithms require the presence of a large number of images that are obtained in different lighting and noise conditions. They need huge amounts of memory for storage as well as for training. There are subject areas for which it is advisable to study the possibility of using grayscale images instead of color during training of recognition algorithms. This can reduce by three times the need for RAM or hard disk space. © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 271–281, 2020. https://doi.org/10.1007/978-3-030-30425-6_32

272

D. A. Yudin et al.

One of these areas is the task of person’s facial expression recognition. The standard for determining the type of facial expressions is the Emotional facial action coding system (EMFACS-7), proposed by Friesen and Ekman in 1983 [4]. This generally accepted standard identifies seven basic types of emotions: (1) anger, (2) contempt, (3) disgust, (4) fear, (5) happiness, (6) sadness, (7) surprise. Additionally, it is considered a neutral facial expression. At the initial stage, methods for emotions recognition on human face images were associated with manual selection of features: Gabor wavelets [5], local binary patterns [6], geometric deformation features on image sequences [7], 3D Surface Features [8], etc. Modern approaches are based on the automatic generation of image features based on deep convolutional neural networks, some of them use the prior alignment technique [9], some recognize facial expressions on images as they are [1, 10], including tuning of pre-trained networks on the task of face identification [11]. Also deep spatialtemporal networks were proposed for emotion recognition on video sequences [12]. Deep learning is also actively used to analyze facial expressions from face threedimensional model [13]. There are a number of commercial services that implement closed-ended emotion recognition methods: Face API from Microsoft Azure [14], Amazon Emotion API [15], Affectiva Emotion SDK [16], etc. However, the recognition of facial expressions on images in complex conditions of variable light, noise, uncomfortable perspective is still an important topic for further research. For the study of approaches based on neural networks, there are many different data sets that differ in shooting conditions, the variety of people photographed, and the number of images per class. Some popular datasets and their features are listed in Table 1: – Cohn-Kanade AU-Coded Expression Database [17] (CK), it is shown the statistics for the case with the first two and last two frames of image sequences from database, – The Japanese Female Facial Expression Database [5] (JAFFE), – Facial Expression Recognition Challenge [18] (FER2013), – Facial expressions Repository [19] (FE), – SoF dataset [20] (SoF), – AffectNet [21]. The largest of them is AffectNet dataset (a total of more than 1 million images). In addition to manually labeled data, it contains automatically annotated images that researchers or developers can label and check on their own if necessary. This paper discusses the issue of facial expression recognition on static images using modern deep learning methods, as well as choosing the format of the input data. On the one hand, color images provide additional information about a person’s face, on the other hand, using grayscale images reduces the effect of shooting conditions: light level, type of light source, etc. To choose one of these forms of image representation, it is necessary to conduct experiments with various architectures of neural networks with different sizes of input images.

The Usage of Grayscale or Color Images for FER

273

Table 1. Datasets for facial expression recognition Database details

CK

JAFFE

FER2013 FE

SoF

AffectNet

48 48

640 480 Portrait

129 129– 4706 4706 Cropped face

Color

Color

667 1042 237 (sad/anger/disgust) 145 (surprise/fear) 0 0 0 0 2091

75374 134915 25959

Image 640 490– 256 256 size 720 480 Image Portrait Portrait style Image Grayscale, Grayscale type color Facial expression categories: Neutral 324 30 Happy 138 31 Sad 56 31

23 29– 355 536 Cropped Cropped face face Grayscale Grayscale, color 6194 8989 6077

6172 5693 220

Surprise Fear Disgust Anger Contempt Total:

4002 5121 547 4953 0 35883

364 21 208 240 9 12927

166 50 118 90 36 978

30 32 29 30 0 213

14590 6878 4303 25382 4250 291651

2 Task Formulation In this paper we will solve the task of determining one of the eight facial expression categories (“Neutral”, “Happiness”, “Sadness”, “Surprise”, “Fear”, “Disgust”, “Anger”, “Contempt”) on grayscale or color images with cropped faces, see the Fig. 1. We had taken modern and the biggest open-source dataset – AffectNet which contains 287651 images as training sample and 4000 images (500 images per class) as testing sample [21]. Samples include images of different sizes from 129 129 to 4706 4706 pixels that are obtained from different cameras in different shooting conditions.

0

1

2

3

4

5

6

7

Fig. 1. Examples of labeled images with facial expressions from AffectNet Dataset: 0 – Neutral, 1 – Happiness, 2 – Sadness, 3 – Surprise, 4 – Fear, 5 – Disgust, 6 – Anger, 7 – Contempt

274

D. A. Yudin et al.

To solve the task it is necessary to develop various variants of deep neural network architectures and to test them on the available data set with 1-channel (grayscale) and 3-channel (color) image representation. We must determine which image representation is best used for the task of facial expression recognition. Also, we need to select the best architecture that will provide best performance and the highest quality measures of image classification: accuracy, precision and recall [22].

3 Dataset Preparation AffectNet [21] was chosen as the main dataset, which is one of the largest modern datasets for facial expression recognition. However, it contains relatively few images for the “Fear”, “Disgust” and “Contempt” categories compared to other categories. To conduct experiments for learning neural networks, augmentation of images was carried out, and a balanced training sample was formed with 10,000 images per class. For image augmentation we have used 5 sequential steps: 1. Coarse Dropout – setting rectangular areas within images to zero. We have generated a dropout mask at 2 to 25 percent of image’s size. In that mask, 0 to 2 percent of all pixels were dropped (random per image). 2. Affine transformation – image rotation on random degrees from −15 to 15. 3. Flipping of image along vertical axis with 0.9 probability. 4. Addition Gaussian noise to image with standard deviation of the normal distribution from 0 to 15. 5. Cropping away (cut off) random value of pixels on each side of the image from 0 to 10% of the image height/width. Results of this augmentation procedure are shown on Fig. 2.

Fig. 2. Examples of augmented images for Training sample 2 (balanced)

As the most of open-source datasets AffectNet contains wrong ground truth labels for cropped faces (Fig. 3). We had cleaned the testing sample for more correct evaluation of classifiers. As a result we create Testing sample 2 which have 3210 images.

The Usage of Grayscale or Color Images for FER

Errors in “Neutral” category

Errors in “Happiness” category

Errors in “Sadness” category

Errors in “Surprise” category

Errors in “Fear” category

Errors in “Disgust” category

Errors in “Anger” category

275

Errors in “Contempt” category

Fig. 3. Examples of wrong ground truth labels in testing sample of AffectNet Dataset

Details of the datasets used in this research are given in Table 2. Table 2. Training and testing samples of used dataset Facial expression category 0 - Neutral 1 - Happiness 2 - Sadness 3 - Surprise 4 - Fear 5 - Disgust 6 - Anger 7 - Contempt Total:

Training sample 1 74874 134415 25459 14090 6348 3803 24882 3749 287621

Training sample 2 (balanced) 10000 10000 10000 10000 10000 10000 10000 10000 80000

Testing sample 1 500 500 500 500 500 500 500 500 4000

Testing sample 2 (cleaned) 490 451 473 453 477 359 351 156 3210

4 Classification of Emotion Categories Using Deep Convolutional Neural Networks In this paper to solve formulated task we investigate the application of a deep convolutional neural networks of three architectures: – ResNetM architecture inspired from ResNet [23] and implemented by authors in previous works [24]. It has input tensor 120 120 3 for color images and 120 120 1 for grayscale images. Its structure is shown in Fig. 4 and contains 3 convolutional blocks, 5 identity blocks, 2 max pooling layers, 1 average pooling layer and one output dense layer. First 11 layers and blocks provide automatic feature extraction and the last one fully connected layer allows us to find one of five image classes corresponding to input image. ResNetM net was trained on full Training sample 1.

276

D. A. Yudin et al.

Fig. 4. ResNetM architecture.

– DenseNet architecture is based on DenseNet169 model [25] with input tensor 224 224 3 for color images and 224 224 1 for grayscale images. Its structure uses alternating Dense and Transition blocs (Fig. 5). The dataset from Training sample 2 containing 4000 images per class was prepared for DenseNet training.

Fig. 5. DenseNet architecture.

– Xception architecture [26] with changed input tensor to 120 120 3 for color images and 120 120 1 for grayscale images. This structure is a development of the Inception [27] and is based on prospective Separable convolutional blocks architectures (see Fig. 6). Xception net was trained on balanced Training sample 2.

The Usage of Grayscale or Color Images for FER

277

Fig. 6. Xception architecture.

Output layer in all architecture has 8 neurons with “Softmax” activation function. All input images are pre-scaled to a size of 60 60 pixels for ResNetM architecture, 120 120 pixels for Xception architecture and 224 224 pixels for DenseNet169. Neural networks works with color (three-channel) and grayscale (one-channel) images. To train the neural networks we have used “categorical crossentropy” loss function, Stochastic Gradient Descent (SGD) as training method with 0.001 learning rate. Accuracy is used as classification quality metric during training. The batch is consisted of 5 images. The training process of deep neural networks is shown in Fig. 7. The training experiment was carried out for 50 learning epochs using our developed software tool implemented on Python 3.5 programming language with Keras + Tensorflow frameworks [28]. We can see that DenseNet and Xception networks have similar speed and accuracy, while ResNetM achieves much lower accuracy rates on test samples compared to them. The calculations had performed using the NVidia CUDA technology on the graphics processor of the GeForce GTX 1060 graphics card with 6.00 GB, central processor Intel Core i-5-8300H, 4 Core with 2.3 GHz and 24 GB RAM.

278

D. A. Yudin et al.

Fig. 7. Training of deep neural networks with ResNetM, DenseNet and Xception architectures.

Table 3 shows the results of the facial expression recognition on training and test samples with color or grayscale images using ResNetM, DenseNet169 and Xception architectures. Analysis of the obtained results shows the highest accuracy and on all samples Xception architecture with grayscale input images: 97.65% on training sample, 57.48% on testing sample 2 and top-2 accuracy 76.70%. It also has the greatest and more balanced values of precision and recall for almost all categories (classes) of facial expression except for “Anger” и “Contempt”. ResNetM is significantly faster than all other architectures: about 4 ms for processing a single image against 12 ms for Xception and 30 ms for DenseNet. Also, this architecture has the highest recognition recall for the “Happiness” category. DenseNet surpasses all other architectures in “Anger” category recognition and is better in terms of recognition recall of “Fear” and “Contempt” categories. Also it has the highest precision for “Neutral” category. The category “Contempt” is poorly recognized by all the types of neural networks considered, which speaks primarily of its ambiguity and similarity with other types of facial expressions, in particular “Neutral”.

The Usage of Grayscale or Color Images for FER

279

Table 3. Quality of facial expression recognition on AffectNet Dataset Metric

ResNetM DenseNet Color Grayscale Color Accuracy on train sample 0.9283 0.9139 0.9168 Accuracy on test 0.4844 0.4781 0.5520 sample 2 Top-2 acc. on test 0.6748 0.6766 0.7467 sample 2 Classif. time per image, s 0,0042 0.0047 0.0305 Weights number 2613392 2607120 12656200 Size of model, Mb 10.654 10.629 51.933 Size of train sample on 26373.7 9520.9 3432.7 HDD, Mb 12425.2 4141.7 19267.6 Size of train sample in operative memory, Mb Quality metrics on test sample 2 (cleaned) Neutral (0): precision 0.375 0.4083 0.4838 Neutral (0): recall 0.6061 0.5000 0.5490 Happiness (1): precision 0.5214 0.5325 0.7363 Happiness (1): recall 0.9468 0.9268 0.7428 Sadness (2): precision 0.5184 0.4103 0.6070 Sadness (2): recall 0.4165 0.5370 0.4735 Surprise (3): precision 0.4810 0.4708 0.4977 Surprise (3): recall 0.3907 0.3377 0.4966 Fear (4): precision 0.5880 0.5951 0.5867 Fear (4): recall 0.3501 0.3542 0.5744 Disgust (5): precision 0.6510 0.5679 0.5287 Disgust (5): recall 0.2702 0.3259 0.6156 Anger (6): recall 0.4645 0.4550 0.5552 Anger (6): recall 0.5413 0.4900 0.4587 Contempt (7): precision 0.3333 0.5384 0.3103 Contempt (7): recall 0.0192 0.0448 0.4038

Xception Grayscale Color 0.9428 0.9686 0.5427 0.5654

Grayscale 0.9765 0.5748

0.7371

0.7355

0.7670

0.0299 12649928 51.908 1210.6

0.0120 20877872 83.826 7624.1

0.0123 20877296 83.823 2670.2

6422.5

13824.0

4608.0

0.5644 0.3755 0.7701 0.7428 0.5617 0.4715 0.5177 0.5475 0.5665 0.5898 0.5600 0.5070 0.4456 0.5954 0.2827 0.5128

0.5422 0.4592 0.7363 0.8049 0.5221 0.6490 0.5455 0.5033 0.6181 0.5597 0.5912 0.5599 0.4941 0.4758 0.3065 0.3654

0.5223 0.5735 0.7973 0.7849 0.6099 0.4693 0.5000 0.6137 0.5864 0.5765 0.6655 0.5097 0.4802 0.5869 0.3407 0.2949

As for the size of the network, here the smallest amount of memory is occupied by the weights for ResNetM (about 10.6 MB), the largest volume by the weights of the Xception network (83.8 MB). For all considered types of neural networks, the representation of the input images in gray or color format did not lead to any significant difference in the values of the metrics accuracy, top-2 accuracy, processing time per image, and weights number. Thus, it can be concluded that for the facial recognition task it does not matter, the color or grayscale image is fed to the algorithm. This fact can save a significant amount of memory when storing datasets (about 65% of HDD space) and training of neural networks (about of 67% of operative memory).

280

D. A. Yudin et al.

5 Conclusions It follows from the Table 3 that the applied architectures of a deep neural network for face expression recognition on AffectNet dataset show high quality indicators for the training set, but significantly worse results on the testing sample. This can be explained by the ambiguity of certain emotions on a person’s face, a variety of shooting angles and the presence of conflicting data in the training sample. The most accurate architecture is Xception. It gave classification accuracy 97.65% on training sample, 57.48% on testing sample 2 and top-2 accuracy 76.70% on testing sample 2. The category “Contempt” is worst recognized by all the types of neural networks considered, which indicates its ambiguity and similarity with other types of facial expressions. Experimental results show that for the considered task it does not matter, the color or grayscale image is fed to the input of the algorithm. This fact can save a significant amount of memory when storing data sets and training neural networks. An important aspect for the further application of the considered approaches is the average classification time per image. It varies from 4 ms for ResNetM to 30 ms for DenseNet. This suggests that the all described approaches can be integrated into a realtime face recognition software. To further studies on the paper topic it is necessary to expand the training and test samples to cover more images in “Fear”, “Disgust” and “Contempt” categories. Also, it will be promising to explore the emotion recognition on images with a face alignment based on key points, in order to reduce the impact of choosing bounding box of face detection algorithms. Acknowledgment. The research was made possible by Government of the Russian Federation (Agreement №. 075-02-2019-967).

References 1. Zeng, N., Zhang, H., Song, B., Liu, W., Li, Y., Dobaie, A.M.: Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273, 643–649 (2018) 2. Yudin, D., Knysh, A.: Vehicle recognition and its trajectory registration on the image sequence using deep convolutional neural network. In: The International Conference on Information and Digital Technologies, pp. 435–441 (2017) 3. Yudin, D., Naumov, A., Dolzhenko, A., Patrakova, E.: Software for roof defects recognition on aerial photographs. J. Phys. Conf. Ser. 1015(3), 032152 (2018) 4. Friesen, W., Ekman, P.: EMFACS-7: emotional facial action coding system. Unpublished Manuscript Univ. Calif. San Francisco 2(36), 1 (1983) 5. Lyons, M.J., Akemastu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with gabor wavelets. In: 3rd IEEE International Conference on Automatic Face and Gesture Recognition, pp. 200–205 (1998) 6. Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009) 7. Kotsia, I., Pitas, I.: Facial expression recognition in image sequences using geometric deformation features and support vector machines. IEEE Trans. Image Process. 16(1), 172– 187 (2006)

The Usage of Grayscale or Color Images for FER

281

8. Wang, J., Yin, L., Wei, X., Sun, Y.: 3D facial expression recognition based on primitive surface feature distribution. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006) (2006) 9. Lopes, A.T., de Aguiar, E., De Souza, A.F., Oliveira-Santos, T.: Facial expression recognition with convolutional neural networks: coping with few data and the training sample order. Pattern Recogn. 61, 610–628 (2017) 10. Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition using deep neural networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2016) 11. Ding, H., Zhou, S.K., Chellappa, R.: FaceNet2ExpNet: regularizing a deep face recognition net for expression recognition. In: 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017) (2017) 12. Zhang, K., Huang, Y., Du, Y., Wang, L.: Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017) 13. Zhang, T., Zheng, W., Cui, Z., Zong, Y., Yan, J., Yan, K.: A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimedia 18(12), 2528–2536 (2016) 14. Face API oт Microsoft Azure. https://azure.microsoft.com/ru-ru/services/cognitive-services/ face/#detection. Accessed 26 May 2019 15. Amazon Emotion API. https://docs.aws.amazon.com/rekognition/latest/dg/API_Emotion. html. Accessed 26 May 2019 16. Affectiva Emotion SDK. https://www.affectiva.com/product/emotion-sdk/. Accessed 26 May 2019 17. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohn-kanade dataset (CK+): a complete expression dataset for action unit and emotionspecified expression. In: Proceedings of the Third International Workshop on CVPR for Human Communicative Behavior Analysis (CVPR4HB 2010), pp. 94–101 (2010) 18. Carrier, P.-L., Courville, A.: Challenges in representation learning: facial expression recognition challenge (2013). https://www.kaggle.com/c/challenges-in-representationlearning-facial-expression-recognition-challenge/data. Accessed 26 May 2019 19. Facial expressions. A set of images for classifying facial expressions. https://github.com/ muxspace/facial_expressions. Accessed 26 May 2019 20. Afifi, M., Abdelhamed, A.: AFIF4: deep gender classification based on an AdaBoost-based fusion of isolated facial features and foggy faces. J. Vis. Commun. Image Represent. 62, 77– 86 (2019) 21. Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017) 22. Olson, D.L., Delen, D.: Advanced Data Mining Techniques, 1st edn. Springer, Cham (2008) 23. Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S.: Deep residual learning for image recognition. ECCV. arXiv:1512.03385 (2015) 24. Yudin, D., Kapustina, E.: Deep learning in vehicle pose recognition on two-dimensional images. Adv. Intell. Syst. Comput. 874, 434–443 (2019) 25. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. CVPR 2017. arXiv:1608.06993 (2017) 26. Chollet, F.: Xception: deep learning with depthwise separable convolutions. CVPR 2017. arXiv:1610.02357 (2017) 27. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. ECCV. arXiv:1512.00567 (2016) 28. Chollet, F.: Keras: deep learning library for theano and tensorflow. https://keras.io/. Accessed 26 May 2019

Applications of Neural Networks

Use of Wavelet Neural Networks to Solve Inverse Problems in Spectroscopy of Multi-component Solutions Alexander Efitorov1(&) , Sergey Dolenko1 , Tatiana Dolenko1,2 , Kirill Laptinskiy1,2, and Sergey Burikov1,2 1

D.V. Skobeltsyn Institute of Nuclear Physics, M.V. Lomonosov Moscow State University, Moscow 119991, Russia [email protected], [email protected] 2 Physical Department, M.V. Lomonosov Moscow State University, Moscow 119991, Russia

Abstract. Wavelet neural networks (WNN) are a family of approximation algorithms that use wavelet functions to decompose the approximated function. They are more flexible than conventional multi-layer perceptrons (MLP), but they are more computationally expensive, and require more effort to find optimal parameters. In this study, we solve the inverse problems of determination of concentrations of components in multi-component solutions by their Raman spectra. The results demonstrated by WNN are compared to those obtained by MLP and by the linear partial least squares (PLS) method. It is shown that properly used WNN are a powerful method to solve multi-parameter inverse problems. Keywords: Wavelet neural networks Inverse problems Raman spectroscopy Partial least squares Multi-layer perceptron

1 Introduction The inverse problems of determination of concentrations of components in multicomponent solutions by processing optical spectra have been successfully solved using multilayer perceptrons (MLP) [1–3]. It should be noted that such a solution is based on the properties of the MLP as a universal approximator, and the resulting solution is an approximation of a multi-parameter inverse function that maps the spectrum to the set of determined parameters of the problem. Approximation is carried out by decomposing the approximated function over the basis of transfer functions of the hidden layer of MLP, with the adjustment of the parameters of the basis functions in the process of network training. Initially, all basis functions have approximately the same parameters, including the same characteristic scale in the space of input features. With a sufficient amount of training patterns, the parameters will eventually be adjusted optimally, in particular, will take into account the characteristics of the data – for example, the width of the spectral bands characteristic of certain components of the object of study. However, for real spectroscopic problems, the amount of available data © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 285–294, 2020. https://doi.org/10.1007/978-3-030-30425-6_33

286

A. Efitorov et al.

is usually too small (at best, the several thousand of patterns, with the dimension of the input data of the order of hundreds). This means that when training the network, it is almost inevitable that a local (rather than global) minimum of error functional will be found, even if it is deep enough, and the solution found will be only quasi-optimal. A possible solution of the problem is such a change in the decomposition basis, in which the functions of both the same and multiple different scales in the space of input features are already initially present in the basis. Such a basis is provided by wavelet neural networks (WNN) [4]. At the same time there is reason to believe [5, 6], that in the transition to the wavelet basis the form of the error functional will change in such a way that the local minima will become deeper and will approach in their depth to the global minimum, which will lead to a decrease in the average error of the approximation solution of the desired problem of modeling the inverse function. In this case, the network will be able to work more efficiently with data that includes simultaneously spectral bands of multiple different widths. The classical approach to the formation and training of WNN has already been worked out in detail, due to the fact that historically it appeared earlier. In particular, the algorithm of error backpropagation by the method of stochastic gradient descent (SGD) was used for training [7], as well as its combinations with the least squares method [8], Kalman filter [9], genetic algorithms (GA) [10]. In addition to a comparative analysis of the use of GA and SGD backpropagation, it is interesting to consider new popular methods of optimization Adam [11] and AdaGrad [12]. This analysis should be carried out for WNN with linear and nonlinear activation functions in the output layer, as well as with various families of wavelets.

2 Statement of Inverse Problem and Experimental Data The inverse problem (IP) considered in this study is determination of types and concentrations of components in multi-component water solution of inorganic salts by Raman spectra. The principle possibility of solving this problem is due to the fact that the bands of the Raman spectrum of an aqueous solution are very sensitive to the presence of dissolved salts/ions in it (Fig. 1). Complex ions (SO42− – sulphates, NO4− – nitrates, CO32− – carbonates etc.) have their own Raman lines in the region of 500–1500 cm−1 (the area of the so-called “fingerprints”), which makes it possible to uniquely determine the type of ion and its concentration. The presence of simple ions that do not have their own Raman lines, however, is also manifested in the Raman spectra of aqueous solutions. Ions such as Cl−, I−, Br−, Na+, K+ etc. affect the shape and position of the most intense band of the spectrum – the band of stretching vibrations of water molecules in the region 3000–4000 cm−1 [13–15]. Different ions have different effects. In addition, the behavior of the quantitative characteristics of Raman spectral bands of water depends significantly on the state of the solutes: the presence of associates, contact and non-contact ion pairs, etc. in the solution also appears in its spectrum. This problem is inherently a complex IP. This nature of the task involves determining the concentration of a large set of simultaneously dissolved substances in a wide range of concentrations – from tenths to units of mole per liter. Such tasks are

Use of Wavelet Neural Networks to Solve Inverse Problems

287

Fig. 1. Raman spectra of distilled water and of multi-component water solutions of inorganic salts (left – “fingerprint” area, right – Raman valence band of water). 1 – distilled water, 2 – KNO3 – 0.6M, Li2SO4 – 0.75 M; 3 – NaCl-0.5 M, NH4Br – 1.75 M, CsI – 0.25 M; 4 – NaCl – 0.2 M, NH4Br – 0.2 M, Li2SO4 – 0.4 M, KNO3 – 1 M, CsI – 0.6 M.

relevant in the diagnosis of wastewater and process water, mineral water, sea and river reservoirs. It is obvious that the components of the solution interact both with the solvent molecules and with each other, and these interactions are of a complex nonlinear nature. Formation of associates, ion pairs, etc. is also possible. This leads to the fact that it is impossible to create a model that adequately describes the molecular interactions in the solution. In addition, it should be borne in mind that the information content of different spectral channels varies. The spectral regions in which the lines of complex ions and the valence band of water are located are obviously the ones most sensitive to the type and concentration of dissolved substances. The area of the deformation band of water (1600–1700 cm−1) and the area of the associative band (2000–2400 cm−1) are much less informative. The presence of the above factors leads to the fact that the dependence of the signal intensity in different spectral channels on the concentration of solutes is significantly nonlinear. The situation is complicated by the fact that the spectral bands that need to be analyzed simultaneously differ significantly from each other both in intensity (for example, the valence band of water is about 100 times more intense than the deformation band) and in width (for example, the width of the lines of nitrate anions is units of cm−1, and the width of the valence band of water at half-height is about 500 cm−1). In addition, the specificity of spectroscopic methods from the point of view of data processing is such that it implies the solution of inverse problems for extracting the necessary information from highdimensional data, since the recorded spectra contain thousands of channels. Previous experience of the authors showed that such multi-parameter IP are quite effectively solved with the help of MLP. The developed methods, as well as the use of a number of methods to reduce the dimension of the input data, allowed simultaneous determination of the concentrations of 5 salts in water – NaCl, NH4Br, Li2SO4, KNO3,

288

A. Efitorov et al.

CsI – with an average error of 0.02 M in concentration measurement when operating in the concentration range 0…2.5 M [1]. (Here we shall call this IP “5 salts problem”.) However, at present, many applications require greater accuracy of salt identification and determination of their concentration in multicomponent media, while increasing the number of components in solutions, for example, diagnostics of process and wastewater. Therefore, it is necessary to develop new methods and approaches that will take into account the increasing complexity of interactions in solutions and the specifics of spectroscopic methods. The 5 salts problem provided the determination of individual concentrations of components in a solution of inorganic salts, implying the component salt as a whole (cation and anion together). It is clear that in the solution at concentrations far from the solubility limit of the salt, it is in a completely dissociated state, and cations and anions exist in the solution independently of each other and have an independent effect on the Raman spectrum. That is, it is correct to understand as the component of the solution not the salt as a whole, but a particular ion. Thus, a more correct formulation of the problem is the identification and determination of the concentration of individual ions in multicomponent solutions. In this case, the number of components to be determined in solutions increases dramatically. Moreover, the number/molarity of one ion does not always correspond exactly to the molarity of the counter-ion, as in the case of 5 salts problem, when salts with non-repeating ions were dissolved. Therefore, the problem of diagnostics of multicomponent ion solutions is much more complicated. This second IP (for solutions of 10 salts with repeated 10 ions Na+, NH4+, Li+, K+, Cs+, Cl−, Br−, SO42−, NO3−, I−, we shall call it “10 ions problem”) was solved with the help of MLP [3]. The accuracy of the determination of complex ions was 10−4 M, simple ions 10−3 M. However, while such accuracy is quite satisfactory for monitoring of discharge and formation waters, the diagnosis of e.g. mineral waters requires higher accuracy – down to 10−5 – 10−6 M. The 5 salts and 10 ions problems were first compared from the point of view of their solution by MLP in [16]. The data array for the 5 salts problem consisted of 9144 patterns (spectra) with 1535 input features (channels), for 10 ions problem – 4445 patterns with 1824 features.

3 Results: Feature Extraction First we present the results of solving the 10 ions problem with the partial least squares (projection to latent structures) (PLS) method [17] and with MLP. The dataset was randomly divided into training, validation and test sets in a ratio of 70:20:10. PLS and MLP were applied both to the initial data and to the data processed by various compression methods. Data compression is used to reduce the dimension of the input data. Thus, for inverse problems of spectroscopy, the spectra contain thousands of channels, thus making any approximation method tend to overtrain. At the same time, it is clear that not all spectral channels are equally informative. Very often, reducing the input dimension allows increasing the accuracy of the solution. In this case, only the most informative input features remain, and the construction of the PLS model or MLP training is carried out on patterns with a smaller number of input features extracted by some algorithm.

Use of Wavelet Neural Networks to Solve Inverse Problems

289

The input data can be compressed in different ways. The simplest method is the aggregation of spectral channels, consisting in summation of intensities in some number of neighboring channels and averaging over these channels. In this study, in addition to channel aggregation, the input data compression using discrete and continuous wavelet transform (DWT and CWT) was used. In this case, the initial spectrum is considered as some scale space with the best resolution, and for some given basis of orthogonal functions there is a set of subspaces with less detail. Calculations of the DWT were carried out using R language using the wavethresh library: Wavelet Statistics and Transforms [18]. The wavelets of the family Daubechies 10 [19] were used. The CWT was calculated using our own code implementation in Python language, supporting parallel computations on GPU through the use of library functions of the tensorflow library [20]. Computational experiments with MLP training were carried out by means of Python language on the basis of machine learning libraries scikit-learn [21] and tensorflow. Construction of the PLS model was stopped when convergence was achieved on the training set. The results of the application of the PLS method are shown in Fig. 2. As algorithms for compression of the input data, we used aggregation by 8 adjacent input features, DWT for 4th, 5th, 6th, and 7th levels, and CWT with convolution width of 8, 16, 32 and 64 channels.

Fig. 2. Application of the PLS method to data with different compression of input features: mean absolute error on the test dataset. The methods used are: Aggr – aggregation, DWT – discrete wavelet transform, CWT – continuous wavelet transform; the number of input features is separated by a space.

As can be seen, use of some methods of compression of the input data improves the result of application of the PLS method to solve the 10 ions IP, compared with use of the initial data. In the case of DWT, the best result is achieved using level 5 (32 approximation and 32 detail coefficients). Aggregation by 8 features provides the result

290

A. Efitorov et al.

better than DWT. The best result is achieved when using the CWT with a window 16 channels wide (190 input features). On the average, the best accuracy of determination of salt concentration is 0.034 M. When solving the 10 ions IP using MLP, a perceptron with two hidden layers (120 neurons in the first hidden layer and 60 in the second) was used. Each network was trained 5 times with various initial weights; the results of all 5 networks were averaged. The results of the application of ANN method are shown in Fig. 3. The initial data used was the same as in the PLS method.

Fig. 3. Application of the MLP to data with different compression of input features: mean absolute error on the test dataset. The legend is the same as in Fig. 2.

The results of application of MLP without compression of the input data are worse than the results obtained using DWT. In this case, DWT provides a greater error than aggregation, and the best result is provided by CWT. The smallest error is achieved using CWT with a window 32 channels wide (94 input features). On the average, the mean absolute error of determination of salt concentration is 0.023 M. Of the three methods of extraction of informative features considered above, the method of CWT is the best. It demonstrates the lowest values of the mean absolute error on the test dataset when using MLP or PLS. MLP shows significantly better results than PLS, indicating a significant nonlinearity of the problem.

4 Results: Use of Wavelet Neural Networks As a possible alternative to the two methods described above, we investigated the classical wavelet neural network trained by gradient methods using the error back propagation algorithm. We created a software implementation of classical wavelet neural networks using modern techniques of parallel programming.

Use of Wavelet Neural Networks to Solve Inverse Problems

291

First, an implementation of the WNN classical scheme on the basis of the Python programming language and a number of libraries of this language has been created. This initial implementation was a classic Python code within object-oriented programming. This implementation allowed to work out all the computational operations performed during the training and application of the WNN, and to observe the evolution of the parameters throughout the training. The following problems were identified: saturation of weights and exit of their values out of the domain of definition of wavelet functions in the training process (by the shift parameter). In combination with the procedure of multiplication inside the wavelon, this leads to the fact that it will provide the zero value, both in direct run, and in the reverse propagation of the error. This makes further adjustment of weights by the gradient method impossible. The second parameter, the values of which also need to be artificially limited, is the scale parameter. If the value of this parameter is very large, the function definition area will suffer again, actually leading to the Delta function behavior, which will lead to negative consequences similar to those mentioned above. At the same time, these parameters are interrelated, so simply establishing hard constraints on their values limits the definition area too much, and often does not allow finding optimal solutions by the method of stochastic gradient descent (SGD). The main way to deal with these problems was use of special effective approaches to determining the initial values of weights (parameters) of the WNN. In this study, new gradient descent algorithms were tested in the training of WNN: Adam and Adadelta, in comparison with the classical SGD. As expected, SGD demonstrated slower convergence and a high degree of dependence on the initialization of the weights, and in case of an unfortunate coincidence – the problems described above: going beyond the definition area of wavelet functions and the need to interrupt training. However, when SGD was run many times, it was usually possible to obtain a model comparable in properties to that trained by Adam algorithm. The Adadelta method did not allow obtaining the best solutions, however, it should be noted that it was often inclined to large values of learning rate parameters. This method may require a more thorough search for the optimal parameters of the training algorithm. Note that for several problems tested, and for the three methods of reducing the data dimension for each of the problems, in only one scenario, SGD surpassed Adam. In all other cases it was WNN trained by Adam that showed the best results. Therefore, the most effective approach has been the combination of setting limits on the values taken by the parameters and using the Adam method for training. The next stage was the implementation of training and application of the WNN on the basis of tensorflow high-performance machine learning library, which allowed use of multithreaded calculations on CPU and GPU, greatly reducing calculation time. Writing control scripts for the heterogeneous computing cluster allowed making calculations simultaneously on more than 150 processor cores, managing all procedures for storing and processing data from the cluster control terminal. Finally, the results of solving 5 salts and 10 ions IPs using the classical WNN were compared with the results obtained using the classical MLP and the PLS method. The comparison is presented in Fig. 4 (5 salts) and Fig. 5 (10 ions).

292

A. Efitorov et al.

Fig. 4. Comparison of the results of solving the 5 salts problem by the algorithms of WNN, MLP and PLS on the initial data and after compression by PCA, DWT and CWT methods with the best parameters (Figs. 2, 3). The configuration optimal for WNN in all cases was: 32 wavelons, Adam.

Fig. 5. Comparison of the results of solving the 10 ions problem by the algorithms of WNN, MLP and PLS on the initial data and after compression by PCA, DWT and CWT methods with the best parameters (Figs. 2, 3). The optimal configurations for WNN were the following: 32 wavelons, SGD (PCA); 16 wavelons, Adam (DWT); 32 wavelons, Adam (CWT).

Use of Wavelet Neural Networks to Solve Inverse Problems

293

On the base of the performed experiments, it can be concluded that at this stage, WNN occupies an intermediate position between MLP and PLS, in some scenarios even surpassing the result of MLP. This result can be considered partly successful, since there are directions of improvement of the WNN training technology. As it was mentioned above, the WNN has difficulties working with data of high dimension. For this reason, although the obtained results were somewhat worse than expected, they also showed good potential and prospects for WNN. At the same time, the problem of working with high-dimensional data remains urgent and requires further development of study in this direction. Finally, this study has confirmed the results of our preceding studies regarding comparison of the two IPs. The 10 ions IP is much more complex and non-linear, requiring maximum of information available to achieve the best results. Therefore, feature selection worsens the result for the 10 ions IP in all cases, and MLP turns out to be the ML algorithm providing the best results for any number of input features.

5 Conclusions In this study, we considered use of wavelet neural networks to solve the inverse problems of determination of the composition of multi-component solutions of inorganic salts by the method of Raman spectroscopy combined with machine learning. The results of WNN were compared to the results demonstrated by multi-layer perceptrons and by the method of partial least squares (projection to latent structures). As WNN is very sensitive to the number of input features, the solution of the studied problems was preceded with feature extraction. The best result among the feature extraction methods was demonstrated by continuous wavelet transformation. At present stage of research, WNN usually performs better than the linear PLS algorithm, but worse than an MLP. However, it has several problems in performing efficient training. Directions of possible improvement of the WNN training algorithm have been formulated. Acknowledgement. This study has been performed with financial support from Russian Foundation for Basic Research, projects 17-07-01479 and 19-01-00738.

References 1. Burikov, S.A., Dolenko, S.A., Dolenko, T.A., Persiantsev, I.G.: Application of artificial neural networks to solve problems of identification and determination of concentration of salts in multi-component water solutions by Raman spectra. Opt. Mem. Neural Netw. (Inf. Opt.) 19(2), 140–148 (2010) 2. Dolenko, S.A., Burikov, S.A., Dolenko, T.A., Persiantsev, I.G.: Adaptive methods for solving inverse problems in laser Raman spectroscopy of multi-component solutions. Pattern Recogn. Image Anal. 22(4), 551–558 (2012)

294

A. Efitorov et al.

3. Efitorov, A., Dolenko, T., Burikov, S., Laptinskiy, K., Dolenko, S.: Neural network solution of an inverse problem in Raman spectroscopy of multi-component solutions of organic salts. In: Samsonovich, A.V. et al. (eds.) FIERCES 2016, Advances in Intelligent Systems and Computing, vol. 449, pp. 273–279. Springer, Heidelberg (2016) 4. Zhang, Q., Benveniste, A.: Wavelet networks. IEEE Trans. Neural Netw. 6, 889–898 (1992) 5. Li, S., Chen, S.: Function approximation using robust wavelet neural networks. In: 14th IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2002), Proceedings, Washington, DC, USA, pp. 483–488 (2002) 6. Bellil, W., Ben Amar, C., Alimi, A.: Comparison between beta wavelet neural networks, RBF neural networks and polynomial approximation for 1D, 2D functions approximation. Int. J. Appl. Sci. Eng. Technol. 13, 33–37 (2006) 7. Zhang, J., Walter, G., Miao, Y.: Wavelet neural networks for function learning. IEEE Trans. Signal Process. 43(6), 1485–1496 (1995) 8. Zhang, Q.: Using wavelet network in nonparameters estimation. IEEE Trans. Neural Netw. 8, 227–236 (1997) 9. Sui, Q., Gao, Y.: A stepwise updating algorithm for multiresolution wavelet neural networks. In: International Conference on Wavelet Analysis and its Applications (WAA), Proceedings, Chongqing, China, pp. 633–638 (2003) 10. Lim, C.G., Kim, K., Kim, E.: Modeling for an adaptive wavelet network parameter learning using genetic algorithms. In: Fifteenth IASTED International Conference on Modeling and Simulation, Proceedings, California, USA, pp. 55–59 (2004) 11. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. arXiv (2015). https:// arxiv.org/pdf/1412.6980v8.pdf. Accessed 09 June 2019 12. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) 13. Rull, F., De Saja, J.A.: Effect of electrolyte concentration on the Raman spectra of water in aqueous solutions. J. Raman Spectrosc. 17(2), 167–172 (1986) 14. Dolenko, T.A., Churina, I.V., et al.: Valence band of liquid water Raman scattering: some peculiarities and applications in the diagnostics of water media. J. Raman Spectrosc. 31, 863–870 (2000) 15. Burikov, S.A., Dolenko, T.A., Velikotnyi, P.A., Sugonyaev, A.V., Fadeev, V.V.: The effect of hydration of ions of inorganic salts on the shape of the Raman stretching band of water. Opt. Spectrosc. 98(2), 235–239 (2005) 16. Efitorov, A., Dolenko, T., Burikov, S., Laptinskiy, K., Dolenko, S.: Solution of an inverse problem in Raman spectroscopy of multi-component solutions of inorganic salts by artificial neural networks. In: Villa, A.E.P. et al. (eds.) ICANN 2016, Part II, LNCS, vol. 9887, pp. 355–362. Springer, Heidelberg (2016) 17. Esbensen, K.H.: Multivariate Data Analysis—In Practice, An Introduction to Multivariate Data Analysis and Experimental Design, 5th edn. CAMO Software AS, US (2006) 18. Wavelet Statistics and Transforms. https://cran.r-project.org/package=wavethresh. Accessed 09 June 2019 19. Daubechies, I.: Ten Lectures on Wavelets. SIAM, Pennsylvania (1992) 20. TensorFlowTM: An open source machine learning framework for everyone. https://www. tensorflow.org/. Accessed 09 June 2019 21. scikit-learn: Machine Learning in Python. http://scikit-learn.org/stable/index.html

Automated Determination of Forest-Vegetation Characteristics with the Use of a Neural Network of Deep Learning Daria A. Eroshenkova, Valeri I. Terekhov, Dmitry R. Khusnetdinov(&), and Sergey I. Chumachenko Bauman Moscow State Technical University (BMSTU), Moscow, Russia [email protected], [email protected], [email protected]

Abstract. The article proposes a method of automated solution for determining the species composition, stock coefficient and other characteristics of forest plantations with the use of deep learning. The analysis of existing approaches and ways of forest inventory, which include the use of LiDAR systems and machine learning methods, is carried out. An algorithm is proposed for solving this problem and features of its implementation are given. The problem of combining the data of a “dense cloud” and a lidar survey is considered, a possible solution is proposed. The problem of segmentation of tree crowns among many other objects in this data is also considered. For the segmentation of crowns, it is proposed to use the PointNet neural network of deep learning, which allows segmentation of objects by submitting a point cloud to the input. The description of the architecture and the main features of the neural network use are briefly given. The path of further research is determined. Keywords: Forest inventory Unmanned aerial vehicle LiDAR Segmentation Deep learning Neural network PointNet

1 Introduction Recently, there has been an intensive development of various unmanned vehicles (robots, cars, aircraft and underwater vehicles, etc.) intended for automating the work processes of human activity. Their use eliminates the human factor, increases productivity, and also most effectively solves the tasks. Unmanned vehicles are already used in such areas as: transportation of goods, medicine, agriculture, construction, communications, weapons, monitoring of objects, etc. In some cases, unmanned vehicles work with lidar systems (Light Detection and Ranging - LiDAR) [1], which is used to automatically construct a three-dimensional map (scene) of the surrounding space and spatial orientation of the device. Such systems generate huge arrays of data that can be used not only to visualize the resulting image, but also to analyze individual objects located on the image, for example, using machine learning methods [2–6].

© Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 295–302, 2020. https://doi.org/10.1007/978-3-030-30425-6_34

296

D. A. Eroshenkova et al.

Using this approach, it is possible to improve already working solutions, and also to develop more effective ways of solving existing problems in various fields. Based on this, the article will consider the method that will determine the species composition, planting stock and other coefficients of forest plantations with use of machine learning methods based on lidar data. This direction is extremely important, since today the use of unmanned vehicles becomes more popular in forestry. This is primarily due to the fact that the unmanned aerial vehicle (UAV) with the LiDAR system makes long-distance flights to take pictures of hard-to-reach forest areas, monitors large areas, and also receives data on the characteristics of forest stands in a short time. By analyzing the data obtained by various methods, including methods of machine learning, it is possible to assess the dynamics of the development of the forest fund in the studied area. There are a number of studies and ready-made solutions that solve similar problems. For example, the Finnish company Arbonaut Oy Ltd [7] specializes in developing solutions for geographic information systems and processing data of remote sensing for different areas. They use LiDAR systems for forest inventory with the method based on sparse Bayesian regression for modeling forest characteristics [8]. This method is superior in accuracy to the traditional inventory methods based on field measurements. However, it should be noted that forest plantations in Finland have a fairly strict order and a homogeneous structure, and the variety of species is not large. Such plantings are easier to analyze, unlike forest plantations in the Russian Federation, where the order and structure of forests is more chaotic, and the diversity of species is much greater [9, 10]. Therefore, the solution proposed by Arbonaut Oy Ltd is not suitable for the inventory of forest plantations in the Russian Federation. The task of forest inventory is extremely relevant and requires its speedy resolution, taking into account the development of big data technologies, artificial intelligence, robotics, as well as complex digital transformation of the economy and social sphere of the Russian Federation by 2024.

2 Problem Definition and Algorithm Description We analyzed the known approaches and methods for forest inventory and their implementations and noticed the absence of acceptable, in terms of quality, ready-made solutions [11]. In this regard, to solve this problem, we need to develop our own method, based on the data of UAV shooting, LiDAR systems and the use of deep learning, which is implemented using a neural network, as one of the methods of machine learning. For this, we propose the following algorithm: 1. Combine LiDAR data (Fig. 1a) and « dense cloud » data (Fig. 1b) [12], which is a kind of terrain plan on a precise geodetic basis: A [ B ¼ C;

ð1Þ

where A ¼ fða11 ; a12 ; . . .; a1m Þ; . . .; ðan1 ; an2 ; . . .; anm Þg is LiDAR data set, B ¼ fðb11 ; b12 ; . . .; b1k Þ; . . .; bp1 ; bp2 ; . . .; bpk g is « dense cloud » data set, C ¼

Automated Determination of Forest-Vegetation Characteristics

297

Fig. 1. Survey data: a - lidar survey of a strip of forest, b - forest «dense cloud»

fðc11 ; c12 ; . . .; c1r Þ; . . .; cðn þ pÞ1 ; cðn þ pÞ2 ; . . .; cðn þ pÞr g is compatible data set, aij ; bij and cij are attributes of points of a three-dimensional scene according to the data specification of each type of survey, including positional values; n is the number of points in the set A, p is the number of points in the set B, m, k and r are the number of points attributes. This operation is performed in the ArcMap software [13] by spatial reference. ArcMap allows you to create, view, edit, and publish maps. When using the spatial reference function, it is necessary to find clearly expressed objects on the image crown tops. The result of this alignment will be data sets containing point clouds. In addition to the positional values x, y and z, the system also stores additional information. The following attributes are recorded and saved for each laser pulse of the LiDAR system: intensity, reflection number, number of reflected signals, point classification values, extreme points of the flight line, RGB values, time, GPS, scan angle and scan direction. A detailed description of each of the attributes can be found in the specifications of the lidar data given in [14]. 2. Segmentation of tree crowns in the combined images. Crown segmentation means that each point in the picture must belong to a particular tree, if this point is indeed a point of the tree, since there may be other objects in the pictures. Thus, it is necessary to solve the segmentation problem:

298

D. A. Eroshenkova et al.

FðCÞ ¼ fðc11 ; c12 ; . . . ; c1r ; l1 Þ; . . . ; ðcðn þ pÞ1 ; cðn þ pÞ2 ; . . . ; cðn þ pÞr ; lðn þ pÞ Þg; where F is segmentation function, C is the result of operation (1), cij are points attributes, li is the variable of belonging of a point to a certain tree. To solve this problem, we carried out a review of existing methods of 3D segmentation [15], based on the results of which we propose to use PointNet convolutional neural network (CNN) [16]. 3. After segmentation of tree crowns, it is necessary to find the diameter of the crowns. Calculating the diameter of crowns by the known dependencies [17], it is possible to determine the diameter of the stem. This parameter is important when analyzing tree stands of the studied area. 4. Summarizing the results of paragraphs 1-3, one can determine the characteristics of forest plantations, such as: the predominant species, tree species in the studied area, the height of tree stands, the crown diameter and the stem diameter. The values of these parameters can be used to calculate the full and stock of plantings in a given territory. The result of the work will be a forest plantation map with a database attached to it.

3 Features of the Implementation of the Proposed Approach The advantage of using LiDAR in the task of determining the species composition, stock coefficient and other characteristics of forest plantations is that the obtained data give the correct height of plantations, which can be used in further analysis. The drawbacks of the LiDAR data are rare measurements (about 30 points/m2), as well as the absence of color scale (it is impossible to visually distinguish forestland species). The « dense cloud » data is different. Its pros are high resolution system, i.e. frequent location of points (up to 1000/m2), RGB images, and the presence of an infrared channel, which is used for additional research of forests. But its limitation is inaccurate measurement of the heights of forest plantations. When combining LiDAR and « dense cloud » data we get: – correct coefficient of the height of forest stands; – frequent location of points; – the presence of an infrared channel. We should note that it is difficult to combine two scenes into one without the presence of common points. Due to the different points of the survey, the angles of the points, their slant ranges and other parameters differ. One possible solution is to shoot from a UAV equipped with two LiDAR systems. Knowing the fixed distance between the cameras, we get the difference in the locations of the points of the two scenes relative to each other. Taking into account this distance, it is possible to carry out the operation of combining two scenes into one scene. An important point in the work is the use of CNN PointNet [16]. PointNet is a unified deep learning network architecture that studies both global and local point

Automated Determination of Forest-Vegetation Characteristics

299

objects, providing a simple, effective approach for solving a number of 3D-recognition tasks, such as: classification, partial segmentation, and semantic segmentation. The network architecture is shown in Fig. 2. The network has three key modules: – max pooling layer (max pool) as a symmetric function for aggregating information from all points;

Fig. 2. PointNet neural network architecture

– structure of combining information on local and global point objects (global feature); – two integrated alignment networks (T-Net) [18], which are used in the input transform and feature transform blocks. T-Net is used to align the feature space of input points and point features in geometric transformations. Thus, the studied set of points remains invariant to these transformations. The Classification Network accepts n input points as input data, applies transformations to input data and objects using the feature transform layer, and then aggregates the point objects by max pooling. The output is the evaluation of classification by k output scores. Segmentation Network is an extension of the Classification Network. It combines global and local point features, and output estimates by categories of the Classification Network. In this case: mlp is a multilayer perceptron, where the numbers in brackets denote the dimensions of the layers. For all layers with the ReLU activation function, batch-normalization is used. The drop layers are used for the last mlp in the Classification Network (in Fig. 2. contained in the mlp (512,256, k) Classification Network). PointNet directly uses unordered sets of points from Euclidean space as input; these sets have the following three properties. 1. Random nature. Unlike arrays of image pixels, a cloud of points is a set without a specific order. In other words, a network that receives as input N sets of points in three-dimensional space must be invariant to N! permutation of the input set in the order of data input. 2. Interaction between points, which means that the points are not isolated and form subsets with adjacent points.

300

D. A. Eroshenkova et al.

3. Invariance in transformations. The studied set of points must be invariant to certain transformations. For example, neither the global category of the point cloud, nor the segmentation of points should be changed, while rotating and moving points together. A cloud of points is represented as a set of points of a 3-dimensional space fPi ji ¼ 1; . . .; ng; where each point Pi is a vector of coordinates ðx; y; zÞ with additional functional channels, such as color, normal, etc. For the process of semantic segmentation, the input can be a single object intended for the segmentation of details of the object. This may be some area from the threedimensional scene for its segmentation into objects. The model will display n m estimates for each of the n points and each of the m semantic subcategories. Today, PointNet creators offer to test the network on a set of data that represent point clouds of pieces of furniture and other interior items (table, door, sofa, board, ceiling, floor, etc.) in the auditoriums of Stanford University [16]. Data sets are stored in HDF5-files [19], therefore, in order to submit user data to the network input, it is necessary to generate files of this type from them. Before proceeding with the PointNet learning phase, we should additionally perform the algorithms for preprocessing, dividing, and forming a data set. Metrics and additional information about the course of training and testing are recorded in log-files. After learning on a training and test sample, network recognition accuracy reaches 86% [16]. This result is encouraging and is sufficient to solve the problem of determining the species composition and the stock coefficient of forest plantations.

4 Conclusion The article proposes a method for the automated determination of the species composition, stock coefficient and other characteristics of forest plantations. It includes shooting from a UAV with a LiDAR system installed on it and using PointNet deep learning neural network to process the received data. We described the working procedure and the algorithm for solving this problem. The article contains the description and experimental results obtained by the authors of PointNet. It is shown that the method proposed in the article is a promising, but not an easy scientific and technical challenge. This is due to the fact that the main difficulties in analyzing the data obtained are caused by the following problems: 1. Incompleteness and possible distortions of information about objects of interest due to different types of surveys - lidar and « dense cloud »; 2. The lack of a correct method of combining data from a lidar survey and shooting an « dense cloud » in one scene; 3. The impossibility of learning the neural network without a sufficient number of labeled data sets. Based on this, in the next stage of work, it is planned to create a labeled set of data representing clouds of points of trees. Then it is necessary to split data into sets of HDF5-files and train CNN PointNet with their help. Depending on the obtained

Automated Determination of Forest-Vegetation Characteristics

301

learning results, such a modification of the network is possible, which will allow to achieve the required accuracy in tree crown segmentation. After solving this problem, it is necessary to consistently solve the problem of determining the diameters of crowns and stems, as well as other related parameters that are used in the analysis of forest stands of the studied area. The result of this work is a map of forest plantations of the region with a base of the main characteristics of forest stands attached to it.

References 1. Weitkamp, C. (ed.).: Lidar: Range-Resolved Optical Remote Sensing of the Atmosphere. vol. 102. Springer (2006) 2. Chernenkiy, V., Gapanyuk, Y., Revunkov, G., Kaganov, Y., Fedorenko, Y.: Metagraph approach as a data model for cognitive architecture. In: Biologically Inspired Cognitive Architectures Meeting, pp. 50–55. Springer, Cham, August 2018 3. Lychkov I.I., Alfimtsev A.N., Sakulin S.A.: Tracking of moving objects with regeneration of object feature points. In: 2018 Global Smart Industry Conference (GloSIC), pp. 1–6. IEEE (2018) 4. Neusypin, K.A., et al.: Algorithm for building models of INS/GNSS integrated navigation system using the degree of identifiability. In: 2018 25th Saint Petersburg International Conference on Integrated Navigation Systems (ICINS), pp. 1–5. IEEE (2018) 5. Serov, V.A., Voronov, E.M.: Evolutionary algorithms of stable-effective compromises search in multi-object control problems. In: Smart Electromechanical Systems, pp. 19–29. Springer, Cham (2019) 6. Knyazev, B., Barth, E., Martinetz, T.: Recursive autoconvolution for unsupervised learning of convolutional neural networks. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2486–2493. IEEE (2017) 7. https://www.arbonaut.com/en/ 8. Tipping, M.E., et al.: Fast marginal likelihood maximisation for sparse Bayesian models. In: AISTATS (2003) 9. Alexeyev, V.A., et al.: Statistical data on forest fund of Russia and changing of forest productivity in the second half of XX century. St. Petersburg Forest Ecological Center, p. 272 (2004) 10. http://www.iiasa.ac.at/web/home/research/researchPrograms/EcosystemsServicesandManag ement/RussianForests.en.html 11. Hyyppä, J., et al.: Review of methods of small-footprint airborne laser scanning for extracting forest inventory data in boreal forests. Int. J. Remote Sens. 29(5), 1339–1366 (2008) 12. Thrower, N.J.W., Jensen, J.R.: The orthophoto and orthophotomap: characteristics, development and application. Am. Cartogr. 3(1), 39–56 (1976) 13. https://desktop.arcgis.com/en/arcmap/ 14. Heidemann, H.K.: Lidar base specification. US Geol. Surv. (11-B4) (2012) 15. Nguyen, A., Le, B.: 3D point cloud segmentation: a survey. In: 2013 6th IEEE Conference on Robotics, Automation and Mechatronics (RAM), pp. 225–230. IEEE (2013) 16. Qi, C.R., et al.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

302

D. A. Eroshenkova et al.

17. Chumachenko, S.I., et al.: Simulation modelling of long-term stand dynamics at different scenarios of forest management for coniferous–broad-leaved forests. Ecol. Model. 170(2–3), 345–361 (2003) 18. Ishiguro, H., Miyashita, T., Tsuji, S.: T-net for navigating a vision-guided robot in a real world. In: Proceedings of 1995 IEEE International Conference on Robotics and Automation, vol. 1, pp. 1068–1073. IEEE, (1995) 19. Folk, M., et al.: An overview of the HDF5 technology suite and its applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pp. 36–47. ACM, (2011)

Depth Mapping Method Based on Stereo Pairs Vasiliy E. Gai(&), Igor V. Polyakov, and Olga V. Andreeva Nizhny Novgorod State Technical University n.a. R.E. Alekseev, Minin St., 24, Nizhny Novgorod, Russia [email protected]

Abstract. The paper proposes a new method for solving the problem of constructing a depth map based on a stereo pair of images. The result of the depth information recovery can be used to capture the reference points of objects in the film industry when creating special effects, as well as in computer vision systems used on vehicles to warn the driver about a possible collision. Proposed method consists in using the theory of active perception at the stage of segmentation and image matching. To implement the proposed method, a software product in the C# language was developed. The developed algorithm was tested on various sets of input data. The results obtained during the experiment indicate the correct operation of the proposed method in solving the problem of constructing a depth map. The accuracy of depth mapping using the described method turned out to be comparable with the accuracy of the methods considered in the review. This suggests that this method is competitive and usable in practice. Keywords: Theory of active perception

Depth map Stereo pair of images

1 Introduction One of the important tasks of the computer vision is the transformation of a stereo pair of images into a three-dimensional scene. As a result of this process, the depth information of each image point is restored. Obtaining an accurate depth map is the ultimate goal of three-dimensional image recovery. The depth information received as the result of this process can be used in many other areas. For example, depth maps are used to capture the reference points of objects in film production when creating special effects, as well as in computer vision systems used on vehicles to warn the driver about a possible collision. Based on this, we can conclude that the development of new models and methods for solving the problem of constructing a depth map based on a stereo pair is relevant.

2 The General Principle of the Methods for Constructing Depth Maps Using a Stereo Pair The general algorithm of depth mapping using stereo images includes the following steps [1]: camera calibration, image rectification, image segmentation, search for matches between points of a pair of images, conversion of a discrepancy map into a © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 303–308, 2020. https://doi.org/10.1007/978-3-030-30425-6_35

304

V. E. Gai et al.

depth map. In this paper, the first two stages are not considered, since these stages are simple geometric transformations and they are solved at the hardware level in most computer vision systems. When analyzing the algorithms that implement the steps described above, the following problems were identified. The problem of segmentation. This problem lies in the fact that part of the segmentation algorithms does not have sufficient accuracy, and therefore, at the stage of the search for correspondences, there are multiple errors associated with incorrect segmentation. The other part of the algorithms provides sufficient accuracy, but has a high computational complexity. There is also a problem with the correlation of segments of two images [2]. The problem of finding matches. This problem lies in the imperfection of matching algorithms, as a result of which the accuracy of depth map construction is reduced [3]. The problem of handling errors after matching. This problem based on the fact that usually after the stage of the search for correspondences the discrepancy map contains a number of erroneously determined points and their additional processing is necessary.

3 Methods of Depth Mapping The proposed method for solving the problem of depth mapping applies the theory of active perception (TAP) at the stage of segmentation and the search for correspondence of points [4]. To solve the problem of depth mapping in this paper, the following algorithm is proposed: 1. Image input – receiving images from cameras or from files; 2. Pre-processing – converting images to a brightness function. 3. Segmentation – the selection of objects in the first image in order to reduce the search area in the future; 4. Search for matching segments – search for segments of the left image on the right image; 5. Discrepancy mapping – the formation of a matrix containing information on how much each point of the first image differs in its position in space from the same point in the second image; 6. Depth mapping – the final stage of the restoration of depth information with the subsequent visualization of the results.

4 Segmentation The next step is to divide the image into segments. This is necessary to reduce the search area at the stage of matching. Due to the fact that there is a search for matching points on the same objects, the best solution would be to divide the image into a set of objects, thereby narrowing the search area to the inner area of objects. Also, the two source images are epipolar, which allows the image to be divided into horizontal segments without the accuracy loss.

Depth Mapping Method Based on Stereo Pairs

305

Based on this, it was decided to produce segmentation in two stages: 1. Division into horizontal segments; 2. Selection in horizontal segments, segments based on the boundaries of objects. Since the images included in the system are epipolar, all the horizontal lines of one image coincide with the horizontal lines of the other. Therefore, you can divide the image into horizontal segments without compromising the accuracy of the search. After the image is divided into horizontal segments, it is necessary to select the boundaries of objects inside each horizontal segment. For this process, we will use filters that allow us to find the change in brightness in different directions (see Fig. 1).

Fig. 1. Filters used to select borders.

Filter F1 is used to select vertical borders, F2 – to select horizontal borders, F3 – to select diagonal borders. These filters are applied to each point in the horizontal segment. From the obtained values, the greatest in absolute value is selected, i.e. denoting the greatest difference in brightness (see Fig. 2).

Fig. 2. Segments

For the subsequent use it is necessary to form a segment model. It consists of the following elements: 1. The starting point of the segment and its description with the help of TAP. 2. The end point of the segment and its description using TAP. TAP filters are used to describe points. The description of the points is formed by applying to them all 16 TAP filters.

306

V. E. Gai et al.

5 Segment Matching At the moment, one image is divided into segments. The next step is to search for the segments of the first image on the second one. To do this, the second image searches for the most similar points for the beginning and end of the segment using the following algorithm: 1. The response for the reference point is calculated for all 16 filters. 2. A 4 4 window is passed through the pixels of the horizontal segment of the second image. The current pixel is the coordinate of the upper left corner of a 4 4 window. As the window passes through the image, the response is calculated for all 16 filters. 3. The difference modulus (“delta”) of each response is found with the reference response that was found at the beginning. 4. All sixteen differences are summed up and saved together with the coordinates of the current position of the window. 5. From all the obtained differences the minimum difference is found, which determines the minimum difference of the found point from the original one. 6. This point is set in accordance with the original. This algorithm is performed for the starting and ending points of the segment. Thus, pairs of segments of the first and second images are formed.

6 Discrepancy Mapping The first main stage of the algorithm is discrepancy mapping – a matrix containing information about how much each point of the first image differs in position in space from the same point in the second image. For each point of the segment, the corresponding point is searched for in the second image. The scope of the search in this case is limited by the size of the segment. When the desired point is found, its discrepancy is calculated by the formula: D ¼ jX1 X2 j;

ð1Þ

where X1 – coordinates of the point on the first image, X2 – coordinates of a point on the second image.

7 Depth Mapping Depth mapping is the final stage of solving the problem. At this stage, the discrepancy map is converted to a depth map. It is also necessary to solve the problem of possible errors made at the stage of the search for matches. Therefore, it was decided to apply the following formula to all points of the discrepancy map:

Depth Mapping Method Based on Stereo Pairs

Dx;y

8 D ; D Max; > < x þx;y n P2 ¼ Di;y > : i¼xn2 ; D [ Max; n

307

ð2Þ

where Dx;y – depth map value at point, Max – maximum possible depth map value, n – the size of the area on which the average value is calculated. This formula is a filter. In other words, if the value of a point is greater than expected, replace its value with an average value from neighboring points. This completes the depth recovery. The following formula is used to visualize the results: Gx;y ¼ 255 Dx;y Dmax ;

ð3Þ

where Dmax – maximum depth map value, Gx;y – point value in grayscale.

8 Computational Experiment To conduct a computational experiment, a database of stereo images was formed. The database consists of 2000 different pairs of images. For each of the pairs of images in the database there is also a reference depth map (see Fig. 3).

Fig. 3. An example of images used in a computational experiment (left, right and depth map)

During the experiment, each point of the reference depth map is compared with the corresponding points of the depth map obtained by the algorithm proposed in this paper. The proposed method for solving the problem of depth mapping has a set of input parameters. Therefore, in the course of the experiment, different sets of values of the input parameters of the algorithm were investigated in order to identify the set that allows depth mapping with the greatest accuracy. As a result of a combination of all the specified values of the input parameters of the algorithm, nine launch configurations were obtained. For each configuration of the launch of the algorithm, the following values were obtained: the accuracy of depth mapping, the average processing time of a single image. The test results of the algorithm are given in Table 1.

308

V. E. Gai et al. Table 1. Algorithm testing results Maximum amount of segments 1

4

8

Minimum size of segments 10 50 70 10 50 70 10 50 70

Accuracy, % 82,4 83,7 82,8 90,2 90,4 90,6 90,3 90,7 90,7

Average processing time, s 10 9 9 7 6 6 5 5 5

Table 2 presents the results of the known methods for depth mapping [1]. Table 2. The results of work of the known methods of mapping depth Method SAD without segmentation MeanShift и SAD Trust Distribution Algorithm and SSD

Accuracy, % 87,6 90,7 91,8

Comparing the data from Table 2 and the obtained results of testing the algorithm (see Table 1), we can conclude that the developed method has a depth map construction accuracy, which is quite comparable with the accuracy of the known methods considered. As a result of testing the algorithm in normal conditions, the accuracy of constructing a depth map equal to 90.7% was obtained.

References 1. Kamencay, P., Breznan, M., Jarina, R., Lukac, P., Zachariasova, M.: Improved depth map estimation from stereo images based on hybrid method. Radioeng. J. 21(1), 70–78 (2012) 2. Comaniciu, D., Meer, P.: Mean shift: a robust approach towards feature space analysis. IEEE Trans. Patt. Anal. Mach. Intell. 24(5), 603–619 (2002) 3. Hisham, M.B.: Template matching using sum of squared difference and normalized cross correlation. In: 2015 IEEE Student Conference Research and Development (SCOReD) (2015) 4. Utrobin, V.A.: Physical interpretations of the elements of image algebra. Uspekhi Fizicheskikh Nauk (UFN) 174(10), 1089–1104 (2004)

Semantic Segmentation of Images Obtained by Remote Sensing of the Earth Dmitry M. Igonin(&) and Yury V. Tiumentsev Moscow Aviation Institute (National Research University), Moscow, Russia [email protected], [email protected]

Abstract. In the last decade, computer vision algorithms, including those related to the problem of understanding images, have developed a lot. One of the tasks within the framework of this problem is semantic segmentation of images, which provides the classification of objects available in the image at the pixel level. This kind of segmentation is essential as a source of information for robotic UAV behavior control systems. One of the types of pictures that are used in this case is the images obtained by remote sensing of the earth’s surface. A significant number of various neuroarchitecture based on convolutional neural networks were proposed for solving problems of semantic segmentation of images. However, for some reasons, not all of them are suitable for working with pictures of the earth’s surface obtained using remote sensing. Neuroarchitectures that are potentially suitable for solving the problem of semantic segmentation of images of the earth’s surface are identified, a comparative analysis of their effectiveness as applied to this task is carried out. Keywords: Earth remote sensing Aerial and satellite imaging 2D image Semantic segmentation Convolutional neural networks Comparative analysis

1 Introduction One of the most challenging scientific and applied problems of our time is the development of behavior control systems for highly autonomous robotic unmanned aerial vehicles (UAVs) that can perform complex missions under uncertainty conditions [1, 2]. Such a control system, to support decision-making processes, requires information about the current situation in which the UAV operates. In obtaining such information, the most crucial role belongs to computer vision, which is an interdisciplinary scientific and applied area focused on solving problems related to the perception, analysis, and understanding of images [3–5]. Understanding of images is precisely the thing what we require to obtain the information necessary to decision making when controlling the behavior of a UAV. It should be emphasized that it is the understanding of images that is the basis for obtaining the information necessary for making decisions in controlling the behavior of UAVs. In the last decade, computer vision techniques have been actively developed, including image understanding methods based on the use of deep learning and deep neural networks, in particular, convolutional neural networks (CNN) [3, 6–10].

© Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 309–318, 2020. https://doi.org/10.1007/978-3-030-30425-6_36

310

D. M. Igonin and Y. V. Tiumentsev

Concerning various types of images and problems to be solved, a significant number of varieties of CNN-based neuroarchitectures were proposed [6, 10]. One of the kinds of images that are of great importance in solving the problems mentioned above is the images obtained by remote sensing of the earth’s surface [11, 12, 20]. Working with such pictures, especially in real time, causes severe demands for both hardware and algorithms for solving the appropriate problems. For this reason, not all neuroarchitectures based on the use of convolutional networks are suitable for solving the problem of on-line semantic segmentation of images obtained by remote sensing of the earth’s surface. In this regard, this article attempts to eliminate neuroarchitectures, which are for one reason or another, unsuitable for solving the considered problem and also to conduct a comparative analysis of the effectiveness of potentially suitable neuroarchitecture. We discuss the results of this analysis in the following sections.

2 Semantic Segmentation as a Part of the Image Understanding Problem We can solve the task of image understanding at several levels of granularity [3–5]: 1. Image classification. In this case, we assume that the image contains a single object (the “main object”) that needs to be assigned to one of the finite set of prescribed classes. The answer, in this case, is the label of the corresponding class. 2. Object classification and localization. In this case, in addition to the task of classifying an object, as in the previous granularity level, it is also required to localize it on the image. Such localization is carried out by enclosing this object in some bounding box. The answer, in this case, is the label of the corresponding class together with the parameters of the bounding box. 3. Object detection. The task is similar to the one that is solved at the previous level, but for the case when there are more than one classified objects in the image. The answer, in this case, is a set of class labels in combination with a set of parameters of the bounding box for all objects detected in the image. 4. Semantic segmentation. In this case, we solve the problem at the pixel level of the analyzed image, that is, by assigning a label of the corresponding class to each of the pixels of the given image. In general, the answer at this granularity level will be an image of the same size as the original image with the corresponding class labels assigned to each pixel. At the same time, for clarity, the image areas corresponding to different classes are marked by different conditional colors. 5. Instance segmentation. This level provides additional granularity compared to image segmentation. In this case, we require not only to mark each of the image pixels with a corresponding label, but also to select individual instances of each of the recognized classes in this image, as is the case in the “object detection” task. In this case, we assign various conditional colors in the picture not to separate classes of objects, but to separate instances of these classes. For example, in the semantic segmentation task, pixels that correspond to all objects of the “person” class will be

Semantic Segmentation of Images Obtained by Remote Sensing of the Earth

311

marked with the same color, and in the case of instance segmentation, each of the found objects of this type will we mark with its specific conditional color. The following sections discuss the solution of one of these tasks, namely, the problem of semantic image segmentation, which is critical for providing the UAV behavior control system with source data. A tool that has proven itself in solving problems of semantic image segmentation, including under conditions of uncertainty, is a convolutional neural network (CNN) in combination with deep learning methods [6, 7, 10]. During the last decade, a significant number of neuroarchitectures based on this class of neural networks have been proposed. As experience in solving semantic image segmentation problems shows, such CNN-based neuroarchitectures as U-Net [14], SegNet [15], MultiNet [16] demonstrate the best results. There are attempts to use for semantic segmentation other networks, in particular, DenseNet [17], DeepLab [8], ICNet [18], FRRN [19], as well as several others. These networks, however, for some reasons do not meet the requirements arising when working with images obtained by remote sensing methods. The analysis of these reasons is beyond the scope of this article. The following sections provide a comparative analysis of the neuroarchitectures UNet, SegNet, and MultiNet in terms of their efficiency in solving problems of semantic segmentation of images obtained during remote sensing of the earth’s surface. This analysis is carried out using source data from the WorldView-3 image gallery [11].

3 The Source Data Used for Analysis The training data required to solve the problem of semantic segmentation was formed using the gallery of multispectral images obtained by the WorldView-3 satellite [11]. This database contains tagged images of the earth’s surface that can be used to recognize objects of various types on them. Examples of such images we can see in Fig. 1. Classes of objects that are labelled in the WorldView-3 database are presented in Table 1. All images in the gallery are presented in GeoTiff format [12] in three- and 16-band format. We obtained 25 color images in RGB format with a resolution of 3396 3349 pixels using 16-band pictures. We plan to use the multispectral nature of photos in the WorldView-3 gallery in our future research as a source of additional information about the objects in these images. We divide each of these 25 images into images of 128x128 pixels in size to reduce the requirements for the required computational resources. Examples of the reduced pictures we can see in Fig. 2. As a result of this operation, about 1.6 104 patterns were obtained, namely, 9752 training patterns, 1300 validation patterns and 5202 test patterns were formed. These sets of patterns are sufficient, as shown by the results of computational experiments, for training the analyzed convolutional networks.

312

D. M. Igonin and Y. V. Tiumentsev

Fig. 1. Examples of images of the earth’s surface from the WorldView-3 image gallery

Fig. 2. Training patterns and their masks obtained using images from WorldView-3 gallery

Semantic Segmentation of Images Obtained by Remote Sensing of the Earth

313

As we know [13], there is a rigid relationship between the number of tunable parameters of a neural network and the number of training examples required for its training. Bearing in mind this factor, we can say that the number of examples in the generated training set is sufficient to train such neuroarchitectures as U-Net, SegNet, MultiNet.

Table 1. Object classes tagged on images from the WorldView-3 database Class number 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Content

Label

Background Large building, residential, non-residential, fuel storage facility, fortified building Misc manmade structures Roads Poor, dirt, cart track, footpath, trail Woodland, hedgerows, groups of trees, standalone trees Contour ploughing, cropland, grain (wheat) crops, row (potatoes, turnips) crops River, channel et al. Lake, pond, pool

Background Buildings

Large vehicle (e.g. lorry, truck, bus), logistics vehicle Small vehicle (car, van), motorbike

Percentage in database 0.53 0.05

Misc Road Track Trees

0.09

Crops

0.28

Waterway Standing water Vehicle Large Vehicle Small

0.007

0.04

0.0002

4 Comparative Efficiency Analysis of Selected Neuroarchitectures The neuroarchitectures selected for comparative analysis of their effectiveness in solving the problem of semantic segmentation of remote sensing images can be briefly described as follows. MultiNet (Fig. 3a) [16] is a neuroarchitecture, which consists of an encoder, a decoder, and a segmentation decoder. This architecture was developed for use as part of a behavior control system for unmanned cars. SegNet [15] is an autoencoder based on a convolutional neural network. The SegNet network architecture (Fig. 3b) consists of consecutive blocks, each of which contains convolution layers, UpSampling layers, as well as ReLU activation layers and BatchNorm normalization layers.

314

D. M. Igonin and Y. V. Tiumentsev CONV 2D 128x128x3 CONV 2D 128x128x64 Max Pooling 64x64x64

CONV 2D 128x128x3 CONV 2D 128x128x64 BatchNormalisaon 128x128x64 Max Pooling 64x64x64

CONV 2D 64x64x128 CONV 2D 64x64x128 Max Pooling 32x32x128

CONV 2D 64x64x128 BatchNormalisaon 64x64x128 Max Pooling 32x32x128

CONV 2D 32x32x256 CONV 2D 32x32x256 CONV 2D 32x32x256 Max Pooling 16x16x256 CONV 2D 16x16x512 CONV 2D 16x16x512 CONV 2D 16x16x512 Max Pooling 8x8x512 CONV 2D 8x8x512 CONV 2D 8x8x512 CONV 2D 8x8x512 Max Pooling 4x4x512

CONV 2D 8x8x2

CONV 2D 32x32x256 BatchNormalisaon 32x32x256 Max Pooling 16x16x256 CONV 2D 16x16x512 BatchNormalisaon 16x16x512 Max Pooling 32x32x512

CONV 2D 16x16x2

CONV 2D 16x16x512 BatchNormalisaon 16x16x512 CONV 2D 16x16x512 BatchNormalisaon 16x16x512 Max Pooling 32x32x512

CONV 2D 4x4x2 CONV 2D TRANSPOSE 8x8x2

CONV 2D 32x32x256 BatchNormalisaon 32x32x256 UpSampling2D 64x64x256 CONV 2D 64x64x128 BatchNormalisaon 64x64x128 UpSampling2D 128x128x128

ADD 8x8x2 CONV 2D TRANSPOSE 16x16x2

CONV 2D 128x128x128 BatchNormalisaon 128x128x128 UpSampling2D 128x128x3

ADD 16x16x2 CONV 2D TRANSPOSE 128x128x2 DENSE 128x128x7

(a)

(b) CONV 2D 128x128x3 CONV 2D 128x128x32 Max Pooling 64x64x32 CONV 2D 64x64x64 CONV 2D 64x64x64 Max Pooling 32x32x64 CONV 2D 32x32x128 CONV 2D 32x32x128 Max Pooling 16x16x128 CONV 2D 16x16x256 CONV 2D 16x16x256 Max Pooling 8x8x256

CONV 2D 8x8x512 CONV 2D 8x8x512 CONV 2D TRANSPOSE 16x16x256 CONCENTRATE 16x16x512 CONV 2D 16x16x256 CONV 2D 16x16x256 CONV 2D TRANSPOSE 32x32x128 CONCENTRATE 32x32x256 CONV 2D 32x32x128 CONV 2D 32x32x128 CONV 2D TRANSPOSE 64x64x64 CONCENTRATE 64x64x128 CONV 2D 64x64x64 CONV 2D 64x64x64 CONV 2D TRANSPOSE 128x128x32 CONCENTRATE 128x128x64 CONV 2D 128x128x32 CONV 2D 128x128x32 CONV 2D 128x128x3

(c) Fig. 3. Neuroarchitectires: (a) – MultiNet; (b) – SegNet; (c) – U-Net

Semantic Segmentation of Images Obtained by Remote Sensing of the Earth

315

Fig. 4. Learning curves of the selected neuroarchitectures: (a) – MultiNet; (b) – SegNet; (c) – U-Net

U-Net (Fig. 3c) [14] is a standard CNN architecture for image segmentation tasks. In the ISBI competition in 2015, U-Net ranked first by a large margin. The U-Net network architecture yielded the best results in biomedical applications, as well as in solving problems for which there is a limited amount of source data. The quality of training is checked on the validation set that was not involved in the learning (Fig. 4). The results of recognition for a test examples are presented in the form of probability matrices (Fig. 5). The value of each of the elements of the matrix is a probabilistic assessment of the conformity of the class to itself (the diagonal values) and the probability of the classes being confused with each other (the nondiagonal values).

316

D. M. Igonin and Y. V. Tiumentsev

Fig. 5. Test patterns results: (a) – MultiNet; (b) – SegNet; (c) – U-Net

Semantic Segmentation of Images Obtained by Remote Sensing of the Earth

317

5 Conclusions With a fixed size of the training set, neuroarchitectures with a smaller number of adjustable parameters have an advantage, due to the tight connection between this number and the number of training examples. Under these conditions, the best results were shown by the SegNet network, for which the average value of the diagonal elements of the probability matrix is higher than that of the MultiNet and U-Net networks. It should be noted, however, that the recognition of objects belonging to the Vehicle class, which is essential for the applications in question, is a difficult task for all analyzed networks. Acknowledgement. This research is supported by the Ministry of Science and Higher Education of the Russian Federation as Project No. 9.7170.2017/8.9.

References 1. Finn, A., Scheding, S.: Developments and Challenges for Autonomous Unmanned Vehicles. Springer, Heildelberg (2010) 2. Valavanis, K.P.: Advances in Unmanned Aerial Vehicles: State of the Art and the Road to Autonomy. Springer, Netherlands (2007) 3. Favorskaya, M.N., Jain, L.C. (eds.): Computer vision in control systems. Aerial and satellite image processing, vol. 3. Springer, Heidelberg (2018) 4. Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, London (2011) 5. Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 2nd edn. Prentice-Hall, New Jersey (2002) 6. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press, Cambridge (2017) 7. Zhao, Z.-Q., et al.: Object detection with deep learning: a review. arXiv:1807.05511v2 [cs. CV]. Accessed 16 Apr 2019 8. Chen, L.-C., et al.: DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. arXiv:1606.00915v2 [cs.CV]. Accessed 12 May 2017 9. Hu, R., et al.: Learning to segment everything. arXiv:1711.10370v2 [cs.CV]. Accessed 27 March 2018 10. Gu, J., et al.: Recent advances in convolutional neural networks. arXiv:1512.07108v6 [cs. CV]. Accessed 19 Oct 2017 11. WorldView-3 Satellite Imagery, DigitalGlobe, Inc. (2017) 12. Qu, J.J., et al.: Earth Science Satellite Remote Sensing: Data, Computational Processing, and Tools, vol. 2. Springer, Heidelberg (2006) 13. Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, New York (2009). Pearson 14. Ronneberger, O, Fischer, P, Brox, T.: U-Net: convolutional networks for biomedical image segmentation. arXiv:1505.04597v1 [cs.CV]. Accessed 18 May 2015 15. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561v3. [cs.CV]. Accessed 10 Oct 2016 16. Teichmann, M., et al.: MultiNet: real-time joint semantic reasoning for autonomous driving. arXiv:1612.07695v2 [cs.CV]. Accessed 8 May 2018

318

D. M. Igonin and Y. V. Tiumentsev

17. Huang, G, et al: Densely connected convolutional networks. arXiv:1608.06993v5 [cs.CV]. Accessed 28 Jan 2018 18. Zhao, H., et al.: ICNet for real-time semantic segmentation on high-resolution images. arXiv: 1704.08545v2 [cs.CV]. Accessed 20 Aug 2018 19. Pohlen, T., et al.: Full-resolution residual networks for semantic segmentation in street scenes. arXiv:1611.08323v2 [cs.CV]. Accessed 6 Dec 2016 20. Cheng, G., Han, J., Lu, X.: Remote sensing image scene classification: benchmark and state of the art. Proc. IEEE 105(10), 1865–1883 (2017)

Diagnostics of Water-Ethanol Solutions by Raman Spectra with Artificial Neural Networks: Methods to Improve Resilience of the Solution to Distortions of Spectra Igor Isaev1,2(&), Sergey Burikov1,2, Tatiana Dolenko1,2 Kirill Laptinskiy1,2, and Sergey Dolenko1

2

,

1 D.V. Skobeltsyn Institute of Nuclear Physics, M.V. Lomonosov Moscow State University, Moscow, Russia [email protected], [email protected] Physical Department, M.V. Lomonosov Moscow State University, Moscow, Russia

Abstract. In this study, we consider adding noise during training of a neural network as a method of improving the stability of its solution to noise in the data. We tested this method in solving the inverse problem of Raman spectroscopy of aqueous ethanol solutions, for a special type of distortion caused by changes in the power of laser pump leading to compression or stretching of the spectrum. In addition, we tested the method on the spectra of real alcoholic beverages. Keywords: Neural networks Water-ethanol solutions

Inverse problems Raman spectroscopy

1 Introduction The problem of quality control of alcoholic beverages considered in this paper is to detect toxic impurities (methanol, fusel oils, etc.) and to determine their concentrations. The methods of solving this problem must be accurate, cheap, fast and non-contact. Currently, there are a number of methods that allow solving this problem with a sufficiently high accuracy: chromatography [1], NMR [2, 3], chemical methods. However, they are expensive and time-consuming, and are not contactless, i.e. require opening the container and extracting a certain amount of sample. As an alternative, the method of Raman spectroscopy [4–6] was proposed, which is fast and non-contact, and does not require complex sample preparation and expensive reagents. Unfortunately, currently there is no analytical solution for the inverse problem (IP) of Raman spectroscopy, and empirical methods based on the measurement of the intensity of characteristic lines [4] are not applicable in the case of a large number of components. Therefore, machine learning methods are actively used to solve IP in Study performed at the expense of Russian Science Foundation, project no.19-11-00333. © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 319–325, 2020. https://doi.org/10.1007/978-3-030-30425-6_37

320

I. Isaev et al.

spectroscopy. For example, artificial neural networks (ANN) have been successfully used for determination of the concentrations of salts dissolved in water by Raman spectra [5, 6], for rapid determination of wine components by absorption spectra [7], to determine the content of glucose in urine by IR absorption spectra [8]. The IP considered in this paper, like many other IP, is characterized by incorrectness and poor conditionality, resulting in high sensitivity of the solution to noise in the data. Despite the fact that ANN by themselves have the ability to work with noisy data, in the case of IP this ability is not enough, requiring the development of special approaches to improve the stability of the neural network solution. In the previous studies of the authors [9–11], it was proposed to use addition of noise during training to improve the stability of the neural network solution of IP. The basis for this is a number of studies, where it was shown that this method could improve the generalizing capabilities of the network [12, 13], prevent overtraining [14– 16], as well as increase the speed of training [17], and that its use was equivalent to Tikhonov regularization [18]. In this paper, this method was tested in relation to the IP of spectroscopy of aqueous ethanol solutions. In this case, a special type of distortion affecting the entire spectrum at once was considered.

2 Problem Statement 2.1

Data Preparation

Experimental Setup. A spectrometer consisting of an argon laser (wavelength 488 nm, power 200 mW), a monochromator, and a CCD detector was used. The spectra were recorded in the range of 200–3800 cm−1, with a resolution of 2 cm−1. For each sample, 10 spectra were taken, which were then averaged. The principle possibility of determining the concentrations of ethanol and the impurities is due to the fact that each of the components has specific lines in the ranges 200–1600 cm−1 and 2600–3800 cm−1 (Fig. 1). Concentrations of single-component aqueous solutions can be determined by the intensity of these lines [4]. However, for multi-component solutions, this approach is not applicable, since the lines of the components under consideration overlap (Figs. 1 and 2, left). Simulation Alcohol Drinks Set. Alcoholic beverages of different strength were modeled, and the following ethanol concentrations were considered: 35, 38, 40, 42, 45, 49, 53, 57%. Impurity concentrations varied from zero to lethal dose and were as follows: methanol – 0, 0.05, 0.14, 0.4, 1.1, 3.1, 8.6, 24%; fusel oil – 0, 0.025, 0.07, 0.22, 0.66, 2, 6, 18%; ethyl acetate – 0, 0.17, 0.35, 0.7, 1.4, 2.8, 5.6, 11.2%. Fusel oil was modeled by a mixture of isoamyl and isopropyl alcohols in a ratio of 70/30. 4043 spectra with various combinations of the considered components were recorded (Fig. 2). Real Alcohol Drinks Set. To test the results of the work, a data set containing spectra of 69 real alcoholic beverages: vodka, gin, tequila, liqueurs etc. (Fig. 2, right) was

Diagnostics of Water-Ethanol Solutions by Raman Spectra with ANN

321

Fig. 1 Raman spectra of pure substances.

Fig. 2 Raman spectra of water-ethanol solutions.

recorded. In addition, spectra of pure alcohol and distilled water were also included. There were total 73 patterns. 2.2

Description of Distortions

Experimental data of spectroscopy IP are subject to distortions of the following types: (a) Deviations in the concentrations of the solution components due to inaccuracies in the preparation of solutions (the “true” concentrations of the components for each spectrum were not measured by an alternative method, but were set during the preparation of each sample).

322

I. Isaev et al.

(b) Random errors in determining the intensity of the spectra channels by the CCDdetector. (c) Frequency shift of the spectrum channels, which may be due to uncontrolled change of adjustment of the experimental setup when replacing the sample. (d) Spectra distortions caused by a change in the laser power or by a change in the absorption coefficient of the sample container, which leads to stretching or contraction of the spectrum. e) Variable pedestal caused by light scattering on inhomogeneity of medium density (Fig. 2). The purpose of this study was to verify the applicability of the previously developed methods of improving the resilience of the neural network solution of IP to noise in the data to the problem of spectroscopy of aqueous ethanol solutions in relation to the fourth type of distortion (stretching/contraction).

3 Solving the Problem 3.1

Data Preprocessing

In order to compensate distortions such as stretching/contraction, normalization is usually used. For example, in the problem of ion concentration determination [11], Raman spectra are normalized to the area (or maximum) of valence band of water. The basis for this is the assumption that water concentration is approximately the same in all samples. In the case of aqueous ethanol solutions, this assumption is not fulfilled: with an increase in the proportion of ethanol in the solution, the proportion of water decreases, due to which the intensity of the ethanol bands increases, and the intensity of the valence band of water decreases (Fig. 2, center). In view of this, in the present study, normalization was not performed, thus increasing the complexity of the problem being solved. 3.2

Using Neural Networks

In this study, we used a multilayer perceptron with 32 neurons in the single hidden layer. Activation function was logistic in the hidden layer, and linear in the output layer. Training was carried out by the method of stochastic gradient descent. Each network was trained 5 times with various initializations of weights, the statistics of application of these 5 networks were averaged. To prevent overtraining, the early stopping method was used. To do this, the initial array of spectra was randomly divided into training, validation, and test sets, which contained 2799, 779, 445 patterns, respectively. The training was stopped after 500 epochs without improving the result on the validation set. 3.3

Method of Training with Noise Addition

In [9] it was shown that the optimal method of training was the one when the training of the ANN was carried out on a training set with addition of noise, and stop of the

Diagnostics of Water-Ethanol Solutions by Raman Spectra with ANN

323

training on a validation set without noise. In this case, the quality of the solution was higher, and the training time – less. This approach was used in the present study. The type of distortion (stretching/contraction) considered in this paper was modeled as multiplicative noise. Two statistics were considered – Gaussian and uniform. The noise levels considered were 1, 3, 5, 10, 20%. Thus, including the initial data sets without noise, 11 training sets and 11 test sets, as well as 1 validation set were used. Each initial pattern of the training and test sets was presented in 10 noise realizations. Networks trained on a training set with a certain noise level were applied to test sets of all noise levels of the same statistics.

4 Results 4.1

Simulation Alcohol Drinks Set

For the first data set that modeled alcoholic beverages, the results for ethanol are shown in Fig. 3. One can see that the resilience of the solution to distortions in the data is higher for distortions that have uniform statistics (Fig. 3, right) than for distortions having Gaussian statistics (Fig. 3, left).

Fig. 3 The dependence of the quality of the solution (mean absolute error, MAE) for ethanol on the distortion level in the test set for various distortion statistics: left – multiplicative Gaussian distortion (mgd), right – multiplicative uniform distortion (mud). Various lines represent various distortion levels in the training set.

For the method of adding noise during training, one can see that the higher is the noise level in the training set, the slower is the deterioration of the solution when the noise level in the test set increases. For the other components under consideration, the nature of the dependencies is completely similar. The low level of error may indirectly indicate that the dataset is representative. 4.2

Real Alcohol Drinks Set

To solve the problem on the data containing spectra of real alcoholic beverages, the basic version of ANN was trained without adding distortions to the training set. In this

324

I. Isaev et al.

case, the results of almost the entire data set went into saturation – showed the lower or upper limit of concentrations in the training sample (Fig. 4, left). This fact indicates a high degree of difference of the sets.

Fig. 4 Results of application of neural networks to determine ethanol concentration in real alcohol drinks. Left – network trained without adding distortions, right – networks trained with addition of 20% Gaussian distortions to training set. Markers represent the determined concentrations of ethanol; lines represent concentrations declared by drink producers.

Therefore, in the second case, the networks were trained at the maximum (20%) level of Gaussian noise. The results are shown in Fig. 4, right. When using the networks trained with noise, the results of determination of the concentrations were close to those stated by the manufacturers. The average deviation was 2.07% vol.

5 Conclusion The following conclusions can be drawn from the results of the work: • When using this method, the following effect has been confirmed: the higher is the noise level in the training set, the slower the solution quality decreases with increase of the noise level in the test set. • The resilience of the solution to distortions in the data is higher for distortions having uniform statistics than for distortions having Gaussian statistics. • A dataset of spectra of real alcoholic beverages differs significantly from the dataset, simulating alcoholic beverages. As a result, the networks trained without adding noise, failed to give reasonable results. • Networks trained with the addition of Gaussian noise with the level of 20% showed an average deviation of 2.07% vol. Thus, the effectiveness of the method of training with noise to improve the resilience of the neural network solution of the inverse problem of spectroscopy of aqueous ethanol solutions was confirmed.

Diagnostics of Water-Ethanol Solutions by Raman Spectra with ANN

325

References 1. Leary, J.: A quantitative gas chromatographic ethanol determination. J. Chem. Educ. 60(8), 675 (1983) 2. Isaac-Lam, M.: Determination of alcohol content in alcoholic beverages using 45 MHz benchtop NMR spectrometer. Int. J Spectrosc. 2016(2526946), 8 (2016) 3. Zuriarrain, A., Zuriarrain, J., Villar, M., Berregi, I.: Quantitative determination of ethanol in cider by 1H NMR spectrometry. Food Control 50, 758–762 (2015) 4. Boyaci, I., Genis, H., et al.: A novel method for quantification of ethanol and methanol in distilled alcoholic beverages using Raman spectroscopy. J. Raman Spectrosc. 43(8), 1171– 1176 (2012) 5. Dolenko, S., Burikov, S., et al.: Adaptive methods for solving inverse problems in laser Raman spectroscopy of multi-component solutions. Patt. Recogn. Image Anal. 22(4), 551– 558 (2012) 6. Dolenko, S., Burikov, S., et al.: Neural network approaches to solution of the inverse problem of identification and determination of partial concentrations of salts in multicomponent water solutions. LNCS, vol. 8681, pp. 805–812 (2014) 7. Martelo-Vidal, M., Vázquez, M.: Application of artificial neural networks coupled to UV– VIS–NIR spectroscopy for the rapid quantification of wine compounds in aqueous mixtures. CyTA J. Food 13(1), 32–39 (2015) 8. Liu, W., Wang, W., et al.: Use of artificial neural networks in near-infrared spectroscopy calibrations for predicting glucose concentration in urine. LNCS, vol. 5226, pp. 1040–1046 (2008) 9. Isaev, I.V., Dolenko, S.A.: Training with noise as a method to increase noise resilience of neural network solution of inverse problems. Opt. Mem. Neural Netw. (Inf. Opt.) 25(3), 142–148 (2016) 10. Isaev, I.V., Dolenko, S.A.: Adding noise during training as a method to increase resilience of neural network solution of inverse problems: test on the data of magnetotelluric sounding problem. Studies in Computational Intelligence, vol. 736, pp. 9–16 (2018) 11. Isaev, I., Burikov, S., Dolenko, T., Laptinskiy, K., Vervald, A., Dolenko, S.: Joint application of group determination of parameters and of training with noise addition to improve the resilience of the neural network solution of the inverse problem in spectroscopy to noise in data. LNCS, vol. 11139, pp. 435–444. Springer, Cham (2018) 12. Holmstrom, L., Koistinen, P.: Using additive noise in back-propagation training. IEEE Trans. Neural Netw. 3(1), 24–38 (1992) 13. Matsuoka, K.: Noise injection into inputs in back-propagation learning. IEEE Trans. Syst. Man Cybern. 22(3), 436–440 (1992) 14. An, G.: The effects of adding noise during backpropagation training on a generalization performance. Neural Comput. 8(3), 643–674 (1996) 15. Zur, R.M., Jiang, Y., Pesce, L.L., Drukker, K.: Noise injection for training artificial neural networks: a comparison with weight decay and early stopping. Med. Phys. 36(10), 4810– 4818 (2009) 16. Piotrowski, A.P., Napiorkowski, J.J.: A comparison of methods to avoid overfitting in neural networks training in the case of catchment runoff modeling. J. Hydrol. 476, 97–111 (2013) 17. Wang, C., Principe, J.C.: Training neural networks with additive noise in the desired signal. IEEE Trans. Neural Netw. 10(6), 1511–1517 (1999) 18. Bishop, C.M.: Training with noise is equivalent to Tikhonov regularization. Neural comput. 7(1), 108–116 (1995)

Metaphorical Modeling of Resistor Elements Vladimir B. Kotov, Alexandr N. Palagushkin, and Fedor A. Yudkin(&) Scientific Research Institute of System Analysis, Moscow, Russia [email protected]

Abstract. The variable resistors changing their resistance during the process of functioning may become the basis for creation of neural networks elements (synapses, neurons, etc.). The processes leading to resistance change are extremely complicated and are not yet amenable to correct description. To master the possibilities of using the variable resistors it is reasonable to use the metaphorical modeling, i.e. to replace a complex physical system with a simple mathematical system with a small number of parameters, reproducing the important features of real system’s behavior. A simple (elementary) resistor element with state determined by a single scalar variable is considered as the modeling unit. The equations describing the change of the state variable are written down. The choices of functions and parameters in equations, as well as the methods of such elements combination with traditional electronic components (fixed resistors, capacitors, diodes, etc.) are discussed. The selection of these functions from a small set and the adjustment of several parameters allow us to obtain the characteristics close to real ones. The scheme of measuring the “volt-ampere characteristics” is considered. An example of specific selection of functions determining the resistor element behavior is given. Keywords: Variable resistor State of resistor Equation of the state change Volt-ampere characteristics

1 Introduction One of the most promising directions of the neuromorphic devices’ elemental base development is mastering the possibilities of variable resistors application [1, 2]. Such resistors change their resistance in the process of functioning and are able to become the basis for creation of neural networks elements analogs (synapses, neurons, etc.) [2]. Even the special title “memristors” was invented for these elements. However the different authors understand this title differently. Moreover, the term itself implies the presence of energy independent memory in “memristors”, which is not necessary at all for the neural elements implementation. Therefore, in order to avoid misunderstandings we will not use this term. The functioning of variable resistors is based on various physical processes [3–6], which are not yet fully understood due to their complexity. The constructed “physical” models are not actually physical and require the adjustment of parameters. From the point of practical development of neuromorphic devices it would be much more useful to have the simplest model reproducing the main features of behavior, although unable © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 326–334, 2020. https://doi.org/10.1007/978-3-030-30425-6_38

Metaphorical Modeling of Resistor Elements

327

to approximate the characteristics of devices with high accuracy due to small number of parameters. At the same time, the model construction is based on general principles, its specification is aimed at maximum simplification (provided that the required characteristic features are preserved).

2 Equations and Assumptions The resistor element obeys Ohm’s law U ¼ RI;

ð1Þ

where U is voltage on resistor, I – current flowing through resistor, R – resistance. As a result of the current flow (and/or under the action of voltage) the changes occur in the resistor and this is expressed in a change of its resistance. The state of resistor can be described by state variables. We assume that for description of the state one scalar variable x is enough, so R ¼ RðxÞ. Then the general equation describing the change of state has the form dx ¼ Fðx; U; I; tÞ dt

ð2Þ

In common conditions the dependence of the equation right side on time can be ignored. By means of Ohm’s law it is possible to exclude one of the quantities U or I. As a result, we obtain the equation dx ¼ Fðx; IÞ dt

ð3Þ

or a similar equation where I ! U. Which one of these equations to use is a matter of convenience. It can be assumed in many cases that the change in resistance is mainly due to flowing current with the current dependent function having a simpler form. Let’s assume that the state variable is between 0 and 1: 0 x 1. This can always be achieved by converting the variable. We also consider that in the state x ¼ 0 the resistor has the maximum resistance, and in the state x ¼ 1 – the minimum resistance. The equations of state change are written for 0\x\1. In order to avoid going beyond the range of permissible values, the right part of the equations should be considered equal to zero at x 0 and x 1. The accepted assumptions do not fixate the choice of the state variable. It can be done in different ways. We can bind the state variable to resistance R: Rð xÞ ¼ R0 DRx; where R0 ¼ Rð0Þ [ 0; DR ¼ Rð0Þ Rð1Þ [ 0; or conductivity G ¼ 1=R:

ð4Þ

328

V. B. Kotov et al.

Gð xÞ ¼ G0 þ DGx; where G0 ¼ Gð0Þ [ 0; DG ¼ Gð1Þ Gð0Þ [ 0:

ð5Þ

Let us proceed to concretization (simplification) of function F from Eq. (3). We use the expression F ðx; I Þ ¼ Fxþ ð xÞFIþ ðI Þ þ Fx ð xÞFI ðI Þ þ F0 ð xÞ:

ð6Þ

The first term describes the effect of positive current on the change of the resistor state. The effect of negative current is described by the second term. The splitting into positive and negative (relative to the current direction) parts is due to the fact that for the most interesting types of resistors the processes during the positive and negative currents are different. Meanwhile usually the currents of different directions tend to change the state variable in opposite directions. This is true in particular for structures of the metal-dielectric/semiconductor-metal type [6]. In this case it is convenient to determine the current direction in accordance with the resistor direction – the positive current tends to increase the state variable x, and negative current – to reduce. Here we can assume that the functions FIþ ðI Þ; FI ðI Þ describing the dependence of the rate of x change on the quantity of positive and negative currents have the following properties FIþ ðI Þ ¼ 0 at I 0; FIþ ðI Þ [ 0;

dFIþ ðI Þ [ 0 at I [ 0; dt

FI ðI Þ ¼ 0 at I 0; FI ðI Þ [ 0;

dFI ðI Þ [ 0 at I \ 0: dt

ð7Þ

We note that the properties (7) are not universal. Thus, if the state variable is the normalized temperature the heating occurs regardless of the current direction and both summands have the same sign. However, this case is not very interesting for practice. In addition, here the limitation to one direction current is possible, so it is sufficient to take only the first term in the right hand side of Eq. (6). It is possible to indicate as the functions with properties (7) the family of power functions on semiaxis, that is, we can take FIþ ðI Þ ¼ B þ I b þ at I [ 0; FI ðI Þ ¼ B ðIÞb at I \ 0;

ð8Þ

where B þ ; B are positive coefficients, b þ ; b – positive exponents of power. Many used models assume the unit exponents of power [1, 4], but other exponents may also be useful. For example the quite high exponents of power (actually not less than 2) provide the functions well replacing the threshold functions. The functions Fxþ ð xÞ; Fx ð xÞ describe inhomogeneity by x rate of state change. It is natural to assume that Fxþ ð xÞ 0; Fx ð xÞ 0 at 0\x\1:

ð9Þ

These functions, like the F0(x) function, depend on the method of state variable determination.

Metaphorical Modeling of Resistor Elements

329

Function F0 ðxÞ is used to describe the evolution of the resistor state in the absence of current. The change of state has a character of approach to stationary state which either coincides with one of the boundary states (x = 0 or x = 1) or corresponds to zero of the F0 ðxÞ function. Let us assume for certainty that there is only one stationary (basic) state x = 0. This is the most typical case. Herewith should be F0 ð xÞ\0 at 0\x\1:

ð10Þ

The most convenient power functions are F0 ð xÞ ¼ f0 xa ðf0 [ 0; a 0Þ:

ð11Þ

The Eq. (3) for I ¼ 0 with a function F0 ðxÞ of the form (11) has a solution xðtÞ ¼ xðt0 Þ f0 ðt t0 Þ for a ¼ 0; xðtÞ ¼ xðt0 Þexpff0 ðt t0 Þg for a ¼ 1;

ð12Þ

1 h i1a for a 6¼ 0; 1 xðtÞ ¼ xðt0 Þ1a þ ða 1Þf0 ðt t0 Þ

(t0 is the initial time). At a\1, the basic state is achieved in a finite time t t0 ¼ xðt0 Þ1a =ðð1 aÞf0 Þ, afterwards the state is unchangeable. At a ¼ 1, the variable x tends to zero according to exponential law. Although the basic state is not reached, the approach to it is very fast. In both cases there is no sense to talk about long-term memory. At a [ 1, approaching to basic state happens according to power law 1 ðt t0 Þ =1 a . The higher the index a is the slower the relaxation proceeds. The memory on initial state is retained for long enough. Hence, the function (11) with a sufficiently high index a allows us to model the long-term memory. In case of need to model the memory with an infinite storage time, the function F0 ðxÞ should be equal to zero at x from continuous interval.

3 Circuit to Measure the Electrical Characteristics To measure the electrical characteristics of variable resistor a fixed resistor with resistance r is connected in series with it and a given voltage uðtÞ is applied to the resulting pair. Measuring a voltage on the fixed resistor allows us to find out a current through the resistors, the voltage on the variable resistor and its resistance. The fixed resistor is also needed to limit the current. At Rð1Þ Rð0Þ, the dynamic range of current is very large and the high current at x ! 1 in absence of fixed resistor could damage the variable resistor. The resistance r is usually chosen according to the conditions Rð1Þ r Rð0Þ.

330

V. B. Kotov et al.

For this circuit I¼

u ; Rþr

ð13Þ

therefore the Eq. (3) with representation (6) is written down as dx u u þ þ ¼ Fx ð xÞFI þ Fx ð xÞFI þ F0 ðxÞ: dt Rð xÞ þ r R ð xÞ þ r

ð14Þ

At u 0, the first term in the right part of Eq. (14) is equal to zero and two other summands are negative with taking into account the properties of (7), (9), (10). It means that here happens the accelerated relaxation to the ground state x ¼ 0. The negative voltage u can be used for fast erase of information. The second term in the right part of Eq. (14) annuls at u 0. The remaining summands have opposite signs. Their sum can be either positive or negative depending on x and u. Considering the right part of Eq. (14) Fðx; IÞ as a function of x and u, we obtain partitioning of the permissible values region 0 x 1; u 0 into the regions F [ 0 and F\0. At F [ 0, the state variable x increases over time, and decreases at F\0. Areas F [ 0 and F\0 are separated by the curve F ¼ 0. Above the curve F ¼ 0 (i.e. at bigger u values) is the region F [ 0, below – the region F\0. The equation F ¼ 0 at a given value u determines the stationary point xst ðuÞ corresponding to the stationary state of resistor at constant voltage of the source u. The stationary point is a stable equilibrium point (or a stable stationary point) if to its left (x\xst ) we have F [ 0 and to its right – F \ 0. Otherwise we have the unstable stationary point. Besides the stationary points determined by equation F = 0 the boundary stationary points are possible. The point x ¼ 0 is a stable stationary point if at small positive values x we have F\0. The point x ¼ 1 is a stable stationary point when to its left F [ 0. Just the stable stationary points play the determining role at the direct voltage u since in this case the Eq. (14) describes the approximation to stationary point. In most cases the approach to stationary point is fast enough – exponential, or even the stationary point is achieved over the finite time. The conclusions for direct voltage u case can be extended to the case of quasi-stationary voltage change, when the state of resistor has time to adjust to current voltage. In this case (i.e. at u [ 0), the equation of curve F ¼ 0 can be written in the form u ¼ PðxÞ;

ð15Þ

F0 ðxÞ Pð xÞ ¼ ðRð xÞ þ r Þh þ ; Fx ðxÞ

ð16Þ

where

Metaphorical Modeling of Resistor Elements

u Ps

u=P(x)

Pi

u Ps

331

u=P(x)

Pi 0

1 x

Fig. 1. Increasing function P(x) and one-dimensional trajectories at different u

0

1x

Fig. 2. Nonmonotonous function P(x) and one-dimensional trajectories

hðzÞ is a function inverse to the function FIþ ðIÞ. On condition that the inequalities (7) fulfill and providing that the function FIþ ðIÞ is unbounded at I ! þ 1 we obtain that the function h(z) biunivocally and monotonically maps the positive semi-axis onto the positive semi-axis. Obviously, Pð xÞ [ 0 at 0\x\1. Let’s denote Ps and Pi the exact upper and lower bounds of function Pð xÞ at 0\x\1. If the function Pð xÞ is unlimited we consider Ps ¼ 1. For u\Pi the Eq. (15) for variable x has no solutions, and the only stationary (stable) point is the boundary point x ¼ 0. At u [ Ps , the Eq. (15) also has no solutions, here the only stationary point is the boundary point x ¼ 1. For Pi \u\Ps , the Eq. (15) has at least one solution. If the function Pð xÞ is increasing then the solution of Eq. (15) is the only one. This solution determines the sole stationary (stable) point. The Fig. 1 shows such curve F = 0 together with onedimensional trajectories of the imaging point movement at different voltages u. And if the function Pð xÞ is monotonically decreasing then the only solution of Eq. (15) determines the unstable stationary point. Here the both boundary points x ¼ 0 and x ¼ 1 are stable stationary points. For nonmonotonous function Pð xÞ, in a certain range of voltage u values the Eq. (15) has more than one solution. The solution xst corresponding to positive slope of the curve u ¼ Pð xÞ provides the stable stationary point, and if at x ¼ xst the slope of the curve is negative we get the unstable stationary point. Additional stable stationary points can be located at interval boundaries. For given value of u the number of stable stationary points must be one more than the number of unstable stationary points. In typical cases there may be the two stable stationary points and one unstable point (Fig. 2).

332

V. B. Kotov et al.

xst 1

u

u

0

0

Fig. 3. Switching branches of function xst(u) at quasistationary change of voltage u

xm

1x

Fig. 4. Function P(x) from formula (17) with one maximum and one-dimensional trajectories

If there are several roots of Eq. (15) the dependence of the stationary (stable) point on voltage xst ðuÞ is multivalued (usually double-valued) within certain range of voltages u. Under quasi-stationary source voltage change the change of resistor state corresponds to movement on one of the branches of the function xst ðuÞ. If this branch ends, the transition to another branch occurs inevitably (Fig. 3). A sharp change of the resistor state accompanied by sharp changes of resistance, current and voltage of the variable resistor is the most obvious manifestation of multi-stability (bistability in having the two stable stationary states).

4 Example Let’s take FIþ ðIÞ in the form (8), F0 ðxÞ as (11), RðxÞ as (4), and Fxþ ð xÞ ¼ 1. Then Pð xÞ ¼

f0 Bþ

1= b

a= ðr þ R0 DRxÞx b :

ð17Þ

In this case Pi ¼ Pð0Þ ¼ 0; Ps \1. The function PðxÞ on the positive semi-axis has a maximum at x ¼ xm

a r þ R0 : a þ b DR

ð18Þ

If xm 1 then the function PðxÞ is monotonically increasing at 0\x\1, the Eq. (15) has only one solution at 0\u\Ps ¼ Pð1Þ, representing the stable stationary state. At u Pð1Þ the only stationary state (stable) is the boundary state x ¼ 1. And if xm \1, the maximum is within the permissible range of variable x (Fig. 4). For 0\u\Pð1Þ, the Eq. (15) has the single solution representing the single stationary state. For Pð1Þ\u\Ps ¼ Pðxm Þ, the Eq. (15) has two solutions: the smaller corresponds to stable stationary state and the larger – to unstable stationary state. The second

Metaphorical Modeling of Resistor Elements

333

I

x,u

1

U

Fig. 5. “Volt-ampere characteristic” at triangular voltage feeding

0 0

5

10

15

20

25

30

35

40

45

t

Fig. 6. Time dependence x(t) with the graph of normalized voltage of source

stable stationary state is the boundary state x ¼ 1. The same boundary state is the only stable state at u [ Pðxm Þ. Thus, the inequality xm \1 is the condition for presence of bistability. Taking into account that usually DR R0 ; r R0 , so the second multiplier in the right part of (18) is of order of magnitude of one, we find out that the fulfillment of bistability condition is quite real. However if the inequality xm \1 is observed with insufficient margin the range of bistability becomes rather narrow and it is difficult to detect the bistability. In practice, the periodically changing voltage of standard form (triangular, notched, sinusoidal) is used as the source voltage. The condition of quasistationarity often is not met. At that, the state of resistor does not keep up to get close enough to “stationary” state for the current value of voltage u. So the state of resistor tends to “stationary” state which is constantly changing. The resistance shocks arising due to the bistability can be strongly smoothed due to incomplete relaxation towards the stationary state. Figure 5 shows the “volt-ampere characteristic” (more precisely, the trajectory of point with coordinates U, I) at using the triangular voltage u (of positive polarity), obtained as a result of numerical solution of Eq. (6) at a ¼ 2; b ¼ 1; Rr0 ¼ 50; R0 R0 DR ¼ 1000. The three loops correspond to three periods of the source voltage. The difference of the loops is explained by the fact that at completion of the period of voltage change, the state variable does not return to the initial value. This is clearly seen in Fig. 6, where the graph of dependence xðtÞ along with the graph of the normalized source voltage is presented.

5 Instead of Conclusion. Combinatorics The considered model of simple resistor element can explain a lot and predict something. But not everything. It’s natural. The world of resistor elements is diverse and cannot be covered by one simple model. Nevertheless the capabilities of model can be significantly expanded if we are not limited to one element and build on its basis the various combinations.

334

V. B. Kotov et al.

The real variable resistor has two poles (contacts). Any of contacts can be a source of variable (controlled) resistance. This is in any case true for metal-dielectric-metal, metal-semiconductor-metal and other similar structures. To model such structures it is necessary to use not one resistor element, but two parallel oppositely directed resistor elements. The resulting combination has much richer capabilities (and is more complex) than one resistor element. At the points of different materials contact (for example, metal and dielectric) the diverse diode structures characterized by nonlinear volt-ampere characteristics may occur. If such characteristic can be considered as constant with no memory, then the effect of the structure is reduced to series connection of diode or other similar nonlinear element. And if the volt-ampere characteristic depends on previous events, then for an additional diode with memory it is possible to use a model similar to above considered, but with use of the nonlinear Ohm’s law. In many cases it is convenient to consider the diode and resistor elements as one. The simple resistor element can act as a memory element – an analog of synapse if the rate of resistor relaxation is somehow limited, for example the index a in formula (11) is large enough. The resistor element can be used as a nonlinear element – analog of neuron, since the resulting characteristics of resistor are essentially nonlinear, and even the bistability is possible. If parallel to resistor element (simple or combined) a capacitor is connected it is possible to implement the “leaky integration” found in many neural networks. The considered model is not only useful for presentation of existing resistor elements, but it also can indicate the direction of perspective elements improvement. Funding. The work financially supported by State Program of SRISA RAS No. 0065-20190003 (AAA-A19-119011590090-2).

References 1. Adamatzky, A., Chua, L.: Memristor Networks. Springer, Heidelberg (2014) 2. Vaidyanathan, S., Volos, C.: Advances in Memristors. Memristive Devices and Systems. Springer, Heidelberg (2017) 3. Yang, J.J., Strukov, D.B., Stewart, D.R.: Memristive devices for computing. Nat. Nanotechnol. 8, 13 (2013) 4. Radwan, A.G., Fouda, M.E.: On the Mathematical Modeling of Memristor, Memcapacitor and Meminductor. Springer, Heidelberg (2015) 5. Yang, Y., Lu, W.: Nanoscale resistive switching devices: mechanisms and modeling. Nanoscale 4, 10076 (2013) 6. Palagushkin, A.N., et al.: Aspects of the a-TiOx Memristor Active Medium Technology. J. Appl. Phys. 124, 205109 (2018)

Semi-empirical Neural Network Models of Hypersonic Vehicle 3D-Motion Represented by Index 2 DAE Dmitry S. Kozlov1,2(B) and Yury V. Tiumentsev1 1

2

Moscow Aviation Institute (National Research University), Moscow, Russia [email protected], [email protected] Federal State Unitary Enterprise “State Research Institute of Aviation Systems”, Moscow, Russia

Abstract. We consider a problem of mathematical modeling and computer simulation of nonlinear controlled dynamical systems represented by differential-algebraic equations of index 2. The solution of the problem is proposed within the framework of a neural network based semiempirical approach that combines theoretical knowledge of the modeling object with training tools applied to artificial neural networks. We propose particular form semi-empirical models implementing implicit Runge-Kutta integration formulas inside the activation function. The training of the semi-empirical model makes it possible to elaborate on the models of aerodynamic coefficients implemented as a part of it. We present a semi-empirical model that uses as theoretical knowledge the equations of a full model of hypersonic vehicle motion in the specific phase of descent in the atmosphere. The simulation results for the problem of identifying the aerodynamic coefficient, implemented as an ANNmodule of a semi-empirical model of the movement of a hypersonic vehicle, are presented. Keywords: Dynamical system · Differential-algebraic equations Semi-empirical model · Neural network based simulation

1

·

Introduction

The semi-empirical approach assumes the generation of gray-box models using theoretical knowledge about the simulated object in the form of a system of ordinary differential equations (ODE) [1]. We transform the initial theoretical model into a semi-empirical one taking into account the methods of integrating the ODE so that the neural network methods could modify parts of the model. In [1], we present the simulation results, confirming the high efficiency of the semi-empirical approach compared with the traditional black-box dynamic neural network models, such as NARX (Nonlinear AutoRegressive network with eXogeneous inputs). The difference between the semi-empirical approach and c Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 335–341, 2020. https://doi.org/10.1007/978-3-030-30425-6_39

336

D. S. Kozlov and Y. V. Tiumentsev

the NARX approach lies in the fact that in the first case when generating a model, some of the connections between state variables and control variables of the source system of ODEs are embedded into the model without changing. That allows us to reduce the number of adjusting parameters of the model and improves its generalization properties. In [2] Runge-Kutta neural networks (RKNN) are proposed for building models of dynamic systems represented in the form of ODE. This approach also assumes the use of theoretical knowledge about the modeling object in the form of the explicit Runge-Kutta integration formulas implemented in the network architecture. RKNN has layers that, taking into account the connections between state variables, implement the right parts of the ODE system. With this approach, when training RKNN, the models of the right-hand side are refined. In some problems, in addition to ODE, the theoretical model includes algebraic equality-type constraints, that is, the system of differential-algebraic equations (DAE) is the basis of the theoretical model. For DAE systems, the concept of the index of the DAE system [3] is introduced. An example of such a problem is the controlling of vehicle descending in the upper atmosphere. In [1,4] the semi-empirical approach based on the explicit conditionally stable methods of the numerical integration is considered. It is not possible to use this approach directly to modeling the systems represented by DAE. A modification is needed that takes into account the specific character of DAE systems.

2

Semi-empirical Models for DAE Systems

Let us examine the system of the differential-algebraic equations of index 2 in the semi-explicit form y˙ = f (t, y, z, u), 0 = g(t, y),

(1)

where y = y(t) is a vector of state variables of the system, z = z(t) is the state variable which is DAE algebraic variable (1), u = u(t) are the control variables. We reduce the index of the system (1) by differentiating the algebraic constraint ˙ [3]. The new algebraic constraint takes the form 0 = 2g+g. For index 1 DAE systems, the use of one-step s-stage methods for numerical integration is promising. The implicit Runge-Kutta (IRK) method is often used. We propose to use the IRK method based on the quadrature formula Radau IIA [3,5,6]. Applying IRK method to the DAE system, we get (2)–(3). Using an implicit scheme involves solving the system of nonlinear equations (2) by Newton’s method at each step of integration: Yni = yn + h

s

aij f (tn + cj h, Ynj , Znj ), 0 = g˜(tn + ci h, Yni , Zni ),

(2)

j=1

yn+1 = yn + h

s j=1

bi f (tn + cj h, Ynj , Znj ), zn+1 = R(∞)zn +

s i,j=1

bi ωi,j Zn,j , (3)

Semi-empirical ANN Index 2 DAE Models of Hypersonic Vehicle 3D-Motion

337

Fig. 1. The structural scheme of the semi-empirical model

where h is the integration step, (aij ), bi , cj are Butcher table coefficients, yn is the vector of the state variables s at step tn , ωij are the elements of the matrix inverse to (aij ), R(∞) = 1 − i,j=1 bi ωij . The structural scheme of the semi-empirical model is shown in Fig. 1. We described the network structure in [6]. The neural network is trained using the RTRL (Real-Time Recurrent Learning) algorithm. We form the training set as a sequence of observed outputs for a given control and initial conditions. It uses a random input control signal (U (t)) of a specific type [5,6]. In contrast, to [1,4], which implements the integration scheme in the network structure, we propose the approach whereby the procedure containing the integration scheme is specified inside the activation function of the network layer [7]. This approach allows us to implement both explicit and implicit integration schemes in a model. Let us consider the RKNN [2], which implements, to simplify the calculations, 2-stage Heun’s method. The method is explicit, and the network architecture implements a cascade scheme with a single input and output (4). The values at each stage of the method (K1 ) are calculated using the values of the previous stage (K1 ), followed by their composition: K0 = Nf (yn , W), K1 = Nf (yn + hK0 , W), yn+1 = yn + h/2(K0 + K1 ), (4) where yn , yn+1 are network input and output respectively, Nf , W are ANNmodules that implement the right-hand sides of the ODE system and their weights. When training the network, the delta rule (5) is used to modify the weights of the ANN-modules. The derivatives are calculated by the chain rule considering the fact that the network error propagates through the cascade circuit to the network input and the same ANN-modules are used at each stage of the method: ∂yn+1 ∂yn+1 h ∂K0 ∂K1 ∂E = −2(on+1 − yn+1 ) , = + , ∂wj ∂wj ∂wj 2 ∂wj ∂wj ∂K0 ∂Nf (yn , W) ∂W = , (5) ∂wj ∂W ∂wj ∂K1 ∂Nf (yn + hK0 , W) ∂K0 ∂Nf (yn + hK0 , W) ∂W h = + , ∂wj ∂y ∂wj ∂W ∂wj

338

D. S. Kozlov and Y. V. Tiumentsev

For the proposed semi-empirical models that realize implicit integration schemes, it is not possible to implement a cascade scheme in the network architecture. When calculating the derivative in accordance with equation (3), the error propagates only through hbi f (tn + cj h, Ynj , Znj ). We perform calculations taking into account that Yni , Zni are known. Since we use the same ANN-modules for each stage, several values obtained for such a weight coefficient are obtained by such a delta rule. For modification, we use the smallest value.

3

Simulation Results

The proposed semi-empirical models can be used in the algorithms of trajectory prognosis for the aircraft descending in the upper atmosphere. During the descending, the flight trajectory we can divide into separated parts. The motion along each part can be performed when state variables satisfy specific constraint in the form of algebraic equality [5–7]. The training of the semi-empirical model allows elaborating the models of aerodynamic coefficients implemented in it as a separate artificial neural network (ANN) modules. Let us consider the identification task for the aerodynamic pitching moment coefficient Cm within a model of the hypersonic vehicle motion. The hypersonic vehicle model from [8] and the standard model of the atmosphere are used in the simulation. A full model of the vehicle motion, containing differential equations describing the trajectory and angular motion, as well as the equations of the actuators of control surfaces (6)–(7) is considered. This model is for the zerothrust phase of the flight. We use the right/left elevons and the rudder as control surfaces. V cos γ sin ψW ˙ cos θ sin φ + sin γ sin β V , λ= cos γ cos ψW , sin φW = , r cos λ r cos γ cos β −fxW 2 r cos λ (sin λ cos ψW cos γ − cos λ sin γ) , H˙ = V sin γ, V˙ = − g sin γ − ωE m fyW V ψ˙ W = + cos γ sin ψW tan λ − 2ωE (cos λ cos ψW tan γ − sin λ) mV cos γ r 2 r cos λ sin λ sin ψW ωE −fzW cos γ V 2 + , γ˙ = + − g + 2ωE cos γ sin ψw V cos γ mV V r μ˙ =

2 r cos λ ωE (sin λ cos ψW sin γ + cos λ cos γ), Tr2 d¨r = −2Tr ξr d˙r − dr + dr,act , V Ta2 dä = −2Ta ξa d˙a − da + da,act , Te2 dë = −2Te ξe d˙e − de + de,act , M = B · ME , ¯ Iy q˙ + (Ix − Iz )pr = M ¯ , Iz r˙ + (Iy − Ix )qp = N ¯, Ix p˙ + (Iz − Iy )rq = L, ⎡ ˙⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎤⎡ cos φ sin φ

ψ MxE cos λ 0 p − Mx 0 cos θ cos θ ωE + μ˙ ⎣ θ˙ ⎦ = ⎣0 cos φ −1⎦ , − sin φ ⎦ ⎣q − My ⎦ , ⎣MyE ⎦ = ⎣ 0 λ˙ r − Mz MzE − sin λ 0 1 sin φ tan θ cos φ tan θ φ˙

+

(6)

Semi-empirical ANN Index 2 DAE Models of Hypersonic Vehicle 3D-Motion GW Z + macW Z , da,act = dpitch + droll , mV cos β GW Y + macW Y β˙ = p sin α − r cos α + , de,act = dpitch − droll , mV fxW = −D, fyW = Y cos φW + L sin φW , fzW = Y sin φW − L cos φW ,

339

α˙ = q − tan β(p cos α + r sin α) +

(7)

where μ is the longitude, λ is the geocentric latitude, γ is the relative flight path angle, ψW is the relative azimuth, H is the altitude, r is the distance from the Earth center to the center of mass of the vehicle, V is the relative velocity, φW is the bank angle, α is the angle of attack, β is the angle of sideslip, [ψ, θ, φ] are Euler angles, [p, q, r]T are components of angular velocity vector, B is a matrix transforming vectors from vehicle-carried local Earth reference frame to body-fixed, D, L, Y are total aerodynamic drag, lift and side forces respectively, ¯ M ¯,N ¯ are aerodynamic rolling, pitching and yawing moments respectively, L, Ix , Iy , Iz are the roll, pitch and yaw moments of inertia respectively, da , de , dr are the deflections of the right and left elevons and the rudder, da,act , de,act , dr,act are control signals for right and left elevons and the rudder actuators, dpitch , droll are pitch and roll motion control signals, T = 0.02 sec are the time constants for right/left elevons and rudder actuators, ξ = 0.707 are the right/left elevons and rudder actuators damping ratios, ωE is the Earth rotational rate, g is the geopotential function, m = 191902 lb is the mass of the vehicle, acW , GW are the vectors of the Coriolis acceleration and force of gravity in wind-axes reference frame respectively. In the DAE system H, μ, λ, V, ψW , γ, ψ, θ, φ, p, q, r, α, β, da , de , dr are state variables, droll is an algebraic variable. Pitch and roll motion control signals dpitch , droll are control variables. The rudder control law is given in [6]. We calculate values droll at each step of the numerical integration of the DAE system following the (α–φW )-technique for control of aircraft descending in the upper atmosphere [5–7]. To ensure movement along a given trajectory, the model (6)– (7) is enclosed by an algebraic equality (8) describing the variation of relative flight path angle γ in the range of [−4.2385◦ , −10◦ ]. The resulting system of equations can be attributed to the index-2 DAE system. The equation (8) is transformed for calculations. For variable γ˙ the index reduction by differentiation procedure and the right-side (6)–(7) substitution are performed. 0 = γ + 4.2385 + 9 (t/200)2 , 0 = γ˙ + 18t/40000.

(8)

We generate the semi-empirical model in the form of the modular neural network. During the simulation, ANN-module is retrained, which implements the pitching moment coefficient Cm . As a new moment coefficient, the model Cm is used for the Mach number more than when the maneuver is performed (M +5). In the training set dpitch sequences of a particular form as input data [5,6] are used. We use the values of the pitch rate q as the output data. The weight coefficients are changed to reproduce a new relationship during the training procedure. We used MATLAB system and Neural Network Toolbox package when implementing the semi-empirical models and in the course of the computer simulations. We used t0 = 419 s and 1000 iterations were performed with the integration

340

D. S. Kozlov and Y. V. Tiumentsev

dpitch, deg

40 30 20 2

4

6

8

10

12

14

16

18

t, sec

2

4

6

8

10

12

14

16

18

t, sec

2

4

6

8

10

12

14

16

18

t, sec

2

4

6

8

10

12

14

16

18

t, sec

2

E

q

10 0 −3 x 10 3

1

q, rad/sec

0 0 0.02 0.015 0.01 0.005

droll, deg

0 0 0

−0.5

−1 0

Fig. 2. The semi-empirical model output for values from the test set

step t = 0.2 s. The initial values were H = 1.272e+5 ft, V = 6.922e+3 ft/sec, γ = −4.2385◦ , ψW = 55.316◦ , μ = 183.8◦ , λ = 34.4◦ , ψ = 69.767◦ , θ = 9.64◦ , φ = 46.69◦ , α = 20◦ , β = 0◦ , ω = 0 rad/sec, droll = 0◦ d˙ = 0, da = de = 0◦ , dr = 1◦ . The hypersonic vehicle characteristics Ix , Iy , Iz , xcg , S, c¯, b and aerody¯ M ¯,N ¯ ,) are given from namic force and moment coefficient models (D, L, Y, L, [8]. To implement the model of the hypersonic vehicle motion a semi-empirical model was used that realizes order 3 IRK method of numerical integration based on Radau IIA quadrature formulas. A perceptron type network with 12 neurons in the hidden layer was used as an ANN-module for Cm . In Fig. 2 we show the values of the pitch control signal (dpitch ) from the test set, the values of the pitch rate q calculated using the semi-empirical model, the values of the algebraic variable (droll ) and the relevant absolute error (Eq ) of the q values reproduced by the semi-empirical model. The root mean square deviations for the training, the validation, and the test sets are respectively 6.6207e−4, 7.8975e−4, 0.0014.

4

Conclusions

The semi-empirical model was implemented using the equations of the full model of the hypersonic vehicle motion in the specific part of the descent in the atmosphere as theoretical knowledge. We present this system of equations as a DAE system of index 2. The aerodynamic pitching moment coefficient implemented as an ANN-module of a semi-empirical model has been identified to verify the training properties of this model. The obtained results demonstrate the efficiency of the semi-empirical approach for neural network modeling of complex dynamical systems.

Semi-empirical ANN Index 2 DAE Models of Hypersonic Vehicle 3D-Motion

341

Acknowledgments. This research is supported by the Ministry of Science and Higher Education of the Russian Federation as Project No. 9.7170.2017/8.9.

References 1. Egorchev, M.V., Kozlov, D.S., Tiumentsev, Y.V., Chernyshev, A.V.: Neural network based semi-empirical models for controlled dynamical systems. J. Comput. Inf. Technol. 9, 3–10 (2013). (in Russian) 2. Wang, Y.J., Lin, C.T.: Runge-Kutta neural network for identification of dynamical systems in high accuracy. IEEE Trans. Neural Netw. 9(2), 294–307 (1998) 3. Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems, 2nd edn. Springer, Heidelberg (2002) 4. Egorchev, M.V., Tiumentsev, Y.V.: Learning of semi-empirical neural network model of aircraft three-axis rotational motion. Opt. Mem. Neural Netw. (Inf. Opt.) 24(3), 201–208 (2015) 5. Kozlov, D.S., Tiumentsev, Y.V.: In: Proceedings of 8th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2017, vol. 128, pp. 252–257 (2018) 6. Kozlov, D.S., Tiumentsev, Y.V.: Neural network based semi-empirical models of 3D-motion of hypersonic vehicle. In: Advances in Neural Computation, Machine Learning, and Cognitive Research II, pp. 196–201. Springer, Cham (2019) 7. Kozlov, D.S., Tiumentsev, Y.V.: Neural network based semi-empirical models for dynamical systems described by differential-algebraic equations. Opt. Mem. Neural Netw. (Inf. Opt.) 24(4), 279–287 (2015) 8. Shaughnessy, J.D., et al.: Hypersonic vehicle simulation model: winged-cone configuration. Technical report, NASA (1990)

Style Transfer with Adaptation to the Central Objects of the Scene Alexey Schekalev1 and Victor Kitov1,2(B) 1

2

Lomonosov Moscow State University, Moscow, Russia [email protected] Plekhanov Russian University of Economics, Moscow, Russia [email protected] https://victorkitov.github.io

Abstract. Style transfer is a problem of rendering an image with some content in the style of another image, for example a family photo in the style of a painting of some famous artist. The drawback of classical style transfer algorithm is that it imposes style uniformly on all parts of the content image, which perturbs central objects on the content image (such as face and body in case of a picture with a person), and makes them unrecognizable. This work proposes a novel style transfer algorithm which automatically detects central objects on the content image, generates spatial importance mask and imposes style non-uniformly: central objects are stylized less to preserve their recognizability and other parts of the image are stylized as usual to preserve the style. Three methods of automatic central object detection are proposed and evaluated qualitatively and via a user evaluation study. Both comparisons demonstrate higher quality of stylization compared to the classical style transfer method. Keywords: Computer vision Image classification

1

· Image processing · Style transfer ·

Introduction

Non-photorealistic rendering or image stylization [5] is a classical problem in computer vision, where the task is to render a content image in a given style. Early methods [3,7,9] perform reproduction of specific styles (e.g. oil paintings or pencil drawings) and use hard-coded features and algorithms for that. Style transfer is a problem of transferring any style from arbitrary image, representing that style, to any content image, as shown on Fig. 1. It is found by Gatys et al. [2] that this task can be performed surprisingly well using deep convolutional neural networks. Their main idea is to find in the space of images a picture semantically reflecting content from the content image and style from the style image. These two contradicting goals are regulated by minimizing simultaneously content loss and style loss: y = arg min{Lcontent (x, xc , α) + Lstyle (x, xs )}

x c Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 342–350, 2020. https://doi.org/10.1007/978-3-030-30425-6_40

(1)

Style Transfer with Adaptation to the Central Objects of the Scene

343

Fig. 1. Style transfer task

where xc is the content image, xs —the style image, y—the resulting stylized image and parameter α is a weight factor (multiplier) inside the content loss function, controlling the strength of stylization (Fig. 2a). Lower α imposes more style and vice versa. The shortcoming of this approach is that style is imposed uniformly onto the whole content image, distorting important central objects of the image, which are critical for perception. For example, it is hard to say what kind of birds sit on the tree (Fig. 2b), because small details of bird silhouettes are lost during stylization.

(a)

(b)

Fig. 2. (a) Style transfer for different α. (b) Problem case

One may improve preservation of content by increasing α coefficient in (1). However this solution decreases stylization strength globally, thus giving less expressive stylization. The paper proposes a new solution to this problem. First, central objects are detected and selected using automatically generated spatial importance mask

344

A. Schekalev and V. Kitov

for the content image. Next, this mask is used to impose style with spatially varying strength, controlled by the importance mask. This allows to achieve two contradicting goals. Stylization is gentle on the central objects of the image, critical for perception, such as human faces, houses, cars, etc. And stylization is strong for the rest of the image, thus expressing a vivid style. The paper is organized as follows. Section 2 gives a description of the proposed method and provides qualitative comparisons with the baseline stylization method of Gatys et al. [2]. Section 3 provides the details of the user evaluation study and summarizes its results, highlighting the superiority of the proposed solution. Section 4 concludes.

2 2.1

Method Non-uniform Stylization

Consider the loss function in the optimization problem (1). In the original paper [2] content loss is formalized as follows: 2 l l Fi,j,c (x) − Fi,j,c (xc ) (2) Lcontent (x, xc , α) = α i,j,c

where F l (z) ∈ RWl ×Hl ×Cl denotes inner tensor representation of image z on the l-th layer of the convolutional neural network, (i, j) are spatial coordinates and c is the number of the channel. Instead of using scalar α, we propose to use a matrix αi,j ∈ RWl ×Hl with different values for each spatial location (i, j): L content (x, xc , α) =

l 2 l αi,j Fi,j,c (x) − Fi,j,c (xc )

(3)

i,j,c

Making α spatially varying allows spatial control of the stylization strength. In particular, it allows to impose less style on central objects of the scene, critical for perception, and more style on all other areas of the image. 2.2

Automatic Central Objects Detection

Consider convolutional neural network pre-trained for image classification. We use VGG [8]. Such model takes input image of size 224 × 224 × 3 (for different size scaling is performed) and outputs probability distribution for each class from ImageNet dataset [1]. We detect central objects by filling different parts of the input image with uniform color and measuring change in the output class probabilities. If key object of the image is filled, one observes a drastic change in resulting class probabilities. On the contrary, if background is changed, class probabilities change only slightly. Overall, the magnitude of change of class probabilities determines the importance of the filled region. This approach was used to visualize convolutional neural networks in classification problems [10], but in the domain of style transfer, to our knowledge, it is used for the first

Style Transfer with Adaptation to the Central Objects of the Scene

345

time. We split the whole image into a set of regions and fill each region one by one, evaluating its importance, using the above principle. This way we construct an importance map αi,j , measuring semantic significance of each location (i, j) on the image. This importance map is used as matrix α in the spatially aware content loss (3) of the style transfer algorithm (1).

(a)

(b)

Fig. 3. (a) The probability distribution for input. (b) Changing the probability distribution when the patch is overwritten

Fixed Patch-Based Mask Generation. In this approach we propose to divide the image by a uniform grid into regular square patches p1 , ...pK (like the input image on Fig. 3b). Denote input image with I, and Ik - input image with k-th patch filled with constant color. We use pretrained classification convolutional neural network cnn(·), that takes image as input and outputs a vector of class probabilities, corresponding to the image. We estimate the importance of each patch k by calculating L2 distance between vectors of class probabilities for original and modified image: cnn(I) − I(Ik ). Visualization of results shows that proposed algorithm can find central object of the scene and separate it from the background—the muzzle of a dog on Fig. 4a. We rescale a map of patch importances to the spatial size Wl × Hl of intermediate image representation in convolutional neural network on layer l, where content loss (3) is calculated, to obtain weights αi,j . Next we apply style transfer procedure (1) with spatially varying content loss (3). At Figs. 4b and c the difference is shown between the baseline approach (style transfer with the standard content loss (2)) and the proposed model correspond-

346

A. Schekalev and V. Kitov

ingly. There are a lot of small details at dog’s muzzle that are lost in baseline approach and preserved in our algorithm.

(a)

(b)

(c)

Fig. 4. (a) Patch importance. (b) Baseline. (c) Patch stylisation

Average Patch-Based Mask Generation. It is found that fixed patch grid produces step-like boundary, consisting of horizontal and vertical edges, that is not flexible enough to surround central object of arbitrary shape. For example, on Fig. 4a important patch covers not only the central object, but also the background in a step-like manner. To extend the shape of the boundary we additionally propose to use previous fixed patch grid algorithm for different positions of the grid mesh and combine results together by pixel-wise averaging, as shown on Fig. 5a. Resulting stylizations for the baseline and the proposed method are shown on Figs. 5b and c. Averaging of different matrices allows to obtain smooth distribution of weights with a smooth gradual boundary of elliptical shape. Superpixel-Based Mask Generation. If central objects have complicated, especially non-convex, boundaries, the proposed method becomes unsuitable. To improve the results, instead of using a uniform patch grid, we suggest to split the image into superpixels [6]. Superpixel extraction algorithm divides the image into small segments (superpixels), the boundaries of which are the regions of sharp color change, which reflect very accurately the true boundaries of the objects on the image (Fig. 6a). The importance of each superpixel is approximated by the average importance of the square patches, belonging to the superpixel, which in turn can be estimated using fixed or average patch-based mask generation algorithm described above. Superpixel algorithm has two main parameters responsible for the number of segments and the shape of the boundaries. We run the algorithm over a set of typical values for these parameters and then average the obtained masks for better quality, see Fig. 6b.

Style Transfer with Adaptation to the Central Objects of the Scene

(a)

(b)

347

(c)

Fig. 5. (a) Averaging α matrices. (b) Baseline. (c) Averaging patch stylization.

(a)

(b)

Fig. 6. (a) Superpixels. (b) Averaging α matrices

Figure 7 shows qualitative difference between uniform stylization (a) averaging patch-based (b) and superpixel-based (c) spatially varying stylization. Boundaries of the central object – the glass – are non-convex, thus superpixelbased approach extracts the boundary of such object better, which improves the quality of the final stylization. Segmentation-Based Mask Generation. Deep learning models are good at image segmentation tasks [11]. To select the boundaries of central objects more accurately we can split the image into obtained semantic segments. The importance of each segment can be approximated by the average importance of the square patches, belonging to the segment, which in turn can be estimated by the fixed or average patch-based mask generation algorithm described above. This approach allows to increase quality of stylization when it is easy to separate central object from background using segmentation algorithms. Illustrative example on Fig. 8 shows, that stylization algorithm with segmentation locates the car exactly along its border, which allows to build accurate importance map, which is very consistent with the actual border of the central object. In contrast, superpixel-based algorithm affects some pixels near the car, which makes final style transfer less sharp along the border of the central object.

348

A. Schekalev and V. Kitov

(a)

(b)

(c)

Fig. 7. (a) Baseline. (b) Averaging patch stylization. (c) Averaging super-pixel stylization

(a)

(b)

Fig. 8. (a) Superpixel stylization. (b) Segmentation stylization

3

User Evaluation Study

To evaluate quantitatively the advantage of the proposed methods, compared to the algorithm of Gatys et al. [2], we conduct user evaluation studies. In the study a user is shown a pair of stylizations—by our method and by the baseline method, and he is asked to select a stylization he likes more. Stylizations are shown in random order to omit location bias. This procedure is repeated for a set of six users and a representative set of content and style images, forming together twenty nine stylization outputs. We conduct three surveys, comparing baseline stylization algorithm of Gatys et al. [2] with our method with average patch-based, superpixel-based and segmentation-based importance mask generation. Results, reporting how often our method with each kind of modification is preferred in comparison to the uniform baseline, are shown on Table 1. It can be seen that our method in all its modifications outperforms the baseline stylization method. Image segmentation modification gives maximum benefit, which can be attributed to the fact that it extracts the boundaries of central objects more accurately.

Style Transfer with Adaptation to the Central Objects of the Scene

349

Table 1. Frequencies with which each of the proposed methods are preferred compared to the baseline of Gatys et al. [2]. Frequency Patches-based importance generation

66%

Superpixel-based importance generation

72%

Segmentation-based importance generation 80%

4

Conclusion

A new style transfer method with spatially varying strength is proposed in this work. Stylization strength is controlled for each pixel by automatically generated importance mask. Three methods—patch-based, segmentation-based and superpixel-based—are proposed to generate importance mask. Qualitative comparisons and conducted user evaluation studies demonstrate superiority of the proposed method compared to the classical style transfer method of Gatys et al. [2] due to strong and expressive style transfer for the background and more gentle style transfer for the central objects of the content image, allowing to minimize distortions of the important details. Among three proposed importance mask generation approaches, segmentation-based method showed the highest quality which may be attributed to more accurate boundary estimation of the central objects of the image.

References 1. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255. IEEE (2009) 2. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423 (2016) 3. Gooch, B., Gooch, A.: Non-photorealistic rendering. AK Peters/CRC Press, Natick (2001) 4. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 5. Research, A.: Image stylization: history and future. https://research.adobe.com/ news/image-stylization-history-and-future/. Accessed 2 July 2019 6. Rosebrock, A.: Segmentation: a slic superpixel tutorial using python. https:// www.pyimagesearch.com/2014/07/28/a-slic-superpixel-tutorial-using-python/. Accessed 2 July 2019 7. Rosin, P., Collomosse, J.: Image and video-based artistic stylisation, vol. 42. Springer, Heidelberg (2012) 8. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

350

A. Schekalev and V. Kitov

9. Strothotte, T., Schlechtweg, S.: Non-photorealistic Computer Graphics: Modeling, Rendering, and Animation. Morgan Kaufmann, Burlington (2002) 10. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer, Cham (2014) 11. Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ade20k dataset. Int. J. Comput. Vis. 127(3), 302–321 (2019)

The Construction of the Approximate Solution of the Chemical Reactor Problem Using the Feedforward Multilayer Neural Network Dmitriy A. Tarkhov(&)

and Alexander N. Vasilyev

Peter the Great St. Petersburg Polytechnical University, 29 Politechnicheskaya Street, 195251 Saint-Petersburg, Russia [email protected], [email protected]

Abstract. A significant proportion of phenomena and processes in physical and technical systems is described by boundary value problems for ordinary differential equations. Methods of solving these problems are the subject of many works on mathematical modeling. In most works, the end result is a solution in the form of an array of numbers, which is not the best for further research. In the future, we move from the table of numbers to more suitable objects, for example, functions based on interpolation, graphs, etc. We believe that such an artificial division of the problem into two stages is inconvenient. We and some other researchers used the neural network approach to construct the solution directly as a function. This approach is based on finding an approximate solution in the form of an artificial neural network trained on the basis of minimizing some functional which formalizing the conditions of the problem. The disadvantage of this traditional neural network approach is the time-consuming procedure of neural network training. In this paper, we propose a new approach that allows users to build a multi-layer neural network solution without the use of time-consuming neural network training procedures based on that mentioned above functional. The method is based on the modification of classical formulas for the numerical solution of ordinary differential equations, which consists in their application to the interval of variable length. We demonstrated the efficiency of the method by the example of solving the problem of modeling processes in a chemical reactor. Keywords: Ordinary differential equations Boundary value problems Multilayer neural networks Chemical reactor model

1 Introduction Our version of the neural network approach to solving differential equations turned out to be quite universal [1–8]. At the same time, it was not devoid of several drawbacks compared to the classical methods of meshes, finite elements, etc. First, neural network training is a very resource-intensive procedure. Secondly, the required size of the neural network and the time of its training increase dramatically with strengthening the requirements for the accuracy of the model. In this paper, we consider the methods of formation of multilayer functional approximations proposed by © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 351–358, 2020. https://doi.org/10.1007/978-3-030-30425-6_41

352

D. A. Tarkhov and A. N. Vasilyev

us in [9] without a time-consuming learning procedure. The result is an analog of deep learning [10, 13, 14]. The essence of the approach is to apply the known recurrent formulas of numerical integration of differential equations [11] to the interval with a variable upper limit. The result is an approximate solution in the form of a function of this upper limit.

2 Materials and Methods Let us consider the Cauchy problem for a system of ordinary differential equations

y0 ðxÞ ¼ fðx; yðxÞÞ; yðx0 Þ ¼ y0

ð1Þ

on the interval D ¼ ½x0 ; x0 þ a. Here x 2 D R; y Rp ; f : Rp þ 1 ! Rp : For the numerical solution of the Cauchy problem (1) on the interval ½x0 ; x0 þ a, a wide palette of numerical methods is developed [11]. A significant part of them consists in dividing the given interval by points xk into intervals of length hk ; k ¼ 1; . . .; n and applying the recurrent formula: yk þ 1 ¼ yk þ Fðf; hk ; xk ; yk Þ:

ð2Þ

Here the operator F defines a specific method. A polyline (Euler’s polyline) or a spline is drawn according to the obtained point approximations to get an approximate solution in the form of a function. We propose to apply n times the formula (2) to the interval with a variable upper limit ½x0 ; x ½x0 ; x0 þ a (herewith hk ¼ hk ðxÞ, y0 ðxÞ ¼ y0 , yk ¼ yk ðxÞ). The result is a function yn ðxÞ, which can be considered as an approximate solution of Eq. (1). In the simplest case of uniform partitioning, we obtain hk ¼ nx, xk ¼ x0 þ xnk. For the explicit Euler method, we have Fðf; hk ; xk ; yk Þ ¼ hk fðxk ; yk Þ. The estimation of the resulting approximations in the form of inequality kyðxk Þ yk k Cmaxðhk Þ

ð3Þ

is known. The constant C depends on the estimates of the function f and its derivatives in the region in which the solution is found [11]. More accurate formulas are obtained by applying second-order methods [11], for which the estimate (3) is replaced by the estimate kyðxk Þ yk k Cmaxðhk Þ2 . One such method is the corrected Euler method, which works according to the formula: Fðf; hk ; xk ; yk ; yk þ 1 Þ ¼ hk ½fðxk ; yk Þ þ

hk 0 ðf x ðxk ; yk Þ þ f 0 y ðxk ; yk Þfðxk ; yk ÞÞ 2

ð4Þ

The Construction of the Approximate Solution of the Chemical Reactor

353

For the second-order equation of the form y00 ðxÞ ¼ fðx; yÞ, the Störmer method is even more accurate [11] yk þ 1 ¼ 2yk yk1 þ h2k fðxk ; yk Þ:

ð5Þ

Quite often in practice, there are cases when the formulation of the problem (1) includes parameters.

y0 ðxÞ ¼ fðx; yðxÞ; lÞ; yðx0 Þ ¼ y0 ðlÞ:

ð6Þ

Here the vector of the mentioned parameters is denoted by l. In this situation, the problem (6) is usually solved numerically for a sufficiently representative set of parameters. Our approach automatically gives an approximate version of the required dependence, as which is taken yn ðx; lÞ. Another common complication of the problem (1) is the boundary value problem, which has the form

y0 ðxÞ ¼ fðx; yðxÞÞ; uðx0 Þ ¼ u0 ; vðx0 þ aÞ ¼ v0 :

Here vectors u; v are composed of coordinates of vector y; their total dimension is equal to the dimension of vector y. The boundary value problem can be reduced to a problem with a parameter

y0 ðxÞ ¼ fðx; yðxÞÞ; uðx0 Þ ¼ u0 ; wðx0 Þ ¼ l:

ð7Þ

A vector w contains coordinates of a vector y that are not included in a vector u. As before, we construct a multilayer solution of the problem (7) yn ðx; lÞ. This allows us to get the equation from the conditions on the right end of the interval vn ðx0 þ a; lÞ ¼ v0 ; solving the equation, we find l. This our approach can be considered as a functional variant of the shooting method. Next, we consider its application for a specific application problem. The perspective direction of our approach development is connected with the use of the neural network approximation of the function fðx; yÞ in formula (2) rather than the function itself. As a result, even for single-layer neural network functions fðx; yÞ, we obtain a solution in the form of a multilayer neural network. We have received such a solution for the above-mentioned specific task. We consider the stationary problem of thermal explosion in the plane-parallel case [12] under the assumption that the reaction is one-stage, irreversible, not accompanied by phase transitions, and it occurs in a stationary medium.

354

D. A. Tarkhov and A. N. Vasilyev

We have built an approximate solution of the boundary value problem: d2y dy þ d expðyÞ ¼ 0; ð0Þ ¼ 0; yð1Þ ¼ 0: dx2 dx

ð8Þ

This problem is interesting because we know the exact solution, the domain of existence of the solution, and the parameter values at which the solution of the problem does not exist (d [ d 0:878458).

3 Calculation According to the above considerations, at the first step, we approximate the exponent from Eq. (8) by the perceptron expðyÞ 4:09 3:71 tanh½1:19 0:794y on the interval ½0; 1 (it is known [12] that the sought solution is on this interval). In constructing the multilayer solution, we used our modification of the corrected Euler method (4) as the first step and our modification of the Stӧrmer method (5) as the next. For two layers, we obtain an approximate solution: y2 ðx; dÞ y0 2:04x2 d þ 0:92x2 d tanh½1:19 0:794y0 þ 0:928x2 d tanh½1:19 0:794ðy0 þ 0:464x2 dð1:10 þ tanh½1:19 0:794y0 ÞÞ: Here y0 is the unknown initial value of the desired function at the left end of the interval ½0; 1. To define a parameter y0 , we use a condition on the right end of the interval yð1Þ ¼ 0, acting in one of two ways. The first method is to define the value y0 for fixed values of the parameter d. The maximum difference between the exact solution and the approximate solution y2 ðx; dÞ at the parameter value d ¼ 0:1 was 0.00041, at d ¼ 0:5 was 0.0046, at d ¼ 0:8 was 0.14 (Fig. 1).

0.35

y2 x,0.5

y

y x,0.5

y2 x,0.8

y

y x,0.8

0.8

0.30 0.25

0.6

0.20 0.4

0.15 0.10

0.2

0.05 0.2

0.4

0.6

0.8

1.0

x

0.2

0.4

0.6

0.8

1.0

x

Fig. 1 The exact solution and the approximate two-layer solution y2 ðx; dÞ at the parameter value (a) d ¼ 0:5, (b) d ¼ 0:8.

The Construction of the Approximate Solution of the Chemical Reactor

355

The results showed that for small values of d the approximate solution is close to the exact solution. However, as the parameter d approaches the value d , the accuracy deteriorates significantly. For three layers, we obtain an approximate solution: y3 ðx; dÞ y0 2:04x2 d þ 0:619x2 d tanh½1:19 0:795y0 þ 0:825x2 d tanh½1:19 0:795y0 þ 0:180x2 d 0:164x2 d tanh½1:19 0:795y0 1:19 0:795y0 þ 0:722x2 d 0:328x2 d tanh½1:19 0:795y0 þ 0:413x2 d tanh 0:328x2 d tanh½1:19 0:795ðy0 þ 0:206x2 dð1:1 þ tanh½1:19 0:795y0 ÞÞ

The exact solution and the approximate three-layer solution y3 ðx; dÞ at d ¼ 0:1 and at d ¼ 0:5 practically merge, so we do not give the corresponding graphs. The maximum difference between the exact solution and the approximate solution y3 ðx; dÞ at the parameter value d ¼ 0:1 was 0.00037, at d ¼ 0:5 was 0.0016, and at d ¼ 0:8 was 0.026. As the number of layers increases, accuracy enhances, but formulas become more cumbersome. The maximum difference between the exact solution and the approximate four-layer solution y4 ðx; dÞ at d ¼ 0:1 made 0.00032, when d ¼ 0:5 this made 0.00044, it was 0.015 when d ¼ 0:8. We present graphs of the exact solution and the approximate three-layer solution y3 ðx; dÞ and four-layer solution y4 ðx; dÞ at the parameter value d ¼ 0:8 in Fig. 2. y3 x,0.8

y 0.8

y x,0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.2

0.4

0.6

0.8

y4 x,0.8

y 0.8

1.0

x

a)

0.2

0.4

y x,0.8

0.6

0.8

1.0

x

b)

Fig. 2 The exact solution and the approximate solution at the parameter value d ¼ 0:8: (a) threelayer y3 ðx; dÞ, (b) four-layer y4 ðx; dÞ.

The second way to determine the parameter y0 is to build a neural network dependency y0 ðdÞ. To do this, we use the condition on the right end yn ð1; dÞ ¼ 0, minimizing the functional m X i¼1

y2n ð1; di Þ:

ð9Þ

356

D. A. Tarkhov and A. N. Vasilyev

Further, we present a result for which getting we used a three-layer solution. When optimizing the functional (9) for m ¼ 100 and di ¼ id =m we got the dependence y0 ðdÞ ¼ 1:52 1:65 tanh½1:54 1:28d: In this case, we obtain an approximate solution y ðx; dÞ 1:52 2:04x2 d 1:65 tanh½1:54 1:28d 0:619x2 d tanh½0:0201 1:31 tanh½1:54 1:28d 0:0202 0:18x2 d 1:31 tanh½1:54 12:8d 0:825x2 d tanh 2 1:31 tanh½1:54 128d 2 0:164x d tanh½0:0202 3 0:0202 0:722x2 d 1:31 tanh½1:54 1:28d 6 0:328x2 d tanh½0:0202 1:31 tanh½1:54 1:28d 7 7 0:413x2 d tanh6 2 4 5 0:0202 0:18x d 1:31 tanh½1:54 12:8d 0:328x2 d tanh 0:164x2 d tanh½0:0202 1:31 tanh½1:54 128d The maximum difference between the exact solution and the approximate solution y ðx; dÞ at d ¼ 0:1 was 0.0055, this at d ¼ 0:5 was 0.0069, and for the parameter value d ¼ 0:8 was 0.014. To illustrate the accuracy of the obtained solution, we give the following graphs (Fig. 3).

y x,0.5

y

y x,0.5

y x,0.8

y

y x,0.8

0.35 0.30 0.6

0.25 0.20

0.4

0.15 0.10

0.2

0.05 0.2

0.4

a)

0.6

0.8

1.0

x

0.2

0.4

0.6

0.8

1.0

x

b)

Fig. 3 The exact solution and the approximate solution y ðx; dÞ at the parameter value: (a)d ¼ 0:5, (b) d ¼ 0:8.

We compared the results with the classical method of obtaining approximate solutions of differential equations, namely, with the expansion in the powers of the parameter d. We present the results for the expansion up to the third degree u3 ðx; dÞ ¼ d 1 x2 =2 þ d2 5=24 x2 =4 þ x4 =24 þ d3 127=720 11x2 =48 þ x4 =16 7x6 =720 :

The Construction of the Approximate Solution of the Chemical Reactor

357

The maximum difference between the exact and approximate solution u3 ðx; dÞ with the parameter value d ¼ 0:1 was 0.000035, with the parameter value d ¼ 0:5 was 0.0048, and with the parameter value d ¼ 0:8 was 0.12. To illustrate the accuracy of the obtained solution, we give the following graphs (Fig. 4). u3 x,0.5

y

y x,0.5

u3 x,0.8

y

y x,0.8

0.30 0.6

0.25 0.20

0.4 0.15 0.10

0.2

0.05 0.2

0.4

0.6

a)

0.8

1.0

x

0.2

0.4

0.6

0.8

1.0

x

b)

Fig. 4 The exact solution and approximate solution u3 ðx; dÞ at the parameter value: (a) d ¼ 0:5 and (b) d ¼ 0:8.

As we expected, our method gives a more uniform approximation over the entire interval of parameter d change.

4 Conclusion We have studied new methods for constructing approximate neural network solutions of differential equations. The methods do not require the use of resource-intensive training procedures and allow building solutions with guaranteed accuracy. As a test problem, we considered the solution of the boundary value problem (8), which simulates the processes in a chemical reactor [12]. As a result, we obtained the above explicit solutions, which are more accurate than approximate solutions [3], in which a network with 100 neurons was used. Acknowledgment. This paper is based on research carried out with the financial support of the grant of the Russian Scientific Foundation (project №18-19-00474).

References 1. Tarkhov, D., Vasilyev, A.: New neural network technique to the numerical solution of mathematical physics problems. I Simple Probl. Opt. Mem. Neural Netw. (Inf. Opt.) 14, 59– 72 (2005) 2. Tarkhov, D., Vasilyev, A.: New neural network technique to the numerical solution of mathematical physics problems. II Complicated Nonstand. Probl. Opt. Mem. Neural Netw. (Inf. Opt.) 14, 97–122 (2005)

358

D. A. Tarkhov and A. N. Vasilyev

3. Shemyakina, T.A., Tarkhov, D.A., Vasilyev, A.N.: neural network technique for processes modeling in porous catalyst and chemical reactor. In: Cheng, L. et al. (eds.) Advances in Neural Networks – ISNN 2016. Lecture Notes in Computer Science, vol. 9719, pp. 547–554. Springer, Cham (2016) 4. Budkina, E.M., Kuznetsov, E.B., Lazovskaya, T.V., Leonov, S.S., Tarkhov, D.A., Vasilyev, A.N.: Neural network technique in boundary value problems for ordinary differential equations. In: Cheng, L. et al. (eds.) Advances in Neural Networks – ISNN 2016. Lecture Notes in Computer Science, vol. 9719, pp. 277–283. Springer, Cham (2016) 5. Lozhkina, O., Lozhkin, V., Nevmerzhitsky, N., Tarkhov, D., Vasilyev, A.: Motor transport related harmful PM2.5 and PM10: from on-road measurements to the modeling of air pollution by neural network approach on street and urban level. In: Journal of Physics Conference Series, vol. 772 (2016). http://iopscience.iop.org/article/10.1088/1742-6596/772/ 1/012031 6. Kaverzneva, T., Lazovskaya, T., Tarkhov, D., Vasilyev, A.: Neural network modeling of air pollution in tunnels according to indirect measurements. In: Journal of Physics Conference Series, vol. 772 (2016). http://iopscience.iop.org/article/10.1088/1742-6596/772/1/012035 7. Lazovskaya, T.V., Tarkhov, D.A., Vasilyev, A.N.: Parametric Neural Network Modeling in Engineering. Recent Pat. Eng. 11(1), 10–15 (2017) 8. Antonov, V., Tarkhov, D., Vasilyev, A.: Unified approach to constructing the neural network models of real objects. Part 1 Math. Models Meth. Appl. Sci. 41(18), 9244–9251 (2018) 9. Lazovskaya, T., Tarkhov, D.: Multilayer neural network models, based on grid methods. In: IOP Conference Series: Materials Science and Engineering, vol. 158 (2016). http:// iopscience.iop.org/article/10.1088/1757-899X/158/1/01206 10. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015) 11. Hairer, E., Norsett, S. P., Wanner, G.: Solving Ordinary Differential Equations I: Nonstiff Problem, xiv, p. 480. Springer, Berlin (1987) 12. Hlavacek, V., Marek, M., Kubicek, M.: Modelling of chemical reactors Part X. Chem. Eng. Sci. 23 (1968) 13. Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Sig. Process. 7 (3–4), 1–199 (2014) 14. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)

Linear Prediction Algorithms for Lossless Audio Data Compression L. S. Telyatnikov(&) and I. M. Karandashev Scientific Research Institute for System Analysis of Russian Academy of Sciences, Moscow, Russia [email protected], [email protected]

Abstract. The paper considers the use of such linear interpolation algorithms as LPC, FLPC, and Wise-LPC in the lossless audio data compression. In addition to the interpolation methods, the problems of best coding and optimal sampling window selection are investigated. The Wise-LPC algorithm is shown to allow a 1–5% improvement of audio signal compression against conventional LPC and FLPC approaches. The prediction error has a Laplace distribution, its variance decreasing smoothly and reaching “saturation” with the growing window width. Keywords: LPC

FLPC Codec Sampling Compression

1 Introduction Neural net algorithms provide new tools for different fields of science and technology. They have recently helped to make a breakthrough in pattern and speech recognition, text translation, and intellectual multipath games such as Go, chess, etc. On the other hand, the data compression, storage and transmission still use the algorithms developed in the 80s and 90s or in early 2000s at best. These are such well-known lossless data compression algorithms and data formats as zip, png, flac, exe, and many other lossy compression techniques, e.g. mp3, jpeg, mpeg. Here we would like to elaborate on the FLAC data format [1–3] once again. Today this data format is most popular in lossless audio data compression. Article [2], which gives the basics of the algorithm, was taken as a starting point for further consideration. The FLAC format is the combination of linear predictive coding (LPC) [4] and Huffman-Golomb prediction error coding [5, 6]. Below we discuss the features of prediction and compression algorithms and present the experimental results.

2 Setting the Problem 2.1

Linear Predictive Coding (LPC)

We consider amplitude xt of an audio signal taken at an instant of time t. In the LPC method, value ~xt , which is a linear combination of p readings at preceding instants: © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 359–364, 2020. https://doi.org/10.1007/978-3-030-30425-6_42

360

L. S. Telyatnikov and I. M. Karandashev

~xt ¼

p X

ai xti ¼ a1 xt1 þ a2 xt2 þ . . . þ ap xtp

ð1Þ

i¼1

is formed to estimate amplitude xt . Instead of storing signal amplitudes it is sufficient to keep the coefficients of the linear model and corresponding prediction errors: et ¼ xt ~xt

ð2Þ

The nearer to zero the value of error (2) is, the fewer data bits are needed for storage. For this reason the unknown coefficients fai gpi¼1 are determined by minimizing the mean square deviation of the estimate from the actual amplitude: E¼

w X

xt

p X

!2 ai xti

ð3Þ

i¼1

t¼0

where xt are signal amplitudes at moments t 2 ½0; w, w is the sample length. Though the sample length w is not defined strictly and is often a mere standard requirement, the usual number of readings in sample w is much larger than the order p of the linear model (w p). For example, standard LCP10 used in speech compression has the prediction order p ¼ 10 and the number of readings w ¼ 120. It can be shown [4] that the minimization of (3) reduces to the set of p linear equations with a Toeplitz matrix consisting only of the autocorrelation coefficients: Rl ¼

w X

xt xtl

ð4Þ

t¼0

The Levinson-Durbin algorithm [4] with computational complexity Oðp2 Þ can be used to solve the set. Linear with respect to the sample length and quadratic with respect to the model complexity, the computational complexity of the whole LPC algorithm Oðwp2 Þ is mostly determined by the computation of autocorrelation coefficients Rl . The greater the sample length is, the more accurately autocorrelation coefficients Rl can be computed and the better the results of compression are. 2.2

Fixed Linear Predictive Coding (FLPC)

The FLPC algorithm is another method for determining coefficients fai gpi¼1 . Here the coefficients are not calculated, they are constants derived from the expansion of the signal in its derivatives. The first three linear estimates of the signal are: ~x1 ðtÞ ¼ xt1 ~x2 ðtÞ ¼ 2xt1 xt2 ~x3 ðtÞ ¼ 3xt1 3xt2 þ xt3

ð5Þ

Linear Prediction Algorithms for Lossless Audio Data Compression

361

The FLPC algorithm has an advantage over the LPC method in the unnecessity of computing autocorrelation coefficients (4). Since all coefficients fai gpi¼1 are fixed in the FLPC algorithm, there is no need to code and store anything but the errors. 2.3

Wise-LPC

It can be easily shown that FLPC of the p-th order gives the p-th derivative of the input signal. We suggest a new algorithm Wise-LPC which is a combination of FLPC and LPC algorithms. The idea is to determine how many derivatives of the signal (the order of FLPC) should be taken before the use of the LPC method. The Wise-LPC algorithm includes three steps: 1. Consecutive differentiation of the signal and computation of the error. 2. If the variance of the error for the n-th derivative is smaller than that for the ðn þ 1Þ-th derivative, the process is stopped and the n-th derivative is chosen. 3. The application of the p-th-order LPC to the n-th derivative. The time complexity remains linear when the Wise-LPC method is used. 2.4

Coding of the Remainder

Simple Huffman code [5, 6] is used to encode errors. It includes the following stages: 1. The sign of the number is determined: if it is positive, the code starts from 0, otherwise, from 1. 2. The variance of the error is used to choose parameter N (see formula (6)). N least significant bits of the number are written in the code. 3. The remaining bits define how many zeros are to be written in the code. 4. 1 is written at the end of the code. When the number of bits in the binary representation is less than N, zeros are added to the left end to make up the N-bit binary. The decoding engages the same operations as with encoding, but in the reverse order. This kind of simple Huffman code features the unnecessity of the frequency table, which is used in the usual Huffman code. It is necessary to determine the appropriate value of N to make the simple Huffman code work effectively. As we show below, the error has the Laplace distribution. This assumption (as it is shown in [5]) gives us the formula for optimal value of N: N ¼ dlog2 ðr ln 2Þ 0:5e

ð6Þ

where the r is the error variance.

3 Results 3.1

Window Width w

The first conclusions have to do with the window width (sample width) w. The division of the signal into samples is most important in realization of different codecs. Such

362

L. S. Telyatnikov and I. M. Karandashev

division is always made before processing and compression of the signal. The smaller the sample length is, the simpler the transmission of this portion of the signal is and the less risk that it gets distorted or lost in transmission. On the other hand, it was mentioned in paragraph 2.1 that the realization of the LPC algorithm requires that the sample length shouldn’t be too small because it affects the precision of determination of autocorrelation coefficients. As of now there is not any mathematically proved recommendation about which window width should be best for which kind of signal. Figure 1 illustrates the spread of error for p ¼ 3 when the window width varies. Figure 2a shows the relation between the variance of the error and the degree of approximation and window width w for the same audio signal. It is seen that with p ¼ 3 the widening of the window beyond w ¼ 4096 makes no sense because it doesn’t lead to notable improvement, i.e. “the saturation” comes.

Fig. 1. The spread of the error with varying window width w and p ¼ 3.

3.2

Comparing LPC, FLPC and Wise-LPC Algorithms in Signal Compression

The compression results of LPC, FLPC and Wise-LPC are shown in Fig. 2b and Table 1. The compression efficiency is evaluated as C ¼ Ip =I, where I is the size of the original file (wav file) in bytes, Ip is the size of the compressed file. We discovered that

Fig. 2. (a) The relation between the variance of the error and the degree of approximation p and window width w; (b) Comparison of audio signal compression using LPC, FLPC and Wise-LPC methods. The optimal degree of differentiation is n ¼ 2, the order of LPC changes from 0 to 9.

Linear Prediction Algorithms for Lossless Audio Data Compression

363

the best differentiation degree in Wise-LPC is dependent on the signal spectrum. The higher the upper frequency of the signal is, the less the differentiation degree. The upper frequency Fupper is determined by the threshold of 20 dB. It is seen from Table 1 that if we deal with high-frequency signals (Fupper ¼ 12. . .20 kHz), the differentiation degree is n ¼ 1 or n ¼ 2. In the case of low-frequency signals (Fupper ¼ 0. . .10 kHz), n ¼ 3 or n ¼ 4. The compression results for low-frequency signals are significantly better. In particular, the Wise-LPC algorithm should work well in compression of speech because the human speech frequency spectrum extends from 0:3 to 3:4 kHz. Table 1. The compression ratio for fifteen different audio files №

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

LPC compression of order p¼9 0.502 0.421 0.509 0.544 0.606 0.602 0.542 0.729 0.634 0.684 0.742 0.693 0.747 0.719 0.749

FLPC compression of order n ¼ n 0.407 0.409 0.424 0.449 0.611 0.68 0.56 0.796 0.656 0.779 0.810 0.811 0.786 0.726 0.782

The best degree of differentiation n 4 3 4 3 3 3 2 2 2 1 2 1 1 1 1

Wise-LPC compression with n ¼ n and p ¼ 9 0.370 0.351 0.387 0.395 0.597 0.565 0.519 0.725 0.607 0.669 0.725 0.679 0.739 0.718 0.748

Fupper (kHz)

3 3 5 5 8 10 12 13 13 15 15 16 18 20 20

4 Conclusions The research allows the following conclusions. The variance of the error smoothly falls and the width of the Laplace distribution approaches “saturation” when the window width grows. The Wise-LPC algorithm permits better compression retaining the linear time complexity. On the average, the Wise-LPC algorithm improves the compression by 1– 5% for broadband high-frequency signals and 5-10% for low-frequency signals. It allows the conclusion that the algorithm should work well in speech encoding. The FLAC format involves the combination of linear prediction and HuffmanGolomb error coding. Note that the division of the compression procedure into two unrelated stages is a popular trick in compression algorithms: first the extrapolation

364

L. S. Telyatnikov and I. M. Karandashev

algorithm is generated, and then the second algorithm that takes the remnants (prediction errors) and stores them in a compact form is built. The approach is also popular in modern neural-net-based compression techniques where neural nets are usually used only in the first stage (data prediction) [7]. We hope that we will evident soon the advent of end-to-end systems where neural nets are engaged in the both stages concurrently [8]. And this kind of systems is our next goal. Acknowledgements. The research was supported by the State Program SRISA RAS No. 00652019-0003 (AAA-A19-119011590090-2).

References 1. FLAC format. https://xiph.org/flac/format.html 2. Robinson, T.: SHORTEN: Simple lossless and near-lossless waveform compression. Technical Report 156, Cambridge University Engineering Department, Trumpington Street, Cambridge, CB2 1PZ UK, December 1994 3. Hans, M., Schafer, R.W.: Lossless compression of digital audio. IEEE Sign. Process. Mag. 18 (4), 21–32 (2001). https://doi.org/10.1109/79.939834 4. Collomb, C.: Linear prediction and Levinson-Durbin algorithm (2009). https://www. academia.edu/8479430/Linear_Prediction_and_Levinson-Durbin_Algorithm_Contents 5. Golomb, S.W.: Run-length Encodings. IEEE Trans. Inf. Theory 12, 399–401 (1966) 6. Rice, R.F.: Some Practical Universal Noiseless Coding Techniques. Technical Report 79/22, Jet Propulsion Laboratory (1979) 7. Kleijn, W.B., Lim, F.S.C., Luebs, A., Skoglund, J., Stimberg, F., Wang, Q., Walters, T.C.: Wavenet based low rate speech coding. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018). https://arxiv.org/abs/1712.01120 8. Kankanahalli, S.: End-to-end optimized speech coding with deep neural networks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2521–2525 (2018). https://doi.org/10.1109/icassp.2018.8461487. https://arxiv.org/abs/ 1710.09064

Neural Network Theory, Concepts and Architectures

Approach to Forecasting Behaviour of Dynamic System Beyond Borders of Education A. A. Brynza(&) and M. O. Korlyakova Bauman Moscow State Technical University - Kaluga Branch, Kaluga 248000, Russia [email protected]

Abstract. The problem of forecasting behavior of a complex dynamic system is considered. The analysis of the approaches allowing in the conditions of limitation of information and parametric uncertainty on behavior character to predict with high accuracy behavior of systems for situations at which value of control parameters go beyond the limits of the used training set is carried out. The estimation of forecasting results is executed, the corresponding graphs are presented, conclusions are drawn. Keywords: Training of neural networks Forecasting Recurrent neural networks Trees of decisions LSTM networks

1 Introduction In diagnostics of technical systems the forecasting carried out conditions of some object which is based on telemetric data, obtained during work. The obtained information is analyzed, the changes arising with a time under the influence of external and irreversible processes of wear of different components of a system are defined. Forecast of development of defects and timely assessment of technical condition on approximate period allows to increase control efficiency of systems in general [1, 2]. Thus, it is necessary to provide mechanisms of predictions of development of system conditions under various states of operation and taking into account specific features of each system.

2 Methods of Prediction of a System Behavior Now one of effective methods of forecasting in sophisticated systems - creation of their ‘digital doubles’. Similar models are effectively used in different applied processes and in systems with different function. As an example - modeling the emissions of charcoal gas in a system with many sources generating energy in time [3]. There as a digital double can be used a model of a partially equilibrium balanced power system. The main task to be solved when carrying out the analysis of the information, consists in determination of dynamics of changes of functioning of the formed information model, © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 367–374, 2020. https://doi.org/10.1007/978-3-030-30425-6_43

368

A. A. Brynza and M. O. Korlyakova

which allows to describe the behavior of objects that make up a complex system in the present and future. [4–6]. Thus, it is necessary to create models of systems which will allow to predict the behavior of difficult technical objects in conditions of stable and changeable environments, at rated loads and beyond them. For difficult technical objects it is possible to use various approaches for creation of mathematical models with different degree of details [7]: – creation of nominally functional descriptions of a system (static or dynamic) that demands understanding of processes, taking place in a system; – creation of simulation models on the basis of the known properties and functions of a system (nature of communications of input and output parameters); – creation of models on the basis of training and the analysis experimental data without a known functional connection, which requires a huge number of examples of system work states. The purpose of any variant of modeling consists in rather exact description of the processes taking part in the modeled object for predictions of consequences. However, it should be noted that the nominal settings are usually well studied, but the emergencies have no full description. This leads to the fact that the formed model of the object has to provide forecasting of behavior not only within nominal situations but also to overstep their boundaries. Let’s consider possible ways of solving the task of modeling of systems, based on training methods using the examples. Among them it is possible to allocate neural network models [3] which allow multiple examples to construct not only connections of input and dependent parameters, but to estimate structure of this connections to a certain degree. Let’s review several examples of models of dynamic systems to predict their behavior beyond the borders of education. 2.1

Example 1. Vibration Gyroscope

Model of a vibration gyroscope with a control system [8] based on principle of adaptive control in real time, where quasistationary angular speed of an oscillatory gyroscope is considered as unknown parameter and must be estimate. Input - the operating impacts (force) on both axes of a gyroscope are calculated in the way that dynamics of a gyroscope reached the quality set by reference model for internal coordinates ðx; y; x_ ; y_ ; €x; €yÞ and control impacts ðu; vÞ. The model is limited to the range of stabilization of angular speeds of Xz from 3 to 7 rad/sec (see Fig. 1 (a)). Areas of angular speed out of specified range lead to appearance of oscillatory process (see Fig. 1 (b)), which is not stabilized by a control system.

Approach to Forecasting Behaviour of Dynamic System Beyond Borders of Education

369

Fig. 1. (a) Stabilization of the angular velocity, (b) There is no stabilization.

Let’s consider as the object of modeling the process of occurrence of such phenomena, i.e. we actually will be able to predict whether the process of stabilization of angular speed is successful by the behavior of the model. Thus we solve a problem of classification of the following types: – predict the distance of the prediction window ðnwÞ from the current time moment ðtÞ, whether the angular velocity (i.e. analysis of the behavior of the system in moments from t n to t þ nw); – the input dataset is the values of the gyroscope state vector at the time of t and n previous states X ¼ hxðt nÞ; yðt nÞ; x_ ðt nÞ; y_ ðt nÞ; €xðt nÞ; €yðt nÞ; uxðt nÞ; uyðt nÞ; . . .xðtÞ; yðtÞ; x_ ðtÞ; y_ ðtÞ; €xðtÞ; €yðtÞ; uxðtÞ; uyðtÞi; – the dependent variable T 2 f1; 1g. We believe that T ¼ 1 (class ‘off’ on Fig. 2) the lack of stabilization of the angular velocity and T ¼ 1 (class ‘on’ on Fig. 2) for stabilization areas, the difference between the reference model and the results of the adaptation and control loop operation on the interval is actually estimated ½t; t þ nw. Determination of the signal type refers to problems of classification temporary signals (TSC) for the solution of which offered most various approaches on the basis of use of classical feedforward and recurrent networks [4], networks of convolutional type [5] and LSTM [6] networks. We teach qualifiers for the Xz area = [3, 7] rad/sec. As a result of training of several types of qualifiers best results shows model based on LSTM network [9]. Quality of decisions in the area of nominal values of angular speed for test selection made 94% of correctly assessed situations. Application of this model for the test is shown in Fig. 1a. Besides, modeling at Xz 2 ½7; 10 rad/sec and Xz 2 ½0; 3 rad/sec shows high quality of prediction (70–75%) for examples behind the borders Xz used for training. The generated classifier allows to specify the area of stabilization for nw ¼ 0:004 seconds up to stable model entrance into this zone.

370

A. A. Brynza and M. O. Korlyakova Table 1. Assessment of the quality of training of networks

Network type

The size of the window n, sec.

Training time, min.

Time of one prediction, sec.

2-layer perceptron, 10 neurons 2-layer perceptron, 100 neurons LSTM – 10 neurons LSTM – 100 neurons

Dt ¼ 0:004

10

0.013

Type I error in the range [0,3] and [7, 10] rad/sec, % 49.60

Dt ¼ 0:004

30

0.014

73.5

Dt ¼ 0:004

0.6

0.041

29.0

Dt ¼ 0:004

6

0.043

25.5

Results of modeling are given in Fig. 2 (a) for area inside training range and Fig. 2 (b) beyond the borders of education. Practically all timepoints in Fig. 2 (b) correspond to lack of stabilization angular speed of T ¼ 1 (class ‘off’), the area of 1st kind errors is highlighted with color. Assessment of quality of training of several types of networks is given in Table 1 (sample size - 4000 examples). The efficiency of LSTM network can be explained by formation of model of temporary behavior whereas the network of a perceptron class made only the description of the known part of data. Thus, when training LSTM appears dynamic system model more suitable for formations of the digital double.

Fig. 2 Classifier solution ‘on’ (T=1)\‘off’ (T=−1) (a) for the nominal area of the model, (b) beyond the borders of the nominal area.

Approach to Forecasting Behaviour of Dynamic System Beyond Borders of Education

2.2

371

Example 2. Synchronous Motor Base

Let’s consider the second object, prediction of development process on the model of the valve DC motor [10], which belongs to the class of prediction of a time series. Valve motor represents the brushless synchronous motor with three stator windings and rotor with permanent polar magnets, creating magnetic field. A mathematical model of the motor with introduction of variable statuses: ð~x1 ~x2 ~x3 ÞT ¼ ðid iq xÞT and ð~u1 ~u2 ÞT ¼ ðud uq ÞT , allows for [10] to obtain a system of equations in dimensionless variables: x_ 1 ¼ x1 þ x2 x3 þ u1 x_ 2 ¼ x2 x1 x3 þ cx3 þ u2 x_ 3 ¼ rðx2 x3 Þ MH signðx3 Þ

ð1Þ

Parameters u1 ; u2 ; MH are control parameters. The development of the process in time relative to the coordinates x1 ; x2 ; x3 takes the form shown in Fig. 3.

Fig. 3. Changing the state of a system.

The area of stability of an attractor is reached in an interval of control parameter u1 2 ½9:3 0:1. For forecasting we use the sequence of a system status on previous n modeling steps. The predicted value is set on nw of steps of modeling. Therefore, the examples for training contain the following elements: - the input dataset is the values of the state vector at the timepoint t and n previous states X ¼ hx1 ðt nÞ; x2 ðt nÞ; x3 ðt nÞ; . . .; x1 ðtÞ; x2 ðtÞ; x3 ðtÞi; - dependent variable T ¼ hx1 ðt þ nwÞ; x1 ðt þ nwÞ; x1 ðt þ nwÞi: We have considered several different architectures for solving the forecasting task: ensemble of decision trees and multilayer nonlinear perceptron. Besides, for the regression target coordinates ðx1 ; x2 ; x3 Þ, applied different methods of forming of the training couples. For models in training range all created solvers show good quality of the description of a trajectory of a system. Let’s change parameters u1 ; u2 ; MH for

372

A. A. Brynza and M. O. Korlyakova

training borders, we will also consider result of generation of a trajectory vector of a condition of a system, presented in Fig. 4. A two-layer network with sigmoidal neurons on a hidden layer and linear output neurons is used. The learning algorithm used is Levenberg-Marquardt. The training sample is fed in the form of a vector 3n 1, where the intervals hð1; nÞ; ðn þ 1; 2nÞ; ð2n þ 1; 3nÞi are put according to target values of coordinates x1 ; x2 ; x3 at time t þ 1. At the same time the network is under construction for 3 exits (values coordinates), i.e. the mode of vector modeling is realized. When the control parameter reaches the instability zone, proceeding from results of modeling, it is possible to draw a conclusion that usage of the network is capable to build the forecast of behavior of an attractor in high quality only in the neighborhood of the control parameter. When going beyond the limits of stability, network it is not capable to build the reliable forecast of behavior.

Fig. 4 The initial trajectory of the system at u1 ¼ 6 and the result of forecasting using (a) decision trees (b) the output vector of the neural network (c) serial output of the neural network.

The most successful model of forecasting of a trajectory was received at radical revision of a way of the description of the training couples. The training selection is formed in the form of mix of couples ðhx1 ðt nÞ; . . .; x1 ðtÞi; x1 ðt þ 1ÞÞ; ðhx2 ðt nÞ; . . .; x2 ðtÞi; x2 ðt þ 1ÞÞ; ðhx3 ðt nÞ; . . .; x3 ðtÞi; x3 ðt þ 1ÞÞ; where examples are located consistently one after another and are put in compliance to target values of the corresponding coordinate in timepoint of t þ 1. The network is built for the 1st exit (coordinate values), but actually all coordinates are passed in the sample sequentially, though with three examples. As can be seen from Fig. 5, such network is able to take into account the patterns of behavior even without having in the training sample data on the trajectory of motion at the time of loss of stability, and at the same time to build a forecast with very high quality. For assessment of quality of the generated model we will consider mistakes (MSD to a target trajectory) on an interval of the analysis of the control parameter u1 2 ½13 1 in Fig. 6, where experiment 1 is an ensemble of trees, experiment 2 is a vector neural network, and experiment 3 is a sequential neural network.

Approach to Forecasting Behaviour of Dynamic System Beyond Borders of Education

373

Fig. 5 (a) The initial trajectory of the system at u1 ¼ 11 (unstable behavior) (b) The result of prediction using a feed forward network.

Fig. 6 Error graphs of the considered experiments on the range of values of the control parameter u1 2 ½13 1.

Based on the graph (see Fig. 6) the best quality of forecasting is shown by the network from experiment 3. The network from experiment 2 achieves good prediction quality only in the vicinity of the control parameter at which the training was performed. The ensemble of trees is able to predict the pattern of behavior, however, not as smooth as the results obtained using other approaches.

374

A. A. Brynza and M. O. Korlyakova

3 Conclusions The formation of digital doubles of real objects allows to solve the problem of prediction the behavior of complex objects, but it is necessary to consider that the resulting dataset cannot reflect all features of work of a real system, and only a number of key patterns. As showed by preliminary experiments within computer modeling, prediction of behavior of difficult technical system is possible with good quality on the basis of preliminary training if modeling occurs in one of the forms not only at input-output reactions, but also may form a model of dynamic system. This fact was noted for both classification and time series forecasting. Separately it is worth mentioning an opportunity in additional training of a digital double in the course of working operation that will eventually allow to adjust forecasts of working capacity.

References 1. Bazhenov, Yu., Kaleno, V.P.: Prediction of residual life of electronic engine control systems. Gazette SibADI 2(56) (2017) 2. Tonoyan, S., Baldin, A., Eliseev, D.: Forecasting of the technical condition of electronic systems with adaptive parametric models.Gazette BMSTU. Series “Instrumentation” 6(111) (2016) 3. Modeling Time-dependent CO 2 Intensities in Multi-modal Energy Systems with Storage Christopher Ripp and Florian Steinke, Member, IEEE. url: https://arxiv.org/pdf/1806.04003. pdf 4. Katsuba, Yu., Grigorieva, L.: Application of artificial neural networks to predict the technical condition of products. Int. Res. J. 3(45), 19–21 (2016) 5. Fawaz, H.I., Forestier, G., Weber, J., doumghar, L.I., Muller, P.-A.: Transfer learning for time series classification. In: 2018 IEEE International Conference on Big Data (2018) 6. RuBwurm, M., Körner, M.: Temporal vegetation modelling using long short-term memory networks for crop identification from medium-resolution multi-spectral satellite images. In: 2017 IEEE Conference On Computer Vision And Pattern Recognition Workshops (CVPRW) (2017) 7. Chucheva, I.: models and methods of prediction. In: Mathematical Bureau. Forecasting on OREM (2011) 8. Myshlyaev, Y., Finoshin, A., Myo, T.Y.: Sliding Mode with Tuning Surface Control for MEMS Vibratory Gyroscope. 6th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (2014) 9. Tai, K.S., et al.: Improved semantic representations from tree-structured long short-term memory network. arXiv:1503.00075 [cs.CL] (2015) 10. Chu, J., Hu, W.: Control chaos for pernament magnet synchronous motor base on adaptive backstepping of error compensation. Int. J. Control Autom. 9(3), 163–174 (2016)

Towards Automatic Manipulation of Arbitrary Structures in Connectivist Paradigm with Tensor Product Variable Binding Alexander V. Demidovskij(&) Higher School of Economics, ul. Bolshaya Pecherskaya 25/15, Nizhny Novgorod, Russia [email protected]

Abstract. Building a bridge between symbolic and connectionist level of computations requires constructing a full pipeline that accepts symbolic structures as an input, translates them to distributed representation, performs manipulations with this representation equivalent to symbolic manipulations and translates it back to the symbolic structure. This work proposes neural architecture that is capable of joining two structures which is an essential part of structure manipulation step in the connectionist pipeline. Verification of the architecture demonstrates scalability of the solution, a set of advice for engineering practitioners was elaborated. Keywords: Connectionism Unsupervised learning

Tensor computations Neural networks

1 Introduction For a long period, Artificial Intelligence (AI) community investigates two important paradigms about computations: symbolic and sub-symbolic or connectionist approaches. Although, those two ideas can be considered drastically different, it is likely for them to become partners rather than competitors. Symbolic level is defined by methods that manipulate symbols and explicit representations. Connectionist approach [1, 2] is built around the idea of massive parallelism and mostly characterized by artificial neural networks. The potential symbiosis of two paradigms can bring robust and flexible solutions that produce understandable results that are easy to validate. Symbolic structures can be encoded in the distributed representation with many means: First-Order Logics (FOLs) [3, 4], Holographic Reduced Representations (HRRs), Binary Spater Codes and so on [5]. One of the key contributions to the field are presented in the Tensor Product Variable Binding approach proposed by Smolensky [6] and further applied in Vector Symbolic Architectures (VSA) [7]. Distributed representations taken by this method are used in multiple domains, especially in Natural Language Processing (NLP) [8], where a sentence plays a role of structure. In order to describe the task and the proposed solution it is essential to give several key definitions of the Tensor Product Variable Binding (TPVB).

© Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 375–383, 2020. https://doi.org/10.1007/978-3-030-30425-6_44

376

A. V. Demidovskij

Definition 1. Filler – a particular instance of the given structural type. Definition 2. Role – a function that filler presents in a structure. Definition 3. Tensor multiplication is an operation over two tensors a with rank x and b with rank y that produces a tensor z has rank x + y and it consists of pair-wise multiplications of all elements from x and y. Definition 4. Tensor product of a structure. A structure is perceived as a set of pairs of fillers ffi g and roles fri g and its tensor product is found as (1). w¼

X

f i i

ri

ð1Þ

There are already solutions that can translate simple structures to tensor representations and back to the symbolic structures [9]. However, there is a gap in making operations over structures on the tensor level. Indeed, there are multiple routine operations over structures: adding or removing nodes, joining structures together etc. In this paper the task of joining structures together is considered and thoroughly analyzed.

2 Task Description There is structure S presented on the Fig. 1. It consists of two levels of nesting (root is not considered as a first level). This structure contains 3 fillers: A, B, C and only two elementary roles: r0 (left child) and r1 (right child). Each filler and role should be transformed to vector representation. There is only one strong requirement: fillers, defined on vector space VF, should be linearly independent among each other, as well as roles, defined on vector space VR. At the same time, an assignment for fillers and roles can be arbitrary with the aforementioned condition being satisfied (2). A ¼ ½8 0 0; B ¼ ½0 15 0; C ¼ ½0 0 10; r0 ¼ ½10 0; r1 ¼ ½0 5

ð2Þ

According to Definition 4 the given structure S can be translated to the distributed representation (3).

Fig. 1. Sample structure

Towards Automatic Manipulation of Arbitrary Structures

w¼

X

f i i

ri ¼ A r0 r0 þ C r1 r0 þ B r1

377

ð3Þ

It is easier to first calculate compound roles (4) and then apply them to (3) in order to find the corresponding tensor representation (5). ½ 100 0 r00 ¼ r0 r0 ¼ ½10 0 ½10 0 ¼ ½0 0 ½0 0 r10 ¼ r1 r0 ¼ ½0 5 ½10 0 ¼ ½ 50 0

ð4Þ

w ¼ A r00 þ C r10 þ B r1 ½ 100 0 ½0 0 ¼ ½8 0 0 þ ½0 0 10 ½0 0 ½ 50 0 þ ½0 15 0 ½0 5 ¼ ¼

½ 800 0 ½0 0

½0 ½0

0 0

½0 0 ½ 500 0

2

3

ð5Þ

½0 0 6 7 þ 4 ½0 75 5 ½0 0

It is extremely important to note that the resulting tensor representation contains tensors of different rank that cannot be summed as plain matrixes. Instead there is a direct sum operation. The idea is that a tensor of rank N can be represented as a list of tensors of rank 1..N with tensors of rank 1..N−1 being just filled with zeros. Therefore, when a sum of tensor representation is performed tensors are summed according to their rank. At this moment, it is clear how to build a binary tree of the predefined height using sub-symbolic operations. In order to better understand the requirements of the task of the paper it is necessary to analyze the algorithm that is used to construct the considered example (Fig. 2).

Fig. 2. Possible stages of building structure from subtrees. (a) There are independent fillers. (b) A and C are joined as left and right children of root accordingly. B is still an independent filler. (c) A subtree from (b) is taken as a left subtree and a free filler B is taken as a right subtree.

378

A. V. Demidovskij

From Fig. 2 it is clear that building a structure inherently means joining subtrees. In case of binary tree there are one or two subtrees that can be joined. Also, it is vital that at the beginning each filler is considered as a separate tree that can participate in the joining procedure. This brings to the formulation of the task. The target task of the current paper is to propose the robust neural architecture for performing dynamic construction of tensor representation of the arbitrary structure via joining the subtrees and investigate engineering aspects of its implementation.

3 Theoretical Method of Building Shift Matrix Joining two subtrees as direct children of the new root and by that constructing the new tree is by nature a simple operation that makes a whole subtree play a new role in terms of Tensor Product Variable Binding. It is extremely clear from Fig. 2b, where instead of taking big trees, there are only two fillers that play a role of left and right subtree correspondingly. In order to achieve the same result on tensor level it is enough to perform tensor multiplication of the filler and corresponding role. Generalizing it to the case when instead of a filler there is a representation of a tree, there is still a need to perform tensor multiplication of the tree distributed representation and the assigned role. The complexity in this case lies in the fact that tensor representation of the structure is the multi-component list of tensors of different depth and it is no longer a plain vector-vector multiplication. Definition 5. Joining operation cons(p, q) is an action over two structures (trees) so that the tree p is sliding as a whole ‘down to the left’ so that its root is moved to the leftchild-of-the-root position and tree q is sliding ‘down to the right’. Operation cons can be expressed for binary trees as: consðp; qÞ ¼ p r0 þ p r1 cons0 ð pÞ consðp; ;Þ cons1 ðqÞ consð;; qÞ;

ð6Þ

where r0 and r1 are roles, ; is empty tree. It was proved [10] that this operation can be expressed in matrix form given that it operates over the tensor representation of structures (7). consðp; qÞ ¼ Wcons0 p þ Wcons1 q

ð7Þ

Matrix exposes a shifting mechanism over a tensor representation of a structure that contains tensors of different rank. Technically, to shift the tree ‘down to the left’ (‘down to the right’) means to apply the role r0 (r1 ) to each tensor from tensor representation. This is what Wcons0 (Wcons1 ) matrices perform. These matrices take symbols at depth d from p and put them at depth d + 1.The form of these matrices is the following: all elements are zeros except the elements under the main diagonal. This is true because of the fact that cons operation just shifts the tree one level down. As both

Towards Automatic Manipulation of Arbitrary Structures

379

matrices are constructed in the same manner, only Wcons0 is considered in this section. Matrix is computed from the role vector and identity matrices (8). Wcons0 ¼1A r0 þ 1R 1A r0 þ 12 R 1A r 0 þ þ . . . þ 1d R 1A r0 þ . . .;

ð8Þ

where d is the depth of the representation, 1A is an identity matrix of width and height equals number of elements in the filler vector and 1R is an analogous identity matrix with size depending on the role vector. The key point in constructing the matrix is to keep the order of tensor multiplications. This is not so obvious because the way tensor representation is considered in TPVB is rather unbounded – TPVB only recognizes the feature that resulting tensor contains all multiplications of input tensors elements. However, for Wcons0 it is very important to keep dimensions of roles first. Finally, we get the following matrix for depth = 2, role vector with 2 elements, filler vector with 3 elements (9). 22

r0 0 6 6 r0 1 66 66 66 0 66 66 64 6 0 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4

0 r0 0 r0 1 0

0

3

3

7 7 7 0 7 7 7 r0 0 5 r0 1

7 7 7 7 7 7 7 7 7 37 7 7 77 77 77 77 0 77 77 77 77 77 7 37 77 77 0 0 777 7 7 77 7 77 7 r0 0 7 7 0 777 7 r0 1 7 77 5 5 5 r0 0 0 r0 1

0

0 22

r0 0 6 6 r0 1 66 66 66 0 66 66 64 6 0 6 6 6 6 6 6 6 6 6 6 6 4

3

0 r0 0 r0 1 0

0

0

7 7 7 0 7 7 7 r00 5 r01

2

r00 6 r01 6 6 6 0 6 6 4 0

ð9Þ During the computation phase the matrix is flattened and does not contain the block structure present in (9). Blocks are shown for better visualization of the matrix structure.

4 Proposed Neural Architecture The overall scheme of the proposed neural architecture for joining structures is demonstrated on the Fig. 3. Neural Network is designed to accept multiple inputs of two types: constant and variable ones, they will be described later. After that each filler

380

A. V. Demidovskij

processing subtree is flattened to a vector format while a shifting matrix is prepared based on the role that is chosen for this sub-tree. Finally, each subtree vector representation is multiplied by the shifting matrix and all the resulting vectors are summed and by that the tensor representation of the structure that contains inputs structures as direct children of the new root is produced. All the layers details are covered below. Input Layers. As it was stated in (4) tensor representation is by definition a list of tensors. Number of elements in the list hugely depends on the depth of the structures that should be joined. Each variable input corresponds to the tensor of the particular rank. Also, there can be multiple structures that we are going to join, that is why the number of inputs can drastically grow with the demand of the original task. The second type of inputs is constant inputs. Those inputs are filled with roles vectors. On the Fig. 2 it is clear that there are only two roles taken for simplicity of description. In reality there can be plenty of roles and Neural Network is designed to be easily extended to a larger case.

Fig. 3. Overall scheme of the neural architecture

Reshaping Layers. Those layers are part of the subtree flattening branch (Fig. 4) and exist for input tensors or rank 1 and 2. It is a technical requirement of the implementation in the Keras1 framework due to the fact that Flatten layer can work only with tensors of rank bigger than two. So, Reshaping layers expand dimensions of such inputs with fake dimension of 1 to satisfy Flatten layer requirements.

1

https://keras.io/.

Towards Automatic Manipulation of Arbitrary Structures

381

Flattening Layers. Those layers are part of the subtree flattening branch (Fig. 4) and exist for all input tensors. Those layers transform tensors of different rank to a simple vector format according to the ordinary rules of flattening multi-dimensional tensors. Concatenate Layers. Those layers are part of the subtree flattening branch (Fig. 4). Those layers join vectors that correspond to each level of the tensor representation in one vector. The order is very important here: from vectors representing zero depth level to N. Transpose Layers. Those layers are part of the subtree flattening branch (Fig. 4). Due to the fact that next operation is matrix-vector multiplication it is required to transform a vector into a column vector. Transpose layers enclose the subtree flattening branch and their output is used in the final part of the network.

Fig. 4. Subtree flattening branch of the proposed architecture

ShiftMatrix Layers. Those layers are part of the role propagating branch (Fig. 5). The primary and only purpose of this layer is production of the shift matrix that was discussed in Section “Theoretical method of building shift matrix”. In practice it is a tensor of rank 2 or an ordinary matrix. It is interesting to estimate it dimensions. Width

382

A. V. Demidovskij

of the matrix or a shift operator equals to the size of the vector representing the tree that should be assigned to a given role while height of the matrix equals the size of vector representing a structure assigned to a new role.

Fig. 5. Role propagating branch

MulVec Layers. Those layers are part of the neural network tail (Fig. 3). Those layers perform ordinary matrix-vector multiplication and the resulting vector contains tensor representation of the current subtree assigned to a new role. Add Layer. This layer is an output of the network (Fig. 3). All the subtrees are now assigned to new roles and it is required to join them together and the sum vector would represent the resulting structure after joining all subtrees on the tensor level.

5 Conclusion The novel neural architecture that solved a task of joining structures was proposed and implemented in the Keras framework. The implementation is open-source and available online2. Several conceptual gaps of original works devoted to the same topic were closed, in particular the mechanics of building the shift matrix. The elaborated network is robust and is designed to work with arbitrary number of roles and existing tensor representations of different depth. This result provides an essential brick in the bridge between symbolic and sub-symbolic levels of computations. However, there is still an opened question on performing other operations over arbitrary structures on the tensor level, for example adding or removing nodes or moving nodes to other positions in the structure. Also, current proposal requires initial definition of the structure maximum depth that can be an obstacle in edge cases, as well

2

https://github.com/demid5111/ldss-tensor-structures.

Towards Automatic Manipulation of Arbitrary Structures

383

as constructing the shifting matrix depending on number of roles. So, there is an actual direction for further development of Tensor Product Variable Binding methods.

References 1. Rumelhart, D.E., Hinton, G.E., McClelland, J.L.: A general framework for parallel distributed processing. Parallel Distrib. Process. Explor. Microstruct. Cogn. 1, 26 (1986) 2. Rumelhart, D.E., McClelland, J.L.: PDP Research Group: Parallel Distributed Processing, 1st edn, p. 184. MIT press, Cambridge (1988) 3. Serafini, L., Garcez, A.D.A.: Logic tensor networks: deep learning and logical reasoning from data and knowledge. arXiv preprint. arXiv:1606.04422 (2016) 4. Teso, S., Sebastiani, R., Passerini, A.: Structured learning modulo theories. Artif. Intell. 244, 166–187 (2017) 5. Browne, A., Sun, R.: Connectionist inference models. Neural Netw. 14(10), 1331–1355 (2001) 6. Smolensky, P.: Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif. Intell. 46(1), 159–216 (1990) 7. Gallant, S.I., Okaywe, T.W.: Representing objects, relations, and sequences. Neural Comput. 25(8), 2038–2078 (2013) 8. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014) 9. Demidovskij, A.: Considering selected aspects of tensor product variable binding in connectionist systems. In: Proceedings of the 2019 Intelligent Systems Conference (IntelliSys). The conference will be held in September, pp. 5–6. Springer, Cham (2019) 10. Smolensky, P., Legendre, G.: The Harmonic Mind: From Neural Computation to Optimality-Theoretic Grammar (Cognitive Architecture), 1st edn. MIT press, Cambridge (2006)

Astrocytes Organize Associative Memory Susan Yu. Gordleeva1(&), Yulia A. Lotareva1, Mikhail I. Krivonosov1, Alexey A. Zaikin1,2, Mikhail V. Ivanchenko1, and Alexander N. Gorban1,3 1

Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia [email protected] 2 University College London, London, UK 3 University of Leicester, Leicester, UK

Abstract. We investigate one aspect of the functional role played by astrocytes in neuron-astrocyte networks present in the mammal brain. To highlight the effect of neuron-astrocyte interaction, we consider simplified networks with bidirectional neuron-astrocyte communication and without any connections between neurons. We show that the fact, that astrocyte covers several neurons and a different time scale of calcium events in astrocyte, alone can lead to the appearance of neural associative memory. Without any doubt, this mechanism makes the neuron networks more flexible to learning, and, hence, may contribute to the explanation, why astrocytes have been evolutionary needed for the development of the mammal brain. Keywords: Astrocyte

Associative memory Neural network

1 Introduction The functional role of astrocyte calcium signaling in brain information processing was intensely debated in recent decades. Astrocytes play crucial roles in brain homeostasis and are emerging as regulatory elements of neuronal and synaptic physiology by responding to neurotransmitters with Ca2 þ elevations and releasing gliotransmitters that activate neuronal receptors [1]. The characteristic times of calcium signals (1–2 s) are three orders of magnitude longer than the duration of spikes in neurons (1 ms). It was shown that astrocyte can act as temporal and spatial integrator, hence, detecting the level of spatio-temporal coherence in the activity of accompanying neuronal network. Currently actively discussed hypothesis is that the astrocytic calcium activity can induce spatial synchronization in neuronal circuits defined by the morphological territory of the astrocyte [2–4]. In other words one can draw an analogy with the Hopfield network. Calcium events in astrocytes that induce synchronization in surrounding neural ensembles work as a temporal Hopfield network, and, hence, can be interpreted as an associative memory model. In this paper, we consider one of the simplest model of the neuron-astrocyte network (NAN), where we implement a kind of the Hopfield network with forgetting. There is just a few of previous works studying role of astrocyte in learning tasks. PortoPazos and collaborators investigated the performance of an astrocyte-inspired learning © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 384–391, 2020. https://doi.org/10.1007/978-3-030-30425-6_45

Astrocytes organize associative memory

385

rule to train deep learning networks in data classification and found that the neuronastrocyte networks were able to outperform identical networks without astrocytes in all classification tasks they implemented [5–7]. In the presented studies they taken into account only temporal features of astrocytic modulation of the signal transmission in neural network. In contrast to this approach, we concentrate on the local spatial synchronization organized by astrocyte, which, due to its different time scale, work as a kind of neural associative memory.

2 Model and Architecture of Neuron-Astrocyte Network The proposed neuron-astrocyte network consists of 2 layers, first layer of neurons with dimensions 40 40 and second layer of astrocytes with dimensions 13 13. To focus only on associative learning, the elements in each layer are not interconnected. We consider bidirectional neuron-astrocytic communication between layers. Each astrocyte interacts with neuronal ensemble dimensions of 4 4 with overlapping in one row (see Fig. 1). Experiments show that astrocytes and neurons communicate via a special mechanism modulated by neurotransmitters from both sides. The model is designed so that when the calcium level inside an astrocyte exceeds a threshold, the astrocyte releases neuromodulator (e.g., glutamate) that may affect the release probability (and thus a synaptic strength) at neighboring connections in a tissue volume. Single astrocyte can regulate the synaptic strength of several neighboring synapses. The membrane potential of a single neuron is described by Izhikevich model and evolves according to the following equations [8]: 8 dV > < ¼ 0:04V 2 þ 5V þ 140 U þ Iapp þ Iastro ; dt > : dU ¼ aðbV UÞ: dt

ð1Þ

If V 30 mV ; then V ! c; U ! U þ d: We use the following parameter values: a = 0.1, b = 0.25, c = −65, d = 2. The applied currents Iapp simulating input signal Iapp ¼ 5 if input signal is presented. The astrocytic modulation of the synaptic activity is modeled by current Iastro , which has a value Iastro ¼ 30, if Ca2 þ level in astrocyte exceeds 0.15 lM and more than 50% of neurons, corresponding to this astrocyte, are activated. Calcium dynamics in astrocyte is described by the Li-Rinzel model. State variables of each cell include IP3 concentration IP3 , Ca2 þ concentration Ca, and the fraction of activated IP3 receptors h. They evolve according to the following equations [9]:

386

S. Yu. Gordleeva et al.

8 dCa > > ¼ Ier Ipump þ Ileak ; > > dt > > < dH H h ¼ ; > dt sn > > > > > : dIP3 ¼ ðIP3s IP3 Þsr þ Iplc þ Ineuro : dt 3 3 IP3 Ca c0 Ca 3 Ier ¼ c1 v1 h Ca ; Ca þ d5 c1 IP3 þ d1 c0 Ca Ileak ¼ c1 v2 ; c1 Ca2 ; Ipump ¼ v3 2 Ca þ k32 IP3 þ d1 IP3 þ d1 H ¼ d2 þ Ca ; = d2 IP3 þ d3 IP3 þ d3 IP3 þ d1 þ Ca ; sn ¼ 1= a2 d2 IP3 þ d3 Ca þ ð1 aÞk4 Iplc ¼ v4 : Ca þ k4

ð2Þ

Biophysical meaning of all parameters in Eq. (2) and their values determined experimentally can be found in Ref. [6]. For our purpose we use the following parameter values [6]. c0 ¼ 2:0 lM; c1 ¼ 0:185; v1 ¼ 6 s1 ; v2 ¼ 0:11 s1 ; v3 ¼ 2:2 lMs1 ; v5 ¼ 0:025 lMs1 ; v6 ¼ 0:2 lMs1 ; k1 ¼ 0:5 s1 ; k2 ¼ 1:0 lM; k3 ¼ 0:1 lM; a2 ¼ 0:14 lM 1 s1 ; d1 ¼ 0:13 lM; d2 ¼ 1:049 lM; d3 ¼ 0:9434 lM; d5 ¼ 0:082 lM; a ¼ 0:8; sr ¼ 7:143 s; IP3s ¼ 0:16 lM; k4 ¼ 1:1 lM:

The current Ineuro describes production of IP3 due to the synaptic activity of neighbor neurons. The current Ineuro is modeled by rectangular pulse signal with amplitude 5 lM and duration 60 ms. Ineuro 6¼ 0 if more than 50% of neurons, interacting with this astrocyte, are activated. Note that the time unit in the neuronal model Eq. (1) is 1 ms. Due to a slower timescale, in the astrocytic model Eq. (2) all empirical constants are indicated using seconds as time units. When integrating the joint system of differential equations, the astrocytic model time is rescaled so that the units in both models match up.

Astrocytes organize associative memory

387

Fig. 1. A network structure. Input images 40 40 pixels size fed into the neuronal network containing 40 40 neurons. Red fields correspond to the astrocyte, which overlap by one neuron wide layer.

3 Results We have used as input signals the black and white images of digit 0 or 1, with size 40 40 pixels as shown in Fig. 2. The training set included 10 samples for each image with 10% of salt and pepper noise added to every sample fed into the NAN (see Fig. 3a).

Fig. 2. Patterns for network training.

A 40 40 pixel input is processed by a 40 40 neuron layer (1600 neurons), obtaining the applied currents, Iapp , in Eq. (1) for each input which will be further converted into spikes. The neural response, shown in Fig. 3b, is the membrane potential map, further converted into spike trains. Each sample was presented to the network during 4 ms with period between samples 40 ms. In Fig. 4, the membrane potential change is shown. During the training, each astrocyte monitored activity associated with it 16 neurons in time window of 400 ms. If more than 8 neurons were spiking and spiking frequency was more than 17.5 s1 , astrocyte received an input signal, Ineuro (see Eq. (2)), inducing an increase in intracellular calcium concentration (see Fig. 3c).

388

S. Yu. Gordleeva et al.

Fig. 3. (a) The training sample with 10% of salt and pepper noise. (b) The response of the neuronal network. The values of the membrane potentials are shown. (c) The intracellular Ca2 þ concentrations in astrocytic layer.

After training, our neuron-astrocyte network remembers the pattern for a period of time that is determined by the duration of the calcium pulse in astrocyte. Testing sample was presented to the network for 20 ms. While Ca2 þ concentration in astrocyte exceeded the threshold in 0.15 lM and more than 8 neurons were still active, a feedback from astrocytes to neurons is turned on. This feedback is determined by biophysical mechanisms of astrocytic modulation of synaptic transmission and modeled as additional current Iastro in Eq. (1). Example of this test is shown in the Fig. 5.

Fig. 4. (a–c) Membrane potentials of neurons during and after training. (a) Neuron in target pattern interacted with active astrocyte. (b) Neuron, which are not in target pattern, interacted with active astrocyte. (c) Neuron not in target pattern interacted with quiet astrocyte. (d) The intracellular Ca2 þ concentration in active astrocyte.

Astrocytes organize associative memory

389

Fig. 5. The testing sample with 40% of salt and pepper noise. (a) The response of the neuronal network after an input with 4,4 (b) and 11,6 (c) ms duration. (d) The intracellular Ca2 þ concentrations in astrocytic layer.

Tests showed that the network can not only clean noise inside the target pattern (Fig. 5b) as expected but also can separate in time the pattern and noise around (Fig. 5c). The latter is due to the fact that neuronal spiking frequency is proportional to value of applied current.

Fig. 6. The dependences of the accuracy on noise level. Dotted line corresponds to manual selected threshold of accuracy.

390

S. Yu. Gordleeva et al.

To test robustness to noise of the proposed network we calculated the dependencies of the accuracy on noise level (see Fig. 6). Here the accuracy was not equal to 100% in ideal sample without noise because of the fact, that resolution of our system have been determined by the interaction radius astrocytes with neurons. Capacity of the proposed network is determined by orthogonality of images, number of astrocytes, and the radius of overlap between the territories of the astrocyte. In the Fig. 7 we presented the example of the training proposed network to 2 patterns, represented by digits 1 and 0.

Fig. 7. (a) and (d) The training sample with 10% of salt and pepper noise. (b) and (e) The response of the neuronal network. The values of the membrane potentials are shown. (c) and (f) The intracellular Ca2 þ concentrations in astrocytic layer. (g) The testing sample with 40% of salt and pepper noise. The response of the neuronal network after the 4,4 (h) and 11,6 (j) ms input.

4 Conclusions In this paper, we describe a simple neuron-astrocyte network architecture having the capabilities for associative memory. The proposed neuron-astrocyte network works as a temporal Hopfield network. The effect considered occurs because of the local spatial synchronization organized by the astrocyte and working on a different time scale. No

Astrocytes organize associative memory

391

links between cells have been required. Astrocytic modulation of the activity of nearby neurons during elevation of calcium concentration imitates Hebbian temporary synapse. In the future, the proposed neuron-astrocyte network will be developed by incorporation of the Hebbian learning algorithm. As we know from working with artificial intelligence algorithms, the flexibility of learning strongly depends on the complexity of the network. As we have demonstrated, astrocytes increases the complexity of the neural network by the coordination induce by calcium events, and this mechanism alone can lead to the organization of the neural associative memory. Without any doubt, it would be extremely interesting to investigate how this learning mechanism will work together with deep learning. Another important direction of the future research will include identification of conceptual markers of malfunction associated either with age-related disease or grows disorders. In both these situation, the brain loses ability to learn properly, hence, the question arises whether we could model these processes without simple conceptual model, and, probably, shed light on the methodology how to identify pathology markers in real medical applications. Acknowledgments. This work was supported by the Ministry of Science and Education of Russian Federation (Grant No. 075-15-2019-871).

References 1. Verkhratsky, A., Butt, A.: Glial Neurobiology. Wiley, Chichester (2007) 2. Bazargani, N., Attwell, D.: Astrocyte calcium signaling: the third wave. Nat. Neurosci. 19(2), 182–189 (2016) 3. Araque, A., Carmignoto, G., Haydon, P.G., Oliet, S.H., Robitaille, R., Volterra, A.: Gliotransmitters travel in time and space. Neuron 81, 728–739 (2014) 4. Gordleeva, S.Y., Ermolaeva, A.V., Kastalskiy, I.A., Kazantsev, V.B.: Astrocyte as spatiotemporal integrating detector of neuronal activity. Front. Physiol. 10, 294 (2019) 5. Porto-Pazos, A.B., Veiguela, N., Mesejo, P., Navarrete, M., Alvarellos, A., Ibáñez, O., Pazos, A., Araque, A.: Artificial astrocytes improve neural network performance. PLoS ONE 6(4), e19109 (2011) 6. Alvarellos-González, A., Pazos, A., Porto-Pazos, A. B.: Computational models of neuronastrocyte interactions lead to improved efficacy in the performance of neural networks. Computational and Mathematical Methods in Medicine (2012) 7. Mesejo, P., Ibáñez, O., Fernández-Blanco, E., Cedrón, F., Pazos, A., Porto-Pazos, A.B.: Artificial neuron–glia networks learning approach based on cooperative coevolution. Int. J. Neural Syst. 25(4), 1550012 (2015) 8. Izhikevich, E.: Simple model of spiking neurons. IEEE Trans. Neural Netw. 14(6), 1569– 1572 (2003) 9. Li, Y.X., Rinzel, J.: Equations for InsP3 receptor-mediated [Ca2 +]i oscillations derived from a detailed kinetic model: a Hodgkin-Huxley like formalism. Theor. Biol. 166(4), 461–473 (1994)

Team of Neural Networks to Detect the Type of Ignition Alena Guseva1(B) and Galina Malykhina1,2 1

2

Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, St.Petersburg, Russia [email protected] Russian State Scientific Center for Robotics and Technical Cybernetics, 21 Tikhoretsky Prospect, St.Petersburg, Russia [email protected] https://www.spbstu.ru/

Abstract. The article is about the development of a modern multisensory fire system, which has sensors for temperature, CO concentration and smoke concentration. The presence of several different types of sensors allows determine the type of source of ignition, which make possible automatically determine the means of fire extinguishing at the very beginning of the ignition process. The study was carried out on the basis of simulation results obtained in the supercomputer center. It simulated the processes of ignition in the ship’s rooms for various sources of fire: paper, household waste containing plastic, gasoline, alcohol-containing substances and electrical cables. As the study showed, a good result can be obtained with the help of a team of specially organized neural networks. A team of neural networks divided into two levels has been proposed to solve this problem. At the first level, neural networks with partial training are used. At the second level, a probabilistic neural network. The fire system is highly flexible at the hardware level because it has a wireless interface that allows quick reconfiguration. The software of the fire system, in this case also has a high flexibility, allows for simple expansion, contraction or modification of software modules in the conditions of changing sources of ignition in the room. Keywords: Fire system · Source of ignition · Team of neural networks · Semisupervized learning

1

· Bayesian NN

Introduction

The ship’s premises have different fire hazards. Moreover, inside a single room, for example, an engine room, a room with electrical equipment, the probabilities and types of ignition can differ significantly. Means of automatic extinguishing can most quickly eliminate the fire, especially if they are applied locally. To use these tools, you need to know what substance is ignited and where the fire is located. In this case, local application of a suitable fire extinguishing c Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 392–397, 2020. https://doi.org/10.1007/978-3-030-30425-6_46

Team of Neural Networks

393

agent is possible. In the considered multi-sensor fire system, which has sensors for temperature, CO concentration and smoke concentration, it is possible to determine the type of fire. With a sufficient number of sensors and their optimal placement, it is possible to determine the area of ignition. A better result can be obtained using neural networks or a team of neural networks. Therefore, the aim of our study is to develop a neural network data processing algorithm of a multisensory fire system with the goal of the most rapid fire detection, localization and classification.

2

Simulation of Fire

Consider the following sources of ignition and their respective classes in accordance with the classification given in the NFPA10 (National Fire Protection Association) standard [1]: Depending on the type of ignition, the readings of the three types of sensors: temperature, concentration of carbon monoxide and concentration of smoke vary with time, as shown in Fig. 1. The data were obtained using simulations on a supercomputer in the FDS environment [2]. The inertia of the sensors was taken into account at the preprocessing stage using the impulse response of each sensor. The analysis of dependencies in Fig. 1 showed that the source of ignition affects the change in the fire factors received from the sensors. Table 1. Sources of fire and their corresponding class Ignition source

Class

Paper

A1

Household waste containing plastic A2 Gazoline

B1

Alcohol-containing substances

B2

Electrical cable

E

Fig. 1. Changes in fire factors for five sources of ignition, in case of a fire at zero-time, (a). Data received from the temperature sensor, (b). Data obtained from a carbon monoxide concentration sensor, (c). Data obtained from the sensor measuring the concentration of smoke.

394

A. Guseva and G. Malykhina

This distinction can be used to identify ignition sources. The recognition result depends on the number of sensors and their relative location. Three sensors were selected for the investigated room measuring 5 by 7 m, which corresponds to the standards SP 5.13130.2009 [3]. The location of the sensors was optimized using a variant of the genetic algorithm proposed by the authors in the articles. [4–6]. The importance of temporal dependencies for the recognition problem leads to the application of temporal signal processing using dynamic INS with short-term memory. Short-term memory is made on the delay line at the input of the INS. Data from sensors is received once per second (Table 1).

3

Architecture of Detecting Fire Type System

The fire system is built on the basis of a wireless interface and allows for quick reconfiguration at the hardware level when a situation changes in a particular room or when moving from one room to another. The software of the fire system should also have high flexibility, allow simple expansion, contraction or change of fire conditions in the room. A team of neural networks divided into two levels has been proposed to solve this problem. At the first level, neural networks with partial training are used, at the second level, a probabilistic neural network. Let us consider in more detail each part of the algorithmic support (Fig. 2).

Fig. 2. Team of the neural network for fire type recognition

– Input parameters. The number of input parameters n depends on the number of sensors located in the room space and on the length d of the short-term memory delay line d: n = qdK where K is the number of individual sensors included in the multisensor, q is data record from multisensor. In our case, q = 3, while using sensors of temperature, carbon monoxide concentration and smoke concentration, which is indirectly measured based on the definition of visibility. – Delay lines. Since the type of fire is characterized by the dynamics of changes in fire factors, the current and several previous readings are received at the NN input. To determine the required amount of short-term memory, we analyzed the effect of the length of the delay line, which is selected from the series 3, 5, 10.

Team of Neural Networks

395

– Neural networks trained with partial involvement of a teacher (semi-supervised learning). Five first-level neural networks with the same architecture are organized, each of which is designed to identify one type of fire. The input data for neural networks have the same appearance, so it is advisable to use identical in structure neural networks. Having several identical neural networks reduces the total number of parameters for training. In this case, less data is required for training and, under equal conditions, the network is less prone to retraining. – Bayesian neural network. The neural network of the second level is designed to estimate the probability of one of the five types of ignition. The structure of five identical neural networks is shown in Fig. 3: X is data vector of size n obtained from multisensors, V is output value in the range from 0 to 1. At the input layer of each of their neural networks, current and delayed normalized data received from sensors are received. Data normalization is reduced to their reduction to the interval [0, 1]. Neural networks have an input, hidden and output layer. Sigmoid activation functions are used for the hidden and output layers. During partial training of each network, data is used in which only those data that relate to the corresponding type of fire are marked up, the remaining data are considered as “other”. The training was performed using the Levenberg-Markvard error backpropagation algorithm. Amount of data for training is 25025, for testing – 5005.

Fig. 3. The structure of identical neural networks.

4

Evaluation of Ignition Source Recognition Results

The results of verification of neural networks are shown in Table 2. Analysis of the effect of short-term memory, taking into account the training time and network error, showed that the number of delayed samples of sensor signals can be taken equal to 5. The number of delays grater then 5 leads to an increase in training time and a decrease in accuracy. The Bayesian network is used to calculate the probability that a fire belongs to the appropriate class. The structure of the Bayesian network, shown in Fig. 4, is characterized by the input layer, hidden layers and output layer. The input layer is the outputs of five neural networks designed to determine each type of fire. As the activation function of the hidden layer, a normalized exponent was used, which is a generalization of the logistic

396

A. Guseva and G. Malykhina Table 2. Learning networks of twin networks 3 countdowns on entry Artificial neural network to determine the type of fire Electrical cable

Number of delays 3

Training time, Network minutes error, % 01:48

87,1

Electrical cable

5

13:10

93,5

Electrical cable

10

19:18

44,5

Paper

3

02:05

91,6

Paper

5

12:17

95,8

Paper

10

20:24

50,1

3

02:09

97,7

Gazoline Gazoline

5

12:24

98,8

Gazoline

10

20:51

52,4

3

02:07

95,4

Alcohol-containing substances Alcohol-containing substances

5

12:43

97,6

Alcohol-containing substances

10

21:02

51,8

Householdwaste containing plastic

3

02:10

87,5

Householdwaste containing plastic

5

12:29

87,6

Householdwaste containing plastic

10

20:47

43,8

Fig. 4. The structure of the Bayesian network

function. The output layer of the Bayesian network represents the probability that one of five types of fires occurs or there is no fire at all. The sum of all values of the output vector is equal to one. As a result of learning the Bayesian neural network, the resulting error of determining the type of fire was 93.7. Moreover, the main error is related to the work of the previous five neural networks. The time required for training is 18 s.

Team of Neural Networks

5

397

Conclusion

The proposed two-tier architecture has several advantages: – Allows you to perform a simple restructuring of the system in the absence of sources of ignition in the room. To do this, you must remove the first level neural network responsible for detecting this type of source and reduce the number of neurons in the hidden layer of the Bayesian neural network. – Allows simple expansion of the number of types of fire in a given room. To add a new type of ignition, it is enough to add a neural network of the first level and train it to recognize a new type of ignition. To the Bayesian network, you need to add a neuron and retrain only this network, for which the learning time is very short. – Allows you to quickly reconfigure the fire system when moving its units to another room. To do this, it is necessary to determine the optimal location of the sensors and change the system so that it allows to determine the types of sources of ignition possible in this room. The number of neural networks of the first level should be changed as follows: removed the networks responsible for fires, the sources of which are absent in the new room; networks responsible for detecting new types of fires were added and trained.

References 1. NFPA 10: Standard for Portable Fire Extinguishers. https://www.nfpa.org/cod es-and-standards/all-codes-and-standards/list-of-codes-and-standards/detail?code =10 2. McGrattan, K., Hostikka, S., Floyd, J., Baum, H., Rehm, R., Mell, W., McDermott, R.: Fire Dynamics Simulator (Version 5) Technical Reference Guide. National Institute of Standards and Technology, Gaithersburg (2010). http://code.google.com/ p/fds-smv 3. SP 5.13130.2009 Fire protection systems: Installation of fire alarm and fire extinguishing automatic. Norms and rules of design (with Amendment N 1). http://docs. cntd.ru/document/1200071148 4. Malykhina, G.F., Guseva, A.I., Militsyn, A.V., Nevelskii, A.S.: Developing an intelligent fire detection system on the ships. In: Sukhomlin, V., Zubareva, E., ShnepsShneppe, M. (eds.) The International Scientific Conference on II Convergent Cognitive Information Technologies (Convergent’2017), vol. 2064, pp. 289–296. Russia, Moscow (2017) 5. Militsyn, A.V., Malykhina, G.F., Guseva, A.I.: Early fire prevention in the plant. In: International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM), Saint Petersburg, Russia, vol. 2, pp. 1–4. IEEE Explore (2017) 6. Guseva, A.I., Malykhina, G.F., Nevelskiy, A.S.: Neural network based algorithm for the measurements of fire factors processing. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds.) Neural Computation, Machine Learning, and Cognitive Research II. Neuralinformatics Studies in Computational Intelligence, vol. 79, pp. 160–166. Springer, Cham (2019)

Chaotic Spiking Neural Network Connectivity Configuration Leading to Memory Mechanism Formation Mikhail Kiselev(&) Chuvash State University, Cheboksary, Russia [email protected]

Abstract. Chaotic spiking neural network serves as a main component (a “liquid”) in liquid state machines (LSM) – a very promising approach to application of neural networks to online analysis of dynamic data streams. The LSM ability to recognize complex dynamic patterns is based on “memory” of its liquid component – prolonged reaction of its neural network to input stimuli. A generalization of LSM called self-organizing LSM (LSM including spiking neural network with synaptic plasticity switched on) is studied. It is demonstrated that memory appears in such networks under certain locality conditions on their connectivity. Genetic algorithm is utilized to determine parameters of neuron model, synaptic plasticity rule and connectivity optimal from point of view of memory characteristics. Keywords: Spiking neural network Liquid state machine Chaotic neural network Synaptic plasticity Neural network self-organization Memory mechanism

1 Introduction The recently proposed neural network paradigms such as spiking neural networks (SNN), convolutional and deep learning networks are considered by many researchers as a potential basis for the break-through IT technologies of the near future. Since SNNs are complex non-linear dynamic systems, their specific application area is processing of dynamic signals such as video streams, sensory data in robotics or signals from technological sensors. The most common form of SNN architecture used for solution of this kind of problems is the so called liquid state machine (LSM) [1]. LSM is a computational model consisting of the two main parts. The first part is a large chaotic spiking neural network. It is chaotic in the sense that it has no predefined structure (layers etc.). Instead, its connectivity is random – presence of synaptic connection between two given neurons, weight of this connection and its delay are random variables obeying certain statistical distributions. Input data streams represented in form of spike sequences (let us remind that spiking neurons communicate by spikes – short pulses of the constant amplitude and negligible duration) are injected into the network via special afferent synapses. The network responds to stimulation by complex activity of its © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 398–404, 2020. https://doi.org/10.1007/978-3-030-30425-6_47

Chaotic Spiking Neural Network Connectivity Configuration

399

neurons which may depend on recent history of the input signal. Activity of the neurons (in form of spike counts in equal time intervals) is monitored by the second part of LSM, the read-out mechanism. This mechanism implements supervised learning – it learns to use LSM neuron activity data to classify input stimuli, to make predictions, to recognize exceptional situations and to perform other data analysis and prediction tasks. Nature of the read-out mechanism may be very diverse. It may be any suitable data mining algorithm – logistic regression, support vector machine, decision tree, naïve Bayesian classifier or anything else – it is required only that it should be fast and could work with very multi-dimensional data. It is assumed that valuable predictive features are hidden in the multi-dimensional and diverse reaction of the large SNN to input signal, and the job of the read-out layer is to mine them in the seeming chaos of the SNN activity. In the original version of LSM which is used now by the majority of researchers, neurons are not plastic – the synaptic plasticity is switched off. However, there are many reasons to believe that it can play positive role. Indeed, the strong feature of LSM is its randomness. It makes possible to implement all kinds of computations on input data (provided that the SNN is sufficiently large). But at the same time, randomness is an evident weakness of the LSM concept – small number of useful circuits in the networks is neighbored by plenty of random network subsets performing senseless or trivial operations. Thus, there is tempting opportunity to preserve computation generality provided by chaotic connectivity while eliminating senseless circuits in the process of guided self-organization implemented in the form of synaptic plasticity. It leads us to concept of self-organizing LSM (SOLSM). Test of this hypothesis and creation of the practically usable SOLSM are among aims of the research project ArNI (Artificial NeuroIntelligence). The crucial feature of LSM explaining its efficiency for processing of dynamic data is its memory ability (the transient working memory is meant here, not to be confused with the constant long-term memory fixed in synaptic weights). If the spatio-temporal pattern to be recognized spans significant time interval, the network should memorize its beginning until its final part is presented. It is true for SOLSM, as well. However, appearance of the memory mechanism in evolving chaotic SNN is very poorly explored process. Some of the earlier works of the author were devoted to this subject [2, 3]. However, the structured SNNs were studied in these works. At present, the majority of working memory models in SNN is based on short-term plasticity, an additional process modifying synaptic weights which acts together with the conventional long-term STDP plasticity (see, for example, [4]). Different researchers include this mechanism in their models in different forms. For example, in the pioneering work of Izhikevich [5], short-term plasticity enables formation of the so called polychromous neuronal groups (PNG), whose sporadic activation indicates recent appearance of the stimulus specific for the given PNG. Other approaches utilize the notion of attractors [6], meta-stable states of the network preserving information expressed by the attractor in time. Further extension of this idea called continuous attractors explains how continuous values can be stored in memory [7]. However, most of these approaches cannot be directly applied to SOLSM because either cannot be implemented in chaotic networks (like continuous attractors) or use complicated synaptic plasticity models (especially, keeping in mind that LSM does not use synaptic plasticity at all).

400

M. Kiselev

Thus, our aim is to study how working memory can appear in chaotic SNN with Hebbian long-term synaptic plasticity.

2 Model of Neuron and Synaptic Plasticity The simplest but functional leaky integrate-and-fire (LIF) neuron model with currentbased excitatory synapses and conductance-based inhibitory synapses was used in this study. Upon receiving a spike at the moment tijþ , the i-th excitatory synapse instantly increments the neuron membrane potential u by a value equal to its weight wiþ . The k-th inhibitory synapse receiving spike instantly increments inhibitory membrane conductance c by the value of its weight w k . In the absence of input spikes, u and c decay to 0 with time constants su and sc, respectively. When u reaches a threshold value, the neuron emits a spike. After that, the neuron cannot emit new spike during the refractory period sR. Values of membrane potential are selected such that its resting value equals 0 and its threshold value equals 1. While the value of c is not equal to zero, the membrane potential falls exponentially to the inverse inhibitory potential UI (which is negative) with the time constant 1/c. Thus, the used neuron model is described by the following equations: 8 P þ þ du u > > ¼ c ð u U Þ þ w d t t I < dt i ij su i;j P dc c > > w i d t tij : dt ¼ sc þ

ð1Þ

i;j

and the condition that if u > 1 and t > Ta + sR, where Ta is the moment when this neuron fired last time, then the neuron fires and u is reset to 0. The plasticity rule used in this work is based on the spike timing dependent plasticity (STDP) model. As in our previous works [8, 9], the lower and upper limits (wmin and wmax) on synaptic weight values are set by using the so called synaptic resource W, whose value depends monotonically on the weight value w in accordance with the following formula: w ¼ wmin þ

ðwmax wmin Þmax ðW; 0Þ : wmax wmin þ max ðW; 0Þ

ð2Þ

Each long-term potentiation (LTP) or long-term depression (LTD) act increases or decreases W by a certain value but value of w always remains in the interval [wmin, wmax). LTP occurs when neuron fires in short time after arrival of the presynaptic spike and has its classic form W ! W þ Dw þ exp ðDt=sw Þ, where Δt is a time interval between post- and pre-synaptic spikes, Δw+ and sw are constants (we select sw equal to su). Rule for LTD is different and much simpler – synapse is depressed by the value Δw− every time it receives spike. Besides that, the total value of W in all synapses of one neuron is kept constant – when some synapse is depressed or potentiated, all the rest are modified in the respective direction by an equal value.

Chaotic Spiking Neural Network Connectivity Configuration

401

All postsynaptic connections of excitatory neurons (E) have non-negative weights, weights of postsynaptic connections of inhibitory neurons (I) are negative (and constant in time).

3 External Signal and Memory Ability Tests Now let us describe how the memory ability of the SNN is evaluated. Informational input of the SNN is represented as a certain number of nodes – sources of spikes (in our experiments this number was equal to 600). These nodes emit low intensity Poissonian noise (mean spike frequency 0.1 Hz). Besides that, every 100 ms, some group of input nodes begins to emit high intensity (100 Hz) Poissonian noise. This high frequency signal lasts 40 ms (below, it will be also called pattern). These groups do not intersect. We used 30 groups (patterns), 20 nodes per group. Order, in which these groups became active, was random. The task was to predict which group was active during the preceding time interval using the network activity (spike counts of each neuron) measured in the current interval. Successful prediction would mean that the network memorizes properties of input signal during at least 60 ms and that this memory is sufficiently stable – it is not destroyed immediately by activity of the next input node group. Random forest data mining algorithm [10] was chosen as a read-out mechanism because of its speed and stability in case of very numerous predictors. In the described series of experiments, the whole simulation lasted 1600 s. It was assumed that during first 800 s the network reaches a certain equilibrium state. If it really does then during last 800 s no significant synaptic weight modifications should be observed. In this case, this second half of simulation period was used for measurement of the pervious pattern prediction accuracy as was said above. Interneuron connections have non-zero delays. Inhibitory connections are always fast (have 1 ms delay).

4 Network Connectivity Since it is not clear a priori which connectivity configuration could lead to formation of memory in SNN, the three following variants were tested: • Neural gas. All neurons have identical number of synapses of each kind - excitatory, inhibitory and afferent, connecting a neuron with input nodes (they are always excitatory) but the set of presynaptic neurons is selected randomly for every neuron. Synaptic weights and delays are also random and selected using the same distribution law for all neurons, but different for connections E ! E; E ! I; I ! E and I ! I. • Bottleneck. The same as above but only a small fraction of all neurons have afferent links. • Sphere. Let us imagine that all neurons correspond to randomly selected points of a sphere with radius equal to 1. The synaptic delays of excitatory links are

402

M. Kiselev

proportional to the length of the links. Network connectivity obeys the “small world” law – all neurons have the same numbers of long and short links. Long links are created by the same rule as in the two previous schemas. Postsynaptic neurons for short links are selected using the probability distribution pðrÞ exp ðr aÞ2 =2b2 , where r – is the distance to postsynaptic neuron, a and

b – the constants (for excitatory links a = 0).

5 Genetic Algorithm Finding Network with the Best Memory Thus, three kinds of chaotic SNNs were explored. Each one is characterized by 30 + parameters (constants, entering neuron model and plasticity rule, structural properties of the network). Criterion for evaluation of their memory ability was described in Sect. 3. Therefore, finding the best SNN is an optimization problem. This type of optimization problems are solved efficiently by the genetic algorithm (GA) and it was selected as an optimization technique in this study. Optimization was performed for networks of the same size (10000 neurons). The population size in all cases was 300. The mutation probability per individual was 0.5; elitism – 10%. Optimization was stopped when 3 consecutive populations had not shown progress.

6 Results The GA optimization performed in this study showed that the connectivity configurations “neural gas” and “bottleneck” show almost no signs of emerging memory mechanism. The best accuracy obtained for “neural gas” was 6.34%, for “bottleneck” – 7.25%. It is too low accuracy, close to the baseline lazy classifier accuracy which equals approximately to 3.3% for 30 equally frequent patterns. Interestingly, synaptic plasticity was found to be a definitely positive factor – without it the accuracy fell to 4.29%. At the same time, formation of memory mechanism in a “sphere” SNN was reliably demonstrated (accuracy 25.7%). The best network is characterized by very sparse and local connectivity – excitatory neurons have 7 excitatory synapses such that 6 of them are connections with the closest neurons and only 1 link is “far”. Number of inhibitory synapses is only 3, all inhibitory links are “local” (a = 0.00653, b = 0.00315). The optimum percent of inhibitory neurons was 7.82%. Another interesting feature of the best network is significant difference of time constant su for excitatory and inhibitory neurons (14/4 ms). Dependence of the accuracy on the network size was studied (for fixed optimum values of the other parameters). It is shown on Fig. 1. We see that it is almost linear on logarithmic scale. The computations were performed on three GPU servers using the high performance SNN simulation package ArNI. A SNN consisting of 100000 neurons is simulated at the speed 7 times slower than real time on a powerful PC with 4 NVIDIA TITAN Xp cards provided for this project by Kaspersky Lab.

Chaotic Spiking Neural Network Connectivity Configuration

403

Accuracy, %

80 60 40 20 0 6000

24000

96000

Network size Fig. 1. Dependence of the pervious pattern determination accuracy on the SNN size.

7 Conclusion The results obtained in this work let us make the following conclusions: • The connectivity scheme used in the traditional LSM is not optimal from the viewpoint of LSM memory characteristics and therefore may limit its ability to produce valuable predictive features from dynamic data. To reach higher performance the “small world” connectivity scheme described above should be used. • SOLSM (LSM with plastic neurons) can outperform traditional LSM due to fuller usage of network resources (restructuring silent or constantly active neuronal groups). • Network size is very important. It is possible that the power of SOLSM will be unveiled in full only in case of very large SNN still unavailable for commonly used hardware platforms (such as GPU servers). The type of SNNs studied in this work is very hard for theoretical and empirical exploration. This scientific problem requires significant research efforts. The presented results while being significant and valid should still be considered as preliminary. Systematic study of SOLSM is being carried out now as a part of the research project ArNI supported by Kaspersly Lab, its results will be reported in further publications. Acknowledgements. I would like to thank Andrey Lavrentyev and Artyom Nechiporuk for valuable discussion. I am grateful to Kaspersky Lab for the powerful GPU computer provided.

References 1. Maass, W.: Liquid state machines: motivation, theory, and applications. In: Computability in Context: Computation and Logic in the Real World. World Scientific, pp. 275–296 (2011) 2. Kiselev, M.: Self-organization process in large spiking neural networks leading to formation of working memory mechanism. In: Rojas, I., Joya, G., Cabestany, J. (eds.) Proceedings of IWANN 2013. LNCS, vol. 7902, Part I, pp. 510–517 (2013)

404

M. Kiselev

3. Kiselev, M.: Self-organized short-term memory mechanism in spiking neural network. In: Proceedings of ICANNGA 2011 Part I, Ljubljana, pp. 120–129 (2011) 4. Fiebig, F., Lansner, A.: A spiking working memory model based on Hebbian short-term potentiation. J. Neurosci. 37(1), 83–96 (2016) 5. Szatmary, B., Izhikevich, E.: Spike-timing theory of working memory. PLoS Comput. Biol. 6(8), e1000879 (2010) 6. Lansner, A., Marklund, P., Sikström, S., Nilsson, L.-G.: Reactivation in working memory: an attractor network model of free recall. PLoS ONE 8(8), e73776 (2013). https://doi.org/10. 1371/journal.pone.0073776 7. Seeholzer, A., Deger, M., Gerstner, W.: Stability of working memory in continuous attractor networks under the control of short-term plasticity. PLoS Comput. Biol. 15(4), e1006928 (2019). https://doi.org/10.1371/journal.pcbi.1006928 8. Kiselev, M.: Rate coding vs. temporal coding – is optimum between? In: Proceedings of IJCNN-2016, pp. 1355–1359 (2016) 9. Kiselev, M., Lavrentyev, A.: A preprocessing layer in spiking neural networks – structure, parameters, performance criteria, accepted for publication. In: Proceedings of IJCNN-2019 (2019) 10. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A: 1010933404324

The Large-Scale Symmetry Learning Applying Pavlov Principle Alexander E. Lebedev(&), Kseniya P. Solovyeva, and Witali L. Dunin-Barkowski Scientific Research Institute for System Analysis of Russian Academy of Sciences, Moscow, Russia [email protected]

Abstract. Symmetry detection task in the domain of 100-dimension binary vectors is considered. This task is characterized by practically infinite number of training samples. We train an artificial neural network with binary neurons to solve the symmetry detection task. Weight changing of hidden neurons is performed according to Pavlov Principle. In the presence of error, synaptic weights are adjusted considering a matrix of random weights. After training on a relatively small number of data samples our network obtained generalization ability and detects symmetry on data not present at the training set. The obtained averaged percentage of correct recognition of our network is better than those of classic perceptron with fixed weights of synapses of neurons of hidden layer. We also compare performance of different modifications of the architecture including different number of hidden layers, different number of neurons in hidden layer, different number of neurons’ synapses. Keywords: Symmetry detection Pavlov Principle Artificial neural nets Feedback alignment Biologically plausible learning

1 Introduction 1.1

History of Symmetry Detection Task

Symmetry detection task is a tradition benchmark in the field of Artificial Intelligence and Artificial Neural Networks research. In [1], one of the first works, introducing backpropagation for training artificial neural nets, mirror symmetry detection was one of the first tasks to test the algorithm. In this example the net had six input neurons divided into two groups. The answer should be considered positive if the activity value (which can be either 0 or 1) of each input neuron of the first group is equal to the value of corresponding neuron of the second group. The learning required about 100000 presentations of input vectors, with the weights being adjusted on the basis of the accumulated gradient after each sweep. Another example of symmetry detection was presented in [2]. In this work a Boltzmann machine was utilized. The task was formulated as to recognize whether a binary square image was symmetric. Different types of symmetry were considered like horizontal, vertical or diagonal. Input data was represented by 4 4 and 10 10 © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 405–411, 2020. https://doi.org/10.1007/978-3-030-30425-6_48

406

A. E. Lebedev et al.

binary images, corresponding to 16 and 100 size of input vector respectively. Boltzmann machine, trained on a set of randomly selected such images, obtained 98.2% accuracy on 4 4 problem an 90% accuracy on 10 10 problem. 1.2

Pavlov Principle and Biologically Plausibly Learning Algorithms

Since its introduction in 1986 backpropagation of errors became a dominant algorithm for training multi-layer neural nets. However, many neuroscientists argue that it is impossible in a real brain due to several considerations. For example, it seems unrealistic that axons have mechanisms to propagate back complex information needed to calculate all corresponding derivatives. After pioneering work [3] of Timothy Lillicrap many studies proposed different alternative algorithms for training neural networks that were intendent to be more biologically plausible. In [3] a comparable in efficiency to deep learning by backpropagation result was achieved without backpropagation of error utilizing derivatives. Instead error signals are transmitted to previous layers using random feedback weights. Intuition, gained from such experiments and neurobiological studies, can be generalized. In 2015 Pavlov principle was introduced. It was formulated in [4] as follows: PAVLOV PRINCIPLE (PP): The network of neurons, such that the strength of each of the connection between neurons is gradually changing as a function of locally available error signal components and activity states of the neurons connected, comes in the process of network functioning to error-free operation. One of the closest implementation, resembling this principle (although probably unaware of it) was performed by Nokland [5]. In this study a random matrix of weights was used to propagate error information to neurons of hidden layers. This architecture was efficient to successfully solve MNIST problem.

2 The Model 2.1

Our Formulation of Symmetry Problem

In this work we investigate symmetry problem, applying Pavlov principle. It had been briefly introduced in [4]. Our formulation of symmetry recognition task is following. The input vector consists of fixed even number of binary variables divided into two subsets. Each variable in the first subset has a corresponding “pair” variable in the second subset. The input data sample is considered symmetric if the values of variables in each “pair” are equal to each other. If a data sample doesn’t fit to the symmetry class, it is considered as non-symmetric, or belonging to class 0. In our experiments we consider input vectors with the size of 100. In this case there are 2100 possible different data samples, which is practical infinity. And for each symmetry class there are 2100 symmetric samples, which is greater than 2100 . This makes including every possible data sample into a training set in fact impossible. The classifier should have a strong generalization ability to successfully solve this problem. To generate a data sample we use the following strategy. First, the class number is selected randomly between given set of classes including symmetric classes and non-

The Large-Scale Symmetry Learning Applying Pavlov Principle

407

symmetric class. For a symmetry class we assign a random binary value to each variable of the first subset and the same value to each corresponding variable of the second subset. This guarantees the data sample to be symmetric in the chosen way. If the data sample should belong to non-symmetric class, each variable is assigned to a random value. Then we check the sample for symmetry. If the data sample happened to belong to the symmetric class, we select randomly one variable and change its value to the opposite. Since symmetric vectors never have odd number of 0 s or 1 s, this procedure guarantees to generate a non-symmetric data sample. However such percussion is practically unnecessary since there is only a 2100 chance that a randomly generated 100-digit binary vector will accidentally happen to be symmetric. 2.2

Learning Procedure and Experiments

To solve the symmetry recognition problem we train a neural net applying Pavlov principle. The network consists of input layer, one or several hidden layers and an output layer. The usage of a hidden layer is crucial, because the activity in an individual input unit, considered alone, provides no evidence about symmetry or non-symmetry of the whole input vector, so simply adding up the evidence from the individual input units is unsufficient. The interpretation of Pavlov Principle, used in this study, includes using error signals, derived from comparison of actual output values (taken from neurons of output layer) with desired ones, explicitly to train neurons of hidden layers. Like in [4] these signals are weighted by randomly chosen but fixed weights. However as long as we don’t need to compute derivatives of error function, we use binary McCulloch–Pittslike neurons with threshold activation function. More formally, the neurons of hidden and output layers perform the following operation: Y ðtÞ ¼ Sð

N X

wi ðtÞ Xi ðtÞ bÞ

ð1Þ

i¼1

Here 2100 are components of the binary input vector X at step t, presented at i-th synapse, and 2100 is the corresponding output value. Since the architecture used in our study doesn’t imply recurrent connections, layerwise successive computation of neurons’ outputs can be viewed as performed on the same step. 2100 is a binary inputoutput threshold activation function, it equals to 1 if its argument is greater than 0, and it is 0 otherwise. 2100 is the weight of input variable with index i 2100 and b is a threshold value. The learning rule of a hidden neuron can be formalized as following: wi ðt þ 1Þ ¼ wi ðtÞ þ e Fð

XK K1

Ek ðtÞ ek;i ; YðtÞ; XðtÞÞ

ð2Þ

Here, e is a learning rate factor which determines the speed of weight changing. E is a K-component error vector, where k is the number of output values multiplied by 2. Each component of an output vector 2100 corresponds to two error components, 2100

408

A. E. Lebedev et al.

and 2100 . They both equal to 0 if oj ðtÞ match with the desired value. E2j ðtÞ equals 1 only if oj ðtÞ is greater than its desired value (i.e. it equals to 1 when 0 is desired) and E2j þ 1 ðtÞ equals to 1 only if oj ðtÞ is less than desired (i.e. it equals to 0 when 1 is desired). 2100 is a fixed weigh, associated with k-th error component and propagated to i-th synapse. 2100 sets the learning rule. In most cases in this study we use the following learning formula: F

XK

XK E ð t Þ e ; Y ð t Þ; X ð t Þ ¼ E ð t Þ e ðY ðtÞ 0:5Þ k k;i k k;i K1 K1 ðX ðtÞ 0:5Þ 4

ð3Þ

For training output neurons we use a similar formula, but ek,i are not selected randomly. Instead we use 2100 and 2100 for all i where j is the index of corresponding output class. Other e-factors are equal to 0, so the total impact of error is equal to the difference between the desired output value and the actual output. This makes the learning rule of output neurons similar to delta-rule for classic perceptron. It increases the weights of inputs, that are positively correlated with the desired output and decreases weights of inputs, that are negatively correlated with the desired output.

1 42 83 124 165 206 247 288 329 370 411 452 493 534 575 616 657 698 739 780 821 862 903 944 985

100 90 80 70 60 50 40 30 20 10 0

learning rate 0.001 learning rate 0.01 fixed hidden neurons, learning rate 0.001 Fig. 1. The history of changing of percentage of correct symmetry recognition for different values of learning rate and for perceptron with fixed weights of neurons in hidden layer. The vertical axis corresponds to percentage of correct answers. The horizontal axis corresponds to the number of training steps (in thousands).

The Large-Scale Symmetry Learning Applying Pavlov Principle

409

We tested our neural net on the symmetry detection problem with different settings. In the primary one we used one hidden layer with 400 neurons and 2 output neurons, corresponding to symmetric and non-symmetric classes. Each neuron was connected to each neuron of the previous layer. On Fig. 1 we present two examples of history of changing of average (averaged over last 1000 steps) percentage of correct answers for symmetry class during training. The learning process lasted 1000000 steps in this experiment. Since the learning process stabilizes after 500000 steps for learning rate 0.01, we reduced the number of steps to this number for next experiments. The final average percentage of correct answers was 94.80%. We compared the obtained results with the performance of classic perceptron which have neurons in hidden layer with fixed random weights. With a similar configuration (400 neurons in one hidden layer, 1000000 training steps, learning rate 0.001) its average obtained performance was 59.38% of correct symmetry recognition which is slightly better than a random guess. The history of changing of percentage of correct answers for symmetry class for perceptron with fixed weights in hidden layer is also shown in Fig. 1 with dotted line. Next we investigated architectures with more than one hidden layers. We tested configurations with 1, 2, 3 and 5 hidden layers. Table 1 shows percentage of correct answers for symmetry class, obtained after 500000 steps of training. These percentages were measured during special test phase with fixed weights and lasted for 10000 steps. The results were averaged over several independent runs. The obtained accuracy decreases with the increase of number of hidden layers. However it was still better, than a perceptron with fixed weights in hidden layer.

Table 1. Comparison of performance of configurations with different number of hidden layers. Number of hidden layers 1 2 3 5

Obtained percentage of correct answers for symmetry class (averaged over several runs) 94.80% 82.14% 74.45% 62.23%

Standard deviation 1.41% 2.20% 2.37% 7.76%

Number of runs 5 3 3 3

We also investigated the impact of the amount of neurons in the hidden layer. We tested configurations with 200, 400 and 800 neurons in hidden layer (with only one hidden layer). As can be observed from Table 2, the increase of neurons in hidden layer increases the performance of the neural network.

410

A. E. Lebedev et al.

Table 2. Comparison of performance of configurations with different number of neurons in hidden layer. Number of neurons Obtained percentage of correct in hidden layer answers for symmetry class (averaged over several runs) 200 88.82% 400 94.80% 800 98.28%

Standard deviation Number of runs

5.86% 1.41% 0.53%

3 5 3

We also investigated architectures where the network was not fully connected. Instead each neuron in hidden layer was randomly connected to the fixed number of neurons in the previous layer. Neurons of output layer remained connected to all neurons of the previous layer. Table 3 presents the obtained performance for different number of connections

Table 3. Comparison of performance of configurations with different number of connections of neurons of hidden layer. Number of connections Obtained percentage of correct Standard deviation Number of runs per neuron answers for symmetry class (averaged over several runs) 25 89.48% 3.69% 3 50 95.92% 0.77% 4

The learning procedure of the experiments, mentioned above, included no weight normalization procedure. We performed a series of separate experiments to investigate the impact of different normalization strategies. We examined normalization by sum (which adjusts weights to preserve the sum of weights) and normalization by sum of squares (which adjusts weights to preserve the sum of squares of weights). The result is presented in Table 4. These experiments were carried out with basic configurations parameters i.e. 400 neurons in one hidden layer. The training lasted for 500000 steps. As it can be obtained, normalization by preserving sum of weights completely destroys the learning process, showing 51.32% accuracy which is not better than a random guess. By the way, normalization by preserving sum of squares managed to slightly improve the performance. In the last setting we also used lower learning rate since the normalization procedure leads to appearance of weights with low absolute value and high learning rate can influence them too much.

The Large-Scale Symmetry Learning Applying Pavlov Principle

411

Table 4. Comparison of performance with different normalization strategies. Normalization strategy type

No normalization Normalization by preserving sum Normalization by preserving sum of squares, learning rate 0,001

Obtained percentage of correct answers for symmetry class 94.80% 50.32%

Standard deviation

Number of runs

1.41% 0.69%

5 3

98.76%

0.80%

3

3 Conclusion In this work we investigated one of possible implementations of Pavlov Principle and applied it to symmetry detection problem in the domain of 100-dimensional binary vectors. Our implementation implies utilizing error signals from output neurons to adjust weights of hidden neurons by weighting them by fixed random weights. Unlike [5] we use binary neurons with threshold activation function. Although only a tiny fraction of all possible data samples were used to train the neural network, it managed to obtain generalization ability and detect symmetry of previously unpresented data samples. This proves the general plausibility of Pavlov Principle. However the performance didn’t improve by adding extra hidden layers. We suggest that using some modifications of learning algorithm will allow overcoming this problem and further improving the performance while remaining consistent with Pavlov Principle. Eventually this approach will lead toward appearance of more biologically plausible learning algorithms and utilizing them to create general artificial intelligence. Acknowledgements. The work is financially supported by State Program of SRISA RAS No. 0065-2019-0003 (AAA-A19-119011590090-2).

References 1. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–534 (1986) 2. Sejnowski, T.J., Kienker, P.K., Hinton, G.E.: Learning symmetry groups with hidden units: beyond the perceptron. Phys. D Nonlinear Phenom. 22(1–3), 260–275 (1986) 3. Lillicrap, T., Cownden, D., Tweed, D.B., Akerman C.J.: Random feedback weights support learning in deep neural networks. arXiv:1411.0247 (2014) 4. Dunin-Barkowski, W.L., Solovyeva, K.P.: Pavlov principle and brain reverse engineering. In: 2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, Saint Lois, Missouri, USA, 30 May–2 June 2018, vol. 37, pp. 1–5 (2018) 5. Nokland, A.: Direct feedback alignment provides learning in deep neural networks. arXiv: 160901596 (2016)

Bimodal Coalitions and Neural Networks Leonid Litinskii(&) and Inna Kaganowa Scientific Research Institute for System Analysis RAS, Nakhimov Ave 36-1, Moscow 108840, Russia [email protected]

Abstract. We give an account of the Axelrod – Bennet model that describes formation of a bimodal coalition. We present its initial formalism and applications and reformulate the problem in terms of the Hopfield model. This allowed us to analyze a system of two homogeneous groups of agents, which interact with each other. We obtained a phase diagram describing the dependence of the bimodal coalition on external parameter. Keywords: Bimodal coalition

Hopfield model Homogeneous groups

1 Introduction In the early 90s R. Axelrod and D. Bennet proposed an approach for a formal description of splitting of a set of interacting agents into two competing groups [1, 2]. Their results have found applications in social, politic, and management sciences. Then Galam [3] reformulated this approach in terms of the Ising model. Afterwards he complicated the initial scheme and proposed a number of new models (see references in [4]). The following development of this approach led to the appearance of the econophysics and sociophysics. In this paper, we solve the same problem using the ideas and concepts of the discrete dynamic of the Hopfield model. We analyze analytically an idealized case of two equally interacting homogeneous groups of the agents and construct a phase diagram that describes completely how the decomposition of the agents into two groups depends on the intra-group interaction and cross-interaction between groups. Following tradition, the decomposition of the agents into two groups will be called a bimodal coalition.

2 Bimodal Coalition Problem 1. The original setting of the problem. We have n agents that are connected with each other. By wi ; i ¼ 1; . . .; n we define the weight of the i-th agent. The connections of the agents we interpret in terms of their mutual propensity and suppose that propensities are symmetrical:

© Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 412–419, 2020. https://doi.org/10.1007/978-3-030-30425-6_49

Bimodal Coalitions and Neural Networks

pij :

[ 0; if agents i and j are prone to cooperate; \0; if agents iand jare prone to conflict:

413

pij ¼ pji :

~ define a bimodal coalition C ¼ ðA; AÞ ~ or, in other words, deTwo lists A and A composition into two groups. Each of these lists contains all the numbers of agents assigned to the given group: ~ ¼ InA; where I ¼ f1; 2;. . .,ng is the full list. A ¼ fi1; ; i2 ; . . .; ip g; A ~ provides a proximity relation dij between the agents: Each grouping C ¼ ðA; AÞ dij ðCÞ ¼

1; if agents i and j belong to the given list; 0; if agents i and j belong to different lists:

~ for the i-agent as Let us define a productivity of the grouping C ¼ ðA; AÞ Ui ðCÞ ¼

n X

wj pij dij ðCÞ:

j¼1

The productivity of the grouping C for the i-th agent is maximal if all the other agents with which the given agent is prone to cooperate belong to his group and the group does not contain agents with which it is prone to conflict. In the Axelrod-Bennet it is stated that a system of agents tends to those grouping for which the weighted sum of the productivities is maximal: UðCÞ ¼

n X

wi Ui ðCÞ ! max:

ð1Þ

i¼1

2. Applications. The described approach was applied when analyzing compositions of the belligerent coalitions during the World War II. The agents were 17 European countries. An integral index defined the weight of each country. It was calculated as a combination of different demographic, industrial, and military characteristics. The mutual propensities were calculated using the data for 1936 and criteria that included ethnic conflicts, religion, frontier incidents, political regime and so on. The maximization of the sum UðCÞ led to two maximums. The global maximum corresponded to the following decomposition: A = {Britain, France, USSR, Czechoslovakia, Yugoslavia, Greece, and Denmark}; ~ = {Germany, Italy, Poland, Romania, Hungary, Portugal, Finland, Latvia, A Lithuania, and Estonia}. We see that only Poland found itself in the improper camp (as well as Portugal, which was a neutral nation during the war). Let us make it clear that the block to which the given country belonged was determined by taking into account who occupied the country or who declared war on it.

414

L. Litinskii and I. Kaganowa

In other paper of the same authors they used this method to describe alliances of producers of standards of UNIX operating systems. Nine companies involved in the UNIX production were regarded as agents. They are AT&T, Sun, Apollo, DEC, HP, Intergraph, SGI, IBM and Prime. In the course of cumbersome calculations of the connections pij , some parameters of the problem played the role of weight coefficients. By varying the parameters within reasonable limits they discovered only a weak dependence of the result on the values of the parameters. The authors found that there were two decompositions of the functional (1) that provided the same global maximum: • {Sun, DEC, HP} and {AT&T, Apollo, Intergraph, SGI, IBM, Prime}; • {Sun, AT&T, IBM, Prime} and {DEC, HP, Apollo, Intergraph, SGI}. The second grouping corresponded to the existing associations of the companies in UNIX International and OPEN Software Foundation and only IBM was identified incorrectly. 3. Ising model. In the second half of 90 s Serge Galam recognized that it was convenient to formulate the Axelrod-Bennet model in terms of the Ising model. Let us introduce a matrix J ¼ ðJij Þ; Jij ¼ pij wi wj ð1 dij Þ; i; j ¼ 1; n where dij is the Kronecker delta and the diagonal elements of the matrix J are equal to zero. To each bimodal coalition C we assign a configuration vector s ¼ ðs1 ;s2 ;. . .;sn Þ: ~ , s ¼(s1 ,s2 ,. . .,sn ): C ¼ ðA; AÞ

si ¼ 1; i 2 A ~ si ¼ 1; i 2 A:

Then the maximization of the sum (1) is equivalent to the determination of the state s corresponding to the global minimum of the energy EðsÞ: EðsÞ ¼ ðJs; sÞ ¼

n X

Jij si sj ! min:

ð2Þ

i;j¼1

The problem (2) is a well-known minimization problem of a quadratic form of binary variables. This problem arises in various scientific fields. 4. Hopfield model. Let us analyze the described system in terms of a neural network of the Hopfield type. As the context may require, in what follows we refer to the binary variables si ¼ 1 as binary agents or spins. The state of the system we describe by a configuration vector s ¼ ðs1 ;s2 ;. . .;sn Þ. Let us introduce a dynamic procedure on which the Hopfield model is based. Let P sðtÞ be the state of the system at time t. At this moment a local field hi ðtÞ ¼ Nj¼1 Jij sj ðtÞ acts on the i-th spin. At the next moment t þ 1 the state of the spin changes if its sign does not coincide with the sign of the field hi ðtÞ, and it remains unchanged otherwise:

Bimodal Coalitions and Neural Networks

si ðt þ 1Þ ¼

si ðtÞ; when si ðtÞhi ðtÞ 0 si ðtÞ; when si ðtÞhi ðtÞ\0

,

si ðt þ 1Þ ¼ signðhi ðtÞÞ:

415

ð3Þ

In what follows, an unsatisfied spin is a spin whose sign does not coincide with the sign of the field acting on it. If the state of the i-th spin changes, then its contribution to the local fields acting on the other spins also changes. As a result, the state of some other spins can also change etc. The evolution of the system consists of subsequent turns of unsatisfied spins. Each step of the evolution is accompanied by a decrease of the energy of the state, and sooner or later the system reaches a state that corresponds to an energy minimum (it may be a local minimum). At that moment, the evolution of the system will stop, since all the spins will be satisfied. However, according the setting of the problem, we have to find the global minimum. For this purpose we can use improved procedures of minimization [5, 6]. The formulation of problem (3) in terms of neural networks allows us to illustrate the problem of bimodal coalition formation. Concluding this section let us note, that all the energies are two-fold degenerate: EðsÞ ¼ ðJs; sÞ ¼ EðsÞ. To remove the degeneration we need an external field.

3 Homogeneous Groups of Agents 1. One homogeneous group. A homogeneous group is a group where all the agents interact identically. In this case the interaction matrix has the form 0

0 Ba J¼B @ ...

a 0 .. .

a

a

1 a aC . C; .. . .. A 0

a [ 0:

ð4Þ

The network with such a connection matrix has only one the global minimum of the energy s0 ¼ ð1; 1; . . .; 1Þ; and there are no other minima. (We do not take into account the second minimum that appears due to the equality EðsÞ ¼ EðsÞ). In other words, the states of all the agents are the same. It can be said that all the agents behave “as one person”. If we turn to Eq. (2) we see that for the system with the connection matrix (4), not a bimodal coalition but a consolidation of all the agents into one group is profitable. 2. Two homogeneous groups. Let us examine a spin system consisting of two homogeneous groups. We suppose that in the first group there are p agents and the interactions between these agents are identical and equal to A. The interactions between the remaining q agents (that constitute the second group) are also identical and equal to C. We suppose that all the interactions between the agents from the first and second groups are equal to B. We assume that C is positive and larger than A and B, and factor out C. Now the connection matrix has the form

416

L. Litinskii and I. Kaganowa

0

p

zfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflffl{ B 0 a a B B a 0 a B . . . B . . . . ... B . . B J ¼ B a a 0 B B b b b B B b b b B . . . @ .. .. . . ... b b b

1 q zfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflffl{ b b b C C b b b C . C .. .. . . C . .. C . . C b b b C; C 0 1 1 C C 1 0 1 C .. C .. .. . . . . A . . 1 1 0

a ¼ A=C; b ¼ B=C:

For a neural network with such a connection matrix we will describe the dependence of the set of the minima upon the parameters a, b, p, and q, where p þ q ¼ n: One can show that if a configuration corresponds to a minimum of the energy (2) its last q coordinates have to be identical: s (s1 ,s2 ,. . .sp ,1,1,. . .,1).

ð5Þ

Consequently, there are at most 2p configurations that can be minima of the functional (2) and the last q coordinates of all the configurations are identical. Let us divide the set of 2p configurations into classes Rk in such a way that the configurations where among the first p coordinates exactly k coordinates equal to –1 belong to the class Rk . Since k takes the values k ¼ 0; 1; 2; . . .; p; there are p þ 1 such classes. Let us write down these classes in the explicit form indicating how many configurations belong to a given class (see Table 1). Table 1. Classes Rk and numbers of configurations in these classes

Bimodal Coalitions and Neural Networks

417

It turns out that for some values of the parameters, only the configurations from the class Rk provide the minimum of the functional (2) simultaneously. For all the configurations from Rk the energies (2) are the same. In other words, they are minima, and if the inequality 0\k\p is fulfilled then there are no other local minima of the functional (2). In Fig. 1 we show the partition of the ða; bÞ-plane into regions where one class or other of the configurations Rk provides a minimum of the functional (2). Below we interpret this diagram in terms of bimodal coalitions.

Fig. 1. The phase diagram for the problem (2).

3. Sensible interpretation. From Eq. (5) for the coordinates of the global minimum it follows that the second group always acts “as one person” – the last q spins are equal to +1. At first, let us examine the case when the agents of the first homogeneous group are prone to cooperate with each other (a [ 0). Then they also act ‘as one person’ (see the upper half-plane of the diagram). If both groups of agents are prone to cooperate with each other, that is, when b [ 0, then all the agents of the first group are in the same state as the agents of the second group, i.e. the first p coordinates of the vector R0 are equal to +1. However, if the groups conflict with each other, that is, b\0, then all the agents of the first group are in the state, opposite to the state of the agents belonging to the second group. Let us summarize. When all the agents inside each group are prone to cooperate (a [ 0), the sign of the cross-interaction defines the state of the whole system. If b [ 0, and, consequently, the groups are prone to cooperate with each other, it is more profitable for them to be together. In this case, the vector R0 provides the global minimum. If b\0 and the groups are conflicting, it is more profitable for the groups to be separate: in this case the global minimum corresponds to Rp . Inside the symmetric strip along the axis of ordinates, both configurations R0 and Rp are minima simultaneously. This strip is a unique region on the plane where the

418

L. Litinskii and I. Kaganowa

functional (2) has both global and local minima simultaneously. To the right of the axis of ordinates, where b [ 0, the vectors R0 and Rp provide the global and the local minima, respectively. On the other hand, to the left of the axis of ordinates, Rp corresponds to the global minimum and R0 to the local minimum. It is easy to explain why such quasi-instability takes place. Indeed, let us suppose that the cross-interaction between the groups is equal to zero: b ¼ 0. In other words, two groups of the agents are completely independent. Then the problem (2) has two equivalent solutions R0 and Rp that correspond to the same value of energy. When jbj increases slightly, at the beginning the second configuration continues to be a minimum, but now a local minimum. When the value of jbj becomes sufficiently large, the additional local minimum disappears. The narrow strip along the axis of ordinates is the result of removing the random degeneracy of the global minimum when the external parameter b ¼ 0. It is interesting to understand whether local minima always appear for the same reason? Or are there other mechanisms for their appearance? Finally, let us briefly discuss the situation when the agents inside the first group conflict with each other (a\0). The lower half of the phase diagram shows that in this case the first group of agents splits into two opposing groups. This conclusion is rather reasonable. Other intrinsic interpretations are more speculative.

4 Conclusions We have shown that in a system with a great number of interacting binary agents, the known problem of the formation of two competing groups, or the problem of the bimodal coalition, can be formulated in terms of neural networks of the Hopfield type. The neural network dynamics is convenient when describing the influence of the agents on each other. We analyzed theoretically an idealized case of interaction between two homogeneous groups of agents. The obtained results allowed us to present a sensible interpretation of the bimodal coalition problem. We determined the mechanism of the formation of the local minima for the energy functional. It is interesting to find out whether there are other possibilities for their appearance. We think that our analysis is promising and deserves further examination. Acknowledgement. The work was financially supported by State Program of SRISA RAS No. 0065-2019-0003 (AAA-A19-119011590090-2). We are grateful to Ben Rozonoer for his help in preparation of this paper.

References 1. Axelrod, R.M., Bennett, D.S.: A landscape theory of aggregation. Brit. J. Polit. Sci. 23(2), 211–233 (1993) 2. Axelrod, R.M., Mitchell, W., Thomas, R.E., Bennett, D.S., Bruderer, E.: Coalition formation in standard-setting alliances. Manag. Sci. 41(9), 1493–1508 (1995)

Bimodal Coalitions and Neural Networks

419

3. Galam, S.: Fragmentation versus stability in bimodal coalitions. Phys. A 230(1–2), 174–188 (1996) 4. Serge, G.: Sociophysics. Springer, New York (2012) 5. Houdayer, J., Martin, O.C.: Renormalization for discrete optimization. Phys. Rev. Lett. 83, 1030–1033 (1999) 6. Karandashev, I.M., Kryzhanovsky, B.V.: Matrix transformation method in quadratic binary optimization. Opt. Mem. Neural Netw. (Inf. Opt.) 24(2), 67–81 (2015)

Building Neural Network Synapses Based on Binary Memristors Mikhail S. Tarkov(&) Rzhanov Institute of Semiconductor Physics SB RAS, Novosibirsk, Russia [email protected]

Abstract. The design of an analog multilevel memory cell based on the use of resistors and binary memristors is proposed. This design provides a greater number of resistance levels with a smaller number of elements than the wellknown multilevel memory devices. The cell is designed to set the synapse weights in hardware-implemented neural networks. The neuron vector of weights can be represented by a crossbar of binary memristors and a resistor set. An algorithm is proposed for mapping the neuron weight to the proposed multilevel memory cell. The proposed approach is illustrated by the construction example of a neuron for partitioning a set of vectors into two classes. Keywords: Neural networks Crossbar LTSPICE

Binary memristors Multilevel memory cell

1 Introduction The neural network hardware implementation requires a lot of memory to store the neurons layer weight matrix and it is expensive. The solution of this problem is simplified by using a device called memristor (a resistor with a memory) as a memory cell. The memristor was predicted theoretically in 1971 by Leon Chua [1]. The first physical realization of the memristor was demonstrated in 2008 by the Hewlett Packard laboratory as a thin-film TiO2 structure [2]. The memristor behaves like a synapse: it “remembers” the total electrical charge that has passed through it. The memory based on the memristors can reach the integration degree of 100 Gbits/cm2, several times higher than that based on the flash memory technology. These unique properties make the memristor a promising device for creating massively parallel neuromorphic systems. Binary memristors realize two conductivity values. Multilevel memristors realize a set of discrete conductivity levels (the levels number can reach tens and hundreds). Binary and multilevel memristors [3–8] are based on the filament switching mechanism and are more widespread than analog memristors, which conductivities can be changed continuously. The analog memristor materials are encountered much less often and they require a more complex making process. Multilevel memristors are more stable to statistical fluctuations than the analog memristors. The use of binary memristors to set the weighing coefficients of neural networks makes it important to create multilevel memory cells based on them. © Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 420–425, 2020. https://doi.org/10.1007/978-3-030-30425-6_50

Building Neural Network Synapses Based on Binary Memristors

421

2 Adjustable Multilevel Memory Cell Based on Binary Memristors The memory cell consists of parallel-connected circuits, each contains a binary memristor and a resistor connected in series (Fig. 1). The cell output is the current, the value of which is proportional to the product of the input voltage of the cell and its conductivity. Each of the parallel-connected circuits implements the cell binary digit. The high resistance M of the memristor corresponds to the value zero of the weight binary digit of the cell, and the low resistance m M corresponds to the value one of the digit. The resistance of the resistor is determined by the position of the corresponding binary digit of the cell weight: the lowest digit corresponds to the maximum resistance value R of the resistor, then the resistance decreases according to the law R=2i ; i ¼ 0; 1; . . .; n 1; n is the number of binary digits. In contrast to [6, 7], we do not require quantizing the input signal. Resistance R must meet constraints m R=2n ; R M=n: For n binary digits, the cell resistance has 2n values from R=

nP 1

2i (we neglect the

i¼0

value m) to M R n. For example, we get 32 values using 10 elements (5 memristors and 5 resistors with resistances Ri ¼ R=2i ; i ¼ 0; 1; . . .; n 1 (Fig. 1). For comparison: in the cell proposed in [9], 27 resistance values were obtained using 15 elements (3 memristors and 12 resistors).

Fig. 1. Example of multilevel memory cell

3 Specifying an Array of Weights To calculate the neuron activation it is required to store an array of weights. As the number of memory cells in such array increases, only the number of binary memristors increases, but the resistors number remains the same, since they are common to all cells (Fig. 2). It can be said that these resistors form the basis of the cells resistances, and the binary memristors realize the decomposition coefficients of the cell resistance

422

M. S. Tarkov

according to this basis. The set of binary memristors forms a crossbar, the number of rows in which is equal to the decomposition digits number n, and the columns number is equal to the neuron inputs number. In the general case, for the weight vector realization, two crossbars are required: the first for realizing positive weights, and the second for realizing negative weights.

Fig. 2. A neuron weights array construction

Fig. 3. Circuit for setting the memristor resistance

In Fig. 2, the circuits designed to set the memristor resistances of the crossbar are not shown. The corresponding scheme is presented in Fig. 3. It allows us to set the memristor resistance of an arbitrary crossbar to the minimum m or maximum M value depending on the sign of the voltage that is fed to the input In and significantly exceeds the binary memristor voltage threshold. For setting the memristor resistance, the transistor T is open by the voltage source V. In the crossbar functioning mode, this transistor is closed.

Building Neural Network Synapses Based on Binary Memristors

423

4 The Weight Array Binary Digits Calculation Suppose that the neural network is trained, i.e. the network weights have been calculated. To implement the neuron weights based on the multilevel memory cell we propose the following algorithm. 1. Among the neuron weight coefficients w1 ; w2 ; . . .wL ; L is the number of weights, choose a coefficient wmin 6¼ 0 such that jwmin j jwi j for all i ¼ 1; . . .; L. Put the coefficient wmin in correspondence to the resistor with minimum conductivity R1 , R m, R M. wi =jwmin j, i ¼ 1; . . .; L: 2. Normalize the weights: wi 3. Set the number of binary digits n ¼ 1. 4. For normalized weights wi ; i ¼ 1; 2; . . .; L; select a set of binary coefficients kji 2 f0; 1g providing a minimum of the sum Sn ¼

L X

ðjwi j

i¼0

n1 X

kji 2 j Þ2 :

j¼0

5. If Sn [ e, where e is the permissible error value, increase the number of digits (n n þ 1) and go to 4. 6. End.

5 A Neuron Example with Synapses Based on Binary Memristors Consider an example of developing a neuron with a crossbar on binary memristors. Let the vectors x1 ¼ ð1; 1; 1Þ, x2 ¼ ð1; 1; 0Þ, x3 ¼ ð1; 0; 0Þ belong to the first class (the required neuron output signal di ¼ 1; i ¼ 1; 2; 3), and the vectors x4 ¼ ð0; 0; 0Þ, x5 ¼ ð0; 0; 1Þ, x6 ¼ ð0; 1; 1Þ belong to the second class (di ¼ 1; i ¼ 4; 5; 6). Then, according to the mass center method, the neuron weight vector is equal to w¼

3 X

xi

i¼1

6 X

xi ¼ ð3; 1; 1Þ:

ð1Þ

i¼4

We assume that the neuron activation function is f ðaÞ ¼

1; a [ 0; a ¼ ðw; xÞ; 1; a 0;

ð2Þ

x is the neuron input vector. The activation function (2) can be implemented on the basis of an operational amplifier operating in comparator mode.

424

M. S. Tarkov

The weight w1 ¼ 3 (see (1)) can be represented as a sum w1 ¼ 2 þ 1. This means that this weight can be given by the conductivity g1 ¼ R2 þ R1 of parallel resistors with resistances R2 and R respectively. The weight w2 ¼ 1 can be set by the conductivity g2 ¼ R1 , and the weight w3 ¼ 1 is set by conductivity g3 ¼ R1 of the resistor connected to the negative input of the operational amplifier. Assuming that R ¼ 20 kX, m ¼ 100 X, M ¼ 1000 kX, we get the neuron circuit shown in Fig. 4.

Fig. 4. A neuron example based on binary memristors

For clarity, only memristors in the “on” state are shown here, that is, memristors with minimal resistance m ¼ 100 X. The Table 1 shows the results of the experiment in LTSPICE modeling system [10]. The output voltage values 3.2 V mean that the vectors x1 ; x2 ; x3 belong to the first class, and the values −3.2 V mean that the vectors x4 ; x5 ; x6 belong to the second class (supply voltage V ¼ 5 V). Table 1. Results of the experiment in LTSPICE x1 x2 x3 x4 x6 x5 Output, in volts 3,2 3,2 3,2 −3,2 −3,2 −3,2

Input

In order for the operational amplifier to implement the activation function (2) at the input x4 ¼ ð0; 0; 0Þ, a small negative bias based on the V2 source is added to the circuit. The input value 0 corresponds to zero voltage, and the input 1 corresponds to the voltage 0.3 V, which does not change the memristor resistance.

Building Neural Network Synapses Based on Binary Memristors

425

6 Conclusion An analog multilevel memory cell design based on resistors and binary memristors is proposed. This design provides a greater number of resistance levels with a smaller number of elements than the one proposed previously. The cell is designed to set the neuron synapse weights in the hardware-implemented neural networks. The neuron weights can be represented by a crossbar of binary memristors and a set of resistors. The number of resistors used in the neuron weights vector does not depend on the number of weights. An algorithm is proposed for mapping the neuron weights to the multilevel memory cells with binary memristors. The proposed approach is illustrated by the neuron construction example for partitioning a set of vector patterns into two classes. The example is implemented in the LTSPICE software simulation environment.

References 1. Chua, L.: Memristor – the missing circuit element. IEEE Trans. Circ. Theor. 18, 507–519 (1971) 2. Strukov, D.B., Snider, G.S., Stewart, D.R., Williams, R.S.: The missing memristor found. Nature 453, 80–83 (2008) 3. He, W., Sun, H., Zhou, Y., Lu, K., Xue, K., Miao, X.: Customized binary and multi-level HfO2−x-based memristors tuned by oxidation conditions. Sci. Rep. 7, 10070 (2017) 4. Yu, S., Gao, B., Fang, Z., Yu, H., Kang, J., Wong, H.-S.P.: A low energy oxide-based electronic synaptic device for neuromorphic visual systems with tolerance to device variation. Adv. Mater. 25, 1774–1779 (2013) 5. Tarkov, M.S.: Crossbar-based hamming associative memory with binary memristors. In: Huang, T., Lv, J., Sun, C., Tuzikov, A. (eds.) Advances in Neural Networks – ISNN 2018. ISNN 2018. Lecture Notes in Computer Science, vol 10878. Springer, Cham (2018). https:// link.springer.com/chapter/10.1007/978-3-319-92537-0_44. Accessed 25 Apr 2019 6. Truong, S.N., Ham, S.-J., Min, K.-S.: Neuromorphic crossbar circuit with nanoscale filamentary-switching binary memristors for speech recognition. Nanoscale Res. Lett. 9 (629), 1–9 (2014) 7. Nguyen, T.V., Vo, M.-H.: New binary memristor crossbar architecture based neural networks for speech recognition. Int. J. Eng. Sci. Invent. 5(5), 1–7 (2016) 8. Yakopcic, C., Taha, T.M., Subramanyam, G., Pino, R.E.: Memristor SPICE model and crossbar simulation based on devices with nanosecond switching time. In: Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, 4–9 August, pp. 158–160, IEEE (2013) https://ieeexplore.ieee.org/abstract/document/6706773. Accessed 25 Apr 2019 9. Irmanova, A., James, A.P.: Neuron inspired data encoding memristive multi-level memory cell. Analog Integr. Circ. Sign. Process. 95, 429–434 (2018) 10. LTspice XVII. URL: http://www.linear.com/designtools/ software/ #LTspice

Author Index

A Aleksey, Staroverov, 62 Alexandrov, Yu. I., 138 Alexandrov, Yuri I., 159 Andreev, Ark, 71 Andreeva, Olga V., 303 Arutyunova, K. R., 138 B Bakhshiev, A. V., 221 Bakhshiev, Aleksandr, 214 Beskhlebnova, Galina A., 124 Bogatyreva, Anastasia A., 263 Brynza, A. A., 367 Bulava, Alexandra I., 159 Burikov, Sergey, 285, 319 C Chizhov, Anton V., 165 Chumachenko, Sergey I., 295 D Dakhtin, Ivan S., 116 Demareva, Valeriia A., 89 Demidovskij, Alexander V., 375 Demin, Vyacheslav, 255 Dick, Olga E., 172 Dolenko, Sergey, 285, 319 Dolenko, Tatiana, 285, 319 Dolzhenko, Alexandr V., 271 Dunin-Barkowski, Witali L., 405 E Edeleva, Yu. A., 89 Efitorov, Alexander, 285

Egorchev, Mikhail, 25 Engel, Ekaterina A., 45 Engel, Nikita E., 45 Eroshenkova, Daria A., 295 F Farzetdinova, Rimma, 95 Fedorenko, Yuriy S., 207 Filatov, Nikolay, 214 Fomin, I. S., 221 Fomin, Ivan, 214 G Gai, Vasiliy E., 303 Gapanyuk, Yuriy, 71, 78 Glyzin, Sergey D., 181 Gorban, Alexander N., 384 Gordleeva, Susan Yu., 384 Gurtovoy, Konstantin, 151 Guseva, Alena, 392 I Igonin, Dmitry M., 309 Isaev, Igor, 319 Ivanchenko, Mikhail V., 384 K Kaganowa, Inna, 412 Kapustina, Ekaterina O., 271 Karandashev, I. M., 230, 359 Kartashov, Sergey I., 144 Kashcheev, Mikhail, 53 Kazantsev, Victor B., 190 Khayrov, E. M., 230 Kholodny, Yuri I., 144

© Springer Nature Switzerland AG 2020 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2019, SCI 856, pp. 427–428, 2020. https://doi.org/10.1007/978-3-030-30425-6

428 Khusnetdinov, Dmitry R., 295 Kiselev, Mikhail, 398 Kitov, Victor, 342 Kniaz, Vladimir V., 3 Knyazeva, Irina, 239 Kopeliovich, Mikhail, 53 Korlyakova, M. O., 367 Kotov, Vladimir B., 326 Kovalchuk, Mikhail V., 144 Kozlov, Dmitry S., 335 Kozubenko, Evgeny, 53 Krivonosov, Mikhail I., 384 L Laptinskiy, Kirill, 285, 319 Lebedev, Alexander E., 405 Litinskii, Leonid, 412 Lotareva, Yulia A., 384

Author Index R Red’ko, Vladimir G., 124, 131 Revunkov, Georgiy, 78 Rozhnova, Maiya A., 190 Rybintsev, Andrey, 239 S Schekalev, Alexey, 342 Shaposhnikov, Dmitry, 53 Skachkov, A. M., 106 Smirnitskaya, Irina A., 197 Smirnova, Elena Y., 165 Sokhova, Zarema B., 131 Sokolov, Mikhail, 151 Solovyeva, Kseniya P., 405 Sozinova, I. M., 138 Stikharnyi, Aleksandr, 71

M Makarenko, Nikolay, 239 Malakhov, Denis G., 144 Malsagov, M. Yu., 230 Malykhina, Galina, 392 Matveev, Mikhail, 151 Meilikov, Evgeny, 95 Mizginov, Vladimir A., 3 Moshkantsev, Peter V., 3 Moskalenko, Viktor, 246 Muratov, Y. R., 106

T Taran, Maria, 78 Tarasov, A. S., 106 Tarkhov, Dmitriy A., 351 Tarkov, Mikhail S., 420 Telyatnikov, L. S., 359 Terekhov, Serge A., 17 Terekhov, Valeri I., 295 Tiumentsev, Yury, 25 Tiumentsev, Yury V., 309, 335 Trofimov, Alexander G., 263

N Nekhaev, Dmitry, 255 Nikiforov, M. B., 106

U Ushakov, Vadim L., 144

O Ohinko, Timur, 239 Orekhov, Alexey, 71 Orlov, Vyacheslav A., 144 Osipov, Grigory, 246 P Palagushkin, Alexandr N., 326 Pankratova, Evgeniya V., 190 Panov, Aleksandr I., 62 Pashkov, Anton A., 116 Petrushan, Mikhail, 53 Polyakov, Igor V., 303 Preobrazhenskaia, Margarita M., 181

V Vasilyev, Alexander N., 351 Vlasenko, Vladislav, 214 Volkov, Sergey V., 159 Vvedensky, Victor, 151 Y Yakimova, Elena G., 165 Yudin, Dmitry A., 271 Yudkin, Fedor A., 326 Z Zaikin, Alexey A., 384 Zolotykh, Nikolai, 246