Hot Topics in Linear Algebra
 2020015306, 2020015307, 9781536177701, 9781536177718

Table of contents :
HOT TOPICS INLINEAR ALGEBRA
HOT TOPICS INLINEAR ALGEBRA
CONTENTS
PREFACE
Chapter 1COMPUTING GENERALIZED INVERSESUSING GRADIENT-BASED DYNAMICALSYSTEMS
Abstract
1. INTRODUCTION
2. GNN DYNAMICS FOR SOLVING MATRIXEQUATIONS
2.1. GNN for Solving the Matrix Equation AXB = D
3. GNN MODELS FOR COMPUTING GENERALIZEDINVERSES
3.1. GNN for Regular Inverse
3.2. GNN for Computing the Moore-Penrose Inverse
3.3. GNN for Computing theWeightedMoore-Penrose Inverse
3.4. GNN for Computing Outer Inverses
4. RNN MODELS ARISING FROM GNN MODELS
5. FURTHER RESULTS ON CONVERGENCEPROPERTIES
6. SYMBOLIC IMPLEMENTATION OF THE GNNDYNAMICS
7. EXAMPLES
7.1. Examples in Matlab
7.2. Examples in Mathematica
CONCLUSION
ACKNOWLEDGMENTS
REFERENCES
Chapter 2CRAMER’S RULES FOR SYLVESTER-TYPEMATRIX EQUATIONS
Abstract
1. INTRODUCTION
2. PRELIMINARIES
2.1. Elements of the Theory of Row-Column Determinants
2.2. Determinantal Representations of theMoore-Penrose Inverseswith Applications to the Twos-SidedMatrix Equation
3. DETERMINANTAL REPRESENTATIONS OF THEGENERAL SOLUTION TO THE QUATERNIONSYLVESTER MATRIX EQUATION (1.1)
4. DETERMINANTAL REPRESENTATIONS OF THEGENERAL AND (SKEW-)HERMITIAN SOLUTIONSTO (1.2)
5. DETERMINANTAL REPRESENTATIONS OF -HERMITIAN AND -SKEW-HERMITIANSOLUTIONS TO SYLVESTER-TYPEQUATERNION MATRIX EQUATION
6. AN EXAMPLE
REFERENCES
Chapter 3BICR ALGORITHM FOR COMPUTINGGENERALIZED BISYMMETRIC SOLUTIONSOF GENERAL COUPLED MATRIX EQUATIONS
Abstract
1. INTRODUCTION
2. MAIN RESULTS
HS version of BiCR algorithm
3. NUMERICAL EXPERIMENTS
CONCLUSION
REFERENCES
Chapter 4SYSTEM OF MIXED GENERALIZEDSYLVESTER-TYPE QUATERNION MATRIXEQUATIONS
Abstract
1. INTRODUCTION
2. PRELIMINARIES
3. MAIN RESULT
4. SOME SIMPLIFICATIONS OF THE RESULTFORMULAS
ALGORITHM WITH A NUMERICAL EXAMPLE
CONCLUSION
REFERENCES
Chapter 5HESSENBERG MATRICES:PROPERTIES AND SOME APPLICATIONS
Abstract
1. INTRODUCTION
2. PRELIMINARY DEFINITIONS AND RESULTS
2.1. Hessenberg Matrices
2.2. Permanents and Determinants
2.3. Ordered Partitions of a Positive Integer
2.4. Sets (n)
2.5. Definition of Triangular Tables and Their Functions
2.6. On Connection of Parafunctions of Triangular Tables
3. RELATIONSHIP OF THE DETERMINANT OFHESSENBERG MATRIX AND PARADETERMINANT
4. NORMALIZATION OF A GENERAL HESSENBERGMATRIX
5. POLYA TRANSFORMATION AND HESSENBERGMATRICES
6. ALGORITHMS FOR CALCULATINGDETERMINANTS AND PERMANENTSOF THE HESSENBERG MATRIX
6.1. Calculating Functions of Hessenberg Matrix Using OrderedPartitions and Sets (n)
6.2. Reducing the Order of Hessenberg Matrices
6.3. Decomposition of the Hessenberg Matrices by Elements of aLast Row
7. APPLICATIONS OF HESSENBERG MATRICES
7.1. Some Recurrent Relations and Hessenberg Matrices
7.2. Some Combinatorial Identities Using Hessenberg-ToeplitzMatrices
7.3. Some Fibonacci-Lucas Identities Using Generalized Brioschi’sFormula
REFERENCES
Chapter 6EQUIVALENCE OF POLYNOMIAL MATRICESOVER A FIELD
Abstract
1. INTRODUCTION
2. SEMI-SCALAR EQUIVALENCE OF NONSIGULAR MATRICES
2.1. Preparatory Notations and Results
2.2. Main Results
2.3. The Illustrative Example
3. NORMAL FORMS OF A MATRIX PENCIL WITH RESPECT TO SEMI-SCALAR EQUIVALENCE
3.1. Auxiliary Statements
3.2. The First Normal Form of a Matrix Pencil with Respect toSemi-Scalar EquivalenceIn
3.3. The Second Normal Form of a Matrix Pencil with Respect toSemi-Scalar EquivalenceIn
REFERENCES
Chapter 7MATRICES IN CHEMICAL PROBLEMSMODELED USING DIRECTED GRAPHSAND MULTIGRAPHS
ABSTRACT
1. INTRODUCTION
2. MODELING MPS USING DIRECTED GRAPHSAND MULTIGRAPHS
3. TWO DIFFERENT EXAMPLES OF MP-MATRICES
4. SPECIAL GRAPHS STRUCTURES AND THEIRCORRESPONDING MP-MATRICES
4.1. Complete Graphs
4.2. Cycles
4.3. Wheels
4.4. n-Cubes
4.5. Complete Bipartite Graphs
4.6. Full Binary Tree
5. THEOREMS AND COROLLARIES ABOUT MP-MATRICES
6. EXAMPLES, OTHER RESULTS AND CONJECTURESABOUT MP-MATRICES
CONCLUSION
ACKNOWLEDGMENTS
REFERENCES
Chapter 8ENGAGING STUDENTS INTHE LEARNING OF LINEAR ALGEBRA
ABSTRACT
INTRODUCTION
DIFFICULTIES WHEN LEARNING LINEAR ALGEBRA
Ways of Thinking in Linear Algebra
SEMIOTIC REPRESENTATION REGISTERS
CONCEPT OF SKILLS
The Development of Mathematical Skills
Bloom’s Taxonomy
VISUAL TOOLS
CUSTOMIZED VISUAL TOOLS FOR SELECTED TOPICS
Calculating and Visualizing Eigenvalues and Eigenvectors
Calculating and Visualizing Coordinates of Points in Two 2D Basis
Visualizing Linear Transformations in the Plane
USE OF THE DESIGNED TOOLS IN THE CLASSROOM
Activity 1
Activity 2
Activity 3
Activity 4
Activity 5
Activity 6
Activity 7
CONCLUSION
REFERENCES
ABOUT THE EDITOR
INDEX
Blank Page

Citation preview

MATHEMATICS RESEARCH DEVELOPMENTS

HOT TOPICS IN LINEAR ALGEBRA

No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.

MATHEMATICS RESEARCH DEVELOPMENTS Additional books and e-books in this series can be found on Nova’s website under the Series tab.

MATHEMATICS RESEARCH DEVELOPMENTS

HOT TOPICS IN LINEAR ALGEBRA

IVAN I. KYRCHEI EDITOR

Copyright © 2020 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. We have partnered with Copyright Clearance Center to make it easy for you to obtain permissions to reuse content from this publication. Simply navigate to this publication’s page on Nova’s website and locate the “Get Permission” button below the title description. This button is linked directly to the title’s permission page on copyright.com. Alternatively, you can visit copyright.com and search by title, ISBN, or ISSN. For further questions about using the service on copyright.com, please contact: Copyright Clearance Center Phone: +1-(978) 750-8400 Fax: +1-(978) 750-4470 E-mail: [email protected].

NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the Publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. Additional color graphics may be available in the e-book version of this book.

Library of Congress Cataloging-in-Publication Data Names: Kyrchei, Ivan I., editor. Title: Hot topics in linear algebra / Ivan Kyrchei, editor. Identifiers: LCCN 2020015306 (print) | LCCN 2020015307 (ebook) | ISBN 9781536177701 (hardcover) | ISBN 9781536177718 (adobe pdf) Subjects: LCSH: Algebras, Linear. Classification: LCC QA184.2 .H68 2020 (print) | LCC QA184.2 (ebook) | DDC 512/.5--dc23 LC record available at https://lccn.loc.gov/2020015306 LC ebook record available at https://lccn.loc.gov/2020015307

Published by Nova Science Publishers, Inc. † New York

CONTENTS Preface

vii

Chapter 1

Computing Generalized Inverses Using Gradient-Based Dynamical Systems Predrag S. Stanimirović and Yimin Wei

Chapter 2

Cramer's Rules for Sylvester-Type Matrix Equations Ivan I. Kyrchei

Chapter 3

BiCR Algorithm for Computing Generalized Bisymmetric Solutions of General Coupled Matrix Equations Masoud Hajarian

111

Chapter 4

System of Mixed Generalized Sylvester-Type Quaternion Matrix Equations Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram, Ilyas Ali and Abdul Shakoor

137

Chapter 5

Hessenberg Matrices: Properties and Some Applications Taras Goy and Roman Zatorsky

163

Chapter 6

Equivalence of Polynomial Matrices over a Field Volodymyr M. Prokip

205

1

45

vi

Contents

Chapter 7

Matrices in Chemical Problems Modeled Using Directed Graphs and Multigraphs Victor Martinez-Luaces

233

Chapter 8

Engaging Students in the Learning of Linear Algebra Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero

267

About the Editor

293

Index

295

P REFACE

Linear algebra is the branch of mathematics concerning vector spaces and linear mappings between such spaces. Systems of linear equations with several unknowns are naturally represented using the formalism of matrices and vectors. So we arrive at the matrix algebra, etc. Linear algebra is central to almost all areas of mathematics. Many ideas and methods of linear algebra were generalized to abstract algebra. Functional analysis studies the infinite-dimensional version of the theory of vector spaces. Combined with calculus, linear algebra facilitates the solution of linear systems of differential equations. Linear algebra is also used in most sciences and engineering areas, because it allows modeling many natural phenomena, and efficiently computing with such models. ”Hot Topics in Linear Algebra” presents original studies in some areas of the leading edge of linear algebra. Each article has been carefully selected in an attempt to present substantial research results across a broad spectrum. Topics discussed herein include recent advances in analysis of various dynamical systems based on the Gradient Neural Network; Cramer’s rules for quaternion generalized Sylvester-type matrix equations; matrix algorithms for finding the generalized bisymmetric solution pair of general coupled Sylvester-type matrix equations; explicit solution formulas of some systems of mixed generalized Sylvester-type quaternion matrix equations; new approaches to studying the properties of Hessenberg matrices by using triangular tables and their functions; researching of polynomial matrices over a field with respect to semi-scalar equivalence; mathematical modeling problems in chemistry with applying mix-

viii

Ivan I. Kyrchei

ing problems which the associated MP-matrices; some visual apps, designed in Scilab, for the learning of different topics of Linear Algebra. In Chapter 1, the dynamical systems and recurrent neural networks a apply as a powerful tool for solving many kinds of matrix algebra problems. In particular, for computing generalized inverse matrices RNN models, that are dedicated to find zeros of equations or to minimize nonlinear functions and represent optimization networks, are used. Convergence properties and exact solutions of considered models are investigated in this section as well. In the following three chapters, matrix equations as one of more famous subjects of linear algebra are studying. The well-known Cramer’s rule is an elegant formula for the solutions of a system of linear equations that has both theoretical and practical importances. It is the consequence of the unique determinantal representation of inverse matrix by the adjoint matrix with the cofactors in the entries. Is it possible to solve by Cramer’s rule the generalized Sylvester matrix equation AXC + BY D = E,

(1)

moreover when this equation has quaternionic coefficient matrices? Chapter 2 gives the answer on this question. In this chapter, Cramer’s rules for Eq. (1) and for the quaternionic generalized Sylvester matrix equations with ∗- and ηHermicities are derived within the framework of the theory of noncommutative column-row determinants previously introduced by the author. Algorithms of finding solutions are obtained in both cases with complex and quaternionic coefficient matrices. In Chapter 3, the Hestenes-Stiefel (HS) version of biconjugate residual (BiCR) algorithm for finding the generalized bisymmetric solution pair (X, Y ) of the general coupled matrix equations ( P f Pgi=1 (Ai XBi + Ci Y Di) = M, j=1 (Ej XFj + Gj Y Hj ) = N. is established in a finite number of iterations in the absence of round-off errors. Some necessary and sufficient conditions of constraint mixed type generalized Sylvester quaternion matrix equations A3 X = C3 , Y B3 = C4 , ZB4 = C5 , A4 ZB5 = C6 , A1 X − Y B1 = C1 , A2 X − ZB2 = C2 ,

Preface

ix

to have a solution are derived in Chapter 4. The solution pair (X, Y ) is expressed in terms of Moore-Penrose inverses and its determinantal representations by noncommutative column-row determinants are used in an example. In Chapter 5, new approaches to studying the properties of Hessenberg matrices and an effective algorithms for calculating the determinants and permanents of such matrices are considered. The theory of new subjects of linear algebra such as triangular tables and their functions – paradeterminants and parapermanents, which are some analogs of the determinant and permanent, are used in this chapter. Polynomial matrices over a field with respect to semi-scalar equivalence are studying in Chapter 6. The necessary and sufficient conditions of semiscalar equivalence of nonsingular matrices A(λ) and B(λ) over a field F of characteristic zero are given in terms of solutions of a homogenous system of linear equations, and canonical forms are obtained with respect to semi-scalar equivalence for an n × n matrix pencil A(λ) = A0 λ + A1 over an arbitrary field F with nonsingular matrix A0 in this chapter. In Chapter 7, mathematical modeling problems in chemistry, namely, mixing problems (MPs) are explored. These problems lead to linear ordinary differential equations (ODE) systems, for which the associated matrices (so-called MP-matrices) have different structures depending on the internal geometry of the system. Useful tools to characterize MPs geometrical properties are graphs theory. In particular, directed graphs and multigraphs are widely utilized in this chapter for that purpose. The main objective of this chapter consists in analyzing MP-matrices, focusing on their algebraic properties, which involve eigenvectors, eigenvalues and their algebraic and geometric multiplicities. The main objective of Chapter 8 is to present some visual apps, designed in Scilab, for the learning of different topics of Linear Algebra. These apps are far-reaching resources that give new didactical and pedagogical possibilities.

In: Hot Topics in Linear Algebra Editor: Ivan I. Kyrchei

ISBN: 978-1-53617-770-1 c 2020 Nova Science Publishers, Inc.

Chapter 1

C OMPUTING G ENERALIZED I NVERSES U SING G RADIENT-B ASED DYNAMICAL S YSTEMS Predrag S. Stanimirovi´c1,∗ and Yimin Wei2,† 1 University of Niˇs, Faculty of Sciences and Mathematics, Niˇs, Serbia 2 Fudan University, Shanghai, P. R. China

Abstract The present chapter is a survey and further theoretical and computational analysis of various dynamical systems based on the Gradient Neural Network (GNN) evolution design for solving matrix equations and computing generalized inverses. For that purpose, different types of dynamic state equations corresponding to various outer and inner inverses are considered. In addition, some dynamical systems arising from GNN models have been proposed and used in computing generalized inverses. Convergence properties and exact solutions of considered models are investigated. Simulation results are obtained using Matlab Simulink implementation and using Matlab programs. Also, an algorithm for generating the exact solution of some dynamic state equations is stated. Implementation of that algorithm in Computer Algebra System (CAS) Mathematica gives ∗ †

Corresponding Author’s Email: [email protected] (Corresponding author). Corresponding Author’s Email: [email protected].

2

Predrag S. Stanimirovi´c and Yimin Wei an efficient software for symbolic computation of outer inverses of matrices. The domain of the Mathematica program includes constant matrices whose entries are integers, rational numbers as well as one-variable or multiple-variable rational or polynomial matrices. Illustrative examples are presented using a symbolic implementation in the package Mathematica.

Keywords: Gradient Neural Network (GNN), dynamical system, dynamic state equation, convergence, computer algebra

1.

INTRODUCTION

According to the traditional notation, Crm×n (resp. Rnm×n ) denotes the set of all complex (resp. real) m × n matrices of rank r. The identity matrix of an appropriate order is denoted by I. Furthermore, the notations A∗ , R(A), N (A), rank(A) and σ(A) stand for the conjugate transpose, the range and the null space, the rank, and the spectrum of the matrix A ∈ Cm×n , respectively. Similarly, R[X ] (resp. R(X )) denotes the polynomials (resp. rational functions) with real coefficients with respect to unknown variables X = x1 , . . . , xk . The set of m × n matrices with elements in R[X ] (resp. R(X )) is denoted by R[X ]m×n (resp. R(X )m×n ). Further, O denotes a zero matrix of proper dimensions. The problem of generalized inverses computation leads to, so called, Penrose equations (1) AXA = A

(2) XAX = X

(3) (AX)∗ = AX

(4) (XA)∗ = XA.

The set of all matrices satisfying the equations determined by the set S is denoted by A{S}. Any matrix from A{S} is called the S-inverse of A and is denoted by A(S) . For any matrix A there exists a unique element A{1, 2, 3, 4}, called the Moore-Penrose inverse of A, which is denoted by A† . The Drazin inverse of a square matrix A ∈ Cn×n is the unique matrix X ∈ Cn×n which fulfills the matrix equation (2) in conjunction with (1k ) Al+1 X = Al , l ≥ ind(A),

(5) AX = XA,

and it is denoted by X = AD . The notation ind(A) index of a square  denotes the j matrix A, which is defined by ind(A) = min j| rank(A ) = rank(Aj+1 ) . In the case ind(A) = 1, the Drazin inverse becomes the group inverse X = A# .

Computing Generalized Inverses Using Gradient-Based ... 3 −1 T T The right inverse of A ∈ Rnm×n will be denoted by A−1 R = A A  A , while m×n will be denoted by A−1 = AT AAT −1 . the left inverse of A ∈ Rm L Outer generalized inverse of A ∈ Cm×n with determined range T and null (2) space S is denoted by AT ,S and defined as the matrix X ∈ Cn×m which satisfies XAX = X, R(X) = T, N (X) = S.

(1.1)

If A ∈ Crm×n , T is a subspace of Cn of dimension t ≤ r and S is a subspace of Cm of dimension m − t, then A has a {2}-inverse X such that R(X) = T and N (X) = S if and only if AT ⊕ S = Cm , in which case X is unique and it is (2) denoted by AT ,S . The Moore-Penrose inverse A† and the weighted Moore-Penrose inverse † AM,N , the Drazin inverse AD and the group inverse A# can be derived by means of appropriate choices of T and S (see, for example, [1]): (2)

A† = AR(A∗ ),N (A∗) , (2)

A†M,N = AR(A] ),N (A] ), A] = N −1 A∗ M (2)

AD = AR(Ak ),N (Ak ) ,

(1.2)

k ≥ ind(A),

(2)

A# = AR(A),N (A) , ind(A) = 1. The matrices M, N which determine A†M,N are positive-definite and of appropriate dimensions. For other important properties of generalized inverses see [1, 2, 3]. There are three general approaches in computing generalized inverses. 1. Classical numerical algorithms, defined as a complete set of procedures for finding an approximate solution of a problem, together with computable error estimates. Numerical algorithms can be divided in two categories: direct and iterative methods. The singular value decomposition (SVD) algorithm is the most known between the direct methods [2]. Also, other types of matrix factorizations have been exploited in computation of generalized inverses, such as the QR decomposition [4, 5], LU factorization [6]. Methods based on the application of the Gauss-Jordan elimination process to an appropriate augmented matrix were investigated in [7, 8]. Algorithms for computing the inverse of a constant nonsingular matrix A ∈ Cm×n by means of the Leverrier-Faddeev algorithm

4

Predrag S. Stanimirovi´c and Yimin Wei were presented in [9, 10]. A more general finite algorithm for computing the Moore-Penrose generalized inverses of a given rectangular or singular constant matrix A ∈ Cm×n was originated in [11]. Two variants of the finite algorithm for computing the Drazin inverse were introduced in (2) [12, 13, 14]. Finite algorithms for the nontrivial AT ,S inverse were established in [15]. Greville’s Partitioning method, originated in [16], has been very popular last years. Iterative methods, such as the orthogonal projection algorithms, the Newton iterative algorithm, and the higher-order convergent iterative methods are suitable for implementation. The iterative methods, such as the orthogonal projection algorithms, the Newton iterative algorithm, and the higher-order convergent iterative methods are more suitable for implementation. The Newton iterative algorithm has a fast convergence rate, but it requires an initial condition for its convergence. All iterative methods, in general, require initial conditions which are ultimate, rigorous and sometimes cannot be fulfilled easily. A number of iterative methods were proposed in [8, 17, 18, 19, 20, 21, 22, 23, 24] and many other references. The Newton iterative algorithm has a fast convergence rate, but it requires an initial condition for its convergence. All iterative methods, in general, require initial conditions which are ultimate, rigorous and sometimes cannot be fulfilled easily. 2. Computer algebra, also called symbolic computation or algebraic computation, is a part of computational mathematics that refers to algorithms and software for manipulating mathematical expressions and other mathematical objects. An overview of computer algebra algorithms was given in [25]. More details can be found in the references [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]. 3. Continuous-time recurrent neural network (RNN) algorithms, based on dynamical systems.

A dynamical system is a system in which the movement of some points in a geometrical space is defined by a time-dependent function. The dynamical systems and recurrent neural networks are a powerful tool for solving many kinds of matrix algebra problems because of: (a) their parallel distributed nature; (b) possibility to ensure a response within a predefined time-frame in real-time

Computing Generalized Inverses Using Gradient-Based ...

5

applications; (c) convenience of hardware implementation. We consider RNN models dedicated to finding zeros of equations or to minimize nonlinear functions. These models represent optimization networks. Optimization RNN models can be divided into two classes: Gradient Neural Networks (GNN) and Zhang Neural Networks (ZNN). GNN models are explicit and aimed to solving time-invariant problems. On the other hands, ZNN models can be implicit and able to solve time-varying problems. Recently, a number of nonlinear and linear continuous-time dynamical systems and initiated recurrent neural network models have been developed for the purpose of numerical evaluation of the matrix inverse and the pseudoinverse of full-row or full-column rank rectangular matrices (for more details, see [45, 46, 47]). Also, various recurrent neural networks for computing generalized inverses of rank-deficient matrices were designed in [48, 49]. The most general GNN design evolution for solving the matrix equation AXB = D was investigated in [50]. In [51] the authors proposed conditions for the existence and representations of {2}, {1, 2}, and {1}-inverses. Also, a new computational framework for computing these generalized inverses was proposed. Computational algorithms [51] are defined using GNN models for solving matrix equations. It is known that the multiplication of the right hand side of the classical ZNN design by an appropriate positive definite matrix generates a new neural design with improved convergence rate. The goal in [52] is to apply similar principles on the GNN design. Appropriate combinations of GNN and ZNN models for solving the matrix equations BX = D and XC = D in time-invariant case were developed in [53]. Two gradient-based recurrent neural networks for computing the W-weighted Drazin inverse of a real constant matrix were presented in [54]. The global organization of sections is as follows. Various dynamical systems for solving matrix equations are investigated in Section 2. Section 3 is devoted to GNN models for computing generalized inverses. RNN models arising from GNN models are considered in Section 4. Convergence properties and exact solutions of considered models are investigated in Section 5. Section 6 investigates symbolic computation of outer inverses based on finding exact solutions of underlying dynamic state equations. Illustrative simulation examples are presented in Section 7.

6

Predrag S. Stanimirovi´c and Yimin Wei

2. GNN DYNAMICS FOR SOLVING MATRIX E QUATIONS The dynamics of the GNN models for solving a matrix equation is defined on the usage of the error matrix E(t) which is defined by replacing the unknown matrix in the considered matrix equation by the time-varying activation state variables matrix V (t). The goal function is the scalar-valued norm-based error function 1 ε(t) = ε(V (t)) = kE(t)k2F , (2.3) 2 p where and kAkF := Tr(AT A) denotes the Frobenius norm of the matrix A and Tr(·) denotes the trace of a matrix. The general design formula is defined as the dynamical system with the evolution along the negative gradient −∂ε(V (t))/∂V of ε(V (t)), as follows   dV (t) ∂ε(V (t)) ˙ V (t) = = −γF . (2.4) dt ∂V The left-hand side, V˙ (t), in (2.4) represents the time derivative of V (t). The scaling real parameter γ in (2.4) is an inductance parameter or the reciprocal of a capacitance parameter, and could be chosen as large as possible in order to accelerate the convergence. Further, F (C) is an odd and monotonically increasing function array, element-wise applicable to elements of a real matrix C = (cij ) ∈ Rm×n , i.e., F (C) = (f (cij )), i = 1, . . ., m, j = 1, . . ., n, wherein f (·) is an odd and monotonically increasing function. Remark 2.1. The following real-valued monotonically increasing odd functions are widely used. Linear function f (x) = x. (2.5) Bipolar-sigmoid function f (x) =

1 + exp(−q) 1 − exp(−qx) · , q > 2. 1 − exp(−q) 1 + exp(−qx)

Power-sigmoid function ( xp , if|x| ≥ 1 f (x) = , q > 2, p ≥ 3. 1+exp(−q) 1−exp(−qx) · , otherwise 1−exp(−q) 1+exp(−qx)

(2.6)

(2.7)

Computing Generalized Inverses Using Gradient-Based ...

7

Smooth power-sigmoid function 1 1 + exp(−q) 1 − exp(−qx) · , p ≥ 3, q > 2. f (x) = xp + 2 1 − exp(−q) 1 + exp(−qx)

(2.8)

In general, so far considered GNN models are defined on the basis of an appropriate error matrix and the derivative of an appropriate matrix norm. The most general linear matrix equation is given in the form AXB = D. It is useful to consider a solution to this matrix equation and its particular appearances. Clearly, so far considered GNN models can be defined upon one of the appearances of the matrix equation AXB = D.

2.1.

GNN for Solving the Matrix Equation AXB = D

Most general gradient-based neural dynamics is aimed to solving the general linear matrix equation AXB = D. Corresponding model was defined and investigated in [50]. The model is based on the matrix-valued error function E(t) = D − AV (t)B. Consequently, the scalar-valued goal function is ε(t) = ε(V (t)) = kD − AV (t)Bk2F , whose derivative with respect to V (t) is equal to 1 ∂kD − AV (t)Bk2F ∂ε(V (t)) = = −AT (D − AV (t)B) B T . ∂V 2 ∂V Using the general evolution design (2.4), the nonlinear GNN design for solving linear matrix equation AXB = D can be defined as follows: dV (t) = V˙ (t) = γAT F (D − AV (t)B)B T . (2.9) dt The model (2.9) was defined in [50] and it was termed as GN N (A, B, D). Remark 2.2. It is important to emphasize the difference between V˙ (t) = γAT F (D − AV (t)B)B T and

 dV (t) = V˙ (t) = γF AT (D − AV (t)B)B T . dt Right-hand side of (2.9) can be related with the ZNN design ˙ E(t) = −γF (E(t))

as follows: T ˙ V˙ (t) = γAT F (E(t))B T = −AT E(t)B .

8

Predrag S. Stanimirovi´c and Yimin Wei

The generalized GNN model (GGNN model) is applicable in both timevarying and time-invariant case and it can operate with time-varying coefficient matrices A(t), B(t), D(t): V˙ (t) = −γA(t)T F (D(t) − A(t)V (t)B(t)) B(t)T .

(2.10)

But, gradient-based numerical algorithms and neural-dynamic schemes are substantially designed for solving problems with static-in-time coefficient matrices [55]. Convergence properties of GN N (A, B, D) design are considered in [50]. Theorem 2.1. [50] Assume that real matrices A ∈ Rm×n , B ∈ Rp×q and D ∈ Rm×q satisfy the constraint AA(1) DB (1) B = D,

(2.11)

for some inner inverses A(1) and B (1) . If an odd and monotonically increasing array activation function F (·) based on the elementwise application of appropriate odd function f (·) is used, then the neural state matrix V (t) ∈ Rn×p of the GN N (A, B, D) model (2.9) asymptotically and globally converges to the theoretical solution of the matrix equation AXB = D, i.e., AV (t)B → D as t → +∞, for an arbitrary initial state matrix V (0). Proof. Firstly, validity of (2.11) for some A(1) ∈ A{1} and B (1) ∈ B{1} ensures solvability of AV B = D. The substitution V (t) = V (t) + A(1)DB (1) transforms the dynamics (2.9) into the equivalent form dV (t) dV (t) = = γAT F (D − AV (t)B) B T dt dt  

  = γAT F D − A V (t) + A(1)DB (1) B B T .

According to (2.11), it follows that

  dV (t) = γATF D − AA(1)DB (1) B − AV (t)B B T dt  = −γAT F AV (t)B B T .

(2.12)

The Lyapunov function candidate, which measures the convergence performance, is of the same form as in [56, Theorem 3.1]:

2  1 1 L(V (t), t) = V (t) F = Tr V (t)T V (t) . (2.13) 2 2

Computing Generalized Inverses Using Gradient-Based ...

9

Evidently, the inequality L(V (t), t) ≥ 0 holds  for V (t) 6= 0. According to (2.13), assuming (2.12) and using dTr X T X = 2Tr X T dX in conjunction with basic properties of the matrix trace function, one can express the time derivative of L(V (t), t) as in the following:   dL(V (t), t) T dV (t) = Tr V (t) dt dt    T T (2.14) = −γTr V (t) A F AV (t)B B T h i T  = −γTr AV (t)B F AV (t)B . Since the scalar-valued function f (·) is an odd and monotonically increasing, Generalizing the strategy from [56], one can verify the following:   dL(V (t), t) = −γTr W T F (W ) dt  m X n X < 0 if AV (t)B = 6 0, = −γ wij f (wij ) = 0 if AV (t)B = 0.

(2.15)

i=1 j=1

 Observing that W (t) = AV (t)B = A V (t) − A(1) DB (1) B = AV (t)B−D, one can verify  dL(V (t), t) < 0 if W (t) 6= 0, (2.16) = 0 if W (t) = 0. dt This further implies: - dL(Vdt(t),t) < 0 at any non-equilibrium state V (t) satisfying W (t) = AV (t)B− D 6= 0; - dL(Vdt(t),t) = 0 at the equilibrium state V (t) satisfying W (t) = AV (t)B−D = 0. According to the Lyapunov stability theory, W (t) = AV (t)B − D globally converges to the zero matrix, from arbitrary initial value V (0). It is worth to mention that the constraints (2.17) are not caused by the GNN evolution design. This condition is just the general condition for solvability of the general linear matrix equation AXB = D. Since A(1) and B (1) are arbitrary, the most convenient choice is A† and B † . This choice is used in Theorem 2.2.

10

Predrag S. Stanimirovi´c and Yimin Wei

Theorem 2.2. [50] Assume that the real matrices A ∈ Rm×n , B ∈ Rp×q and D ∈ Rm×q satisfy the condition AA† DB † B = D.

(2.17)

Then the activation state variables matrix V (t) of the model GN N (A, B, D), defined by (2.9), is convergent as t → +∞ and has the equilibrium state V (t) → V˜ = A† DB † + V (0) − A† AV (0)BB †

(2.18)

for every initial state matrix V (0) ∈ Rn×p . Proof. In view of (2.9), the matrix V1 (t) = A†AV (t)BB † satisfies dV (t) T dV1 (t) = γA†AAT B BB † = γA†AAT F (E(t)) B T BB † . dt dt According to the basic properties of the Moore-Penrose inverse, it follows that B T BB † = B T , A† AAT = AT which further implies dV1 (t) = γAT F (E(t)) B T dt dV (t) = . dt Consequently, V2 (t) = V (t) − V1 (t) satisfies

dV2(t) dt

= 0, which implies

V2 (t) = V2 (0) = V (0) − V1 (0) = V (0) − A† AV (0)BB † , t ≥ 0.

(2.19)

Furthermore, according to Theorem 2.1, AV (t)B → D and V1 (t) converges to V1 (t) = A† (AV (t)B)B † → A† DB † , t → +∞. Accordingly, having in mind (2.19), V (t) is convergent and its equilibrium value is V (t) = V1 (t) + V2 (t) → V˜ = A† DB † + V2 (0) = A† DB † + V (0) − A† AV (0)BB † , which is just a confirmation of (2.18).

Computing Generalized Inverses Using Gradient-Based ...

11

According to Theorem 2.2, the limiting value V˜ of V (t) is determined by the choice of V (0). For this purpose, it will be denoted by V˜V (0). Corollary 2.1 follows immediately from (2.18), taking into account (2.17). Also, it is directly implied by Theorem 2.1. Corollary 2.1. [50] Assume that the real matrices A ∈ Rm×n , B ∈ Rp×q and D ∈ Rm×q satisfy (2.17). Further, let an odd and monotonically increasing function f (·) be used to define the array activation function F (·) and γ > 0. Let V˜V (0) denotes the limit value V˜ = limt→∞ V (t) corresponding to the initial state V (0). Then the equilibrium state matrices V˜V (0) of GN N (A, B, D) satisfy n o A V˜V (0) B| V (0) ∈ Rn×m = {D}. (2.20)

for each V (0) ∈ Rn×m .

Remark 2.3. (a) According to [57, 58], it is known that kAXB − Dk2 ≥ kAA† DB † B − Dk2 , where the equality is valid if and only if X = A† DB † + Y − A† AY BB † ,

(2.21)

wherein Y in (2.21) is arbitrary matrix of appropriate dimensions. Also, A† DB † is the unique minimizer of minimal k·k norm between least squares minimizers: kA† DB † k2 ≤ kA† DB † + Y − A† AY B † Bk2 . (b) Solution (2.18) to the GNN model (2.9) coincides with the general solution A(1)DB (1) + Y − A(1)AY BB (1) to the matrix equation AXB = D, wherein Y is an arbitrary matrix and A(1), B (1) are arbitrary inner inverses of A and B, respectively. (c) Also, it is important to mention two important details: – Arbitrary matrix Y from (2.21) is replaced by the initial state V (0) in (2.18). – In general, the solutions (2.18) are the least squares solutions of AXB = D. – The zero initial condition V (0) = O produces the Moore-Penrose solution A† DB † to the matrix equation AXB = D, i.e., the least squares solution of minimal norm which possesses the minimal norm attribute in k·k2 matrix norm.

12

3.

Predrag S. Stanimirovi´c and Yimin Wei

GNN MODELS FOR C OMPUTING G ENERALIZED INVERSES

It is known that representations of generalized inverses are closely related to solving of appropriate matrix equations. In this section, we exploit this possibility, and derive dynamical systems for solving main generalized inverses.

3.1.

GNN for Regular Inverse

Wang in [47] proposed the dynamic equation of the linear recurrent neural network for computing the inverse of a nonsingular matrix A. This dynamics solves the matrix equation AX = I and it is initiated by the error matrix E(t) = AV (t) − I. In this case, the objective scalar-valued function is defined as ε(V (t)) = 12 kAV (t) − Ik2F . Further, ∂ε(V (t)) 1 ∂kAV (t) − Ik2F = = AT (AV (t) − I) . ∂V 2 ∂V Now, using the general design rule (2.4) with linear activation function, corresponding GNN dynamics can be defined as follows: dV (t) = −γAT AV (t) + γAT dt = −γAT (AV (t) − I) , V (0) = V0 .

(3.1)

The entries of V (t) are unknown activation state variables which appoximates the inverse A−1 during the time, and γ is a positive gain parameter. It was proven in [47] that the GNN model (3.1) is asymptotically stable in the large and the steady-state matrix of the recurrent neural network is equal to the inverse matrix of A, i.e., lim V (t) = A−1 , for arbitrary V (0). t→∞

3.2.

GNN for Computing the Moore-Penrose Inverse

Recurrent neural network defined in (3.1) can be used for computing the right −1 T T inverse A† = A−1 A of a full-column rank rectangular matrix R = A A A ∈ Rm×n , by simply allowing the activation state matrix V (t) to be rectangun m×n lar. In the full-row rank case A ∈ Rm , the recurrent neural network dV (t) = γ (V (t)A − I) AT , dt

(3.2)

Computing Generalized Inverses Using Gradient-Based ...

13

−1 T can be used in approximating the left inverse A† = A−1 AAT . The L = A closed-form solution to the state matrices from (3.1) and (3.2) can be described as follows (see [48]): ( Rt T T exp(−γAT At)V (0) + γe−γA At 0 eγA Aτ AT dτ, m ≥ n, V (t) = T Rt T V (0) exp(−γAATt) + γATe−γAA t 0 eγAA τ dτ, m < n.

In the case of a full-row rank or full-column rank rectangular matrix A, Wang in [48] derived the following representation of A† , which is independent of V (0):  R T T  lim γe−γA At 0t eγA Aτ AT dτ, m ≥ n, t→∞ A† = (3.3) R  lim γAT e−γAAT t t eγAAT τ dτ, m < n. 0 t→∞

Wang in [48] proposed three recurrent neural networks for computing the Moore-Penrose inverse of rank-deficient matrices. The first recurrent neural network has the dynamic equation ( −M V (t)AT A+M AT , V (0) = 0, m ≥ n, dV (t) = (3.4) −V (t)AAT M +AT M, V (0) = 0, m < n, dt where M is a positive diagonal matrix satisfying M ∈ Rn×n if m ≥ n and M ∈ Rm×m if m < n.

Representation of the Moore–Penrose inverse given in (3.3) corresponds to the following integral representation of the Moore–Penrose inverse for bounded linear operators, introduced in [59]: Z ∞ T A† = e−A Aτ AT dτ. (3.5) 0

The global exponential convergence of Gradient neural network (3.2) in the case when A is nonsingular as well as its global stability when A is singular is verified in [60].

3.3.

GNN for Computing the Weighted Moore-Penrose Inverse

Wei in [49] introduced the following dynamic state equation of the first recurrent neural network (called N N 1) for computing the weighted Moore–Penrose

14

Predrag S. Stanimirovi´c and Yimin Wei

inverse of a rank–deficient matrix: ( −DA] AV (t)+DA], V (0) = 0, m ≥ n, dV (t) = dt −V (t)AA] D+A]D, V (0) = 0, m < n,

(3.6)

where D is a positive diagonal matrix of proper dimensions and A] = N −1 AT M for appropriately chosen positive definite matrices M and N . The simplest choice for D is the constant diagonal matrix D = γI, where γ > 0 is a real large scalar value [49]. Corresponding integral representation of the weighted Moore–Penrose inverse of a linear operator between Hilbert spaces was introduced in [61]: Z ∞ # † AM,N = e−A Aτ A# dτ. (3.7) 0

3.4.

GNN for Computing Outer Inverses (2)

The main property of the outer inverse AR(G),N (G) , which is a basis of the dynamic state equation and later for the induced gradient based recurrent neural network (GNN), is given in Lemma 3.1. Lemma 3.1. Let A ∈ Crm×n be given and G ∈ Csn×m be arbitrarily chosen (2) matrix satisfying 0 < s ≤ r. Assume that X := AR(G),N (G) exists. Then the matrix equations GAX = G, XAG = G (3.8) are satisfied. The GNN model for solving the matrix equation AXB = D was investigated in [50] and termed as GN N (A, B, D). Since GN N (A, B, D) is aimed to solving the matrix equation AXB = D, it is based on the error matrix E(t) = AV (t)B − D. On the other hand, the GN N (GA, I, G) model is defined for solving the matrix equations GAX = G, where A ∈ Rrm×n is given and G ∈ Rsn×m , 0 < s ≤ r, is appropriately chosen matrix. This model is (2) applicable in generating the outer inverse AR(G),N (G) . The first matrix equation in (3.8) is more appropriate in the case m ≥ n and the second in the case m < n. The matrix equations in (3.8) define the error matrix E(t) in one of the following two dual forms EGA (t) = GAV (t) − G,

EAG (t) = V (t)AG − G,

(3.9)

Computing Generalized Inverses Using Gradient-Based ...

15

where V (t) ∈ Rn×m denotes the unknown time-varying matrix to be solved. Clearly, the time-varying matrix V (t) corresponds to the outer inverse with pre(2) scribed range and null space X := AR(G),N (G) . Our intention is to solve one of the equations included in (3.9) with respect to the unknown matrix V (t) using the dynamic-system approach in conjunction with the symbolic data processing. For this purpose, the generally adopted rule is to use one of the following two dual scalar-valued error functions, defined as the residual Frobenius norm   kGAV (t)−Gk2F , if m ≥ n, 2 (3.10) ε(t) =  kV (t)AG−Gk2F , if m < n. 2

A computational scheme for computing the minimum of ε(V (t)) is defined along the gradient descent direction of ε(t). The derivative of ε(t) with respect to V is equal to ( (GA)T (GAV (t) − G) , if m ≥ n, ∂ε(t) = (3.11) ∂V (V (t)AG − G) (AG)T , if m < n.

According to the general GNN design (2.4), we obtain the following GN N (GA, I, G) dynamics: dVGA (t) = −γ (GA)T (GAVGA (t) − G) , if m ≥ n, dt

(3.12)

and GN N (I, AG, G) dynamics dVAG (t) = −γ (VAG (t)AG − G) (AG)T , if m < n. dt

(3.13)

The Simulink implementation of GN N (GA,I,G) is illustrated in Figure 1. The convergence of GN N (GA, I, G) in the case when A, G are constant matrices in the limiting case t → ∞ was investigated [50]. The results of Corollary 3.1 arise from Theorem 2.2. Corollary 3.1. [50] Assume that the real matrices A ∈ Rrm×n , G ∈ Rsn×m satisfy 0 < s ≤ r and ind(GA) = 1. Then the following statements hold. (i) The activation state variables matrix VG (t) of the model GN N (GA, I, G), defined by (3.12), is convergent as t → +∞ and has the limit value V˜GA (VGA (0)) = (GA)†G + VGA (0) − (GA)†GAVGA (0),

(3.14)

16

Predrag S. Stanimirovi´c and Yimin Wei

G A Matrix G Matrix A

Matrix Multiply GA

T

u

-1 Constant Matrix -KMultiply (GA)T( GAV(t)-G) γ

(GA)T

Display1 Matrix Multiply GAV(t)

Interpreted MATLAB Fcn Frobenius Norm

GAV(t)-G V(t)

1/s

-γ(GA)T( GAV(t)-G)=V'(t)

Time Scope

Figure 1. Simulink implementation of GN N (GA, I, G). for every initial matrix VGA (0) ∈ Rn×m . (ii) In particular VGA (0) = O initiates (2,4) V˜GA (O) = (GA)†G = AR(GA)T ,N (G) .

Proof. According to rank assumption rank(GA) = rank(G), it follows that (GA)(GA)†G = G.

(3.15)

So, according to Theorem 3.1, GN N (GA, I, G) is convergent and the general solution is given by (3.14). The convergence of GN N (I, AG, G) in the case when A, G are constant matrices was investigated [50]. Corollary 3.2. [50] Assume that the real matrices A ∈ Rrm×n , G ∈ Rsn×m satisfy 0 < s ≤ r and and ind(AG) = 1. Then the following statements hold. (i) The activation state variables matrix VAG (t) of the dynamical model GN N (I, AG, G), defined by (3.13), is convergent as t → +∞ and has the limit value V˜AG (VAG (0)) = G(AG)† + VAG (0) − VAG (0)AG(AG)†, for every initial matrix VAG (0) ∈ Rn×m .

(3.16)

Computing Generalized Inverses Using Gradient-Based ...

17

(ii) In particular, for VAG (0) = O, it follows that (2,3) V˜AG (O) = G(AG)† = AN (G),N (AG)⊥ .

4.

RNN MODELS A RISING

FROM

GNN MODELS

The authors of [62] omitted the constant term (GA)T (resp. (AG)T ) from (3.12) (resp. (3.13)), and considered two dual linear RNN models, defined as follows: ( dV (t) R = −γ (GAVR (t) − G) , VR (0) = O, if m ≥ n, dt (4.17) dVL(t) = −γ (V (t)AG − G) , V (0) = O, if m < n. L L dt The dynamical evolution (4.17) will be termed as GNNATS2. Practically, it is possible to consider two RNN models for computing outer (2) inverses AR(G),N (G) . More effective model in the case m ≥ n is termed as GN N AT S2R ≡ RN N (GA, I, G) and defined by dVR (t) = −γ (GAVR (t) − G) . dt

(4.18)

More effective model in the case m < n is termed as GN N AT S2L ≡ RN N (I, AG, G) and defined by dVL (t) = −γ (VL(t)AG − G) . dt

(4.19)

The Simulink implementation of GN N AT S2R is illustrated in Figure 2. The application of the dynamic evolution design (4.17) assumes that the real parts of eigenvalues of GA or AG lie in the open right half-plane [62]:  σ(GA) ⊂ {z : Re (z) ≥ 0}, m ≥ n, (4.20) σ(AG) ⊂ {z : Re (z) ≥ 0}, m < n. More precisely, the RN N (GA, I, G) model fails in the case when Re (σ(GA)) contains negative values. Clearly, the model (4.17) is simpler than the models (3.12) and (3.13), but it loses global stability. Two approaches can be used to generate the solution in the case when (4.20) is not satisfied: (i) An approach was proposed in [62], and it is based on the replacement of G by G0 = G(GAG)TG in the GNNATS2 evolution (4.17).

                                       

 



          

  

    

18

Predrag S. Stanimirovi´c and Yimin Wei

I

d L BF

Figure 2. Simulink implementation of GNNATS2R.

(ii) The second possibility is to use GN N (GA, I, G) instead of RN N (GA, I, G) dynamics. The recurrent neural network defined above is a linear dynamic system in matrix form. According to the linear systems theory [63], the closed-form solution of the state matrix can be described as follows: ( Rt V˜R (t) := e−γGAt VR (0) + γe−γGAt 0 eγGAτ G dτ, m ≥ n, ˜ Rt VG (t) = ˜ (4.21) VL (t) := VL(0)e−γAGt + γGe−γAGt 0 eγAGτ dτ, m < n. To analyze the convergence and stability of a neural network, it is of major interest to know the eigenvalues of the matrix GA (or AG for the dual case). Using the principles from [62], it can be easily shown that the term limt→∞ e−γGAt vanishes if the matrix GA has nonnegative eigenvalues, i.e. lim e−γGAt = O.

t→∞

(4.22)

Now, (4.22) in conjunction with (4.21) imply the following result for limt→∞ VG (t) = V G :  R  lim γe−γGAt teγGAτ G dτ, m ≥ n, 0 t→∞ R t γAGτ VG = (4.23) −γAGt  lim γGe dτ, m < n, 0e t→∞

Computing Generalized Inverses Using Gradient-Based ...

19

Theorem 4.1. [62] Let A ∈ Rm×n be given matrix, G ∈ Rsn×m be arbitrary matrix satisfying 0 < s ≤ r, and σ(GA) = {λ1 , λ2 , . . ., λn } be the spectrum of GA. Suppose that the condition Re(λj ) ≥ 0, j = 1, 2, . . ., n

(4.24)

is satisfied. Then the limiting expression (4.23) produces the outer inverse (2) AR(G),N (G) , i.e., (2)

V GA = AR(G),N (G).

(4.25)

Remark 4.1. Analogous statement can be analogously verified when the outer (2) inverse AR(G),N (G) is generated using dual equation XAG = G in (3.8). This situation is more efficient in the case m < n. According to Theorem 4.1, the application of the dynamic equation (4.18) is conditioned by the restriction (4.20) on the spectrum of the matrix GA or AG. More precisely, the first RNN approach used in [62] fails in the case when Re (σ(GA)) contains negative values. The neural network used in our implementation is composed from a number of independent subnetworks, in the similar way as it has already been discussed in [47, 48]. Specifically, the number of subnetworks is m if m ≥ n or it is equal to n if m < n. The connection weight matrix W of the neurons is identical in each subnetwork and defined as  −γGA, m ≥ n W = −γAG, m < n. Particularly, a simplification of the GNN model for computing the Drazin inverse AD was proposed in [64]. This model can be derived removing the first constant term in GN N (Ak , A, Ak ), k ≥ ind(A), and it is defined as   dVD (t) = −γ Ak+1 VD (t) − Ak , k ≥ ind(A), VD (0) = 0. (4.26) dt

Accordingly, an application of the model (4.26) is conditioned by   Re λk+1 ≥ 0, j = 1, . . . , n, j

(4.27)

where σ(A) = {λ1 , . . . , λn} is the spectrum of A, and k ≥ ind(A) [64]. One method to resolve the limitation (4.27) was proposed in [64], and it is based on the possibility to find an appropriate power k such that (4.27) holds.

20

Predrag S. Stanimirovi´c and Yimin Wei

There are several cases, depending on eigenvalues of A, for selecting the parameter m which ensures that the matrix Ak+1 has eigenvalues with nonnegative real parts. These cases are discussed in Theorem 4.2. Before the main results, we present several supporting facts in Lemma 4.1 and some notations. Lemma 4.1. [64] Let A ∈ Rn×n be given λj ∈ σ(A) and k be a fixed  matrix, 

positive integer. Then the condition Re λkj cases:

≥ 0 is ensured in the following

C1. λj ∈ R+ . C2. λj ∈ R, k is even. C3. λj and k satisfy π ϕj = Arg λj = ± , k ∈ {4l : l ∈ N+ }. 2

(4.28)

λj ∈ C ∩ {z = rj eıϕj : 0 < |ϕj | < π}

(4.29)

C4. λj satisfies and the parameter k satisfy 4s + 1 π 4s − 1 π 0, the nonzero spectrum of the matrix Ak+1 will lie in the open right half of the complex plane.

5. F URTHER R ESULTS ON C ONVERGENCE P ROPERTIES Before a fixed (or equilibrium) point can be analysed, it is desirable to determine it. This initiates importance of some classical computer algebra problems (such as an exact solution to differential or algebraic equations) in the study of dynamical systems. Exact solutions of some dynamical systems are investigated in [65, 66]. The following result from [65, Appendix B.2] will be useful. Proposition 5.1. The matrix differential equation X˙ = AX(t) + X(t)B + C(t),

X(t0 ) = X0

has the exact solution (t−t0 )A

X(t) = e

(t−t0 )B

X0 e

+

Zt

e(t−τ )A C(τ )e(t−τ )B dτ.

t0

Theorem 5.1. Let A ∈ R(X )m×n be given multivariate rational matrix of variables X such that t ∈ / X , let G ∈ R(X )n×m be arbitrary matrix satisfying s 0 < s ≤ r and ind(GA) = ind(AG) = 1. Assume that the spectrum σ(GA) = {λ1 , . . . , λn} satisfies the constraints Re(λj ) ≥ 0, j = 1, 2, . . ., n. (a) The exact solution to the GN N AT S2R evolution (4.18) is equal to  (2) VR (G, t, γ, VR(0)) = e−γt GA VR (0) + I − e−γt GA AR(G),N (G) .

(5.1)

(5.2)

22

Predrag S. Stanimirovi´c and Yimin Wei

(b) If the initial approximation is the zero matrix: V (0) = O, then the exact solution to GN N AT S2R is  (2) (5.3) VR (G, t, γ, O) = I − e−γt GA AR(G),N (G) .

(c) If γ > 0, then the limits of VG (t, γ) with V (0) = O are equal to

V R (G, t, γ, O) = lim VR (G, t, γ, O) = lim VR (G, t, γ, O) γ→∞

t→∞

=

(2) AR(G),N (G)

(5.4)

= G(GAG)†G.

Proof. The assumption ind(GA) = ind(AG) = 1 implies the existence of the (2) outer inverse AR(G),N (G) = (GA)# G = G(AG)#. Further, according to the linear dynamical systems theory [63], the closed-form of the state matrix V (t) of the GNNATS2R evolution (4.18) is equal to Z t −γt GA −γt GA VR (G, t, γ, VR(0)) = e VR (0) + γe eγ GA τ G dτ. (5.5) 0

(2)

(a) The replacement G = GAAR(G),N (G) leads to −γt GA

Z

t

eγ GAτ G dτ   Z t (2) −γt GA γ GAτ = e e (γ GA) dτ AR(G),N (G) 0   Z t  (2) −γtGA γGAτ = e d e AR(G),N (G).

γe

0

0

Further, using several elementary transformations and basic properties (3.8) of the outer inverse, one can obtain  τ =t  (2) −γt GA −γtGA γGAτ VR (G, t, γ, VR(0)) = e VR (0) + e e AR(G),N (G) τ =0

= e−γt GA VR (0) + e−γtGA eγtGA − I  (2) = I − e−γtGA AR(G),N (G) . 

This part of the proof is completed.



(2)

AR(G),N (G) .

Computing Generalized Inverses Using Gradient-Based ...

23

(b) The zero initial state V (0) = O vanishes the term e−γt GA VR (0), and the proof is implied by the part (a). (c) It suffices to use known fact from [62], where the authors shown that lim e−γtGA = O if the matrix GA has nonnegative real parts of eigenvalues in t→∞ conjunction with equation (5.2). Theorem 5.2. Let A ∈ R(X )m×n be given matrix such that t ∈ / X , let G ∈ R(X )sn×m be arbitrary matrix satisfying 0 < s ≤ r and ind(AG) = 1. Assume that the spectrum σ(AG) = {λ1 , . . . , λm} satisfies the constraints Re(λj ) ≥ 0, j = 1, 2, . . ., m.

(5.6)

(a) The exact solution to the GN N AT S2L evolution (4.18) is equal to  (2) VL (G, t, γ, VL(0)) = e−γt GA VL(0) + AR(G),N (G) I − e−γt AG .

(5.7)

(b) If the initial approximation is the zero matrix: V (0) = O, then the exact solution to GN N AT S2L is  (2) (5.8) VL (G, t, γ, O) = I − e−γt GA AR(G),N (G). (c) If γ > 0, then the limits of VG (t, γ) are equal to

V L (G, O) = lim VL(G, t, γ, O) = lim VL (G, t, γ, O) t→∞

=

(2) AR(G),N (G)

γ→∞

= G(GAG)†G.

(5.9)

The convergence of GN N (GA, I, G) dynamics in the case A ∈ R(X )m×n , G ∈ R(X )sn×m is investigated in Theorem 5.3. Theorem 5.3. Let A ∈ R(X )m×n be given matrix such that t ∈ / X , let G ∈ R(X )sn×m be arbitrary matrix satisfying 0 < s ≤ r and ind(GA) = 1. The following statements are valid. (a) The exact solution to the GN N (GA, I, G) dynamics (3.12) is   T T VGA (G, t, γ, VGA(0)) = e−γt(GA) GA VGA (0) + I −e−γt (GA) GA (GA)†G   T T (2,4) = e−γt(GA) GA VGA (0) + I − e−γt (GA) GA AR(GA)T ,N (G) . (5.10)

24

Predrag S. Stanimirovi´c and Yimin Wei

(b) The exact solution to the GN N (GA, I, G) dynamics (3.12) with the zero initial stage V (0) = O is equal to   T VGA (G, t, γ, O) = I − e−γt (GA) GA (GA)†G   (5.11) T (2,4) = I − e−γt (GA) GA AR(GA)T ,N (G) . (c) The exact solution to the GN N (GA, I, G) dynamics (5.11) converges to V˜GA (G, t, γ, O) = lim V GA (G, t, γ, O) = lim V GA (G, t, γ, O) γ→∞

t→∞

= (GA)†G =

(5.12)

(2,4) AR(GA)T ,N (G) .

Proof. (a) Using known results from [63], the closed-form solution of the state matrix VGA (t) of the GN N (GA, I, G) dynamics is equal to VGA (G, t, γ, VGA(0)) = e−γt(GA)

T GA

VGA (0) + γe−γt (GA)

T GA

Z

t

eγ(GA)

T GAτ

(GA)T G dτ.

(5.13)

0

Since ind(GA) = 1, it follows that rank(GA) = rank(G), which further implies GA(GA)†G = G. Now, it follows that −γt (GA)T GA

Z

t

T

eγ(GA) GAτ (GA)TG dτ 0   Z t T −γt (GA) GA γ(GA)T GAτ T = e e (γ(GA) GA) dτ (GA)†G 0  Z t   T T = e−γt (GA) GA d eγ(GA) GAτ (GA)†G.

γe

0

Further, using several elementary transformations, we conclude VGA (G, t, γ, VGA(0))

  τ =t  −γt (GA)T GA γ (GA)T GAτ =e VGA (0)+ e e (GA)†G τ =0 h  i T T T = e−γt(GA) GA VGA (0) + e−γt (GA) GA eγt (GA) GA −I (GA)†G  T = e−γt(GA) GA VGA (0) + I − e−γtGA (GA)†G. −γt(GA)T GA

Computing Generalized Inverses Using Gradient-Based ...

25

(2,4)

Now, the proof of this part can be easily completed using AR(GA)T ,N (G) = (GA)†G. (b) The zero initial state VGA (0) = O vanishes the first term of (5.13), and the proof follows from the part (a). (c) Since the matrix GA has nonnegative real parts of eigenvalues, using T lim e−γt(GA) GA = O, the proof follows from (5.11). t→∞

In Theorem 5.4 we investigate the convergence of symbolic GN N (I, AG, G) dynamics in the case when A ∈ R(X )m×n is a given r matrix and G ∈ R(X )n×m , 0 < s ≤ r. s Theorem 5.4. Let A ∈ R(X )m×n be given matrix such that t ∈ / X ; let G ∈ n×m R(X )s be arbitrary matrix satisfying 0 < s ≤ r and ind(AG) = 1. Then following statements hold. (a) The exact solution to the GN N (I, AG, G) dynamics (3.12) is equal to   T T VAG (G, t, γ, VAG(0)) = e−γtAG(AG) VAG (0) + I −e−γt AG(AG) G(AG)†   T T (2,3) = e−γtAG(AG) VAG (0) + I − e−γt AG(AG) AR(G),N (AG)T . (5.14) (b) The exact solution to the GN N (I, AG, G) dynamics with the zero initial stage VAG (0) = O is defined by   T VAG (G, t, γ, O) = G(AG)† I − e−γt AG(AG) . (5.15) (c) If t > 0 and γ > 0, then the exact solution (5.15) converges to

V˜AG (G, O) = lim VAG (G, t, γ, O) = lim VAG (G, t, γ, O) t→∞

γ→∞



(5.16)

= G(AG) .

6.

SYMBOLIC IMPLEMENTATION OF THE GNN DYNAMICS

Traditional implementation of GNN and ZNN models assumes two possible numerical approaches: Matlab Simulink implementation and implementation by a Matlab program. The implementation in the Matlab program requires a vectorization of the system of matrix differential equations into the vector form (mass

26

Predrag S. Stanimirovi´c and Yimin Wei

matrix) and then solving the vector of differential equations by means of one of Matlab solvers, such as ode45, ode15s, ode23. This requires repeated application of one of these solvers in unpredictable time instants inside the predefined time interval [0, tf ]s, where s stands for the time in seconds. Now, we can define exactly our “symbolic solver” or “symbolic Simulink”. Namely, they are defined as one of the symbolic matrices defined in (5.2) or in (5.11) or (5.15). The exact outer inverses can be generated by the limiting expressions limt→∞ VGA (G, t, γ, V0), γ > 0 or limγ→∞ VGA (G, t, γ, V0), t > 0. We want to define a Mathematica program for symbolic implementation of dynamical state equations (4.17), (3.12) and (3.13). The expression generated in symbolic form can replace a Matlab Simulink presented in Figure 1 or corresponding Matlab program. An overview of continuous-time and discrete-time dynamical systems supported by the aid of Mathematica was presented in [67]. As a continuation of that research, we consider the symbolic computation of generalized inverses by means of dynamical systems. The main idea is to solve the matrix differential equations in symbolic form. The solution given in symbolic form should be generated only once. After that, that symbolic expression derived as the symbolic solution to the system of differential equations which define GN N (GA, I, G) or GN N (I, AG, G) or GNNATS2 dynamics can be used as “symbolic solver” which is able to define wanted values of V (t) in each time instant only by means of simple replacement of the variable t by an arbitrary time instant t0 ∈ [0, tf ]. Also, there is available the possibility to investigate some limiting properties of the symbolic solver by means of Mathematica function Limit. According to the previous discussion, we present the corresponding Algorithm 6.1 for computing outer inverses of the input matrix A ∈ R(X )rm×n by finding exact solutions of the dynamic state equation (4.17) or (3.12). The algorithm assumes choice of a matrix G ∈ R(X )sn×m , 0 < s ≤ r. If the entries of A are real numbers then the output of the algorithm is an approximation of (2) the outer inverse AR(G),N (G) . In the case when the elements of A are rational numbers or as well as one-variable or multiple-variable rational expressions, (2) Algorithm 6.1 returns the exact outer inverse AR(G),N (G) . It is assumed that the variables s1 , . . . , sk are different than the symbol t representing the time, which will be denoted as t ∈ / X.

Computing Generalized Inverses Using Gradient-Based ...

27

Algorithm 6.1 Computing outer inverse of a given matrix A ∈ R(X )rm×n . Require: Time-invariant matrices A ∈ R(X )rm×n and G ∈ R(X )sn×m , 0 < s ≤ r. 1: Construct the dynamic state equations contained in (3.12) or (4.17) in symbolic form. Step 1:1: Construct symbolic matrix V (t) = vij (t), t ∈ / X. Step 1:3: Construct the matrix −γ (GA)T (GAV (t) − G) for GN N (GA, I, G) dynamics or −γ (GAV (t) − G) for GNNATS2 dynamics. Step 1:4: Define the symbolic matrix equation eqnstate, defined as V˙ (t) − γ (GA)T (GAV (t) − G) = O or V˙ (t) − γ (GAV (t) − G) = O. 2: Define the initial state V (0) = O. 3: Solve (3.12) or (4.17) symbolically. Denote the output by V (t). Step 3:1: Join vectorized lists eqnstate and V (0) in the list eqns. Step 3:2: Solve the system of differential equations eqns with respect to variables vij and the time t. 4: Compute V = lim V (t). t→∞

5:

(2)

Return the outer inverse V = AR(G),N (G).

The implementation of Algorithm 6.1 is performed by the following code in the algebraic programming language Mathematica [68]. Below we give the code which solves the problem (3.12). Let us mention that the Mathematica function DSolve can be applied to generate a solution in Step 3:2 of Algorithm 6.1. More precisely, DSolve finds symbolic, “pure function”, solutions for functions vij which are included in differential equations that appear in (3.12), (3.13) or (4.17). Also, the Mathematica function Limit can be applied to generate an exact solution in Step 5 of Algorithm 6.1.

28

Predrag S. Stanimirovi´c and Yimin Wei

SymNNInv[A_, G_] := Module[ {dimsA,V,derV,E,eqnState,eqnInit,eqns,vars,ret,prodGA,eigs}, (*Compute the matrix product GA*) prodGA = G.A; (*Compute dimensions of A*) dimsA = Dimensions[A]; (*Step 1: Construct the dynamic state equation in symbolic form*) (*Step 1:1:*) V = Table[Subscript[v,i,j][t], {i, dimsA[[2]]}, {j, dimsA[[1]]}]; (*Step 1:2:*) derV = Table[Subscript[v,i,j]’[t], {i,dimsA[[2]]}, {j,dimsA[[1]]}]; (*Step 1:3:*) eqnEvol = -\[Gamma]*(prodGA.V-G); (*Step 1:4:*) eqnState = Flatten[Table[ derV[[i,j]]-eqnEvol[[i,j]]==0, {i,dimsA[[2]]}, {j,dimsA[[1]]}]]; (*Step 2: Define the zero initial state*) eqnInit = Flatten[Table[ Subscript[v,i,j][0]==0, {i,dimsA[[2]]}, {j,dimsA[[1]]}]]; (*Step 3: Solve the dynamic state equation in symbolic form*) (*Step 3:1:*) eqns = Join[eqnState, eqnInit]; vars = Flatten[V]; (*Step 3:2:*) ret = DSolve[eqns, vars, t] // Simplify; ret = vars /. Sort[Flatten[ret]]; (*Return the outer inverse*) Return[Table[ ret[[(i-1)*dimsA[[1]]+j]], {i,dimsA[[2]]}, {j,dimsA[[1]]} ]]; ];

Denote by SymNNInvGA[A , G ] the Mathematica function wherein the expression eqnEvol in SymNNInv[A , G ] is defined by eqnEvol = -\[Gamma]*Transpose[prodGA].(prodGA.V-G);

In dual case, SymNNInvAG[A , G ] denotes the Mathematica function where the expression eqnEvol in SymNNInv[A , G ] is defined by prodAG = A.G; eqnEvol = -\[Gamma]*(V.prodAG-G).Transpose[prodAG];

Computing Generalized Inverses Using Gradient-Based ...

29

If the entries of A are constant real numbers then the output of the algorithm (2) is an approximation of the outer inverse AR(G),N (G) . In the cases when the elements of A = A(X ) are rational numbers or rational expressions with respect to variables included in X such that t ∈ / X , Algorithm 6.1 returns the exact outer (2) inverse A(X )R(G),N (G). Denote shortly one of the function SymNNInvGA[A,G] or SymNNInvGA[A,G] or SymNNInvAG[A,G] by SymGNNInv[A,G]. What is the main advantage of the symbolic processing in solving dynamical system with respect to numerical approach, used so far? (1) The solution V (t) given by V[t]:=SymGNNInv[A,G] is in symbolic form with respect to the variable t representing the time. After generating V (t), arbitrary value V [t0] of the generalized inverse in a time instant t0 ∈ [0, tf ]s, tf > 0 can be simply generated only the replacement SymGNNInv[A,G]/.t->t0. This possibility avoids necessity to compute values of V (t) repeatedly. (2) The symbolic expression generated by SymGNNInv[A,G] will be termed as “symbolic Simulink”. Construction of such expression is a time consuming job. But, it can be generated only once and used later efficiently only by means of simple replacements. In order to obtain a solution at one single time instant, an arbitrary real-time solver requires the information from previous time instants. On the other hand, a “symbolic solver” is a ready for use symbolic expression prepared for simple replacements and finding limiting values. (3) As a consequence, each replacement SymGNNInv[A,G]/.t->t0 avoids permanent time consuming and possibly numerically instable job in finding numerical solutions in subsequent time instances t0 ∈ [0, tf ]. (4) In addition, it is possible to find limiting values of generated symbolic expressions using possibility to obtain limiting values in computer algebra systems such as Mathematica. These limiting values can be generated by the expressions SymGNNInv[A,G]/.t->∞.

7.

E XAMPLES

The first part oh this section gives several examples produced by the Matlab Simulink implementation. The second part gives illustrative examples in symbolic form, generated by the programming language Mathematica.

30

7.1.

Predrag S. Stanimirovi´c and Yimin Wei

Examples in Matlab

Example 7.1. Consider the input matrix  1 2  1 3   2 3 A=  3 4   4 5 6 6

3 4 4 5 6 7

4 6 5 6 7 7

1 2 3 4 6 8

       

and the joined matrix 

  G=  

0 1 2 3 0

0 2 3 5 1

0 1 2 3 0

0 2 3 5 1

0 1 2 3 0

0 2 3 5 1



  .  

The model GN N (GA, I, G) with γ = 108 in the case VR (0) = O generates the result X = (GA)†G   −0.2812 0.2307 −0.2812 0.2307 −0.2812 0.2307  −0.0761 0.0719 −0.0761 0.0719 −0.0761 0.0719     = 0.1290 −0.0868 0.1290 −0.0868 0.1290 −0.0868    0.3341 −0.2455 0.3341 −0.2455 0.3341 −0.2455  −0.3294 0.2723 −0.3294 0.2723 −0.3294 0.2723

within the time interval [0, 10−5]s. The initial state VR (0) = AT in GN N (GA, I, G) under the same environment initiates the result X = (GA)†G + AT − (GA)†GAAT   −0.0493 0.0565 −0.3276 0.3234 −0.4667 0.3121  0.0606 0.3282 −0.1034 0.1266 −0.1855 −0.2390    = 0.2099  .  0.1706 −0.4001 0.1207 −0.0702 0.0957  0.2806 −0.1284 0.3448 −0.2669 0.376932 −0.3413  −0.5969 0.3579 −0.2759 0.1653 −0.1153 0.2937

31

Computing Generalized Inverses Using Gradient-Based ... On the other hand, the solution to GN N AT S2R is (2)

AR(G),N (G) = G(GAG)† G  0 0  −0.1207 0.1092  −0.2644 =  0.3448  0.2241 −0.1552 −0.5862 0.4828

7.2.

0 −0.1207 0.3448 0.2241 −0.5862

0 0.1092 −0.2644 −0.1552 0.4828

0 −0.1207 0.3448 0.2241 −0.5862

 0 0.1092   −0.2644  . −0.1552  0.4828

Examples in Mathematica

The dynamical system (4.17) is simpler for symbolic handling than (3.12). So, it is preferable to use (4.17) in the case when the condition (4.20) holds. In the opposite case, the recommended choice is (3.12). Example 7.2. In this example we consider the input matrix from [69]   1 −1 0 0 0 0  −1 1 0 0 0 0     −1 −1 1 −1 0 0    A=  −1 −1 −1 1 0 0    −1 −1 −1 0 2 −1  −1 −1 0 −1 −1 2

and choose various appropriate matrices G. A) The following result in the convenient matrix form can be obtained by the commands G = A.A and VGD = SymNNInv[A, G]: 2

VGD

6 6 6 6 6 =6 6 6 6 4

1 1 − e−8tγ 4 ` ´ 1 −1 + e−8tγ 4

`

´

0

1 4 1 4

´ −1 + e−8tγ ` ´ 1 − e−8tγ

`

0

0 0

0

0

`

1 12 1 12

` `

−5 + 2e

1 −27tγ −3e−tγ 6 4−e ` ´ −27tγ 1 −3e−tγ 6 2+e

`

− 3e

0 0 0

0 ´

´

+ 6e−tγ

−7 − 2e−27tγ + 3e−8tγ + 6e−tγ

0 0 0

1 −27tγ +3e−8tγ +6e−tγ 12 −7− 2e ` ´ −27tγ 1 −3e−8tµ +6e−tµ 12 −5+2e

`

0 1 1 − e−8tγ 4 ` ´ 1 −1 + e−8tγ 4 −27tγ −8tγ

0

0

0 0 ` ´ 1 −1 + e−8tγ 4 ` ´ 1 −8tγ 4 1−e

0

´

´ ´ 3

7 7 7 7 7. 0 7 7 ` ´ 1 −27tγ −tγ 7 − 3e 6 2+e 5 ` ´ −27tγ −tγ 1 4−e −3e 6

32

Predrag S. Stanimirovi´c and Yimin Wei

From the matrix VGD it is possible to “read” the exponential convergence of the GNN model. The matrix VGD can be used as “symbolic solver”. Further, D D the limit expressions V G can be examined. Firstly V G = Limit[VGD /.γ → 103 , t → ∞] produces the following exact Drazin inverse of A:  1  1 0 0 0 0 4 −4  1  1 0 0 0 0   −4 4   1  0 − 14 0 0  0  4  D VG = A =  . 1 0 − 41 0 0  0  4   5 7 2 1  0 0 − 12 − 12 3 3    7 5 1 2 − 12 0 0 − 12 3 3 −8 The same result can be obtained using Limit[VGD /.t → 10  ,γ → ∞ ]. 3 D −γtA It is important to mention that the appearance VG = I − e AD of the statement (5.2) can be verified by the Mathematica expression

G=A.A; AD = G.PseudoInverse[G.A.G].G; (IdentityMatrix[6] - MatrixExp[-\[Gamma]*t*G.A]).AD //Simplify//MatrixForm

The positive real scaling constant γ should be chosen as large as possible in order to achieve the convergence for smaller values of t. An approximation of AD can be obtained by means of Limit[VGD /.t → 10−8 , γ → 109 ], and it is equal to   0.25 −0.25 0 0 0 0   −0.25 0.25 0 0 0 0     0 0 0.25 −0.25 0 0 .   0  0 −0.25 0.25 0 0    0 0 −0.416667 −0.583333 0.666667 0.333333  0 0 −0.583333 −0.416667 0.333333 0.666667 The state trajectories of the elements xij = (VAT )ij 6= 0 in the case γ → 109 , t ∈ [0, 10−8]s are presented in Figure 3. B) Particularly, G = A.A; SymNNInvGA[A,G] generates the “symbolic

Computing Generalized Inverses Using Gradient-Based ...

33

0.6 0.4 0.2

2. ´ 10-9

4. ´ 10-9

6. ´ 10-9

8. ´ 10-9

1. ´ 10-8

- 0.2 - 0.4 - 0.6

Figure 3. The state trajectories of xij = (VA2 )ij in the case γ → 109 , t ∈ [0, 10−8]s in Example 7.2.

Simulink” for computing 

1 4 − 14

   0 3 † 2 (A ) A =   0   0 0

− 41

0 0

0 0 0 0

1 2

1 4

0 − 61 − 31

0 0 0 1 2 − 13 − 16

0 0 − 14 − 14 5 12 1 12

0 0 − 14 − 14 1 12 5 12



   .   

One approach is to use the the limit Limit[SymNNInvGA[A,A.A]/.γ → 103 , t → ∞]. Another possibility is to achieve the same result with arbitrary small time interval, for example in [0, 10−8 ]s using the limit Limit[SymNNInvGA[A,A.A]/.t→ 10−8 , γ → ∞]. C) The expression X = VAT =SymNNInv[A,Transpose[A]] is the “symbolic Simulink” for computing the Moore-Penrose inverse of A. The limit expression V AT =Limit[VAT /.γ → 103 , t → ∞] produces the

34

Predrag S. Stanimirovi´c and Yimin Wei

following exact Moore-Penrose inverse of A:  1 1 1 4 −4 −4  1 1 1  −4 4 −4  1  0 0 2  † V AT = A =  0 0  0  1  0 0 −6  0 0 − 31

− 14

0

− 14

0

0 − 14 1 2 − 13 − 16

− 14 5 12 1 12

0



 0   − 14   . − 14   1  12  5 12

D) Now, we want to to compute the pseudoinverse twice and verify (A† )† = A. For this purpose, it suffices to compute A1= SymNNInv[X,Transpose[X]] and then verify that Limit[A1 /.γ → 103 , t → ∞] coincides with A. Example 7.3. Consider the matrix  1  1   2 A=  3   4 6

2 3 3 4 5 6

3 4 4 5 6 7

4 6 5 6 7 7

1 2 3 4 6 8



   ,   

choose the following matrices P ∈ R5×2 and Q ∈ R2×6   0 0  2 1      0 1 0 1 0 1   P =  3 2 , Q = 1 0 1 0 1 0  5 3  1 0

(7.1)

(7.2)

and compute G = P Q. The spectrum of GA contains only nonnegative real values: σ(GA) = {266.346716180717, 0.653283819282910, 0., 0., 0.}. Using the function Xp = SymNNInv[A, G] we obtain the exact outer inverse of

Computing Generalized Inverses Using Gradient-Based ...

35

A corresponding to G: V G = Limit[Xp/.γ → 103 , t → ∞]  0 0 0 0 0 0 19 7 19 7 19  −7  58 174 − 58 174 − 58 174  10 23 10 23 10 23 − − − = 87 29 87 29 87  29 13 9 13 9  13 − 9 − − 58 58 58 58 58  58 14 17 14 17 14 − 17 29 29 − 29 29 − 29 29



   .   

In order to verify V G , it is possible to use the full-rank representation (2) AR(P ),N (Q) = P (QAP )−1 Q (see [70]). The state trajectories of the elements (VG )ij in the case γ →109 , t ∈ [0, 10−8] are presented in Figure 4. 0.4

0.2

2. ´ 10-9

4. ´ 10-9

6. ´ 10-9

8. ´ 10-9

1. ´ 10-8

- 0.2

- 0.4

- 0.6

Figure 4. The state trajectories of (VG )ij in the case γ → 109 , t ∈ [0, 10−8] in Example 7.3.

Example 7.4. The tridiagonal square test matrix 

1 −1

 2 −1  −1  −1 2 −1  Bn =  .. ..  . .    −1

0

0



     ..  .   2 −1  −1 1

36

Predrag S. Stanimirovi´c and Yimin Wei

of the order n of rank rank(Bn ) = n − 1 is considered. In the case n = 6 let us choose G6 = B6T . Now, the expression X6= SymNNInv[B6,G6 ] generates the exact Moore-Penrose inverse of A: V AT = Limit[X6 /.γ → 103 , t → ∞]  55 25 1 17 36 36 36 − 36  25 31 7 11  36 36 36 − 36  1 7 19 1   36 36 36 36 = A† =  17 1 19  − 36 − 11 36 36 36  29 23 11 7  − − −  36 36 36 36 29 17 1 − 35 36 − 36 − 36 36

− 29 36

− 35 36

− 23 36

 − 29 36    − 17 36  1 . 36   25  36 

11 − 36 7 36 31 36 25 36



55 36

The state trajectories of the elements (VAT )ij in the case γ → 1010 , t ∈ [0, 10−8]s are presented in Figure 5. Clear and fast convergence to exact equilibrium points of each trajectory is observable. 1.5

1.0

0.5

2. ´ 10-9

4. ´ 10-9

6. ´ 10-9

8. ´ 10-9

1. ´ 10-8

- 0.5

- 1.0

Figure 5. The state trajectories of (VAT )ij in the case γ → 1010 , t ∈ [0, 10−8] in Example 7.4.

Since ind(A) = 1, the expression Y6 = SymNNInv[B6, B6 ] can be used in deriving the exact group inverse of A: V A = Limit[Y6 /.γ → 103 , t → ∞] = A# = A† .

Computing Generalized Inverses Using Gradient-Based ...

37

Example 7.5. It is worth mentioning that the function SymNNInv[A ,G ] is applicable for matrices whose entries are polynomial or rational expressions. For example, let us observe the case where A=H[n] is the Hessenberg matrix of the order n and G=Transpose[H[n]]: H[n_]:=Table[If[j>=i+2,0,\[Alpha]ˆ(i-j+1)],{i,n},{j,n}]; G[n_]:=Transpose[H[n]];

Then MPH[n] can be generated in symbolic form using the expression MPH[n ]:=SymNNInv[H[n], G[n]] and H[n]† = lim M P H[n], for each fixed t→∞ value of the parameter γ.

C ONCLUSION The present paper is a contribution to both numeric and symbolic computations of outer generalized inverses of matrices. Some dynamical systems for solving matrix equations and computing generalized inverses are proposed. Considered dynamical models are based on GNN design. Convergence properties and exact solutions of considered models are investigated. Two different implementations of considered models are developed. Simulation results are obtained using Matlab Simulink implementation and using Matlab programs. Another kind of implementation is based on the algorithm for generating exact solution to some dynamic state equations. Corresponding program is developed in Computer Algebra System (CAS) Mathematica, and it represents an efficient software for symbolic computation of outer inverses of rational matrices. The domain of the Mathematica program includes constant matrices whose entries are integers, rational numbers as well as one-variable or multiple-variable rational or polynomial matrices. Examples on rational matrices are developed in the package Mathematica. In general, derived results are a contribution to the application of computer algebra in solving dynamical systems. The introduced algorithm is based on the exact solutions to systems of differential equations which appear in dynamic state equations included in the GNN modelling of generalized inverses. The hearth of our algorithm is the possibility to solve the system of ordinary differential equations. This problem is a part of the scientific research known as Computer Algebra and Differential Equations. Here, we used the possibility of the standard Mathematica function DSolve. Clearly, many different approaches are available and could be exploited.

38

Predrag S. Stanimirovi´c and Yimin Wei

Moreover, it is worth to mention that the approach and algorithm proposed in the current paper are applicable to all dynamical systems which exploit ordinary differential equations (ODEs) in describing evolution of a state space in a time. It is known that the capabilities of computer algebra packages in finding explicit solutions of differential equations are limited. Thereafter, numerical approach remains important. In general, symbolic and numeric computations could interact with each other. Possible further research includes the following: 1. Extend the symbolic algebra approach to the extended ZNN design, possibly with the presence of noise. 2. Consider the application of computer algebra in the discrete case, in solving the system of difference equations.

ACKNOWLEDGMENTS Predrag Stanimirovi´c is supported by the Ministry of Education, Science and Technological Development, Republic of Serbia, grant no. 174013. Yimin Wei is supported by the National Natural Science Foundation of China under grant 11771099 and Innovation Program of Shanghai Municipal Education Commission. Predrag Stanimirovi´c and Yimin Wei are supported by the bilateral project between China and Serbia “The theory of tensors, operator matrices and applications (no. 4-5)”.

R EFERENCES [1] Wang, G. R., Wei, Y. and Qiao, S. Z. (2018). Generalized Inverses: Theory and Computations, Second edition, Developments in Mathematics, 53. Springer, Singapore; Science Press Beijing, Beijing. [2] Ben-Israel, A., Greville, T. N. E. (2003). Generalized inverses: Theory and Applications, Second Ed., Springer-Verlag, New York, Berlin, Heidelberg. [3] Wei, Y., Stanimirovi´c, P. S., Petkovi´c, M. D. (2018). Numerical and symbolic computations of generalized inverses, World Scientific Publishing

Computing Generalized Inverses Using Gradient-Based ...

39

Co. Pte. Ltd., Hackensack, NJ, DOI 10.1142/10950 (ISBN: 978-981-323866-4). [4] Katsikis, V. N., Pappas, D. and Petralias, A. (2011). An improved method for the computation of the Moore-Penrose inverse matrix, Appl. Math. Comput. 217: 9828–9834. [5] Stanimirovi´c, P. S., Pappas, D., Katsikis, V. N. and Stanimirovi´c, I. P. (2012). Full-rank representations of outer inverses based on the QR decomposition, Appl. Math. Comput. 218: 10321–10333. [6] Stanimirovi´c, P. S. and Tasi´c, M. B. (2008). Computing generalized inverses using LU factorization of matrix product, International Journal of Computer Mathematics 85: 1865–1878. [7] Guo, W. B. and Huang. T. Z. (2010). Method of elementary transformation to compute Moore-Penrose inverse, Appl. Math. Comput 216: 1614–1617. [8] Stanimirovi´c, P. S. and Petkovi´c, M. D. (2013). Gauss-Jordan elimination method for computing outer inverses, Appl. Math. Comput. 219: 4667– 4679. [9] Barnett, S. (1989). Leverrier’s algorithm: a new proof and extensions, SIAM J. Matrix Anal. Appl. 10: 551–556. [10] Faddeev D. K. and Faddeeva, V. N. (1963). Computational Methods of Linear Algebra, Freeman, San Francisko. [11] Decell, H. P. (1965). An application of the Cayley-Hamilton theorem to generalized matrix inversion, SIAM Rev. 7: 526–528. [12] Grevile, T. N. E. (1973). The Souriau-Frame algorithm and the Drazin pseudoinverse, Linear Algebra Appl. 6: 205–208. [13] Hartwig R. E. (1976). More on the Souriau-Frame algorithm and the Drazin inverse, SIAM J. Appl. Math. 31: 42–46. [14] Ji, J. (1994). An alternative limit expression of Drazin inverse and its applications, Appl. Math. Comput. 61: 151–156. [15] Chen, Y. L. (1995). Finite Algorithms for the (2)-Generalized Inverse (2) AT ,S , Linear and Multilinear Algebra 40: 61–68.

40

Predrag S. Stanimirovi´c and Yimin Wei

[16] Greville, T. N. E. (1960). Some applications of the pseudo-inverse of matrix, SIAM Rev. 3: 15–22. [17] Chen, Y. L. and Chen, X. (2000). Representation and approximation of the (2) outer inverse AT ,S of a matrix A, Linear Algebra Appl. 308, 85–107. [18] Djordjevi´c, D. S., Stanimirovi´c, P. S. and Wei, Y. (2004). The representation and approximation of outer generalized inverses, Acta Math. Hungar. 104: 1–26. [19] Li, W. G. and Li, Z. (2010). A family of iterative methods for computing the approximate inverse of a square matrix and inner inverse of a nonsquare matrix, Appl. Math. Comput. 215: 3433–3442. [20] Petkovi´c, M. D. and Stanimirovi´c, P. S. (2014). Two improvements of the iterative method for computing Moore-Penrose inverse based on Penrose equations, J. Comput. Appl. Math. 267: 61–71. [21] Stanimirovi´c, P. S. and Cvetkovi´c-Ili´c, D. S. (2008). Successive matrix squaring algorithm for computing outer inverses, Appl. Math. Comput. 203: 19–29. [22] Stanimirovi´c, P. S. and Soleymani, F. (2014). A class of numerical algorithms for computing outer inverses, J. Comput. Appl. Math. 263: 236– 245. [23] Wei, Y. and Wang, G. R. (1999). Approximate methods for the generalized (2) inverse AT ,S , J. Fudan Univ. (Natural Science) 38: 234–249. [24] Wei, Y. and Wu, H. (2003). The representation and approximation for the (2) generalized inverse AT ,S , Appl. Math. Comput. 135: 263–276. [25] Stanimirovi´c P. S., Wei, Y., Kolundˇzija, D., Sendra, J. R. and Sendra, J. (2019). An Application of Computer Algebra and Dynamical Systems, ´ c M. Droste, J.-E. Pin (eds.): CAI 2019. Lecture Chapter No 19 in: M. Ciri´ Notes in Computer Science, vol 11545. Springer, Cham., pp. 225-236. https://doi.org/10.1007/978-3-030-21363-3 19. [26] Fragulis, G., Mertzios, B. G. and Vardulakis, A. I. G. (1991). Computation of the inverse of a polynomial matrix and evaluation of its Laurent expansion, Int. J. Control 53: 431–443.

Computing Generalized Inverses Using Gradient-Based ...

41

[27] Jones, J., Karampetakis, N. P. and Pugh, A. C. (1998). The computation and application of the generalized inverse via Maple, J. Symbolic Computation 25: 99–124. [28] Karampetakis, N. P. (1997). Computation of the generalized inverse of a polynomial matrix and applications, Linear Algebra Appl. 252: 35–60. [29] Karampetakis, N. P. (1997). Generalized inverses of two-variable polynomial matrices and applications, Circuits Systems Signal Processing 16: 439–453. [30] Karampetakis, N. P. and Vologiannidis, S. (2003). DFT calculation of the generalized and Drazin inverse of a polynomial matrix, Appl. Math. Comput. 143: 501–521. [31] Krishnamurthy, E. V. (1985). Symbolic iterative algorithm for generalised inversion of rational polynomial matrices, J. Symbolic Computation 1: 271–281. [32] McNulty, S. K. and Kennedy, W. J. (1985). Error-free computation of a reflexive generalized inverse, Linear Algebra Appl. 67: 157–167. [33] Petkovi´c, M. D., Stanimirovi´c, P. S. and Tasi´c, M. B. (2008). Effective partitioning method for computing weighted Moore-Penrose inverse, Comput. Math. Appl. 55: 1720–1734. [34] Petkovi´c, M. D. and Stanimirovi´c, P. S. (2005). Symbolic computation of the Moore-Penrose inverse using partitioning method, Int. J. Comput. Math. 82: 355–367. [35] Rao, T. M., Subramanian, K. and Krishnamurthy, E. V. (1976). Residue arithmetic algorithms for exact computation of g-Inverses of matrices, SIAM J. Numer. Anal. 13: 155–171. [36] Sendra, J. R. and Sendra, J. (2016). Symbolic computation of Drazin inverses by specializations, J. Comput. Appl. Math. 301: 201–212. [37] Schmidt, K. (2009). On the Computation of the Moore-Penrose Inverse of Matrices with Symbolic Elements, In: Schipp B., Kr¨aer W. (eds) Statistical Inference, Econometric Analysis and Matrix Algebra. Physica-Verlag HD, 349–358.

42

Predrag S. Stanimirovi´c and Yimin Wei

[38] Schmidt, K. (2008). Computing the Moore-Penrose inverse of a matrix with a computer algebra system, International Journal of Mathematical Education in Science and Technology 39: 557–562. [39] Sendra, J. R. and Sendra, J. (2017). Computation of Moore-Penrose generalized inverses of matrices with meromorphic function entries, Appl. Math. Comput. 313: 355–366. [40] Stanimirovi´c, P. S., Pappas, D., Katsikis, V. N. and Stanimirovi´c, I. P. (2) (2012). Symbolic computation of AT ,S -inverses using QDR factorization, Linear Algebra Appl. 437: 1317–1331. [41] Stanimirovi´c, P. S. and Tasi´c, M. B. (2004). Partitioning method for rational and polynomial matrices, Appl. Math. Comput. 155: 137–163. [42] Stanimirovi´c, I. P. and Tasi´c, M. B. (2012). Computation of generalized inverses by using the LDL∗ decomposition, Appl. Math. Lett. 25: 526– 531. [43] Tasi´c, M. B., Stanimirovi´c, P. S. and Petkovi´c, M. D. (2007). Symbolic computation of weighted Moore-Penrose inverse using partitioning method, Appl. Math. Comput. 189: 615–640. [44] Yu, Y. and Wang, G. (2009). DFT calculation for the {2}-inverse of a polynomial matrix with prescribed image and kernel, Appl. Math. Comput. 215: 2741–2749. [45] Luo, F. L. and Bao, Z. (1992). Neural network approach to computing matrix inversion, Appl. Math. Comput. 47: 109–120. [46] Wang, J. (1993). Recurrent neural networks for solving linear matrix equations, Comput. Math. Appl. 26: 23–34. [47] Wang, J. (1993). A recurrent neural network for real-time matrix inversion, Appl. Math. Comput. 55: 89–100. [48] Wang, J. (1997). Recurrent neural networks for computing pseudoinverses of rank-deficient matrices, SIAM J. Sci. Comput. 18: 1479–1493. [49] Wei, Y. (2000). Recurrent neural networks for computing weighted Moore-Penrose inverse, Appl. Math. Comput. 116: 279–287.

Computing Generalized Inverses Using Gradient-Based ...

43

[50] Stanimirovi´c, P. S. and Petkovi´c, M. D. (2018). Gradient neural dynamics for solving matrix equations and their applications, Neurocomputing 306: 200–212. ´ c, M., Stojanovi´c, I. and Gerontitis, D. (2017). [51] Stanimirovi´c, P. S., Ciri´ Conditions for existence, representations and computation of matrix generalized inverses, Complexity, Volume 2017, Article ID 6429725, 27 pages, https://doi.org/10.1155/2017/6429725. [52] Stanimirovi´c, P. S. and Petkovi´c, M. D. (2019). Improved GNN models for constant matrix inversion, Neural Processing Letters, 50: 321-339. [53] Stanimirovi´c, P. S., Katsikis, V. N. and Li, S. (2018). Hybrid GNN-ZNN models for solving linear matrix equations, Neurocomputing 316: 124– 134. [54] Wang, X.-Z., Ma, H. and Stanimirovi´c, P. S. (2017). Recurrent neural network for computing the W-weighted Drazin inverse, Appl. Math. Comput. 300: 1–20. [55] Guo, D., Yi, C. and Zhang, Y. (2011). Zhang neural network versus gradient-based neural network for time-varying linear matrix equation solving, Neurocomputing 74: 3708–3712. [56] Stanimirovi´c, P. S., Petkovi´c, M. D. and Gerontitis, D. (2018). Gradient neural network with nonlinear activation for computing inner inverses and the Drazin inverse, Neural Processing Letters 48: 109–133. [57] Maher, P. J. (1990). Some operator inequalities concerning generalized inverses, Illinois J. Math. 34: 503–514. [58] Penrose, R. (1956). On a best approximate solutions to linear matrix equations, Proc. Cambridge Philos. Soc. 52: 17–19. [59] Groetsch, C. W. (1977). Generalized Inverses of Linear Operators: Representation and Approximation, Marcel Dekker, New York-Basel. [60] Zhang, Y., Shi, Y., Chen, K. and Wang, C. (2009). Global exponential convergence and stability of gradient-based neural network for online matrix inversion, Appl. Math. Comput. 215: 1301–1306.

44

Predrag S. Stanimirovi´c and Yimin Wei

[61] Wei, Y. (2003). The representation and approximation for the weighted Moore-Penrose inverse in Hilbert space, Appl. Math. Comput. 136: 475– 486. ˇ [62] Zivkovi´ c, I., Stanimirovi´c, P. S. and Wei, Y. (2016). Recurrent neural network for computing outer inverses, Neural Computation 28: 970–998. [63] Kailath, T. (1980). Linear Systems, Prentice-Hall, Englewood Cliffs, NJ. ˇ [64] Stanimirovi´c, P. S., Zivkovi´ c, I., Wei, Y. (2015). Recurrent neural network for computing the Drazin inverse, IEEE Transactions on Neural Networks and Learning Systems, 26: 2830–2843. [65] Helmke, U. and Moore, J. B. (1994). Optimization and Dynamical Systems, Springer-Verlag, London, 1994, 3rd printing. [66] Tavakkoli, V., Chedjou, J.C. and Kyamakya, K. 2019 A novel recurrent neural network-based ultra-fast, robust, and scalable solver for inverting a “time-varying matrix”, Sensors, 19, 4002, doi:10.3390/s19184002. [67] Lynch, S. (2017). Dynamical Systems with Applications Using Mathematicar, 2nd Edition, Birkhauser, Springer International Publishing, Boston. [68] Wolfram Research (2015). Inc., Mathematica, Version 10.0, Champaign, IL. (2)

[69] Wei, Y. (2003). Integral representation of the generalized inverse AT ,S and its applications, In Recent Research on Pure and Applied Algebra, Nova Science Publisher, New York, pp. 59–65. [70] Sheng, X. and Chen, G. (2007). Full-rank representation of generalized (2) inverse AT ,S and its applications, Comput. Math. Appl. 54: 1422–1430.

In: Hot Topics in Linear Algebra Editor: Ivan I. Kyrchei

ISBN: 978-1-53617-770-1 c 2020 Nova Science Publishers, Inc.

Chapter 2

C RAMER ’ S R ULES FOR S YLVESTER -T YPE M ATRIX E QUATIONS Ivan I. Kyrchei∗ Pidstryhach Institute for Applied Problems of Mechanics and Mathematics, NAS of Ukraine, Lviv, Ukraine

Abstract Because of the non-commutativity of the quaternion algebra, difficulties arise already in defining the quaternion determinant. There are several possible ways to define the determinant with noncommutative entries. But all previously introduced noncommutative determinants do not retain the same properties as the usual determinant for matrices with commutative entries. Moreover, if all functional properties of a determinant over a noncommutative ring are satisfied, then it takes on a value in its commutative subset only. This can be avoided thanks to the theory of row-column determinants introduced by the author. Generalized inverses are useful tools used to solve matrix equations. Using determinantal representations of the Moore-Penrose inverse previously obtained within the framework of the theory of noncommutative column-row determinants, we get explicit determinantal representation formulas of solutions (analogs of Cramer’s rule) to the quaternion generalized Sylvester matrix equation AXC + BYD = E. ∗

Corresponding Author’s Email: [email protected].

(0.1)

46

Ivan I. Kyrchei Cramer’s rules of general, Hermitian or η-Hermitian solutions, (η ∈ {i, j, k}), to the Sylvester-type matrix equations involving ∗-Hermicity or η-Hermicity, when, respectively, C = A∗ and D = B∗ , or C = Aη∗ and D = Bη∗ in Eq. (0.1), are derived in this chapter as well.

1.

INTRODUCTION

Let Hm×n and Hrm×n stand for the set of all m × n matrices and for its subset of matrices with rank r, respectively, over the quaternion skew field H = {a0 + a1 i + a2 j + a3 k | i2 = j2 = k2 = ijk = −1, a0 , a1 , a2 , a3 ∈ R}, where R is the real number field. For any quaternion a = a0 +a1 i+a2 j+a3 k ∈ H, its conjugate is a = a0 − a1 i − a2 j − a3 k ∈ H such that aa = aa = a20 + a21 + a22 + a23 and a = a. For any two quaternions a, b ∈ H, we have in general ab 6= ba and ab = b · a. For A ∈ Hm×n , the symbol A∗ stands for the (quaternion) conjugate transpose (Hermitian adjoint) of A. A matrix A ∈ Hn×n is Hermitian when A∗ = A. The Moore-Penrose inverse of A ∈ Hm×n is called the unique matrix X ∈ Hn×m satisfying the following four equations 1. AXA = A, 2. XAX = X, 3. (AX)∗ = AX, 4. (XA)∗ = XA. It is denoted by A†. A matrix A satisfying the conditions (i), (j), . . . is called an {i, j, . . .}inverse of A, and is denoted by A(i,j,...) . In particular, A(1) is the inner inverse, A(2) is the outer inverse, A(1,2) is the reflexive inverse, and A(1,2,3,4) is the Moore-Penrose inverse, etc. Another generalized inverses also have been introduced but we will not use them in this chapter. The two-sided generalized Sylvester matrix equation, AXB + CYD = E,

(1.1)

has been well-studied in matrix theory. Baksalary and Kala [1] derived the general solution to Eq. (1.1) expressed in terms of generalized inverses that has been extended to an arbitrary division ring and any regular ring with identity in [2,3]. Ranks and independence of solutions to Eq. (1.1) were explored in [4]. In [5], expressions, necessary and sufficient conditions were given for the existence of the real and pure imaginary solutions to the consistent quaternion

Cramer’s Rules for Sylvester-Type Matrix Equations

47

matrix equation (1.1). Liao et al. [6] established a direct method for computing its approximate solution using the generalized singular value decomposition and the canonical correlation decomposition. Efficient iterative algorithms were presented to solve a system of two generalized Sylvester matrix equations in [7] and to solve the minimum Frobenius norm residual problem for a system of Sylvester-type matrix equations over generalized reflexive matrices in [8]. The high research activities on quaternion Sylvester-type matrix equations can be observed lately. For instance, Huang [9] obtained necessary and sufficient conditions for the existence of solutions to Eq. (1.1) with X = Y over the quaternion skew field. Systems of periodic discrete-time coupled Sylvester quaternion matrix equations [10], systems of quaternary coupled Sylvester-type quaternion matrix equations [11], and optimal pole assignment of linear systems by the Sylvester matrix equations [12] have been explored. Some constraint generalized Sylvester matrix equations [13, 14] were studied recently. Special solutions to Sylvester-type quaternion matrix equations have been actively studied as well. Roth’s solvability criterias for some Sylvester-type matrix equations were extended over the quaternion skew field with a fixed involutive automorphism in [15]. S¸ims¸ek et al. [16] established the precise solutions to the minimum residual and matrix nearness problems of the quaternion matrix equation (AXB, DXE) = (C, F) for centrohermitian and skewcentrohermitian matrices. Explicit solutions to some Sylvester-type quaternion matrix equations (with j-conjugation) were established by means of Kronecker map and complex representation of a quaternion matrix in [17, 18]. The expressions of the least squares solutions to some Sylvester-type matrix equations over a non-split quaternion algebra [19] and hermitian solutions over a split quaternion algebra [20] were derived. Many authors have paid attention also to the Sylvester-type matrix equation involving ∗-Hermicity AXA∗ + BYB∗ = C. (1.2) Chang and Wang [21] derived expressions for the general symmetric solution and the general minimum-2-norm symmetric solution to Eq. (1.2) within the real settings. Xu et al. [22] given a representation of the least-squares Hermitian (skew-Hermitian) solution to Eq. (1.2). Zhang [23] obtained a representation of the general Hermitian nonnegative-definite (positive-definite) solution to Eq. (1.2) within the complex settings. Yuan et al. [24] derived the expression of Hermitian solution for the matrix nearness problem associated with the quaternion matrix equation (1.2). Wang et al. [25] gave a necessary and sufficient condition

48

Ivan I. Kyrchei

for the existence and an expression for the Re-nonnegative definite solution to Eq. (1.2) over H by using the decomposition of pairwise matrices. Wang et al. [26] established the extreme ranks for the general (skew-)Hermitian solution to Eq. (1.2) over H. Recently, important tools in researching of linear modeling and convergence analysis in statistical signal processing have been η-Hermitian matrices (see [27, 28]). Definition 1.1. [28–30] A matrix A ∈ Hn×n is known to be η-Hermitian and ηskew-Hermitian if A = Aη∗ = −ηA∗η and A = −Aη∗ = ηA∗ η, respectively, where η ∈ {i, j, k}. On broader aspects, a map φ : Hm×n −→ Hn×m , where φ(A) = Aη∗ , is an example of a nonstandard involution. The definition of the nonstandard involution φ on the quaternion skew field H were first presented by Rodman as follows. Definition 1.2. (Involution) [31] A map φ : H −→ H is called an involution if φ(xy) = φ(y)φ(x), φ(x + y) = φ(x) + φ(y), φ2 (x) = x for all x, y ∈ H. Moreover, φ has matrix representation as a real 4 × 4-matrix with respect to the basis {1, i, j, k}:   1 0 φ= , 0 T

where either T = −I3 , (I3 stands to the 3 × 3 unit matrix), then an involution be standard, or T is a 3 × 3 real orthogonal symmetric matrix with eigenvalues 1, 1, −1, then an involution be nonstandard. For A ∈ Hm×n , Aφ stands [31] for the n×m matrix obtained by applying φ entrywise to the transposed matrix AT . Some new properties of the nonstandard involution have been recently studied in [32–34]. By themselves, η-Hermitian matrices cause close attention of researchers. The singular value decomposition of the η-Hermitian matrix was examined in [29]. Very recently, Liu [35] determined η-skew-Hermitian solutions to some classical matrix equations and, among them, the generalized Sylvester-type matrix equation, AXAη∗ + BYBη∗ = C.

(1.3)

Motivated by the vast application of quaternion matrices and the latest interest of Sylvester-type quaternion matrix equations, the main goals of the

Cramer’s Rules for Sylvester-Type Matrix Equations

49

chapter are to derive explicit determinantal representation formulas (analogs of Cramer’s rule) of the general solution to Eq. (1.1), of the general and (skew-) Hermitian solutions to Eq. (1.2), and of the general and η-(skew-)Hermitian solutions to Eq. (1.3) based on determinantal representations of generalized inverses. The determinantal representation of the usual inverse is the matrix with cofactors in entries that suggests a direct method of finding of inverse and makes it applicable through Cramer’s rule to systems of linear equations. The same is desirable for the generalized inverses. But there is not so unambiguous even for generalized inverses with complex or real settings. Therefore, there are various determinantal representations of generalized inverses because of looking for their more applicable explicit expressions (for the Moore-Penrose inverse, see, e.g., [36–38]). By virtue of noncommutativity of quaternions, the problem for determinantal representation of generalized quaternion inverses is even more complicated, and only now it can be solved due to the theory of column-row determinants introduced in [39, 40]. Within the framework of the theory of row-column determinants, determinantal representations of various generalized quaternion inverses, namely, the Moore-Penrose inverse [41], the Drazin inverse [42], the W-weighted Drazin inverse [43], and the weighted Moore-Penrose inverse [44] have been derived by the author. These determinantal representations were used to obtain determinantal explicit representation formulas for the minimum norm least squares solutions in [45] and weighted Moore-Penrose inverse solutions in [46, 47] to some quaternion matrix equations, and for both Drazin and W-weighted Drazin inverse solutions to some restricted quaternion matrix equations and quaternion differential matrix equations in [48, 49]. Analogs of Cramer’s rule for the two-sided quaternion generalized Sylvester matrix equation (1.1) and quaternion Sylvester-type matrix equations, (1.2) and (1.3), having ∗- and η-hermicity, respectively, have been derived by the author in [50–52]. An interested reader can find determinantal representations of solutions to some systems of quaternion Sylvester-type matrix equations obtained by the author in the framework of the theory of row-column determinants as well in [53–57]. Other researchers also used the row-column determinants in their developments. In particular, Song derived determinantal representations of the generalized inverse A2T ,S [58] and the Bott-Duffin inverse [59]. Song et al. obtained the Cramer rules for the solutions of restricted matrix equations [60], and so forth. Moreover, Song et al. [61] have just recently considered determinantal repre-

50

Ivan I. Kyrchei

sentations of the general solution to the generalized Sylvester matrix equation (1.1) over the quaternion skew field H using row-column determinants as well. Their approach differs from our proposed because for determinantal representations of solutions to some quaternion matrix equation we use only coefficient matrices of the equation, while, in [61], supplementary matrices have been used but that have not be obtained always easy. This chapter is organized as follows. In Section 2, we start with some remarkable results which have significant role during the construction of the main results of the chapter. Elements of the theory of row-column determinants and features of the η-Hermitian and η-skew-Hermitian matrices are given in Subsection 1. Determinantal representations of the Moore-Penrose inverse of a quaternion matrix, its conjugate transpose and its η-conjugate transpose are considered in Subsection 2. In this subsection, we give Cramer’s rules to the quaternion matrix equation AXB = C and its special cases as well. In Section 3, the explicit determinantal representations of the general solution to Eq. (1.1) are obtained within the framework of the theory of row-column determinants. Also, we give an analog of Cramer’s rule of Eq. (1.1) with only the complex coefficient matrices that is the new representation of the general solution to the complex Sylvester equation (1.1) by using usual determinants. Algorithms of Cramer’s rules for Eq. (1.1) in both quaternion and complex cases are represented for better understanding of their applications. Analogs of Cramer’s rule for the general, Hermitian, skew-Hermitian solutions to Eq. (1.2) are derived in Section 4, and for the general, η-Hermitian, η-skew-Hermitian solutions to Eq. (1.3) are obtained in Section 5. In Section 6, a numerical example to illustrate the main results is considered.

2.

P RELIMINARIES

We commence with the following preliminaries which have crucial function in the construction of the chief outcomes of the following sections.

2.1.

Elements of the Theory of Row-Column Determinants

Due to noncommutativity of quaternions, a problem of defining a determinant of matrices with noncommutative entries (that is also defined as the noncommutative determinant) has been unsolved for a long time. There are several versions of the definition of the noncommutative determinant, in particular the determi-

Cramer’s Rules for Sylvester-Type Matrix Equations

51

nants of Dieudonn´e [62], Study [63], Moore [64], Chen [65], the quasideterminants of Gelfand-Retakh [66], etc. But any of the previous noncommutative determinants have not fully retained those properties which it has owned for matrices with commutative entries [67, 68]. Moreover, if functional properties of noncommutative determinant over a ring are satisfied, then it takes on a value in its commutative subset. This dilemma can be avoided thanks to the theory of row-column determinants. For A ∈ Hn×n we define n row determinants and n column determinants as follows. Suppose Sn is the symmetric group on the set In = {1, . . ., n}. Definition 2.1. [39] The i-th row determinant of A = (aij ) ∈ Hn×n is defined for all i = 1, . . . , n by putting rdeti A =

X

(−1)

n−r

(ai ik1 aik1 ik1 +1 . . . aik1 +l1 i ) . . . (aikr ikr +1 . . . aikr +lr ikr ),

σ∈Sn

σ = (i ik1 ik1 +1 . . . ik1 +l1 ) (ik2 ik2 +1 . . . ik2 +l2 ) . . . (ikr ikr +1 . . . ikr +lr ) ,

where σ is the left-ordered permutation. It means that its first cycle from the left starts with i, other cycles start from the left with the minimal of all the integers which are contained in it, ikt < ikt +s for all t = 2, . . ., r, s = 1, . . ., lt, and disjoint cycles (except for the first one) are ordered so that their first elements ik2 < ik3 < · · · < ikr strictly increase from left to right. Definition 2.2. [39] The j-th column determinant of A = (aij ) ∈ Hn×n is defined for all j = 1, . . . , n by putting cdetj A = X = (−1)n−r (ajkr jkr +lr . . . ajkr +1 jkr ) . . . (aj jk1 +l1 . . . ajk1 +1 jk1 ajk1 j ), τ ∈Sn

τ = (jkr +lr . . . jkr +1 jkr ) . . . (jk2 +l2 . . . jk2 +1 jk2 ) (jk1 +l1 . . . jk1 +1 jk1 j) ,

where τ is the right-ordered permutation. It means that its first cycle from the right starts with j, other cycles start from the right with the minimal of all the integers which are contained in it, jkt < jkt +s for all t = 2, . . ., r, s = 1, . . ., lt,

52

Ivan I. Kyrchei

and disjoint cycles (except for the first one) are ordered so that their first elements jk2 < jk3 < · · · < jkr strictly increase from right to left. The row and column determinants have the following linear properties. Lemma 2.3. [40] If the i-th row of A ∈ Hn×n is a left linear combination of other row vectors, i.e. ai. = α1 b1 + · · · + αk bk , where αl ∈ H and bl ∈ H1×n for all l = 1, . . ., k and i = 1, . . ., n, then X rdeti Ai. (α1 b1 + · · · + αk bk ) = αl rdeti Ai. (bl ) . l

Lemma 2.4. [40] If the j-th column of A ∈ Hn×n is a right linear combination of other column vectors, i.e. a.j = b1 α1 + · · · + bk αk , where αl ∈ H and bl ∈ Hn×1 for all l = 1, . . ., k and j = 1, . . . , n, then X cdetj A.j (b1 α1 + · · · + bk αk ) = cdetj A.j (bl ) αl . l

Lemma 2.5. [40]Let A ∈ Hn×n . Then cdeti A∗ = rdeti A, rdeti A∗ = cdeti A for all i = 1, . . ., n. Since by Definition 2.1 and 2.2 for A ∈ Hn×n rdeti Aη =rdeti (−ηAη) = −η(rdeti A)η, η

cdeti A =cdeti (−ηAη) = −η(cdeti A)η,

(2.1) (2.2)

η

n−1

η(rdeti A)η,

(2.3)

η

n−1

η(cdeti A)η

(2.4)

rdeti (−A ) =rdeti (ηAη) = (−1)

cdeti (−A ) =cdeti (ηAη) = (−1)

for all i = 1, . . . , n, then, due to Lemma 2.5, the next lemma follows immediately. Lemma 2.6. Let A ∈ Hn×n . Then rdeti Aη∗ = −η(cdeti A)η, cdeti Aη∗ = −η(rdeti A)η, rdeti (−Aη∗ ) = (−1)n−1 η(cdeti A)η, cdeti (−Aη∗ ) = (−1)n−1 η(rdeti A)η, for all i = 1, . . . , n

Cramer’s Rules for Sylvester-Type Matrix Equations

53

Remark 2.7. For any ηl ∈ {i, j, k}, where l = 1, 2, 3, and for an arbitrary quaternion, q = a0 + a1 η1 + a2 η2 + a3 η3 ∈ H, it’s conjugate can be expressed as q = a0 − a1 η1 − a2 η2 − a3 η3 , and q η1 := − η1 qη1 = a0 + a1 η1 − a2 η2 − a3 η3 , q −η1 :=η1 qη1 = −a0 − a1 η1 + a2 η2 + a3 η3 . It is well-known that for a Hermitian matrix, A = A∗ , the element in the i-th row and j-th column is equal to the conjugate of the element in the j-th row and i-th column, for all indices i and j. So, the elements of its principal diagonal are real.   For an η1 -Hermitian matrix A = Aη1 ∗ = aηij1 ∗ , the elements of the principal diagonal are expressed as aηii1 ∗ = a0 + a2 η2 + a3 η3 , and a pair of elements which are symmetric with respect to the principal diagonal can be represented as aηij1 ∗ = a0 + a1 η1 + a2 η2 + a3 η3 , aηji1 ∗ = a0 − a1 η1 + a2 η2 + a3 η3 . Similarly, elements  of the  principal diagonal of an η1 -skew-Hermitian matrix −η1 ∗ η1 ∗ A = −A = aij are 1∗ a−η = a1 η1 , ii

and a pair of elements which are symmetric with respect to the principal diagonal can be represented as 1∗ a−η = a0 + a1 η1 + a2 η2 + a3 η3 , ij 1∗ a−η = −a0 + a1 η1 − a2 η2 − a3 η3 . ji

where al ∈ R for all l = 0, . . . , 3. So, Lemma 2.8 gives the following features of η-Hermitian and η-skewHermitian matrices.

54

Ivan I. Kyrchei

Lemma 2.8. If A ∈ Hn×n is η-Hermitian, then rdeti A = −η(cdeti A)η, and cdeti A = −η(rdeti A)η. If A ∈ Hn×n is η-skew-Hermitian, then rdeti A = (−1)n−1 η(cdeti A)η, and cdeti A = (−1)n−1 η(rdeti A)η for all i = 1, . . . , n. Remark 2.9. Since [40] for Hermitian A we have rdet1 A = · · · = rdetn A = cdet1 A = · · · = cdetn A ∈ R, the determinant of a Hermitian matrix can be defined by putting det A := rdeti A = cdeti A for all i = 1, . . . , n. Properties of the determinant of a Hermitian matrix are similar to the properties of an usual (commutative) determinant and they have been completely explored by row-column determinants in [40]. In particular, it is proved [40] that det(A∗ A) = 0 iff some column of A ∈ Hm×n is a right linear combination of others, or some row of A∗ is a left linear combination of others. From this follows the definition of the determinantal rank of a quaternion matrix A as the largest possible size of a nonzero principal minor of its corresponding Hermitian matrix A∗ A. It is shown that the row rank of a quaternion matrix A ∈ Hm×n (that is a number of its left-linearly independent rows), the column rank (that is a number of its right-linearly independent columns) and its determinantal rank are equivalent to rank (A∗ A) = rank (AA∗).

2.2.

Determinantal Representations of the Moore-Penrose Inverses with Applications to the Twos-Sided Matrix Equation

The following notation will be used for determinantal representations of the Moore-Penrose inverse. Let α := {α1 , . . . , αk } ⊆ {1, . . . , m} and β := {β1 , . . . , βk } ⊆ {1, . . . , n} be subsets with 1 ≤ k ≤ min {m, n}. By Aαβ denote a submatrix of A ∈ Hm×n whose rows and columns are indexed by α and β, respectively. Similarly, suppose Aαα is a principal submatrix of A whose

Cramer’s Rules for Sylvester-Type Matrix Equations

55

rows and columns are indexed by α. If A is Hermitian, then |A|αα denotes the corresponding principal minor of det A. The collection of strictly increasing sequences of k integers chosen from {1, . . ., n} is denoted by

Lk,n := {α : α = (α1 , . . . , αk ) , 1 ≤ α1 < · · · < αk ≤ n} for 1 ≤ k ≤ n. For fixed i ∈ α and j ∈ β, let Ir,m {i} := {α : α ∈ Lr,m , i ∈ α}, Jr,n {j} := {β : β ∈ Lr,n , j ∈ β}. Let a.j and a∗.j be the j-th columns, ai. and a∗i. be the i-th rows of the matrices A and A∗ , respectively. Suppose A.j (b) denotes the matrix obtained from A by replacing its j-th column with the column b, and Ai. (b) denotes the matrix obtained from A by replacing its i-th row with the row b. Theorem 2.10. [41] If A ∈ Hrm×n , then the Moore-Penrose inverse A† =   † aij ∈ Hn×m have the following determinantal representations a†ij =

=

P

β∈Jr,n {i}

P

  β cdeti (A∗ A).i a∗.j β

P

β∈Jr,n

α∈Ir,m {j}

|A∗ A|ββ

=

(2.5)

rdetj ((AA∗)j. (a∗i. ))αα P

α∈Ir,m

.

|AA∗|αα

(2.6)

Remark 2.11. Note that for an arbitrary full-rank matrix A ∈ Hrm×n , a rowvector b ∈ H1×m , and a column-vector c ∈ Hn×1 we put, respectively, • if r = m, then for all i = 1, . . ., m X rdeti ((AA∗ )i. (b)) =

rdeti ((AA∗)i. (b))αα ,

α∈Im,m {i}



det (AA ) =

X

α∈Im,m

|AA∗|αα ;

56

Ivan I. Kyrchei • if r = n, then for all j = 1, . . . , n   cdetj (A∗ A).j (c) = det (A∗ A) =

X

β∈Jn,n {j}

X

 β cdetj (A∗ A).j (c) , β

β

|A∗ A|β .

β∈Jn,n

Corollary 2.1. If A ∈ Hrm×n , then the Moore-Penrose inverse (Aη )† =   aη† ∈ Hn×m have the following determinantal representations, ij aη† ij = −η

P

β∈Jr,n {i}

  β cdeti (A∗ A).i a∗.j β

P

β∈Jr,n

|A∗ A|ββ

η

(2.7)

η.

(2.8)

and aη† ij = −η

P

α∈Ir,m {j}

rdetj ((AA∗)j. (a∗i. ))αα P

α∈Ir,m

|AA∗ |αα

Proof. First, we claim that rank A = rank Aη by determinantal rank as well. Really, since ((Aη )∗ Aη ) is Hermitian, and due to (2.2), we have det ((Aη )∗ Aη ) = det (−ηA∗ η(−η)Aη) = det (−ηA∗Aη) = cdeti (−ηA∗ Aη) = −ηcdeti (A∗ A) η = −η det (A∗ A) η = det (A∗ A) .

(2.9)

So, the (determinantal) ranks of ((Aη )∗ Aη ) and (A∗ A) are the same. Then by (2.2) and (2.9), for the Moore-Penrose inverse (Aη )† , we obtain

Cramer’s Rules for Sylvester-Type Matrix Equations

aη† ij =

P

β∈Jr,n {i}

  β cdeti (Aη∗ Aη ).i aη∗ .j β

P

β∈Jr,n

=

P

β∈Jr,n {i}

|Aη∗ Aη∗ |ββ

=

  β cdeti (−ηA∗ Aη).i −η(a∗.j )η β

P

β∈Jr,n

=−η

57

P

β∈Jr,n{i}

|A∗ A|ββ

  β cdeti (A∗ A).i a∗.j

=

β

P

|A∗ A|ββ

β∈Jr,n

η,

So, we prove (2.7). The determinantal representation (2.8) can be proven similarly. Remark 2.12. First note that (A∗ )† = (A†)∗ . Because of symbol equivalence, we shall use the denotation A†,∗ := (A∗)† as well. So, by Lemma 2.5, for the Hermitian adjoint matrix A∗ ∈ Hrn×m  determinantal representations of its ∗ † ∗ † Moore-Penrose inverse (A ) = (aij ) ∈ Hm×n are (a∗ij )† = (aji )† =

=

P

α∈Ir,n {j}

P

 α rdetj (A∗ A)j. (ai . ) α

P

α∈Ir,n

β∈Jr,m {i}

|A∗ A|αα

=

(2.10)

cdeti ((AA∗ ).i(a.j ))ββ P

β∈Jr,m

|AA∗ |ββ

.

(2.11)

Remark 2.13. Suppose A ∈ Hrn×n . Denote the (ij)-th elements of Aη∗ and −η∗ −Aη∗ by aη∗ ij and aij , respectively. By Lemma 2.6 and Remark 2.12, for the η-Hermitian adjoint matrix Aη∗ ∈ Hn×m and η-skew-Hermitian adjoint matrix −Aη∗ ∈ Hn×m determinantal representations of their Moore-Penrose inverses

58

Ivan I. Kyrchei     † ∈ Hm×n and (−Aη∗ )† = (a−η∗ )† are respectively (Aη∗ )† = (aη∗ ) ij ij η∗

(aij )† = −η(aji )†η =

= −η

= −η

P

α∈Ir,n {j}

 α rdetj (A∗ A)j. (ai . ) α

P

β∈Ir,n

P

P

β∈Jr,m



P

α∈Ir,n {j}

P

η

(2.12)

η,

(2.13)

cdeti ((AA∗).i (a.j ))ββ

β∈Jr,m {i}

−η∗ † (aij ) = η(aji)† η = η

|A∗ A|αα

 α rdetj (A∗ A)j. (ai . ) α

P

|A∗ A|αα

P

|AA∗ |ββ

β∈Ir,n

β∈Jr,m {i}

β

|AA∗ |β

η=

cdeti ((AA∗).i (a.j ))ββ

β∈Jr,m

η.

Since the projection matrices A†A =: QA = (qij ) and AA† =: PA = (pij ) are Hermitian, then qij = qji and pij = pji for all i 6= j. So, due to Theorem 2.10 and Remark 2.12 we have evidently the following corollaries. Corollary 2.2. If A ∈ Hrm×n , then the projection matrix QA = (qij )n×n have the determinantal representations  α P P cdeti ((A∗ A).i (a˙ .j ))ββ rdetj (A∗ A)j. (a˙ i. ) qij =

β∈Jr,n {i}

P

β∈Jr,n

|A∗ A|ββ

=

α∈Ir,n {j}

α

P

α∈Ir,n

|A∗ A|αα

,

(2.14) where a˙ .j and a˙ i. are the j-th column and i-th row of A∗ A ∈ Hn×n , respectively. Corollary 2.3. If A ∈ Hrm×n , then the projection matrix AA† =: PA = (pij )m×m has the determinantal representation

59

Cramer’s Rules for Sylvester-Type Matrix Equations

pij =

P

α∈Ir,m {j}

rdetj ((AA∗ )j.(¨ ai. ))αα P

α∈Ir,m

=

|AA∗ |αα

P

β

β∈Jr,m {i}

cdeti ((AA∗).i (¨ a.j ))β P

α∈Jr,m

,

|AA∗ |ββ ∗

m×m

where ¨ ai. and a ¨j. are the i-th row and the j-th column of AA ∈ H

(2.15) .

Determinantal representations of orthogonal projectors LA := I−A† A and RA := I − AA† induced from A can be derived similarly. Theorem 2.14. [2] Let A ∈ Hm×n , B ∈ Hr×s , C ∈ Hm×s be known and X ∈ Hn×r be unknown. Then the matrix equation AXB = C

(2.16)

is consistent if and only if AA†CB† B = C. In this case, its general solution can be expressed as X = A† CB† + LA V + WRB , where V, W are arbitrary matrices over H with allowable dimensions. . Then the partial solution , B ∈ Hrr×s Theorem 2.15. [42] Let A ∈ Hrm×n 1 2 † † n×r X = A CB = (xij ) ∈ H to (2.16 ) have determinantal representations,

xij =

P

β∈Jr1 ,n {i}

P

β∈Jr1 ,n

where



 dB .j =

cdeti (A∗ A).i dB .j

β |A∗ A|β

X

α∈Ir2 ,r {j}



 dA i. =

X

β∈Jr1 ,n {i}

P

α∈Ir2 ,r

β

α |BB∗ |α

β

=

P

α∈Ir2 ,r {j}

P

β∈Jr1 ,n

 α rdetj (BB∗ )j. dA i.

β |A∗ A|β

α

P

α∈Ir2 ,r

α |BB∗ |α

,

  α rdetj (BB∗ )j. (˜ck. )  ∈ Hn×1 , k = 1, . . ., n, α



cdeti ((A∗ A).i (˜ c.l ))ββ  ∈ H1×r , l = 1, . . . , r,

are the column vector and the row vector, respectively. ˜ci. and ˜c.j are the i-th e = A∗ CB∗ . row and the j-th column of C

60

Ivan I. Kyrchei

Corollary 2.4. Let A ∈ Hkm×n , C ∈ Hm×s be known and X ∈ Hn×s be unknown. Then the matrix equation AX = C is consistent if and only if AA†C = C. In this case, its general solution can be expressed as X = A†C + LA V, where V is an arbitrary matrix over H with an allowable dimension. The partial solution X = A†C has the following determinantal representation, P cdeti ((A∗ A).i (ˆ c.j ))ββ xij =

β∈Jk,n {i}

P

β∈Jk,n

,

|A∗ A|ββ

ˆ = A∗ C. where ˆc.j is the j-th column of C Corollary 2.5. Let B ∈ Hkr×s , C ∈ Hn×s be given, and X ∈ Hn×r be unknown. Then the equation XB = C is solvable if and only if C = CB† B and its general solution is X = CB† + WRB , where W is a any matrix with an allowable dimension. Moreover, its partial solution X = CB† has the determinantal representation,  α P ∗ rdetj (BB )j. (ˆ ci. ) xij =

α∈Ik,r {j}

α

P

α∈Ik,r

|BB∗ |αα

,

ˆ = CB∗ . where ˆci. is the i-th row of C

3.

D ETERMINANTAL R EPRESENTATIONS OF THE G ENERAL SOLUTION TO THE QUATERNION SYLVESTER MATRIX E QUATION (1.1)

Lemma 3.1. [2] Let A ∈ Hm×n , B ∈ Hr×s , C ∈ Hm×p , D ∈ Hq×s , E ∈ Hm×s . Put M = RA C, N = DLB , S = CLM . Then the following results are equivalent. (i) Eq. (1.1) has a pair solution (X, Y), where X ∈ Hn×r , Y ∈ Hp×q . (ii) RM RA E = 0, RA ELD = 0, ELD LN = 0, RC ELB = 0. (iii) PM RA EQD = RA E, PC ELB QN = ELB .

Cramer’s Rules for Sylvester-Type Matrix Equations

61

(iv) rank [A C E] = rank [A C], rank [B∗ D∗ E∗ ] = rank [B∗ D∗ ],         A E A 0 C E C 0 rank = rank , rank = rank . 0 D 0 D 0 B 0 B In that case, the general solution to (1.1) can be expressed as X =A† EB† − A†CM†RA EB† − A†SC†ELB N† DB† − A† SVRN DB† + LA U + ZRB , †







(3.1) †





Y =M RA ED + LM S SC ELB N + LM (V − S SVNN ) + WRD , (3.2) where U, V, Z and W are arbitrary matrices over H obeying agreeable dimensions. Lemma 3.2. [69] If A ∈ Hn×n is Hermitian and idempotent, then for any matrix B ∈ Hm×n the following equations hold A(BA)† = (BA)†, (AB)†A = (AB)†. By Lemma 3.2, we have M†RA =(RA C)†RA = M†, LB N† = LB (DLB )† = N† , LM S† =LM (CLM )† = S† .

(3.3)

Using (3.3), the simplifications of (3.1) and (3.2) follow as X =A†EB† − A†CM†EB† − A† SC†EN† DB† − A†SVRN DB† + LA U + ZRB , Y =M†ED† + QS C†EN† + LM (V − QS VPN ) + WRD . By putting U, V, Z, and W as zero-matrices, we obtain the partial solution to (1.1), X = A†EB† − A†CM†EB† − A†SC† EN† DB† , †







Y = M ED + QS C EN . The following theorem gives determinantal representations of (3.4)-(3.5).

(3.4) (3.5)

62

Ivan I. Kyrchei

Theorem 3.3. Let A ∈ Hrm×n , B ∈ Hrr×s , C ∈ Hrm×p , D ∈ Hrq×s 3 4 , rank M = 1 2 r5 , rank N = r6 , rank S = r7 . Then the pair solution (3.4)-(3.5), X = (xij ) ∈ Hn×r , Y = (ygf ) ∈ Hp×q , to Eq. (1.1) by the components (1)

(2)

(3)

(1)

(2)

xij = xij − xij − xij , ygf = ygf + ygf , have the following determinantal representations. (i)

(1)

xij =

P

β∈Jr1 ,n {i}

P

=

α∈Ir2 ,r {j}

P

β∈Jr1 ,n

β

|A∗ A|ββ

β∈Jr1 ,n

P

  β cdeti (A∗ A).i dB .j P

α∈Ir2 ,r

|BB∗ |αα

 α rdetj (BB∗ )j. dA i.

=

(3.6)

α

|A∗ A|ββ

P

α∈Ir2 ,r

|BB∗ |αα

,

(3.7)

where 

X

 dB .j =

α∈Ir2 ,r {j}



 dA i. =

X

   α (1)  ∈ Hn×1 , k = 1, . . ., n, (3.8) rdetj (BB∗ )j. ek. α

cdeti

β∈Jr1 ,n {i}



  β (1)  ∈ H1×r , l = 1, . . . , r, (3.9) (A∗ A).i e. l β

(1)

(1)

are the column vector and the row vector, respectively. ek. and e.l are the k-th row and the l-th column of E1 := A∗ EB∗ . (ii)  α P rdetj (BB∗ )j. (φei. (2)

xij =

P

β∈Jr1 ,n

α∈Ir2 ,r {j}

|A∗ A|ββ

α

P

β∈Ir5 ,m

|MM∗ |αα

P

α∈Ir2 ,r

|BB∗ |αα

,

(3.10)

63

Cramer’s Rules for Sylvester-Type Matrix Equations

ei. is i-th row of Φ e := ΦEB∗ . The matrix Φ = (φik ) is such that where φ X β φik = cdeti (A∗ A). i ϕM (3.11) .k β = β∈Jr1 ,n {i}

X

=

rdetk (MM∗ )k. ϕA i.

α∈Ir5 ,m {k}

where



 ϕM .k = 

 ϕA i. = (1)

X

α∈Ir5 ,m {k}

X

β∈Jr1 ,n {i}

α α

,

(3.12)

   α (1)  ∈ Hn×1 , t = 1, . . ., n, rdetk (MM∗)k. ct . α

(3.13)



  β (1)  ∈ H1×m , l = 1, . . ., m. cdeti (A∗ A).i c. l β

(3.14)

(1)

Here ct . and c. l are t-th row and l-th column of C1 := A∗ CM∗ , respectively. (iii) P cdeti ((A∗ A).i (e υ.j ))ββ (3)

xij =

P

β∈Jr1 ,n {i}

β∈Jr1 ,n

|A∗ A|ββ

P

β∈Jr3 ,p

|C∗ C|ββ

P

α∈Jr6 ,s

|N∗ N|ββ

P

α∈Ir2 ,r

|BB∗ |αα

,

(3.15)

e = A∗ SΥ. The matrix Υ = (υtj ) ∈ Hp×n is such where υe.j is j-th column of Υ that X β υtj = cdett ((C∗ C).t (e e.j ))β , (3.16) β∈Jr3 ,p {t}

e = C∗ EΨ, and the matrix Ψ = (ψvj ) ∈ Hs×n is where e e.j is j-th column of E such that ψvj =

X

α∈Ir2 ,r {j}

=

X

β∈Jr6 ,s {f }

 α N rdetj (BB∗ )j. ζv. =

(3.17)

cdetv (N∗ N).v ζ.jB

(3.18)

α

β β

,

64

Ivan I. Kyrchei

where 

X

N ζv. =

β∈Jr6 ,s {v}



X

ζ.jB = 

α∈Ir2 ,r {j}

(1)

(1)

   β (1)  ∈ H1×r , k = 1, . . ., r, cdetv (N∗ N).v d.k β



(3.19)

  α (1)  ∈ Hs×1 , l = 1, . . . , s. (3.20) rdetj (BB∗ )j. dl. α

Here d.k and dl. are k-th column and l-th row of the matrix D1 = N∗ DB∗ , respectively. (iv)

(1)

ygf =

P

β∈Jr5 ,p {g}

P

β∈Jr5 ,p

=

P

α∈Ir4 ,q {f }

P

β∈Jr5 ,p

  β cdetg (M∗ M).g dD .f β

|M∗ M|ββ

P

α∈Ir4 ,q

|DD∗ |αα

 α rdetf (DD∗ )f. dM g.

|M∗ M|ββ

=

(3.21)

α

P

α∈Ir4 ,q

|DD∗ |αα

,

(3.22)

where 

 dD .f = 

 dM g. =

X

   α (4)  ∈ Hp×1 , k = 1, . . . , p, rdetf (DD∗ )f. ek.

X

   β (4)  ∈ H1×q , l = 1, . . ., q. cdetg (M∗ M).g e.l

α∈Ir4 ,q {f }

β∈Jr5 ,p {g}

(4)

(4)

α

(3.23)

β

Here ek. and e.l are k-th row and l-th column of E4 := M∗ ED∗ .

(3.24)

65

Cramer’s Rules for Sylvester-Type Matrix Equations (v) (2)

ygf =

P

P

β∈Jr7 ,p {g}

β∈Jr7 ,p

|S∗ S|ββ

 β cdetg (S∗ S).g (e ω.f ) β

P

β∈Jr3 ,p

|C∗ C|ββ

P

α∈Ir6 ,q

|NN∗ |αα

,

(3.25)

e := S∗ SΩ. The matrix Ω = (ωtf ) is such that where ω e.f is f -th row of Ω  X α ωtf = rdetf (NN∗ )f. ξt.C = (3.26) α

α∈Ir6 ,q {t}

X

=

N cdett (C∗ C).t ξ.f

β∈Jr3 ,p {t}

where



ξt.C = 

X

β∈Jr3 ,p {t}



N ξ.f =

X

α∈Ir6 ,q {j}

(5)

(5)

β β

,

(3.27)

   β (5)  ∈ H1×q , k = 1, . . ., q, (3.28) cdett (C∗ C).t e.k β

   α (5)  ∈ Hs×1 , l = 1, . . . , p. (3.29) rdetf (NN∗ )f. el. α

Here e.k and el. are k-th column of l-th row of E5 = C∗ EN∗ , respectively.

Proof. (i) It’s evident that the equations (3.6)-(3.7) follow from Theorem  2.15.  (2) † † † (ii) Consider the second term of (3.4), A CM EB := X2 = xij . Using Theorem 2.15 for A†CM† and Corollary 2.5 for EB† , we obtain   α Pp P (2) ∗) φ rdet (BB e q. iq j j. q=1 (2)

xij =

β∈Jr1 ,n (2)

α

α∈Ir2 ,r {j}

P

|A∗ A|ββ

P

β∈Ir5 ,m

|MM∗ |αα

P

α∈Ir2 ,r

|BB∗ |αα

where eq. is q-th row of E2 := EB∗ ; and by Theorem 2.15 X β φiq := cdeti (A∗ A). i ϕM = .q β β∈Jr1 ,n {i}

=

X

α∈Ir5 ,m {q}

 α rdetq (MM∗ )q. ϕA , i. α

,

66

Ivan I. Kyrchei

where 

 ϕM .q =

X

α∈Ir5 ,m {q}



 ϕA i. =

X

β∈Jr1 ,n {i}

(1)

(1)

   α (1)  ∈ Hn×1 , f = 1, . . ., n, rdetq (MM∗ )q. cf. α



  β  ∈ H1×m , s = 1, . . ., m. cdeti (A∗ A).i c(1) .s β

Here cf. and c.s are f -th row and s-th column of C1 := A∗ CM∗ . e = ΦEB∗ . Taking into Now, construct the matrix Φ = (φiq ) and find Φ P (2) account q φiq eq. = φei. , it follows (3.10).   (3) (iii) For the third term of (3.4), A†SC† EN† DB† := X3 = xij , we first use Theorem 2.15 for N†DB† =: Ψ. So, ψf j :=

X

α∈Ir2 ,r {t}

=

X

 α N = rdetj (BB∗ )j. ζf.

β∈Jr6 ,s {f }

α

 β cdetf (N∗ N).f ζ.jB , β

(3.30) (3.31)

where 

X

N ζf. =

β∈Jr6 ,s {f }

ζ.jB = 

α∈Ir2 ,r {j}



(1)

X

(1)

   β (1)  ∈ H1×r , k = 1, . . ., r, cdetf (N∗ N).f d.k β



  α (1)  ∈ Hs×1 , l = 1, . . . , s. rdetj (BB∗ )j. dl. α

Here d.k and dl. are k-th column and l-th row of D1 = N∗ DB∗ , respectively. Using (3.30) or (3.31), we construct the matrix Ψ = (ψf j ) ∈ Hs×n and find E3 = EΨ. For determinantal representation of the Moore-Penrose C†, it can

67

Cramer’s Rules for Sylvester-Type Matrix Equations be used (2.5). Then υtj :=

p X

X

(3)

cdett ((C∗ C).t (c.l ))ββ · elj =

l=1 β∈Jr3 ,p {t}

X

=

cdett ((C∗ C).t (e e.j ))ββ ,

(3.32)

β∈Jr3 ,p {t} (3) e = where elj is (lj)-th entry of the matrix E3 and e e.j is the j-th column of E ∗ p×n C EΨ. Using (3.32), we obtain the matrix Υ = (υtj ) ∈ H and find S1 = † SΥ. For determinantal representation of A , it can be used (2.5). Then

Pm

f =1

(3) xij

= P

β∈Jr1 ,n

P

β∈Jr1 ,n {i}

|A∗ A|ββ

P

β∈Jr3 ,p

  β (1) cdeti (A∗ A).i a∗.f · sf j β

|C∗ C|ββ

P

α∈Jr6 ,s

|N∗ N|ββ

P

α∈Ir2 ,r

|BB∗ |αα

.

P (1) ∗ e = Taking into account m e.j , where υe.j is j-th column of Υ f =1 a.f · sf j = υ ∗ A SΥ, it follows (3.15). Now, we consider each term of (3.5). (iv) Determinantal representations (3.21) and (3.22) for the first term Y1 = M†ED† evidently follow from Theorem 2.15. (v) Consider the second term Y2 = QS C† EN† of (3.4). Applying Theorem 2.15 for C† EN† , we have  X X  α N β = cdett (C∗ C).t ξ.f , ωtf = rdetf (NN∗ )f. ξt.C β α

α∈Ir6 ,q {t}

β∈Jr3 ,p {t}

where 

ξt.C =  

N ξ.f =

X

β∈Jr3 ,p {t}

X

α∈Ir6 ,q {j}

   β (5)  ∈ H1×q , k = 1, . . . , q, cdett (C∗ C).t e.k β

   α (5)  ∈ Hs×1 , l = 1, . . ., p, rdetf (NN∗ )f. el. α

68

Ivan I. Kyrchei

(5)

(5)

e.k and el. are k-th column and l-th row of E5 = C∗ EN∗ . Construct the matrix Ω = (ωtf ). By (2.14), we use the determinantal representations of the projection matrix QS , then Pp

t=1

(2) ygf

=

P

β∈Jr7 ,p

From the equality (3.25).

Pp

P

β∈Jr7 ,p {g}

|S∗ S|ββ

˙ .t ωtf t=1 s

 β cdetg (S∗ S).g (˙s.t) · ωtf β

P

β∈Jr3 ,p

|C∗ C|ββ

P

α∈Ir6 ,q

|NN∗ |αα

.

e := S∗ SΩ, it follows =ω e.f that is f -th row of Ω

Algorithm 1. We give algorithms obtaining determinantal representations for each components. (1)

(i) For the components xij . ∗ EB∗ , A∗ A, BB∗ , and the values 1. Compute the matrices P P E1 =∗ A β α ∗ |A A|β , |BB |α . α∈Ir2 ,r

β∈Jr1 ,n

2. Compute the column-vectors dB . j by (3.8) for all j = 1, . . . , r, or the A row-vectors di . by (3.9) for all i = 1, . . ., n. (1)

3. Finally, find xij by (3.6) or (3.7) according to the vectors from the above point. (2)

(ii) For the components xij . 1. Compute the matrices C1 = A∗ CM∗ , A∗ A, MM∗ , BB∗ , and the P P P values |A∗ A|ββ , |MM∗ |αα , |BB∗ |αα . β∈Jr1 ,n

2.

β∈Ir5 ,m

α∈Ir2 ,r

Compute the column-vectors ϕM .k by (3.13) for all A the row-vectors ϕi. by (3.14) for all i = 1, . . ., n.

k = 1, . . . , m, or

3. Compute the matrix Φ = (φik ) by (3.11) when the column-vectors ϕM .k are found in the above point or by (3.12) when the row-vectors ϕA i. are found. e = ΦEB∗ . 4. Compute the matrix Φ (2)

5. Finally, find xij by (3.10).

Cramer’s Rules for Sylvester-Type Matrix Equations

69

(3)

(iii) For the components xij . 1. Compute the matrices D1 = N∗ DB∗ , A∗ A, C∗ C, N∗ N, BB∗ , P P P and the values |A∗ A|ββ , |C∗ C|ββ , |N∗ N|ββ , α∈Jr6 ,s β∈Jr1 ,n β∈Jr3 ,p P |BB∗ |αα. α∈Ir2 ,r

N 2. Compute the row-vectors ζv. by (3.19) for all v = 1, . . ., s, or the B column-vectors ζ.j by (3.20) for all j = 1, . . ., r.

N 3. Compute the matrix Ψ = (ψf j ) by (3.17) when the row-vectors ζf. are found in the above point or by (3.18) when the column-vectors ζ.jB are found.

e = C∗ EΦ. 4. Compute the matrix E

e = A∗ SΥ. 5. By (3.16), compute the matrix Υ = (υtj ), then Υ (3)

6. Find xij by (3.15). (1)

(iv) For the components ygf . 1. Compute the matrices E4 = M∗ ED∗ , M∗ M, DD∗ , and the values P P |DD∗ |αα . |M∗ M|ββ , α∈Ir4 ,q

β∈Jr5 ,p

2. Compute the column-vectors dD . f by (3.23) for all f = 1, . . . , q, or M the row-vectors dg . by (3.24) for all g = 1, . . ., p. (1)

3. Finally, find ygf by (3.21) or (3.22), respectively to the vectors from the above point. (2)

(v) For the components ygf . ∗ ∗ ∗ ∗ ∗ , and the 1. ComputeP the matrices EP 5 = C EN , S PS, C C,∗ NN β β α ∗ ∗ |C C|β , |NN |α . |S S|β , values β∈Jr7 ,p

β∈Jr3 ,p

α∈Ir6 ,q

ξt.C

2. Compute the row-vectors by (3.28) for all t = 1, . . ., p, or the N column-vectors ξ.f by (3.29) for all f = 1, . . ., q.

3. Compute the matrix Ω = (ωtf ) by (3.26) when the row-vectors ξt.C are found in the above point or by (3.27) when the column-vectors N are found. ξ.f

70

Ivan I. Kyrchei e = S∗ SΩ. 4. Compute the matrix Ω (2)

5. Finally, find ygf by (3.25).

Remark 3.4. If in the generalized Sylvester matrix equation (1.1) all coefficient matrices are complex, then determinantal representations of its general solution can be evidently obtain from Theorem 3.3 by changing all row and column determinants into usual determinants. But for a better understanding, we give an analog of Cramer’s rule for the complex Sylvester matrix equation in the following theorem without proof. Theorem 3.5. Let A ∈ Crm×n , B ∈ Crr×s , C ∈ Crm×p , D ∈ Crq×s 3 4 , rank M = 1 2 r5 , rank N = r6 , rank S = r7 . Then the pair solution (3.4)-(3.5), X = (xij ) ∈ Cn×r , Y = (ygf ) ∈ Cp×q , to Eq. (1.1) by the components (1)

(2)

(3)

(1)

(2)

xij = xij − xij − xij , ygf = ygf + ygf , have the following determinantal representations. (i)   β P ∗ (A A).i dB .j (1)

β∈Jr1 ,n

|A∗ A|ββ

P

α∈Ir2 ,r {j}

= P

β∈Jr1 ,n

where 

X

 dB .j =

α∈Ir2 ,r {j}

 dA i. =

β∈Jr1 ,n {i}



X

β

β∈Jr1 ,n {i}

xij = P

P

α∈Ir2 ,r

=

|BB∗ |αα

 α (BB∗ )j. dA i.

(3.33)

α

|A∗ A|ββ

P

α∈Ir2 ,r

|BB∗ |αα

,

(3.34)

   α (1) (BB∗ )j. ek.  ∈ Cn×1 , k = 1, . . ., n, α

   β ∗ (1) (A A).i e. l  ∈ C1×r , l = 1, . . . , r, β

(1)

(1)

(3.35)

(3.36)

are the column vector and the row vector, respectively. ek. and e.l are the k-th row and the l-th column of E1 := A∗ EB∗ .

71

Cramer’s Rules for Sylvester-Type Matrix Equations (ii)

(2)

xij =

P

α∈Ir2 ,r {j}

P

β∈Jr1 ,n

|A∗ A|ββ

P

α (BB∗ )j. (φei. α

β∈Ir5 ,m

|MM∗ |αα

P

α∈Ir2 ,r

|BB∗ |αα

,

(3.37)

ei. is i-th row of Φ e := ΦEB∗ . The matrix Φ = (φik ) is such that where φ φik =

X

β∈Jr1 ,n {i}

=

X

∗  (A A) ϕM β = .k β .i

α∈Ir5 ,m {k}

where 

 ϕM .k =

X

α∈Ir5 ,m {k}



 ϕA i. =

X

β∈Jr1 ,n {i}

(1)

(1)

(MM∗ )

k.

(3.38)

 α ϕA i. α ,

(3.39)

   α (1) (MM∗ )k. ct .  ∈ Cn×1 , t = 1, . . ., n, α

   β ∗ (1) (A A).i c. l  ∈ C1×m , l = 1, . . . , m. β

(3.40)

(3.41)

Here ct . and c. l are t-th row and l-th column of C1 := A∗ CM∗ . (iii)

(3)

xij =

P

β∈Jr1 ,n

P

β∈Jr1 ,n {i}

|A∗ A|ββ

P

β∈Jr3 ,p

|(A∗ A).i (e υ.j )|ββ

|C∗ C|ββ

P

α∈Jr6 ,s

|N∗ N|ββ

P

α∈Ir2 ,r

|BB∗ |αα

,

(3.42)

e = A∗ SΥ. The matrix Υ = (υtj ) ∈ Cp×n is such where υ e.j is j-th column of Υ that X υtj = |(C∗ C).t (e e.j )|ββ , (3.43) β∈Jr3 ,p {t}

72

Ivan I. Kyrchei

e = C∗ EΨ, and the matrix Ψ = (ψvj ) ∈ Cs×n is where e e.j is j-th column of E such that X  α N ψvj = (3.44) (BB∗ )j. ζv. = α

α∈Ir2 ,r {j}

=

X

β∈Jr6 ,s {f }

where 

X

N ζv. =

β∈Jr6 ,s {v}



X

ζ.jB = 

α∈Ir2 ,r {j}

(1)

(1)

∗  (N N) ζ B β , .j β .v

(3.45)

   β ∗ (1) (N N).v d.k  ∈ C1×r , k = 1, . . . , r, β

   α (1) (BB∗ )j. dl.  ∈ Cs×1 , l = 1, . . . , s. α

(3.46)

(3.47)

Here d.k and dl. are k-th column and l-th row of the matrix D1 = N∗ DB∗ , respectively. (iv)   β P ∗ (M M).g dD .f (1)

β∈Jr5 ,p

P

β∈Jr5 ,p



 dD .f =

X

α∈Ir4 ,q {f }



 dM g. =

X

β∈Jr5 ,p {g}

P

|M∗ M|ββ

α∈Ir4 ,q {f }

= P

where

β

β∈Jr5 ,p {g}

ygf = P

α∈Ir4 ,q

=

|DD∗ |αα

 α (DD∗ )f. dM g.

(3.48)

α

P

|M∗ M|ββ

α∈Ir4 ,q

|DD∗ |αα

,

   α (4) (DD∗ )f. ek.  ∈ Cp×1 , k = 1, . . . , p, α

   β ∗ (4) (M M).g e.l  ∈ C1×q , l = 1, . . . , q. β

(3.49)

(3.50)

(3.51)

Cramer’s Rules for Sylvester-Type Matrix Equations (4)

(4)

Here ek. and e.l are k-th row and l-th column of E4 := M∗ ED∗ . (v) β P ∗ ω.f ) (S S).g (e (2)

ygf =

P

β

β∈Jr7 ,p {g}

β∈Jr7 ,p

|S∗ S|ββ

P

β∈Jr3 ,p

|C∗ C|ββ

P

α∈Ir6 ,q

|NN∗ |αα

,

e := S∗ SΩ. The matrix Ω = (ωtf ) is such that where ω e.f is f -th row of Ω ωtf =

X

α∈Ir6 ,q {t}

=

X

β∈Jr3 ,p {t}

where 

ξt.C = 

X

β∈Jr3 ,p {t}



N ξ.f =

X

α∈Ir6 ,q {j}

(5)

73

(5)

 α (NN∗ )f. ξt.C = α

∗  N β (C C) ξ.f , .t β

   β ∗ (5) (C C).t e.k  ∈ C1×q , k = 1, . . . , q, β

   α (5) (NN∗ )f. el.  ∈ Cs×1 , l = 1, . . . , p. α

(3.52)

(3.53) (3.54)

(3.55)

(3.56)

Here e.k and el. are k-th column of l-th row of E5 = C∗ EN∗ , respectively. So, Cramer’s rule for the complex Sylvester matrix equation (1.1) has the following algorithm. Algorithm 2. We also give algorithms obtaining determinantal representations for each components. (1)

(i) For the components xij . 1. Compute the matrices E1 = A∗ EB∗ , A∗ A, BB∗ , and the values P P |A∗ A|ββ , |BB∗ |αα . β∈Jr1 ,n

α∈Ir2 ,r

74

Ivan I. Kyrchei 2. Compute the column-vectors dB . j by (3.35) for all j = 1, . . ., r, or A the row-vectors di . by (3.36) for all i = 1, . . . , n. (1)

3. Finally, find xij by (3.33) or (3.34), respectively to the vectors from the above point. (2)

(ii) For the components xij . ∗ ∗ 1. Compute P the matrices C1 P = A∗ CM∗ , A∗ A, , and the PMM , BB β α α ∗ ∗ ∗ values |A A|β , |MM |α , |BB |α . β∈Jr1 ,n

2.

β∈Ir5 ,m

α∈Ir2 ,r

Compute the column-vectors ϕM .k by (3.40) for all A the row-vectors ϕi. by (3.41) for all i = 1, . . ., n.

k = 1, . . . , m, or

3. Compute the matrix Φ = (φik ) by (3.38) when the column-vectors ϕM .k are found in the above point or by (3.39) when the row-vectors ϕA i. are found. e = ΦEB∗ . 4. Compute the matrix Φ (2)

5. Finally, find xij by (3.37). (3)

(iii) For the components xij . ∗ ∗ ∗ 1. Compute the matrices DB C∗ C,P N∗ N, BB∗ , P , A A, P D1 ∗= N β β |N∗ N|ββ , |C∗ C|β , |A A|β , and the values α∈Jr6 ,s β∈Jr3 ,p β∈Jr1 ,n P ∗ α |BB |α. α∈Ir2 ,r

N 2. Compute the row-vectors ζv. by (3.46) for all v = 1, . . ., s, or the B column-vectors ζ.j by (3.47) for all j = 1, . . ., r.

N 3. Compute the matrix Ψ = (ψf j ) by (3.44) when the row-vectors ζf. are found in the above point or by (3.45) when the column-vectors ζ.jB are found.

e = C∗ EΦ. 4. Compute the matrix E

e = A∗ SΥ. 5. By (3.43), compute the matrix Υ = (υtj ), then Υ (3)

6. Finally, find xij by (3.42). (1)

(iv) For the components ygf .

Cramer’s Rules for Sylvester-Type Matrix Equations

75

∗ 1. Compute the matrices ED∗ , M∗ M, DD∗ , and the values P PE4 = M β α ∗ ∗ |M M|β , |DD |α . β∈Jr5 ,p

α∈Ir4 ,q

2. Compute the column-vectors dD . f by (3.50) for all f = 1, . . . , q, or M the row-vectors dg . by (3.51) for all g = 1, . . ., p. (1)

3. Finally, find ygf by (3.48) or (3.49), respectively to the vectors from the above point. (2)

(v) For the components ygf . 1. Compute the matrices E5 = C∗ EN∗ , S∗ S, C∗ C, NN∗ , and the P P P values |NN∗ |αα . |S∗ S|ββ , |C∗ C|ββ , β∈Jr7 ,p

β∈Jr3 ,p

α∈Ir6 ,q

2. Compute the row-vectors ξt.C by (3.55) for all t = 1, . . ., p, or the N by (3.56) for all f = 1, . . ., q. column-vectors ξ.f

3. Compute the matrix Ω = (ωtf ) by (3.53) when the row-vectors ξt.C are found in the above point or by (3.54) when the column-vectors N ξ.f are found. e = S∗ SΩ. 4. Compute the matrix Ω (2)

5. Finally, find ygf by (3.52).

4.

D ETERMINANTAL R EPRESENTATIONS OF THE G ENERAL AND (SKEW-)H ERMITIAN SOLUTIONS TO (1.2)

Now, consider the equation (1.2). Since for an arbitrary matrix A it is evidently ∗ that QA∗ = (A∗ )† A∗ = AA† = PA , so PA∗ = QA , LA∗ = I − QA∗ = I − PA = RA , and RA∗ = LA . Due to above, M = RA B and N = B∗ LA∗ = B∗ RA = (RA B)∗ = M∗ , and we obtain the following analog of Lemma 3.1. Lemma 4.1. Let A ∈ Hm×n , B ∈ Hm×k , C ∈ Hm×m . Put M = RA B, S = BLM . Then the following results are equivalent. (i) Eq. (1.2) has a pair solution (X, Y), where X ∈ Hn×n , Y ∈ Hk×k . (ii) RM RA C = 0, RA CRB = 0, CRB RM = 0, RB CRA = 0.

76

Ivan I. Kyrchei

(iii) PM RA CPB = RA C, PB CRA PM = CRA .         A C A 0 (iv) rank A B C = rank A B , rank = rank , 0 B∗ 0 B∗     B C B 0 rank = rank . 0 A∗ 0 A∗

In that case, the general solution to (1.2) can be expressed as follows

X =A†CA∗,† − A†BM†CA∗,† − A† SB∗,†CM∗,†B∗ A∗,†− A†SVLM B∗ A∗,† + LA U + ZLA , Y =M† CB∗,† + QS B† CM∗,† + LM (V − QS VQM ) + WLB . where U, V, Z and W are arbitrary matrices over H with allowable dimensions. By putting U, V, Z, and W as zero-matrices with compatible dimensions, we obtain the following partial solution to (1.2), X =A† CA∗,† − A†BM†CA∗,† − A† SB† CM∗,†B∗ A∗,†, †

∗,†

Y =M CB



∗,†

+ QS B CM .

(4.1) (4.2)

The following theorem gives determinantal representations of (4.1)-(4.2). Theorem 4.2. Let A ∈ Hrm×n , B ∈ Hrm×k , rank M = r3 , rank S = r4 . 1 2 Then the partial pair solution (4.1)-(4.2) to Eq.(1.2), X = (xij ) ∈ Hn×n , Y = (ypg ) ∈ Hk×k , by the components (1)

(2)

(3)

(1) (2) xij = xij − xij − xij , ypg = ypg + ypg ,

possess the following determinantal representations, (i)  α P rdetj (A∗ A)j. (vi. ) (1)

xij =

=

α

α∈Ir1 ,n {j}

P

P

|A∗ A|αα

P

|A∗ A|ββ

α∈Ir1 ,n

β∈Jr1 ,n {i}

!2

=

(4.3)

cdeti ((A∗ A).i (v.j ))ββ

β∈Jr1 ,n

!2

,

(4.4)

77

Cramer’s Rules for Sylvester-Type Matrix Equations where



X

vi. = 

β∈Jr1 ,n {i}



X

v.j = 

α∈Ir1 ,n {j}

   β  ∈ H1×n , s = 1, . . ., n, (4.5) cdeti (A∗ A).i c(1) .s β

  α (1) rdetj (A∗ A)j.(cf. )  ∈ Hn×1 , f = 1, . . ., n α

(1)

(4.6)

(1)

are the row vector and the column vector, respectively; c.s and cf. are the s-th column and the f -th row of C1 = A∗ CA. (ii)  α P rdetj (A∗ A)j.(φ˜i. ) α

α∈Ir1 ,n {j}

(2)

xij =

(

P

β∈Jr1 ,n

|A∗ A|ββ )2

P

α∈Ir3 ,m

|MM∗ |αα

,

(4.7)

˜ := ΦCA, and Φ = (φiq ) ∈ Hn×m is such that where φ˜i. is the ith row of Φ X

φiq =

cdeti (A∗ A). i η.Mq

β∈Jr1 ,n {i}

and



M η.q =

X

rdetq

α∈Ir3 ,m {q}



ηi.A = 

X

β∈Jr1 ,n {i}



β β

=

X

α∈Ir3 ,m {q}

 α A rdetq (MM∗ )q. ηi. , α

(4.8)

  α (1)  ∈ Hn×1 , f = 1, . . ., n, (MM∗ )q. bf. α

(4.9)



  β  ∈ H1×m , s = 1, . . ., m, cdeti (A∗ A).i b(1) .s β

(4.10)

(1)

(1)

are the column vector and the row vector, respectively. bf. and b.s are the f -th row and the s-th column of B1 = A∗ BM∗ . (iii) P cdeti ((A∗ A).i (e υ.j ))ββ (3)

xij =

(

P

β∈Jr1 ,n

β∈Jr1 ,n {i}

|A∗ A|ββ )2

P

β∈Jr2 ,k

|B∗ B|ββ

P

β∈Jr3 ,m

|MM∗ |ββ

,

(4.11)

78

Ivan I. Kyrchei

e = A∗ SΥ, the matrix Υ = (υpj ) ∈ Hk×n where υ e.j is the j-th column of Υ such that  β X υpj = cdetp (B∗ B).p (e c.j ) . (4.12) β

β∈Jr2 ,k {p}

e = B∗ CΦ∗ , and Φ∗ is Hermitian adjoint of Here e c.j is the j-th column of C Φ = (φiq ) from (4.8). (iv)  β P cdetp (M∗ M).p dB .g (1) ypg =

P

β

|M∗ M|β

β∈Jr3 ,k

=

β

β∈Jr3 ,k {p}

P

α∈Ir2 ,k {g}

P

β∈Jr3 ,k

P

=

|B∗ B|αα

α∈Ir2 ,k

 α rdetg (B∗ B)g. dM p.

(4.13)

α

|M∗ M|ββ

P

α∈Ir2 ,k

|B∗ B|αα

,

(4.14)

where 

 dB .g =

X

α∈Ir2 ,k {g}

dM p.



=

X

β∈Jr3 ,k {p}

   α  ∈ Hk×1 , q = 1, . . . , k, rdetg (B∗ B)g. c(4) q. α

(4.15)



β   (4) ∗  ∈ H1×k , l = 1, . . . , k, cdetp (M M).p c.l β

(4.16)

(4)

(4)

are the column vector and the row vector, respectively. cq. and c.l are the q-th row and the l-th column of C4 := M∗ CB. (v)  β P cdetp (S∗ S).p (e ω.g ) (2) ypg =

P

β∈Jr4 ,k {p}

β∈Jr4 ,k

|S∗ S|ββ

β

P

β∈Jr2 ,k

|B∗ B|ββ

P

α∈Ir3 ,k

|M∗ M|αα

,

(4.17)

79

Cramer’s Rules for Sylvester-Type Matrix Equations ˜ = S∗ SΩ and Ω = (ωtg ) such that where Ω X β ωtg = cdett (B∗ B).t dM = .g β β∈Jr2 ,k {t}

=

X

α∈Ir3 ,k {g}

Here



 dM .g =

X

α∈Ir3 ,k {g}



 dB t. =

X

β∈Jr3 ,k {t}

(4.18)

 α rdetg (M∗ M)g. dB . t.

(4.19)

α

   α (4,∗)  ∈ Hk×1 , q = 1, . . ., k, rdetg (M∗ M)g. cq. α

(4.20)



  β (4,∗)  ∈ H1×k , l = 1, . . ., k, cdett (B∗ B).t c.l β

(4.21)

(4,∗)

are the column vector and the row vector, respectively. cq. q-th row and the l-th column of C∗4 := M∗ CB.

(4,∗)

and c.l

are the

Proof. The proof can be obtained from the proof of Theorem 3.3 by substitution corresponding matrices. Since the equation (1.2) has its own peculiarities, more completely proof will be made in some points, and a few comments will be done in others. (1) (i) For the first term of (4.1), X1 = A† C (A∗ )† = (xij ), we have (1) xij

=

m X m X

a†il clta∗,† tj .

l=1 t=1

By using determinantal representations (2.5) and (2.11) of the Moore-Penrose inverses A† and (A∗ )† , respectively, we obtain (1)

xij = m P m P

P

l=1 t=1 β∈Jr1 ,n {i}

cdeti ((A∗ A). i (a∗.l ))ββ clt P

α∈Ir1 ,n

|A∗ A|αα

P

P

α∈Ir1 ,n {j}

β∈Jr1 ,n

rdetj ((A∗ A)j.(at. ))αα

|A∗ A|ββ

.

80

Ivan I. Kyrchei

Suppose el. and e.l are the unit row-vector and the unit column-vector, respectively, such that all their components are 0, except the l-th components, which m P m P (1) are 1. Denote C1 := A∗ CA. Since a∗f l clt ats = cf s , then l=1 t=1

(1)

xij = n P n P

P

P

β (1)

f=1 s=1 β∈Jr1 ,n{i}

cdeti ((A∗ A). i (e.f ))β cfs P

α∈Ir1 ,n

If we denote by vis :=

n X

X

α |A∗ A|α

P

α

α∈Ir1 ,n {j}

β∈Jr1 ,n

rdetj ((A∗ A)j. (es. ))α

β

|A∗ A|β

(1)

cdeti ((A∗ A).i (e.f ))ββ cf s =

f =1 β∈Jr1 ,n {i}

X

=

β∈Jr1 ,n {i}

  β cdeti (A∗ A).i c(1) .s β

the s-th component of a row-vector vi. = [vi1 , . . ., vin ], then m X

vis

s=1

X

X

rdetj ((A∗ A)j. (es.))αα =

α∈Ir1 ,n {j}

rdetj ((A∗A)j. (vi.))αα .

α∈Ir1 ,n {j}

P

Farther, it’s evident that

β∈Jr1 ,n

|A∗ A|ββ =

P

α∈Ir1 ,n

|A∗ A|αα, so the first term of

(4.1) has the determinantal representation (4.3), where vi. is (4.5). If we denote by (2)

vf j :=

n X

(1)

cf s

s=1

=

X

X

rdetj ((A∗ A)j.(es. ))αα =

α∈Ir1 ,n {j}

α∈Ir1 ,n {j}

  (1) α rdetj (A∗ A)j.(cf. ) α

the f -th component of a column-vector v.j = [v1j , . . ., vnj ], then n X

X

f=1 β∈Jr1 ,n {i}

β

cdeti ((A∗ A). i (e.f ))β vfj =

X

β∈Jr1 ,n {i}

β

cdeti ((A∗ A). i (v.j ))β .

81

Cramer’s Rules for Sylvester-Type Matrix Equations

So, another determinantal representation of the first term of (4.1) is (4.4), where v.j is (4.6).   (2)

(ii) For the second term A† BM†CA∗,† := X2 = xij (2) xij

=

m X k X m X m X

of (4.1), we have

a†il blpm†pq cqta∗,† tj .

l=1 p=1 q=1 t=1

Using determinantal representations (2.5) for the Moore-Penrose inverse A† , (2.6) for M† = (m†pq ), and (2.11) for (A∗ )†, respectively, we obtain (2)

xij =

` ´β cdeti (A∗ A). i (a∗.l ) β blp

P

m X k X m X m X

β∈Jr1 ,n {i}

l=1 p=1 q=1 t=1

P

cqt

β∈Jr1 ,n ∗

rdetj ((A

α∈Ir1 ,n {j}

×

P

P

α∈Ir1 ,n

A)j. (at. )α α

|A∗ A|ββ |

P

α∈Ir3 ,m {q}

P

α∈Ir3 ,m

` ´α rdetq (MM∗)q. (m∗p. ) α

|MM∗ |α α

×

.

|A∗ A|α α

Further, thinking as above in the point (i), we obtain φiq := m X k X

X

β

cdeti ((A∗ A). i (a∗.l ))β blp

l=1 p=1 β∈Jr1 ,n {i}

=

X

α∈Ir3,m {q}

 M β

cdeti (A∗ A). i η. q

β∈Jr1 ,n {i}

where



M η.q =

ηi.A = 

α∈Ir3 ,m {q}

 α A rdetq (MM∗ )q. ηi. , α

(4.22)

  α (1)  ∈ Hn×1 , f = 1, . . ., n, (MM∗ )q. bf.

rdetq

X

  β  ∈ H1×m , s = 1, . . ., m. cdeti (A∗ A).i b(1) .s

β∈Jr1 ,n {i}

(1)



β

=

X

α rdetq (MM∗ )q. (m∗p. ) α =

X

α∈Ir3 ,m {q}



X

(1)

α

β



Here bf. and b.s are the f -th row and the s-th column of B1 = A∗ BM∗ . Construct the matrix Φ = (φiq ) ∈ Hn×m such that φiq can be obtained using

82

Ivan I. Kyrchei

˜ := ΦCA. Since one of the two cases in (4.22), and denote Φ m X m X

φiq cqt

q=1 t=1

X

rdetj ((A∗A)j. (at.)αα =

α∈Ir1 ,n {j}

=

X

α∈Ir1 ,n {j}

 α rdetj (A∗ A)j.(φ˜i. ) , α

˜ then we have (4.7). where φ˜i. is the i-th row of Φ,   (3) (iii) For the third term A† SB† CM∗,†B∗ A∗,† := X3 = xij of (4.1), we use the determinantal representation (2.5) to the both Moore-Penrose matrices A† and B† . Then by Corollary 2.4 and taking into account that M∗,†B∗ A∗,† = (A†BM† )∗ , we have (3)

xij =

=

k P m P

P

p=1 t=1 β∈Jr1 ,n {i}

(

“ “ ””β (1) cdeti (A∗ A).i s.p P

β∈Jr1 ,n

(1)

|A∗ A|ββ )2

“ “ ””β (3) φ∗tj cdetp (B∗ B).p c.t

P

β β∈J r2 ,k {p} P ∗ |B B|ββ β∈Jr2 ,k β∈Jr3 ,m

P

β

|MM∗|ββ

(3)

,

where s.p is the p-th column of S1 := A∗ S, c.t is the t-th column of C3 := B∗ C, φ∗tj is the (tj)-th entry of Φ∗ that is Hermitian adjoint to Φ = (φiq ) from e Then, (4.8). Denote C3 Φ∗ = B∗ CΦ∗ = C. m X

X

t=1 β∈Jr2 ,k {p}

=

  β (3) cdetp (B∗ B).p c.t φ∗tj = β

X

β∈Jr2 ,k {p}

 β cdetp (B∗ B).p (e c.j ) . β

Construct the matrix Υ = (υpj ) ∈ Hk×n such that  β X υpj = cdetp (B∗ B).p (e c.j ) . β

β∈Jr2 ,k {p}

e = (e Denote S1 Υ = A∗ SΥ =: Υ υij ) ∈ Hn×n . Since   β X X cdeti (A∗ A).i s(1) υpj = cdeti ((A∗ A).i (e υ.j ))ββ , .p β∈Jr1 ,n {i}

β

β∈Jr1 ,n {i}

83

Cramer’s Rules for Sylvester-Type Matrix Equations

it follows (4.11). (iv) Taking into account Theorem 2.15 and similarly as above for the first (1) term Y1 = M†CB∗,† = (ypg ) of (4.2), we have the determinantal representations (4.13) and (4.14).   (2) (v) Finally, for the second term Y2 = QS B† CM∗,† = ypg of (4.2) using (2.14) for a determinantal representation of QS , and due to Theorem 2.15 for B† CM∗,†, we obtain k P

P

t=1 β∈Jr

(2) ypg =

P

β∈Jr4 ,k

4 ,k

{p}

|S∗ S|ββ

where ωtg are as follows ωtg =

X

β∈Jr2 ,k {t}

=

X

α∈Ir3 ,k {g}

 β cdetp (S∗ S).p (¨s.t) ωtg β

P

β∈Jr2 ,k

|B∗ B|ββ

P

α∈Ir3 ,k

cdett (B∗ B).t dM .g

|M∗ M|αα

β β

,

=

 α rdetg (M∗ M)g. dB , t. α

(4.23)

and 

 dM .g =

X

α∈Ir3 ,k {g}



 dB t. =

X

β∈Jr3 ,k {t}

(4,∗)

   α (4,∗)  ∈ Hk×1 , q = 1, . . ., k, rdetg (M∗ M)g. cq. α

   β (4,∗)  ∈ H1×k , l = 1, . . ., k. cdett (B∗ B).t c.l β

(4,∗)

Here cq . and c. l are the q-th row and the l-th column of C∗4 := M∗ CB. Construct the matrix Ω = (ωtg ) ∈ Hk×k such that ωtg are obtained by (4.23), ˜ := S∗ SΩ. Since and denote Ω k X

X

t=1 β∈Jr ,k {p} 4

 β cdetp (S∗ S).p (¨s.t) ωtg =

it follows (4.17).

β

X

β∈Jr4 ,k {p}

 β cdetp (S∗ S).p (e ω.g ) , β

84

Ivan I. Kyrchei

So, Cramer’s rule for the Sylvester-type matrix equation (1.2) has the following algorithm. Algorithm 3.

(1)

(i) For the components xij .

1. Compute the matrices C1 = A∗ CA, A∗ A, and the values P ∗ |A A|ββ . β∈Jr1 ,n

2. Compute the row-vectors vi. by (4.5) for all i = 1, . . . , n, or the column-vectors v.j by (4.6) for all j = 1, . . . , n. (1)

3. Finally, find xij by (4.3) or (4.4), respectively to the vectors from the above point. (2)

(ii) For the components xij . 1. Compute the matrices B1 = A∗ BM∗ , A∗ A, MM∗ , and the values P P |A∗ A|ββ , |MM∗ |αα . β∈Jr1 ,n

β∈Ir3 ,m

M by (4.9) for all q = 1, . . . , m, or 2. Compute the column-vectors η.q the row-vectors ηi.A by (4.10) for all i = 1, . . . , n.

3. Compute the matrix Φ = (φik ) by one from the two cases in Eq.(4.8) M accordingly to whether the column-vectors η.q or the row-vectors A ηi. are obtained in the above point. ˜ = ΦCA. 4. Compute the matrix Φ (2)

5. Finally, find xij by (4.7). (3)

(iii) For the components xij . 1. Compute the matrices Φ∗ that is Hermitian adjoint to Φ = (φiq ) e = B∗ CΦ∗ , Υ by using (4.12), and Υ e = A∗ SΥ. from (4.8), C (3)

2. Finally, find xij by (4.11). (1)

(iv) For the components ygf . 1. Compute the matrix C4 = M∗ CB. 2. Compute the column-vectors dB . g by (4.15) for all g = 1, . . ., k, or the row-vectors dM by (4.16) for all p = 1, . . . , k. p.

Cramer’s Rules for Sylvester-Type Matrix Equations

85

(1)

3. Finally, find ygf by (4.13) or (4.14) that correspond to the vectors from the above point. (2)

(v) For the components ygf . 1. Compute the matrix C∗4 that is the Hermitian adjoint of C4 obtained above. 2. Compute the row-vectors dB t. by (4.21) for all t = 1, . . ., k, or the M column-vectors d. g by (4.20) for all g = 1, . . ., k. 3. Compute the matrix Ω = (ωtg ) by (4.18) when the column-vectors dM . g are found in the above point, or by (4.19) when the row-vectors B dt. are found. e = S∗ SΩ. 4. Compute the matrix Ω (2)

5. Finally, find ygf by (4.17).

Due to [22], the following lemma can be generalized to H. Lemma 4.3. Suppose that matrices A ∈ Hm×n and B ∈ Hm×m and C ∈ Hm×m are given with C = C∗ = (−C∗ ) in Eq. (1.2). If Eq. (1.2) is solvable, then it must have Hermitian (skew-Hermitian) solutions. b = The general Hermitian solution to (1.2) can be expressed as X 1 ∗ ∗ b = (Y + Y ), where (X, Y) is an arbitrary solution to (1.2). (X + X ), Y 2 Since by Lemma 4.3, the existence of Hermitian solutions to (1.2) needs that C be Hermitian, then 1 2

X∗ = A†CA∗,† − A†CM∗,†B∗ A∗,† − A†BM† CB∗,†S∗ A∗,†, Y ∗ = B† CM∗,† + M†CB∗,†QS . By Lemma 4.3, if C = −C∗ and Eq. (1.2) have solutions, then it has skewHermitian solutions. Taking into account C = −C∗ for X∗ that is conjugate transpose to X from (4.1) and for Y ∗ that is conjugate transpose to Y from (4.2), we have X∗ = − A†CA∗,† + A†CM∗,†B∗ A∗,† + A†BM† CB∗,†S∗ A∗,†, Y ∗ = − B† CM∗,† − M†CB∗,†QS . 1 2

e = The general skew-Hermitian solution (1.2) can be expressed as X 1 ∗ ∗ e = (Y − Y ), where (X, Y) are solutions of (1.2). (X − X ), Y 2

86

Ivan I. Kyrchei

It is evident that the determinantal representations of the Hermitian solution b = (b b = (b e = (b e = (e X xij ), Y yij ) and the skew-Hermitian solution X xij , Y yij ) are the same, and they can be expressed as     1 1 1 (1) (2) (2) (3) (3) xij + xji − xij + xji , x bij = x eij = (xij + xji ) = xij − 2 2 2     1 1 1 (1) (2) (1) (2) ypg + ygp + y + ygp ybpg = yepg = (ypg + ygp ) = 2 2 2 pg for all i, j = 1, . . ., n and p, g = 1, . . ., k, where xij and ypg are obtained in Theorem 4.2.

5.

D ETERMINANTAL R EPRESENTATIONS OF η-H ERMITIAN AND η-SKEW-H ERMITIAN SOLUTIONS TO SYLVESTER -T YPE QUATERNION MATRIX E QUATION

Now, consider the Sylvester-type quaternion matrix equation with η-Hermicity, AXAη∗ + BYBη∗ = C.

(5.1)

Since N = Bη∗ LAη∗ = (RA B)η∗ = Mη∗ , RN Bη∗ = (BLM )η∗ = Sη∗ , then, due to [2], we get the following lemma on the general solution to (5.1). Lemma 5.1. Let A ∈ Hm×n , B ∈ Hm×k , C ∈ Hm×m be given. Set M = RA B, S = BLM . Then the following statements are equivalent: (1) The matrix equation (5.1) has a pair solution (X, Y). (2) RM RA C = 0, RA C (RB )η∗ = 0.   A C (3) rank = rank(A) + rank(B), rank [A B C] = rank [A B]. 0 Bη∗ In this case, the general solution to (5.1) can be expressed as  η∗  η∗ X =A†C(A†)η∗ − A†BM† C A† − A†SB† C A† BM†  η∗ −A†SW2 A†S + LA U + ZLη∗ A,  η∗  η∗ η η Y =M†C B† + QS B† C M† + LM W2 LM + VLB + LM LS W1 ,

Cramer’s Rules for Sylvester-Type Matrix Equations

87

where U, V, W1 , W2 , and Z are arbitrary matrices over H with appropriate sizes. By putting Z, U, V, W1 , and W2 as zero-matrices with compatible dimensions, we obtain the following partial solution to (5.1),  η∗  η∗ X =A†C(A†)η∗ − A†BM†C A† − A† SB† C A†BM† , (5.2)  η∗  η∗ Y =M†C B† + QS B† C M† . (5.3)

The following theorem gives determinantal representations of (5.2)-(5.3).

, B ∈ Hrm×k , rank M = r3 , rank S = r4 . Theorem 5.2. Let A ∈ Hrm×n 1 2 Then the partial pair solution (5.2)-(5.3) to Eq.(5.1), X = (xij ) ∈ Hn×n , Y = (ypg ) ∈ Hk×k , by the components (1)

(2)

(3)

(1) (2) xij =xij − xij − xij , ypg = ypg + ypg ,

possess the following determinantal representations, (i) P

−η

α∈Ir1 ,n {j}

(1)

xij =

=

P

 α rdetj (A∗ A)j. (vi.η ) η α

P

β∈Jr1 ,n

β∈Jr1 ,n {i}



vi.η = −η 

u.j = −η

X

β∈Jr1,n {i}

X

α∈Ir1,n {j}

(5.4)

cdeti ((A∗ A). i (u.j ))ββ P

β∈Jr1 ,n

where

|A∗ A|ββ

!2

|A∗ A|ββ

!2

,

(5.5)



cdeti ((A∗ A).i (b a.s ))ββ η  ∈ H1×n , s = 1, . . ., n, 

(5.6)

α rdetj (A∗ A)j.(b aηl. ) α η  ∈ Hn×1 , l = 1, . . . , n.

(5.7)

88

(ii)

Ivan I. Kyrchei b = A∗ CAη and a bηl. is the l-th row of Here b a.s is the s-th column of A b η = Aη∗ Cη A. A −η

α∈Ir1 ,n {j}

(2)

xij =

P

(

P

β∈Jr1 ,n

 α rdetj (A∗ A)j.(φei. ) η α

|A∗ A|ββ )2

P

α∈Ir3 ,m

|MM∗ |αα

,

(5.8)

e := Φη C∗ A and Φ = (φiq ) is such that where φei. is the i-th row of Φ φiq = X

cdeti (A∗ A). i ϕM .q

β∈Jr1 ,n {i}

and



 ϕM .q = 

 ϕA i. =

X

α∈Ir3 ,m {q}

β β

=

X

α∈Ir3 ,m {q}

 α rdetq (MM∗ )q. ϕA , i. α

(5.9)

   α (1)  ∈ Hn×1 , f = 1, . . . , n, rdetq (MM∗ )q. bf. α

(5.10)



  β  ∈ H1×m , s = 1, . . . , m. cdeti (A∗ A).i b(1) .s

X

β

β∈Jr1 ,n {i}

(5.11)

(1)

(1)

Here bf. and b.s are the f -th row and the s-th column of B1 = A∗ BM∗ . (iii)

(3)

xij =

(

P

β∈Jr1 ,n {j}

β∈Jr1 ,n

−η

=

(

P

P

β∈Jr1 ,n

  β (1) cdeti (A∗ A).i ω.j β

|A∗ A|ββ )2 P

α∈Ir1 ,n {j}

P

α∈Ir2 ,m

|BB∗ |αα

P

β∈Jr3 ,m

  α (2) rdetj (A∗ A)j. ψi. η,

|A∗ A|ββ )2

(5.12)

|MM∗ |ββ

α

P

α∈Ir2 ,m

|BB∗ |αα

P

β∈Jr3 ,m

|MM∗ |ββ

,

(5.13)

Cramer’s Rules for Sylvester-Type Matrix Equations (1)

89

(2)

where ω.j is the j-th column of Ω1 = ΩΨ1 and ψi. is the i-th row of (1)

Ψ2 := Ωη2 Ψ. The matrices Ω = (ωit ) ∈ Hn×m , Ψ1 := (ψtj ) ∈ Hm×n , (2)

Ψ := (ψqj ) ∈ Hm×n , Ω2 = (ωiq ) are such that   α X (1) ωit = rdett (BB∗ )t. si. ,

(5.14)

α

α∈Ir2 ,m {t}

(1)

where si. is the i-th row of of S1 = A∗ SB∗ ; (1)

ψtj = −η

X

  α (1) rdetj (A∗ A)j. ct. η,

X

  β (1,∗) cdetq (MM∗ ).q b.f ,

(5.15)

α

α∈Ir1,n {j}

where C1 := Cη Ψ;

ψqj =

β∈Jr3 ,m {q} (1,∗)

where b.f

β

(5.16)

is f -th column of B∗1 = MB∗ A; (2) ωiq

X

=

β∈Jr1 ,n {i}

  β ∗ (2) cdeti (A A).i c.q ,

(5.17)

β

(2)

where c.q is the q-th column of C2 := ΩC. (iv) P

−η (1) ypg =

α∈Ir2 ,k {g}

P

β∈Jr3 ,k

=

P

β∈Jr3 ,k {p}

P

β∈Jr3 ,k

 α η rdetg (B∗ B)g. (vp. ) η α

|M∗ M|ββ

P

α∈Ir2 ,k

|B∗ B|αα

 β ∗ cdetp (M M).p (u.g )

|M∗ M|ββ

,

(5.18)

β

P

α∈Ir2 ,k

|B∗ B|αα

,

(5.19)

90

Ivan I. Kyrchei where 

vp . = 



X

u. g = −η

(v)

cdetp

β∈Jr3 ,k {p}

X



 β (M∗ M).p (b c.l )  ∈ H1×k , l = 1, . . . , k, (5.20) β



rdetg (B∗ B)g.

α∈Ir2 ,k {g}

 α  b cηq. η  ∈ Hk×1 , q = 1, . . . , k. α

(5.21)

b := M∗ CBη and b Here b c.l is the l-th column of C cηq. is the q-th row of b η. C (2) ypg =

P

P

β∈Jr4 ,k {p}

β∈Jr4 ,k

|S∗ S|ββ

 β cdetp (S∗ S).p (e υ.g ) β

P

β∈Ir2 ,k

|B∗ B|ββ

P

α∈Ir3 ,k

|M∗ M|αα

,

(5.22)

e := S∗ SΥ, Υ = (υtg ) is such that where υe.g is the g-th column of Υ υtg = X

β

cdett ((B∗ B).t (λ.g ))β = −η

β∈Jr2 ,k {t}

X

α∈Ir3 ,k {g}

 α rdetg (M∗ M)g. (µt. ) η, α

(5.23)

and 

λ.g = −η 

µt. = 

X

α∈Ir3 ,k {g}

X

β∈Jr2 ,k {t}

  α rdetg (M∗ M)g. (ˇ cl. ) η  ∈ Hk×1 , l = 1, . . . , k,

ˇη.q cdett (B∗ B).t c

α

β β



 ∈ H1×k , q = 1, . . . , k.

(5.24)

(5.25)

ˇ := B∗ CMη Here ˇcl. and cˇη.q are the l-th row and the q-th column of C ˇ η , respectively. and C Proof. (i) It’s evident that the equations (5.4)-(5.5) follow from Theorem 2.15.

91   (2) (ii) Consider the second term (A†BM†)(C(Aη∗)† ) := X2 = xij of (5.2). Taking  into account (2.12) for the determinantal representation of η∗,† η∗ † (A ) = atj , we have for the second multiplier C(Aη∗ )† Cramer’s Rules for Sylvester-Type Matrix Equations

m X

η∗,† cqtatj

t=1



 α  ∗ A) (a ) rdet (A j t. j. m  α∈Ir ,n {j} α  X   1 P = cqt · −η η = α ∗ |A A|α   P

t=1

β∈Ir1 ,n



 α  ∗ A) (a ) rdet (A j t. j. m X α α∈Ir1 ,n {j}   η P =−η cqt · η α ∗ |A A|α   t=1 P

β∈Ir1 ,n

−η

=

P

α∈Ir1 ,n {j}

 α rdetj (A∗ A)j. (e cq . ) η α

P

β∈Ir1 ,n

|A∗ A|αα

e := Cη A. By applying the determinantal reprewhere e cq . is the q-th row of C sentations (2.5) and (2.6) for the Moore-Penrose inverses A† and M†, respectively, and due to Theorem 2.15 for the first multiplier A†BM† , we obtain the matrix Φ = (φiq ) such that X β φiq = cdeti (A∗ A). i ϕM .q β β∈Jr1 ,n {i}

=

X

α∈Ir3 ,m {q}

and



 ϕM .q =

X

α∈Ir3 ,m {q}



 ϕA i. =

X

β∈Jr1 ,n {i}

 α ∗ A rdetq (MM )q. ϕi. , α

   α (1)  ∈ Hn×1 , f = 1, . . ., n, rdetq (MM∗ )q. bf. α



  β  ∈ H1×m , s = 1, . . . , m. cdeti (A∗ A).i b(1) .s β

92

Ivan I. Kyrchei (1)

(1)

Here bf. and b.s are the f -th row and the s-th column of B1 = A∗ BM∗ . So, we have ! m P P α φiq −η rdetj ((A∗ A)j.(e cq . ))α η (2)

xij =

q=1

(

P

α∈Ir1 ,n {j}

β∈Jr1 ,n

|A∗ A|ββ )2

P

α∈Ir3 ,m

|MM∗ |αα

.

(5.26)

e := Φη C∗ A. From this denotation and the equation (5.26), it follows Denote Φ (5.8).   (3) (iii) For the third term (A†SB† )C((M†)η∗ Bη∗ (A†)η∗ ) := X3 = xij of (5.2), we have

(3) xij

=

m P m P

q=1 t=1

P

(

β∈Jr1,n

|A∗ A|ββ )2

P

ω eit ctq ψeqj

α∈Ir2 ,m

|BB∗ |αα

P

β∈Jr3 ,m

|MM∗ |ββ

,

(5.27)

where ψeqj = −ηφjq η = −η

and 

ψq. = 

X

β∈Jr3 ,m {q}

(5.28)

ω.t = 

X

is f -th column of B∗1 = MB∗ A;

cdeti ((A∗ A).i (ω.t ))ββ ,

β∈Jr1 ,n {j}



α

β

(1,∗)

X

α∈Ir1,n {j}

 α rdetj (A∗ A)j. (ψq.) η,

   β (1,∗)  ∈ H1×n , f = 1, . . ., n, cdetq (MM∗ ).q b.f

is the row vector, b.f ω eit =

X

α∈Ir2 ,m {t}

   α (1)  ∈ Hn×1 , f = 1, . . ., n, rdett (BB∗ )t. sf. α

(5.29)

93

Cramer’s Rules for Sylvester-Type Matrix Equations (1)

where sf. is the f -th row of S1 = A∗ SB∗ . Construct the matrices Ψ = (ψqf ) ∈ Hm×n and Ω = (ωf t) ∈ Hn×m determined by (5.28) and (5.29), respectively. Denote C1 := Cη Ψ, Ψ1 := (1) (ψtj ), where (1) ψtj

m X

=

q=1

ctq ψeqj = −η

X

α∈Ir1 ,n {j}

  α (1) rdetj (A∗ A)j. ct. η. α

(1)

Here ct. is the t-th row of C1 , and Ω1 := ΩΨ1 . From these denotations and Eq. (5.27), it follows (5.12). (3) Another determinantal representation of xij can obtained by putting C2 := (2)

ΩC, Ω2 := (ωtj ), where (2)

ωiq =

m X q=1

(2)

ω eit ctq =

X

β∈Jr1 ,n {i}

  β cdeti (A∗ A).i c(2) . .q β

Here c.q is the q-th column of C2 , and Ψ2 := Ωη2 Ψ. From these denotations and Eq. (5.27), it follows (5.13).   η∗ (1) (iv) Now consider the first item M†C B† := Y1 = ypg of (5.3). η∗ Using the determinantal representations (2.5) for M†, and (2.12) for B† , we have (1) ypg =

m m X X t=1 l=1

PP t

P

` ´† m†pt ctl bη∗ = lg

l β∈Jr ,k {p} 3

“ ”β cdetp (M∗ M).p (m∗.t ) ctl

−η

β

P

β∈Jr3 ,k

|M∗ M|ββ

P

P

α∈Ir2 ,k {g}

β∈Ir2 ,k

|B∗ B| α α

“ ”α rdetg (B∗ B)j. (bl. ) η α

b = (b Denote M∗ CBη =: C cij ). Then, thinking as above, we have (1) ypg =

PP t

P

m β∈Jr ,k {p} 3





cdetp (M M).p (e.t ) P

β∈Jr3 ,k

”β β

ctl b

|M∗ M|ββ

−η P

P

α∈Ir2 ,k {g}

β∈Ir2 ,k

|B∗ B| α α





rdetg (B B)g. (el. )

”α α

η

!

!

,

,

94

Ivan I. Kyrchei

where el. and e. l are respectively the unit row and column vectors. If we denote by vpl :=

m X

X

t=1 β∈Jr ,k {p} 3

=

X

β∈Jr3 ,k {i}

β  cdetp (M∗ M).p (e.t ) b ctl β

 β cdetp (M∗ M).p (b c. l ) β

the l-th component of a row-vector vp. = [vp1 , . . . , vpm], then   m X X vpl −η rdetg ((B∗ B)g.(el. ))αα η  = l=1

α∈Ir2 ,k {g}



= −η 

X

α∈Ir2 ,k {g}





η rdetg (B∗ B)g.(vp. ) α η,

h i (1) η η η where vp. = vp1 , . . . , vpm . So, ypg has the determinantal representation (5.18), where vp. is (5.20). If we denote by   m X X utg := b ctl −η rdetg ((B∗ B)g. (el. ))αα η  l=1

X

=−η

α∈Ir2 ,k {g}

α

rdetg ((B∗ B)g. (b cηt.))αη

α∈Ir2 ,k {g}

the t-th component of a column-vector u.g = [u1g , . . . , umg ], then m X

X

t=1 β∈Jr3 ,k {p}

=

 β cdetp (M∗ M). p (e.t ) utg =

X

β∈Jr3 ,k {p}

β

 β cdetp (M∗ M). p (u.g ) . β

(1)

So, another determinantal representation of ypg is (5.19) with u.g determined by (5.21).

Cramer’s Rules for Sylvester-Type Matrix Equations 95   η∗ (2) (v) For the second term QS B† C M† = Y4 = ypg of (5.3) using (2.14) for a determinantal representation of QS , and similarly as in the point (v)  † † η∗ for B C M , we have k P

P

t=1 β∈Jr

(2) ypg =

P

β∈Jr4 ,k

4 ,k

{p}

β

|S∗ S|β

 β cdetp (S∗ S).p (˙s.t ) υtg β

P

P

β

β∈Jr2 ,k

|B∗ B|β

α∈Ir3 ,k

where s˙ .t is the tth column of S∗ S, υtg = =

X

β

cdett ((B∗ B).t (λ.g ))β = −η

β∈Jr2 ,k {t}

and



λ.g = −η 

µt. = 

X

α∈Ir3 ,k {g}

X

α∈Ir3 ,k {g}

X

|M∗ M|αα

,

(5.30)

 α rdetg (M∗ M)g. (µt. ) η, α

(5.31)

  α rdetg (M∗ M)g. (ˇcl. ) η  ∈ Hk×1 , l = 1, . . ., k, α

cdett (B∗ B).t

β∈Jr2 ,k {t}

  β cˇη.q β  ∈ H1×k , q = 1, . . . , k.

ˇ := B∗ CMη and cˇη.q is the q-th column of C ˇ η. Here ˇcl. is the l-th row of C Construct the matrix Υ = (υtg ), where υtg is determined by (5.31). Denote the e := S∗ SΥ. Using of this denotation in (5.30) yields to (5.22). matrix Υ So, Cramer’s rule for the Sylvester-type matrix equation (5.1) has the following algorithm. Algorithm 4.

(1)

(i) For the components xij .

b = A∗ CAη and A b η = Aη∗ Cη A. 1. Compute the matrices A

2. Compute the row vectors vi.η by (5.6) for all i = 1, . . . , n, or the column vectors v.j by (5.7) for all j = 1, . . . , n. (1)

3. Finally, find xij by (5.4) or (5.5), respectively to the vectors from the above point.

96

Ivan I. Kyrchei (2)

(ii) For the components xij . 1. Compute the matrices B1 = A∗ BM∗ , A∗ A, MM∗ . 2. Compute the column vectors ϕM .q by (5.10) for all q = 1, . . ., m, or A the row vectors ϕi. by (5.11) for all i = 1, . . ., n. 3. Compute the matrix Φ = (φik ) by one from the two cases in Eq.(5.9) accordingly to whether the column vectors ϕM .q , or the row vectors A ϕi. are obtained in the above point. ˜ = Φη C∗ A. 4. Compute the matrix Φ (2)

5. Finally, find xij by (5.8). (3)

(iii) For the components xij . 1. Compute the matrices S1 = A∗ SB∗ and B∗1 = MB∗ A. 2. Compute the matrix Ω by using (5.14). 3. Compute the matrix Ψ by (5.16). 4. Then, we have two different options: (a) Compute the matrix C1 = Cη Ψ. (b) Compute the matrix Ψ1 by (5.15) (c) Compute the matrix Ω1 = ΩΨ1 . (3)

(d) Then, find xij by (5.12). or (a) Compute the matrix C2 = ΩC. (b) Compute the matrix Ω2 by (5.17) (c) Compute the matrix Ψ2 = Ω2 Ψ. (3)

(d) Then, find xij by (5.13). (1)

(iv) For the components ygf . b = M∗ CBη and C b η. 1. Compute the matrices C

2. Compute the column vectors u .g by (5.21) for all g = 1, . . ., k, or the row vectors vp . by (5.20) for all p = 1, . . ., k. (1)

3. Finally, find ygf by (5.18) or (5.19) that correspond to the vectors from the above point.

Cramer’s Rules for Sylvester-Type Matrix Equations

97

(2)

(v) For the components ygf . ˇ := B∗ CMη and and C ˇ η. 1. Compute the matrices C 2. Compute the row vectors µt. by (5.25) for all t = 1, . . ., k, or the column vectors λ.g by (5.24) for all g = 1, . . ., k. 3. Compute the matrix Υ = (υtg ) by one from the two cases in Eq. (5.23) accordingly to whether the column vectors λ.g , or the row vectors µt. are obtained in the above point. e = S∗ SΥ. 4. Compute the matrix Υ (2)

5. Finally, find ygf by (5.22).

Now consider Eq. (5.1) with the restriction C = Cη∗ . Note that Eq. (5.1) has an η-Hermitian pair solution X, Y if and only if the matrix system ( b η∗ + BYB b η∗ = C, AXA (5.32) b η∗Aη∗ + BY b η∗ Bη∗ = C, AX

b Y, b where X, b Y b may not be η-Hermitian matrices. If the has a pair solution X, system (5.32) is consistent, then 1 b b η∗ ), Y = 1 (Y b +Y b η∗ ). +X X = (X 2 2

It’s evident that the system (5.32 ) is equivalent to Eq. (5.1) with the restriction b η∗ + BYB b η∗ = C, C = Cη∗ . AXA

(5.33)

b Y, b then Eq. (5.1) is consistent for X, Y. So, if (5.33) is consistent for X, By Lemma 5.1,  η∗  η∗ b =A†C(A†)η∗ − A†BM† C A† X − A†SB† C A† BM†  η∗ −A†SW2 A†S + LA U + ZLη∗ A,  η∗  η∗ η η b =M†C B† + QS B† C M† + LM W2 LM + VLB + LM LS W1 , Y

98

Ivan I. Kyrchei

So, 1 b b η∗ ) = A†C(A†)η∗ X = (X +X 2  η∗  η∗ i 1h † − A BM†C A† + A† C A†BM† 2  η∗  η∗ i 1h † − A BM†C A†SB† + A† SB† C A†BM† 2  η∗ − A†SW2 A†S + LA U + (LA U)η∗ , h  η∗  η∗ i 1 b b η∗ ) = 1 M†C B† +Y + B† C M† Y = (Y 2 2  η∗ i 1 h †  † η∗ η M C B QS + QS B† C M† + 2 + LM W2 LηM + VLηB + LB V η∗ + LM LS W1 + W1η∗ LηS LηM ,

(5.34)

(5.35)

where W1 , U, V, and W2 = W2η∗ are arbitrary matrices over H with an appropriate sizes. So, we proved the following proposition that first has been obtained in [70]. Corollary 5.1. [70] Let A ∈ Hm×n , B ∈ Hm×k , C = Cη∗ be given. Set M = RA B, S = BLM . Then the following statements are equivalent: (1) The matrix equation (5.1) has a pair of η-Hermitian solutions X = Xη∗ and Y = Y η∗ . (2) RM RA C = 0, RA CRη∗ B = 0. (3) rank

(5.36)

  A C = rank(A) + rank(B), rank [A B C] = rank [A B]. 0 Bη∗

In this case, the η-Hermitian solution Xη∗ = X, Y = Y η∗ to (5.1) can be expressed as (5.34)-(5.35). Similarly, an η-skew-Hermitian pair solution (X, Y) to Eq. (5.1) with the restriction C = −Cη∗ can be expressed as 1 b b η∗ ), Y = 1 (Y b −Y b η∗ ), X = (X −X 2 2

b Y) b is a pair solution to the system (5.32) that may not be η-skewwhere (X, Hermitian matrices. We have the following corollary.

Cramer’s Rules for Sylvester-Type Matrix Equations

99

Corollary 5.2. Let A ∈ Hm×n , B ∈ Hm×k , C = −Cη∗ be given. Set M = RA B, S = BLM . Then the following statements are equivalent: (1) The matrix equation (5.1) has a pair of η-skew-Hermitian solutions X = −Xη∗ and Y = −Y η∗ . (2) RM RA C = 0, RA CRη∗ B = 0.   A C (3) rank = rank(A) + rank(B), rank [A B C] = rank [A B]. 0 Bη∗ In this case, the η-skew-Hermitian solution X, Y to Eq. (5.1) can be expressed as  η∗  η∗ i 1h † A BM† C A† + A†C A† BM† X = −Xη∗ = A†C(A†)η∗ − 2  η∗  η∗ i 1h † † † † A BM C A SB + A†SB† C A† BM† − 2  η∗ − A†SW2 A†S − LA U + (LA U)η∗ , (5.37) h     i η∗ η∗ 1 Y = −Y η∗ = M† C B† + B† C M† 2  η∗ i 1 h †  †η∗ η MC B QS + QS B† C M† + 2 + LM W2 LηM + VLηB − LB V η∗ + LM LS W1 − W1η∗ LηS LηM , (5.38) By putting W1 , U, V, and W2 as zero-matrices with compatible dimensions in (5.34)-(5.35) and (5.37)-(5.38), we obtain the following partial η-Hermitian and η-skew-Hermitian solution to Eq. (5.1) with the restrictions C = Cη∗ and C = −Cη∗ , respectively,  η∗  η∗ i 1h † X =A†C(A†)η∗ − A BM†C A† + A†C A†BM† 2  η∗  η∗ i 1h † − A SB† C A†BM† + A† BM†C A†SB† , (5.39) 2 h     i η∗ η∗ 1 Y = M† C B† + B†C M† 2  η∗ i 1 h †  † η∗ η + M C B PS + PS B† C M† . (5.40) 2 The determinantal representations of (5.39)-(5.40) can be obtained by components as follows

100

Ivan I. Kyrchei

• for the η-Hermitian solution,     1 (3) 1 (2) (1) (2) (3) xij =xij − xij − ηxji η − xij − ηxji η , 2 2     1 (1) 1 (2) (1) (2) y − ηygp η , ypg = ypg − ηygp η + 2 2 pg

(5.41) (5.42)

• for the η-skew-Hermitian solution,     1 (2) 1 (3) (1) (2) (3) x + ηxji η − x + ηxji η , xij =xij − 2 ij 2 ij     1 (2) 1 (1) (1) (2) y + ηygp η , ypg = ypg + ηygp η + 2 2 pg for all i, j = 1, . . . , n and p, g = 1, . . . , k, where xij and ypg are determined by Theorem 5.2.

6.

A N E XAMPLE

Given the matrices:       i k k j 1 + 2j i + 2k A= , B= , C= . j −1 −j k −i + 2k −1

(6.1)

Since C = Cη∗ with η = i, we shall find the i-Hermitian solution to Eq. (5.1) with the given matrices (6.1). By Theorem 2.10, one can find,       1 −i −j 1 1 k 1 −k j † † A = , RA = , B = , 4 −k −1 2 −k 1 4 −j −k     1 1 i 1 i + k −1 + j , M= , RB = 2 −i 1 2 1−j i+k     1 −i − k 1 + j 1 −1 −k † M = , RM = . 1 4 −1 − j −i − k 2 k It is easy to check that the consistency conditions (5.36) of Eq. (5.1) are fulfilled by given matrices. So, Eq. (5.1) has η-Hermitian solutions. We compute the

Cramer’s Rules for Sylvester-Type Matrix Equations

101

partial solution (5.2)-(5.3) by Cramer’s rule from Theorem 5.2. So,       −i −j i −k 2 2j ∗ η ∗ A = , A = , A A= , −k −1 −j −1 −2j 2   b = A∗ CAη = 6 − 4j −4 − 6j . A −4 − 6j −6 + 4j

Since rank(A∗ A) = 1, then by (5.6),   η   η v1. = 6 + 4j −4 + 6j , v2. = −4 + 6j −6 − 4j .

Further, by (5.4), we obtain

−i(6 + 4j)i = 0.375 − 0.25j, 16 −i(−4 + 6j)i (1) x12 = = −0.25 − 0.375j, 16 −i(−4 + 6j)i (1) x21 = = −0.25 − 0.375j, 16 −i(−6 − 4j)i (1) x22 = = −0.375 + 0.25j. 16   2k −2 (2) ∗ ∗ Now, we find xij by (5.8). Since B1 = A BM = , then −2i 2j (1)

x11 =

A ϕA 1. = [2k, − 2] , ϕ2. = [−2i, 2j] .

Taking into account (5.9), we get Φ =   4 4j . Finally, we have 4j −4

(2)

 2k −2 e = Φη C∗ A = and Φ −2i 2j

−i(4)i −i(4j)i) (2) = 0.125, x12 = = −0.125j, 32 32 −i(4j)i) −i(−4)i (2) = = −0.125j, x22 = = −0.125. 32 32 (2)

x11 = x21



102

Ivan I. Kyrchei (3)

Since S = 0 and, ergo, xij = 0 for all i, j = 1, 2, then by (5.41) 1 (2) (2) (1) x11 =x11 − (x11 − ix11 i) = 0.25 − 0.25j, 2 1 (2) (1) (2) x12 =x12 − (x12 − ix21 i) = −0.25 − 0.25j, 2 1 (2) (1) (2) x21 =x21 − (x21 − ix12 i) = −0.25 − 0.25j, 2 1 (2) (1) (2) x22 =x22 − (x22 − ix22 i) = −0.25 + 0.25j. 2 Further, we find ypg by (5.42) for all p, g = 1, 2. Since       2j −2k 1 i 2 2i ∗ η ∗ ∗ ˆ C = M CB = , M M= , B B= −2k −2j −i 1 −2i 2 and by (5.20) v1. = [2j, − 2k] , v2. = [−2k, − 2j] , then by (5.18) we have (1)

2j = 0.25j, 8 −2k = = −0.25k, 8 −2j = = −0.25j. 8

y11 = (1)

(1)

y21 = y12

(1)

y22 (1)

(1)

(2)

Taking into account yij = −iyji i and yij = 0 for all i, j = 1, 2, it follows that     0.25 − 0.25j −0.25 − 0.25j 0.25j −0.25k X= , Y= −0.25 − 0.25j −0.25 + 0.25j −0.25k −0.25j is the partial i-Hermitian solution to Eq.(5.1) with the given matrices (6.1).

Cramer’s Rules for Sylvester-Type Matrix Equations

103

R EFERENCES [1] Baksalary, J.K., Kala, R. (1980). “The matrix equation AXB − CY D = E”. Linear Algebra and its Applications, 30: 141-147. [2] Wang, Q.W. (2004). “A system of matrix equations and a linear matrix equation over arbitrary regular rings with identity”. Linear Algebra and its Applications, 384: 43-54. [3] Wang, Q.W.,Van der Woude, J.W. and Chang, H.X. (2009). “A system of real quaternion matrix equations with applications”. Linear Algebra and its Applications, 431(1): 2291-2303. [4] Tian, Y. (2006). “Ranks and independence of solutions of the matrix equation AXB + CY D = M ”. Acta Mathematica Universitatis Comenianae, 1: 75-84. [5] Wang, Q.W., Zhang, H.S. and Yu, S.W. (2008). “On solutions to the quaternion matrix equation AXB + CY D = E”. Electronic Journal of Linear Algebra, 17: 343-358. [6] Liao, A.P., Bai, Z.Z. and Lei, Y. (2006). “Best approximate solution of matrix equation AXB + CY D = E.” SIAM Journal on Matrix Analysis and Applications, 27(3): 675-688. [7] Lin, Y. and Wang, Q.W. (2013). “Iterative solution to a system of matrix equations”. Abstract and Applied Analysis, ID 124979, 7 p. [8] Yin, F. and Huang, G.-X. (2012). “An iterative algorithm for the least squares generalized reflexive solutions of the matrix equations AXB = E, CXD = F ”. Abstract and Applied Analysis, ID 857284, 18 p. [9] Huang, L. (1996). “The matrix equation AXB − GXD = E over the quaternion field”. Linear Algebra and its Applications, 234: 197-208. [10] He, H.Z., Cai, G.-B. and Han, X.-J. (2014). “Optimal pole assignment of linear systems by the Sylvester matrix equations”. Abstract and Applied Analysis, ID 301375, 7 p. [11] He, H.Z., Wang, Q.W. and Zhang, Y. (2018). “A system of quaternary coupled Sylvester-type real quaternion matrix equations”. Automatica, 87: 25-31.

104

Ivan I. Kyrchei

[12] He, H.Z. and Wang, Q.W. (2017). “A System of periodic discrete-time coupled Sylvester quaternion matrix equations”. Algebra Colloquium, 24(1): 169-180. [13] Wang, Q.W., Rehman, A., He, H.Z. and Zhang, Y. (2016). “Constraint generalized Sylvester matrix equations”. Automatica, 69: 60-64. [14] Rehman, A., Wang, Q.W., Ali, I., Akram, M. and Ahmad, M.O. (2017). “A constraint system of generalized Sylvester quaternion matrix equations”. Advances in Applied Clifford Algebras, 27(4): 3183-3196. [15] Futorny, V., Klymchuk, T. and Sergeichuk, V.V. (2016). “Roth’s solvability b = C and X −AXB b = C over criteria for the matrix equations AX − XB the skew field of quaternions with an involutive automorphism q → qb”. Linear Algebra and its Applications, 510: 246-258.

[16] S¸ims¸ek, S., Sarduvan, M. and `‘Ozdemir, H. (2017). “Centrohermitian and skew-centrohermitian solutions to the minimum residual and matrix nearness problems of the quaternion matrix equation (AXB, DXE) = (C, F )”. Advances in Applied Clifford Algebras, 27(3): 2201-2214.

[17] Song, C., Chen, G. (2011). “On solutions of matrix equation XF − AX = ˜ = C over quaternion field”. Journal of Applied MatheC and XF − AX matics and Computing, 37: 57-68. [18] Song, C., Chen, G., Liu, Q. (2012). “Explicit solutions to the quaternion ¯ = C”. International matrix equations X − AXF = C and X − AXF Journal of Computer Mathematics, 89: 890-900. [19] Yuan, S.F. and Liao A.P. (2011). “Least squares solution of the quater¯ = C with the least norm”. Linear and nion matrix equation X − AXB Multilinear Algebra, 59: 985-998. [20] Yuan, S.F., Wang, Q.W., Yu, Y.B. and Tian, Y. (2017). “On hermitian solutions of the split quaternion matrix equation AXB + CXD = E”. Advances in Applied Clifford Algebras, 27(4): 3235-3252. [21] Chang, X.W. and Wang, J.S. (1993). “The symmetric solution of the matrix equations AX + XA = C, AXAT + BY B T = C, and (AT XA, B T XB) = (C, D).” Linear Algebra and its Applications, 179: 171-189.

Cramer’s Rules for Sylvester-Type Matrix Equations

105

[22] Xu, G.P., Wei, M.S. and Zheng, D.S. (1998). “On solutions of matrix equation AXB + CY D = F .” Linear Algebra and its Applications, 279: 93109. [23] Zhang, X. (2004). “The general Hermitian nonnegative-deffinite and positive-definite solutions to the matrix equation GXG∗ + HY H ∗ = C”. Journal of Applied Mathematics and Computing, 14: 51-67. [24] Yuan, S., Liao, A. and Yao, G. (2011). “The matrix nearness problem assocated with the quaternion matrix equation AXAH + BY B H = C”. Journal of Applied Mathematics and Computing, 37: 133-144. [25] Wang, Q.W. and Zhang, F. (2008). “The reflexive re-nonnegative definite solution to a quaternion matrix equation”. Electronic Journal of Linear Algebra, 17: 88-101. [26] Wang, Q.W. and Jiang, J. (2010). “Extreme ranks of (skew-)hermitian solutions to a quaternion matrix equation”. Electronic Journal of Linear Algebra, 20: 552-573. [27] Took, C.C. and Mandic, D.P. (2011). “Augmented second-order statistics of quaternion random signals.” Signal Processing., 91: 214-224. [28] Took, C.C., Mandic, D.P. and Zhang, F. (2011). “On the unitary diagonalisation of a special class of quaternion matrices.” Applied Mathematics Letters, 24: 1806-1809. [29] Horn, R.A. and Zhang, F. (2012). “A generalization of the complex Autonne-Takagi factorization to quaternion matrices.” Linear and Multilinear Algebra, 60(11-12): 1239–1244. [30] Yuan, S.F. and Wang, Q.W. (2012). “Two special kinds of least squares solutions for the quaternion matrix equation AXB + CXD = E”. Electronic Journal of Linear Algebra, 23: 257–274 [31] Rodman, L. (2014). “Topics in Quaternion Linear Algebra”. Princeton University Press, Princeton. [32] Aghamollaei, G., Rahjoo, M. (2018). “On quaternionic numerical ranges with respect to nonstandard involutions”. Linear Algebra and its Applications, 540: 11-25.

106

Ivan I. Kyrchei

[33] He, Z.H. (2019). “Structure, properties and applications of some simultaneous decompositions for quaternion matrices involving φ-skewHermicity”. Advances in Applied Clifford Algebras, 29:6. [34] He, Z.H., Liu, J.and Tam, T.Y. (2017). “The general φ-Hermitian solution to mixed pairs of quaternion matrix Sylvester equations”. Electronic Journal of Linear Algebra, 32: 475-499. [35] Liu, X. (2018). “The η-anti-Hermitian solution to some classic matrix equations.” Applied Mathematics and Computation, 320: 264-270. [36] Bapat, R.B., Bhaskara, K.P.S. and Prasad, K.M. (1990). “Generalized inverses over integral domains”. Linear Algebra and its Applications, 140: 181-196. [37] Stanimirovic, P.S. (1996). “General determinantal representation of pseudoinverses of matrices”. Matemati˘cki Vesnik, 48: 1-9. [38] Kyrchei, I.I. (2015). “Cramer’s rule for generalized inverse solutions”. In: Kyrchei, I.I. (Ed.), Advances in Linear Algebra Research, pp. 79-132. Nova Science Publ., New York. [39] Kyrchei, I.I. (2007). “Cramer’s rule for quaternion systems of linear equations”. Fundamentalnaya i Prikladnaya Matematika, 13(4): 67-94. [40] Kyrchei, I.I. (2012). “The theory of the column and row determinants in a quaternion linear algebra”. In: Baswell, A.R. (Ed.), Advances in Mathematics Research 15, pp. 301-359. Nova Sci. Publ., New York. [41] Kyrchei, I.I. (2012). “Determinantal representations of the Moore-Penrose inverse over the quaternion skew field and corresponding Cramer’s rules”. Journal of Mathematical Sciences, 180(1): 23-33. [42] Kyrchei, I.I. (2014). “Determinantal representations of the Drazin inverse over the quaternion skew field with applications to some matrix equations”. Applied Mathematics and Computation, 238: 193-207. [43] Kyrchei, I.I. (2014). “Determinantal representations of the W-weighted Drazin inverse over the quaternion skew field”. Applied Mathematics and Computation, 264: 453-465.

Cramer’s Rules for Sylvester-Type Matrix Equations

107

[44] Kyrchei, I.I. (2017). “Weighted singular value decomposition and determinantal representations of the quaternion weighted Moore-Penrose inverse”. Applied Mathematics and Computation, 309: 1-16. [45] Kyrchei, I.I. (2017). “Explicit representation formulas for the minimum norm least squares solutions of some quaternion matrix equations”. Linear Algebra and its Applications, 438(1): 136-152. [46] Kyrchei, I.I. (2017). “Determinantal representations of the quaternion weighted Moore-Penrose inverse and its applications”. In: Baswell, A.R. (Ed.), Advances in Mathematics Research 23, pp.35–96. Nova Sci. Publ., New York. [47] Kyrchei, I.I. (2018). “Explicit determinantal representation formulas for the solution of the two-sided restricted quaternionic matrix equation”. Journal of Applied Mathematics and Computing, 58(1-2): 335-365. [48] Kyrchei, I.I. (2016). “Explicit determinantal representation formulas of W-weighted Drazin inverse solutions of some matrix equations over the quaternion skew field”. Mathematical Problems in Engineering, ID 8673809, 13 p. [49] Kyrchei, I.I. (2017). “Determinantal representations of the Drazin and W-weighted Drazin inverses over the quaternion skew field with applications”. In: Griffin, S. (Ed.), Quaternions: Theory and Applications, pp.201-275. Nova Sci. Publ., New York. [50] Kyrchei, I.I. (2018). “Cramer’s rules for Sylvester quaternion matrix equation and its special cases.” Advances in Applied Clifford Algebras, 28(5):90. [51] Kyrchei, I.I. (2019). “Determinantal representations of general and (skew)Hermitian solutions to the generalized Sylvester-type quaternion matrix equation”. Abstract and Applied Analysis, ID 5926832, 14 p. [52] Kyrchei, I.I. (2019). “Cramer’s Rules of η-(skew-) Hermitian solutions to the quaternion Sylvester-type matrix equations”. Advances in Applied Clifford Algebras, 29(3):56.

108

Ivan I. Kyrchei

[53] Kyrchei, I.I. (2018). “Determinantal representations of solutions to systems of quaternion matrix equations”. Advances in Applied Clifford Algebras, 28(1):23. [54] Kyrchei, I.I. (2018). “Determinantal representations of solutions and Hermitian solutions to some system of two-sided quaternion matrix equations”. Journal of Mathematics, ID 6294672, 12 p. [55] Kyrchei, I.I. (2019). “Determinantal representations of solutions to systems of two-sided quaternion matrix equations.” Linear and Multilinear Algebra, 1–25. Doi: 10.1080/03081087.2019.1614517 [56] Kyrchei, I.I. (2018). “Cramer’s rules for the system of two-sided matrix equations and of its special cases”. In: Yasser, H.A. (Ed.), Matrix Theory – Applications and Theorems, pp. 3-20. IntechOpen, London. [57] Kyrchei, I.I. (2019). “Cramer’s rules for the system of quaternion matrix equations with η-Hermicity”. 4open, 2:24. [58] Song, G.J. (2012). “Determinantal representation of the generalized inverses over the quaternion skew field with applications”. Applied Mathematics and Computation, 219(2): 656-667. [59] Song, G.J. (2013). “Bott-Duffin inverse over the quaternion skew field with applications”. Journal of Applied Mathematics and Computing, 41(1-2): 377-392. [60] Song, G.J., Wang, Q.W. and Chang, H.X. (2011). “Cramer rule for the unique solution of restricted matrix equations over the quaternion skew field”. Computers and Mathematics with Applications, 61: 1576-1589. [61] Song, G.J., Wang, Q.W. and Yu, S.W. (2018). “Cramer’s rule for a system of quaternion matrix equations with applications”. Applied Mathematics and Computation, 336: 490-499. [62] Dieudonne, J. (1943). “Les determinantes sur un corps non-commutatif”. Bulletin de la Soci´et´e Math´ematique de France, 71: 27-45. [63] Study, E. (1920). “Zur Theorie der linearen Gleichungen”. Acta Mathematica, 42: 1-61.

Cramer’s Rules for Sylvester-Type Matrix Equations

109

[64] Dyson, F.J. (1972). “Quaternion determinants”. Helvetica Physica Acta, 45: 289-302. [65] Chen, L. (1991). “Definition of determinant and Cramer solutions over quaternion field”. Acta Mathematica Sinica (N.S.), 7: 171-180. [66] Gelfand, I., Gelfand, S., Retakh, V. and Wilson, R.L. (2005). “Quasideterminants”. Advances in Mathematics, 193: 56-141. [67] Aslaksen, H. (1996). “Quaternionic determinants”. Mathematical Intelligencer, 18(3): 57-65. [68] Cohen, N. and De Leo, S. (2000). “The quaternionic determinant”. Electronic Journal of Linear Algebra, 7: 100-111. [69] Maciejewski, A.A. and Klein, C.A. (1985). “Obstacle avoidance for kinematically redundant manipulators in dynamically varying environments“. The International Journal of Robotics Research, 4(3): 109-117. [70] He, Z.H. and Wang, Q.W. (2013). “A real quaternion matrix equation with applications”. Linear and Multilinear Algebra, 61(6): 725-740.

In: Hot Topics in Linear Algebra Editor: Ivan I. Kyrchei

ISBN: 978-1-53617-770-1 c 2020 Nova Science Publishers, Inc.

Chapter 3

B I CR A LGORITHM FOR C OMPUTING G ENERALIZED B ISYMMETRIC S OLUTIONS OF G ENERAL C OUPLED M ATRIX E QUATIONS Masoud Hajarian∗ Department of Mathematics, Faculty of Mathematical Sciences, Shahid Beheshti University, General Campus, Evin, Tehran, Iran

Abstract The linear matrix equations have a wide range of application as, e.g., in the model of the vibrating structures, the system and control theory. In this chapter, we deal with the generalized bisymmetric solution pair (X, Y ) of the general coupled matrix equations  Pf Pgi=1 (Ai XBi + Ci Y Di ) = M, j=1 (Ej XFj + Gj Y Hj ) = N.

We develop the Hestenes-Stiefel (HS) version of biconjugate residual (BiCR) algorithm to establish a matrix algorithm for finding the generalized bisymmetric solution pair (X, Y ) of the general coupled matrix equations in a finite number of iterations in the absence of round-off errors. Finally, numerical comparisons are made with the CGNR and CGNE algorithms, which verify the effectiveness and applicability of the established algorithm. ∗

Corresponding Author’s Email: m [email protected].

112

Masoud Hajarian

Keywords: HS version of BiCR algorithm, generalized bisymmetric solution, general coupled matrix equations, CGNR algorithm, CGNE algorithm AMS Subject Classification: 15A24; 39B42; 65F10; 65F30

1.

INTRODUCTION

Notation. Throughout this chapter, the symbols tr(A), ||A||, κ(A) and R(A) stand for the trace, the Frobenius norm, the condition number and the column space of a matrix A ∈ Rm×n , respectively. For a matrix A ∈ Rm×n , the socalled stretching function vec(A) is defined by vec(A) = (aT1 , aT2 , ..., aTn )T , where ak is the k-th column of A. The notation A ⊗ B stands for the Kronecker product of matrices A and B. A matrix P ∈ Rn×n is called a symmetric orthogonal matrix if P = P T and P 2 = I. Throughout, we always assume that P ∈ Rn×n and Q ∈ Rm×m are given symmetric orthogonal matrices. If A = AT = P AP then A ∈ Rn×n is called a generalized bisymmetric matrix with respect to P . The symbol BSRPn×n denotes the set of n × n generalized bisymmetric matrices with respect to P . It is obvious that every symmetric matrix A ∈ Rn×n is also a generalized bisymmetric matrix with respect to I. The linear matrix equations appear frequently in a wide variety of engineering applications and are of growing interest [1, 2, 3]. For instance, when finite element techniques are applied to model of the vibrating structures such as highways, bridges, buildings and automobiles, we face to (coupled) Sylvester matrix equations over symmetric matrices, for more details see [4, 5, 6] and references therein. To solve the problem of updating damping and stiffness matrices using symmetric eigenstructure assignment we need to find the symmetric solutions of the Sylvester matrix equations [5]. The generalized Sylvester matrix equation AV − V F = BW [7] is closely related with pole assignment by state feedback [8], eigenstructure assignment design [9], Luenbergertype observer design [10] and output regulation and constrained control [11]. In recent 10 years, much attention has been given in the literature to the development of direct and iterative methods for solving linear matrix equations [12, 13, 14, 15, 16, 17]. Li et al. introduced the successive projection iterative method to solve the linear matrix equation AX = B [18]. Deng et al. proposed an orthogonal direction method for computing the Hermitian minimum norm solution or the linear matrix equation AXB = C [19]. In [20], the matrix form of the LSQR algorithm was presented for computing the minimum

BiCR Algorithm for Computing Generalized Bisymmetric Solutions ... 113 norm least-squares solution of AXB + CY D = E. Li and Huang established the LSQR iterative method for solving the generalized coupled Sylvester matrix equations [21]. In [22, 23, 24, 25, 26, 17], the generalized forms of the CGNE and CGNR algorithms were presented for solving several constrained matrix equations. In [27, 28, 29, 30, 31, 32, 33, 34], several gradient-based iterative (GI) algorithms were introduced for computing the solutions of various linear matrix equations. The Krylov subspace methods are one of the widely used and successful classes of numerical methods to find the solutions of linear matrix equations [35, 36, 37, 38, 39, 40, 41, 42]. The goal of this chapter is to generalize the HS version of BiCR algorithm for computing the generalized bisymmetric solutions X ∈ BSRPn×n and Y ∈ m×m of the general coupled matrix equations BSRQ ( P f (A XBi + Ci Y Di) = M, Pgi=1 i j=1 (Ej XFj + Gj Y Hj ) = N,

(1)

where Ai ∈ Rp×n , Bi ∈ Rn×q , Ci ∈ Rp×m , Di ∈ Rm×q , Ej ∈ Rs×n , Fj ∈ Rn×t , Gi ∈ Rs×m , Hj ∈ Rm×t , M ∈ Rp×q and N ∈ Rs×t for i = 1, 2, ..., f and j = 1, 2, ..., g. A brief synopsis is as follows. In the next section, first we introduce the generalized HS version of BiCR algorithm to solve the general coupled matrix equations (1) over the generalized bisymmetric matrices. Then we prove that the introduced algorithm converges to the generalized bisymmetric solutions of (1) in a finite number of iterations in the absence of round-off errors. To assess the ability and accuracy of the introduced algorithm, we present two numerical examples in Section 3. Section 3 gives conclusion.

2.

MAIN R ESULTS

We start with recalling the HS version of BiCR algorithm for solving nonsymmetric linear system Ax = b. There exist several ideas to compute the solution of nonsymmetric linear system Ax = b. Vespucci and Broyden in [43] proposed different versions of BiCR algorithm without any convergence study. In [43], numerical examples were only used to test the efficiency of proposed algorithms. One of the proposed BiCR algorithms for solving Ax = b is the HS version of BiCR algorithm that can be summarized as follows:

114

Masoud Hajarian

HS version of BiCR algorithm Initial values: x(1) and s(1) arbitrary, r(1) = Ax(1) − b, u(1) = s(1), v(1) = r(1), w(1) = Au(1), and z(1) = AT v(1). Recursions: T As(k) , x(k + 1) = x(k) − α(k)u(k), α(k) = r(k) w(k)T w(k) r(k + 1) = r(k) − α(k)w(k), T As(k) β(k) = r(k) , s(k + 1) = s(k) − β(k)z(k), z(k)T z(k) r(k+1)T As(k+1)

γ(k) = r(k)T As(k) , u(k + 1) = s(k + 1) + γ(k)u(k), v(k + 1) = r(k + 1) + η(k)v(k), w(k + 1) = Au(k + 1), z(k + 1) = AT v(k + 1). If by applying the Kronecker product and stretching function, the problem of finding the generalized bisymmetric solutions of (1) is transformed into the following system   P Pf f   BiT ⊗ Ai DiT ⊗ Ci vec(M ) i=1 i=1 P P   f f  vec(M T ) A ⊗ BiT C ⊗ DiT      Pf i=1 T i Pf i=1 T i  vec(M )   i=1 Bi P ⊗ Ai P Di Q ⊗ Ci Q  i=1    Pf    Pf T T  vec(X) vec(M T )  Q P C Q ⊗ D A P ⊗ B i i   P   i i i=1 i=1 P g g  vec(Y ) =  vec(N ) ,  T ⊗G T ⊗E H F j j    Pj=1 j j Pj=1 {z }    vec(N T )  g g T | G ⊗ H  P j=1 Ej ⊗ FjT   j x j Pg j=1 T   g  vec(N )  H Q ⊗ G Q   j=1 FjT P ⊗ Ej P j Pgj=1 j Pg T T vec(N T ) G Q ⊗ H P E P ⊗ F j j j Q j j=1 j=1 | {z } | {z } A

b

(2) then we can use the HS version of BiCR algorithm for solving the system (2). But obviously the size of the coefficient matrix of the system (2) is large compared to the size of the coefficient matrices of the general coupled matrix equations (1). Therefore this approach is not advisable. In order to prevent the enlargement of the size of the coefficient matrices, we substitute the vector parameters of the HS version of BiCR algorithm with new matrix parameters. Indeed, we present the matrix form of the HS version of BiCR algorithm for solving (1). By substituting the system (2) into the HS version of BiCR algorithm and defining the matrix parameters, we obtain the following algorithm for solving (1). Algorithm 1. Given the arbitrary matrices X(1) ∈ BSRPn×n , Y (1) ∈

BiCR Algorithm for Computing Generalized Bisymmetric Solutions ... 115 m×m m×m BSRQ and S1 (1) ∈ BSRPn×n , S2 (1) ∈ BSRQ ; Compute

R1 (1) =

f X (Ai X(1)Bi + Ci Y (1)Di ) − M,

R2 (1) =

i=1

g X (Ej X(1)Fj + Gj Y (1)Hj ) − N, j=1

U1 (1) = S1 (1), U2 (1) = S2 (1), V1 (1) = R1 (1), V2 (1) = R2 (1), f X (Ai U1 (1)Bi + Ci U2 (1)Di ),

W1 (1) =

i=1

Z1 (1) = +

W2 (1) =

g X

(Ej U1 (1)Fj + Gj U2 (1)Hj ),

j=1

f 1 X T [ (Ai V1 (1)BiT + Bi V1 (1)T Ai + P ATi V1 (1)BiT Q + P Bi V1 (1)T Ai P ) 4 i=1

g X (EjT V2 (1)FjT + Fj V2 (1)T Ej + P EjT V2 (1)FjT P + P Fj V2 (1)T Ej P )], j=1

Z2 (1) = +

g X

f 1 X T [ (Ci V1 (1)DiT + Di V1 (1)T Ci + QCiT V1 (1)DiT Q + QDi V1 (1)T Ci Q) 4 i=1

(GTj V2 (1)HjT + Hj V2 (1)T Gj + QGTj V2 (1)HjT Q + QHj V2 (1)T Gj Q)];

j=1

For k = 1, 2, ..., do: α(k) =

tr(R1 (k)T (

Pf

T Pg i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(R2 (k) ( j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj ))) , tr(W1 (k)T W1 (k)) + tr(W2 (k)T W2 (k))

X(k + 1) = X(k) − α(k)U1 (k),

Y (k + 1) = Y (k) − α(k)U2 (k),

R1 (k + 1) = R1 (k) − α(k)W1 (k), R2 (k + 1) = R2 (k) − α(k)W2 (k), β(k) =

tr(R1 (k)T (

Pf

i=1

Pg (Ej S1 (k)Fj + Gj S2 (k)Hj ))) j=1 , T T tr(Z1 (k) Z1 (k)) + tr(Z2 (k) Z2 (k))

(Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(R2 (k)T (

S1 (k + 1) = S1 (k) − β(k)Z1 (k),

S2 (k + 1) = S2 (k) − β(k)Z2 (k),

Pf tr(R1 (k + 1)T ( i=1 (Ai S1 (k + 1)Bi + Ci S2 (k + 1)Di ))) γ(k) = Pf Pg T tr(R1 (k) ( i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(R2 (k)T ( j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj ))) +

P tr(R2 (k + 1)T ( g (Ej S1 (k + 1)Fj + Gj S2 (k + 1)Hj ))) j=1 P Pg f tr(R1 (k)T ( i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(R2 (k)T ( j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj ))) U1 (k + 1) = S1 (k + 1) + γ(k)U1 (k),

U2 (k + 1) = S2 (k + 1) + γ(k)U2 (k),

V1 (k + 1) = R1 (k + 1) + γ(k)V1 (k), V2 (k + 1) = R2 (k + 1) + γ(k)V2 (k), W1 (k + 1) =

f X

i=1

Z1 (k + 1) =

(Ai U1 (k + 1)Bi + Ci U2 (k + 1)Di ), W2 (k + 1) =

g X

(Ej U1 (k + 1)Fj + Gj U2 (k + 1)Hj ),

j=1

f 1 X T T T T T T [ (Ai V1 (k + 1)Bi + Bi V1 (k + 1) Ai + P Ai V1 (k + 1)Bi P + P Bi V1 (k + 1) Ai P ) 4 i=1

116

Masoud Hajarian +

g X

T

T

T

T

T

T

(Ej V2 (k + 1)Fj + Fj V2 (k + 1) Ej + P Ej V2 (k + 1)Fj P + P Fj V2 (k + 1) Ej P )],

j=1

Z2 (k + 1) =

+

g X

f 1 X [ (CiT V1 (k + 1)DiT + Di V1 (k + 1)T Ci + QCiT V1 (k + 1)DiT Q + QDi V1 (k + 1)T Ci Q) 4 i=1 T

T

T

T

T

T

(Gj V2 (k + 1)Hj + Hj V2 (k + 1) Gj + QGj V2 (k + 1)Hj Q + QHj V2 (k + 1) Gj Q)].

j=1

Stopping criterion. To check convergence, we use the stopping criterion p ||R1(k)||2 + ||R2(k)||2 ≤ tol,

where tol is a chosen fixed threshold.

Remark 1. If in the above algorithm, we set P = Q = I then this algorithm compute the symmetric solution pair (X, Y ) of the general coupled matrix equations (1). Remark 2. Notice that we have by Algorithm 1: γ(k) =

+

=

P tr(R1 (k + 1)T ( f i=1 (Ai S1 (k + 1)Bi + Ci S2 (k + 1)Di )))+ P Pg f tr(R1 (k)T ( i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(R2 (k)T ( j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj )))

P tr(R2 (k + 1)T ( g (Ej S1 (k + 1)Fj + Gj S2 (k + 1)Hj ))) j=1 P Pg f tr(R1 (k)T ( i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(R2 (k)T ( j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj ))) tr(R1 (k)T ( tr(R1 (k)T (

Pf

T i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(R2 (k) (

Pf

T i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(R2 (k) (

Pg

j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj )))

Pg

j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj )))

P T Pg tr(R1 (k)T ( f i=1 (Ai Z1 (k)Bi + Ci Z2 (k)Di ))) + tr(R2 (k) ( j=1 (Ej Z1 (k)Fj + Gj Z2 (k)Hj ))) −β(k) P Pg f tr(R1 (k)T ( i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(R2 (k)T ( j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj ))) −α(k)

T Pg i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(W2 (k) ( j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj ))) P Pg f tr(R1 (k)T ( i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(R2 (k)T ( j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj )))

tr(W1 (k)T (

+α(k)β(k)

= 1−



tr(W1 (k)T (

Pf

tr(R1 (k)T (

tr(R1 (k)T (

tr(W1 (k)T (

tr(W1 (k)T (

T i=1 (Ai Z1 (k)Bi + Ci Z2 (k)Di ))) + tr(W2 (k) (

Pf

T i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(R2 (k) (

Pg

j=1 (Ej Z1 (k)Fj + Gj Z2 (k)Hj )))

Pg

j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj )))

Pf

T Pg i=1 (Ai Z1 (k)Bi + Ci Z2 (k)Di ))) + tr(R2 (k) ( j=1 (Ej Z1 (k)Fj + Gj Z2 (k)Hj ))) T tr(Z1 (k) Z1 (k)) + tr(Z2 (k)T Z2 (k))

Pf

h tr(R1 (k)T ( +

×

Pf

T Pg i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(W2 (k) ( j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj ))) tr(W1 (k)T W1 (k)) + tr(W2 (k)T W2 (k))

Pf

T Pg i=1 (Ai S1 (k)Bi + Ci S2 (k)Di ))) + tr(R2 (k) ( j=1 (Ej S1 (k)Fj + Gj S2 (k)Hj ))) tr(Z1 (k)T Z1 (k)) + tr(Z2 (k)T Z2 (k))

Pf

T Pg i i=1 (Ai Z1 (k)Bi + Ci Z2 (k)Di ))) + tr(W2 (k) ( j=1 (Ej Z1 (k)Fj + Gj Z2 (k)Hj ))) , T tr(W1 (k) W1 (k)) + tr(W2 (k)T W2 (k))

BiCR Algorithm for Computing Generalized Bisymmetric Solutions ... 117

W1 (k + 1) =

f X

(AiS1 (k + 1)Bi + Ci S2 (k + 1)Di ) + γ(k)W1(k),

i=1

W2 (k + 1) =

g X (Ej S1 (k + 1)Fj + Gj S2 (k + 1)Hj ) + γ(k)W2(k), j=1

Z1 (k + 1) =

+

g X

f X

1 T T T T T T [ (Ai R1 (k + 1)Bi + Bi R1 (k + 1) Ai + P Ai R1 (k + 1)Bi P + P Bi R1 (k + 1) Ai P ) 4 i=1

(EjT R2 (k + 1)FjT + Fj R2 (k + 1)T Ej + P EjT R2 (k + 1)FjT P + P Fj R2 (k + 1)T Ej P )] + γ(k)Z1 (k),

j=1

Z2 (k + 1) =

+

g X

f 1 X T T T T T T [ (Ci R1 (k + 1)Di + Di R1 (k + 1) Ci + QCi R1 (k + 1)Di Q + QDi R1 (k + 1) Ci Q) 4 i=1

T T T T T (GT j R2 (k + 1)Hj + Hj R2 (k + 1) Gj + QGj R2 (k + 1)Hj Q + QHj R2 (k + 1) Gj Q)] + γ(k)Z2 (k).

j=1

Now some important properties of Algorithm 1 are given in the following lemma. Lemma 1. Suppose that the sequences {R1 (k)}, {R2 (k)}, {W1 (k)}, {W2 (k)}, {S1 (k)}, {S2 (k)}, {Z1 (k)} and {Z2 (k)} are generated by Algorithm 1. If there exists a positive integer number r such that W (k) 6= 0 and R(k) 6= 0 for all k = 1, 2, ..., r then tr(R1 (v)T W1 (u)) + tr(R2 (v)T W2 (u)) = 0, for u, v = 1, 2, ..., r,

u < v, (3)

tr(S1 (v)T Z1 (u)) + tr(S2 (v)T Z2 (u)) = 0, for u, v = 1, 2, ..., r,

u < v,

(4)

tr(Z1 (v)T Z1 (u)) + tr(Z2 (v)T Z2 (u)) = 0, for u, v = 1, 2, ..., r,

u 6= v,

(5)

T

T

tr(W1 (v) W1 (u)) + tr(W2 (v) W2 (u)) = 0, for u, v = 1, 2, ..., r,

u 6= v. (6)

Proof. Let us prove the lemma proceeding by induction on u and v. For v = 2 and u = 1, we can state tr(R1 (2)T W1 (1)) + tr(R2 (2)T W2 (1)) = tr((R1 (1) − α(1)W1(1))T W1 (1)) + tr((R2(1) − α(1)W2(1))T W2 (1)) = 0, tr(S1 (2)T Z1 (1)) + tr(S2 (2)T Z2 (1)) = tr((S1 (1) − β(1)Z1(1))T Z1 (1)) + tr((S2 (1) − β(1)Z2 (1))T Z2 (1)) = 0, tr(Z1 (2)T Z1 (1)) + tr(Z2 (2)T Z2 (1))

118

Masoud Hajarian f

1 X = tr(( [ (ATi R1 (2)BiT + Bi R1 (2)T Ai + P ATi R1 (2)BiT P + P Bi R1 (2)T Ai P ) 4 i=1 +

g X

(EjT R2 (2)FjT + Fj R2 (2)T Ej + P EjT R2 (2)FjT P + P Fj R2 (2)T Ej P )] + γ(1)Z1 (1))T Z1 (1))

j=1

+tr((

+

g X

f 1 X [ (CiT R1 (2)DiT + Di R1 (2)T Ci + QCiT R1 (2)DiT Q + QDi R1 (2)T Ci Q) 4 i=1

T T T T T T (GT j R2 (2)Hj + Hj R2 (2) Gj + QGj R2 (2)Hj Q + QHj R2 (2) Gj Q)] + γ(1)Z2 (1)) Z2 (1))

j=1

= tr((

f 1 X T T T [ (Ai (R1 (1) − α(1)W1 (1))Bi + Bi (R1 (1) − α(1)W1 (1)) Ai 4 i=1 T

T

T

+P Ai (R1 (1) − α(1)W1 (1))Bi P + P Bi (R1 (1) − α(1)W1 (1)) Ai P ) g X

+

T

T

T

(Ej (R2 (1) − α(1)W2 (1))Fj + Fj (R2 (1) − α(1)W2 (1)) Ej

j=1 T

T

T

T

+P Ej (R2 (1) − α(1)W2 (1))Fj P + P Fj (R2 (1) − α(1)W2 (1)) Ej P )]) Z1 (1))

+tr((

f 1 X [ (CiT (R1 (1) − α(1)W1 (1))DiT + Di (R1 (1) − α(1)W1 (1))T Ci 4 i=1

+QCiT (R1 (1) − α(1)W1 (1))DiT Q + QDi (R1 (1) − α(1)W1 (1))T Ci Q) +

g X

T T (GT j (R2 (1) − α(1)W2 (1))Hj + Hj (R2 (1) − α(1)W2 (1)) Gj

j=1 T T T +QGT j (R2 (1) − α(1)W2 (1))Hj Q + QHj (R2 (1) − α(1)W2 (1)) Gj Q)]) Z2 (1)) 2

2

+γ(1)[||Z1 (1)|| + ||Z2 (1)|| ] 2

2

= ||Z1 (1)|| + ||Z2 (1)|| −

tr(R1 (1)T (

Pf

T Pg i=1 (Ai S1 (1)Bi + Ci S2 (1)Di ))) + tr(R2 (1) ( j=1 (Ej S1 (1)Fj + Gj S2 (1)Hj ))) tr(W1 (1)T W1 (1)) + tr(W2 (1)T W2 (1))

×[tr((

f X

T

T

(Ai W1 (1)Bi ) +

i=1

+tr((

f X

i=1

g X

T

T

T

T

T

(Ej W2 (1)Fj )) Z1 (1))

j=1

T

T

(Ci W1 (1)Di ) +

g X

T

(Gj W2 (1)Hj )) Z2 (1))]

j=1

+[||Z1 (1)||2 + ||Z2 (1)||2 ] h × 1−



Pf Pg tr(R1 (1)T ( i=1 (Ai Z1 (1)Bi + Ci Z2 (1)Di ))) + tr(R2 (1)T ( j=1 (Ej Z1 (1)Fj + Gj Z2 (1)Hj ))) tr(Z1 (1)T Z1 (k)) + tr(Z2 (1)T Z2 (1))

Pf Pg tr(W1 (1)T ( i=1 (Ai S1 (1)Bi + Ci S2 (1)Di ))) + tr(W2 (1)T ( j=1 (Ej S1 (1)Fj + Gj S2 (1)Hj ))) tr(W1 (1)T W1 (1)) + tr(W2 (1)T W2 (1))

P P h tr(R1 (1)T ( f (Ai S1 (1)Bi + Ci S2 (1)Di ))) + tr(R2 (1)T ( g (Ej S1 (1)Fj + Gj S2 (1)Hj ))) i=1 j=1 + T T tr(Z1 (1) Z1 (1)) + tr(Z2 (1) Z2 (1))

BiCR Algorithm for Computing Generalized Bisymmetric Solutions ... 119 ×

tr(W1 (1)T (

Pf

i=1 (Ai Z1 (1)Bi

+ Ci Z2 (1)Di ))) + tr(W2 (1)T (

tr(W1

(1)T

W1 (1)) + tr(W2

Pg

j=1 (Ej Z1 (1)Fj

(1)T W

2 (1))

+ Gj Z2 (1)Hj ))) ii

= 0,

tr(W1 (2)T W1 (1)) + tr(W2 (2)T W2 (1)) = tr((

f X

T

(Ai S1 (2)Bi + Ci S2 (2)Di ) + γ(1)W1 (1)) W1 (1))

i=1

+tr((

g X

T

(Ej S1 (2)Fj + Gj S2 (2)Hj ) + γ(1)W2 (1)) W2 (1))

j=1

= tr((

f X

(Ai (S1 (1) − β(1)Z1 (1))Bi + Ci (S2 (1) − β(1)Z2 (1))Di ))T W1 (1))

i=1

+tr((

g X

T

(Ej (S1 (1) − β(1)Z1 (1))Fj + Gj (S2 (1) − β(1)Z2 (1))Hj )) W2 (1))

j=1

+γ(1)[||W1 (1)||2 + ||W2 (1)||2 ] 2

2

= ||W1 (1)|| + ||W2 (1)|| − β(1)[tr((

f X

T

(Ai Z1 (1)Bi + Ci Z2 (1)Di )) W1 (1))

i=1

+tr((

g X

T

(Ej Z1 (1)Fj + Gj Z2 (1)Hj )) W2 (1))

j=1 2

2

+[||W1 (1)|| + ||W2 (1)|| ] h tr(R1 (1)T ( × 1−



tr(W1 (1)T (

tr(W1 (1)T (

i=1 (Ai Z1 (1)Bi

i=1 (Ai S1 (1)Bi

Pf

i=1

+ Ci Z2 (1)Di ))) + tr(R2 (1)T (

Pg

j=1 (Ej Z1 (1)Fj

+ Gj Z2 (1)Hj )))

tr(Z1 (1)T Z1 (k)) + tr(Z2 (1)T Z2 (1))

Pf

h tr(R1 (1)T ( +

×

Pf

+ Ci S2 (1)Di ))) + tr(W2 (1)T (

Pg

+ Gj S2 (1)Hj )))

Pg

(Ej S1 (1)Fj + Gj S2 (1)Hj )))

j=1 (Ej S1 (1)Fj

tr(W1 (1)T W1 (1)) + tr(W2 (1)T W2 (1)) (Ai S1 (1)Bi + Ci S2 (1)Di ))) + tr(R2 (1)T ( tr(Z1

Pf

i=1 (Ai Z1 (1)Bi

(1)T

Z1 (1)) + tr(Z2

(1)T Z

+ Ci Z2 (1)Di ))) + tr(W2 (1)T (

j=1

2 (1))

Pg

j=1 (Ej Z1 (1)Fj

tr(W1 (1)T W1 (1)) + tr(W2 (1)T W2 (1))

+ Gj Z2 (1)Hj ))) ii

= 0.

It follows from the above, the relations (3)-(6) hold for v = 2 and u = 1. Now for u < w < r, we assume that tr(R1 (w)T W1 (u)) + tr(R2 (w)T W2 (u)) = 0, tr(S1 (w)T Z1 (u)) + tr(S2 (w)T Z2 (u)) = 0,

(7)

tr(Z1 (w)T Z1 (u)) + tr(Z2 (w)T Z2 (u)) = 0, tr(W1 (w)T W1 (u)) + tr(W2 (w)T W2 (u)) = 0. (8)

It can be seen that tr(R1 (w + 1)T W1 (u)) + tr(R2 (w + 1)T W2 (u)) = tr((R1 (w) − α(w)W1(w))T W1 (u)) + tr((R2 (w) − α(w)W2(w))T W2 (u)) = 0, tr(S1 (w + 1)T Z1 (u)) + tr(S2 (w + 1)T Z2 (u)) = tr((S1 (w) − β(w)Z1(w))T Z1 (u)) + tr((S2 (w) − β(w)Z2(w))T Z2 (u)) = 0,

120

Masoud Hajarian tr(Z1 (w + 1)T Z1 (u)) + tr(Z2 (w + 1)T Z2 (u))

= tr((

+

f 1 X T T T T T T (Ai R1 (w + 1)Bi + Bi R1 (w + 1) Ai + P Ai R1 (w + 1)Bi P + P Bi R1 (w + 1) Ai P ) [ 4 i=1

g X

T

T

T

T

T

T

(Ej R2 (w + 1)Fj + Fj R2 (w + 1) Ej + P Ej R2 (w + 1)Fj P + P Fj R2 (w + 1) Ej P )]

j=1 T

+γ(w)Z1 (w)) Z1 (u)) +tr((

+

f 1 X T T T T T T (Ci R1 (w + 1)Di + Di R1 (w + 1) Ci + QCi R1 (w + 1)Di Q + QDi R1 (w + 1) Ci Q) [ 4 i=1

g X

T

T

T

T

T

T

(Gj R2 (w + 1)Hj + Hj R2 (w + 1) Gj + QGj R2 (w + 1)Hj Q + QHj R2 (w + 1) Gj Q)]

j=1 T

+γ(w)Z2 (w)) Z2 (u)) f X

= tr((

+tr((

T (AT i R1 (w + 1)Bi ) +

1

j=1

f X

g X

(CiT R1 (w + 1)DiT ) +

T T (GT j R2 (w + 1)Hj ) Z2 (u))

j=1

T

β(u)

(EjT R2 (w + 1)FjT )T Z1 (u))

i=1

i=1

=

g X

[tr(R1 (w + 1) (

f X

(Ai (−S1 (u + 1) + S1 (u))Bi + Ci (−S1 (u + 1) + S1 (u))Di )))

i=1 T

+tr(R2 (w + 1) (

g X

(Ej (−S2 (u + 1) + S2 (u))Fj + Gj (−S2 (u + 1) + S2 (u))Hj )))]

j=1

=

1 β(u)

[tr(R1 (w + 1)T (−W1 (u + 1) + γ(u)W1 (u) + W1 (u) − γ(u − 1)W1 (u − 1))) T

+tr(R2 (w + 1) (−W2 (u + 1) + γ(u)W2 (u) + W2 (u) − γ(u − 1)W2 (u − 1)))] =−

1

T

β(u)

[tr(R1 (w + 1) W1 (u + 1)) + tr(R2 (w + 1)

T

W2 (u + 1))],

tr(W1 (w + 1)T W1 (u)) + tr(W2 (w + 1)T W2 (u)) = tr((

f X

(Ai S1 (w + 1)Bi + Ci S2 (w + 1)Di ) + γ(w)W1 (w))T W1 (u))

i=1

+tr((

g X

(Ej S1 (w + 1)Fj + Gj S2 (w + 1)Hj ) + γ(w)W2 (w))T W2 (u))

j=1

= tr((

f X

T

(Ai S1 (w + 1)Bi + Ci S2 (w + 1)Di )) W1 (u))

i=1

+tr((

g X

(Ej S1 (w + 1)Fj + Gj S2 (w + 1)Hj ))T W2 (u))

j=1

=

1 α(u)

T

[tr(S1 (w + 1) (

f 1 X T T T [ (Ai (R1 (u) − R1 (u + 1))Bi + Bi (R1 (u) − R1 (u + 1)) Ai 4 i=1

(9)

BiCR Algorithm for Computing Generalized Bisymmetric Solutions ... 121 T

T

T

+P Ai (R1 (u) − R1 (u + 1))Bi P + P Bi (R1 (u) − R1 (u + 1)) Ai P ) +

g X

T

T

(Ej (R2 (u) − R2 (u + 1))Fj

j=1

+Fj (R2 (u) − R2 (u + 1))T Ej + P EjT (R2 (u) − R2 (u + 1))FjT P + P Fj (R2 (u) − R2 (u + 1))T Ej P )])) T

+tr(S2 (w + 1) (

f 1 X T T T (Ci (R1 (u) − R1 (u + 1))Di + Di (R1 (u) − R1 (u + 1)) Ci [ 4 i=1 T

T

+QCi (R1 (u) − R1 (u + 1))Di Q T

+QDi (R1 (u) − R1 (u + 1)) Ci Q) +

g X

T

T

(Gj (R2 (u) − R2 (u + 1))Hj

j=1 T T +Hj (R2 (u) − R2 (u + 1))T Gj + QGT j (R2 (u) − R2 (u + 1))Hj Q + QHj (R2 (u) − R2 (u + 1)) Gj Q)]))]

1

=

α(u)

[tr((S1 (w + 1))T (Z1 (u) − γ(u − 1)Z1 (u − 1) − Z1 (u + 1) + γ(u)Z1 (u)))) T

+tr((S2 (w + 1)) (Z2 (u) − γ(u − 1)Z2 (u − 1) − Z2 (u + 1) + γ(u)Z2 (u))))] =−

1 α(u)

T

T

[tr((S1 (w + 1) Z1 (u + 1)) + tr((S2 (w + 1) Z2 (u + 1))].

(10)

For u = w, it can also be shown T

T

tr(R1 (w + 1) W1 (w)) + tr(R2 (w + 1) W2 (w)) = tr((R1 (w) − α(w)W1 (w))T W1 (w)) + tr((R2 (w) − α(w)W2 (w))T W2 (w)) = tr(R1 (w)T W1 (w)) + tr(R2 (w)T W2 (w)) T

−tr(R1 (w) (

f X

T

(Ai S1 (w)Bi + Ci S2 (w)Di ))) − tr(R2 (w) (

i=1

g X

(Ej S1 (w)Fj + Gj S2 (w)Hj )))

j=1

= tr(R1 (w)T W1 (w)) + tr(R2 (w)T W2 (w)) −tr(R1 (w)T (W1 (w) − γ(w − 1)W1 (w − 1))) − tr(R2 (w)T (W2 (w) − γ(w − 1)W2 (w − 1))) = 0, T

T

tr(S1 (w + 1) Z1 (w)) + tr(S2 (w + 1) Z2 (w)) T

T

= tr((S1 (w) − β(w)Z1 (w)) Z1 (w)) + tr((S2 (w) − β(w)Z2 (w)) Z2 (w)) T

T

= tr(S1 (w) Z1 (w)) + tr(S2 (w) Z2 (w)) −tr(R1 (w)T (

f X

(Ai S1 (w)Bi + Ci S2 (w)Di ))) − tr(R2 (w)T (

i=1

g X

(Ej S1 (w)Fj + Gj S2 (w)Hj )))

j=1 T

T

= tr(S1 (w) Z1 (w)) + tr(S2 (w) Z2 (w)) −tr((

+

g X

f 1 X T T T T T [ (AT i R1 (w)Bi + Bi R1 (w) Ai + P Ai R1 (w)Bi P + P Bi R1 (w) Ai P ) 4 i=1

(EjT R2 (w)FjT + Fj R2 (w)T Ej + P EjT R2 (w)FjT P + P Fj R2 (w)T Ej P )])T S1 (w))

j=1

+

−tr((

f 1 X T T T T T T [ (Ci R1 (w)Di + Di R1 (w) Ci + QCi R1 (w)Di Q + QDi R1 (w) Ci Q) 4 i=1

g X

T

T

T

T

T

T

T

(Gj R2 (w)Hj + Hj R2 (w) Gj + QGj R2 (w)Hj Q + QHj R2 (w) Gj Q)]) S2 (w))

j=1

122

Masoud Hajarian T

T

= tr(S1 (w) Z1 (w)) + tr(S2 (w) Z2 (w)) −tr((Z1 (w) − γ(w − 1)Z1 (w − 1)))T S1 (w)) − tr((Z2 (w) − γ(w − 1)Z2 (w − 1)))T S2 (w)) = 0, T

T

tr(Z1 (w + 1) Z1 (w)) + tr(Z2 (w + 1) Z2 (w)) = tr((

+

f 1 X T T T T T [ (AT i R1 (w + 1)Bi + Bi R1 (w + 1) Ai + P Ai R1 (w + 1)Bi P + P Bi R1 (w + 1) Ai P ) 4 i=1

g X

T

T

T

T

T

T

(Ej R2 (w + 1)Fj + Fj R2 (w + 1) Ej + P Ej R2 (w + 1)Fj P + P Fj R2 (w + 1) Ej P )]

j=1

+γ(w)Z1 (w))T Z1 (w)) +tr((

+

f 1 X T T T T T T [ (Ci R1 (w + 1)Di + Di R1 (w + 1) Ci + QCi R1 (w + 1)Di Q + QDi R1 (w + 1) Ci Q) 4 i=1

g X

T T T T T (GT j R2 (w + 1)Hj + Hj R2 (w + 1) Gj + QGj R2 (w + 1)Hj Q + QHj R2 (w + 1) Gj Q)]

j=1 T

+γ(w)Z2 (w)) Z2 (w)) = tr((

f X

T (AT i R1 (w + 1)Bi ) +

i=1

+tr((

f X

(EjT R2 (w + 1)FjT ))T Z1 (w))

j=1

(CiT R1 (w + 1)DiT +

i=1

g X

T T 2 2 (GT j R2 (w + 1)Hj )) Z2 (w)) + γ(w)[||Z1 (w)|| + ||Z2 (w)|| ]

j=1

= tr((

f X

(Ai (R1 (w) − α(w)W1 (w))Bi ) +

f X

(Ci (R1 (w) − α(w)W1 (w))Di +

T

T

i=1

+tr((

g X

g X

T

T

T

(Ej (R2 (w) − α(w)W2 (w))Fj )) Z1 (w))

j=1

T

g X

T

i=1

T

T

T

(Gj (R2 (w) − α(w)W2 (w))Hj )) Z2 (w))

j=1

+γ(w)[||Z1 (w)||2 + ||Z2 (w)||2 ] T

T

= tr((Z1 (w) − γ(w − 1)Z1 (w − 1)) Z1 (w)) + tr((Z2 (w) − γ(w − 1)Z2 (w − 1)) Z2 (w)) −

Pf Pg tr(R1 (w)T ( i=1 (Ai S1 (w)Bi + Ci S2 (w)Di ))) + tr(R2 (w)T ( j=1 (Ej S1 (w)Fj + Gj S2 (w)Hj ))) tr(W1 (w)T W1 (w)) + tr(W2 (w)T W2 (w))

×tr((

f X

(Ai W1 (w)Bi ) +

f X

(CiT W1 (w)DiT +

T

T

i=1

+tr((

i=1

g X

T

T

T

(Ej W2 (w)Fj )) Z1 (w))

j=1 g X

T T (GT j W2 (w)Hj )) Z2 (w))

j=1

+[||Z1 (w)||2 + ||Z2 (w)||2 ] ×[1 −

Pf Pg tr(R1 (w)T ( i=1 (Ai Z1 (w)Bi + Ci Z2 (w)Di ))) + tr(R2 (w)T ( j=1 (Ej Z1 (w)Fj + Gj Z2 (w)Hj ))) tr(Z1 (w)T Z1 (w)) + tr(Z2 (w)T Z2 (w))

Pf Pg tr(W1 (w)T ( i=1 (Ai S1 (w)Bi + Ci S2 (w)Di ))) + tr(W2 (w)T ( j=1 (Ej S1 (w)Fj + Gj S2 (w)Hj ))) − T T tr(W1 (w) W1 (w)) + tr(W2 (w) W2 (w)) h tr(R1 (w)T ( +

Pf

T Pg i=1 (Ai S1 (w)Bi + Ci S2 (w)Di ))) + tr(R2 (w) ( j=1 (Ej S1 (w)Fj + Gj S2 (w)Hj ))) tr(Z1 (w)T Z1 (w)) + tr(Z2 (w)T Z2 (w))

BiCR Algorithm for Computing Generalized Bisymmetric Solutions ... 123 ×

tr(W1 (w)T (

Pf

T Pg i i=1 (Ai Z1 (w)Bi + Ci Z2 (w)Di ))) + tr(W2 (w) ( j=1 (Ej Z1 (w)Fj + Gj Z2 (w)Hj ))) T tr(W1 (w) W1 (w)) + tr(W2 (w)T W2 (w)) 2

2

2

2

= ||Z1 (w)|| + ||Z2 (w)|| + [||Z1 (w)|| + ||Z2 (w)|| ] ×[1 −

Pf Pg tr(R1 (w)T ( i=1 (Ai Z1 (w)Bi + Ci Z2 (w)Di ))) + tr(R2 (w)T ( j=1 (Ej Z1 (w)Fj + Gj Z2 (w)Hj ))) tr(Z1 (w)T Z1 (w)) + tr(Z2 (w)T Z2 (w))

P T Pg tr(W1 (w)T ( f i=1 (Ai S1 (w)Bi + Ci S2 (w)Di ))) + tr(W2 (w) ( j=1 (Ej S1 (w)Fj + Gj S2 (w)Hj ))) − ] = 0, T tr(W1 (w) W1 (w)) + tr(W2 (w)T W2 (w)) T

T

tr(W1 (w + 1) W1 (w)) + tr(W2 (w + 1) W2 (w)) = tr((

f X

T

(Ai S1 (w + 1)Bi + Ci S2 (w + 1)Di ) + γ(w)W1 (w)) W1 (w))

i=1

+tr((

g X

T

(Ej S1 (w + 1)Fj + Gj S2 (w + 1)Hj ) + γ(w)W2 (w)) W2 (w))

j=1

= tr((

f X

T

(Ai (S1 (w) − β(w)Z1 (w))Bi + Ci (S2 (w) − β(w)Z2 (w))Di )) W1 (w))

i=1

+tr((

g X

T

(Ej (S1 (w) − β(w)Z1 (w))Fj + Gj (S2 (w) − β(w)Z2 (w))Hj )) W2 (w))

j=1

+γ(w)(||W1 (w)||2 + ||W2 (w)||2 ) = tr((W1 (w) − γ(w − 1)W1 (w − 1))T W1 (w)) + tr((W2 (w) − γ(w − 1)W2 (w − 1))T W2 (w)) −

tr(R1 (w)T (

×[tr((

f X

Pf

T Pg i=1 (Ai S1 (w)Bi + Ci S2 (w)Di ))) + tr(R2 (w) ( j=1 (Ej S1 (w)Fj + Gj S2 (w)Hj ))) T tr(Z1 (w) Z1 (w)) + tr(Z2 (w)T Z2 (w)) T

(Ai Z1 (w)Bi + Ci Z2 (w)Di )) W1 (w)) + tr((

i=1

g X

T

(Ej Z1 (w)Fj + Gj Z2 (w)Hj )) W2 (w))]

j=1

+(||W1 (w)||2 + ||W2 (w)||2 ) h tr(R1 (w)T ( × 1−



tr(W1 (w)T (

tr(W1 (w)T (

T Pg i=1 (Ai Z1 (w)Bi + Ci Z2 (w)Di ))) + tr(R2 (w) ( j=1 (Ej Z1 (w)Fj + Gj Z2 (w)Hj ))) T tr(Z1 (w) Z1 (w)) + tr(Z2 (w)T Z2 (w))

Pf

h tr(R1 (w)T ( +

×

Pf

T Pg i=1 (Ai S1 (w)Bi + Ci S2 (w)Di ))) + tr(W2 (w) ( j=1 (Ej S1 (w)Fj + Gj S2 (w)Hj ))) tr(W1 (w)T W1 (w)) + tr(W2 (w)T W2 (w))

Pf

T Pg i=1 (Ai S1 (w)Bi + Ci S2 (w)Di ))) + tr(R2 (w) ( j=1 (Ej S1 (w)Fj + Gj S2 (w)Hj ))) T tr(Z1 (w) Z1 (w)) + tr(Z2 (w)T Z2 (w))

Pf

T Pg ii i=1 (Ai Z1 (w)Bi + Ci Z2 (w)Di ))) + tr(W2 (w) ( j=1 (Ej Z1 (w)Fj + Gj Z2 (w)Hj ))) tr(W1 (w)T W1 (w)) + tr(W2 (w)T W2 (w)) 2

2

2

2

= (||W1 (w)|| + ||W2 (w)|| ) + (||W1 (w)|| + ||W2 (w)|| ) h tr(R1 (w)T ( × 1−



tr(W1 (w)T (

Pf

Pf

T Pg i=1 (Ai Z1 (w)Bi + Ci Z2 (w)Di ))) + tr(R2 (w) ( j=1 (Ej Z1 (w)Fj + Gj Z2 (w)Hj ))) T tr(Z1 (w) Z1 (w)) + tr(Z2 (w)T Z2 (w))

T Pg i i=1 (Ai S1 (w)Bi + Ci S2 (w)Di ))) + tr(W2 (w) ( j=1 (Ej S1 (w)Fj + Gj S2 (w)Hj ))) = 0. tr(W1 (w)T W1 (w)) + tr(W2 (w)T W2 (w))

124

Masoud Hajarian

From tr(Z1 (w)T Z1 (u)) + tr(Z2 (w)T Z2 (u)) = 0, tr(R1 (w + 1)T (W1 (w)) + tr(R2 (w + 1)T (W2 (w)) = 0 and (9) we conclude that tr(Z1 (w + 1)T Z1 (u)) + tr(Z2 (w + 1)T Z2 (u)) = 0. Also it follows from tr(W1 (w)T W1 (u))+tr(W2 (w)T W2 (u)) = 0, tr((S1 (w + 1)T Z1 (w)) + tr((S2(w + 1)T Z2 (w)) = 0 and (10) that tr(W1 (w + 1)T W1 (u)) + tr(W2 (w + 1)T W2 (u)) = 0. By combining the above relations, the proof is now complete by induction. Our next theorem shows that Algorithm 1 can compute the generalized bisymmetric solution pair of the general coupled matrix equations (1) within a finite number of iterations in the absence of round-off errors. Theorem 1. Let the general coupled matrix equations (1) over the generalized bisymmetric solution pair (X, Y ) be consistent. Algorithm 1 can obtain the generalized bisymmetric solution pair of (1) within a finite number of iterations in the absence of round-off errors. Proof. By using the relations (3) and (6) it is not difficult to see Algorithm 1 converges to the generalized bisymmetric solution pair of (1) within a finite number of iterations in the absence of round-off errors. In what follows, we show that the least norm generalized bisymmetric solution pair of (1) can be obtained by Algorithm 1 when the appropriate initial generalized bisymmetric matrices are chosen. Theorem 2. Let the general coupled matrix equations (1) over the generalized bisymmetric solution pair (X, Y ) be consistent. If we take the initial generalized bisymmetric matrices f X

X(1) =

T T T T T (AT i J1 (1)Bi + Bi J1 (1) Ai + P Ai J1 (1)Bi P + P Bi J1 (1) Ai P )

i=1

+

g X

(EjT J2 (1)FjT + Fj J2 (1)T Ej + P EjT J2 (1)FjT P + P Fj J2 (1)T Ej P ),

j=1

Y (1) =

f X i=1

(CiT J1 (1)DiT + Di J1 (1)T Ci + QCiT J1 (1)DiT Q + QDi J1 (1)T Ci Q)

(11)

BiCR Algorithm for Computing Generalized Bisymmetric Solutions ... 125 +

g X

T T T T T (GT j J2 (1)Hj + Hj J2 (1) Gj + QGj J2 (1)Hj Q + QHj J2 (1) Gj Q),

(12)

j=1

and matrices f X

S1 (1) =

T T T T T (AT i K1 (1)Bi + Bi K1 (1) Ai + P Ai K1 (1)Bi P + P Bi K1 (1) Ai P )

i=1

+

g X

(EjT K2 (1)FjT + Fj K2 (1)T Ej + P EjT K2 (1)FjT P + P Fj K2 (1)T Ej P ),

(13)

j=1

S2 (1) =

f X

(CiT K1 (1)DiT + Di K1 (1)T Ci + QCiT K1 (1)DiT Q + QDi K1 (1)T Ci Q)

i=1

+

g X

T T T T T (GT j K2 (1)Hj + Hj K2 (1) Gj + QGj K2 (1)Hj Q + QHj K2 (1) Gj Q),

(14)

j=1

where J1 (1), K1(1) ∈ Rp×q and J2 (1), K2(1) ∈ Rs×t are arbitrary, then the generalized bisymmetric solution pair (X ∗, Y ∗ ) generated by Algorithm 1 is the least Frobenius norm generalized bisymmetric solution pair of (1). Proof. By Algorithm 1 and (11)-(14), it can be shown that there exist the matrices J1 (k), K1(k) ∈ Rp×q and J2 (k), K2(k) ∈ Rs×t such that f X (ATi J1 (k)BiT + Bi J1 (k)T Ai + P ATi J1 (k)BiT P + P Bi J1 (k)T Ai P )

X(k) =

i=1

+

g X (EjT J2 (k)FjT + Fj J2 (k)T Ej + P EjT J2 (k)FjT P + P Fj J2 (k)T Ej P ),

(15)

j=1

Y (k) =

f X (CiT J1 (k)DiT + Di J1 (k)T Ci + QCiT J1 (k)DiT Q + QDi J1 (k)T Ci Q) i=1

+

g X

(GTj J2 (k)HjT + Hj J2 (k)T Gj + QGTj J2 (k)HjT Q + QHj J2 (k)T Gj Q),

(16)

j=1

S1 (k) =

f X

(ATi K1 (k)BiT + Bi K1 (k)T Ai + P ATi K1 (k)BiT P + P Bi K1 (k)T Ai P )

i=1

g X + (EjT K2 (k)FjT + Fj K2(k)T Ej + P EjT K2 (k)FjT P + P Fj K2 (k)T Ej P ),

(17)

j=1

S2 (k) =

f X i=1

(CiT K1 (k)DiT + DiK1 (k)T Ci + QCiT K1 (k)DiT Q + QDi K1 (k)T Ci Q)

126

Masoud Hajarian

+

g X

(GTj K2 (k)HjT + Hj K2 (k)T Gj + QGTj K2 (k)HjT Q + QHj K2 (k)T Gj Q).

(18)

j=1

Now we have using (2), (15) and (16) that 0 Pf BiT ⊗ Ai B Pi=1 f A ⊗ BiT B B Pf i=1 T i B i=1 Bi P ⊗ Ai P „ « B Pf T B vec(X(k)) i=1 Ai P ⊗ Bi P =B B Pg F T ⊗ Ej vec(Y (k)) B Pj=1 j g B BP j=1 Ej ⊗ FjT B g @ j=1 FjT P ⊗ Ej P Pg T j=1 Ej P ⊗ Fj P 00 P f BiT ⊗ Ai BB Pi=1 f BB A ⊗ BiT BB Pf i=1 T i BB B BB Pi=1 i P ⊗ Ai P BB f A P ⊗ B T P i B i=1 i ∈ RB g BB P T BB Pj=1 Fj ⊗ Ej BB g E ⊗ FjT BB BBPg j=1 T j @@ j=1 Fj P ⊗ Ej P Pg T j=1 Ej P ⊗ Fj P

1T Pf DiT ⊗ Ci Pi=1 C f C ⊗ DiT C Pf i=1 T i C D Q ⊗ Ci Q C C Pfi=1 i T C i=1 Ci Q ⊗ Di Q C P g T Hj ⊗ Gj C C j=1 Pg Gj ⊗ HjT C C j=1 Pg C HjT Q ⊗ Gj QA Pj=1 g T j=1 Gj Q ⊗ Hj Q

0

1 vec(J1 (k)) Bvec(J1 (k)T )C B C B vec(J1 (k)) C B C Bvec(J1 (k)T )C B C B vec(J2 (k)) C B C T Bvec(J2 (k) )C B C @ vec(J2 (k)) A T vec(J2 (k) )

1T 1 Pf DiT ⊗ Ci Pi=1 C C f C ⊗ DiT C C Pf i=1 T i C C C D Q ⊗ Ci Q C C C Pfi=1 i T C C C Q ⊗ D Q i C C = R(AT ). i=1 i P g C HjT ⊗ Gj C C C j=1 Pg T C C G ⊗ H C j C j Pg j=1 T C C Hj Q ⊗ Gj QA A j=1 Pg T j=1 Gj Q ⊗ Hj Q

This implies that the generalized bisymmetric solution pair generated by Algorithm 1 with (11)-(14) is the least Frobenius norm generalized bisymmetric solution pair of (1).

3.

N UMERICAL E XPERIMENTS

In this section, the efficiency and suitability of Algorithm 1 are shown in comparison with other algorithms via two numerical examples. Example 1. We consider the Sylvester matrix equation AXB + CXD = E over the symmetric matrix X with the following parameters 0

12 B 12 B B −6 B A=B B 3 B−2700 B @ 300 6

120 12 −60 6 −27 −6 9

12 0 3 −6 15 −3 27

21 −6 15 −6 15 15 1800

27 −6 18 15 6 12 9

3 6 −3 3 90 15 15

1 21 9 C C −15C C 6 C C, 30 C C 9 A 270

BiCR Algorithm for Computing Generalized Bisymmetric Solutions ... 127 1 20 −1200 12 28 44 −4 24 B 16 −20 4 −48 20 20 24 C B C B −8 20000 4 20 240 −8 48 C B C B=B 20 −20 −8 20 −12 100C B 4 C, C B3964 −32 8 8 12 20 28 B C @ 8 −20 −40 20 12 80 44 A 12 24 8 160 8 8 320 0 1 −56 840 −48 −91 −125 −5 −87 B −52 −16 −4 66 −2 −38 −51 C B C B 26 −19820 −13 −65 −294 17 −3 C B C C =B −38 38 26 −65 3 −118 C B −13 C, B 4136 113 −53 −53 −30 −290 −118 C B C @−908 38 49 −65 −48 −125 −71 A −30 −51 −89 −5560 −35 −53 −1130 0 1 −116 8640 −60 −154 −254 34 −126 B −88 164 −28 324 −152 −128 −150 C B C B 44 −140120 −22 −110 −1644 50 −366 C B C D=B −128 128 44 −110 90 −688 C B −22 C, B−33148 170 −26 −26 −72 40 −136 C B C @ 544 128 274 −110 −60 −530 −290 A −72 −150 −2 2480 −38 −26 −1700 1 0 −0.0132 −0.0488 0.0004 0.0016 −0.0008 −0.0006 −0.0020 B 0.0047 0.0133 −0.0000 −0.0001 0.0003 0.0001 0.0003 C C B B 0.6455 2.3324 −0.0112 −0.0607 0.0419 0.0192 0.0828 C C B 9B 0.0090 −0.0001 −0.0004 0.0003 0.0001 0.0006 C E = 10 B 0.0068 C. C B−0.1506 −0.3419 0.0010 0.0087 −0.0082 −0.0019 −0.0153 C B @ 0.0472 0.1143 −0.0003 −0.0025 0.0025 0.0006 0.0040 A 0.2355 0.8906 −0.0003 −0.0182 0.0156 0.0021 0.0216 0

We can easily see that „„ T «« B ⊗ A + DT ⊗ C κ = 3.5475 × 104 . A ⊗ B T + C ⊗ DT This implies that the equivalent system with the Sylvester matrix equation over the symmetric matrix is ill-conditioned. Applying Algorithm 1, CGNR and CGNE algorithms [4, 22, 23, 24, 25, 26, 17, 44] with the initial matrix X(1) = 0, we compute the approximate solution {X(k)}. After 100 iterations, Algorithm 1 can obtain the symmetric solution of the Sylvester matrix equation as follows 1 0 1.9003 0.2496 0.7831 0.8389 1.1635 1.1807 1.1377 B0.2496 1.6428 0.8504 1.4286 0.9908 1.7680 1.1177C C B B0.7831 0.8504 1.8709 0.9268 0.4255 1.4188 0.8897C C B C X(100) = B B0.8389 1.4286 0.9268 0.2778 0.9496 0.4014 1.1066C . B1.1635 0.9908 0.4255 0.9496 0.8902 1.6040 1.1755C C B @1.1807 1.7680 1.4188 0.4014 1.6040 1.6762 0.4485A 1.1377 1.1177 0.8897 1.1066 1.1755 0.4485 0.6092

128

Masoud Hajarian

Obviously the logarithm of the norm of the residuals is a measure of the quality of the approximate solutions. In Figure 1, we show the logarithm of the norm of the residuals as r(k) = log10 ||E − AX(k)B − CX(k)D||. The results show the acceptable accuracy and efficiency of Algorithm 1 for finding the symmetric solution of the Sylvester matrix equation. 







)

*

+

,

6

7

8

9

-

:

.

/

;




2

3

?

@

4

5

A

B



C

D

E

F

G

H

I

J

K

L

(

'



&

%











































!





"



!







#











$

Figure 1. The results for Example 1.

Example 2. In this example, we study the Sylvester matrix equation AXB + CY D = E over the generalized bisymmetric matrices X ∈ BSRP7×7 and Y ∈ BSRP7×7 where   1200 120 12 21 27 3 21  12 12 0 −6 −6 6 9     −6 −60 3 15 18 −3 −15    A= 6 −6 −6 15 3 6   3 ,  −27 −27 15 15 6 90 30     3 −6 −3 15 12 15 9  6 9 27 18 9 15 2700

BiCR Algorithm for Computing Generalized Bisymmetric Solutions ... 129 e

f

d

c

b

a

w

v

u

t

`

_

\

]

Z

^

x

y

z

{

…

†

‡

ˆ

|

‰

}

~

Š

‹



€

Œ





Ž

‚



ƒ



„

‘

[

’

X

“

”

•

–

—

˜

™

š

›

Y

M

N

O

O

P

g

h

Q

i

Q

R

j

k

l

m

n

o

p

k

l

q

S

p

o

S

m

T

r

U

U

V

W

W

s

Figure 2. The results for Example 2.



1000 −6 6 14  8 −10 2 −24   −4 10 2 10  B= 2 10 −10 −4   −16 −16 4 4   4 −10 −20 10 6 12 4 800  −4600 −354 −42 −77  −44 −26 −2 42   22 170 −11 −55  C= −28 28 22  −11  97 97 −49 −49   −13 28 29 −55 −24 −39 −85 −854

 22 −2 12 10 10 12   120 −4 24   10 −6 50  , 6 10 14   6 40 22  4 4 1600

 −103 −7 −75 8 −28 −39   −174 13 21   −55 −3 −68  , −24 −280 −104   −42 −85 −49  −31 −49 −9700

130

Masoud Hajarian 

 −4600 282 −18 −56 −100 20 −42  −32 94 −14 156 −82 −58 −66     16  −190 −8 −40 −804 22 −198    D =  −8 −58 58 16 −40 48 −338  ,  58  58 2 2 −30 110 −38    −22 58 134 −40 −18 −250 −136  −30 −66 26 −5564 −10 2 −5800 0

0.3686 B 0.0013 B B 0.0072 B E = 108 B B−0.0012 B 0.0255 B @ 0.0101 1.3769

0.0368 −0.0002 0.0003 −0.0008 −0.0032 −0.0009 0.0191



−1  0   0  P =  0  0   0 0

−0.0139 −0.0004 0.0004 −0.0007 −0.0013 −0.0008 −0.0220

0.7176 0.0066 0.0045 0.0059 0.0200 0.0159 2.0337

0.2051 0.0017 0.0009 0.0006 0.0046 0.0035 0.2935

−0.0086 0.0006 −0.0017 0.0009 0.0017 0.0006 −0.0281

 0 0 0 0 0 0 1 0 0 0 0 0   0 −1 0 0 0 0   0 0 −1 0 0 0  . 0 0 0 1 0 0   0 0 0 0 1 0  0 0 0 0 0 −1

1 0.8784 0.0089C C 0.0042C C 0.0079C C, 0.0275C C 0.0205A 2.3128

By using the mentioned algorithms with the initial matrices X(1) = Y (1) = 0, we obtain the approximate solutions {X(k)} and {Y (k)}. The numerical results are depicted in Figure 2 where r(k) = log10 ||E − AX(k)B − CY (k)D||. Numerical results reveal that Algorithm 1 gives good results and is more efficient and effective than other algorithms.

C ONCLUSION This chapter deals with the generalized bisymmetric solutions of the general coupled matrix equations (1). The HS version of BiCR algorithm is generalized for solving the general coupled matrix equations (1) over the generalized bisymmetric matrices. We prove that the generalized HS version of BiCR algorithm can compute the generalized bisymmetric solutions in a finite number of

BiCR Algorithm for Computing Generalized Bisymmetric Solutions ... 131 iterations in the absence of round-off errors. In the end, numerical results are given which confirm the proposed theoretical results and illustrate the efficiency of Algorithm 1.

R EFERENCES [1] Wang, Y. and Ding, F. (2017). Iterative estimation for a non-linear IIR filter with moving average noise by means of the data filtering technique. IMA Journal of Mathematical Control and Information, 34: 745-764. [2] Calvetti, D. and Reichel, L. (1996). Application of ADI iterative methods to the restoration of noisy images. SIAM Journal on Matrix Analysis and Applications, 17: 165-186. [3] Hu, G.D. and Hu, G.D. (2000). A relation between the weighted logarithmic norm of a matrix and the Lyapunov equation. BIT, 40: 606-610. [4] Yuan, Y. and Liu, H. (2012). An iterative updating method for undamped structural systems. Meccanica, 47: 699-706. [5] Yuan, Y. and Liu, H. (2014). An iterative updating method for damped structural systems using symmetric eigenstructure assignment. Journal of Computational and Applied Mathematics, 256: 268-277. [6] Jiang, J. and Yuan, Y. (2009). A new model updating method for damped structural systems. Mathematical Methods in the Applied Sciences, 32: 2138-2147. [7] Zhou, B. and Yan, Z.B. (2008). Solutions to right coprime factorizations and generalized Sylvester matrix equations. Transactions of the Institute of Measurement and Control, 30: 397-426. [8] Keel, L.H., Fleming, J.A. and Bhattacharyya, S.P. (1985). Minimum norm pole assignment via Sylvester s equation. In Linear algebra and its role in systems theory, Vol. 47, AMS Contemporary Mathematics, pp. 265-72. [9] Duan, G.R. (2003). Parametric eigenstructure assignment via output feedback based on singular value decompositions. IEE Proceedings, Control Theory and Applications, 150: 93-100.

132

Masoud Hajarian

[10] Duan, G.R., Liu, W.Q. and Liu, G.P. (2001). Robust model reference control for multivariable linear systems: a parametric approach. Journal of Systems and Control Engineering, 215: 599-610. [11] Saberi, A., Stoorvogel, A.A. and Sannuti, P. (2000). Control of Linear Systems with Regulation and Input Constraints. Springer Verlag. [12] Kyrchei, I.I. (2010). Cramer’s rule for some quaternion matrix equations. Applied Mathematics and Computation, 217: 2024-2030. [13] Kyrchei, I. (2012). Analogs of Cramer’s rule for the minimum norm least squares solutions of some matrix equations. Applied Mathematics and Computation, 218: 6375-6384. [14] Kyrchei, I. (2013). Explicit representation formulas for the minimum norm least squares solutions of some quaternion matrix equations. Linear Algebra and Its Applications, 438: 136-152. [15] Larin, V.B. (2009). On solution of Sylvester equation. Journal of Automation and Information Sciences, 41: 1-7. [16] Larin, V.B. (2009). Solutions of matrix equations in problems of mechanics and control. International Applied Mechanics, 45: 847-872. [17] Hajarian, M. (2015). Recent developments in iterative algorithms for solving linear matrix equations, in: I. Kyrchei (Ed.), Advances in Linear Algebra Research, Nova Sci. Publ., New York, pp. 239-286. [18] Li, F.L., Gong, L.S., Hu, X.Y. and Zhang, L. (2010). Successive projection iterative method for solving matrix equation AX = B. Journal of Computational and Applied Mathematics, 234: 2405-2410. [19] Deng, Y., Bai, Z.Z. and Gao, Y. (2006). Iterative orthogonal direction methods for Hermitian minimum norm solutions of two consistent matrix equations. Numerical Linear Algebra with Applications, 13: 801-823. [20] Wang, M., We, M. and Feng, Y. (2010). An iterative algorithm for a least squares solution of a matrix equation. International Journal of Computer Mathematics, 87: 1289-1298.

BiCR Algorithm for Computing Generalized Bisymmetric Solutions ... 133 [21] Li, S.K. and Huang, T.Z. (2012). LSQR iterative method for generalized coupled Sylvester matrix equations. Applied Mathematical Modelling, 36 3545-3554. [22] Dehghan, M. and Hajarian, M. (2012). On the generalized reflexive and anti-reflexive solutions to a system of matrix equations. Linear Algebra and its Applications, 437: 2793-2812. [23] Dehghan, M. and Hajarian, M. (2010). The general coupled matrix equations over generalized bisymmetric matrices. Linear Algebra and its Applications, 432: 1531-1552. [24] Dehghan, M. and Hajarian, M. (2011). Analysis of an iterative algorithm to solve the generalized coupled Sylvester matrix equations. Applied Mathematical Modelling, 35: 3285-3300. [25] Hajarian, M. (2017). Convergence analysis of the MCGNR algorithm for the least squares solution group of discrete-time periodic coupled matrix. Transactions of the Institute of Measurement and Control, 39: 29-42. [26] Hajarian, M. (2016). Extending the CGLS algorithm for least squares solutions of the generalized Sylvester-transpose matrix equations. Journal of the Franklin Institute, 353: 1168-1185. [27] Ding, F. and Chen, T. (2005). Iterative least squares solutions of coupled Sylvester matrix equations. Systems & Control Letters, 54: 95-107. [28] Ding, F. and Chen, T. (2006). On iterative solutions of general coupled matrix equations. SIAM Journal on Control and Optimization, 44: 22692284. [29] Ding, F. and Chen, T. (2005). Gradient based iterative algorithms for solving a class of matrix equations. IEEE Transactions on Automatic Control, 50: 1216-1221. [30] Ding, F. and Zhang, H. (2014). Gradient-based iterative algorithm for a class of the coupled matrix equations related to control systems. IET Control Theory and Applications, 8: 1588-1595. [31] Zhou, B., Duan, G.R. and Li, Z.Y. (2009). Gradient based iterative algorithm for solving coupled matrix equations. Systems & Control Letters, 58: 327-333.

134

Masoud Hajarian

[32] Zhou, B., Li, Z.Y., Duan, G.R. and Wang, Y. (2009). Weighted least squares solutions to general coupled Sylvester matrix equations. Journal of Computational and Applied Mathematics, 224: 759-776. [33] Niu, Q., Wang, X. and Lu, L.Z. (2011). A relaxed gradient based algorithm for solving Sylvester equations. Asian Journal of Control, 13: 461-464. [34] Hajarian, M. (2014). A gradient-based iterative algorithm for generalized coupled Sylvester matrix equations over generalized centro-symmetric matrices. Transactions of the Institute of Measurement and Control, 36: 252-259. [35] Xie, Y., Huang, N. and Ma, C. (2014). Iterative method to solve the generalized coupled Sylvester-transpose linear matrix equations over reflexive or anti-reflexive matrix. Computers & Mathematics with Applications, 67: 2071-2084. [36] Huang, N. and Ma, C. (2014). The modified conjugate gradient methods for solving a class of generalized coupled Sylvester-transpose matrix equations. Computers & Mathematics with Applications, 67: 1545-1558. [37] Hajarian, M. (2014). Matrix form of the CGS method for solving general coupled matrix equations. Applied Mathematics Letters, 34: 37-42. [38] Hajarian, M. (2016). Symmetric solutions of the coupled generalized Sylvester matrix equations via BCR algorithm. Journal of the Franklin Institute, 353: 3233-3248. [39] Hajarian, M. (2016). Generalized conjugate direction algorithm for solving the general coupled matrix equations over symmetric matrices. Numerical Algorithms, 73: 591-609. [40] Hajarian, M. (2013). Matrix iterative methods for solving the Sylvestertranspose and periodic Sylvester matrix equations. Journal of the Franklin Institute, 350: 3328-3341. [41] Hajarian, M. (2015). Developing BiCOR and CORS methods for coupled Sylvester-transpose and periodic Sylvester matrix equations. Applied Mathematical Modelling, 39: 6073-6084.

BiCR Algorithm for Computing Generalized Bisymmetric Solutions ... 135 [42] Hajarian, M. (2015). Matrix GPBiCG algorithms for solving the general coupled matrix equations. IET Control Theory & Applications, 9: 74-81. [43] Vespucci, M.T. and Broyden, C.G. (2001). Implementation of different computational variations of biconjugate residual methods. Computers & Mathematics with Applications, 42: 1239-1253. [44] Peng, Z. (2013). The reflexive least squares solutions of the matrix equation A1 X1 B1 + A2 X2 B2 + ... + Al Xl Bl = C with a submatrix constraint. Numerical Algorithms, 64: 455-480.

In: Hot Topics in Linear Algebra Editor: Ivan I. Kyrchei

ISBN: 978-1-53617-770-1 c 2020 Nova Science Publishers, Inc.

Chapter 4

S YSTEM OF M IXED G ENERALIZED S YLVESTER -T YPE Q UATERNION M ATRIX E QUATIONS Abdur Rehman1,∗, Ivan I. Kyrchei2,†, Muhammad Akram1,‡, Ilyas Ali1,§ and Abdul Shakoor3,¶ 1 University of Engineering & Technology, Lahore, Pakistan 2 Pidstryhach Institute for Applied Problems of Mechanics and Mathematics, NAS of Ukraine, Lviv, Ukraine 3 Khwaja Fareed University of Engineering & Information Technology, Rahim Yar Khan, Pakistan

Abstract Some necessary and sufficient conditions of constraint mixed type generalized Sylvester quaternion matrix equations to have a solution are derived in this article. When they obey some solvable conditions, the general solution is given in terms of generalized inverses, and explicit solution formulas are simplified due to conditions projectors inducted by ∗

Corresponding Author’s Email: Corresponding Author’s Email: ‡ Corresponding Author’s Email: § Corresponding Author’s Email: ¶ Corresponding Author’s Email: †

armath [email protected]. [email protected]. [email protected]. [email protected]. [email protected].

138

Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram et al. coefficient matrices. To understand the constructed notions, a numerical example is established where determinantal representations of MoorePenrose inverse are used.

1.

INTRODUCTION

Throughout, R and H stand for the real number field and quaternion skew field defined by H ={b0 + b1 i + b2 j + b3 k | i2 = j2 = k2 = ijk = −1, b0, b1 , b2, b3 ∈ R}, respectively. Hm×n represents the set of all matrices of size m × n over H. The symbol I represents an identity matrix of appropriate shape. The notations J ∗ , r(J) and R(J) denote the conjugate transpose, the rank and the space generated by the columns of a given matrix J, respectively. In addition, J † denotes the Moore-Penorse inverse of J and is defined as JJ † J = J,

J † JJ † = J † ,

(JJ † )∗ = JJ † , (J † J)∗ = J † J. Furthermore, PJ = J † J, QJ = JJ † , LJ = I − J † J and RJ = I − JJ † are the associated projectors of J, respectively. It is clear that LJ = (LJ )∗ = (LJ )2 = L†J , RJ = (RJ )2 = (RJ )∗ = R†J . Sir William Rowan Hamilton introduced quaternions in [1]. Obviously, H is noncommutative and associative division algebra. The robust applications of quaternion matrices in different disciplines like mechanics, computer graphics, control theory, quantum physics, altitude control and signal processing can be seen in [2–8]. Quaternion matrices also have remarkable role in mathematics fields like geometry, algebra, computation and analysis [9–12]. Zhang explored quaternions in [13]. The Sylvester matrix equation has far reaching applications in singular system control [14–16], Hα -optimal control [17], system design [18], robust control [19–22], feedback [23, 24], perturbation theory [25], sensitivity analysis [26], linear descriptor systems [27], neural networks [28], optimal pole assignment [29], control theory and theory of orbits [30–42].

System of Sylvester-Type Quaternion Matrix Equations

139

The general solution to A1 X − Y B1 = C1 , A2 Z − Y B2 = C2 , and

(1.1)

A1 X − Y B1 = C1 , A2 Y − ZB2 = C2 ,

was established in [43] and [44] respectively, when these systems have a solution. Some consistent conditions to (1.1) were discussed in [45]. Some necessary and sufficient conditions for (1.1) to have a solution were presented in [46]. The authors in [46] showed that (1.1) is solvable if and only if     A1 C1 A1 0 P1 = P2 , 0 B1 0 B1     A2 C2 A2 0 Q1 = Q2 , 0 B2 0 B2 where P1 , P2 , Q1 and Q2 are invertible matrices over a field F. Some recent work on the general solution of some matrix equations can be found in [47–49]. Recently, the authors in [50] computed the general solution to A3 X = C3 , Y B3 = C4 , A4 Z = C5 , A5 ZB5 = C6 , A1 X − Y B1 = C1 , A2 Z − Y B2 = C2 . when it is consistent. They also gave a numerical example to comprehend their finding. Some recent work on generalized Sylvester matrix equations can be observed in [51–60]. The conversion of several problems in the area of engineering like linear system [61], eigenstructure assignment [62], Luenberger-type observer design [63, 64] and control theory [65] into (1.1) inspires us to evaluate the general solution to A3 X = C3 , Y B3 = C4 , ZB4 = C5 , A4 ZB5 = C6 , A1 X − Y B1 = C1 , A2 X − ZB2 = C2 ,

(1.2)

140

Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram et al.

when it is consistent. When A3 , C3 , B3 , C4 , B4 , C5 , A4 , B5 , C6 are all null matrices and Y = Z, then (1.2) reduces to A1 X − Y B1 = C1 , A2 X − Y B2 = C2 , which was examined by Wimmer in [40]. The primary goal of this paper is to construct the general solution to (1.2) when it has a solution. Since we express an explicit solution to (1.2) in terms of the Moore-Penrose inverse, we mean a direct explicit method of construct of the Moore-Penrose inverse, namely, its determinantal representations [66, 67]. This paper is organized as follows. In Section 2, we consider some necessary elementary provisions concerning to quaternion matrix equations and determinantal representations of the Moore-Penrose inverse obtained within in framework of the theory of row-column quaternion determinants. Some necessary and sufficient conditions for (1.2) to have a solution are established in Section 3 with its exclusive explicit solution. Due to conditions of projectors, formulas of the obtained general solution are simplified in Section 4. An algorithm and a numerical example is presented in Section 5. Conclusions are given in Section 5.

2.

P RELIMINARIES

Some necessary elementary lemmas concerning to quaternion matrix equations are quoted below. Lemma 2.1. [68]. Known E, F and G matrices over H of adequate dimensions, EX − Y F = G has a solution if and only if RE GLF = 0. In this condition, its explicit solution is X = E † G + W 1 F + LE W 2 , Y = −RE GF † + EW1 + W3 RF , where W1 , W2 , and W3 are arbitrary matrices of adequate dimensions over H.

System of Sylvester-Type Quaternion Matrix Equations

141

Lemma 2.2. [69]. Let K ∈ Hm×n , P ∈ Hm×t , Q ∈ Hl×n . Then   K r − r(QLK ) = r(K), Q   r K P − r(RP K) = r(P ),   K P r − r(RP KLQ ) − r(Q) = r(P ). Q 0 Lemma 2.3. [70,71]. Let A1 , B1 , C3 , D3 , C4 , D4 and E1 be known matrices of adequate shapes. Assign A = RA1 C3 , B = D3 LB1 , C = RA1 C4 , D = D4 LB1 , E = RA1 E1 LB1 , F = RA C, G = DLB , H = CLF . Then the statements mentioned below are equivalent: (1) A1 U + V B1 + C3 W D3 + C4 ZD4 = E1

(2.1)

has a solution. (2) RF RA E = 0, ELB LG = 0, RA ELD = 0, RC ELB = 0. Under these conditions, the general solution to (2.1) is U =A†1 (E1 − C3 W D3 − C4 ZD4 ) − A†1 S7 B1 + LA1 S6 , V =RA1 (E1 − C3 W D3 − C4 ZD4 )B1† + A1 A†1 S7 + S8 RB1 , W =A† EB † − A† CF † EB † − A† HC † EG†DB † − A† HS2 RG DB † + LA S4 + S5 RB , Z =F † ED † + H †HC † EG† + LF LH S1 + LF S2 RG + S3 RD , where S1 − S8 are arbitrary matrices of conformable shapes over H. Lemma 2.3 is crucial to obtain the main result of this paper. The following Lemma is given over C but it can be extended to H.

142

Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram et al.

Lemma 2.4. [72]. Let A1 and C1 be given matrices with suitable shapes. Then A1 X = C1 has a solution if and only if C1 = A1 A†1 C1 . Under this condition, its general solution is †

X = A1 C1 + LA1 T, where T is arbitrary matrix with adequate shape. Lemma 2.5. [72]. Let B1 and D1 be known matrices with conformable shapes. Then Y B1 = D1 has a solution if and only if D1 = D1 B1† B1 . Under this condition, its general solution is Y = D1 B1† + SRB1 , where S is arbitrary matrix with conformable shape. Lemma 2.6. [73]. Let A4 , B4 , B5 and C6 be given matrices over H with able shapes. Use A5 = RB4 B5 . Then the following conditions are equivalent: (1) The system of matrix equations ZB4 = C5 , A4 ZB5 = C6 , is consistent. (2) C5 LB4 = 0, (C6 − A4 C5 B4† B5 )LA5 = 0, RA4 C6 = 0. (3) r



C5 B4



= r(B4 ), r



C6 B5

A4 C5 B4



=r



B5

B4



,

r[A4 C6 ] = r(A4 ).

With these conditions, its general solution is Z = C5 B4† + A†4 (C6 − A4 C5 B4† B5 )A†5 RB4 + LA4 Q1 RB4 + Q2 RA5 RB4 , where Q1 and Q2 are free matrices with apt sizes.

System of Sylvester-Type Quaternion Matrix Equations

143

Since the Moore-Penrose inverse of a coefficient matrix and its inducted projectors are crucial to expressions of solutions, there is a problem of their construct. The inverse matrix is determined by the adjugate matrix that gives a direct method of its finding by using minors of an initial matrix. Due to minors, this method can be called the determinantal representation of an inverse. The same is desirable for generalized inverses. However, determinantal representations are not so unambiguous even for complex or real generalized inverses. Through looking for their more applicable explicit expressions, there are various determinantal representations of generalized inverses, in particular for the Moore-Penrose inverse (see, e.g. [74–78]). By virtue of noncommutativity of quaternions, the problem for determinantal representation of generalized quaternion inverses is even more complicated, and only now it can be solved due to the theory of column-row determinants introduced in [79, 80]. For A ∈ Hn×n , we define n row determinants and n column determinants. Suppose Sn is the symmetric group on the set In = {1, . . ., n}. Definition 2.7. [79]. The ith row determinant of A = (aij ) ∈ Hn×n is defined for all i = 1, . . . , n by putting X (−1)n−r (ai ik1 aik1 ik1 +1 . . . aik1 +l1 i ) . . .(aikr ikr +1 . . . aikr +lr ikr ), rdeti A = σ∈Sn

σ = (i ik1 ik1 +1 . . . ik1 +l1 ) (ik2 ik2 +1 . . . ik2 +l2 ) . . . (ikr ikr +1 . . . ikr +lr ) , where ik2 < ik3 < · · · < ikr and ikt < ikt +s for all t = 2, . . . , r and s = 1, . . . , lt. Definition 2.8. [79]. The jth column determinant of A = (aij ) ∈ Hn×n is defined for all j = 1, . . . , n by putting cdetj A =

X

(−1)n−r (ajkr jkr +lr . . . ajkr +1 jkr ) . . . (ajjk1+l1 . . . ajk1 +1 jk1 ajk1 j ),

τ∈Sn

τ = (jkr +lr . . . jkr +1 jkr ) . . . (jk2 +l2 . . . jk2 +1 jk2 ) (jk1 +l1 . . . jk1 +1 jk1 j) ,

where jk2 < jk3 < · · · < jkr and jkt < jkt +s for t = 2, . . ., r and s = 1, . . . , lt. Since [80] for Hermitian A we have rdet1 A = · · · = rdetn A = cdet1 A = · · · = cdetn A ∈ R,

144

Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram et al.

the determinant of a Hermitian matrix is defined by putting det A := rdeti A = cdeti A for all i = 1, . . ., n. Its properties are similar to the properties of an usual (commutative) determinant and they have been completely explored in [80] by using row and column determinants that are so defined only by construction. Further, we give determinantal representations of the Moore-Penrose inverse over H. We shall use the following notations. Let α := {α1 , . . . , αk } ⊆ {1, . . ., m} and β := {β1 , . . . , βk } ⊆ {1, . . . , n} be subsets of the order 1 ≤ k ≤ min {m, n}. Let Aαβ be a submatrix of A ∈ Hm×n whose rows are indexed by α and the columns indexed by β. Similarly, let Aαα be a principal submatrix of A whose rows and columns indexed by α. If A ∈ Hn×n is Hermitian, then |A|αα is the corresponding principal minor of det A. For 1 ≤ k ≤ n, the collection of strictly increasing sequences of k integers chosen from {1, . . . , n} is denoted by Lk,n := { α : α = (α1 , . . . , αk ) , 1 ≤ α1 ≤ . . . ≤ αk ≤ n}. For fixed i ∈ α and j ∈ β, let Ir, m {i} := { α : α ∈ Lr,m , i ∈ α}, Jr, n {j} := { β : β ∈ Lr,n , j ∈ β}. Let a.j be the jth column and ai. be the ith row of A, respectively. Suppose A.j (b) denotes the matrix obtained from A by replacing its jth column with the column-vector b ∈ Hm×1 , and Ai. (b) denotes the matrix obtained from A by replacing its ith row with the row-vector b ∈ H1×n . We denote the ith row and the jth column of A∗ by a∗i. and a∗.j , respectively. Theorem 2.9. [67]. If A ∈ Hrm×n , then the Moore-Penrose inverse A† =   a†ij ∈ Hn×m have the following determinantal representations, a†ij =

P

β∈Jr,n {i}

  β cdeti (A∗ A).i a∗.j β

P

β∈Jr,n

and a†ij =

P

α∈Ir,m {j}

|A∗ A|ββ

,

(2.2)

rdetj ((AA∗ )j. (a∗i.))αα P

α∈Ir,m

|AA∗ |αα

.

(2.3)

System of Sylvester-Type Quaternion Matrix Equations

145

Remark 2.10. For an arbitrary full-rank matrix A ∈ Hrm×n , we put X β cdeti ((A∗ A).i (d.j )) = cdeti ((A∗ A).i (d.j ))β , β∈Jn,n {i}

X

det (A∗ A) =

|A∗ A|ββ when r = n,

β∈Jn,n

X

rdetj ((AA∗ )j. (di.)) =

rdetj ((AA∗ )j. (di. ))αα ,

α∈Im,m {j}

X

det (AA∗ ) =

|AA∗ |αα when r = m,

α∈Im,m

where a column-vector d.j and a row-vector di. have appropriate sizes. Corollary 2.11. If A ∈ Hrm×n , then the projection matrix A† A =: QA = (qij )n×n has the determinantal representation

qij =

P

β∈Jr,n {i}

cdeti ((A∗ A).i (a˙ .j ))ββ P

β∈Jr,n

|A∗ A|ββ

,

where a˙ .j is the jth column of A∗ A ∈ Hn×n . Corollary 2.12. If A ∈ Hrm×n , then the projection matrix AA† =: PA = (pij )m×m has the determinantal representation P rdetj ((AA∗ )j. (¨ ai. ))αα pij =

α∈Ir,m {j}

P

α∈Ir,m

|AA∗ |αα

,

(2.4)

where ¨ ai. is the ith row of AA∗ ∈ Hm×m . Determinantal representations of the orthogonal projectors LA := I − A† A and RA := I − AA† can be easy obtained from (2.11) and (2.4), respectively.

146

3.

Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram et al.

MAIN R ESULT

Now we are able to give the main theorem of this article. For this purpose, denote A5 = RB4 B5 , A6 = A1 LA3 , B6 = RB3 B1 , C7 = C1 − A1 A†3 C3 + C4 B3† B1 , A7 = A2 LA3 LA6 , B7 = −RA5 RB4 B2 , A8 = A2 LA3 , B8 = B6 , C8 = −LA4 , D8 = RB4 B2 , C9 = C2 − A2 [A†3 C3 + LA3 A†6 C7 ] + Z0 B2 , Z0 = C5 B4† + A†4 (C6 − A4 C5 B4† B5 )A†5 RB4 , A9 = RA7 A8 , B9 = B8 LB7 , A10 = RA7 C8 , B10 = D8 LB7 , C10 = RA7 C9 LB7 , F = RA9 A10 ,

(3.1) (3.2) (3.3)

G = B10 LB9 , H = A10 LF .

(3.4)

Theorem 3.1. Given A1 , A2 , A3 , A4 , B1 , B2 , B3 , B4 , B5 , C1 , C2 , C3 , C4 , C5 and C6 of conformable shapes over H. Then the following conditions are equivalent: (1) The system (1.2) is consistent. (2) A3 A†3 C3 = C3 , C4 B3† B3 = C4 , C5 LB4 = 0, (C6 − A4 C5 B4† B5 )LA5 = 0, RA4 C6 = 0, RA6 C7 LB6 = 0, RF RA9 C10 = 0, C10 LB9 LG = 0,

(3.5)

RA9 C10 LB10 = 0, RA10 C10 LB9 = 0. (3) r r r









A3 C6 B5 C5 B4

C1 r  B1 C3





C4 B3



= r(A3 ), r = r(B3 ),    A4 C5 = r B5 B4 , r[A4 C6 ] = r(A4 ), B4  = r(B4 ),    A1 −C4   A1 0 B3  = r + r B1 B3 , A3 A3 0 C3

(3.6) (3.7) (3.8) (3.9)

147

System of Sylvester-Type Quaternion Matrix Equations 

B2 r  A4 C2 C3 

  r   

C2 B2 B1 C1 C3

0 A4 A2 A3

A2 0 0 A1 A3

B5 −C6 0

−C5 B4 0 0 0



  B4  A4 A2 −A4 C5  = r + r B2 A3 0

0 0 B3 −C4 0 



    A1   = r  A2  + r B2  B1  A3

B4 0

B5



B4

,

(3.10)

0 B3



,

(3.11)

  C2 A2 −C5   A2 B4  = r B2 B4 + r , r  B2 0 A3 C3 A3 0   B1 0 B3 0 0  B2  0 0 B5 B4    C A −C 0 0 r 1 1 4     C3 A3 0 0 0 A4 C2 A4 A2 0 −C6 −A4 C5     A1 B1 B3 0 0 =r  A3  + r . B2 0 B5 B4 A4 A2

(3.12)

(3.13)

Under these conditions, the general solution to (1.2) can be expressed as X = A†3 C3 + LA3 (A†6 C7 + W3 B6 + LA6 W1 ), Y = Z=

C4 B3† C5 B4†

− RA6 C7 B6† RB3 + A6 W3 RB3 + W6 RB6 RB3 , + A†4 (C6 − A4 C5 B4† B5 )A†5 RB4 + LA4 W4 RB4

(3.14) (3.15) + W2 RA5 RB4 , (3.16)

with W1 = A†7 (C9 − A8 W3 B8 − C8 W4 D8 ) − A†7 T1 B7 + LA7 T2 ,

(3.17)

† † W2 = RA7 (C9 − A8 W3 B8 − C8 W4 D8 )B7 + A7 A7 T1 + T3 RB7 , W3 = A†9 C10 B9† − A†9 A10 F † C10 B9† − A†9 HA†10 C10 G† B10 B9† − A†9 HZ2 RG B10 B9† + LA9 Z3 + Z4 RB9 ,

(3.18)

(3.19)

† W4 = F † C10 B10 + H †HA†10 C10 G† + LF LH Z5 + LF Z2 RG + Z6 RB10 , (3.20)

148

Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram et al.

where W6 , T1 , T2 , T3 , Z2 , · · · , Z6 are free matrices of allowable sizes over H. Proof. By Lemma 2.4 and Lemma 2.5, the general solution to A3 X = C3 and Y B3 = C4 are X = A†3 C3 + LA3 U1 ,

(3.21)

Y = C4 B3† + U2 RB3 ,

(3.22)

and

where U1 and U2 are arbitrary matrices of adequate shapes over H. By replacing (3.21) and (3.22) in A1 X − Y B1 = C1 and by the support of Lemma 2.1, we get U1 = A†6 C7 + W3 B6 + LA6 W1 ,

(3.23)

and U2 = −RA6 C7 B6† + A6 W3 + W6 RB6 , when RA6 C7 LB6 =0. Substituting (3.23) in (3.21), we have X = A†3 C3 + LA3 [A†6 C7 + W3 B6 + LA6 W1 ].

(3.24)

By Lemma 2.6, the exclusive solution to ZB4 = C5 , A5 ZB5 = C6 is Z =C5 B4† + A†4 (C6 − A4 C5 B4† B5 )A†5 RB4 + LA4 W4 RB4 + W2 RA5 RB4 ,

(3.25)

where W2 and W4 are arbitrary matrices of adequate shapes. By substituting (3.24) and (3.25) in A2 X − ZB2 = C2 , we get A7 W1 + W2 B7 + A8 W3 B8 + C8 W4 D8 = C9 . By Lemma 2.3, Eq. (3.26) has a solution if and only if RF RA9 C10 = 0, C10 LB9 LG = 0, RA9 C10 LB10 = 0, RA10 C10 LB9 = 0.

(3.26)

System of Sylvester-Type Quaternion Matrix Equations

149

With these conditions W1 , W2 , W3 and W4 can be represented by (3.17)-(3.20), respectively. Now we show that RF RA9 C10 = 0 is equivalent to (3.11). By applying Lemma 2.2 to the left hand side of r(RF RA9 C10 ) = 0, we have   r(RF RA9 C10 ) = r RA9 C10 F − r(F )   =r RA9 C10 RA9 A10 − r(RA9 A10 )     =r C10 A10 A9 − r A10 A9     =r RA7 C9 LB7 RA7 C8 RA7 A8 − r RA7 C8 RA7 A8     C9 C8 A8 A7 =r − r A7 A8 C8 − r(B7 ) B7 0 0 0   C2 − A2 X0 + Z0 B2 −LA4 A2 LA3 A2 LA3 LA6 =r −RA5 RB4 B2 0 0 0     −r −LA4 A2 LA3 A2 LA3 LA6 − r −RA5 RB4 B2   C2 − A2 X0 + Z0 B2 −LA4 A2 LA3 =r −RA5 RB4 B2 0 0     −r −LA4 A2 LA3 − r RA5 RB4 B2   C2 − A2 X0 + Z0 B2 I A2 0  RB4 B2 0 0 A5   =r   0 A4 0 0  0 0 A3 0   I A2   −r  A4 0  − r RB4 B2 A5 0 A3   C2 − A2 X0 + Z0 B2 I A2 0  RB4 B2 0 0 RB4 B5   =r    0 A4 0 0 0 0 A3 0   I A2   −r  A4 0  − r RB4 B2 RB4 B5 0 A3   B2 0 B5 B4 =r  A4 C2 A4 A2 −C6 −A4 C5  C3 A3 0 0     A4 A2 −r − r B2 B4 B5 . A3

Hence r(RF RA E) = 0 is equivalent to (3.10). In the same way, we can

150

Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram et al.

prove that the remaining equalities are also satisfied and hence the theorem is finished.

4.

SOME SIMPLIFICATIONS OF THE R ESULT F ORMULAS

Due to [81], if A is Hermitian and idempotent, then the following equations hold for any matrix B, A(BA)† = (BA)† , (AB)† A = (AB)† .

(4.1)

Using (4.1) and to facilitate calculations, we can simplify some of the notations as follows, A7 = A2 LA3 (I − (A1 LA3 )†(A1 LA3 )) = A2 (LA3 − QA6 ), B7 = (PB6 − RB4 )B2 , C9 = C2 − A2 [A†3 C3 + A†6 C7 ] + Z0 B2 , Z0 = C5 B4† + A†4 (C6 − A4 C5 B4† B5 )A†5 , Y0 = C4 B3† − RA6 C7 B6† , A9 = (I − A8 A†7 )A8 , B9 = B8 (I − B7† D8 ), A10 = (I − A8 A†7 )C8 ,

(4.2)

B10 = D8 (I − B7† D8 ), C10 = (I − A8 A†7 )C9 (I − B7† D8 ), F = (RA7 − PA9 )C8 , G = D8 (LB7 − QB9 ), H = (I − A10 F † )A10 . Moreover, the general solution can be simplified as well. We shall divide the result matrices into strictly-defined and conditionally-free parts. So, let W4 = W41 + W42 , where † † W41 = F † C10 B10 + H † HA†10 C10 G† = F † RA7 C9 B10 + H †LF A†10 C9 G†,

W42 = LF LH Z5 + LF Z2 RG + Z6 RB10 , and Z2 , Z5 , Z6 are free matrices of allowable sizes over H. Similarly, by denoting W3 = W31 +W32 , using (4.1) and due to RA9 RA7 = RA7 RA9 and LA4 LF = LF LA4 , we obtain W31 =A†9 C10 B9† − A†9 A10 F † C10 B9† − A†9 HA†10 C10 G†B10 B9† =A†9 C9 B9† − A†9 F † C9 B9† − A†9 LF A†10 C9 G†B2 B9† , W32 = − A†9 HZ2 RG D8 B9† + LA9 Z3 + Z4 RB9 ,

System of Sylvester-Type Quaternion Matrix Equations

151

where Z3 and Z4 are free matrices of allowable sizes as well. Since A8 W31 B8 = A2 W31 B1 and C8 W41 D8 = −W41 B2 , then for W2 = W21 + W22 and W1 = W11 + W12 we have W21 = RA7 (C9 − A2 W31 B1 + W41 B2 )B7† , †



W22 = −RA7 (A8 W32 B8 + C8 W42 D8 )B7 + A7 A7 T1 + RB7 T3 , W11 = A†7 (C9 − A2 W31 B1 + W41 B2 ), W12 = −A†7 (A8 W32 B8 + C8 W42 D8 ) + A†7 T1 B7 + LA7 T2 , where the conditionally-free matrices W12 and W22 possess free matrices T1 , T2 , and T3 , respectively. Finally, the general solution is X =A†3 C3 + A†6 C7 + W31 B1 + W11 + LA3 W32 B6 + LA3 LA6 W12 , Y Z

=C4 B3† =C5 B4†

− RA6 C7 B6† + A1 W31 + A6 W32 RB3 + W6 RB6 RB3 , + A†4 (C6 − A4 C5 B4† B5 )A†5 + W41 + W21 +

LA4 W42 RB4 + W22 RA5 RB4 .

(4.3) (4.4) (4.5) (4.6)

Now we present a numerical example to illustrate our method.

5.

A LGORITHM

WITH A

N UMERICAL E XAMPLE

First we present an algorithm with the support of Theorem 3.1.

Algorithm. (1) First, provide the values of Ai , Bj , Ck , i = 1, . . ., 4, j = 1, . . . , 5, k = 1, . . ., 6. (2) Calculate A9 , B9 , A10 , B10 , C10 , F, G and H by (3.1)-(3.4) and using their simplifications (4.2). (3) Confirm whether (3.6)-(3.13) are true or not. If one of these does not true, ” there is no solution” END. (4) Calculate X, Y and Z from (4.3)-(4.6).

152

Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram et al.

An Example. For given matrices over H A1 B1 B4 C1

C2

C4



     2 1 −i 1   1 −i =  i −j , A2 =  j −i , A3 = , A4 = i j k , j k k i k j       1 i i −2 1 −i 2k = , B2 = , B3 = , −2 −j j −2k −2 j 0     −j k i i 1 = , B5 = , 1 i −k k −j     80 + 14i + 20j + 30k 9 + 8i + 57j + 18k 1 2 2k 1 = 25 − 16i + 65j + 34k −89 − 22i − 2j − 8k , C3 = , 9 9 2j 2i −4 − 11i − 7j − 52k 5 − 25i − 91j − k   −6 − 8i − 16j − 5k 19 − 12i − 6j + 26k   1 = −6 + 5i − 6j − 9k −8 + 8i − 16j + 24k , C6 = 12i −12k , 5 −9 + 5i + 4j − 2k −14 − 14i − 2j − 8k     21 −8j 4 + 2k 2 2i −2k =  8j 9 −4i − 8j , C5 =  8j 9 −4i − 8j . 4 − 2k 4i + 8j 20 2k 2j 2

The matrices Ci , i = 1, . . . , 6, has been chosen that they satisfy all the conditions mentioned in (3.5). By Theorem 2.9, Corollaries 2.11 and 2.12, and using the simplifications (4.2), we have       i 1 1 −j 1 1 −i −k A†3 = , A†4 = −  j  , A†5 = , j 4 i k 3 4 1 k     1 − 2k −10 − k 1 2 + i −i + k 1 − k 1  A†6 = , B3† = 4i − 2j 2i − 5j  , 9 1 − 2i −1 + j −i − j 25 −10k −2 − 4k     j 1 1 −k j 1 1 †  B4 = −k −i , LF = k 1 −i  , 6 3 −i k −j i −2   91 − 23i − 20j − 10k −9 − 8i + 33j − 9k 1 C7 = −25 + 39i + 7j − 43k 8 + 13i + 2j + 17k  . 9 57 + 11i + 7j + 25k −5 − 20i + 10j + k Since RB3 = 0, then B9 = B8 = B6 = 0 and LB9 = I. Similarly, H = 0, A7 = 0, B7 = 0, and RA7 = I, LB7 = I. So, we have Wi1 = 0 for all

System of Sylvester-Type Quaternion Matrix Equations

153

i = 1, . . ., 4. Finally, by putting free matrices W6 , T1 , T2 , T3 , Z2 , . . . , Z6 as zero-matrices, we obtain the following partial solution,   1 50 + 9i − 2j − 16k −2 − 5i + 15j + 12k , X= + = 9 9 − 32i − 16j + 2k −5 + 2i − 6j − 15k   1 − 2k −10 − k Y = C4 B3† = 4i − 2j 2i − 5j  , −10k −2 − 4k   −i 2−k Z = C5 B4† + A†4 C6 A†5 =  i − j + k 1 + i + k . −1 − i − k i − j + k A†3 C3

A†6 C7

C ONCLUSION Some necessary and sufficient conditions have been computed for (1.2) to have a solution in this article. The explicit expression of (1.2) is also constructed when it is consistent by the virtue of generalized inverses and rank equalities. Explicit solution formulas are simplified due to conditions projectors inducted by coefficient matrices. A numerical example is also established when to obtain the explicit expression of the general solution determinantal representations of Moore-Penrose inverses are used.

R EFERENCES [1] Hamilton, W.R. (1844). “On quaternions, or on a new system of imaginaries in algebra”, Philosophical Magazine, 25(3): 489-495. [2] Adler, S.L. (1995). Quaternionic Quantum Mechanics and Quantum Fields. Oxford University Press, New York. [3] Kuipers, J.B. (2002). Quaternions and Rotation Sequences: A Primer with Applications to Orbits, Aerospace, and Virtual Reality, Princeton University, Press, Princeton. [4] Leo, S.D. and Scolarici, G. (2000). “Right eigenvalue equation in quaternionic quantum mechanics”, Journal of Physics A: Mathematical and General, 33: 2971-2995.

154

Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram et al.

[5] Took, C.C. and Mandic, D.P. (2009). “The quaternion LMS algorithm for adaptive filtering of hypercomplex real world processes”, IEEE Transactions on Signal Processing, 57: 1316-1327. [6] Took, C.C. and Mandic, D.P. (2010). “A quaternion widely linear adaptive filter”, IEEE Transactions on Signal Processing, 58: 4427-4431. [7] Took C.C. and Mandic, D.P. (2011). “Augmented second-order statistics of quaternion random signals”, Signal Processing 91: 214-224. [8] Took, C.C. Mandic, D.P. and Zhang, F. (2011). “On the unitary diagonalisation of a special class of quaternion matrices”, Applied Mathematics Letters, 24: 1806-1809. [9] Conway, J.H. and Smith, D.A. (2002). “On Quaternions and Octonions: Their Geometry, Arithmetic, and Symmetry”, A K Peters, Natick. [10] Kamberov, G. Norman, P. Pedit, F. and Pinkall, U. (2002). “Quaternions Spinors and Surfaces, Contemporary Mathematics”, 299, American Mathematical Society, 154. [11] Nebe, G. (1998). “Finite quaternionic matrix groups”, Representation Theory, 2: 106-223. [12] Ward, J.P. (1997). “Quaternions and Cayley Numbers”, Mathematics and Its Applications, vol. 403, Kluwer Academic Publishers, Dordrecht, The Netherlands. [13] Zhang, F. (1997). “Quaternions and matrices of quaternions”, Linear Algebra and its Applications, 251: 21-57. [14] Gavin, K.R. and Bhattacharyya, S.P. (1982). “Robust and well-conditioned eigenstructure assignment via Sylvester’s equation”, In Proceedings of American Control Conference, Arlington, Virginia. [15] Kwon, B.H. and Youn, M.J. (1987). “Eigenvalue-generalized eigenvector assignment by output feedback”, IEEE Transactions on Automatic Control, 32(5): 417-421. [16] Shahzad, A. Jones, B.L. Kerrigan, E.C. and Constantinides, G.A. (2011). “An efficient algorithm for the solution of a coupled Sylvester equation appearing in descriptor systems”, Automatica, 47: 244-248.

System of Sylvester-Type Quaternion Matrix Equations

155

[17] Saberi, A. Stoorvogel, A.A. and Sannuti, P. (2003). Control of linear systems with regulation and input constraints, Berlin, Springer Verlag. [18] Syrmos, V.L. and Lewis, F.L. (1994). “Coupled and constrained Sylvester equations in system design”, Circuits, Systems, and Signal Processing, 13(6): 663-694. [19] Cavinlii, R.K. and Bhattacharyya, S.P. (1983). “Robust and wellconditioned eigenstructure assignment via Sylvester’s equation”, Optimal Control Applications and Methods 4(3): 205-212. [20] Chen, J. Patton, R. and Zhang, H. (1996). “Design unknown input observers and robust fault detection filters”, International Journal of Control 63: 85-105. [21] Park, J. and Rizzoni, G. (1994). “An eigenstructure assignment algorithm for the design of fault detection filters”, IEEE Transactions on Automatic Control, 39: 1521-1524. [22] Varga, A. (2000). “Robust pole assignment via Sylvester equation based state feedback parametrization: Computer-Aided Control System Design (CACSD)”, IEEE International Symposium, 57: 13-18. [23] Syrmos, V.L. and Lewis, F.L. (1993). “Output feedback eigenstructure assignment using two Sylvester equations”, IEEE Transactions on Automatic Control, 38: 495-499. [24] Villareal, E.R.L. Vargas, J.A.R. and Hemerly, E.M. (2009). “Static output feedback stabilization using invariant subspaces and Sylvester equations”, Trends in Applied and Computational Mathematics, 10(1): 99-110. [25] Li, R.C. (1999). “A bound on the solution to a structured Sylvester equation with an application to relative perturbation theory”, SIAM Journal on Matrix Analysis and Applications 21(2): 440-445. [26] Barraud, A. Lesecq, S. and Christov, N. (2001). “From sensitivity analysis to random floating point arithmetics-application to Sylvester Equations”, Numerical Analysis and Applications, Lecture Notes in Computer Science 1988: 35-41.

156

Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram et al.

[27] Darouach, M. (2006). “Solution to Sylvester equation associated to linear descriptor systems”, Systems and Control Letters, 55: 835-838. [28] Zhang, Y.N. Jiang, D.C. and Wang, J. (2002). “A recurrent neural network for solving Sylvester equation with time-varying coefficients”, IEEE Transactions on Neural Networks, 13(5): 1053-1063. [29] He, H.F. Cai, G.B. and Han, X.J. (2014). “Optimal pole assignment of linear systems by the Sylvester matrix equations”, Abstract and Applied Analysis, 7 pages. [30] Dehghan, M. and Hajarian, M. (2011). “Analysis of an iterative algorithm to solve the generalized coupled Sylvester matrix equations”, Applied Mathematical Modelling, 35: 3285-3300. [31] Ding, F. and Chen, T. (2005). “Gradient based iterative algorithms for solving a class of matrix equations”, IEEE Transactions on Automatic Control, 50(8): 1216-1221. [32] Ding, F. Liu, P. and Ding, J. (2008). “Iterative solutions of the generalized Sylvester matrix equations by using the hierarchical identification principle”, Applied Mathematics and Computation, 197(1): 41-50. [33] Duan, G.R. and Zhou, Z. (2006). “Solution to the second-order Sylvester matrix equation M V F 2 + DV F + KV = BW ”, IEEE Transactions on Automatic Control, 51(5): 805-809. [34] K˚agstr¨om, B. and Westin, L. (1989). “Generalized Schur methods with condition estimators for solving the generalized Sylvester equation”, IEEE Transactions on Automatic Control, 34: 745-751. [35] Song, C.Q. and Chen, G.L. (2011). “On solutions of matrix equations ˜ = C over quaternion field”, Journal XF − AX = C and XF − AX of Applied Mathematics and Computing, 37: 57-88. [36] Song, C.Q. Chen, G.L. and Zhao, L.L. (2011). “Iterative solutions to coupled Sylvester-transpose matrix equations”, Applied Mathematical Modelling , 35: 4675-4683. [37] Ter´an, F.D. and Dopico, F.M. (2011). “The solution of the equation XA + AX T = 0 and its application to the theory of orbits”, Linear Algebra and its Applications, 434: 44-67.

System of Sylvester-Type Quaternion Matrix Equations

157

[38] Ter´an, F.D. and Dopico, F.M. (2011). “The equation XA + AX ∗ = 0 and the dimension of ∗-congruence orbits”, Electronic Journal of Linear Algebra, 22: 448-465. [39] Tsui, C.C. (1987). “A complete analytical solution to the equation T A − F T = LC and its applications”, IEEE Transactions on Automatic Control, 8: 742-744. [40] Wimmer, H.K. (1994). “Consistency of a pair of generalized Sylvester equations”, IEEE Transactions on Automatic Control, 39: 1014-1016. [41] Wu, A.G. Duan, D.R.and Zhou, B. (2008). “Solution to generalized Sylvester matrix equations”, IEEE Transactions on Automatic Control, 53(3): 811-815. [42] Zhou, B. and Duan, G.R. (2006). “A new solution to the generalized Sylvester matrix equation AV − EV F = BW ”, Systems & Control Letters, 55: 193-198. [43] Wang, Q.W. and He, Z.H. (2013). “Solvability conditions and general solution for the mixed Sylvester equations”, Automatica, 49: 2713-2719. [44] He, Z.H. and Wang, Q.W. (2014). “A pair of mixed generalized Sylvester matrix equations”, Journal of Shanghai Normal University (Natural Sciences), 20(2): 138-156. [45] Liu, Y.H. (2006). “Ranks of solutions of the linear matrix equation AX + Y B = C”, Computers & Mathematics with Applications, 52: 861-872. [46] Lee, S.G. and Vu, Q.P. (2012). “Simultaneous solutions of matrix equations and simultaneous equivalence of matrices”, Linear Algebra and its Applications, 437: 2325-2339. [47] Rehman, A. and Wang, Q.W. (2015). “A system of matrix equations with five variables”, Applied Mathematics and Computation, 271: 805-819. [48] Rehman, A. Wang, Q.W. and He, Z.H. (2015). “Solution to a system of real quaternion matrix equations encompassing η-Hermicity”, Applied Mathematics and Computation, 265: 945-957.

158

Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram et al.

[49] Rehman, A. Wang, Q.W. Ali, I. Akram, M. and Ahmad, M.O. (2017). “A constraint system of generalized Sylvester quaternion matrix equations”, Advances in Applied Clifford Algebras, 27(4): 3183-3196. [50] Wang, Q.W. Rehman, A. He, Z.H. and Zhang, Y. (2016). “Constraint generalized Sylvester matrix equations”, Automatica, 69: 60-64. [51] Beik, F.P.A. (2014). “Theoretical results on the global GMRES method for solving generalized Sylvester matrix equations”, Bulletin of the Iranian Mathematical Society 40(5): 1097-1117. [52] Dehghan, M. and Hajarian, M. (2014). “Solving the system of generalized Sylvester matrix equations over the generalized centro-symmetric matrices”, Journal of Vibration and Control, 20(6): 838-846. [53] Dmytryshyn, A. and K˚agstr¨om, B. (2015). “Coupled sylvester-type matrix equations and block diagonalization”, SIAM Journal on Matrix Analysis and Applications, 36(2): 580-593. [54] Futorny, V. Klymchuk, T. and Sergeichuk, V.V. (2016). “Roth’s solvability b = C and X −AXB b = C over criteria for the matrix equations AX − XB the skew field of quaternions with an involutive automorphism q → qb”, Linear Algebra and Applications, 510: 246-258. [55] Kaabi, A. (2014). “On the numerical solution of generalized Sylvester matrix equations”, Bulletin of the Iranian Mathematical Society, 40(1): 101113. [56] Kyrchei, I. (2013). “Explicit representation formulas for the minimum norm least squares solutions of some quaternion matrix equations”, Linear Algebra and its Applications, 438(1): 136-152. [57] Kyrchei, I. (2017). “Explicit determinantal representation formulas for the solution of the two-sided restricted quaternionic matrix equation”, Journal of Applied Mathematics and Computing, doi:10.1007/s12190-017-1148-6. [58] Kyrchei, I. (2018). “Determinantal representations of solutions to systems of quaternion matrix equations”, Advances in Applied Clifford Algebras, 28: 23 pages.

System of Sylvester-Type Quaternion Matrix Equations

159

[59] Madiseh, M.D. and Dehghan, M. (2014). “GeneralizedPsolution sets p of the interval generalized Sylvester matrix equation i=1 Ai Xi + Pq j=1 Yj Bj = C and some approaches for inner and outer estimations”, Computers & Mathematics with Applications, 68: 1758-1774. [60] Ramadan, M.A. El-Danaf, T.S. and Bayoumi, A.M.E. (2015). “Two iterative algorithms for the reflexive and Hermitian reflexive solutions of the generalized Sylvester matrix equation”, Journal of Vibration and Control, 21(3): 483-492. [61] Benzaouia,A. Rami, M.A. and Faiz, S.E. (2004). “Stabilization of linear systems with saturation: a Sylvester equation approach”, IMA Journal of Mathematical Control and Information 21(3): 247-259. [62] Yang, C. Liu, J. and Liu, Y. (2012). “Solutions of the generalized Sylvester matrix equation and the application in eigenstructure assignment”, Asian Journal of Control, 14(6): 1669-1675. [63] Duan, G.R. Liu, G.P. and Thompson, S. (2001). “Eigenstructure assignment design for proportional-integral observers: continuous-time case”, IEE Proceedings - Control Theory and Applications, 148(3): 263-267. [64] Tsui, C.C. (1988). “New approach to robust observer design”, International Journal of Control, 47(3): 745-751. [65] Dehghan, M. and Hajarian, M. (2009). “An efficient iterative method for solving the second-order Sylvester matrix equation EV F 2 − AV F − CV = BW ”, IET Control Theory & Applications, 3: 1401-1408. [66] Kyrchei, I. (2011). “Determinantal representations of the Moore-Penrose inverse over the quaternion skew field and corresponding Cramer’s rules”, Linear and Multilinear Algebra, 59(4): 413-431. [67] Kyrchei, I. (2012). “Determinantal representation of the Moore-Penrose inverse matrix over the quaternion skew field”, Journal of Mathematical Sciences, 180(1): 23-33. [68] Baksalary, J.K. and Kala, P. (1979). “The matrix equation AX + Y B = C”. Linear Algebra and its Applications, 25: 41-43.

160

Abdur Rehman, Ivan I. Kyrchei, Muhammad Akram et al.

[69] Marsaglia, G. and Styan, G.P.H. (1974). “Equalities and inequalities for ranks of matrices”, Linear and Multilinear Algebra, 2: 269-292. [70] He, Z.H. and Wang, Q.W. (2015). “The general solutions to some systems of matrix equations”, Linear & Multilinear Algebra, 63 (10): 2017-2032. [71] Wang, Q.W. and He, Z.H. (2012). “Some matrix equations with applications”, Linear and Multilinear Algebra, 60: 1327-1353. [72] Buxton, J.N. Churchouse, R.F. and Tayler, A.B. (1990). “Matrices Methods and Applications”. Clarendon Press, Oxford. [73] Wang, Q.W. (2005). “A system of four matrix equations over von neumann regular rings and its applications”, Acta Mathematica Sinica, English Series, 21(2): 323-334. [74] Bapat, R.B. Bhaskara, K.P.S. and Prasad, K.M. (1990). “Generalized inverses over integral domains”, Linear Algebra and its Applications, 140: 181-196. [75] Gabriel, R. (1969). “Das verallgemeinerte inverse einer matrix deren elemente u¨ ber einem beliebigen K¨orper angeh¨oren”, Journal of reine angewandte Mathematics, 234: 107-122. [76] Kyrchei, I. (2008). “Analogs of the adjoint matrix for generalized inverses and corresponding Cramer’s rules”, Linear and Multilinear Algebra, 56(9): 453-469. [77] Kyrchei, I. (2015). “Cramer’s rule for generalized inverse solutions”. In: I. Kyrchei (Ed.), Advances in Linear Algebra Research, Nova Science Publ., New York,79-132. [78] Stanimirovi´c, P.S. (1996). “General determinantal representation of pseudoinverses of matrices”, Mat. Vesn, 48: 1-9. [79] Kyrchei, I. (2007). “Cramer’s rule for quaternion systems of linear equations”, Fundamentalnaya i Prikladnaya Matematika, 13(4): 67-94. [80] Kyrchei, I. (2012). “The theory of the column and row determinants in a quaternion linear algebra”, In: A.R. Baswell (Ed.), Advances in Mathematics Research 15, Nova Science Publ., New York, pp. 301-359.

System of Sylvester-Type Quaternion Matrix Equations

161

[81] Maciejewski, A.A. and Klein, C.A. (1985). “Obstacle avoidance for kinematically redundant manipulators in dynamically varying environments”, The International Journal of Robotics Research, 4(3): 109-117.

In: Hot Topics in Linear Algebra Editor: Ivan I. Kyrchei

ISBN: 978-1-53617-770-1 c 2020 Nova Science Publishers, Inc.

Chapter 5

H ESSENBERG M ATRICES : P ROPERTIES AND S OME A PPLICATIONS Taras Goy∗and Roman Zatorsky† Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk, Ukraine

Abstract We consider a new approach to studying the properties of Hessenberg matrices and propose an effective algorithm for calculating the determinants and permanents of such matrices. Also, we present the applications of Hessenberg matrices to the problems associated with linear recurrence relations and partition polynomials.

1.

INTRODUCTION

Over the years, linear algebra has been shown as the most fundamental component in mathematics as it presents powerful tools in wide varieties of areas from theoretical science to engineering, including computer science. Since the time of Leibniz, who considered tables of numbers (matrices) as a mathematical object for the first time in Europe, various matrices and their functions permeate all discrete and continuous mathematics. Although the permanent is rarely used in ∗ †

Corresponding Author’s Email: [email protected]. Corresponding Author’s Email: [email protected].

164

Taras Goy and Roman Zatorsky

linear algebra, it is often used in discrete mathematics, especially in graph theory and combinatorics. Determinants have a long history in mathematics and arise in numerous scientific and engineering applications, mostly as tools for solving linear systems of equations, matrix inversion and eigenvalue problems. Hessenberg matrices play a considerable role among square matrices. An (upper) Hessenberg matrix is a square matrix having zero entries below the first subdiagonal. Such matrices were first studied by German engineer and mathematician Karl Hessenberg whose dissertation investigated the computation of eigenvalues and eigenvectors of linear operators [1]. Hessenberg matrices prove their usefulness especially in computational linear algebra and computer programming because they are “almost upper triangular” and are obtained in intermediate steps in many algorithms for the eigenvalue problem [2]. Hessenberg matrices arise everywhere where linear recurrence relations occur, as well as where problems related to polynomials of partitions appear. Also, the Hessenberg matrices, in a certain sense, are the limiting case of square matrices for which there are Polya transformations [3]. It is important that normalized Hessenberg matrices and their functions (determinants and permanents) have a bijective relation with parapermanents of corresponding triangular tables [4, 5]. Since there is no systematic study of the Hessenberg matrices in contemporary literature, in this work we investigate general properties of the normalized Hessenberg matrices functions and their functions and their connections with linear recurrent relations.

2.

P RELIMINARY D EFINITIONS

2.1.

Hessenberg Matrices

AND

R ESULTS

Definition 2.1. A lower Hessenberg matrix Mn = (mij ) is an n × n matrix whose entries above the superdiagonal are all zero, i.e.,   m11 m12 0 ··· 0 0  m21  m22 m23 ··· 0 0     .. .. .. . . . . . .   . . . . . . Mn =  (1) .  mn−2,1 mn−2,2 mn−2,3 · · · mn−2,n−1  0    mn−1,1 mn−1,2 mn−1,3 · · · mn−1,n−1 mn−1,n  mn1 mn2 mn3 ··· mn,n−1 mnn

165

Hessenberg Matrices

Hessenberg matrices play an important role in both computational and applied mathematics (see, for example, [6–8] and references therein). Some authors computed determinants and permanents of various tridiagonal matrices which are in fact Hessenberg matrices [9, 10]. Cahill et al. [11] gave a recurrent relation for the determinant of the matrix Mn as follows det(Mn ) = mnn det(Mn−1 ) +

n−1 X

(−1)n−k mnk det(Mk−1 )

k=1

n−1 Y

mi,i+1 , (2)

i=k

where det(M0 ) = 1, by definition. In [12], Tamm give the concept of a normalized Hessenberg matrix (the Hessenberg matrix in a normalized form), i.e., mi+1,i = 1 for i = 1, . . . , n: 

h11 h21 .. .

1 h22 .. .

0 1 .. .

··· ··· .. .

    Hn =   hn−2,1 hn−2,2 hn−2,3 · · ·   hn−1,1 hn−1,2 hn−1,3 · · · hn1 hn2 hn3 ···

0 0 .. .

0 0 .. .

1 hn−1,n−1 hn,n−1



    . 0   1  hnn

(3)

For the special choice mij = ti−j+1 for all i and j, i.e., on each diagonal all the elements are the same, we get the Toeplitz-Hessenberg matrix   t1 t0 0 ··· 0 0  t2 t1 t0 ··· 0 0     .. .. .. ..  .. ..  . . . . .  . Tn = Tn (t0 ; t1 , . . . , tn ) =   , (4)  tn−2 tn−3 tn−4 · · · t0 0     tn−1 tn−2 tn−3 · · · t1 t0  tn tn−1 tn−2 · · · t2 t1

where t0 6= 0 is assumed. Then, from (2), we obtain

n X det(Tn) = (−t0 )k−1 tk det(Tn−k ), k=1

with det(T0 ) = 1.

n ≥ 1,

(5)

166

Taras Goy and Roman Zatorsky

The following result is known as Trudi’s formula (the case t0 = 1 of this formula is known as Brioschi’s formula [13]. It gives the multinomial extension of det(Tn ) as follows   X s1 + · · · + sn s1 det(Tn ) = (−t0 )n−s1 −···−sn t1 · · · tsnn , (6) s1 , . . ., sn s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

or, equivalently, det(Tn) =

X (−t0 )n−k k=1

2.2.

X

ti1 ti2 · · · tik .

i1 ,...,ik ≥1 i1 +i2 +···+ik =n

Permanents and Determinants

Basic combinatorial notions, which give rise to the notions of the determinant and the permanent of a square matrix are the notions of permutation and transversal (recall that the transversal of a square matrix is a tuple of elements taken by one at a time from each row and each column of the matrix). More precisely, the determinant and the permanent of a square matrix are defined as follows. Let   p11 p12 · · · p1n  p21 p22 · · · p2n    (7) Pn =  . .. ..  . ..  .. . . .  pn1 pn2 · · ·

pnn

Definition 2.2. The permanent and the determinant of the square matrix (7) are, respectively, the numbers X pi1 1 pi2 2 · · · pin n , (8) perm(Pn ) = ϕ∈Sn

det(Pn ) =

X

signϕ · pi1 1 pi2 2 · · · pin n ,

(9)

ϕ∈Sn

where the summation is over all permutations of n elements (i.e., over the symmetric group Sn ) and sgn ϕ is the sign of the permutation ϕ.

Hessenberg Matrices

167

It is well known that the permanent can be defined for any rectangular matrix. In order to compare the definitions of determinant and permanent, however, we consider only the permanent of a square matrix. In algebra it is proved that each permutation decomposes uniquely into a product of independent cycles. The number n − r, where n is the order of the permutation, and r is the number of cycles, is called the decrement of the permutation. The set of permutations of order n is divided into even and odd permutations (the evenness of the permutation coincides with the evenness of its decrement). In linear algebra, the computation of the permanent of a matrix is a problem that is thought to be more difficult than the computation of the determinant of a matrix despite the slight difference of the definitions. The development of both exact and approximate algorithms for computing the permanent of a matrix is an active area of researches.

2.3.

Ordered Partitions of a Positive Integer

An m-partition of a nonnegative integer n into nonnegative integer summands is the set (n1 , n2 , . . ., nm ) of the positive integers, the elements of which satisfy n1 + n2 + · · · + nm = n. Denote by Cm (n, +) the set of all ordered m-partitions of positive integer n, and by C(n, +) the set of all ordered partitions of positive n into nonnegative integer summands. For example, C3 (4, +) = {(2, 1, 1), (1, 2, 1), (1, 1, 2)}, C(4, +) = {(4), (3, 1), (1, 3), (2, 2), (2, 1, 1), (1, 2, 1), (1, 1, 2), (1, 1, 1, 1)}. If |Cm (n, +)| = c(n, m) and |C(n, +)| = c(n), then c(n) =

n X

c(n, m),

(10)

m=1

where c(n, m) = 0 for n < m. To calculate the number c(n, m) we need the following statement. Proposition 2.1. The number of nonnegative integer solutions of the equation x1 + x2 + · · · + xm = n equals

 n+m−1 n

.

(11)

168

Taras Goy and Roman Zatorsky

Proof. We use a bijective proof. Every solution (x1 , x2 , . . . , xm) of (11) we assign (n + m − 1)-tuple which consist of n units and (m − 1) zeroes separating units, i.e, (x1 , x2 , . . ., xm ) ←→ (1, . . ., 1, 0, 1, . . ., 1, 0, . . . , 0, 1, . . ., 1). | {z } | {z } | {z } x1

x2

Now it is obvious that the desired number is

n+m−1 n

Proposition 2.2. The following equality holds   n−1 c(n, m) = . m−1

xm

 .

(12)

Proof. Since m ≤ n, then we subtract m from both sides of (11). We get the equation (x1 − 1) + (x2 − 1) + · · · + (xm − 1) = n − m. Consequently, the bijection is established between positive integer solutions of (11) and nonnegative integer solutions of the equation y1 + y2 + · · · + ym = n − m, where yi = xi − 1, i = 1, . . ., m. By Proposition 2.1, the number of solutions n−1  of the last equation equals m−1 . Proposition 2.3. The following holds

c(n) = 2n−1 .

(13)

Proof. The result follows directly from (10) and (12): c(n) =

n X

m=1

c(n, m) =

 n  X n−1

m=1

m−1

= 2n−1 .

Hessenberg Matrices

2.4.

169

Sets Ξ(n)

Definition 2.3. The set Ξ(n) is the set of all ordered n-tuples  ξ = ξ(1), ξ(2), . . ., ξ(n)

from the multiset with the primary specification {11 , 22 , . . . , nn }, the elements of which satisfy the following conditions: 1) an integer ξ(j) > 0 satisfies j ≤ ξ(j) ≤ n for j ∈ {1, 2, . . ., n}; 2) for each j ∈ {1, 2, . . ., n} the following equalities hold ξ(j) = ξ(j + 1) = . . . = ξ(ξ(j)). For example, Ξ(3) = {(1, 2, 3), (2, 2, 3), (1, 3, 3), (3, 3, 3)}. Note that for j = n the inequality 1) in Definition 2.3 implies ξ(n) = n. The sets Ξ(n) first appeared while finding the number of shortest paths in a Ferrer graph [14], connecting the limiting southeast point with the limiting northwest point of this graph. Proposition 2.4. There is a one-to-one correspondence between the elements of the set Ξ(n) and the elements of the set C(n, +) of ordered partitions of a positive integer n. Proof. We prove this in several steps; see [4]. 1. If an element ξ ∈ Ξ(n) has primary specification [1α(1), 2α(2), . . . , nα(n)], then α(1) + α(2) + · · · + α(n) = n. Therefore, the exponents α(1), α(2), . . . , α(n) of the first specification of an element ξ form some ordered partitions of a number n. Thus, we obtain reflection ϕ : ξ 7→ (α(1), α(2), . . ., α(n)) of the set Ξ(n) in the set C(n, +). 2. Let us prove a injectiveness of the reflection ϕ. Let ξ1 = (ξ1 (1), ξ1(2), . . ., ξ1 (n)),

ξ2 = (ξ2 (1), ξ2(2), . . ., ξ2 (n))

be two different elements of the set Ξ(n) with primary specifications [1α1(1), 2α1(2) , . . ., nα1 (n) ] and [1α2(1), 2α2(2), . . . , nα2(n) ].

170

Taras Goy and Roman Zatorsky

Let i be also the least index when ξ1 (i) 6= ξ2 (i). We assume that ξ1 (i) < ξ2 (i). Then from the condition 2) in Definition 2.3 follows the inequality α1 (ξ2 (i)) < α2 (ξ2 (i)), i.e., different ordered partitions of the set C(n, +) correspond to the elements ξ1 and ξ2 . 3. We construct back reflection Ξ(n) in C(n, +) by the following algorithm. Let p = (p(1), p(2), . . ., p(s)) ∈ C(n, +). 1◦ beginning; 2◦ i := 1; p := p(i); j := 1; 3◦ ξ(j) = . . . = ξ(p) = p; 4◦ j := p + 1; i := i + 1; 5◦ if i ≤ s, then p := p + p(i); go to 3◦ ; 6◦ end. Since p(i) ≥ 1, after meeting 4◦ and 5◦ of this algorithm, the following inequality holds j ≤ p, which together with the equalities of 3◦ satisfy both conditions in Definition 2.3. Let ξ ∈ Ξ(n) and r be a number of different components of an element ξ. The number n − r is a decrement ξ of an element ξ, and a number ε(ξ) = (−1)n−r is its sign. Note that the bijection ϕ from proof of Proposition 2.4 maintains the signs as the number of different components of an element ξ equals the number of non-zero components of the partition ϕ(ξ). From Propositions 2.3 and 2.4 it follows directly that |Ξ(n)| = 2n−1 .

(14)

The set Ξ(n) can also be constructed by the following recurrent algorithm [4]: 1◦ Ξ(1) = {(1)}; 2◦ If the set Ξ(k) is constructed already, then the elements of the set Ξ(k+1) can be obtained by forming two elements of the set Ξ(k + 1) with the help of each element ξ = (ξ(1), . . ., ξ(k)) of the set Ξ(k). The first is ascribed to the (k + 1) place of a number k + 1, and the second is formed with replacement of all the components equal to k by k + 1 and is ascribed to the (k + 1) place of the component k + 1. Proof. By Definition 2.3, (k + 1) place in each ordered multiset of the set Ξ(k + 1) is taken by a number k + 1, therefore, ascribing the number k + 1

171

Hessenberg Matrices

to the elements of the set Ξ(k) to the (k + 1) place, we obtain 2k−1 different elements of the set Ξ(k+1). The set of these elements is denoted by Ξ(k; k+1). Replacement of a number k by a number k + 1 in each multiset of the set Ξ(k) as well as ascription of k+1 to the (k+1) place does not violate conditions of the definition of the set Ξ(n), for all the elements ξ(i), less than k, satisfy these conditions, and a number k + 1, which appeared in the j place, also takes all the consecutive places up to (k + 1) inclusive. This procedure gives 2k−1 more different elements of the set Ξ(k + 1). Let us denote the set of these elements by Ξ(k + 1; k + 1). Multiplicity of occurrence of a number k + 1 in each element of the set Ξ(k; k + 1) equals 1, and multiplicity of occurrence of the same number in each element of the set Ξ(k + 1; k + 1) is more than 1. Therefore, all the elements of these two sets are different and belong to set Ξ(k + 1). Since Ξ(k; k + 1) ∪ Ξ(k + 1; k + 1) ⊆ Ξ(k + 1) and |Ξ(k; k + 1) ∪ Ξ(k + 1; k + 1)| = 2k , from (14) follows Ξ(k; k + 1) ∪ Ξ(k + 1; k + 1) = Ξ(k + 1).

2.5.

Definition of Triangular Tables and Their Functions

In this section, we give the notion of functions of triangular tables, which are called the paradeterminant and the parapermanent of triangular tables. For more details and complete the bibliography see, for example, [4, 5, 15–17]. Definition 2.4. A triangular table

An = (aij )1≤j≤i≤n



a11  a21  = .  ..

a22 .. .

an1 an2



..

. ···

ann

   

(15)

of the numbers from some number field is called a triangular table of order n. To every element aij of the table (15), we correspond (i − j + 1) elements aik with k ∈ {j, . . ., i}, which are called derivative elements of a triangular table, generated by a key element aij . A key element of a triangular table is concurrently its derivative element.

172

Taras Goy and Roman Zatorsky

The product of all derivative elements generated by a key element aij is denoted by {aij } and called a factorial product of the key element aij , i.e., Yi aik . {aij } = k=j

Let us show schematically elements of the matrix (15) by means of circles, key and derivative elements by means of filled circles and asterisks, respectively. Fig. 2.1 presents the triangular table A5 , where a42 is a key element, and elements a42 , a43 , a44 are derivative elements, which it generates.   ◦ ◦ ◦    ◦ ◦ ◦    ◦ • ∗ ∗  ◦ ◦ ◦ ◦ ◦ Fig. 2.1

A tuple of key elements of the table (15) is called a normal tuple of this table, if they generate a monotransversal, i.e., a set of derivative elements of cardinality n, no two of which belong to the same column of this table. For instance, in order to add the key element of the table shown in Fig. 2.1 to a normal tuple of key elements, it is necessary to add two more key elements a11 and a55 to it. It turns out that there is a one-to-one correspondence between elements of the set C(n, +) and normal tuples of key elements of the table (15) of order n. Let us consider some ordered r-partition p = (p1 , . . ., pr ). To each component ps , s ∈ {1, . . . , r}, of this partition, we assign a key element aij of the table (15) by the following algorithm: 1◦ beginning; 2◦ j := 1; s := 0; i := 0; 3◦ s := s + 1; i := i + ps ; key element (s) := aij ; 4◦ if s < r then j := j + ps ; go to 3◦ ; 5◦ end. Thereby, we obtain a normal tuple of key elements generated by the partition p. It is also easy to establish a backward correspondence. There is a one-to-one correspondence between ordered r-partitions and normal tuple of key elements, namely (n1 , n2 , . . . , nr ) ∈ Cr (n, +) ⇔ (aσ1 ,σ0 +1 , aσ2,σ1 +1 , . . . , aσr ,σr −1 ),

(16)

173

Hessenberg Matrices

where σ0 = 0, σs = ni + ni+1 + · · · + ns with s = 1, 2, . . ., r. This algorithm describes one more geometric image of ordered partitions of a positive integer n into positive integer summands. Let us have a triangular table of order n. By the partition p = (p1 , . . . , pr ) of a positive integer n, we construct a normal tuple of elements of this table, which form its monotransversal. To the first component p1 of this partition, we correspond a row of elements of a triangular table, where there are exactly p1 elements of this table. This row is the p1 th row of the table. Then the first p1 columns are ignored and a new triangular table of order n − p1 is considered. To the second component p2 of the partition, we correspond a row of a new table, which consists of p2 elements, etc. For example, let us show a one-to-one correspondence between ordered partitions of the number 4 and normal tuples of key elements of a triangular table A4 according to the following schemes: ◦ ◦ ◦ ◦ ◦ ◦ • ∗ ∗ ∗ (4) ◦ • ∗ ◦ ◦ • ◦ ◦ ◦ • (2,1,1)

◦ ◦ ◦ • ∗ ∗ ◦ ◦ ◦ •

• ◦ ◦ ◦ ◦ ◦ ◦ • ∗ ∗

◦ • ∗ ◦ ◦ ◦ ◦ ◦ • ∗

(1,3)

(2,2)

• ◦ • ◦ ◦ ◦ ◦ ◦ • ∗

• ◦ • ◦ ◦ • ◦ ◦ ◦ •

(3,1) • ◦ ◦ ◦ • ∗ ◦ ◦ ◦ • (1,2,1)

(1,1,2)

(1,1,1,1)

Fig. 2.2 To every normal tuple a of key elements, we assign the sign (−1)ε(a), where ε(a) is the sum of all subscripts of the key elements from this tuple. Definition 2.5. The paradeterminant of the triangular table (15) is the number ddet(An ) =

X

(α1,...,αr )∈C(n,+)

(−1)

ε(a)

r Y

{ai(s),j(s)},

(17)

s=1

where ai(s),j(s) is the key element corresponding to the sth component of the partition α = (α1 , α2 , . . . , αr ), and (−1)ε(a) is the sign of the normal tuple a of key elements.

174

Taras Goy and Roman Zatorsky

The parapermanent of a triangular table defined similarly to its paradeterminant. Definition 2.6. The parapermanent of the triangular table (15) is the number r Y

X

pper(An ) =

{ai(s),j(s)},

(18)

(α1,...,αr )∈C(n,+) s=1

where ai(s),j(s) is a key element corresponding to the sth component of the partition α = (α1 , α2 , . . ., αr ) ∈ C(n, +). For example, using Definition 2.5, find the value of the paradeterminant of the triangular table A4 (see Fig. 2.2):   a11 a21 a22   ddet(A4 ) = ddet  a31 a32 a33  a41 a42 a43 a44 = −a41 a42 a43 a44 + a31 a32 a33 a44 + a11 a42 a43 a44 + a21 a22 a43 a44

− a21 a22 a33 a44 − a11 a32 a33 a44 − a11 a22 a43 a44 + a11 a22 a33 a44 . Note that the parapermanent of a triangular table can be defined as a sum of products of elements of all monotransversals of this table. In view of Proposition 2.3, both the paradeterminant and the parapermanent of order n consist of 2n−1 terms. In the sequel, where the paradeterminant and the parapermanent of a triangular table are presented simultaneously, we will use the term parafunction of a triangular table. We will prove the theorem which could be the definition of the parafunctions of the triangular table An . This theorem is based on the bijection (16) (see [18]). Theorem 2.1. If An is the triangular table (15), then the following equalities hold: ddet(An ) = pper(An ) =

n X

X

(−1)n−r

r=1 p1 +···+pr =n r n X X Y

r=1 p1 +···+pr =n s=1

r Y

{ap1 +···+ps , p1 +···+ps−1 +1 },

s=1

{ap1+···+ps , p1 +···+ps−1 +1 },

(19)

Hessenberg Matrices

175

where the summation is over the set of positive integer solutions of the equation p1 + · · · + pr = n. Next, we will give one more theorem which could be the definition of parafunctions of a triangular table. It is based on the notion of the set Ξ(n) [4]. Theorem 2.2. If An is the triangular table (15), then the following equalities hold X ddet(An ) = (−1)n−r aξ(1),1aξ(2),2 · · · aξ(n),n , ξ∈Ξ(n)

pper(An ) =

X

aξ(1),1 aξ(2),2 · · · aξ(n),n ,

ξ∈Ξ(n)

where r is the number of elements in the basis of the multiset ξ or the number of elements belonging to this basis.

2.6.

On Connection of Parafunctions of Triangular Tables

An important fact of calculating triangular tables is the ability to establish a bijective relationship between parafunctions of a triangular table. Based on this fact, we established a bijective relationship between the permanent and determinant of Hessenberg matrix. The next theorem gives a relation of the parapermanent to the paradeterminant [18]. Theorem 2.3. If An is the triangular table (15), then the following holds  pper(An ) = ddet (−1)δij +1 aij 1≤j≤i≤n , (20)

where δij is the Kronecker symbol.

Proof. By the definition of the paradeterminant of a triangular table, the sign of its each summand depends on evenness of the sum of indices of all key elements. It is obvious that the sign of the factorial product of the key element aij of the  table (−1)δij +1 aij 1≤j≤i≤n coincides with the sign of the expression (−1)2i. Consequently, all the summands of the paradeterminant of the right-hand side of (20) have the plus sign.

176

Taras Goy and Roman Zatorsky

Corollary 2.1. For every triangular table Bn = (bij )1≤j≤i≤n ,  ddet(Bn ) = pper (−1)δij +1 bij 1≤j≤i≤n ,

(21)

where δij is the Kronecker symbol.

Proof. The result immediately follows from (20) if aij = (−1)δij +1 bij .

3.

R ELATIONSHIP OF THE D ETERMINANT OF H ESSENBERG MATRIX AND PARADETERMINANT

The analogy of properties of determinants and paradeterminants can be explained to a great extent by close relation between them. It turns out that in a number of cases, determinants can be replaced with paradeterminants. Since n(n−1) to find the values of the latter ones, it is enough to perform only 2 multiplications and the same number of additions, in many cases, the replacement of the determinant by the paradeterminant equal to it may considerably simplify computing. Let us consider the normalized Hessenberg matrix Hn in the form (3). Theorem 3.1. [4] For any triangular table (15), the following holds ddet(An ) = det(Hn ), where hij = {aij } =

Yi

k=j

aik ,

1 ≤ j ≤ i ≤ n.

(22) (23)

Proof. We will prove that the following algorithm establishes a one-to-one correspondence between the set of normal tuples of key elements of a triangular table and the set of transversals with non-zero elements of the Hessenberg matrix Hn : i) if aij is a key element of a triangular table, then the element hij of a square matrix belongs to the transversal; ii) but if aik , k ∈ {j +1, . . ., i}, is any derivative element of the key element aij , then to the transversal belongs unity, which is in the (k − 1)th row and the kth column of the square matrix. 1. Let us consider two factorial products of key elements ai1 ,j1 and ai2 ,j2 , which belong to one normal tuple. By the definition of a normal tuple of key elements and their factorial product, the sets of column numbers of all the elements of these factorial products satisfy {j1 , j1 + 1, . . ., i1 } ∩ {j2, j2 + 1, . . ., i2 } = ∅.

Hessenberg Matrices

177

Therefore, the given above the algorithm to each normal tuple of key elements of the triangular table (15) corresponds the transversal of non-zero elements of the Hessenberg matrix (3). 2. In consideration of point i) of the above algorithm, different normal tuples of key elements of a triangular table correspond to different transversals with non-zero entries of the Hessenberg matrix. 3. The number of transversals with non-zero elements of a matrix B2 equals two. Decompose the determinant of a normalized Hessenberg matrix of order k by the elements of the first row. At that we obtain two Hessenberg determinants of order k − 1. Consequently, by induction, the number of transversals with non-zero elements of the matrix (3), as well as the number of all normal tuples of the triangular table (15), equals 2n−1 . 4. To prove the theorem, it remains to show that the sign of the respective summands of the paradeterminant and the determinant are the same. Let ai1 j1 , ai2 j2 , . . . , aik jk be some normal tuple of key elements of a triangular taPk ble, to which the following sign corresponds (−1) s=1 (is +js ) , and the following equalities hold i1 < i2 < . . . < ik . By the given above algorithm, to a key element aij and its derivative elements corresponds an element bij and i − j elements, which belong to the rows with the numbers less than i. Thus, the total number of transpositions P of permutation of the first indices, respective to the given normal tuple, equals ks=1 (is − js ) and has the same evenness as the P value of the expression ks=1 (is + js ) which defines the sign of the respective summand of the paradeterminant. Note that this demonstration is valuable not because of its simplicity, but because of construction of a one-to-one correspondence between the normal tuples of key elements of a triangular table and transversals of non-zero elements of a normalized Hessenberg matrix. Corollary 3.1. For every Hessenberg matrix (3), the following equality holds   h11  h21 h22   h22  det(Hn ) = ddet  . (24) . . . . . .  .  . . hn1 hn2 hn2 hn3 · · · hnn

Note that the equality (24) holds even when some elements of a Hessenberg matrix equal 0, because when finding the value of the respective paradeterminant, zeros are canceled and uncertainty disappears.

178

4.

Taras Goy and Roman Zatorsky

N ORMALIZATION OF A G ENERAL H ESSENBERG MATRIX

A normalized Hessenberg matrix (3) are more convenient than the general Hessenberg matrix (1) because it has only n(n+1) elements that affect its value. In 2 addition, a normalized Hessenberg matrix, as noted above, can be represented as a triangular table (hij )1≤j≤i≤n . In our opinion, the main advantages of a normalized Hessenberg matrices over a general Hessenberg matrix is its simpler connection with corresponding linear recurrence relations, and also a natural bijective connection with calculus of triangular tables. Now we consider one statement by which we can normalize a general Hessenberg matrix (1). Proposition 4.1. For n ≥ 1, the following formula hold  h11 a1 0 ··· 0 0  h21 h a · · · 0 0 22 2   .. .. .. . .. . .. ..  . . . . det   hn−2,1 hn−2,2 hn−2,3 · · · a 0 n−2   hn−1,1 hn−1,2 hn−1,3 · · · hn−1,n−1 an−1 hn1 hn2 hn3 ··· hn,n−1 hnn 

1 h22 .. .

h11 a1 h21 .. .

     = det  n−2 Q  ai · hn−1,1   i=1  n−1 Q ai · hn1 i=1

n−2 Q

ai · hn−1,2

i=2 n−1 Q i=2

ai · hn2

... ··· .. .

0 0 .. .

...

hn−1,n−1

. . . an−1 hn,n−1

         0 0 .. .



      . (25) 1     hnn

Proof. The result obviously follows after the term-by-term multiplication of the ith (2 ≤ i ≤ n) row from the determinant on the left side of the equality Qi−1 by Qi−1 k=1 ak and the term-by-term division of the jth row (2 ≤ j ≤ n) by k=1 ak .

Proposition 4.1 with ai = −1, i ∈ {1, 2, . . ., n − 1}, yields the following result (see [18]).

179

Hessenberg Matrices Proposition 4.2. The following equality hold  h11 1 ··· 0 0  −h h · · · 0 0 21 22   .. .. .. .. ..  . . . . . det   (−1)n−3 hn−2,1 (−1)n−4 hn−2,2 · · · 1 0   (−1)n−2 hn−1,1 (−1)n−3 hn−1,2 · · · hn−1,n−1 1 (−1)n−1 hn,1 (−1)n−2 hn,2 · · · −hn,n−1 hn,n   h11 −1 0 ··· 0 0  h21 h22 −1 ··· 0 0     .. .. .. .. ..  ..  . . . . . .  = det  .  hn−2,1 hn−2,2 hn−2,3 · · · −1 0     hn−1,1 hn−1,2 hn−1,3 · · · hn−1,n−1 −1  hn,1 hn,2 hn,3 · · · hn,n−1 hn,n

5.

        

P OLYA T RANSFORMATION AND H ESSENBERG MATRICES

Permanents are an important objects of linear algebra and find applications in various areas of mathematics, especially in combinatorial analysis, as they enumerate perfect matchings in bipartite graphs, in physics as they compute certain integrals and is computer science as they occupy a special place in the computational complexity hierarchy [19, 20]. In spite of the apparent simplicity of permanents, a natural algorithm for calculating them by simplifying matrices, similar to Gauss’ algorithm for calculating determinants, has not been discovered so far. In this context, Polya posed the problem of finding a transformation taking permanents to determinants and proved the impossibility of assigning + and − to the elements of a matrix of order n ≥ 3 so that the permanent is transformed into its determinant [3]. After fundamental generalization of Polya’s result given by Marcus and Minc [21], the hope to find even linear transformations, which would transform the permanent of a matrix of order n ≥ 3 into the determinant of the same order, disappeared. In this connection, in [20], the following problem was essentially posed: “Among all matrices of order n, determine a class of matrices for which a linear transformation is defined so that the permanent of every matrix coincides with the determinant of an initial matrix”.

180

Taras Goy and Roman Zatorsky

Definition 5.1. Let M be a square matrix of order n. The Polya transformation of this matrix is an assignment of the signs + and − to its elements which turns the permanent of the matrix into its determinant. We denote such transformation by P (M ). We will determine a class of matrices for which a Polya transformation exists. According to Theorem 3.1, for any triangular table (15) and HessenQ berg matrix (3), ddet(An ) = det(Hn ), where hij = {aij } = ik=j aik with 1 ≤ j ≤ i ≤ n. Corollary 3.1 establishes an identity between the determinant and the paradeterminant of a triangular tables. The first part of the proof of Theorem 3.1 consists in constructing a bijection between the summands of the paradeterminant pper(An ) and those of the determinant det(Hn ). The second part of the proof is devote to the signs of these terms. Since the permanents of a square matrices differs from the determinants of this matrix only in signs, it follows from Corollary 3.1 that perm(Hn ) = pper(An ),

(26)

where Hn and An are Hesssenberg matrix (3) and the triangular table (15), hij . respectively, and aij = hi,j+1 From (21) we find pper(An ) = ddet ((−1)δij +1 aij )1≤j≤i≤n , where δij is the Kronecker symbol. Thus, from (26) and (20) we obtain     hij δij +1 hij perm(Hn ) = pper = ddet (−1) . hi,j+1 1≤j≤i≤n hi,j+1 1≤j≤i≤n Since

Qi

hik k=j hi,k+1

= hij , it follows that

perm(Hn ) = det (−1)i−j hij

which implies the following result [18].



i,j=1,2,...,n

,

Theorem 5.1. For any Hessenberg matrix (3), there exist a Polya transformation, which is defined by P (Hn ) = (−1)i−j hij , i, j ∈ {1, 2, . . . , n}, in other words, perm(Hn ) = det(P (Hn )).

181

Hessenberg Matrices

Interestingly, for the Hessenberg matrix (3) to which at least one nonzero element hij , with i − j > −1 is added, then exist no Polya transformation. Indeed, let Hn∗ be a square matrix obtained from the Hessenberg matrix (3) by adding a nonzero element hij with j − i ≥ 2. Since the permanent of any matrix does not change under transpositions of columns or rows, we can pass to the new matrix in which the ith and jth columns are transposed. Then we obtain a matrix on which principal diagonal the following blocks occur consecutively: [h11 ], [h22 ], . . ., [hi−1,i−1 ], [hij ], [hi+1,i+1 ], . . ., [hj−2,j−2 ],   hj−1,j−2 hj−1,j−1 hj−1,i  hj,j−2 hj,j−1 hji  , [hj+2,j+2 ], . . ., [hnn ]. hj+1,j−2 hj+1,j−1 hj+1,i

Thus, in the initial permanent, six transversals are to be constructed with the help of the elements on the principal diagonal and the six transversals of the block   bj−1,j−2 bj−1,j−1 bj−1,i  bj,j−2 bj,j−1 bji  . bj+1,j−2 bj+1,j−1 bj+1,i

Here three of these transversals (those corresponding to even permutations) must be positive, and the remaining three transversals must be negative. In [3], Polya proved that such an arrangement of signs in three-order martix is impossible. This proves the desired assertion. Thus, a Hessenberg matrix contain the maximum number of nonzero elements among all matrices for which a Polya 2 elements, transformation exist. Since a Hessenberg matrix consists of n +3n−2 2 Theorem 5.1 agrees with Gibson’s assertion that if (0, 1)− matrix A of order n has a positive permanent and it is possible to transform the permanent of A into determinant by assigning the signs ± to the elements of a matrix A, then the 2 number of ones in A is at most n +3n−2 ; see [22]. 2 Now we prove one more proposition, which gives yet another Polya transformation for a Hessenberg matrix [18]. Proposition 5.1. For n ≥ 1,  a11 1  −a a 21 22   .. .. det  . .   (−1)n−2 an−1,1 (−1)n−3 an−1,2 (−1)n−1 an,1 (−1)n−2 an,2

··· ··· .. . ··· ···

0 0 .. .

0 0 .. .

an−1,n−1 1 −an,n−1 an,n

      

182

Taras Goy and Roman Zatorsky 

−1 a22 .. .

a11 a21 .. .

0 −1 .. .

··· ··· .. .

   = det    an−1,1 an−1,2 an−1,3 · · · an,1 an,2 an,3 · · ·

0 0 .. . an−1,n−1 an,n−1

0 0 .. .



   .  −1  an,n

(27)

Proof. For n = 1 and n = 2, the equality is obvious. Suppose that (27) hold for n = 1, 2, . . ., k − 1. Let us prove it for n = k. To this end, it suffices to decompose the determinant on the left-hand side of (27) over the last column into two determinants of order (k−1), multiply the second determinant and each element of its last row by (−1), apply (27) with n = k − 1 to both (k − 1)thorder determinants, and note that the sum of the two obtained terms is the result of decomposing the right-hand side of (27) over the last column. Thus, we obtain the following result. Theorem 5.2. [18] For a Hessenberg matrix (3), a Polya transformation exists and can be defined by   hij , 1 ≤ j ≤ i ≤ n, P (Hn ) = −1, j − i = 1,   0, j − i ≥ 2.

6.

A LGORITHMS FOR C ALCULATING D ETERMINANTS AND P ERMANENTS OF THE H ESSENBERG M ATRIX

In this Section, we present effective algorithms for calculating the functions of Hessenberg matrices using only the elements under a main diagonal.

6.1.

Calculating Functions of Hessenberg Matrix Using Ordered Partitions and Sets Ξ(n)

Theorem 6.1. For n ≥ 1, the following formula hold det(Hn ) =

n X

X

(−1)n−r

r=1 p1 +···+pr =n

r Y

s=1

hp1 +···+ps , p1 +···+ps−1 +1 .

(28)

183

Hessenberg Matrices

Proof. To prove the result, we will use Theorem 2.1 and Theorem 3.1 on connection of the paradeterminant and determinant of Hessenberg matrix (3). According to (23), we can replace the factorial product {ap1 +···+ps ,p1 +···+ps−1 +1 } on (19) by the element hp1 +···+ps ,p1 +···+ps−1 +1 . As a result, we get the desired equality (28). A similar theorem holds for the permanent of Hessenberg matrix. Theorem 6.2. The following formula hold perm(Hn ) =

n X

r Y

X

hp1 +...+ps ,p1+...+ps−1 +1 .

(29)

r=1 p1 +···+pr =n s=1

The next theorems give effective algorithms for calculating functions of Hessenberg matrix using sets Ξ(n). Theorem 6.3. For n ≥ 1, det(Hn ) =

X

(−1)n−r

(ξ1 ,...,ξn )∈Ξ(n)

n Y

hξi −ξi−1 +i−1,i ,

(30)

i=1

where ξ0 = 0 and hi,j = 1 with i < j. Proof. Let us illustrate the proof of this theorem for n = 5. One of the elements of the set Ξ(5) is (ξ1 , ξ2, ξ3 , ξ4 , ξ5 ) = (1, 3, 3, 5, 5). After a decomposition of the determinant in the sum of summands, by Theorem 2.2, this set will be correspond to the summand (−1)5−3 a11 a31 a33 a54 a55 = (−1)2 {a11 }{a31 }{a54 }. Calculating the corresponding determinant of the Hessenberg matrix, we, according to Theorem 3.1, have to replace each factorial product to corresponding element of the Hessenberg matrix. As a result, we get the summand (−1)5−3 h11 h31 h54 . Thus, we multiply only those elements of the Hessenberg matrix that correspond to the key elements of the triangular table, and replace all derive elements by 1’s. This can be achieved in the product hξ1 −ξ0 +0,1 hξ2 −ξ1 +1,2 hξ3 −ξ2 +2,3 hξ4 −ξ3 +3,4 hξ5 −ξ4 +4,5 = h11 h32 h23 h54 h45 , if ξ0 = 0 and hi,j = 1, i < j. A similar theorem holds for the permanent of the Hessenberg matrix Hn .

184

Taras Goy and Roman Zatorsky

Theorem 6.4. For n ≥ 1, the following identity hold: det(Hn ) =

X

n Y

hξi −ξi−1 +i−1,i ,

(31)

(ξ1 ,...,ξn)∈Ξ(n) i=1

where ξ0 = 0 and hi,j = 1 with i < j.

6.2.

Reducing the Order of Hessenberg Matrices

Functions of a Hessenberg matrix can be calculated effectively by reducing their order. Expanding the function of nth order Hessenberg matrix, we obtain two (n − 1)th order functions of Hessenberg matrices. For example, let Vn+1 (x1 ; z1 ) 

z1 x1 z2 x1 x2 z3 .. .

1 z2 x2 z3 .. .

0 1 z3 .. .

··· ··· ··· .. .

    = det     x1 · · · xn−1 zn x2 · · · xn−1 zn x3 · · · xn−1 zn · · · x1 · · · xn x2 · · · xn x3 · · · xn ···

0 0 0 .. . zn xn

Expanding the determinant along the first row, we obtain  z2 1 ··· 0  x z z · ·· 0 2 3 3   . .. . . .. .. .. Vn+1 (x1 ; z1 ) = z1 det  .   x2 · · · xn−1 zn x3 · · · xn−1 zn · · · zn x2 · · · xn x3 · · · xn · · · xn  x1 z2 1 ··· 0 0  x1 x2 z3 z3 ··· 0 0   . . .. .. .. .. .. − det  . . .   x1 · · · xn−1 zn x3 · · · xn−1 zn · · · zn 1 x1 · · · xn x3 · · · xn · · · xn 1

0 0 0 .. .



0 0 .. .



    .   1  1

     1  1       

= z1 Vn (x2 ; z2 ) − x1 Vn (x2 ; z2 ) = (z1 − x1 )Vn(x2 ; z2 ).

185

Hessenberg Matrices Therefore, for every positive integer number n,  z1 1 0  x z z 1 1 2 2   x x z x z3 z3 1 2 3 2  det  .. .. ..  . . .   x1 · · · xn−1 zn x2 · · · xn−1 zn x3 · · · xn−1 zn x1 · · · xn x2 · · · xn x3 · · · xn n Y = (zi − xi ).

··· ··· ··· .. .

0 0 0 .. .

··· ···

zn xn

0 0 0 .. .



       1  1 (32)

i=1

The similar result holds for permanent of the normalized Hessenberg matrix, i.e., 

z1 x1 z2 .. .

1 z2 .. .

0 1 .. .

··· ··· .. .

   perm    x1 · · · xn−1 zn x2 · · · xn−1 zn x3 · · · xn−1 zn · · · x1 · · · xn x2 · · · xn x3 · · · xn ··· n Y = (zi + xi ).

0 0 .. . zn xn

0 0 .. .



     1  1 (33)

i=1

If z1 = . . . = zn−1 = 1 and xi = i, i = 1, 2, . . . , n − 1, then we can rewrite (33) as follows   0! 1 0 ··· 0 0 0! 1!  1! 1 ··· 0 0  1!  n−1  0!  Y  .. .. .. .. .. .. =  . . . . . . (1 + i) = n!. perm   (n−2)! (n−2)! (n−2)! (n−2)!   i=1 · · · (n−2)! 1   0! 1! 2! (n−1)! (n−1)! (n−1)! (n−1)! (n−1)! · · · (n−2)! (n−1)! 0! 1! 2! The next theorem gives an efficient polynomial algorithm for calculating of Hessenberg matrices.

186

Taras Goy and Roman Zatorsky

Theorem 6.5. For generalized Hessenberg matrix (3), the following is true   h11 h22 − h21 1 0 ··· 0  h11 h32 − h31 h33 1 ··· 0     . . . .. , (34) . .. .. .. .. det(Hn ) = det .     h11 hn−1,2 − hn−1,1 hn−1,3 hn−1,4 · · · 1  h11 hn2 − hn1 hn3 hn4 · · · hnn

where n ≥ 2.

Proof. Expanding the determinant of the Hessenberg matrix along the first raw, we obtain   h22 1 ··· 0 0  h32 h33 ··· 0 0      . . . . . . . . . . h11 · det . . . . .     hn−1,2 hn−1,3 · · · hn−1,n−1 1  hn2 hn3 ··· hn,n−1 hnn   h21 1 ··· 0 0  h31 h33 ··· 0 0     ..  .. .. .. .. − det . .  . . .    hn−1,1 hn−1,3 · · · hn−1,n−1 1  hn1 hn3 ··· hn,n−1 hnn   h11 h22 1 ··· 0 0  h11 h32 h33 ··· 0 0     ..  . . . . .. .. .. .. = det .     h11 hn−1,2 hn−1,3 · · · hn−1,n−1 1  h11 hn2 hn3 ··· hn,n−1 hnn   h21 1 ··· 0 0  h31 h33 ··· 0 0      . . . . . . . . . . − det  . . . .  .    hn−1,1 hn−1,3 · · · hn−1,n−1 1  hn1 hn3 ··· hn,n−1 hnn

187

Hessenberg Matrices 

h11 h22 − h21 h11 h32 − h31 .. .

1 h33 .. .

0 1 .. .

··· ··· .. .

   = det   h11 hn−1,2 − hn−1,1 hn−1,3 hn−1,4 · · · h11 hn2 − hn1 hn3 hn4 ···

0 0 .. .



   .  1  hnn

It follows from Theorem 6.5 that the determinant of the Hessenberg matrix of order n we can reduce to the determinant of order (n − 1) using only (n − 1) the multiplication operation. So, its value we can calculate using only n(n−1) 2 multiplication operation. Therefore, Theorem 6.5 makes it possible to construct an efficient polynomial algorithm. Moreover, this algorithm can not improve elements, each significantly because the the Hessenberg matrix contains n(n+1) 2 of which affects the value of the determinant. Next, we formulate a similar result for the permanent of Hessenberg matrices. Theorem 6.6. For n ≥ 2,  h11 h22 + h21 1 0  h h + h h 1 11 32 31 33   .. .. .. perm(Hn ) = perm . . .   h11 hn−1,2 + hn−1,1 hn−1,3 hn−1,4 h11 hn2 + hn1 hn3 hn4

··· ··· .. . ··· ···

0 0 .. .



   . (35)  1  hnn

The proof of the result is similar to the proof of Theorem 6.5. Let us consider Theorems 6.5 and 6.6 in more detail. According to these theorems, the parafunction of triangular table An of order n is replaced by the (1) parafunction of the triangular table An−1 of order (n−1), the elements of which can be found in two steps: (1) 1. We find the value of the expression hi1 = (h11 − hi1 )hi2 with i ∈ {2, 3, . . ., n} in the case of finding the paradeterminant and the value of the (1) expression hi1 = (h11 + hi1 )hi2 , i = 2, 3, . . ., n, in the case of of finding the parapermanent. We place the greatest common divisor (if any) outside the sign of the respective parafunction. We replace i by i+1 in the obtained expression. The obtained

188

Taras Goy and Roman Zatorsky (1)

expression is equal to the value of the elements hi1 with i ∈ {1, 2, . . ., n − 1}, of the first column of the new matrix of order (n − 1). (1) 2. We obtain the rest of the elements of the new matrix Hn−1 by replacing the elements hij with i, j ∈ {3, 4, . . ., n} by the respective elements (1)

hij = hi+1,j+1 ,

i, j = 2, 3, . . ., n − 1.

Thus, with the help of (n − 1) iterations (k)

(k−1)

ai1 = (h11 (k)

(k−1)

(k−1)

(k−1)

− hi+1,1 )hi+1,2 , k = 1, . . . , n − 1,

hij = hi+1,j+1 ,

i = 1, . . . , n − k,

i, j = 2, . . ., n − k,

(36)

we get (n−1)

ddet(Hn ) = h11

,

(0)

where we assume that hij = hij . When calculating the permanent of Hessenberg matrix, these iterations will be similar as follows

(k)

(k−1) (k−1) (k−1) hi+1,2 + hi+1,1 , k = 1, . . ., n − 1, (k−1) hi+1,j+1 , i, j = 2, . . . , n − k.

hi1 = h11 (k)

hij =

i = 1, . . ., n − k,

For example, we find the determinant of the Hessenberg matrix   ia + jb + c , det i−j +1 1≤j≤i≤n where



ia + jb + c i−j +1



=

i Y ia + kb + c . i−k+1

k=j

(37)

189

Hessenberg Matrices The first iteration is as follows a11 = a + b + c, a(i + 1) + b + c , i = 1, 2, . . ., n − 1, ai+1,1 = i+1 a(i + 1) + 2b + c ai+1,2 = , i = 1, 2, . . ., n − 1, i (1) ai1 = (a11 − ai+1,1 )ai+1,2   a(i + 1) + b + c a(i + 1) + 2b + c = a+b+c− i+1 i a(i + 1) + 2b + c = (b + c) , i = 1, 2, . . ., n − 1. i+1

We place the common multiplier (b + c) outside the sign of the paradeterminant. The we obtain (1)

a(i + 1) + b(j + 1) + c (i + 1) − (j + 1) + 1 ai + bj + a + b + c = , i, j = 2, 3, . . . , n − 1. i−j+1

aij = ai+1,j+1 =

The second iteration is as follows (1)

2a + 2b + c , 2 a(i + 1) + a + 2b + c = , i = 1, 2, . . ., n − 1, i+2 a(i + 1) + a + 3b + c = , i = 1, 2, . . ., n − 1, i (1) (1) (1) = (a11 − ai+1,1 )ai+1,2   2a + 2b + c a(i + 1) + a + 2b + c a(i + 1) + a + 3b + c − = 2 i+2 i 2b + c a(i + 1) + a + 3b + c = · , 2 i+2

a11 = (1)

ai+1,1 (1)

ai+1,2 (2)

ai1

where i = 1, 2, . . ., n − 2. We take out the common multiplier (2)

aij =

ai + bj + 2a + 2b + c , i−j +1

2b+c 2 .

i = 2, 3, . . . , n − 2.

190

Taras Goy and Roman Zatorsky The third iteration is (2)

3a + 3b + c , 3 a(i + 1) + 2a + 3b + c , i = 1, 2, . . ., n − 2, = i+3 a(i + 1) + 2a + 4b + c = , i = 1, 2, . . ., n − 2, i (2) (2) (2) = (a11 − ai+1,1 )ai+1,2   3a + 3b + c a(i + 1) + 2a + 3b + c a(i + 1) + 2a + 4b + c − · = 3 i+3 i 3b + c a(i + 1) + 2a + 4b + c = · , i = 1, 2, . . ., n − 3. 3 i+3

a11 = (2)

ai+1,1 (2)

ai+1,2 (3)

ai1

We take out the common multiplier (3)

aij =

3b+c 3

ai + bi + 3a + 3b + c , i−j+1

as follows i, j = 2, 3, . . . , n − 3.

After the three iterations, we see some regularities. For example, the first columns are ai + a + 2b + c , i+1

ai + 2a + 3b + c , i+2

ai + 3a + 4b + c . i+3

2b+c 3b+c Common multipliers of the first columns are b+c 1 , 2 , 3 . So, it is possible that after the kth iteration, the first column and the first ai+ka+(k+1)b+c kb+c , k . Indeed, according to the multiplier will be written as i+k assumption, the respective equalities of the kth iteration will be of the form:   (k + i − 1)a + kb + c ka + kb + c (k + i − 1)a + (k + 1)b + c − i+k−1 k i−1 (i − 1)(kb + c) ai + (k − 1)a + (k + 1)b + c = · k(i + k − 1) i−1 kb + c ai + (k − 1)a + (k + 1)b + c = · , k i+k−1

The common multiplier is equal to (k)

ai1 =

kb+c k .

ai + ka + (k + 1)b + c , i+k

Then i = 1, 2, . . ., n − k.

191

Hessenberg Matrices

Our assumption is true. Thus, after (n − 1) iterations, outside the sign of (n−1)b+c the paradeterminant, we place the common divisor n−1 . At that the only element of the paradeterminant of the first order is equal to ai + (n − 1)a + nb + c , i+n−1 i.e.,

i = 1,

na+nb+c . n

It follows that for any real numbers a, b, c, the following identity holds 

ai + bj + c det i−j +1



=

1≤j≤i≤n

n−1 na + nb + c Y (ib + c), n! i=1

Q where 0i=1 (ib + c) = 1. Let us consider some partial cases of the Hessenberg determinant (38): • a = 1, b = −1, c = m:

det

"(

)# 1 mi−j+1{1} · i − j + 1 mi−j{1} 1≤j≤i≤n   i−j +m mn{−1} = det = ; i−j +1 n! 1≤j≤i≤n

• a = −1, b = 1, c = m:

det



 1 mi−j+1 · i − j + 1 mi−j{−1} 1≤j≤i≤n   mn{1} −i + j + m = det = ; i−j+1 n! 1≤j≤i≤n

• a = 0, b = 1, c = 0: det



j i−j +1



1≤j≤i≤n

= 1;

(38)

192

Taras Goy and Roman Zatorsky

• a = 0, b = 0, c = 1: 

det



=

1 ; n!



=

1n{2} ; 2n n!

1 i−j+1

1≤j≤i≤n

• a = −1, b = 1, c = 12 : det



2j − 2i + 1 2i − 2j + 2

1≤j≤i≤n

• a = 2, b = 1, c = 0: det



2i + j i−j +1



= 3,

1≤j≤i≤n

where xn{k} = x(x+k)(x+2k) · · · (x+(n−1)k) with x0{k} = 1, by definition.

6.3.

Decomposition of the Hessenberg Matrices by Elements of a Last Row

Denote the determinant (38) by Dn . Using (5) expand it along the elements of the last row as follows Dn =

n X

(−1)n−s Ds−1

s=1

n Y na + kb + c , n − (k − 1)

k=s

Q where D0 = 1 and qi=p (·) = 1 with g < p. Therefore, the last identity we can rewrite as Dn = (−1)n−1

n n n Y Y na + kb + c X na + kb + c + (−1)n−s Ds−1 . n−k+1 n−k+1 s=2

k=1

Since Ds−1 =

k=s

s−2 (s − 1)a + (s − 1)b + c Y (ib + c), (s − 1)! i=1

we have the following result.

193

Hessenberg Matrices Theorem 6.7. For arbitrary real numbers a, b, c, n−1 n Y na + kb + c na + nb + c Y (ib + c) = (−1)n−1 n! n−k+1 i=1

k=1

n s−2 n X Y (s − 1)a + (s − 1)b + c Y na + kb + c + (−1)n−s (ib + c) . (39) (s − 1)! n−k+1 s=2

i=1

k=s

Let us consider some partial cases of (39): • a = 1, b = −1, c = m: n

ms−1{−1} mn−s+1{1} mn{−1} X = (−1)n+s · ; n! (s − 1)! (n − s + 1)! s=1

• a = −1, b = 1, c = m, n

mn{1} X ms−1{1} mn−s+1{−1} = (−1)n+s · ; n! (s − 1)! (n − s + 1)! s=1

• a = 0, b = 1, c = 0: n X

(−1)n+s

s=1

sn−s+1{1} = 1; (n − s + 1)!

• a = 0, b = 0, c = 1: n X

(−1)n+s

s=1

1 1 = ; (s − 1)!(n − s + 1)! n!

• a = −1, b = 1, c = 12 : n X s=1

1s−1{2} 1n−s+1{−2} 1n{2} · = ; 2s−1 (s − 1)! 2n−s+1 (n − s + 1)! 2n n!

• a = 2, b = 1, c = 0: (−1) ,

n−1 (2n + 1)

n!

n{1}

+3·

n X i=2

(−1)n−i

(2n + i)n−i+1{−1} = 3. (n − i + 1)!

where xn{k} = x(x + k)(x + 2k) · · · (x + (n − 1)k) with x0{k} = 1.

194

7.

Taras Goy and Roman Zatorsky

A PPLICATIONS

OF

H ESSENBERG MATRICES

In this section, we consider some applications of the Hessenberg matrices to recurrent relations. For more details and complete the bibliography see, for example, [23, 24].

7.1.

Some Recurrent Relations and Hessenberg Matrices

Proposition 7.1. Let a0 = b0 =Qx0 = 1 and xs = 0 for s < 0, and {aij } is the factorial product, i.e., {aij } = ik=j aik . Systems of equations xi = {aii }b1 xi−1 − {ai,i−1 }b2 xi−2 + · · · + (−1)i−1 {ai1 }bixi−i ,

(40)

xi = −({aii }b1 xi−1 + {ai,i−1 }b2 xi−2 + · · · + {ai1 }bixi−i ),

(41)

xi = −({aii }b1 xi−1 − {ai,i−1 }b2 xi−2 + · · · + (−1)i−1 {ai1 }bixi−i ),

(42)

xi = {aii }b1 xi−1 + {ai,i−1 }b2 xi−2 + · · · + {ai1 }bixi−i ,

(43)



1 x2 a1 .. .

x1 a1 x2 a2 .. .

0 1 .. .

   perm    xn−1 an−1 xn−1 an−2 xn−1 an−3 xn an xn an−1 xn an−2

··· ... .. .

0 0 .. .

0 0 .. .

. . . xn−1 a1 1 . . . xn a2 xn a1



    = bi ,  

(44)

where i = 1, 2, . . ., have, respectively, the following solutions xi = det ({asr }bs−r+1 )1≤r≤s≤i , i

(45)

xi = (−1) det ({asr }bs−r+1 )1≤r≤s≤i ,

(46)

xi = (−1)iperm ({asr }bs−r+1 )1≤r≤s≤i ,

(47)

xi = perm ({asr }bs−r+1 )1≤r≤s≤i , bi xi = . ai + ai−1 b1 + ai−2 b2 + · · · + a2 bi−2 + a1 bi−1

(48)

Proof. The result for system (40) clearly holds, when i = 1. Suppose it is true for i = 2, 3, . . ., k − 1 and prove it for i = k. Since the right-hand side of (40) for i = k is a expansion of the paradeterminant (45) along he elements of its last row, this statement for system (40) is true for every positive integer i.

Hessenberg Matrices

195

To prove the result for systems (41)–(43), we write them, respectively, as xi = aii (−b1 )xi−1 − aii ai,i−1 b2 xi−2 + · · · + (−1)i−1 aii · · · ai1 (−1)i bi , xi = aii (−b1 )xi−1 − aii ai,i−1 (−b2 )xi−2 + · · · + (−1)i−1 aii · · · ai1 (−bi ), xi = aii b1 xi−1 − aii ai,i−1 (−b2 )xi−2 + · · · + (−1)i−1 aii · · · ai1 (−1)i−1 bi . Apply to these systems the result for system (40). Taking out (−1) from each column of the parapermanent which is the solution of systems (41) and (42) and applying Theorem 2.3 for systems (42) and (43), we obtain, respectively, solutions (46)–(48). The proof of the statement for the system (44) follows from the expansion of the parapermanent on the left-hand side of (44) along the last row.

7.2.

Some Combinatorial Identities Using Hessenberg-Toeplitz Matrices

In [25], Horadam presented a history of number sequence nno {On }n≥1 = , 2n n≥1

(49)

attributed to famous French naturalist and philosopher Nicole Oresme. The Oresme numbers On also can be defined by the recurrence 1 On = On−1 − On−2 , 4

n ≥ 2,

(50)

with initial values O0 = 0, O1 = 12 . We study some families of determinants of Toeplitz-Hessenberg matrices of the form (4) the entries of which are Oresme numbers. Our first result provides a relation between the Oresme numbers and the Fibonacci and Pell sequences. Recall that the Fibonacci sequence {Fn }n≥0 is defined by the initial values F0 = 0, F1 = 1 and the recurrence Fn = Fn−1 + Fn−2 , where n ≥ 2. The Pell sequence {Pn }n≥0 is defined by the recurrence Pn = 2Pn−1 + Pn−2 , where P0 = 0, P1 = 1.

196

Taras Goy and Roman Zatorsky

Theorem 7.1. [26] For n ≥ 1,  F2n det Tn (−1; O1, O2 , . . ., On ) = n , 2  F3n−1 det Tn (−1; O1, O3 , . . . , O2n−1 ) = 2n−1 , 2  Pn−1 det Tn (−1; O0, O1 , . . ., On−1 ) = n−1 . 2

(51)

Proof. We will prove formula (51) using induction on n. The other proofs follow similarly, so we omit them. Let  Dn = det Tn (−1; O1 , O2, . . . , On) .

Clearly, formula (51) holds, when n = 1 and n = 2. Suppose it is true for all positive integers k ≤ n − 1, where n ≥ 2. Using recurrences (5) and (50), we get Dn =

n X

Oi Dn−i

i=1

 n  X 1 = O1 Dn−1 + Oi−1 − Oi−2 Dn−i 4 =

1 Dn−2 + 2

i=2 n−1 X

Oi Dn−i−1 −

i=1

1 1 = Dn−1 + Dn−1 − Dn−2 2 4 1 3 = Dn−1 − Dn−2 . 2 4

n−2

1X Oi Dn−i−2 4 i=0

Using the induction hypothesis, we obtain 3 F2n−2 1 F2n−4 · n−1 − · n−2 2 2 4 2 1 F2n = n (3F2n−2 − F2n−4 ) = n . 2 2

Dn =

Consequently, the formula (51) is true for n. Therefore, by induction, the formula holds for all positive integers.

Hessenberg Matrices

197

The next theorem gives connection between Fibonacci numbers and Pell numbers using Toeplitz-Hessenberg determinants [27]. Theorem 7.2. For all n ≥ 1, the following formulas hold:  Fn =(−1)n−1 det Tn (1; P1, P2 , . . . , Pn ) ,  F2n+3 = det Tn (1; P3, P4 , . . . , Pn+2 ) .

(52) (53)

Proof. We will prove formula (52) using the principle of mathematical induction on n. The proof of (53) follow similarly, so we omit it for brevity. Let  Dn = det Tn (1; P1 , P2 , . . . , Pn) .

The formula (52) clearly holds, when n = 1 and n = 2. Suppose it is true for all k ≤ n − 1, where n ≥ 3. Using recurrence (5) and Pell recurrence, we have Dn =

n X (−1)k−1 Pk Dn−k k=1

= P1 Dn−1 +

n X

(−1)k−1 (2Pk−1 + Pk−2 ) Dn−k

k=2

n−1 n−2 X X k = Dn−1 + 2 (−1) Pk Dn−k−1 + (−1)k+1 Pk Dn−k−2 k=1

k=0

= Dn−1 − 2Dn−1 + Dn−2 = −Dn−1 + Dn−2 .

Using the induction hypothesis and the Fibonacci recurrence, we obtain Dn = −(−1)n−2 Fn−1 + (−1)n−3 Fn−2 = (−1)n−1 Fn . Consequently, formula (52) is true in the n case and thus, by induction, it holds for all positive integers. Now we focus on multinomial extension of Theorem 7.1, 7.2, using Trudi’s formula (6).

198

Taras Goy and Roman Zatorsky

Corollary 7.1. For n ≥ 1, the following formulas hold X F2n = 2n · mn (s)O1s1 O2s2 · · · Onsn ,

(54)

s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

X

F3n−1 = 22n−1 ·

sn , mn (s)O1s1 O3s2 · · · O2n−1

(55)

s

(56)

s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

X

Pn−1 = 2n−1 ·

n−1 mn−1 (s)O1s1 O2s2 · · · On−1 ,

s1 ,...,sn−1 ≥0 2s1 +3s2 +···+nsn−1 =n

where mn (s) =

(s1 + · · · + sn )! s1 ! · · · sn !

is the multinomial coefficient. From (54)–(56) after simple manipulations we obtain the following results. Corollary 7.2. For n ≥ 1, the following formulas hold: X F2n = mn (s) 1s1 2s2 · · · nsn , s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

F3n−1 = 2

n−1

·

X

 s1  s2   1 3 2n − 1 sn mn (s) ··· , 1 2 2n−1

X

 s1  s2   1 2 n − 1 sn−1 mn−1 (s) ··· . 2 4 2n−1

s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

Pn−1 = 2

n−1

·

s1 ,...,sn−1 ≥0 2s1 +···+nsn−1 =n

Trudi’s formula (6), coupled with Theorems 7.2 yield the following Pell identities with multinomial coefficients [27]. Corollary 7.3. For n ≥ 1, X Fn =

(−1)s1 +···+sn −1 mn (t)P1t1 P2t2 · · · Pntn ,

s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

F2n+3 =

X

s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

tn (−1)n+s1 +···+sn mn (t)P3t1 P4t2 · · · Pn+2 .

199

Hessenberg Matrices

Many combinatorial identities for different polynomials involving sums over integer partitions can be generated in this way. Some of these identities for Catalan, Fibonacci, Horadam, Lucas, Jacobsthal, and Narayana numbers (polynomials) presented in [29–36].

7.3.

Some Fibonacci-Lucas Identities Using Generalized Brioschi’s Formula

Consider the n × n Hessenberg matrix having the form  k1 a1 1 0  k a a 1 2 2 1   .. .. .. Hn (a1 , a2 , . . . , an) =  . . .   kn−1 an−1 an−2 an−3 kn an an−1 an−2 In [37], Zatorsky and Stefluk proved that

det(Hn ) =

X

s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

(−1)n−|sn | |sn |

n X

si ki

i=1

!

··· ··· .. .

0 0 .. .

··· ···

a1 a2

0 0 .. .



   .  1  a1

mn (s)as11 · · · asnn , (57)

(s +···+s )!

where |sn | = s1 + · · · + sn , mn (s) = 1s1 !···sn !n , and the summation is over all n-tuples (s1 , . . . , sn ) of integers si ≥ 0 satisfying the Diophantine equation s1 + 2s2 + · · · + nsn = n. In the case k1 = . . . = kn = 1 we have well-known Brioschi’s formula [13]. A proof comparable to the one given for Theorem 7.2 yields the following relations between Fibonacci and Lucas numbers. Recall that the Lucas sequence {Ln }n≥0 is defined by the initial values L0 = 2, L1 = 1 and the recurrence Ln = Ln−1 + Ln−2 ,

n ≥ 2.

Theorem 7.3. For n ≥ 1, the following formulas hold:  det Tn (1; F0 , F1 , . . . , Fn−1 = (−1)n−1 (Ln − 1)),  det Tn (1; −F0, −F1 , . . . , −Fn−1 ) = 2n + (−1)n − Ln ,  det Tn (1; F1, F2 , . . . , Fn ) = (−1)n−1 (Ln − 1 − (−1)n ) ,  det Tn (1; F2, F3 , . . . , Fn+1 ) = (−1)n−1 Ln ,  det Tn (1; F3, F4 , . . . , Fn+2 ) = (−1)n−1 Ln + 1.

200

Taras Goy and Roman Zatorsky

Next, we focus on multinomial extensions of Theorem 7.3. Formula (6), together with Theorem 7.3 above, yields the following combinatorial identities expressing the Lucas numbers in terms of Fibonacci numbers. Corollary 7.4. For n ≥ 1, the following formulas hold: Ln = 1 − n ·

X

s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

(−1)|sn | sn mn (s)F0s1 F1s2 · · · Fn−1 , |sn | X

Ln = 2n + (−1)n − n ·

s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

X

Ln = 1 + (−1)n − n ·

s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

Ln = − n ·

X

s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

Ln = (−1)n − n ·

X

1 sn mn (s)F0s1 F1s2 · · · Fn−1 , |sn |

(−1)|sn | mn (s)F1s1 F2s2 · · · Fnsn , |sn |

(−1)|sn | sn mn (s)F2s1 F3s2 · · · Fn+1 , |sn |

s1 ,...,sn ≥0 s1 +2s2 +···+nsn =n

(−1)|sn | sn mn (s)F3s1 F4s2 · · · Fn+2 , |sn | (s +···+s )!

where |sn | = s1 + · · · + sn , mn (s) = 1s1 !···sn n! , and the summation is over integers si ≥ 0 satisfying equation s1 + 2s2 + · · · + nsn = n.

R EFERENCES [1] Hessenberg, K. 1942. Thesis. Darmstadt, Germany: Hochschule.

Technische

[2] Golub, G. H., Van Loan, C. F. 2012. Matrix Computations. Baltimore: Johns Hopkins University Press. [3] Polya, G. 1913. “Aufgabe 424.” Archiv der Mathemntik und Physik, 20(3):271.

Hessenberg Matrices

201

[4] Zatorsky, R. 2015. “Introduction of the theory of triangular matrices (tables).” In Advances in Linear Algebra Research, edited by I. Kyrchei, 185– 237. New York: Nova Science Publishers. [5] Zatorsky, R. A. 2007. “Theory of paradeterminants and its applications.” Algebra and Discrete Mathematics, 1:108-137. [6] Chen, Y. H., Yu, C. Y. 2011. “A new algorithm for computing the inverse and the determinant of a Hessenberg matrix.” Applied Mathematics and Computation, 218:4433-4436. doi: 10.1016/j.amc.2011.10.022 [7] Kilic¸, E. and Arıcan, T. 2017. “Evaluation of Hessenberg determinants via generating function approach.” Filomat, 31(15):4945-4962. doi: 10.2298/FIL1715945K [8] Maroulas, J. 2016. “Factorization of Hessenberg matrices.”, Linear Algebra and its Applications, 506:226-243. doi: 10.1016/j.laa.2016.05.026 [9] Kilic¸, E. 2009. “On the second order linear recurrences by tridiagonal matrices.” Ars Combinatoria, 91:11-18. [10] Li, H.-C. 2012. “On Fibonacci-Hessenberg matrices and the Pell and Perrin numbers.” Applied Mathematics and Computation, 218(17):83538358. doi: 10.1016/j.amc.2012.01.062 [11] Cahill, N. D., D’Errico, J. R., Narayan, D. A. and Narayan, J. Y. 2002. “Fibonacci determinants.” College Mathematical Journal, 3(3):221-225. [12] Tamm, U. 2009. “The determinant of a Hessenberg matrix and some applications in discrete mathematics”, preprint. https:// www.math.unibielefeld.ed/ahlswede/pub/tamm/hessen.ps. [13] Muir, T. 1960. The Theory of Determinants in the Historical Order of Development. Vol. 3. New York: Dover Publications. [14] Zatorsky, R. A. 2002. “Determinants of triangular matrices and trajectories on Ferrer diagrams.” Mathematical Notes, 72(6): 834-852. doi: 10.1023/A:1021433728200 [15] Zatorsky, R. A. 2010. Calculus of Triangular Matrices and its Applications. Ivano-Frankivsk: Simyk.

202

Taras Goy and Roman Zatorsky

[16] Zatorsky, R. A. 2008. “Paradeterminants and polynomials of partitions.”, Ukrainian Mathematical Journal, 60(11):1702-1715. doi: 10.1007/s11253-009-0164-6 [17] Zatorskii, R. A. and Malyarchuk, A. R. 2009. “Triangular matrices and combinatorial inversion formulas.” Mathematical Notes, 85(1):12-21. doi: 10.1134/S0001434609010027 [18] Tarakanov, V. E. and Zatorsky, R. A. 2009. “A relationship between determinants and permanents.” Mathematical Notes, 85(2):267-273. doi: 10.1134/S0001434609010301 [19] Barvinok, A. 2017. Combinatorics and Complexity of Partition Functions. New York: Springer. [20] Mink, H. 1978. Permanents. Encyclopedia of Mathematics and its Applications. Vol. 6. Reading: Addison-Wesley. [21] Marcus, M. and Minc, H. 1961. “On the relation between the determinant and the permanent.” Illinois Journal of Mathematics, 5:376-381. [22] Gibson, P. M. 1971. “Conversion of the permanent into the determinant.” Proceedings of the American Mathematical Society, 27(3):471-476. [23] Goy, T. and Zatorsky, R. 2017. “Infinite linear recurrence relation and superposition of linear recurrence equations.” Journal of Integer Sequences, 20: Article 17.5.3. [24] Zatorsky, R. and Goy, T. 2016. “Parapermanents of triangular matrices and some general theorems on number sequences.”, Journal of Integer Sequences, 19: Article 16.2.2. [25] Horadam, A. F. 1974. “Oresme numbers.”, Fibonacci Quarterly, 12(3):267-271. [26] Goy, T. and Zatorsky, R. 2019. “On Oresme numbers and their connection with Fibonacci and Pell numbers.” Fibonacci Quarterly, 57(3):238-245.

Hessenberg Matrices

203

[27] Goy, T. 2019. “Pell numbers identities from Toeplitz-Hessenberg determinants.” Novi Sad Journal of Mathematics, 49(2):87-94. doi: 10.30755/NSJOM.08406 [28] Goy, T. 2018. “On combinatorial identities for Jacobsthal polynomials.” In “Voronoi’s Impact on Modern Science”. Proceedings of the Sixth International Conference on Analitic Number Theory and Spartial Tessellations, edited by J¨orn Steuding and Mykola Pratsiovytyi. Kyiv, Ukraine, September 24-28, 2018. Vol. 1, 41-47. Kyiv: Natl Pedagog. Dragomanov Univ. Publ. [29] Goy, T. 2018. “On determinants and permanents of some ToeplitzHessenberg matrices whose entries are Jacobsthal numbers.” Eurasian Mathematical Journal, 9(4):61-67. doi: 10.32523/2077-9879-2018-9-461-67 [30] Goy, T. 2018. “On new identities for Mersenne numbers.” Applied Mathematics E-Notes, 18: 100-105. [31] Goy, T. 2018. “On identities with multinomial coefficients for FibonacciNarayana sequence.” Annales Mathematicae et Informaticae, 49:75-84. doi: 10.33039/ami.2018.09.001 [32] Goy, T. 2018. “On some fibinomial identities.” Chebyshevski Sbornik, 19(2):56-66. (in Russian). doi: 10.22405/2226-8383-2018-19-2-56-66 [33] Goy, T. 2018. “Some families of identities for Padovan numbers.” Proceedings of the Jangjeon Mathematical Society, 21(3):413-419. doi: 10.17777/pjms2018.21.3.413 [34] Goy, T. and Shattuck, M. 2019. “Determinant formulas of some ToeplitzHessenberg matrices with Catalan entries.” Proceedings – Mathematical Sciences, 129: Article 46. doi: 10.1007/s12044-019-0513-9 [35] Goy, T. and Shattuck, M. 2019. “Determinants of Toeplitz-Hessenberg matrices with generalized Fibonacci entries.” Notes on Number Theory and Discrete Mathematics, 25 (4):83-95. doi: 10.7546/nntdm.2019.25.4.83-95

204

Taras Goy and Roman Zatorsky

[36] Goy, T. and Shattuck, M. 2019. “Fibonacci and Lucas identities using Toeplitz-Hessenberg matrices.“ Applications and Applied Mathematics, 14(2):699-715. [37] Zatorsky, R. and Stefluk, S. 2013. “On one class of partition polynomials.” Algebra and Discrete Mathematics, 16(1):127-133.

In: Hot Topics in Linear Algebra Editor: Ivan I. Kyrchei

ISBN: 978-1-53617-770-1 c 2020 Nova Science Publishers, Inc.

Chapter 6

E QUIVALENCE OF P OLYNOMIAL M ATRICES OVER A F IELD Volodymyr M. Prokip∗ Department of Algebra Pidstryhach Institute for Applied Problems of Mechanics and Mathematics NAS of Ukraine, Lviv, Ukraine

Abstract Polynomial n × n matrices A(λ) and B(λ) over a field F are called semi-scalar equivalent if there exist a nonsingular n × n matrix P over the field F and an invertible n × n matrix Q(λ) over the ring F[λ] such that A(λ) = P B(λ)Q(λ). Dias da Silva and Laffey studied polynomial matrices up to PS-equivalence: n × n matrices A(λ) and B(λ) over F are PS-equivalent if A(λ) = P (λ)B(λ)Q for some invertible n × n matrix P (λ) over F[λ] and a nonsingular n × n matrix Q over F. It is evident that matrices A(λ) and B(λ) are PS-equivalent if and only if the transpose matrices AT (λ) and B T (λ) are semi-scalar equivalent. It is clear that two semi-scalar or PS-equivalent polynomial matrices are always equivalent. The converse of the above statement is not always true. The semi-scalar equivalence and PS-equivalence of matrices over a field F contain the problem of similarity between two families of matrices. Therefore, these equivalences of matrices can be considered a difficult problem in linear algebra. The problem of semi-scalar equivalence of ∗

Corresponding Author’s Email: [email protected].

206

Volodymyr M. Prokip matrices includes the following two problems: (1) the determination of a complete system of invariants and (2) the construction of a canonical form for a matrix with respect to semi-scalar equivalence. But these problems have satisfactory solutions only in isolated cases. The aim of the present paper is to study polynomial matrices over a field with respect to semi-scalar equivalence. The paper consists of three chapters. Section 1 contains some terminologies and notations. The purpose of the second Section is to present the necessary and sufficient conditions of semi-scalar equivalence of nonsingular matrices A(λ) and B(λ) over a field F of characteristic zero in terms of solutions of a homogenous system of linear equations. We also establish similarity of monic polynomial matrices A(λ) and B(λ) over a field. We obtain canonical forms with respect to semi-scalar equivalence for an n × n matrix pencil A(λ) = A0 λ + A1 over an arbitrary field F with nonsingular matrix A0 in Section 3. We also describe the structure of nonsingular polynomial matrices over a field, which can be reduced to the established forms by the transformations of semiscalar equivalence.

MSC 2000: 15A21; 15A24 Keywords: semi-scalar equivalence, PS-equivalence, similarity of matrices

1.

INTRODUCTION

We give now some terminologies and notations here. Let F be a field. Denote by Mm,n (F) the set of m × n matrices over F and by Mm,n (F[λ]) the set of m × n matrices over the polynomial ring F[λ]. A polynomial a(λ) = a0 λk + a1 λk−1 + · · · + ak ∈ F(λ) is said to be normalized if the first non-zero term a0 is equal to 1. Definition 1.1. Let A(λ), B(λ) ∈ Mn,n (F[λ]). The matrices A(λ) and B(λ) are said to be equivalent if A(λ) = U (λ)B(λ)V (λ) for some U (λ), V (λ) ∈ GL(n, F[λ]). Let A(λ) ∈ Mn,n (F[λ]) be nonzero matrix and rank A(λ) = r. Then A(λ) is equivalent to a diagonal matrix, i.e., there exist matrices P (λ), Q(λ) ∈ GL(n, F[λ]) such that  P (λ)A(λ)Q(λ) = SA (λ) = diag a1 (λ), a2 (λ), . . . , ar (λ), 0, . . . , 0 ,

where aj (λ) ∈ F[λ] are normalized polynomials for all j = 1, 2, . . ., r and a1 (λ)|a2(λ)| . . . |ar (λ) (divides) are the invariant factors of A(λ). The diagonal

Equivalence of Polynomial Matrices over a Field

207

matrix SA (λ) is called the Smith normal form of A(λ). It is well known that matrices A(λ), B(λ) ∈ Mn,n (F[λ]) are equivalent if and only if A(λ) and B(λ) have the same rank and the same invariant factors (see [1], [2]). Definition 1.2. Matrices A(λ), B(λ) ∈ Mn,n (F[λ]) are said to be semi-scalar equivalent (see [3], Chapter 4) if there exist matrices P ∈ GL(n, F) and Q(λ) ∈ GL(n, F[λ]) such that A(λ) = P B(λ)Q(λ). Let A(λ) ∈ Mn,n (F[λ]) be nonsingular matrix over an infinite field F. Then A(λ) is semi-scalar equivalent to the lower triangular matrix (see [3])   s11 (λ) 0 ... ... 0  s21 (λ) s22 (λ)  0 ... 0  Sl (λ) =   ... ... ... ... ...  sn1 (λ) sn2 (λ) . . . sn,n−1 (λ) snn (λ)

with the following properties:

(a) sii (λ) = si (λ), i = 1, 2, . . . , n, where s1 (λ)|s2 (λ)| · · ·|sn (λ) (divides) are the invariant factors of A(λ); (b) sii (λ) divides sji (λ) for all i, j with 1 ≤ i < j ≤ n. Later, the same upper triangular form was obtained in [4]. Let F = Q be the field of rational numbers. Consider the following examples.   λ λ Example 1.1. For singular matrix A(λ) = ∈ M2,2 (Q[λ]) λ2 + 1 λ2 + 1 there do not exist invertible matrices P ∈ M2,2 (Q) and Q(λ) ∈ M2,2 (Q[λ]) such that   1 0 P A(λ)Q(λ) = Sl (λ) = . ∗ 0 Thus, for a singular matrix A(λ), the matrix Sl (λ) does not always exist. Example 1.2. For nonsingular matrix   1 0 A(λ) = 2 ∈ M2,2 (Q[λ]) λ − λ (λ − 1)4

208

Volodymyr M. Prokip

there exist invertible matrices    2  1 2 2λ − 6λ + 5 2(λ − 1)4 P = and Q(λ) = −2 −5 −2 −2λ2 + 2λ − 1 such that



 1 0 P A(λ)Q(λ) = B(λ) = 2 . λ − 3λ (λ − 1)4

Hence, matrices A(λ) and B(λ) are semi-scalar equivalent. It is evident that A(λ) and B(λ) have conditions (a) and (b) for semi-scalar equivalence. Thus, the matrix Sl (λ) is defined not uniquely with respect to the semi-scalar equivalence for nonsingular matrix A(λ). Dias da Silva and Laffey studied polynomial matrices up to PS-equivalence. Definition 1.3. Matrices A(λ) ∈ Mn,n (F[λ]) and B(λ) ∈ Mn,n (F[λ]) are PS-equivalent if A(λ) = P (λ)B(λ)Q for some P (λ) ∈ GL(n, F[λ]) and Q ∈ GL(n, F) (see [5]). Let F be an infinite field. A matrix A(λ) ∈ Mn,n (F[λ]) with det A(λ) 6= 0 is PS-equivalent to the upper triangular matrix (see [5], Proposition 2)   s11 (λ) s12 (λ) . . . s1n (λ)  0 s22 (λ) . . . s2n (λ)   Su (λ) =   ... ... ... ...  0 ... 0 snn (λ)

with the following properties:

(a) sii (λ) = si (λ), i = 1, 2, . . . , n, where s1 (λ)|s2 (λ)| · · ·|sn (λ) (divides) are the invariant factors of A(λ); (b) sii (λ) divides sij (λ) for all integers i, j with 1 ≤ i < j ≤ n; (c) if i 6= j and sij (λ) 6= 0, then sij (λ) is a normalized polynomial and deg sii (λ) < deg sij (λ) < deg sjj (λ).

Equivalence of Polynomial Matrices over a Field

209

The matrix Su (λ) is called a near canonical form of the matrix A(λ) with respect to PS-equivalence. We note that conditions (a) and (b) for semi-scalar equivalence were proved in [3]. It is evident that matrices A(λ), B(λ) ∈ Mn,n (F[λ]) are PS-equivalent if and only if the transpose matrices AT (λ) and B T (λ) are semi-scalar equivalent. It is easy to make sure that the matrix Su (λ) is not uniquely determined for the nonsingular matrix A(λ) with respect to PSequivalence (see Example 1.1). It is clear that semi-scalar equivalence and PS-equivalence represent an equivalence relation on Mn,n (F[λ]). The semi-scalar equivalence and PSequivalence of matrices over a field F contain the problem of similarity between two families of matrices (see, for example [2, 3, 5–11]). In most cases, these problems are involved with the classic unsolvable problem of a canonical form of a pair of matrices over a field with respect to simultaneous similarity. At present, such problems are called wild ( [7–9]). Thus, these equivalences of matrices can be considered a difficult problem in linear algebra. On the basis of the semi-scalar equivalence of polynomial matrices in [3] algebraic methods for factorization of matrix polynomials were developed. We note that these equivalences were used in the study of the controllability of linear systems [12]. The problem of semi-scalar equivalence of matrices includes the following two problems: (1) the determination of a complete system of invariants and (2) the construction of a canonical form for a matrix with respect to semi-scalar equivalence. But these problems have satisfactory solutions only in isolated cases. The canonical and normal forms with respect to semi-scalar equivalence for a matrix pencil A0 λ + A1 ∈ Mn,n (F[λ]), where A0 is nonsingular, were investigated in [13] and [14]. More detail about semi-scalar equivalence and many references to the original literature can be found in [13–17].

2. 2.1.

SEMI -S CALAR E QUIVALENCE OF N ONSIGULAR MATRICES Preparatory Notations and Results

To prove the main result, we need the following notations and propositions. Let F be a field of characteristic zero. In the polynomial ring F[λ] we consider the operation of differentiation D.

210

Volodymyr M. Prokip Let a(λ) = a0 λl + a1 λl−1 + . . . + al−1 x + al ∈ F[λ]. Put D (a(λ)) = la0 λl−1 + (l − 1)a1 λl−2 + . . . + al−1 = a(1)(λ)

and Dk (a(λ)) = D(a(k−1) (λ)) = a(k) (λ) for every natural k ≥ 2. The differentiation of a matrix A(λ) = Mm,n (F[λ]) is understood as its elementwise differentiation, i.e.,



aij (λ)





(1)

A(1)(λ) = D(A(λ)) = [D(aij (λ))] = [aij (λ)] and A(k) (λ) = D(A(k−1)(λ)) is the k-th derivative of A(λ) for every natural k ≥ 2. Let b(λ) = (λ − β1 )k1 (λ − β2 )k2 · · · (λ − βr )kr ∈ F[λ], deg b(λ) = k = k1 + k2 + · · · + kr , and A(λ) ∈ Mm,n (F[λ]). For the monic polynomial b(λ) and the matrix A(λ) we define the matrix   N1    N2   M [A, b] =   ..  ∈ Mmk,n (F),  .  Nr 

  where Nj =   

A(βj ) A(1)(βj ) .. . A(kj −1) (βj )



   ∈ Mmk ,n (F), j = 1, 2, . . ., r. j  

Proposition 2.1. Let b(λ) = (λ − β1 )k1 (λ − β2)k2 · · · (λ − βr)kr ∈ F[λ], where βi ∈ F for all i = 1, 2, . . ., r, and A(λ) ∈ Mm,n (F[λ]) be a nonzero matrix. Then A(λ) admits the representation A(λ) = b(λ)C(λ),

(2.1)

if and only if M [A, b] = 0. Proof. Suppose that (2.1) holds. It is evident that b(βj ) = b(1)(βj ) = . . . = b(kj −1) (βj ) = 0 for all j = 1, 2, . . . , r and A(βj ) = 0. Differentiating equality

Equivalence of Polynomial Matrices over a Field (2.1) (kj − 1) times and substituting each time obtained equalities, we finally obtain    A(βj )    A(1)(βj )       A(2)(β )  =  j       . ..    A(kj −1) (βj )

211

λ = βj into both sides of the

0 0 0 .. . 0



   .  

Thus, Nj = 0. Since 1 ≤ j ≤ r, we have M [A, b] = 0. Conversely, let M [A, b] = 0. Dividing the matrix A(λ) by In b(λ) with residue (see, for instance, Theorem 7.2.1 in the classical book by Lancaster and Tismenetski [18]), we have A(λ) = b(λ)C(λ) + R(λ), where C(λ), R(λ) ∈ Mm,n (F[λ]) and deg R(λ) < deg b(λ). Thus, M [A, b] = M [R, b] = 0. Since M [R, b] = 0, then R(λ) = (λ − βi )ki Ri (λ) for all i = 1, 2, . . ., r, i. e. R(λ) = b(λ)R0(λ). On the other hand, deg R(λ) < deg b(λ). Thus, R(λ) ≡ 0. This completes the proof. Corollary 1. Let A(λ) ∈ Mn,n (F[λ]) be a matrix of rank A(λ) ≥ n − 1 with the Smith normal form S(λ) = diag (s1 (λ), . . . , sn−1 (λ), sn(λ)). If sn−1 (λ) = (λ − α1 )k1 (λ − α2 )k2 · · · (λ − αr )kr , where αi ∈ F for all i = 1, 2, . . ., r; then M [A∗ , sn−1 ] = 0. Proof. Write the matrix A(λ) as A(λ) = U (λ)S(λ)V (λ), where U (λ), V (λ) ∈ GL(n, F[λ]). Then A∗ (λ) = V ∗ (λ)S ∗(λ)U ∗ (λ). Put d(λ) = s1 (λ)s2 (λ) · · ·sn−1 (λ). Since rank A(λ) ≥ n − 1, we have A∗ (λ) 6= 0. It is clear that   sn (λ) sn (λ) S ∗ (λ) = diag ,··· , , 1 d(λ). s1 (λ) sn−1 (λ) Hence, A∗ (λ) admits the representation A∗ (λ) = sn−1 (λ)B(λ), where B(λ) ∈ Mn,n (F[λ]). By virtue of Proposition 2.1, M [A∗ , sn−1 ] = 0. This completes the proof.

212

Volodymyr M. Prokip The Kronecker product of matrices A = [aij ] (n × m) and B is denoted by   a11 B . . . a1m B   .. A ⊗ B =  ... . . an1 B . . .

anm B

Let nonsingular matrices A(λ), B(λ) ∈ Mn,n (F[λ]) be equivalent and S(λ) = diag (s1 (λ), . . . , sn−1 (λ), sn(λ))

be their Smith normal form. For A(λ) and B(λ) we define the matrix  −1  B ∗ (λ) ⊗ AT (λ) ∈ Mn2 ,n2 (F[λ]). D(λ) = s1 (λ)s2 (λ) · · ·sn−1 (λ)

It may be noted if S(λ) = diag (1, . . . , 1, s(λ)) is the Smith normal form of the matrices A(λ) and B(λ), then D(λ) = B ∗ (λ) ⊗ AT (λ).

2.2.

Main Results

It is clear that two semi-scalar or PS-equivalent matrices are always equivalent. The converse of the above statement is not always true. The main result of this chapter is the following theorem. Theorem 2.1. Let nonsingular matrices A(λ), B(λ) ∈ Mn,n (F[λ]) be equivalent and S(λ) = diag (s1 (λ), . . . , sn−1 (λ), sn (λ)) be their Smith normal form. Further, let sn (λ) = (λ − α1 )k1 (λ − α2 )k2 · · · (λ − αr )kr , where αi ∈ F for all i = 1, 2, . . . , r. Then A(λ) and B(λ) are semi-scalar equivalent if and only if rank M [D, sn] < n2 and the homogeneous system of equations M [D, sn ]x = ¯0 has a solution x = [v1 , v2 , . . . , vn2 ]T over F such that the matrix   v1 v2 ... vn  vn+1 vn+2 . . . v2n   V =  ... ... ... ...  vn2 −n+1 vn2 −n+2 . . . vn2 is nonsingular. If det V 6= 0, then V A(λ) = B(λ)Q(λ), where Q(λ) ∈ GL(n, F[λ]).

213

Equivalence of Polynomial Matrices over a Field

Proof. Let nonsingular matrices A(λ) and B(λ) in Mn,n (F[λ]) be semi-scalar equivalent, i.e., A(λ) = P B(λ)Q(λ), where P ∈ GL(n, F) and Q(λ) ∈ GL(n, F[λ]). From the last equality we have B ∗ (λ)P −1 A(λ) = Q(λ) det B(λ).

(2.2)

Write B ∗ (λ) in the form B ∗ (λ) = d(λ)C(λ) (see the proof of Corollary 1) and det B(λ) = b0 d(λ)sn (λ), where d(λ) = s1 (λ)s2 (λ) · · ·sn−1 (λ), C(λ) ∈ Mn,n (F[λ]) and b0 is a nonzero element in F. Now rewrite equality (2.2) as d(λ)C(λ)P −1 A(λ) = Q(λ)d(λ)sn(λ)b0. This implies that C(λ)P −1 A(λ) = Q(λ)sn (λ)b0. Put

and



 P −1 =  

v1

v2

vn+1 ...

vn+2 ...

vn2 −n+1 vn2 −n+2



(2.3)

 . . . vn . . . v2n   ... ...  . . . vn2

w1 (λ) w2 (λ)  wn+1 (λ) wn+2 (λ) Q(λ)b0 = W (λ) =   ... ... wn2 −n+1 (λ) wn2 −n+2 (λ)

 . . . wn (λ) . . . w2n(λ)  , ... ...  . . . wn2 (λ)

where vj ∈ F and wj (λ) ∈ F[λ] for all j = 1, 2, . . ., n2 . Then we can write equality (2.3) in the form (see [18], Chapter 12)   T C(λ) ⊗ AT (λ) · v1 , v2 , . . . , vn2 =  T sn (λ) w1 (λ), w2 (λ), . . . , wn2 (λ) . (2.4)

Note that C(λ) ⊗ AT (λ) = D(λ). In view of equality (2.4) and Proposition 2.1, we have  T M [D, sn ] v1 , v2 , . . . , vn2 = ¯0.

This implies that rank M [D, sn ] < n2 .

214

Volodymyr M. Prokip

Conversely, let rank M [D, sn ] < n2 and for matrix M [D, sn ] there exists a vector  T x0 = v1 , v2 , . . . , vn2 , where vj ∈ F for all j = 1, 2, . . ., n2 , such that M [D, sn]x0 = ¯0 and the matrix 

 V = 

v1

v2

vn+1 ...

vn+2 ...

vn2 −n+1 vn2 −n+2

··· ... ... ...

 vn v2n   ...  vn2

is nonsingular. Dividing the product C(λ)V A(λ) by In sn (λ) with residue, we have C(λ)V A(λ) = sn (λ)Q(λ) + R(λ), where Q(λ), R(λ) = [rij (λ)] ∈ Mn,n (F[λ]) and deg R(λ) < deg sn (λ). From the last equality we obtain M [D, sn]x0 = M [Col R, sn ] = ¯0, where Col R(λ) =



r11 (λ) . . .

r1n (λ) . . . rn,n−1 (λ) . . .

rnn (λ)

In accordance with Proposition 2.1 Col R(λ) ≡ ¯0. Thus, R(λ) ≡ 0 and C(λ)V A(λ) = sn (λ)Q(λ).

T

.

(2.5)

Note that det B(λ) = b0 d(λ)sn(λ), where b0 is a nonzero element in F. Multiplying both sides of equality (2.5) by b0 d(λ), we have b0 d(λ)C(λ)V A(λ) = B ∗ (λ)V A(λ) = b0 d(λ)sn(λ)Q(λ) = Q(λ) det B(λ).

(2.6)

From equality (2.6) it follows V A(λ) = B(λ)Q(λ). Passing to the determinants on both sides of this equality, we obtain det Q(λ) = const 6= 0. Since Q(λ) ∈ GL(n, F[λ]), we conclude that matrices A(λ) and B(λ) are semi-scalar equivalent. This completes the proof.

Equivalence of Polynomial Matrices over a Field

215

It may be noted that nonsingular matrices A(λ), B(λ) ∈ Mn,n (F[λ]) are semi-scalar equivalent if and only if A(λ)T and B(λ)T are PS-equivalent. Thus, Theorem 2.1 gives the answer to the question: When are nonsingular matrices A(λ) and B(λ) PS-equivalent? In the future F = C is the field of complex numbers. Corollary 2. Let nonsingular matrices A(λ), B(λ) ∈ Mn,n (C[λ]) be equivalent and S(λ) = diag (s1 (λ), . . . , sn−1 (λ), sn(λ)) be their Smith normal form. Then A(λ) and B(λ) are semi-scalar equivalent if and only if rank M [D, sn] < n2 and the homogeneous system of equations M [D, sn ]x = ¯0 has a solution x = [v1 , v2 , . . . , vn2 ]T over C such that the matrix   v1 v2 ... vn  vn+1 vn+2 . . . v2n   V =  ... ... ... ...  vn2 −n+1 vn2 −n+2 . . . vn2

is nonsingular.

Definition 2.1. Two families of n × n matrices over a field F A = {A1 , A2 , . . . , Ar }

and B = {B1 , B2 , . . ., Br }

are said to be strictly equivalent if there exist matrices U, V ∈ GL(n, F) such that Ai = U Bi V for all i = 1, 2, . . ., r. The families A and B are said to be similar if there exists a matrix T ∈ GL(n, F) such that Ai = T Bi T −1

for all

i = 1, 2, . . . , r.

The families A and B we associate with monic matrix polynomials A(λ) = In λr + A1 λr−1 + A2 λr−2 + · · · + Ar

216

Volodymyr M. Prokip

and B(λ) = In λr + B1 λr−1 + B2 λr−2 + · · · + Br over C of degree r respectively. The families A and B are similar over C if and only if the matrices A(λ) and B(λ) are semi-scalar equivalent (PS-equivalent) (see and [3] and [5]). From Theorem 2.1 and Corollary 2 we obtain the following corollary. Corollary 3. Let n × n monic matrix polynomials of degree r r

A(λ) = In λ +

r X

Ai λ

r−i

r

and B(λ) = In λ +

i=1

r X

Bi λr−i

i=1

over the field of complex numbers C be equivalent, and let S(λ) = diag (s1 (λ), . . . , sn−1 (λ), sn(λ)) be their Smith normal form. The families A = {A1 , A2 , . . . , Ar } and B = {B1 , B2 , . . ., Br } are similar over C if and only if rank M [D, sn] < n2 and the homogeneous system of equations M [D, sn ]x = ¯0 has a solution x = [v1 , v2, . . . , vn2 ]T over C such that the matrix   v1 v2 ... vn  vn+1 vn+2 . . . v2n   V =  ... ... ... ...  vn2 −n+1 vn2 −n+2 . . . vn2

is nonsingular. If det V 6= 0, then Ai = V −1 Bi V for all i = 1, 2, . . . , r.

Let families A = {A1 , A2 , . . . , Ar } and B = {B1 , B2 , . . . , Br } be similar or strictly equivalent over a field F. Let us take note of the fact that rank Ai = rank Bi for all i = 1, 2, . . ., r. The following statement describes a relationship between strict equivalence and similarity of families of matrices. Proposition 2.2. We assume that for some 1 ≤ k ≤ r in families of matrices A = {A1 , A2 , . . . , Ar }

and B = {B1 , B2 , . . . , Br }

over F the following equality rank Ak = rank Bk = n is valid. The following are equivalent:

Equivalence of Polynomial Matrices over a Field

217

1) families A and B are strictly equivalent; n o −1 −1 −1 2) families Ar = A1 A−1 , . . . , A A , A A , . . . , A A k−1 k+1 r k k k k n o and Br = B1 Bk−1 , . . . , Bk−1 Bk−1 , Bk+1 Bk−1 , . . . , Br Bk−1 are similar; n o −1 −1 −1 3) families Ar = A1 A−1 , . . . , A A , A A , . . . , A A k−1 k+1 r k k k k n o and Bl = Bk−1 B1 , . . ., Bk−1 Bk−1 , Bk−1 Bk+1 , . . ., Bk−1 Br are similar; o n −1 −1 −1 A A , . . . , A A , A A , . . . , A 4) families Al = A−1 k+1 r k−1 1 k k k k n o and Bl = Bk−1 B1 , . . ., Bk−1 Bk−1 , Bk−1 Bk+1 , . . ., Bk−1 Br are similar. Proof. We can assume without any restriction of generality that rank A1 = rank B1 = n. 1) → 2). Let families of matrices A = and B be strictly equivalent, i.e., U Ai = Bi V for all i = 1, 2, . . ., r, where U, V ∈ GL(n, C). Thus, polyP P nomial matrices A(λ) = ri=1 Ai λr−i and B(λ) = ri=1 Bi λr−i are strictly equivalent, i.e., U A(λ) = B(λ)V. From this it follows that r   X r −1 r−i U A(λ)A−1 = U I λ + A A λ = n i 1 1 i=2

r   X r −1 r−i B(λ)V A−1 = I λ + B B λ B1 V A−1 n i 1 1 1 . i=2

This implies that the families Ar and Br are similar. 2) → 3). Let rank A1 = rank B1 = n and the families Ar and Br be similar, i.e., T Aj A−1 = Bj B1−1 T , where T ∈ GL(n, C) and 2 ≤ j ≤ r. 1 From this it follows that W Aj A−1 = B1−1 Bj W for all j = 2, 3, . . ., r and 1 W = B1−1 T ∈ GL(n, F). Thus, families Ar and Bl are similar. −1 3) → 4). Let families Ar and Bl be similar, i.e., W Aj A−1 1 = B1 Bj W for all j = 2, 3, . . ., r, where W ∈ GL(n, F). From these equalities we have −1 −1 (W A1 )A−1 1 Aj = B1 Bj (W A1 ) for all j = 2, 3, . . ., r. It is obvious that families Al and Bl are similar.

218

Volodymyr M. Prokip 4) → 1). We assume that families Al and Bl are similar, i.e., SA−1 1 Aj = for all j = 2, 3, . . ., r, where S ∈ GL(n, F). It is easily seen that

B1−1 Bj S

r r     X X r−i r −1 r−i S In λr + A−1 A λ = I λ + B B λ S. i n i 1 1 i=2

i=2

From the last equality we obtain r r     X X r r−i r r−i B1 SA−1 A λ + A λ = B λ + B λ S. 1 i 1 i 1 i=2

i=2

Thus, families A and B are strictly equivalent.

2.3.

The Illustrative Example

Let F = Q be the field of rational numbers. To illustrate Theorem 2.1 consider the following example. Example 2.1. Matrices   1 0 A(λ) = λ2 + aλ λ4

and

B(λ) =



1 0 λ2 + bλ λ4



with entries from Q[λ] are equivalent for all a, b ∈ Q and S(λ) = diag (1, λ4) is their Smith normal form. In what follows a 6= b. Construct the matrix D(λ) = B ∗ (λ) ⊗ AT (λ) =  λ4 λ6 + aλ5 0 0  0 λ8 0 0    − (λ2 + bλ) −(λ4 + (a + b)λ3 + abλ2 ) 1 λ2 + aλ  0

−(λ6 + bλ5 )

0

λ4

and solve the system of equations M [D, s2]x = ¯0. From this it follows      0 0 1 0 v1 0      −b 0 0 a   v2   0  .  =  −2 −2ab 0 2   v3   0  v4 0 0 −6(a + b) 0 0

     

Equivalence of Polynomial Matrices over a Field

219

From this we have, if a + b 6= 0, then A(λ) and B(λ) are not semi-scalar equivalent. If a + b = 0, then b = −a and system of equations M [D, s2 ]x = ¯0 T  is a solution of M [D, s2 ]x = ¯0 is solvable. The vector 1, 0, a22 , −1 " # 1 a22 for arbitrary a 6= 0. Thus, the matrix V = is nonsingular. 0 −1 So, if a 6= 0 and b = −a, then matrices A(λ) and B(λ) are semi-scalar equivalent, i.e., A(λ) = P B(λ)Q(λ), where " # 2 1 2 a P = V −1 = 0 −1 and Q(λ) =

"

2λ2 a2

2λ a − a22

+

+1

2λ4 a2 2λ2 − a2 + 2λ a

−1

#

∈ GL(2, Q[λ]).

3. N ORMAL F ORMS OF A MATRIX P ENCIL WITH R ESPECT TO SEMI -S CALAR E QUIVALENCE In this section, F denotes an arbitrary field. Let a(λ) = λk +a1 λk−1 +· · ·+ak ∈ F[λ] be normalized polynomial. Let us denote by   0 1 0 ... ... 0  0 0 1 0 ··· 0    La =  ... ... ... ... ...  ...  ∈ Mk,k (F)  0 ... ... ... 0 1  −ak −ak−1 . . . . . . −a2 −a1 the companion matrix (Frobenius cell) of a(λ) and by   1 0 ... ... 0 0 1 0 ... 0     ..  ∈ M (F[λ]). .. Ha (λ) =  ... . k,k .    0 . . . 0 1 0  λ λ2 . . . λk−1 a(λ)

220

Volodymyr M. Prokip

It is obvious that Ik λ − La and Ha(λ) are equivalent matrices with the Smith  normal form Sa (λ) = diag 1, . . . , 1 , a(λ) .

3.1.

Auxiliary Statements

Below, we describe relationship between the notions of equivalence and semiscalar equivalence of linear matrix pencils. Proposition 3.1. Matrix pencils A(λ) = A0 λ − A1 ∈ Mn,n (F[λ]) and B(λ) = B0 λ − B1 ∈ Mn,n (F[λ]) with nonsingular matrices A0 and B0 are semi-scalar equivalent if and only if they are equivalent. Proof. Let matrix pencils A(λ) and B(λ) with nonsingular matrices A0 and B0 be semi-scalar equivalent, i. e., A(λ) = P B(λ)Q(λ), where P ∈ GL(n, F) and Qλ ∈ GL(n, F[λ]). It is obvious that pencils A(λ) and B(λ) are equivalent. Conversely, we assume that matrix pencils A(λ) = A0 λ − A1 ∈ Mn,n F[λ]) and B(λ) = B0 λ − B1 ∈ Mn,n F[λ]) with nonsingular matrices A0 and B0 are equivalent, i.e., B(λ) = U (λ)A(λ)V (λ), where U (λ), V (λ) ∈ GL(n, F[λ]). From this it follows W (λ)B(λ) = A(λ)V (λ), (3.1) where U −1 (λ) = W (λ). We write matrices W (λ) and U (λ) in the following forms: W (λ) = A(λ)S(λ) + R, (3.2) U (λ) = B(λ)M (λ) + T

(3.3)

where R, T ∈ Mn,n F). By (3.2) equality (3.1) we rewrite as (A(λ)S(λ) + R) B(λ) = A(λ)V (λ). From this it follows RB(λ) = A(λ)Q(λ),

(3.4)

where Q(λ) = V (λ) − S(λ)B(λ). Taking into account (3.2), (3.3) and (3.4) we rewrite the equality In = W (λ)U (λ) as In = W (λ)U (λ) = (A(λ)S(λ) + R)U (λ) = A(λ)S(λ)U (λ) + RU (λ) = A(λ)S(λ)U (λ) + RB(λ)M (λ) + RT = A(λ)N (λ) + RT = A0 N (λ)λ − A1 N (λ) + RT,

(3.5)

Equivalence of Polynomial Matrices over a Field

221

where N (λ) = S(λ)U (λ) + Q(λ)M (λ). Since det A0 6= 0, from equality (3.5) we obtain N (λ) = 0n,n and RT = In . Thus, det R 6= 0. It follows from equality (3.4) that matrix pencils A(λ) and B(λ) with nonsingular matrices A0 and B0 are semi-scalar equivalent. The Proposition is proved. Proposition 3.2. Matrix pencils A(λ) = A0 λ − A1 ∈ Mn,n (F[λ]) and B(λ) = B0 λ − B1 ∈ Mn,n (F[λ]) with nonsingular matrices A0 and B0 are semi-scalar equivalent if and only if they are strictly equivalent. Proof. It is clear that if linear pencils A(λ) and B(λ) with nonsingular matrices A0 and B0 are strictly equivalent, then they are semi-scalar equivalent. Conversely, if matrix pencils A(λ) = A0 λ − A1 ∈ Mn,n (F[λ]) and B(λ) = B0 λ − B1 ∈ Mn,n (F[λ]) with nonsingular matrices A0 and B0 are semi-scalar equivalent, then B(λ) = P A(λ)Q(λ), where P ∈ GL(n, F), Q(λ) ∈ GL(n, F[λ]) and deg Q(λ) ≥ 1. From the last equality we have  deg B(λ) = deg P A(λ) + deg Q(λ). (3.6) Since A0 , B0 and P are nonsingular matrices, using equality (3.6) we obtain deg Q(λ) = 0. Thus, Q(λ) ∈ GL(n, F) and the matrix pencils A(λ) and B(λ) are strictly equivalent. The Proposition is proved. The different version of Proposition 3.1 and Proposition 3.2 we present here. Proposition 3.3. Let A(λ) = A0 λ − A1 ∈ Mn,n (F[λ]) and B(λ) = B0 λ − B1 ∈ Mn,n (F[λ]) be matrix pencils with nonsingular matrices A0 and B0 . The following statements are equivalent: 1) the matrix pencils A(λ) and B(λ) are equivalent; 2) the matrix pencils A(λ) and B(λ) are semi-scalar equivalent; 3) the matrix pencils A(λ) and B(λ) are strictly equivalent. This Proposition can be proved in much the same way as Propositions 3.1 and 3.2. Lemma 3.1. Let a(λ) = λn + a1 λn−1 + · · · + an ∈ F(λ) be normalized polynomial of degree n ≥ 2. Then

222

Volodymyr M. Prokip

1) the matrices La (λ) = In λ − La and Ha(λ) are semi-scalar equivalent; 2) the matrix Ha (λ) by elementary transformations of columns can be reduced to a unital matrix pencil C(λ) = In λ − C ∈ Mn,n (F[λ]), i.e., Ha(λ)W (λ) = In λ − C, where W (λ) ∈ GL(n, F[λ]). Proof. Consider the matrices  0 ... ...  0 ... ...  Q1 (λ) =  . . . . . . . . .  0 1 λ 1 λ λ2 and

Then



0 ...  0 ...  P1 =  . . . ...   −1 0 an−1 an−2

... 0 ... ... ... ... 0 ... ... ...

 0 1 1 λ   ... ...   ∈ GL(n, F[λ]) n−3 n−2  λ λ λn−2 λn−1  0 −1 0 −1 0 0  . . . . . . . . .  ∈ GL(n, F). ... ... 0  a2 a1 1

 1 0 ... ... 0 0 1 0 ... 0     .. ..  , .. P1 La (λ)Q1 (λ) = H1 (λ) =  .  . .   0 ... 0 1 0  λ h12 (λ) . . . h1,n−1 (λ) a(λ) 

where h1k (λ) = λk + a1 λk−1 + . . .+ ak−1 λ ∈ F[λ] for all k = 2, 3, . . ., n − 1. Put   1 −a1 −a2 . . . −an−2 0  0 1 0 ... ... 0    P2 = . . . . . . . . . . . . ... . . .  ∈ GL(n, F).  0 ... ... 0 1 0 0 ... ... ... 0 1

223

Equivalence of Polynomial Matrices over a Field It is easy to verify that 

1 0 0 1  0 0  P2−1 H1 (λ)P2 = H2 (λ) =  .  ..  0 . . . λ λ2

0 0 1

... ... 0 .. .

... 0 h23 (λ) . . .

... ... ...

0 0 0 .. .



    ,   1 0  h2,n−1 (λ) a(λ)

where h2k (λ) = h1k (λ) − ak λ ∈ F[λ] for all k = 3, 4, . . ., n − 1. The matrix H2 (λ) we rewrite in the form  1 0 ... 0  0     .. H2 (λ) =  . ,   0  H3 (λ) λ 

where 

1 0 ... 0 1 0   .. H3 (λ) =  .  0 ... 0 λ2 h23 (λ) . . .

... ... .. .

0 0 .. .



    ∈ M (n − 1, F[λ]).  1 0  h2,n−1 (λ) a(λ)

Performing operations on the matrix H3 (λ), similar to those performed on the matrix H1 (λ), we have   1 0 ... 0 0     ..  −1 P3 H3 (λ)P3 = H4 (λ) =  . ,    0 H5 (λ) 2 λ where

224

Volodymyr M. Prokip 

1  0  P3 =  . . .  0 0 

−a1 −a2 1 0 ... ... ... ... ... ...

1 0 ... 0 1 0   .. . .. H5 (λ) =  .  0 ... 0 3 λ h34 (λ) . . .

... ... ... 0 ...

−an−3 ... ... 1 0 ... ... 1

 0 0  . . .  ∈ GL(n − 1, F), 0 1 0 0 .. .

0 h3,n−1 (λ) a(λ)



    ∈ M (n − 2, F[λ])  

and h3k (λ) = h2k (λ) − ak λ2 for all k = 4, 5, . . ., n − 1. Continuing these procedures further, after a finite number of steps we obtain that for La (λ) = In λ − La there exist matrices P ∈ GL(n, F) and Q(λ) ∈ GL(n, F[λ]) such that P La (λ)Q(λ) = Ha (λ). (3.7) It is obvious that Ha (λ) is uniquely determined for La (λ) = In λ − La with respect to semi-scalar equivalence, i.e., Ha(λ) is the canonical form for La (λ). From equality (3.7) we find that P La (λ) = Ha (λ)Q−1 (λ). From this it follows that P La (λ)P −1 = Ha (λ)W (λ) = In λ − C, where C = P La P −1 ∈ M (n, F) and W (λ) = Q−1 (λ)P −1 ∈ GL(n, F[λ]). This completes the proof of Lemma 3.1

3.2.

The First Normal Form of a Matrix Pencil with Respect to Semi-Scalar Equivalence

In this subsection we give a normal form of matrix pencil A(λ) = A0 λ − A1 ∈ Mn,n (F[λ]) with a nonsingular matrix A0 with respect to semi-scalar equivalence. Theorem 3.1. Let normalized polynomials a1 (λ), a2(λ), . . . , ak (λ) ∈ F[λ], deg ai (λ) = mi ≥ 1, be invariant polynomials of a matrix pencil A0 λ − A1 ∈

Equivalence of Polynomial Matrices over a Field

225

Mn,n (F[λ]). If A0 is a nonsingular matrix, then A(λ) = A0 λ − A1 is semiscalar equivalent to the block-diagonal (n × n) matrix   1 0 ... ... 0 0 1 0 ... 0   k  k  HA (λ) = ⊕ . . . . . . . . . ... ...  = ⊕ Hai (λ).  i=1  i=1 0 ... 0 1 0  λ λ2 . . . λmi −1 ai (λ)

The matrix HA (λ) for the matrix pencil A0 λ−A1 with nonsingular A0 is unique up to permutations of the diagonal blocks Hai (λ). Proof. Since A0 is nonsingular, we write A0 λ − A1 in the form A0 λ − A1 = (In λ − A)A0 , where A = A1 A−1 0 ∈ Mn,n (F). It is clear that a1 (λ), a2(λ), . . . , ak (λ) are invariant polynomials of the matrix pencil In λ − A. For A there exists a matrix T ∈ M (n, F) such that (see. [18], Chapter 7, p. 264)  T AT −1 = LA = diag La1 , La2 , . . . , Lak ,

where Lai denotes the companion matrices associated with the invariant polynomials ai (λ) for all i = 1, 2, . . ., k. It is well known that LA is uniquely determined with the accuracy of permutation blocks Lai . So, we have that A0 λ−A1 is strictly equivalent to the block-diagonal matrix U (A0 λ − A1 )V = In λ − LA =  Im1 λ − La1 0  0 I λ m2 − La2   ... ... 0 ...

 ... ... 0  0 ... 0  , (3.8)  ... ... ... . . . 0 Imk λ − Lak

where U, V ∈ M (n, F). By Proposition 3.2 the matrix pencils A0 λ − A1 and In λ − LA are semi-scalar equivalent. According to Lemma 3.1 the matrices Imi λ − Lai and Hai (λ) are semi-scalar equivalent, i.e., Pi Lai (λ)Qi(λ) = Hai (λ), where Pi ∈ GL(mi , F), Qi (λ) ∈ GL(mi , F[λ]) and i = 1, 2, . . ., k.

226

Volodymyr M. Prokip For matrices  P = diag P1 , P2 , . . . , Pk U ∈ GL(n, F)

and

 Q(λ) = V diag Q1 (λ), Q2 (λ), . . . , Qk (λ) ∈ GL(n, F[λ])

we have

 1 0 ... ... 0 0 1 0 ... 0   k   ... ...  P (A0 λ − A1 )Q(λ) = HA (λ) = ⊕ . . . . . . . . . . i=1  0 ... 0 1 0  λ λ2 . . . λmi −1 ai (λ) 

Since the matrix Hai (λ) is uniquely defined for the matrix Iki λ − Lai , then HA (λ) is uniquely determined with the accuracy of permutation blocks Hai (λ). The Theorem is proved. We describe the structure of nonsingular matrices from the ring Mn,n (F[λ]), which can be reduced to the established normal form with the help of such transformations. Theorem 3.2. Let normalized polynomials a1 (λ), a2(λ), . . ., ar (λ) ∈ F[λ], deg ai (λ) = mi ≥ 1, be invariants polynomials of a nonsingular matrix A(λ) ∈ Mn,n (F[λ]). If deg(det A(λ)) = k < n, then A(λ) is semi-scalar equivalent to the block-triangular (n × n) matrix   1 0 ... ... 0 0 1 0 ... 0   r   ... ...  D(λ) = In−k ⊕ . . . . . . . . .  i=1  0 ... 0 1 0  λ λ2 . . . λmi −1 ai (λ)

if and only if for A(λ) there exists the following representation   In−k 0n−k,k A(λ) = U V (λ), 0k,n−k Ik λ − B

where U ∈ GL(n, F ), B ∈ Mk,k (F) and V (λ) ∈ GL(n, F[λ]).

(3.9)

227

Equivalence of Polynomial Matrices over a Field

Proof. We assume that a nonsingular matrix A(λ) ∈ Mn,n (F[λ])) admits the representation in the form of product (3.9). Since U and V (λ) are invertible matrices, then a1 (λ), a2(λ), . . ., ar (λ) are invariant polynomials of the matrix B(λ) = Ik λ − B ∈ Mk,k (F[λ]). By Theorem 3.1 the matrix pencil Ik λ − B is r

semi-scalar equivalent to the matrix ⊕ Hai (λ), i.e., i=1

r

P1 (Ik λ − B)Q1 (λ) = ⊕ Hai (λ), i=1

where P1 ∈ GL(k, F ) and Q1 (λ) ∈ GL(k, F[λ]). Taking into account equality (3.9), we conclude that the matrix A(λ) can be reduced to the cellular diagonal r

(n × n) matrix D(λ) = In−k ⊕ Hai (λ). i=1

Conversely, suppose that for nonsingular matrix A(λ) ∈ Mn,n (F[λ]) the following relation holds deg(det A(λ)) = k < n. Put ai (λ) ∈ F[λ], which are invariant polynomials of A(λ) and deg ai (λ) = mi ≥ 1 for all i = 1, 2, . . . , r. Let A(λ) be semi-scalar equivalent to the block-diagonal matrix r

D(λ) = In−k ⊕ Hai (λ) = P A(λ)Q(λ), i=1

(3.10)

where P ∈ GL(n, F)and Q(λ) ∈ GL(n, F[λ]). By Theorem 3.1 the matrix r

H(λ) = ⊕ Hai (λ) ∈ Mk,k (F[λ]) i=1

is semi-scalar equivalent to a matrix pencil B(λ) = Ik λ−B ∈ Mk,k (F[λ]) with invariant polynomials a1 (λ), a2(λ), . . ., ar (λ), i.e., P1 H(λ)Q1(λ) = B(λ), where P1 ∈ GL(k, F) and Q1 (λ) ∈ GL(k, F[λ]). It may be noted that the matrix B is unique up to similarity. It is easy to make sure that     In−k 0n−k,k In−k 0n−k,k In−k 0n−k,k D(λ) = . (3.11) 0n−k,k Ik λ − B 0n−k,k Q−1 0k,n−k P1−1 1 (λ) Taking into account equalities (3.10) and (3.11) we obtain that for A(λ) there exists a representation in the form of product (3.9). The Theorem is proved. One important special case of Theorem 3.2 is the following corollary.

228

Volodymyr M. Prokip

Corollary 4. Let normalized polynomials a1 (λ), a2(λ), . . ., ak (λ) ∈ F[λ], deg ai (λ) = mi ≥ 1, be invariant polynomials of a nonsingular matrix A(λ) ∈ Mn,n (F[λ]). If deg(det A(λ)) = n, then A(λ) is semi-scalar equivalent to the block-diagonal matrix   1 0 ... ... 0 0 1 0 ... 0   k k  HA (λ) = ⊕ Hai (λ) = ⊕  ... ... ... ... ...    ∈ Mn,n (F[λ]) i=1 i=1  0 ... 0 1 0  2 mi−1 λ λ ... λ ai (λ)

if and only if A(λ) is the right equivalent to a matrix pencil In λ − B ∈ Mn,n (F[λ]), i.e., A(λ)W (λ) = In λ − B, where W (λ) ∈ GL(n, F[λ]). The matrix HA (λ) is uniquely determined for A(λ) up to permutation of its direct summands Hai (λ).

Corollary 5. Let nonsingular matrices A1 (λ), A2(λ) ∈ Mn,n (F[λ]) be the right equivalent to monic matrix pencils, i.e., Ai (λ)Wi(λ) = In λ − Bi , where Wi (λ) ∈ GL(n, F[λ]), i = 1, 2. Matrices A1 (λ) and A2 (λ) are semi-scalar equivalent if and only if A1 (λ) and A2 (λ) are equivalent.

3.3.

The Second Normal Form of a Matrix Pencil with Respect to Semi-Scalar Equivalence

In this subsection we will describe the second normal form for nonsingular polynomial matrices. Theorem 3.3. Let normalized polynomials h1 (λ), h2(λ), . . ., hr (λ) ∈ F[λ], deg ai (λ) = mi ≥ 1, be elementary divisors of a matrix pencil A0 λ − A1 ∈ Mn,n (F[λ]). If A0 is a nonsingular matrix, then A(λ) = A0 λ − A1 is semiscalar equivalent to the block-diagonal matrix   1 0 ... ... 0 0 1 0 ... 0   r r  G(λ) = ⊕ Hhi (λ) = ⊕  . . . . . . . . . . . . . ..   . i=1 i=1  0 ... 0 1 0  2 mi−1 λ λ ... λ hi (λ)

The matrix G(λ) is uniquely determined for A(λ) = A0 λ − A1 up to permutation of its direct summands Hhi (λ).

Equivalence of Polynomial Matrices over a Field

229

Proof. A matrix pencil A(λ) = A0 λ − A1 with nonsingular matrix A0 is scalar r

equivalent to a matrix pencil In λ − B, where B = ⊕ Lhi and the matrix i=1

B is determined up to similarity (see. [18], Chapter 7, p. 269). In view of r Proposition 3.2, we note that the direct sum Iλ − ⊕ Lhi is the canonical form i=1

of A(λ) = A0 λ − A1 with nonsingular matrix A0 with respect to both strict equivalence and semi-scalar equivalence. According to Lemma 3.1, the matrices Iki λ − Lhi and Hhi (λ) are semiscalar equivalent, i.e., there exist matrices Pi ∈ GL(mi, F) and Qi (λ) ∈ GL(mi, F[λ]) such that Pi (Iki λ − Lhi )Qi(λ) = Hhi (λ),

i = 1, 2, . . ., r.

We can now easily show that for matrices P = diag (P1 , P2 , . . . , Pr ) ∈ Mn,n (F) and Q(λ) = diag (Q1 (λ), Q2(λ), . . ., Qr (λ)) ∈ Mn,n (F[λ]) we have r

P A(λ)Q(λ) = G(λ) = ⊕ Hhi (λ) = i=1



 1 0 ... ... 0 0 1 0 ... 0   r   . . . . . . . . . . . . . ..  ⊕ . i=1  0 ... 0 1 0  λ λ2 . . . λmi −1 hi (λ) r

It is obvious that the matrix G(λ) = ⊕ Hhi (λ) is uniquely defined up i=1

to rearrangement of diagonal blocks. This completes the proof of Theorem 3.3. Theorem 3.4. Let normalized polynomials h1 (λ), h2(λ), . . ., hs (λ) ∈ F[λ], deg hi (λ) = mi ≥ 1, be elementary divisors of a nonsingular matrix A(λ) ∈ Mn,n (F[λ]). If deg(det A(λ)) = k < n, then A(λ) is semi-scalar equivalent to a block-triangular (n × n) matrix   1 0 ... ... 0 0 1 0 ... 0   s   D(λ) = In−k ⊕ . . . . . . . . . ... ...   i=1  0 ... 0 1 0  λ λ2 . . . λmi −1 hi (λ)

230

Volodymyr M. Prokip

if and only if for A(λ) there exists the following representation   In−k 0n−k,k A(λ) = U V (λ), 0k,n−k Ik λ − B where B ∈ Mk,k (F), U ∈ GL(n, F ) and V (λ) ∈ GL(n, F[λ]). The proof of Theorem 3.4 is the same as the proof of Theorem 3.2 and we leave all details for readers.

R EFERENCES [1] Gantmakher F. R. The Theory of Matrices. Chelsea, New York, 1959. [2] Friedland S. Matrices: Algebra, Analysis and Applications. World Scientific, 2015. [3] Kazimirs’kyi P.S. Decomposition of Matrix Polynomials into factors. Naukova Dumka, Kyiv, 1981 (in Ukrainian). [4] Baratchart L. (1982). Un theoreme de factorisation et son application a la representation des systemes cyclique causaux. C. R. Acad. Sci. Paris, Ser.1: Mathematics, 295: 223–226. [5] Dias da Silva J.A., Laffey T.J. (1999). On simultaneous similarity of matrices and related questions. Linear algebra and its applications, 291: 167– 184. [6] Kazimirs’kyi P.S., Zelisko V.R. and Petrychkovych V.M. (1976). To the question of the similarity of matrix polynomials. Dokl. Akad. Nauk Ukr. SSR, Ser. A, 10: 867–878 (in Ukrainian). [7] Drozd Yu. A. (1977). On tame and wild matrix problems. In: Yu.A. Mitropol’skii (Ed.), Matrix Problems. Inst. Mat. Akad. Nauk Ukrain. SSR, Kiev: 104–114 (in Russian). [8] Drozd Yu. A. (1979). Tame and wild matrix problems. In: Yu. A. Mitropol’skii (Ed.), Representations and Quadratic Forms. Inst. Mat. Akad. Nauk Ukrain. SSR, Kiev: 39–74 (in Russian).

Equivalence of Polynomial Matrices over a Field

231

[9] Drozd Yu. A. (1980). Tame and wild matrix problems. Lecture Notes in Math., 832: 242–258. [10] Friedland S. (1983) Simultaneous similarity of matrices. Advances in Mathematics, 50: 189–265. [11] Sergeichuk V.V. (2000). Canonical matrices for linear matrix problems. Linear algebra and its applications, 317: 53–102. [12] Dodig M. (2008). Eigenvalues of partially prescribed matrices. Electron. J. Linear Algebra, 17: 316–332. [13] Prokip V. M. (2012). Canonical form with respect to semi-scalar equivalence for a matrix pencil with nonsingular first matrix. Ukrainian Math. J., 63: 1314–1320. [14] Prokip V. M. (2013). On the normal form with respect to the semi-scalar equivalence of polynomial matrices over the field. J. Math. Sciences, 194: 149–155. [15] Kazimirskii P.S., Bilonoga D. M. (1990). Semi-scalar equivalence of polynomial matrices with pairwise coprime elementary divisors. Dokl. Akad. Nauk Ukr. SSR, Ser. A, 4: 8–9 (in Ukrainian). [16] Mel’nyk O. M. (1993). Construction of unital matrix polynomials with mutually distinct characteristic roots. Ukrainian Math. J., 45: 76 – 84. [17] Shavarovs’kyi B.Z. (2018). On some invariants of polynomial matrices with respect to semi-scalar equivalence. Appl. Problems of Mechanics and Math., 16: 14–18 (in Ukrainian). [18] Lancaster P., Tismenetsky M. The theory of matrices. Second edition with applications. Academic Press, New York, 1985.

In: Hot Topics in Linear Algebra Editor: Ivan I. Kyrchei

ISBN: 978-1-53617-770-1 © 2020 Nova Science Publishers, Inc.

Chapter 7

MATRICES IN CHEMICAL PROBLEMS MODELED USING DIRECTED GRAPHS AND MULTIGRAPHS Victor Martinez-Luaces Electrochemistry Multidisciplinary Research Group, Faculty of Engineering, UdelaR, Montevideo, Uruguay

ABSTRACT It is well known that chemistry is an interesting source for mathematical modeling problems. Particularly, mixing problems (MPs) provide challenging situations, which were partially analyzed in previous works. These problems led to linear ordinary differential equations (ODE) systems, for which the associated matrices (so-called MP-matrices) have different structures depending on the internal geometry of the system. Graphs theory is an important branch of mathematics, which provides useful tools to characterize MPs geometrical properties. In particular, directed graphs and multigraphs are widely utilized in this chapter for that purpose. The main objective of this work consists in analyzing MP-matrices, focusing on their algebraic properties, which involve eigenvectors, eigenvalues and their algebraic and geometric multiplicities.



Corresponding Author’s Email: [email protected].

234

Victor Martinez-Luaces Finally, the findings allow analyzing different qualitative behaviors of the ODE system solutions, establishing interesting connections between algebra and other branches of mathematics like graphs and differential equations.

1. INTRODUCTION There exists a strong relationship between chemistry and/or chemical engineering and mathematics, usually observed in areas like differential equations, integral transforms, and numerical methods, among others [1, 2, 3, 4]. This relationship was previous explored in several book chapters released by NOVA Science Publishers [5, 6] and it obviously includes linear algebra [7, 8, 9]. A particular case of this interaction corresponds to mixing problems (MP), also known as “compartment analysis” [10]. In chemistry, MPs involve creating a mixture of two or more substances and then determining some quantity (usually concentration) of the resulting mixture. For instance, a typical MP deals with the amount of salt in a mixing tank. Salt and water enter to the tank at a certain rate, they are mixed with what is already in the tank, and the mixture leaves at a certain rate. This process is modeled by an ordinary differential equation (ODE) and as Groestch affirms: “The direct problem for onecompartment mixing models is treated in almost all elementary differential equations texts” [11]. If instead of only one tank, there is a group, as it was stated by Groestch: “The multicompartment model is more challenging and requires the use of techniques of linear algebra” [11]. Here, different MPs are modeled using graphs and multigraphs. It is wellknown that graphs theory [12] is an important branch of mathematics, which provides useful tools for different purposes, including chemical ones [13]. In particular, directed graphs and multigraphs are widely utilized in this chapter to characterize MPs geometrical properties. Additionally, each MP can be modeled by an ODE system, which associated matrix deserves to be studied since it determines the qualitative behavior of the solutions [14]. Moreover, several qualitative properties (like stability and asymptotic stability) depend on the eigenvalues and eigenvectors – and their algebraic and geometric multiplicities – of the associated matrices, so-called MP-matrices.

Matrices in Chemical Problems Modeled Using Directed Graphs …

235

Finally, the findings of this work allow establishing interesting connections between linear algebra and other branches of mathematics like graphs theory and differential equations.

2. MODELING MPS USING DIRECTED GRAPHS AND MULTIGRAPHS In order to explain how directed graphs and multigraphs can be used for MP modeling, let us consider the tank system represented in Figure 1, where the notation used is the following: 

The symbol  ik represents the flux that goes from the i -th tank to the

k -th one (for instance  34 is the flux that goes from tank 3 to tank 4). 

Cn represents the concentration of a given substance in the n -th tank, which can be supposed homogeneous in the whole tank.



Vn represents the volume of the n -th tank.



The Greek letter

 is used for all the system incoming fluxes, for

instance  1 represents the incoming flux that goes into the first tank. 

Similarly, the Greek letter

 is used for all the system outgoing fluxes,

for instance  4 represents the outgoing flux that leaves the fourth tank. Besides, the nomenclature used in this chapter is the following:   

An input tank is a tank with one or more incoming fluxes (for instance, tanks 1 and 2 are input tanks). An output tank is that one with one or more outgoing fluxes (like tank 4). An internal tank is the one without incoming and/or outgoing fluxes to or from outside the tank system (e.g., tank 3 is an internal tank).

236

Victor Martinez-Luaces

Figure 1. A tank system with incoming and outgoing fluxes.

This tank system can be easily represented by a directed multigraph, where fluxes and volumes are not included in Figure 2, in order to simplify the diagram.

Figure 2. A tank system modeled by a directed multigraph.

Matrices in Chemical Problems Modeled Using Directed Graphs …

237

The same situation happens when a tank is divided into several compartments as showed in Figure 3.

Figure 3. A tank divided into four compartments.

This system can be easily represented by a directed graph, as follows:

Figure 4. A tank divided into compartments modeled by a directed graph.

The previous examples can be generalized and any MP can be modeled by directed graphs or multigraphs. It can be noted that any tank or compartment has a corresponding linear ODE, which can be obtained by performing a mass balance. Then an MP with n tanks or compartments will give a linear ODE system. In the following sections, several examples are presented and the ODE system associated matrices – so-called MP-matrices – are analyzed from the viewpoint of their algebraic properties.

238

Victor Martinez-Luaces

3. TWO DIFFERENT EXAMPLES OF MP-MATRICES Let us consider firstly the tank system schematized in Figure 1 or the corresponding directed multigraph showed in Figure 2. It can be observed that for the first tank there are two incoming fluxes:  1 and  21 and two outgoing fluxes: 12 and 13 , so the associated mass balance can be written as

V1

dC1   1C 1   21C2   12  13  C1 dt

(1)

It is important to mention that in Eq. (1) it is supposed that the incoming flux  1 has a salt concentration C 1 which can be the same as C 2 or not. So, the concentrations corresponding to the system incoming fluxes

i , i  1, 2 are not necessarily the same. The relating mass balance for the other tanks can be written as:

V2

dC2   2C 2  12C1   21C2 dt

(2)

V3

dC3  13C1   34C3 dt

(3)

V4

dC4   34C3   4 C4 dt

(4)

If all these equations are put together, the following ODE system is obtained:

Matrices in Chemical Problems Modeled Using Directed Graphs …

 dC1 V1 dt   1C 1   21C2  12  13 C1  V dC2   C   C   C 2 2 12 1 21 2  2 dt  V dC3   C   C 13 1 34 3  3 dt  dC V4 4   34C3   4 C4  dt

239

(5)

After some algebraic manipulations, the corresponding mathematical model can be written as

d C  AC  B , where: dt  C1    1C 1 V1       2C 2 V2   C2   , C    ,B    C3 0     C   0     4

and the system associated matrix (i.e., the MP-matrix) is the following:

  12  13  V1  12 V2  A 13 V3   0 

 21 V1

0

  21 V2

0

0

  34 V3

0

 34 V4

  0   0    4 V4  0

(6)

If the tank system corresponding to Figure 3 and Figure 4 is considered, the ODE system can be written as:

240

Victor Martinez-Luaces

 dC1 V1 dt   1C 1  12  13 C1  V dC2   C   C 12 1 24 2  2 dt  V dC3   C   C 13 1 34 3  3 dt  dC V4 4   24C2   34C3   4 C4  dt Once again, the ODE system can be written as

(7)

d C  AC  B , and the dt

associated MP-matrix is the following:

  12  13  V1  12 V2  A 13 V3   0 

0

0

  24 V2

0

0

  34 V3

 24 V4

 34 V4

  0   0    4 V4  0

(8)

It is interesting to observe that the last MP-matrix (8) is lower, whereas the previous one (6) has a different structure. Moreover, the corresponding eigenvalues for matrix (8) are the diagonal entries, and all of them are negative numbers. The previous observation has an important consequence from the point of view of the solutions qualitative behavior, since all of them show asymptotic stability. This is a particular case of general theorems to be proved in section 5. Before presenting the theoretical results, some typical examples are considered.

Matrices in Chemical Problems Modeled Using Directed Graphs …

241

4. SPECIAL GRAPHS STRUCTURES AND THEIR CORRESPONDING MP-MATRICES In this section, some special graphs structures are considered as possible diagrams corresponding to different MPs. The special cases of undirected graph structures choosen for this study are the following: complete graphs ( K n ), cycles ( Cn ), wheels ( Wn ), n -cubes ( Qn ), complete bipartite graphs ( K m,n ) and a full binary tree ( T ). Examples of all these graphs will be posed and the corresponding MP-matrices will be obtained in order to analyze their structure.

4.1. Complete Graphs Let us consider an example ( K 4 ) of a complete graph, which can be drawn as a planar graph. The following MP, schematized using the planar graph K 4 , is shown in Figure 5.

Figure 5. An MP modeled by the planar graph

K4 .

For the previous structure, the corresponding ODE system can be written as:

242

Victor Martinez-Luaces

 dC1 V1 dt   1C 1  12  13  14 C1  V dC2   C     C 12 1 23 24 2  2 dt  V dC3   C   C   C 13 1 23 2 34 3  3 dt  dC V4 4  14C1   24C2   34C3   4 C4  dt If this ODE system is be written as

(9)

d C  AC  B , then, the associated dt

MP-matrix is the following:    12  13  14  V1  12 V2 A  13 V3   14 V4 

0

0

   23   24  V2

0

 23 V3

34 V3

 24 V4

 34 V4

  0   0   4 V4  0

(10)

It is interesting to observe that once again, this is a lower matrix and its eigenvalues are the diagonal entries:

i  

 j

Vi

ij

0

(11)

4.2. Cycles An example ( C4 ) of a cycle has been considered in Figure 4 and its associated MP-matrix is given by (8). As in the previous example, this is a lower matrix and the corresponding eigenvalues are the diagonal entries, so, the eigenvalues are:

Matrices in Chemical Problems Modeled Using Directed Graphs …

i  

 j

Vi

243

ij

0

(12)

4.3. Wheels Wheels can be exemplified by W5 , as in Figure 6.

Figure 6. An MP modeled by the wheel

W5 .

In this case, the corresponding ODE system can be written as:  dC1 V1 dt   1C 1  12  13  14 C1  V dC2   C   C   C 12 1 25 2 32 3  2 dt  dC  3  13C1   32   34   35 C3 V3 dt   dC4 V4 dt  14C1   34C3   45C4  V dC5   C   C   C   C 25 2 35 3 45 4 5 5  5 dt

Consequently, the corresponding MP-matrix is the following:

(13)

244

Victor Martinez-Luaces    12  13  14  V1 0 0 0 0    12 V2  25 V2 32 V2 0 0   A 13 V3 0   32  34  35  V3 0 0    14 V4 0 34 V4  45 V4 0    0  25 V5 35 V5  45 V5 5 V5  

(14)

 32

This MP-matrix is not lower due to the entry V2 located at the position

2,3 . However, if the flux that goes from the third to the second compartment

is inverted (i.e., the arrow points in the opposite direction), then the MP-matrix will be a lower one, like in matrices (8) and (10). This fact will be deeply analyzed in section 5.

4.4. n-Cubes The tridimensional cube ( Q3 ) is an interesting example that can be drawn as a planar graph. An MP, schematized using the planar graph Q3 , is shown in Figure 7.

Figure 7. An MP modeled by a tridimensional cube.

Matrices in Chemical Problems Modeled Using Directed Graphs …



In this case, the notation Ci (t ) will be used instead of

245

dCi in order to dt

simplify the corresponding ODE system:

V C  (t )   C         C 1 1 12 13 15 1  1 1 V C  (t )   C   C   C 12 1 24 2 62 6  2 2 V C  (t )   C       C 13 1 34 37 3  3 3 V C  (t )   C   C   C   C  4 4 24 2 34 3 4 4 84 8   V5C5 (t )  15C1    56   57  C5  V6C6 (t )   56C5    62   68  C6  V7C7 (t )   37C3   57C5   78C7  V8C8 (t )   68C6   78C7  84C8

(15)

In order to write the associated MP-matrix, the following notation will be used:

A

12  13  15  , V1

B

and D  

34  37  , C   56  57  V3

62  68  V6

V5

(16)

Once more, it can be observed that the MP-matrix (Equation 17) is not lower. This fact is due to the entries positions 2,6 and 4,8 .

 62  and 84 which are located at the V2 V4

From the direct graph viewpoint, this situation took place because of the direction of the arrows that go from node 6 to node 2 and node 8 to node 4.

246

Victor Martinez-Luaces 0 0 0 0 0 0 0   A   0 0 0  62 V2 0 0   12 V2  24 V2  13 V3 0 B 0 0 0 0 0    0  24 V4 34 V4  4 V4 0 0 0 84 V4   A  15 V5 0 0 0 C 0 0 0    0 0 0 56 V6 D 0 0   0  0 0 37 V7 0 57 V7 0  78 V7 0     0 0 0 0 0  68 V8  78 V8 84 V8  

(17)

It can be conjectured that if all arrows go from i to j being i  j , then the MP-matrix will be lower. This fact is proved in section 5.

4.5. Complete Bipartite Graphs The complete bipartite graph K 2,3 is another example that can be drawn as a planar graph. The following MP can be schematized using this graph:

Figure 8. An MP modeled by the graph

K 2,3 .

The mathematical model corresponding to this MP can be written as an ODE linear system, as follows:

Matrices in Chemical Problems Modeled Using Directed Graphs …

 dC1 V1 dt   1C 1  12  13  14 C1  V dC2   C   C 12 1 25 2  2 dt  dC  3  13C1   35C3 V3  dt  dC4 V4 dt  14C1   45C4  V dC5   C   C   C   C 25 2 35 3 45 4 5 5  5 dt

247

(18)

And the associated MP-matrix is the following:

0 0 0 0    12  13  14  V1   12 V2   25 V2 0 0 0   A 13 V3 0   35 V3 0 0    14 V4 0 0   45 V4 0    0  25 V5  35 V5  45 V5   5 V5  

(19)

It is interesting to observe that this MP-matrix can be obtained from equation (14) by simply putting  32  0 and  34  0 .This fact is consistent with the analysis of the graphs, since the wheel W5 becomes the graph K 2,3 if the arrows that go from node 3 to node 2 and node 3 to node 4 are eliminated.

4.6. Full Binary Tree A full binary tree T is the last example to be considered. The following MP can be schematized using this graph (Figure 9):

248

Victor Martinez-Luaces

Figure 9. An MP modeled by a full binary tree



Once again, the notation Ci (t ) is used instead of

T.

dCi in order to simplify dt

the corresponding ODE system:

V C  (t )   C     C 1 1 12 13 1  1 1 V C  (t )   C   C 12 1 2 2  2 2 V C  (t )   C     C 13 1 34 35 3  3 3   V4C4 (t )   34C3   46   47 C4   V5C5 (t )   35C3   5 C5   V6C6 (t )   46C4   6 C6   V7C7 (t )   47C4   7 C7 

(20)

The corresponding MP-matrix is a lower one. This result may be expected, since all arrows in Figure 9 go from i to j being i  j . This fact can be observed in matrix (21).

Matrices in Chemical Problems Modeled Using Directed Graphs …    12  13    0 0 0 0 0 0    V 1           2  12   0 0 0 0 0     V2  V2          34 35  13   0  0 0 0 0     V3 V3          46   47   34 A 0 0  0 0 0   V V   4 4      5  35  0 0 0   0 0    V5  V5        6   46 0 0 0 0  0    V6  V6         47  7   0 0 0 0 0      V7  V7   

249

(21)

Thus, taking into account all these examples, several theorems, corollaries and conjectures are presented in next section.

5. THEOREMS AND COROLLARIES ABOUT MP-MATRICES As it was observed in section 4, in MPs where all arrows go from node i to node j being i  j – like those studied in subsections 4.1, 4.2, 4.5 and 4.6 – the corresponding MP-matrix is a lower one. This fact can be easily observed in matrices (8), (10), (19) and (21) and it can be considered as a direct consequence of the form of the linear ODE corresponding to the i -th node. The following theorem formalizes the demonstration of this property: Theorem 1. In an MP modeled by a directed graph or multigraph, such that there is no arrow going from node j to node i being i  j , the corresponding MP-matrix is lower. Proof. If there is no arrow going from node j to node i being i  j , then all arrows which head is in node j have their tails in nodes i1, i 2 ,, ik being

i1, i2 ,, ik a subset of 1,2 ,, j  1. The only possible exception to this

250

Victor Martinez-Luaces

rule is that the directed graph or multigraph can include one more arrow which tail is in node  and its head in node j . Similarly, all arrows which tail is in node j have their heads in nodes n1, n2 ,, np , being n1, n2 , , np a subset of

 j  1, j  2 ,, m , being m

the number of nodes of the directed graph or multigraph. The only possible exception to this rule is that the directed graph or multigraph can include one more arrow having its tail in node j and its head in node  . This situation is illustrated in the Figure 10.

Figure 10. Arrows having their heads or tails in node

j.

As a consequence, the ODE corresponding to node j can be written as:

Vj

dC j dt

  ajCaj   i1, j Ci1    ik , j Cik   j ,n1     j ,np   j , C j

(22)

In this equation,  aj , i1, j , , ik , j are the incoming fluxes, where the flux

 aj can be zero or not, and  j ,n1 , ,  j ,np ,  j , are the outgoing fluxes where the flux  j , can be zero or not. Another option consists in writing this equation as:

Matrices in Chemical Problems Modeled Using Directed Graphs …

251

  ajCaj  1 j C1    j 1, j C j 1   j , j 1     j ,m   j , C j

(23)

Vj

dC j dt

where a given incoming flux   j or outgoing flux  j can be zero if there is no arrow that links the corresponding node with node j . As a consequence of equation (23), the j -th row of the MP-matrix has entries:

 xj Vj

for x  j ,   j , j 1     j ,m   j , for the diagonal entry and 0 Vj

otherwise. It should be noted that

 j Cj

is part of the ODE system

Vj

independent term, and then it is not part of the MP-matrix. Taking into account the previous results, the j -th row of the MP-matrix is:

 1 j    j , j 1     j ,m   j ,   j 1, j A( j)     0  0   Vj  Vj Vj    

(24)

Then, the MP-matrix is:   12       A      

 1m  1   V1 

0

12 V2

     2 m   2    23  V2  

1m Vm

 2m Vm

    0       m       Vm   0

(25)

And the theorem is proved. Theorem 2. Let us consider an MP modeled with a directed graph or multigraph, such that there is no arrow going from node j to node i being

i j.

252

Victor Martinez-Luaces In this case, all the MP-matrix eigenvalues are negative numbers.

Proof. Since the MP-matrix (25) is lower, its eigenvalues are:  12    1m  1 V1 

1  



      2 m   2   , 2   23 V2  



 m   m  , and so i  0  i  1, 2, , m .  Vm 

  , …, 

(26)

A stronger result can be obtained by performing a flux balance at a given node j (Figure 10), which gives:

 aj  i1, j   ik , j   j ,n1     j ,np   j ,

(27)

As a consequence, if node j is not isolated, both terms of equation (27) must be positive and so,  j  0 . The same situation happens for each node of the system, and the theorem is proved. The previous theorem was proved by performing a flux balance. The same procedure can be used to obtain other results, where the hypothesis that no arrow goes from node j to node i being i  j is removed. It is important to mention that this procedure (flux balance) may change in problems with variable volumes [15], which are not considered in this chapter. Theorem 3. Let us consider an MP such that there exits an arrow which tail is in node

 and its head is in node j . Then, the diagonal entry a jj of the MP-

matrix is a negative number and a jj  R j   a ji being R j the sum of the noni j

diagonal entries of the j -th row. Proof. Since the hypothesis that no arrow goes from node j to node i being

i  j was removed, the situation can be modeled again by a diagram like in

Matrices in Chemical Problems Modeled Using Directed Graphs …

253

Figure 10 although in this case i1, i 2 ,, ik  and n1, n2 ,, np are both subsets of 1,2 ,, m, without any restriction. Even more, it is possible that

ix  ny for some x, y  1,2 ,, m, like in Figure 2 where there is an arrow which head is node 1 and its tail is node 2 and there is another one that goes in the opposite direction. As a consequence, the ODE corresponding to node j can be written as follows:

Vj

dC j dt

  a, j Ca, j  1, j C1    j 1, j C j 1   j 1, j C j 1     m, j Cm    j ,1     j , j 1   j , j 1     j ,m   j , C j

(28)

where a certain incoming flux   j or outgoing flux  j can be zero if there is no arrow that links the corresponding node with node j . Then, the j -th row of the MP-matrix is:

  j ,    j ,k  1, j  j 1, j k j A    Vj Vj  Vj  ( j)

 j 1, j Vj

  m, j    Vj  

(29)

It follows straightforward that the diagonal entry can be expressed as

a j, j  

 j ,    j ,k k j

Vj

(30)

which is a non-positive number. If a flux balance is performed at node j , the following equation holds:

 a, j  1, j    j1, j   j1, j     m, j   j ,1     j , j1   j , j1     j ,m   j,

(31)

254

Victor Martinez-Luaces A simpler version of equation (31) can be written as follows:  , j    k , j   j ,    j ,k k j

(32)

k j

which divided by V j gives:  , j Vj

The term



k, j

k j

 k j

k, j Vj



 j , Vj

 k j

 j ,k

(33)

Vj

represents the sum of the non-diagonal entries of the j -

Vj

th row of the MP-matrix (see equation (29)), so it corresponds to R j   a ji . i j

It also should be noted that

 j , Vj

so a jj 

 j , Vj

 k j

 j ,k

 k j

 j ,k

is the opposite of the diagonal entry,

Vj

and then, equation (33) can be rewritten as

Vj  , j Vj

 R j  a jj

(34)

Moreover, the hypothesis says that there exits an arrow which tail is in node

 and its head is in node j , so the incoming flux is j  0 , and the first term of (34),

 , j Vj

, is a positive number. Then, from equation (34) follows that a jj

must be positive and greater than R j and the theorem is proved. The previous theorem has many consequences. Some of them are analyzed in the following corollaries.

Matrices in Chemical Problems Modeled Using Directed Graphs …

255

Corollary 1. If an MP is modeled with a directed graph or multigraph, such that for every node j there exits an arrow which tail is in node  and its head is in node j , then the corresponding MP-matrix is a strictly diagonally dominant matrix. Proof. The proof is obvious since in each row the diagonal entry a jj satisfies the inequality a jj  R j   a ji being a jj and R j as in theorem 3. i j

It should be noted that the hypothesis of Corollary 1 is equivalent to say that every element of the MP is an input tank. In order to prove the next corollary and other results, the Gershgorin circle theorem will be used. The first version of it was published by S. A. Gershgorin in 1931 [16, 17]. This theorem is particularly useful to bind the spectrum of a complex n n matrix and its statement is the following: Theorem (Gershgorin). If A is an n n matrix, with entries aij being i, j  1,, n  and Ri   aij is the sum of the non-diagonal entries modules j i

in the i -th row, then every eigenvalue of A lies within at least one of the closed discs D aii , Ri  , called Gershgorin discs.

Corollary 2. Under the same conditions of Corollary 1 – for every node j there exits an arrow which tail is in node  and its head is in node j , or equivalently, every element of the MP is an input tank – then all the MP-matrix eigenvalues satisfy the condition Rei   0 . Proof. Using the same notation of Theorem 3, it follows that in each row the diagonal entry a jj is negative and it satisfies the inequality a jj  R j   a ji . i j

Then, the Gershgorin closed disc D aii , Ri  , corresponding to the i -th row is like the disc D1 , schematized in Figure 11.

256

Victor Martinez-Luaces

Figure 11. Examples of Gershgorin circles. ( )

Since D1  C

( )

, being C

 z  C / Re( z)  0, the subset of complex

numbers with negative real part, then all the eigenvalues must satisfy the condition Rei   0 , which proves the corollary. In the following theorem, the hypothesis about a node j , such that there exits an arrow which tail is in node  and its head is in node j , will be removed. Theorem 4. Let us consider a node j , such that there is no arrow which tail is in node

 and its head is in node j . Then, the diagonal entry a jj of the MP-

matrix is a negative number and a jj  R j   a ji being R j the sum of the noni j

diagonal entries of the j -th row. Proof. As in previous cases, the ODE corresponding to node j can be written as:

Vj

dC j dt

 1, j C1    j 1, j C j 1   j 1, j C j 1     m, j Cm    j ,1     j , j 1   j , j 1     j ,m   j , C j

(35)

Matrices in Chemical Problems Modeled Using Directed Graphs …

257

Equation (35) is a particular case of equation (28), when  a , j Ca , j  0 , since there is no incoming flux j as a consequence of the new hypothesis (no arrow that links node

 and node j ).

This fact does not change the j -th row of the MP-matrix, which can be written as:

  j ,    j ,k  1, j  j 1, j k j A    Vj Vj  Vj  ( j)

 j 1, j Vj

  m, j    Vj  

(29)

Once again, the diagonal entry is

a j, j  

 j ,    j ,k k j

(30)

Vj

which is a non-positive number. The situation changes when a flux balance is performed at node j , since the following equation holds:

1, j  j 1, j  j 1, j  m, j   j,1   j, j1  j, j 1   j,m  j,

(36)

A simpler version of equation (36) can be written as follows:

 k j

k, j

  j ,    j ,k

(37)

k j

which divided by V j gives:

 k j

k, j Vj



 j , Vj

 k j

 j ,k Vj

(38)

258

Victor Martinez-Luaces It should be noted that equations (36), (37) and (38) are particular cases of

equations (31), (32) and (33) when j  0 . As in Theorem 3, the term  k j

a jj 

 j , Vj

 k j

 j ,k Vj

k, j Vj

represents R j 

a i j

ji

and here again

and then, the equation (38) can be rewritten as R j  a jj

and the theorem is proved. The following corollary combines the previous theorem and the Gershgorin circle theorem, with the aim of binding the spectrum of the corresponding MPmatrix. Corollary 3. In the same conditions of Theorem 4, i.e., a node j is considered, such that there is no arrow which tail is in node

 and its head is in node j .

Then, all the MP-matrix eigenvalues satisfy the condition Rei   0 . Proof. Theorem 4 states that the diagonal entry a jj of the MP-matrix is a negative number and a jj  R j   a ji being R j the sum of the non-diagonal i j

entries of the j -th row. As a consequence, the Gershgorin circle corresponding to the j -th row looks like the discs D2 and D3 , schematized in Figure 11. Both discs D2 and D3 , are included in the closure of the subset C( ) , being

C( )  z  C / Re( z)  0, as in Corollary 2. This closure is the subset of complex numbers with non positive real part, then all the eigenvalues must satisfy the condition Rei   0 , which is the thesis of the corollary. In a general situation, i.e., an MP such that there is an arrow which tail is in node  and its head is in node j and at same time, there is no arrow which tail is in node  and its head is in node k , the Gershgorin discs will look like those of Figure 11. Taking into account the previous results, one more corollary can be proved.

Matrices in Chemical Problems Modeled Using Directed Graphs …

259

Corollary 4. In a given MP all the eigenvalues must satisfy the condition Rei   0 and Rei   0 if and only if i  0 . Proof. The proof is a direct consequence of the position of the Gershgorin discs in the complex plane.

6. EXAMPLES, OTHER RESULTS AND CONJECTURES ABOUT MP-MATRICES In the first example considered in this chapter (Figures 1 and 2), the following MP-matrix was obtained:

  12  13  V1  12 V2  A 13 V3   0 

 21 V1

0

  21 V2

0

0

  34 V3

0

 34 V4

  0   0    4 V4  0

(6)

The flux balances in nodes 1, 2, 3 and 4, give:

 1   21  12  13        2 12 21  13  34 34   4

(39)

Then, as the fluxes are positive numbers, it is easy to observe that

12  13   21 , which is a consequence of the first equation in (39). Due to this fact, the first Gershgorin disk is like D1 , schematized in Figure 11. Similarly, since  2  0 , it follows that  21  12 , and the second Gershgorin disk also, is like D1 , represented in Figure 11.

260

Victor Martinez-Luaces The third and fourth equations in (39) show that the other two Gershgorin

discs will be tangent to the imaginary axis, i.e., like circles D2 and D3 (Figure 11). As it can be expected, the main difference, i.e., being tangent to the imaginary axis, depends on considering an input or a non-input tank, that is, a node j such that there is an arrow which tail is in node  and its head is in node j . Then, an interesting situation that deserves to be analyzed is a closed system, i.e., an MP with no arrows which tail is in node  . Let us consider the following MP, modeled by a directed graph:

Figure 12. A closed MP modeled by a cycle

C3 .

In this case, due to the flux balances in nodes 1, 2 and 3, it must be

1 12   23  31   , and if the volumes are: V1  V , V2  V and 2 1 V3  V , the following ODE system is obtained: 3

 dC1 V dt  C3  C1   1 dC2  C1  C2  V  2 dt  1 dC3  3 V dt  C2  C3 

(40)

Matrices in Chemical Problems Modeled Using Directed Graphs …

261

In order to simplify the problem, let us suppose that   V , each one in its corresponding units. In this special case, the MP-matrix is the following:

1   1 0   A 2 2 0  0 3  3  

(41)

The eigenvalues of this MP-matrix can be easily obtained, and they are

1  0 , 2  3  i 2 , and 3  3  i 2 . This example shows that null and complex eigenvalues take place, at least in closed systems like the MP in Figure 12. From the point of view of the stability of the ODE system solutions, the study of the MP-matrix eigenvalues is crucial. First of all and independently of the previous results, it is easy to observe that all the solutions corresponding to the eigenvalues with Rei   0 tend to vanish when t   . For this purpose, when analyzing eigenvalues with Rei   0 , there are two cases to be considered: i  and i  . In the first case, the corresponding ODE solutions are a linear combination of the following exponential and exponential-polinomial functions

exp   t , t exp   t , t i

i

2



exp  i t ,, t q exp  i t  . In this set of functions,

the number q depends on the algebraic and geometric multiplicity of i (i.e., AM i  and GM i  ). Taking into account that i  0 , it follows that

t n exp  i t  t  0 ,  n  0, 1,, q.  In the second case – which really happens, as it was observed in the example in Figure 12 – we have i  a  bi  (with a  0 , b  0 ). The ODE solutions

are

a

linear

combination

functions: exp  at cosbt , exp  at sinbt ,, t exp  at cosbt , t exp  at sinbt , q

of

the

q

where the number q depends on AM i  and GM i  as in the first case. It is

262

Victor Martinez-Luaces

easy

to

prove

that

t n exp  at  cosbt  t  0 

and

t n exp  at sinbt  t  0 ,  n  0, 1,, q , since a  0 .  The previous analysis confirms that all the solutions corresponding to the eigenvalues with Rei   0 , tend to vanish when t   . In order to complete this study the case Rei   0 must be analyzed. For this purpose it is important to observe that if an eigenvalue i satisfies

Rei   0 , then it must be i  0 , as it was mentioned in Corollary 4.

In this case the ODE solutions are a linear combination of the following functions: exp  0t , t exp  0t , t 2 exp  0t , , t q exp  0t , or 1, t , t 2 , , t q , were the number q again depends on AM 0 and GM 0 . In other words, the corresponding solutions are polynomial and so, they will not tend to vanish nor remain bounded when t   , unless AM 0  MG0 in which case the polynomial becomes a constant. Considering all these results, it is obvious that the stability of the ODE system solutions will depend exclusively on AM 0 and GM 0 . As it was mentioned if AM 0  MG0 the polynomial solutions will be unbounded and this situation becomes chemically nonsensical, since one or more concentrations will tend to infinity and this is not possible. As a consequence, from the chemical point of view, the main conjecture is that in any MP-matrix the eigenvalue   0 only can occur if AM 0  MG0 . It is important to mention that in a previous NOVA Science book chapter [8], it was proved that in a given MP with three or less compartments, all the corresponding ODE system solutions are asymptotically stable, so, necessarily AM 0  MG0 in those cases. There are other important questions and conjectures that deserve to be analyzed and the following ones are just a few examples: Conjecture 1. If   0 is an eigenvalue of a given MP-matrix then AM 0  GM 0 and the ODE system solutions are asymptotically stable.

Matrices in Chemical Problems Modeled Using Directed Graphs …

263

Question 1. Is it possible to find an MP-matrix with an eigenvalue i  0 such that AM 0  1 ? For instance, is it possible to find and MP-matrix with a double eigenvalue i  0 with GM 0  2 or less? Question 2. Is it possible to find an MP-matrix with complex eigenvalues in an open system i.e., having arrows that link the nodes 1, 2, …, m with nodes  and  ? Question 3. Is it possible to find an MP-matrix with complex eigenvalues, i such that AM i   1 ? Question 4. Is it possible to find an MP-matrix with complex eigenvalues, i such that AM i   GM i  ? These questions and conjectures are interesting since they may determine the form of the ODE system solutions and their qualitative behavior.

CONCLUSION Chemical mixing problems are interesting sources for applied research in mathematical modeling, ODE linear systems, integral transforms and linear algebra, among others. At the same time, it is possible to connect them with graphs theory, since MPs can be modeled by using directed graphs and multigraphs. One of the most important connections is about the stability theory of differential equations. In fact, the algebraic properties of the MP-matrices are deeply related to the qualitative behavior of the ODE system solutions which can show asymptotic stability, weak stability or instability, depending on the eigenvalues and their multiplicities. In all cases considered here – and in previous works – flux balances had an important role, particularly when applying the Gershgorin circles theorem. However, it should be mentioned that this situation may change if the sum of

264

Victor Martinez-Luaces

incoming fluxes is not equal to the sum of outgoing fluxes. This represents a different type of problems, usually called mixture problems with variable volumes, which involve other procedures not considered in this chapter. The questions and conjectures presented in this work as well as those situations not considered here, will provide challenging problems for further research in this area. At the same time, it makes possible to stablish interesting connections between linear algebra and other branches of mathematics like graphs theory and differential equations.

ACKNOWLEDGMENTS The author wishes to thank Marjorie Chaves for her valuable contribution to this work.

REFERENCES [1]

[2] [3]

[4]

[5]

[6]

Mickley, H.S., Sherwood, T.S. and Reed, C.E. (1975). Applied Mathematics in Chemical Engineering, 2nd Ed., New Delhi: Tata McGraw-Hill Publ. Co. Ltd. Rice, R.G. and Do, D.D. (2012). Applied Mathematics and Modeling for Chemical Engineers, 2nd Ed. New Jersey: John Wiley & Sons. Martinez-Luaces, V., (2015). Stability of O.D.E. solutions corresponding to chemical mechanisms based-on unimolecular first order reactions, Mathematical Sciences and Applications E-notes. Vol. 3, No. 2, pp. 58-63. Martinez-Luaces, V., (2016). Stability of O.D.E. systems associated to first order chemical kinetics mechanisms with and without final products, Konuralp Journal of Mathematics. Vol. 4, No. 1, pp. 80-87. Martinez-Luaces, V., (2017). Laplace Transform in Chemistry degrees Mathematics courses in Focus on Mathematics Educations Research, Chapter 4. New York: Nova Science Publishers. Martinez-Luaces, V., (2017). Qualitative Behavior of Concentration Curves in First Order Chemical Kinetics Mechanisms, in: Advances in

Matrices in Chemical Problems Modeled Using Directed Graphs …

[7]

[8]

[9]

[10] [11] [12] [13] [14] [15] [16] [17]

265

Chemistry Research, Volume 34, Chapter 7. New York: Nova Science Publishers. Martinez-Luaces, V., (2015). First Order Chemical Kinetics Matrices and Stability of O.D.E. Systems, in: Advances in Linear Algebra Research. Chapter 10, pp. 325-343. New York: Nova Science Publishers. Martinez-Luaces V., (2017). Matrices in Chemical Problems: Characterization, Properties and Consequences about the Stability of ODE Systems, in: Advances in Mathematics Research. Chapter 1, pp. 133. New York: Nova Science Publishers. Martinez-Luaces, V., (2018). Square Matrices Associated to Mixing Problems, in: Matrix Theory: Applications and Theorems, Chapter 3, pp. 41-58. London, UK: In Tech Open Science. Braun M. (2013). Differential Equations and Their Applications. 3rd Edition. New York: Springer. Groestch C. (1999) Inverse Problems: Activities for Undergraduates. Washington D.C.: Mathematical Association of America. Gross, J. L.; Yellen, J., eds. (2004). Handbook of Graph Theory. CRC press. Trinajstic, N. (2018). Chemical Graph Theory. Routledge. Bellman, R. (2008). Stability Theory of Differential Equations. Courier Corporation. Campbell, S.L. and Haberman, R. (2011). Introduction to Differential Equations with Dynamical Systems. Princeton University Press. Varga, R.S. (2004), Geršgorin and His Circles. Berlin, Germany: Springer-Verlag. Shores, T.S. (2007). Applied Linear Algebra and Matrix Analysis. Springer Science & Business Media.

In: Hot Topics in Linear Algebra Editor: Ivan I. Kyrchei

ISBN: 978-1-53617-770-1 © 2020 Nova Science Publishers, Inc.

Chapter 8

ENGAGING STUDENTS IN THE LEARNING OF LINEAR ALGEBRA Marta G. Caligaris*, Georgina B. Rodríguez and Lorena F. Laugero Grupo Ingeniería and Educación, Facultad Regional San Nicolás, Universidad Tecnológica Nacional, San Nicolás, Buenos Aires, Argentina

ABSTRACT Learning Linear Algebra is not an easy task for students, as it is not related to any other concepts acquired before. Most of the them are presented as formal definitions of objects whose existence, in most cases, neither has connection with previous knowledge nor geometric or physical arguments. Generally, Linear Algebra is taught from an approach that emphasizes the formalism and the axiomatic structure that characterizes it. This way of teaching usually causes in students confusion with the concepts, called by many authors “the obstacle of formalism”: the tendency of students to manipulate representations mechanically without understanding their meaning, without perceiving relationships between them. With the appropriate teacher’s guide, visual tools can act as facilitators in the learning process of maths, as they enable the use of different semiotic registers *

Corresponding Author’s Email: [email protected] (Corresponding author).

268

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero about the same concept. The use of this kind of resources in class promotes an ambient where students can experiment, analyze, discuss, and conjecture about what they are learning. The main objective of this chapter is to present some visual apps, designed in Scilab, for the learning of different topics of Linear Algebra. These apps are farreaching resources that give didactical and pedagogical possibilities.

Keywords: linear algebra, formalism obstacle, semiotic registers, mathematical skills, visual apps

INTRODUCTION The learning of Linear Algebra (LA) is generally considered a rather difficult task. This results from the fact that most concepts involved are presented as formal definitions of objects that are neither connected with previous knowledge nor linked in any way to geometrical or physical arguments that motivates the given definition [1]. That is the reason why students consider LA as a “catalogue of very abstract notions where many are not represented, besides that notions are under a cascade of new words, new symbols and new theorems” [2]. This situation generates in students what many authors, as Dorier or Sierpinska, called the Obstacle of Formalism. This is the students´ tendence to behave as if the formal symbolic representations of the objects of LA were the objects themselves; therefore they manipulate the representations mechanically, without understanding their meaning and seeing no relation between them [3]. According to Sierpinska, the obstacle of formalism is a didactical problem and should be avoided. The task of designing situations for this purpose is not easy, given the complexity of this obstacle [4]. Some authors agree that it is important for students to deal with the different concepts of LA from different semiotical registers, to achieve their comprehension [5]. Considering that each representation is partially regarding to what it represents, the interaction between different representations of the mathematical object is absolutely necessary, so as students can learn each concept adequately. Instead, other authors consider that the establishment of a system of mathematical skills is what helps students to understand mathematical concepts [6]. Therefore, the drafting of definitions from the analysis of examples or the deduction of

Engaging Students in the Learning of Linear Algebra

269

properties from an inductive reasoning lets students make images of the different concepts. A student gains a concept when they construct the image of that concept, as it is considered that knowing the definition does not guarantee the comprehension of that concept [7]. This is based on the fact that students do not necessarily use the definition when they have to decide if a mathematical object is an example of a concept. Instead, they decide from the image they have made of it in their minds, in other words, the set of all the mental images and properties and processes associated with that concept. In order to help in the learning of LA, the potential given by technological resources was exploited as instruments that let students represent different semiotic registers of the same concept, and the formation of mathematical skills. Some visual applications were developed for the teaching of many concepts. The main purpose of this chapter is to present the visual apps developed for different issues of LA, together with some activities that make use of those applications.

DIFFICULTIES WHEN LEARNING LINEAR ALGEBRA Despite the teaching method used, students have the same difficulties when learning LA. Dorier says that the two principal causes of these difficulties are: the nature of LA itself (conceptual difficulties) and the kind of thinking required for the comprehension of LA (the cognoscitive difficulties) [8]. These difficulties, together with the way that LA concepts are usually presented, causes students to face the “formalism obstacle.” Students develop defense mechanisms, which means to try to reproduce a speech formally written, similar to the one given on the textbook or to the one given in class, but without understanding the symbols and the terminology [8]. In other words, the formalism obstacle appears in students when they handle formal symbolic representations without understanding them. For example, many students calculate the characteristic polynomial when they only have to check if a given vector is an eigenvector of a linear transformation. Dorier also says that the difficulties that students present in LA are not only caused by formal handling, but also because of the lack of knowledge on logics and elemental set theory.

270

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero

Ways of Thinking in Linear Algebra Students’ thinking modes are a way to typify the reasoning they activate when they have to comprehend some mathematical objects. Sierpinska develops them specifically for LA concepts [9]. This author distinguishes the coexistence of three types of languages in LA: geometrical language (used, for example, to illustrate the representations and properties of R2 and R3 vectors), arithmetic language (used, for example, to describe operations between matrices or solutions of equations) and algebraic language (used to formalize and symbolize objects like vector spaces or linear transformations). Each one of these languages develops, in correspondence, a way of thinking:   

Synthetical – geometrical Analytical – arithmetic Analytical – structural

In the analytical-arithmetic thinking, the mathematical objects are given by relations, operations and procedures with numbers and variables. In the synthetical-geometrical way, mathematical objects are defined by geometrical characteristics of functions, coordinates, vectors, etc. and they are represented by images. In the analytic-structural way, the mathematical objects are explained from properties, characteristics, axioms or more general definitions. A way to foster the comprehension of a mathematical object or concept is by promoting transitions and interactions between these three types of thinking. In contrast with other theoretical approaches, the ways of thinking are not different stages of the development of the algebraic thinking, they are equally necessary when understanding a concept [10].

SEMIOTIC REPRESENTATION REGISTERS The traditional teaching of mathematical issues is generally characterized by emphasizing mechanical procedures and memorization of concepts, definitions and techniques. As Artigue says:

Engaging Students in the Learning of Linear Algebra

271

… studies also clearly demonstrate that, in response to difficulties encountered, traditional teaching and, in particular, at the university, still having other ambitions, tends to center in an algorithmic and algebraic practice… and to assess in essence the skills acquired in this domain. This phenomenon becomes a vicious circle: to achieve acceptable degrees of success, what students do better is evaluated, which in turn is considered by students as the essential, as it is what is being assessed... ([11], p. 97)

Duval, in his Theory of Semiotic Representation Registers [12], says that students should interact between different representation registers for a better apprehension of a mathematical concept. The coordination of different semiotic representation registers turns to be essential for the conceptual apprehension of an object: it is necessary that the object is not confused with its representations and that it is recognized in each of them, under these conditions a representation gives access to the represented mathematical object [12]. He also says that the semiotic representations are not only the way of express the mental representations for the purpose of communication, but they are essential for the thinking cognitive activity. Although it is often accepted that when a person understands or knows an object, they can represent it with a symbol or graphism, this author affirms that there is no noesis (conceptual apprehension of an object) without semiosis (or apprehension or production of a semiotic representation) [12]. He also distinguishes three types of cognitive activities linked to semiosis: the formation of an identifiable representation as register, the processing, as the transformation from one representation to another inside the register were it was created, and the conversion, which implies the transformation of a representation given in one register to other representation in a different register [13]. Dealing with the understanding of mathematical concepts, Duval states that each representation is partial referring to what it represents, the interaction between at least two representation registers of the mathematical object must be considered as necessary to achieve the understanding of the concept [12].

CONCEPT OF SKILLS The term ‘skill’ is defined as the knowledge and ability of a person to do something well and easily.

272

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero

Many authors working on this issue have established different definitions for this word. Petrovsky [14] for example, considers a skill as the mastering of a complex system of practical and psychic actions needed for the rational regulation of an activity, with the help of a person´s knowledge and habits. A skill is the relation between the individual and the object and its guiding element is the reason, in other words, the need to do something. Brito Fernandez [15] states that a skill is the mastering of the activity technics, both practical and cognitive. From the psycological point of view, Ávarez [16] defines a skill as a system of actions and operations to elaborate the information included in the knowledge and that leads to the achievement of a specific purpose. Bravo [17], referring to skills training, states that they are one of the key objectives of the learning and teaching process, and they are the ones that enable people to perform a particular task, depending on the success of the related abilities. These skills allow students to interact with the concept being dealt, and help with its understanding and assimilation. Regardless of the definitions given of the term “skill,” all authors consider the skills as a system of actions that allows the performance of a certain activity based on acquired knowledge and habits [18].

The Development of Mathematical Skills Skills are developed and enhanced by continous training, and they are the basis of the lines of action taken when solving practical and theorical issues. In particular, many authors recognize mathematical skills as those that are developed during the performance of actions and operations inmersed in a mathematical activity. As García Bello, Hernández Gallo and Pérez Delgado say [19], mathematical skills are the way that a student performs activities related to a certain mathematical activity which let them search and use concepts, properties, relationships and procedures, make use of strategies, and develop reasonings and judgements that are necessary for solving mathematical problems. This concept implies that it is not enough to think about the preparation of a student to demonstrate a theorem or solve an equation; it also refers to their possibilities to explain the way of selecting or using a method or procedure, estimate the characteristics of the desired result to check if what they obtained

Engaging Students in the Learning of Linear Algebra

273

satisfies the objective, and write them in a proper language, using different ways of representation. The development of mathematical skills is a planified process guided by the professor. So as students reach a conscious level of a certain action, it is necessary the planning and organization of the teaching process considering that the objective of the execution of that action is the development of a certain mathematical skill. The process of developing mathematical skills can be divided in many stages where each one involves different types of cognoscitive activities in the students [20]. These stages are: 

 





Guidance and motivation of the implementation: it is necessary to make students aware that the knowledge they have is not sufficient, make them see the difference between what they know and what they should know. This can be achieved by confronting the students to situations they are not able to solve, or they can solve with difficulty. Besides, it is necessary that students have the proper guidance, to know what to do and how. Uptake of the skill: faculty should give students tasks or situations where it is necessary to apply certain skills. Mastery of the skill: the main goal of this stage is that students accomplish a certain mastery level of the action. To this end, the professor will guide the performance of some kind of tasks in order to meet the goals. This tasks should be executed periodically, in many knowledge systems and different complexity levels. At this moment, students develop their independency, doing the tasks by themselves and using the knowledge they have about why they should do them. Systematization of the skill: this stage has as objective the spread of the action to new situations, it is the moment when students have to be able to relate new contents with others previously acquired. Test of the skill: at this stage, the students´ achievement of the objectives is assessed. Therefore, faculty should propose students to develop a task which involves the use of the skill concerned.

274

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero

It is important to remark that it is essential that students fulfill the taks in each one of the stages of the process of development of a skill. This fact reduces the lectures of the professor and centers the process in the student, as they have to play the main role guided by the professor. The classification of the mathematical skills considered in this work is that given by the revised Bloom´s Taxonomy [21]. It helps to understand how students learn and lays the foundations in each learning level for the purpose of ensuring a meaningful learning and the acquirement of skills that enables the use of the built knowledge.

Bloom’s Taxonomy The learning objective taxonomy, known as Bloom´s Objective Taxonomy, is a classification of the objectives and skills that students should achieve in the learning and teaching processes [22]. This taxonomy has been revised and enhanced by many authors over the past decades, being Churches the most recent reviewer [21].

Figure 1. Levels in Bloom´s Revised Taxonomy.

Engaging Students in the Learning of Linear Algebra

275

Table 1. Verbs associated to each level of Bloom’s Taxonomy Skill Remember Understand Apply Analyze Evaluate Create

Indicator Verbs Quote, define, search, enumerate, write, memorize, say, indicate, mention. Determine, conclude, estimate, associate, compare, generalize, distinguish, relate, explain, interpret Calculate, complete, execute, use, show, modify, choose, use, operate, solve, tabulate, do. Study, debate, criticize, deduce, differentiate, integrate, classify, associate. Estimate, test, formulate, choose, justify, argue, assess, discuss. Generate, design, elaborate, devise, formulate, construct, invent, modify, develop, produce.

The revised Bloom´s Taxonomy distinguishes six levels that students have to overcome so as a real learning process takes place. These levels are: remember, understand, apply, analyze, evaluate and create. The first three correspond to low order skills, and the others to superior order. Figure 1 describes what is expected in each of these levels [23]. Table 1 presents some indicator verbs of the cognitive process developed by students for each level of the revised Bloom´s Taxonomy. The assessment of the action or activity done using these verbs will determine the taxonomy levels fullfilled by students.

VISUAL TOOLS The use of different programs as didactic tools, like Mathematica, Maple, Scilab or Matlab, among others, in the teaching of mathematical issues requires the handling of code that distress clouds on the object of study. This particularly happens when studying LA, where both algebraic operations and plots may be required from this kind of software. Even though there exists many interactive tools, where students do not have to deal with the sintaxis of programming languages, sometimes these apps do not satisfy the requirements when a teaching sequence is designed. For this reason, the Engineering and Education

276

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero

Group (GIE) has been working for many years in the design of visual applications referring to different topics in mathematics [24, 25]. Different options were tested, taking advantage of the possibility of creating customized graphic interphases that uses the computing and graphic power of the software, but SCILAB was finally the software chosen for this purpose. The fact that it is a free software decisively influenced in its choice, as it is available for students in the Internet at no cost. SCILAB is a high level programming language for scientific computation, interactive, free and available in multiple operative systems (Unix, GNU/Linux, Windows, Solaris, Alpha) developed by INRIA (Institut National de Recherche en Informatique et Automatique) and the ENPC (École Nationale des Ponts et Chaussées). Both the program and documentation related can be downloaded from www.scilab.org. Even though SCILAB was created for numerical operations, some symbolic calculations can be performed. It includes hundred of mathematical functions and the possibility of integrating programs written in other known languages, like FORTRAN, Java, C and C++.

CUSTOMIZED VISUAL TOOLS FOR SELECTED TOPICS When teaching LA, many issues are difficult to understand because of the abstraction. The use of customized visual apps in the teaching process enables to build up activities where not only different representations of the matematical object in study can be used, but also the development of certain mathematical skills can be achieved. In this sense, it is expected to reduce the formalism obstacle, by striking the right balance between the concret and abstract approaches in the teaching of LA. Some applications were developed by GIE related to selected topics in LA. In particular, tools for the calculation and visualization of eigenvalues and eigenvectors of 2 x 2 matrices, change of basis in ℝ2 , and linear transformations are presented below. The visual tools presented here are easy to interpret and manipulate. This fact makes students to keep concentrated on the concepts and properties that faculty wants to emphasize, without having to deal with code.

Engaging Students in the Learning of Linear Algebra

277

Calculating and Visualizing Eigenvalues and Eigenvectors Consider a square matrix A of size n, x ≠ 0, a vector in ℂ𝑛 , and λ, a scalar in ℂ. Then x is an eigenvector of A with eigenvalue λ if A x = λ x. This equation gives the characteristic polynomial, whose roots are the eigenvalues of A [26].

Figure 2. Customized visual app “Eigenvalues an eigenvectors of a matrix.”

Figure 2 shows the tool designed for calculating the eigenvalues and eigenvectors associated to a real 2 x 2 matrix. The input data of this tool are the coefficients of the matrix whose eigenvalues and eigenvectors will be calculated. Users have to press the Eigenvalues or Eigenvectors buttons to obtain the desired values. Besides, if the

278

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero

user wants to get the graphic representation of the eigenvectors obtained, the button Graph of the Eigenvectors should be pressed. The Characteristic Polynomial button gives what its name implies, the polynomial whose roots are the eigenvalues of the matrix entered and its graph. Thus, the user can visually check that the eigenvalues obtained are equal to the roots of the characteristic polynomial. It may happen that the given matrix –even with real coefficients– has eigenvectors of complex components. In this case, as the graph is not possible in two real axes, the tool will show a message in the corresponding area, as it is shown in Figure 3. In a previous paper [27] other tool to work this issue was discussed.

Figure 3. Message indicating that the eigenvectors are complex.

Calculating and Visualizing Coordinates of Points in Two 2D Basis Consider a vector space V. A subset S ⊆ V is a basis of V if it is linearly independent and spans V [26]. The coordinates of a vector v  V in a basis B

Engaging Students in the Learning of Linear Algebra

279

are the coefficients required in the linear combination of the elements of B to obtain v. Obviously, if the basis changes, the coordinate changes. Figure 4 shows a customized tool designed to calculate the coordinates of a point given in the canonic basis, with respect to another one, in this case the new basis is A  (1;1);(1;1) . This new basis is fixed, it can not be changed.

Figure 4. Customized Tool “Change of basis in the plane.”

Data input in this tool are the coordinates of a ℝ 2 point in the canonic basis. With the idea of giving clear graphics, easy to visualize, the coordinates of the given points must belong to the interval [-3, 3]. The button New coordinates should be pressed to obtain the new coordinates.

280

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero

Visualizing Linear Transformations in the Plane Given two vector spaces U and V, a linear transformation T: U → V is a function that carries elements of U (called the domain) to V (called the codomain), with the properties of linearity: T(u1 + u2) = T(u1) + T(u2) for all u1, u2 ∈ U, and T(α u) = α T(u) for all u ∈ U and all α ∈ ℂ [26].

Figure 5. Visual tool “Linear Transformations on the plane.”

Figure 5 shows the tool designed for interpreting linear transformations on the plane. In this case, the input data is the law of the transformation or the associated matrix, after selecting the corresponding option button, and the geometrical object to be transformed. Different symbols and colors in the geometrical objects let users appreciate the action of the transformation.

Engaging Students in the Learning of Linear Algebra

281

Suppose T: U → V is a linear transformation. Then the kernel of T is the subset of the domain U: (T) = {u ∈ U | T(u) = 0} [26]. That is to say, the kernel of a linear transformation contains all the preimages of the null vector of the codomain. The kernel of a linear transformation is a subspace of the domain. If the line transformed is the kernel, the image will be a point, as Figure 6 shows. In a previous work [28] other tools to analyze linear transformations were discussed.

Figure 6. Visual tool “Linear Transformations on the plane.” Image of a line.

USE OF THE DESIGNED TOOLS IN THE CLASSROOM Usually students learn LA concepts applying procedures routinely without articulating different representations of them, with disappointing learning outcomes. Considering the potential that the designed apps have as a tool to ease visualization, exploration, manipulation, and calculus, some activities were

282

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero

designed to be performed by students using these resources related to the issues eigenvalues and eigenvectors, change of basis and linear transformations. These activities were conceived to integrate the new knowledge inside the students´ cognitive structures in a significant way and develop mathematical skills. In these activities, by applying an inductive approach, and considering the questions stated in each situation, students will be able to deduce some concepts and properties. But it is worth to remark that showing examples that fulfill a property is not enough to demonstrate it. It is necessary to give a closure with a formal demonstration.

Activity 1 Consider the following matrices: 3 2 ) 0 1 −1 0 𝐴2 = ( ) 5 2 𝐴1 = (

3 0 ) 2 1 −1 5 𝐵2 = ( ) 0 2 𝐵1 = (

a) Calculate the eigenvalues and eigenvectors of the given matrices. How are the eigenvalues and eigenvectors of the pairs of matrices A1, B1 and A2, B2 related? b) What is the relation between the matrices of each pair? c) What kind of matrices are the given ones? d) Can you draw any conclusion considering the previous answers? e) How can you demonstrate them? Students, after performing this activity, should realize that the eigenvalues of a matrix and its transpose are the same, but this does not happen with the eigenvectors; and that the eigenvalues of a triangular matrix are the elements of its diagonal.

Engaging Students in the Learning of Linear Algebra

283

Activity 2 Given the following matrices: 1 2 −1 𝐴2 = ( 6 𝐴1 = (

4 ) 3 0.5 ) 1

−0.6 0.4 −0.25 𝐵2 = ( 1.5 𝐵1 = (

0.8 ) −0.2 0.125 ) 0.25

0.5 2 ) 1 1.5 2 −1 𝐶2 = ( ) −12 −2 𝐶1 = (

a) Which is the result of A1  B1 and A2  B2 ? b) Can you find any relation between A1 and C1? And betweenA2 and C2? c) Calculate the eigenvalues and eigenvectors of each of the given matrices. d) Can you draw any conclusion considering the previous answers? After performing this activity, students should deduce the following properties: 

If the eigenvalues of the matrix A are 1 , 2 , …, n , then the eigenvalues of the matrix   A are   1 ,   2 ,…,   n .



If the eigenvalues of the matrix A are 1 , 2 , …, n , then the eigenvalues of the matrix A1 are 1/ 1 , 1/ 2 ,…, 1/ n .

 

The eigenvalues of A and A1 are equal. The eigenvalues of A and   A are equal.

Activity 3 Given the following matrices: 𝐴1 = (

−2 −1 ) −4 1

𝐵1 = (

2 1

−1 ) 4

a) Find the eigenvalues and eigenvectors of each matrix. b) How many eigenvalues each matrix has? And eigenvectors?

284

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero c) Study the linear dependence of the eigenvectors of each matrix. d) Are you able to set a property based on the results obtained?

Considering the obtained results, students should realize that the eigenvectors associated to a matrix are linearly independent, if they are associated to different eigenvalues.

Activity 4 Given the points P(2, 1) and Q(-1, -2) in the canonical basis, find their coordinates referring to the basis A = {(1, 1), (-1, 1)}. Observing the plot in the application, analyze the points´ position in the plane after the change of basis. After the analysis, students should realize that the points do not change their position on the plane. A change of basis implies a change of the reference system, therefore the coordinates are different, but not the position.

Activity 5 Consider the following linear transformations: 𝑇1 : ℝ2 → ℝ2 / 𝑇1 (𝑥, 𝑦) = (−𝑦, 𝑥) 𝑇2 : ℝ2 → ℝ2 / 𝑇2 (𝑥, 𝑦) = (2𝑥, 2𝑦) 2 0 𝑥 𝑇3 : ℝ2 → ℝ2 / 𝑇3 (𝑥, 𝑦) = ( )( ) 0 2 𝑦 0 −1 𝑥 𝑇4 : ℝ2 → ℝ2 / 𝑇4 (𝑥, 𝑦) = ( ) (𝑦) 1 0 a) Apply each one of these transformations to the unit circumference centered in (0, 0). What can you say about the geometry of the image in each case? b) Can you find any relation between these linear transformations? c) How can you write a linear transformation in its matrix form if the associated matrix is expressed in the canonical basis?

Engaging Students in the Learning of Linear Algebra

285

While performing this activity, students will discover that T 1 and T4 have the same effect on the selected points (a 90° counter-clockwise rotation). The same happens with T2 and T3, as both produce an expansion with coefficient 2. Therefore, students should conclude that a linear transformation can be expressed by different representations. Although calculating the matrix of a linear transformation in the canonical basis is one of the simplest examples, this is a favorable situation to explain when a linear transformation may be expressed in this way. It will be demonstrated in class that if V and W are vector spaces of dimensions n and m respectively (dim V = n and dim W = m), the linear transformation T: V →W admits a matrix representation as V is isomorph to ℝ𝑛 and W is isomorph to ℝ𝑚 .

Activity 6 Consider the following linear transformations: 𝑇1 : ℝ2 → ℝ2 / 𝑇1 (𝑥, 𝑦) = (

𝑥+𝑦 𝑥−𝑦 , 2 ) 2

𝑥 1 0 𝑇2 : ℝ2 → ℝ2 / 𝑇2 (𝑥, 𝑦) = ( ) (𝑦) 0 −1 a) Apply the given linear transformations to a line passing through the origin, given by y = k x, where k ℝ. What can you say about the image obtained in each case? b) Observe the graphic results given by the tool, and answer:  Which is the image of the null vector in each case?  If v is a vector of the domain. In each case, what is the relation between T(-v) and -T(v)? c) Can you state any conclusion? After analyzing the graphics obtained in each example, students should say that T1 is a transformation that projects each point of the plane to the line that bisects the first and third quadrant, while T2 produces a reflexion with respect to the x-axe. They can also say that for every linear transformation T: V →W:

286

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero  

The image of the null vector of the domain V is the null vector of the codomain W. In symbols: T(0V) = 0W. The image of the opossite of a vector in V is the opposite of the image of that vector. In symbols: T(-v) = -T(v).

Activity 7 Given the following linear transformations: 𝑇1 : ℝ2 → ℝ2 / 𝑇1 (𝑥, 𝑦) = (𝑥 + 2𝑦, 2𝑥 + 4𝑦) 1 0 𝑥 𝑇2 : ℝ2 → ℝ2 / 𝑇2 (𝑥, 𝑦) = ( )( ) 2 1 𝑦 a) Apply the given linear transformations to a line passing through the origin, given by y = k x, where k  ℝ. What can you say, by watching the graph given by the tool, about the elements of the domain whose image is the null vector of the codomain? b) Check your intuition with the obtained results applying the formulae and indicate the graphical interpretation. c) Are the sets obtained as solution a subspace of the domain of each transformation? When doing this activity, students will obtain the kernel of each linear transformation, without knowing about it. This will be a proper situation to introduce this concept. Recalling both the graphic and algebraic registers, they should complete a table as the one shown as Table 2. Considering the obtained results, students will realize that in both cases, the kernel is a subspace of the domain of the linear transformation. This situation does not only happen in these particular cases, it is a property for every linear transformation on any vector space.

Engaging Students in the Learning of Linear Algebra

287

Table 2. Kernel of a linear transformation and its graphical interpretation Linear Transformation

Kernel

T1 ( x, y)   x  2 y;2x  4 y 

Nu(T1 )  ( x; y) 2 / x  y  0

Graphical Interpretation The origin of coordinates

1 T2 ( x, y )   2

Nu(T2 )  ( x; y) 2 / y  0,5  y

A line that passes through O(0, 0)

0  x    1  y 

CONCLUSION The use of visual tools as the ones presented during the teaching and learning processes of LA makes students compare, estimate, experiment, analize, explain, test, justify… all of them different actions of the revised Bloom´s Taxonomy. Taking into account that the development of a mathematical skills system helps students to understand mathematical concepts, these visual apps are a proper tool to achieve this objective. By using them, students save the time spent on doing manual calculations and they can use it for doing activities that recall mathematical skills of superior order according to Bloom´s Taxonomy, which are not usually developed in the traditional teaching of mathematics. Moreover, as Williamson and Kaput say [29], an important consequence of the use of technological resources in mathematical education is that they help students to have an inductive way of thinking. The students´interaction with technology produces a favourable space where students can discover mathematical relationships or properties from the observation of the repetition of certain results. It also helps students to understand different concepts as they can deal with different semiotic representations in a coordinate manner, without contradictions. Nevertheless the mere presence of visual tools does not ease the comprehesion process of the concepts involved in LA, reducing the formalism obstacle. Professors, by their timely intervention, are the ones who will foster the meeting between students and the resource to make knowledge arise, by the design of a situation [30]. Therefore, the use of visual tools as the ones designed

288

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero

should contribute to develop certain mathematical skills and construct different semiotic representations of the objects as the obstacle of formalism is mere didactic.

REFERENCES [1]

[2]

[3]

[4]

[5]

Costa, V., & Vacchino, M. (2007). La enseñanza y aprendizaje del Álgebra Lineal en la Facultad de Ingeniería, UNLP. Actas del XXI Congreso Chileno de Educación en Ingeniería. Universidad de Chile. [Teaching and learning of Linear Algebra in the Faculty of Engineering, UNLP. Proceedings of the XXI Chilean Congress of Engineering Education. University of Chile]. Dorier, J. L., Robert, A., Robinet, J., & Rogalski, M. (1997). L’Algèbre Linéaire: L’Obstacle du Formalisme à Travers Diverses Recherches de 1987 à 1995, In J-L. Dorier (Ed.), L’enseignement de l’algèbre linéaire en question (pp. 105-147). Grenoble, France : La Pensée Sauvage Editions. [Linear algebra: the obstacle of formalism through various researches from 1987 to 1995. In J-L. Dorier (Ed.), The teaching of the linear algebra in question (pp. 105-147). Grenoble, France: The Wild Thought Editions.] Uicab, R., & Oktaç, A. (2006). Transformaciones lineales en un ambiente de geometría dinámica. Revista Latinoamericana de Investigación en Matemática Educativa, 9 (3), 459-490. [Linear transformations in an environment of dynamic geometry. Latin American Journal of Research in Educational Mathematics, 9 (3), 459-490.] Sierpinska, A., & Dreyfus, T. (1999). Evaluation of a Teaching Design in Linear Algebra: The case of Linear Transformations, Recherches en Didactique des Mathématiques, 19 (1), 7-40. [Research in Mathematical Didactics, 19 (1), 7-40.] Rubio, B. (2013). La enseñanza de la visualización en álgebra lineal: el caso de los espacios vectoriales cociente (tesis doctoral). Universidad Complutense de Madrid, Madrid, España. [The teaching of visualization in linear algebra: the case of the quotient vector spaces (doctoral thesis). Complutense University of Madrid, Madrid, Spain.]

Engaging Students in the Learning of Linear Algebra [6]

289

Caligaris, M., Rodríguez, G., Favieri, A. & Laugero, L. (2017). Uso de objetos de aprendizaje para el desarrollo de habilidades matemáticas. Actas del XX Encuentro Nacional, XII Internacional de Educación Matemática en Carreras de Ingeniería, pp. 623-631. [Use of learning objects for the development of mathematical skills. Proceedings of the XX National, XII International, Meeting on Mathematics Education in Engineering Careers, pp. 623-631]. [7] Vinner, S. (1991). The role of definitions in the teaching and learning of mathematics. In D. Tall (Ed.), Advanced Mathematical Thinking (pp. 6581). Dordrecht: Kluwer Academic Publishers. [8] Hurman, L. (2007). El papel de las aplicaciones en el proceso de enseñanza – aprendizaje del Álgebra Lineal. Enseñanza del Álgebra. Colección Digital Eudoxus. Nº 3. [The role of applications in the teaching - learning process of Linear Algebra. Algebra Teaching. Eudoxus Digital Collection. Nº 3.] [9] Sierpinska, A. (2000). On Some Aspects of Students’ Thinking in Linear Algebra. In: J. L. Dorier (Ed.), On the Teaching of Linear Algebra (pp. 209-246). Dordrecht: Springer. [10] Parraguez, M. y Bozt, J. (2012). Conexiones entre los conceptos de dependencia e independencia lineal de vectores y el de solución de sistemas de ecuaciones lineales en R² y R³ desde el punto de vista de los modos de pensamiento. Revista electrónica de investigación en educación en ciencias, 7(1), 49-72. [Connections between the concepts of linear dependence and independence of vectors and the solution of systems of linear equations in R² and R³ from the point of view of thought modes. Electronic Journal of Research in Science Education, 7 (1), 49-72.] [11] Artigue, M. (1995). La enseñanza de los principios del cálculo: problemas epistemológicos, cognitivos y didácticos. In P. Gómez (Ed.), Ingeniería didáctica en educación matemática (pp. 97-140). México: Grupo Editorial Iberoamérica. [Teaching the principles of calculus: epistemological, cognitive and didactic problems. In P. Gómez (Ed.), Didactic Engineering in Mathematics Education (pp. 97-140). Mexico: Iberoamerica Publishing Group.]

290

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero

[12] Duval, R. (1998). Registros de representación semiótica y funcionamiento cognitivo del pensamiento. In F. Hitt (Ed.), Investigaciones en Matemática Educativa II, pp. 173-201. México. Cinvestav. [Registers of semiotic representation and cognitive functioning of thought. In F. Hitt (Ed.), Research in Educational Mathematics II, p. 173-201. Mexico. Cinvestav]. [13] Duval, R. (2006). Un tema crucial en la educación matemática: La habilidad para cambiar el registro de representación, La Gaceta de la Real Sociedad Matemática Española, 9(1), 143-168. [A crucial issue in mathematics education: The ability to change the representation record, The Gazette of the Royal Spanish Mathematical Society, 9 (1), 143 - 168.] [14] Petrovsky, A. (1985). Psicología General. Editorial Progreso, Moscú. [General psychology. Progress Editorial, Moscow.] [15] Brito Fernández, H. (1987). Psicología general para los ISP. La Habana, Cuba: Pueblo y Educación. [General Psychology for Higher Pedagogical Institutes. Havana, Cuba: People and Education.] [16] Álvarez de Zayas, C. (1999). La escuela en la vida. La Habana, Cuba: Pueblo y Educación. [The school in life. Havana, Cuba: People and Education.] [17] Bravo Estévez, M. (2002). Una estrategia didáctica para la enseñanza de las demostraciones geométricas (tesis doctoral). Universidad de Oviedo, Oviedo, España. [A didactic strategy for the teaching of geometric demonstrations (doctoral thesis). University of Oviedo, Oviedo, Spain.] [18] Rodríguez Rebustillo, M., & Bermúdez Sarguera, R. (1993). Algunas consideraciones acerca del estudio de las habilidades. Revista cubana de Psicología, 10 (1), 27-32. [Some considerations about the study of skills. Cuban Journal of Psychology, 10 (1), 27-32.] [19] García Bello, B., Hernández Gallo, T., & Pérez Delgado, E. (2010). The process of formation of mathematical skills. Retrieved from https://es.scribd.com/document/360870457/Proceso-FormacionHabilidades-Matematicas. [20] Machado Ramírez, E., & Montes de Oca Recio, N. (2009). El desarrollo de habilidades investigativas en la educación superior: un acercamiento para su desarrollo. Revista Humanidades Médicas, 9 (1). [The

Engaging Students in the Learning of Linear Algebra

[21] [22]

[23]

[24]

[25]

[26]

[27]

[28]

291

development of research skills in higher education: an approach to its development. Medical Humanities Magazine, 9 (1).] Churches, A. (2008). Bloom’s taxonomy for the digital age. Eduteka. http://eduteka.icesi.edu.co/articulos/TaxonomiaBloomDigital. Bloom, B., Engelhart, M., Furst, E., Hill, W., & Krathwohl, D. (1956). Taxonomy of Educational Objectives. The Classification of Educational Goals. Handbook 1. Cognitive Domain. New York, United States of America: Longmans. Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching, and assessing, Abridged Edition. Boston, MA, United States of America: Allyn and Bacon. Caligaris, M., Rodríguez, G., & Laugero, L. (2013). Learning Objects for Numerical Analysis Courses. Procedia - Social and Behavioral Sciences, 106, 1778-1785. Caligaris, M., Rodríguez, G., & Laugero, L. (2014). A Numerical Analysis Lab: Solving System of Linear Equations. Procedia - Social and Behavioral Sciences, 131, 160-165. Beezer, R. (2010). A first course in linear Algebra. Department of Mathematics and Computer Science. University of Puget Sound. Washington. Retrieved from http://linear.ups.edu. Last visit: July 2019. Caligaris, M., Rodríguez, G., & Laugero, L. (2011). Laboratorio de Álgebra lineal. Autovalores y autovectores. Actas del XVI Encuentro Nacional, VIII Internacional de Educación Matemática en Carreras de Ingeniería. [Linear Algebra Lab. Eigenvalues and eigenvectors. Proceedings of the XVI National, VIII International, Meeting on Mathematics Education in Engineering Careers.] Caligaris, M., Rodríguez, G., & Laugero, L. (2009). El papel de los registros semióticos en el aprendizaje de las transformaciones lineales. Actas del VI Congreso Internacional de Enseñanza de la Matemática Asistida por Computadora. [The role of semiotic registers in learning linear transformations. Proceedings of the VI International Congress of Computer Aided Mathematics Teaching.]

292

Marta G. Caligaris, Georgina B. Rodríguez and Lorena F. Laugero

[29] Williamson, S., & Kaput, J. (1999). Mathematics and virtual culture: an evolutionary perspective on technology and mathematics education. Journal of Mathematical Behavior, 17(21), 265-281. [30] Sadovsky, P. (2005). La Teoría de Situaciones Didácticas: un marco para pensar y actuar la enseñanza de la matemática. In H. Alagia, A. Bressan, & P. Sadovsky (Eds), Reflexiones teóricas para la educación Matemática (pp. 13-65). Buenos Aires, Argentina: Libros del Zorzal. [The Theory of Teaching Situations: a framework to think and act the teaching of mathematics. In H. Alagia, A. Bressan, & P. Sadovsky (Eds), Theoretical Reflections for Mathematics Education (pp. 13-65). Buenos Aires, Argentina: Zorzal Books.]

ABOUT THE EDITOR

Ivan I. Kyrchei, PhD Senior Research Fellow, Associate Professor Research Fellow of Pidstryhach Institute for Applied Problems of Mechanics and Mathematics of National Academy of Sciences of Ukraine, Ukraine Email: [email protected]; [email protected] Ivan Kyrchei was born in 1964 in Lviv region, Ukraine. He received his PhD (Candidate of Science) degree in 2008 from Taras Shevchenko National University of Kyiv. Now he is working as the Senior Researcher of Pidstryhach Institute for Applied Problems of Mechanics and Mathematics of National Academy of Sciences of Ukraine. He was awarded the title of Senior Research Fellow (Algebra and The Theory of Numbers) from Ministry of Education and Science of Ukraine, equivalent

294

About the Editor

to Associate Professor. His research interests are mostly in Algebra, Linear Algebra and their applications. His papers have published in well-known professional journals and editor's books. He serves also as Editorial Board Member and reviewer in several journals.

INDEX A activation state variables matrix, 10, 15, 16 associated MP-matrix, 240, 245, 247 asymptotic stability, 234, 264

B behaviors, 234

C C++, 276 CGNE algorithm, 111, 112, 127 CGNR algorithm, 112, 113 chemical kinetics, 264 cognitive activity, 271 cognitive function, 290 cognitive process, 275 column determinant, 45, 49, 50, 51, 52, 54, 143, 144 column rank, 5, 12, 13 compartment analysis, 234 complete bipartite graph, 241, 246 complete graph, 241 complex eigenvalues, 261, 263 complex numbers, 259 complexity, 268, 273 computer algebra, 1, 2, 4, 21, 29, 37, 38, 40, 42 conjugate transpose, 2, 46, 50, 85, 138 cycle, 51, 241, 242, 260

D defense mechanisms, 269 determinantal rank, 54, 56 determinantal representation(s), ix, 45, 49, 50, 54, 55, 56, 57, 58, 59, 60, 61, 62, 66, 67, 68, 70, 73, 76, 79, 80, 81, 82, 83, 86, 87, 91, 93, 94, 95, 99, 106, 107, 138, 140, 143, 144, 145, 153, 158, 160 differential equations, 25, 26, 27, 37, 38, 233, 234, 235, 264 directed graph, ix, 233, 234, 235, 237, 239, 241, 243, 245, 247, 249, 250, 251, 252, 253, 255, 257, 259, 260, 261, 263, 264 directed multigraph, 236, 238 discs, 255, 258, 259, 260 Drazin inverse, 2, 3, 4, 5, 19, 32, 39, 41, 43, 44, 49, 106, 107 dynamic system, 18 dynamical systems, viii, 1, 4, 12, 21, 22, 26, 37, 38, 40, 44, 265

E eigenvalues of the matrix, 18, 278, 283 equilibrium state, 9, 10, 11

F facilitators, 267 flux balance, 252, 254, 257, 259, 261, 264 formalism obstacle, 268, 269

296

Index

formation, 269, 271, 290 Frobenius norm, 6, 15, 16, 47, 112, 125, 126 full binary tree, 241, 247, 248

G gain parameter, 12 general coupled matrix equations, 111, 112, 113, 124, 130, 134 generalized bisymmetric solution, vii, 111, 112, 113, 114, 115, 117, 119, 121, 123, 124, 125, 126, 127, 129, 130, 131, 133, 135 generalized inverses, 1, 2, 3, 4, 5, 7, 9, 11, 12, 13, 15, 17, 19, 21, 23, 25, 26, 27, 29, 31, 33, 35, 37, 38, 39, 40, 41, 43, 46, 49, 137, 143, 153 Gershgorin closed disc, 256 Gradient Neural Networks (GNN), vii, 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 23, 24, 25, 26, 27, 30, 32, 37, 43 graphs theory, 234, 235, 264 group inverse, 2, 3, 36

H Hermitian matrix, 48, 53, 54, 144 Hermitian solution, 46, 47, 48, 49, 50, 85, 86, 98, 99, 100, 102, 106, 107 HS version of BiCR algorithm, 112, 113, 114, 130

I idempotent, 61, 150 input tank, 235, 255, 256, 260 integral representation, 13, 14 internal tank, 235 involution, 48

J Java, 276

L left inverse, 3, 13 left linear combination, 52, 54 linear algebra, vii, viii, 1, 39, 40, 41, 42, 45, 103, 104, 105, 106, 109, 111, 132, 133, 137, 156, 157, 158, 159, 160, 163, 164, 167, 179, 201, 205, 209, 231, 233, 234, 235, 264, 265, 267, 268, 269, 270, 271, 273, 275, 277, 279, 281, 283, 285, 287, 288, 289, 291, 294 linear dependence, 284 linear ODE system, 237 linear systems, 264 lower matrix, 242 Lyapunov function, 8 Lyapunov stability, 9

M mass balance, 237, 238 Mathematica, 1, 2, 26, 27, 28, 29, 31, 32, 37, 44, 103, 109, 160, 275 mathematical skills, 268, 269, 272, 273, 274, 282, 287, 288, 289, 290 Matlab program, 1, 25, 26, 37 Matlab Simulink, 1, 25, 26, 37 matrix, vii, viii, ix, 234, 239, 240, 242, 243, 244, 245, 246, 247, 248, 249, 251, 252, 253, 254, 255, 256, 257, 258, 259, 261, 263, 277, 278, 280, 282, 283, 284, 285 matrix algebra, vii, viii matrix inverse, 5 mixing problems, 233, 234, 264 models, vii, viii, 234 Moore-Penrose inverse, ix MP-matrix, 239, 240, 242, 243, 244, 245, 246, 247, 248, 249, 251, 252, 253, 254, 255, 256, 257, 258, 259, 261, 263 MP-matrix eigenvalues, 252, 256, 258, 261

N noncommutative determinant, 45, 50, 51 nonstandard involution, 48, 105 numerical algorithms, 3, 8, 135

Index

O open system, 263 operations, 270, 272, 275, 276 ordinary differential equations, 233 orthogonal projectors, 59, 145 Outer generalized inverse, 3 outer inverse, 2, 5, 14, 15, 19, 22, 26, 27, 28, 29, 34, 37, 39, 40, 44, 46 output tank, 235

P planar graph, 241, 244, 246 principal minor, 54, 55, 144 principal submatrix, 54 programming, 275, 276 projection matrix, 58, 68, 145

Q quaternion skew field, 46, 47, 48, 50, 106, 107, 108, 138, 159

R rank, 2, 5, 13, 14, 16, 24, 35, 36, 39, 42, 44, 46, 54, 55, 56, 61, 62, 70, 76, 86, 87, 98, 99, 138, 145, 153, 206, 207, 211, 212, 213, 214, 215, 216, 217 reflexive inverse, 46 right inverse, 3 right linear combination, 52 row determinant, viii, 45, 49, 51, 106, 143, 160 row rank, 12, 13, 54

297

S semiotic registers, 267, 268, 269, 292 semi-scalar equivalence, ix set theory, 269 similarity of matrices, 206, 231 Simulink, 15, 16, 17, 18, 26, 29, 33 software, 275, 276 solutions qualitative behavior, 240 stability, 13, 17, 18, 43, 234, 240, 261, 262, 264 state matrix, 8, 10, 12, 18, 20, 22, 24 strictly diagonally dominant, 255 structure, 240, 241, 267 Sylvester matrix equations, viii, 47, 103, 104, 112, 113, 131, 133, 134, 139, 156, 157, 158

T tanks, 235, 237, 238 taxonomy, 274, 275, 291 technology, 287, 292 theoretical approaches, 270 transformations, 270, 276, 280, 281, 282, 284, 285, 286, 288, 292 tridimensional cube, 244

V vector, vii, 269, 270, 277, 278, 280, 281, 285, 286, 287, 289 visual apps, ix, 268, 269, 276, 287

Z Zhang Neural Networks (ZNN), 5, 7, 38, 43