Algebraic and Differential Methods for Nonlinear Control Theory: Elements of Commutative Algebra and Algebraic Geometry [1st ed.] 978-3-030-12024-5, 978-3-030-12025-2

This book is a short primer in engineering mathematics with a view on applications in nonlinear control theory. In parti

460 41 3MB

English Pages XIV, 196 [201] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Algebraic and Differential Methods for Nonlinear Control Theory: Elements of Commutative Algebra and Algebraic Geometry [1st ed.]
 978-3-030-12024-5, 978-3-030-12025-2

Table of contents :
Front Matter ....Pages i-xiv
Mathematical Background (Rafael Martínez-Guerra, Oscar Martínez-Fuentes, Juan Javier Montesinos-García)....Pages 1-17
Group Theory (Rafael Martínez-Guerra, Oscar Martínez-Fuentes, Juan Javier Montesinos-García)....Pages 19-30
Rings (Rafael Martínez-Guerra, Oscar Martínez-Fuentes, Juan Javier Montesinos-García)....Pages 31-47
Matrices and Linear Equations Systems (Rafael Martínez-Guerra, Oscar Martínez-Fuentes, Juan Javier Montesinos-García)....Pages 49-64
Permutations and Determinants (Rafael Martínez-Guerra, Oscar Martínez-Fuentes, Juan Javier Montesinos-García)....Pages 65-73
Vector and Euclidean Spaces (Rafael Martínez-Guerra, Oscar Martínez-Fuentes, Juan Javier Montesinos-García)....Pages 75-91
Linear Transformations (Rafael Martínez-Guerra, Oscar Martínez-Fuentes, Juan Javier Montesinos-García)....Pages 93-107
Matrix Diagonalization and Jordan Canonical Form (Rafael Martínez-Guerra, Oscar Martínez-Fuentes, Juan Javier Montesinos-García)....Pages 109-123
Differential Equations (Rafael Martínez-Guerra, Oscar Martínez-Fuentes, Juan Javier Montesinos-García)....Pages 125-161
Differential Algebra for Nonlinear Control Theory (Rafael Martínez-Guerra, Oscar Martínez-Fuentes, Juan Javier Montesinos-García)....Pages 163-182
Back Matter ....Pages 183-196

Citation preview

Mathematical and Analytical Techniques with Applications to Engineering

Rafael Martínez-Guerra Oscar Martínez-Fuentes Juan Javier Montesinos-García

Algebraic and Differential Methods for Nonlinear Control Theory Elements of Commutative Algebra and Algebraic Geometry

Mathematical and Analytical Techniques with Applications to Engineering

The importance of mathematics in the study of problems arising from the real world, and the increasing success with which it has been used to model situations ranging from the purely deterministic to the stochastic, in all areas of today’s Physical Sciences and Engineering, is well established. The progress in applicable mathematics has been brought about by the extension and development of many important analytical approaches and techniques, in areas both old and new, frequently aided by the use of computers without which the solution of realistic problems in modern Physical Sciences and Engineering would otherwise have been impossible. The purpose of the series is to make available authoritative, up to date, and self-contained accounts of some of the most important and useful of these analytical approaches and techniques. Each volume in the series will provide a detailed introduction to a specific subject area of current importance, and then will go beyond this by reviewing recent contributions, thereby serving as a valuable reference source.

More information about this series at http://www.springer.com/series/7311

Rafael Martínez-Guerra Oscar Martínez-Fuentes Juan Javier Montesinos-García •



Algebraic and Differential Methods for Nonlinear Control Theory Elements of Commutative Algebra and Algebraic Geometry

123

Rafael Martínez-Guerra Departamento de Control Automático CINVESTAV-IPN Mexico City, Distrito Federal, Mexico

Oscar Martínez-Fuentes Departamento de Control Automático CINVESTAV-IPN Mexico City, Distrito Federal, Mexico

Juan Javier Montesinos-García Departamento de Control Automático CINVESTAV-IPN Mexico City, Distrito Federal, Mexico

ISSN 1559-7458 ISSN 1559-7466 (electronic) Mathematical and Analytical Techniques with Applications to Engineering ISBN 978-3-030-12024-5 ISBN 978-3-030-12025-2 (eBook) https://doi.org/10.1007/978-3-030-12025-2 Library of Congress Control Number: 2018968389 © Springer Nature Switzerland AG 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

This book is written so the reader with little knowledge on mathematics and without experience in control theory can start from the beginning and go directly through the logical evolution of the topics. If the reader has advanced knowledge on mathematics, is fluent on abstract algebra, linear algebra, etc., will be able to read this book a little faster, even so, the reader must study the mathematics sections of the book, to make sure he is familiar with any special notation that we have presented. If the reader only knows about mathematics but has no experience in natural sciences or engineering, he must not be discouraged for the occasional use of examples and vocabulary of other fields (physical models or others). If the reader knows a little about control theory, maybe from previous courses of engineering or physics, it is possible that he will have to work slightly more at the beginning to get used to the ideas and notations in modern mathematics. When the reader is in a section that contains some mathematics that are new to him, it is required of the reader to study the examples, to do many exercises and to learn the new subject as if he has taken a course of it. At the end of the book, the reader will know not only control theory but also a little about useful mathematics. Whatever the previous experience the reader has, he will find in the book some topics that are needed to be learned for this particular area. As well, the book shows in a simple way some elementary concepts of mathematics such as basic algebraic structures (sets, groups, rings, fields, differential rings, differential fields, differential field extensions, etc., that is to say elements of commutative algebra and algebraic geometry in general) for the introduction to the theory of nonlinear control seen from a differential and algebraic point of view. Enough material has been included, rigorous proofs and illustrative exercises of each subject, as well as some examples and demonstrations are left to the reader for the good understanding of the chapters of the book. In the selection of material for this book, the authors intended to expound basic ideas and methods applicable to the study of differential equations in theory of control. The exposition is developed in such a way that the reader can omit passage that turn out to be difficult for him. The content of this book constitutes a tool that can be read by not only mathematicians, engineers and physicists but also all users of the theory of differential equations and theory of control. Control theory has become into a v

vi

Preface

scientific discipline and of engineering that seems destined to have an impact in all aspects of modern society. First studied by mathematicians and engineers now it is increasingly present on areas of physics, economy, biology, sociology, psychology, etc. What is control theory and why is so important for a wide variety of specialities? Control theory is a set of concepts and techniques that are used to analyze and design various kinds of systems, independently from their physical nature and special functions. The most important aspect of control theory is the development of a quantitative model that describes the relation and interaction of a cause and effect between the variables of a system. This means that the language of control theory is mathematical and any serious attempt to learn control theory must be accompanied of mathematical precision and its comprehension. We have attempted, at writing this book, to make a pedagogic introduction to control theory that is accessible to readers with little background beyond linear and abstract elemental algebra and matrix theory while begin motivating for those mathematician readers with more sophisticated theoretical knowledge. As result, the book is more autonomous and less specialized. This book begins with the study of elementary set and map theory being a little bit abstract. Chapters 2 and 3 on groups theory and rings respectively are included because of their important relation to the linear algebra, the group of invertible linear maps (or matrices) and the ring of linear maps of a vector space being perhaps the most amazing examples of groups and rings. We deal with homomorphisms and Ideals as well as in Chap. 3 we mention some properties of integers. Also it is well known that group theory at its inception was deeply connected to algebraic equations. The success of the theory of groups in algebraic equations led to the hope that similar group-theoretic methods could be a powerful arsenal to attack the problems of differential equations. Still more, Ritt and Kolchin tackled the problem form the differential and algebraic point of view (Chap. 10 of control theory from point view of differential algebra). Chapter 4 is devoted to the matrices theory and the linear equations systems. Chapter 5 gives some definitions of permutations, determinants and inverse of a matrix. In Chap. 6 is tackled in notion of vector in real Euclidian space and in general vector space over a field, bases and dimension of a vector space as well as sums and direct sums. In Chap. 7 we treat with linear maps or linear transformations, Kernel and image of a linear map, dimensions of kernel and image, in addition, the application in linear control theory of some abstract theorems such as the concept of kernel, image and dimension of space are illustrated. Chapter 8 we shall consider the diagonalization of a matrix and the canonical form of Jordan. In Chap. 9 we attack elementary methods of solving differential equations and finally in Chap. 10 we treat the nonlinear control theory from the point of view of differential algebra. Mexico City, Mexico

Rafael Martínez-Guerra Oscar Martínez-Fuentes Juan Javier Montesinos-García

Acknowledgements

To the memory of my father who was the source of my teaching Carlos Martínez Rosales. To my wife and Sons Marilen, Rafael and Juan Carlos. To my mother and brothers Virginia, Victor, Arturo, Carlos, Javier and Marisela. Je tiens à exprimer ma profonde gratitude au Dr. Michel Fliess qui m’a initié au domaine du contrôle non linéaire à l’aide, d’outils tels que l’algèbre différentielle. Merci également à mon ami Dr. Sette Diop avec qui j’ai eu le plaisir de collaborer à certaines publications. à eux ma plus sincère gratitude. Rafael Martínez-Guerra To my parents Teresa and Leopoldo. In memory of my grandfather, Facundo. Oscar Martínez-Fuentes To my parents Irma and Javier, whose support and love allowed me to reach many goals. To my aunt Rosario and uncle Clemente, for all their teachings and support. Juan Javier Montesinos-García

vii

Contents

1

Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction to Set Theory . . . . . . . . . . . . . . . . . . . . 1.1.1 Set Operations and Other Properties . . . . . . 1.2 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . 1.3 Functions or Maps . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Classification of Functions or Maps . . . . . . . 1.4 Well-Ordering Principle and Mathematical Induction References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

1 1 2 5 8 9 13 16

2

Group Theory . . . . . . . . . . . . . . . 2.1 Basic Definitions . . . . . . . . 2.2 Subgroups . . . . . . . . . . . . . 2.3 Homomorphisms . . . . . . . . . 2.4 The Isomorphism Theorems References . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

19 19 21 26 29 30

3

Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Basic Definitions . . . . . . . . . . . . . 3.2 Ideals, Homomorphisms and Rings 3.3 Isomorphism Theorems in Rings . . 3.4 Some Properties of Integers . . . . . . 3.4.1 Divisibility . . . . . . . . . . . . 3.4.2 Division Algorithm . . . . . . 3.4.3 Greatest Common Divisor . 3.4.4 Least Common Multiple . . 3.5 Polynomials Rings . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

31 31 33 35 36 36 38 38 42 44 47

. . . . . .

. . . . . .

. . . . . .

. . . . . .

ix

x

Contents

4

Matrices and Linear Equations Systems . . . . . . . . . . . . . . . 4.1 Properties of Algebraic Operations with Real Numbers . 4.2 The Set Rn and Linear Operations . . . . . . . . . . . . . . . . 4.2.1 Linear Operations in Rn . . . . . . . . . . . . . . . . . 4.3 Background of Matrix Operations . . . . . . . . . . . . . . . . 4.4 Gauss-Jordan Method . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

49 49 50 51 53 59 63 64

5

Permutations and Determinants 5.1 Permutations Group . . . . . 5.2 Determinants . . . . . . . . . . References . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

65 65 67 73

6

Vector and Euclidean Spaces . . . . . . . . . . 6.1 Vector Spaces and Subspaces . . . . . . 6.2 Generated Subspace . . . . . . . . . . . . . 6.3 Linear Dependence and Independence 6.4 Bases and Dimension . . . . . . . . . . . . 6.5 Quotient Space . . . . . . . . . . . . . . . . . 6.6 Cayley-Hamilton Theorem . . . . . . . . . 6.7 Euclidean Spaces . . . . . . . . . . . . . . . 6.8 GramSchmidt Process . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

75 75 79 80 81 85 87 88 90 91

7

Linear Transformations 7.1 Background . . . . . 7.2 Kernel and Image . 7.3 Linear Operators . . 7.4 Associate Matrix . . References . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. 93 . 93 . 94 . 100 . 102 . 106

8

Matrix Diagonalization and Jordan Canonical Form 8.1 Matrix Diagonalization . . . . . . . . . . . . . . . . . . . 8.2 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . 8.2.1 Generalized Eigenvectors . . . . . . . . . . . 8.2.2 Dot Diagram Method . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

109 109 116 118 120 123

9

Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Motivation: Some Physical Origins of Differential Equations . 9.1.1 Free Fall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.2 Simple Pendulum Problem . . . . . . . . . . . . . . . . . . . 9.1.3 Friction Problem . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

125 125 125 126 127

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . .

. . . . . .

. . . .

. . . . . .

. . . .

. . . . . .

. . . .

. . . . . .

. . . .

. . . . . .

. . . .

. . . . . .

. . . .

. . . . . .

. . . . . .

Contents

xi

9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 9.11

Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Separable Differential Equations . . . . . . . . . . . . . . . . . . . Homogeneous Equations . . . . . . . . . . . . . . . . . . . . . . . . . Exact Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Differential Equations . . . . . . . . . . . . . . . . . . . . . . Homogeneous Second Order Linear Differential Equations Variation of Parameters Method . . . . . . . . . . . . . . . . . . . . Initial Value Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . Indeterminate Coefficients . . . . . . . . . . . . . . . . . . . . . . . . Solution of Differential Equations by Means of Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.11.1 Some Criterions of Convergence of Series . . . . . . 9.11.2 Solution of First and Second Order Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12 Picard’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.13 Convergence of Picard’s Iterations . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10 Differential Algebra for Nonlinear Control Theory . . . . . . . . 10.1 Algebraic and Transcendental Field Extensions . . . . . . . . 10.2 Basic Notions of Differential Algebra . . . . . . . . . . . . . . . 10.3 Definitions and Properties of Single-Input Single-Output Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Identifying the Input and Output of Dynamic Systems . . 10.5 Invertible Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Realization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 Generalized Controller and Observer Canonical Forms . . 10.8 Linearization by Dynamical Feedback of Nonlinear Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

128 129 130 132 134 137 139 141 145

. . . . 146 . . . . 146 . . . .

. . . .

. . . .

. . . .

149 154 156 161

. . . . . 163 . . . . . 163 . . . . . 165 . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

167 168 168 172 173

. . . . . 174 . . . . . 182

Appendix: Important Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

Notations and Abbreviations

N Z Q R C A; B; . . . x; y; . . . AB x2A A[B A\B Ac AnB ; () 8  a  b mod n A=R  detðAÞ jAj trðAÞ A⊤ f   g ðaij Þi; j rankðAÞ A 1 y_ ¼ dy dt Ek

The Set of natural numbers The Set of integers numbers The Set of rational numbers The Set of real numbers The Set of complex numbers Capital letters represent arbitrary sets Lowercase letters represent elements of a set A is subset of B x is element of A The union of two sets The intersection of two sets Complement set of A Difference of sets A and B Empty set Necessary and sufficient condition For all Equivalence relation a is congruent with b module n Quotient set Binary operation for groups The determinant of a square matrix A 2 Rnn The determinant of a square matrix A 2 Rnn The trace of a matrix A The transpose of a matrix A Set m  n matrix with entries aij ; 1  i  m; 1  j  n Rank of a matrix A Inverse of A First derivative of y with respect to t Eigenspace of E corresponding to eigenvalue k

xiii

xiv

dimðVÞ L ðV; WÞ Ker u diff trd° y \ð [ Þ ð Þ 8 PðnÞ ffi G=N Khui K=k

Notations and Abbreviations

Dimension of V Space of linear transformations of V in W Kernel of u Differential transcendence degree Designation of the end of a proof Less (greater) than Less (greater) than or equal to For all A statement Isomorphism Quotient group Differential field generated by the field K; u and its time derivatives Differential field extension k  K

Chapter 1

Mathematical Background

Abstract This chapter focuses on the basic concepts and algebra of sets as well as a brief introduction to the theory of functions and the well known principle of mathematical induction. All this background will be needed as a tool to understand the theory of linear algebra and differential equations as set forth in the following chapters.

1.1 Introduction to Set Theory Definition 1.1 A set A is any collection of objects called elements of A that is well defined (same nature) [1–4]: A = {a1 , a2 , a3 , . . . , an−1 , an } • ai ∈ A denotes ai is an element of set A. • x∈ / A denotes x is not an element of set A. Hereinafter, capital letters, S, T, . . . denote sets and each of the elements of a set is denoted by a lowercase letters, x, y, . . . , etcetera. Example 1.1 Let N = {1, 2, . . .} the set of natural numbers (positive integers),   Z= {. . . , −3, −2, −1, 0, 1, 2, . . .} the set of integers, Q = ab | a, b ∈ Z, b = 0 the set of rational numbers and R the set of real numbers. It is clear that −5 ∈ Z, 35 ∈ √ √ / N, 23 ∈ / Z. Q, 23 ∈ R, π ∈ According to the number of elements of each set, the set can be finite or infinite. For example, if we consider the set of grains of sand in the world, although the amount is very large and possibly we could never finish counting, this set is finite. The numbers set are infinite. The set with no elements is called empty set [5–8]. This set is unique and it is represented by ∅. On the other hand, a finite set consisting of one element is called singleton. Remark 1.1 ∅ and {∅} are totally different. The first one is a set without elements, whereas the another set is a singleton with the unique element that is ∅. © Springer Nature Switzerland AG 2019 R. Martínez-Guerra et al., Algebraic and Differential Methods for Nonlinear Control Theory, Mathematical and Analytical Techniques with Applications to Engineering, https://doi.org/10.1007/978-3-030-12025-2_1

1

2

1 Mathematical Background

Definition 1.2 Let A, B two set. It is said that A is a subset of B, A ⊂ B (or B ⊃ A ), if and only if every element of A is element of B, i.e., a ∈ A ⇒ a ∈ B, ∀a ∈ A. (If A = B we call A a proper subset of B). According to this new concept, note that if a ∈ A then we can to write an equivalent notation for this fact, i.e. {a} ⊂ A. Other examples of subsets are given by the following chain of inclusions: N ⊂ Z ⊂ Q ⊂ R ⊂ C, the larger field, C is called an extension field of R, similarly R is an extension field of Q, this concept will have relevance in Chap. 10. Remark 1.2 Let A an arbitrary set, then the empty set satisfies ∅ ⊂ A for all set A. To prove this fact, let us suppose that the statement is false. It is false only if ∅ has an element that does not belong to A. Since ∅ has not elements, then is not true and therefore ∅ ⊂ A for every A. Remark 1.3 Let A, B, C any set, according to the Definition 1.2, • A ⊂ A. This property states that the inclusion is reflexive (every set is a subset of itself). • The inclusion is antisymmetric. This property is written as follows: A ⊂ B and B ⊂ A then A = B. • If A ⊂ B and B ⊂ C then A ⊂ C, i.e., the inclusion is transitive. The condition of equality between two sets is given from the following axiom. Property 1.1 (Axiom of extension): Two sets are equal if and only if they have the same elements. The axiom of extension states that if A and B are set such that A ⊂ B and B ⊂ A then A and B have the same elements and therefore A = B. This is the more general form to prove the equality between two set and this is the necessary and sufficient condition. Note that equality is symmetric, i.e., if A = B then B = A. In particular, when A = A it said that equality is reflexive. Finally, the relationship of equality is transitive. This property is analogous to the property of transitivity for subsets, i.e., A = B and B = C then A = C [10–12].

1.1.1 Set Operations and Other Properties Sometimes, when we have two set, A and B that are not necessarily equals, it is necessary to define a third set whose elements are given by the above sets. So that, it is necessary to define an operation in the sets with that feature. Before the definition is established, an important axiom that determines these new operations: Property 1.2 (Axiom of unions): For every collection of sets there exists a set that contains all the elements that belong to at least one set of the given collection.

1.1 Introduction to Set Theory

3

Although is complicated to define a set namely universe, that contains every set, in the next definition we will suppose that there exists a set (non empty) that contains a finite number of sets. Definition 1.3 Let A, B two subsets of a set S. The union of two sets A and B is the set A ∪ B = {s ∈ S | s ∈ A or s ∈ B} where, we say s ∈ A ∪ B if and only if s ∈ A, s ∈ B or s is in both sets. The other important operation is the intersection between two sets, defined as follows. Definition 1.4 Let A, B two subsets of a set S. The intersection of two sets A and B is the set A ∩ B = {s ∈ S | s ∈ A and s ∈ B} where, we say s ∈ A ∩ B if and only if s ∈ A and s ∈ B. Remark 1.4 These operations can be generalized for more sets. Let {Aα }α∈I a collection of subsets of S, then:  Aα = {s ∈ S | s ∈ Aα for some α ∈ I, I is an index set} • α∈I  • Aα = {s ∈ S | s ∈ Aα for all α ∈ I, I is an index set} α∈I

In any application of set theory, all sets that intervene can be considered as subsets of a fixed set. This set is called universal set (or simply universe) and it is denoted by U . Definition 1.5 Let the universe U and an arbitrary subset A of U , the complement of set A respect to U , to the set of all elements that do not belong to A but that belong to U [11, 13]. In analogous manner the union an intersection operations have some particular properties. We establish the following assertion. Note that each assertion should be demonstrated. 1. Associativity • A ∪ (B ∪ C) = (A ∪ B) ∪ C • A ∩ (B ∩ C) = (A ∩ B) ∩ C 2. Commutativity • A∪B = B∪ A • A∩B = B∩ A 3. Distributivity • A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) • A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)

4

1 Mathematical Background

4. Idempotency • A ∪ A = A for all set A. • A ∩ A = A for all set A. 5. De Morgan’s laws • (A ∪ B)c = Ac ∩ B c • (A ∩ B)c = Ac ∪ B c To prove each one of the claims, it is necessary to establish the equality between both sets. That is to say, we prove the two inclusions: A = B ⇔ A ⊂ B and B ⊂ A For example, for the case of associativity of union, we have the following proof. Let x ∈ A ∪ (B ∪ C), then x ∈ A ∪ (B ∪ C) ⇔ x ∈ A or x ∈ (B ∪ C) ⇔ x ∈ A or x ∈ B or x ∈ C ⇔ x ∈ (A ∪ B) or x ∈ C ⇔ x ∈ (A ∪ B) ∪ C Thus A ∪ (B ∪ C) = (A ∪ B) ∪ C. At the end of this section, we will establish a proposition with more properties. Now, we establish the following definition. Definition 1.6 Let A, B subsets of a set S. A and B are said to be disjoint if they have not elements in common, that is to say: A ∩ B = ∅. Note, that this definition can be extended to any family of sets: A family of sets is pairwise disjoint or mutually disjoint if every two different sets in the family are disjoint ( Ai , A j ∈ A , i, j ∈ I, Ai ∩ A j = ∅). Another important operation is the difference between two sets, defined as follows. Definition 1.7 Let A, B subsets of a set S. The difference of the sets A and B is the set of all elements of set A that are not belong to the set B, i.e., A \ B = A − B = {s ∈ S | s ∈ A , s ∈ / B} = A ∩ B c where B c denotes the complement of B. Example 1.2 Let K = {1, 2, 3, 4, 5} , L = {1, 2, 3, 4, 5, 6, 7, 8}, and M = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. Note that K ⊂ L ⊂ M. According to the difference of sets, we have that

1.1 Introduction to Set Theory

5

L − K = {6, 7, 8} M − L = {9, 10, 11, 12} M − K = {6, 7, 8, 9, 10, 11, 12} Note as these sets are related by the union between them, and by their number of elements denominated cardinality of the set, i.e.: M − K = (M − L) ∪ (L − K ) and 7 = car d(M − K ) = car d(M − L) + car d(L − K ) = 4 + 3. This idea will be generalized and stated in Chap. 10 as a theorem for a tower of fields and their transcendence degree. Remark 1.5 According to the above definition, the complement of a set A can be written as follows: / A} = U \ A Ac = {x | x ∈ U and x ∈ that is, the difference of the universe with the set A. The properties of difference of set are the following: 1. 2. 3. 4.

For every set A, A \ A = ∅. For every set A, A \ ∅ = A. For every set A, ∅ \ A = ∅. Let A and B are nonempty sets. If A − B = B − A then A = B The latter property can be demonstrated as follows. By definition, we have A − B = {x | x ∈ A and x ∈ / B} B − A = {x | x ∈ B and x ∈ / A}

If A − B = B − A then every element of A \ B is an element of B \ A, but in the set A \ B there are only elements of A. In the same manner, all element of B \ A is an element of A \ B but in the set B \ A there are only elements of B, therefore, A = B.

1.2 Equivalence Relations Definition 1.8 Let A a nonempty set. A relation in A is a subset R of A × A. Remark 1.6 We write a ∼ b if (a, b) ∈ R.

6

1 Mathematical Background

Definition 1.9 A relation R in A it is said to be: 1. Reflexive, if a ∼ a ∀ a ∈ A 2. Symmetric, if a ∼ b ⇒ b ∼ a ∀ a, b ∈ A 3. Transitive, if a ∼ b and b ∼ c ⇒ a ∼ c ∀ a, b, c ∈ A Definition 1.10 A relation R in A it is said to be equivalence relation if R is reflexive, symmetric and transitive [14, 15]. Example 1.3 Let A = Z. Let a, b ∈ A. It said to be a and b have the same parity, if a and b are both even or are both odd. We say that a and b are related, a ∼ b, if a and b have the same parity. Indeed, this is a equivalence relation: • a ∼ a ⇔ (−1)a = (−1)a , a − a = 0 is even. • a ∼ b ⇔ (−1)a = (−1)b ⇔ (−1)b = (−1)a ⇔ b ∼ a. • a ∼ b ⇔ (−1)a = (−1)b and b ∼ c ⇔ (−1)b = (−1)c . By virtue of the properties of equality, we have (−1)a = (−1)b = (−1)c . Therefore a ∼ c. Definition 1.11 Let m, n ∈ Z. It is said that m divides n if there exists q ∈ Z such that n = mq. In other words, m is a factor of n or n it is a multiple of m. Remark 1.7 The notation for divisibility of numbers is the following: • If m divides n, we write m | n. • In other case, i.e., if m not divides n, we write m  n. Definition 1.12 Let n ∈ N, a, b ∈ Z. It said to be a is congruent with b module n if n | (b − a). We write a ≡ b mod n. The congruence module n is a equivalence relation. Definition 1.13 Inclusion is not an equivalence relation. Note that N ⊂ Z but Z ⊂ N (⊂ is not symmetric). Definition 1.14 Let ∼ an equivalence relation in A. The equivalence class of a ∈ A is denoted by the following set: C = [a] = {x ∈ A | x ∼ a} The elements of [a] are the elements of A that are related with a. If C is an equivalence class, then any element of C is a representative of C . Example 1.4 In the case of the parity we have: • [0] = {. . . , −4, −2, 0, 2, 4, . . .} = {2m | m ∈ Z} • [1] = {. . . , −5, −3, −1, 1, 3, 5, . . .} = {2m + 1 | m ∈ Z} Example 1.5 According to a ≡ b mod n, if n = 5, we have: • [0] = {. . . , −10, −5, 0, 5, 10, . . .} = {5m | m ∈ Z} • [1] = {. . . , −9, −4, 1, 6, 11, . . .} = {5m + 1 | m ∈ Z}

1.2 Equivalence Relations

7

• [2] = {. . . , −8, −3, 2, 7, 12, . . .} = {5m + 2 | m ∈ Z} • [3] = {. . . , −7, −2, 3, 8, 13, . . .} = {5m + 3 | m ∈ Z} • [4] = {. . . , −6, −1, 4, 9, 14, . . .} = {5m + 4 | m ∈ Z} Theorem 1.1 If ∼ is an equivalence relation in A, then A=



[a]

a∈A

In addition, if [a] = [b] ⇒ [a] ∩ [b] = ∅. Proof First, we prove the double contention of subsets.   [a]. This implies that A ⊂ [a]. 1. Let a ∈ A. Then a ∈ [a] and therefore a ∈ a∈A  a∈A 2. Given that [a] ⊂ A ∀a ∈ A, therefore [a] ⊂ A. a∈A

Hence A=



[a]

a∈A

Finally we will use the contrapositive form to prove the last statement. Suppose that [a] ∩ [b] = ∅. Let c ∈ [a] ∩ [b], then we have c ∼ a and c ∼ b , therefore a ∼ b. Now, if x ∈ [a], then x ∼ a and x ∼ b (since a ∼ b) and hence x ∈ [b]. This implies [a] ⊂ [b] and in a similar manner, [b] ⊂ [a]. Therefore [a] = [b].  Definition 1.15 Let A a nonempty set. A partition of A, is a collection {Aα }α∈I of subsets of A such that:  Aα A= α∈I

In addition, if α = β then Aα ∩ Aβ = ∅. Aα are so called elements of the partition. Remark 1.8 Let A a nonempty set. An equivalence relation in A gives rise to a partition of A. The elements of this partition, are the equivalence classes. Reciprocally, a partition of A generates an equivalence relation. In conclusion, let a, b ∈ A. a and b are said to be related if and only if they are in the same element of the partition. Definition 1.16 Let A a nonempty set and R (or ∼) an equivalence relation in A. A is said to be a quotient set denoted by: A/R = A/ ∼= {[a] | a ∈ A} Example 1.6 Let n ∈ N. Consider the congruence module n in Z (that is an equivalence relation). The quotient set of Z is given by Z/nZ = {[0], . . . , [n − 1]} Sometimes, Z/nZ is denoted by Z/n or Zn . As particular cases, we have:

8

1 Mathematical Background

1. Z/2Z = {[0], [1]} 2. Z/5Z = {[0], [1], [2], [3], [4]}

1.3 Functions or Maps Definition 1.17 Let A, B nonempty sets. A function or map is a relation given by a set A and a set B with the following property: each element a ∈ A is related to exactly one and only one element b ∈ B. In other words, a function or map is a rule that assigns to each element a of A [16, 17], one and only one element b of B. If f : A → B is a function, the set A is called domain of the function and the set B is the codomain or range of f . A function f is commonly denoted by: f :A→B a −→ b The notation f (a) = b is equivalent to say a −→ b. The image of a ∈ A is f (a) ∈ B. The image of the function f is: Im f = { f (a) | a ∈ A} ⊂ B Example 1.7 1. Let f defined by: f :R → R a −→ a 2 It is not hard to see that some elements are given by: f (2) = 4, f (−2) = 4, f (5) = 25, f (−5) = 25, f (0) = 0. 2. Let g defined by: g:Z → Z n −→ 2n + 1 Some elements using the correspondence rule, are the following: g(2) = 5, g(−2) = −3, g(5) = 11, g(−5) = −9, g(0) = 1. 3. Let A and B nonempty sets. The functions π1 : A × B → A (a, b) −→ a

1.3 Functions or Maps

9

and π2 : A × B → B (a, b) −→ b are called projections. 4. Let A and B nonempty sets. Let b0 ∈ B. The constant function is defined by f :A → B a −→ b0 In other words, f (a) = b0 ∀a ∈ A. In particular, when B = A, we have the identity function: id A : A → A a −→ a Definition 1.18 The functions f and g are equal, if they have the same domain and range, i.e., f = g if and only if f : A → B and g : B → A. In addition, f (a) = g(a) ∀a ∈ A Example 1.8 Let f and g the functions defined by f :Z→Z a −→ a − 1 2

,

g:Z→Z a −→ (a + 1)(a − 1)

Note that domain and codomain of both functions are equal, i.e., Z. Now, we choose an arbitrary element z ∈ Z, then f (z) = z 2 − 1 = (z + 1)(z − 1) = g(z) ∀z ∈ Z. Therefore f = g.

1.3.1 Classification of Functions or Maps Definition 1.19 The function f : A → B is called injective function if a1 = a2 then f (a1 ) = f (a2 ) ∀a1 , a2 ∈ A. Equivalently if f (a1 ) = f (a2 ) then a1 = a2 ∀a1 , a2 ∈ A. Example 1.9 Let g : R \ {−2} → R given by g(x) =

2x − 1 x +2

(1.1)

10

1 Mathematical Background

Let x1 , x2 ∈ Dg . Suppose that g(x1 ) = g(x2 ) then 2x1 − 1 2x2 − 1 = x1 + 2 x2 + 2 if and only if (2x1 − 1)(x2 + 2) = (2x2 − 1)(x1 + 2) this yields to x1 = x2 . Hence g is injective. Definition 1.20 A surjective function or onto is a function whose image is equal to its codomain. Equivalently, a function f : A → B with domain A and codomain B is surjective if for every b ∈ B there exists at least one a ∈ A such that f (a) = b. In other words, f : A → B is onto if Im f = B [18, 19]. Example 1.10 Let g : R \ {−2} → R given by g(x) =

2x − 1 x +2

This function is surjective since for all y ∈ R \ {−2} exists x =

1+2y 2−y

such that

  −1 2 1+2y 2−y  =y g(x) =  1+2y + 2 2−y Definition 1.21 The function f : A → B is called bijective function or bijection if f is an injective and surjective function. Example 1.11 The function g defined above is injective and surjective, thus is bijective. Definition 1.22 Let f : A → B a bijective function. In this case, the inverse function of f is: f −1 : B → A The inverse function is given by f −1 (b) = a if and only if f (a) = b. Example 1.12 The function g defined by (1.1) is a bijective function then the inverse function g −1 exists given by 2x + 1 g −1 (x) = 2−x Proposition 1.1 Let f : A → B. If f is bijective, then its inverse function f −1 :

−1 B → A is also bijective and f −1 = f.

1.3 Functions or Maps

11

Proof The proof is left to the reader.



Definition 1.23 Let g : A → B and f : B → C functions. The composition of f and g, denoted by f ◦ g is the function: f ◦g: A→C and is given by ( f ◦ g) (a) = f (g(a)). Example 1.13 f : R → R, g : R → R defined by: f (x) = x 2 − 1, g(x) = sin(x), hence ( f ◦ g) (x) = − cos2 (x). Example 1.14 In general f ◦ g = g ◦ f . Let f and g the functions defined by f :R→R a −→ a

g:R→R a −→ (a + 1)

,

2

From the definition, we have: f ◦g:R→R a −→ (a + 1)

2

,

g◦ f :R→R a −→ a 2 + 1

Therefore, f ◦ g = g ◦ f . Proposition 1.2 Let h : A → B, g : B → C, f : C → D functions. Then: ( f ◦ g) ◦ h = f ◦ (g ◦ h) Proof We have ( f ◦ g) ◦ h : A → D and f ◦ (g ◦ h) : A → D, so the domain and the codomain are the same for both functions. Let a ∈ A, then we have the following equalities: (( f ◦ g) ◦ h)(a) = ( f ◦ g)(h(a)) = f (g(h(a))) = f ((g ◦ h)(a)) = ( f ◦ (g ◦ h))(a)

This yields to (( f ◦ g) ◦ h)(a) = ( f ◦ (g ◦ h))(a) ∀a ∈ A Finally, we conclude that ( f ◦ g) ◦ h = f ◦ (g ◦ h). Proposition 1.3 Let g : A → B, f : B → C functions. Then: 1. If f, g are injective functions, then f ◦ g is injective. 2. If f, g are surjective functions, then f ◦ g is surjective. 3. If f, g are bijective functions, then f ◦ g is bijective.



12

1 Mathematical Background

Proof The proof is left to the reader.



Proposition 1.4 Let g : A → B, f : B → C functions. Then: 1. 2. 3. 4. 5. 6.

If f ◦ g is injective then g is injective. If f ◦ g is injective then f is not necessarily injective. If f ◦ g is surjective then f is surjective. If f ◦ g is surjective then g is not necessarily surjective. If f ◦ g is bijective then g is injective and f is surjective. If f ◦ g is bijective then f and g are not necessarily injective and surjective respectively.

Proof The proof is left to the reader.



Proposition 1.5 Let f : A → B a bijective function and f −1 : B → A the inverse of f , then: 1. f ◦ f −1 = id B 2. f −1 ◦ f = id A Proof First, note that f ◦ f −1 : B → B, id B : B → B, f −1 ◦ f : A → A and id A : A → A. Now, we consider the first case. Let b ∈ B. As the function f is bijective, then there exists a unique a ∈ A such that f (a) = b. According to the definition of inverse function, f −1 (b) = a, then we have: ( f ◦ f −1 )(b) = f ( f −1 (b)) = f (a) = b = id B (b) Therefore f ◦ f −1 = id B . For the second case, let a ∈ A and b = f (a). Since the function f is bijective, then f −1 (b) = a. Therefore: ( f −1 ◦ f )(a) = f −1 ( f (a)) = f −1 (b) = a = id A (a) Thus f −1 ◦ f = id A .



Proposition 1.6 If f : A → B is function and there exists a function g : B → A such that g ◦ f = id A and f ◦ g = id B , then f is bijective and f −1 = g. Proof It is trivial considering the Proposition 1.4. According to this result, we refer to the bijective function as an invertible function.  Proposition 1.7 Let f : A → B a function. Then 1. f ◦ id A = f 2. id B ◦ f = f Proof Note that f ◦ id A : A → B and id B ◦ f : A → B. Let a ∈ A then ( f ◦ id A ) (a) = f (id A (a)) = f (a). In the same manner (id B ◦ f ) (a) = id B ( f (a)) = f (a).  Therefore f ◦ id A = f and id B ◦ f = f .

1.3 Functions or Maps

13

Proposition 1.8 Let g : A → B and f : B → C bijective functions. Then ( f ◦ g)−1 = g −1 ◦ f −1 Proof From the Proposition 1.3, f ◦ g : A → C is bijective and ( f ◦ g)−1 : C → A. Since f and g are bijective functions then g −1 : B → A and f −1 : C → B. Then g −1 ◦ f −1 : C → A, therefore ( f ◦ g)−1 and g −1 ◦ f −1 have the same domain and codomain. Let c ∈ C, then there exists a ∈ A and b ∈ B such that g(a) = b and f (b) = c. Then we have ( f ◦ g)(a) = f (g(a)) = f (b) = c, hence ( f ◦ g)−1 (c) = a. On the other hand, f −1 (c) = b and g −1 (b) = a, then (g −1 ◦ f −1 )(c) =  g −1 ( f −1 (c)) = g −1 (b) = a. Therefore ( f ◦ g)−1 = g −1 ◦ f −1 .

1.4 Well-Ordering Principle and Mathematical Induction Consider the well-ordering principle axiom that is stated as follows. Well-ordering principle: If A is subset nonempty of N, then there exists m ∈ A such that m ≤ a ∀a ∈ A, i.e., m is the least element of A. Theorem 1.2 (Mathematical induction principle) Let P(n) a statement about natural numbers N such that: 1. P(1) is valid. 2. Suppose that statement is true for n = k, i.e., P(k) is valid, then P(k + 1) is valid ∀k ∈ N. then P(n) is true for all n ∈ N. Proof By contradiction. Suppose that the theorem is false, then there exists a statement P(n) such that satisfies (1) and (2) but P(n) is not valid for all n ∈ N. By using the well-ordering principle, there exists a minimum natural number m such that P(m) is not valid. Since (2) is true, then m > 1. By using the minimality of m we have P(m − 1) is valid. According to (2) then P(m) is valid, but this contradicts the fact that P(m) is not valid. Hence P(n) is valid ∀ n ∈ N.  Remark 1.9 Some comments about the mathematical induction principle, then we can do some the following modifications. • Change 1 for any n 0 ∈ Z+ . Conclude that P(n) is valid ∀n ≥ n 0 . • Suppose that P(1) is valid and P( j) is valid too ∀ j < k, then P(k) is valid ∀k ∈ N. Conclude that P(n) is valid ∀n ∈ N. • Suppose that P(n 0 ) is valid ∀ j < k then P(k) is valid ∀k ≥ n 0 . Conclude that P(n) is valid ∀n ≥ n 0 . Example 1.15 Show that the sum of the first n natural numbers is: 1 + 2 + 3 + ··· + n =

n(n + 1) ∀n ∈ N 2

(1.2)

14

1 Mathematical Background

Proof Using induction over n. Let Sn the set of positive integers n such that satisfy (1.2), i.e, Sn = 1 + 2 + 3 + · · · + n First we verify that 1 satisfies Sn : S1 =

1(1 + 1) =1 2

Let us prove the induction step. Suppose that k satisfies Sn (induction hypothesis), i.e. k(k + 1) (1.3) Sk = 1 + 2 + 3 + · · · + k = 2 and prove that Sk+1 =

(k + 1)(k + 2) 2

(1.4)

Indeed, Sk+1 = Sk + (k + 1) = 1 + 2 + 3 + · · · + k + (k + 1) =

k(k + 1) (k + 1)(k + 2) + (k + 1) = 2 2

Therefore: 1 + 2 + 3 + ··· + n =

n(n + 1) ∀n ∈ N 2



Remark 1.10 In another way, let S1 (n) = 1 + 2 + 3 + · · · + n. We write the sum S1 (n) twice with the second one written in opposite order: S1 (n) = 1 + 2 + 3 + · · · + n S1 (n) = n + (n − 1) + (n − 2) + · · · + 2 + 1 Then we add the two expressions: 2S1 (n) = (n + 1) + (n + 1) + (n + 1) + (n + 1) + · · · + (n + 1) + (n + 1) That is, 2S1 (n) = n(n + 1) and therefore S1 (n) = 1 + 2 + 3 + · · · + n =

n(n + 1) ∀n ∈ N 2

Example 1.16 Prove the following formula by induction. 12 + 22 + · · · + n 2 =

n(n + 1)(2n + 1) 6

(1.5)

1.4 Well-Ordering Principle and Mathematical Induction

15

Proof Denote by Sn the sum: Sn = 12 + 22 + · · · + n 2 Let A the set of positive integers such that satisfies (1.5). Firstly is verified that 1 ∈ A: 12 = 1 =

(1)(2)(3) (1)(1 + 1) [2(1) + 1] 6 = = 6 6 6

Now the induction step is tested. Suppose that k ∈ A (induction hypothesis) then Sk =

k(k + 1)(2k + 1) 6

(1.6)

Let us show that Sk+1 =

(k + 1)[(k + 1) + 1][2(k + 1) + 1] 6

Indeed, adding (k + 1)2 on both sides of (1.6) and making algebraic manipulations: Sk+1 = Sk + (k + 1)2 = = = = = =

k(k + 1)(2k + 1) + (k + 1)2 6 k(k + 1)(2k + 1) + 6(k + 1)2 6 (k + 1) [k(2k + 1) + 6(k + 1)] 6 2 (k + 1) 2k + 7k + 6) 6 (k + 1)(k + 2)(2k + 3) 6 (k + 1) [(k + 1) + 1] [2(k + 1) + 1] 6

We have shown that k + 1 ∈ A. Given that 1 ∈ A and for every positive integer k belonging to the set A, the number k + 1 also belongs to the set A, by using the mathematical induction principle A = N and (1.5) is valid for every natural number n.  Example 1.17 13 + 23 + · · · + n 3 = (1 + · · · + n)2

(1.7)

16

1 Mathematical Background

Proof We proceed by using mathematical induction on n. Let B the set of positive integers such that satisfies (1.7). Firstly we have 13 = 1 = 12 then 1 ∈ B. Suppose that (1.7) is valid for n = k: 13 + 23 + · · · + k 3 = (1 + · · · + k)2

(1.8)

Equation above represents the induction hypothesis. Let us demonstrate that the hypothesis is valid for n = k + 1, that is 13 + 23 + · · · + k 3 + (k + 1)3 = [1 + · · · + k + (k + 1)]2 Then [1 + · · · + k + (k + 1)]2 = (1 + · · · + k)2 + 2 (1 + · · · + k) (k + 1) + (k + 1)2 

k(k + 1) (k + 1) + (k + 1)2 = 13 + 23 + · · · + k 3 + 2 2 = 13 + 23 + · · · + k 3 + k(k + 1)2 + (k + 1)2 = 13 + 23 + · · · + k 3 + (k + 1)2 (k + 1) = 13 + 23 + · · · + k 3 + (k + 1)3 therefore k + 1 ∈ B which means that B = N. Based in the mathematical induction principle, (1.7) is valid for every natural number n. 

References 1. Adamson, I. T.: A Set Theory Workbook. Springer Science & Business Median, New York (2012) 2. Hammack, R.: Book of proof. Virginia Commonwealth University Mathematics (2013) 3. Lang, S.: Algebra. Springer, New York (2002) 4. Lipschutz, S.: Schaum’s outline of theory and problems of set theory and related topics (1964) 5. Bourbaki, N.: Theory of Sets. Springer, Heidelberg (2004) 6. Halmos, P.R.: Naive Set Theory. Dover Publications, New York (2017) 7. Henkin, L.: On mathematical induction. Am. Math. Monthly 67(4), 323–338 (1960) 8. Hrbacek, K., Jech, T.: Introduction to Set Theory, Revised and Expanded. CRC Press, Boca Raton (1999) 9. Sominskii, I.S.: The Method of Mathematical Induction (Popular Lectures in Mathematics). Pergamon Press, Oxford (1961) 10. Bourbaki, N. Commutative Algebra (Vol. 8), Hermann, Paris (1972) 11. Kuratowski, K.: Introduction to Set Theory and Topology. Elsevier, Amsterdam (2014) 12. Smith, D., Eggen, M., Andre, R.S.: A Transition to Advanced Mathematics. Nelson Education (2014)

References

17

13. Jech, T. J.: The axiom of choice. Courier Corporation (2008) 14. Kunz, E.: Introduction to Commutative Algebra and Algebraic Geometry. Birkhuser, Boston (1985) 15. Zariski, O., Samuel, P.: Commutative Algebra, vol. 1. Springer Science & Business Media, New York (1958) 16. Birkhoff, G., Mac Lane, S.: A Survey of Modern Algebra. Universities Press (1965) 17. Sontag, E.D.: Linear systems over commutative rings. A survey. Ricerche di Automatica 7(1), 1–34 (1976) 18. Samuel, P., Serge, B.: Mthodes d’algbre abstraite en gomtrie algbrique. Springer, Heidelberg (1967) 19. Zariski, O., Samuel, P.: Commutative Algebra, vol. 2. Springer Science & Business Media, New York (1960)

Chapter 2

Group Theory

Abstract This chapter provides an introduction to group theory, the chapter begins with basic definitions about groups, then continues to introduce subgroups and its characteristics, followed by homomorphisms, the chapter concludes with various theorems about isomorphisms, along with the concepts, various examples are provided to facilitate the understanding of the theory. This material ca ben found in any introductory book on abstract algebra (also called modern algebra).

2.1 Basic Definitions Definition 2.1 A group is a nonempty set G (G = ∅) together with an operation ∗ (called the group law of G) that combines any two elements a and b to form another element, denoted a ∗ b or ab or explicitly [1–5]: ∗ :G × G → G (a, b) → a ∗ b To qualify as a group, the set and operation, (G, ∗), must to satisfy four requirements known as the group axioms [6–8]: 1. a ∗ b ∈ G ∀a, b ∈ G (Closure). 2. a ∗ (b ∗ c) = (a ∗ b) ∗ c ∀a, b, c ∈ G (Associativity). 3. There exists e ∈ G such that a ∗ e = e ∗ a = a ∀a ∈ G (Existence of identical element. e is the identity of G). 4. ∀a ∈ G there exists b ∈ G such that a ∗ b = b ∗ a = e (In this case, b = a −1 is called the inverse of a). Remark 2.1 If a (G, ∗) satisfies 1 and 2, is called semigroup. The semigroup together with 3 is called monoid. Example 2.1 1. The set of integer numbers Z with the operation sum +, denoted by (Z, +), form a group. © Springer Nature Switzerland AG 2019 R. Martínez-Guerra et al., Algebraic and Differential Methods for Nonlinear Control Theory, Mathematical and Analytical Techniques with Applications to Engineering, https://doi.org/10.1007/978-3-030-12025-2_2

19

20

2 Group Theory

2. The set of rational numbers Q with the operation sum +, denoted by (Q, +), form a group. 3. Let us consider the set of rational numbers Q without the zero, denoted by Q∗ = Q \ {0} with the operation multiplication ·, form a group (Q∗ , ·). 4. Let G = {e} and the operation ∗ defined as e ∗ e = e ∈ G. The pair (G, ∗), form the most basic group. 5. Let G = {e, a} and the operation ∗ defined as ∗ ea e ea aae 6. The set of matrix G(n, R) (matrices of n × n with coefficients in R) is a group non-commutative, i.e., AB = B A. 7. The set of matrices defined as   cos θ sin θ sin θ cos θ is a group commutative with the matrix product. Definition 2.2 Let G a group. It is said that G is an Abelian group [7, 9] if a ∗ b = b ∗ a ∀a, b ∈ G Definition 2.3 Let G a group. The order of a group, denoted by | G |, consists in the number of elements of G. Note that if G is a countable set of elements, then the order of G is finite. In other case, G is called a infinite group, for example |{e}| = 1 and |R| = ∞. Example 2.2 1. The set Z with the operation · usual product is not a group because there is not a multiplicative inverse. 2. Let Z∗ = Z ∪ {0} with the operation · (usual product), is Z∗ a group? The proof is left to the reader as an exercise. 3. Let G = R \ {0}. If we define the operation a ∗ b = a 2 b, is G a group? The proof is left to the reader as an exercise. 4. The equivalence class Z/nZ with the operation [a] + [b] = [a + b] is a group, where [a] = {x ∈ Z | a ∼ x, a ∼ = x mod n, n | (a − x)}. The proof is left to the reader as an exercise. Proposition 2.1 Let G a group, then: 1. 2. 3. 4.

The identity element is unique. ∀a ∈ G a −1 is unique.  −1 ∀a ∈ G, a −1 = a. −1 ∀a, b ∈ G, (ab) = b−1 a −1 .

2.1 Basic Definitions

21

−1 5. In general, (a1 a2 · · · an−1 an )−1 = an−1 an−1 · · · a2−1 a1−1

Proof 1. Let e1 , e2 identity elements of G, with e1 = e2 . If e1 is an identity element, then e1 e2 = e2 e1 = e2 . If e2 is an identity element, then e2 e1 = e1 e2 = e1 , so we conclude that e1 = e2 i.e., the identity element e is unique. 2. Let b, c inverse elements of a in G, with b = c. Note that ab = ba = e and ac = ca = e then b = be = b(ac) = (ba)c = ec = c, therefore b = c, i.e., a −1 is unique. 3. Note that aa −1 = a −1 a = e, then a is the inverse element of a −1 . Therefore −1 = a. a −1 4. First applying the invertibility on the left, i.e., (b−1 a −1 )(ab) = b−1 (a −1 a) b = b−1 (eb) = b−1 b = e. Now, applying the invertibility on the right: (ab) (b−1 a −1 ) = a(bb−1 )a −1 = a(ea −1 ) = aa −1 = e. Hence (ab)−1 = b−1 a −1 . 5. By induction. Let n = 2 then (a1 a2 )−1 = a2−1 a1−1 , so the result is valid for n = −1 · · · a2−1 a1−1 holds. It 2. Assume, for n = k that (a1 a2 · · · ak−1 ak )−1 = ak−1 ak−1 −1 −1 −1 −1 −1 easy to verify that ak+1 (a1 a2 · · · ak−1 ak ) = ak+1 ak ak−1 · · · a2−1 a1−1 , hence, we −1 −1 ak · · · a2−1 a1−1 .  have (a1 a2 · · · ak ak+1 )−1 = ak+1 Proposition 2.2 (Cancelation laws) Let G a group. Then, 1. If ab = ac then b = c ∀a, b, c ∈ G. 2. If ba = ca then b = c ∀a, b, c ∈ G. Proof We have: 1. b = eb = (a −1 a)b = a −1 (ab) = a −1 (ac) = (a −1 a)c = ec = c 2. b = be = b(aa −1 ) = (ba)a −1 = (ca)a −1 = c(aa −1 ) = ce = c Hence, 1. and 2. have been proved.



2.2 Subgroups Definition 2.4 Let G a group and H (a nonempty set) such that H ⊂ G. It said that H is a subgroup of G if H is a group with respect to the operation of G. If H is a subgroup of G, it is denoted by H < G [10–13]. Remark 2.2 Every group G has two trivial subgroups: G and {e}. Proposition 2.3 A nonempty subset H ⊂ G is a subgroup of G if and only if H is closed with the operation of G and if a ∈ H then a −1 ∈ H . Proof • ⇒ (Sufficiency). Let H a subgroup of G by the definition of subgroup. In particular, H is a group, H is closed and there exists the inverse. • ⇐ (Necessity). H is closed, is nonempty and a −1 ∈ H (closure of H ). In addition, let a, b, c ∈ H , then a(bc) = (ab)c because H ⊂ G. Therefore H is a subgroup. 

22

2 Group Theory

Exercise 2.1 1. Let G = Z considering the usual sum. Let H the set of even numbers, i.e. H = {n | n = 2k, k ∈ Z}. Is H a subgroup of G? 2. Let X a nonempty set. We consider G = S X . Let a ∈ X then H (a) = { f ∈ S X | f (a) = a}. Prove that H < G through function composition. 3. Let G = C∗ = C \ {0} and the usual product. Let U = {z ∈ C∗ : | z |= 1}. Prove that U < G. 4. Let G a group and a ∈ G. Let C(a) = {g ∈ G | ga = ag}. Prove that C(a) is a subgroup of G. The proof of the exercises 1–3 are left to the reader. The proof of exercise 4 is as follows. C(a) is a group and in particular is a subgroup of G. Indeed. Firstly, C(a) = ∅ since a = ea = ae then e ∈ C(a). On the other hand, let g1 , g2 ∈ C(a), we have to verify that g1 g2 ∈ C(a) (closure). Since g1 , g2 ∈ C(a), g1 a = ag1 , g2 a = ag2 then (g1 g2 )a = g1 (g2 a) = g1 (ag2 ) = (g1 a)g2 = (ag1 )g2 = a(g1 g2 ). Hence g1 g2 ∈ C(a). Also if g ∈ C(a) then g −1 ∈ C(a), i.e., g −1 a = ag −1 . Since a = ae = a(gg −1 ) = (ag)g −1 = (ga)g −1 then g −1 (a) = g −1 ((ga)g −1 ) = g −1 gag −1 = eag −1 = ag −1 then g −1 ∈ C(a). Thus C(a) < G. Definition 2.5 Let G a group and a ∈ G. The set   A = a = a i | i ∈ Z is a subgroup of G (A is nonempty since a 0 = e ∈ A. The usual product for a i , a j ∈ A is given by a i a j = a i+ j , i, j ∈ Z, i + j ∈ Z, a i+ j ∈ A. On the other hand a i ∈ A then a −i = (a i )−1 = (a −1 )i ∈ A). The set A = a is called cyclic subgroup of G generated by a [14, 15]. Definition 2.6 Let G a group. It is said that G is cyclic if G = a for some a ∈ G. Exercise 2.2 1. Let (G 1 , ∗), (G 2 , ◦) and (G 3 , ) groups, where G 1 = {e}, G 2 = {e, a}, G 3 = {e, a, b} and each group operation is defined by the following tables: eab ◦ea ∗e e eab eea ee a abe aae b bea What is the generator element (or elements) in each group? 2. What is the generator element in the group defined by Z/2Z = Z2 = {[0], [1]} with the usual sum? 3. Let G a group, such that x 2 = e ∀x ∈ G. Prove that G is an abelian group. Hint: Verify that x = x −1 ∀x ∈ G. Definition 2.7 Let G a group, H a subgroup of G. For a, b ∈ G, it is said that a and b are congruent modulo H (a ∼ = b mod H ), if ab−1 ∈ H .

2.2 Subgroups

23

Exercise 2.3 According to above definition, prove that ∼ = is a equivalence relation. Definition 2.8 Let H a subgroup of G, a ∈ G. The set H a = {ha | h ∈ H } is called right lateral class of H in G. Lemma 2.1 Let G a group, and a ∈ G, then H a = {x ∈ G | a ∼ =x

mod H }

Proof Let a set defined by [a] = {x ∈ G | a ∼ =x

mod H }

• (Sufficiency). Let h ∈ H and ha ∈ H a. We take an arbitrary element in H a and we show that this element is in [a], i.e., if a(ha)−1 ∈ H then a ∼ = ha mod H and therefore ha ∈ [a]. Note that a(ha)−1 = a(a −1 h −1 ) = (aa −1 )h −1 = eh −1 = h −1 ∈ H , then ha ∈ [a]. So, we conclude that H a ⊂ [a]. • (Necessity). Let x ∈ [a] then a ∼ = x mod H , i.e., ax −1 ∈ H . In particular −1 −1 −1 (ax ) = xa ∈ H . Furthermore, let h = xa −1 ∈ H , then ha = (xa −1 )a = x(a −1 a) = xe = x ∈ H a, then [a] ⊂ H a. Since H a ⊂ [a] and [a] ⊂ H a, H a = [a] = {x ∈ G | a ∼  = x mod H }. Theorem 2.1 (Lagrange’s theorem) For any finite group G, the order of every subgroup H of G divides the order of G. Proof Considering that [a] = H a, the equivalence classes to form a partition of G, i.e., [a1 ] ∪ [a2 ] ∪ . . . ∪ [ak ] = G where [ai ] ∩ [a j ] = ∅. The right lateral classes form a partition given by G = H a1 ∪ H a2 ∪ . . . ∪ H ai ∪ . . . ∪ H ak ,

1≤i ≤k

Now, we establish a bijective map: H ai → H hai −→ H that is to say | H ai |=| H |

∀ 1 ≤ i ≤ k. Then

| G |=| H a1 | + | H a2 | + . . . + | H ak |=| H | + | H | + . . . + | H |= k | H |, k ∈ Z

This means that the order of every subgroup H of G divides the order of G, i.e.,

24

2 Group Theory

|H |/|G|



Definition 2.9 If G is a finite group and H is a subgroup of G, the index of H in G is |G| i G (H ) = |H| Definition 2.10 Let G a finite group and a ∈ G. The order of a is the minimum positive integer n such that a n = e. The order of a is denoted by O(a), then a O(a) = e [14, 16]. Proposition 2.4 Let G a finite group and a ∈ G, then O(a)/ | G | Proof Let suppose H = a, then O(a) = H . By Lagrange’s theorem, | H | / | G |, hence O(a)/ | G |.  Corollary 2.1 If G is a finite group of order n, then a n = e ∀a ∈ G. Proof According to above proposition, O(a)/ | G |. By hypothesis, the order of G is n, then O(a)/n, so there exists k ∈ Z such that n = k O(a). Note that a n = a k O(a) =  O(a) k a = ek = e.  Definition 2.11 A group N of G is said to be a normal subgroup of G, denoted by N  G, if for all g in G and for all n ∈ N , gng −1 ∈ N . Lemma 2.2 N is a normal subgroup of G if and only if g N g −1 = N for all g in G. Proof 1. (Necessity). If g N g −1 = N for all g in G, in particular, g N g −1 ⊂ N , then gng −1 ∈ N for all n ∈ N . Therefore N is a normal subgroup of G. 2. (Sufficiency). If N is a normal subgroup of G, then gng −1 ∈ N . Let g ∈ G, then −1 −1 −1 −1 hand, ) ⊂ N, for all n in N , g N g −1 ⊂ N . On the other   −1 g N g−1= g N (g −1 −1 −1 N gg = g g N g g ⊂ g N g . Therefore besides N = eN e = gg  g N g −1 = N . Lemma 2.3 Let N a subgroup of G. N is a normal subgroup of G if and only if all left lateral class of N in G is a right lateral class of N in G. Proof Let a H = {ah | h ∈ H } a left lateral class. 1. (Sufficiency). If N is a normal subgroup of G for all g ∈ G,  n ∈ N , then  from the above result, g N g −1 = N . Further g N = g N e = g N g −1 g = g N g −1 g = N g. Therefore, all left lateral class coincides with the right lateral class. −1 −1 2. (Necessity).  let suppose N g = g N , then g N g = (g N ) g = (N g)  Now, g −1 = N gg −1 = N e = N .

2.2 Subgroups

25

Hence that N  G.



Definition 2.12 Let H a subgroup of G. Let H H = {h 1 h 2 | h 1 , h 2 ∈ H } ⊂ H The proof of this assertion is immediate. Remark 2.3 H H = H . It is clear that H H ⊂ H . On the other hand, chose h 1 h 2 = h ∈ H since H < G as well as H ⊂ H H . Therefore H H = H . Corollary 2.2 Let N a normal subgroup of G, a, b ∈ G, then N a N b = N (N a) b = N N ab = N ab, ab ∈ G i.e., the product of right lateral classes is a right lateral class.



Definition 2.13 Let G/N the collection of right lateral classes of N in G: G/N = {N a | a ∈ G} Theorem 2.2 Let G a group, N is a normal subgroup of G, then G/N is a group called quotient group. Proof Let x, y, z ∈ G/N , then x = N a, y = N b and z = N c for a, b, c ∈ G. 1. (Closure) Let a, b ∈ G ⇒ N ab ∈ G/N . 2. (Associativity) Note that (x y)z = (x ∗ y) ∗ z = (N a N b) N c = (N ab) N c = N abc = N a (N bc) = N a (N bN c) = x ∗ (y ∗ z) = x(yz). 3. (Existence of identity element). Let N = N e then x N = N a N e = N N (ae) = N N a = N a = x N x = N eN a = N N (ea) = N N a = N a = x 4. (Existence of inverse element) Let N a −1 ∈ G/N . To show that x −1 = N a −1 is the inverse element of x = N a. Indeed, as x ∈ G/N then x = N a for any a in G. Then x x −1 = N a N a −1 = N N aa −1 = N N e = N e = N x −1 x = N a −1 N a = N N a −1 a = N N e = N e = N hence x −1 = N a −1 and G/N is a group.



26

2 Group Theory

2.3 Homomorphisms ¯ Let (G, ◦) and (G, ¯ ) groups with ◦ and  operDefinition 2.14 Let ϕ : G → G. ations defined G and G¯ respectively. It is said that ϕ is a homomorphism if for all a, b ∈ G: ϕ(a ◦ b) = ϕ(a)ϕ(b) (2.1) Example 2.3 Let G = R+ with usual multiplication operation in R and G¯ = R+ with usual addition operation in R. Let ϕ : R+ → R+ with ϕ(r ) = ln r . Let r1 , r2 ∈ R+ , then: ϕ(r1 · r2 ) = ln(r1 · r2 ) = ln r1 + ln r2 = ϕ(r1 ) + ϕ(r2 ) thus ϕ is a homomorphism. Lemma 2.4 Let G a group and N a normal subgroup of G. Let ϕ : G → G/N such that ϕ(x) = N x then ϕ is a homomorphism. Proof Let x, y ∈ G then ϕ(x) = N x and ϕ(y) = N y. Note that ϕ(x y) = N x y = N N x y = N x N y = ϕ(x)ϕ(y) Hence ϕ is a homomorphism.



¯ The kernel of ϕ, denoted by K ϕ Definition 2.15 If ϕ is a homomorphism of G in G. (or Ker ϕ) is given by: ¯ ⊂G K ϕ = {x ∈ G | ϕ(x) = e}

(2.2)

¯ where e¯ is the identity of G. ¯ then: Lemma 2.5 Let ϕ a homomorphism of G in G, 1. ϕ(e) = e¯ 2. ϕ(x −1 ) = [ϕ(x)]−1 ∀ x ∈ G Proof 1. Let x ∈ G. By the identical element axiom, x = xe then ϕ(x) = ϕ(xe) = ϕ(x)ϕ(e). On the other hand ϕ(x) = ϕ(x)e. ¯ Finally by the left cancellation rule we have ϕ(e) = e. ¯ 2. Note that e¯ = ϕ(e) = ϕ(x x −1 ) = ϕ(x)ϕ(x −1 ) and e¯ = ϕ(e) = ϕ(x −1 x) = ϕ(x −1 )ϕ(x) therefore ϕ(x −1 ) is the inverse element of ϕ(x), this is ϕ(x −1 ) =  [ϕ(x)]−1 ∀ x ∈ G. Example 2.4 1. Let G an Abelian group and ϕ : G → G such that ϕ(a) = a 2 . Let a1 , a2 ∈ G such that ϕ(a1 ) = a12 and ϕ(a2 ) = a22 . Furthermore as G is Abelian, a1 a2 = a2 a1 then it follows that (a1 a1 )(a2 a2 ) = (a1 a2 )(a1 a2 ). Finally ϕ(a1 a2 ) = (a1 a2 )2 = (a1 a2 )(a1 a2 ) = (a1 a1 )(a2 a2 ) = a12 a22 = ϕ(a1 )ϕ(a2 ). Therefore ϕ is a homomorphism.

2.3 Homomorphisms

27

2. Let (G, ∗), (G  , ) groups and e the identity element of G  . The function ϕ : G → G  defined by ϕ(g) = e for all g ∈ G is a homomorphism. To verify this statement, let g1 , g2 ∈ G. It is not hard to see that ϕ(g1 ∗ g2 ) = e = e e = ϕ(g1 )ϕ(g2 ). 3. Let (G, ∗) a group. The identity function IdG : G → G is a homomorphism. Indeed, let g1 , g2 ∈ G then IdG (g1 ) = g1 and IdG (g2 ) = g2 . Now, let g1 ∗ g2 ∈ G then IdG (g1 ∗ g2 ) = g1 ∗ g2 = IdG (g1 ) ∗ IdG (g2 ). This prove that ϕ is a homomorphism. 4. Let (Z, +) and (G  , ·) groups, where G  = {1, −1} and · is the usual product between integers. Let define ϕ : Z → G  in the following form:

ϕ(n) =

1, if n is even −1, if n is odd

Is ϕ(n) a homomorphism? To answer this question, it is necessary to consider three cases: a. If x1 , x2 are even numbers then these numbers can be represented as x1 = 2k1 and x2 = 2k2 , where k1 , k2 ∈ Z. According to ϕ(n), ϕ(x1 ) = 1, ϕ(x2 ) = 1. Further, x1 + x2 = 2k1 + 2k2 = 2(k1 + k2 ) is even then ϕ(x1 + x2 ) = 1 = 1 · 1 = ϕ(x1 ) · ϕ(x2 ). b. If x1 , x2 are odd numbers then they can be represented in the form x1 = 2k1 + 1 and x2 = 2k2 + 1, where k1 , k2 ∈ Z. Further, x1 + x2 = (2k1 + 1) + (2k2 + 1) = 2(k1 + k2 + 1) is even, ϕ(x1 ) = −1 and ϕ(x2 ) = −1, then ϕ(x1 + x2 ) = 1 = (−1) · (−1) = ϕ(x1 ) · ϕ(x2 ). c. If x1 is even and x2 is a odd number, the sum x1 + x2 = 2k1 + 2k2 + 1 = 2(k1 + k2 ) + 1 is a odd number, ϕ(x1 ) = 1, ϕ(x2 ) = −1. On the other hand ϕ(x1 + x2 ) = −1 = (1) · (−1) = ϕ(x1 ) · ϕ(x2 ). Hence ϕ(n) is a homomorphism. 5. Let n ∈ N, G = Z and G  = Z/nZ = Zn . Let define ϕ : Z → Zn by ϕ(a) = [a]. This function is a homomorphism because ϕ(a + b) = [a + b] = [a] + [b] = ϕ(a) + ϕ(b). 6. Let C∗ = C \ {0} and the groups (C∗ , ∗) and (R+ , ·). Operations ∗ and · represent the usual product of complex numbers and real numbers respectively. Let ϕ : C∗ → R+ the function given by ϕ(z) = |z|. Show that ϕ is a homomorphism. The proof is left to the reader. Definition 2.16 A homomorphism ϕ : G → G  is said that is • Monomorphism if it is an injective function (one to one or 1–1). • Epimorphism if it is surjective (onto). • Isomorphism if it is a bijection (injective and surjective). Definition 2.17 An automorphism of a group one means an isomorphism of a group with itself.

28

2 Group Theory

Definition 2.18 If ϕ : G → G  is a isomorphism then G and G  are isomorphic. This characteristic is denoted by G ∼ = G. Proposition 2.5 If ϕ : G → G  is a homomorphism then Im ϕ < G  , where Im ϕ =  y ∈ G  | ϕ(x) = y, x ∈ G ⊂ G  . Proof Let y1 , y2 ∈ Im ϕ then y1 = ϕ(x1 ), y2 = ϕ(x2 ) ∈ G  , x1 , x2 ∈ G. In addition y1 y2 = ϕ(x1 )ϕ(x2 ) = ϕ(x1 x2 ) because ϕ is a homomorphism, then y1 y2 ∈ Im ϕ. Now, let y = ϕ(x) ∈ Im ϕ. By hypotheses ϕ is a homomorphism then ϕ(x)ϕ(x −1 ) = ϕ(x x −1 ) = ϕ(e) = e ∈ G  . In the same manner ϕ(x −1 )ϕ(x) = ϕ(x −1 x) = ϕ(e) = e ∈ G  . Therefore [ϕ(x)]−1 = ϕ(x −1 ) and Im ϕ < G  . This proves the assertion.  Proposition 2.6 Let ϕ : G → G  a homomorphism then ϕ is a monomorphism if and only if Ker ϕ = {0}. Proof Let x1 , x2 such that ϕ(x1 ) = ϕ(x2 ) then ϕ(x1 ) − ϕ(x2 ) = 0. From last equality we have ϕ(x1 − x2 ) = 0 because ϕ is a homomorphism. According to the hypothesis Ker ϕ = {0} then x1 − x2 ∈ Ker ϕ = {0}. Therefore x1 − x2 = 0 and this implies that x1 = x2 . This proves the sufficiency of the proposition. The necessity is left to the reader.  Theorem 2.3 If ϕ : G → G  a homomorphism then 1. Ker ϕ < G 2. For all a ∈ G, a −1 (Ker ϕ) a ⊂ Ker ϕ Proof Ker ϕ = ∅ since exists e ∈ G such that ϕ(e) = e . To prove the first assertion, let x, y ∈ Ker ϕ then ϕ(x y) = ϕ(x)ϕ(y) = e e = e . This implies that x y ∈ Ker ϕ and therefore Ker ϕ is closed. On the other hand, let x ∈ Ker ϕ then ϕ(x) = e . Note that ϕ(x −1 ) = e ϕ(x −1 ) = ϕ(x)ϕ(x −1 ) = ϕ(x x −1 ) = ϕ(e) = e . Therefore x −1 ∈ Ker ϕ and the first assertion is proved. For the second one, let a ∈ G, g ∈ Ker ϕ then ϕ(a −1 ga) = ϕ(a −1 )ϕ(g)ϕ(a) = ϕ(a −1 )e ϕ(a) = ϕ(a −1 )ϕ(a) = ϕ(a −1 a) = ϕ(e) = e . This implies that a −1 ga ∈ Ker ϕ for all g ∈ G. Therefore a −1 (Ker ϕ) a ⊂ Ker ϕ. In addition, Ker ϕ  G.  Example 2.5 Let ϕ : Z → Z given by ϕ(a) = 2a. Is ϕ a homomorphism? If ϕ is homomorphism, is it a monomorphism? To answer the first question, let a, b ∈ Z then ϕ(a) = 2a, ϕ(b) = 2b. On the other hand, ϕ(a + b) = 2(a + b) = 2a + 2b = ϕ(a) + ϕ(b), then ϕ is a homomorphism under the sum in Z. Is left as an exercise to verify that ϕ is not homomorphism under the product in Z. Note that Ker ϕ = {a ∈ Z | ϕ(a) = 2a = 0} then Ker ϕ = {0}. Thus ϕ is a monomorphism. Exercise 2.4 Check if the following functions are homomorphism with the sum and product. In addition to check if they are also monomorphisms. 1. ϕ : Z → G  where G  = {1, −1} and

ϕ(n) =

1, if n is even −1, if n is odd

2.3 Homomorphisms

29

2. ϕ : C∗ → C∗ the function given by ϕ(z) = |z| where C∗ = C \ {0}. Proposition 2.7 Let G a group and N a normal subgroup of G. There exist a epimorphism ϕ : G → G/N with Ker ϕ = N . Proof Let define ϕ : G → G/N given by ϕ(a) = [a]. Let a, b ∈ G then ϕ(ab) = [ab] = [a][b] = ϕ(a)ϕ(b) so that ϕ is a homomorphism. On the other hand ϕ is onto (by construction) then ϕ is a epimorphism because if a ∈ Ker ϕ then ϕ(a) = e then a∼ = e mod N . This implies that ae−1 ∈ N hence a ∈ N . The inverse demonstration is followed immediately and therefore Ker ϕ = N . 

2.4 The Isomorphism Theorems Theorem 2.4 (First theorem of isomorphisms) Let ϕ : G → G  , g ∈ G, ϕ(g) ∈ G  an epimorphism with kernel K then G/K ∼ = G. ¯ = ϕ(g). By construction ϕ¯ is onto Proof Let the function ϕ¯ : G/K → G  with ϕ(kg)  ¯ = ϕ(g) where kg ∈ G/K . then for all ϕ(g) ∈ G there exists g ∈ G such that ϕ(kg) ¯ 1 ) = ϕ(kg ¯ 2) To prove the injectivity of ϕ, we consider g1 , g2 ∈ G such that ϕ(kg ¯ 1 ) = ϕ(kg ¯ 2) then we will verify that kg1 = kg2 . According to ϕ¯ function, if ϕ(kg then ϕ(g1 ) = ϕ(g2 ) so that ϕ(g1 )ϕ(g2−1 ) = e and therefore ϕ(g1 g2−1 ) = e . Hence g1 g2−1 ∈ K and g2 g1−1 ∈ K . From these assertions it follows that g1 ∼ = g2 mod K and g2 ∼ = g1 mod K . Therefore g1 ∼ g2 , g2 ∼ g1 and [g1 ] = [g2 ] then kg1 = kg2 , this yields to ϕ¯ is injective. The last part of the proof is to check that ϕ¯ is a homomor¯ ¯ 1 g2 ) = ϕ(g1 g2 ) = phism. To check this, note that ϕ(kg ¯ 1 kg2 ) = ϕ(kkg 1 g2 ) = ϕ(kg ¯ 1 )ϕ(kg ¯ 2 ). Hence ϕ¯ is a isomorphism and G/K ∼  ϕ(g1 )ϕ(g2 ) = ϕ(kg = G. Theorem 2.5 (Second theorem of isomorphisms) Let G a group, H < G and N  G then H N < G, H ∩ N  H , N  H N and H N /N ∼ = H/(H ∩ N ). Theorem 2.6 (Third theorem of isomorphisms) Let G a group, N  G and K < N with K  G then G/K /N /K ∼ = G/N . Definition 2.19 Let G 1 , G 2 , . . . , G n groups. The direct product (or exterior product) denoted by G 1 × G 2 × · · · × G n is the set (a1 , a2 , . . . , an ) where each ai ∈ G for all i ∈ N and the operation in the direct product is defined component by component: (2.3) (a1 , a2 , . . . , an )(b1 , b2 , . . . , bn ) = (a1 b1 , a2 b2 , . . . , an bn ) Remark 2.4 G = G1 × G2 × · · · × Gn is a group whose identity element is (e1 , e2 , . . . , en ). The inverse element of (a1 , a2 , . . . , an ) is given by (a1−1 , a2−1 , . . . , an−1 ).

30

2 Group Theory

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

Birkhoff, G., Mac Lane, S.: A Survey of Modern Algebra. Universities Press (1965) Dummit, D.S., Foote, R.M.: Abstract Algebra (Vol. 3). Hoboken: Wiley (2004) Fraleigh, J.B.: A First Course in Abstract Algebra. Addison-Wesley (2003) Hungerford, T.W.: Abstract Algebra: an Introduction. Cengage Learning (2012) Sontag, E.D.: Linear systems over commutative rings. A Survey. Ricerche di Automatica 7(1), 1–34 (1976) Bourbaki, N.: Theory of Sets. Springer, Berlin, Heidelberg (2004) Herstein, I.N.: Topics in Algebra. Wiley (2006) Lang, S.: Algebra. Springer (2002) Herstein, I.N.: Abstract Algebra. Prentice Hall (1996) Saracino, D.: Abstract Algebra: a First Course. Waveland Press (2008) Jacobson, N.: Lectures in Abstract Algebra, vol. 1. Van Nostrand, Basic Concepts (1951) Zariski, O., Samuel, P.: Commutative Algebra Vol. 1. Springer Science & Business Media (1958) Zariski, O., Samuel, P.: Commutative Algebra Vol. 2. Springer Science & Business Media (1960) Kunz, E.: Introduction to Commutative Algebra and Algebraic Geometry. Birkhuser (1985) Borel, A.: Linear Algebraic Groups (Vol. 126). Springer Science & Business Media (2012) Samuel, P., Serge, B.: Mthodes d’algbre abstraite en gomtrie algbrique. Springer (1967)

Chapter 3

Rings

Abstract This chapter contains introductory concepts to ring theory, beginning with basic definitions, followed by the definitions of ideals, homomorphisms and rings, then isomorphism theorems in rings are given, next properties about integer rings are introduced and finally polynomial rings are explained.

3.1 Basic Definitions Definition 3.1 A nonempty set R together with two binary operators + and ∗ is called ring if satisfies the following axioms [1–5]: 1. 2. 3. 4. 5. 6. 7. 8.

The operation a + b ∈ R for all a, b ∈ R. a + (b + c) = (a + b) + c for all a, b, c ∈ R. a + b = b + a for all a, b ∈ R. There exists 0 ∈ R such that a + 0 = a for all a ∈ R. For all a ∈ R there exists b ∈ R such that a + b = 0. In this case b = −a. Given a, b ∈ R, a ∗ b ∈ R for all a, b ∈ R. a ∗ (b ∗ c) = (a ∗ b) ∗ c for all a, b, c ∈ R. a ∗ (b + c) = a ∗ b + a ∗ c and (b + c) ∗ a = b ∗ a + c ∗ a for all a, b, c ∈ R.

Remark 3.1 The axioms 1–5 listed above correspond to an additive Abelian group under addition. Axioms 6 and 7 mean that a ring is a semigroup (closed and associative). Finally the last two axioms correspond to the distributivity property. Definition 3.2 It is said that a ring R is a ring with identity if there exists 1 ∈ R, 1 = 0 such that a ∗ 1 = 1 ∗ a = a for all a ∈ R. Definition 3.3 A ring R is a commutative ring if a ∗ b = b ∗ a for all a, b ∈ R.

© Springer Nature Switzerland AG 2019 R. Martínez-Guerra et al., Algebraic and Differential Methods for Nonlinear Control Theory, Mathematical and Analytical Techniques with Applications to Engineering, https://doi.org/10.1007/978-3-030-12025-2_3

31

32

3 Rings

Definition 3.4 Let R a ring, a ∈ R, a = 0. It said that a is a divisor of zero if there exists b ∈ R, b = 0 such that a ∗ b = 0 (right division) or there exists c ∈ R, c = 0 such that c ∗ a = 0 (left division). Example 3.1 In Z6 we have [3][2] = [0] then [3] and [2] are divisors of zero. Definition 3.5 Let R a ring with identity. It said that R is a ring with division if for all a ∈ R, a = 0 there exists b ∈ R such that ab = ba = 1. In this case b = a −1 . Example 3.2 In Z5 , [2][3] = 1 then [2] = [3]−1 . Analogously [3][2] = [1] then [3] = [2]−1 . Definition 3.6 A field is a ring with division and it is also commutative. Remark 3.2 A field is an Abelian group with the sum and the multiplication. Example 3.3 Z is not a field under the multiplication and sum. The proof is left to the reader as an exercise. Definition 3.7 A commutative ring with identity is an integral domain if ab = 0 then a = 0 or b = 0. In other words, an integral domain is a nonzero commutative ring with no nonzero zero divisors. Example 3.4 1. In Z6 , [2][3] = [6] = [0]. Note that [2] = [0] and [3] = [0] therefore is not an integral domain. 2. Z5 is a commutative ring with identity and every element different from [0] has inverse, for example [1]−1 = [1] or [2]−1 = [3]. In addition, Z5 is an integral domain. Is it a field? Remark 3.3 If p is prime number then Z p is a field. Exercise 3.1 Let M = M2×2 (R) =

    a b  a, b, c, d ∈ R cd 

with usual sum and multiplication of matrices in R. Let (I2 )i j = δi j the identity matrix, where  1, if i = j δi j = 0, otherwise. Verify that M is a ring with identity, not commutative and that is not an integral domain. Lemma 3.1 (Cancellation rules) Let a, b, c ∈ R if b + a = c + a then b = c.

3.1 Basic Definitions

33

Proof Applying directly the axioms of a ring: b = b + 0 = b + (a + (−a)) = (b + a) + (−a) = (c + a) + (−a) = c + (a + (−a)) = c + 0 = c therefore b = c.



Proposition 3.1 Let R be a ring, a, b ∈ R then [6, 7]: 1. 2. 3. 4.

a·0=0·a =0 a(−b) = (−a)(b) = −(ab) (−a)(−b) = (−(−a))b = ab If 1 ∈ R then (−1)a = −a

Proof Applying the axioms of a ring, mainly considering the additive inverse we have: 1. a · 0 + 0 = a · 0 = a · (0 + 0) = a · 0 + a · 0 then by cancellation rule a · 0 = 0. 2. a(−b) + a · b = a(−b + b) = a · 0 = 0, then a(−b) = −ab. 3. (−a)(−b) + (−a)b = (−a)(−b + b) = −a · 0 = 0, then (−a)(−b) = −(−a)b. 4. (−1)a + a = a(−1 + 1) = a · 0 = 0 then (−1)a = −a. 

3.2 Ideals, Homomorphisms and Rings Definition 3.8 A function ϕ : R → R  is a homomorphism if [1, 3–5, 8] 1. ϕ(a) + ϕ(b) = ϕ(a + b) 2. ϕ(a)ϕ(b) = ϕ(a · b) Definition 3.9 Let ϕ : R → R  a homomorphism of rings [9, 10], then 1. ϕ is monomorphism if it is injective (one-to-one or 1 − 1). 2. ϕ is epimorphism if it is surjective (onto). 3. ϕ is isomorphism if it is bijection (1 − 1 and onto). Definition 3.10 The kernel of ϕ is Ker ϕ = {x ∈ R | ϕ(x) = 0} ⊂ R. Proposition 3.2 Let ϕ : R → R  a homomorphism of rings, then [8, 11] 1. Ker ϕ is an additive subgroup. 2. If k ∈ Ker ϕ and r ∈ R then r k, kr ∈ Ker ϕ. 3. ϕ is 1 − 1 if and only if Ker ϕ = {0}. Proof We only prove the second assertion. Let k ∈ Ker ϕ and r ∈ R then ϕ(r k) = ϕ(r ) · ϕ(k) = ϕ(r ) · 0 = 0. In the same manner ϕ(kr ) = ϕ(k) · ϕ(r ) = 0 · ϕ(r ) = 0, thus r k, kr ∈ Ker ϕ. The proof of 1 and 3 are left to the reader. 

34

3 Rings

Definition 3.11 Let R a ring. A subset I of R is said to be an ideal de R if 1. I is an additive subgroup of R. 2. Let r ∈ R, a ∈ I then ra ∈ I and ar ∈ I (absorption property). Corollary 3.1 If ϕ : R → R  is an homomorphism, then Ker ϕ is an ideal. Proof It is immediate from Proposition 3.2.



Definition 3.12 Let R a ring [12–14], I an ideal of R then R/I (quotient ring) is a group with the sum of equivalence classes: (a + I ) + (b + I ) = (a + b) + I, a, b ∈ R and we define the product as: (a + I )(b + I ) = ab + I Remark 3.4 Let R a ring then {0} and R are rings of R called the trivial ideals. Definition 3.13 If I is an ideal of R and I = R, it is said that I is a proper ideal. Remark 3.5 Let R a ring with identity 1, I an ideal of R such that 1 ∈ I , then I = R. Since I is an ideal of R then I ⊂ R we should verify that R ⊂ I . Let a ∈ R, since a · 1 = a by definition a = a · 1 ∈ I (1 ∈ I ) then R ⊂ I . Therefore R = I . Example 3.5 Let R be a commutative ring with identity. Let a ∈ R and let define (a) = {ax | x ∈ R}. Prove that (a) is an ideal in R. It is left to the reader as an exercise. Definition 3.14 Let R a commutative ring with identity. A principal ideal is an ideal of the form (a) for some a ∈ R. Definition 3.15 Let R an integral domain. It is said that R is a principal ideals domain if every ideal of R is principal. Exercise 3.2 Let I, J ideals of a ring R. Let define I + J = {a + b | a ∈ I, b ∈ J }. Show that I + J and I ∩ J are ideals. Remark 3.6 The quotient ring R/I is a ring with the addition and product respectively: • (a + I ) + (b + I ) := (a + b) + I • (a + I )(b + I ) := ab + I We have the following assumption: Lemma 3.2 The function

is an epimorphism.

ϕ : R → R/I a → a + I

3.2 Ideals, Homomorphisms and Rings

35

Proof Note that ϕ is a homomorphism. Indeed, ϕ(a + b) = (a + b) + I = (a + I ) + (b + I ) = ϕ(a) + ϕ(b). On the other hand, ϕ(ab) = ab + I = (a + I )(b + I ) = ϕ(a)ϕ(b). By construction, function ϕ is onto. Thus ϕ is an epimorphism.  Exercise 3.3 Show that kernel of ϕ given by Ker ϕ = {a ∈ R | ϕ(a) = 0 + I } coincides with the ideal I .

3.3 Isomorphism Theorems in Rings Now we present three theorems about isomorphisms in ring theory. Theorem 3.1 (First theorem of isomorphisms) Let ϕ : R → R  an epimorphism of rings and K = Ker ϕ then R/K is isomorphic to R  , i.e., R/K ∼ = R. Theorem 3.2 (Second theorem of isomorphisms) Let R a ring, A subset of R (A is a sub-ring)and B and ideal of R then A + B is a sub-ring of R and A ∩ B is an ideal of A. In addition (A + B)/B ∼ = A/(A ∩ B) Theorem 3.3 (Third theorem of isomorphisms) Let I, J ideals of ring R with I ⊂ J then J/I is an ideal of R/I . In addition (R/I )/(J/I ) ∼ = R/J Remark 3.7 Let F a field (commutative ring with division), {0} is an ideal of F. Example 3.6 Let I an ideal of F, I = {0} then F = I . Proof Let a ∈ I, a = 0 then 1 = a −1 a ∈ I . On the other hand, let r ∈ F then r = 1 · r ∈ I hence F ⊂ I thus F = I . Example 3.7 Let n ∈ N, n > 1 and ϕ : Z → Z/nZ a → [a] ϕ is an epimorphism. Since Ker ϕ = {a ∈ Z | [a] = [0]} = {a ∈ Z | a ∼ = 0 mod n} = {a ∈ Z | a = nz, z ∈ Z} = nZ. Example 3.8 Let R = { f : [0, 1] → R | f is continuous}. We define: ( f + g)(x) = f (x) + g(x) ( f · g)(x) = f (x) · g(x) R is a commutative ring with identity. The proof is left to the reader.

36

3 Rings

 Example 3.9 Let R a group, I = f ∈ R | f ( 21 ) = 0 . I is an ideal. Show that R/I ∼ = R. Define ϕ:R→R f → f ( 21 ) ϕ is an homomorphism. Let f, g ∈ R then ϕ( f + g) = ( f + g)( 21 ) = f ( 21 ) + g( 21 ) = ϕ( f ) + ϕ(g) ϕ( f · g) = ( f · g)( 21 ) = f ( 21 ) · g( 21 ) = ϕ( f ) · ϕ(g) it is clear that ϕ is an epimorphism (by construction). Hence by using Theorem 3.1 R/I ∼ = R. Remark 3.8 In Chap. 10, we will focus on algebraic structures with a derivation. In the case of rings, any commutative ring can be viewed as a differential ring.

3.4 Some Properties of Integers 3.4.1 Divisibility Definition 3.16 Let a, b ∈ Z. It is said that b divides a if there exists q ∈ Z such that a = bq. In other words, b is a factor of a or a it is a multiple of b. Remark 3.9 The notation for divisibility of numbers is the following: • If b divides a, we write b | a. • In other case, i.e., if b not divides a, we write b  a. Theorem 3.4 Let a, b, c ∈ Z. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

b |b∀b ∈Z b | 0 ∀ b ∈ Z \ {0} 1 | a and −1 | a ∀ a ∈ Z 0 | a if and only if a = 0 If b | 1 then b = ±1 If b | a and a | b then a = ±b If b | a and a | c then b | c If b | a and a | c then b | (a + c) and b | (a − c) b | a then b | (ac) ∀ c ∈ Z b | a and b | c then b | (as + ct) ∀ s, t ∈ Z

3.4 Some Properties of Integers

37

11. b | a ⇐⇒ b | −a ⇐⇒ −b | a ⇐⇒ −b | −a 12. b | a ⇐⇒ b | |a| ⇐⇒ |b| | a ⇐⇒ |b| | |a| Proof 1. According to definition of divisibility, we have to find a number q ∈ Z such that b = b · q. Since b = b · 1, it is obvious that q = 1 in the definition. 2. Since 0 = b · 0, in the definition of divisibility q = 0 for this case. 3. The results are immediately: a = 1 · a and a = (−1)(a). 4. If 0 divides a then there exists q ∈ Z such that a = 0 · q. It follows that a = 0. The rest of the proof is left to the reader. 5. By hypothesis, b | 1 then there exists q ∈ Z such that 1 = b · q then b = 0, q = 0 and 1 = |bq|. Since b, q ∈ Z and both numbers are not zero, then |b| ≥ 1 and |q| ≥ 1. If |b| > 1 then |q| < |b||q| = 1. This is a contradiction, then b = |1|, so that b = 1 or b = −1. 6. By hypothesis, there exist q1 , q2 ∈ Z such that a = bq1 and b = aq2 . Combining these expressions a = (aq2 )q1 . If a = 0 it is clear that b = 0 so that we consider a = 0 then 1 = q1 q2 . According to definition of divisibility, it follows that q1 | 1 and by 5 we have that q1 = ±1. Therefore a = b(1) = b and a = b(−1) = −b. 7. If b | a and a | c then there exist q1 , q2 ∈ Z such that a = bq1 and c = aq2 . Combining these expressions c = (bq1 )q2 = b(q1 q2 ). Let k = q1 q2 ∈ Z then it follows that b | c. 8. Similar as above, let q1 , q2 ∈ Z such that a = bq1 and c = aq2 (by hypothesis) then a + c = bq1 + bq2 = b(q1 + q2 ). Since q1 + q2 ∈ Z then b | (a + c). The rest of the proof is left to the reader. 9. By hypothesis there exists q ∈ Z such that a = bq. Let c ∈ Z then ac = (bq)c = b(qc). Since qc ∈ Z it follows that b | (ac). 10. If b | a and b | c then there exist q1 , q2 ∈ Z such that a = bq1 and c = bq2 . Let s, t ∈ Z then as + ct = (bq1 )s + (bq2 )t = b(q1 s + q2 t). Since q1 s + q2 t ∈ Z we conclude that b | (as + ct). The proof of the statements 11 and 12 are left to the reader.



Example 3.10 Let n ∈ Z such that (−10) | n then 5 | n. To prove this, according to the definition of divisibility, there exists k ∈ Z such that n = (−10)k. Since that −10 = (5)(−2), then n = 5(−2k) where −2k ∈ Z. We have an integer number m = −2k such that n = 5m. Therefore 5 | n. Example 3.11 Let x ∈ Z such that (−12) | x and x = 0 then |x| ≥ 12. To prove this, according to the definition of divisibility, there exists m ∈ Z such that x = (−12)m with m = 0. If m = 0 then x = 0 that contradicts the hypothesis x = 0. Since m ∈ Z and m = 0, it follows that |m| ∈ Z and |m| > 0 so that |m| ≥ 1. On the other hand |x| = |−12m| = 12|m| and considering that |m| ≥ 1 then |x| = 12|m| ≥ 12. Therefore |x| ≥ 12. The above example can be generalized as follows. Theorem 3.5 Let a, b ∈ Z such that b = 0 and a | b then |a| ≤ |b|.

38

3 Rings

Proof By definition, there exists k ∈ Z such that b = ak. If k = 0 then b = 0 that contradicts the hypothesis so that k = 0. Since k ∈ Z and k = 0 it follows that |k| ∈ Z and |k| > 0 then |k| ≥ 1. Taking the absolute value of b and considering that |k| ≥ 1 we have |b| = |ak| = |a||k| ≥ 1 · |a| = |a|. The proof is completed. 

3.4.2 Division Algorithm Theorem 3.6 If a, b ∈ Z and b = 0, then there exist q, r ∈ Z (unique), such that: a = bq + r,

0 ≤ r < |b|

Proof We define a set S = {a − bx | x ∈ Z}. Firstly we will verify that S ⊂ Z+ ∪ {0}. (a) b < 0. If b is negative then r b ≤ −1 this yields to b|a| ≤ −|a| ≤ a then a − b|a| ≥ 0. (b) b > 0. If b is positive then b ≥ 1. We have that b (−|a|) ≤ −|a| ≤ a then a − b(−|a|) ≥ 0. Hence S ⊂ Z+ ∪ {0}. On the other hand we state that r < |b|. Suppose that r is the smaller of the non-negative integers and r ∈ S, as well as we suppose that r ≥ |b| then r − |b| ≥ 0, since r ∈ S, there exists x = q ∈ Z such that r = a − bq then r − |b| = a − bq − |b|. We have the following  a − b(q − 1), if b > 0 r − |b| = a − b(q + 1), if b < 0 but r − |b| ∈ S, so • If b < 0 then a − b(q − 1) = a − bq + b = r + b < r . • If b > 0 then a − b(q + 1) = a − bq − b = r − b < r . Thus is not true that r ≥ |b| then r < |b|. The proof of uniqueness of q and r is left to the reader.  Remark 3.10 In the above algorithm, a is namely dividend, b is the divisor and the numbers q and r are quotient and residue, respectively.

3.4.3 Greatest Common Divisor Definition 3.17 Let a, b ∈ Z, not both zero. It is said that d ∈ Z, d > 0, is the greatest common divisor (gcd) of a and b if:

3.4 Some Properties of Integers

39

1. d | a and d | b 2. If c ∈ Z is such that c | a and c | b, then c | d Remark 3.11 If d is the greatest common divisor of a and b, it is very common to write d = (a, b) or d = gcd {a, b}. Lemma 3.3 If d = (a, b) then d is unique. Proof Suppose d  (a, b), then from the Definition 3.17, d | d  and d  | d. Finally,  from 6 in Theorem 3.4, d  = d. Lemma 3.4 Let a, b ∈ Z, b = 0. If b | a, then |b| = (a, b) Proof Consider statements 1. and 12. of Theorem 3.4. We have that b | b and by hypothesis b | a then |b| | b and |b| | a respectively. On the other hand, let c ∈ Z such that c | a and c | b then by the same Theorem, c | |b|. Therefore |b| = (a, b).  Lemma 3.5 Let a, b ∈ Z, not both zero. If a = bq + r for some q, r ∈ Z, then d = (a, b) if and only if d = (b, r ). Proof If b  a then

a = bq + r, 0 ≤ r ≤ |b|

(3.1)

¯ b = cq  . 1. Let c such that c | a and c | b then there exist ¯ q  ∈ Z such that a = cq,

q, From (3.1) r = a − bq = cq¯ − cq  q = c q¯ − q  q then c | r . 2. Let l such that l | b and l | r then there exist m, m¯ ∈ Z such that b = lm, r = l m. ¯ From (3.1) r = a − bq = a − lmq = l m¯ then a = l m¯ + lmq = l(m¯ + mq), i.e., l | a. Thus (a, b) = (b, r ).



Theorem 3.7 (The Euclidean Algorithm) Let a, b ∈ Z, not both zero, there exists d = (a, b). In addition, d is the minimum positive integer such that, there exist m, n ∈ Z such that d = am + bn (Bézout’s identity). Proof The methodology to find the greatest common divisor is given by the following manner. If b  a then a = bq + r, 0 ≤ r < |b| (3.2) If r | b then b = qr and r = (a, b). The algorithm is ended. On the other hand, if r  b then (3.3) b = q1 r + r1 , 0 ≤ r1 < |r | Suppose that r1 | r then r1 = (b, r ) and the algorithm is finished. If r1  r then

40

3 Rings

r = r1 q2 + r2 , 0 ≤ r2 < |r1 |

(3.4)

Continuing the same process, in step k + 1 if rk−1  rk−2 then rk−2 = rk−1 qk + rk , 0 ≤ rk ≤ |rk−1 |. Suppose that if rk | rk−1 then rk−1 = rk qk+1 , therefore rk = (rk−2 , rk−1 ) = . . . = (r, r1 ) = (b, r ) = (a, b) = d. On the other hand, from (3.2), r = a − bq = a + (−q)b = m 1 a + n 1 b where m 1 = 1, n 1 = −q ∈ Z. From (3.3) making algebraic manipulations: r1 = b − rq1 = b − (m 1 a + n 1 b)q1 = −m 1 q1 a + (1 − n 1 q1 )b = m 2 a + n 2 b, where m 2 = −m 1 q1 , n 2 = (1 − n 1 q1 ) ∈ Z. In the same manner, from (3.4): r2 = m 3 a + n 3 b, m 3 = m 1 − m 2 q2 , n 3 = n 1 − n 2 q2 ∈ Z In general, we obtain rk = m k+1 a + n k+1 b = ma + nb where m = m k+1 ,  n = n k+1 ∈ Z, hence d = ma + nb. Example 3.12 Calculate the greatest common divisor d of following pairs of numbers x, y. Also, find numbers u, v (Bézout coefficients) such that ux + vy = d. 1. d1 = (141, 96). Applying the Euclidean algorithm: 141 = 96(1) + 45 96 = 45(2) + 6 45 = 6(7) + 3 6 = 3(2) + 0 Hence d1 = (141, 96) = 3. Indeed, 191 = 3 · 47 and 96 = 3 · 32. On the other hand: 3 = 45 − 7(6) = 45 − 7 [96 − 45(2)] = 15(45) − 7(96) = 15 [141 − 96(1)] − 7(96) = 141(15) + 96(−22) So that the Bézout coefficients are u = 15, v = −22 and 141(15) + 96(−22) = 2115 − 2112 = 3. 2. d2 = (−2805, 119) = 17. Indeed, applying the Euclidean algorithm:

3.4 Some Properties of Integers

41

−2805 = 119(−24) + 51 119 = 51(2) + 17 51 = 17(3) + 0 In addition, it is not hard to obtain 17 = 119(−47) − 2805(−2). 3. d3 = (726, 275) = 11 and 11 = 726(11) + 275(−29). Arithmetic operations are left to the reader as an exercise. Proposition 3.3 If a | c, b | c and (a, b) = d then ab | cd. Proof By hypothesis, there are integers m, n ∈ Z such that c = am and c = bn. By the Bezout’s identity d = pa + qb where p, q ∈ Z, then cd = c ( pa + qb) = cpa + cqb = bnpa + amqb = ab (np + mq) , np + mq ∈ Z Hence ab | cd.



Definition 3.18 Let a, b ∈ Z no both zero. It is said that a and b are relatively prime (mutually prime or coprime) if the only positive integer that evenly divides both of them is 1, i.e., (a, b) = 1. Theorem 3.8 Let a, b ∈ Z. 1. If 1 = (a, b) and a | bc then a | c. 2. If 1 = (a, b), then 1 = (a, bn ) ∀ n ∈ N Proof 1. Since (a, b) = 1 then there exist m, n ∈ Z such that am + bn = 1, so that c = a(cm) + bc(n). On the other hand, since a | a and a | bc then a | (a(cm) + bc(n)). Therefore a | c. 2. Let A be the set of positive integers that satisfy the statement to prove. If 1 = (a, b), then 1 = (a, b1 ) so that 1 ∈ A. Suppose that n = k ∈ A, i.e., if 1 = (a, b), then 1 = (a, bk ), hence there exist m, n ∈ Z such that 1 = am + bk n. On the other hand, since (a, b) = 1 then there exist x, y ∈ Z such that 1 = ax + by, so that



1 = 1 · 1 = (ax + by) am + bk n = a axm + bym + xbk n + bk+1 (ny) Therefore 1 = (a, bk+1 ), i.e., k + 1 ∈ A and A = N. Based in the mathematical  induction principle, if 1 = (a, b), then 1 = (a, bn ) ∀ n ∈ N. Definition 3.19 It is said that p ∈ Z is a prime number, if p > 1 and the only positive divisors of p are 1 and itself. Theorem 3.9 Let a, b ∈ Z and let p a prime number. Then 1. p | a or 1 = (a, p). 2. Euclid’s lemma. If p | ab then p | a or p | b.

42

3 Rings

Proof 1. Let d = (a, p) then d | a and d | p. Since p is prime then d = p or d = 1. If d = p then p | a. On the other hand if d = 1 it follows that 1 | a and 1 | p then 1 = (a, p). 2. Suppose p  a then 1 = ( p, a). By hypotheses p | ab then by Theorem 3.8 p | b. In the same manner, it is proved that p | a.  Lemma 3.6 If a ∈ Z and a > 1, then the smallest integer greater than 1 and that divides a, is a prime number. Proof Let B = {m ∈ N | m > 1 and m | a}. Since a > 1 and a | a then a ∈ B, so that B = ∅. By the well-ordering principle, there exists p ∈ B such that p ≤ m ∀ m ∈ B. If p is not prime then there exists q ∈ Z with 1 < q < p such that q | p. Since p | a then q | a so that q ∈ B which is a contradiction. Therefore p is prime.  Theorem 3.10 (Fundamental theorem of arithmetic) If a ∈ Z and a > 1, then a is prime or there exist p1 , p2 , . . . , pk prime numbers such that: a = p1 · p2 · . . . · pk Furthermore, if q1 , q2 , . . . , qm are prime numbers such that a = q1 · q2 · . . . · qm then m = k and qi = p j for some i, j = 1, 2, . . . , k. 

Proof The proof is left to the reader as an exercise.

Example 3.13 Let a = 2241756, b = 8566074. The unique factorization of both numbers is: 2241756 = 22 · 34 · 11 · 17 · 37 8566074 = 2 · 34 · 112 · 19 · 23 In addition, (a, b) = 2 · 34 · 11.

3.4.4 Least Common Multiple All the integers numbers a1 , a2 , . . . , an different from zero have a common multiple m if ai | b ∀1 ≤ i ≤ n. A common multiple of a1 , a2 , . . . , an is m = a1 · a2 · · · an =

n i=1

ai

3.4 Some Properties of Integers

43

since m = a1 (a2 · · · an ) m = a2 (a1 · · · an ) .. . m = an (a1 · a2 · · · an−1 ) It should be mentioned that there may be several common multiples, but definitely, there is one minimum multiple. Definition 3.20 Let A = {a1 , a2 , a3 , . . . , an } a set of integers. The least common multiple (mc) is the smallest positive integer that is divisible by all the numbers ai ∈ A, 1 ≤ i ≤ n. This number is denoted by  mc = [a1 , a2 , a3 , . . . , an ] Theorem 3.11 If b is any common multiple of a1 , a2 , a3 , . . . , an , then  mc | b. Proof Since b is any multiple of a1 , a2 , a3 , . . . , an then ai | b ∀1 ≤ i ≤ n. Let h = [a1 , a2 , a3 , . . . , an ]. Applying the division algorithm to h and b: b = h · q + r, 0 ≤ r < |h|. Let suppose that r = 0 then r > 0 or r < 0. On the other hand, we know that ai | b ∀1 ≤ i ≤ n and ai | h then ai | r , where r = b + h(−q), so that r is a multiple of ai < n. This is a contradiction to the division algorithm and the fact that h is the least common multiple. Therefore r = 0, b = h · q and h | b.  Theorem 3.12 Let m > 0, then 1. [ma, mb] = m[a, b] 2. [a, b](a, b) = |a · b| Proof 1. Let h = [ma, mb] then ma | h and mb | h so that there are integers q1 , q2 such that h = (ma)q1 = m(aq1 ) = a(mq1 ) and h = (mb)q2 = m(bq2 ) = b(mq2 ). It is clear that m | h, a | h and b | h then h = m · h 1 . Let h 2 = [a, b] then a | h 2 and b | h 2 so that ma | mh 2 and mb | mh 2 where mh 2 is another common multiple of ma and mb. This implies that h | mh 2 and mh | mh 2 so that h 1 | h 2 . On the other hand ma | mh 1 and mb | mh 1 then a | h 1 and b | h 2 , hence h 2 | h 1 . Therefore h 1 = h 2 and [ma, mb] = m[a, b]. 2. The proof of this item is left to reader as an exercise.  Example 3.14 According to a previous example, (141, 96) = 3. From the previous theorem, the least common multiple of these numbers is: [141, 96] =

|141 · 96| 25 · 32 · 47 = 4512 = 3 (141, 96)

44

3 Rings

3.5 Polynomials Rings Definition 3.21 The polynomial ring, F[x] in x over a field F is defined as the set of polynomials f (x), i.e., F[x] = { f (x) = a0 x0 + a1 x1 + · · · + an xn | n ∈ Z, n ≥ 0, ai ∈ F} where a0 , a1 , . . . , an are the coefficients of p(x). Definition 3.22 Let f (x) = a0 + a1 x + · · · + an x n ∈ F[x], an = 0. The degree of a polynomial f (x), written d◦ f (x) is the largest n such that the coefficient of x n is not zero. In this case the coefficient an is called the leading coefficient. If an = 1, f (x) is called monic. Example 3.15 Consider F = R, let f (x) = x 6 − 7x 4 + 8x 2 + 2x − 7, g(x)= 2x 3 + 5 then f (x), g(x) ∈ R[x] with d◦ f (x) = 6 and d◦ g(x) = 3. Definition 3.23 Let f (x), g(x) ∈ F[x], where f (x) = a0 + a1 x + · · · + an x n and g(x) = b0 + b1 x + · · · + bn x n . It is said that f (x) equals g(x) if ai = bi ∀i = 0, 1, 2, . . . , n. Definition 3.24 Let f (x), g(x) ∈ F[x], where f (x) = a0 + a1 x + · · · + an x n and g(x) = b0 + b1 x + · · · + bm x m • The sum of polynomials f (x) and g(x) is the polynomial given by: f (x) + g(x) = (a0 + b0 ) + (a1 + b1 ) x + · · · + (as + bs ) x s where s = max {n, m}. • The product of polynomials f (x) and g(x) is the polynomial given by: f (x) · g(x) = c0 + c1 x + · · · + cl x l where l = m + n, ck =

ai b j = a0 bk + a1 bk−1 + · · · + ak b0 , k = 0, 1, . . . ,

i+ j=k

n + m, i = 0, 1, . . . , n and j = 0, 1, . . . , m. Exercise 3.4 F[x] is a commutative ring with identity. The proof is left to the reader as an exercise. Example 3.16 Let p(x) = ax 2 + bx + c. Calculate the values of a, b, c so that the next equality holds:

3.5 Polynomials Rings

45

x 3 − 17x 2 + 54x − 8 = (x − 4) p(x) According to the product of polynomials defined above, we have that (x − 4) p(x) = ax 3 + (b − 4a) x 2 + (c − 4b) x − 4c then, by Definition 3.23: a=1 b − 4a = −17 c − 4b = 54 −4c = −8 Finally, we find a = 1, b = 13, c = 2 and p(x) = x 2 − 13x + 2. After performing operations between polynomials, it is common to ask what happens to the degree of the resulting polynomial. The following proposition establishes what is the degree of this polynomial. Proposition 3.4 Let f (x), g(x) ∈ F[x] with f (x) = 0 and g(x) = 0, then: 1. d◦ ( f (x) + g(x)) ≤ max {d◦ f (x), d◦ g(x)} 2. d◦ ( f (x) · g(x)) = d◦ f (x) + d◦ g(x) Proof Let f (x) = a0 + a1 x + · · · + an x n and g(x) = b0 + b1 x + · · · + bm x m . Since f (x) = 0 and g(x) = 0, then an = 0 and bm = 0, so that d◦ f (x) = n and d◦ g(x) = m. 1. Consider that m > n or m < n. Suppose that f (x) + g(x) = 0 then f (x) + g(x) = (a0 + b0 ) + (a1 + b1 ) x + · · · + (as + bs ) x s where s = max {n, m}, hence d◦ ( f (x) + g(x)) = s = max {d◦ f (x), d◦ g(x)} On the other hand, if m = n then f (x) + g(x) = (a0 + b0 ) + (a1 + b1 ) x + · · · + (as + bs ) x s where a j + b j = 0 for some j = 0, 1, . . . , s. Let k ∈ {0, 1, . . . , s} the max such that ak + bk = 0 then d◦ ( f (x) + g(x)) = k ≤ s = max {d◦ f (x), d◦ g(x)}

46

3 Rings

2. By definition, f (x) · g(x) = c0 + c1 x + · · · + cl x l , where ck =

ai b j and

i+ j=k

l = m + n. It is clear that cl = 0 so that f (x) · g(x) = 0, then d◦ ( f (x) · g(x)) =  l = n + m = d◦ f (x) + d◦ g(x). Exercise 3.5 Let f (x), g(x) ∈ F[x] with f (x) = 0 and g(x) = 0, c ∈ F, c = 0. Prove that: 1. d◦ f (x) ≤ d◦ ( f (x) · g(x)). 2. d◦ c f (x) = d◦ f (x). 3. d◦ ( f (x) + c) = d◦ f (x). Taking as a starting point the division algorithm for integers, the following theorem states that given f (x), g(x) ∈ F[x], g(x) = 0, there are only q(x), r (x) ∈ F[x] such that f (x) = q(x)g(x) + r (x) where d◦ r (x) < d◦ g(x) or r (x) = 0. Theorem 3.13 (Division Algorithm for Polynomials) If f (x), g(x) ∈ F[x], g(x) = 0, there are only q(x), r (x) ∈ F[x] such that f (x) = g(x)q(x) + r (x) where r (x) = 0 or d◦ r (x) < d◦ g(x). The polynomials q(x) and r (x) are called quotient and residue respectively. Example 3.17 If the polynomial x 5 − 5x 4 + 10x 3 − 10x 2 + 5x + 9 is divided by x − 1, to find the quotient q(x) and residue r (x), we consider the following arrangement: x 4 − 4x 3 + 6x 2 − 4x + 1 x −1 x 5 − 5x 4 + 10x 3 − 10x 2 + 5x + 9 − x5 + x4 − 4x 4 + 10x 3 4x 4 − 4x 3 6x 3 − 10x 2 − 6x 3 + 6x 2 − 4x 2 + 5x 4x 2 − 4x x +9 −x +1 10 Therefore q(x) = x 4 − 4x 3 + 6x 2 − 4x + 1 and r (x) = 10. According to the division algorithm for polynomials:

x 5 − 5x 4 + 10x 3 − 10x 2 + 5x + 9 = (x − 1) x 4 − 4x 3 + 6x 2 − 4x + 1 + 10

3.5 Polynomials Rings

47

Similar to the division algorithm for polynomials, the greatest common divisor for polynomials can be defined and the Euclidean algorithm can be used. Example 3.18 Let f (x) = 16x 3 + 36x 2 − 12x − 18 and g(x) = 8x 2 − 2x − 3. The greatest common divisor of f (x) and g(x) is obtained in the same manner as for integer numbers: 16x 3 + 36x 2 − 12x − 18 = (8x 2 − 2x − 3)(2x + 5) + 4x − 3 8x 2 − 2x − 3 = (4x − 3)(2x + 1) + 0 hence gcd( f (x), g(x)) = 4x − 3, where: 8x − 2x − 3 2

2x + 5 , 16x + 36x − 12x − 18 − 16x 3 + 4x 2 + 6x 3

2

4x − 3

2x + 1 8x − 2x − 3 − 8x 2 + 6x 2

40x 2 − 6x − 18 − 40x 2 + 10x + 15 4x − 3

4x − 3 − 4x + 3 0

On the other hand, it is left as an exercise to the reader, verify that:

gcd 2x 3 − 11x 2 + 10x + 8, 2x 3 + x 2 − 8x − 4 = 2x 2 − 3x − 2.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

Birkhoff, G., Mac Lane, S.: A Survey of Modern Algebra. Universities Press (1965) Dummit, D.S., Foote, R.M.: Abstract Algebra, vol. 3. Hoboken, Wiley (2004) Fraleigh, J.B.: A First Course in Abstract Algebra. Addison-Wesley (2003) Lang, S.: Algebra. Springer (2002) Saracino, D.: Abstract Algebra: A First Course. Waveland Press (2008) Herstein, I.N.: Abstract Algebra. Prentice Hall (1996) Hungerford, T.W.: Abstract Algebra: An Introduction. Cengage Learning (2012) Jacobson, N.: Lectures in Abstract Algebra, vol. 1. Van Nostrand, Basic concepts (1951) Kunz, E.: Introduction to Commutative Algebra and Algebraic Geometry. Birkhuser (1985) Samuel, P., Serge, B.: Mthodes d’algbre abstraite en gomtrie algbrique. Springer (1967) Herstein, I.N.: Topics in Algebra. Wiley (2006) Bourbaki, N.: Commutative Algebra, vol. 8. Hermann, Paris (1972) Zariski, O., Samuel, P.: Commutative Algebra, vol. 1. Springer Science & Business Media (1958) 14. Zariski, O., Samuel, P.: Commutative Algebra, vol. 2. Springer Science & Business Media (1960)

Chapter 4

Matrices and Linear Equations Systems

Abstract In this chapter basic concepts about matrices and linear algebra is introduced, in order to present the information in a easy to understand manner, the first subject is algebraic operations with real numbers, then linear operation with real number vectors, followed by the definition of a matrix and its basic operations, next the change of basis for a matrix is given, finally Gauss Jordan method is presented.

4.1 Properties of Algebraic Operations with Real Numbers The following list describes the properties of the addition of real numbers. 1. Associative property. For every real numbers α, β, γ , (α + β) + γ = α + (β + γ ). 2. Commutative property. For every real numbers α, β, α + β = β + α. 3. Existence of an additive identity element. There exist a real number γ such that for every α ∈ R, α + γ = α. This number is unique and it is called the zero number (0). 4. Existence of an additive inverse. For every real number α there exists β ∈ R such that α + β = 0. It is easy to show that for every number α ∈ R, the number β with the above property is unique. This number is called additive inverse of α and β = −α. For example, if a + c = b + c then a = b. To demonstrate this assertion, consider the above properties. According to additive identity property 3, a = a + 0. On the other hand, a + 0 = a + (c + (−c)) considering the existence of additive inverse property 4. Continuing this argumentation we have the following: = a + (c + (−c))  = (a + c) + (−c) a  = a + 0  3

4

1

= 

(b + c) + (−c)

By hypothesis

(b + c) + (−c)  = b + (c + (−c))  = b + 0  = b 1

4

3

© Springer Nature Switzerland AG 2019 R. Martínez-Guerra et al., Algebraic and Differential Methods for Nonlinear Control Theory, Mathematical and Analytical Techniques with Applications to Engineering, https://doi.org/10.1007/978-3-030-12025-2_4

49

50

4 Matrices and Linear Equations Systems

The properties of the multiplication of real numbers are listed below. 5. Associative property. For every real numbers α, β, γ , (αβ) γ = α (βγ ). 6. Commutative property. For every real numbers α, β, αβ = βα. 7. Existence of a multiplicative identity element. There exist a real number δ such that for every α ∈ R, αδ = α. This number is unique and it is called the one number (1). 8. Existence of a multiplicative inverse. For every real number α = 0 there exists β ∈ R such that αβ = 1. It is easy to show that for every number α ∈ R, the number β with the above property is unique. This number is called multiplicative inverse of α and β = α −1 . Finally the distributive property of multiplication respect to addition that describes the multiplication of a sum by a factor, where each summand is multiplied by this factor and the resulting products are added: 9. Distributive property. For every real numbers α, β, γ , α (β + γ ) = αβ + αγ (left) and (β + γ ) α = βα + γ α (right). In addition, 1 = 0. According to these properties, the set of real numbers R is called a field. A field F is a set together with two operations, usually called addition and multiplication, and denoted by + and ·, respectively, such that the above properties (axioms) hold. The rational numbers Q and complex numbers C are important examples of fields.

4.2 The Set Rn and Linear Operations A list of length n is an ordered collection of n objects. The of n objects can be numbers or abstract entities, for example: (x1 , . . . , xn ) In particular, when n = 2, the list of length 2 is called an ordered pair or if n = 3, the list is called ordered triple [1–3]. Definition 4.1 A list of n real numbers is called n-tuple of real numbers. If x is a n-tuple of real numbers then its entries are x1 , x2 , . . . , xn . This n-tuple can be write as follows:  n x = x j j=1 If j ∈ {1, . . . , n} then x j is the j-th component of x. Considering the above definition, we can establish the condition for the tuples equality. Let x a n-tuple and y a m-tuple, then x and y are equal if and only if m = n and x j = y j ∀ j ∈ {1, . . . , n}. Definition 4.2 The set of every n-tuples of real numbers is called the set Rn .

4.2 The Set Rn and Linear Operations

51

4.2.1 Linear Operations in Rn The addition of n-tuples and multiplication of a number by a n-tuples are defined by components, that is, element by element. Let x, y ∈ Rn , and α ∈ R. The addition of 2 n-tuples is given by n  n   n x + y = x j j=1 + y j j=1 = x j + y j j=1 where x + y is an element of Rn whose j-th component is (x + y) j = x j + y j for j ∈ {1, . . . , n}. On the other hand, the multiplication of a n-tuple by one number is defined as follows: n   n αx = α x j j=1 = αx j j=1 where αx is an element of Rn whose j-th component is (αx) j = αx j for j ∈ {1, . . . , n}. Example 4.1 Consider the case n = 4. Let x, y ∈ R4 given by ⎡ ⎤ π √ ⎢ 5 2⎥ ⎥ x= ⎢ ⎣ e ⎦, 2

⎡√ ⎤ π ⎢ 8 ⎥ ⎥ y= ⎢ ⎣ e5 ⎦ 7π

and α = 2, then ⎡√ ⎤ ⎡ √ ⎤ ⎡ ⎤ ⎡√ ⎤ ⎡ √ ⎤ π + √π π π π 2 π √ ⎢ 8 ⎥ ⎢ 16 ⎥ ⎢ 5 2⎥ ⎢ 8 ⎥ ⎢ 8 + 5 2 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ x+y=⎢ ⎣ e ⎦ + ⎣ e5 ⎦ = ⎣ e + e5 ⎦ , αy = 2 ⎣ e5 ⎦ = ⎣ 2e5 ⎦ 7π 7π 14π 2 2 + 7π Remark 4.1 The n-tuples of real numbers that satisfy the above operations, are called vectors and the numbers that are multiply by these vectors usually are called scalars. The properties of real numbers as associativity and commutativity of addition, they are also satisfied by the tuples. Similarly, a zero tuple and an opposite tuple (seen as the additive inverse) are defined. Then, for the addition in Rn : 1. Associative property of Rn For every x, y, z ∈ Rn (x + y) + z = x + (y + z). 2. Commutative property of Rn For every x, y ∈ Rn x + y = y + x. It is easy to check the properties 1 and 2. For the first one,

52

4 Matrices and Linear Equations Systems

(x + y) + z

   n   n n x j j=1 + y j j=1 + z j j=1

= 



By notation

= 

x j + yj

Definition of sum in Rn



Definition of sum in Rn



=  = 

Associative property 1 for R

= 

n j=1

 n + z j j=1

 n x j + y j + z j j=1

n  x j + y j + z j j=1

 n n  x j j=1 + y j + z j j=1

= 

   n  n  n x j j=1 + y j j=1 + z j j=1

= 

x + (y + z)

Definition of sum in Rn Definition of sum in Rn By notation

On the other hand, to verify the property 2, first one we check that the tuples x + y and y + x have the same length. Indeed, for the definition of addition in Rn we conclude that both tuples have length n. Let an index k ∈ {1, . . . , n} and show that the k − th component of x + y is equal to the k − th component of y + x: (x + y)k

= 

xk + yk

= 

yk + xk

= 

(y + x)k

Definition of sum in Rn Commutative property 2 for R Definition of sum in Rn

We have shown that the corresponding components of the tuples x + y and y + x are equal. The n-tuple [0]nk=1 , denoted by 0n is called the nulln-tuple. This tuple has length n and every component of this tuple is zero, that is, 0n ∈ Rn ∀ k ∈ {1, . . . , n}. Equivalently (0n )k = 0. This tuple is the neutral element (identity element) of the sum in Rn , that is, x + 0n = 0n + x = x ∀ x ∈ Rn . To verify this assertion, it is sufficient to verify the first (or the second) equality because the sum is commutative in Rn . Let x = [xk ]nk=1 then by notation and definition of the null n-tuple, we have: x + 0n = [xk ]nk=1 + [0]nk=1 . Now, according to definition of sum in Rn , [xk ]nk=1 + [0]nk=1 = [xk + 0]nk=1 and finally by the additive identity element 3 and definition of x, [xk + 0]nk=1 = [xk ]nk=1 = x. On the other hand, the element in Rn there exist an element −x ∈ Rn such that x + (−x) = 0n . This element is unique and is called the opposite tuple or additive inverse in Rn of x. It is easy to check that x + (−x) = (−x) + x = 0n for every x ∈ Rn . Finally, it is left as an exercise to demonstrate the following properties: 1. Distributive property of multiplication by scalars in Rn respect to addition of scalars. For every α, β ∈ R and x ∈ Rn , (α + β) x = αx + βx.

4.2 The Set Rn and Linear Operations

53

2. Distributive property of multiplication by scalars in Rn respect to addition of vectors. For every α ∈ R and x, y ∈ Rn , α (x + y) = αx + αy. 3. For every x ∈ Rn , 1 ∈ R, 1x = x. 4. For every α, β ∈ R and x ∈ Rn , α (βx) = (αβ) x.

4.3 Background of Matrix Operations Letx f : R4 → R2 defined by ⎡ ⎤ x1   ⎢x2 ⎥ ⎥ −→ y1 f :⎢ ⎣x3 ⎦ y2 x4 where y1 = 3x2 + 5x3 , y2 = 3x1 − 4x2 + 5x3 . Later, it will define what is a matrix and a linear transformation and the relationship between them. For now, we will say that a matrix is used to describe a linear transformation. According to function f [4, 5]. Definition 4.3 Let F a field. Let m, n ∈ N. A matrix m × n with entries in F is a rectangular array with m rows and n columns with elements of F: ⎛

a11 a12 ⎜ a21 a22 ⎜ A=⎜ . . ⎝ .. .. am1 am2

··· ··· .. .

⎞ a1n a2n ⎟ ⎟ .. ⎟ . ⎠

· · · amn

More strictly, an m × n matrix over the field F is a function A from the set of pairs of integers (i, j), 1 ≤ i ≤ m, 1 ≤ j ≤ n, into the field F. The entriesof the  matrix A are the scalars ai, j . Therefore, a matrix can be represented by A = ai j i, j . Note that in later chapters the entries of matrices can be different than elements of fields, so they can be elements of rings or even other matrices, which are more general sets and with specific features.     Definition 4.4 Let A = ai j i, j , B = bi j i, j ∈ Mm×n (F). Let c ∈ F. The sum (or addition) of matrices is given by [1, 2, 6, 7]:   A + B = ai j + bi j i, j The multiplication of one matrix by a scalar c is given by:   c · A = c · ai j i, j

54

4 Matrices and Linear Equations Systems

    Definition 4.5 Let A = ai j i, j ∈ Mm×n (F), B = b jk j,k ∈ Mn× p (F). The product AB is the matrix C = (cik )i,k ∈ Mm× p (F) such that [7–9]: cik =

n 

ai j b jk

j=1

    Proposition 4.1 Let A = ai j i, j ∈ Mm×n (F), B = b jk j,k ∈ Mn× p (F), C = (ckl )k,l ∈ M p×q (F). Then: A(BC) = (AB)C A(B + C) = AB + AC (A + B)C = AC + BC k(AB) = (k A)B     Proof 1.  Let A = ai j ∈ Mm×n , B = b jk ∈ Mn× pand C = [ckl ] ∈ M p×q , p defining α jl = BC = k=1 b jk ckl and [βik ] = AB = nj=1 ai j b jk then: 1. 2. 3. 4.

γil = A (BC) = Aα jl  p   =A b jk ckl k=1

=

n 

ai j

 p 

j=1

=

 b jk ckl

k=1

p n  

  ai j b jk ckl

j=1 k=1

Also δil = (AB) C = βik C ⎛ ⎞ n  =⎝ ai j b jk ⎠ C j=1

⎛ ⎞ p n   ⎝ = ai j b jk ⎠ ckl k=1

j=1

n  p

=

k=1 j=1

  ai j b jk ckl

4.3 Background of Matrix Operations

55

   p n   n  p Since j=1 j=1 ai j b jk ckl it is concluded that k=1 ai j b jk ckl = k=1 A(BC) = (AB) C.      2. Let A = ai j ∈ Mm×n , B = b jk ∈ Mn× p and C = c jk ∈ Mn× p , by defining D = d jk = B + C = b jk + c jk , it is easy to see that the matrix A (B + C) equals AD =

n 

ai j d jk

j=1

=

n 

  ai j b jk + c jk

j=1

And AB + AC is AB + AC =

n 

ai j b jk +

j=1

=

n 

n 

ai j c jk

j=1

  ai j b jk + c jk

j=1

Thus AD = AB + AC showing that A (B  + C) = AB + BC is true. 3. Having A = ai j ∈ Mm×n , B = bi j ∈ Mm×n and C = c jk ∈ Mn× p , and making D = di j = A + B = ai j + bi j causes that: (A + B) C = DC n  di j c jk = j=1

=

n    ai j + bi j c jk j=1 n    ai j c jk + bi j c jk j=1

Also AC + BC cause: AC + BC =

n  j=1

=

n 

bi j c jk

j=1

n    ai j c jk + bi j c jk j=1

Hence (A + B) C = AC + BC

ai j c jk +

56

4 Matrices and Linear Equations Systems

    4. With A = ai j ∈ Mm×n , B = b jl ∈ Mn× p and k ∈ F it can be seen that: ⎛ ⎞ n  k (AB) = k ⎝ ai j b jl ⎠ j=1

=

n 

kai j b jl

j=1

The other side of the equation yields: n    kai j b jl

(k A) B =

j=1 n 

=

kai j b jl

j=1

It follows that k(AB) = (k A) B.



Definition 4.6 The identity matrix In ∈ Mn×n (F) is given by [4–7]: ⎛

In = (δi j )i, j

1 ⎜0 ⎜ ⎜ .. ⎜. ⎜ ⎜0 =⎜ ⎜0 ⎜ ⎜0 ⎜ ⎜. ⎝ ..

0 1 .. . 0 0 0 .. .

··· ··· .. . ··· ··· ··· .. .

0 0 .. .

0 0 .. .

0 0 .. .

1 0 0 .. .

0 1 0 .. .

0 0 1 .. .

··· ··· .. . ··· ··· ··· .. .

⎞ 0 0⎟ ⎟ .. ⎟ .⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ .. ⎟ .⎠

0 0 ··· 0 0 0 ··· 1

with δi j the Kronecker delta:  δi j =

1, if i = j 0, if i = j

Proposition 4.2 If A ∈ Mm×n (F), B ∈ Mn× p (F), then AIn = A, In B = B Proof Let ci j be an element of AIn with: ci j = ai1 δ1 j + ai2 δ2 j + . . . + ai j δ j j + . . . + ain δn j

(4.1)

4.3 Background of Matrix Operations

57

According to the definition of In ci j = ai j proving that AIn = A.



Definition 4.7 A matrix A ∈ Mn×n (F) is invertible [10, 11] if there exist a matrix B ∈ Mn×n (F) such that AB = B A = In In the above definition, if the matrix B exists, then this matrix is unique. This matrix is called the inverse of A and is denoted by A−1 . Proposition 4.3 If A ∈ Mn×n (F) is an invertible matrix, then A−1 is invertible and (A−1 )−1 = A. Proof The expression implies that A is invertible so A−1 A = In and A A−1 = In it  follows that A−1 is also invertible whose inverse is A. Proposition 4.4 If A, B ∈ Mn×n (F) are invertible matrices, then AB is invertible and (AB)−1 = B −1 A−1 Proof The expression implies that A and B are invertible: A A−1 = A−1 A = In B B −1 = B −1 B = In With:      (AB) B −1 A−1 = A B B −1 A−1 = (AIn ) A−1 = A A−1 = In And: 

     B −1 A−1 (AB) = B −1 A−1 A B   = B −1 In B = B −1 B = In

It is possible to conclude that (AB)−1 = B −1 A−1 .



Proposition 4.5 Let A ∈ Mn×n (F) is an invertible matrix and c ∈ F, c = 0. Then c A is invertible and (c A)−1 = c−1 A−1

58

4 Matrices and Linear Equations Systems

Proof Since A is invertible  −1 −1  c A = c−1 A−1 c A c A = cc−1 A−1 A = 1In = In The other side of the equality is: (c A)−1 (c A) = In Therefore (c A)−1 = c−1 A.



Definition 4.8 If A ∈ Mm×n (F), the transpose of A, denoted by A is the matrix n × m defined by   A = a ji i, j Proposition 4.6 Let A, B matrices and c ∈ F, when the next operations are well defined, we have: (A + B) = A + B  (c A) = c A (AB) = B  A   If A is invertible, then A is invertible and (A )−1 = A−1       Proof 1. Let A = ai j ∈ Mm×n , B = bi j ∈ Mm×n , A T = a ji ∈ Mn×m and B T = b ji ∈ Mn×m , then: 1. 2. 3. 4.

T  (A + B)T = ai j + bi j = a ji + b ji And A T + B T = a ji + b ji T Allows to conclude = A T + B T   that (A + B) T 2. Consider A = ai j ∈ Mm×n , A = a ji ∈ Mn×m and c = c T ∈ F making the T T T T T T operations  (cA) = c A = c A proves that (c A) = c A 3. Let A = ai j ∈ Mm×n , B = b jk ∈ Mn× p , Defining

C = (AB)T n  c ji = aik bk j k=1

4.3 Background of Matrix Operations

59

And D = B T AT n  d ji = bk j aik k=1

 n T T T Shows that nk=1  aik bk j = k=1 bk j aik so (AB) = B A 4. Consider A = ai j ∈ Mn×n is invertible, it follows:  T  T A T A−1 = A−1 A = InT =I  −1  −1 T = A . Then A T has the inverse A T



Definition 4.9 Let V a finite-dimensional vector space over a field F, with dim V = n. Let {v1 , . . . , vn } and v1 , . . . , vn basis of V . The change of basis matrix.

4.4 Gauss-Jordan Method   Definition 4.10 Let A = ai j i, j ∈ Mm×n (F) and c ∈ F. An elementary row operation is function e which associated with each A ∈ Mm×n (F) an e(A) ∈ Mm×n (F), i.e. 1. Multiplication of one row of A by a non-zero scalar c: e(A)i j = Ai j , if i = r , e(A)r j = c Ar j . 2. Replacement of the r − th row of A by row r plus c times row s. c is any scalar and r = s: e(A)i j = Ai j , if i = r , e(A)r j = Ar j + c As j . 3. Interchange of two rows of A: e(A)i j = Ai j , if i = r, i = s, e(A)r j = As j , e(A)s j = Ar j . Definition 4.11 It said that an elementary matrix is a matrix which is obtained after operating the identity matrix with one of the three elementary row operations. There are three types of elementary matrices, which correspond to three types of row operations: 1. Row switching: A row i within the matrix In can be switched with another row j (elemental matrix of first type).

60

4 Matrices and Linear Equations Systems

⎞ 0 0⎟ ⎟ .. ⎟ .⎟ ⎟ · · · 0⎟ ⎟ · · · 0⎟ ⎟ · · · 0⎟ ⎟ . . .. ⎟ . .⎠ 0 0 ··· 0 0 0 ··· 1



0 ··· 0 0 0 1 ··· 0 0 0 .. . . .. .. .. . . . . . 0 ··· 0 1 0 0 ··· 1 0 0 0 ··· 0 0 1 .. . . .. .. .. . . . . .

1 ⎜0 ⎜ ⎜ .. ⎜. ⎜ ⎜0 Ii j = ⎜ ⎜0 ⎜ ⎜0 ⎜ ⎜. ⎝ ..

··· ··· .. .

(4.2)

2. Row multiplication: Each element in a row i of In can be multiplied by a non-zero constant c ∈ F (elemental matrix of second type). ⎞ ··· 0 · · · 0⎟ ⎟ . . .. ⎟ . .⎟ ⎟ · · · 0⎟ ⎟ · · · 0⎟ ⎟ · · · 0⎟ ⎟ . . .. ⎟ . .⎠ 0 0 ··· 0 0 0 ··· 1



1 ⎜0 ⎜ ⎜ .. ⎜. ⎜ ⎜0 Ii (c) = ⎜ ⎜0 ⎜ ⎜0 ⎜ ⎜ .. ⎝.

0 ··· 0 0 0 1 ··· 0 0 0 .. . . .. .. .. . . . . . 0 ··· 1 0 0 0 ··· 0 c 0 0 ··· 0 0 1 .. . . .. .. .. . . . . .

(4.3)

3. Row addition: A row i of In can be replaced by the sum of that row i and a multiple of another row j (elemental matrix of third type). ⎛

1 ⎜0 ⎜ ⎜ .. ⎜. ⎜ ⎜0 Ii j (c) = ⎜ ⎜0 ⎜ ⎜0 ⎜ ⎜ .. ⎝.

0 1 .. . 0 0 0 .. .

··· ··· .. .

··· ··· ··· .. .

0 0 .. .

0 0 .. .

0 0 .. .

1 0 0 .. .

c 1 0 .. .

0 0 1 .. .

··· ··· .. .

··· ··· ··· .. .

0 0 ··· 0 0 0 ···

⎞ 0 0⎟ ⎟ .. ⎟ .⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ .. ⎟ .⎠ 1

Lemma 4.1 The elementary matrices are invertible, i.e.:  −1 1. Ii j = Ii j −1 2. (Ii (c)) = Ii (c−1 )  −1 3. Ii j (c) = Ii j (−c)

(4.4)

4.4 Gauss-Jordan Method

61

Proof 1. Consider the matrix Ii j defined by the row swapping ri = r j , then the next expression is true: Ii j Ii j = I 2. Consider the elementary matrix Ii (c) formed by the row operation ri = cri and the matrix Ii−1 (c) defined by the row operation ri = 1c ri , then: Ii (c) Ii−1 (c) = Ii−1 (c) Ii (c) =I 3. The elementary matrix Ii j (c) is defined by the row operation ri = ri + ar j and another matrix Ii−1 j (c) defined by ri = ri − ar j , so that: −1 Ii j (c) Ii−1 j (c) = Ii j (c) Ii j (c) =I



  Proposition 4.7 Let A = ai j i, j ∈ Mm×n (F) and X an elemental matrix. The product X A is equivalent to realize an elemental row operation to matrix A. Proof Let r be an elementary row operation and X = r (I ) the resulting elementary matrix of such operation. The proof will consider the three previously stated elementary matrix operations, in the first case if the rows α = β in the operation α = β make the next elementary matrix:  δik i = α Xi j = δβk i = α Thus XA =

m 

 X ik Ak j =

k=1

Aik i = α Aβ j i = α

The second case has the operation α = cβ with α = β having the elementary matrix be:  Xi j =

δik i = α cδβk i = α

62

4 Matrices and Linear Equations Systems

And the operation XA =

m 

 X ik Ak j =

k=1

Aik i = α c Aβ j i = α

For the final case consider the rows α = β in the operation α = α + cβ, then the elementary matrix is:  X ik =

δik i = α δαk + cδβk i = α

Making XA =

m  k=1

 X ik Ak j =

Aik i = α Aα j + c Aβ j i = α

showing that X A = r (A).    Definition 4.12 Let A = ai j i, j ∈ Mm×n (F). The matrix A is called row-reduced if: 1. The first non-zero entry in each non-zero row of A is equal to 1. 2. Each column of A which contains the leading non-zero entry of some row has all its other entries 0. The Gauss-Jordan elimination method consists in to apply successively elementary row operations in order to bring the matrix A to a row-reduced matrix. Finally, we consider the algorithm to determine if an matrix isinvertible and if it is  possible, to calculate the inverse of a square matrix. Let A = ai j i, j ∈ Mn×n (F). We place the identity matrix In to the right of the matrix A and we obtain a matrix n × 2n. This matrix (augmented matrix) is defined by M A = [A | In ] Now, it is necessary to realize elementary row operations to the augmented matrix M A in a consecutive way until to obtain a row-reduced matrix. If A is invertible, this last obtained matrix is equal to In = A−1 . Therefore, if in the first block of the last matrix, after the process of elementary operations, to appear the identity matrix, then the matrix A is invertible and A−1 is the matrix in the second block. If in the first block, the matrix is not the identity, then the matrix A is not an invertible matrix [10, 11].

4.5 Definitions

63

4.5 Definitions Definition 4.13 A linear equation over F is an expression of the form: a1 x 1 + a2 x 2 + . . . + an x n = b

(4.5)

with a1 , a2 , . . . , an , b ∈ F and x1 , x2 , . . . , xn are independent variables. A solution of (4.5) is (α1 , α1 , . . . , αn ) such that: a1 α1 + a2 α2 + . . . + an αn = b Now, we consider a system of m linear equations with n unknowns: a11 x1 + a12 x2 + . . . + a1n xn = b1 a21 x1 + a22 x2 + . . . + a2n xn = b2 .. . am1 x1 + am2 x2 + . . . + amn xn = bm

(4.6)

where ai j ∈ F, 1 ≤ i ≤ m, 1 ≤ j ≤ n and x1 , x2 , . . . , xn are independent variables. If b1 = b2 = . . . = bm = 0 the system (4.6) is said to be homogeneous. A solution of (4.6) is (α1 , α1 , . . . , αn ) such that: a11 α1 + a12 α2 + . . . + a1n αn = b1 a21 α1 + a22 α2 + . . . + a2n αn = b2 .. . am1 α1 + am2 α2 + . . . + amn αn = bm Finally, the solution or general solution of (4.6) consists in all the possible solutions. If the system of linear equations is homogeneous, then we have two possibilities: 1. The unique solution is the trivial solution. 2. There are more solutions. If the system of linear equations is not homogeneous, there are two possibilities: 1. The system is inconsistent (there are not solutions). 2. The system is consistent (there are solutions). In this case, there are two situations: the solution is unique or there exists more solutions. Theorem 4.1 Let a homogeneous system of m linear equations with n unknowns. If n > m then the system has only one (nontrivial) solution in F. Proof The proof is left to the reader as an exercise.



64

4 Matrices and Linear Equations Systems

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Hoffman, K., Kunze, R.: Linear Algebra. Prentice-Hall, Englewood Cliffs (1971) Axler, S.: Linear Algebra Done-Right. Springer, New York (1997) Grossman, S.: Elementary Linear Algebra, Wadsworth Publishing, Belmont (1987) Lipschutz, S.: Linear Algebra. McGraw Hill, New York (2009) Nef, W.: Linear Algebra. Dover Publications, New York (1967) Lang, S.: Linear Algebra. Springer, New York (2004) Anton, H.: Elementary Linear Algebra. Wiley, New York (2010) Whitelaw, T.: Introduction to Linear Algebra. Chapman & Hall/CRC, Boca Raton (1992) Kolman, B.: Introductory Linear Algebra an Applied First Course. Pearson Education (2008) Gantmacher, F.R.: The Theory of Matrices, vol. 1. Chelsea, New York (1959) Gantmacher, F.R.: The Theory of Matrices, vol. 2. Chelsea, New York (1964) Nering, E.: Linear Algebra and Matrix Theory. Wiley, New York (1970)

Chapter 5

Permutations and Determinants

Abstract This chapter introduces basic concepts about permutations and determinants of matrices, the first part contains permutation groups explained through various definitions and theorems, the second part is about determinants of matrices, a method to obtain the determinant of a matrix is given along various other useful results.

5.1 Permutations Group Definition 5.1 Let X a nonempty set. The permutations group in X , denoted by S X is the set of bijective functions of X itself. The elements of S X are called permutations (of the elements of X ) [1–4]. Proposition 5.1 Let X a nonempty set. 1. 2. 3. 4.

σ, τ ∈ S X then σ ◦ τ ∈ S X . ∀ σ, τ, ρ ∈ S X , (σ ◦ τ ) ◦ ρ = σ ◦ (τ ◦ ρ). There exists id X ∈ S X such that σ ◦ id X = id X ◦ σ = σ ∀ σ ∈ S X . Let σ ∈ S X there exists τ = σ −1 ∈ S X such that σ ◦ τ = τ ◦ σ = id X .

Proof 1. The proof 1, 2 and 4 are left as an exercise to the reader. 2. 3. Having σ ∈ S X and x ∈ X , then (σ ◦ id X ) (x) = σ (id X (x)) = σ (x) and (id X ◦ σ ) (x) = id X (σ (x)) = σ (x) then it is possible to conclude σ ◦ id X = id X ◦ σ = σ 4.  Corollary 5.1 Let σ, τ, ρ ∈ S X . 1. If σ ◦ τ = σ ◦ ρ, then τ = ρ. 2. If σ ◦ ρ = τ ◦ ρ, then σ = τ .

© Springer Nature Switzerland AG 2019 R. Martínez-Guerra et al., Algebraic and Differential Methods for Nonlinear Control Theory, Mathematical and Analytical Techniques with Applications to Engineering, https://doi.org/10.1007/978-3-030-12025-2_5

65

66

5 Permutations and Determinants

 −1  −1 −1 Proof  −11. σ ◦ τ = σ ◦ ρ ⇒ σ ◦ (σ ◦ τ ) = σ ◦ (σ ◦ ρ) ⇒ σ ◦ σ ◦ τ = σ ◦ σ ◦ ρ ⇒ id X ◦ τ = id X ◦ ρ ⇒ τ = ρ 2. The procedure is similar to 1 and it is left as an exercise to the reader.  Proposition 5.2 If X is a finite set with n elements, then S X has n! = 1 · 2 · . . . · n elements. In this case S X = Sn . 

Proof It is left as an exercise to the reader.

Remark 5.1 Let suppose n > 1. If X = {1, 2, . . . , n} and ϕ ∈ Sn , then ϕ is a bijection of X in itself: ϕ : X −→ X 1 −→ i 1 2 −→ i 2 .. . n −→ i n We write:



1 2 ··· n ϕ= i1 i2 · · · in



Let n, k ∈ N, with k ≤ n. Let i 1 , i 2 , . . . , i k different integers in X = {1, 2, . . . , n}. We write   i1 i2 · · · ik to represent the permutation σ ∈ Sn given by σ (i 1 ) = i 2 , σ (i 2 ) = i 3 , . . . , σ (i k−1 ) = i k , σ (i k ) = i 1 and σ (x) = x ∀x ∈ X, x ∈ / {i 1 , i 2 , . . . , i k }.   Definition 5.2 It said that a permutation i 1 i 2 · · · i k is called cycle of order k or k-cycle. If k = 2 , the 2-cycle is called transposition. Proposition 5.3 If σ is a k-cycle then σ k = σ ◦ · · · ◦ σ = id Proof It is left to the reader as an exercise.



Definition 5.3 It is said that two cycles are disjoint if they have elements in common. Lemma 5.1 Let σ, τ ∈ Sn . If σ and τ are disjoint cycles then σ ◦ τ = τ ◦ σ . In this case we say that σ and τ commute.

5.1 Permutations Group

67

Proof It is left to the reader as an exercise.



Theorem 5.1 Every permutation in Sn is a product of disjoint cycles pairs. This is uniquely except the order and 1-cycle [2, 5–7]. Proof It is left to the reader as an exercise.



Theorem 5.2 Every permutation in Sn is a product of transpositions. Proof It is left to the reader as an exercise.



Theorem 5.3 Any permutation in Sn is the product of an even number of transpositions or product of an odd number of transpositions but not both. Proof Consider the permutations: (ab) (ac1 · · · ch bd1 · · · dk ) = (bd1 · · · dk ) (ac1 · · · ch ) Making h or k be 0 causes that c or d does not occur, using in both sides i ∈ {1, 2, . . . , n} shows that the equation holds. Since (ab)−1 = ab multiplying both sides of the equation by (ab) yields: (ab) (bd1 · · · dk ) (ac1 · · · ch ) = (ac1 · · · ch bd1 · · · dk ) The value of N is N (ac1 · · · ch bd1 · · · dk ) = h + k + 1 and N ((bd1 · · · dk ) (ac1 · · · ch )) = h + k, it follows that N (ab) (α) = N (α) − 1 if a and b occur in the same operation in the decomposition of α, otherwise N (ab) (α) = N (α) + 1 if a and b are present in different m operations, hence if α is a product of m transpositions εi where εi = ±1, changing an εi = −1 for 1 equals then, N (1) = 0, N (α) = i=1 adding 2 thus, it does not affect the parity, if the same is applied to εi = 1 the final sum will be m that has the same parity as N (α).  Definition 5.4 The parity of a permutation σ ∈ Sn is: 1. An even permutation if the product of an even number of transpositions. 2. An odd permutation if the product of an odd number of transpositions.

5.2 Determinants Let σ ∈ Sn the permutations group of {1, . . . , n}. The sign of σ is given by the following rule:  1, if σ is even. σ (−1) = −1, if σ is odd.

68

5 Permutations and Determinants

  Definition 5.5 Let A = ai j i, j ∈ Mn×n (F). The determinant of A is: det(A) =



(−1)σ a1σ (1) a2σ (2) · · · anσ (n)

σ ∈Sn

Remark 5.2 The determinat of A is denoted by | A | or by [7–10] a11 a12 · · · a1n a21 a22 · · · a2n .. .. . . an1 an2 · · · ann   Definition 5.6 Let A = ai j i, j ∈ Mn×n (F).   Theorem 5.4 Let A = ai j i, j ∈ Mn×n (F). The determinant of A is equal to the sum of the obtained products through the multiplication of elements of any row or column by their respectively cofactors.

Proof Let A = ai j and Ai j denote the cofactor of ai j , then for any i or j |A| = ai1 Ai1 + · · · + ain Ain = a1 j A 1 j + · · · + an j A n j Having| A| = A T leads to ∗ ∗ |A| = ai1 Ai1 + · · · + ain Ain

where Ai∗j is the sum of terms involving no entry of the ith row of A. Consider i = n and j = n makes the sum of terms in |A| containing ann is ann A∗nn = ann



sign (σ ) a1σ (1) a2σ (2) · · · an−1σ (n 1 )

σ

The sum of all permutations σ ∈ Sn for σ (n) = n which equals to adding over the permutations {1, . . . , n − 1} so A∗nn = |Mnn | = (−1)n+n |Mnn |. Considering every other i and j the ith row is exchanged with each succeeding row and the jth column is exchanged with each succeeding column, this operation does not change the determinant Mi j , but the sign of |A| and Ai∗j does change by n − 1 and n − j accordingly, so Ai∗j = (−1)n−i+n− j Mi j = (−1)i+ j Mi j Showing that the theorem is true.



5.2 Determinants

69

  Proposition 5.4 Let A = ai j i, j ∈ Mn×n (F). det(A ) = det(A)



Proof Let A = ai j and A T = bi j with a ji = bi j , then,  T  A = sign (σ ) b1σ (1) · · · bnσ (n) = sign (σ ) aσ (1)1 · · · aσ (n)n σ ∈Sn

σ ∈Sn

With τ = σ −1 then sign (τ ) = sign (σ ) and aσ (1)1 · · · aσ (n)n = a1τ (1) · · · anτ (n) then T  A = sign (τ ) a1τ (1) · · · anτ (n) σ ∈Sn

But σ spans across all Sn,τ = σ −1 that also spans through all elements of Sn thus T |A| = A .    Definition 5.7 Let A = ai j i, j ∈ Mn×n (F). 1. 2. 3. 4.

A is said to be superior triangular if ai j = 0 and i > j. A is said to be inferior triangular if ai j = 0 and i < j. A is said to be triangular if it is superior triangular or inferior triangular. A is said to be diagonal if ai j = 0 and i = j.

Proposition 5.5 The determinant of a triangular matrix is the product of its entries in the diagonal [1, 10–12]. Proof It is left to the reader as an exercise.



Corollary 5.2 The determinant of a diagonal matrix is the product of its entries in the diagonal. Proof It is left to the reader as an exercise.   Proposition 5.6 Let A = ai j i, j ∈ Mn×n (F) [5, 9, 12].



1. If B is obtained through the interchange between two rows (or columns) of A, then det B = − det A. 2. If B is obtained through the multiplication between a row (or a column) of A and a scalar c, then det B = c det A. 3. If B is obtained through the sum of a multiple of row (or a column) of A and then det B = det A.

70

5 Permutations and Determinants

Proof 1. Let τ be a transposition that interchanges the numbers corresponding to

the columns of A that are exchanged, if A = ai j and B = bi j then bi j = aiτ ( j) , for a permutation σ b1σ (1) · · · bnσ (n) = a1(τ ◦σ )(1) · · · an(τ ◦σ )(n) So |B| =



sign (σ ) b1σ (1) · · · bnσ (n) =

σ ∈Sn



sign (σ ) a1(τ ◦σ )(1) · · · an(τ ◦σ )(n)

σ ∈Sn

Being τ an odd permutation makes sign (τ ◦ σ ) = sign (τ ) sign (σ ) = −sign (σ ) and sign (σ ) = −sign (τ ◦ σ ), hence |B| =



sign (τ ◦ σ ) a1(τ ◦σ )(1) · · · an(τ ◦σ )(n)

σ ∈Sn

Since σ spans through Sn , τ ◦ σ which also spans across Sn , thus |B| = − |A| 2. Multiplying the jth row by k, then every element of |A| is multiplied by k so |B| = k |A| so |B| =



sign (σ ) a1i1 · · · ka jin · · · anin

σ

=k



sign (σ ) a1i1 · · · anin

σ

= k |A| 3. The k-th row is added c times to the j-th row of A, using ˆ to denote the j-th position of a term in the determinant makes: |B| =



sign (σ ) a1i1 · · · caˆ kik + a ji j · · · anin

σ

=c



sign (σ ) a1i1 · · · akiˆ k · · · anin +

σ



sign (σ ) a1i1 · · · a ji j · · · anin

σ

With this result it is possible to conclude that |B| = c · 0 + |A| = |A|. Theorem 5.5 Let det : Mn×n (F) −→ F with A −→ det A. Then

det (AB) = det (A) det (B)



5.2 Determinants

71

Proof If A is singular then AB is also singular, so |AB| = 0 = |A| |B|, if A is nonsingular A = E n · · · E 1 then |AB| = |E n · · · E 1 B| = |E n | · · · |E 1 | |B| = |A| |B|



Corollary 5.3 Let A ∈ Mn×n (F). If A is an invertible matrix then det A = 0 and   det A−1 = (det A)−1 Proof A is invertible A A−1 = In , then |A| A−1 = |In | = 1 so |A| = 0 and −1 Since−1 A = |A| .  Corollary 5.4 Let A, P ∈ Mn×n (F). If P is an invertible matrix then   det P −1 A P = det A Proof The previous statement leads to: −1 P A P = P −1 |A| |P| = |P|−1 |A| |P| = |P|−1 |P| |A| = |In | |A| = 1 |A| = |A| 

Showing that the corollary is true.

Remark 5.3 Note that if T : V −→ V is a linear operator, where V is a finitedimensional vector space in F then, it is possible to define the determinant of T : det T = det A where A is the associated matrix to T respect any basis of V .   Definition 5.8 Let A = ai j i, j ∈ Mn×n (F). The adjoint matrix of A is the transposed matrix of the matrix cofactors, i.e.: ⎛

A11 ⎜ A21 ⎜ Adj (A) = ⎜ . ⎝ ..

A21 A22 .. .

A1n A2n where Ai j is the cofactor of ai j .

⎞ · · · An1 · · · An2 ⎟ ⎟ . . .. ⎟ . . ⎠ · · · Anm

72

5 Permutations and Determinants

Theorem 5.6 Let A ∈ Mn×n (F), then A · Adj (A) = det (A) · In



the ith row of A be (ai1 , . . . , ain ) Proof Having A = ai j and Aadj   (A) = bi j makes and the jth column of adj (A) is A j1 , . . . , A jn , the bi j element of Aadj (A) is given by bi j = ai1 A j1 + ai2 A j2 + · · · + ain A jn which can also be represented as  bi j =

|A| i f i = j 0 i f i = j

The matrix Aadj (A) is a diagonal matrix with every diagonal element of |A| showing that A (adj (A)) = |A| I and (adj (A) A) = |A| I .  Corollary 5.5 Let A ∈ Mn×n (F), then A is invertible if and only if det (A) = 0 and A−1 =

1 · Adj (A) det (A) 

Proof It is left to the reader as an exercise.

Finally, we consider a system of linear equations with n equations and n variables: a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. . an1 x1 + an2 x2 + · · · + ann xn = bn ⎛

Let   A = ai j i, j

a11 ⎜a21 ⎜ =⎜ . ⎝ ..

a21 a22 .. .

··· ··· .. .

⎫ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎭

(5.1)

⎞ a1n a1n ⎟ ⎟ .. ⎟ . ⎠

a1n a2n · · · ann

where A is called the coefficients matrix of the system (5.1). Let Δ = det (A) for i ∈ {1, . . . , n}: a11 · · · a1i−1 b1 a1i+1 · · · a1n a21 · · · a2i−1 b2 a2i+1 · · · a2n Δi = . . . .. .. . . .. .. . . .. . . . . an1 · · · ani−1 bn ani+1 · · · ann

5.2 Determinants

73

In linear algebra, Cramer’s rule is an explicit formula for the solution of a system of linear equations with as many equations as unknowns, valid whenever the system has a unique solution. It expresses the solution in terms of the determinants of the (square) coefficient matrix and of matrices obtained from it by replacing one column by the vector of right hand sides of the equations. The following theorem explains the general method for Cramer’s rule. Theorem 5.7 If Δ = 0, then the system of linear equations (5.1) has a unique solution given by: Δi , 1≤i ≤n xi = Δ Proof The system AX = B has a unique solution if and only if A is invertible which in turn is only invertible if Δ = |A| = 0, suppose that Δ = 0, this causes A−1 = (1/Δ) adj (A). Multiplying AX = B by A−1 yields: X = A−1 AX = (1/Δ) adj (A) B The ith row of (1/Δ) adj (A) B is (1/Δ) (A1i , . . . , Ani ), if B = (b1 , . . . , bn )T , then: xi = (1/Δ) (b1 A1i + · · · + bn Ani ) And since Δi = b1 A1i + · · · + bn Ani yields xi =

Δi Δ

.



References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Hoffman, K., Kunze, R.: Linear Algebra. Prentice-Hall (1971) Axler, S.: Linear Algebra Done-Right. Springer (1997) Nef, W.: Linear Algebra. Dover Publications (1967) kolman, B.: Introductory Linear Algebra an Applied First Course. Pearson Education (2008) Lipschutz, S.: Linear Algebra. McGraw Hill (2009) Gantmacher, F.R.: The Theory of Matrices, vol. 2. Chelsea, New York (1964) Halmos, P.R.: Finite-Dimensional Vector Spaces. Courier Dover Publications (2017) Whitelaw, T.: Introduction to Linear Algebra. Chapman & Hall/CRC (1992) Nering, E.: Linear Algebra and Matrix Theory. Wiley (1970) Anton, H.: Elementary Linear Algebra. Wiley (2010) Grossman, S.: Elementary Linear Algebra. Wadsworth Publishing (1987) Gantmacher, F.R.: The Theory of Matrices, vol. 1. Chelsea, New York (1959)

Chapter 6

Vector and Euclidean Spaces

Abstract This chapter is about vector spaces and Euclidean spaces, the first part presents definitions and concepts about vector spaces and subspaces, followed by generated subspace, linear dependence and independence, bases and dimension, quotient space, Euclidean spaces and finally the GramSchidt process is given, every section includes several proposed and solved exercises.

6.1 Vector Spaces and Subspaces Definition 6.1 It said that a field [1–5] is a set F with two operations: +:F×F→F ·:F×F→F such that: 1. Addition is commutative: x + y = y + x ∀ x, y ∈ F 2. Addition is associative: x + (y + z) = (x + y) + z ∀ x, y, z ∈ F 3. There is a unique element 0 ∈ F called identity element (or neutral element) for addition, such that: x +0=0+x = x ∀x ∈F

© Springer Nature Switzerland AG 2019 R. Martínez-Guerra et al., Algebraic and Differential Methods for Nonlinear Control Theory, Mathematical and Analytical Techniques with Applications to Engineering, https://doi.org/10.1007/978-3-030-12025-2_6

75

76

6 Vector and Euclidean Spaces

4. For all x ∈ F there exists a unique element (−x) ∈ F such that: x + (−x) = (−x) + x = 0 5. Multiplication is commutative: x · y = y · x ∀ x, y ∈ F 6. Multiplication is associative: x · (y · z) = (x · y) · z ∀ x, y, z ∈ F 7. There exists a unique non-zero element 1 ∈ F called identity element (or neutral element) for multiplication, such that: x ·1=1·x = x ∀x ∈F 8. To each non-zero x ∈ F there corresponds a unique element x −1 (equivalently 1/x) such that: x x −1 = x −1 x = 1 9. Multiplication distributes over addition: x · (y + z) = x · y + x · z ∀ x, y, z ∈ F Remark 6.1 Note that in C, 0 = 0 + 0i and 1 = 1 + 0i. Example 6.1 Q, R and C are examples of fields of very common use in mathematics. The set   √ Q√2 = p + q 2 | p, q ∈ Q is a field. Definition 6.2 A vector space over a field F (or linear space) is a set V together with two operations [6–9], +:V ×V →V ·:F×V → V such that 1. Addition is commutative: α + β = β + α ∀ α, β ∈ V

6.1 Vector Spaces and Subspaces

77

2. Addition is associative: α + (β + γ ) = (α + β) + γ ∀ α, β, γ ∈ V 3. There is a unique vector 0 ∈ V , called the zero vector (neutral additive), such that α+0=0+α =α ∀α ∈ V 4. For each vector v ∈ V there is a unique vector −v ∈ V called inverse additive, such that v + (−v) = (−v) + v = 0 5. (x · y) · α = x · (y · α) ∀ α ∈ V, ∀ x, y ∈ F (associativity) 6. (x + y) · α = x · α + y · α ∀ α ∈ V, ∀ x, y ∈ F (distributivity of scalar multiplication with respect to field addition). 7. x · (α + β) = x · α + x · β ∀ α, β ∈ V, ∀ x ∈ F (distributivity of scalar multiplication with respect to vector addition). 8. Identity element of scalar multiplication: 1 · α = α ∀ α ∈ V Remark 6.2 The objects of the set V are called vectors and the elements of F are called scalars. The subtraction of two vectors and division by a (non-zero) scalar can be defined as α − β = α + (−β), ∀ α, β ∈ V α/x = (1/x) · α ∀ α ∈ V, ∀ x = 0, x ∈ F Proposition 6.1 Let V a vector space over a field F, then: (I) (II) (III) (IV) (V)

If β + α = γ + α for some α, β, γ ∈ V then β = γ . x · 0 = 0 ∀ x ∈ F. 0·α =0 ∀α ∈ V If x · α = 0 for some x ∈ F and α ∈ V then x = 0 or α = 0. −1 · α = −α ∀ α ∈ V

Proof According to axioms of a vector space: 1. β = β + 0 = β + (α + (−α)) = (β + α) + (−α) = (γ + α) + (−α) = γ + (α + (−α)) = γ + 0 = γ . 2. Note that x · 0 = x · 0 + 0 and x · 0 = x · (0 + 0) = x · 0 + x · 0 then x · 0 + x · 0 = x · 0 + 0. By I, x · 0 = 0. 3. In the same manner, 0 + 0 · α = 0 · α = (0 + 0)α = 0 · α + 0 · α. By I, 0 · α = 0.

78

6 Vector and Euclidean Spaces

4. By hypothesis, x · α = 0. if x the proof is finished. Let suppose that x = 0 then exists x −1 such that x −1 x = 1 then α = 1 · α = (x −1 x)α = x −1 (xα) = x −1 · 0 = 0 from II. Hence α = 0. 5. Note that α + (−1)α = 1 · α + (−1)α = (1 + (−1))α = 0 · α = 0 = α + (−α). Then from I it follows that −1 · α = −α.  Definition 6.3 Let V a vector space over a field F. A subspace of V is a subset nonempty W of V which is itself a vector space over F with the operations of vector addition and scalar multiplication on V [10–13]. The subspace W is a vector space with the addition and scalar multiplication of F, inherited from V , that is to say, W is closed under the addition and scalar multiplication originally defined in V . To prove that a set W of V is a subspace of V , it suffices to prove: 1. W = ∅. 2. If w1 , w2 ∈ W and x, y ∈ F then xw1 + yw2 ∈ W . Theorem 6.1 A nonempty subset W of V is a subset of V if and only if for each pair of vectors w1 , w2 ∈ W and each scalar x ∈ F, the vector xw1 + w2 is again in W . 

Proof The proof is left to the reader.

Definition 6.4 If W1 , W2 , . . . , Wk are subspaces of V , then the sum of subspaces is given by W1 + W2 + · · · + Wk = {w1 + w2 + · · · + wk | w1 ∈ W1 , w2 ∈ W2 , . . . , wk ∈ Wk } Proposition 6.2 If W1 , W2 are subspaces of V , then 1. W1 ∩ W2 is a subspace of V . 2. W1 + W2 is a subspace of V . Proof 1. By hypothesis, W1 , W2 are subspaces of V then 0 ∈ W1 , 0 ∈ W2 so that 0 ∈ W1 ∩ W 2. Hence W1 ∩ W 2 = ∅. Let x, y ∈ F, α, β ∈ W1 ∩ W 2, since W1 , W2 are subspaces of V then xα + yβ ∈ W1 and xα + yβ ∈ W2 so that xα + yβ ∈ W1 ∩ W2 . Therefore W1 ∩ W2 is a subspace of V .  2. Since 0 ∈ W1 , 0 ∈ W2 then 0 = 0 + 0 ∈ W1 + W2 so that W1 + W2 = ∅. Definition 6.5 Let V an vectorial space and W1 , W2 subspaces of V . It said to be the sum of subspaces is direct sum and we write W1 ⊕ W2 to denote the sum W1 + W2 if

W1 ∩ W2 = {0}

6.1 Vector Spaces and Subspaces

79

Proposition 6.3 Let W1 and W2 be subspaces of V . The sum of W1 and W2 is direct sum, if and only if every vector v of W1 + W2 can be written in one and only one way v = w1 + w2 , w1 ∈ W1 , w2 ∈ W2 Proof Suppose that the sum is direct, this is W1 ∩ W2 = {0}, let v ∈ W1 + w2 , then v = w1 + w2 = w1, + w2, with w1 , w1, ∈ W1 and w2 , w2, ∈ W2 , consider w1 + w2 = w1, + w2, and w1 − w1, = w2 − w2, ∈ W1 ∩ W2 = {0}, therefore w1 − w1, = w2 − w2, = 0 leads to w1 = w1, and w2 = w2, . Any element v of W1 + W2 can be written as v = w1 + w2 , knowing that {0} ⊆ W1 ∩ W2 and using w ∈ W1 ∩ W2 then w = w + 0 = 0 + w is an element of W1 + W2 , but unicity indicates that w = 0, hence  W1 + W2 ⊆ {0} makes possible to conclude that W1 ∩ W2 = {0} .

6.2 Generated Subspace Definition 6.6 Let V a vector space over a field F and S ⊂ V . The subspace of V generated by S denoted by L(S), is the minimum subspace of V containing S, i.e., 

L(S) =

W

W subspace of V,S⊂W

Proposition 6.4 Let S = ∅. If S ⊂ V then L(S) = {x1 α1 + x2 α2 + · · · + xn αn | n ∈ N, xi ∈ F, αi ∈ S, i ∈ {1, 2, . . . , n}} Proof Consider U := {α1 v1 + · · · + αn vn | n ∈ N, α1 ∈ K , v1 ∈ S} for i ∈ {1, . . . , n} then v = α1 v1 + · · · + αn vn ∈ U and W is a subspace of V such that S ⊆ W with v1 , . . . , vn ∈ S makes v1 , . . . , vn ∈ W , then v = α1 v1 + · · · + αn vn ∈ W . So  W = L (S) v∈ W subspace of V,S⊂W

Leads to U  L (S). Let s ∈ S then s = 1s ∈ U so that S ⊆ U , knowing that S = ∅ and U = ∅, makes α1 v1 + · · · +αn vn , α1, v1, + · · · +αn, vn, ∈ U and α, β ∈ K , Then α (α1 v1 + · · · +αn vn ) + β α1, v1, + · · · + αn, vn, = αα1 v1 + · · · + ααn vn + βα1, v1, + · · · + βαn, vn, ∈ U , Thus U is a subspace of V , even more, S ⊆ U makes L (S) =



W ⊆U

W subspace of V,S⊂W

Which allows to conclude that U = L (S).



80

6 Vector and Euclidean Spaces

Definition 6.7 Let V a vector space over a field F. A vector β ∈ V is said to be linear combination of the vectors α1 , α2 . . . , αn ∈ V , it there exist scalars x1 , x2 . . . , xn ∈ F such that: n  xi αi β = x1 α1 + x2 α2 + · · · + xn αn = i=1

Definition 6.8 Let V a vector space over a field F. It is said that vectors α1 , α2 . . . , αn generate V (or equivalently that {α1 , α2 . . . , αn } generates V ) if for any α ∈ V there exist scalars x1 , x2 . . . , xn ∈ F such that: α = x1 α1 + x2 α2 + · · · + xn αn =

n 

xi αi

i=1

Note that the vectors α1 , α2 . . . , αn generate V if and only if V = L ({α1 , α2 . . . , αn })

6.3 Linear Dependence and Independence Definition 6.9 Let V a vector space over a field F. It is said that vectors α1 , α2 . . . , αn are linearly independent over F or that {α1 , α2 . . . , αn } is linearly independent over F if for x1 , x2 . . . , xn ∈ F, the equation: x1 α1 + x2 α2 + · · · + xn αn =

n 

xi αi = 0

i=1

can only be satisfied by xi = 0 for i = 1, 2, . . . , 2. This implies that no vector in the set can be represented as a linear combination of the remaining vectors in the set. In other words, a set of vectors is linearly independent if the only representations of 0 as a linear combination of its vectors is the trivial representation in which all the scalars xi are zero. It is said that vectors α1 , α2 . . . , αn are linearly dependent over F or that {α1 , α2 . . . , αn } is linearly dependent over F if there exists scalars x1 , x2 . . . , xn ∈ F not all zero, such that x1 α1 + x2 α2 + · · · + xn αn =

n  i=1

xi αi = 0

6.3 Linear Dependence and Independence

81

Notice that if not all of the scalars are zero, then at least one is nonzero, say x1 , in which case this equation can be written in the form α1 =

n −x2 −xn 1  α2 + · · · + αn = − xi αi x1 x1 x1 i=2

Thus, α1 is shown to be a linear combination of the remaining vectors [3, 8, 9]. Remark 6.3 Finally, note that 1. 2. 3. 4.

If α ∈ V , α = 0 then {α} is linearly independent. {0} is linearly dependent. ∅ is linearly independent. If 0 ∈ {α1 , α2 . . . , αn } then {α1 , α2 . . . , αn } is linearly dependent.

6.4 Bases and Dimension Definition 6.10 Let V a vector space over a field F. It is said that set of vectors {α1 , α2 . . . , αn } of V is a basis of V if the set is linearly independent and generates V .  2 3 4 Example that a0 + a1 x + a2 x + a3 x + a4 x | ai ∈ R . It 2is clear  6.2 2 Let3 V 4= B = 1, x, x , x , x is a basis and dim V = 5. Generally, the list 1, x, x , x 3 , . . . , x n is a basis of a0 + a1 x + a2 x 2 + a3 x 3 + · · · + an x n and dim V = n + 1. The dimension of V is the cardinality of a basis B of V . In Chap. 10, this concept will be denoted transcendence degree of an extension field (Definition 10.12). Lemma 6.1 The vectors {α1 , α2 . . . , αn } nonzero are linearly dependent if and only if on of them is a linear combination of the vectors before him. Proof Suppose that αi is a linear combination of the other vectors: αi = a1 α1 + · · · + ai−1 αi−1 + ai+1 αi+1 + · · · + am αm Adding −αi to both sides: a1 α1 + · · · + ai−1 αi−1 − αi + ai+1 αi+1 + · · · + am αm = 0 With i = 0, then the vectors are linearly dependent. Suppose that the vectors are linearly dependent: b1 α1 + · · · + b j α j + · · · + bm αm = 0

82

6 Vector and Euclidean Spaces

Having b j = 0 leads to the solution: −1 −1 −1 v j = b−1 j b1 α1 − · · · − b j b j−1 α j−1 − b j b j+1 α j+1 − · · · − b j bm αm

So α j is a linear combination of the other vectors.



Proposition 6.5 Suppose {α1 , α2 . . . , αn } generates V and {β1 , β2 . . . , βm } is linearly independent, then m ≤ n. Proof Since {αi } spans V then {β1 , α1 , . . . , αn } is linearly dependent and spans V , one of the vectors of {β1 , α1 , . . . , αn } is a linear combination of the preceding vectors, but it cannot be β1 hence it must α j , thus  be any of the other αi denoted the if α j is removed from the set it is left: β1 , α1 , . . . , α j−1 , α j+1 , . . . , αn , doing same operation to a vector β2 makes β1 , β2 , α1 , . . . , α j−1 , α j+1 , . . . , αn which is linearly dependent and spans V. One of the vectors in this last set is a linear it from the previous combination  of the preceding vectors denoted αk , and removing set yields β1 , β2 , α1 , . . . , α j−1 , α j+1 , αk−1 , αk+1 , . . . , αn , if this is done again with a new vector β3 and so on, and if m ≤ n leads to the spanning set of the required form {β1 , . . . , βm , . . . , αi1 , . . . , αin−m }, Consider that m > n is not possible, if it where after n operations it will produce {β1 , . . . , βm } which implies that βn+1 is a linear combination of {β1 , . . . , βm } contradicting the hypothesis that {β1 , . . . , βm } is linearly independent.  Corollary 6.1 Let {α1 , α2 . . . , αn } and {β1 , β2 . . . , βm } two basis of V , then m = n. Proof Suppose {α1 , . . . , αn } is a basis of V and {β1 , β2 , . . .} is another basis of V, since {αi } spans V, the basis {β1 , β2 , . . .} must contain n or less vector, else it is linearly dependent, if the basis {β1 , β2 , . . .} contains less than n elements, then {α1 , . . . , αn } is linearly dependent, thus both basis have the same number of elements, hence m = n.  Corollary 6.2 Let V a vector space that can be generated by a finite number of vectors. Then 1. Given any set of generators {α1 , α2 . . . , αn } of V , it is possible to obtain a basis. 2. All linearly independent set {β1 , β2 . . . , βm } of V can be completed to a basis. Proof 1. Remove from {α1 , . . . , αn } de 0 vectors and successively from left to right the vectors that are linearly dependent from the previous vectors, after this process only a basis remains. 2. let {α1 , . . . , αn } be a set of generators fo V, consider the set {β1 , . . . , βm , α1 , . . . , αn } which generates V, removing the 0 vectors and subsequently, from left to right, remove the linearly dependent vectors, after this a basis for V that contains {β1 , . . . , βm } is left.  Definition 6.11 If the vector space V can be generated by a finite number of vectors, we say V is finite-dimensional. The number of elements of V is called the dimension of V and it is denoted by dimV or dimF V [2, 6, 7].

6.4 Bases and Dimension

83

Proposition 6.6 Let {α1 , α2 . . . , αn } a basis of V . If α ∈ V then there exist unique x1 , x2 . . . , xn ∈ F such that α = x1 α1 + x2 α2 + · · · + xn αn =

n 

xi αi

i=1

Proof The xi elements exist since {α1 , . . . , αn } generate V, suppose that v = x1 α1 + · · · + xn αn = y1 α1 + · · · + yn αn For some x1 , . . . , xn , y1 , . . . , yn ∈ F there is (x1 − y1 ) α1 + · · · + (xn − yn ) αn = 0, since {α1 , . . . , αn } is linearly independent, then x1 − y1 = · · · = xn − yn = 0 so xn = yn , then the values xi are unique.  Definition 6.12 Let V a vector space over a field F, {α1 , α2 . . . , αn } a basis of V and α ∈ V . According to Proposition 6.6, there exist unique x1 , x2 . . . , xn ∈ F such n  that α = xi αi . The coordinate vector of α relative to the base {α1 , α2 . . . , αn } i=1

is

⎛ ⎞ x1 ⎜ .. ⎟ [α]αi = ⎝ . ⎠ xn

Proposition 6.7 Let V a finite-dimensional vector space and W a subspace of V . Then W is finite-dimensional and dim W ≤ dim V In addition, dim V = dim W if and only if V = W . Proof 1. Suppose that dimV = n, if W = {0}, then W has finite dimension and W = 0 ≤ n = dimV . (i) Suppose that W = {0}, let w1 ∈ W, _w1 = 0, {w1 } is linearly independent in V, then 1 ≤ n. If L ({w1 }) = W so dimW = 1 ≤ n = dimV. / L ({w1 }), then {w1 , w2 } is linearly inde(ii) If L ({w1 })  W , let w2 ∈ W, w2 ∈ pendent in V, so 2 ≤ n. Hence L ({w1 , w2 }) = W , then dimW = 2 ≤ n = dimV . / L ({w1 , w2 }), then {w1 , w2 , w3 } is (iii) If L ({w1 , w2 })  W , let w3 ∈ W, w2 ∈ linearly independent in V, so 3 ≤ n. Hence L ({w1 , w2 , w3 }) = W , then dimW = 3 ≤ n = dimV . This process cannot continue indefinitely, it must stop at n steps, for the r step it is: / L ({w1 , . . . , wr −1 }), then (r) Suppose L ({w1 , . . . , wr −1 })  W , let wr ∈ W, wr ∈ {w1 , . . . , wr } is linearly independent in V, so r ≤ n. So L ({w1 , . . . , wr }) = W , then dimW = r ≤ n = dimV .

84

6 Vector and Euclidean Spaces

2. Clearly V = W =⇒ dimV = dimW , Suppose that W ⊆ V and dimW = dimV = n. Let {w1 , . . . , wn } be a base for W, since {w1 , . . . , wn } is a subset linearly independent of V and dimV = n, then V = L ({w1 , . . . , wn }) = W .  Theorem 6.2 Let V a finite-dimensional vector space. Let W1 , W2 subspaces of V . Then dim (W1 + W2 ) = dim W1 + dim W2 − dim (W1 ∩ W2 ) Proof Let r = dimW1 ∩ W2 , n = dimW1 and m = dimW2 , let {w1 , . . . , wr } be a base for W1 ∩ W2 , then {w1 , . . . , wr } is a linearly independent subset of W1 . Complete {w1 , . . . , wr } making it {w1 , . . . , wr , u r +1 , . . . , u n } to be a basis for W1 , it is also completed to {w1 , . . . , wr , vr +1 , . . . , vm } so it is a basis for W2 . 1. Suppose that α1 w1 + · · · + αr wr + βr +1 u r +1 + · · · + βn u n + γr +1 vr +1 + · · · + γm vm = 0 for α1 + · · · + αr + βr +1 + · · · + βn + γr +1 + · · · + γm ∈ F it is α1 w1 + · · · + αr wr + βr +1 u r +1 + · · · + βn u n = −γr +1 vr +1 − · · · − γm vm ∈ W1 ∩ W2

Then −γr +1 vr +1 − · · · − γm vm = δ1 w1 + · · · + δr wr for some δ1 , . . . , δr ∈ F. Being {w1 , . . . , wr , vr +1 , . . . , vm } a basis for W2 it is linearly independent, then γr +1 = · · · = γm = δ1 = cldots = δr = 0 and α1 w1 + · · · + αr wr + βr +1 u r +1 + · · · + βn u n = 0. Since {w1 , . . . , wr , u r +1 , . . . , u n } is a basis for W1 it is linearly independent, then α1 = · · · = αr = βr +1 = · · · = βn = 0 makes possible to say that {w1 , . . . , wr , vr +1 , . . . , vm } is linearly independent. 2. Let u + v ∈ W1 + W2 , with u ∈ W1 and v ∈ W2 , then there is α1 , . . . , αr , α1, , . . . , αr, , βr , . . . , βn , γr +1 , . . . , γm ∈ F such that u = α1 w1 + · · · + αr wr + βr +1 u r +1 + · · · + βn u n v = α1, w1 + · · · + αr, wr + γr +1 vr +1 + · · · + γm vm Leading to     u + v = α1 + α1, w1 + · · · + αr + αr, wr + βr +1 u r +1 + · · · + βn u n +γr +1 vr +1 + · · · + γm vm Therefore {w1 , . . . , wr , u r +1 , . . . , u n vr +1 , . . . , vm } generates W1 + W2 . Points 1 and 2 make possible to conclude that the previous set is a basis for W1 + W2 , Hence dim (W1 + W2 ) = n + m − r = dimW1 + dimW2 − dim (W1 ∩ W2 )



Exercise 6.1 In R3 proof that the sum of two subspaces of dimension 2 has dimension different than zero. Corollary 6.3 If V is a finite-dimensional vector space and W1 ∩ W2 = {0} then dim (W1 ⊕ W2 ) = dim W1 + dim W2

6.4 Bases and Dimension

85

Proof Since W1 ∩ W2 = {0} the direct sum and the dimension yields dim (W1 ∩ W2 )  = 0, from the theorem it follows that dim (W1 ⊕ W2 ) = dimW1 + dimW2 .

6.5 Quotient Space Let V a vector space over a field F and W a subspace of V . Let α, β ∈ V . It said to be α and β are related or related module W and we write α ∼ β, if α − β ∈ W . This relation is an equivalence relation. Indeed: 1. Let α ∈ V , then α − α = 0 ∈ W , therefore α ∼ α. 2. Supposing α ∼ β, then α − β ∈ W . Finally β − α = −(α − β) ∈ W . Therefore β ∼ α. 3. Supposing α ∼ β and β ∼ γ , then α − β ∈ W and β − γ ∈ W . Now, α − γ = α − β + β − γ ∈ W . Hence α ∼ γ . According to Definition 1.16, the quotient set is denoted by V /W . The elements of V /W are the equivalence classes of the elements of V . Let α ∈ V , then: [α] = {β ∈ V = {β ∈ V = {β ∈ V = {α + γ

| β ∼ α} | β − α ∈ W} | β − α = γ ∈ W} | γ ∈ W}

=α+W These classes form a partition of V . It defined in V /W an addition and scalar multiplication. These operations are well defined: 1. (α + W ) + (β + W ) = (α + β) + W, ∀ α, β ∈ V 2. x (α + W ) = (xα) + W, ∀ α ∈ V, ∀ x ∈ F To verify that operations are well defined, note the following operations. Supposing that α + W = α1 + W and β + W = β1 + W , then α − α1 ∈ W , β − β1 ∈ W and (α + β) − (α1 + β1 ) = (α − α1 ) + (β − β1 ) ∈ W . Therefore (α + β) + W = (α1 + β1 ) + W . Similarly for the multiplication, supposing α + W = α1 + W then α − α1 ∈ W . Let x ∈ F, then xα − xα1 = x(α − α1 ) ∈ W . Therefore xα + W = xα1 + W . We can conclude that V /W is a vector space over F with operations defined above. This vector space is called the quotient space. Remark 6.4 The form of the quotient ring associated with its ideal is important to define the differential range of output given in Chap. 10. The elements of quotient space have the form α + W, α ∈ W . The neutral for addition is the class of 0, i.e., 0 + W . The inverse for addition (the inverse of α + W ) is (−α) + W .

86

6 Vector and Euclidean Spaces

Lemma 6.2 If V is a finite-dimensional vector space and W is a subspace of V then dim V /W is finite. In addition: dim V /W ≤ dim V Proof Having the finite set {v1 , . . . , vn } that generates V, the set {v1 + W, . . . , vn +W } generates V /W which has finite dimension, if {v1 , . . . , vn } is a basis for V, then {v1 + W, . . . , vn + W } generates V /W , then V /W ≤ dimV.  Proposition 6.8 Let V a finite-dimensional vector space over a field F and W a subspace of V . Then dim V = dim W + dim V /W Proof Let n = dimV , W has a finite dimension r = dimW ≤ n, the lemma states V /W has finite dimension s = dimV /W ≤ n, having {w1 , . . . , wr } be a base for W and {u 1 + W, . . . , u s + W } is a base for V /W , then {w1 , . . . , wr , u 1 , . . . , u n } is a base for V. (1) Consider v ∈ V , since v + W ∈ V /W and {u 1 + W, . . . , u s + W } is a basis for V /W , there is α1 , . . . , αs ∈ F such that v + W = α1 (u 1 + W ) + · · · + αs (u s + W ), then v + W = (α1 u 1 + · · · + αs u s ) + W so v − (α1 u 1 + · · ·+ αs u s ) = w ∈ W. Since {w1 , . . . , wr } is a basis for W, there are β1 , . . . , βr ∈ F such that w = β1 w1 + · · · + βr wr , therefore v = w + α1 u 1 + · · · + αs u s = β1 w1 + · · · + βr wr + α1 u 1 + · · · + αs u s The set {w1 , . . . , wr , u 1 , . . . u n } generates V. (2) Suppose that δ1 w1 + · · · + δr wr + γ1 u 1 + · · · + γs u s = 0 for some δ1 , . . . , δr , γ1 , . . . , γs ∈ F, then δ1 (w1 + W ) + · · · + δr (wr + W ) + γ1 (u 1 + W ) + · · · + γ1 (u 1 + W ) = 0 + W Makes γ1 (u 1 + W ) + · · · + γ1 (u 1 + W ) = 0 + W , since {u 1 + W, . . . , u s + W } is a basis for V /W it is linearly independent in F, so γ1 = . . . = γs = 0, then δ1 w1 + · · · + δr wr = 0, since {w1 , . . . , wr } is a basis for W and is linearly independent causes δ1 = . . . = δr = 0 so {w1 , . . . , wr , u 1 , . . . , u s } is also linearly independent. From 1 and 2 it is possible to say that {w1 , . . . , wr , u 1 , . . . , u s } is a basis for V, then dimV = r + s = dimW + dimV /W. 

6.6 Cayley-Hamilton Theorem

87

6.6 Cayley-Hamilton Theorem Theorem 6.3 Every matrix is a zero of its characteristic polynomial. To illustrate this theorem, we have the following example: 

12 A= 32



The characteristic polynomial of the matrix is: |λI − A| = |A − λI |   1−λ 2 = 3 2−λ = (1 − λ) (2 − λ) − 6 = λ2 − 3λ − 4 = P (λ) A is a zero for P (λ)  P (A) =

12 32

2

 −3

   12 10 −4 32 01

P (A) = 0 In what follows we give the proof of Theorem 6.3. Proof Let A be a squared matrix of size n × n and P (λ) is a characteristic polynomial: P (A) = |A − λI | = λn + an−1 λn−1 + · · · + a1 λ + a0 Suppose that B (λ) represents the adjoint matrix of λI − A. The elements of B (λ) are the cofactors of the matrix λI − A hence they are polynomials in λ of order no larger than n − 1: B (λ) = Bn.1 λn−1 + Bn−2 λn−2 + · · · + B1 λ + B0 Where Bi , 0 ≤ i ≤ n − 1 are the squared matrices of size n × n over the field K . Using the fundamental property of the adjoint: (λI − A) B (λ) = |λI − A| I B (λ) = (λI − A)−1 |λI − A|

88

6 Vector and Euclidean Spaces

This yields to:     (λI − A) Bn−1 λn−1 + · · · + B1 λ + B0 = λn + an−1 λn−1 + · · · + a1 λ + a0 I Then:   Bn−1 = I λn

  Bn−2 − ABn−1 = an−1 I λn−1   Bn−3 − ABn−2 = an−2 I λn−2 .. .. .. . . . B0 − AB1 = a1 I (λ) −AB0 = ao I Multiplying by An , An−1 , . . . .A, I : An Bn−1 = An An−1 Bn−2 − An Bn−1 = an−1 An−1 An−2 Bn−3 − An−1 Bn−2 .. .

= an−n An−2 .. .. . .

AB0 − A2 B1 = a1 A −AB0 = ao I Adding the matrix equations to the right side: P (A) = An + an−1 An−1 + an−2 An−2 + · · · + a1 A + a0 I = 0 Hence, the Cayley-Hamilton theorem is true.



6.7 Euclidean Spaces Definition 6.13 Let V a vector space over R. An inner product in V is a function ·, · : V × V −→ R such that: 1. x1 α1 + x2 α2 , β = x1 α1 , β + x2 α2 , β, ∀ α1 , α2 , β ∈ V, ∀ x1 , x2 ∈ R. 2. α, β = β, α, ∀ α, β ∈ V . 3. α, α ≥ 0, ∀ α ∈ V and α, α = 0 if and only if α = 0.

6.7 Euclidean Spaces

89

Definition 6.14 Let V a vector space over C. An inner product in V is a function ·, · : V × V −→ C such that: 1. x1 α1 + x2 α2 , β = x1 α1 , β + x2 α2 , β, ∀ α1 , α2 , β ∈ V, ∀ x1 , x2 ∈ C. 2. α, β = β, α, ∀ α, β ∈ V . 3. α, α ≥ 0, ∀ α ∈ V and α, α = 0 if and only if α = 0. Remark 6.5 An inner product space is a vector space V over a field F with inner product. The inner product space is also known as a real inner product space or Euclidean space if F = R. If F = C the inner product space is called complex inner product space or unitarian space. Definition 6.15 Let V an Euclidean space (or unitarian space) and α ∈ V . The norm of u is given by:    α  = α, α   Definition 6.16 A vector α ∈ V is said to be unitarian if α  = 1. Let α ∈ V, α = 0. To normalize α, we consider the unitarian vector 1 α αN =   =   · α α  α  Indeed:  2 α N  = α N , α N  =   1  · α,  1  · α α  α  =

 1  α,  1      α  α 

=

1  2 α, α   α 

=

1  2   α 

· α

 2 α 

 2 α N  = 1

Theorem 6.4 (Cauchy-Schwarz inequality) Let V an Euclidian (or unitarian) space. |α, β| ≤|| α || · || β || ∀α, β ∈ V



Let V an Euclidian space. According to Cauchy-Schwarz inequality, in R2 , for α, β = 0: α, β (6.1) cos θ = || α || · || β ||

90

6 Vector and Euclidean Spaces

Definition 6.17 Let V an Euclidian space and α, β ∈ V . It is said that vectors α and β are orthogonal if α, β = 0 Note that 0, β = 0 · 0, β = 00, β = 0 ∀β ∈ V Definition 6.18 Let V an Euclidian space and W a subspace of V . The orthogonal complement of W is: W  = {α ∈ V | α, β = 0 ∀β ∈ W } Proposition 6.9 Let V an Euclidian space and W a subspace of V . W  is a subspace of V . Proof It is left to the reader as an exercise.



Definition 6.19 Let V an Euclidian space [1, 6, 13]. 1. A set of vectors {α1 , . . . , αr } of V is said to be orthogonal if its elements are orthogonal pairs, i.e.: αi , α j  = 0 if i = j 2. A set of vectors {α1 , . . . , αr } of V is said to be orthonormal if its elements are orthonormal pairs, i.e.: αi , α j  = δi j

6.8 GramSchmidt Process The GramSchmidt process is a method for the orthonormalization of a set of vectors in an inner product space, most commonly the Euclidean space Rn . Lemma 6.3 Let V an Euclidian space. Let {α1 , . . . , αn } an orthogonal set of vectors of V, with αi = 0 ∀1 ≤ i ≤ n. Then 1. {α1 , . . . , αn } is linearly independent. 2. If β ∈ V, then γ , the vector given by γ =β−

β, α1  β, α2  β, αn  α1 − α2 − . . . − αn || α1 ||2 || α2 ||2 || αn ||2

is an orthogonal vector to αi , ∀1 ≤ i ≤ n. Proof It is left to the reader as an exercise.



6.8 GramSchmidt Process

91

The GramSchmidt process described below, is an algorithm such that, given a basis {α1 , . . . , αn } of an euclidian space V and for the Lemma 6.3, it produces an orthogonal basis {β1 , . . . , βn } of V . In addition, the change of basis matrix is superior triangular: Theorem 6.5 Let V a finite dimensional euclidian space and W a subspace of V , then V = W ⊕ W Proof It is left to the reader as an exercise.



References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

Hoffman, K., Kunze, R.: Linear Algebra. Prentice-Hall (1971) Grossman, S.: Elementary Linear Algebra, Wadsworth Publishing (1987) Nef, W.: Linear Algebra. Dover Publications (1967) Nering, E.: Linear Algebra and Matrix Theory. Wiley (1970) Gantmacher, F.R.: The Theory of Matrices, vol. 1. Chelsea, New York (1959) Axler, S.: Linear Algebra Done-Right. Springer (1997) Lipschutz, S.: Linear Algebra. McGraw Hill (2009) Whitelaw, T.: Introduction to Linear Algebra, Chapman & Hall/CRC (1992) Gantmacher, F.R.: The Theory of Matrices, vol. 2. Chelsea, New York (1964) Lang, S.: Linear Algebra. Springer (2004) Anton, H.: Elementary Linear Algebra. Wiley (2010) kolman, B.: Introductory Linear Algebra an Applied First Course. Pearson Education (2008) Halmos, P.R.: Finite-dimensional Vector Spaces. Courier Dover Publications (2017)

Chapter 7

Linear Transformations

Abstract This chapter is about linear transformations, it begins with an elementary introduction, followed by kernel and image concepts, then linear operators and finally associate matrix.

7.1 Background Definition 7.1 Let V and W vector spaces over a field F. A linear transformation [1–5] between two vector spaces V and W is a map: T : V −→ W such that the following hold: 1. T (α1 + α2 ) = T (α1 ) + T (α2 ) ∀α1 , α2 ∈ V 2. T (x α1 ) = x T (α1 ) ∀x ∈ F. Remark 7.1 The above conditions, can be substituted by the following condition: T (x1 α1 + x2 α2 ) = x1 T (α1 ) + x2 T (α2 ) ∀α1 , α2 ∈ V, ∀x1 , x2 ∈ F Proposition 7.1 Let V, W, U vector spaces over a field F. Let T1 : V −→ W , T2 : W −→ U linear transformations. The composition T2 ◦ T1 : V −→ U is a linear transformation. Proof Let α1 , α2 ∈ V and x1 , x2 ∈ F. Then, we have:

© Springer Nature Switzerland AG 2019 R. Martínez-Guerra et al., Algebraic and Differential Methods for Nonlinear Control Theory, Mathematical and Analytical Techniques with Applications to Engineering, https://doi.org/10.1007/978-3-030-12025-2_7

93

94

7 Linear Transformations

(T2 ◦ T1 ) (x1 α1 + x2 α2 ) = = = =

T2 (T1 (x1 α1 + x2 α2 )) T2 (x1 T1 (α1 ) + x2 T1 (α2 )) x1 T2 (T1 (α1 )) + x2 T2 (T1 (α2 )) x1 (T2 ◦ T1 ) (α1 ) + x2 (T2 ◦ T1 ) (α2 )

Therefore T2 ◦ T1 : V −→ U is a linear transformation.



Remark 7.2 A linear transformation may or may not be injective or surjective [6–8]. If V and W have the same dimension, then it is possible for T to be invertible, i.e., there exists a T −1 such that T T −1 = I .

7.2 Kernel and Image Proposition 7.2 Let T : V −→ W a linear transformation, then T (0) = 0. Proof Since 0 + T (0) = T (0) = T (0 + 0) = T (0) + T (0), then T (0) = 0.



Definition 7.2 Let T : V −→ W a linear transformation. The kernel of T is given by the following set [9–12]: kerT = {α ∈ V | T (α) = 0} Proposition 7.3 Let T : V −→ W a linear transformation. The kernel of T is a subspace of V . Proof 1. Knowing that T (0) = 0, then 0 ∈ ker T and ker T = 0 2. Let v1 , v2 ∈ V and α1 , α2 ∈ F, make T (α1 v1 + α2 v2 ) = α1 T (v1 ) + α2 T (v2 ) = α1 · 0 + α2 · 0 = 0,  then α1 v1 + α2 v2 ∈ ker T suggests that the kernel of T is a subspace of V. Proposition 7.4 Let T : V −→ W a linear transformation. T is injective if and only if kerT = {0}. Proof 1. Suppose T is injective, then {0} ⊆ ker T and v ∈ ker T , so T (v) = 0 = T (0), since T is injective, v = 0, where ker T ⊆ {0}, hence ker T = {0}. 2. Suppose ker T = {0} and v1 , v2 ∈ V such that T (v1 ) = T (v2 ) then T (v1 ) = T (v2 ) =⇒ T (v1 ) − T (v2 ) = 0 =⇒ T (v1 + v2 ) = 0 =⇒ v1 − v2 ∈ 1 ∈ ker T  =⇒ v1 − v2 = 0 =⇒ v1 = v2 then T is injective.

7.2 Kernel and Image

95

Proposition 7.5 Let T : V −→ W a linear transformation. The image of T defined by: ImT = {T (α) | α ∈ V } is a subspace of W . Proof 1. Knowing that 0 = T (0) ∈ I mT and 0 ∈ I mT so I mT = ∅ 2. Let T (α1 ) , T (α2 ) ∈ I mT and v1 , v2 ∈ F, having v1 T (α1 ) + v2 T (α2 ) = T (α1 v1 + α2 v2 ) ∈ I mT makes possible to conclude that the image of T is a subspace of W.  Proposition 7.6 Let V a vector space over a field F and W a subspace of V . The function π : V −→ V /W , defined by π(α) = α + W , with α ∈ V is a linear transformation. In addition, π is surjective. Proof 1. Let α1 , α2 ∈ V and v1 , v2 ∈ F, having π (v1 α1 + v2 α2 ) = (v1 α1 + v2 α2 ) + W = (v1 α1 + W ) + (v2 α2 + W ) = v1 (α1 + W ) + v2 (α2 + W ) = v1 π (α1 ) + v2 π (α2 ), then π is a linear transformation. 2. Given that α + W ∈ V /W , then α ∈ V equals π (α) = α + W , then π is surjective.  Theorem 7.1 Let V a finite-dimensional vector space over a field F and T : V −→ U a linear transformation. Then dimV = dim kerT + dim ImT Proof Being n = dim V , since ker T is a subspace of V, ker T has a finite dimension r := dim ker T ≤ n. Let {w1 , . . . , wr } be a base for ker T and a subset of V linearly independent over F. Completing {w1 , . . . , wr } to be a base of V : {w1 , . . . , wr , v1 , . . . , vn−r }: 1. Let T (v) ∈ I mT , with v ∈ V , there is α1 , . . . , αr , β1 , . . . , βn−r ∈ F such that v = α1 w1 + · · · + αr wr + β1 v1 + · · · + βn−r vn−r , then T (v) = β1 T (v1 ) + · · · + βn−r T (vn−r ) and {T (v1 ) , . . . , T (vn−r )} generates I mT . 2. Consider {T (v1 ) , . . . , T (vn−r )} then, γ1 T (v1 ) + · · · + γn−r T (vn−r ) = 0 for some γ1 , . . . , γn−r ∈ F. Having T being linearly independent T (γ1 v1 + · · · +γn−r vn−r ) = 0, then γ1 v1 + · · · + γn−r vn−r ∈ ker T , in consequence there exist δ1 , . . . , δr ∈ F such that γ1 v1 + · · · + γn−r vn−r = δ1 w1 + · · · + δr wr , so −δ1 w1 − · · · − δr wr + γ1 v1 + · · · + γn−r vn−r = 0, knowing that {w1 , . . . , wr , v1 , . . . , vn−r } is a basis for V, it is linearly independent, making −δ1 = · · · = −δr = γ1 = · · · = γn−r = 0 so {T (v1 ) , . . . , T (vn−r )} is linearly independent. Therefore {T (v1 ) , . . . , T (vn−r )} is a basis for I mT and dim V = n = r + n − r = dim ker T + dim I mT .  In what follows we give a direct application of Theorem 7.1 to linear control theory, in particular of the concept of observability. The next figure shows a object with mass m that is displaced without friction, to which a force F is applied producing an acceleration a (Fig. 7.1).

96

7 Linear Transformations

Fig. 7.1 System diagram F =u

m

y

Example 7.1 The mentioned system diagram is as follows: From Newton’s second law we have: F = ma d2 y =m dt 2

Here, dy = y = y˙ = y (1) , ddty = y = y¨ = y (2) are commonly used to represent dt the first and second derivatives with respect to the time respectively. Now, considering m = 1 F= And the input F =

d2 y dt

d2 y dt

= u.

Now, we consider the following change of coordinates: x1 = y x2 = y˙ Giving the following state representation: x˙1 = x2 x˙2 = u With  x=

x1 x2

 ∈ R2

And  x˙ =

x˙1 x˙2

 ∈ R2

7.2 Kernel and Image

97

The system can be expressed as: 

x˙1 x˙2





    01 x1 0 = + u x2 00 1     x1 y= 01 x2

That is to say x˙ = Ax + Bu y = Cx x (t0 ) = xo For x ∈ Rn , u ∈ Rm , the matrices A, B, C are of appropriate size. It  is well known that a linear system is controlable and observable if the rank of B AB = 2 that is to say the system is fully controllable, as well as the rank  T C C A = 2 that is to say that the system is fully observable (see [13]). Definition 7.3 (Distinguishability) Two initial states x0 and x1 are distinguishable for an input u if the produced outputs by both initial states are different, and if they are not distinguishable they are indistinguishable [14]. Exercise 7.1 Show that the indistinguishability is a equivalence relation (Chap. 1, Definition 1.9). Definition 7.4 A system is said to be observable if all of its states are distinguishable. Remark 7.3 It is clear that the injectivity of the output function is linked to the observability concept. Theorem 7.2 ([15]) ⎡A system⎤is fully observable if and only if the rank of the observC ⎢ CA ⎥ ⎢ ⎥ ability matrix V = ⎢ . ⎥ is equal to the dimension of the space state. ⎣ .. ⎦ C An−1

Proof (Sufficiency) By contradiction, suppose that the system is not fully observable. If the system is not observable then, there exists states x0 and x1 with x0 = x1 ∀u, ∀t > 0 such that y (x0 , t, u) = y (x1 , t, u). Let x˙ = Ax + Bu y = Cx

98

7 Linear Transformations

Then x˙ − Ax = Bu ⇒ e−At (x˙ − Ax) = e−At (Bu)

, ⇒ e−At x = e−At Bu  t ⇒ e−At x (τ ) |t0 = e−Aδ Bu (δ) dδ 0

Hence 

t

x = e At

e−Aδ Bu (δ) dδ + x (0) e At ,

0

Since y = C x, we have 

t

y (x0 ) = Ce x0 + At



e A(t−δ) C Bu (δ) dδ, x (0) = x0

0

y (x1 ) = Ce At x1 +

t

e A(t−δ) C Bu (δ) dδ, x1 (0) = x1

0

Since y (x0 ) = y (x1 ) This yields to Ce At (x1 − x0 ) = 0, with x1 = x0 i.e. ⎧ ⎪ ⎪ ⎪ ⎨

Ce At (x1 − x0 ) C Ae At (x1 − x0 ) 0= .. ⎪ . ⎪ ⎪ ⎩ C An−1 e At (x1 − x0 ) We define x = x1 − x0 = 0 That is to say ⎡

⎤ Cx ⎢ C Ax ⎥ At ⎢ ⎥ e = 0, ⎣ ⎦ C An−1 x

f or e At = 0

7.2 Kernel and Image

Then

99

⎤ ⎡ ⎤ Cx Cx ⎢ ⎥ ⎢ C Ax ⎥ ⎥ =⇒ dim ker ⎢ C Ax ⎥ = 0 x ∈ ker ⎢ ⎦ ⎣ ⎦ ⎣ C An−1 x C An−1 x ⎡

From Theorem 7.1 dim V = dim I m (V ) + dim ker (V ) since dim V = n, then dim I m (V ) < n and, in consequence dim I m (V ) = rank(V ) < n, this ends the sufficiency. The necessity is left to the reader as an exercise.  Definition 7.5 Let V and W vector spaces over the field F. An isomorphism T : V −→ W is a bijective linear transformation. Proposition 7.7 Let T : V −→ W an isomorphism, then T −1 : W −→ V is an isomorphism. Proof Since that T −1 : W → V is a bijective function and T −1 (w) = v if and only if T (v) = w, let w1 , w2 ∈ W , α1 , α2 ∈ F , and v1 , v2 ∈ V such that v1 = T (w1 ) and v2 = T (w2 ), since T is linear, there is T (α1 v1 + α2 v2 ) = α1 T (v1 ) + α2 T (v2 ), then T −1 (α1 w1 + α2 w2 ) = α1 v1 + α2 v2 = α1 T −1 (w1 ) + α2 T −1 (w2 ) allows to conclude  that T −1 is a linear transformation and it is also an isomorphism. Definition 7.6 Let V and W vector spaces over the field F. It is said that V and W are isomorphic if there exists an isomorphism of V in W . We write V ∼ = W , and read V is isomorphic to W , when such a map exists. Theorem 7.3 Let V and W finite-dimensional vector spaces over a field F. Then V ∼ = W if and only if dimV = dimW . Proof 1. Suppose that V and W are isomorphisms, there exists the isomorphism T : V → W , let {v1 , . . . , vn } be a base for V, then it is linearly independent and generates V. Being T injective and the sets {v1 , . . . , vn }, {T (v1 ) , . . . , T (vn )} are linearly independent. Since T is surjective and {v1 , . . . , vn } generates V, {T (v1 ) , . . . , T (vn )} generates W, then dim W = n = dim V . 2. Suppose that dim V = dim W = n, let {v1 , . . . , vn } and {w1 , . . . , wn } be bases for V and W , defining T : V → W for T (vi ) = wi for 1 ≤ i ≤ n, T is extended by linearity, this is: (a) Let v ∈ ker T , sincet there are unique α1 , . . . , αn ∈ F such that v = α1 v1 + · · · + αn vn and T (v) = 0, then 0 = T (v) = T (α1 v1 + · · · + αn vn ) = α1 T (v1 ) + · · · + αn T (vn ) = α1 w1 + · · · + αn wn . Since {w1 , . . . , wn } is a basis for W it is linearly independent, then α1 = · · · = αn = 0 so v = 0v1 + · · · + 0vn = 0, so ker T ⊆ {0} and {0} ⊆ ker T it is possible to say ker T = {0}, then T is injective. (b) Let w ∈ W and {w1 , . . . , wn } is a basis for W, there are unique β1 , . . . , βn ∈ F such that w = β1 w1 + · · · + βn wn , then w = β1 T (v1 ) + · · · + βn T (vn ) = T (β1 v1 + · · · + βn vn ) = T (v), wherev := β1 v1 + · · · + βn vn ∈ V , then T is surjective. Hence, it is possible to conclude that T is an isomorphism and V ∼  = W.

100

7 Linear Transformations

7.3 Linear Operators Definition 7.7 Let V a vector space over a field F. An linear operator in V is a linear transformation T : V −→ V . Proposition 7.8 Let V a finite-dimensional vector space over a field F and T : V −→ V a linear operator. The following statements are equivalent: 1. T is injective. 2. T is surjective. 3. T is bijective. Proof 1. Suppose that T is injective, then ker T = {0} and dim ker T = 0, also V has finite dimension and dim ImT = dim ker T + dim ImT = dim V . Then ImT = V so T is surjective. 2. Suppose that T is surjective, so ImT = V , since V is of finite dimension, dim ker T = dim V − dim ImT = 0, then ker T = {0} and T is injective and T is bijective. 3. Clearly, if T is bijective, then T is injective.  Corollary 7.1 Let V and W vector spaces over the field F, with dimV = dimW . Let T : V −→ W a linear transformation. The following statements are equivalent: 1. T is injective. 2. T is surjective. 3. T is bijective. Proof The proof is similar to the former one, hence it is left as an exercise to the reader.  Definition 7.8 Let V and W vector spaces over the field F. The space of linear transformations of V in W is given by the following set [5, 12]: L (V, W ) = {T : V −→ W | T is a linear transformation} For T1 , T2 , T ∈ L (V, W ), x ∈ F: 1. T1 + T2 : V −→ W such that (T1 + T2 ) (α) = T1 (α) + T2 (α). 2. x T : V −→ W such that (x T ) (α) = x T (α). Definition 7.9 Let V a vector space over a field F. The space of linear operators of V is given by the following set [3, 4]: L (V ) = L (V, V ) = {T : V −→ V | T is a linear operator} Proposition 7.9 Let V and W vector spaces over the field F. If T1 , T2 ∈ L (V, W ) and x1 , x2 ∈ F then x1 T1 + x2 T2 ∈ L (V, W ).

7.3 Linear Operators

101

Proof Having x1 T1 + x2 T2 : V → W and v1 , v2 ∈ V , a1 , a2 ∈ F makes: (x1 T1 + x2 T2 ) (a1 v1 + a2 v2 ) = x1 T1 (a1 v1 + a2 v2 ) + x2 T2 (a1 v1 + a2 v2 ) = x1 (a1 T1 (v1 ) + a2 T1 (v2 )) + x2 (a1 T2 (v1 ) + a2 T2 (v2 )) = x1 a1 T1 (v1 ) + x1 a2 T1 (v2 ) + x2 a1 T2 (v1 ) + x2 a2 T2 (v2 ) = a1 [x1 T1 (v1 ) + x2 T2 (v1 )] + a2 [x1 T1 (v2 ) + x2 T2 (v2 )] = a1 (x1 T1 + x2 T2 ) (v1 ) + a2 (x1 T1 + x2 T2 ) (v2 )

Therefore x1 T1 + x2 T2 ∈ L (V, W )



Proposition 7.10 Let V and W vector spaces over the field F then L (V, W ) is a vector space over F. Proof To proof that L (V, W ) is a vector space the conditions on scalar multiplication and vector addition must be verified, then it is left to the reader as an exercise.  Corollary 7.2 Let V a vector space over F, then L (V ) is a vector space over F. Proof It follows from Proposition 7.10 making W=V.



Proposition 7.11 Let V and W vector spaces over the field F . If dimV = n and dimW = m then dimL (V, W ) = mn. Proof Let A = {α1 , . . . , αn } and B = {β1 , . . . , βm } be bases for V and W respectively. for every pair of integers ( p, q) with 1 ≤ p ≤ m and 1 ≤ q ≤ n, defining  L

pq

(α1 ) =

0 i f i = q βp i f i = q

Let T be a linear transformation from V to W,for 1 ≤ j ≤ n, with the elements Ai j , . . . , Am j of T α j of the basis , this is T α j = mp=1 A pj β p∗ and it must be shown   that T = mp qn A pq L pq Let Uαj = = =

 p

q

p

q

 m 

A pq L pq α j A pq δ jq β p

A pj β p

p

= T αj So U = T and L pq spans L (V, ), since they are linearly independent and  W if the transformation U = p q A pq L pq is the zero transformation,  then U α j = 0 hence mp=1 A pj β p = 0 and β p being linearly independent makes that A pj = 0. 

102

7 Linear Transformations

Proposition 7.12 Let {α1 , α2 , . . . , αn }, {β1 , β2 , . . . , βm } basis of V and W respectively. A basis for L (V, W ) is 

Ti j | 1 ≤ i ≤ m, 1 ≤ j ≤ n



where Ti j : V −→ W is given by Ti j (αk ) = δ jk αi . Proof It follows from Proposition 7.11 by making V = W .



Theorem 7.4 Let V a vector space over F with dimV = n, then for all T ∈ L (V ) there exist a polynomial nonzero q(x) ∈ F[x] of degree at most n 2 such that q(T ) = 0.   2 Proof Since dim L (V ) = n 2 , the set I dV , T, . . . , T n is linearly dependent, then there are α0 , α1 , . . . , αn 2 ∈ F, not al zero such that a0 idv + a1 T + · · · + an 2 T = 0, 2  the polynomial q (x))a0 + ax x + · · · + an 2 x n fulfills the requirement.

7.4 Associate Matrix Let V and W vector spaces over the field F. Let T : V −→ W a linear transformation. Let {α1 , α2 , . . . , αn }, {β1 , β2 , . . . , βm } basis of V and W respectively. It is associated with T and bases given, a matrix as follows [2, 8, 11, 12]. There exist xi j ∈ F for 1 ≤ i ≤ m, 1 ≤ j ≤ n such that m  T αj = xi j βi for 1 ≤ j ≤ n i=1

The matrix



a11 a12 . . . ⎢ a ⎢ 21 a22 . . . A = ai j i j = ⎢ . ⎣ ..

⎤ a1n a2n ⎥ ⎥ .. ⎥ . ⎦

am1 am2 . . . amn

is the associate matrix to (or matrix representation of) the linear transformation T respect to the basis {α1 , α2 , . . . , αn }, {β1 , β2 , . . . , βm }. To denote this matrix, we write: {β } [T ]{αij} Remark 7.4 If T is a linear operator in V and {α1 , α2 , . . . , αn } is a base of V , {α } the matrix [T ]{αii } is denoted by [T ]{αi } . If α ∈ V with α = x1 α1 + · · · + xn αn , the coordinate vector of α respect to the base {α1 , α2 , . . . , αn } is

7.4 Associate Matrix

103

[α]{αi }

⎡ ⎤ x1 ⎢ .. ⎥ =⎣.⎦ xn

Remark 7.5 If T ∈ L (V, W ), α ∈ V , then {β }

[T ]{αij} [α]{αi } = [T (α)]{βi } If T ∈ L (V ), then

[T ]{αi } [α]{αi } = [T (α)]{αi }

Proposition 7.13 Let V, W, U finite-dimensional vector spaces over a field F. T, T1 , T2 : V −→W and S : W −→  U linear transformations. Let {α1 , α2 , . . . , αn }, {β1 , β2 , . . . , βm }, γ1 , γ2 , . . . , γ p basis of V, W and U respectively and x ∈ F, then {β }

{β }

{β }

[T1 + T2 ]{αii } = [T1 ]{αii } + [T2 ]{αii } . {β } {β } [x T ]{αii } = x [T ]{αii } . {γ } {γ } {β } [S ◦ T ]{αii } = [S]{βii } [T ]{αii } . {β } If n = m then T is isomorphism if and only if [T ]{αii } is invertible. In addition  −1 {αi } {β } −1 T {βi } = [T ]{αii } . 5. If V =  W , then T is isomorphism if and only if [T ]{αi } is invertible. In addition T −1 {αi } = [T ]{αi } −1 . 1. 2. 3. 4.

Proof Points 3 and 4 will be proven, the remaining are left to the reader as an exercise. {u } {u } 1. S ◦ T : V → U , then [S ◦ T ]{vii} is a matrix of size p × n and [S]{wii } is a matrix {w } {u } {w } p × m and [T ]{vi}i is of m × n, having [S]{wii } [T ]{vi}i be a matrix of size p × n. If p m T (vk ) = j=1 b jk w j and S w j = i=1 ai j u i then: (S ◦ T ) (vk ) = S (T (vk )) ⎛ ⎞ m  = S⎝ b jk w j ⎠ j=1

=

m 

b jk S w j

j=1

=

m  j=1

b jk

p  i=1

ai j u i

⎛ ⎞ p m   ⎝ = ai j b jk ⎠ u i i=1

j=1

104

7 Linear Transformations {ui}

{w }

The i j entry of S{wi} is ai j , the entry jk of [T ]{vi}i is bi j and the entry ik of   {u } {w } {u } [S]{wii } [T ]{vi}i is mj=1 ai j b jk , also the value ik of [S ◦ T ]{vii} is mj=1 ai j b jk , hence {u } {u } {w } [S ◦ T ]{vii} = [S]{wii } [T ]{vi}i {w } 2. Suppose that [T ]{vi}i is invertible, there is a matrix B = b jk such that n {w } bi j vi with the base [T ]{vi}i B = In , Making S : W → V for S w j = i=1 {u i } {w1 , . . . , wm }, then S is a linear transformation and [S]{wi } = B, let w ∈ W , then {w }

{ui}

T (S (w)){wi} = (T ◦ S){wi} [w]{wi} = [T ]{vi}i S{wi} [w]{wi} = In [w]{wi} = [w]{wi} , then T (S (w)) = w allows to conclude that T is surjective, and being V and W of the dame finite dimension and T a linear transformation, it is bijective and a isomorphism.  Theorem 7.5 Let V, W finite-dimensional vector spaces over a field F and {α1 , α2 , . . . , αn }, {β1 , β2 , . . . , βm } basis of V, W respectively. The function Φ : L (V, W ) −→ Mm×n (F) {β }

with T −→ [T ]{αii } is an isomorphism of vector spaces. Proof The function is linear and a one on one map L (V ; W ) into the set of matrices m × n.  Corollary 7.3 Let V a finite-dimensional vector space over a field F and {α1 , α2 , . . . , αn } a basis of V . The function Φ : L (V ) −→ Mn×n (F) with T −→ [T ]{αi } is an isomorphism of vector spaces. Proof It follows from the theorem by making V = W .



Remark 7.6 The isomorphism in Corollary 7.3 is called an isomorphism of algebras becaus L (V ) and Mn×n (F) have the following properties: 1. L (V ) has the composition of linear operators. 2. Mn×n (F) has the product of square matrix. These properties are inherited at φ function, i.e.: Φ(S ◦ T ) = [S ◦ T ]{αi } = [S]{αi } · [T ]{αi } = Φ(S) · Φ(T ) Theorem 7.6 Let V, W finite-dimensional vector spaces over a field F. Let  {α } and α basis of V . Respectively, let {β1 , β2 , . . . , βm }, , α , . . . , α , α , . . . , α n  n 1 2  1 2 β1 , β2 , . . . , βm basis of W and T ∈ L (V, W ).Let P the change of basis matrix of basis {α1 , α2 , . . . , αn } to basis α1 , α2 , . . . , αn of V and  Q the change of basis matrix of basis {β1 , β2 , . . . , βm } to basis β1 , β2 , . . . , βm of W . We have

7.4 Associate Matrix

105 {β }

{β }

[T ]{αi } = Q −1 [T ]{αii } P i



Proof It is left to the reader as an exercise.

Corollary 7.4 Let V a finite-dimensional vector space over a field F. Let  {α1 , α2 , . . . , αn } and α1 , α2 , . . . , αn basis of V and T ∈ L (V ). Let P the change of basis matrix of basis {α1 , α2 , . . . , αn } to basis α1 , α2 , . . . , αn of V , then: [T ]{αi } = P −1 [T ]{αi } P Proof Consider v ∈ V and knowing that P [v]{αi, } = [v]{αi } makes P −1 [T ]{αi } P [v]{αi, } = P −1 [T ]{αi } [v]{αi } = P −1 [T (v)]{αi } = [T (v)]{αi, } but [T ]{αi, } [v]{αi, } = [T (v)]{αi, } , then P −1 [T ]{αi } P [v]{αi, } = [T ]{αi, } [v]{αi, } . Since v → [v]{αi, } is inFn , it leads to P −1 [T ]{αi } P X = [T ]{αi, } X for any X ∈ Fn , therefore P −1 [T ]{αi } P = [T ]{αi, }  Lemma 7.1 Let P = (chi )h,i ∈ Mn×n , an invertible matrix and {v1 , v2 , . . . , vn } a linearly independent set. For 1 ≤ i ≤ n, let vi

=

n 

chi vh

h=1

  Then v1 , v2 , . . . , vn is linearly independent. Proof Consider x1 v1, + · · · + xn vn, = 0 for some x1 , . . . , xn ∈ R, having: 0=

n 

xi vi,

i=1

= =

n 

n 

h=1

i=1

xi chi vh i=1 h=1  n  n  

xi chi vh

The set {v1 , . . . , vn } is linearly independent and make the system:

n

c11 x1 + · · · + x1n xn = 0 c21 x1 + · · · + x2n xn = 0 .. . 0 cn1 x1 + · · · + xnn xn = 0

i=1

xi chi = 0 for 1 ≤ h ≤ n

106

7 Linear Transformations

The associated matrix for the systems is P, making the proper elementary operations P can be made into the identity, and the systems will be: x1 = 0 x2 = 0 .. . xn = 0   Then vi, , . . . , vn, is linearly independent.



Definition 7.10 Let V a vector space over F, with dim V = n. Let T : V −→ V a linear operator and A ∈ Mn×n (F). It said to be matrix A represents to T if there exists a basis {α1 , α2 , . . . , αn } of V such that [9, 10, 16]: [T ]{vi } = A Theorem 7.7 Let A, B ∈ Mn×n (F) and V a vector space over F, with dim V = n. The matrices A and B represent the same linear operator T in V if and only if A and B are similar.   Proof There are T ∈ L (V ) and {v1 , . . . , vn }, v1, , . . . , vn, are bases for V, then A = [T ]{vi } and B = [T ]{vi, } . Let P be a matrix for basis change from {vi , . . . , vn } to  ,  vi , . . . , vn, , then B = P −1 A P implies that A and B are similar.

Similarly, Suppose that A = ai j and B are similar, then, there is P ∈ Mm×n (F) . , vn } be a basis for V . Making T : V → invertible such that B = P −1 A P, let {v i , . . n V in the basis {v1 , . . . , vn } for T v j = i=1 ai j vi for 1 ≤ j ≤ n and completing for linearity, then T ∈ L (V ) and A represent the operator T corresponding n , to the basis  , {vi , . ,.. , vn }. Also if P = (chi )hi for 1 ≤ i ≤ n , vi = h=1 chi vh , the set v1 , . . . , vn is linearly independent, then it is also a basis for V having B = P −1 A P = P −1 [T ]{vi } P = [T ]{vi, } , hence B also represents T. 

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Hoffman, K., Kunze, R.: Linear Algebra. Prentice-Hall (1971) Grossman, S.: Elementary Linear Algebra. Wadsworth Publishing (1987) Whitelaw, T.: Introduction to Linear Algebra. Chapman & Hall/CRC (1992) Anton, H.: Elementary Linear Algebra. Wiley (2010) Gantmacher, F.R.: The Theory of Matrices, vol. 2. Chelsea, New York (1964) Axler, S.: Linear Algebra Done-Right. Springer (1997) Lipschutz, S.: Linear Algebra. McGraw Hill (2009) Nef, W.: Linear Algebra. Dover Publications (1967) Lang, S.: Linear Algebra. Springer (2004) Nering, E.: Linear Algebra and Matrix Theory. Wiley (1970)

References 11. 12. 13. 14.

107

kolman, B.: Introductory Linear Algebra an Applied First Course. Pearson Education (2008) Gantmacher, F.R.: The Theory of Matrices, vol. 1. Chelsea, New York (1959) Kailath, T.: Linear Systems. Prentice-Hall, Englewood Cliffs, NJ (1980) Hermann, R., Krener, A.: Nonlinear controllability and observability. IEEE Trans. Autom. Control 22(5), 728–740 (1977) 15. Aguilar-Lpez, R., Mata-Machuca, J.L., Martnez-Guerra, R.: Observability and Observers for Nonlinear Dynamical Systems: Nonlinear Systems Analysis. Lambert Academic Publishing (2011) 16. Halmos, P. R.: Finite-Dimensional Vector Spaces. Courier Dover Publications (2017)

Chapter 8

Matrix Diagonalization and Jordan Canonical Form

Abstract This chapter focuses on the basic theory of Matrix Diagonalization and Jordan Canonical Form.

8.1 Matrix Diagonalization Let V a finite dimensional vector space, dim V = n and T : V → V a linear operator, then T can be represented by a matrix A ∈ Mn×n (R). For this reason, some occasions we are going to refer to eigenvalues and eigenvectors of n × n. Theorem 8.1 Let A ∈ Mn×n (R), then λ is a eigenvalue of A if and only if p(λ) = det (A − λI ) = 0 Proof The number λ is an eigenvalue of A if and only if there is a non zero vector v that fulfills: Av = λv λI v − Av = 0 (λI − A) v = 0 If not the matrix (λI − A) is singular. The value λ is a root of P (λ) = |λI − A|, then v is an Eigenvector only if the statement is true, hence it is a solution of the characteristic polynomial  We know that p(λ) can be written as: p(λ) = λn + an−1 λn−1 + · · · + a1 λ + a0 = 0

© Springer Nature Switzerland AG 2019 R. Martínez-Guerra et al., Algebraic and Differential Methods for Nonlinear Control Theory, Mathematical and Analytical Techniques with Applications to Engineering, https://doi.org/10.1007/978-3-030-12025-2_8

109

110

8 Matrix Diagonalization and Jordan Canonical Form

This equation has n roots, which can be repeated. If λ1 , λ2 , . . . , λk are the different roots of p(λ) with multiplicities r1 , r2 , . . . , rk respectively, then p(λ) can be factorized as: p(λ) = (λ − λ1 )r1 (λ − λ2 )r2 · · · (λ − λk )rk = 0.

Definition 8.1 It said that the numbers r1 , r2 , . . . , rk are called algebraic multiplicities of eigenvalues λ1 , λ2 , . . . , λk respectively [1–4]. Definition 8.2 Let λ an eigenvalue of T and E λ = {v | T v = λv} = {v | (T − λI )v = 0} The subspace E λ (since is the kernel of linear transformation T − λI ) is called Eigenspace of T corresponding to eigenvalue λ. Since E λ is a subspace then 0 ∈ E λ , but the dimension dim E λ > 0 since by definition if λ is an eigenvalue then there exists an eigenvalue nonzero corresponding to λ. Theorem 8.2 Let V a finite dimensional vector space, dim V = n and T : V → V a linear operator. Let λ1 , λ2 , . . . , λk different eigenvalues of T and their corresponding eigenvectors v1 , v2 , . . . , vn , then v1 , v2 , . . . , vn are linearly independent. Proof Suppose that v1 , . . . , vn are the minimal set of vectors for which the theorem is not true, having s > 1 and v1 = 0, by the minimality condition v2 , . . . , vs are linearly independent, then v1 is a linear combination of v2 , . . . , vs : v1 = a2 v2 + · · · + as vs Applying T makes: T (v1 ) = T (a2 v2 + · · · + as vs ) = T (a2 v2 ) + · · · + T (as vs ) the eigenvector v j corresponding to the eigenvalue λ j and applying T  Consider T v j = λ j v j , then: λ1 v1 = a2 λ2 v2 + · · · + as λs vs Multiplying by λ1 : λ1 v1 = a2 λ1 v2 + · · · + as λ1 vs

8.1 Matrix Diagonalization

111

a2 (λ1 − λ2 ) v2 + · · · + as (λ1 − λs ) vs = 0 Being v2 , . . . , vs linearly independent causes that a2 (λ1 − λ2 ) = 0, . . . , as (λ1 − λs ) = 0, but the values of λ are different, then λ1 = λ j , j > 1, hence  a2 = 0, . . . , as = 0 contradicting ak = 0. Remark 8.1 Let A a matrix representation of T . Let suppose A ∈ M3×3 (R) with eigenvalues λ1 , λ2 and λ3 and v1 , v2 and v3 the eigenvectors associated with each λ [5–8]: (i) If the algebraic multiplicity of each λ is 1, by Theorem 8.1, v1 , v2 and v3 are linearly independent. (ii) If the algebraic multiplicity of λ1 and λ2 is 1 and 2 respectively, then dim E λ2 ≤ 2. Otherwise we could have at least 4 linearly independent vectors in a space of three dimensions, that is, for λ2 there can be 1 or 2 linearly independent eigenvectors. (iii) If the algebraic multiplicity of λ1 is 3 then dim E λ ≤ 3. This is, there can be 1, 2 or 3 linearly independent eigenvectors. Theorem 8.3 Let V a finite dimensional vector space, dim V = n and T : V → V a linear operator. Let λ an eigenvalue of T , then: 1 ≤ dim E λ ≤ algebraic multiplicity of λ. Proof It is left to the reader as an exercise. Example 8.1 Let

⎡ ⎤ −3 3 −2 A = ⎣−7 6 −3⎦ 1 −1 2

The eigenvalues of A corresponding to the roots of p(λ), this is: p(λ) = det (A − λI ) = (λ − 2)2 (λ − 1) = 0 hence, the eigenvalues are: • λ1 = 2 with algebraic multiplicity 2. • λ2 = 1 with algebraic multiplicity 1. On the other hand:



112

8 Matrix Diagonalization and Jordan Canonical Form

E λ1 = ker (A − λ1 I )

E λ1

= ker (A − 2I ) ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ −5 3 −2 x1 0 ⎬ ⎨ x1 = ⎣x2 ⎦ : ⎣−7 4 −3⎦ ⎣x2 ⎦ = ⎣0⎦ ⎩ ⎭ 1 −1 0 0 x3 x3

The solution of above system of linear equations provides the unique linearly independent eigenvector given by: ⎡ ⎤ 1 v1 = ⎣ 1 ⎦ −1 then E λ1 = L {v1 }, hence dim E λ1 = 1. Similarly for λ2 : E λ2 = ker (A − λ2 I ) = ker (A − I ) ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ −4 3 −2 x1 0 ⎬ ⎨ x1 E λ2 = ⎣x2 ⎦ : ⎣−7 5 −3⎦ ⎣x2 ⎦ = ⎣0⎦ ⎩ ⎭ 1 −1 1 0 x3 x3 Solving the above system of linear equations, the linearly independent eigenvector associated to λ2 is ⎡ ⎤ 1 v2 = ⎣2⎦ 1 then E λ2 = L {v2 }, hence dim E λ2 = 1. Definition 8.3 Let A, B ∈ Mn×n (R). It said A and B are equivalent matrices if there exists an invertible matrix Q ∈ Mn×n (R) such that: B = Q −1 AQ In addition, Q is called transformation matrix. Theorem 8.4 If A, B ∈ Mn×n (R) are equivalent matrices then A and B have the same characteristic polynomial and therefore they have the same eigenvalues. Proof Having the invertible matrix P that fulfills B = P −1 A P , which also makes λI = P −1 λI P such that:

8.1 Matrix Diagonalization

113

  |λI − B| = λI − P −1 A P    =  P −1 λI P − P −1 A P    =  P −1 (λI − A) P    =  P −1  |λI − A| |P|   Knowing that  P −1  |P| = 1, then |λI − B| = |λI − A| .



Definition 8.4 Let A ∈ Mn×n (R). It said that A is a diagonalizable matrix if there exist a diagonal matrix D such that A and D are equivalent, that is to say: D = Q −1 AQ where Q ∈ Mn×n (R) is a invertible matrix. Remark 8.2 • If D is a diagonal matrix, the eigenvalues of D are the components of the principal diagonal. • If A is equivalent to D, by Theorem 8.4, the eigenvalues of A are the same that D. According to above comments, if A is diagonalizable then A is equivalent to a diagonal matrix with components in the principal diagonal are the eigenvalues of A. In addition, given a linear transformation T : V → V in a vector space V with dim V = n, it is worth asking: 1. Is there a base β for V such that the matrix representation A of T is a diagnonalizable matrix? 2. If that base exists, how can it be found? The following theorem has the answer. Theorem 8.5 Let A ∈ Mn×n (R). A is diagonalizable if and only if A has n linearly independent eigenvectors. In this case, the diagonal matrix D equivalent to A is given by: ⎡ ⎤ λ1 0 0 · · · 0 ⎢ 0 λ2 0 · · · 0 ⎥ ⎢ ⎥ ⎢ ⎥ D = ⎢ 0 0 λ3 · · · 0 ⎥ ⎢ .. .. .. . . .. ⎥ ⎣. . . . .⎦ 0 0 0 . . . λn

where λ1 , λ2 , . . . , λn are the eigenvalues of A. In addition if Q is a matrix whose columns are linearly independent vectors of A, then: D = Q −1 AQ. Proof The matrix A has eigenvalues λ1 , . . . , λn that make linearly independent eigenvectors v1 , . . . , vn , let the Q matrix have in its columns v1 , . . . , vn , so it is non singular. The columns of AQ are Av1 , . . . , Avn and Avk = λvk , then its columns

114

8 Matrix Diagonalization and Jordan Canonical Form

can be written as λ1 v1 , . . . , λn vn , the matrix D = diag (λ1 , . . . , λn ) implies that Q D has columns λk vk , then AQ = Q D And D = Q −1 AQ Suppose there is a non singular matrix Q such that Q −1 AQ = diag (λ1 , . . . , λn ) = D and v1 , . . . , vn are the columns of Q, then the columns of AQ = Avk and the columns of Q D = λk vk then Av1 = λ1 v1 , · · · , Avn = λn vn , even more, Q is nonsingular and v1 , . . . , vn are non zero, eigenvectors of A, diagonal elements of D and linearly independent, hence the theorem is true.  Corollary 8.1 If A ∈ Mn×n (R) has n different eigenvalues then A is diagonalizable. Proof It is left to the reader as an exercise. Example 8.2 Let





⎤ 1 −1 4 A = ⎣3 2 −1⎦ 2 1 −1

with eigenvalues given by p(λ) = det (A − λI ) = − (λ − 1) (λ + 2) (λ − 3) = 0, hence λ1 = 1, λ2 = −2, λ3 = 3. Since the eigenvalues are different, by Theorem 8.2 there are three linearly independent eigenvectors that can be: ⎡

⎤ ⎡ ⎤ ⎡ ⎤ −1 1 1 v1 = ⎣ 4 ⎦ , v2 = ⎣−1⎦ , v3 = ⎣2⎦ 1 −1 1 Hence, by Theorem 8.5 A is diagonalizable where: ⎡

⎤ −1 1 1 Q = ⎣ 4 −1 2⎦ 1 −1 1 Finally: ⎡ ⎤⎡ ⎤⎡ ⎤ ⎡ ⎤ 1 −2 3 1 −1 4 −1 1 1 1 0 0 1 D = Q −1 AQ = − ⎣−2 −2 6 ⎦ ⎣3 2 −1⎦ ⎣ 4 −1 2⎦ = ⎣0 −2 0⎦ 6 −3 0 −3 2 1 −1 1 −1 1 0 0 3

8.1 Matrix Diagonalization

115

Example 8.3 Although the eigenvalues of a matrix are not different, it is possible to diagonalize the given matrix. Let ⎡

⎤ 324 A = ⎣2 0 2 ⎦ 423 with eigenvalues given by p(λ) = det (A − λI ) = − (λ + 1)2 (λ − 8) = 0, hence λ1 = −1 with algebraic multiplicity 2 and λ2 = 8. For λ2 = 8 the linearly independent eigenvector is: ⎡ ⎤ 2 v1 = ⎣1⎦ 2 On the other hand for λ1 = −1 the corresponding linearly independent eigenvectors are: ⎡ ⎤ ⎡ ⎤ 1 0 v2 = ⎣−2⎦ , v3 = ⎣−2⎦ 0 1 By Theorem 8.5 the matrix A is diagonalizable with ⎡

⎤ 2 1 0 Q = ⎣1 −2 −2⎦ 2 0 1 ⎡ ⎤⎡ ⎤⎡ ⎤ ⎡ ⎤ −2 −1 −2 324 2 1 0 8 0 0 1 D = Q −1 AQ = − ⎣−5 2 4 ⎦ ⎣2 0 2⎦ ⎣1 −2 −2⎦ = ⎣0 −1 0 ⎦ 9 4 2 −5 423 2 0 1 0 0 −1 Example 8.4 In Example 8.1, the matrix A has only two linearly independent eigenvectors, then it is not possible to find the transformation matrix Q. Hence by Theorem 8.5, A is not diagonalizable. The matrices M ∈ Mn×n (R) with n linearly independent eigenvectors can be expressed in a diagonal matrix through a equivalence transformation. Most polynomials have different roots, then most of the matrices will have different eigenvalues and therefore they will be diagonalizable [3–5, 8]. Matrices that are not diagonalizable (matrices that do not have n linearly independent eigenvectors) have some applications. In this case, it still possible to prove that matrix is equivalent to other matrix most simple but the new matrix is not diagonal and the transformation matrix Q is more difficult to obtain.

116

8 Matrix Diagonalization and Jordan Canonical Form

8.2 Jordan Canonical Form Although not every linear operator T is diagonalizable, it is possible to find a base β for the vector space V with dim V = n such that the matrix representation A ∈ Mn×n (R) of T is equivalent to: ⎡

J1 ⎢0 ⎢ ⎢ J = ⎢0 ⎢ .. ⎣.

0 J2 0 .. .

0 0 J3 .. .

··· ··· ··· .. .

⎤ 0 0⎥ ⎥ 0⎥ ⎥ .. ⎥ .⎦

0 0 0 . . . Jn

where Ji is a matrix of the form (λi ) or the form: ⎡ λi ⎢0 ⎢ ⎢0 ⎢ ⎢ .. ⎢. ⎢ ⎣0 0

⎤ 0 0⎥ ⎥ 0⎥ ⎥ .. ⎥ .⎥ ⎥ 0 0 . . . λi 1 ⎦ 0 0 . . . 0 λi

1 λi 0 .. .

0 1 λi .. .

0 0 1 .. .

··· ··· ··· .. .

for some eigenvalue λi of A. In the above description: • • • •

Ji is called Jordan block corresponding to λi . βi is the corresponding base to the block Ji . J is called Jordan canonical form of A. β is the Jordan canonical base.

Remark 8.3 Each Jordan block Ji is almost a diagonal matrix. J is a diagonal matrix if and only if each Ji is of the form (λi ) [9–11]. Example 8.5

⎡ ⎢4 ⎢· · · ⎢ ⎢ ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎢· · · ⎣

.. . ··· .. . .. . .. .

··· . 0 ..

⎤ 0⎥ · · ·⎥ ⎥ ⎥ 0⎥ ⎥ ⎥ 0⎥ ⎥ ⎥ 0⎥ ⎥ · · · · · · · · · · · · · · ·⎥ ⎦ .. 0 0 0 . 7

. 0 0 0 .. ··· ··· ··· ··· . −3 1 0 .. . 0 −3 1 .. . 0 0 −3 ..

Jordan blocks are marked with dotted lines. The algebraic multiplicity of each eigenvalue λi is the number of times that the eigenvalue λi appears on the diagonal of Ji .

8.2 Jordan Canonical Form

117

Proposition 8.1 Let λi an eigenvalue of A such that dim E λi = si , then the number of ones above the diagonal of Jordan canonical form is given by: n−

k 

si

(8.1)

i=1

where k is the number of eigenvalues. 

Proof It is left to the reader as an exercise. Example 8.6 If the characteristic polynomial of a matrix A ∈ M4×4 (R) is p(λ) = (λ − 2)3 (λ + 3)

then λ1 = 2 with algebraic multiplicity 3 and dim E λ1 ≤ 3. For λ2 = −3 with algebraic multiplicity 1 and dim E λ2 = 1. Since dim E λ1 ≤ 3 then there can be 1, 2 or 3 linearly independent eigenvectors for λ1 . For λ2 there is always a linearly independent eigenvector, hence the possible Jordan canonical forms of A are: 1. If dim E λ1 = 1, by Eq. (8.1) the number of ones above the diagonal of Jordan canonical form is 3 − 1 = 2: ⎡ ⎤ 210 0 ⎢0 2 1 0 ⎥ ⎥ J1 = ⎢ ⎣0 0 2 0 ⎦ 0 0 0 −3 2. If dim E λ1 = 2, the number of ones above the diagonal of Jordan canonical form is 3 − 2 = 1: ⎡ ⎤ 210 0 ⎢0 2 0 0 ⎥ ⎥ J2 = ⎢ ⎣0 0 2 0 ⎦ 0 0 0 −3 3. If dim E λ1 = 3, in this case the number of canonical form is 3 − 3 = 0: ⎡ 200 ⎢0 2 0 J3 = ⎢ ⎣0 0 2 000

ones above the diagonal of Jordan ⎤ 0 0⎥ ⎥ 0⎦ −3

In the columns where appear the ones above the eigenvalue indicate that there are vectors that are not eigenvectors. In J1 the columns 2 and 3 indicate that for λ1 there are two vectors that are not eigenvectors. On the other hand for J3 all vectors are eigenvectors.

118

8 Matrix Diagonalization and Jordan Canonical Form

Theorem 8.6 Let A ∈ Mn×n (R), then there exists an invertible matrix Q ∈ Mn×n (R) such that: J = Q −1 AQ where J is a Jordan matrix whose diagonal elements are the eigenvalues of A. In addition, J is unique except by the order of the Jordan blocks. 

Proof It is left to the reader as an exercise. Example 8.7 From Example 8.6, if A is equivalent to J1 then A is equivalent to: ⎡ −3 ⎢0 ⎢ ⎣0 0

0 2 0 0

0 1 2 0

⎤ 0 0⎥ ⎥ 1⎦ 2

this is, the Jordan blocks are the same but the order can be different. Example 8.8 According to the methodology to find the matrix representation A of T , let P2 (R) the set of polynomials and β = {x1 , x2 , x3 } = a linear 2transformation  1, x, x a base for P2 (R). Let define T : P2 (R) → P2 (R) through T ( f ) = f  . The matrix representation A can be calculated as follows: T (1) = 0 = 0 · 1 + 0 · x + 0 · x 2 T (x) = 1 = 1 · 1 + 0 · x + 0 · x 2 T (x 2 ) = 2x = 0 · 1 + 2 · x + 0 · x 2 then

⎡ ⎤ 010 A = ⎣0 0 2⎦ . 000

8.2.1 Generalized Eigenvectors Definition 8.5 Let T a linear operator in a space vector V with dim V = n. An element nonzero v ∈ V is called generalized eigenvector of T if there exists an scalar λ such that (T − λI ) p (v) = 0 for any p ∈ Z+ . Definition 8.6 It said to be v is a generalized eigenvector corresponding to λ. Example 8.9 Let

8.2 Jordan Canonical Form

119



2 ⎢0 ⎢ ⎢0 ⎢ ⎢0 J =⎢ ⎢0 ⎢ ⎢0 ⎢ ⎣0 0

1 2 0 0 0 0 0 0

0 1 2 0 0 0 0 0

0 0 0 2 0 0 0 0

0 0 0 0 3 0 0 0

0 0 0 0 1 3 0 0

0 0 0 0 0 0 0 0

⎤ 0 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 1⎦ 0

The characteristic polynomial is p(λ) = det (J − λI ) = (λ − 2)4 (λ − 3)2 λ2 where λ1 = 2 with algebraic multiplicity 4, λ2 = 3 with algebraic multiplicity 2 and λ3 = 0 with algebraic multiplicity 2. From the associated basic vectors x1 , x2 , . . . , x8 to λ1 , λ2 and λ3 only x1 , x4 , x5 and x7 are eigenvectors. Since x1 , x4 are corresponding eigenvectors to λ1 then: (T − 2I ) (x1 ) = (T − 2I ) (x4 ) = 0 but x2 and x3 are not eigenvectors. Hence: (T − 2I ) (x2 ) = 0 , (T − 2I ) (x3 ) = 0 On the other hand since J is the equivalent matrix to matrix representation of T then T (x2 ) = x1 + 2x2 + 0x3 + · · · + 0x8 T (x3 ) = 0x1 + x2 + 2x3 + 0x4 + · · · + 0x8 then (T − 2I )2 (x2 ) = (T − 2I ) (T (x2 ) − 2x2 ) = (T − 2I ) (x1 ) = 0 and (T − 2I )3 therefore, although (x3 ) = (T − 2I )2 (T (x3 ) − 2x3 ) = (T − 2I )2 (x2 ) = 0, (T − 2I ) (x2 ) = 0 and (T − 2I ) (x3 ) = 0 it follows that (T − 2I ) p (x2 ) = (T − 2I ) p (x3 ) = 0 if p ≥ 2 If p is the smallest positive integer such that (T − λI ) p (x) = 0 then (T − λI ) p−1 (x) = y is an eigenvector corresponding to eigenvalue λ. Definition 8.7 Let T a linear operator in a space vector V . A subspace W of V is called T − cyclic subspace if there exists an element v ∈ W such that W is equal to the generated subspace by {v, T (v), T 2 (v), . . .}. In this case is, it is said that W is generated by v. Example 8.10 Let T : R3 → R3 defined by T (a, b, c) = (−b + c, a + c, 3c). The T − cyclic subspace generated by e1 = (1, 0, 0) is determined as follows: T (e1 ) = T (1, 0, 0) = (0, 1, 0) = e2 T 2 (e1 ) = T (T (e1 )) = T (e2 ) = (−1, 0, 0) = −e1

120

8 Matrix Diagonalization and Jordan Canonical Form

then

  L e1 , T (e1 ), T 2 (e1 ), . . . = L {e1 , e2 } .

Definition 8.8 Let T and V as above definition. Let v a generalized eigenvector of T corresponding to eigenvalue λ. If p is the smallest positive integer such that  (T − λI ) p (v) = 0 then the set (T − λI ) p−1 (v), (T − λI ) p−2 (v), . . . , (T − λI ) (v), v is called a cycle of generalized eigenvectors of T corresponding to eigenvalue λ. Theorem 8.7 Let T a linear operator in V , γ a cycle of generalized eigenvectors of T corresponding to eigenvalue λ then: 1. The initial vector of γ is a eigenvector of T corresponding to eigenvalue λ and no other element of γ is a eigenvector of T . 2. γ is linearly independent. 3. Let β a base of V . Then β is a Jordan canonical base for V if and only if β is a disjoint union of cycles of generalized eigenvectors of T . Proof It is left to the reader as an exercise.



Definition 8.9 Let λ a eigenvalue of a linear operator T in V . The set given by:   K λ = v ∈ V : (T − λI ) p (v) = 0, p ∈ Z+ is called generalized eigenspace of T corresponding to λ such that E λ ⊆ K λ . If the algebraic multiplicity of λ is m then dim K λ = m, hence dim E λ ≤ dim K λ = m.

8.2.2 Dot Diagram Method As an aid to calculate Ji and the base βi , we introduce an arrangement of points called dot diagram. Let suppose that βi is a disjoint union of cycles c1 , c2 , . . . , cki with lengths p1 ≥ p2 ≥ · · · ≥ pki respectively. The diagram is constructed as follows. 1. The arrangement consists in ki columns. One column for each cycle. 2. From left to right, the column j has p j points that correspond to elements of c j in the following form. If x j is the vector terminal of c j the: • The top point corresponds to (T − λi I ) p j −1 (x j ). • The second point corresponds to (T − λi I ) p j −2 (x j ). • The last point (point of the bottom) of the column corresponds to x j . Finally, the associated diagram to βi can be written as:

8.2 Jordan Canonical Form

121

• (T − λi I ) p1 −1 (x1 ) • (T − λi I ) p2 −1 (x2 ) · · · • (T − λi I ) p1 −2 (x1 ) • (T − λi I ) p2 −2 (x2 ) · · · .. .. . . .. .. . . • (T − λi I ) (x2 ) • (T − λi I ) (x1 ) • x2 • x1

• (T − λi I ) pki −1 (xki ) • (T − λi I ) pki −2 (xki ) .. . • •

(T − λi I ) (xki ) x ki

where vector x j always is a eigenvector. Note that dot diagram is for each λi . Besides since p1 ≥ p2 ≥ · · · ≥ pki , the columns of the dot diagram become shorter (or at least no longer) when we move from left to right. How many points are there in the first row of the diagram? The following theorem has the answer to this question. Theorem 8.8 Let r j the number of points in the row j of a dot diagram for the base βi , then r1 = dim V − rank (T − λi I ) r j = rank (T − λi I ) j−1 − rank (T − λi I ) j for j > 1. 

Proof It is left to the reader as an exercise. Example 8.11 In Example 8.4, we saw that matrix A given by ⎡ ⎤ −3 3 −2 A = ⎣−7 6 −3⎦ 1 −1 2

is not diagonalizable. For this matrix, the eigenvalues are λ1 = 2 with algebraic multiplicity 2 and dim E λ1 = 1, λ2 = 1 with algebraic multiplicity 1 and dim E λ2 = 1. Besides dim E λ1 < dim K λ1 = 2. Let β1 a base for K λ1 , by Theorem 8.8, r1 = 3 − rank (A − 2I ) = 3 − 2 = 1 Hence in the firs row there is one point. Since dim K λ1 = 2 there are only two linearly independent vectors, then in the second row there is one point too. Therefore the associated dot diagram with βi is: • (A − 2I ) (x1 ) • x1 where x1 is a linearly independent eigenvector, then 

21 J1 = 02



122

8 Matrix Diagonalization and Jordan Canonical Form

After some calculations, the linearly independent eigenvector is obtained: ⎡

⎤ 1 x1 = ⎣ 1 ⎦ −1 To find the generalized eigenvector v, the easiest method for practice cases is to calculate from the equation (A − 2I ) v = x1 , i.e., ⎡ ⎤⎡ ⎤ ⎡ ⎤ −5 3 −2 v1 1 ⎣−7 4 −3⎦ ⎣v2 ⎦ = ⎣ 1 ⎦ 1 −1 0 −1 v3 ⎡ ⎤ 1 v = ⎣2 ⎦ 0

where it follows that:

hence

⎧⎡ ⎤ ⎡ ⎤⎫ 1 ⎬ ⎨ 1 β1 = ⎣ 1 ⎦ , ⎣2⎦ ⎩ ⎭ −1 0

then K λ1

⎧⎡ ⎤ ⎡ ⎤⎫ 1 ⎬ ⎨ 1 = L ⎣ 1 ⎦ , ⎣2 ⎦ ⎩ ⎭ −1 0

On the other hand, dim E λ2 = dim K λ2 = 1. Let β2 a base for K λ2 , since dim K λ2 = 1 there is only one linearly independent vector and the dot diagram is: • x1 then J2 = (λ2 ) = (1). Solving (A − I ) v = 0 it is obtained ⎡ ⎤ 1 v = ⎣2⎦ 1 then

then

⎧ ⎡ ⎤⎫ ⎨ 1 ⎬ β2 = ⎣2⎦ ⎩ ⎭ 1

8.2 Jordan Canonical Form

123

K λ2

⎧⎡ ⎤⎫ ⎨ 1 ⎬ = L ⎣2 ⎦ ⎩ ⎭ 1

If β is the Jordan canonical base, by Theorem 8.7, ⎧ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 1 1 ⎬ ⎨ 1 β = β1 ∪ β2 = ⎣ 1 ⎦ , ⎣2⎦ , ⎣2⎦ ⎩ ⎭ −1 0 1 and the Jordan canonical form is ⎡

⎤ 0⎥ ⎥ 0⎥ ⎥ · · · · · ·⎥ ⎦ . 0 0 .. 1

⎢2 1 ⎢ ⎢ J =⎢0 2 ⎢· · · · · · ⎣

.. . .. .

Then the transformation matrix Q is: ⎡

⎤ 1 11 Q = ⎣ 1 2 2⎦ −1 0 1 Hence J = Q −1 AQ. Exercise 8.1 If the characteristic polynomial of a matrix B is (λ + 3)2 (λ − 2)2 (λ − 4), find all possible Jordan canonical forms.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Hoffman, K., Kunze, R.: Linear Algebra. Prentice-Hall (1971) Axler, S.: Linear Algebra Done-Right. Springer-Verlag (1997) Lipschutz, S.: Linear Algebra. McGraw Hill (2009) Anton, H.: Elementary Linear Algebra. Wiley (2010) Grossman, S.: Elementary Linear Algebra. Wadsworth Publishing (1987) Whitelaw, T.: Introduction to Linear Algebra. Chapman & Hall/CRC (1992) kolman, B.: Introductory Linear Algebra an Applied First Course. Pearson Education (2008) Halmos, P.R.: Finite-Dimensional Vector Spaces. Courier Dover Publications (2017) Nef, W.: Linear Algebra. Dover Publications (1967) Lang, S.: Linear Algebra. Springer (2004) Nering, E.: Linear Algebra and Matrix Theory. Wiley (1970)

Chapter 9

Differential Equations

Abstract The purpose of this chapter is to present several methods to solve differential equations, this chapter begins with a motivation based on physical phenomena that can be represented with differential equations, after that we introduce definitions and methods for first order and second order linear differential equations, various exercises are given to the reader to better illustrate the contents of this chapter.

9.1 Motivation: Some Physical Origins of Differential Equations 9.1.1 Free Fall Free fall is any motion of a body where gravity is the only force acting upon it. Consider the free fall of a body with mass m and y(0) = 0, y  (0) = 0 (Fig. 9.1). By second law of Newton we have:  +↓ Fy = ma mg = ma d2 y g=a= 2 dt   d2 y dv then dv = gdt hence: since 2 = dt dt v = gt + k1 Integrating the above equation, and taking into account that v =  y=

 vdt =

(gt + k1 )dt =

dy : dt

1 2 gt + k1 t + k2 2

© Springer Nature Switzerland AG 2019 R. Martínez-Guerra et al., Algebraic and Differential Methods for Nonlinear Control Theory, Mathematical and Analytical Techniques with Applications to Engineering, https://doi.org/10.1007/978-3-030-12025-2_9

125

126

9 Differential Equations

Fig. 9.1 Free fall

mg

Finally, by initial conditions of the problem: k1 = y  (0) − g(0) = 0 1 k2 = y(0) − g(0)2 + k1 (0) = 0 2 therefore y=

1 2 gt 2

9.1.2 Simple Pendulum Problem Now, according to Fig. 9.2, it is not difficult to obtain the following analysis based in kinetic and potential energy for the simple pendulum that consists in a mass m attached to a rope with length L. From points 1 and 2 in the figure: 1 2 mv + mgL (1 − cos θ ) = mgL (1 − cos α) 2 1 2 v + gL (1 − cos θ ) = gL (1 − cos α) 2 v2 + 2gL (1 − cos θ ) = 2gL (1 − cos α) since s = θ L then v =

ds dθ = L , we have: dt dt  L

dθ dt

2 + 2g (cos α − cos θ ) = 0

9.1 Motivation: Some Physical Origins of Differential Equations

127

Fig. 9.2 Simple pendulum

9.1.3 Friction Problem According to Fig. 9.3, we have: mg − F f = m

d2 y dt 2

then d2 y dt 2 d2 y dy =m 2 mg − K¯ dt dt mg − μK v = m

where K¯ = μK . Finally we have: d2 y K¯ dy −g=0 + 2 dt m dt

Fig. 9.3 Friction problem Ff

mg

128

9 Differential Equations

9.2 Definitions Definition 9.1 A differential equation of order n is a relationship of the form:   F = y (n) , y (n−1) , . . . , y (1) , y, x = 0 Remark 9.1 Note that y (1) =

dy = y dx

y (2) =

d2 y = y  dx 2

and

(9.1)

y  = y˙ = y (1) , y  = y¨ = y (2) are commonly used to represent the first and second derivatives with respect to the time respectively. Definition 9.2 It is said to be f : (a, b) → R, f ∈ C n , is a solution of (9.1), if for all x ∈ (a, b):   F = f (n) (x), f (n−1) (x), . . . , f  (x), f (x), x = 0

(9.2)

Example 9.1 1. The relation F = y  − y = 0 is a differential equation of second order. Note that f 1 (x) = ex f 2 (x) = k1 ex , k1 ∈ R f 3 (x) = ex + e−x f 4 (x) = k2 ex + k3 e−x , k2 , k3 ∈ R are solutions of F. 2. The solutions of the differential equation of second order y  − 2y  + y = 0 are given by y = f (x) = xex and y = ex . 3. y  = cos x is a nonlinear differential equation of first order. 4. y  + k 2 y= 0 is a differential equation of second order. 5. x 2 + y 2 dx + 2x ydy = 0 is a nonlinear differential equation of first order.  3 3 6. ddxw2 − x y dw + w = 0 is a differential equation of second order and third dx degree.

9.3 Separable Differential Equations

129

9.3 Separable Differential Equations Definition 9.3 The equation of first order F(y  , y, x) = 0 is called normal, if the first derivative can be obtained, i.e. [1–3]: dy = y  = f (x, y) dx

(9.3)

Definition 9.4 The Eq. (9.3) is said to be separable differential equation if we can write it in the following form: N (y)dy = M(x)dx

(9.4)

Solving separable differential equations is easy. We first rewrite the differential equation as the follows N (y)dy = M(x)dx Then, integrating in both sides, we have: 

 N (y)dy =

M(x)dx

Example 9.2 1. x dx − y dy = 0. In this case, the differential equation can be write in the following form x dx = y dy, then √ integrating in both sides we have 2 2 + c = y , where it follows that y = ± x 2 + c. that x     2 2 2. x 1 + y dx − y 1 + x dy = 0. For this DE we have:     x 1 + y 2 dx − y 1 + x 2 dy   y 1 + x 2 dy  x dx 1 + x2 ln(1 + y 2 )

=0

  = x 1 + y 2 dx  y dy = 1 + y2 = ln(1 + x 2 ) + C

y 2 = C1 (1 + x 2 ) − 1

y = ± C1 (1 + x 2 ) − 1 3.

dy dx

+ y 2 ex = y 2 . This differential equation is separable, then:

130

9 Differential Equations

dy + y 2 ex = y 2 dx   dy = (1 − e x )d x y2 −y −1 = x − e x + C 1 y= x e −x +C dy = cos2 (y) sin(x). dx 1+y dy = 5. dx 1−x 6. ay ln(x) = dy − ay, a ∈ R dx The solution for differential equations 4, 5 and 6 are left to the reader as an exercise. 4.

9.4 Homogeneous Equations Definition 9.5 A function f (x, y) is said to be homogeneous of degree n (n = 0, 1, . . .) if for all t > 0 [4, 5]: f (t x, t y) = t n f (x, y) Example 9.3 1. If f (x, y) = x 3 + x y 2 then f (t x, t y) = (t x)3 + (t x)(t y)2 = t 3 (x 3 + x y 2 ) = t 3 f  (x,  y). Hence f (t, x) is homogeneous of degree 3. x is homogeneous of degree 0: 2. f (x, y) = 1 + sin y 

tx f (t x, t y) = 1 + sin ty



  x = 1 + sin = t 0 f (x, y) y

3.

In the same manner as above, it is not difficult to prove that f (x, y) = x + x 2 + y 2 is homogeneous of degree 1. The proof is left to the reader. Definition 9.6 An equation geneous of zero degree. Example 9.4

dy dx

= f (x, y) is said to be homogeneous if f is homo-

dy x+y = dx 3x − 2y

is homogeneous because f (x, y) =

x+y 3x−2y

is a homogenous function of degree zero:

9.4 Homogeneous Equations

131

f (t x, t y) =

t (x + y) tx + ty = = t 0 f (x, y) 3t x − 2t y t (3x − 2y)

Proposition 9.1 Let M(x, y) and N (x, y) homogeneous functions of the same degree (say n) then the differential equation M(x, y)d x + N (x, y)dy = 0 is homogeneous. Proof The proof is immediate and it is left to the reader.



Now, let consider f (x, y) a homogeneous equation of zero degree, i.e., f (t x, t y) = t 0 f (x, y) = f (x, y) if

y 1 = f (x, y), t = , x = 0 f 1, x x

then for a differential equation

dy dx

= f (1, xy ), let define z = xy , then y = xz

Deriving (9.5), we have d dx dz dy dz = +z =x + z = f (1, z) (x z) = x dx dx dx dx dx then

dz dx = f (1, z) − z x

The problem is solved as a separable differential equation problem. Example 9.5 Given y =

2x + y x

We define z = xy . Then x

dz +z = 2+z dx dz =2 x dx   dx dz = 2 x z = 2 ln x + k, k ∈ R

(9.5)

132

9 Differential Equations

Returning to the original variable

y x

= 2 ln x + k. Finally:

y = ln x 2x + kx Exercise 9.1 Solve the following differential equations. 1. 2. 3. 4. 5.

x+y dy = dx x−y

2 + y 2 dx = y dy x   2 x + y 2 dx x y dy = 0 1+y dy = dx 1−x ay ln(x) = dy − ay, a ∈ R dx

9.5 Exact Equations Let F(x, y), then dF =

∂F ∂F dx + dy. If F is a constant function, then ∂x ∂y

M(x, y)dx + N (x, y)dy =

∂F ∂F dx + dy = 0 ∂x ∂y

(9.6)

∂F ∂F = M(x, y), = N (x, y). The differential equation (9.6) is said to be ∂x ∂y exact if there exists F(x, y) such that: hence

M(x, y) =

∂ F(x, y) ∂ F(x, y) and N (x, y) = ∂x ∂y

In this case, the Eq. (9.6) can be written as: dF = M(x, y)dx + N (x, y)dy = 0 and its solutions are given by F(x, y) = constant. Example 9.6 The differential equation   yex dx + ex + 2y dy = 0 is exact. Let F(x, y) = yex + y 2 = c with c a constant, then ∂ F(x, y) ∂ F(x, y) = yex and = ex + 2y ∂x ∂y

9.5 Exact Equations

133

Now let us check the following problem. If M(x, y)dx + N (x, y)dy is exact, then ∂N ∂M = ∂y ∂x if and only if dx and dy are continuous. Reciprocally if M(x, y)dx + N (x, y)dy is a differential equation such that: ∂N ∂M = ∂y ∂x then the equation is exact. To solve this problem, let us suppose the equation ∂N , then there exists F(x, y) such that ∂x M=

F(x, y) = ∂F ∂y

∂F ∂x

M(x, y) dx + g(y)

(9.7)

= N (x, y), i.e. ∂ ∂y

then

=



therefore

but

∂M ∂y





M(x, y) dx + g  (y) = N (x, y)

∂ g (y) = N (x, y) − ∂y 

Finally

 N (x, y) −

g(y) =

∂ ∂y

 M(x, y) dx 

 M(x, y) dx dy

Now, the substitution in (9.7) of g(y) is useful to find the function F(x, y). Example 9.7 In the differential equation   yex dx + ex + 2y dy = 0 let M = yex and N = ex + 2y, so that ∂∂My = ex and differential equation is exact. From (9.7):

∂N ∂x

= ex . Therefore the given

 F(x, y) =

yex dx + g(y) = yex + g(y)

134

9 Differential Equations

In addition N=

∂ F(x, y) = ex + g  (y) = ex + 2y ∂y

so that g  (y) = 2y and g(y) = y 2 + k. Finally F(x, y) = yex + y 2 = constant Exercise 9.2 Solve the following differential equations.   2 dy + ydx = 0 1. x + y    2. y − x 3 dx + x + y 3 dy = 0 3. (y + y cos(x y)) dx + (x + x cos(x y)) dy = 0 4. (sin x sin y − xe y ) dy = (e y + cos x cos y) dx 5. e y dx + (xe y + 2y) dy = 0

9.6 Linear Differential Equations Definition 9.7 A differential equation of order n is said to be linear if has the following form: an (x)y (n) + an−1 (x)y (n−1) + · · · + a1 (x)y (1) + a0 (x)y = f (x)

(9.8)

As well as, the differential equation (9.8) is said to be homogeneous [6, 7] if f (x) = 0 ∀ x and the differential equation (9.8) is said to be with constants coefficients if an (x), an−1 (x), . . . , a1 (x), a0 (x) are constants functions. Example 9.8 1. 2x y  − x y  = 2 is a differential equation of second order, linear, with variables coefficients and non homogeneous. 2. y (3) − y (2) + x y = 0 is a differential equation of third order, linear , with variables coefficients and homogeneous. 3. y (2) + λ2 y = cos(ωx) is a linear differential equation of second order with constants coefficients and non homogeneous. Now we consider three cases. The first case is given by the following differential equation ay  + by = 0 a, b are constants, a = 0 (9.9) Solving (9.9) we have:

9.6 Linear Differential Equations

135

dy = −by dx dy b = − dx y a   dy b =− dx y a b ln y = − x + C1 a a

b

y = C 2 e− a x with C1 , C2 = eC1 constants. The second case is given by the following differential equation ay  + by = f (x) a, b are constants, a = 0 Rewriting (9.10) we have: where λ = ab , h(x) =

1 a

y  + λy = h(x)

(9.10)

(9.11)

f (x), a = 0. On the other hand, we have

   λx  e y = eλx y  + eλx λy = eλx y  + λy From (9.11) multiplying by eλx :   eλx y  + λy = eλx h(x) so that we have

 λx  e y = eλx h(x)

Finally, eλx y =



eλx h(x) dx + C

and the solution of (9.10) is given by: y=e

−λx



eλx h(x) dx + e−λx C

Example 9.9 Given the differential equation y  + 2y = e5x Applying the methodology previously explained, we have

(9.12)

136

9 Differential Equations

  e2x y  + 2y = e7x  2x  e y = e7x 1 e2x y = e7x + K 7 1 y(x) = e5x + K e−2x 7 Note that a = 1, b = 2, f (x) = e5x , so that λ = ab = 2 and h(x) = and from (9.12), the solution of the differential equation is: y(x) =

1 a

f (x) = e5x

1 5x e + K e−2x 7

and K is the integration constant. Finally the third case is given by the following differential equation: y  + b(x)y = f (x) Notice that

(9.13)

   B(x)  y = e B(x) y  + B  (x)y e

so that it is proposed B  (x) = b(x). From (9.13) multiplying by e B(x), we obtain:     e B(x) y  + b(x)y = e B(x) y = e B(x) f (x) 

therefore e B(x) y =

e B(x) f (x) dx + C

and the solution of (9.13) is:   y = e−B(x) e B(x) f (x) dx + Ce−B(x) , with B(x) = b(x)

(9.14)

Example 9.10 To solve the following differential equation y  + y sin x = sin x note that B  (x) = b(x) = sin x, then B(x) = sin x. Then from (9.14), we have:



sin x = − cos x + k1 and f (x) =

9.6 Linear Differential Equations

137

 y = ecos x−k1 =e

−k1 cos x



e− cos x+k1 sin x dx + Cecos x−k1

e 

ek1 e− cos x sin x dx + Cecos x−k1

e− cos x sin x dx + Cecos x−k1   = ecos x e− cos x + k2 + Cecos x−k1 = ecos x

= 1 + k2 ecos x + Cecos x−k1 y = 1 + k3 ecos x where k1 , k2 are integration constants and k3 = k2 + β, β = Ce−k1 .

9.7 Homogeneous Second Order Linear Differential Equations In the homogeneous second order linear differential equation with constants coefficients a0 , a1 : (9.15) y  + a1 y  + a0 y = 0 suppose that y = eλx is a solution. The derivatives of the proposed solution are given by y  = λeλx , y  = λ2 eλx Substituting in (9.15), we have:   λ2 eλx + a1 λeλx + a0 eλx = eλx λ2 + a1 λ + a0 = 0 If λ is a root of the equation λ2 + a1 λ + a0 = 0 then the function eλx is a solution of (9.15). The polynomial p(λ) = λ2 + a1 λ + a0 is called the characteristic polynomial and the equation λ2 + a1 λ + a0 = 0 is the characteristic equation of (9.15) [2, 5, 7]. The roots of the characteristic equation are called characteristic roots. Theorem 9.1 Given the differential equation y  + a1 y  + a0 y = 0

(9.16)

We consider the roots of characteristic equation: λ2 + a 1 λ + a 0 = 0 1. If r1 = r2 , let define ϕ1 (x) = er1 x , ϕ2 (x) = er2 x . 2. If r1 = r2 = r , let define ϕ1 (x) = er x , ϕ2 (x) = xer x .

(9.17)

138

9 Differential Equations

then in any case, the functions ϕ1 (x) and ϕ2 (x) are solutions. In addition, any solution ϕ(x) of differential equation is of the form: y(x) = c1 ϕ1 (x) + c2 ϕ2 (x) where c1 , c2 are constants. Proof We suppose that the roots of equation λ2 + a1 λ + a0 = 0 are equals (r1 = r2 = r ). ϕ2 (x) = xer x is a solution of (9.16), i.e., ϕ2 (x) = (r x + 1)er x , ϕ2 (x) = rx rx rx rx r (r x + 2)e  2. Substituting in (9.16) wehave r (r x + 2)e + a1 (r x + 1)e + a0 xe rx then e x(r + a1r + a0 ) + 2r + a1 . Since r is a root, this yields to er x [2r + a1 ]

(9.18)

1 Since r = r1 = r2 , r = − a1 . Substituting in (9.18), we have 2

   1 er x 2 − a1 + a1 = 0 2 Hence ϕ2 is a solution of (9.16). If ϕ1 and ϕ2 are solutions of (9.16) then ϕ = c1 ϕ1 + c2 ϕ2 is a solution of (9.16), since      c1 ϕ1 + c2 ϕ2 + a1 c1 ϕ1 + c2 ϕ2 + a0 (c1 ϕ1 + c2 ϕ2 )     = c1 ϕ1 + a1 ϕ1 + a0 ϕ1 + c2 ϕ2 + a1 ϕ2 + a0 ϕ2 = 0 Thus the linear combination ϕ = c1 ϕ1 + c2 ϕ2 is a solution of (9.16). Example 9.11 1. Let

y  − 4y  + 4y = 0

 (9.19)

Its characteristic equation is given by λ2 − 4λ + 4 = (λ − 2)2 = 0 and its roots are given by λ = 2 with multiplicity m = 2. Therefore the solutions (according to Theorem 9.1) are given by ϕ1 (x) = e2x , ϕ2 (x) = xe2x and the complete solution for (9.19) is y(x) = c1 ϕ1 (x) + c2 ϕ2 (x) = c1 e2x + c2 xe2x

9.7 Homogeneous Second Order Linear Differential Equations

139

2. The differential equation given by y  − 3y  + 2y = 0 has a characteristic equation λ2 − 3λ + 2 = (λ − 1)(λ − 2) = 0 with roots λ1 = 1 and λ2 = 2. Thus the functions y1 = e x and y2 = e2x are solutions of the linear homogeneous equation. Exercise 9.3 Solve the following differential equations. 1. 2. 3. 4.

y  − 4y  + 5y = 0 y  + 8y = c, c ∈ R 2y  + 2y  + 3y = 0 The derivative of any solution of y  + a1 y  + a0 y = 0 is a solution of the equation as well.

9.8 Variation of Parameters Method Given the non-homogeneous differential equation of second order with constants coefficients: (9.20) y  + a1 y  + a0 y = b(x) if b(x) = 0, then (9.20) is a homogeneous equation. The solution of (9.20) is given in the following result. Theorem 9.2 If ϕ1 (x) and ϕ2 (x) are solutions of the homogeneous equation associate to (9.20) and ϕ p (x) is any solution of (9.20), i.e., a particular solution, then the general solution of (9.20) is: y(x) = c1 ϕ1 (x) + c2 ϕ2 (x) + ϕ p (x) where c1 , c2 are constants. The methodology to solve (9.20) is described as follows. The constants c1 and c2 are replaced by unknown functions μ1 (x) and μ2 (x) such that y = μ1 ϕ1 + μ2 ϕ2

(9.21)

is a solution of (9.20). Therefore, the main objective is to find μ1 (x) and μ2 (x) such that (9.21) is a solution of the non-homogeneous equation (9.20). From (9.21), we have y  = μ1 ϕ1 + μ1 ϕ1 + μ2 ϕ2 + μ2 ϕ2 y  = 2μ1 ϕ1 + 2μ2 ϕ2 + μ1 ϕ1 + μ1 ϕ1 + μ2 ϕ2 + μ2 ϕ2 Substituting the above equations in (9.20), we have μ1 ϕ1 + a1 μ1 ϕ1 + a0 μ1 ϕ1 = 0 μ2 ϕ2 + a1 μ2 ϕ2 + a0 μ2 ϕ2 = 0

140

9 Differential Equations

since ϕ1 and ϕ2 are solutions of the homogeneous equation, then: 2μ1 ϕ1 + 2μ2 ϕ2 + μ1 ϕ1 + ϕ2 μ2 + a1 μ1 ϕ1 + a1 μ2 ϕ2 = b(x)

(9.22)

Furthermore, if ϕ1 μ1 + ϕ2 μ2 = 0, then its derivative is ϕ1 μ1 + ϕ1 μ1 + ϕ2 μ2 + = 0, finally from (9.22) we have

ϕ2 μ2

μ1 ϕ1 + μ2 ϕ2 = b(x) Note that we have to solve a system of equations given by: 

ϕ1 μ1 + ϕ2 μ2 = 0

(9.23)

ϕ1 μ1 + ϕ2 μ2 = b(x) where μ1 , μ2 are the variables to determine.

Definition 9.8 Let the system (9.23), the Wronskian (Wronsky determinant) is defined by:   ϕ ϕ  (9.24) W (ϕ1 , ϕ2 ) =  1 2  = ϕ1 ϕ2 − ϕ2 ϕ1 = 0 ϕ1 ϕ2 Finally, the solution of (9.23) is:    0 ϕ2    b(x) ϕ   −b(x)ϕ2 2  = μ1 = W (ϕ1 , ϕ2 ) W (ϕ1 , ϕ2 )   ϕ1 0     ϕ b(x) b(x)ϕ1 1  μ2 = = W (ϕ1 , ϕ2 ) W (ϕ1 , ϕ2 )

(9.25)

(9.26)

and the functions μ1 , μ2 are given by the integration from (9.25) and (9.26)  μ1 = −

b(x)ϕ2 , μ2 = W (ϕ1 , ϕ2 )



b(x)ϕ1 W (ϕ1 , ϕ2 )

(9.27) 

Example 9.12 Let

y  − 2y  + y = 2x

Note that the characteristic equation associate to the homogeneous differential equation is λ2 − 2λ + 1 = 0. The root of this equation is λ = 1 with multiplicity two, then ϕ2 (x) = xex ϕ1 (x) = ex  x ϕ1 (x) = e ϕ2 (x) = xex + ex

9.8 Variation of Parameters Method

141

and the Wronskian for this problem is given by   x e   xe x  = e x xe x + e x − xe x e x = xe2x + e2x − xe2x = e2x  = 0 W (ϕ1 , ϕ2 ) =  x x x  e xe + e

According to the methodology described above, we have 

  2  2x xex 2 −x −x x dx = −2 x e dx = 2e + 2x + 2 e2x   2x ex μ2 = dx = 2 xe−x dx = −2e−x (x + 1) e2x

μ1 = −

Finally ϕ(x) = μ1 ϕ1 + μ2 ϕ2 = 2x + 4 Exercise 9.4 Solve the following differential equations. 1. 2. 3. 4.

y  + 4y = tan 2x y  + 2y  + y = e−x ln x y  − 2y  − 3y = xe−x y  + 2y  + 5y = e−x sec 2x

9.9 Initial Value Problem Let x0 ∈ R, α, β ∈ R. To find a solution ϕ of the Eq. (9.15) such that ϕ(x0 ) = α , ϕ  (x0 ) = β any solution that satisfies the conditions, can be written as y = c1 ϕ1 + c2 ϕ2 Theorem 9.3 Let x0 , α, β ∈ R, there exist unique constants c1 , c2 such that the function ϕ(x) = c1 ϕ1 (x) + c2 ϕ2 (x), ϕ(x0 ) = α, ϕ  (x0 ) = β where ϕ1 , ϕ2 are given by 1. ϕ1 (x) = er1 x , ϕ2 (x) = er2 x if r1 = r2 . 2. ϕ1 (x) = er x , ϕ2 (x) = xer x if r1 = r2 = r . Proof Let c1 and c2 such that the following system

142

9 Differential Equations

c1 ϕ1 (x0 ) + c2 ϕ2 (x0 ) = α = ϕ(x0 )



c1 ϕ1 (x0 ) + c2 ϕ2 (x0 ) = β = ϕ  (x0 )

(9.28)

has a unique solution for cases 1 and 2. This condition is satisfied by   ϕ (x ) ϕ (x ) W (ϕ1 , ϕ2 ) =  1 0 2 0  = ϕ1 (x0 )ϕ2 (x0 ) − ϕ2 (x0 )ϕ1 (x0 ) = 0 ϕ1 (x0 ) ϕ2 (x0 ) For the case 1, the Wroskian is given by   rx  e1 er2 x   = er1 x r2 er2 x − er2 x r1 er1 x = (r2 − r1 ) e(r1 +r2 )x = 0 W (ϕ1 , ϕ2 ) =  r1 x r1 e r2 er2 x  and for the case 2, the Wroskian is given by   rx  e xer x  = e2r x = 0  W (ϕ1 , ϕ2 ) =  r x r e (1 + r x)er x  therefore the solution is unique and there exist constants c1 , c2 such that ϕ = c1 ϕ1 (x) + c2 ϕ2 (x)



Proposition 9.2 Abel’s Differential Equation Identity. If ϕ1 (x) and ϕ2 (x) are solutions of: (9.29) y  + a1 y  + a0 y = 0 then there exist A such that

W (ϕ1 , ϕ2 ) = Ae−a1 x

(9.30)

Proof By hypothesis ϕ1 (x) and ϕ2 (x) are solutions of (9.29), i.e. ϕ1 + a1 ϕ1 + a0 ϕ1 = 0 , ϕ2 + a1 ϕ2 + a0 ϕ2 = 0 so that

then

    ϕ1 ϕ2 + a1 ϕ2 + a0 ϕ2 − ϕ2 ϕ1 + a1 ϕ1 + a0 ϕ1 = 0     ϕ1 ϕ2 − ϕ2 ϕ1 + a1 ϕ1 ϕ2 − ϕ1 ϕ2 = 0

On the other hand, we have   ϕ1 ϕ2   W (ϕ1 , ϕ2 ) =     = ϕ1 ϕ2 − ϕ2 ϕ1 ϕ1 ϕ2 and its derivative is given by

(9.31)

9.9 Initial Value Problem

143

d W (ϕ1 , ϕ2 ) = ϕ1 ϕ2 − ϕ2 ϕ1 dx Note that (9.31) can be written as d W (ϕ1 , ϕ2 ) + a1 W (ϕ1 , ϕ2 ) = 0 dx

(9.32)

or equivalently, we have 

dW (ϕ1 , ϕ2 ) = −a1 W (ϕ1 , ϕ2 )

 dx

this yields to ln W = −a1 x + c with c the integration constant. Finally, the solution of (9.32) is W (ϕ1 , ϕ2 ) = Ae−a1 x where A = ec .



Remark 9.2 If the roots of characteristic equation λ2 + a1 λ + a0 = 0 of (9.29) r1 , r2 ∈ C given by r1 = a + bi , r2 = a − bi, a, b ∈ R, b = 0 then the general solution of differential equation (9.29) is ϕ( x) = c1 er1 x + c2 er2 x = c1 e(a+bi)x + c2 e(a−bi)x   = eax c1 eibx + c2 e−ibx = eax [c1 (cos bx + i sin bx) + c2 (cos bx − i sin bx)] = eax [(c1 + c2 ) cos bx + (c1 − c2 )i sin bx] = Aeax cos bx + Beax sin bx, A = (c1 + c2 ), B = i(c1 − c2 ) In particular we have the following solutions ϕ1 (x) = eax cos bx , ϕ2 (x) = eax sin bx and W (ϕ1 , ϕ2 ) = be2ax . Example 9.13 Given the differential equation y  + k 2 y = cos ωx

144

9 Differential Equations

we have λ2 + k 2 = 0, whose roots are λ1,2 = ±ki. According to the above remark, ϕ1 (x) = ekxi = cos kx + i sin kx ϕ2 (x) = e−kxi = cos kx − i sin kx then ϕ(x) = A cos kx + B sin kx,

A = c1 + c2 , B = i(c1 − c2 )

We define ϕ¯1 = cos kx, ϕ¯2 = sin kx then W (ϕ¯1 , ϕ¯2 ) = k = 0, that implies that μ1 = 1 1 − cos (ωx) sin(kx), μ2 = cos (ωx) cos(kx). Finally the solution is given by ϕ = k k μ1 Aϕ¯1 + μ2 B ϕ¯2 . It is left as an exercise to obtain μ1 , μ2 . Definition 9.9 Let ϕ1 (x), ϕ2 (x) two solutions of the homogeneous equation y  + a1 y  + a0 y = 0 The solutions are said to be linearly independents (or independents) if c1 ϕ1 (x) + c2 ϕ2 (x) = 0 then c1 = c2 = 0. Proposition 9.3 Two solutions ϕ1 (x) and ϕ2 (x) are linearly independents if and only if the Wronskian is not zero, i.e. W (ϕ1 , ϕ2 ) = 0. Proof (Sufficiency) If W (ϕ1 , ϕ2 ) = 0, i.e.,   ϕ1 ϕ2   W (ϕ1 , ϕ2 ) =     = ϕ1 ϕ2 − ϕ2 ϕ1 = 0 ϕ1 ϕ2 then     0 ϕ2  ϕ1 0      0 ϕ  ϕ 0 0 0 2 1 = = 0, c2 = = =0 c1 = W (ϕ1 , ϕ2 ) W (ϕ1 , ϕ2 ) W (ϕ1 , ϕ2 ) W (ϕ1 , ϕ2 ) Hence we have the trivial solution c1 = c2 = 0, thus ϕ1 and ϕ2 are linearly independents. (Necessity) Suppose ϕ1 and ϕ2 are linearly independents. If W (ϕ1 , ϕ2 ) = 0, we have hat the determinant is equal zero thus ϕ1 and ϕ2 are linearly dependents which is a contradiction. Hence W (ϕ1 , ϕ2 ) = 0. 

9.10 Indeterminate Coefficients

145

9.10 Indeterminate Coefficients The method of indeterminate coefficients is used to find particularly solutions of the non-homogeneous linear equation of second order with constants coefficients. Before reviewing the methodology, some examples of motivation are presented. Example 9.14 1. The function y = 5 is a particular solution of the differential equation y  + 2y  + y = 5. 2. For the differential equation y  + y  − y = 3x, the function y = −3x − 3 is a particular solution since y  = −3 and y  = 0. 3. If ϕ(x) = ax + b satisfies the differential equation y  + 5y  + 2y = 2x, then ϕ  (x) = a, ϕ  (x) = 0 so that 5a + 2(ax + b) = 2x. It is not hard to see that 5 a = 1, b = − . Therefore ϕ(x) = x − 5/2. 2 4. Let y  + 6y = 2x 2 + x − 3 a given differential equation (DE). The function ϕ(x) = ax 2 + bx + c is proposed as the solution for the DE. The problem consists in determine the values of a, b and c of ϕ(x). Differentiating ϕ(x) we have that ϕ  (x) = 2ax + b, ϕ  (x) = 2a then substituting in the DE: 2a + 6(ax 2 + bx + c) = 2x 2 + x − 3 or equivalently 6ax 2 + 6bx + (2a + 6c) = 2x 2 + x − 3 It is said that two polynomials of degree n are equal if the coefficients of the monomials of equal degree are equal, then 6a = 2, 6b = 1 and 2a + 6c = −3, therefore 1 1 11 ϕ(x) = − x 2 + x + 18 6 3 Exercise 9.5 For the differential equation y  + 2y  + y = sin x, the general form of the solution is given by ϕ(x) = A sin x + B cos x. According to the methodology of the examples, what are the values of A and B for the proposed solution ϕ(x)? The examples provide information that leads to the general method to solve the following differential equation: y  + a1 y  + a0 y = b(x)

(9.33)

If b(x) = an x n + an−1 x n−1 + · · · + a1 x + a0 , then a polynomial of the same degree is proposed as the solution and the coefficients must be determined: ϕ p (x) = αn x n + αn−1 x n−1 + · · · + α1 x + α0 On the other hand, if b(x) = A sin ωx + B cos ωx, the proposed solution is given by

146

9 Differential Equations

ϕ p (x) = A sin ωx + B cos ωx even if any of the coefficients A or B are zero, i.e, ϕ p (x) = A sin ωx or ϕ p (x) = B cos ωx. In the same manner, if b(x) = aekx , then the proposed solution is given by ϕ p (x) = kx Ae . To determine the value of A, note that ϕ  (x) = Akekx , ϕ  (x) = Ak 2 ekx then substituting in the DE: Ak 2 ekx + a1 Akekx + a0 Aekx = aekx so that: A=

k2

a + a1 k + a0

Remark 9.3 Consider the Eq. (9.33). To apply the method of indeterminate coefficients, b(x) must satisfy the following. There exist b1 , b2 , . . . , bn such that any derivative b(k) (x) must be able to be written as a linear combination of them. To verify this fact, it is necessary to derive b(x) a sufficiently large number of times and observing that after a certain order of the derivative, all the terms that appear are repeated. Exercise 9.6 Find a solution to the following differential equations. 1. y  + 3y = e2x sin x. Hint: ϕ(x) = Ae2x sin x + Be2x cos x. 2. y  + 3y  = xex + x 2 cos x

9.11 Solution of Differential Equations by Means of Power Series 9.11.1 Some Criterions of Convergence of Series ∞ Let a sequence {an }∞ n=1 , the serie with general term an is the sequence {sn }n=1 defined by

s1 = a 1 s2 = a 1 + a 2 s3 = a 1 + a 2 + a 3 .. . sn = a 1 + a 2 + · · · + a n n  sn = ak k=1

9.11 Solution of Differential Equations by Means of Power Series



i.e. {sn }∞ n=1

=

n 

147

∞ ak

k=1

Remark 9.4 This sequence will be denoted by

n=1



an . If the index of an begins in

n 0 , i.e. {an }∞ n=n 0 , then the sequence will be denoted by 

an

n=n 0

 It is said that an converges if the sequence s1 = a1 , s2 = a1 + a2 , s3 = a1 + a2 + a3 , . . . , (s1 , s2 , s3 , . . . are so called partial sums) converges. In this case, when a series an converges, the limit is denoted by ∞ 

an or

∞ 

an

n=n 0

n=1

Definition 9.10 A sequence (sn ) is said to be a Cauchy sequence if for every positive real number ε, there exists a positive integer N such that for all natural numbers m, n > N |sn − sm | < ε This condition is usually written as limm,n→∞ |sm − sn | = 0. Theorem 9.4 A sequence (Sn ) converges if and only if Sn is a Cauchy sequence.  Another form of the previous result is the following: A sequence (Sn ) converges if and only if for every ε > 0 there exists a n 0 such that n ≥ n 0 and p ≥ 1: |sn+ p − sn | ≤ ε In particular if (Sn ) is a



an then the convergence criterion of a series is:



an

converges if and only if for every ε > 0 there exists a n 0 such that n ≥ n 0 and p ≥ 1 then:    p    a n+k  < ε  k=1

Next the criterion for the convergence of a series is explained:  Theorem 9.5 D’Alembert criterion. Let suppose a series an , with an = 0 ∀ n such that exists the limit

148

9 Differential Equations

lim

n→∞

an+1 =α an

then 1. If α < 1, the series converges. 2. If α > 1, the series diverges. 3. None of the conclusions can be extended to the equality case α = 1 (the series may converge or diverge).  Example 9.15 1. Let the series

∞  1 , then n n=1

1 n+1 n→∞ 1 n

α = lim

n =1 n→∞ n + 1

= lim

1 with N = N0 , P = 2N0 (see However the series does not converge for ε = 10 Cauchy criterion): 2N0  1 1 N0 1 1 1 1 > = > = + + ··· + 10 n=N +1 n N0 + 1 N0 + 2 N0 + N0 2N0 2 0

which is a contradiction, hence the series diverges. ∞  1 , then 2. Let the series n(n + 1) n=1 α = lim

n→∞

n =1 n+2

On the other hand note that:  n   1 1 1 = lim − n→∞ n(n + 1) k k+1 n=1 k=1       

 1 1 1 1 1 1 1 + − + − + ··· + − 1− = lim n→∞ 2 2 3 3 4 n n+1   1 = lim 1 − n→∞ n+1 =1 ∞ 

Hence, the series converges.

9.11 Solution of Differential Equations by Means of Power Series

149

Theorem 9.6 Let {an } a sequence, x0 ∈ R ∀ x ∈ R. Let the power  series  n an an (x − x0 ) , then there exists R, 0 ≤ R < ∞ such that the series (x − x0 )n : 1. 2. 3. 4.

Converges if | x − x0 |< R (x0 − R < x < x0 + R). Diverges if | x − x0 |> R (x0 − R > x > x0 + R). If R = 0, the series converges only in x = x0 . If R = ∞, the series converges for all x.



Remark 9.5 Using the Cauchy criterion, we have        a  a  (x−x0 )n+1  α = lim  an+1  = lim  n+1  | x − x0 |=| x − x0 | lim  n+1  =| x − x0 | K n a a n n a (x−x ) n 0 n→∞ n→∞ n→∞    K

The series converges if | x − x0 | K < 1, that is, if | x − x0 |< words, the series converges if | x − x0 |< R, where R=

1 K

= R. In other

   n   an+1  = lim  aan+1   n→∞ limn→∞  1

an

Example 9.16 1. The series



1 n x n!

converges for all x. Indeed:

1 n x n! 1 n→∞ x n+1 (n+1)!

R = lim

2. The series



= lim

n→∞

xn = 1 + x + x2 + · · · + xn =

since

1 (n + 1) = ∞ x 1 − x n+1 converges for all x, 1 − xn

   n     an  R = lim  an+1  = lim  xxn+1  = lim  x1  = ∞ if and only if | x |< 1 n→∞

n→∞

n→∞

9.11.2 Solution of First and Second Order Differential Equations Example 9.17 Solve the following differential equations using power series. 1. Let y  − y = 0, then according to the main result of this section y(x) =

∞  n=0

that converges for |x| < R, R > 0 and

an x n

150

9 Differential Equations

y  (x) =

∞ 

an nx n−1

n=1

On the other hand, substituting y and y  in the differential equation: y − y =

∞ 

an nx n−1 −

n=1

∞ 

an x n =

n=0

∞    an n − an−1 x n−1 = 0

(9.34)

n=1

Since (9.34) is equal to zero, then the coefficient of each power of x is equal to zero, that is to say, an n − an−1 = 0, n = 1, 2, 3, . . . (9.35) The above recurrence relation determines an and since n = 0: an =

an−1 , n = 1, 2, 3, . . . n

then: a1 = a0 a1 = a2 = 2 a2 = a3 = 3 .. . 1 an = a0 n!

1 a0 2! 1 a0 3!

Finally, substituting the obtained coefficients in the proposed solution: y=

∞ ∞   1 xn a0 x n = a0 = a0 e x n! n! n=0 n=0

2. For the differential equation (1 + x)y  = P y, y(0) = 1, let y(x) =

∞  n=0

that converges for |x| < R, R > 0 and

an x n

(9.36)

9.11 Solution of Differential Equations by Means of Power Series

y  (x) =

∞ 

151

an nx n−1

n=1

Substituting in the differential equation we have:

(1 + x)

(1 + x)y  = P y ∞  n−1 an nx =P an x n

∞  n=1

∞ 

an n (1 + x) x n−1 =

n=1 ∞ 

  an n x n−1 + x n =

∞ 

Pan x n

n=0

an nx n−1 +

n=1

Pan x n

n=0

n=1 ∞ 

n=0 ∞ 

∞  n=1

an nx n = Pa0 +

∞ 

Pan x n

n=1

∞ ∞     n an+1 (n + 1) + an n x = Pa0 + a1 + Pan x n n=1

n=1

then a1 = a0 P (coefficients of x 0 in both sides) and an+1 (n + 1) + an n = Pan for n ∈ N. Since y(0) = 1, according to the proposed solution, a1 = P. The other coefficients are determined with the recurrence relation: an+1 =

an (P − n) , n = 1, 2, 3, . . . n+1

(9.37)

For some positive integers, we have: (P − 1)P a1 (P − 1) = , n=1 2 2 P − 2 (P − 1)P a2 (P − 2) a3 = = · , n=2 3 3 2 P − 3 P − 2 (P − 1)P a3 (P − 3) a4 = = · · , n=3 4 4 3 2 . . . P − k P − (k − 1) P − (k − 2) P(P − 1)(P − 2) · · · (P − k) ak (P − k) = · · ··· P = ak+1 = k+1 k+1 k k−1 (k + 1)! a2 =

then

152

9 Differential Equations

(n + 1) an+1 P −n (n + 1) P(P − 1)(P − 2) · · · (P − n) = P −n (n + 1)! P(P − 1)(P − 2) · · · (P − n + 1) = n! P! = n!(P − n)!   P an = n an =

hence y=

∞    P n=0

y  − 2x y  − 2y = 0,

n

xn

y(0) = a0 ,

Let y(x) = a0 + a1 x + a2 x 2 + · · · =

y  (0) = a1 ∞ 

(9.38)

an x n

n=0

and its derivatives sequential given by y  (x) = a1 + 2a2 x + 3a3 x 2 + · · · + =

∞ 

nan x n−1

n=1

y  (x) = 2a2 + 6a3 x + · · · + =

∞ 

n(n − 1)an x n−2

n=2

Substituting in (9.38): ∞  n=2

n(n − 1)an x n−2 − 2x

∞ 

nan x n−1 − 2

n=1

∞ 

an x n

n=0

Exercise 9.7 Solve the following differential equations: 1. y  = 2x y 2. y  + y = 1 Example 9.18 Consider the differential equation of second order: y  − 2x y  − 2y = 0, y(0) = a0 , y  (0) = a1 Let

(9.39)

9.11 Solution of Differential Equations by Means of Power Series

y(x) = a0 + a1 x + a2 x 2 + · · · =

153

∞ 

an x n

n=0

whose first and second derivatives are given by y  (x) = a1 + 2a2 x + 3a3 x 2 + · · · =

∞ 

nan x n−1

n=1

y  (x) = 2a2 + 6a3 x + · · · =

∞ 

n(n − 1)an x n−2

n=2

Substituting the previous expressions in the Eq. (9.39): ∞ 

n(n − 1)an x n−2 − 2x

n=2

since

∞ 

nan x n−1 − 2

n=1

∞ 

n(n − 1)an x n−2 =

n=2

∞ 

∞ 

an x n = 0

n=0

(n + 2)(n + 1)an+2 x n

n=0

this yields to ∞ 

(n + 2)(n + 1)an+2 x − 2 n

n=0

∞ 

nan x − 2

n=0

n

∞ 

an x n = 0.

n=0

We obtain (n + 2)(n + 1)an+2 − 2nan − 2an = 0, so that an+2 =

2an . n+2

To find two solutions of Eq. (9.39), two different values for a0 and a1 are chosen. For example, a0 = 1, a1 = 0 or a0 = 0, a1 = 1. For the first case, the odd coefficients a1 , a3 , a5 are zero because a3 =

2a1 2a3 = 0, a5 = = 0, . . . 1+2 5

On the other hand, the even coefficients are determined by a2 = a0 = 1, a4 = then one solution is:

1 1 2a2 2a4 1 = , a6 = = , . . . , a2n = 4 2 6 2·3 n!

154

9 Differential Equations

y1 (x) = 1 + x + x 2 +

x6 x4 2 + + · · · = ex . 2! 3!

For the second case, a0 = 0, a1 = 1, the even coefficients are zero and the odd coefficients are given by: a3 =

2a1 2a3 2a 2n 22 222 , a5 = = , a7 = 5 = , . . . , a2n+1 = 3 5 53 7 753 3 · 5 · 7 · · · (2n + 1)

then, other solution is ∞

 2n x 2n+1 2 22 5 y2 (x) = x + x 3 + x + ··· = 3 3·5 3 · 5 · 7 · · · (2n + 1) n=0 Exercise 9.8 Solve the following differential equations: 1. y  + y = 0 2. y  = x − y, y(0) = 0

9.12 Picard’s Method Let the initial value problem

y  = f (t, y)

(9.40)

then, from (9.40) can be written in a special form, integrating both sides of equation respect to t. Namely, if y(t) satisfies (9.40), we have:  t0

t



dy(s) = ds

f (s, y(s))

t0



then

t

y(t) = y0 +

t

f (s, y(s)) ds

(9.41)

t0

Reciprocally, if y(t) is a continuous function and satisfies (9.41), then dy = dt f (t, y(t)). Therefore, y(t) is a solution of (9.40), if and only if, it is a continuous solution of (9.41). This suggests the following method to obtain a succession of approximate solutions yn (t) from (9.41). The Eq. (9.41) is an integral equation and it is in the special form:  t

y0 +

f (s, y(s)) ds

t0

The method begins by proposing a tentative solution y0 (t) of (9.41). The easiest choice is y0 (t) = y0 . To check if y0 (t) is a solution of (9.41), is calculated

9.12 Picard’s Method

155

Fig. 9.4 Rectangular domain



t

y1 (t) = y0 +

f (s, y0 (s)) ds

t0

Now, the y1 (t) as the next option is used. If y1 (t) = y0 , then y(t) = y0 . Otherwise, y1 as the following approximation is used. To verify that y1 (t) is a solution of (9.41), it is calculated  t y2 (t) = y0 +

f (s, y1 (s)) ds

t0

In this form, we can define a sequence of functions y1 , y2 , . . ., where  yn+1 (t) = y0 +

t

f (s, yn (s)) ds

t0

The functions yn (t) are called successive approximations or Picard’s iterations (Fig. 9.4). Example 9.19 To calculate the Picard’s iterations for the following initial value problem and verify that the sequences converge. Let y  = y = f (t, y(t)) , y(0) = 1 The integral equation is  y1 (t) = y0 +

t

 f (s, y0 (s)) ds = 1 +

t0

t

1 ds = 1 + t

0

so that  y2 (t) = y0 + 0

t



t

f (s, y1 (s)) ds = 1 + 0

(1 + s) ds = 1 + t +

t2 2

156

9 Differential Equations

In general, we have  yn (t) = y0 +

t

f (s, yn−1 (s)) ds = 1 +

0

= 1+t + =

∞ n  t n=0 t

 t 0

s n−1 1 + s + ··· + (n − 1)!

 ds

t2 tn + ··· + 2! n!

n!

yn (t) = e

Example 9.20 For the following initial value problem, y = 1 + y3,

y0 (1) = 1

the corresponding integral equations is: 

t

y(t) = 1 +

(1 + y 3 (s))ds

1

For y0 (t) = 1, we have f (1, y0 (1)) = 1 + 13 = 2, then:  y1 (t) = 1 +

t

2ds = 1 + 2(t − 1)

1

and for y2 (t), in the same manner: 

t

y2 (t) = 1 + 

  1 + 2(s − 1)3 ds

1

= 1 + 2s + 3(s − 1)2 + 4(s − 1)3 + 2(s − 1)4

t 1

= 1 + 2(t − 1) + 3(t − 1)2 + 4(t − 1)3 ∞  y2 (t) = K (t − 1) K −1 k=1

9.13 Convergence of Picard’s Iterations Lemma 9.1 Let a, b positive numbers, let the rectangle R: t0 ≤ t ≤  t0 + a, |y − y0 | ≤ b. Let M = max| f (t, y)| where (t, y) ∈ R and α = min a, Mb then |yn (t) − y0 | ≤ M(t − t0 ) for t0 ≤ t ≤ t0 + α. Proof (From Lemma 9.1). It is fulfilled for n = 0, y0 (t) = y0 . For n = j:

9.13 Convergence of Picard’s Iterations

  |y j+1 (t) − y0 | = 

t

157

 f |s, y j (s)|ds ≤

t0

t0

t

  | f (s, y j )|ds  ≤ M(t − t0 )

where M = max | f (t, y)| (t,y)∈R

Now it is possible to prove that Picard’s iterations yn (t) converge for all t in the interval t0 ≤ t ≤ t0 + α. If ∂∂ yf exists and it is continuous:   yn (t) = y0 (t) + [y1 (t) − y0 ] + [y2 − y1 ] + [y3 − y2 ] + · · · + yn − yn−1 It is clear that sequence yn (t) converges if and only if the series converge: |yn − y0 | = |y1 − y0 (t) − . . . − yn (t) − yn−1 (t)| that is:

∞  |yn (t) − yn−1 (t)| < ∞ n=1

Note that  t   t   |yn (t) − yn−1 (t)| =  | f (s, yn−1 (s)) − f (s, yn−2 (s))|ds ( f (s, yn−1 (s)) − f (s, yn−2 (s))) ds  ≤ t0 t0  t (s)) = | ∂ f (s,ζ ||yn−1 (s) − yn−2 (s)|ds ∂y t0

where ζ (s) is between yn−1 (s) and yn−2 (s). Therefore 

t

|yn (t) − yn−1 (t)| ≤ L

|yn−1 (s) − yn−2 (s)|ds

t0

for t0 ≤ t ≤ t0 + α and | L = max | ∂ f ∂(t,y) y



(t,y)∈R

Remark 9.6 If y ≤ y0 + b then y0 + M(t − t0 ) ≤ y0 + b. We obtain a ≤

b . M

Definition 9.11 A function f(x,y) defined in a domain D it is said to satisfy the Lipschitz condition with respect to y if for every x, y1 , y2 such that (x, y1 ), (x, y2 ) ∈ D: | f (x, y1 ) − f (x, y2 )| ≤ K |y1 − y2 | where K is called the Lipschitz constant.

158

9 Differential Equations

Lemma 9.2 Let f (x, y) with partial derivative ∂∂ yf (x, y) bounded for every (x, y) ∈ D, D is a convex set, then f (x, y) satisfies the Lipschitz condition with constant K such that    ∂ f (x, y)     ∂ y  ≤ K , (x, y) ∈ D Proof Using the Mean Value Theorem:      ∂ f (x, ξ )   ∂ f (x, ξ )  |y2 − y1 | ≤ K |y2 − y1 | | f (x, y2 ) − f (x, y1 )| =  (y2 − y1 ) ≤  ∂y ∂y  where    ∂ f (x, ξ )    K = max  (x,ξ )∈D ∂y 



Theorem 9.7 (Theorem of approximated solutions) Let (x0 , y0 ) ∈ R, f (x, y) continuous function that satisfies the Lipschitz conditions with constant K . Let y1 (x), y2 (x) approximated solutions of the differential equation y  = f (x, y) defined for |x − x0 | ≤ h with errors ε1 , ε2 . Define P(x) = y1 (x) − y2 (x) and ε = ε1 + ε2 , then: |P(x)| ≤ e K |x−x0 | |P(x0 )| +

 ε  K |x−x0 | e −1 K

Proof The inequality |x − x0 | ≤ h defines the interval [x0 − h, x0 + h] (Without lost of generality, the proof is only for the interval x0 ≤ x ≤ x0 + h). Since the solutions y1 (x) and y2 (x) satisfy: |

dy1 dy2 − f (x, y1 )| ≤ ε1 , | − f (x, y2 )| ≤ ε2 dx dx

then    dy1 dy1 dy2  dy2   d x − d x  = | d x − d x + f (x, y1 ) − f (x, y1 ) + f (x, y2 ) − f (x, y2 )|      dy1  dy2 =  − f (x, y1 ) − + f (x, y2 ) + [ f (x, y1 ) − f (x, y2 )]  dx dx ≤ ε1 + ε2 + | f (x, y1 ) − f (x, y2 )| ≤ ε1 + ε2 + K |y1 − y2 | = ε + K |y1 − y2 | On the other hand, since P(x) = y1 (x) − y2 (x) then P  (x) = |P  (x)| ≤ ε + K |P(x)|

dy1 dx

+

dy2 , dx

so that

9.13 Convergence of Picard’s Iterations

159

Now consider the following three cases: 1. P(x) = 0, x0 ≤ x ≤ x0 + h. For P(x) > 0:   d P(x)  d P(x)  ≤ K P(x) + ε ≤ dx dx  P  (x) P (x) − K P(x)    −K x P (x) − K P(x) e  −K x  e P(x) 

≤ K P(x) + ε ≤ε ≤ e−K x ε ≤ e−K x ε

Integrating both sides: 

x x0

 −K x  e P(x) d x ≤



x

e−K x εd x

x0

x  ε  −K x0 e e−K x P(x)x0 ≤ − e−K x K  ε  −K x0 −K x −K x0 e e P(x) − e P(x0 ) ≤ − e−K x K  ε  K (x−x0 ) e P(x) ≤ e K (x−x0 ) P(x0 ) + −1 K 2. For P(x) = 0. In this case |P(x)| = 0. This value is bounded for every positive constant. 3. P(x) < 0. The proof is left to the reader as an exercise.  If D is a connected1 region in R, we have the following existence theorem for solutions of linear differential equations. Theorem 9.8 Let D ⊂ R be simply connected and B and C holomorphic2 on D. Let a homogeneous linear differential equation: d2x dx + C(z)x = 0 + B(z) 2 dz dz

(9.42)

with initial conditions at z = z 0 ∈ D given by x(z 0 ) = θ dx (z 0 ) = ε dz 1 Roughly

(9.43)

speaking, a region D is connected if for an arbitrary pair of points P and Q, we can always draw a curve in D. 2 A function F defined on D (open, connected subset) is holomorphic if F(·) is differentiable and F  (·) is continuous.

160

9 Differential Equations

has a unique holomorphic solution x on D satisfying both (9.42) and (9.43). The proof can be found in [8]. Since (9.42) is a homogeneous equation, we state that the set V of functions that satisfy the above differential equation is a vector space n  over C. Let x1 , . . . , xn ∈ V and λ1 , . . . , λn ∈ C then we have λi xi ∈ V . To see i=1

this, note that xi ∈ V if and only if

d 2 xi d xi + C(z)xi = 0 + B(z) 2 dz dz

Multiplying by λi and add, then d2 dz 2

Thus,

n 

 n  i=1

 λi xi

d + B(z) dz

 n 

 λi xi

+ C(z)

i=1

 n 

 λi xi

=0

i=1

λi xi is also a solution of (9.42). Since the differential equation (9.42) has

i=1

order 2, the dimension of V over C is 2, i.e., dim V = 2. This can be see as follows. We define a map: ϕ : V −→ C2   x(z 0 ) = θ x → ϕ(x) = d x (z ) = ε dz 0 Clearly ϕ is a linear map from V to C2 . By the existence theorem above, if (θ, ε) ∈ C2 is an arbitrary vector, there is a unique solution x ∈ V such that 

   x(z 0 ) θ = dx (z ) ε dz 0

Therefore, ϕ is bijective. Thus V ∼ = C2 (V is isomorphic to C2 ) and dim V = 2.  Example 9.21 An interesting problem in control theory, especially in linear time invariant systems the state-model takes the following form (see Chap. 7, Theorem 7.2): x˙ = Ax(t) + Bu(t) (9.44) that represents an nonhomogeneous differential equation. We obtain the matrix solution of (9.44) applying the integrating factor method. The integrating factor is chosen as e−At . Multiplying both sides of (9.44) by the integrating factor and rearranging terms:

9.13 Convergence of Picard’s Iterations

161

 −At    e x˙ = e−At (Ax(t) + Bu(t)) xe ˙ −At Note that

d dt



xe ˙ −At = Ax(t)e−At + Bu(t)e−At − Ax(t)e−At = Bu(t)e−At

 e−At x(t) = xe ˙ −At − Ax(t)e−At , then:  d  −At e x(t) = e−At Bu(t) dt

(9.45)

Integrating both sides of above equation:  t  t   d e−Aτ x(τ ) = e−Aτ Bu(τ )dτ 0 0 t  t  −Aτ e x(τ ) = e−Aτ Bu(τ )dτ 0

e−At x(t) − e0 x(0) =



0

t

e−Aτ Bu(τ )dτ  t e−Aτ Bu(τ )dτ e−At x(t) = x(0) + 0

0

since e0 = I . Finally, multiplying both sides by the inverse of e−At , i.e. e At the solution of (9.44) is given by:  x(t) = x0 e At +

t

e A(t−τ ) Bu(τ )dτ

(9.46)

0

where x0 = x(0).

References 1. Boyce, W. E., DiPrima, R. C.: Differential Equations Elementary and Boundary Value Problems. Willey (1977) 2. Walter, W.: Ordinary Differential Equations. Springer, New York (1998) 3. Murphy, G.M.: Ordinary Differential Equations and their Solutions. Dover Publications, New York (2011) 4. Simmons, G.F.: Differential Equations With Applications and Historical Notes. CRC Press (2016) 5. Hartman, P.: Ordinary Differential Equations. Willey (1964) 6. Hale, J.: Ordinary Differential Equations. Dover Publications, New York (1969) 7. Tricomi, F.G.: Differential Equations. Dover Publications, New York (2012) 8. Birkhoff, G., Rota, G.C.: Ordinary Differential Equations. Wiley, New York (1989)

Chapter 10

Differential Algebra for Nonlinear Control Theory

Abstract This chapter focus is to show the application of the commutative algebra, algebraic geometry and differential algebra concepts to nonlinear control theory, it begins with necessary information of differential algebra, it continues with definitions of single-input single-output systems, invertible systems, realization and canonical forms, finally we present methods for stabilization of nonlinear systems throughout linearization by dynamical feedback and some examples of such processes.

10.1 Algebraic and Transcendental Field Extensions The notion of field was introduced in the late nineteenth century in Germany, with the purpose of treating equations while avoiding the need of complex calculations, in a way similar to nonlinear equations. Definition 10.1 Let L , K fields such that K ⊂ L. The larger field L is then said to be an extension field of K denoted by L/K [1–4]. On the other hand, consider a ∈ L. 1. a is algebraic over K if there exists a polynomial P(x) with coefficients in K such that P(a) = 0. 2. a is transcendent over K if it is not algebraic, that is, if there is no polynomial P(x) with coefficients in K such that P(a) = 0. √ For example, if K = Q, 2 ∈ R is algebraic over Q since there exists a polyno√ mial, namely p(x) = x 2 − 2, such that p( 2) = 0. If L = R, π, e are transcendental elements over Q. According to the above concepts, the extension L/K is algebraic if and only if every element of L is algebraic over K , it is said to be transcendental if and only if there exist at least one element of L such that it is transcendental over K . © Springer Nature Switzerland AG 2019 R. Martínez-Guerra et al., Algebraic and Differential Methods for Nonlinear Control Theory, Mathematical and Analytical Techniques with Applications to Engineering, https://doi.org/10.1007/978-3-030-12025-2_10

163

164

10 Differential Algebra for Nonlinear Control Theory

Now, consider the algebraic dependence. This idea is similar to linear independence of vectors in vector spaces. Definition 10.2 The elements {ai | i ∈ I } of L are said to be K -algebraically independent if there are no polynomials P(x1 , . . . , xγ ) in any finite number of variables with coefficients in K such that P(a1 , . . . , aγ ) = 0. Elements no K -algebraically independent are said to be K -algebraically dependent [5–7]. It is possible to show the existence of maximal families of elements of L that are K -algebraically independent. Two maximal families have the same cardinality (the same number of elements). Definition 10.3 Let S be a subset of L a maximal family. S is said to be a transcendence basis of L/K if it is algebraically independent over K and if furthermore L is an algebraic extension of the field K(S) (the field obtained by adjoining the elements of S to K). The cardinality of this basis is the transcendence degree of L/K denoted by tr d◦ L/K [8–10]. Example 10.1 1. The extension L/K is algebraic if and only if tr d◦ L/K = 0. 2. Let K = R. We consider the field of rational functions in the variable s, denoted by R(s): a0 + a1 s + · · · + an s n , ai , bi ∈ R f (s) = b0 + b1 s + · · · + bm s m Hence s is transcendental over K = R, then tr d◦ L/K = 1. 3. Let K = R. The field of rational fractions in a finite number α of variables s1 , . . . , s α : P(s1 , . . . , sα ) f (s1 , . . . , sα ) = Q(s1 , . . . , sα ) P, Q are polynomials with real coefficients in s1 , . . . , sα . This field is denoted by R(s1 , . . . , sα ). In this case, tr d◦ R(s1 , . . . , sα )/R = α. s1 , . . . , sα are Ralgebraically independent (they constitute a transcendence basis). These extensions are called pure transcendental extensions. Theorem 10.1 ([4]) Let K , L , M be fields such that K ⊂ L ⊂ M to be a tower of fields. Then tr d◦ M/K = tr d◦ M/L + tr d◦ L/K



Example 10.2 As an application of the above Theorem, let L/K be a field extension of transcendence degree n, i.e., tr d◦ L/K = n. Let (x1 , . . . , xn ) a transcendence basis of L/K . Considering the field K (x1 , . . . , xn ) we have K ⊂ K (x1 , . . . , xn ) ⊂ L. By the Theorem: n = tr d◦ L/K = tr d◦ L/K (x1 , . . . , xn ) + tr d◦ K (x1 , . . . , xn ) /K

10.1 Algebraic and Transcendental Field Extensions

165

According to the last part of the above example, tr d◦ K (x1 , . . . , xn ) /K = n, then tr d◦ L/K (x1 , . . . , xn ) = 0. That means that algebraic extension L/K (x1 , . . . , xn ) is algebraic.

10.2 Basic Notions of Differential Algebra The differential algebra was introduced by the American mathematician Joseph Ritt, It was introduced to provide the theory of differential equations tools similar to those given by commutative algebra for algebraic equations. Definition 10.4 Let A a commutative ring with unity. A derivation of A is an application δ of A in A such that δ(a + b) = δ(a) + δ(b) δ(ab) = aδ(b) + δ(a)b for all a, b ∈ A. Definition 10.5 A differential ring A is a commutative ring with unity provided with a finite set Δ of derivations such that for all δ1 , δ2 ∈ Δ, for all a ∈ A, δ1 δ2 a = δ2 δ1 a. Definition 10.6 A differential ring is said to be ordinary or partial if the finite set Δ contains one or more derivations, respectively. Definition 10.7 A differential field is a differential ring. Definition 10.8 A constant of a differential ring is an element a such that δ(a) = 0 for all δ ∈ Δ. Lemma 10.1 The set of constants of a differential ring (respectively a differential field) is a differential subring (respectively a differential subfield). Proof Let A be a differential ring. Let C the set of constants. C contains at least 0 and 1. 1. For all a ∈ A, a + 0 = a, then δ(a + 0) = δ(a) + δ(0) = δ(a) so that δ(0) = 0. 2. In the same manner, δ(a) = δ(a · 1) = 1 · δ(a) + a · δ(1) = δ(a) + aδ(1) then δ(1) = 0. 3. On the other hand, let c1 , c2 ∈ C, then δ(c1 + c2 ) = δ(c1 ) + δ(c2 ) = 0 and δ(c1 c2 ) = c1 δ(c2 ) + δ(c1 )c2 = 0. 4. Finally, let c a nonzero constant, then 1c is a constant. Besides for all δ ∈ Δ,   = 0.  δ 1c = − δ(c) c2

166

10 Differential Algebra for Nonlinear Control Theory

Definition 10.9 A differential field extension L/K is given by two differential fields K and L, such that: 1. K is subfield of L, (K ⊂ L). 2. The derivation of K is the restriction to K of the derivation of L. Example 10.3 R, Q, C are trivial fields of constants. Let L/K a differential field extension. There are two possible cases: 1. An element a ∈ L is said to be differentially algebraic over K if and only if there exists a differential algebraic extension with coefficients in K , i.e., there exist a polynomial P(x, x, ˙ . . . , x (n) ) with coefficients in K in a finite number of derivations of x such that P(a, a, ˙ . . . , a (n) ) = 0. The extension L/K is said to be differentially algebraic if and only if every element of L is differentially algebraic over K . (It is also called Hyperalgebraic). 2. An element a ∈ L is said to be differentially transcendental over K if and only it is not differentially algebraic. The extension L/K is said to be differentially transcendental if and only if there exist at least an element of L differentially transcendental over K . Example 10.4 Ret /R is a differential field extension R ⊂ Ret  where et is a solution of P(x) = x˙ − x = 0 and it is differentially algebraic. Definition 10.10 A set {ξi | i ∈ I } of elements in L is said to be differentially K algebraically dependent if and only if the set of derivatives of any order {ξi(νi ) | i ∈ I, νi = 0, 1, 2, . . .} is K -algebraically dependent. Definition 10.11 If a set is not differentially K -algebraically dependent is said to be differentially K -algebraically independent [1, 3, 5]. Definition 10.12 Every family differentially K -algebraically independent such that is maximal respect to the inclusion is said to be a differential transcendence base of L/K . The cardinality of a a base is called the differential transcendence degree of L/K denoted by diff tr d◦ L/K . Example 10.5 L/K is differentially algebraic if and only diff tr d◦ L/K = 0. Theorem 10.2 Let K , L , M fields, such that K ⊂ L ⊂ M is a tower of differential field extensions. Then diff tr d◦ M/K = diff tr d◦ M/L + diff tr d◦ L/K 

10.3 Definitions and Properties of Single-Input Single-Output Systems

167

10.3 Definitions and Properties of Single-Input Single-Output Systems One of the most well known topics on automatic control theory is stationary linear systems that posses a single-input and a single-output (SISO), usually, these system are described by a transfer function, further developments have made possible to include optimal control and optimal statistical filtering which led to the notion of state space representation for linear systems. these kind of systems can be better described by using algebra concepts as follows. Let K be a differential field of a given basis. If the coefficients of the equations are constants then we take K = R. On the other hand if the coefficients are variables, we take K as a field of meromorphic functions of single or several variables. Let u = (u 1 , . . . , u m ) a finite family of differential quantities. K u denotes the differential field of K and the components of u. Example 10.6



u (28) 1

45

u˙ 22 − 1  6 u¨ 1 + 3 u (8) 2

Let u = (u 1 , . . . , u m ) and y = (y1 , . . . , yn ) two finite families of differential quantities. We denote by K u, y the differential field by K and the components of u and y. Definition 10.13 A system withinput u and output y consists in a differential algebraic extension K u, y/K u, i.e., the components of y1 , . . . , yn of y are differentially algebraic over K u [3, 6, 7]. Remark 10.1 In the definition of differential algebraic extensions, the differential algebraic equations are used. In the case of a simple pendulum, y¨ + sin y = 0, note the following situation. sin y satisfies a polynomial differential equation: d sin y = y˙ cos y dt besides

 y˙ (sin y) + 2

2

d sin y dt

2 = y˙ 2

so that dtd ( y¨ + sin y) = 0 can be expressed as y˙ 2 y¨ 2 + (y (3) )2 = y˙ 2 . We have proved that y satisfies a differential algebraic equation. This elimination can be done in every differential equation where the coefficient satisfy a differential algebraic equation.

168

10 Differential Algebra for Nonlinear Control Theory

Definition 10.14 An input-output system defined as above is said to be a system of distributed parameters if and only if the differential fields are not ordinariesonce,doce. Definition 10.15 If u is a transcendence base of K u, y/K then the input is said to be independent [4–6]. In other words, if u ⊂ K u ⊂ K u, y, then diff tr d◦ K u, y/K = diff tr d◦ K u, y/K u + diff tr d◦ K u/K therefore diff tr d◦ K u, y/K u = 0.

10.4 Identifying the Input and Output of Dynamic Systems For simple dynamic systems is easy to identify the input and the output, although this situation does not always occur, for example in complex electrical circuits, is necessary to decide which voltage and current variables are the inputs and outputs of the system. Definition 10.16 Let K a differential field. A systemic field is a differential extension K /k (finitely generated) if there exists a finite set w = (w1 , . . . , ws ) of differential quantities such that K = kw. Example 10.7 Let w˙ = w and Rw = Ret  the set of rational functions of et . Definition 10.17 Let m = diff tr d◦ K /k, where 0 ≤ m ≤ s. There exists a nonunique partition of w in w = (u, y) where u = (u 1 , . . . , u m ) is a differential transcendence basis of K /k. y = (y1 , . . . , y p ), such that m + p = s is formed of the remaining components. Since K u, y/K u is differentially algebraic, it is said that u is the input and y the output. Example 10.8 Let the integrator system y˙ = u. The inverse system (a perfect derivative) is also an input-output system.

10.5 Invertible Systems One of the classical problems in automatic control is trajectory tracking of an output, there are two problems to solve. The first one is to know if the trajectory is reproducible. If so, the other problem is to calculate the inputs. One solution is through constant linear transfer matrices, in such way the problem is equivalent to left (or right) invertibility of matrices.

10.5 Invertible Systems

169

For linear systems with state-space representation, there is a fundamental algorithm called structure algorithm (of Silverman) that provides the response elements. In the nonlinear case, the answer is easy with SISO systems. For MIMO systems there is not an clear answer for the problem. Definition 10.18 The differential rank of output ρ of system K u, y/u is defined [10–12] by ρ = diff tr d◦ K y/k Two important properties of differential rank are listed below. 1. ρ ≤ inf (m, p) where m is the number of inputs and p is the number of outputs. It is clear that a. ρ ≤ p b. ρ ≤ m since k ⊂ ky ⊂ ku, y. Furthermore, diff tr d◦ ku, y/k = diff tr d◦ ku, y/ky + diff tr d◦ ky/k 2. In a linear system, the differential rank of output is defined by a transfer matrix and is equal to the rank of this matrix. Remark 10.2 The differential rank of output ρ is also the maximum number of outputs that are not related by any polynomial differential equation with coefficients in a field K (independent of x and u). In practice, in certain simples cases, it is determined by looking for all the relationships of the form:   h¯ y1 , . . . , y p = 0,

h¯ ∈ K y1 , . . . , y p 

considering a nonlinear system represented by  Σ:

x˙ = f (x) + g(x)u y = h(x)

(10.1)

  with u = (u 1 , . . . , u m ) ∈ Rm , x = (x1 , . . . , xn ) ∈ Rn , y = y1 , . . . , y p ∈ R p If we find r polynomials that are independent, then the differential output rank of the system (10.1) is (10.2) ρ = diff tr d◦ K y/K = p − r

170

10 Differential Algebra for Nonlinear Control Theory

Example 10.9 1. Let us define Σ1 , with two inputs and two outputs: ⎧ x˙1 ⎪ ⎪ ⎪ ⎪ ⎪ x ⎪ ⎨ ˙2 Σ1 : x˙3 ⎪ ⎪ ⎪ y1 ⎪ ⎪ ⎪ ⎩ y2

= u1 = u1 = u2 = x1 = x2

it is possible to immediately observe that the only relationship of the type p(y) = 0 is a differential polynomial with coefficients in K given by: p(y) = y˙1 − y˙2 = 0 a single-output is differentially independent and the differential rank of output ρ is ρ = p−r = 2−1 = 1 2. Let us define Σ2 , with two inputs and two outputs: ⎧ x˙1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ x˙2 Σ2 : x˙3 ⎪ ⎪ ⎪ y1 ⎪ ⎪ ⎪ ⎩ y2

= u1 = u2 = u2 = x1 = x2 + x3

there is no polynomial differential equation that relates the two outputs that are differentially algebraically independent so that the differential rank of output of the system is ρ= p=2 For a nonlinear system it is possible to associate two differential sets, namely polynomial ring and an ideal, i.e., given the following nonlinear system  NLS :

x˙ = f (x) + g(x)u y = h(x)

(10.3)

we have that the polynomial ring and the ideal associated with the previous system is: K {u, x, f (x), g(x), h(x), y} = K {Σ} I = {x˙ − f (x) − g(x)u, y − h(x)} ⊂ K {Σ}

10.5 Invertible Systems

171

If we form the quotient ring with K {Σ} and I , then we have the quotient ring K {Σ} /I where the ideal I can be prime or not. If I is prime, we form the class quotient field of K {Σ} /I , i.e., O (K {Σ} /I ). Then we extract a subfield K y, with K y a differential extension of K , achieving this form the so called differential rank of output ρ given by (10.4) ρ = diff tr d◦ K y/K If I is not prime the system is not well defined and f is ill-defined. We can show an example where ideal is not prime. Example 10.10 Let us define the system Σ¯ given by a nonlinear “implicit” system:  Σ¯ :

x˙ 2 = u 2 y=x

¯ y) = y − x, then the associated differential We have f¯ (x, x, ˙ u) = x˙ 2 − u 2 and g(x, sets are:

 K {x, u, y} = K Σ¯



 I = f¯, g¯ ⊂ K Σ¯ Hence, f¯ is factorizable, f¯ = x˙ 2 − u 2 = (x˙ + u) (x˙ − u), that is to say f¯ is not / I and x˙ + u ∈ / I , thus the ideal I is irreducible. Besides, x˙ 2 − u 2 ∈ I but x˙ − u ∈ not prime. Definition 10.19 Let K u, y/u a system with independent inputs. 1. The system is left invertible if and only if ρ = m. 2. The system is right invertible if and only if ρ = p. Theorem 10.3 The input-output system K u, y/ku is left invertible if and only if diff tr d◦ K u, y/ky = 0  In other words, an input-output system is left invertible if and only if we can recover the input from the output by means of a finite number of differential equations. Example 10.11 System y = u˙ is left invertible because 

t

u(t) =

y (σ ) dσ + c

0

On the other hand, y = u 1 + u 2 is not left invertible. Definition 10.20 If the system K u, y/u is square (m = p), left and right invertibility are equivalents. It is said that system is invertible.

172

10 Differential Algebra for Nonlinear Control Theory

10.6 Realization One of the fundamental problems of automatic control is the realization or state space representation. Let L/K a finitely generated differential extension, i.e., there exists w = (w1 , . . . , ws ) such that L = K w. The following conditions are equivalents: 1. The extension L/K is differentially algebraic. 2. Transcendence degree (non differential) of L/K is finite. In other words, the non differential transcendence degree is the number of initial conditions necessary to calculate the solutions of differential algebraic equations. Example 10.12 ( y¨ )2 − ( y˙ )3 + y y˙ = 0 In this case, nondiff tr d◦ = 2. Definition 10.21 Let u be a differential scalar indeterminate and let K be a differential field, with derivation denoted by dtd . A dynamics is a finitely generated differential algebraic extension G/K u, where K u denotes the differential field by K and the elements of a set u = (u 1 , u 2 , . . . , u m ) of differential quantities. Remark 10.3 In the above definition, G = K u, ξ  where ξ ∈ G. Any element of G satisfies a differential algebraic equation with coefficients that are rational functions over K in the components of u and a finite number of their time derivatives. Example 10.13 The input-output system y¨ − uy = 0 that is equivalent to the bilinear system: ⎧ ⎪ ⎨ x˙1 = x2 Σ=

⎪ ⎩

x˙2 = ux1 y = x1

(10.5)

can be seen like a dynamics of the form Ru, y/Ru, where G = Ru, y, y ∈ G, K = R. Definition 10.22 The input u is independent if and only if u is a differential transcendence base of K /k. Since K /ku is a differential algebraic extension, the transcendence degree (non differential) of K /ku is finite, for example n. Let’s take a finite set ξ = (ξ1 , . . . , ξγ ), with γ ≥ n of elements in K , that contains a transcendence basis of K /ku. The derivatives ξ˙1 , . . . , ξ˙γ are K u-algebraically dependent over ξ :   ˙ . . . , u (α) = 0 A1 ξ˙1 , . . . , ξ˙γ , u, u, .. .   (α) ˙ ˙ Aγ ξ1 , . . . , ξγ , u, u, =0 ˙ ...,u

10.6 Realization

173

where A1 , . . . , Aγ are polynomials with coefficients in ku. These equations are implicit due to implicit function theorem we obtain locally explicit differential equations:   ξ˙1 = a1 ξ, u, u, ˙ . . . , u (α) .. .   ξ˙γ = aγ ξ, u, u, ˙ . . . , u (α) Definition 10.23 ξ is said to be a generalized state of the dynamics K /ku and its dimension γ . A nominal state is a state of minimal dimension such that is a transcendence base of K /ku with dimension n. This state is characterized by the ku-algebraic independence of its components.

10.7 Generalized Controller and Observer Canonical Forms Let L/K be a finitely generated differential algebraic extension. The differential primitive element theorem states the existence of one element δ ∈ L such that L = K δ. On the other hand, let K /ku a nonlinear dynamic. Let suppose that ku is not a field of constants, for example, when the set of controllers u is not empty. Let δ ∈ K a differential primitive element of K /ku. Consider the sequence of  ˙ . . . , δ (γ ) , . . . . derivatives of δ: δ, δ,   ˙ . . . , δ (γ ) , . . . is ku-algebraically dependent Lemma 10.2 The set δ, δ, (respectively independent) if and only if γ ≤ n − 1 (respectively γ ≥ n). n = diff tr d◦ K /ku.   ˙ . . . , δ (γ ),... Proof By contradiction. We suppose that γ ≤ n − 1 then the set δ, δ, is ku-algebraically dependent, this implies that diff tr d◦ K /ku ≤ n − 1. In the same way for ku-algebraically independent and γ ≥ n, diff tr d◦ K /ku ≥ n. In both cases there is a contradiction.    ˙ . . . , δ (γ ) , . . . is a transcendence base of K /ku, then we write δ, δ,   ˙ δ, u, u, C δ (n) , . . . , δ, ˙ . . . , u (n) = 0 where C is a polynomial with coefficients in K . According to the last, x1 = δ, x2 = ˙ . . . , xn = δ (n−1) is a minimal state of the dynamics K /ku. This leads to: δ,

174

10 Differential Algebra for Nonlinear Control Theory

x˙1 = x2 x˙2 = x3 .. . x˙n−1 = xn   ¯ ˙ . . . , u (δ) = 0 C x˙n , xn , . . . , x1 , u, u, that represents global generalized controller canonical form. In particular, the local representation is: x˙1 = x2 x˙2 = x3 .. . x˙n−1 = xn   x˙n = C x1 , . . . , xn , u, u, ˙ . . . , u (δ) y = x1 = δ

10.8 Linearization by Dynamical Feedback of Nonlinear Dynamics The problem of exact linearization by a static state feedback [13, 14] of a nonlinear dynamic has been proposed by Brockett, Jakubczyk and Respodek, also, the contributions by Hunt, Su and Meyer consist in Lie brackets and vector fields. Recently, Charlet, Levine and Marino gave sufficient conditions for linearization by a dynamic state feedback of nonlinear systems using geometric techniques. Taking the local generalized controller canonical form and considering the equality between C and a homogeneous polynomial l with coefficients in K , in the variables x = (x1 , . . . , xn ), v = (v1 , . . . , vm ) that consist in a new input and a finite number of derivatives of v: 



˙ . . . , u (δ) = l(x, v, v˙ , . . . , v(β) ) = C x, u, u,

n  j=1

fjxj +

β 

gi v (i) , ( f j < 0, f j , gi ∈ K )

i=0

Theorem 10.4 u contains a nonempty set of independent control variables then the dynamic K /ku can be locally linearized by a dynamic state feedback. 

10.8 Linearization by Dynamical Feedback of Nonlinear Dynamics

175

Example 10.14 Let K = R and consider the dynamic with a single-input: η˙ 1 = η2 + u η˙ 2 = η2 η3 η˙ 3 = uη1 where Ru, η/Ru. It is not difficult to check that η1 is a differential primitive element. Let x1 = η1 , x2 = η˙ 1 , x3 = η¨ 1 . It follows that: x˙1 = x2 x˙2 = x3 x˙3 = ux1 x2 − u 2 x1 +

(x3 − u˙ 1 )2 + u¨ = f 1 x1 + f 2 x2 + f 3 x3 + g1 v + g2 v˙ + g3 v¨ ( f i < 0) x2 − u

Making v = v˙ = v¨ = 0, we have the following system x˙1 = x2 x˙2 = x3  (x3 − u˙ 1 )2 + u¨ = f i xi , f i < 0 x2 − u i=1 3

x˙3 = ux1 x2 − u 2 x1 +

With the following dynamical control law u¨ =

3 

f i xi − ux1 x2 + u 2 x1 −

i=1

(x3 − u˙ 1 )2 x2 − u

redefining u1 = u u 2 = u˙ we have the following system u˙ 1 = u 2 3  (x3 − u 2 )2 u˙ 2 = f i xi − u 1 x1 x2 + u 21 x1 − x2 − u 1 i=1 Finally, the system is extended with a stabilizing dynamical controller as above is mentioned, i.e.:

176

10 Differential Algebra for Nonlinear Control Theory

x˙1 = x2 x˙2 = x3 x˙3 = u 1 x1 x2 − u 21 x1 +

(x3 − u 2 )2 + u˙ 2 x2 − u 1

u˙ 1 = u 2 3  (x3 − u 2 )2 u˙ 2 = f i xi − u 1 x1 x2 + u 21 x1 − x2 − u 1 i=1 To exemplify the performance of the stabilizing dynamical controller, a simulation is conducted, for this purpose the previous system has initial conditions x1 (0) = 3, and u 2 (0) = 0 (note that this set of values make x2 (0) = 1, x3 (0) = −3, u 1 (0) = 0  3 f i xi = f 1 x1 + f 2 x2 + f 3 x3 determine the the system unstable), the expression i=1 placement of the system’s poles which have to be chosen so the system is asymptotically stable. A simple method to obtain the values of f i is to solve f i in the equation (s + p1 )(s + p2 )(s + p3 ) = s 3 − f 3 s 2 − f 2 s − f 1 where pi is the desired location of the poles that make the system stable, the solution of the equality gives the next relation of f i and pi : f 1 = − p1 p2 p3 f 2 = − ( p1 p2 + p1 p3 + p2 p3 ) f 3 = − ( p1 + p2 + p3 ) For this example the values f 1 = −24, f 2 = −26 and f 3 = −9 lead to the characteristic polynomial (s + 2)(s + 3)(s + 4) that causes a stable behavior shown in the following image (Fig. 10.1). The original system’s states are reproduced with the inverse transformation: η1 = x 1 η2 = x 2 − u x3 − u˙ η3 = x2 − u Causing the system to behave as follows (Fig. 10.2). The states of the system converge to zero within a few seconds, now, to see the action of the dynamical controller it is turned off, making the system have the next dynamic (Fig. 10.3). Clearly the system without the aid of the dynamical controller is unstable proving that the control law can successfully make the system asymptotically stable.

10.8 Linearization by Dynamical Feedback of Nonlinear Dynamics

177

4 x1 x2

2

x3

States

0

−2

−4

−6

−8 0

1

2

3

4

5

6

7

8

9

10

Time(s)

Fig. 10.1 Stabilized transformed system

1 0.5 0

States

−0.5 −1 −1.5 η1

−2

η2 −2.5

η3

−3 0

1

2

3

4

5

Time (s)

Fig. 10.2 Stabilized original system

6

7

8

9

10

178

10 Differential Algebra for Nonlinear Control Theory x 10

4

x1

2

x2 x3

States

0

−2

−4

−6

−8 0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Time (s)

Fig. 10.3 Unstable transformed system

Example 10.15 Let the following system given in [15] x˙ = x + y 3 y˙ = u The system was treated in [15] with a discontinuous dynamical control law that was based on a smooth feedback for stabilization, yet, we give an alternative solution for the stabilization using a continuous dynamical control law. A differential primitive element is selected: x1 = x − u x˙1 = x2 A few simple operation with the differential primitive element follow: x2 = x + y 3 − u˙ = x1 + u + y 3 − u˙ y = (x2 − x1 − u + u) ˙ 1/3

10.8 Linearization by Dynamical Feedback of Nonlinear Dynamics

179

On the other hand x˙ = x˙1 + u˙ = x2 + u˙ Then x˙2 = x˙ + 3y 2 y˙ − u¨ = (x2 + u) ˙ + 3u (x2 − x1 − u + u) ˙ 2/3 − u¨ Making the transformed system be x˙1 = x2 x˙2 = (x2 + u) ˙ + 3u (x2 − x1 − u + u) ˙ 2/3 − u¨ The intention is to make the system have a linear dynamic given by: 

x˙1 x˙2





01 = ab



x1 x2

 , a, b < 0

Choosing as dynamic controller u¨ = (x2 + u) ˙ + 3u (x2 − x1 − u + u) ˙ 2/3 − ax1 − bx2 The system has the desired linear dynamic, the poles are obtained with the method presented in the previous example, a simulation is conducted using the parameters: a = −3, b = −5, x1 (0) = 5, x2 (0) = −4 So the system has the stable dynamic found in the image below (Fig. 10.4). The original systems states are obtained by the inverse transformation: x = x1 − u y = (x2 − x1 − u + u) ˙ 1/3 Which also have a stable dynamic (Fig. 10.5). Another alternative to achieve the linearization is to select the transformation as: x1 = x x˙1 = x2 x˙2 = x¨ x˙2 = x˙ + 3y 2 y˙ = x + y 3 + 3y 2 y˙

180

10 Differential Algebra for Nonlinear Control Theory 6 x1

5

x2

4 3

States

2 1 0 −1 −2 −3 −4 −5

0

5

10

15

Time (s)

Fig. 10.4 Stabilized transformed system 6 x

4 y

2

States

0 −2 −4 −6 −8 −10 10

5

0

15

Time (s)

Fig. 10.5 Stabilized original system

Notice that this transformation does not yield a dynamic control law, but it does produce a smooth feedback. Since y = (x˙ − x)1/3 = (x2 − x1 )1/3 x˙2 = x1 + (x2 − x1 )3/3 + 3u (x˙ − x)1/3

10.8 Linearization by Dynamical Feedback of Nonlinear Dynamics

181

The transformed system is: x˙1 = x2 x˙2 = x2 + 3u (x2 − x1 )1/3 Making the smooth control law be: u=

ax1 + bx1 − x2 3(x2 − x1 )1/3

The system has the same linear dynamic, and employing the same parameters a = −3, b = −5, x1 (0) = 5, x2 (0) = −4 the next results are achieved (Fig. 10.6). Again the original systems states are obtained by the inverse transformation: x = x1 y = (x2 − x1 )1/3 That also produces a stable dynamic (Fig. 10.7). This alternative choice of primitive element does not yield a dynamic feedback control law requiring various integrations, and yet it produces results on pair with the dynamic control, still it presents a singularity, so the implementation must be carefully made, specially when selecting initial conditions. To avoid infinite or undefined values that makes the original system or the transformed system unstable.

6 x1

5

x2

4 3

States

2 1 0 −1 −2 −3 −4 −5

0

2

4

6

8

10

Time (s)

Fig. 10.6 Stabilized transformed system

12

14

16

18

20

182

10 Differential Algebra for Nonlinear Control Theory 6 x

5

y

4

States

3 2 1 0 −1 −2 −3 0

2

4

6

8

10

12

14

16

18

20

Time (s)

Fig. 10.7 Stabilized original system

References 1. Byrnes, C.I., Falb, P.L.: Applications of algebraic geometry in system theory. Am. J. Math. 101(2), 337–363 (1979) 2. Dieudonn, J. (1974). Cours de gomtrie algbrique (Vol. 2). Presses universitaires de France 3. Samuel, P., Serge, B.: Mthodes d’algbre abstraite en gomtrie algbrique. Springer (1967) 4. Kolchin, E. R.: Differential algebra & algebraic groups. Academic Press (1973) 5. Mumford, D.: Introduction to algebraic geometry. Department of Mathematics, Harvard University (1976) 6. Hartshorne, R.: Algebraic Geometry. Springer Science & Business Media (1977) 7. Borel, A.: Linear algebraic groups, vol. 126. Springer Science & Business Media (2012) 8. Fliess, M.: Nonlinear control theory and differential algebra. In: Modelling and Adaptive Control, pp. 134–145. Springer, Berlin (1988) 9. Simmons, G.F.: Differential Equations with Applications and Historical Notes. CRC Press (2016) 10. Boyce, W.E., DiPrima, R.C.: Differential Equations Elementary and Boundary Value Problems. Willey & Sons (1977) 11. Falb, P.: Methods of Algebraic Geometry in Control Theory: Part I. Scalar Linear Systems and Affine Algebraic Geometry, Birkhuser (1990) 12. Diop, S., Fliess, M.: Nonlinear observability, identifiability, and persistent trajectories. In: Proceedings of the 30th IEEE Conference on Decision and Control, pp. 714–719. IEEE (1991) 13. Kawski, M.: Stabilization of nonlinear systems in the plane. Syst. Control Lett. 12(2), 169–175 (1989) 14. Aeyels, D.: Stabilization of a class of nonlinear systems by a smooth feedback control. Syst. Control Lett. 5(5), 289–294 (1985) 15. Fliess, M., Messager, F.: Vers une stabilisation non linaire discontinue. In: Analysis and Optimization of System, pp. 778–787. Springer, Berlin (1990)

Appendix

Important Tools

In this appendix, we collect some tools that are necessary in the development of some algebraic properties.

A.1

Summation

The addition of elements a1 + a2 + a3 + a4 + a5 + · · · + an can be represented in a short form using the symbol Σ, that is: n 

ai = am + am+1 + am+2 + · · · + an−1 + an

i=m

where i is the variable of summation (or index), m is the lower bound of summation, n the upper bound of summation and i = m means that the index i starts out equal to m. This index is incremented by 1 for each successive term and stopping when i = n. The summation is the addition of a sequence of numbers that can be integers, rational, real or complex numbers. Note the sum does  not depend of this index  that n n   ai = al . As examples of sums we and can be replaced for another index i=1

l=1

have: 10 

x j = x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10

j=1 4 

2 j · j 2 = 21 · 12 + 22 · 22 + 23 · 32 + 24 · 42 = 2 + 16 + 72 + 256 = 346

j=1

© Springer Nature Switzerland AG 2019 R. Martínez-Guerra et al., Algebraic and Differential Methods for Nonlinear Control Theory, Mathematical and Analytical Techniques with Applications to Engineering, https://doi.org/10.1007/978-3-030-12025-2

183

184

Summation that

Appendix: Important Tools

n i=1

ai can be established by mathematical induction , considering 0 

xk = 0

(A.1)

k=1 n+1 

ai =

i=1

n 

ai + an+1

(A.2)

i=1

Some important properties of summation are the following: 1. Partition of summation. If m ∈ {1, . . . , n} then n 

ai =

i=1

m 

ai +

i=1

n 

ai

(A.3)

i=m+1

2. Additive property of summation. n 

(ai + bi ) =

i=1

n 

ai +

n 

bi

(A.4)

ai ∀λ ∈ R

(A.5)

i=1

i=1

3. Homogeneous property. n 

(λai ) = λ

i=1

n  i=1

4. Exchange of finite summations. m  n 

ai, j =

i=1 j=1

m  n 

ai, j

(A.6)

j=1 i=1

5. Finite summations of constants. n 

λ = λn ∀λ ∈ R

(A.7)

i=1

6. Linearity of the summation. n n n      λai + λ¯ bi = λ ai + λ¯ bi ∀λ, λ¯ ∈ R i=1

i=1

i=1

(A.8)

Appendix: Important Tools

185

7. Telescoping summation. n 

(ai+1 − ai ) = an+1 − a1 ,

i=1

n 

(ai − ai+1 ) = a1 − an+1

(A.9)

i=1

Example A.1 The use of the properties mentioned above is shown in this exercise. Let consider: ⎧ 2n+1 3n−2 4 ·2 ⎪ if 1 ≤ n ≤ 27 ⎨√ 8n−1 √ an = 2n + 2 − 2n if 28 ≤ n ≤ 56 ⎪  ⎩  if 57 ≤ n 7 n + n2 and calculate the following sum: 59 

ak

k=1

According to (A.3): 59 

ak =

k=1

27 

ak +

k=1

56  k=28

ak +

59 

ak

k=57

For the first addend, making algebraic reductions, taking account the homogeneous property (A.5) and the sum of the first n terms of a geometric series we have: 27  k=1

27 27 2k+1 27 2k 27    4 · 2−2  42k · 23k 4 · 23k−2 4 · 4 · 23k · 2−2 = = = 8 16k 8k−1 8k · 8−1 8−1 8k k=1 k=1 k=1 k=1     16 1 − 1627 128 1 − 1627 =8· =− 1 − 16 15

ak =

On the other hand, for the second addend using the telescoping summation (A.9): 56 

ak =

k=28

56  √

2k + 2 −

56  √ √  √ √ 2k = 2(k + 1) − 2k = 114 − 2 14

k=28

k=28

and finally for the third addend: 59 

ak =

k=57

59 

59

 7 k + k2 = 7 k (k + 1) = 7 [57(58) + 58(59) + 59(60)] = 71876

k=57

k=57

Therefore,   √ √ 128 1 − 1627 + 114 − 2 14 + 71876 ak = − 15 k=1

59 

186

Appendix: Important Tools

Sometimes it is necessary the change of variable in a summation, for example, let consider the following summation: 5 

ak = a2 + a3 + a4 + a5

k=2

In the above expression, let l = k + 2 then k = l − 2. If k = 2 and k = 5 then l = 4 and l = 7 respectively, then we have 7 

al−2 = a2 + a3 + a4 + a5 =

l=4

5 

ak

k=2

In addition as example, note that 6  j=3

p j+2 = p 5 + p 6 + p 7 + p 8 =

8 

pm

m=5

The first summation, can be expressed as the second summation taking account the change of variable m = j + 2, then j = m − 2 and the lower and upper bound of the summation are m = 5 and m = 8 respectively. To conclude this section, see what happens with the cancellation of addends as special case of partition of a summation (A.3). Let the summation m+n  j=m

aj +

l+k 

  a j = am + am+1 + am+2 + · · · + am+n − al + al+1 + al+2 + · · · + al+k

j=l

where l > m, l + k > m + n. Let define the sets of index Aˆ := {m, m + 1, m + 2, . . . , m + n}, Bˆ = {l, l + 1, l + 2, . . . , l + k}. Suppose that from ˆ some i and even a i: am+i + am+i+1 + am+i+2 + · · · + am+iˆ = al+i + al+i+1 + al+i+2 + · · · + al+iˆ then, the summation is reduced and these addends are eliminated (because each one is the additive inverse of the other). To determine which addends are in the final result, note that Cˆ := Aˆ ∩ Bˆ represents a new set whose elements are the index of the addends to be canceled in the final result, that is,     Aˆ ∩ Bˆ = m + i, m + i + 1, m + i + 2, . . . , m + iˆ = l + i, l + i + 1, l + i + 2, . . . , l + iˆ

Appendix: Important Tools

187

Finally the index of addends of the final result are Aˆ \ Cˆ for the summation

m+n 

aj

j=m

and Bˆ \ Cˆ for the summation

l+k 

aj.

j=l

Example A.2 Consider the following summation S =

7  j=0

aj −

11 

aj

j=1

According to theory discussed above Aˆ = {0, 1, 2, 3, 4, 5, 6, 7} Bˆ = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} then the set of indices of the addends to be canceled in the final result is Cˆ = Aˆ ∩ Bˆ = {1, 2, 3, 4, 5, 6, 7} and Aˆ \ Cˆ = {0}, Bˆ \ Cˆ = {8, 9, 10, 11} represent the sets of indices of the addends in the final result for the first and second addends respectively. Finally: S =

7  j=0

A.2

aj −

11 

a j = a0 − a8 − a9 − a10 − a11 = a0 −

j=1

11 

aj.

j=8

Kronecker Delta

The Kronecker delta is a function of two variables, usually just positive integers. This function δ : Z × Z → {0, 1} is a piecewise function of variables i and j given by:  1, if i = j, (A.10) δi, j = 0, otherwise for example δ1,1 = 1, δ10,5 = 0, δ1,2 = 0. In addition, for every i, j ∈ Z, δi, j = δ j,i . Some properties of Kronecker delta are listed below. These properties are useful in the simplification of summations with Kronecker delta. 1. 2.

n  i=1 n  i=1

δi, j ai = a j for j ∈ {1, . . . , n}. δi, j ai = 0 for j ∈ / {1, . . . , n}.

188

3.

Appendix: Important Tools n 

δi,k δk, j = δi, j .

k=1

Example A.3 Considering the following summations 5 

δi,2 ai = δ1,2 a1 + δ2,2 a2 + δ3,2 a3 + δ4,2 a4 + δ5,2 a5

i=1

1 0 0 0 0 >  >a1 + δ2,2  >a3 + δ4,2  >  >a5 = a2  = δ1,2 δ3,2 δ5,2  a4 +   a2 +  In this case, according to property 1, j = 2 and j = 2 ∈ {1, 2, 3, 4, 5}, then we can use this property and avoid to make all the summation. In the following summation, j = 6. This value not belongs to set of indices, therefore the property 2 is used. 5 

δi,6 ai = δ1,6 a1 + δ2,6 a2 + δ3,6 a3 + δ4,6 a4 + δ5,6 a5

i=1

0 0 0 0 0 >a2 + δ3,6  >a4 + δ5,6  >a1 + δ2,6  >a3 + δ4,6  >a5 = 0  = δ1,6     Finally, 5 

δi,k δk, j = δi,1 δ1, j + δi,2 δ2, j + δi,3 δ3, j + δi,4 δ4, j + δi,5 δ5, j

k=1

This summation is 1 if i = j and belongs to set of indices or 0 if i = j or i = j but not belongs to the set of indices. In other words, this summation is the Kronecker delta δi, j . Kronecker delta is commonly used in matrix theory to define the identity matrix. This section conclude with other example where the entries of two matrices are written considering an explicit formula taking account the Kronecker delta. Example A.4 ⎡ ⎤ ⎡ ⎤ 6δ1,3 6δ1,3 6δ1,3 6δ1,3 0000 4  A = 6δi,3 i, j=1 = ⎣6δ2,3 6δ2,3 6δ2,3 6δ2,3 ⎦ = ⎣0 0 0 0⎦ 6666 6δ3,3 6δ3,3 6δ3,3 6δ3,3 On the other hand, if B ∈ M5 (R) and Bi, j = δi,4 + δ j,2 then ⎡

δ1,4 + δ1,2 ⎢δ2,4 + δ1,2 ⎢ B=⎢ ⎢δ3,4 + δ1,2 ⎣δ4,4 + δ1,2 δ5,4 + δ1,2

δ1,4 + δ2,2 δ2,4 + δ2,2 δ3,4 + δ2,2 δ4,4 + δ2,2 δ5,4 + δ2,2

δ1,4 + δ3,2 δ2,4 + δ3,2 δ3,4 + δ3,2 δ4,4 + δ3,2 δ5,4 + δ3,2

δ1,4 + δ4,2 δ2,4 + δ4,2 δ3,4 + δ4,2 δ4,4 + δ4,2 δ5,4 + δ4,2

⎤ ⎡ 0 δ1,4 + δ5,2 ⎢0 δ2,4 + δ5,2 ⎥ ⎥ ⎢ ⎢ δ3,4 + δ5,2 ⎥ ⎥ = ⎢0 δ4,4 + δ5,2 ⎦ ⎣1 0 δ5,4 + δ5,2

1 1 1 2 1

0 0 0 1 0

0 0 0 1 0

⎤ 0 0⎥ ⎥ 0⎥ ⎥ 1⎦ 0

Appendix: Important Tools

A.3

189

Matrices

In this part we consider important a resume of properties of matrices and determinants. Consider the matrices A, B, C ∈ Mm×n (F) and α, β ∈ F. In addition 0 = 0m×n represents the zero matrix. 1. Sum of matrices and multiplication of one scalar by a matrix. a. b. c. d. e. f. g.

A+B = B+ A A + B + (C) = (A + B) + C α(A + B) = α A + α B (α + β)A = α A + β A α(β A) = (αβ)A A+0= A A + (−A) = 0

2. Multiplication of matrices. a. b. c. d. e. f. g h i

A(B + C) = AB + AC (A + B)C = AC + BC A(BC) = (AB)C α(AB) = (α A)B = A(α B) A0 = 0 A = 0 BI = I B = B In general, AB = AB. AB = 0 does not necessarily mean that A = 0 or B = 0. AB = AC does not necessarily mean that B = C.

3. Trace properties. The trace of a square matrix A ∈ Mn×n (F) is defined to be the sum of the elements on the main diagonal of A: tr(A) = a11 + a22 + a33 + · · · + ann =

n 

aii

(A.11)

i=1

a. b. c. d.

tr(A + B) = tr(A) + tr(B). tr(α A) = α tr(A). tr(AB) = tr(B A). tr(A ) = tr(A).

e. tr(A B) = tr(AB  ) = tr(B  A) = tr(B A ) =

 i, j

Ai j Bi j .

f. tr(ABC D) = tr(BC D A) = tr(C D AB) = tr(D ABC). Trace is invariant under cyclic permutations. g. In general tr(ABC) = tr(AC B). h. If A, B, C are symmetric matrices then tr(ABC) = tr(A B  C  ) = tr(A (C B) ) = tr((C B) A ) = tr((AC B) ) = tr(AC B).

190

Appendix: Important Tools

i. The trace is similarity-invariant: tr(P −1 A P) = tr(P −1 (A P)) = tr((A P) P −1 ) = tr(A(P P −1 )) = tr(AI ) = tr(A). j If A is a symmetric matrix and B is antisymmetric, then tr(AB) = 0. k If A is an idempotent matrix then tr(A) = rank(A). l. If A is a nilpotent matrix then tr(A) = 0. m. tr(In ) = n. n. If f (x) = (x − λ1 )d1 (x − λ2 )d2 · · · (x − λk )dk is the characteristic polynomial of A, then tr(A) = d1 λ1 + d2 λ2 + · · · + dk λk . n  o. If A ∈ Mn×n (F) and λ1 , . . . , λn are the eigenvalues of A, tr(A) = λi . p. tr(Ak ) =

n 

i=1

λik , where λi is an eigenvalue of A.

i=1

q. Let A∗ the conjugate transpose of A, then tr(A A∗ ) ≥ 0. In addition tr(A A∗ ) = 0 if and only if A = 0. r. tr(B ∗ A) = A, B defines an inner product on the space of matrices Mm×n (F). s. 0 ≤ tr(AB)2 ≤ tr(A2 ) tr(B 2 ) ≤ tr(A)2 tr(B)2 . 4. Diagonal matrices properties. If A, B are diagonal matrices then A + B = diag (a11 + b11 , a22 + b22 , . . . , ann + bnn ). AB = diag (a11 b11 , a22 b22 , . . . , ann bnn ). α A = diag (αa11 , αa22 , . . . , αann ). A is invertible  if and only if the entries a1 , . . . , an are all non-zero, then A−1 = diag a1−1 , . . . , an−1 . e. The adjoint of a diagonal matrix is again diagonal. f. A square matrix is diagonal if and only if it is triangular and normal. n  aii . g. det(A) = a. b. c. d.

i=1

5. Properties of the inverse matrix. a. b. c. d. e. f. g. h. i.

A−1 is unique.  −1 −1 = A. A −1 (AB) = B −1 A−1 (α A)−1 = α −1 A−1 ∀α = 0. n (An )−1 = A−1 .    (A )−1 = A−1 . 1 −1 A = det(A) adj A.   det A−1 = (det(A))−1 . A matrix that is its own inverse, i.e. A = A1 and A2 = I , is called an involution.

Appendix: Important Tools

191

6. Transpose matrix properties. a. b. c. d. e. f. g. h. i. j. k.

(A ) = A. (A + B) = A + B  . (AB) = B  A . (α A) = α A . det(A ) = det(A). Let A ∈ Mn×n (F) then σ (A) = σ (A ), where σ denotes the spectrum of a matrix. Let A ∈ Mn×n (R). A is called a symmetric matrix if A = A. Let A ∈ Mn×n (R). A is called a skew-symmetric matrix if A = −A. Let A ∈ Mn×n (R). A is called a orthogonal matrix if A = A−1 . In addition, A A = A A = I . Let A ∈ Mn×n (C). A is called a Hermitian matrix if A = A∗ . Let A ∈ Mn×n (C). A is called a skew-Hermitian matrix if A = −A∗ .

7. Symmetric and skew-symmetric matrices properties. Let A ∈ Mn×n (F). a. A + A is a symmetric matrix. b. A A is a symmetric matrix. c. A − A is a skew-symmetric matrix. If A, B are symmetric (skew-symmetric) matrices: a. A + B is a symmetric (skew-symmetric) matrix. b. α A is a symmetric (skew-symmetric) matrix. c. AB is not necessarily a symmetric (skew-symmetric) matrix. 8. Conjugate transpose properties.  Let A∗ = A¯ = A¯ the conjugate transpose or Hermitian transpose, where A denotes the transpose matrix and A¯ is matrix with complex conjugated entries. Then

Index

A Abelian group, 20, 22 Abel’s differential equation identity, 142 Additive subgroup, 33, 34 Adjoint matrix, 71 Algebraic multiplicity, 110, 111, 115–117, 119, 121 Antisymmetric, 2 Approximated solutions, 158 Associate matrix, 93, 102 Associativity, 4, 19, 25 Automorphism, 27 Axiom of unions, 2

B Basis of, 81–83 Bijective, 10–13, 99, 100, 104

C Cancellation rule, 32, 33 Cauchy-Schwarz inequality, 89 Cayley-Hamilton theorem, 87, 88 Change of basis, 49, 59 Characteristic equation, 137–140, 143 Closure, 22 Codomain, 8–11, 13 Coefficients matrix, 72 Cofactor, 68, 71 Commutative ring, 31, 32, 34–36, 44 Commutativity, 3

Commute, 66 Composition, 93, 104 Congruent modulo, 22 Controllable, 97 Convergence of series, 146 Coordinate vector, 83 Coprime, 41 Corresponding base, 116 Cramer’s rule, 73 Cycle of generalized eigenvectors, 120 Cycle of order k, 66 Cyclic subgroup, 22

D D’Alembert criterion, 147 Decomposition, 67 De Morgan’s laws, 4 Determinant, 65, 68–71, 73 Diagonal, 69, 72 Difference of the sets, 5 Differential equation of second order, 128, 134, 139, 152 Differential field extension, 166 Differential rank of output, 169–171 Dimension of, The, 81, 82 Direct product, 29 Direct sum, 78, 79, 85 Disjoint, 66, 67 Disjoint cycle, 66, 67 Distributive property, 50, 52 Distributivity, 3

© Springer Nature Switzerland AG 2019 R. Martínez-Guerra et al., Algebraic and Differential Methods for Nonlinear Control Theory, Mathematical and Analytical Techniques with Applications to Engineering, https://doi.org/10.1007/978-3-030-12025-2

193

194 Divisibility, 36, 37 Division algorithm for polynomials, 46, 47 Divisor of zero, 32 Domain, 8–11, 13 Dot diagram, 120–122 Dynamical controller, 175, 176

E Eigenspace, 110, 120 Eigenvalue, 109–121 Eigenvector, 109–115, 117–122 Elementary matrix, 59, 61, 62 Elementary row operation, 59, 61, 62 Epimorphism, 27, 29 Equivalent matrices, 112 Euclid’s lemma, 41 Euclidean algorithm, 39, 40, 47 Euclidean spaces, 75, 88 Even permutation, 67 Exact equation, 132 Existence of an additive identity element, 49 Existence of an additive inverse, 49

F Field, 32, 35, 44, 76–81, 83, 85–87, 89, 93, 95, 99–105 Finite-dimensional, 82–84, 86 Finite-dimensional vector space, 95, 99, 100, 103–105 Finite set, 66 Free fall, 125 Friction, 127 Fundamental theorem of arithmetic, 42

G Gauss-Jordan elimination method, 62 Gauss-Jordan method, 59 Generalized controller, 174 Generalized eigenspace, 120 Generalized eigenvectors, 120 General solution, 63 Generated subspace, 75, 79 Greatest Common Divisor (GCD), 38–40, 47 Group, 19–27, 29

H Homogeneous, 63 Homogeneous equation, 131, 139, 140, 144, 160 Homogeneous function, 131

Index Homogeneous second order linear differential equation, 137 Homomorphism, 19, 26–29 I Ideal, 31, 34–36 Idempotency, 4 Identical element, 19, 26 Identity element, 75, 77 Image, 93, 95 Indeterminate coefficients, 145, 146 Inferior triangular, 69 Infinite group, 20 Initial value problem, 141, 154–156 Injective, 94, 99, 100 Inner product, 88–90 Inner product space, 89, 90 Input-output system, 168, 171, 172 Integral domain, 32, 34 Intersection, 3 Inverse additive, 77 Inverse element, 21, 25, 26, 29 Inverse of a square matrix, 62 Invertible, 12, 57–60, 62 Invertible systems, 163, 168 Isomorphism, 19, 27–29, 99, 103, 104 Isomorphism of algebras, 104 J Jordan block, 116, 118 Jordan canonical base, 120, 123 Jordan canonical form, 109, 117, 123 K K -algebraically independent, 164, 166 k-cycle, 66 Kernel, 26, 29, 93, 94 Kinetic, 126 Kronecker delta, 56 L Lagrange’s theorem, 23, 24 Leading coefficient, 44 Least common multiple, 42, 43 Linear combination, 80–82 Linear differential equation, 125, 134, 137, 159 Linear dynamic, 179, 181 Linear equation, 63 Linearization by dynamical feedback, 163 Linearly dependent, 102

Index Linearly independent, 80–84, 86, 90, 95, 99, 101, 105, 106 Linear operations, 50, 51 Linear operator, 93, 100, 102, 104, 106, 109– 111, 116, 118–120 Linear transformation, 93–95, 99–104 Lipschitz condition, 157, 158

M Map, 8 Mathematical induction, 1, 13, 15, 16 Matrix, 49, 53–55, 57–62 Matrix diagonalization, 109 Matrix representation, 102 Maximal family, 164 Modulo, 22 Monoid, 19 Monomorphism, 27, 28 Mutually disjoint, 4 Mutually prime, 41

N Nonempty, 19, 21, 22 Nonempty set, 19, 31, 65 Nonlinear differential equation, 128 Normalize, 89 Normal subgroup, 24, 25, 29 n-tuple, 50–52 Null n-tuple, 52

O Observable, 97 Observer canonical form, 173 Odd permutation, 67, 70 Order of a group, 20 Orthogonal, 90, 91 Orthogonal complement, 90 Orthonormal, 90

P Pairwise disjoin, 4 Parity, 67 Pendulum, 126 Permutation, 65–68, 70 Permutations group, 65, 67 Picard’s iterations, 155, 157 Picard’s method, 154 Polynomial ring, 31, 44 Potential energy, 126 Prime number, 32, 41, 42

195 Primitive element, 173, 175, 178, 181 Principal ideal, 34 Principal ideals domain, 34 Proper ideal, 34

Q Quotient group, 25 Quotient ring, 34 Quotient space, 75, 85

R Realization, 163, 172 Real numbers, 49 Reflexive, 2, 6 Related module, 85 Relatively prime, 41 Right lateral class, 23–25 Ring, 31–36, 44 Ring with division, 32 Ring with identity, 31, 32, 34, 35, 44 Row addition, 60 Row multiplication, 60 Row-reduced, 62 Row switching, 59

S Scalar multiplication, 101 Scalars, 51–53 Second law of Newton, 125 Separable differential equation, 129, 131 Sets, 1–5, 8, 9 Set without elements, 1 Sign of, 67, 68 Single-input single-output systems, 163 Singleton, 1 Smooth control law, 181 Space of linear transformations, 100 Stabilization, 163, 178 Subgroup, 19, 21–26 Subset, 2–4, 7, 13 Subspace, 75, 78, 79, 83–86, 90, 91, 94, 95 Sum of subspaces, 78 Superior triangular, 69, 91 Surjective, 94, 95, 99, 100, 104 System of linear equations, 72, 73

T Trajectory tracking, 168 Transcendent, 163 Transcendental field extensions, 163

196 Transformation matrix, 115, 123 Transitive, 2, 6 Transpose, 58 Transposition, 66, 67, 70 Triangular, 69 Trivial solution, 63

U Union, 3–5 Unitarian, 89 Universal set, 3 Universe, 3, 5

Index V Variation of parameters, 139 Vectors, 49, 51 Vector space, 75, 77–86, 88, 89

W Well defined, 1 Well-ordering principle, 13

Z Zero vector, 77