Block-oriented Nonlinear System Identification (Lecture Notes in Control and Information Sciences, 404) [2010 ed.] 1849965129, 9781849965125

Block-oriented Nonlinear System Identification deals with an area of research that has been very active since the turn o

114 83 7MB

English Pages [425] Year 2010

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Modeling and Identification of Linear Parameter-Varying Systems (Lecture Notes in Control and Information Sciences, 403) 364213811X, 9783642138119

Through the past 20 years, the framework of Linear Parameter-Varying (LPV) systems has become a promising system theoret

99 25 5MB Read more

Trends in Nonlinear and Adaptive Control: A Tribute to Laurent Praly for his 65th Birthday (Lecture Notes in Control and Information Sciences, 488) 3030746275, 9783030746278

This book, published in honor of Professor Laurent Praly on the occasion of his 65th birthday, explores the responses of

105 65 5MB Read more

Control Systems with Saturating Inputs: Analysis Tools and Advanced Design (Lecture Notes in Control and Information Sciences, 424) 1447125053, 9781447125051

Saturation nonlinearities are ubiquitous in engineering systems: every physical actuator or sensor is subject to saturat

111 26 2MB Read more

Control Technologies for Emerging Micro and Nanoscale Systems (Lecture Notes in Control and Information Sciences, 413) 3642221726, 9783642221729

This book comprises a selection of the presentations made at the “Workshop on Dynamics and Control of Micro and Nanoscal

101 64 14MB Read more

Control Problems of Discrete-Time Dynamical Systems (Lecture Notes in Control and Information Sciences, 447) 3642380573, 9783642380570

This monograph deals with control problems of discrete-time dynamical systems, which include linear and nonlinear input/

124 58 2MB Read more

Quantum Information Processing: Lecture Notes

396 102 4MB Read more

Vision Based Identification and Force Control of Industrial Robots (Studies in Systems, Decision and Control, 404) 9811669899, 9789811669897

This book focuses on end-to-end robotic applications using vision and control algorithms, exposing its readers to design

116 93 8MB Read more

Information Security Lecture Notes (Dixie IT4500)

620 72 267KB Read more

System Security Lecture Notes (StonyBrook CSE509)

527 34 2MB Read more

Operating System Lecture Notes (MIT 6.828)

1,295 162 2MB Read more

Block-oriented Nonlinear System Identification (Lecture Notes in Control and Information Sciences, 404) [2010 ed.]
1849965129, 9781849965125

Author / Uploaded
Fouad Giri (editor)
Er-Wei Bai (editor)

Table of contents :
Title
Preface
Contents
Part I Block-oriented Nonlinear Models
Introduction to Block-oriented Nonlinear Systems
Block-oriented Nonlinear Systems
About This Book
Book Topics
Who Can Use This Book
References
Nonlinear System Modelling and Analysis fromthe Volterra and Wiener Perspective
Introduction
General System Considerations
The Volterra Series
Applications of the Volterra Series
The Wiener Theory
The Wiener G-functionals
System Modelling with the G-functionals
General Wiener Model
The Gate Function Model
An Optimum System Calculator
References
Part II Iterative and Overparameterization Methods
An Optimal Two-stage Identification Algorithm for Hammerstein–Wiener Nonlinear Systems
Introduction
Optimal Two-stage Algorithm
Concluding Remarks
References
Compound Operator Decomposition and Its Application to Hammerstein andWiener Systems
Introduction
Decomposition
Serial Application
Parallel Application
Decomposition of Block-oriented Nonlinear Systems
Hammerstein System
Wiener System
Identification of Hammerstein–Wiener Systems
Hammerstein–Wiener Systems
Piecewise-linear Characteristics
Algorithm
Example
Conclusions
References
Iterative Identification of Hammerstein Systems
Introduction
Hammerstein System with IIR Linear Part
Non-smooth Nonlinearities
Examples
Conclusion
References
Part III Stochastic Methods
Recursive Identification for Stochastic Hammerstein Systems
Introduction
Nonparametric f (·)
Identification of A(z)
Identification of B(z)
Identification of f (u)
Piecewise Linear f (·)
Parameterized Nonlinearity
Concluding Remarks
Appendix
References
Wiener System Identification Using the Maximum Likelihood Method
Introduction
An Output-error Approach
The Maximum Likelihood Method
Likelihood Function for White Disturbances
Likelihood Function for Coloured Process Noise
Maximum Likelihood Algorithms
Direct Gradient-based Search Approach
Expectation Maximisation Approach
Simulation Examples
Example 1: White Process and Measurement Noise
Example 2: Coloured Process Noise
Example 3: Blind Estimation
Conclusion
References
Parametric Versus Nonparametric Approach to Wiener Systems Identification
Introduction toWiener Systems
Nonlinear Least Squares Method
Nonparametric Identification Tools
Inverse Regression Approach
Cross-correlation Analysis
A Censored Sample Mean Approach
Combined Parametric-nonparametric Approach
Kernel Method with the Correlation-based Internal Signal Estimation
Identification of IIR Wiener Systems with Non-Gaussian Input
Recent Ideas
Conclusion
References
Identification of Block-oriented Systems: Nonparametric and Semiparametric Inference
Introduction
Nonparametric and Semiparametric Inference
Semiparametric Block-oriented Systems
Semiparametric Hammerstein Systems
Semiparametric Parallel Systems
Concluding Remarks
References
Identification of Block-oriented Systems Using the Invariance Property
Introduction
Preliminaries
The Invariance Property and Separable Processes
Block-oriented Systems
Discussion
References
Part IV Frequency Methods
Frequency Domain Identification of Hammerstein Models
Introduction
Problem Statement and Point Estimation
Continuous Time Frequency Response
Point Estimation of $G(j\omega)$ Based on $Y_T$ and $U_T$
Implementation Using Sampled Data
Identification of G(s)
Finite-order Rational Transfer Function G(s)
Non-parametric G(s)
Identification of the Nonlinear Part f (u)
Unknown Nonlinearity Structure
Polynomial Nonlinearities
Simulation
Concluding Remarks
References
Frequency Identification of Nonparametric Wiener Systems
Introduction
Identification Problem Statement
Frequency Behaviour Geometric Interpretations
Characterisation of the Loci (xn(t),w(t)) and (x−n (t),w(t))
Estimation of the Loci C$_Psi$ (U,$omega$)
Wiener System Identification Method
Phase Estimation (PE)
Nonlinearity Estimation (NLE)
Frequency Gain Modulus Estimation
Simulation Results
Further Expressions
Geometric Area
Signal Spread
Conclusion
References
Identification of Wiener–Hammerstein Systems Using the Best Linear Approximation
Introduction
Block-oriented versus Black-box Models
Identification Issues
The Best Linear Approximation
Definition
Class of Excitations
Nonlinear Block Structure Selection Method
Two-stage Nonparametric Approach
Some Nonlinear Block Structures
Theoretical Results
Experimental Results
Concluding Remarks
Initial Estimates forWiener–Hammerstein Models
Set-up
Initialisation Procedure
Experimental Results
Concluding Remarks
Conclusions
References
Part V SVM, Subspace and Separable Least-squares
Subspace Identification of Hammerstein–Wiener Systems Operating in Closed-loop
Introduction
Problem Formulation
Problem Formulation
Concept of Basis Functions
Assumptions and Notation
Hammerstein–Wiener Predictor-based Subspace Identification
Predictors
Extended Observability Times Controllability Matrix
Estimation of the Wiener Nonlinearity
Recovery of the System Matrices
Estimation of the Hammerstein Nonlinearity
Example
Conclusions
References
NARX Identification of Hammerstein Systems Using Least-Squares Support Vector Machines
Introduction
Hammerstein Identification Using an Overparametrisation Approach
Implementation of Overparametrisation
Potential Problems in Overparametrisation
Function Approximation Using Least Squares Support Vector Machines
NARX Hammerstein Identification as a Componentwise LS-SVM
SISO Systems
Identification of Hammerstein MIMO Systems
Example
Extensions
Outlook
References
Identification of Linear Systems with Hard Input Nonlinearities of Known Structure
Problem Statement
Deterministic Approach
Identification Algorithm
Consistency Analysis and Computational Issues
Correlation Analysis Method
Concluding Remarks
References
Part VI Blind Methods
Blind Maximum-likelihood Identification of Wiener and Hammerstein Nonlinear Block Structures
Introduction: Blind Nonlinear Modelling
Nonlinear Sensor Calibration
Outline
Introduction of Models and Related Assumptions
Class of Discrete-time Wiener and Hammerstein Systems Considered
Parametrisation
Stochastic Framework
Identifiability
The Gaussian Maximum-likelihood Estimator (MLE)
The Negative Log-likelihood (NLL) Function
The Simplified MLE Cost Function
Asymptotic Properties
Loss of Consistency in the Case of a Non-Gaussian Input
Non-white Gaussian Inputs
Generation of Initial Estimates
Subproblem 1: Monotonically Increasing Static Nonlinearity Driven by Gaussian Noise
Subproblem 2: LTI Driven by White Input Noise
Minimisation of the Cost Function
The Cram´er-Rao Lower Bound
Impact of Output Noise
Simulation Results
Setup: Presentation of the Example
Graphical Presentation of the Results
Monte Carlo Analysis Showing the Impact of Output Noise
Laboratory Experiment
Conclusion
References
A Blind Approach to Identification of Hammerstein Systems
Introduction
Problem Description
Estimation of n$_a$, $\tau$ and A(q)
Estimation of x(t)
Numerical Examples
Experimental Example
Hammerstein Model of MR Dampers
Experiment Setup
Experiment Result
Conclusion
References
A Blind Approach to the Hammerstein-Wiener Model Identification
Introduction
Problem Statement and Preliminaries
Identification of the Hammerstein-Wiener Model
Output Nonlinearity Estimation
Linear Transfer Function Estimation
Input Nonlinearity Estimation
Algorithm and Simulations
Discussions
Concluding Remarks
References
Part VII Decoupling Inputs and Bounded Error Methods
Decoupling the Linear and Nonlinear Parts in Hammerstein Model Identification
Problem Statement
Nonlinearity with the PRBS Inputs
Linear Part Identification
Non-parametric Identification
Parametric Identification
Nonlinear Part Identification
Concluding Remarks
References
Hammerstein System Identification in Presence of Hard Memory Nonlinearities
Introduction
Identification Problem Formulation
Linear Subsystem Identification
Model Reforming
Model Centring and Linear Subsystem Parameter Estimation
A Class of Exciting Input Signal
Consistency of Linear Subsystem Parameter Estimates
Simulation
Nonlinear Element Estimation
Estimation of m$_1$
Estimation of (h1,h2)
Simulation
Conclusion
References
Bounded Error Identification of Hammerstein Systems with Backlash
Introduction
Problem Formulation
Assessment of Tight Bounds on the Nonlinear Static Block Parameters
Definitions and Preliminary Results
Exact Description of D$^l_γ$
Tight Orthotope Description of D$^l_γ$
Bounding the Parameters of the Linear Dynamic Model
A Simulated Example
Conclusion
References
Part VIII Application of Block-oriented Models
Block Structured Modelling in the Study of the Stretch Reflex
Introduction
Preliminaries
Initial Applications
Hard Nonlinearities
The Parallel Cascade Stiffness Model
Iterative, Correlation based Approach
Separable Least Squares Optimisation
Conclusions
References
Application of Block-oriented System Identification to Modelling Paralysed Muscle Under Electrical Stimulation
Introduction
Problem Statement
The Wiener–Hammerstein Fatigue Model
Identification of theWiener–Hammerstein System
Identification of the Wiener–Hammerstein Non-fatigue Model (Single Train Stimulation Model)
Identification of the Wiener–Hammerstein Fatigue Model
Collection of SCI Patient Data
Results
Discussion and Conclusions
References
Index

Citation preview

Lecture Notes in Control and Information Sciences 404 Editors: M. Thoma, F. Allgöwer, M. Morari

Fouad Giri and Er-Wei Bai (Eds.)

Block-oriented Nonlinear System Identification

ABC

Series Advisory Board P. Fleming, P. Kokotovic, A.B. Kurzhanski, H. Kwakernaak, A. Rantzer, J.N. Tsitsiklis

Editors Prof. Fouad Giri Université de Caen GREYC Lab, CNRS UMR 6072 14032 Caen France E-mail: [email protected] Dr. Er-Wei Bai University of Iowa Dept. Electrical & Computer Engineering Iowa Advanced Technology Laboratories 52244 Iowa City Iowa USA E-mail: [email protected]

ISBN 978-1-84996-512-5

e-ISBN 978-1-84996-513-2

DOI 10.1007/978-1-84996-513-2 Lecture Notes in Control and Information Sciences

ISSN 0170-8643

Library of Congress Control Number: 2010932194 MATLAB: MATLAB and Simulink are registered trademarks of The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098, U.S.A. c

2010 Springer-Verlag Berlin Heidelberg

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed in acid-free paper 543210 springer.com

Lecture Notes in Control and Information Sciences Edited by M. Thoma, F. Allgöwer, M. Morari Further volumes of this series can be found on our homepage: springer.com Vol. 404: Giri, F.; Bai, E.-W. (Eds.): Block-oriented Nonlinear System Identification 425 p. 2010 [978-1-84996-512-5] Vol. 403: Tóth, R.; Modeling and Identification of Linear Parameter-Varying Systems 319 p. 2010 [978-3-642-13811-9] Vol. 402: del Re, L.; Allgöwer, F.; Glielmo, L.; Guardiola, C.; Kolmanovsky, I. (Eds.): Automotive Model Predictive Control 284 p. 2010 [978-1-84996-070-0] Vol. 401: Chesi, G.; Hashimoto, K. (Eds.): Visual Servoing via Advanced Numerical Methods 393 p. 2010 [978-1-84996-088-5] Vol. 400: Tomás-Rodríguez, M.; Banks, S.P.: Linear, Time-varying Approximations to Nonlinear Dynamical Systems 298 p. 2010 [978-1-84996-100-4] Vol. 399: Edwards, C.; Lombaerts, T.; Smaili, H. (Eds.): Fault Tolerant Flight Control appro. 350 p. 2010 [978-3-642-11689-6] Vol. 398: Hara, S.; Ohta, Y.; Willems, J.C.; Hisaya, F. (Eds.): Perspectives in Mathematical System Theory, Control, and Signal Processing appro. 370 p. 2010 [978-3-540-93917-7] Vol. 397: Yang, H.; Jiang, B.; Cocquempot, V.: Fault Tolerant Control Design for Hybrid Systems 191 p. 2010 [978-3-642-10680-4] Vol. 396: Kozlowski, K. (Ed.): Robot Motion and Control 2009 475 p. 2009 [978-1-84882-984-8] Vol. 395: Talebi, H.A.; Abdollahi, F.; Patel, R.V.; Khorasani, K.: Neural Network-Based State Estimation of Nonlinear Systems appro. 175 p. 2010 [978-1-4419-1437-8]

Vol. 394: Pipeleers, G.; Demeulenaere, B.; Swevers, J.: Optimal Linear Controller Design for Periodic Inputs 177 p. 2009 [978-1-84882-974-9] Vol. 393: Ghosh, B.K.; Martin, C.F.; Zhou, Y.: Emergent Problems in Nonlinear Systems and Control 285 p. 2009 [978-3-642-03626-2] Vol. 392: Bandyopadhyay, B.; Deepak, F.; Kim, K.-S.: Sliding Mode Control Using Novel Sliding Surfaces 137 p. 2009 [978-3-642-03447-3] Vol. 391: Khaki-Sedigh, A.; Moaveni, B.: Control Configuration Selection for Multivariable Plants 232 p. 2009 [978-3-642-03192-2] Vol. 390: Chesi, G.; Garulli, A.; Tesi, A.; Vicino, A.: Homogeneous Polynomial Forms for Robustness Analysis of Uncertain Systems 197 p. 2009 [978-1-84882-780-6] Vol. 389: Bru, R.; Romero-Vivó, S. (Eds.): Positive Systems 398 p. 2009 [978-3-642-02893-9] Vol. 388: Jacques Loiseau, J.; Michiels, W.; Niculescu, S-I.; Sipahi, R. (Eds.): Topics in Time Delay Systems 418 p. 2009 [978-3-642-02896-0] Vol. 387: Xia, Y.; Fu, M.; Shi, P.: Analysis and Synthesis of Dynamical Systems with Time-Delays 283 p. 2009 [978-3-642-02695-9] Vol. 386: Huang, D.; Nguang, S.K.: Robust Control for Uncertain Networked Control Systems with Random Delays 159 p. 2009 [978-1-84882-677-9]

Vol. 385: Jungers, R.: The Joint Spectral Radius 144 p. 2009 [978-3-540-95979-3] Vol. 384: Magni, L.; Raimondo, D.M.; Allgöwer, F. (Eds.): Nonlinear Model Predictive Control 572 p. 2009 [978-3-642-01093-4] Vol. 383: Sobhani-Tehrani E.: Khorasani K.; Fault Diagnosis of Nonlinear Systems Using a Hybrid Approach 360 p. 2009 [978-0-387-92906-4]

Vol. 373: Wang Q.-G.; Ye Z.; Cai W.-J.; Hang C.-C.: PID Control for Multivariable Processes 264 p. 2008 [978-3-540-78481-4] Vol. 372: Zhou J.; Wen C.: Adaptive Backstepping Control of Uncertain Systems 241 p. 2008 [978-3-540-77806-6] Vol. 371: Blondel V.D.; Boyd S.P.; Kimura H. (Eds.): Recent Advances in Learning and Control 279 p. 2008 [978-1-84800-154-1]

Vol. 382: Bartoszewicz A.; Nowacka-Leverton A.: Time-Varying Sliding Modes for Second and Third Order Systems 192 p. 2009 [978-3-540-92216-2]

Vol. 370: Lee S.; Suh I.H.; Kim M.S. (Eds.): Recent Progress in Robotics: Viable Robotic Service to Human 410 p. 2008 [978-3-540-76728-2]

Vol. 381: Hirsch M.J.; Commander C.W.; Pardalos P.M.; Murphey R. (Eds.): Optimization and Cooperative Control Strategies: Proceedings of the 8th International Conference on Cooperative Control and Optimization 459 p. 2009 [978-3-540-88062-2]

Vol. 369: Hirsch M.J.; Pardalos P.M.; Murphey R.; Grundel D.: Advances in Cooperative Control and Optimization 423 p. 2007 [978-3-540-74354-5]

Vol. 380: Basin M.: New Trends in Optimal Filtering and Control for Polynomial and Time-Delay Systems 206 p. 2008 [978-3-540-70802-5] Vol. 379: Mellodge P.; Kachroo P.: Model Abstraction in Dynamical Systems: Application to Mobile Robot Control 116 p. 2008 [978-3-540-70792-9] Vol. 378: Femat R.; Solis-Perales G.: Robust Synchronization of Chaotic Systems Via Feedback 199 p. 2008 [978-3-540-69306-2] Vol. 377: Patan K.: Artificial Neural Networks for the Modelling and Fault Diagnosis of Technical Processes 206 p. 2008 [978-3-540-79871-2] Vol. 376: Hasegawa Y.: Approximate and Noisy Realization of Discrete-Time Dynamical Systems 245 p. 2008 [978-3-540-79433-2] Vol. 375: Bartolini G.; Fridman L.; Pisano A.; Usai E. (Eds.): Modern Sliding Mode Control Theory 465 p. 2008 [978-3-540-79015-0] Vol. 374: Huang B.; Kadali R.: Dynamic Modeling, Predictive Control and Performance Monitoring 240 p. 2008 [978-1-84800-232-6]

Vol. 368: Chee F.; Fernando T. Closed-Loop Control of Blood Glucose 157 p. 2007 [978-3-540-74030-8] Vol. 367: Turner M.C.; Bates D.G. (Eds.): Mathematical Methods for Robust and Nonlinear Control 444 p. 2007 [978-1-84800-024-7] Vol. 366: Bullo F.; Fujimoto K. (Eds.): Lagrangian and Hamiltonian Methods for Nonlinear Control 2006 398 p. 2007 [978-3-540-73889-3] Vol. 365: Bates D.; Hagström M. (Eds.): Nonlinear Analysis and Synthesis Techniques for Aircraft Control 360 p. 2007 [978-3-540-73718-6] Vol. 364: Chiuso A.; Ferrante A.; Pinzoni S. (Eds.): Modeling, Estimation and Control 356 p. 2007 [978-3-540-73569-4] Vol. 363: Besançon G. (Ed.): Nonlinear Observers and Applications 224 p. 2007 [978-3-540-73502-1] Vol. 362: Tarn T.-J.; Chen S.-B.; Zhou C. (Eds.): Robotic Welding, Intelligence and Automation 562 p. 2007 [978-3-540-73373-7] Vol. 361: Méndez-Acosta H.O.; Femat R.; González-Álvarez V. (Eds.): Selected Topics in Dynamics and Control of Chemical and Biological Processes 320 p. 2007 [978-3-540-73187-0]

Preface

Identification of block-oriented nonlinear systems has been an active research area for over five decades. In particular, over the last fifteen years, there has been a resurgence of research interests. On the one hand, a large number of works are reported each year in diversified journals and conferences. On the other, there are new and unsolved problems arising from practical applications. A number of very successful invited sessions on the topic were organised in recent IFAC SYSID symposia, IEEE CDC, ACC, and others, representing significant advances from theory to designs and applications. The idea itself of writing this book finds its origin in an invited session organised by the editors at SYSID 2009 in Saint-Malo (France). We thought it was timely and highly beneficial to put together a monograph reflecting the wide variety of problems as well as methods, concepts and results in the block-oriented nonlinear systems field. The present book summarises the state of the art in designing, analysing and implementing identification algorithms for this class of systems. It reports on the most recent and significant developments in the area and presents some new research directions. It is intended for a broad spectrum of readers from academia and industry, including researchers, graduate students, engineers and practitioners from various fields. The book provides a thorough theoretical grounding while also addressing the needs of practitioners. We are grateful to all our colleagues from around the world, who brilliantly contributed to this work, gathering the considerable knowledge from a range of aspects in the area as wide as possible. We would also like to thank Mr. Oliver Jackson and Ms. Charlotte Cross from Springer, UK for their encouragement and help in editing the book. Special and warm thanks go to Vincent Van Assche for his valuable help in the preparation of the camera ready copy. Caen, France Iowa City, IA, USA June 2010

Fouad Giri Er-Wei Bai

Contents

Part I: Block-oriented Nonlinear Models 1

2

Introduction to Block-oriented Nonlinear Systems . . . . . . . Er-Wei Bai, Fouad Giri 1.1 Block-oriented Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . 1.2 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Book Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Who Can Use This Book . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonlinear System Modelling and Analysis from the Volterra and Wiener Perspective . . . . . . . . . . . . . . . . . . . . . . . . Martin Schetzen 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 General System Considerations . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The Volterra Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Applications of the Volterra Series . . . . . . . . . . . . . . . . . . . . . . 2.5 The Wiener Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 The Wiener G-functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 System Modelling with the G-functionals . . . . . . . . . . . . . . . . 2.8 General Wiener Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 The Gate Function Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 An Optimum System Calculator . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 3 7 7 9 10 13 13 13 14 15 17 18 19 20 21 22 23

VIII

Contents

Part II: Iterative and Overparameterization Methods 3

4

5

An Optimal Two-stage Identification Algorithm for Hammerstein–Wiener Nonlinear Systems . . . . . . . . . . . . . . . . Er-Wei Bai 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Optimal Two-stage Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compound Operator Decomposition and Its Application to Hammerstein and Wiener Systems . . . . . . . Jozef V¨ or¨ os 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Serial Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Parallel Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Decomposition of Block-oriented Nonlinear Systems . . . . . . 4.3.1 Hammerstein System . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Wiener System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Identification of Hammerstein–Wiener Systems . . . . . . . . . . . 4.4.1 Hammerstein–Wiener Systems . . . . . . . . . . . . . . . . . . 4.4.2 Piecewise-linear Characteristics . . . . . . . . . . . . . . . . . 4.4.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iterative Identification of Hammerstein Systems . . . . . . . . . Yun Liu, Er-Wei Bai 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Hammerstein System with IIR Linear Part . . . . . . . . . . . . . . 5.3 Non-smooth Nonlinearities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27 27 28 33 34 35 35 36 37 39 39 39 41 42 42 44 47 48 49 50 53 53 53 56 61 64 64

Part III: Stochastic Methods 6

Recursive Identification for Stochastic Hammerstein Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Han-Fu Chen 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Nonparametric f (·) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Identification of A(z) . . . . . . . . . . . . . . . . . . . . . . . . . .

69 69 71 71

Contents

6.2.2 Identification of B(z) . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Identification of f (u) . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Piecewise Linear f (·) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Parameterized Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

8

Wiener System Identification Using the Maximum Likelihood Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adrian Wills, Lennart Ljung 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 An Output-error Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 The Maximum Likelihood Method . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Likelihood Function for White Disturbances . . . . . . 7.3.2 Likelihood Function for Coloured Process Noise . . . 7.4 Maximum Likelihood Algorithms . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Direct Gradient-based Search Approach . . . . . . . . . . 7.4.2 Expectation Maximisation Approach . . . . . . . . . . . . 7.5 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Example 1: White Process and Measurement Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Example 2: Coloured Process Noise . . . . . . . . . . . . . 7.5.3 Example 3: Blind Estimation . . . . . . . . . . . . . . . . . . . 7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parametric Versus Nonparametric Approach to Wiener Systems Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Grzegorz Mzyk 8.1 Introduction to Wiener Systems . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Nonlinear Least Squares Method . . . . . . . . . . . . . . . . . . . . . . . 8.3 Nonparametric Identification Tools . . . . . . . . . . . . . . . . . . . . . 8.3.1 Inverse Regression Approach . . . . . . . . . . . . . . . . . . . 8.3.2 Cross-correlation Analysis . . . . . . . . . . . . . . . . . . . . . . 8.3.3 A Censored Sample Mean Approach . . . . . . . . . . . . . 8.4 Combined Parametric-nonparametric Approach . . . . . . . . . . 8.4.1 Kernel Method with the Correlation-based Internal Signal Estimation . . . . . . . . . . . . . . . . . . . . . 8.4.2 Identification of IIR Wiener Systems with Non-Gaussian Input . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.3 Recent Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

IX

74 75 77 82 85 85 86 89 89 91 94 94 95 97 98 99 104 104 105 108 109 109 111 111 112 114 114 115 116 120 120 121 122 123 123

X

9

Contents

Identification of Block-oriented Systems: Nonparametric and Semiparametric Inference . . . . . . . . . . . M. Pawlak 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Nonparametric and Semiparametric Inference . . . . . . . . . . . . 9.3 Semiparametric Block-oriented Systems . . . . . . . . . . . . . . . . . 9.3.1 Semiparametric Hammerstein Systems . . . . . . . . . . . 9.3.2 Semiparametric Parallel Systems . . . . . . . . . . . . . . . . 9.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10 Identification of Block-oriented Systems Using the Invariance Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Enqvist 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 The Invariance Property and Separable Processes . . . . . . . . 10.4 Block-oriented Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

127 127 128 136 136 140 144 146 147 147 148 152 154 156 157

Part IV: Frequency Methods 11 Frequency Domain Identification of Hammerstein Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Er-Wei Bai 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Statement and Point Estimation . . . . . . . . . . . . . . . 11.2.1 Continuous Time Frequency Response . . . . . . . . . . . 11.2.2 Point Estimation of G(jω) Based on YT and UT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.3 Implementation Using Sampled Data . . . . . . . . . . . . 11.3 Identification of G(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1 Finite-order Rational Transfer Function G(s) . . . . . 11.3.2 Non-parametric G(s) . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Identification of the Nonlinear Part f (u) . . . . . . . . . . . . . . . . 11.4.1 Unknown Nonlinearity Structure . . . . . . . . . . . . . . . . 11.4.2 Polynomial Nonlinearities . . . . . . . . . . . . . . . . . . . . . . 11.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

161 161 162 163 164 165 168 168 170 171 172 173 175 179 179

Contents

12 Frequency Identification of Nonparametric Wiener Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fouad Giri, Youssef Rochdi, Jean-Baptiste Gning, Fatima-Zahra Chaoui 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Identification Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 12.3 Frequency Behaviour Geometric Interpretations . . . . . . . . . . 12.3.1 Characterisation of the Loci (xn (t), w(t)) and (x− n (t), w(t)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 Estimation of the Loci Cψ (U, ω) . . . . . . . . . . . . . . . . 12.4 Wiener System Identification Method . . . . . . . . . . . . . . . . . . . 12.4.1 Phase Estimation (PE) . . . . . . . . . . . . . . . . . . . . . . . . 12.4.2 Nonlinearity Estimation (NLE) . . . . . . . . . . . . . . . . . 12.4.3 Frequency Gain Modulus Estimation . . . . . . . . . . . . 12.4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Further Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.1 Geometric Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.2 Signal Spread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Identification of Wiener–Hammerstein Systems Using the Best Linear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . Lieve Lauwers, Johan Schoukens 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.1 Block-oriented versus Black-box Models . . . . . . . . . . 13.1.2 Identification Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 The Best Linear Approximation . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.2 Class of Excitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Nonlinear Block Structure Selection Method . . . . . . . . . . . . . 13.3.1 Two-stage Nonparametric Approach . . . . . . . . . . . . . 13.3.2 Some Nonlinear Block Structures . . . . . . . . . . . . . . . 13.3.3 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Initial Estimates for Wiener–Hammerstein Models . . . . . . . . 13.4.1 Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 Initialisation Procedure . . . . . . . . . . . . . . . . . . . . . . . . 13.4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

XI

181

181 183 184 184 190 192 192 193 195 199 204 204 205 206 206 209 209 209 210 211 211 212 213 213 213 214 215 216 217 217 217 221 223 224 225

XII

Contents

Part V: SVM, Subspace and Separable Least-squares 14 Subspace Identification of Hammerstein–Wiener Systems Operating in Closed-loop . . . . . . . . . . . . . . . . . . . . . . . Jan-Willem van Wingerden, Michel Verhaegen 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.2 Concept of Basis Functions . . . . . . . . . . . . . . . . . . . . . 14.2.3 Assumptions and Notation . . . . . . . . . . . . . . . . . . . . . 14.3 Hammerstein–Wiener Predictor-based Subspace Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.1 Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.2 Extended Observability Times Controllability Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.3 Estimation of the Wiener Nonlinearity . . . . . . . . . . . 14.3.4 Recovery of the System Matrices . . . . . . . . . . . . . . . . 14.3.5 Estimation of the Hammerstein Nonlinearity . . . . . 14.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 NARX Identification of Hammerstein Systems Using Least-Squares Support Vector Machines . . . . . . . . . . . . . . . . . Ivan Goethals, Kristiaan Pelckmans, Tillmann Falck, Johan A.K. Suykens, Bart De Moor 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Hammerstein Identification Using an Overparametrisation Approach . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.1 Implementation of Overparametrisation . . . . . . . . . . 15.2.2 Potential Problems in Overparametrisation . . . . . . . 15.3 Function Approximation Using Least Squares Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 NARX Hammerstein Identification as a Componentwise LS-SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.1 SISO Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.2 Identification of Hammerstein MIMO Systems . . . . 15.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.7 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

229 229 230 230 231 232 233 233 234 235 235 235 236 239 239 241

241 243 243 244 245 247 247 250 253 254 255 256

16 Identification of Linear Systems with Hard Input Nonlinearities of Known Structure . . . . . . . . . . . . . . . . . . . . . . . 259 Er-Wei Bai 16.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Contents

XIII

16.2

261 261 263 265 269 270

Deterministic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.1 Identification Algorithm . . . . . . . . . . . . . . . . . . . . . . . 16.2.2 Consistency Analysis and Computational Issues . . . 16.3 Correlation Analysis Method . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part VI: Blind Methods 17 Blind Maximum-likelihood Identification of Wiener and Hammerstein Nonlinear Block Structures . . . . . . . . . . . Laurent Vanbeylen, Rik Pintelon 17.1 Introduction: Blind Nonlinear Modelling . . . . . . . . . . . . . . . . 17.1.1 Nonlinear Sensor Calibration . . . . . . . . . . . . . . . . . . 17.1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Introduction of Models and Related Assumptions . . . . . . . . . 17.2.1 Class of Discrete-time Wiener and Hammerstein Systems Considered . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.2 Parametrisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.3 Stochastic Framework . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.4 Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3 The Gaussian Maximum-likelihood Estimator (MLE) . . . . . 17.3.1 The Negative Log-likelihood (NLL) Function . . . . 17.3.2 The Simplified MLE Cost Function . . . . . . . . . . . . . 17.3.3 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . . . 17.3.4 Loss of Consistency in the Case of a Non-Gaussian Input . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.5 Non-white Gaussian Inputs . . . . . . . . . . . . . . . . . . . . . 17.4 Generation of Initial Estimates . . . . . . . . . . . . . . . . . . . . . . . . 17.4.1 Subproblem 1: Monotonically Increasing Static Nonlinearity Driven by Gaussian Noise . . . . . . . . . . 17.4.2 Subproblem 2: LTI Driven by White Input Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5 Minimisation of the Cost Function . . . . . . . . . . . . . . . . . . . . . . 17.6 The Cramér-Rao Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . 17.7 Impact of Output Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.8 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.8.1 Setup: Presentation of the Example . . . . . . . . . . . . . 17.8.2 Graphical Presentation of the Results . . . . . . . . . . . 17.8.3 Monte Carlo Analysis Showing the Impact of Output Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9 Laboratory Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

273 273 275 275 275 275 276 276 277 277 278 278 279 280 280 280 281 281 282 282 283 283 283 284 286 287 289 290

XIV

Contents

18 A Blind Approach to Identification of Hammerstein Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiandong Wang, Akira Sano, Tongwen Chen, Biao Huang 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3 Estimation of na , τ and A(q) . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4 Estimation of x(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.6 Experimental Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.6.1 Hammerstein Model of MR Dampers . . . . . . . . . . . . 18.6.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.6.3 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 A Blind Approach to the Hammerstein-Wiener Model Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Er-Wei Bai 19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Problem Statement and Preliminaries . . . . . . . . . . . . . . . . . . . 19.3 Identification of the Hammerstein-Wiener Model . . . . . . . . . 19.3.1 Output Nonlinearity Estimation . . . . . . . . . . . . . . . . 19.3.2 Linear Transfer Function Estimation . . . . . . . . . . . . 19.3.3 Input Nonlinearity Estimation . . . . . . . . . . . . . . . . . . 19.3.4 Algorithm and Simulations . . . . . . . . . . . . . . . . . . . . . 19.3.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

293 293 294 296 299 303 306 307 307 309 311 311 313 313 313 315 315 318 320 321 324 331 331

Part VII: Decoupling Inputs and Bounded Error Methods 20 Decoupling the Linear and Nonlinear Parts in Hammerstein Model Identification . . . . . . . . . . . . . . . . . . . . . . . Er-Wei Bai 20.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 Nonlinearity with the PRBS Inputs . . . . . . . . . . . . . . . . . . . . . 20.3 Linear Part Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.1 Non-parametric Identification . . . . . . . . . . . . . . . . . . . 20.3.2 Parametric Identification . . . . . . . . . . . . . . . . . . . . . . . 20.4 Nonlinear Part Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

335 335 337 339 339 341 343 345 345

Contents

21 Hammerstein System Identification in Presence of Hard Memory Nonlinearities . . . . . . . . . . . . . . . . . . . . . . . . . . . . Youssef Rochdi, Vincent Van Assche, Fatima-Zahra Chaoui, Fouad Giri 21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Identification Problem Formulation . . . . . . . . . . . . . . . . . . . . . 21.3 Linear Subsystem Identification . . . . . . . . . . . . . . . . . . . . . . . . 21.3.1 Model Reforming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3.2 Model Centring and Linear Subsystem Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 21.3.3 A Class of Exciting Input Signal . . . . . . . . . . . . . . . . 21.3.4 Consistency of Linear Subsystem Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Nonlinear Element Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 21.4.1 Estimation of m1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4.2 Estimation of (h1 , h2 ) . . . . . . . . . . . . . . . . . . . . . . . . . 21.4.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Bounded Error Identification of Hammerstein Systems with Backlash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vito Cerone, Dario Piga, Diego Regruto 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Assessment of Tight Bounds on the Nonlinear Static Block Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3.1 Definitions and Preliminary Results . . . . . . . . . . . . . 22.3.2 Exact Description of Dγl . . . . . . . . . . . . . . . . . . . . . . . 22.3.3 Tight Orthotope Description of Dγl . . . . . . . . . . . . . . 22.4 Bounding the Parameters of the Linear Dynamic Model . . . 22.5 A Simulated Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

XV

347

347 348 350 351 351 353 353 354 355 355 356 364 364 364 367 367 368 370 372 373 373 374 377 380 381

Part VIII: Application of Block-oriented Models 23 Block Structured Modelling in the Study of the Stretch Reflex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David T. Westwick 23.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.3 Initial Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

385 385 385 387

XVI

Contents

23.4 23.5

Hard Nonlinearities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Parallel Cascade Stiffness Model . . . . . . . . . . . . . . . . . . . 23.5.1 Iterative, Correlation Based Approach . . . . . . . . . . . 23.5.2 Separable Least Squares Optimisation . . . . . . . . . . . 23.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Application of Block-oriented System Identification to Modelling Paralysed Muscle Under Electrical Stimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhijun Cai, Er-Wei Bai, Richard K. Shield 24.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.3 The Wiener–Hammerstein Fatigue Model . . . . . . . . . . . . . . . . 24.4 Identification of the Wiener–Hammerstein System . . . . . . . . 24.4.1 Identification of the Wiener–Hammerstein Non-fatigue Model (Single Train Stimulation Model) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.4.2 Identification of the Wiener–Hammerstein Fatigue Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.5 Collection of SCI Patient Data . . . . . . . . . . . . . . . . . . . . . . . . . 24.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.7 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

389 396 397 398 400 400

403 403 405 406 409

409 411 412 413 417 417

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421

List of Contributors

E.B. Bai University of Iowa, Iowa City, IA, USA [email protected] Z. Cai University of Iowa, Iowa City, IA, USA [email protected] V. Cerone Politecnico di Torino, Torino, Italy [email protected]

M. Enqvist Link¨ opings universitet, Link¨ oping, Sweden [email protected] T. Falck Katholieke Universiteit Leuven, Leuven, Belgium [email protected] F. Giri University of Caen Basse-Normandie, Caen, France [email protected]

F.Z. Chaoui University of Caen Basse-Normandie, Caen, France J.B. Gning [email protected] Crouzet Automatismes, Valence, France H.F. Chen Jean-Baptiste.Gning@ Chinese Academy of Sciences, greyc.ensicaen.fr Beijing, China [email protected] I. Goethals T. Chen ING Life Belgium, Etterbeek, University of Alberta, Edmonton, Belgium AB, Canada [email protected] [email protected] B. De Moor Katholieke Universiteit Leuven, Leuven, Belgium [email protected]

B. Huang University of Alberta, Edmonton, AB, Canada [email protected]

XVIII

List of Contributors

L. Lauwers Vrije Universiteit Brussel, Brussels, Belgium [email protected]

Y. Rochdi University of Cadi Ayyad, Marrakech, Morocco [email protected]

Y. Liu University of Iowa, Iowa City, IA, USA liu [email protected]

A. Sano Keio University, Yokohama, Japan [email protected]

L. Ljung Link¨ opings universitet, Link¨ oping, Sweden [email protected]

M. Schetzen Northeastern University, Boston, MA, USA [email protected]

G. Mzyk Wroclaw University of Technology, Wroclaw, Poland [email protected]

J. Schoukens Vrije Universiteit Brussel, Brussels, Belgium [email protected]

M. Pawlak University of Manitoba, Winnipeg, Canada [email protected]

R.K. Shield University of Iowa, Iowa City, IA, USA [email protected]

K. Pelckmans Uppsala University, Uppsala, Sweden kristiaan.pelckmans@ esat.kuleuven.be

J.A.K. Suykens Katholieke Universiteit Leuven, Leuven, Belgium [email protected]

D. Piga Politecnico di Torino, Torino, Italy [email protected] R. Pintelon Vrije Universiteit Brussel, Brussels, Belgium rik.pintelon@ vub.ac.be D. Regruto Politecnico di Torino, Torino, Italy [email protected]

V. Van Assche University of Caen Basse-Normandie, Caen, France [email protected] J.W. van Wingerden Delft University of Technology, Delft, The Netherlands [email protected] L. Vanbeylen Vrije Universiteit Brussel, Brussels, Belgium [email protected]

List of Contributors

M. Verhaegen Delft University of Technology, Delft, The Netherlands [email protected] J. V¨ or¨ os Slovak University of Technology, Bratislava, Slovakia [email protected] J. Wang Peking University, Beijing, China [email protected]

XIX

D.T. Westwick University of Calgary, AB, Canada [email protected] A. Wills University of Newcastle, Callaghan, Australia [email protected]

Part I

Block-oriented Nonlinear Models

Chapter 1

Introduction to Block-oriented Nonlinear Systems Er-Wei Bai and Fouad Giri

1.1 Block-oriented Nonlinear Systems System identification refers to the experimental approach that consists of determining system models by fitting experimental data to a suitable model structure [14] in some optimal ways. Linear model structures can be based upon when the physical system remains in the vicinity of a nominal operation point so that the linearity assumption is satisfied. When a wide range of operation modes are involved, the linear assumption may not be valid and a nonlinear model structure becomes necessary to capture the system (nonlinear) behaviour. In relatively simple cases, suitable nonlinear model structures are obtained using the mathematical modelling approach that consists of describing the system phenomena using basic laws of physics, chemistry, etc. Then, system identification methods may be resorted to assign suitable numerical values to the (unknown) model parameters. When the mathematical modelling approach is insufficient, system identification must rely on ‘universal’ blackbox or grey-box nonlinear model structures. These include NARMAX models [9], multi-model representations [15], neuro-fuzzy models [3], Volterra series [19], nonparametric models [14] and others. In the present monograph, the focus is made on block-oriented nonlinear (BONL) models that consist of the interaction of linear time-invariant (LTI) dynamic subsystems and static nonlinear elements (Figure 1.1). The linear subsystems may be parametric (transfer functions, state-space representations, FIR, IIR...) or nonparametric (impulse response, frequency response...). The nonlinear elements may in turn be parametric or not, memory or memoryless. Finally, the system components may be interconnected in different ways (series, parallel, feedback). This high flexibility Er-Wei Bai Dept. of Elec. and Comp. Engineering, University of Iowa e-mail: [email protected] Fouad Giri GREYC, University of Caen e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 3–11. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

4

E.-W. Bai and F. Giri

u1

N

u2

y1

L

L

N

L

N

y2

L

Fig. 1.1: Example of BONL system consisting of LTI subsystems (L) and static nonlinearities (N)

provides these models with a remarkable ability to capture a large class of complex and nonlinear systems and motivates the great deal of interest paid to BONL models over the past fifteen years. It is worth noting that the model (linear and nonlinear) blocks may (and generally do) not correspond to physical components. Consequently, the connection points between blocks are generally artificial, i.e. they cannot be supposed to be accessible to measurements. The inaccessibility of such measurements, together with the system nonlinearities, makes BONL system identification a quite complex problem. Therefore, most currently available solutions concern relatively simple structures. The simplest and most known BONL structures are composed of just two blocks connected in series (Figures 1.2 and 1.3). The first one, the Hammerstein system, introduced in 1930 by the German mathematician A. Hammerstein [6], involves one input static nonlinear element in series with a dynamic LTI subsystem. The nonlinear element may account for actuator nonlinearities and other nonlinear effects that can be brought to the system input. Despite their simplicity, Hammerstein models have proved to be able to accurately describe a wide variety of nonlinear systems e.g. chemical processes [5], electrically stimulated muscles [7], power amplifiers [12], electrical drives [1], thermal microsystems [21], physiological systems [4], sticky control valves [20], solid oxide fuel cells [10], and magneto-rheological dampers [22]. u

NL

L

Fig. 1.2: Hammerstein model

y

1

Introduction to Block-oriented Nonlinear Systems

u

5

y

NL

L

Fig. 1.3: Wiener Model

The permutation of the linear and nonlinear elements in the Hammerstein model leads to what is commonly referred to the Wiener model (Figure 1.3) as a model of this type was first been studied in 1958 by N. Wiener [23]. In this model, the output nonlinear element may represent sensor nonlinearities as well as any nonlinear effects that can be brought to the system output. For instance, limit switch devices in mechanical systems and overflow valves can be modelled by output saturating nonlinearities. Moreover, the ability of Wiener models to capture complex nonlinear phenomena has been formally established. In this regard, it was shown that almost any nonlinear system can be approximated by a Wiener model with an arbitrarily high accuracy [2]. This theoretical fact has been experimentally checked through several practical applications including e.g. chemical processes [24, 11], biological systems [8] and others. A series combination of a Hammerstein and a Wiener model immediately yields a new model structure called the Hammerstein–Wiener system (Figure 1.4). The inverse combination leads to what is referred to as the Wiener–Hammerstein structure (Figure 1.5). These new structures offer higher modelling capabilities. Clearly, the Hammerstein–Wiener model is more convenient when both actuator and sensor nonlinearities are present. It has also been successfully applied to modelling several physical processes e.g. polymerase reactors [13], ionospheric processes [16], PH processes [18], magnetospheric dynamics. . . The Wiener–Hammerstein model (Figure 1.5) also finds applications. Closed-loop model structures (e.g. Figure 1.6) are resorted when feedback phenomena are involved. For instance, note that most industrial actuator valves are equipped with local feedback actions (commonly called positioners) to compensate coulomb frictions and other nonlinear effects. Then, the model structure like Figure 1.6 may then be quite suitable where the nonlinearity is of the saturation type.

u

L1

NL

L2

y

NL2

y

Fig. 1.4: Wiener–Hammerstein model

u

NL1

L Fig. 1.5: Hammerstein–Wiener model

6

E.-W. Bai and F. Giri

u

y

N

L

L

Fig. 1.6: Example of feedback block-oriented system

u

y

L

ξ

w N

Fig. 1.7: LFT representation of block-oriented systems with feedback actions

N

u

L

L

N

L

L

y

N

Fig. 1.8: Example of multichannel block-oriented nonlinear system

Interestingly, any block-oriented nonlinear system involving feedback elements, may be given the so-called linear fractional transformations (LFT) representation of Figure 1.7, where the block L (resp. N) accounts for all linear (resp. nonlinear) operators. Parallel block-oriented models (Figure 1.8) are suitable for modelling multichannel topology systems, e.g. electric power distribution, communication nets, multicell parallel power converters, etc.

1

Introduction to Block-oriented Nonlinear Systems

7

1.2 About This Book 1.2.1 Book Topics The previous section has emphasised the large diversity of BONL systems. The present book is not intended to be an encyclopedic monograph surveying all relevant identifications methods; it rather aims at pointing out the most recent and significant approaches i.e. those backed by formal (convergence, consistency) analyses and/or practical applications. As a matter of fact, most complete and successful solutions concern Hammerstein and Wiener systems or their simple series and parallel combinations. The present book makes a rigorous and illustrative presentation of these solutions. It consists of 24 chapters classed in height homogeneous parts. Each chapter includes an introduction, a conclusion and a self reference list. In addition to the present chapter, Part I contains a second chapter discussing some theoretical considerations about nonlinear system modelling. In particular, Wiener theory, which is an orthogonalisation of the Volterra series when the system input is a Gaussian random process, is discussed. It allows the determination of optimum systems from a desired system response. The limitation of the Wiener theory, that the input must be a Gaussian random process and the error criterion must be the mean-square, is obviated by use of the gate functions. Parts II to VII make together a survey of most recent and significant identification methods for various BONL systems. First, iterative and over-parametrisation methods are presented in Part II. In Chapter 3, Hammerstein–Wiener system identification is dealt with using an optimal two stage identification method combining a least squares parameter estimation and a singular value decomposition of two matrices whose dimensions are fixed and do not increase as the number of the data point increases. The algorithm is shown to be convergent in the absence of noise and convergent with probability one in the presence of white noise. In Chapter 4, the technique for compound mapping decomposition, that reduces the complexity of system description, is applied to general Hammerstein and Wiener types. In the case of Hammerstein–Wiener systems with piecewise affine nonlinearities, several different ways of decomposition can be performed to simplify the system description. Chapter 5 presents a normalised iterative identification algorithm for Hammerstein systems containing common nonsmooth piecewise-affine nonlinearities with saturation and preload characteristics. It is shown that the algorithms converge in one iteration step when the number of sample data is large. Stochastic identification methods are considered in Part III. Hammerstein systems are dealt with in Chapter 6 considering ARX and ARMAX linear subsystems and parametric, nonparametric and piecewise affine nonlinearities. Recursive parameter estimation algorithms are designed to identify the unknown parameters and points of the (nonparametric) nonlinearity. All recursive estimators are shown to converge to the true values with probability one. Stochastic Wiener system

8

E.-W. Bai and F. Giri

identification is dealt with in Chapter 7 using output error and maximum likelihood algorithms. The disturbances are allowed to be coloured making possible blind estimation of Wiener models. This case is accommodated by using the ExpectationMaximisation algorithm in combination with Particles methods. Alternative twostage, combined parametric-nonparametric algorithms are presented in Chapter 8. These involve estimation of the interconnecting signal and decomposition of the identification task into two subroutines. Chapter 9 presents a statistical framework for identification of parametric and nonparametric block-oriented systems, eventually including modelling errors. The developed identification approach makes classical nonparametric estimates amenable to incorporate constraints and be able to overcome the celebrated curse of dimensionality and system complexity. The proposed methodology is illustrated by examining semi-parametric Hammerstein and parallel systems. Chapter 10 gives an introduction to the invariance property and explains why it is so useful for identification of block-oriented systems. For example, the Bussgang’s classic theorem shows that this property holds for a static nonlinearity with a Gaussian input signal. Frequency domain methods are presented in Part IV. Hammerstein models with nonparametric nonlinearity are dealt with in Chapter 11. The identification method is based on the fundamental frequency and therefore, no a priori information of the nonlinearity structure is required. Wiener systems with nonparametric nonlinearities are dealt with in Chapter 12. The identification method relies on the investigation of generalised Lissajous curves. In Chapter 13, parametric Wiener–Hammerstein system identification is coped with using the concept of the best linear approximation. The identification method makes use of the frequency response function of the nonlinear system as a function of the root-mean-square (rms) value and the colouring of the input signal. Identification methods based on the Support Vector Machines (SVM), subspace and least-squares techniques are presented in Part V. Parameter identification of multiple-input multiple-output parametric Hammerstein and Wiener systems is dealt in Chapter 14. The identification method combines the predictor-based subspace method together with ideas from linear algebra and estimation theory. An alternative solution based on the Least Squares Support Vector Machines is presented in Chapter 15. The method is essentially based on Bai’s overparametrisation technique in Chapter 3, and combines this with a regularisation framework and a suitable model description which fits nicely within the LS-SVM framework with primal and dual model representations. Hammerstein systems with known structure input nonlinearities are studied in Chapter 16. The case of input nonlinearities parametrised by one parameter is dealt with using a deterministic identification method based on the idea of the separable least squares. The identification problem is shown to be equivalent to a one-dimensional minimisation problem. For a general input nonlinearity, a correlation analysis based identification algorithm is used. Blind identification methods are described in Part VI. In Chapter 17, Wiener and Hammerstein models are dealt with assuming that the unobserved input is white

1

Introduction to Block-oriented Nonlinear Systems

9

Gaussian noise, that the linear block is minimum phase, that the static nonlinearity is invertible, and, that the output measurements are noise-free. A Gaussian maximumlikelihood estimator is first constructed. Then, a Gauss-Newton iterative scheme is used to minimise the derived maximum-likelihood cost function, and asymptotic uncertainty bounds are computed. Hammerstein systems, with nonparametric static and backlash nonlinearities, are dealt with in Chapter 18. The (blind) identification method involves piecewise constant inputs and subspace direct equalisation. Hammerstein–Wiener model identification is studied in Chapter 19. The proposed (blind) identification approach requires no prior structural knowledge on the input nonlinearity and no white noise assumption on the input. Decoupling input methods are presented in Part VII. The main result of Chapter 20 is to show that the linear subsystem can be made decoupled from the nonlinear element in Hammerstein model identification. Therefore, identification of the linear part for a Hammerstein model becomes a linear problem and accordingly enjoys the same convergence and consistency results as if the unknown nonlinearity is absent. A similar result is presented in Chapter 21 for Hammerstein systems with backlash nonlinearity. Bounded error identification approaches are illustrated in Chapter 22 for Hammerstein systems with backlash nonlinearities. The identification method is a two-stage procedure using suitable tools like Pseudo Random Binary input Sequence (PRBS), relaxation techniques and linear matrix inequalities. The last part of the book illustrate the modelling capability of BONL models through biomedical and physiological applications. Chapter 23 deals with modelling and identification of physiological joint dynamics using BONL models identification. The focus is made on the stretch reflex i.e. the relationship between the angular velocity of a joint and the resulting muscle activity. In this context, the Hammerstein structure and parallel block structure turn out to be quite appropriate. Different identification techniques are used, i.e. iterative, correlation and separable least-squares methods. In Chapter 24, the Wiener–Hammerstein structure is used to model a paralysed muscle under electrical stimulation. An effective identification method is developed specifically for this system.

1.2.2 Who Can Use This Book This book will be of interest to all those who are concerned by nonlinear system identification, e.g. senior undergraduate and graduate students, practitioners and theorists in industries and academic institutions in the areas related to system identification, control and signal processing. The book can be used by active researchers and new comers to the area of nonlinear system identification. It is also useful for industrial practitioners and control technology designers. For students and new comers, the main prerequisite is to have previously followed a postgraduate course on linear system identification or equivalent courses.

10

E.-W. Bai and F. Giri

References 1. Balestrino, A., Landi, A., Ould-Zmirli, M., Sani, L.: Automatic nonlinear auto-tuning method for Hammerstein modeling of electrical drives. IEEE Transactions on Industrial Electronics 48, 645–655 (2001) 2. Boyd, S., Chua, L.O.: Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Transactions on Circuits and Systems 32, 1150–1161 (1985) 3. Brown, M., Harris, H.C.: Neurofuzzy adaptive modelling and control. Prentice Hall, New Jersey (1994) 4. Dempsey, E., Westwick, D.: Identification of Hammerstein models with cubic spline nonlinearities. IEEE Transactions on Biomedical Engineering 51, 237–245 (2004) 5. Eskinat, E., Johnson, S., Luyben, W.L.: Use of Hammerstein models in identification of nonlinear systems. AIChE Journal 37, 255–268 (1991) 6. Hammerstein, A.: Nichtlineare integralgleichung nebst anwendungen. Acta Mathematica 54, 117–176 (1930) 7. Hunt, K.J., Munih, M., Donaldson, N.D., Barr, F.M.D.: Investigation of the Hammerstein hypothesis in the modeling of electrically stimulated muscle. IEEE Transactions on Biomedical Engineering 45, 998–1009 (1998) 8. Hunter, I.W., Korenberg, M.J.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cybernetics 55, 135–144 (1986) 9. Johansen, T.A., Foss, B.A.: Constructing NARMAX models using ARMAX models. International Journal of Control 58, 1125–1153 (1993) 10. Jurado, F.: A method for the identification of solid oxide fuel cells using a Hammerstein model. Journal of Power Sources 154, 145–152 (2006) 11. Kalafatis, A., Wang, L., Cluett, W.R.: Identification of time-varying pH processes using sinusoidal signals. Automatica 41, 685–691 (2005) 12. Kim, J., Konstantinou, K.: Digital predistortion of wideband signals based on power amplifier model with memory. IEE Electronics Letters 37, 1417–1418 (2001) 13. Lee, Y.J., Sung, S.W., Park, S., Park, S.: Input test signal design and parameter estimation method for the Hammerstein–Wiener processes. Industrial and Engineering Chemistry Research 43, 7521–7530 (2004) 14. Ljung, L.: System identification: Theory for the user. Prentice-Hall, Englewood Cliffs (1999) 15. Murray-Smith, R., Johansen, T.A.: Multiple model approaches to modelling and control. Taylor & Francis, London (1997) 16. Palanthandalam-Madapusi, H.J., Ridley, A.J., Bernstein, D.S.: Identification and prediction of ionospheric dynamics using a Hammerstein–Wiener model with radial basis functions. In: American Control Conference, Portland, OR, USA, pp. 5052–5057 (2005) 17. Palanthandalam-Madapusi, H.J., Bernstein, D.S., Ridley, A.J.: Space Weather Forecasting: Identifying periodically switching block-structured models for predicting magneticfield fluctuations. IEEE Control Systems Magazine 27, 109–123 (2007) 18. Park, H.C., Sung, S.W., Lee, J.: Modeling of Hammerstein–Wiener processes with special input test signals. Industrial and Engineering Chemistry Research 45, 1029–1038 (2006) 19. Schetzen, M.: The Volterra and Wiener theories of non-linear systems. Krieger Publishing Co. (1980), (reprint Edition with Additional Material, Malabar, Fla) 20. Srinivasan, R., Rengaswamy, R., Narasimhan, S., Miller, R.: Control loop performance assessment - Hammerstein model approach for stiction diagnosis. Industrial and Engineering Chemistry Research 44, 6719–6728 (2005)

1

Introduction to Block-oriented Nonlinear Systems

11

21. Sung, S.: System identification method for Hammerstein processes. Industrial and Engineering Chemistry Research 41, 4295–4302 (2002) 22. Wang, J., Sano, A., Chen, T., Huang, B.: Identification of Hammerstein systems without explicit parameterization of nonlinearity. International Journal of Control 82, 937–952 (2009) 23. Wiener, N.: Nonlinear problems in random theory. Wiley, New York (1958) 24. Zhu, Y.: Distillation column identification for control using Wiener model. In: American Control Conference, San Diego, California, vol. 5, pp. 3462–3466 (1999)

Chapter 2

Nonlinear System Modelling and Analysis from the Volterra and Wiener Perspective Martin Schetzen

2.1 Introduction A fundamental activity of modern science is the development and analysis of models. A scientific model is just a representation of a phenomenon. It is desired because it can be used to obtain new insights concerning a phenomenon and also to predict the outcome of experiments involving the phenomenon [5]. System theory, which is the theory of models, is thus fundamental to science. An important problem in system theory is the development of nonlinear models of phenomena. The Wiener theory of nonlinear systems is surveyed in this chapter. This survey will begin with a basic discussion of the Volterra theory which lies at the base of the Wiener theory. For this, some of the basic theoretical extensions to the Volterra theory will be outlined since they illustrate the broad usefulness of the Volterra series in the modelling of nonlinear phenomena which have been the basis of a number of practical applications of the Volterra theory.

2.2 General System Considerations Systems are classified as being either autonomous and nonautonomous. An autonomous system, such as the solar system, is one in which there is no external input. A nonautonomous system is one in which there is an external input which results in a system output. The discussion in this chapter will be limited to nonautonomous systems. Further, we limit our discussion to nonexplosive systems which are systems for which bounded inputs result only in bounded outputs. This condition is often referred to as the BIBO stability criterion. Note that a linear time invariant (LTI ) system is unstable in accordance with this criterion if there exits just one bounded input for which the output is unbounded. For example, a causal LTI system 1 u(t), in which u(t) is the unit step function, is BIBO with the impulse response 1+t Martin Schetzen, Professor Emeritus Department of Electrical and Computer Engineering, Northeastern University, Boston, Ma F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 13–24. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

14

M. Schetzen

unstable since its response to a step is unbounded. We note from this example that an LTI system can not have infinite memory for it to be nonexplosive since, as the past of an input increases, its contribution to the present value of the system output must decrease. This is not necessarily so for systems which are not LTI. In fact, many physical nonexplosive systems do have infinite memory. Generally, systems with more than one state of stable equilibrium and which can be switched from one stable state to another by the input possess infinite memory. A simple example of a system in this class of systems is the fuse. Our discussion will not apply to the class of systems with infinite memory and we consider systems with only one stable state. For convenience, our discussion will centre on time invariant systems in which the input and output are functions of time. For physicality, we thus will require the systems to be causal although, with a slight change of viewpoint, the theory also can be applied to noncausal systems such as occur with mechanical systems in which an input about x = 0 can have effects on the system over both positive and negative values of x . The Volterra and Wiener class of nonlinear systems which we shall discuss thus is restricted to non-infinite memory nonautonomous systems which are not explosive.

2.3 The Volterra Series LTI system theory is the basis for modelling many areas such as circuits, dynamics, and electromagnetics and it has been very successful in enabling the development of important practical applications as well as deep insights into these areas [4]. Since many practical nonlinear systems appear to be well approximated as a linear system for inputs with sufficiently small amplitude, Wiener considered representing the output, y(t), of a nonlinear system as a functional series expansion of its input, x(t). The type of expansion he chose was the Volterra Series: ∞

y(t) = h0 + ∑ Hn [x(t)]

(2.1)

hn (τ1 , ..., τn )x(t − τ1 )...x(t − τn )d τ1 ...d τn .

(2.2)

n=1

in which +∞

Hn [x(t)] =

−∞

+∞

...

−∞

In these equations, Hn is called a Volterra operator and in the nth -order integral, hn (τ1 , ..., τn ) is called the Volterra kernel of the Volterra operator. This type of expansion is named in recognition of Volterra’s contributions [10]. The Volterra series can be viewed as power series with memory. This can be seen by changing the input by a gain factor, c, so that the new input is cx(t). The new series for the system output is then seen to be ∞

∞

n=1

n=1

y(t) = h0 + ∑ Hn [cx(t)] = h0 + ∑ cn Hn [x(t)]

(2.3)

2

Nonlinear System Modelling and Analysis

15

which is a power series in the amplitude factor, c. It is a series with memory since the functionals, Hn [x(t)], are convolutions.

2.4 Applications of the Volterra Series Many important nonlinear problems have been successfully analysed by use of the Volterra series. The first application of the Volterra series was by Wiener in which he analysed a nonlinear RLC circuit. This was accomplished by making use of its power series property to expand the solution of the differential equation describing the circuit [11]. Methods were then further developed for the measurement and synthesis of the Volterra series [6]. Also, by making use of its power series property, the Volterra series of nonlinear feedback systems and pth -order inverses of nonlinear systems were determined [6]. A pth -order inverse is one for which tandem connection of a nonlinear system and its pth -order inverse is a system for which the second through the pth -order nonlinearity is zero. The determination of pth -order inverse of a given system requires only the first p Volterra kernels of the given system. This is important since, in practice, only the first p Volterra kernels of a given system are known. One physical application of the pth -order inverse is the undesirable distortion in an audio system due to various nonlinearities such as those of the loudspeaker and of the air through which the sound wave travels which can be reduced by placing a pth -order inverse before the loudspeaker. One can reduce the distortion introduced by audio or video tape recorders in a similar manner. Also, the suppression of the effect of slight transceiver nonlinearities in broadcast satellites can be accomplished without disturbing the satellite by installing the pth -order inverse of the nonlinearity at the output of the ground station transmitter or at the input of the ground station receiver. These are examples of predistorting a signal to combat the distortion introduced by the system through which the signal will pass. In effect, the predistortion of a signal using a pth -order inverse as a pre-inverse is similar to the coding a signal for transmission through a corrupting medium while the use of a pth -order inverse as a post-inverse is similar to the decoding of a signal which has been corrupted in its transmission through a medium. Another important application of Volterra series is the analysis of nonlinear networks. The power series property of the Volterra series was used to develop a multilinear analysis technique for the analysis of networks which are slightly nonlinear [6]. Instead of expanding the network differential equations in a Volterra series, a general multilinear model of the network is derived directly from the network itself from which the effect of the network nonlinearities on each branchcurrent and node voltage is made evident. Initial conditions are incorporated by replacing them with equivalent sources. In this model, the network nonlinearities are equivalent to known voltage and current sources embedded in a linear network. In consequence, all the techniques and insights obtained from linear circuit theory can be used. An important aspect of this procedure, as opposed to the usual methods, is that multilinear network theory maintains the network topology so that the effect of a given

16

M. Schetzen

nonlinear element on the network can be studied. It is this from which a theory for the synthesis of nonlinear networks can be developed. For example, one type of synthesis problem which occurs in electronics is the linearisation of a given nonlinear network. From knowledge of the specific effect of a given nonlinear element on the network, one can determine the placement of other nonlinear elements in the network required to cancel or at least reduce the undesirable effects of the given nonlinear element. An illustration of this linearisation technique is the classic pushpull circuit. Push-pull circuits are often used in electronic power amplifiers in which a large amount of distortion is introduced by operating a transistor in class B. The distortion is significantly reduced by introducing a second similar transistor and connecting it in a push-pull arrangement. In this manner, the power amplifier distortion is reduced significantly by the introduction of a second nonlinear element. Since multilinear analysis only requires the sequential analysis of the same linear network, the analysis can be done on a computer by using any of a number of software packages available for the analysis of linear networks. A computer program in which the software package is called for each sequential term would facilitate the analysis and enable one to obtain a rather high-order Volterra representation of a nonlinear network. It also turns out that the response of a given network to a particular input can be expressed as the linear combination of a particular set of waveforms which are invariant waveforms of the network response. The network nonlinearities and source amplitudes affect only the specific linear combination of these waveforms but not the waveforms themselves. These invariant waveforms thus are the building blocks of the nonlinear network response. With them, it becomes evident why the convergence of the Volterra series often is not a uniform convergence. Also the effect of the network nonlinearities and the source amplitudes upon the network response can be studied in terms of these invariant waveforms. In this manner, multilinear theory, which is based on the Volterra series, allows a major step forward to be taken in the theory of nonlinear networks and opens the possibility of greater advances to be made. The ability to obtain practical insights of a system by its Volterra series analysis is well illustrated by the analysis of the laser diode. Useful insight of its operation was obtained from its Volterra model [8, 13]. From this an analysis of the diode intermodulation modulation distortion (IMD) and its sources were obtained from which methods for reducing the distortion are obtained [9]. In addition, an understanding of methods for optimising the diode operation were obtained from the Volterra model [7]. The structure of a system’s block diagram determines the form of its Volterra kernels which can be exploited to determine general properties of the system. A good example is the class of systems which appear to be a good approximation of many biologic systems are those which are composed of a linear system followed by nonlinear-no memory system, and followed by a linear system (LNL systems). Although I have discussed systems in which the input and output are functions of time, this need not be so. For example, in applications to mechanical systems such as the static bending of a beam due to loading, the input and output would be functions only of position. Note that for such applications, the system Volterra

2

Nonlinear System Modelling and Analysis

17

kernels are not constrained by the requirements of causality as they are for systems in which the input and output are functions of time.

2.5 The Wiener Theory As a consequence of its power series character, there are some limitations associated with the application of the Volterra series to nonlinear problems. One major problem is the convergence of the Volterra Series. In function theory, analytic functions are those that are the sums of a Taylor series. Similarly, systems which can be represented by a convergent Volterra series are called analytic systems. The important successes of the Volterra Series in practical applications makes the question of the class of nonlinear systems that can be represented by a set of Volterra functionals an important one. That is, what is the class of nonlinear systems for which the Volterra operators are complete? Since the Volterra series is a functional power series, we first look to the approximation of functions. From the Weierstrass Approximation Theorem, we know that the functions, 1, x, x2 , x3 , ...,form a complete set of functions in every closed interval. As a result, any function which is continuous in a closed interval can be approximated uniformly by polynomials in the interval. Fréchet extended this result concerning functions to functionals [1]. He proved that any continuous functional can be represented by a Volterra series whose convergence is uniform in all compact sets of continuous functionals. A functional, y(t) = H [x(t)], is said to be continuous if the values y1 (t) = H[x1 (t)] and y2 (t) = H[x2 (t)] are close whenever the corresponding input functions, x1 (t) and x2 (t), are close. So analogously, the Fréchet theorem is a and statement that the set of Volterra functionals, Hn [x(t)], is complete. However, since many systems are not analytic, the Volterra series expansion of a system response may not converge. In the theory of functions, the convergence problem associated with a power series representation of a function, can be circumvented by using a different convergence criterion. One common method of accomplishing this is to use an L2 -norm as the measure of the error and to expand the function in an orthogonal series. One of Wiener’s basic innovations was to extend this notion of orthogonality of functions to functionals. In this manner, he circumvented the convergence problem in system modelling associated with the Volterra series representation of a nonlinear system. First, Wiener chose Brownian motion as the input for the characterisation of a nonlinear system [12]. To understand why he made this choice, first consider the characterisation of a linear system. It is only as a consequence of the superposition property of a linear system that any one waveform of a wide class of waveforms can be used as an input probe to characterise its input-output mapping. Since superposition does not hold for nonlinear systems, Wiener realised that the input probe required to characterise the nonlinear system would have to be a waveform which, over a time interval, eventually approximates any given waveform. That is, since the response of a nonlinear system to some arbitrary input can not be determined by superposition of its responses to some restricted class of input waveforms, the required input probe should, as time evolves, approximate all possible input waveforms. The

18

M. Schetzen

waveform representation of a Brownian particle’s wanderings was just such a waveform. To represent a nonlinear operator with a Brownian input, Wiener used a Stieltjes form of the Volterra functional which is equivalent to our familiar Riemann form of the Volterra functionals with the input, x(t), being white Gaussian time function1 . With these, he formed an orthogonal set of functionals for the analysis of nonlinear systems. He called the orthogonal set he formed, G-functionals. A time function, x(t), is said to be white if its power density spectrum is a constant so that φxx ( jω ) = A for which its autocorrelation function is φxx (τ ) = x(t)x(t + τ ) = Aδ (t) in which the overbar indicates the time average and δ (t)is the unit impulse. A white time function is not a physical waveform since its total average power is infinite. However, it is an accurate idealisation of a physical waveform for which the power density spectrum is flat over a band of frequencies considerably wider than the bandwidth of the system to which is being applied as an input. This idealisation results in great analytical simplification.

2.6 The Wiener G-functionals As I stated above, Wiener called his orthogonal set of functionals G-functionals since they are orthogonal with respect to a white Gaussian input. That is, he formed the G-functionals, G p [k p ; x(t)] in which k p is the kernel and x(t) is the input of the pth -order G-functional, to satisfy the condition G p [k p ; x(t)] Gq [k p ; x(t)] = 0

(2.4)

p = q when x(t) is from a white Gaussian process with the power density φxx ( jω ) = A A pth -order G-functional, G p [k p ; x(t)], is the sum of Volterra functionals of orders less than or equal to p. The form of a pth -degree G-functional, is p

G p [k p ; x(t)] =

∑ Kn(p) [x(t)] .

(2.5)

n=0

All kernels of order less than p are determined by the leading kernel, k p , to satisfy the orthogonality condition. A pth -degree G-functional, G p [k p ; x(t)], thus is the sum of Volterra functionals Kn(p) of orders less than or equal to p which are specified only by the leading Volterra kernel,k p. The first few G-functionals are: G0 [k0 ; x(t)] = k0 , 1

(2.6)

Wiener expressed the Volterra operators as Stieltjes integrals with the input being a member of an ensemble of Brownian waveforms. Since Brownian motion is differentiable almost nowhere, the Riemann form of these integrals does not formally exist. However, the statistical properties of the Volterra operators expressed in the Riemann form with a white Gaussian input is identical with those of the Volterra operators expressed in Stieltjes form with the input being a Brownian waveform. For convenience, we use the Riemann form with a white Gaussian input.

2

Nonlinear System Modelling and Analysis +∞

k1 (τ1 ) x (t − τ1 ) d τ1 ,

(2.7)

k2 (τ1 , τ2 ) x (t − τ1 ) x (t − τ2 ) d τ1 d τ2 + k0(2) ,

(2.8)

G1 [k1 ; x(t)] = +∞ +∞

G2 [k2 ; x(t)] =

−∞

−∞

19

−∞

in which k0(2) = −A

+∞ −∞

k2 (τ1 , τ1 ) d τ1 ,

(2.9)

and +∞ +∞ +∞

G3 [k3 ; x(t)] =

−∞

−∞

−∞

k3 (τ1 , τ2 , τ3 ) x (t − τ1 ) x (t − τ2 ) x (t − τ3 )

d τ1 d τ2 d τ3 + in which k1(3) (τ1 ) = −3A

+∞ −∞

+∞ −∞

k1(3) (τ1 ) x (t − τ1 ) d τ1 , (2.10)

k3 (τ1 , τ2 , τ2 ) d τ2 .

(2.11)

2.7 System Modelling with the G-functionals In terms of the G-functionals, the pth -order model of a nonlinear system is p

y p (t) =

∑ Gn [kn ; x(t)] .

(2.12)

n=0

The kernels, kn , are called the Wiener kernels of the nonlinear system. A practical method was developed [2] by which the Wiener kernels can be determined by crosscorrelating the system response, y(t), with the white Gaussian input, x(t), as ⎧ ti ≥ 0, i = 1, 2, ..., n , ⎪ 1 ⎪ ⎨ n!A n y(t)x (t − τ1 ) x (t − τ2 ) ...x (t − τn ) n = 0, 1, 2, ..., p , kn (τ1 , τ2 , ..., τn ) = ⎪ ⎪ ⎩ 0 otherwise. (2.13) In L2 -norm, this representation results in the optimum pth -order model of the given system. Note that it is optimum only for the white Gaussian input. Once the optimum pth -order model of a system has been determined, the corresponding pth -order Volterra series can be obtained simply by summing the Volterra kernels with the same order of each orthogonal functional. Wiener originally showed that his set of G-functionals is complete in the class of nonexplosive nonlinear systems whose memory is not infinite which I call the Wiener class of systems [6]. By nonexplosive in this context I mean systems for which the output has a finite mean-square value when the input is from a white Gaussian process. Although infinite memory linear systems are not physical, there is

20

M. Schetzen

a large class of infinite memory nonlinear systems which are physical such as a fuse which never forgets whether the current through it ever exceeded its rating. Systems which switch their states are generalisations of the fuse and so are infinite memory systems which are not members of the Wiener class of systems. Note that the model obtained is the optimum system so that the mean-square error obtained with a white Gaussian input with φxx ( jω ) = A will not be smaller for any other system of the Wiener class. The corresponding pth -order Volterra series can be obtained simply by summing the Volterra kernels with the same order of each orthogonal functional. This is the optimum pth -order Volterra model of the actual system for the given white Gaussian input even though the system may not be able to be represented by a convergent Volterra series. In practical applications, it is desirable to model a given system with an input with the same spectrum as that used in normal operation. For this, I have extended the G-functional theory to non-white Gaussian inputs whose spectra are factorable and also to non-Gaussian inputs for which there exists a one-one map to a Gaussian process. These extensions, as well as other concepts are discussed, are detailed in [6]. With these extensions, optimum Volterra models of a given system also can be determined for these classes of inputs. Optimum models for any input will be discussed below.

2.8 General Wiener Model From a study of the Wiener theory, it can be shown that any system of the Wiener class can be represented by the model shown in Figure 2.1. Section A of the model is a single-input multi-output linear system. Section B is a multi-input multi-output no-memory system composed of just multipliers and adders. The single-output of Section C is just a linear combination of its inputs. Sections A and B are the same for all systems of the Wiener class. The specific system being modelled is determined only by the amplifier gains in Section C. The outputs of section A are the coefficients of a representation of the past of the system input. The nonlinear operations on the past of the input is performed in Section B. In a practical model, Sections A and B are constructed with only a finite number of outputs. The outputs of Section A then are the coefficients of what the model can represent of the past of the system input and the outputs of Section B then are coordinates of the class of polynomial nonlinear operations that the model can represent. For an input from a Gaussian process, the outputs of Section A are jointly Gaussian and statistically independent so that outputs of Section B can be made

Fig. 2.1: Wiener Model

2

Nonlinear System Modelling and Analysis

21

linearly independent which enables the determination of the amplifier gains in Section C individually from certain simple averages. However, if the input, x(t), is not Gaussian, then the outputs of Section A are not jointly Gaussian and their joint distribution is extremely difficult to determine. The determination of the structure of Section B for which its outputs are linearly independent is then not possible. In such cases, one technique to determine the optimum amplifier gains in Section C is by a surface search technique. If a surface search technique is used, the optimum system for any convex error criterion can be determined. However, another procedure for inputs of any class is to use the Gate function model with which optimum systems even for certain non-convex error criteria can be determined.

2.9 The Gate Function Model If the input, x(t), is not Gaussian or the desired error criterion is not the mean-square criterion, a practical approach is obtained by using Gate functions. In a Gate function model, Section B is composed of Gate functions for which a given output of Section B is non- zero only if the output amplitudes of Section A are in a given multidimensional cell. As I’ve shown [6], a great advantage of the Gate function model is that the optimum amplifier gains can be determined individually for any weighted function of any error criterion, even many non-convex ones. However, the Gate function model usually will require more amplifier gains in Section C than required for the Wiener model with the same modelling error. This is the price that must be paid for the added generality and flexibility which is obtained with the Gate function model. But the number of amplifier gains can be substantially reduced by a judicious choice of the set of linear operators used in Section A. Some techniques for accomplishing this are detailed in [3, 6]. For a given input of a system, the output error is the difference between its response and some desired response. The measure of the size of the error used is called the error criterion. For a given error criterion, an optimum system is that system for which the size of the error is the smallest possible. Thus different error criteria generally will result in different optimum systems. That is, there is no such thing as the optimum system; an optimum system is only optimum relative to some given error criterion. Classically, the mean-square error criterion has been used to determine optimum linear systems because of its analytic tractability and that the only data required is the autocorrelation function of the input and the crosscorrelation function of the input and desired output. However, other error criteria are often desired. For example, consider attempting to hit the bull’s-eye of a target by using a controller to aim the gun. For such a system, small errors are well tolerated; it is just the large errors which it is desired to reduce significantly. The error criterion should then be one which the measure of the size of the error is small for misses of the target centre less than the bulls-eye radius but increases rapidly for misses larger than the bulls-eye radius. Note that in all our discussion above, there is no requirement that the output, y(t), of the system being modeled be the output of a physical system. The output thus can be just some desired output, z(t), of some unknown system. The optimum system is

22

M. Schetzen

called an optimum predictor if the desired output is a prediction of the system input as, for example, in predictive control in which it is desired to lead a target. The determination of an optimum predictor assumes that the future is not affected by the prediction. Thus, the prediction of the future of your speech waveform is valid only if you are not informed of the prediction allowing you to purposely modify its future. The only waveforms which can be predicted with zero error are quasianalytic ones which are not physical waveforms so that, in any practical case, there always will be some prediction error. The optimum system is called an optimum filter if the system input is a corrupted version of a signal and the desired system output is the uncorrupted signal. Except for the determination of an optimum LTI system using the mean-square error criterion, the analytic determination of an optimum system is generally not possible. The reason is that all the required data is normally not available and even if known, the resulting equations are difficult, if not impossible, to solve analytically. In consequence, an approach that can be taken is to consider a system model that can represent any system of a representative class of systems by a choice of a set of coefficients of the model. The desired optimum system is assumed to be able to be represented by a member of the class of systems. The set of coefficients of the model which represents the optimum system is then determined. This determination can be done experimentally. If the optimum system is not a member of the class of systems that can be represented, then the experimental procedure results in that class member with the smallest error. This approach obviates the difficulties involved with analytical techniques.

2.10 An Optimum System Calculator Of the various system models, the gate function model has the great advantage over the other models discussed in that it can be used to determine optimum systems for any input and any desired proper error criterion. However, the gate model has the disadvantage of requiring the determination of many more coefficients than the Wiener model. This disadvantage, though, can be overcome by only using the gate function model as an optimum system calculator. Advances over the last few years in computer technology have made such a calculator practical. The gate model so determined is not very practical to use as an optimum system since the number of its coefficients is large. However, the gate model only needs to be constructed once and used as a computer to determine optimum systems. For practical applications one then could make a much simpler model of the gate model by constructing a Volterra model of it. This can be accomplished, for example, by first determining a Wiener model of it as described above. Note that this determination also can be done digitally by construction a digital model and using a sample of a Gaussian random process as the input. This determination would not require much time since the number of coefficients of the Wiener model are so much less. The Wiener model then can be realised as a Volterra series in digital or analog form by summing Volterra terms of the Wiener model with the same order.

2

Nonlinear System Modelling and Analysis

23

If it appears that the error obtained with this Volterra model could possibly be reduced significantly, one could determine a second optimum gate model using, for example, a different set of linear systems and/or gates. The desired output used to determine the second optimum gate model would be the original desired output minus the output of the Volterra model that has been determined. The Volterra model of the second gate model could then be added in parallel to the original Volterra model to form a new Volterra model with a smaller error. The realisation of the new Volterra system is simplified by first summing terms of the same order. Note that the gate function model computer, which has not yet been constructed, is only used as a calculator to determine desired optimum systems from which optimum systems with a simpler realisation are obtained. The philosophy of this procedure to experimentally determine an optimum system for some desired error criterion from a representative sample of a system input and some desired system response replaces the procedure of determining the properties of the input and desired system response approximately and then using these data to approximately determine the desired system analytically. This is the basic philosophic approach of the Gate function calculator for optimum system determination. Also note that, as opposed to analytical methods, the direct experimental determination of an optimum system with the Gate function calculator for any desired error criterion with any input and desired response results in a system for which the closeness of the system determined to the optimum system is controlled only by the number of coefficients used. Also, as discussed above, the closeness of the resulting system can easily be made smaller if desired. This chapter has only been a survey illustrating the importance of the Volterra series and Wiener theory in system theory and their usefulness in system applications. Full details of the developments discussed and possible future developments are contained in [6].

References 1. Fréchet, M.: Sur Les Fonctionelles Continues. Annales Scientifiques de L’Ecole Normale Superieure, 3rd -Ser 27, 193–216 (1910) 2. Lee, Y.W., Schetzen, M.: Measurement of the Wiener Kernels of a Nonlinear System by Crosscorrelation. International Journal of Control 2(3), 237–254 (1965) 3. Schetzen, M.: Asymptotic Optimum Laguerre Series. IEEE Transactions on Circuit Theory CT-18(5), 493–500 (1971) 4. Schetzen, M.: Linear Time-Invariant Systems. John Wiley & Sons, New York (2003) 5. Schetzen, M.: Airborne Doppler Radar: Applications, Theory, and Philosophy, Reston, vol. 215, American Institute of Aeronautics and Astronautics (2006) 6. Schetzen, M.: The Volterra and Wiener Theories of Nonlinear Systems, Krieger Publishing Co., Reprint Edition with Additional Material, Malabar, Fla (2006) 7. Schetzen, M.: Analysis of the Single-mode Laser-diode Linear Model. Optics Communication 282, 2901–2905 (2009) 8. Schetzen, M., Yildirim, R.: System Theory of the Single-mode Laser-diode. Optics Communication 219, 341–350 (2003)

24

M. Schetzen

9. Schetzen, M., Yildirim, R., Çelebi, F.V.: Intermodulation Distortion of the Single-Mode Laser-Diode. Applied Physics B 93, 837–847 (2008) 10. Volterra, V.: Theory of Functionals and of Integral and Integro-Differential Equations. Dover Publications, Inc., New York (1959) 11. Wiener, N.: Response of a Nonlinear Device to Noise, Report 129, Radiation Laboratory, MIT., Cambridge, Ma (April 1942) (Also published as U.S. Department of Commerce Publications, PB-58087) 12. Wiener, N.: Nonlinear Problems in Random Theory. Technology Press, MIT & Wiley, New York (1958) 13. Yildirim, R., Schetzen, M.: Applications of the Single-mode Laser-diode. Optics Communication 219, 351–355 (2003)

Part II

Iterative and Overparameterization Methods

Chapter 3

An Optimal Two-stage Identification Algorithm for Hammerstein–Wiener Nonlinear Systems Er-Wei Bai

3.1 Introduction Consider a scalar stable discrete time nonlinear dynamic system represented by p

q

n

m

i=1

l=1

j=1

t=1

y(k) = ∑ ai { ∑ dl gl [y(k − i)]} + ∑ b j { ∑ ct ft [u(k − j)]} + η (k)

(3.1)

where y(k), u(k) and η (k) are the system output, input and disturbance at time k respectively. The gl (·)’s and ft (·)’s are non-linear functions and a = (a1 , ..., a p ) , b = (b1 , ..., bn ) , c = (c1 , ..., cm ) , d = (d1 , ..., dq )

(3.2)

denote the system parameter vectors. The model (3.1) may be considered as the system where two static nonlinear elements N1 and N2 surround a linear block. It is different from the well-known Wiener–Hammerstein model [2] where two linear blocks surround a static nonlinear element and also different from the Hammerstein model discussed in [3, 4, 7, 8, 9] composed of a static nonlinear element followed by a linear block. The purpose of identification is to estimate unknown parameter vectors a, b, c and d from the observed input-output measurements. Through out the chapter, fi ’s (i = 1, 2, ..., m) and g j ’s ( j = 1, 2, ..., q) are assumed to be a priori known smooth nonlinear functions and the orders q, n, p and m are assumed to be known as well. Although applied to modelling some physical systems (e.g., see [10]), the system (3.1) has received little attention as far as the identification is concerned. On the contrary, there exists a large body of work on identification of the Wiener–Hammerstein model and the Hammerstein model. Unfortunately, schemes developed for either the Wiener–Hammerstein model or the Hammerstein model do not apply to the system (3.1) directly. The widely suggested algorithm for the Wiener–Hammerstein Er-Wei Bai Dept. of Elec. and Comp. Engineering, University of Iowa e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 27–34. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

28

E.-W. Bai

model is the correlation technique [2] which relies on a separability assumption. Besides computational complexity, this assumption is restrictive and does not appear to work for the system (3.1) because of two nonlinear elements instead of one. The most popular identification schemes for the Hammerstein model are the NarendraGallman(NG) algorithm [7] and its variations [8]. The original NG algorithm may be divergent [9] But with normalising the estimates at each iteration [8], it is globally asymptotically convergent provided that the linear block in the Hammerstein model is FIR and the input u(k) to the system is white. The NG algorithm does not however apply to the system (3.1) because of a non-FIR linear block and also of the presence of the output nonlinear element. In fact, it is verified that the NG algorithm is not convergent for the system (3.1) even normalised at each iteration. The purpose of this chapter is to propose an algorithm for identification of the system (3.1). It is an optimal two-stage approach also referred to as the overparametrisation method. First, the least squares estimate for an over-sized paramˆ cˆ and dˆ are calculated eter vector is obtained. Then, the optimal estimates a, ˆ b, ˆ based on this least squares estimate. The estimates a, ˆ b, cˆ and dˆ provided by the proposed algorithm converge to the true a, b, c and d if the noise sequence is absent and converge to a, b, c and d with probability one if the noise sequence is white. The proposed algorithm is based on the works in [3, 4, 5] suggested for identification of the Hammerstein model, but it extends them in some important ways: • The proposed algorithm applies to the system (3.1) and thus is more general than those in [4, 5]. Note that the model discussed in [4, 5] contains only one nonlinear block, a subset of the system (3.1). • The estimates given by the proposed algorithm in this chapter are globally optimal while those presented in [4, 5] are local optimal and no optimality can be claimed for the estimate obtained in [3]. In particular, if proper pre-filtering on the observed data is allowed, the proposed algorithm is globally optimal for arbitrary noises. • One of the key assumptions in [3] is that the parameter vector of a bilinear Hammerstein system can be modelled with the first part being linear which implies that the system can only contain one nonlinear block and moreover the exact delay of the unknown system has to be known a priori. Our proposed algorithm does not require this assumption. Finally, it is important to point out that the computation involved in the proposed algorithm is very simple. The first stage is the least squares estimate which can be calculated recursively as the number of samples increases. The second stage is a global search which is elegantly transformed into the computation of the singular value decompositions (SVD) of two matrices with fixed dimensions n × m and p × q independent of the number of the data points.

3.2 Optimal Two-stage Algorithm Notice that the parametrisation of the system (3.1) is not unique. For instance, any parameter vectors b¯ = α b and c¯ = α −1 c or a¯ = β a and d¯ = β −1 d for some

3

Two-stage Algorithm

29

non-zero constant α or β provides an identical system as the one in (3.1). In other words, any identification experiment can not distinguish between the parameter vector sets (b, c) and (α b, α −1 c) or the sets (a, d) and (β a, β −1 d). To obtain a unique parametrisation, the first elements of the vectors b and a may be fixed to 1, a technique often used in the literature. The problem with this technique is that it indirectly presumes the delay of the system to be 1 which may not be the case. To avoid this problem, a different approach will be used: Assumption 3.1 (Uniqueness). Consider the system (3.1). Assume that ad T and bcT are not both zero. Moreover assume that a2 = 1 and b2 = 1 ( · 2 stands for the 2-norm) and the signs of the first non-zero elements of a and b are positive. Under the Uniqueness Assumption, it can be easily verified that the parametrisation of the system (3.1) is unique. Non-zeroness of bcT and ad T are assumed to avoid trivialising the identification problem. Define

θ

= (b1 c1 , ..., b1 cm , b2 c1 , ...b2 cm , ..., bn c1 , ..., bn cm , a1 d1 , ..., a1 dq , ..., a p d1 , ..., a p dq )T = (θ1 , ..., θnm , θnm+1 , ..., θnm+pq )T ⎞ ⎞ ⎛ b 1 c1 , b 1 c2 , . . . , b 1 cm a1 d1 , a1 d2 , . . . , a1 dq ⎜ b 2 c1 , b 2 c2 , . . . , b 2 cm ⎟ ⎜ a2 d1 , a2 d2 , . . . , a2 dq ⎟ ⎟ ⎟ ⎜ T =⎜ , = bcT = ⎜ Θ = ad ⎟ ⎟, ⎜ .. .. . . .. .. .. . . .. ad ⎠ ⎠ ⎝ ⎝ . . . . . . . . b n c1 , b n c2 , . . . , b n cm a p d1 , a p d2 , . . . , a p dq ⎛

Θbc

and φ (k) = ( f1 [u(k − 1)], ..., fm [u(k − 1)], ..., f1 [u(k − n)], ..., fm [u(k − n)] , (3.1) g1 [y(k − 1)], ..., gq [y(k − 1)], ..., g1[y(k − p)], ..., gq [y(k − p)])T . The system (3.1) can now be written as y(k) = φ T (k)θ + η (k). For a given data set {u(k), y(k)}Nk=1 , let YN = (y(1), ..., y(N)) , ηN = (η (1), ..., η (N)) , ΦN = (φ T (1), ..., φ T (N)). Then, Further, let

YN = ΦN θ + ηN .

(3.2)

θ (N) = (θ 1 , ..., θ nm , θ nm+1 , ..., θ nm+pq )T , b(N) = ( b1 , b2 , ..., bn )T , a (N) = ( a1 , a 2 , ..., a p )T ,

c (N) = ( c1 , c 2 , ..., c m )T , and d(N) = (d 1 , d 2 , ..., d q )T

(3.3)

denote the estimates of θ , a, b, c, and d respectively using the data set {u(k), y(k)}Nk=1 and

30

E.-W. Bai

⎛

⎛ ⎞ ⎞ θ 1 , . . . , θ m θ nm+1 , . . . , θ nm+q ⎜ ⎜ ⎟ ⎟ ⎜ θm+1 , . . . , θ 2m ⎟ ⎜ θnm+q+1 , . . . , θ nm+2q ⎟

⎜ ⎜ ⎟ ⎟ Θbc (N) = ⎜ .. . . . .. . . . ⎟ and Θad (N) = ⎜ ⎟ . .. . .. . . ⎝ ⎝ ⎠ ⎠ θ (n−1)m+1 , . . . , θ nm θ nm+(p−1)q+1, . . . , θ nm+pq (3.4) denote the estimate of Θbc and Θad respectively. Now, the two-stage identification algorithm for a given data set {u(k), y(k)}Nk=1 can be summarised in the following: Two-stage Identification Algorithm: Consider the system (3.1) under the Uniqueness Assumption. For a given data set {u(k), y(k)}Nk=1 , Step 1: Calculate the least squares estimate θ (N) = θ ls (N) = (ΦNT ΦN )−1 ΦNT YN .

ad (N) from θ (N) = θ ls (N) as in (3.4) and let

bc (N) and Θ Step 2: Construct Θ

bc (N) = Θ

min(n,m)

∑

i=1

ad (N) = σi μi νiT and Θ

min(p,q)

∑

δi ξi ζiT

i=1

be their singular value decompositions (SVD), where μi ’s (i = 1, 2, ..., n), νi ’s (i = 1, 2, ..., m), ξi ’s (i = 1, 2, ..., p) and ζi ’s (i = 1, 2, ..., q) are n, m, p, q-dimensional orthonormal vectors respectively. Step 3: Let sμ denote the sign of the first non-zero element of μ1 and sξ denote the sign of the first non-zero element of ξ1 . Define the estimates

b(N) = sμ μ1 , c (N) = sμ σ1 ν1 , a (N) = sξ ξ1 , d(N) = sξ δ1 ζ1 . The idea of the above algorithm is simple. First, the least squares estimate

bc (N) and Θ

ad (N), is sought to minimise the predicθ ls (N), or equivalently Θ

tion error. Since θls (N) is of high dimension, usually higher than p + n + m + q =dim(a)+dim(b)+dim(c)+dim(d), a lower q + p + n + m-dimensional vector ( aT (N), bT (N), c T (N), d T (N))T representing the estimates of a, b, c and d is to be determined. Note that

θ = (b1 cT , b2 cT , . . . , bn cT , a1 d T , . . . , a p d T )T and T T T T T T θ (N) = θ ls (N) = (b 1 c , b2 c , . . . , bn c , a1 d , . . . , a p d ) have a special structure in terms of a, b, c and d. Let vec(H) indicate the column vector by stacking the columns of H, i.e., vec(H) = (hT1 , hT2 , ..., hTl )T if H = (h1 , h2 , ..., hl ). Then, (bˆ 1 (N)cˆT (N), . . . , bˆ n (N)cˆT (N), aˆ1 (N)dˆT (N), . . . , aˆ p (N)dˆT (N))T − θ (N)22

3

Two-stage Algorithm

31

=

ˆ vec(b(N) cˆT (N)) − θ (N)22 vec(a(N) ˆ dˆT (N))

bc (N)2 +

ad (N)2 , = b(N) cT (N) − Θ a(N)d T (N) − Θ F F b(N), where · F stands for the matrix Frobenius norm. Thus, the closest a (N),

to θ (N) in the 2-norm sense is given by c (N) and d(N)

( a(N), d(N)) = arg

min

x∈R p ,w∈Rq

( b(N), c (N)) = arg

min

ad (N) − xwT 2 and Θ F

z∈Rn ,v∈Rm

bc (N) − zvT 2 Θ F

The solutions of b(N), c (N), a (N) and d(N) are provided by SVD decomposi

tions of Θbc (N) and Θad (N) as given in Step 2 of the proposed Identification Algorithm. The key difference between the above Two-stage Identification Algorithm

and the existing results along this line, e.g. in [4, 5], is that the estimates a (N), d(N),

b(N), c (N) here are obtained through a search over the entire parameter space while in [4, 5], they were searched only over a small subset. Now, the following results can be derived [1]. Theorem 3.1. Consider the system (3.1) and the Two-stage Identification Algorithm under the Uniqueness Assumption. Then, 1. For any N > 0, if ΦN is full column rank and the disturbance η (k) ≡ 0, then

a (N) = a, b(N) = b, c (N) = c, d(N) = d.

(3.5)

2. Let the disturbance η (k) be white with zero mean and finite variance and independent of u(k). Suppose the input u(k) is bounded and the regressor φ (k) is persistently exciting (PE), i.e.,

α2 I ≥

k0 +l0

∑

φ (k)φ T (k) ≥ α1 I > 0

k=k0

for any k0 ≥ 0 and some l0 > 0. Then, with probability 1 as N −→ ∞,

a (N) → a, b(N) → b, c (N) → c, d(N) → d.

(3.6)

Theorem 3.1 is also valid even if the disturbance η (k) is not zero mean by slightly modifying the system equation. To this end, let e = Eη (k) = 0 be the mean value of η (k). Then, the system (3.1) can be re-written as p

q

n

m

i=1

l=1

j=1

t=1

y(k) = ∑ ai { ∑ dl gl [y(k − i)]} + ∑ b j { ∑ ct ft [u(k − j)]} + e + v(k) where v(k) = η (k) − e is white and zero mean. Re-name

32

E.-W. Bai

θ = (b1 c1 , ..., b1 cm , ..., bn c1 , ..., bn cm , a1 d1 , ..., a1 dq , ..., a p d1 , ..., a p dq , e)T φ (k) = ( f1 [u(k − 1)], ..., fm [u(k − 1)], ..., f1 [u(k − n)], ..., fm [u(k − n)] , and g1 [y(k − 1)], ..., gq [y(k − 1)], ..., g1[y(k − p)], ..., gq [y(k − p)], 1)T . The above system is given by y(k) = φ T (k)θ + v(k) which is exactly the same as one with zero mean disturbance v(k). Thus, by applying the proposed Identification Algorithm with the re-named φ (k) and θ (N), all the results of Theorem 3.1 follow. To illustrate the method, a simulation example is provided. Consider a system y(k) = a1 (d1 y(k − 1) + d2sin(y(k − 1)) + b1(c1 u(k − 1) + c2u2 (k − 1))+ b2 (c1 u(k − 2) + c2u2 (k − 2)) + η (k) where a = (a1 ) = 1, d = (d1 , d2 )T = (0.5, 0.25)T , √ √ b = (b1 , b2 )T = (1/ 5, −2/ 5)T = (0.4472, −0.8944)T , and c = (c1 , c2 )T = (1, 4)T . For simulation, the input is u(k) = 2 ∗ sin(2 ∗ k) + 2 ∗ sin(4 ∗ k) + 0.15 ∗ sin(6 ∗ k) + 0.15 ∗ sin(8 ∗ k) + 0.1 ∗ sin(10 ∗ k) and η (k) are i.i.d. random variables uniformly in [−0.5, 0.5]. For N = 100, the proposed two-stage algorithm gives the following estimates

a (100) = 1, d(100) = (0.5006, 0.2404)T ,

b(100) = (0.4463, −0.8949)T , and c (100) = (1.0054, 4.0091)T . They are very close to the true but unknown a, b, c and d even for a small N = 100. In Theorem 3.1, convergence of the estimates to the true but unknown parameter vectors is pursued. Clearly, the convergence to the true values can be obtained only for some disturbances, e.g., white noises. In general, convergence can not be guaranteed for arbitrary noises. Thus, a more interesting question is to find estimates

a (N), b(N), c (N), d(N) that minimises

ˆ c, ˆ 2T , ˆ b, ˆ d) ( a(N), b(N), c (N), d(N)) = arg min YN − Y N (a, A A ˆ c, a, ˆ b, ˆ dˆ

(3.7)

ˆ c, ˆ = ( ˆ c, ˆ y (2, a, ˆ c, ˆ . . . , y (N, a, ˆ c, ˆ T, ˆ b, ˆ d) y(1, a, ˆ b, ˆ d), ˆ b, ˆ d), ˆ b, ˆ d)) where Y N (a, q p n m ˆ c, ˆ = ∑ aî ∑ dˆl gl [y(k − i)] + ∑ bˆ j ∑ cˆt ft [u(k − j)] . with y (k, a, ˆ b, ˆ d) i=1

l=1

j=1

t=1

Here X2AT A = X T AT AX, with some weighting matrix A. This is a difficult problem in general requiring nonlinear global search. However, for some matrix A, the proposed two-stage algorithm in fact provides an solution for this problem as shown below [1].

3

Two-stage Algorithm

33

Theorem 3.2. Consider the system (3.1) with some weighting matrix A as in (3.7). Let the least squares estimate in the first step of the Two-stage Identification Algorithm be re-defined as θ ls (N) = (Φ¯ NT Φ¯ N )−1 Φ¯ NT Y¯N with Φ¯ N = AΦN and Y¯N = AYN . Then, the estimates derived form the proposed Two-stage Identification Algorithm are the solution of

ˆ c, ˆ 2T ( a(N), b(N), c (N), d(N)) = arg min YN − Y N (a, ˆ b, ˆ d) A A ˆ c, a, ˆ b, ˆ dˆ

(3.8)

for any A such that all the singular values of AΦN are the same and non-zero. The weighting matrix A in Theorem 3.2 may be considered as pre-filtering on the observed raw data YN and ΦN . An important observation in identification is that the regressor vector φi should span all the directions in the parameter space, preferably with equal energy in each direction. This is equivalent to say that the maximum eigenvalue and the minimum eigenvalue of the matrix ΦNT ΦN should not differ by too much, a well-known concept referred to as the condition number of a matrix. In fact, if the condition number goes to infinity, parameter convergence can not be guaranteed even for arbitrarily small noises. The meaning of the weighting matrix A allowable in Theorem 3.2 is to pre-treat the data so that the condition number of the new treated data (AΦN )T AΦN is 1.

3.3 Concluding Remarks It was shown that the proposed two-stage identification algorithm converges to the true but unknown parameters with no noises or white noises. In the presence of arbitrary noises, it still converges to the global optimal values for certain weighting matrices. The problem of finding optimal estimates for arbitrary weighting matrices remains open. However, the following observation is useful: For any non-singular weighting matrix A, let Φ¯ N = AΦN , Y¯N = AYN and U Λ V T = Φ¯ N be the SVD decomposition of Φ¯ N . Then, YN − ΦN θ 2AT A = Y¯N − Φ¯ N θ ls (N)22 + Φ¯ N (θ ls (N) − θ )22 = Y¯N − Φ¯ N θ ls (N)22 + Λ V T (θ ls (N) − θ )22 . Therefore, for any non-singular A,

θ = arg min YN − ΦN θ 2AT A ⇔ θ = arg min Λ V T (θ ls (N) − θ )22 . θ

θ

A special case is that all the singular values of Φ¯ N = AΦN are the same and nonzero. In this case, θ = arg minθ YN − ΦN θ 2AT A ⇔ θ = arg minθ θ ls (N) − θ 22 and the solution is provided by Theorem 3.2. The chapter is based on [1] with permission from Automatica/Elsevier.

34

E.-W. Bai

References 1. Bai, E.W.: An optimal two-stage identification algorithm for a class of nonlinear systems. Automatica 34, 333–338 (1998) 2. Billings, S.A., Fakhouri, S.Y.: Identification of a class of nonlinear systems using correlation analysis. Proc. of IEE 125, 691–697 (1978) 3. Boutayeb, M., Rafaralahy, H., Darouach, M.: A robust and recursive identification method for Hammerstein model. In: IFAC World Congress, San Francisco, pp. 447–452 (1996) 4. Chang, F., Luus, R.: A non-iterative method for identification using Hammerstein model. IEEE Trans. on Auto. Contr. 16, 464–468 (1971) 5. Hsia, T.: A multi-stage least squares method for identifying Hammerstein model nonlinear systems. In: Proc. of CDC, Clearwater Florida, pp. 934–938 (1976) 6. Ljung, L.: Consistency of the least squares identification method. IEEE Trans. on Auto. Contr. 21, 779–781 (1976) 7. Narendra, K.S., Gallman, P.G.: An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Trans. on Auto. Contr. 11, 546–550 (1966) 8. Rangan, S., Wolodkin, G., Poolla, K.: Identification methods for Hammerstein systems. In: Proc. of CDC, New Orleans, pp. 697–702 (1995) 9. Stoica, P.: On the convergence of an iterative algorithm used for Hammerstein system identification. IEEE Trans. on Auto. Contr. 26, 967–969 (1981) 10. Zhang, Y.K., Bai, E.W.: Simulation of spring discharge from a limestone aquifer in Iowa. Hydrogeology Journal 4, 41–54 (1996)

Chapter 4

Compound Operator Decomposition and Its Application to Hammerstein and Wiener Systems Jozef Vörös

4.1 Introduction One of the main difficulties in the automatic control is caused by the fact that real systems are generally nonlinear. For better mathematical tractability of control system descriptions we are usually forced to make proper assumptions and to use approximations and/or simplifications, hoping that any possible side effects will not be too severe. Therefore, the approaches dealing with nonlinear dynamic systems are generally restrictive in assumptions and applicable to special cases only. In the case of block-oriented nonlinear systems we are confronted with system descriptions given by composition of mappings or operators f = f1 ◦ f2 ◦ · · · ◦ fn defined on nonempty sets Xi , where i = 1, 2, . . ., and fi : Xi → Xi+1 . This description seldom can be used in an analytic form (if it even exists) because the corresponding mathematical expression for input-output relation is often too intricate. Appropriate decompositions of the compound mapping f may provide suitable mathematical models, which simplify the identification of block-oriented nonlinear systems based on the input and output variables, even though one or more generally unmeasurable internal variables will have to be considered. In the following the technique for compound mapping decomposition is presented that can reduce the complexity of system description. Then its application to the block-oriented nonlinear dynamic systems of Hammerstein and Wiener types is shown. Finally, the identification of Hammerstein–Wiener systems with piecewise Jozef Vörös Slovak University of Technology Faculty of Electrical Engineering and Information Technology Institute of Control and Industrial Informatics Ilkovicova 3, 812 19 Bratislava, Slovakia e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 35–51. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

36

J. Vörös

linear characteristics is presented where more and different ways of decomposition of the system operator are performed to simplify the system description.

4.2 Decomposition Let f , g and h be (one-to-one) mappings defined on nonempty sets U, X, and Y as follows: f :U →X

(4.1)

g:X →Y

(4.2)

h = g◦ f :U →Y

(4.3)

then where the mapping g ◦ f is called the composition of f and g. The notation g ◦ f reminds us that f (inner mapping) acts before g (outer mapping). The mapping composed from mappings f and g assigns just one element y ∈ Y to u ∈ U, or the corresponding x ∈ X, according to: y = g(x) = g [ f (u)] = h(u)

(4.4)

x = f (u) .

(4.5)

where

U

X

f

g

Y

h=g○f Fig. 4.1: Compound mapping

Let us assume the mapping g can be decomposed (split) and uniquely expressed by two mappings α : X →Y (4.6)

β : X →Y .

(4.7)

Then the mapping g can be defined on the following Cartesian product of two identical copies of set X (4.8) g = α ⊕β : X ⊗X →Y .

4

Compound Operator Decomposition

37

Replacement of the set forming the domain of the mapping by the Cartesian product of two copies of the same set is correct; it does not change the domain topology and does not require any assumptions or restrictions. Choice of an appropriate form of the decomposition (4.8) may simplify the mathematical description of the relations in (4.4) in some cases, mainly by splitting the mapping into additive or multiplicative forms given by a sum or a product of two mappings. This means that two mappings α and β with the domain X will exist for the mapping g such that in the additive case: g(x) = α (x) + β (x) ,

(4.9)

while in the multiplicative case: g(x) = α (x).β (x) ,

(4.10)

for every x ∈ X. Note that in the case of analytic mappings such decomposition forms always exist. The above mentioned way of decomposition is the basis for the so-called key term separation principle consisting of the following steps. Assume the outer mapping of (4.4) can be decomposed into the additive form (4.9). Then we substitute (4.5) only for x in the first term of (4.9), i.e. in the so-called key term as follows: g(x) = α [ f (u)] + β (x) .

(4.11)

After this half-substitution the domain of the compound mapping has been changed to the Cartesian product of inner and outer mapping domains g :U ⊗X →Y

(4.12)

and the couple of mappings given by (4.5) and (4.11) uniquely represents the original compound mapping (4.4). As no restrictions were imposed on the mappings α and β , it will be advantageous to choose the mapping α as the identity mapping. Then in the additive case: g(x) = x + β (x) = f (u) + β (x)

(4.13)

while in the multiplicative case: g(x) = x.β (x) = f (u).β (x) .

(4.14)

In some cases, the application of the key term separation principle may significantly simplify the mathematical description of the relations in (4.4).

4.2.1 Serial Application The key term separation principle can be applied more times within a compound mapping, namely in series, i.e. sequentially and repeatedly. Let Xi , i = 0, 1, . . . , n, be an ensemble of nonempty sets and let fi be mappings defined as follows:

38

J. Vörös

fi : Xi−1 → Xi ,

i = 1, 2, . . . , n .

(4.15)

Let the composition of these mappings h = fn ◦ fn−1 ◦ · · · ◦ f1 : X0 → Xn

(4.16)

be given by the following relations x1 = f1 (x0 ) x2 = f2 (x1 ) = f2 ( f1 (x0 )) .. . xn = fn (xn−1 ) = fn ( fn−1 (· · · f2 ( f1 (x0 )) · · · ) = h(x0 )

(4.17)

where xi ∈ Xi . Assume that the mapping fn can be decomposed in the abovedescribed way, i.e. it can be defined on the Cartesian product of sets fn : Xn−1 ⊗ Xn−1 → Xn .

(4.18)

By analogy with (4.12) the domain of this mapping can be rewritten as follows: fn : Xn−2 ⊗ Xn−1 → Xn .

(4.19)

Further assuming that the mappings (4.15) for i = 1, 2, 3, . . . , n − 1, can be decomposed in the same way, then the compound mapping (4.16) can be replaced by the following equivalent mapping h : X0 ⊗ X1 ⊗ X2 ⊗ · · · ⊗ Xn−2 ⊗ Xn−1 → Xn

(4.20)

supplemented with the ensemble of mappings (4.15). If we choose the additive form of decompositions (4.13) for (4.17) and half-substitute only for the separated key term x1 : x2 = x1 + β2(x1 ) = f1 (x0 ) + β2 (x1 ) x3 = x2 + β3(x2 ) .. . xn = xn−1 + βn (xn−1 ) ,

(4.21)

the compound mapping (4.16) can be described by the equation n

xn = f1 (x0 ) + ∑ βi (xi−1 ) ,

(4.22)

i=2

i.e. by a sum of mappings. From the mathematical point of view, the equations (4.21) and (4.22) may be simpler than the original mapping (4.16).

4

Compound Operator Decomposition

39

4.2.2 Parallel Application The key term separation principle can be applied more times within a compound mapping also in parallel way — simultaneously. For example, let ϕ1 , . . . , ϕn be mappings defined on sets U1 ,U2 , . . . ,Un , X1 , X2 , . . . , Xn as

ϕi = Ui → Xi

i = 1, 2, . . . , n

(4.23)

and let γ be a mapping defined on sets X1 , X2 , . . . , Xn and Y as

γ = X1 ⊗ X2 ⊗ · · · ⊗ Xn → Y ,

(4.24)

y = γ (x1 , x2 , . . . , xn ) i = 1, 2, . . . , n xi = ϕi (ui )

(4.25) (4.26)

then

where ui ∈ Ui , xi ∈ Xi and y ∈ Y . Now we can perform the same procedure as above, i.e. we separate all xi , i = 1, 2, . . . , n, in (4.25) as the key terms and then half-substitute (4.26) only for the separated xi n

y = ∑ ϕi (ui ) + Γ (x1 , x2 , . . . , xn )

(4.27)

i=1

where Γ (.) represents the “remainder” of the original mapping γ (.) after separation of all xi s. Finally note that the decomposition of compound mappings can be performed by more and different combinations of the above-described approaches.

4.3 Decomposition of Block-oriented Nonlinear Systems The compound mapping decomposition can be directly applied to operators, i.e. mappings which operate on mathematical entities of higher complexity than real numbers, such as vectors, deterministic or stochastic variables, or mathematical expressions. In general, if either the domain or the co-domain (or both) of a mapping contains elements significantly more complex than real numbers, that mapping is referred to as an operator. In the control theory an operator can perform a mapping on any number of operands (inputs) to produce corresponding outputs. We shall use the word operator for mappings the domains and co-domains of which contain time dependent variables (e.g. u(t), x(t), y(t)).

4.3.1 Hammerstein System The key term separation principle can be very simply applied to the decomposition of Hammerstein system, which is a cascade connection of a nonlinear static subsystem

40

J. Vörös

x(t) = f [u(t)]

(4.28)

where u(t) are the inputs and x(t) are the outputs, followed by a linear dynamic subsystem q−d B q−1 x(t) (4.29) y(t) = 1 + A (q−1 ) where A q−1 = a1 q−1 + . . . + am q−m B q−1 = b1 q−1 + . . . + bnq−n

(4.30) (4.31)

and x(t) and y(t) are the inputs and outputs, respectively. Then the Hammerstein system output equation can be written as y(t) = B q−1 x(t − d) − A q−1 y(t) . (4.32) Application of the key term separation principle is as follows. First, we separate a key term (4.33) y(t) = b1 x(t − d) + B q−1 − b1 x(t − d) − A q−1 y(t) i.e. the term containing variable x(t − d), and then we half-substitute (4.28) only for the key term y(t) = f [u(t − d)] + B q−1 − 1 x(t − d) − A q−1 y(t) , (4.34) assuming b1 = 1. Namely, if the nonlinear static function is multiplied by a nonzero real constant and if the linear dynamic part is divided by the same constant, the resulting model has the same input-output behaviour. After choosing an appropriate parametrisation for f (.), which is linear in parameters, the Hammerstein system can be described by the output equation, where all the parameters are separated and the equation is linear in parameters, but nonlinear in variables. This approach was applied to modelling and identification of Hammerstein systems with different types of nonlinearities, i.e. polynomial [17, 18, 5, 10],

u(t)

x(t) f (·)

y(t) q

−d

B q −1 − 1 1 + A (q −1 ) Fig. 4.2: Hammerstein system

4

Compound Operator Decomposition

41

discontinuous [19] two-segment polynomial [20], multisegment piecewise linear [22]. The parameter estimation was solved as quasi-linear problem by iterative methods with internal variable estimation. Also recursive estimation methods were proposed for on line identification of decomposed Hammerstein systems [24, 26, 27, 15].

4.3.2 Wiener System Similar approach can be chosen for the decomposition of Wiener system, which is a cascade connection of a linear dynamic subsystem (4.35) x(t) = B q−1 u(t − d) − A q−1 x(t) with the inputs u(t) and the outputs x(t), where A(q−1 ) and B(q−1 ) are given by (4.30) and (4.31), followed by a nonlinear static subsystem y(t) = g [x(t)]

(4.36)

where x(t) and y(t) are the inputs and outputs, respectively. First, we separate a key term y(t) = g1 x(t) + G[x(t)] . (4.37) Then, assuming g1 = 1, we half-substitute (4.35) only for the key term y(t) = B q−1 u(t − d) − A q−1 x(t) + G [x(t)] .

(4.38)

After choosing an appropriate parametrisation for G(.) being linear in parameters, the Wiener system can be described by the output equation, which is linear in all the separated parameters, but nonlinear in variables. The Wiener system description given by (4.38) was applied to modelling and identification of systems with different types of output nonlinearities, i.e. polynomial [17, 18, 5, 10], discontinuous [21], two-segment polynomial [23], multisegment piecewise linear [28], where iterative methods with internal variable estimation

u(t)

q −d B q −1 1 + A (q −1 )

x(t)

G(·)

Fig. 4.3: Wiener system

y(t)

42

J. Vörös

were used. Also recursive estimation methods were proposed for on line identification of decomposed Wiener systems [30, 31, 34, 15].

4.4 Identification of Hammerstein–Wiener Systems Hammerstein and Wiener systems are the simplest types of block-oriented nonlinear dynamic systems and many methods have been proposed for their identification. However, there exist only few works reported in the literature on the so-called Hammerstein–Wiener system. Very little has been done in the identification of nonlinear systems using parametric Hammerstein–Wiener models [1, 2, 3, 6, 12, 13, 14, 35, 36] where more restrictions on nonlinear blocks are assumed, special sets of input-output data are used and more estimation steps and cost functions are considered. A significant disadvantage of these approaches is that the dimension of parameter vector is usually very high because of over-parametrisation. The following approach to parameter identification of Hammerstein–Wiener systems with piecewise linear characteristics is illustrating the multiple application of the key term separation principle in both serial and parallel ways. The resulting mathematical model for this type of block-oriented systems contains explicit information on all the blocks of given system without cross-multiplication of parameters but also more internal variables, which are generally unmeasurable. Application of an iterative algorithm enables, on the basis of one set of measured input/output data, the estimation of all internal variables and all model parameters, i.e.: 1. the coefficients determining the subdomains of the input static block function and the slopes of corresponding linear segments; 2. the parameters of linear block transfer function; and 3. the coefficients determining the subdomains of the output static block function and the slopes of corresponding linear segments.

4.4.1 Hammerstein–Wiener Systems The Hammerstein–Wiener system is given by the cascade connection of an input static nonlinearity block (N1) followed by a linear dynamic system (LS) which is followed by an output static nonlinearity block (N2) (Figure 4.4). The first nonlinear static block N1 can be described as: v(t) = C [u(t)]

(4.39)

where u(t) and v(t) are the inputs and outputs, respectively. The difference equation model of the linear dynamic block is: (4.40) x(t) = B q−1 v(t) − A q−1 x(t) where v(t) and x(t) are the inputs and outputs, respectively, A q−1 and B q−1 are scalar polynomials in the unit delay operator q−1

4

Compound Operator Decomposition

43

A q−1 = a1 q−1 + . . . + am q−m , B q−1 = b1 q−1 + . . . + bn q−n .

(4.41) (4.42)

The second nonlinear static block N2 can be described as y(t) = D [x(t)]

(4.43)

with inputs x(t) and outputs y(t). The Hammerstein–Wiener system inputs u(t) and outputs y(t) are measurable, while the internal variables v(t) and x(t) are not.

u(t)

v(t)

LS

x(t)

y(t)

Fig. 4.4: Hammerstein–Wiener system

The input-output description of the Hammerstein–Wiener system resulting from direct substitutions of the corresponding variables from (4.39) into (4.40) and then into (4.43) would be strongly nonlinear both in the variables and in the parameters, hence not very suitable for the parameter estimation. Therefore, the serial decomposition will be applied with the aim to derive a simpler form of the system description. The second nonlinear block can be decomposed and written as follows: y(t) = d1 x(t) + D [x(t)]

(4.44)

where the internal variable x(t) is separated. The linear dynamic block equation can be written as x(t) = b1 v(t − 1) + B q−1 − b1 v(t − 1) − A q−1 x(t) (4.45) where the internal variable v(t − 1) is separated. Now, to complete the serial decomposition, the corresponding half-substitutions can be performed, i.e.: (i) from (4.39) into (4.45) only for v(t − 1) in the first term, and (ii) from (4.45) into (4.44) again only for x(t) in the first term. The resulting output equation of the Hammerstein– Wiener system will be y(t) = d1 b1C [u(t − 1)] + B q−1 − b1 v(t − 1) − A q−1 x(t) + D [x(t)] . (4.46) Appropriate parametrisations of two nonlinear block descriptions can significantly simplify the system output equation and possibly lead to linearity in parameters. However, the system contains two internal variables, which are generally unmeasurable. As the Hammerstein–Wiener system consists in the cascade connection of three subsystems, the parametrisation of (4.46) is not unique, as many combinations of parameters can be found. Therefore, one parameter in at least two blocks has to be fixed in (4.46) to make the mathematical description unique. Evidently, the choices

44

J. Vörös

d1 = 1 and b1 = 1 (more precisely bi = 1, where bi is the first nonzero parameter considered) in (4.46) will simplify the Hammerstein–Wiener system description.

4.4.2 Piecewise-linear Characteristics The general form (4.46) of the Hammerstein–Wiener system description can be used for the parameter identification of systems with different types of static nonlinearities in both N1 and N2 blocks [25]. Some technical systems possess the structure where the input and output nonlinearities are or can be modelled by the piecewise linear characteristics [11]. Although some approaches to identification of blockoriented systems with piecewise linear characteristics were proposed [4, 7, 16], the following application of key term separation principle simplifies the description of this type of systems. Let the output v(t) of the first nonlinear static block N1 be described by the following equations: if 0 ≤ u(t) ≤ dR1 mR1 u(t) (4.47) v(t) = mR2 [u(t) − dR1] + mR1dR1 if u(t) > dR1 if dL1 ≤ u(t) < 0 mL1 u(t) (4.48) v(t) = if u(t) < dL1 mL2 [u(t) − dL1] + mL1dL1 where |mR1 | < ∞, |mR2 | < ∞ are the corresponding segment slopes and 0 ≤ dR1 < ∞ is the constant for the positive inputs of N1, |mL1 | < ∞, |mL2 | < ∞ are the corresponding segment slopes and −∞ < dL1 < 0 is the constant for the negative inputs of N1. Let us introduce two internal variables f1 (t) = f1 [u(t)] = (mR2 − mR1 ) h [dR1 − u(t)] and

(4.49)

f2 (t) = f2 [u(t)] = (mL2 − mL1) h [u(t) − dL1]

(4.50)

where the switching function h(.) is defined as follows: 0 if s ≥ 0 h[s] = 1 if s < 0

(4.51)

and switches between two sets of variable s ∈ (−∞, ∞). Then the piecewise-linear mapping given by two equations (4.47) and (4.48) can be rewritten into the following input/output form [22]: v(t) = mR1 h [−u(t)] u(t) + [u(t) − dR1] f1 (t) +mL1 h [u(t)]u(t) + [u(t) − dL1] f2 (t) .

(4.52)

Now we can apply the parallel decomposition, i.e. separate two key terms, namely u(t) f1 (t) and u(t) f2 (t) and then half-substitute (4.49) and (4.50). So we obtain the following decomposed output equation for the block N1

4

Compound Operator Decomposition

45

v(t) = mR1 h [−u(t)] u(t) + (mR2 − mR1 ) h [dR1 − u(t)]u(t) − dR1 f1 (t) +mL1 h [u(t)] u(t) + (mL2 − mL1 ) h [u(t) − dL1] u(t) − dL1 f2 (t) (4.53) which is linear in parameters. Let the output y(t) of the second nonlinear static block N2 be described by the following equations: MR1 x(t) if 0 ≤ x(t) ≤ DR1 y(t) = (4.54) MR2 [x(t) − DR1 ] + MR1 DR1 if x(t) > DR1 ML1 x(t) if DL1 ≤ x(t) < 0 y(t) = (4.55) if x(t) < DL1 ML2 [x(t) − DL1] + ML1 DL1 where |MR1 | < ∞, |MR2 | < ∞ are the corresponding segment slopes and 0 ≤ DR1 < ∞ is the constant for the positive inputs of N2, |ML1 | < ∞, |ML2 | < ∞ are the corresponding segment slopes and −∞ < DL1 < 0 is the constant for the negative inputs of N2. In this case, the form proposed in [28] for the description of the first segment on the right-hand side and the left-hand side of the origin can be used leading to the following form: y(t) = MR1 x(t) + (ML1 − MR1 ) h [x(t)] x(t) + [x(t) − DR1] F1 (t) + [x(t) − DL1] F2 (t)

(4.56)

where the internal variables are defined as F1 (t) = F1 [x(t)] = (MR2 − MR1 ) h [DR1 − x(t)] and F2 (t) = F2 [x(t)] = (ML2 − ML1) h [x(t) − DL1 ] .

(4.57) (4.58)

We can again apply the parallel decomposition, i.e. separate two key terms, namely x(t)F1 (t) and x(t)F2 (t) and then half-substitute (4.57) and (4.58). We obtain the following decomposed equation for the block N2 y(t) = MR1 x(t) + (ML1 − MR1 ) h [x(t)] x(t) + (MR2 − MR1 ) h [DR1 − x(t)]x(t) − DR1 F1 (t) + (ML2 − ML1) h [x(t) − DL1] x(t) − DL1F2 (t)

(4.59)

and the equation is linear in parameters of nonlinear block N2. The above descriptions of the nonlinear blocks N1 and N2, i.e. (4.53) and (4.59), can be incorporated into (4.46). Choosing MR1 = 1, the half-substitution of (4.45) into (4.59) for the first term only gives m

n

i=2

j=1

y(t) = b1 v(t − 1) + ∑ bi v(t − i) + ∑ a j x(t − j) + (ML1 − 1)h [x(t)] x(t) + (MR2 − 1)h [DR1 − x(t)]x(t) −DR1 F1 (t) + (ML2 − ML1) h [x(t) − DL1] x(t) − DL1F2 (t)

(4.60)

46

J. Vörös

and then choosing b1 = 1 the half-substitution of (4.53) into (4.60) for the first term only gives the resulting Hammerstein–Wiener system output equation y(t) = mR1 h [−u(t − 1)]u(t − 1) + (mR2 − mR1 ) h [dR1 − u(t − 1)]u(t − 1) − dR1 f1 (t − 1) +mL1h [u(t − 1)]u(t − 1) + (mL2 − mL1 ) h [u(t − 1) − dL1] u(t − 1) m

n

i=2

j=1

−dL1 f2 (t − 1) + ∑ bi v(t − i) − ∑ a j x(t − j) + (ML1 − 1)h [x(t)] x(t) + (MR2 − 1)h [DR1 − x(t)] x(t) −DR1 F1 (t) + (ML2 − ML1 ) h [x(t) − DL1] x(t) − DL1F2 (t) .

(4.61)

The Hammerstein–Wiener system is described by (4.49), (4.50) and (4.53) defining the internal variables f1 (t), f2 (t), and v(t) connected with N1; by (4.45) defining the internal variable x(t) being the output of LS; by (4.57), (4.58) defining the internal variables F1 (t), F2 (t) connected with N2; and by the output equation (4.61). All the parameters to be estimated are separated in (4.61) hence the proposed form of the Hammerstein–Wiener system description contains the least possible number of parameters to be estimated. Now the above description can be used as a mathematical model for the identification of Hammerstein–Wiener systems with piecewise linear characteristics. Defining the data vector

ϕ T (t) = {h [−u(t − 1)]u(t − 1), h [dR1 − u(t − 1)]u(t − 1), − f1 (t − 1), h [u(t − 1)]u(t − 1), h [u(t − 1) − dL1] u(t − 1), − f2 (t − 1), v(t − 2), . . . , v(t − m), −x(t − 1), . . . , −x(t − n), h [x(t)] x(t), h [DR1 − x(t)]x(t), −F1 (t), h [x(t) − DL1] x(t), −F2 (t)} (4.62) and the vector of parameters

θ T = [mR1 , mR2 − mR1 , dR1 , mL1 , mL2 − mL1, dL1 , b2 , . . . , bm , a1 , . . . , an , ML1 − 1, MR2 − 1, DR1 , ML2 − ML1, DL1 ]

(4.63)

the Hammerstein–Wiener model with piecewise-linear nonlinearities given by (4.61) can be put into a concise form y(t) = ϕ T (t) · θ + e(t)

(4.64)

where e(t) is an additive noise and the problem of model parameters estimation can be solved as a pseudo-linear estimation problem.

4

Compound Operator Decomposition

47

4.4.3 Algorithm As the data vector (4.62) contains unmeasurable variables f1 (t), f2 (t), v(t), x(t), F1 (t), F2 (t) and depends on the unknown parameters, no one-shot algorithm can be used for the estimation of parameter vector (4.63). However, the parameter estimation can be performed using an iterative method with internal variables estimation similarly as in [22, 28]. This means that an error criterion is repeatedly minimised using the estimates of internal variables resulting from the previous estimates of parameters included. In the case of the mean squares error criterion the following functional 2 1 N s J = ∑ y(t) − s−1ϕ T (t) · s θ (4.65) N t=1 is repeatedly minimised for the parameter vector estimate s θ , where N is the number of samples and s−1 ϕ (t) is the data vector with the (s − 1)-estimates of internal variables. The steps in the iterative procedure may be stated as follows: 1. Minimising (4.65) the estimates of parameters s θ are obtained using the data vectors s−1 ϕ (t) with the previous estimates of internal variables. 2. The estimates of internal variables s f1 (t), s f2 (t) are evaluated using s

f1 (t) = (s mR2 − s mR1 ) h [s dR1 − u(t)]

(4.66)

s

f2 (t) = ( mL2 − mL1 ) h [u(t) − dL1 ] .

(4.67)

s

s

s

3. The estimates of internal variable s v(t) are evaluated using s

v(t) = s mR1 h [−u(t)] u(t) + (s mR2 − s mR1 ) h [s dR1 − u(t)]u(t) −s dR1 s f1 (t) + s mL1 h [u(t)] u(t) + (s mL2 − s mL1 ) h [u(t) − sdL1 ] u(t) − s dL1 s f2 (t) .

(4.68)

4. The estimates of internal variable s x(t) are evaluated using s

m

n

i=2

j=1

x(t) = s v(t − 1) + ∑ s bi s v(t − i) + ∑ s a j s x(t − j) .

(4.69)

5. The estimates of internal variables s F1 (t), s F2 (t) are evaluated using F1 (t) = (s MR2 − s MR1 ) h [s DR1 − s x(t)] , s F2 (t) = (s ML2 − s ML1 ) h [s x(t) − s DL1 ] . s

(4.70) (4.71)

6. If the value of s J is less than a prescribed value, the procedure ends; else it continues with repeating steps 1–5. In the first iteration, only the parameters of N1 and LS are estimated and nonzero (small) initial values of 1 dR1 and 1 dL1 are used to start up the iterative algorithm. The initial values of the linear system parameters can be chosen as zero. Then nonzero initial values of 1 DR1 and 1 DL1 have to be used for N2. In the early steps of the

48

J. Vörös

iterative procedure, the possible high values of s dR1 and s DR1 estimates and low values of s dL1 and s DL1 estimates may cause insufficient excitation for the estimation algorithm. Namely they may null the estimates of corresponding internal variables s f (t) and s f (t), as well as s F (t) and s F (t). However, proper limits for the values 1 2 1 2 of these parameters during the estimation process can overcome this problem.

4.4.4 Example To illustrate the feasibility of the proposed identification technique, the following example shows the process of parameters and internal variables estimation for the Hammerstein–Wiener system, where the nonlinear static block N1 (piecewise linear function with dead zone) and N2 (piecewise linear function with saturation) are given by the parameters in Table 4.1. Table 4.1: Parameters of the nonlinear blocks BLOCK N1 p11 p12 p13 p14 p15 p16

= = = = = =

mR1 mR2 mR2 − mR1 mL2 − mL1 dR1 dL1

= = = = = =

BLOCK N2 0.0 0.0 1.0 0.8 0.35 −0.3

p21 p22 p23 p24 p25 p26

= = = = = =

MR1 ML1 − MR1 MR2 − MR1 ML2 − ML1 DR1 DL1

= = = = = =

1.0 0.1 −1.0 −1.1 0.7 −0.8

The linear dynamic system is given by the difference equation x(t) = v(t − 1) + 0.15v(t − 2) + 0.2x(t − 1) − 0.35x(t − 2) . The initial values of the parameters were chosen as zero except 1 dR1 = 1 DR1 = 0.45 and 1 dL1 = 1 DL1 = −0.45 for the first estimates of corresponding internal variables. The identification was carried out with 2000 samples, using uniformly distributed random inputs and simulated outputs. Normally distributed random noise with zero mean and signal-to-noise ratio SNR = 50 was added to the simulated outputs (SNR − the square root of the ratio of output and noise variances). The parameter

Fig. 4.5 Parameter estimates for N1

4

Compound Operator Decomposition

49

Fig. 4.6 Parameter estimates for LS

Fig. 4.7 Parameter estimates for N2

estimates for the nonlinear block N1 are graphically shown in Figure 4.5 (the topdown order of parameters is p13 , p14 , p15 , p11 = p12 , p16 ), for the linear dynamic subsystem in Figure 4.6 (the top-down order of parameters is a2 , b2 , a1 ), and for the nonlinear block N2 in Figure 4.7 (the top-down order of parameters is p21 , p25 , p22 , p26 , p23 , p24 ). The parameter estimates are almost equal to the correct values after about 15 iterations and the iterative process shows good convergence. There is no general proof of convergence for iterative algorithms with internal variables estimation. However, the numerical accuracy and convergence of the proposed approach to parameter identification using Hammerstein–Wiener model with piecewise-linear nonlinearities is good despite the fact that 6 internal variables are estimated.

4.4.5 Conclusions From the mathematical point of view the presented form of compound mapping decomposition may appear as trivial or even inconvenient due to a certain redundancy. However, in the praxis, it may help to master problems with the description of systems, behaviour of which is characterised (indeed or expectedly) by compound mappings or operators. In control system analysis, it may be convenient to choose an unmeasurable internal variable as the key term of given compound operator and then to apply the presented principle. Compound mapping decomposition using the key term separation principle can be simply implemented and it does not change the domains of mappings. Note that the key term separation principle can be also used in decomposition of Wiener–Hammerstein systems [29, 15]. Moreover, it can be applied to descriptions of some dynamic nonlinearities as hysteresis [32] or backlash [33]. Consequently, it can be extended also to block-oriented systems with dynamic nonlinearities, e.g., to cascade systems with input backlash [33, 9] or output backlash [8].

50

J. Vörös

References 1. Bai, E.W.: An optimal two-stage identification algorithm for Hammerstein–Wiener nonlinear systems. Automatica 34, 333–338 (1998) 2. Bai, E.W.: A blind approach to the Hammerstein–Wiener model identification. Automatica 38, 967–979 (2002) 3. Bauer, D., Ninness, B.: Asymptotic properties of least-squares estimates of Hammerstein–Wiener models. Int. J. Control 75, 34–51 (2002) 4. Chen, H.F.: Recursive identification for Wiener model with discontinuous piece-wise linear function. IEEE Trans. Automatic Control 51, 390–400 (2006) 5. Chen, H.T., Hwang, S.H., Chang, C.T.: Iterative identification of continuous-time Hammerstein and Wiener systems using a two-stage estimation algorithm. Industrial & Engineering Chemistry Research 48, 1495–1510 (2009) 6. Crama, P., Schoukens, J.: Hammerstein–Wiener system estimator initialization. Automatica 40, 1543–1550 (2004) 7. Dolanc, G., Strmcnik, S.: Identification of nonlinear systems using a piecewise-linear Hammerstein model. Systems and Control Letters 54, 145–158 (2005) 8. Dong, R., Tan, Q., Tan, Y.: Recursive identification algorithm for dynamic systems with output backlash and its convergence. Int. J. Appl. Math. Comput. Sci. 19, 631–638 (2009) 9. Dong, R., Tan, Y., Chen, H.: Recursive identification for dynamic systems with backlash. Asian Journal of Control 12, 26–38 (2010) 10. Guo, F.: A new identification method for Wiener and Hammerstein systems. PhD Dissertation, University Karlsruhe, Germany (2004) ˇ 11. Kalaˇs, V., Juriˇsica, L., Zalman, M., et al.: Nonlinear and Numerical Servosystems. Alfa/SNTL, Bratislava (in Slovak) (1985) 12. Lee, Y.J., Sung, S.W., Park, S., Park, S.: Input test signal design and parameter estimation method for the Hammerstein–Wiener processes. Industrial & Engineering Chemistry Research 43, 7521–7530 (2004) 13. Park, H.C., Sung, S.W., Lee, J.T.: Modeling of Hammerstein–Wiener processes with special input test signals. Industrial & Engineering Chemistry Research 45, 1029–1038 (2006) 14. Pupeikis, R.: On the identification of Hammerstein–Wiener systems. Lietuvos matem. rink 45, 509–514 (2005) 15. Tan, Y., Dong, R., Li, R.: Recursive identification of sandwich systems with dead zone and application. IEEE Trans. Control Systems Technology 17, 945–951 (2009) 16. van Pelt, T.H., Bernstein, D.S.: Non-linear system identification using Hammerstein and non-linear feedback models with piecewise linear static maps. Int. J. Control 74, 1807– 1823 (2001) 17. Vörös, J.: Nonlinear system identification with internal variable estimation. In: Barker, H.A., Young, P.C. (eds.) Preprints 7th IFAC Symposium on Identification and System Parameter Estimation, York, pp. 439–443 (1985) 18. Vörös, J.: Identification of nonlinear dynamic systems using extended Hammerstein and Wiener models. Control-Theory and Advanced Technology 10, 1203–1212 (1995) 19. Vörös, J.: Parameter identification of discontinuous Hammerstein systems. Automatica 33, 1141–1146 (1997) 20. Vörös, J.: Iterative algorithm for parameter identification of Hammerstein systems with two-segment nonlinearities. IEEE Trans. Automatic Control 44, 2145–2149 (1999) 21. Vörös, J.: Parameter identification of Wiener systems with discontinuous nonlinearities. Systems and Control Letters 44, 363–372 (2001)

4

Compound Operator Decomposition

51

22. Vörös, J.: Modeling and parameter identification of systems with multisegment piecewise-linear characteristics. IEEE Trans. Automatic Control 47, 184–188 (2002) 23. Vörös, J.: Modeling and identification of Wiener systems with two-segment nonlinearities. IEEE Trans. Control Systems Technology 11, 253–257 (2003a) 24. Vörös, J.: Recursive identification of Hammerstein systems with discontinuous nonlinearities containing dead-zones. IEEE Trans. Automatic Control 48, 2203–2206 (2003b) 25. Vörös, J.: An iterative method for Hammerstein–Wiener systems parameter identification. J. Electrical Engineering 55, 328–331 (2004) 26. Vörös, J.: Identification of Hammerstein systems with time-varying piecewise-linear characteristics. IEEE Trans. Circuits and Systems - II: Express Briefs 52, 865–869 (2005) 27. Vörös, J.: Recursive identification of Hammerstein systems with polynomial nonlinearities. J. Electrical Engineering 57, 42–46 (2006) 28. Vörös, J.: Parameter identification of Wiener systems with multisegment piecewiselinear nonlinearities. Systems and Control Letters 56, 99–105 (2007) 29. Vörös, J.: An iterative method for Wiener–Hammerstein systems parameter identification. J. Electrical Engineering 58, 114–117 (2007) 30. Vörös, J.: Recursive identification of Wiener systems with two-segment polynomial nonlinearities. J. Electrical Engineering 59, 40–44 (2008) 31. Vörös, J.: Recursive identification of time-varying Wiener systems with polynomial nonlinearities. Int. J. Automation and Control 2, 90–98 (2008) 32. Vörös, J.: Modeling and identification of hysteresis using special forms of the ColemanHodgdon model. J. Electrical Engineering 60, 100–105 (2009) 33. Vörös, J.: Modeling and identification of systems with backlash. Automatica 46, 369– 374 (2010) 34. Vörös, J.: Recursive identification of systems with noninvertible output nonlinearities. Informatica 21, 139–148 (2010) 35. Wang, D., Ding, F.: Extended stochastic gradient identification algorithms for Hammerstein–Wiener ARMAX systems. Computers & Mathematics with Applications 56, 3157–3164 (2008) 36. Zhu, Y.: Estimation of an N-L-N Hammerstein–Wiener model. Automatica 38, 1607– 1614 (2002)

Chapter 5

Iterative Identification of Hammerstein Systems Yun Liu and Er-Wei Bai

5.1 Introduction The iterative method was first proposed to estimate Hammerstein system in [9]. It is a very simple and efficient algorithm. In general, the convergence of iterative algorithm can be a problem [12]. It was recently shown that iterative algorithms with normalisation possess some global convergence properties in identification of Hammerstein system with smooth nonlinearity and finite impulse response (FIR) linear part [3, 4, 11]. The convergence for an infinite impulse response (IIR) system was however not solved. In this chapter, the results are extended to Hammerstein systems with an IIR linear part. The global convergence for an IIR system is established for the odd nonlinearities. The chapter also extends results to Hammerstein systems with non-smooth nonlinearities and FIR linear block [2]. One way to deal with such nonlinearities is to use nonparametric approach. A problem with nonparametric approach is the slow convergence [6, 7], if it converges. To this end, iterative algorithms were proposed and developed to identify Hammerstein systems with piecewise nonlinearity [13, 14]. But the convergence property of the algorithms is unknown. This chapter presents a normalised iterative identification algorithm for two common non-smooth piecewise-linear nonlinear structures, i.e., nonlinearities with saturation and preload characteristics. It is shown that the algorithms converge in one iteration step when the number of sample data is large. The chapter is based on [1] with permission from Automatica/Elsevier.

5.2 Hammerstein System with IIR Linear Part We study the scalar system in the block diagram Figure 5.1. The input signal is ut , xt is the internal signal, vt and yt are the noise and output, respectively. In this chapter, Yun Liu and Er-Wei Bai Dept. of Electrical and Computer Engineering, University of Iowa e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 53–65. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

54

Y. Liu and E.-W. Bai

vt

?

1 D(q−1 )

ut

-

xt

f (·)

-

+ - ?

B(q−1 ) D(q−1 )

+

-yt

Fig. 5.1: Hammerstein system

the noise vt is assumed to be i.i.d. random sequences with zero mean, Evt2 = σv2 < ∞, and E|vt |4+δ < ∞ for some δ > 0, where E denotes expectation of a random variable. The Hammerstein system with an infinite impulse response (IIR) linear subsystem has the following representation. The linear part is an unknown stable system which can be described by: yt = d1 yt−1 + d2 yt−2 + · · · + dnyt−n + b1xt−1 + b2xt−2 + · · · + bm xt−m + vt . The nonlinear part is an unknown odd symmetric function that can be expressed as xt = f (ut ) = a1 g1 (ut ) + a2g2 (ut ) + · · · + al gl (ut ).

(5.1)

Here gi (u), i = 1, ..., l are known odd functions of u. We can write the Hammerstein system into the following matrix form, n

m

l

i=1

j=1

i=1

yt = ∑ di yt−i + ∑ b j [ ∑ ai gi (ut− j )] + vt = dT φt (y) + bT Gt (u)a + vt where

(5.2)

⎛

⎞ g1 (ut−1 ) · · · gl (ut−1 ) ⎜ ⎟ .. .. .. φt (y) = (yt−1 , yt−2 , · · · , yt−n )T , Gt (u) = ⎝ ⎠. . . . g1 (ut−m ) · · · gl (ut−m )

The unknown parameters needed to be estimated are: a = (a1 , a2 , ..., al )T , d = (d1 , d2 , ..., dn )T , b = (b1 , b2 , ..., bm )T . The purpose of identification is to estimate the unknown parameters based on the observed input and output data {ut , yt },t = 1, 2..., N, for large enough N. ˆ bˆ are sought to For Hammerstein systems linear part (5.2), the estimates aˆ , d, minimise the least squares cost function, ¯ b) ¯ = argmina¯ ,d, ˆ b} ˆ = argmina¯ ,d, a, d, {â, d, ¯ b¯ JN (¯ ¯ b¯ = argmina¯ ,d, ¯ b¯

1 N ∑ (yˆt − yt )2 N t=1

1 N ¯T ∑ (d φt (y) + b¯ T Gt (u)¯a − yt )2 . N t=1

(5.3)

5

Iterative Identification of Hammerstein Systems

55

Note that the model (5.2) is not well defined for identification purpose. Any pair λ a and b/λ for some non-zero constant λ provides the same input-output data. To have identifiability, we adopt the normalisation constraint on a for model (5.2). Assume that ||a|| = 1, and the first non-zero entry of a is positive. We propose the following normalised iterative algorithms to estimate a and d, b for system (5.2): Given the initial estimate bˆ 0 = (1, 0, ..., 0)T , and an arbitrary n-dimension vector dˆ 0 , aˆ k = arg mina¯ JN (¯a, dˆ k−1 , bˆ k−1 ); Normalise aˆ k to have positive first non-zero entry ; (5.4) ¯ b); ¯ {dˆ k , bˆ k } = arg mind, ak , d, ¯ b¯ JN (ˆ Replace k by k + 1 and the process is repeated.

Convergence Analysis To analyse the performance of the algorithm (5.4), we write the cost function ˆ b) ˆ in (5.3) as follows, JN (â, d, ˆ b) ˆ = JN (â, d,

l 1 N ˆ [(d − d)T φt (y) + ∑ (gi (ut−1 ), · · · , gi (ut−m ))(aî bˆ − ai b) + vt ]2 . ∑ N t=1 i=1

Let N → ∞, and define ⎛

⎞ yt−1 1 ⎜ ⎟ C = lim ∑ ⎝ ... ⎠ (yt−1 , ..., yt−n ), N→∞ N t=1 yt−n N

⎛

⎞ yt−1 1 ⎜ ⎟ A(i) = lim ∑ ⎝ ... ⎠ (gi (ut−1 ), ..., gi (ut−m )), N→∞ N t=1 yt−n N

where i = 1, ..., l, ⎛

⎞ gi (ut−1 ) 1 ⎜ ⎟ B(i, j) = lim ∑ ⎝ ... ⎠ · (g j (ut−1 ), ..., g j (ut−m )) N→∞ N t=1 gi (ut−m ) N

where i, j = 1, 2, ..., l. When the noise vt is i.i.d. random sequence with zero mean and finite variance σv2 , the cost function (5.5) becomes l

J = (dˆ − d)T C(dˆ − d) + ∑ (dˆ − d)T A(i)(aî bˆ − ai b) i=1

(5.5)

56

Y. Liu and E.-W. Bai l

l

+ ∑ ∑ (aî bˆ − ai b)T B(i, j)(aˆ j bˆ − a j b) + σv2 .

(5.6)

i=1 j=1

If the input signal is i.i.d. with symmetric distribution, the following convergence result can be derived for the IIR Hammerstein system (5.2). Theorem 5.1. Consider the Hammerstein system (5.2) and the cost function (5.6). Suppose the first element of b is nonzero, i.e., b1 = 0, and the input nonlinearity is odd, i.e., −gi (u) = gi (−u), i = 1, ..., l. Further, assume that the input ut is i.i.d. which has a symmetric probability density function with positive supports at no less than l points, say at r1 , ..., rl , so that gi ’s are linear independent with respect to these l points, i.e., ⎞ ⎛ g1 (r1 ) . . . g1 (rl ) ⎟ ⎜ rank ⎝ ... . . . ... ⎠ = l . gl (r1 ) . . . gl (rl ) Then, the normalised iterative algorithm (5.4) converges to the true parameters a, d, and b in one iteration step provided N → ∞.

5.3 Non-smooth Nonlinearities In this section, we apply the iterative algorithm to two types of non-smooth Hammerstein system with finite impulse response (FIR) linear subsystem. The linear part has the following representation: yt = d1 xt−1 + d2 xt−2 + · · · + dn xt−n .

(5.7)

The saturation and preload nonlinearities that are encountered often in practical applications are considered in this chapter as shown in Figures 5.2 and 5.3 . Let us express the saturation nonlinear part in Figure 5.2 as ⎧ ut > c; ⎨ a2 , (5.8) xt = f (ut ) = a1 u, − c ≤ ut ≤ c; ⎩ −a2 , ut < −c. Define a switching function h(u) as follows: 1, u > 0; h(u) = 0, u ≤ 0. The signal xt now can be expressed in an additive form: xt = a1 g1 (u, c) + a2g2 (u, c) where g1 (u, c) = uh(|c| − |u|), g2 (u, c) = h(u − c) − h(−c − u).

(5.9)

5

Iterative Identification of Hammerstein Systems

57

f (u) a2 = a1 c a1 u −c .

−a2

.. ... .. ... .... .. ...

.. ... .. ... .. ... .. ... ...

c

u

Fig. 5.2: Nonlinear part with saturation

Note that both g1 (u, c) and g2 (u, c) are odd functions of u and here xt is continuous piecewise-linear function of ut . Similarly the preload nonlinear part in Figure 5.3 can be expressed by xt = a1 g1 (u, c) + a2g2 (u, c)

(5.10)

where g1 (u, c) = u[h(u − c) + h(−c − u)], g2(u, c) = h(u − c) − h(−c − u). and both g1 (·) and g2 (·) are odd functions. xt is a discontinuous function of ut here. Now we can write both types of Hammerstein systems into the following compact form, yt = where

n

2

j=1

i=1

∑ d j [ ∑ ai gi(ut− j , c)] + vt = dT G(ut , c)a + vt ⎞ g1 (ut−1 , c) g2 (ut−1 , c) ⎟ ⎜ .. .. G(ut , c) = ⎝ ⎠. . . ⎛

g1 (ut−n , c) g2 (ut−n , c) The unknown parameters needed to be estimated are: a = (a1 , a2 )T , d = (d1 , d2 , ..., dn )T and the discontinuity point c in the Hammerstein system with preload nonlinear part. To identify the Hammerstein systems with non-smooth nonlinearities (5.11), the ˆ cˆ are sought to minimise the least squares cost function estimates aˆ , d,

58

Y. Liu and E.-W. Bai

ˆ c} ¯ c) {â, d, ˆ = argmina¯ ,d, a, d, ¯ = argmina¯ ,d, ¯ c¯ JN (¯ ¯ c¯ = argmina¯ ,d, ¯ c¯

1 N ∑ (yt − yˆt )2 N t=1

1 N ¯ a)2 . ∑ (yt − d¯ T G(ut , c)¯ N t=1

(5.11)

Without loss of generality, we adopt the normalisation constraint on d for model (5.11). Assume that ||d|| = 1, and the first non-zero entry of d is positive. The following normalised iterative algorithm is proposed to estimate a and d in Hammerstein system with saturation nonlinearity in Figure 5.2. Given the initial estimate aˆ 0 = 0 and an estimate of an upper bound of the threshold parameter cˆ1 > c, ¯ cˆ1 ); dˆ k = arg mind¯ JN (âk−1 , d, ˆ Normalise dk to have positive first non-zero entry ; aˆ k = arg mina¯ JN (¯a, dˆ k , cˆ1 ); {aˆ2 }k = the second element of aˆ k , Given an estimate of an lower bound of the threshold parameter cˆ2 < c, aˆ k = arg mina¯ JN (¯ak−1 , dˆ k , cˆ2 ); {aˆ1 }k = the first element of aˆ k , Replace k by k + 1 and the process is repeated.

(5.12)

Similarly, we propose the following normalised iterative algorithms to estimate a and d in the Hammerstein system with preload nonlinearity in Figure 5.3. Given the initial estimate aˆ 0 = 0 and an estimate of an upper bound of the threshold parameter cˆ > c, define ¯ c); ˆ dˆ k = arg mind¯ JN (âk−1 , d, Normalise dˆ k to have positive first non-zero entry; ˆ aˆ k = arg mina¯ JN (¯ak−1 , dˆ k , c); Replace k by k + 1 and the process is repeated.

(5.13)

In fact, the algorithm can also be started with dˆ 0 : Given the initial estimate dˆ 0 = 0, define ˆ aˆ k = arg mina¯ JN (¯a, dˆ k−1 , c); ¯ c); dˆ k = arg mind¯ JN (âk , d, ˆ Normalise dˆ k to have positive first non-zero entry; update aˆ k accordingly with the normalisation factor; Replace k by k + 1 and the process is repeated.

(5.14)

To analyse the convergence properties of the above algorithms (5.12), (5.13), and ˆ c) (5.14), we write the cost function JN (â, d, ˆ in (5.11) as follows, 1 N n g1 (ut−i , c) g1 (ut−i , c) ˆ ˆ ˆ ˆ = ∑ [ ∑ (di (a1 , a2 ) − di (aˆ1 , aˆ2 ) + vt ]2 . JN (â, d, c) ˆ g2 (ut−i , c) g2 (ut−i , c) N t=1 i=1 We are interested in the behaviour of the algorithm when N is large enough. Let N → ∞, and define

5

Iterative Identification of Hammerstein Systems

ˆ ˆ j) = E[ g1 (ut−i , c) A(i, ˆ g2 (ut− j , c))], ˆ · (g1 (ut− j , c), g2 (ut−i , c) ˆ g (u , c) A(i, j) = E[ 1 t−i · (g1 (ut− j , c), g2 (ut− j , c))], g2 (ut−i , c) g (u , c) A¯ 1 (i, j) = E[ 1 t−i ˆ g2 (ut− j , c))], ˆ · (g1 (ut− j , c), g2 (ut−i , c) g (u , c) ˆ A¯ 2 (i, j) = A¯ T1 ( j, i) = E[ 1 t−i · (g1 (ut− j , c), g2 (ut− j , c))] g2 (ut−i , c) ˆ

59

(5.15)

where i, j = 1, 2, ..., n. When the noise vt is i.i.d. random sequence with zero mean and finite variance σv2 , the cost function will become n

n

ˆ c) ˆ c) ˆ j)âdˆj ˆ = ∑ ∑ [di aT A(i, j)ad j + dî aˆ T A(i, J(â, d, ˆ = lim JN (â, d, N→∞

i=1 j=1

−di a A¯ 1 (i, j)âT dˆj − dî aˆ T A¯ 2 (i, j)ad j ] + σv2 . (5.16) T

Similarly, define ⎞ gi (ut−1 , c) ⎜ gi (ut−2 , c) ⎟ ⎟ ⎜ B(i, j) = E[⎜ ⎟ · (g j (ut−1 , c), g j (ut−2 , c), · · · g j (ut−n , c))], .. ⎠ ⎝ . ⎛

⎛

gi (ut−n , c)

⎞ gi (ut−1 , c) ˆ ⎜ gi (ut−2 , c) ˆ ⎟ ⎟ ˆ j) = E[⎜ B(i, ˆ g j (ut−2 , c), ˆ · · · g j (ut−n , c))], ˆ ⎜ ⎟ · (g j (ut−1 , c), .. ⎝ ⎠ . ˆ gi (ut−n , c)

⎞ gi (ut−1 , c) ⎜ gi (ut−2 , c) ⎟ ⎟ ⎜ ˆ g j (ut−2 , c), ˆ · · · g j (ut−n , c))], ˆ B¯ 1 (i, j) = E[⎜ ⎟ · (g j (ut−1 , c), .. ⎠ ⎝ . ⎛

B¯ 2 (i, j) =

gi (ut−n , c) T B¯ 1 ( j, i)

(5.17)

where i, j = 1, 2. The cost function in (5.16) can be written in the following equivalent form, 2

2

ˆ c)) ˆ c)) ˆ j)dˆ aˆ j (5.18) J(â, d, ˆ = lim JN (â, d, ˆ = ∑ ∑ [ai dT B(i, j)da j + aîdˆ T B(i, N→∞

i=1 j=1

−ai dT B¯ 1 (i, j)dˆ aˆ j − aî dˆ T B¯ 2 (i, j)da j ] + σv2 (5.19) The convergence property of algorithm (5.12) for Hammerstein system with saturation nonlinear part is shown in the following theorem.

60

Y. Liu and E.-W. Bai

f (u)

−c

.. .... .. .... ... ... .... .

... ... .. ... ... .. ... .. ...

a1 u + a2

c

ut

a1 u − a2

Fig. 5.3: Nonlinear part with preload

Theorem 5.2. Consider the Hammerstein system with saturation nonlinearity (5.11). Assume an upper bound cˆ = cˆ1 > c, and lower bound cˆ = cˆ2 < c for the threshold value c are available. In addition, assume the input ut is i.i.d. with symmetric distribution that have positive support at points both larger and smaller than cˆ1 , cˆ2 such that Eg2i (u, cˆ j ) > 0, i, j = 1, 2, and the initial estimate dˆ T0 d = 0. Then the normalised iterative algorithm (5.12) converges to the true parameters a and d in one step provided that N → ∞. The similar convergence results can be established for the pre-load nonlinearity. However, the proof is tedious. For simplicity, we only present the result for a uniform distribution input. Theorem 5.3. Consider the Hammerstein system with preload nonlinearity (5.11), ˆ i) > 0, i = 1, ..., n. Assume the initial the cost function (5.16) or (5.19) with A(i, estimate dˆ T0 d = 0 and the input ut is i.i.d. uniformly distributed on [−U,U], where U ≥ c is a known upper bound of c. Then, the normalised iterative algorithm (5.13) or (5.14) converges to the true parameters a and d in one step provided that N → ∞. Next we need to obtain an estimate for the discontinuity point c in the Hammerstein system with preload. When the input ut be i.i.d. uniformly distributed in [−U,U], 1 the probability density function of ut is q(u) = 2U . Consider the correlation between yt and ut−1 . We have U 3 − c3 U 2 − c2 + a2 d1 3U 2U a1 d1 3 a2 d1 2 a1 d1U 2 a2 d1U c − c + + − E(yt ut−1 ) = 0 ⇒− 3U 2U 3 2

E(yt ut−1 ) = a1 d1 E(ug1 (u)) + a2d1 E(ug2 (u)) = a1 d1

We can solve last equation (5.20) above to get an estimate c. ˆ The correlation can be estimated by the sample mean,

5

Iterative Identification of Hammerstein Systems

yt ut−1 =

61

1 N ∑ yt ut−1 . N − 1 t=2

The parameters a1 , a2 , d1 in (5.20) are replaced by the estimates from the iterative algorithm. It can be proved that cˆ → c as N → ∞. Thus all the parameters in the systems with preload nonlinear part are estimated.

5.4 Examples The first example shows the convergence of the iterative algorithm (5.4). The true parameters in the Hammerstein system with IIR nonlinear part are a = (0.9545, 0.2983)T , b = (3, −2)T , d = (0.3, 0.2, 0.1)T . The simulation conditions are as follows. ut is uniformly distributed in [−1, 1], N = 2500, bˆ 0 = (1, 0)T , dˆ 0 = (0.1, 0.4, 0.4)T . The nonlinear functions are g1 (u) = u, g2 (u) = u3 , the noise vt is white Gaussian with zero mean and standard deviation 0.1. Using the normalised iterative algorithm, after one step we get aˆ = (0.9537, 0.3008)T , bˆ = (2.9793, −1.9850)T , dˆ = (0.2997, 0.2000, 0.1000)T . The estimates are very close to the true values. To show how the estimates beˆ b) ˆ − have with different values of N, we calculated the squared error eT e, e = (â, d, (a, d, b) for N = 100 to 10000. For each N, the squared error is the average of 100 Monte Carlo simulation. The error is plotted in Figure 5.4 showing that the squared error goes to zero as N → ∞ in probability. The convergence of algorithm (5.4) depends on the odd symmetry of the nonlinear block. When the nonlinear block is not perfectly odd, we give the following example to show the method is not very sensitive on the odd assumption. Let the nonlinear functions be g1 (u) = u3 , g2 (u) = u2 . The true parameters are a = (0.9998, 0.0200)T , b = (3, −2)T , d = (0.4, 0.2, 0.1)T . The nonlinear block has a small even component and is not perfectly odd here. The simulation conditions are as follows. ut is uniformly distributed in [−1, 1], N = 5000, bˆ 0 = (1, 0)T , dˆ 0 = (0.05, 0.05, 0.05)T . The noise vt is white Gaussian with zero mean and standard deviation 0.1. After one iterative step, we get the estimates aˆ = (0.9998, 0.0188)T , bˆ = (2.9999, −1.9993)T , dˆ = (0.3998, 0.2000, 0.1000)T . The estimates are still close to the true values in this example of non-odd case. We then provide a numerical example to show the efficiency of the iterative algorithm (5.12) in Hammerstein system with saturation nonlinear part. The true parameters are a = (1, 3.5)T , d = (0.9058, 0.3397, −0.2265, −0.1132)T , c = 3.5.

62

Y. Liu and E.-W. Bai

0.16 0.14 0.12

Squared Error

0.1 0.08 0.06 0.04 0.02 0

0

2000

4000

6000

8000

10000

N

Fig. 5.4: Estimation error for Hammerstein models with IIR linear block

In simulation, the input ut is uniformly distributed in [−10, 10], N = 4000, aˆ 0 = (0.5, 4.9)T . Noise vt is white Gaussian with zero mean and standard deviation 0.1. Using the normalised iterative algorithm, after one step we get dˆ = (0.9052, 0.3399, −0.2274, −0.1161)T . Using an upper bound cˆ1 = 9.8 and lower bound cˆ2 = 1 of true parameter, we get aˆ = (0.9878, 3.3960)T , It then can be used to estimate the parameter c, cˆ = 3.4380. The mean squared errors for different value of N are plotted in Figure 5.5. Thus the algorithm (5.12) gives very close estimates for the true parameters. The last example is a Hammerstein system with preload piecewise-linear nonlinear part. We will use it to verify the algorithm (5.13) and (5.14). The true parameters are a = (6, 2)T , d = (0.7947, 0.2649, −0.5298, −0.1325)T , and c = 2. The simulation conditions are as follows. ut is uniformly distributed in [−10, 10], N = 5000, aˆ 0 = (4, −3)T , cˆ = 4 is an upper bound of the true value c = 2. Noise vt is white Gaussian with zero mean and standard deviation 0.1. Using the normalised iterative algorithm, after one step we get aˆ = (5.9495, 2.0648)T , dˆ = (0.7973, 0.2685, −0.5249, −0.1292)T .

5

Iterative Identification of Hammerstein Systems

63

0.1 0.09 0.08

Squared Error

0.07 0.06 0.05 0.04 0.03 0.02 0.01 0

0

2000

4000

6000

8000

10000

N

Fig. 5.5: Estimation error for Hammerstein models with saturation nonlinear block

7

6

Squared Error

5

4

3

2

1

0 0.2

0.4

0.6

0.8

1

1.2 N

1.4

1.6

1.8

2 x 10

4

Fig. 5.6: The true (solid) and the estimated (dash-dot) nonlinearities

To estimate c more accurately, we use the input ut uniformly distributed in [−4, 4], and choose the real solution of the equation (5.20). The estimate is cˆ = 2.0140. Figure 5.6 demonstrates the average squared errors with different value of N. This shows that the algorithm is effective.

64

Y. Liu and E.-W. Bai

One point that need to be emphasised is that the estimates in the examples are ˆ b), ˆ not J(â, d, ˆ b). ˆ But given the assumptions on the obtained by minimising JN (â, d, noise, the following consistency result is available [8, 10], lim θˆN = θo w.p. 1, θo = argminθ ∈Rm lim EJN (θ ).

N→∞

N→∞

ˆ b}, ˆ θo = {a, d, b}. Here θˆN = {â, d,

5.5 Conclusion Iterative algorithm is a simple and efficient approach in parameter estimation. The convergence of the iterative method on IIR Hammerstein system is not known in current literature. A normalised iterative algorithm with some given initial conditions is proposed for the system in this chapter. The convergence property is analysed and illustrated with an example. Hammerstein systems with non-smooth piecewise-linear nonlinear part are often encountered in real processes operating differently in different input intervals. In this chapter, normalised iterative algorithms are applied to identification of Hammerstein systems with two kinds of odd piecewise-linear nonlinearities, the saturation and the preload nonlinear structures. By using a random input with symmetric distribution, the iterative algorithm is shown to be convergent in one step. The results are supported by examples of both systems. The convergence results of the iterative algorithms in such systems are new. The algorithms converge fast and is easy to compute. Although some prior knowledge of the nonlinear part structure and lower and upper bound of the critical points is needed, the method is still useful and provides a simple way to get convergent estimates. The numerical examples show that the suggested approach could be a viable method to try in such systems.

References 1. Bai, E.W.: Iterative identification of Hammerstein systems. Automatica 43, 346–354 (2006) 2. Bai, E.W.: Identification of linear system with hard input nonlinearities of known structure. Automatica 38, 853–860 (2002) 3. Bai, E.W., Li, D.: Convergence of the iterative Hammerstein system identification algorithm. IEEE Trans. on Auto. Contr. 49, 1929–1940 (2004) 4. Bai, E.W., Liu, Y.: Least squares solutions of bilinear equations. System & Control Letters 55, 466–472 (2006) 5. Cerone, V., Regruto, D.: Parameter bounds for discrete-time Hammerstein models with bounded output errors. IEEE Trans. on Auto. Contr. 48, 1855–1860 (2003) 6. Chen, H.F.: Pathwise convergence of recursive identification algorithms for Hammerstein systems. IEEE Trans. on Auto. Contr. 49, 1641–1649 (2004) 7. Greblicki, W.: Continuous time Hammerstein system identification. IEEE Trans. on Auto. Contr. 45, 1232–1236 (2000)

5

Iterative Identification of Hammerstein Systems

65

8. Ljung, L.: System Identification: Theory for the User. Prentice-Hall, New Jersey (1999) 9. Narendra, K.S., Gallman, P.G.: Continuous time Hammerstein system identification. IEEE Trans. on Auto. Contr. 11, 546–550 (1966) 10. Ninness, B., Gibson, S.: Quantifying the accuracy of Hammerstein model estimation. Automatica 38, 2037–2051 (2002) 11. Wolodkin, G., Rangan, S., Poolla, K.: New results for Hammerstein system identification. In: Proc. of CDC, pp. 697–702 (1995) 12. Stoica, P.: On the convergence of an iterative algorithm used for Hammerstein systems. IEEE Trans. on Auto. Contr. 26, 967–969 (1981) 13. Vörös, J.: Parameter identification of discontinuous Hammerstein systems. Automatica 33, 1141–1146 (1997) 14. Vörös, J.: Recursive identification of Hammerstein systems with discontinuous nonlinearities containing dead-zones. IEEE Trans. on Auto. Contr. 48, 2203–2206 (2003)

Part III

Stochastic Methods

Chapter 6

Recursive Identification for Stochastic Hammerstein Systems Han-Fu Chen

6.1 Introduction By the Hammerstein system we mean a cascading system composed of a nonlinear memoryless block f (·) followed by a linear subsystem. In a deterministic setting, the linear part of the system is characterised by a rational transfer function (RTF) and the system output yk is exactly observed. However, in practice the system itself may be random and the observations may be corrupted by noises. So, it is of practical importance to consider stochastic Hammerstein systems as shown in Figure 6.1.

𝜉𝑘 𝑢𝑘

𝑓(⋅)

𝑣𝑘

Linear Subsystem

𝑦𝑘

C

𝑧𝑘

Fig. 6.1: Hammerstein system

Identification of Hammerstein systems has been an active research area for many years. The topic of the chapter is to recursively identify the SISO stochastic Hammerstein system. Let us specify systems to be identified in the chapter. The linear subsystem is described by an ARMAX system A(z)yk+1 = B(z) f (uk ) + C(z)wk+1 , k ≥ 0

(6.1)

or by an ARX system Han-Fu Chen Key Laboratory of Systems and Control, Institute of Systems Science, AMSS, Chinese Academy of Sciences, Beijing 100190, P.R. China e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 69–87. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

70

H.-F. Chen

A(z)yk+1 = B(z) f (uk ) + wk+1 ,

(6.2)

where {wk } is the system noise and z is the backward-shift operator: zyk = yk−1 . So, here the stochastic Hammerstein system is concerned. The system input is {uk }, but the input of the linear subsystem is Δ

vk = f (uk ), where f (·) is an unknown static nonlinearity. The output of the system is {yk }, but the observation zk is with additive noise ξk zk = yk + ξk .

(6.3)

By assuming the orders (p, q, r) be known, identification of the linear subsystem is to estimate the unknown coefficients in the polynomials: Δ

A(z) = 1 + a1z + · · · + a pz p , p ≥ 0, a p = 0, Δ

B(z) = 1 + b1z · · · + bqzq , q ≥ 0, bq = 0, Δ

C(z) = 1 + c1z · · · + cr zr , r ≥ 0, cr = 0.

(6.4)

It is worth noting that it is not a restriction that all polynomials are monic. First, dividing by a constant we can obtain a monic A(z), then by changing the variance of wk we can make C(z) to be monic. Finally, any constant factor b0 of B(z) may be considered as a multiple of f (·), and instead of f (·) a new unknown function Δ

g(·) = b0 f (·) may serve as the system nonlinearity. As the nonlinearity concerns, both nonparametric and parametric f (·) are considered in the chapter. For the nonparametric f (·), the value f (u) is estimated for any fixed u. In the parametric case, f (·) either is expressed by a linear combination of known basis functions with unknown coefficients, or is a piecewise linear function with unknown joints and slopes, and hence identification of the nonlinear block in this case is equivalent to estimating unknown parameters. Various parameter estimation methods such as the extended least squares (ELS), instrumental variables, iterative optimisation and many others are used to identify Hammerstein systems with parametrised nonlinearities, see, e.g., [1-3, 7, 10, 15-19] among others. When the nonlinearity is not parametrised, it is usually estimated with the help of kernel functions, see, e.g., [11, 12, 21, 22] among others. For the case where the data set is with fixed size, an optimisation approach to estimating unknown parameters may be appropriate, but in the case where the data size is increasing or the number of data is extremely large though the data size is fixed, the recursive estimation algorithms is preferable to apply, because in this case the estimates are convenient to be updated when a new data is taken into account. In the chapter we only consider the recursive algorithms to estimate not only the nonlinearity of the system but also the unknown parameters contained in the linear

6

Recursive Identification for Hammerstein Systems

71

part of the system. All estimates are proved to converge a.s. to the corresponding true values. The rest of the chapter is arranged as follows. In Section 6.2 the problem is solved for the Hammerstein system with nonparametric f (·). In Section 6.3 the systems with the nonlinearities being piecewise linear functions are identified, while in Section 6.4 it is dealt with Hammerstein system identification with nonlinearity expanded to a series of basis functions with unknown coefficients. Some concluding remarks are given in Section 6.5. In the Appendix a general convergence theorem (GCT) is presented, which is used for convergence analysis of the proposed recursive identification algorithms.

6.2 Nonparametric f (·) In this section we consider identification of the Hammerstein system consisting of (6.2) and (6.3) with f (·) non-parametrised [6,22]. Let us fix an arbitrary u and estimate f (u). As the system input, let us take {uk } to be a sequence of independent and identically distributed (iid) random variables with density p(·) such that Euk = 0, |uk | ≤ u∗ where u∗ > 0 is a constant, u∗ > |u|, and p(·) is continuous at u with p(u) > 0. Assume uk = 0 for k < 0. Let Δ μ E f (uk ) , R1 E( f (uk ) − μ )2 , Euk f (uk ) = ρ = 0. We need the following assumptions: A1. All roots of A(z) = 0 are outside the closed unit disk; A2. 1 + b1 + · · · + bq = 0; A3. Both {ξk } and {wk } are sequences of iid random variables with Ewk = E ξk = 0, R2 Ew2k , R3 E ξk2 , R2 < ∞, R3 < ∞, and ξk = wk = 0 for k < 0, and the sequences {uk }, {ξk }, and {wk } are mutually independent. A4. The rational functions B(z)B(z−1 )R1 + R2 and A(z)A(z−1 ) have no common zero; A5. The function f (·) is measurable, locally bounded and continuous at u, where f (u) is estimated.

6.2.1 Identification of A(z) It is clear that the process {zk } with selected {uk } is asymptotically stationary if A1, A3 hold. By stability of A(z) the influence of initial values exponentially decays. So, without loss of generality, for simplicity of writing we may directly assume {zk } is stationary: b Δ μ ∗ = Ezk (= μ ), and a Δ

γ (τ ) = E(zk+τ − μ ∗ )(zk − μ ∗ ), τ ≥ 0, where

(6.5)

72

H.-F. Chen Δ

p

Δ

b

Δ

Δ

a = ∑ ai , b = ∑ bi with a0 = 1, b0 = 1. i=0

(6.6)

i=0

From (6.2) and (6.3) it follows that A(z)zk+1 = B(z) f (uk ) + A(z)ξk+1 + wk+1 , which can be rewritten as b A(z)(zk+1 − μ ) = B(z)( f (uk ) − μ ) + A(z)ξk+1 + wk+1 . a

(6.7)

Multiplying both sides of (6.7) by (zk−s − ba μ ), s ≥ p ∨ q and taking expectation by A3 lead to

γ (s + 1) + a1γ (s) + · · · + a pγ (s + 1 − p) = 0, s ≥ p ∨ q.

(6.8)

Setting s = p ∨ q, p ∨ q + 1, · · · , p ∨ q + p − 1 in (6.8), we derive the following linear algebraic equation: ⎤ ⎡ ⎡ ⎤ a1 γ (p ∨ q + 1) ⎢ γ (p ∨ q + 2)⎥ ⎢ a2 ⎥ ⎥ ⎢ ⎢ ⎥ (6.9) T ⎢ . ⎥ = −⎢ ⎥, .. ⎦ ⎣ ⎣ .. ⎦ . ap where

⎡

γ (p ∨ q) γ (p ∨ q + 1) .. .

γ (p ∨ q + p)

⎤ · · · γ (p ∨ q + 1 − p) · · · γ (p ∨ q + 2 − p)⎥ ⎥ ⎥. .. .. ⎦ . .

⎢ ⎢ T ⎢ ⎣ γ (p ∨ q + p − 1) · · ·

(6.10)

γ (p ∨ q)

The coefficients of A(z) are estimated as follows. By ergodicity of {zk }, we recursively estimate μ ∗ and γ (τ ) by 1 ∗ 1 μk∗ = (1 − )μk−1 + zk k k and

(6.11)

1 1 ∗ ∗ γk (τ ) = (1 − )γk−1 (τ ) + (zk − μk− (6.12) τ )(zk−τ − μk−τ ). k k Replacing γ (i) in (6.9) with γk (i) obtained by (6.11) and (6.12), i = p ∨ q + 1 − p, · · · , p ∨ q + p, we derive the following equation for defining estimates ak ( j) for a j , j = 1, · · · , p : ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ak (1) γk (p ∨ q) · · · γk (p ∨ q + 1 − p) γk (p ∨ q + 1) ⎢ γk (p ∨ q + 1) · · · γk (p ∨ q + 2 − p)⎥ ⎢ ak (2) ⎥ ⎢ γk (p ∨ q + 2)⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ · ⎢ .. ⎥ = − ⎢ ⎥ . (6.13) .. .. .. .. ⎣ ⎦ ⎣ . ⎦ ⎣ ⎦ . . . . γk (p ∨ q + p − 1) · · · γk (p ∨ q) γk (p ∨ q + p) ak (p)

6

Recursive Identification for Hammerstein Systems

73

Theorem 6.1. Assume A1, A3, A4, and A5 hold. Then ak (i) −−−→ ai a.s. for i = 1, · · · , p. k→∞

Proof. By ergodicity of {zk } we have

μk∗ −−−→ μ ∗ a.s. and γk (τ ) −−−→ γ (τ ) a.s. ∀τ ≥ 0. k→∞

k→∞

This means that the key point is to prove that the matrix T defined by (6.10) is nonsingular. The proof is similar to that given in [20] for the case R2 = 0. By A1 there is an R > 1 such that the spectral function g(z) of {zk } is analytic in B {z : R1 < |z| < R}, where g(z) =

∞ R1 B(z)B(z−1 ) + R2 + R3 A(z)A(z−1 ) = ∑ γ (i)zi . −1 A(z)A(z ) i=−∞

(6.14)

Assume the converse: T is singular. Then there exists a vector h = [h1 , · · · , h p ]T = 0 such that T h = 0, i.e., p

∑ h j γ (p ∨ q + i − j) = 0, i = 1, · · · , p,

(6.15)

j=1

which by (6.9) implies p

p

∑ h j γ (p ∨ q + p + 1 − j)

p

= − ∑ h j ∑ al γ (p ∨ q + p + 1 − j − l) j=1

j=1

l=1

p

p

l=1

j=1

= − ∑ al ∑ h j γ (p ∨ q + p + 1 − j − l) = 0, and hence (6.15) is valid for all i ≥ 1. p

Δ

Consider h(z) = g(z)z p∨q−p ∑ h j z p− j . By (6.14)(6.15) we have j=1

h(z) =

p

∞

j=1

l=0

∑ h j ∑ γ (l + j − p ∨ q)zl .

(6.16)

Thus, h(z) is bounded in |z| < 1. On the other hand, by A4 there is no pole-zero cancelation in R1 B(z)B(z−1 ) + R2 + R3 A(z)A(z−1 ) z p∨q p∨q−p g(z)z = . A(z)A(z−1 )z p By A1, A(z)A(z−1 )z p has p roots in the open disk with radius |z| = R1 . Therefore, h(z) has at least one pole in the open disk with radius |z| = R1 . This contradicts

74

H.-F. Chen

with the boundedness of h(z) in |z| < 1. The obtained contradiction shows that T is nonsingular. Then, (6.13) gives consistent estimates for ai , i = 1 · · · , p.

6.2.2 Identification of B(z) In what follows βk (0) is the estimate for ρ Eu1 f (u1 ) and βk (i) is for ρ bi . So, βk (i) d ∗ β (0) is the estimate for bi , i = 1, · · · , q, whenever βk (0) = 0. Since μ = c μ , k

p

1 + ∑ ak (i)

μk μk∗

i=1

1+

1 βk (0)

(6.17)

q

∑ βk (i)

i=1

serves as the estimate for μ (= E f (u1 )), if ρ = 0 and A2 holds. We apply the stochastic approximation algorithms with expanding truncations (SAAWET) [5] to estimate bi , i = 0, 1, · · · , q. For convenience of reading, the GCT of SAAWET is attached at the end of the chapter. Let {Mk } be a sequence of positive numbers increasingly diverging to infinity: Mk > 0, Mk+1 > Mk ∀ k, and Mk −−−→ ∞. k→∞

Setting σ0 (i) = 0, i = 0, 1, · · · , q, with an arbitrary initial value β0 (i), recursively define

βk+1 (i) = [βk (i) − 1k (βk (i) − uk (zk+1+i + ak−p (1)zk+i + · · · + ak−p(p)zk+1+i−p ))] (6.18) ·I[|β (i)− 1 (β (i)−u (z ], k k k k+1+i +ak−p (1)zk+i +···+ak−p (p)zk+1+i−p ))|≤M σk (i)

k

σk (i)

k−1

= ∑ I[|β j (i)− 1 (β j (i)−u j (z j+1+i +a j−p (1)z j+i +···+a j−p (p)z j+1+i−p))|>M σ j=1

j

j (i)

].

Theorem 6.2. Assume A1, A3, A4, and A5 hold. Then the recursive estimates βk (i), i = 0, 1, · · · , q, are strongly consistent: Δ

βk (i) −−−→ ρ bi , and bk (i) = k→∞

βk (i) −−−→ bi , a.s., i = 0, 1, · · · , q. βk (0) k→∞

Proof. We rewrite (6.18) as 1 1 βk+1 (i) = (βk (i) − (βk (i) − ρ bi) − εk+1 (i)) k k · I[|β (i)− 1 (β (i)−ρ bi )− 1 ε (i)|≤M ] k k k+1

(6.19)

εk+1 (i) = ρ bi − uk (zk+1+i + ak−p(1)zk+i + · · · + ak−p(p)zk+1+i−p ), i = 0, 1, · · · , q.

(6.20)

k

k

σk (i)

where

6

Recursive Identification for Hammerstein Systems

Define

75

m

1 ≤ t} i=n i

m(n,t) max{m : n ≤ m, ∑ By GCT it suffices to show, for i = 0, 1, · · · , q, lim lim sup

T →∞ n→∞

1 m(n,t) 1 | ∑ ε j+1 (i)| = 0 a.s., ∀ t ∈ (0, T ). T j=n j

(6.21)

By (6.2), we rewrite (6.20) as (1)

(2)

(3)

εk+1 (i) = εk+1 (i) + εk+1 (i) + εk+1 (i), where

Δ (1) εk+1 (i) = ρ bi − uk f (uk+i ) + b1 f (uk+i−1 ) + · · · + bq f (uk+i−q ) + ξk+1+i Δ (2) εk+1 (i) = uk (a1 − ak−p(1))yk+i + · · · + (a p − ak−p(p))yk+1+i−p Δ (3) εk+1 (i) = −uk wk+1+i + ak−p (1)ξk+i + · · · + ak−p(p)ξk+1+i−p .

By the convergence theorem for the sum of martingale difference sequences [4, 9], we have for i = 0, 1, · · · , q ∞

1

∑ k εk+1 (i) < ∞ (1)

∞

a.s.

k=1

1

∑ k εk+1 (i) < ∞ (3)

a.s.

k=1

(1)

(3)

Therefore, (6.21) is satisfied with ε j+1 (i) replaced by εk+1 (i) and εk+1 (i). By A1, A3, A5, and the boundedness of {uk }, applying Theorem 6.1 leads to ∞

1

∑ k εk+1 (i) < ∞ (2)

a.s.

k=1

Thus, (6.21) has been verified, and the proof is completed.

Remark 6.1. In the proof of Theorems 6.1 and 6.2 the condition A5 is only partly used. As a matter of fact, the requirement “ f (·) is continuous at u, where f (u) is estimated” is not needed.

6.2.3 Identification of f (u) For estimating f (u), we will use the kernel function defined below:

χk ! ∗ u

−u∗

where τk =

1 kδ

e e

u −u −( kτ )2 k

2 −( x−u τ ) k

, p(x)dx

, with δ fixed, δ ∈ (0, 12 ). Similarly to [6], it can be shown that

76

H.-F. Chen

√ E χk = 1, E χk f (uk ) −−−→ f (u), and sup E( τk χk )2 < ∞. k→∞

(6.22)

k

Setting λ0 (u) = 0, with an arbitrary initial value η0 (u), we define the estimate fk (u) for f (u). Let ηk (u) be recursively calculated according to the following algorithm:

ηk+1 (u) 1 =[ηk (u) − (ηk (u) − (χk − 1)(zk+1 + ak−p(1)zk + · · · + ak−p(p)zk+1−p ))] k · I[|η (u)− 1 (η (u)−(χ −1)(z +a (1)z +···+a (p)z (6.23) ))|≤M ], k

λk (u) =

k

k

k

k+1

k−p

k

k−p

λk (u)

k+1−p

k−1

∑ I[|η j (u)− 1j (η j (u)−(χ j −1)(z j+1+a j−p (1)z j +···+a j−p(p)z j+1−p))|>Mλ j (u) ] .

j=1

Then fk (u) ηk (u) + μk serves as the estimate for f (u). Theorem 6.3. Assume A1-A5 hold and ρ = 0. Then

ηk (u) −−−→ f (u) − E f (u1 ) a.s. and ηk (u) + μk −−−→ f (u) a.s., k→∞

k→∞

where {ηk (u)} is generated by (6.23), and { μk } is given by (6.17). Proof. Similar to the proof of Theorem 6.2, by GCT given in Appendix, we only need to show lim lim sup

T →∞ n→∞

1 m(n,t) 1 | ∑ e j+1 | = 0 a.s., ∀ t ∈ (0, T ), T j=n j

(6.24)

where ek+1 = ( f (u) − E f (u1 )) − (χk − 1)(zk+1 + ak−p(1)zk + · · · + ak−p(p)zk+1−p ).

(6.25)

By (6.2), ek+1 is expressed as ek+1 = ( f (u) − E f (u1 )) − (χk − 1)( f (uk ) + b1 f (uk−1 ) + · · · + bq f (uk−q ) + ξk+1 + (ak−p(1) − a1)yk + · · · + (ak−p (p) − a p)yk+1−p + εk+1 + ak−p(1)εk + · · · + ak−p(p)εk+1−p ). By (6.22), A1, A3, and Theorem 6.1, using the boundedness of { f (uk )} and convergence theorem for the sum of martingale difference sequences [4, 9], similar to the proof in Theorem 6.2, we have

6

Recursive Identification for Hammerstein Systems ∞

77

1

∑ k (χk − 1)(b1 f (uk−1 ) + · · · + bq f (uk−q ) + ξk+1) < ∞

a.s.,

k=1 ∞

1

∑ k (χk − 1)(εk+1 + ak−p(1)εk + · · · + ak−p(p)εk+1−p ) < ∞

a.s.,

k=1

∞

1

∑ k ( f (uk ) − E f (u1)) < ∞

a.s.,

k=1

and

∞

∑

k=1

1 k (χk − 1)(ak−p (1) − a1 )yk

< ∞ a.s.,

.. .

∞

∑ 1k (χk − 1)(ak−p(p) − a p)yk+1−p < ∞ a.s.

k=1

So, for (6.24) it remains to show lim lim sup

T →0 n→∞

1 m(n,t) 1 | ∑ ( f (u) − χ j f (u j ))| = 0 a.s., ∀ t ∈ (0, T ). T j=n j

(6.26)

This is verified by writing f (u) − χ j f (u j ) = f (u) − E χ j f (u j ) + E χ j f (u j ) − χ j f (u j ), and noticing

∞

1

∑ k (χk f (uk ) − E χk f (uk )) < ∞

a.s.

k=1

and m(n,t)

|

∑

j=n

1 ( f (u) − E χ j f (u j ))| = o(1) · O(T ) = o(T ), j

which follows from (6.22). Thus, ηk (u) −−−→ f (u)− E f (u1 ), and the last conclusion k→∞

of the theorem follows by (6.17) and Theorems 6.1 and 6.2.

Remark 6.2. If E ξk4 < ∞ and E εk4 < ∞, more accurate results can be obtained. In 1 2

fact, it can be shown that μk∗ − μ ∗ = O( (log log1 k) ) a.s., γk (τ ) − γ (τ ) = O( log1 k ) a.s. k2

k2

for τ ≥ 0, and ck (i) − ci = O( log1 k ) a.s., where “ = O” means “ ≤ α ”, where α is a k2

constant which is independent of k but may depend on sample path.

6.3 Piecewise Linear f (·) We continue to consider the Hammerstein system consisting of (6.2) and (6.3) with f (·) being a piecewise linear function (possibly discontinuous) [7] containing six unknown parameters c+ , c− , b+ , b− , d + , d − expressed as

78

H.-F. Chen

⎧ + + + + ⎪ ⎨c (u − d ) + b , u > d f (u) = 0, −d − ≤ u ≤ d + ⎪ ⎩ − − − c (u + d ) − b , u < −d − .

(6.27)

We keep Assumptions A2-A4 made in Section 6.2 unchanged, but strengthen A1 to A1’ and replace A5 with A5’. A1’. All roots of A(z) = 0 and B(z) = 0 are outside the closed unit disk. A5’. The unknown nonlinearity is expressed by (6.27), and an upper bound U for d + and d − is available 0 ≤ d + < U and 0 ≤ d − < U. Let us take {uk } to be a sequence of iid random variables with uniform distribution over [−2U, 2U] and independent of {ξk } and {wk }. Paying attention to Remark 1, we see that under A1’, A2-A4, and A5’ Theorems 6.1 and 6.2 remain valid for the Hammerstein system with nonlinearity (6.27). Hence, as before we have the strongly consistent estimates ak (i) and bk ( j) defined in Section 6.2 for ai and b j , respectively, i = 1, · · · , p, j = 1, · · · , q. Δ

Define the q−dimensional vector H T = [1, · · · , 0] and ⎤ ⎡ ⎡ −bk (1) 1 −b1 1 0 · · · 0 ⎢−bk (2) 0 ⎢−b2 0 1 · · · 0⎥ ⎥ ⎢ ⎢ B⎢ . ⎥ and Bk ⎢ .. .. ⎦ ⎣ . ⎣ .. . 0 1 0 −bq

0

0···0

⎤ 0 0⎥ ⎥ ⎥. 1⎦ −bk (q) 0 0 · · · 0 0

0

0··· 1··· .. .

(6.28)

From (6.2) it follows that A(z)yk+1 = B(z)vk + wk+1 , vk = H T xk , where xk = Bxk−1 + H(A(z)yk+1 − wk+1 ) is recursively defined with some initial value x0 . In what follows by Ak (z) we mean the polynomial obtained from A(z) with coefficients ai replaced by its estimate ak (i), i = 1, · · · , p, given in Section 6.2. Therefore, Δ

Ak (z)yk+1 = yk+1 + ak (1)yk + · · · + ak (p)yk−p+1 , and Ak (z)ξk+1 and Ak (z)zk+1 are similarly defined. Then, the estimate vˆk for vk is naturally defined as Δ

vˆk = H T xˆk , xˆk = Bk xˆk−1 + HAk (z)zk+1 with an arbitrarily initial value xˆ0 . It is clear that f (u) = c+ u − h+ for u ≥ U or vk = μ +T φk+ for uk ≥ U, where Δ

μ + = [c+ , h+ ]T ,

Δ

φk+ = [uk , −1]T I[uk ≥U] .

(6.29)

6

Recursive Identification for Hammerstein Systems

79

Define Δ

m+ k = vˆk I[uk ≥U] .

(6.30)

Then μ + can be estimated by the following least squares (LS) algorithm [4, 14]: + + + + +T + μk+ = μk−1 + a+ k Pk φk (mk − φk μk−1 ),

(6.31)

+T + + −1 a+ k = (1 + φk Pk φk ) .

+ + + +T + Pk+1 = Pk+ − a+ k Pk φk φk Pk ,

(6.32)

Δ

The estimate for μ − = [c− , h− ]T is calculated in a similar way. Defining Δ

Δ

m− k = vˆk I[uk ≤−U] ,

φk− = [uk ,

1]T I[uk ≤−U] ,

(6.33)

we estimate μ − by the recursive LS algorithm: − − − − −T − μk− = μk−1 + a− k Pk φk (mk − φk μk−1 ),

(6.34)

− − − −T − Pk+1 = Pk− − a− k Pk φk φk Pk ,

(6.35)

−T − − −1 a− k = (1 + φk Pk φk ) .

Write μk+ and μk− in the component form: + − − − μk+ = [c+ k , hk ] and μk = [ck , hk ].

We now define estimates for d + , b+ , d − , and b− . It is clear that

2U 1 (c+ u − h+)du 4U d + c+ h + + c+ U h + = − d +2 + d + − , 8U 4U 2 2

Evk I[uk ≥0] =

(6.36)

and from here it follows that d+ = ⎧ 1 ⎨ 1 [h+ − sign(h+ )h+2 + 4c+U(c+U − h+ − 2Ev I 2 k [uk ≥0] ) ], c+ + ⎩ 4U ( h + Ev I k [uk ≥0] ), h+ 2

if |c+ | > 0 if c+ = 0,

where “ − sign(h+ )” is taken to make d + to be continuous with respect to c+ as c+ → 0 for a fixed Evk I[uk ≥0] . From here it is natural to define the estimates for d + and b+ as follows dk+

+2 1 + + + + 2 ¯+ h+ k − sign(hk ) hk + 4c¯k U(c¯k U − hk − 2m k) = , c¯+ k Δ

+ + + b+ k = ck d k − h k ,

where c¯+ k is a modification of ck to avoid possible dividing by zero:

(6.37)

80

H.-F. Chen

Δ c¯+ k =

c+ k , 1 (sign c+ k )k,

if if

1 |c+ k | ≥ k, 1 |c+ k | < k,

(6.38)

and m¯ + k is the time average of {vˆk I[uk ≥0] } recursively defined by m¯ + k =

vˆk I[uk ≥0] k−1 + m¯ k−1 + with m¯ + 0 = 0. k k

(6.39)

− Similarly, modify c− k to c¯k :

Δ c¯− k =

c− k, 1 (sign c− k )k,

1 if |c− k | ≥ k, − if |ck | < 1k ,

(6.40)

− − and define estimates dk− and b− k for d and b respectively:

dk− =

" #1 2 − − 2 − − − − h− − sign(h ) (h ) + 4 c ¯ U( c ¯ U − h + 2 m ¯ ) k k k k k k k c¯− k

,

(6.41)

and Δ

− − − b− k = ck d k − h k ,

(6.42)

where m¯ − k is the time average of {vˆk I[uk ≤0] } : m¯ − k =

vˆk I[uk ≤0] k−1 − m¯ k−1 + with m¯ − 0 = 0. k k

(6.43)

Theorem 6.4. Assume that A1’, A2-A4, and A5’ hold and uk is uniformly distributed over [−2U, 2U]. Then with probability one it takes place that

μk− −−−→ [c− , h− ]T ,

μk+ −−−→ [c+ , h+ ]T , k→∞

dk+ −−−→ k→∞

b+ −−→ b+ , k − k→∞

+

d ,

k→∞ dk− −−−→ d − , k→∞

and b− −−→ b− . k − k→∞

Proof. The LS estimate μk+ given by (6.31)(6.32) equals

μk+ =

"

k

∑ φi+ φi+T

i=1

#−1

k

∑ φi+ m+i ,

(6.44)

i=1

k

whenever the matrix

∑ φi+ φi+T is nonsingular [4, 14].

i=1

Since {uk } is iid with uniform distribution over [−2U, 2U], by the strong law of large numbers [9] we have

6

Recursive Identification for Hammerstein Systems

2 1 k + +T 1 k ui −ui ∑ φi φi = k ∑ −ui 1 I[ui ≥U] k i=1 i=1 7 2 U − 38 U −−−→ 12 3 a.s., 1 − 8U k→∞ 4

81

(6.45)

which is nondegenerate. Let νk be the estimation error for vk

νk = vˆk − vk . Noticing that vk I[uk ≥U] = μ +T φk+ and +T + m+ φk + νk I[uk ≥U] , k =μ

from (6.44) we have " # k k μk+ = ( ∑ φi+ φi+T )−1 ∑ φi+ φi+T μ + + νi I[ui ≥U] . i=1

i=1

By noticing (6.45), for μk+ −−−→ μ + a.s. it suffices to to show k→∞

1 n −−→ 0 ∑ νk I[uk ∈B] −n→∞ n k=1

a.s. for any Borel set B.

(6.46)

Define xˆ¯k = Bk xˆ¯k−1 + HAk (z)yy+1 . Then, we have xˆk − xˆ¯k = Bk (xˆk−1 − xˆ¯k−1 ) + HAk (z)ξk+1

(6.47)

xˆ¯k − xk = Bk (xˆ¯k−1 − xk−1 ) + (Bk − B)xk−1 + Hwk+1 .

(6.48)

and

By (6.47)–(6.48) it follows that

νk I[uk ∈B] = H T xˆk I[uk ∈B] − H T xk I[uk ∈B] = H T (xˆk − xˆ¯k )I[uk ∈B] + H T (xˆ¯k − xk )I[uk ∈B] = H T Bk1 (xˆ0 − xˆ¯0 )I[uk ∈B] + H T

k

∑ Bk,i+1 HAi (z)ξi+1 I[uk ∈B]

i=1 k

+ H T Bk1 (xˆ¯0 − x0 )I[uk ∈B] + H T ∑ Bk,i+1 (Bi − B)xi−1I[uk ∈B] i=1

k

+ H T ∑ Bk,i+1 Hwi+1 I[uk ∈B] , i=1

(6.49)

82

H.-F. Chen

where Δ

Bni = Bn Bn−1 · · · Bi

Δ

for n ≥ i,

B ji = I

for

j < i.

By Theorems 6.1 and 6.2 Ak (z) −−−→ A(z) and Bk (z) −−−→ B(z). Then by stability n→∞

n→∞

of B(z) there exist constants c > 0 and λ ∈ (0, 1) such that Bk j ≤ cλ k− j ,

and Bk− j ≤ cλ k− j

∀k ≥ j.

(6.50)

Since {uk }, {wk }, and {ξk } are mutually independent and {uk } is bounded, by (6.50) it is shown [7] that the time average of each term at the right-hand side of (6.49) tends to zero. Hence (6.46) is verified. −−→ Evk I[ uk ≥ 0]. Comparing (6.37) with Notice that (6.46) also implies that m¯ + k − k→∞

+ + −−→ d + . The strong d + , by strong consistency of c+ k and hk we conclude that dk − k→∞

− consistency of μk− , m¯ − k , and dk are established in a similar way.

6.4 Parameterized Nonlinearity We now consider identification of the system (6.1) with f (·) expressed as a sum of linear combination of basis functions with unknown coefficients [21]. To be precise, let {g1 (x), · · · , gs (x)}, x ∈ R be a set of basis functions and let s

f (x) =

∑ d j g j (x)

with unknown d1 , · · · , ds .

j=1

Assume the system output {yk } is observed without noise, i.e., ξk = 0 in (6.3). Then the system (6.1) is expressed as s

s

A(z)yk+1 =

∑ d j g j (uk ) + b1 ∑ d j g j (uk−1) + · · ·

j=1

j=1

s

+ bq ∑ d j g j (uk−q ) + C(z)wk+1 .

(6.51)

j=1

By setting . . . . . Δ θ T = [−a1 · · · − a p..d1 · · · ds ..b1 d1 · · · b1 ds .. · · · ..bq d1 · · · bq ds ..c1 · · · cr ], (6.52) . . ϕk0T = [yk · · · yk+1−p ..g1 (uk ) · · · gs (uk ) · · · g1 (uk−q ) · · · gs (uk−q )..wk · · · wk+1−r ], (6.53) the system (6.51) is then written as yk+1 = θ T ϕk0 + wk+1 .

(6.54)

6

Recursive Identification for Hammerstein Systems

83

To identify θ in this model we may use various estimation algorithms [4], e.g., ELS, the weighted least squares (WLS), the stochastic gradient (SG), and other modifications. Let us apply the ELS with an arbitrary θ0 and P0 = α0 I for some α0 > 0:

θk+1 = θk + ρk Pk ϕk (yk+1 − θkT ϕk )

(6.55)

Pk+1 = Pk − ρk Pk ϕk ϕkT Pk

(6.56)

1 T wˆ k+1 = yk+1 − θk+1 ϕk , ρk = 1 + ϕkT Pk ϕk

(6.57)

ϕkT = [yk · · · yk+1−p g1 (uk ) · · · gs (uk ) · · · g1 (uk−q ) · · · gs (uk−q )wˆ k · · · wˆ k+1−r ]. (6.58) Remark 6.3. Let us write θk component-wisely: Δ

θkT = [−a1,k , · · · , −a p,k , d1,k , · · · , ds,k , (b1 d1 )k , · · · , (b1 ds )k , · · · , (bq d1 )k , · · · , (bq ds )k , c1,k , · · · , cr,k ]. (bi d j )k d j,k

It is clear that 0, j = 1, · · · , s.

may serve as an estimate for bi , i = 1, · · · , q, whenever d j,k =

To establish θk −−−→ θ we need the the following assumptions. k→∞

is strictly positive real (SPR), i.e., C−1 (eiλ ) + C−1 (e−iλ ) − 1 > 0, ∀λ ∈ [0, 2π ]; B2. {wn , Fn } is a martingale difference sequence and supn≥0 E[|wn+1 |β |Fn ] < ∞ a.s. for some β ≥ 2, where {Fn } is a sequence of nondecreasing σ −algebras and un is Fn −measurable. B3. A(z) is stable, i.e., A(z) = 0 ∀|z| ≤ 1; . . B4. A(z), B(z), and C(z) have no common factor and [a ..b ..c ] = 0; B1.

C−1 (z) − 12

p

q r

B5. The set of functions {gi , i = 0, · · · , s} with g0 = 1 is linearly independent over some interval [δ1 , δ2 ]; {uk } is a sequence of iid random variables with density p(x), which is positive and continuous over [δ1 , δ2 ], and 0 < Eg2i (uk ) < ∞, i = 1 · · · , s; The sequences {uk } and {wk } are mutually independent; B6. limn→∞ n1 ∑ni=1 w2i = Rw > 0 a.s.; B7. ∑si=1 di2 = 0. Theorem 6.5. If B1-B7 hold, then as n tends to infinity θn+1 − θ = O 2

" log n log log nδ (β −2) # n

a.s.,

(6.59)

where δ (0) = c > 1 and δ (x) = 0 for x = 0. 0 (n) and λ 0 (n) the maximal and minimal eigenvalues of Proof. Denote by λmax min n 1 0 0T ∑i=0 ϕi ϕi + α0 I, respectively. By Theorem 4.2 in [4] it follows that

84

H.-F. Chen

" log λ 0 (n) log log λ 0 (n)δ (β −2) # max max a.s. as n → ∞, θn+1 − θ = O 0 (n) λmin 2

(6.60)

δ (β −2) 0 0 0 if log λmax (n) log log λmax (n) = o λmin (n) a.s. The conclusion (6.59) will follow from (6.60) if we can show that 1 0 (n) > 0 a.s. lim inf λmin n→∞ n The proof is similar to that given in Theorem 6.2 of [4] for (6.79). Here we only outline the key points of the proof. For (6.60) it suffices to show n 1 lim inf λmin { ∑ fi fiT } > 0 a.s., n→∞ n i=0

Δ

where fi = A(z)ϕi0 and λmin {A} denotes the minimal eigenvalue of a matrix A. For a fixed ω (sample) the converse assumption leads to the existence of a subsequence Δ

(0)

(p−1)

ηnTk = [αnk , · · · , αnk

(01)

(0s)

(q1)

(qs)

(0)

(r−1)

, βnk , · · · , βnk , · · · , βnk , · · · , βnk , γnk , · · · , γnk

]

with ηnk 2 = 1 such that 1 nk T 2 ∑ (ηnk fi ) = 0, k→∞ nk i=0

(6.61)

lim

where

ηnTk fi = μ

∑ h nk

μ

∑ h nk

j=0

(i j) j

z =

p−1 j=0

μ

p−1

j=0

(0 j) j

z =

j=0

j=0

q

∑ αnk B(z)z j+1 di + ∑ βnk

j=0

∑ h nk

μ μ (s j) (0 j) z , · · · , ∑ hnk z j , ∑ hnk z j · [g1 (ui ), · · · , gs (ui ), wi ]T ,

(l j) j

( j)

( ji)

A(z)z j , i = 1, · · · , s,

j=0

r−1

∑ αnk C(z)z j + ∑ γnk A(z)z j .

j=0

( j)

( j)

j=0

By boundedness of {ηnk } without loss of generality we may assume that ηnk −−−→ η k→∞

with η = 1. By a treatment similar to that used in Theorem 6.4 of [4] it is shown that (6.61) implies that η = 0. But this is impossible since 1 ≡ ηnk −−−→ η . The obtained k→∞

contradiction proves the theorem. For details we refer to [21].

6

Recursive Identification for Hammerstein Systems

85

6.5 Concluding Remarks We have proposed recursive algorithms to identify the Hammerstein systems with nonlinearities i) f (·) non-parameterized; ii) f (·) being a piecewise linear function; iii) f (·) being presented as a sum of basis functions with unknown coefficients. In all three cases the estimates are proved to converge to the true values with probability one. In Sections 6.2 and 6.3 the results are obtained for (6.2), where the system noise {wk } is uncorrelated. However, the similar results may also be obtained for (6.1) with correlated system noise C(z)wk with the help of technique developed in [8] without imposing conditions like SPR on C(z). In Section 6.3 the availability of the upper bound U for d + and d − may be removed by the treatment used in [13], where a recursive algorithm is proposed to generate an upper bound for d + and d − . It would be of interest to consider the case where an internal noise exists, i.e., the input of the linear subsystem includes not only vk but also an additive noise. Besides, it might also be of interest to consider the colored observation noise ξk .

6.6 Appendix General Convergence Theorem (GCT) for SA Δ Let f (·) be an unknown function f (·) : Rl → Rl , and denote its root set by J = {x ∈ Rl : f (x) = 0}. With any initial value x0 the problem is to recursively construct {xk } to approach J on the basis of noisy observations {yk }, where yk+1 = f (xk ) + εk+1 denotes the observation at time k + 1 and εk+1 the observation noise, which may depend on xk . Take a sequence {Mk } of positive numbers increasingly diverging to infinity, and a fixed point x∗ ∈ Rl . Define {xk } by SAAWET as follows: xk+1 = (xk + ak yk+1 )I[xk +ak yk+1 ≤Mσ

k

]

∗

+ x I[xk +ak yk+1 >Mσ ] ,

(6.62)

k

σk =

k−1

∑ I[xi +aiyi+1 >Mσi ] ,

σ0 = 0.

(6.63)

i=1

The following assumptions are to be used. S1. ak > 0, ak −−−→ 0 and ∑∞ k=1 ak = ∞. k→∞

S2. There is a continuously differentiable function (not necessarily being nonnegative) v(·) : Rl → R such that sup

δ ≤d(x,J)≤Δ

f T (x)vx (x) < 0

86

H.-F. Chen Δ

for any Δ > δ > 0, where d(x, J) = infy {x − y : y ∈ J} and vx (·) denotes the Δ

gradient of v(·). Further, v(J) = {v(x) : x ∈ J} is nowhere dense, and x∗ used in (6.62) is such that v(x∗ ) < infx=c0 v(x) with x∗ < c0 for some c0 > 0 . S3. f (·) is measurable and locally bounded. For introducing condition on noise let us denote by (Ω , F , P) the probability space. Let εk+1 (·, ·) : (Rl × Ω , B l × F ) → (Rl × B l ) be a measurable function defined on the product space. Let the noise εk+1 be given by

εk+1 = εk+1 (xk , ω ),

ω ∈ Ω.

S4. For the fixed sample path ω under consideration lim lim sup

T →0 k→∞

1 m(nk ,Tk ) ∑ ai εi+1 (xi (ω ), ω ) = 0, T i=n k

∀Tk ∈ [0, T ]

(6.64)

along the subscripts {nk } of any convergent subsequences xnk (ω ), where $ % m Δ m(k, T ) = max m : ∑ ai ≤ T .

(6.65)

i=k

The algorithm (6.62)-(6.63) is considered for a fixed ω , but ω in xi (ω ) is often suppressed. Theorem 6.6 (GCT). Let {xk } be given by (6.62)(6.63) for a given initial value x0 . Assume S1-S3 hold. Then, d(xk , J) −−−→ 0 for any sample paths (ω ) for which S4 k→∞

holds. For the proof we refer to Theorem 2.2.1 in [5]. It is worth noting that Condition S4 is imposed on convergent subsequences, so the condition is of local feature. In contrast to this, a condition would be of global feature if it is required to verify along the whole sequence {xn }. Acknowledgements. The work is supported by NSFC Grants No. 60821091 and 60874001 and by a grant from the National Laboratory of Space Intelligent Control.

References 1. Bai, E.W.: Convergence of the iterative Hammerstein system identification algorithm. IEEE Trans. Automat. Contr. 49(11), 1929–1940 (2004) 2. Bai, E.W., Li, D.: Convergence of the iterative Hammerstein system identification algorithm. IEEE Transactions on Automatic Control 49, 1929–1940 (2004) 3. Boutayeb, M., Rafaralahy, H., Drouach, M.: A robust and recursive identification method for Hammerstein model. In: Proc. 13th. IFAC World Congr., San Fransisco, CA, vol. I, pp. 447–452 (1996) 4. Chen, H.F., Guo, L.: Identification and Stochastic Adaptive Control. Birkhäuser, Boston (1991)

6

Recursive Identification for Hammerstein Systems

87

5. Chen, H.F.: Stochastic Approximation and Its Applications. Kluwer, Dordrecht (2002) 6. Chen, H.F.: Pathwise convergence of recursive identification algorithms for Hammerstein systems. IEEE Trans. Automat. Contr. 49(10), 1641–1649 (2004) 7. Chen, H.F.: Strong consistency of recursive identification for Hammerstein systems with discontinuous piece-wise linear memoryless block. IEEE Trans. Automat. Contr. 50(10), 1612–1617 (2005) 8. Chen, H.F.: New Approach to Identification for ARMAX Systems. IEEE Trans. Autom. Control 55, 868–879 (2010) 9. Chow, Y.S., Teicher, H.: Probability Theory. Spring, New York (1978) 10. Eskinat, E., Johnson, S., Luyben, W.L.: Use of Hammerstein models in identification of nonlinear systems. AIChE. J. 37(2), 255–268 (1991) 11. Greblicki, W.: Stochastic approximation in nonparametric identification of Hammerstein systems. IEEE Trans. Automat. Contr. 47(11), 1800–1810 (2002) 12. Greblicki, W., Pawlak, M.: Nonparametric identification of Hammerstein systems. IEEE Trans. Inform. Theory 35(3), 409–418 (1989) 13. Huang, Y.Q., Chen, H.F., Fang, H.T.: Identification of Wiener Systems with Nonlinearity Being Piecewise-linear Function. Science in China, Series F 51(1), 1–12 (2008) 14. Ljung, L.: System Identification. Prentice Hall, Upper Saddle River (1987) 15. Narendra, K.S., Gallman, P.G.: An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Trans. Automat. Contr. 11(7), 546–550 (1966) 16. Pawlak, M.: On the series expansion approach to the identification of Hammerstein system. IEEE Trans. Automat. Contr. 36, 459–476 (1982) 17. Stoica, P.: On the convergence of an iterative algorithm used for Hammerstein system identification. IEEE Trans. Automat. Contr. 26(4), 967–969 (1981) 18. Verhaegen, M., Westwick, D.: Identifying MIMO Hammerstein systems in the context of subspace model identificaiton methods. Int. J. Control 63(2), 331–349 (1996) 19. Vörös, J.: Parameter identification of discontinuous Hammerstein systems. Automatica 33(6), 1141–1146 (1997) 20. Walker, A.M.: Large-sample estimation of parameters for autoregressive process with moving-average residuals. Biometrika 49(1,2) (1962) 21. Zhao, W.X.: Parametric Identification of Hammerstein Systems with Consistency Results Using Stochastic Inputs. IEEE Trans. Automat. Contr. 55, 474–480 (2010) 22. Zhao, W.X., Chen, H.F.: Recursive identification for Hammerstein system with ARX subsystem. IEEE Trans. Automat. Contr. 51(12), 1966–1974 (2006)

Chapter 7

Wiener System Identification Using the Maximum Likelihood Method Adrian Wills and Lennart Ljung

Dedicated To Anna Hagenblad (1971 - 2009) Much of the research presented in this chapter was initiated and pursued by Anna as part of her work towards a Ph.D thesis, which she sadly never had the opportunity to finish. Her interest in this research area spanned nearly ten years and her contributions were significant. She will be missed. We dedicate this work to the memory of Anna.

7.1 Introduction Within the class of nonlinear system models, the so-called block-oriented models have gained wide recognition and attention by the system identification and automatic control community. Typically, these models are constructed by joining linear dynamic system blocks with static nonlinear mappings in various forms of interconnection. The Wiener model depicted in Figure 7.1 is one such block-oriented model, see, e.g. [2], [18] or [9]. It is typically comprised of two blocks, where the first one is linear and dynamic and the second is nonlinear and static. From one perspective, these models are reasonable since they often reflect the physical realities of a system. Some examples of this include distillation columns [24], pH control processes [11], and biological examples [10]. More generally, they Adrian Wills School of Electrical Engineering and Computer Science, University of Newcastle, Callaghan, NSW, 2308, Australia e-mail: [email protected] Lennart Ljung Division of Automatic Control, Linköpings universitet, SE-581 80 Linköping, Sweden e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 89–110. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

90

A. Wills and L. Ljung

w(t)

u(t)

G(q, ϑ)

x0 (t)

e(t)

x(t)

f (·, η)

y(t)

Fig. 7.1: The Wiener model. The input u(t) and the output y(t) are measurable, but not the intermediate signal x(t). w(t) and e(t) are noise sources. x0 (t) denotes the output of the linear dynamic system G. f is nonlinear and static (memoryless)

accurately model situations where the output of a linear system is obtained using a nonlinear measurement device. From another perspective, if the blocks of a Wiener model are multi-variable, then it can be shown [3] that almost any nonlinear system can be approximated arbitrarily well using them. However, this is not the focus of the current chapter, where single input - single output systems are considered. With this as motivation, in this chapter we are concerned with estimating Wiener models based on input and/or output measurements. To make these ideas more precise, we will adopt the notation used in Figure 7.1 here and throughout the remainder of this chapter. In particular, the input signal is denoted by u(t), the output signal by y(t) and x(t) denotes the intermediate unmeasured signal. The disturbance term w(t) is henceforth called the process noise and e(t) is called the measurement noise as usual. These noise terms are assumed to be mutually independent. Using this notation, the Wiener system can be described by the following equations: x0 (t) = G(q, ϑ )u(t) . x(t) = x0 (t) + w(t) . y(t) = f x(t), η + e(t) .

(7.1)

Throughout this chapter it is assumed that f and G each belong to a parametrised model class. Typical classes for the nonlinear term f include basis function expansions such as polynomials, splines, or neural networks. The nonlinearity f may also be a piecewise linear function, such as a dead-zone or saturation function. Typical classes for the linear term G include rational transfer functions and linear state space models. It is important to note that if the process noise w and the intermediate signal x are unknown, then the parametrisation of the Wiener model is not unique. For example, scaling the linear block G via κ G and scaling the nonlinear block f via f ( κ1 ·) will result in identical input–output behaviour. (It may necessary to scale the process noise variance with a factor κ .)

7

Wiener System Identification Using the Maximum Likelihood Method

91

Based on the above description, the problem addressed in this chapter is to estimate the parameters ϑ within the model class for G and η within the model class for f that best match the measured output data from the system. For convenience, we define a joint parameter vector θ as

θ = [ϑ T , η T ]T

(7.2)

which will be used throughout this chapter.

7.2

An Output-error Approach

While there are several methods for identifying Wiener models proposed in the literature, the most dominant of these is to parametrise the linear and the nonlinear blocks, and then estimate the parameters from data by minimising an output-error criterion (this has been used in [1], [21] and [22] for example). In particular, if the process noise w(t) in Figure 7.1 is ignored, then a natural criterion is to minimise VN (θ ) =

#2 1 N" y(t) − f G(q, ϑ )u(t), η . ∑ N t=1

(7.3)

This approach is standardly used in software packages such as [23] and [12]. If it is true that the process noise w(t) is zero, then (7.3) becomes the predictionerror criterion. Furthermore, if measurement noise is white and Gaussian, (7.3) is also the Maximum Likelihood criterion and the estimate is therefore consistent [13]. Even for the case where there is process noise, it may still seem reasonable to use an output-error criterion like (7.3) to obtain an estimate. However, f G(q, ϑ )u(t), η is not the true predictor in this case and it has been shown in [8] that this can result in biased estimates. A further difficulty with this approach is that is cannot directly handle the case of blind Wiener model estimation where the process noise is assumed to be zero, but the input u(t) is not measured. Related criteria to (7.3) have been derived for this case, but they assume that the nonlinearity is invertible and/or that the measurement noise is not present [20, 19]. By way of motivating the main tool in this chapter, namely Maximum Likelihood estimation, the next section provides conditions for the estimates of (7.3) to be consistent. It is shown by example that using the output-error criterion can produce biased estimates. These results appeared in [8].

Consistent Estimates Consider a Wiener system in the form of Figure 7.1 and Equation (7.1) and assume we have measurements of the input and output according to some “true” parameters (ϑ0 , η0 ), i.e.

92

A. Wills and L. Ljung

y(t) = f G(q, ϑ0 )u(t) + w(t), η0 + e(t) .

(7.4)

Based on the measured inputs and outputs, we would like to find an estimate of

) say, that are close to the true parameters. A more these parameter values, (ϑ , η precise way of describing this is to say that an estimate is consistent if the parameters converge to their true values as the number of data, N tends to infinity. In order to make this idea concrete for the output-error criterion in (7.3) we write the true system (7.4) as ˜ + e(t) (7.5) y(t) = f G(q, ϑ0 )u(t), η0 + w(t) where

w(t) ˜ = f G(q, ϑ0 )u(t) + w(t), η0 − f G(q, ϑ0 )u(t), η0 .

(7.6)

The new disturbance term w(t) ˜ may be regarded as a (input-dependent) transformation of the process noise w(t) to the output. This transformation will most likely distort the stochastic properties of w(t), such as mean and variance, compared with w(t). ˜ By inserting the equation for y in (7.5) into the criterion (7.3), we receive the following expression. #2 1 N" f − f + w(t) ˜ + e(t) (7.7) ∑ 0 N t=1 #2 1 N # 2 2 N " 1 N" = ∑ f0 − f + ∑ w(t) ˜ + e(t) ˜ + e(t) + ∑ f0 − f w(t) N t=1 N t=1 N t=1

VN (θ ) =

where

f0 f G(q, ϑ0 )u(t), η0 ,

f f G(q, ϑ )u(t), η .

(7.8)

Further, assume that all noise terms are ergodic, so that time averages tend to their mathematical expectations as N tends to infinity. Assume also that u is a (quasi)stationary sequence [13], so that is also has well defined sample averages. Let, E denote both mathematical expectation and averaging over time signals (cf. E¯ in [13]). Using the fact that the measurement noise e is zero mean, and independent of the input u and the process noise w means that several cross terms will disappear. The criterion then tends to " #2 " # ˜ . (7.9) V¯ (θ ) = E f0 − f + E w˜ 2 (t) + Ee2(t) + 2E f0 − f w(t) The last term in this expression cannot necessarily be removed since the transformed process noise w˜ need not be independent of u. The criterion (7.9) has a quadratic form, and the true values (ϑ0 , η0 ) will minimise the criterion if and essentially only if " # E f G(q, ϑ0 )u(t), η0 − f G(q, ϑ )u(t), η w(t) ˜ =0. (7.10)

7

Wiener System Identification Using the Maximum Likelihood Method

93

Typically, this will not hold due to the possible dependence between u and w. ˜ The parameter estimates will therefore be biased in general. To illustrate this, we provide an example below. Example 7.1. Consider the following Wiener system, with linear dynamic part described by x0 (t) + 0.5x0(t − 1) = u(t − 1) , (7.11) x(t) = x0 (t) + w(t) , followed by a static nonlinearity described as a second order polynomial, f x(t) = c0 + c1x2 (t) , y(t) = f x(t) + e(t) .

(7.12)

The goal is to estimate the nonlinearity parameters denoted here by c 0 and c 1 . In this case it is possible to provide expressions for the analytical minimum of criterion (7.3). Recall that in this case the process noise w(t) is assumed to be zero. Therefore, the predicted output can be expressed as y (t) = f (G(q, ϑ )u(t), η ) = f (x0 (t), η ) = c 0 + c 1 x20 (t) .

(7.13)

Assume that all signals, noises as well as inputs, are Gaussian, zero mean and ergodic. Let λx denote the variance of x0 , λw denote the variance of w, and λe denote the variance of e. As N tends to infinity, the criterion (7.3) tends to the limit (7.9) 2 V¯ = E(y − y )2 = E c0 + c1 (x0 + w)2 + e − c 0 − c 1 x20 2 = E (c1 − c 1 )x20 + c0 − c 0 + 2c1 x0 w + c1 w2 + e . All the cross terms will be zero since the signals are Gaussian, independent and zero mean. The fourth order moments are Ex4 = 3λx2 and Ew4 = 3λw2 . This leaves V¯ =3(c1 − c 1 )2 λx2 + (c0 − c 0 )2 + 4c1λx λw + 3c21λw2 + λe + 2(c0 − c 0 ) × (c1 − c 1 )λx + 2c1(c1 − c 1 )λx λw + 2c1(c0 − c 0 )λw . From this expression it is possible to compute the gradient with respect to each c i and therefore find the minimum by solving (c0 − c 0) + (c1 − c 1) + c1 λw = 0 3(c1 − c 1 )λx2 + (c0 − c 0 )λx + 3c1λx λw = 0 with the solution c 0 = c0 + c2 λw , Therefore, the estimate of c0 is clearly biased.

c 1 = c1 .

94

A. Wills and L. Ljung

Motivated by the above example, the next section investigates the use of the Maximum-Likelihood criterion to estimate the system parameters, which is known to produce consistent estimates under the assumptions of this chapter [13].

7.3 The Maximum Likelihood Method The maximum likelihood method provides estimates of the parameter values θ based on an observed data set YN = {y(1), y(2), . . . , y(N)} by maximising a likelihood function. In order to use this method it is therefore necessary to first derive an expression for the likelihood function itself. The likelihood function is the probability density function (PDF) of the outputs that is parametrised by θ . We shall assume for the moment that the input sequence UN = {u(1), u(2), . . . , u(N)} is a given, deterministic sequence (the case of blind Wiener estimation where the input is assumed to be stochastic is subsumed by the coloured process noise case in Section 7.3.2). This likelihood will be denoted here by pθ (YN ) and the Maximum-Likelihood (ML) estimate is obtained via

θ = arg max pθ (YN ) . θ

(7.14)

This approach enjoys a long and fruitful history within the system identification community because of its statistical efficiency in producing consistent estimates (see e.g. [13]). In the following sections we will provide expressions of the likelihood function for various Wiener models. In particular, we firstly consider the system depicted in Figure 7.1 and then consider a related one whereby the process noise is allowed to be coloured. Finally, we consider the case where the input signal is unknown (the is the so-called blind estimation problem). Based on these expressions, Section 7.4 provides algorithms for computing the ML estimate. This includes the direct gradient-based approach for models in the form of Figure 7.1, which was presented in [8]. In addition, the ExpectationMaximisation approach is presented for the case of coloured process noise.

7.3.1 Likelihood Function for White Disturbances For the Wiener model in Figure 7.1 we assume that the disturbance sequences e(t) and w(t) are each white noise. This means that for given input sequence UN , y(t) will also be a sequence of independent variables. This in turn implies that the PDF of YN will be the product of the PDF’s of y(t),t = 1, . . . , N. Therefore, it is sufficient to derive the PDF of y(t). To simplify notation we shall use y(t) = y, x(t) = x. As a means to expressing this PDF, we firstly introduce an intermediate signal x (see Figure 7.1) as a nuisance parameter. The benefit of introducing this term is that the PDF of y given x is basically a reflection of the PDF of e since y(t) = f x(t) + e(t) hence

7

Wiener System Identification Using the Maximum Likelihood Method

py (y|x) = pe y − f (x, η )

95

(7.15)

where pe is the PDF of e. In a similar manner, the PDF of x given UN can be obtained by noting that (7.16) x(t) = G(q, ϑ )u(t) + w(t) = x0 (t, ϑ ) + w(t) . So that for a given UN and ϑ , x0 is a known, deterministic variable, and hence px (x) = pw x − x0 (ϑ ) = pw x − G(q, ϑ )u(t) (7.17) where pw is the PDF of w. Since x(t) is not measured, then we must integrate over all x ∈ R in order to eliminate it from the expressions to receive

py (y) =

x∈R

= x∈R

= x∈R

px,y (x, y)dx py (y|x) px (x)dx

(7.18)

pe y − f (x, η ) pw x − G(q, ϑ )u(t) dx .

In order to proceed further, it is necessary to assume a PDF for e and w. Therefore, we assume that the process noise w(t) and the measurement noise e(t) are Gaussian, with zero means and variances λw and λe respectively, i.e. 1 − 1 ε2 pe ε = √ e 2λe 2πλe

and

1 − 1 v2 pw v = √ e 2λw . 2πλw

(7.19)

The joint likelihood can be expressed as the product over all time instants since the noise is white, so that N N ∞ 1 1 √ pθ (YN ) = (7.20) ∏ −∞ e− 2 ε (t,θ ) dx(t) 2π λe λw t=1 where

ε (t, θ ) =

#2 2 1" 1 y(t) − f x(t), η x(t) − G(q, ϑ )u(t) . + λe λw

(7.21)

Therefore, when provided with the observed data UN and YN , we can calculate pθ (YN ) and its gradients for each θ . This means that the ML criterion (7.14) can be maximised numerically. This approach is detailed in Section 7.4.1. The derivation of the Likelihood function appeared in [7] and [8].

7.3.2 Likelihood Function for Coloured Process Noise If the process noise is coloured, we may represent the Wiener system as in Figure 7.2. In this case, equations for the output are given by

96

A. Wills and L. Ljung

x(t) = G(q, ϑ )u(t) + H(q, ϑ )w(t) , y(t) = f x(t), η + e(t) .

(7.22)

By using the predictor form, see [13], we may write this as x(t) = x (t|Xt−1 ,Ut , ϑ ) y(t) =

x (t|Xt−1 ,Ut , ϑ ) + w(t) , (7.23) H −1 (q, ϑ )G(q, ϑ )u(t) + 1 − H −1(q, ϑ ) x(t) , (7.24) f x(t), η + e(t) . (7.25)

In the above, Xt−1 denotes the sequence Xt−1 = {x(1), . . . , x(t − 1)} and similarly for Ut . The only stochastic parts are e and w, hence for a given sequence XN , the joint PDF of YN is obtained in the standard way N

pYN (YN |XN ) = ∏ pe (y(t) − f (x(t), η )) .

(7.26)

t=1

On the other hand, the joint PDF for XN is given by (c.f. Equation (5.74), Lemma 5.1, in [13]) N

pXN (XN ) = ∏ pw (x(t) − x (t|Xt−1 ,Ut , ϑ )) .

(7.27)

t=1

The likelihood function for YN is thus obtained from (7.26) by integrating out the nuisance parameter XN using its PDF (7.27) N

pθ (YN ) =

∏ pw

−1 # " H (q, ϑ )[x(t) − G(q, ϑ )u(t)] pe y(t) − f x(t), η dXN .

t=1

(7.28) Unfortunately, in this case filtered versions of x(t) enter the integral, which means that the integration is a true multidimensional integral over the entire sequence XN .

w(t)

e(t)

H(q, ϑ)

u(t)

G(q, ϑ)

x0 (t)

x(t)

f (·, η)

y(t)

Fig. 7.2: Wiener model with coloured process noise. Both w(t) and e(t) are white noise sources, but w(t) is filtered through H(q, ϑ )

7

Wiener System Identification Using the Maximum Likelihood Method

97

This is likely to be intractable using direct integration methods in practise, unless the inverse noise filters are short FIR filters. Motivated by this, here we adopt another approach whereby the noise filter H is described in state-space form as H(q, ϑ ) = C(ϑ )(qI − A(ϑ ))−1 B(ϑ ) ,

(7.29)

where A, B, C are state-space matrices, and the state update is described via

ξ (t + 1) = A(ϑ ) ξ (t) + B(ϑ ) w(t) .

(7.30)

Therefore, according to Figure 7.2, the output can be expressed as y(t) =

f (C(ϑ )ξ (t) + G(q, ϑ )u(t), η ) + e(t) .

(7.31)

Equations (7.30) and (7.31) are in the form of a nonlinear state-space model, which has recently been considered in [17]. In that paper the authors use the ExpectationMaximisation algorithm in conjunction with particle methods to compute the ML estimate. We also adopt this technique here, which is detailed in Section 7.4.2. Blind estimation Note that if the linear term G was zero, then the above system will become a blind Wiener model, so that (7.31) becomes y(t) =

f (C(ϑ )ξ (t), η ) + e(t)

(7.32)

and the parameters in H and f must be estimated via the output measurements only. This case is profiled via a simulation example in Section 7.5.3.

7.4 Maximum Likelihood Algorithms For the case of white Gaussian process and measurement noise described in Section 7.3.1, it was mentioned that numerical methods could be used to evaluate the likelihood integral in Equation (7.20). At the same time, these methods can be used to compute the gradient for use in a gradient based search procedure to find the maximum likelihood estimate. This is the approach outlined in Section 7.4.1 below and profiled in Section 7.5 by way of simulation examples. While this method is very useful and practical, it does not handle the case of estimating parameters of a colouring filter for the case discussed in Section 7.3.2. Further, it does not handle the blind estimation case discussed in Section 7.3.2. Therefore, we present an alternative method based on using the Expectation Maximisation (EM) approach in Section 7.4.2 below. A key point to note is that this method requires a nonlinear smoothing operation and this is achieved via particle methods. Again, the resulting algorithm is profiled in Section 7.5 by way of simulation studies.

98

A. Wills and L. Ljung

7.4.1 Direct Gradient-based Search Approach In this section we are concerned with maximising the likelihood function described in (7.20) and (7.21) via gradient based search. In order to avoid numerical conditioning issues, we consider the equivalent problem of maximising the log-likelihood function provided below. θ = arg max L(θ ) (7.33) θ

where L(θ ) log pθ (YN ) = −N log(2π ) −

&

N N log(λw λe ) + ∑ log 2 t=1

∞ −∞

' e− 2 ε (t,θ ) dx 1

(7.34) (7.35)

and ε (t, θ ) is given by Equation (7.21). To solve (7.33) here we employ an iterative gradient based approach. Typically, this approach proceeds by computing a “search direction”, and then the function L is increased along the search direction to obtain a new parameter estimate. This search direction is usually determined so that it forms an acute angle with the gradient, since under these conditions it can be shown to increase the cost when added to the current estimate. To be more precise, at iteration k, L(θk ) is modelled locally as 1 L(θk + p) ≈ L(θk ) + gTk p + pT Hk−1 p, 2

(7.36)

where gk is the derivative of L with respect to θ evaluated at θk and Hk−1 is a symmetric matrix. If a Newton direction is desired, then Hk−1 would be the inverse of Hessian matrix, but the Hessian matrix itself may be quite expensive to compute. However, the structure in (7.34) is directly amenable to using Gauss-Newton gradient based search [4], which provides a good approximation to the Hessian. Here, however, we employ a quasi-Newton method where Hk is updated at each iteration based on local gradient information so that it resembles the Hessian matrix in the limit. In particular, we use the well-known BFGS update strategy [15, Section 6.1], which can guarantee that Hk is negative definite and symmetric so that pk = −Hk gk

(7.37)

maximises (7.36). The new parameter estimate θk+1 is then obtained by updating the previous one via θk+1 = θk + αk pk , (7.38) where αk is selected such that L(θk + αk pk ) > L(θk ).

(7.39)

7

Wiener System Identification Using the Maximum Likelihood Method

99

Evaluating the cost L(θk ) and its derivative gk are essential to the success of the above approach. For the case of computing the cost, we see from (7.34) that this requires the evaluation of an integral. Similarly, note that the i’th element of the gradient vector gk , denoted gk (i), is given by ⎡ ! ∞ ∂ ε (t,θ ) − 1 ε (t,θ ) ⎤(( N 2 dx ( N ∂ log(λw ) N ∂ log(λw ) 1 −∞ ∂ θ (i) e ⎦( + + ∑ gk (i) = ⎣ ! ∞ − 1 ε (t,θ ) ( 2 ∂ θ (i) 2 ∂ θ (i) 2 t=1 e 2 dx ( −∞

θ =θk

(7.40) so that computing the gradient vector also requires evaluation of an integral. Evaluating the integrals in (7.34) and (7.40) will be achieved numerically in this chapter. In particular, we employ a fixed-interval grid over x and use the composite Simpson’s rule to obtain the approximation [16, Chapter 4]. The reason for employing a fixed grid (it need not be of fixed-interval as used here) is that it allows straightforward computation of L(ϑk ) and its derivative gk at the same grid points. This is detailed in Algorithm 7.1 below and used in the simulations in Section 7.5.

7.4.2 Expectation Maximisation Approach In this section we address the coloured process noise case introduced in Section 7.3.2. As mentioned in that section, the likelihood function as expressed in (7.28) involved the evaluation of a high dimensional integral, which is not tractable on desktop computers. To tackle this problem, the output y(t) was expressed as a nonlinear state-space model via (7.31), (7.29) and (7.30). In this form, the problem is directly amenable to the recently developed Expectation Maximisation (EM) algorithm described in [17]. This section will detail the EM approach as applied to the coloured process noise case. It is also directly applicable to the blind estimation case discussed in Section 7.3.2. In keeping with the notation already defined in Section 7.4.1 above, the EM algorithm is a method for computing θ in (7.33) that is very general and addresses a wide range of applications. Key to both its implementation and theoretical underpinnings is the consideration of a joint log-likelihood function of both the measurements YN and the so-called “missing data” Z LZ,YN (θ ) = log pθ (Z,YN ).

(7.41)

In some cases, the missing data is quite literally measurements that are absent for some reason. More generally though, the missing data Z consists of “measurements” that while not available, would be useful to the estimation problem. As such, the choice of Z is a design variable in the deployment of the EM algorithm. For the case in Section 7.3.2, this choice is naturally the missing state sequence Z = {ξ1 , . . . , ξN },

(7.42)

100

A. Wills and L. Ljung

Algorithm 7.1 Numerical computation of likelihood and derivatives Given an odd number of grid points M, the parameter vector θ and the data UN and YN , perform the following steps. (Note that after the algorithm has terminated, the cost L ≈ L¯ and gradient g ≈ g). ¯ 1. Simulate the system x0 (t) = G(ϑ , q)u(t). 2. Specify grid vector Δ ∈ RM as M equidistant points between the limits [a b], so that Δ (1) = a and Δ (i + 1) = Δ (i) + (b − a)/M for all i = 1, . . . , M − 1. ¯ = 0 for i = 1, . . . , nθ . 3. Set L¯ = N log(2π ) + N2 log(λw λe ), and g(i) 4. for t=1:N, a. for j=1:M, compute x = x0 (t) + Δ ( j),

γ j = e− 2 (α 1

2

/λw +β 2 /λe )

α = x − x0 (t), ,

δ j (i) = γ j

β = y(t) − f (x, η ) ,

∂ ε (t, θ ) , ∂ θ (i)

i = 1, . . . , nθ ,

end b. Compute ⎛ ⎞ M−1 M−3 2 2 (b − a) ⎝ κ= γ1 + 4 ∑ γ2 j + 2 ∑ γ2 j+1 + γM ⎠ , 3M j=1 j=1 ⎞ ⎛ M−1 M−3 2 2 (b − a) ⎝ π (i) = δ1 (i) + 4 ∑ δ2 j (i) + 2 ∑ δ2 j+1 (i) + δM (i)⎠ , 3M j=1 j=1

i = 1, . . . , nθ ,

L¯ = L¯ − log(κ ), 1 ∂ log(λw λe ) π (i) , i = 1, . . . , nθ , g(i) ¯ = g(i) ¯ + + 2 ∂ θ (i) κ end

since if it were known or measured, then the problem would reduce to one in the form of (7.3), which is more readily soluble. It is of vital importance to understand the connection between the joint likelihood in (7.41) and the likelihood (7.34) that we are trying to optimise. Accordingly, note that by the definition of conditional probability, the likelihood (7.34) and the joint likelihood (7.41) are related by log pθ (YN ) = log pθ (Z,YN ) − log pθ (Z | YN ).

(7.43)

Let θk denote an estimate of the likelihood maximiser θ in (7.33). Further, denote by pθk (Z | YN ) the conditional density of the missing data Z, given observations of the available data YN and depending on the choice θk . These definitions allow the

7

Wiener System Identification Using the Maximum Likelihood Method

101

following expression, which is obtained by taking conditional expectations of both sides of (7.43) relative to pθk (Z | YN ).

log pθ (YN ) =

log pθ (Z,YN )pθk (Z | YN )dZ −

log pθ (Z | YN )pθk (Z | YN )dZ

= Eθk {log pθ (Z,YN ) | YN } − Eθk {log pθ (Z | YN ) | YN } . *+ , ) *+ , ) Q(θ ,θk )

(7.44)

V (θ ,θk )

Employing these newly defined Q and V functions, we can express the difference between the likelihood Lθk (YN ) at the estimate θk and the likelihood Lθ (YN ) at an arbitrary value of θ as L(θ ) − L(θk ) = (Q(θ , θk ) − Q(θk , θk )) + (V (θk , θk ) − V (θ , θk ))) . *+ , )

(7.45)

≥0

The positivity of the last term in the above equation can be established by noting that it is the Kullback–Leibler divergence metric between two densities [5]. As a consequence if we obtain a new estimate θk+1 such that Q(θk+1 , θk ) > Q(θk , θk ), then it follows that L(θk+1 ) > L(θk ). So that, by increasing the Q function we are also increasing the likelihood (7.34). This leads to the EM algorithm, which iterates between forming Q(θ , θk ) and then maximising it with respect to θ to obtain a better estimate θk+1 (for further information regarding the EM algorithm, the text [14] is an excellent reference). Algorithm 7.2 Expectation Maximisation Algorithm 1. Set k = 0 and initialise θ0 such that L(θ0 ) is finite. 2. Expectation (E) step: Compute Q(θ , θk ) = Eθk {log pθ (Z,YN ) | YN } .

(7.46)

3. Maximisation (M) step: Compute

θk+1 = arg max Q(θ , θk ). θ

(7.47)

4. If not converged, update k := k + 1 and return to step 2.

The Expectation and Maximisation steps are treated separately in Sections 7.4.2.1 and 7.4.2.2 below. 7.4.2.1

Expectation Step

The first challenge in implementing the EM algorithm is the computation of Q(θ , θk ) according to (7.44). To address this, note that via Bayes’ rule and the Markov property associated with the model in (7.30) and (7.31) and with the choice (7.42) for Z

102

A. Wills and L. Ljung

Lθ (Z,YN ) = log pθ (YN |Z) + log pθ (Z) N−1

=

∑

t=1

N

log pθ (ξt+1 |ξt ) + ∑ log pθ (yt |ξt ).

(7.48)

t=1

Applying the conditional expectation operator Eθk {· | YN } to both sides of (7.48) yields Q(θ , θk ) = I1 (θ , θk ) + I2(θ , θk ),

(7.49)

where I1 (θ , θk ) =

N−1

∑

t=1 N

I2 (θ , θk ) = ∑

log pθ (ξt+1 |ξt )pθk (ξt+1 , ξt |YN ) dξt dξt+1 ,

log pθ (yt |ξt )pθk (ξt |YN ) dξt .

(7.50a) (7.50b)

t=1

Hence, computing Q(θ , θk ) requires knowledge of densities such as pθk (ξt |YN ) and pθk (ξt+1 , ξt |YN ) associated with a nonlinear smoothing problem. Unfortunately, due to the nonlinear nature of the Wiener model, these densities are unlikely to have analytical expressions. This chapter therefore takes a numerical approach of evaluating (7.50a)-(7.50b) via the use of particle methods, more formally known as sequential

importance resampling (SIR) methods [6]. This will result in an approximation Q of Q via

θ , θk ) = I 1 (θ , θk ) + I 2(θ , θk ) Q(

(7.51)

where I 1 and I 2 are approximations to (7.50a) and (7.50b). These approximations are provided by the particle smoothing Algorithm 7.3 below (see [17] for background and a more detailed explanation). To use this algorithm, we require the ability to draw new samples from the disi ), but this is straightforward since ξt is given by a linear statetribution pθk (ξ˜t |ξt−1 space equation in (7.30) with white Gaussian disturbance w(t). Therefore, according i we can draw ξ˜ti via to (7.30), for each ξt−1 i ξ˜ti = Aξt−1 + Bω i

(7.60)

where ω i is a realisation from the appropriate Gaussian distribution for w(t). In addition, we require the ability to evaluate the probabilities pθk (yt |ξ˜t j ) and k |ξ˜ i ). Again, this is straightforward in the Wiener model case described by pθk (ξ˜t+1 t (7.29)-(7.31) since j pθk (yt |ξ˜t ) = pe (yt − f (Cξ˜ti + G(q)ut )), pθ (ξ˜ k |ξ˜ i ) = pw (B† [ξ˜ k − Aξ˜ i ]) k

t+1

t

t+1

where B† is the Moore-Penrose pseudo inverse of B.

t

(7.61) (7.62)

7

Wiener System Identification Using the Maximum Likelihood Method

103

Algorithm 7.3 Particle Smoother Given the current estimate θk , choose the number of particles M and complete the following steps. 1. Initialise particles, {ξ0i }M i=1 ∼ pθk (ξ0 ) and set t = 1; 2. Predict the particles by drawing M i.i.d. samples according to i ξ˜ti ∼ pθk (ξ˜t |ξt−1 ),

i = 1, . . . , M.

(7.52)

3. Compute the importance weights {wti }M i=1 , wti w(ξ˜ti ) =

pθk (yt |ξ˜ti ) , M p (y |ξ˜ j ) ∑ j=1 θk t t

i = 1, . . . , M.

(7.53)

j

4. For each j = 1, . . . , M draw a new particle ξt with replacement (resample) according to, j P(ξt = ξ˜ti ) = wti ,

i = 1, . . . , M.

(7.54)

5. If t < N increment t → t + 1 and return to step 2, otherwise proceed to step 6. 6. Initialise the smoothed weights to be the terminal filtered weights {wti } at time t = N, wiN|N = wiN ,

i = 1, . . . , M.

(7.55)

and set t = N − 1. 7. Compute the following smoothed weights M

k |ξ˜ i ) pθk (ξ˜t+1 t

k=1

vtk

k ∑ wt+1|N

i = wti wt|N

vtk

,

(7.56)

M

k |ξ˜ti ). ∑ wti pθ (ξ˜t+1

(7.57)

k

i=1 ij

wt|N

j j wti wt+1|N pθk (ξ˜t+1 | ξ˜ti ) ∑M wl pθ (ξ˜ l | ξ˜ l ) l=1

t

k

t+1

(7.58)

t

8. Update t → t − 1. If t > 0 return to step 7, otherwise proceed to step 9. 9. Compute the approximations I 1 (θ , θk )

N M M

ij j log pθ (ξ˜t+1 | ξ˜ti ), ∑ ∑ ∑ wt|N

(7.59a)

t=1 i=1 j=1

I 2 (θ , θk )

N M

i log pθ (yt |ξ˜ti ). ∑ ∑ wt|N

(7.59b)

t=1 i=1

7.4.2.2

Maximisation Step

θ , θk ) of the function Q(θ , θk ) made available, attention With an approximation Q( now turns to the maximisation step (7.47). This requires that the approximation

104

A. Wills and L. Ljung

θ , θk ) is maximised with respect to θ in order to compute a new iterate θk+1 of Q( the maximum likelihood estimate.

will not be available. As such, this In general, a closed form maximiser of Q section again employs a gradient based search technique as already utilised in Sec θ , θk ) tion 7.4.1. For this purpose, note that via (7.51) and (7.59) the gradient of Q( with respect to θ is simply computable via

∂ ∂ I 1 (θ , θk ) ∂ I 2 (θ , θk ) Q(θ , θk ) = + , ∂θ ∂θ ∂θ ˜ j ˜i N M M ∂ I 1 (θ , θk ) i j ∂ log pθ (ξt+1 |ξt ) = ∑ ∑ ∑ wt|N , ∂θ ∂θ t=1 i=1 j=1 N M ˜i ∂ I 2 (θ , θk ) i ∂ log pθ (yt |ξt ) = ∑ ∑ wt|N . ∂θ ∂θ t=1 i=1

(7.63a) (7.63b) (7.63c)

j j In the above, we require partial derivatives of pθk (yt |ξ˜t ) and pθk (ξ˜t+1 |ξ˜ti ) with respect to θ . To that end, we may obtain these derivatives via simple calculus on the expressions provided in (7.61) and (7.62). Note that for a given θk , the particle smoother algorithm will provide the particles

itself). ξ˜ti and all the weights required to calculate the above gradients (and indeed Q Importantly, these particles and weights are valid while ever θk remains the same (which it does throughout the Maximisation step). With this gradient available, we can employ the same strategy that was presented

Indeed, this was in Section 7.4.1 for maximising L, to the case of maximising Q. used in the simulations in Section 7.5.

7.5 Simulation Examples In this section we profile three different algorithms on various simulation examples. To streamline the presentation it is helpful to provide each algorithm with an abbreviation. Therefore, output error approach outlined in Section 7.2 is denoted by OE. Secondly, the direct gradient based search method of Section 7.4.1 is denoted by ML-DGBS. Thirdly, the expectation maximisation method of Section 7.4.2 is labelled ML-EM. For the implementation √ of ML-DGBS we chose the limits for the integration [a, b] (see Algorithm 7.1) as ±6 λw , where λw is the variance of the process noise w(t). This corresponds to a confidence interval of 99.9999 % for the signal x(t) if the process noise is indeed Gaussian and white. The number of grid points was chosen to be 1001.

7.5.1 Example 1: White Process and Measurement Noise The first example is a second order system with complex poles for the linear part G(ϑ , q), followed by a deadzone function for the nonlinear part f (·, η ). The input

7

Wiener System Identification Using the Maximum Likelihood Method

105

u and process noise w are Gaussian, each with zero mean and variance 1, while the measurement noise e is Gaussian with zero mean and variance 0.1. The system is given by x0 (t) + a1x0 (t − 1) + a2x0 (t − 2) = u(t) + b1u(t − 1) + b2 , u(t − 2) , x(t) = x0 (t) + w(t) , ⎧ ⎪x(t) − c1 for x(t) < c1 , ⎨ f x(t) = 0 for c1 ≤ x(t) ≤ c2 , ⎪ ⎩ x(t) − c2 for c2 > x(t) , y(t) = f x(t) + e(t) .

(7.64)

Here, we estimate the parameters a1 , a2 , b1 , b2 , c1 , c2 . A Monte-Carlo simulation with 1000 data sets was generated, each using N = 1000 samples. The true values of the parameters, and the results of the OE approach (see Section 7.2) and ML-DGBS method (see Section 7.3.1) are summarised in Table 7.1. The estimates of the deadzone function f x(t) from Equation (7.69) are plotted in Figure 7.3. Table 7.1: Parameter estimates with standard deviations for Example 1, using OE and MLDGBS methods. The mean and standard deviations are computed over 1000 runs. The notation n.e. stands for “not estimated” as the noise variances are not estimated with output error method Par a1 a2 b1 b2 c1 c2 λw λe

True 0.6000 -0.6000 -0.6000 0.6000 -0.3000 0.5000 1.0000 0.1000

OE 0.5486 ± 0.0463 -0.5482 ± 0.0492 -0.6002 ± 0.0146 0.6006 ± 0.0130 -0.1600 ± 0.0632 0.3500 ± 0.0652 n.e. n.e.

ML-DGBS 0.6017 ± 0.0444 -0.6015 ± 0.0480 -0.6002 ± 0.0141 0.6007 ± 0.0126 -0.3064 ± 0.0610 0.5061 ± 0.0641 0.9909 ± 0.0634 0.1033 ± 0.0273

This simulation confirms that the output error approach provides biased estimates. On the other hand, the Maximum Likelihood method provides consistent estimates, including noise variances.

7.5.2 Example 2: Coloured Process Noise The second example considers the Wiener model in Figure 7.2. It is similar to the first example in that the linear part G is a second order system with complex, but different in that we have replaced the deadzone function with a saturation function for the nonlinear part f (·, η ), and different in that the process noise is coloured by

106

A. Wills and L. Ljung 1

0.8

0.6

0.4

f(x(t),η)

0.2

0

−0.2

−0.4

−0.6

−0.8

−1 −1

−0.8

−0.6

−0.4

−0.2

0 x(t)

0.2

0.4

0.6

0.8

1

−0.8

−0.6

−0.4

−0.2

0 x(t)

0.2

0.4

0.6

0.8

1

1

0.8

0.6

0.4

f(x(t),η)

0.2

0

−0.2

−0.4

−0.6

−0.8

−1 −1

Fig. 7.3: Example 1: The true deadzone function as a thick black line and the 1000 estimated deadzones, appearing in grey. Above: OE. Below: ML-DGBS

H(q) =

q−1 1 − h1q−1

(7.65)

which corresponds to the state-space system

ξ (t + 1) = h1 ξ (t) + w(t).

(7.66)

7

Wiener System Identification Using the Maximum Likelihood Method

107

Therefore, overall Wiener system is then given by x0 (t) + a1x0 (t − 1) + a2x0 (t − 2) = u(t) + b1u(t − 1) + b2u(t − 2) , ⎧ ⎪c for x ≤ c1 , ⎨ 1 f x = x for c1 < x ≤ c2 , ⎪ ⎩ c2 for c2 < x , y(t) = f ξ (t) + x0(t) + e(t) .

(7.67)

The goal is to estimate the parameters a1 , a2 , b1 , b2 , c1 , c2 , h1 based on input and output measurements. In this case, three different algorithms were employed, namely the OE method from Section 7.2, the ML-DGBS approach from Section 7.4.1, and the ML-EM particle based method from Section 7.4.2. It should be mentioned that the former two algorithms do not cater for estimating the filter parameter h1 . It is interesting nonetheless to observe their performance based on the wrong assumptions that each make about the process noise, i.e. it doesn’t exist in the first case, and it is assumed white in the second. As before, we ran a Monte-Carlo simulation with 1000 runs and in each we generated a new data set with N = 1000 points. The signals u(t), w(t) and e(t) were generated in the same way as for Example 1. For the EM approach, we used M = 200 particles in approximating Q (see (7.51)). The results are summarised in Table 7.2. It can be observed that the output error approach again provides biased estimates of the nonlinearity parameters. The direct gradient based search procedure seems to produce reasonable results, but the expectation maximisation approach produces slightly more accurate results (this is perhaps surprising given that only M = 200 particles were used). It is worth asking if the consistency of the ML-DGBS approach for coloured process noise is surprising or not. It is well known from linear identification that the Output Error approach gives consistent estimates, even when the output error disturbance is coloured, and thus an erroneous likelihood criterion is used, [13]. The Wiener model resembles the output error model in that, in essence, it is a static model, i.e. for given input u noise is added to the deterministic variable Table 7.2: Parameter estimates with standard deviations for Example 2 with coloured noise, using the OE, ML-DGBS and ML-EM methods Par a1 a2 b1 b2 c1 c2 h1 λw λe

True 0.6000 -0.6000 -0.6000 0.6000 -0.5000 0.3000 0.9000 1.0000 0.1000

OE 0.5683 ± 0.2424 -0.5677 ± 0.2718 -0.5995 ± 0.0642 0.6027 ± 0.0545 -0.3032 ± 0.0385 0.1108 ± 0.0397 n.e. n.e. n.e.

ML-DGBS 0.6163 ± 0.1798 -0.6258 ± 0.2570 -0.5989 ± 0.0510 0.6022 ± 0.0403 -0.4974 ± 0.0278 0.2991 ± 0.0250 n.e. 5.4671 ± 1.8681 0.1000 ± 0.0069

ML-EM 0.5874 ± 0.1376 -0.5820 ± 0.1649 -0.5980 ± 0.0392 0.6017 ± 0.0333 -0.5000 ± 0.0184 0.3003 ± 0.0173 0.8986 ± 0.0227 0.9765 ± 0.2410 0.1000 ± 0.0054

108

A. Wills and L. Ljung

β (t) = G(q)u(t) as β (t) + e(t) (linear output error) or as f (β (t) + w(t)) + e(t) (Wiener model). The spectrum or time correlation of the noises do not seem essential. However, a formal proof of this does not appear to be straightforward in the Wiener case. Therefore, given the relative simplicity of implementing the ML-DGBS method compared with the EM approach, and given that the estimates for both approaches are comparable, it is worth asking whether or not the noise model really needs to be estimated. On the other hand, if it is essential that the noise model be identified, then the output error and ML-DGBS methods are not really suitable since they do not handle this case. In line with this, the next section discusses the blind estimation problem where identifying the noise filter is essential.

7.5.3 Example 3: Blind Estimation In the third simulation, we again consider the Wiener model depicted in Figure 7.2 but with G = 0. This can be interpreted as a blind Wiener model estimation problem, where the unknown input signal w(t) is first passed through a filter H(q) and then secondly mapped through a static nonlinearity f . Finally, the measurements are corrupted by the disturbance e(t) to provide y(t). In particular, we assume as in Example 2 that the process noise is coloured by H(ϑ , q) =

q−1 1 − h1q−1

(7.68)

and the resulting signal is then mapped through a saturation nonlinearity, so that the overall Wiener system is given by y(t) = f ξ (t) + e(t) , ξ (t + 1) = h1 ξ (t) + w(t) , ⎧ (7.69) ⎪c for ξ (t) ≤ c1 , ⎨ 1 f ξ (t) = ξ (t) for c1 < ξ (t) ≤ c2 , ⎪ ⎩ c2 for c2 < ξ (t) . Here we are trying to estimate the parameters h1 , c1 , c2 and the variance parameters λw , λe of the process noise w(t) and e(t), respectively. This is to be done based on the output measurements alone. The EM method described in Section 7.4.2 is directly applicable to this case, and was employed here. As usual, we ran a Monte-Carlo simulation with 1000 runs and in each we generated a new data set with N = 1000 points. The signals w(t) and e(t) were generated as Gaussian random numbers with variance 1 and 0.1, respectively. In this case, we used only M = 50 particles in approximating Q. The results are summarised in Table 7.3. Even with a modest number of particles used, M = 50, the estimates are consistent and appear to be accurate.

7

Wiener System Identification Using the Maximum Likelihood Method

109

Table 7.3: Parameter estimates with standard deviations for Example 3, using the EM method Par b2 c1 c2 λw λe

True 0.9000 -0.5000 0.3000 1.0000 0.1000

ML-EM 0.8995 ± 0.0237 -0.4967 ± 0.0204 0.2968 ± 0.0193 1.0293 ± 0.1744 0.1019 ± 0.0063

7.6 Conclusion The dominant approach for estimating Wiener models is to parametrise the linear and nonlinear parts and then minimise, with respect to these parameters, the squared error between the measured output and a simulated one from the Wiener model. This approach implicitly assumes that no process noise is present. It was confirmed that this leads to biased estimates if the assumption is wrong. To overcome this problem, the chapter presents two algorithms for providing maximum likelihood estimates of Wiener models that include both process and measurement noise. The first is based on the assumption that the process noise is white, and the second assumes that the process noise has been coloured by a linear filter. In the latter case, the likelihood function involves the evaluation of a high dimension integral, which is not tractable using traditional numerical integration techniques. Motivated by this, the chapter casts the Wiener model in the form of a nonlinear state-space model, which is directly amenable to a recently developed Expectation Maximisation algorithm. Of vital importance is that the expectation step can be approximated using sequential importance resampling (or particle) methods, which are easily implemented using standard desktop computing. This approach was profiled for the case of coloured process noise with very promising results. Finally, the case of blind Wiener model estimation can be directly handled using the expectation maximisation method presented here. The efficacy of this method was demonstrated via a simulation example.

References 1. Bai, E.-W.: Frequency domain identification of Wiener models. Automatica 39(9), 1521– 1530 (2003) 2. Billings, S.A.: Identification of non-linear systems - a survey. IEE Proc. D 127, 272–285 (1980) 3. Boyd, S., Chua, L.O.: Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Transactions on Circuits and Systems CAS-32(11), 1150–1161 (1985) 4. Dennis Jr, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood Cliffs (1983) 5. Gibson, S., Ninness, B.: Robust maximum-likelihood estimation of multivariable dynamic systems. Automatica 41(10), 1667–1682 (2005)

110

A. Wills and L. Ljung

6. Gordon, N.J., Salmond, D.J., Smith, A.F.M.: A novel approach to nonlinear/nonGaussian Bayesian state estimation. In: IEE Proceedings on Radar and Signal Processing, vol. 140, pp. 107–113 (1993) 7. Hagenblad, A., Ljung, L.: Maximum likelihood estimation of wiener models. In: Proc. 39:th IEEE Conf. on Decision and Control, Sydney, Australia, pp. 2417–2418 (2000) 8. Hagenblad, A., Ljung, L., Wills, A.: Maximum likelihood identification of wiener models. Automatica 44(11), 2697–2705 (2008) 9. Hsu, K., Vincent, T., Poolla, K.: A kernel based approach to structured nonlinear system identification part i: Algorithms, part ii: Convergence and consistency. In: Proc. IFAC Symposium on System Identification, Newcastle (2006) 10. Hunter, I.W., Korenberg, M.J.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cybernetics 55, 135–144 (1986) 11. Kalafatis, A., Arifin, N., Wang, L., Cluett, W.R.: A new approach to the identification of pH processes based on the Wiener model. Chemical Engineering Science 50(23), 3693– 3701 (1995) 12. Ljung, L., Singh, R., Zhang, Q., Lindskog, P., Juditski, A.: Developments in Mathworks system identification toolbox. In: Proc. 15th IFAC Symposium on System Identification, Saint-Malo, France (2009) 13. Ljung, L.: System Identification, Theory for the User, 2nd edn. Prentice-Hall, Englewood Cliffs (1999) 14. McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. John Wiley & Sons, Chichester (2008) 15. Nocedal, J., Wright, S.J.: Numerical Optimization, Second Edition, 2nd edn. Springer, New York (2006) 16. Press, W.H., Teukolsky, S.A., Vetterling, W.A., Fannery, B.P.: Numerical Recipes in C, the Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge (1992) 17. Schön, T., Wills, A., Ninness, B.: System identification of nonlinear state-space models. Automatica (provisionally accepted) (2009) 18. Schoukens, J., Nemeth, J.G., Crama, P., Rolain, Y., Pintelon, R.: Fast approximate identification of nonlinear systems. Automatica 39(7), 1267–1274 (2003) 19. Vanbaylen, L., Pintelon, R., de Groen, P.: Blind maximum likelihood identification of wiener systems with measurement noise. In: Proc. 15th IFAC Symposium on System Identification, Saint-Malo, France, pp. 1686–1691 (2009) 20. Vanbaylen, L., Pintelon, R., Schoukens, J.: Blind maximum-likelihood identification of wiener systems. IEEE Transactions on Signal Processing 57(8), 3017–3029 (2009) 21. Westwick, D., Verhaegen, M.: Identifying MIMO Wiener systems using subspace model identification methods. Signal Processing 52, 235–258 (1996) 22. Wigren, T.: Recursive prediction error identification using the nonlinear Wiener model. Automatica 29(4), 1011–1025 (1993) 23. Wills, A.G., Mills, A.J., Ninness, B.: A matlab software environment for system identification. In: Proc, 15th IFAC Symposium on System Identification, Saint-Malo, France (2009) 24. Zhu, Y.: Distillation column identification for control using Wiener model. In: 1999 American Control Conference, Hyatt Regency San Diego, California, USA (1999)

Chapter 8

Parametric Versus Nonparametric Approach to Wiener Systems Identification Grzegorz Mzyk

8.1 Introduction to Wiener Systems The problem of nonlinear dynamic systems modelling by means of block-oriented models has been strongly elaborated for the last four decades, due to vast variety of applications. The concept of block-oriented models assumes that the real plant, as a whole, can be treated as a system of interconnected blocks, static nonlinearities (N) and linear dynamics (L), where the interaction signals cannot be measured. The most popular in this class are two-element cascade structures, i.e., Hammersteintype (N-L), Wiener-type (L-N), and sandwich-type (L-N-L) representations. Particularly, since in the Wiener system (Figure 8.1) the nonlinear block is preceded by the linear dynamics and the nonlinearity input is correlated, its identification is much more difficult in comparison with the Hammerstein system. However the Wiener model allows for better approximation of many real processes. Such difficulties in theoretical analysis forced the authors to consider special cases, and to take somehow restrictive assumptions on the input signal, impulse response of the linear dynamic block and the shape of the nonlinear characteristic. In particular, for Gaussian input the problem of Wiener system identification becomes much easier. Since the internal signal {xk } is then also Gaussian, the linear block can be simply identified by the cross-correlation approach, and the static characteristic can be recovered e.g. by the nonparametric inverse regression approach ([14]-[16]). Non-Gaussian random input is very rarely met in the literature. It is allowed e.g. in [38], but the algorithm presented there requires prior knowledge of the parametric representation of the linear subsystem. Most of recent methods for Wiener system identification assumes FIR linear dynamics, invertible nonlinearity, or require the use of specially designed input excitations ([2], [12]). Grzegorz Mzyk Institute of Computer Engineering, Control and Robotics, Wroclaw University of Technology, Poland e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 111–125. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

112

G. Mzyk

Fig. 8.1: Wiener system

In the chapter we compare and combine two kinds of methods, parametric ([1], [2], [5]-[12], [27]-[32], [39]-[43], [45], [47], [48]) and nonparametric ([14]-[26], [33]-[38], [44]). The method is called ’parametric’ if both linear and nonlinear subsystems are described with the use of finite number of unknown parameters, e.g. when FIR linear dynamic model and polynomial characteristic with known orders are assumed. The popular parametric methods elaborated for Wiener system identification are not free of the approximation error and do not allow full decentralisation of the identification task of complex system. Moreover the theoretical analysis of identifiability, and convergence of parametric estimates remains relatively difficult. On the other hand, nonparametric approach offers simple algorithms, which are asymptotically free of approximation error, i.e. they converge to the true system characteristics. However, the purely nonparametric methods are not commonly exploited in practice for the following reasons: (i) they depend on various tuning parameters and functions; in particular, proper selection of kernel and the bandwidth parameter or orthonormal basis and the scale factor are critical for the obtained results, (ii) the prior knowledge of subsystems is completely neglected; the estimates are based on measurements only, and the resulting model may be not satisfactory when the number of measurements is small, and (iii) bulk number of estimates must be computed when the model complexity grows large. In Section 8.2 we recollect the traditional parametric least-squares method for Wiener system identification and discuss its weak points. Next, in Section 8.3, we present several purely nonparametric methods, i.e., correlation-based estimate of the linear dynamics, kernel estimate of the inverse regression, and a censored sample mean approach to nonlinearity recovering. Finally, selected parametric and nonparametric methods are combined and the properties of the proposed two-stage procedures are discussed in Section 8.4.

8.2 Nonlinear Least Squares Method The Wiener system, i.e. the linear dynamics with the impulse response discrete-time ∞ λ j j=0 , connected in the cascade with the static nonlinear block characterised by μ (), is described by the following equation & ' yk = μ

∞

∑ λ j uk− j

j=0

+ zk ,

8

Parametric Versus Nonparametric Approach to Wiener Systems Identification

113

where uk , yk , and zk are the input, output and the random disturbance, $ %∞ respectively.

(x) for The goal of identification is to recover both elements, i.e. λj and μ j=0

each x ∈ R, using the set of input-output measurements {(uk , yk )}Nk=1 . In the traditional (parametric) approach we also assume finite dimensional functionals, e.g. the ARMA-type dynamic block xk + a∗1xk−1 + ... + a∗r xk−r = b∗0 uk + b∗1uk−1 + ... + b∗s uk−s , xk = φkT θ ∗ ,

(8.1)

φk = (−xk−1 , ..., −xk−r , uk , uk−1 , ..., uk−s ) , T

θ ∗ = (a∗1 , ..., a∗r , b∗0 , b∗1 , ..., b∗s )T , and given formula μ (x, c∗ ) = μ (x) including finite number of unknown true parameters c∗ = (c∗1 , c∗2 , ..., c∗m )T . Respective Wiener model is thus represented by r + (s + 1) + m parameters, i.e., T

xk φ k θ , and xk = 0 for k 0, T

where φ k = (−xk−1 , ..., −xk−r , uk , uk−1 , ..., uk−s )T ,

(8.2)

θ = (a1 , ..., ar , b0 , b1 , ..., bs )T , and y(x, c) = μ (x, c), where c = (c1 , c2 , ..., cm )T . If the assumed model (8.2) agrees with the true system description (8.1), then the results of identification can be significantly improved in comparison with the nonparametric approach, particularly, if the number of measurements is small. On the other hand, the risk of bad parametrisation and existence of systematic approximation error must be taken into account together with the warranty of the parameter estimates converge. If xk had been accessible for measurements then the true system parameters could have been estimated by the following minimisation N

N

θ = arg min ∑ (xk − xk (θ ))2 ,

c = argmin ∑ (yk − y(xk , c))2 .

θ k=1

c

(8.3)

k=1

Here we assume that only the input-output measurements (uk , yk ) of the whole Wiener system are accessible, and the internal signal xk is hidden. This observation leads to the following nonlinear least squares problem N

θ , c = arg min ∑ [yk − y(xk (θ ), c)]2 , θ ,c k=1

(8.4)

which is usually very complicated numerically. Moreover, uniqueness of the solution in (8.4) cannot be guaranteed in general, as it depends on both input distribution, types of models, and values of parameters. Recent publications concerning

114

G. Mzyk

application of neural networks or soft computing methods to Wiener system identification problem do not include any theoretical analysis of convergence.

8.3 Nonparametric Identification Tools The nonparametric approach to block-oriented system identification was introduced in eighties by Greblicki and Pawlak ([20], [37]). For the system of Hammerstein ∞ structure, the reverse connection described by the equation yH k = ∑ j=0 γ j m(uk− j ) + zk , it was noticed that the input-output regression function is equivalent to nonlinear static characteristic, up to some scale and offset factors R(u) E yH k+l |uk = u = γl m(u) + ∑ γ j Em(u1 ). j=l

Since then, two kinds of nonparametric methods have been examined – first, based on the kernel regression estimation, and second, employing the orthogonal series expansion of the nonlinearity. Also the cross-correlation method was proposed for estimation of the impulse response of linear dynamic block in Hammerstein system. The analogous ideas was also applied by Greblicki for a class of Wiener systems, with Gaussian input and locally invertible characteristics [16]. Respective algorithms are shortly reminded in Sections 8.3.1 and 8.3.2. In Section 8.3.3 we introduce and analyse a new kind of nonlinearity estimate in Wiener system, which works under the least possible prior knowledge, i.e. under non-Gaussian input, IIR linear dynamics and any continuous, but not necessary invertible, static characteristic.

8.3.1 Inverse Regression Approach Assume that the input uk and the noise εk are white Gaussian, mutually independent processes with finite variances σu2 , σε2 < ∞, the noise εk is zero-mean, i.e. E εk = 0, and the output measurement noise zk is not present, i.e. yk = yk . The nonparametric estimation of the inverted regression relies on the following lemma. Lemma 8.1. [21] If μ () is invertible then for any y ∈ μ (R) it holds that E uk |yk+p = y = α p μ −1 (y)

(8.5)

where α p = λ p σσu2 . 2

v

Since for any time lag p, the μ −1 (y) can be identified only up to some multiplicative constant α p , let us denote, for convenience, v(y) = α p μ −1 (y). The nonparametric of v(y) has the form y−yk+p ∑Nk=1 uk K( h(N) ) v (y) = , (8.6) y−yk+p ∑Nk=1 K( h(N) )

8

Parametric Versus Nonparametric Approach to Wiener Systems Identification

115

where K() and h(N) is a kernel function and bandwidth parameter, respectively. The following theorem holds. Theorem 8.1. [21] If μ () is invertible, K() is Lipschitz and such that c1 H(|y|) K(y) c2 H(|y|) for some c1 and c2 , where H() is nonnegative and non-increasing function, defined of [0, ∞), continuous and positive at t = 0, and such that tH(t) → 0 as t → ∞, then for h(N) → 0 and Nh2 (N) → ∞ as N → ∞ it holds that v (y) → v(y) in probability as N → ∞,

(8.7)

at every point y, in which the probability density f (y) is positive and continuous. The rate of convergence in (8.7) depends on the smoothness of the identified characteristic and is provided by the following lemma. −1 Lemma 8.2. [21] Let us define g(y) v(y) f (y) and denote v(y) = g(y) f (y) . If μ (), f () and g() have q bounded derivatives in a neighbourhood of y, then " 1 1 # | v(y) − v(y)| = O N − 2 + 2q+2 in probability.

e.g. O(N −1/4 ) for q = 1, O(N −1/3 ) for q = 2, and O(N −1/2 ) for q large. In [16] and [19], the estimate (8.6) was generalised for the larger class of Wiener systems, admitting the “locally invertible” nonlinear static blocks and correlated excitation. The strongest limitation of the inverse regression approach is thus assumption about the Gaussianity of the input signal.

8.3.2 Cross-correlation Analysis The nonparametric identification of the linear dynamic block is based on the following property. Lemma 8.3. [21] If E |vk μ (vk )| < ∞ then E uk yk+p = β λ p , where β =

σu2 E {vk μ (vk )}. σv2

Since λ p can be identified only up to some multiplicative constant β , let us denote, for convenience, κ p β λ p , and consider its natural estimate of the form

κ p =

1 N ∑ uk yk+p . N k=1

Theorem 8.2. [21] If μ () is the Lipschitz function, then 1 2 lim E (κ p − κ p) = O . N→∞ N

(8.8)

(8.9)

116

G. Mzyk

Consequently, when the stable IIR linear subsystem is modelled by the filter with the impulse response κ 0 , κ 1 , ..., κ n(N) , then it is free of the asymptotic approximation error if n(N) → ∞ and n(N)/N → 0 as N → ∞.

8.3.3 A Censored Sample Mean Approach In this section we assume that the input {uk } is an i.i.d., bounded (|uk | < umax ; unknown umax < ∞), but not necessary Gaussian random process. There exists a probability density of the input, ϑu (uk ) say, which is a continuous and strictly positive function around the estimation point x, i.e., ϑu (x) ε > 0. The unknown impulse response {λ j }∞j=0 of the linear IIR filter is exponentially upper bounded, that is ( ( (λ j ( c1 λ j , some unknown 0 < c1 < ∞, (8.10) where 0 < λ < 1 is an a priori known constant. The nonlinearity μ (x) is an arbitrary function, continuous almost everywhere on x ∈ (−umax , umax ) (in the sense of Lebesgue measure). The output noise {zk } is a zero-mean stationary and ergodic process, which is independent of the input {uk }. For simplicity of presentation we also let L ∑∞j=0 λ j = 1 and umax = 12 . We note that the members of the family of Wiener systems composed by series connection of linear filters with the impulse λ responses {λ j } = { c2j }∞j=0 and the nonlinearities μ (x) = μ (c2 x) are, for c2 = 0, indistinguishable from the input-output point of view. In consequence, the characteristic μ () can be recovered in general only up to some domain scaling factor c2 , independently of the applied identification method. Observe that, in particular, for the FIR linear dynamics, the condition (8.10) is fulfilled for arbitrarily small (con( stant λ > 0. Moreover, it holds that |xk | < xmax < ∞, where xmax umax ∑∞j=0 (λ j (. ( ( Since ∑∞j=0 (λ j ( L and L = 1, thus the support of the random variables xk , i.e. (−xmax , xmax ), is generally wider than the estimation interval x ∈ (−umax , umax ). We introduce and analyse the nonparametric estimate of the part of characteristic μ (x), for x ∈ (−umax , umax ), and next we expand the obtained results for x ∈ (−xmax , xmax ), when the parametric knowledge of μ () is provided. Let x be a chosen estimation point of μ (·). For a given x let us define a “weighted distance” between the measurements uk , uk−1 , uk−2 , ..., u1 and x as

δk (x)

k−1 (

(

∑ (uk− j − x( λ j = |uk − x| λ 0 + |uk−1 − x| λ 1 + ... + |u1 − x| λ k−1 ,

(8.11)

j=0

i.e. δ1 (x) = |u1 − x|, δ2 (x) = |u2 − x| + |u1 − x| λ , δ3 (x) = |u3 − x| + |u2 − x| λ + |u1 − x| λ 2 , etc., which can be computed recursively as follows

δk (x) = λ δk−1 (x) + |uk − x|.

(8.12)

8

Parametric Versus Nonparametric Approach to Wiener Systems Identification

117

Under above assumptions we obtain ( ( ( ( (∞ ( (∞ ∞ (( ( ( ( |xk − x| = ( ∑ λ j uk− j − ∑ λ j x( = ( ∑ λ j uk− j − x ( = ( j=0 ( ( j=0 ( j=0 ( ( (k−1 ∞ (( ( = ( ∑ λ j uk− j − x + ∑ λ j uk− j − x ( ( j=0 ( j=k

k−1 (

((

∞

(

( (

λk

∑ (λ j ( (uk− j − x( + 2umax ∑ (λ j ( δk (x) + 1 − λ

j=0

Observe that if in turn

Δk (x).

(8.13)

j=k

Δk (x) h(N),

(8.14)

then the true (but unknown) interaction input xk is located close to x, provided that h(N) (further, a calibration parameter) is small. The distance given in (8.13) may be easily computed as the point x and the data uk , uk−1 , uk−2 , ..., u1 are each time at ones disposal. In turn, the condition (8.14) selects k’s for which the input sequences {uk , uk−1 , uk−2 , ..., u1 } are such that the true nonlinearity inputs {xk } surely belong to the neighbourhood of the estimation point x with the radius h(N). Let us also notice that asymptotically, as k → ∞, it holds that

δk (x) = Δk (x),

(8.15)

with probability 1. Proposition 8.1. If, for each j = 0, 1, ..., ∞ and some d > 0, it holds that ( ( (uk− j − x( d , λj

(8.16)

then

λ . (8.17) 1−λ Proof. The condition (8.16) is fulfilled with probability 1 for each j > j0 , where j0 = logλ d is the solution of the following inequality |xk − x| d logλ d + d

d 2umax = 1. λj Analogously as in (8.13), we obtain |xk − x|

j0

d

λ j0 +1

∑ λ jλ j + 1−λ ,

j=0

which yields (8.17).

We propose the following nonparametric kernel-like estimate of the nonlinear characteristic μ () at the given point x, exploiting the distance δk (x) between xk and x, and having the form

118

G. Mzyk

# " δk (x) ∑Nk=1 yk · K h(N)

N (x) = # , " μ δk (x) ∑Nk=1 K h(N) where K() is the window kernel function of the form 1, as |v| 1 K(v) = . 0, elsewhere

(8.18)

(8.19)

Since the estimate (8.18) is of the ratio form we treat the case 0/0 as 0. Theorem 8.3. If h(N) = d(N) logλ d(N), where d(N) = N −γ (N) , and " #−w γ (N) = log1/λ N , then for each w ∈ 12 , 1 the estimate (8.18) is consistent in the mean square sense, i.e., it holds that

N (x) − μ (x)) = 0. lim E (μ 2

N→∞

(8.20)

Proof. Let us denote the probability of selection as p(N) P (Δk (x) h(N)). To prove (8.20) it suffices to show that (see (19) and (22) in [33]) h(N) → 0, N p(N) → ∞,

(8.21) (8.22)

N (x), respectively. as N → ∞. They assure vanishing of the bias and variance of μ Since under assumptions of Theorem 8.3 d(N) → 0 ⇒ h(N) → 0,

(8.23)

in view of (8.17), the bias-condition (8.21) is obvious. For the variance-condition (8.22) we have ⎧ ⎫ ⎨min(k, - j0 ) ( ( d(N) ⎬ (uk− j − x( < p(N) P ⎩ j=0 λj ⎭ ⎫ ⎧ ⎨min(k, j0 - j0 ) ( ( d(N) ⎬ ( ( d(N) ( ( (uk− j − x( < u < = P P − x k− j ⎩ j=0 λj ⎭ ∏ λj j=0 d(N) d(N) d(N) (ε d(N)) j0 +1 · ε · ... · ε = = j0 ( j0 +1) λ0 λ1 λ j0 λ 2 & ' j0 +1 " 1 # j0 +1 1 1 ε d(N) = = ε d(N) = ε · d(N) 2 logλ d(N)+logλ ε + 2 . j0 λ2 ε

(8.24)

8

Parametric Versus Nonparametric Approach to Wiener Systems Identification

119

By inserting d(N) = N −γ (N) = (1/λ )−γ (N) log1/λ N to (8.24) we obtain N · p(N) = ε · N 1−γ (N)( 2 γ (N) log1/λ N+logλ ε + 2 ) . 1

1

(8.25)

" #−w For γ (N) = log1/λ N and w ∈ 12 , 1 from (8.25) we simply conclude (8.22) and consequently (8.20). To establish the asymptotic rate of convergence we additionally assume that the nonlinear characteristic μ (x) is a Lipschitz function, i.e., it exists a positive constant l < ∞, such that for each xa , xb ∈ R it holds that |μ (xa ) − μ (xb )| l |xa − xb |. 0

N (x) = S1 ∑Si=1 For a window kernel (8.19) we can rewrite (8.18) as μ y[i] , where 0 " δ (x) # [i] [i]’s are indexes, for which K h(N) = 1, and S0 is a random number of semeasurements. For each y[i] , i = 1, 2, ..., S0 , respective x[i] is such that (lected output ( ( ( (x[i] − x( h(N) and consequently ( ( ( ( (μ (x[i] ) − μ (x)( lh(N), which for Ezk = 0 leads to ( ( ( (

N (x)| = (Ey[i] − μ (x)( = ((E μ (x[i] ) − μ (x)(( lh(N), |biasμ

N (x) = O h2 (N) . bias2 μ For the variance we have

(8.26)

&

' 1 n

N (x) = ∑ P(S0 = n) · var(μ

N (x)|S0 = n) = ∑ P(S0 = n) · var varμ ∑ y[i] . n i=1 n=0 n=1 N

N

Since, under strong law of large numbers and Chebychev inequality, it holds that limN→∞ P (S0 > α ES0 ) = 1 for each 0 < α < 1 (see [33]), we obtain asymptotically & ' 1 n

N (x) = ∑ P(S0 = n) · var varμ (8.27) ∑ y[i] n i=1 n>α ES0 with probability 1. Taking into account that y[i] = y[i] + z[i] , where y[i] and z[i] are independent random variables we obtain ' & ' & ' & 1 n 1 n 1 n (8.28) var ∑ y[i] = var n ∑ y[i] + var n ∑ z[i] . n i=1 i=1 i=1 Since the process z[i] is ergodic, under strong law of large numbers, it holds that & var

1 n ∑ z[i] n i=1

'

=O

1 N p(N)

1 =O . N

(8.29)

120

G. Mzyk

The process {y[i] } is in general not ergodic, but in consequence of (8.14) it has compact support [μ (x) − lh(N), μ (x) + lh(N)] and the following inequality holds & ' 1 n var (8.30) ∑ y[i] vary[i] (2lh(N))2 . n i=1 From (8.27), (8.28), (8.29) and (8.30) we conclude that

N (x) = O(h2 (N)), varμ

(8.31)

which in view of (8.26) leads to

N (x) − μ (x)| = O(h2 (N)) |μ

(8.32)

in the mean square sense. A relatively slow rate of convergence, guaranteed in a general case, for h(N) as in Theorem 8.3, is a consequence of small amount of a priori information. We emphasise that for, e.g., often met in applications

N (x) = 0 and piecewise constant functions μ (x), it exists N < ∞, such that bias2 μ # " 1 n

N (x) − μ (x)| = O( N1 ) as N → ∞ var n ∑i=1 y[i] = 0 for N > N, and consequently |μ (see (8.29)).

8.4 Combined Parametric-nonparametric Approach The idea of the combined parametric-nonparametric approach to system identification was introduced by Hasiewicz and Mzyk in [24], and continued in [25], [33], [35], and [38]. The algorithms decompose the complex system identification task on independent identification problems for each components. The decomposition is based on the estimation of interaction inputs xk . Next, using the resulting pairs xk , yk ), both linear dynamic and static nonlinear subsystems are iden(uk , x k ) and ( tified separately. In the contrary to Hammerstein system, where xH k = m(uk ) may be estimated directly by any regression estimation method, for Wiener system the situation is more complicated as xk = ∑∞j=0 λ j uk− j , and the impulse response of the linear dynamics must be estimated first, to provide indirect estimates of xk .

8.4.1 Kernel Method with the Correlation-based Internal Signal Estimation Here we assume that the input uk is white and Gaussian, the nonlinear characteristic μ () is bounded by polynomial of any finite order, cov(u1 , y1 ) = 0, and the linear dynamics is FIR with known order s, i.e. xk = ∑sj=0 λ j uk− j . Observe that E{yk |xk = x} = μ (x). Since the internal signal xk cannot be measured, the following kernel regression estimate is proposed (see [44])

8

Parametric Versus Nonparametric Approach to Wiener Systems Identification

# " x− x ∑Nk=1 yk K h(N)k

(x) = # , " μ x− x ∑Nk=1 K h(N)k

121

(8.33)

where x k is indirect estimate of β xk (i.e. scaled xk ) x k =

s

∑ κ j uk− j

j=0

based on the input-output sample correlation (see (8.8)). The following theorem holds.

(x) → μ (x/β ) in probability as N → ∞. Theorem 8.4. [44] If K() is Lipschitz then μ Moreover, if both μ () and K() are twice differentiable, then it holds that " 2 #

(x) − μ (x/β )| = O N − 5 +ε (8.34) |μ for any small ε > 0, provided that h(N) ∼ N −1/5 . In practise, due to assumed Gaussianity of excitations, the algorithm (8.34) is rather recommended for the tasks, in which the input process can be freely generated.

8.4.2 Identification of IIR Wiener Systems with Non-Gaussian Input The presented kernel-type algorithm (8.18) is applied in this section to support estimation of parameters, when our prior knowledge about the system is large, and in particular, the parametric model of the characteristic is known. Assume that we are given the class μ (x, c), such that μ (x) ⊂ μ (x, c), where c = (c1 , c2 , ..., cm )T , i.e. for the vector of true parameters c∗ = (c∗1 , c∗2 , ..., c∗m )T it holds that μ (x, c∗ ) = μ (x). Let moreover the function μ (x, c) be by assumption differentiable with respect to c, and the gradient c μ (x, c) be bounded in some convex neighbourhood of c∗ for each x. We assume that c∗ is identifiable, i.e., there exists such a sequence x(1) , x(2) , ..., x(N0 ) of estimation points, that

μ (x(i) , c) = μ (x(i) ), i = 1, 2, ..., N0 =⇒ c = c∗ . The proposed estimate has two steps. Step 1. For the sequence x(1) , x(2) , ..., x(N0 ) compute N0 pairs $" #%N0

N (x(i) ) x(i) , μ , i=1

using the estimate (8.18).

122

G. Mzyk

Step 2. Perform the minimisation of the cost-function N0 " #2

N (x(i) ) − μ (x(i) , c) , QN0 ,N (c) = ∑ μ i=1

with respect to the variable vector c, and take c N0 ,N = arg min QN0 ,N (c) c

(8.35)

as the estimate of c∗ . Theorem 8.5. Since in Step 1 (nonparametric) for the estimate (8.18) it holds that

N (x(i) ) → μ (x(i) ) in probability as N → ∞ for each i = 1, 2, ..., N0 , thus μ c N0 ,N → c∗ in probability, as N → ∞. Proof. The proof is analogous to that of Theorem 1 in [25].

8.4.3 Recent Ideas The interesting new attempt to the impulse response estimation of the linear block in Wiener system is presented in [44]. It is assumed that the input probability density f (u) has compact support, both μ () and f () have continuous derivatives, and the linear dynamics is FIR with known order s. We emphasise that similarly as in the correlation-based algorithm (see Section 8.3.2) the characteristic μ () need not to be invertible and moreover the input density is not assumed to be Gaussian. The idea follows from the chain rule. Introducing the vectors uk = (uk , uk−1 , ..., uk−s )T and λ = (λ0 , λ1 , ..., λs )T one can describe the Wiener system by the following formula yk = F(uk ) + zk , where F(uk ) = μ (λ T uk ). Let DF (u) be the gradient of F(). It holds that DF (uk ) = μ (λ T uk )λ and consequently $ % (8.36) E {DF (uk )} = c0 λ , where c0 = E μ (λ T uk ) . It leads to the idea of estimation of the scaled vector λ , including the true elements of the impulse response, by the gradient averaging. Since for a given uk , μ (λ T uk ) is unknown, DF (uk ) cannot be computed directly. Introducing fu () – the joint probability density of uk , the property (8.36) can be transformed to the more applicable form ([44]) % 1 $ E yk D f (uk ) = c1 λ , where c1 = E f (uk )μ (λ T uk ) , 2

8

Parametric Versus Nonparametric Approach to Wiener Systems Identification

123

and D f (u) is a gradient of fu (). Since for white uk we have fu (uk ) = ∏sj=0 f (uk− j ), it leads to the following scalar estimates of the impulse response s 1 N

λ j = ∑ yk d f , j (uk ), where d f , j (uk ) = D f (uk )[ j] = f (uk− j ) ∏ f (uk−i ). N k=1 i=0,i= j (8.37) If the input probability density function f (u) is unknown, it can be simply estimated, e.g., by the kernel method. The open question is generalisation of the approach for IIR linear subsystems and correlated input cases.

8.5 Conclusion The principal question in Wiener system identification problem is selection of adequate method. The scope of application of each estimates is limited by specific set of associated assumptions. Most of them requires a priori known parametric type of model, Gaussian input, FIR dynamics or invertible characteristic. In fact, the authors address particular cases, and the problems they solve are quite different (see references below). Since the general Wiener system identification problem includes many difficult aspects, existence of one universal algorithm cannot be expected. In the light of this, the nonparametric approach seems to be good tool, which allows for combining selected methods, depending on specificity of the particular task. Moreover, pure nonparametric estimates are the only possible choice, when the prior knowledge of the system is poor.

References 1. Bai, E.W.: A blind approach to Hammerstein–Wiener model identification. Automatica 38, 969–979 (2002) 2. Bai, E.W.: Frequency domain identification of Wiener models. Automatica 39, 1521– 1530 (2003) 3. Bai, E.W., Reyland, J.: Towards identification of Wiener systems with the least amount of a priori information on the nonlinearity. Automatica 44, 910–919 (2008) 4. Bai, E.W., Reyland, J.: Towards identification of Wiener systems with the least amount of a priori information: IIR cases. Automatica 45(4), 956–964 (2009) 5. Bershad, N.J., Bouchired, S., Castanie, F.: Stochastic analysis of adaptive gradient identification of Wiener–Hammerstein systems for Gaussian inputs. IEEE Transactions on Signal Processing 48(2), 557–560 (2000) 6. Bershad, N.J., Celka, P., Vesin, J.M.: Analysis of stochastic gradient tracking of timevarying polynomial Wiener systems. IEEE Transactions on Signal Processing 48(6), 1676–1686 (2000) 7. Billings, S.A., Fakhouri, S.Y.: Identification of nonlinear systems using the Wiener model. Automatica 13(17), 502–504 (1977) 8. Billings, S.A., Fakhouri, S.Y.: Identification of systems containing linear dynamic and static nonlinear elements. Automatica 18(1), 15–26 (1982)

124

G. Mzyk

9. Celka, P., Bershad, N.J., Vesin, J.M.: Fluctuation analysis of stochastic gradient identification of polynomial Wiener systems. IEEE Transactions on Signal Processing 48(6), 1820–1825 (2000) 10. Celka, P., Bershad, N.J., Vesin, J.M.: Stochastic gradient identification of polynomial Wiener systems: analysis and application. IEEE Transactions on Signal Processing 49(2), 301–313 (2001) 11. Chen, H.F.: Recursive identification for Wiener model with discontinuous piece-wise linear function. IEEE Transactions on Automatic Control 51(3), 390–400 (2006) 12. Giri, F., Rochdi, Y., Chaoui, F.Z.: An analytic geometry approach to Wiener system frequency identification. IEEE Transactions on Automatic Control 54(4), 683–696 (2009) 13. Giannakis, G.B., Serpedin, E.: A bibliography on nonlinear system identification. Signal Processing 81, 533–580 (2001) 14. Greblicki, W.: Nonparametric identification of Wiener systems. IEEE Transactions on Information Theory 38, 1487–1493 (1992) 15. Greblicki, W.: Nonparametric identification of Wiener systems by orthogonal series. IEEE Transactions on Automatic Control 39(10), 2077–2086 (1994) 16. Greblicki, W.: Nonparametric approach to Wiener system identification. IEEE Transactions on Circuits and Systems – I: Fundamental Theory and Applications 44(6), 538–545 (1997) 17. Greblicki, W.: Continuous-time Wiener system identification. IEEE Transactions on Automatic Control 43(10), 1488–1493 (1998) 18. Greblicki, W.: Recursive identification of Wiener systems. International Journal of Applied Mathematics and Computer Science 11(4), 977–991 (2001) 19. Greblicki, W.: Nonlinearity recovering in Wiener system driven with correlated signal. IEEE Transactions on Automatic Control 49(10), 1805–1810 (2004) 20. Greblicki, W., Pawlak, M.: Identification of discrete Hammerstein systems using kernel regression estimates. IEEE Transactions on Automatic Control 31, 74–77 (1986) 21. Greblicki, W., Pawlak, M.: Nonparametric System Identification. Cambridge University Press, Cambridge (2008) 22. Greblicki, W., Mzyk, G.: Semiparametric approach to Hammerstein system identification. In: Proceedings of the 15th IFAC Symposium on System Identification, Saint-Malo, France, July 6-8, pp. 1680–1685 (2009) 23. Hasiewicz, Z.: Identification of a linear system observed through zero-memory nonlinearity. International Journal of Systems Science 18, 1595–1607 (1987) 24. Hasiewicz, Z., Mzyk, G.: Combined parametric-nonparametric identification of Hammerstein systems. IEEE Transactions on Automatic Control 49, 1370–1376 (2004) 25. Hasiewicz, Z., Mzyk, G.: Hammerstein system identification by non-parametric instrumental variables. International Journal of Control 82(3), 440–455 (2009) ´ 26. Hasiewicz, Z., Sliwi´ nski, P., Mzyk, G.: Nonlinear system identification under various prior knowledge. In: Proceedings of the 17th World Congress the IFAC, Seoul, Korea, pp. 7849–7858 (2008) 27. Hagenblad, A., Ljung, L., Wills, A.: Maximum likelihood identification of Wiener models. Automatica 44, 2697–2705 (2008) 28. Hu, X.L., Chen, H.F.: Strong consistence of recursive identification of Wiener systems. Automatica 41, 1905–1916 (2005) 29. Hughes, M.C., Westwick, D.T.: Identification of IIR Wiener systems with spline nonlinearities that have variable knots. IEEE Transactions on Automatic Control 50(10), 1617–1622 (2005) 30. Hunter, I.W., Korenberg, M.J.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cybernetics 55, 135–144 (1986)

8

Parametric Versus Nonparametric Approach to Wiener Systems Identification

125

31. Korenberg, M.J., Hunter, I.W.: The identification of nonlinear biological systems: LNL cascade models. Biological Cybernetics 55, 125–134 (1986) 32. Lacy, S.L., Bernstein, D.S.: Identification of FIR Wiener systems with unknown, noninvertible, polynomial non-linearities. International Journal of Control 76(15), 1500– 1507 (2003) 33. Mzyk, G.: A censored sample mean approach to nonparametric identification of nonlinearities in Wiener systems. IEEE Transactions on Circuits and Systems – II: Express Briefs 54(10), 897–901 (2007) 34. Mzyk, G.: Generalized kernel regression estimate for the identification of Hammerstein systems. International Journal of Applied Mathematics and Computer Science 17(2), 101–109 (2007) 35. Mzyk, G.: Nonlinearity recovering in Hammerstein system from short measurement sequence. IEEE Signal Processing Letters 16(9), 762–765 (2009) 36. Mzyk, G.: Kernel-type identification of IIR Wiener systems with non-gaussian input. IEEE Transactions on Signal Processing (2010) 37. Nadaraya, E.A.: On estimating regression. Theory of Probability and its Applications 9, 141–142 (1964) 38. Pawlak, M., Hasiewicz, Z., Wachel, P.: On nonparametric identification of Wiener systems. IEEE Transactions on Signal Processing 55(2), 482–492 (2007) 39. Pupeikis, R.: On the identification of Wiener systems having saturation-like functions with positive slopes. Informatica 16(1), 131–144 (2005) 40. Rafajłowicz, E.: Non-parametric identification with errors in independent variables. International Journal of Systems Science 25(9), 1513–1520 (1994) 41. Vanbeylen, L., Pintelon, R., Schoukens, J.: Blind maximum-likelihood identification of Wiener systems. IEEE Transactions on Signal Processing 57(8), 3017–3029 (2009) 42. Vandersteen, G., Schoukens, J.: Measurement and identification of nonlinear systems consisting of linear dynamic blocks and one static nonlinearity. IEEE Transactions on Automatic Control 44(6), 1266–1271 (1999) 43. Vörös, J.: Parameter identification of Wiener systems with multisegment piecewiselinear nonlinearities. Systems and Control Letters 56, 99–105 (2007) 44. Wachel, P.: Parametric-Nonparametric Identification of Wiener Systems, PhD Thesis (in Polish) Wrocław University of Technology, Poland (2008), http://diuna.iiar.pwr.wroc.pl/wachel/rozprawa.pdf 45. Westwick, D., Verhaegen, M.: Identifying MIMO Wiener systems using subspace model identification methods. Signal Processing 52, 235–258 (1996) 46. Wiener, N.: Nonlinear Problems in Random Theory. Wiley, New York (1958) 47. Wigren, T.: Convergence analysis of recursive identification algorithms based on the nonlinear Wiener model. IEEE Transactions on Automatic Control 39, 2191–2206 (1994) 48. Zhao, Y., Wang, L.Y., Yin, G.G., Zhang, J.F.: Identification of Wiener systems with binary-valued output observations. Automatica 43, 1752–1765 (2007)

Chapter 9

Identification of Block-oriented Systems: Nonparametric and Semiparametric Inference M. Pawlak

9.1 Introduction Block-oriented nonlinear systems are represented by a certain composition of linear dynamical and nonlinear static models. Hence, a block-oriented system is defined by the pair (λ , m(•)), where λ defines infinite-dimensional parameter representing impulse response sequences of linear dynamical subsystems, whereas m(•) is a vector of nonparametric multidimensional functions describing nonlinear elements. In the parametric identification approach to block-oriented systems one assumes that both λ and m(•) are known up to unknown finite dimensional parameters, i.e., λ = λ (ϑ ) and m(•) = m(•; ζ ) for ϑ , ζ being finite dimensional unknown parameters. There are numerous identification algorithms for estimating ϑ , ζ representing specific block-oriented systems, see [5] for an overview of the subject. Although such methods are quite accurate, it is well known, however, that parametric models carry a risk of incorrect model specification. On the other hand, in the nonparametric setting λ and m(•) are completely unspecified and therefore the corresponding nonparametric block-oriented system does not suffer from risk of misspecification. Nevertheless, since nonparametric estimation algorithms make virtually no assumptions about the form of (λ , m(•)) they tend to be slower to converge to the true characteristics of a block-oriented system than correctly specified parametric algorithms. Moreover, the convergence rate of nonparametric methods is inversely proportional to the dimensionality of input and interconnecting signals. This is commonly referred to as the “curse of dimensionality”. Nonparametric methods have attracted a great deal of attention in the statistical science, see [14] for a recent overview and a list of a large number of texts written by statisticians. The number of texts on nonparametric algorithms tailored to the needs of engineering and system identification in particular is much smaller, see [6] and [10] for recent contributions. M. Pawlak Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, Manitoba, R3T5V6 Canada e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 127–146. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

128

M. Pawlak

In practice, we can accept intermediate models which lie between parametric and fully nonparametric cases. For this so called semiparametric models we specify a parametric form for some part of the model but we do not require the parametric assumption for the remaining parts of the model. Hence, the nonparametric description (λ , m(•)) of the system is now replaced by (θ , g(•)), where θ is a finite dimensional vector and g(•) is a set of nonparametric nonlinearities being typically univariate functions. The parameter θ represents characteristics of all linear dynamical subsystems and low-dimensional approximations of multivariate nonlinearities. The fundamental issue is to characterise the accuracy of approximation of (λ , m(•)) by the selected semiparametric model (θ , g(•)). This challenging problem will be addressed in this paper in some specific cases. Once such characterisation is resolved, we cam make use of this low complexity semiparametric representation to design practical identification algorithms that share the efficiency of parametric modelling while preserving the high flexibility of the nonparametric case. In fact, in many situations we are able to identify linear and nonlinear parts of a block-oriented system under much weaker conditions on the system characteristics and underlying probability distributions. A semiparametric inference is based on the concept of blending together parametric and nonparametric estimation methods. The basic idea is to first analyse the parametric part of the block-oriented structure as if all nonparametric parts were known. To eliminate the dependence of a parametric fitting criterion on the characteristics of the nonparametric parts, we form pilot nonparametric estimates of the characteristics being indexed by a finite-dimensional vector of the admissible value of the parameter. Then this is used to establish a parametric fitting criterion (such as least squares) with random functions representing all estimated nonparametric characteristics. On the other hand, nonparametric estimates of the characteristics use estimates of the parametric part. As a result of this interchange, we need some data resampling schemes in order to achieve some statistical independence between the estimators of parametric and nonparametric parts of the system. This improves the efficiency of the estimates and facilitates the mathematical analysis immensely. In Section 2 we examine sufficient conditions for the convergence of our identification algorithms for a general class of semiparametric block-oriented systems. This general theory is illustrated (Section 3) in the case of semiparametric versions of multivariate Hammerstein system and parallel systems. We show in this context that the semiparametric strategy leads to consistent estimates of (θ , g(•)) with rates which are independent of the signal dimensionality. These results are also verified in simulation studies. An overview of the theory and applications of semiparametric inference in statistics and economics can be found in [8], [15].

9.2 Nonparametric and Semiparametric Inference The modern nonparametric inference provides a plethora of estimation methods allowing us to recover system characteristics with the minimum knowledge about

9

Nonparametric and Semiparametric Inference

129

their functional forms. This includes classical methods like k−nearest neighbours, kernel and series estimators. On the other hand, sparse basis functions, regularisation techniques, support vector machines, and boosting methods define modern alternatives [14], [9]. For a given set of training data DN = {(X1 ,Y1 ), . . . , (XN ,YN )} taken at the input and output of a certain system, a typical nonparametric estimate of a regression function m(x) = E{Yt |Xt = x} can be written as N

mˆ N (x) =

∑ αt Yt K(x, Xt ),

(9.1)

t=1

where K(x, v) is a kernel function and {αt } is a normalising factor. For instance, in the classical kernel estimate αt = (∑Ni=1 K(x, Xi ))−1 for each t , where K(x, v) is the compactly supported kernel of the convolution type, i.e., K(x, v) = K(x − v). On the other hand, in support vector kernel machines, αt is selected by the optimisation algorithm defined by the maximal-margin separation principle and the kernel function is of the inner product type (Mercer’s kernels) K(x, v) = ∑l φl (x)φl (v). In order to achieve the desired consistency property, i.e., that mˆ N (x) tends to m(x) as N → ∞, the kernel function must be tuned locally. This can be achieved by introducing the concept of smoothing parameters that control the size of local information that is employed in the estimation process. The consistency is the unavoidable property and is met by most classical nonparametric techniques. Some modern techniques like support vector machines are not local since they use the entire training data in the design process. This can be partially overcome by using the proper so-called universal kernels. A serious limitation in the use of nonparametric estimators is the fact that they are prune to the dimensionality of observed signals as well as the smoothness of estimated characteristics [14], [6]. To illustrate this point, let us consider the following multiple-input, single-output (MISO), nonlinear autoregressive model of order p : Yn = m(Yn−1 ,Yn−2 , . . . ,Yn−p , Un ) + Zn ,

(9.2)

where Un ∈ Rd is the input signal, Zn is noise process, and m(•, •) is a (p + d) dimensional function. It is clear that m(•, •) is a regression function of Yn on the past outputs Yn−1 ,Yn−2 , . . . ,Yn−p and the current input Un . Thus, it is a straightforward task to form a multivariate nonparametric regression estimate such as the one in (9.1), where the signal Xt ∈ R p+d is defined as Xt = (Yt−1 ,Yt−2 , . . . ,Yt−p , Ut ). The convergence analysis, see [6], of such an estimate will strongly depend on the stability conditions of the nonlinear recursive difference equation: yn = m(yn−1 , yn−2 , . . . , yn−p , un ). With this respect, a fading-memory type assumption along with the Lipschitz continuity of m(•, •) seem to be sufficient for the consistency of nonparametric regression estimates. Hence, for m(•, •) being a Lipschitz continuous function the best possible rate can be

130

M. Pawlak

" # − 1 OP N 2+p+d , where OP (•) denotes the rate in probability. For the second order system p = 2 and double-input (d = 2) this gives a very slow rate of OP (N −1/6 ). This apparent curse of dimensionality also exists in the case of the MISO Hammerstein system which will be examined in the next section. To overcome this problem one can consider to approximate the regression function m(x) = E{Yt |Xt = x}, x ∈ Rq , by some low-dimensional structures. We are looking for a parsimonious semiparametric alternative which can be represented by a finite-dimensional parameter and a set of single-variable nonlinearities. The following is a simple semiparametric model:

μ (x) = g(θ T x),

(9.3)

where the function g : R → R and the parameter θ ∈ Rq are unknown and must be estimated. We note that g(•) is a single variable function and thus the curse of dimensionality for the model μ (x) is removed. The model μ (x) is not uniquely defined. In fact if g(•) is linear then we cannot identify θ . Moreover, the scaling of the vector θ does not influence the values of g(•). Hence, we need to normalise θ either by setting one of the coefficients of θ to one, e.g., θ1 = 1 or by putting the restriction ||θ || = 1. We will call the set of all such normalised values of θ as Θ . Often the vector of covariates Xt can be decomposed as Xt = (Ut , Vt ), where Ut ∈ Rd has the interpretation of the input signal and Vt ∈ R p defines the past values of the output signal. Then we can propose a few alternatives to the model in (9.3), e.g.,

μ (u, v) = ρ T v + g(γ T u),

(9.4)

where ρ ∈ R p and γ ∈ Rd are unknown parameters. This semiparametric model applied in (9.2) would result in a partially linear nonlinear system of the form Yn = ρ1Yn−1 + ρ2Yn−2 + · · · + ρ pYn−p + g(γ T Un ) + Zn .

(9.5)

The statistical inference for the model in (9.3), i.e., estimation of (θ , g(•)) requires the characterisation of the “true” characteristics (θ ∗ , g∗ (•)). This can be done by finding the optimal L2 projection of the original system onto the model defined in (9.3). Hence, we wish to minimise Q(θ , g(•)) = E{(Yt − g(θ T Xt ))2 }

(9.6)

with respect to θ ∈ Θ and g(•) such that E{g2 (θ T Xt ))} < ∞. The minimiser of Q(θ , g(•)) will be denoted as (θ ∗ , g∗ (•)). Since the minimisation of Q(θ , g(•)) is equivalent to the minimisation of E{(Yt − g(θ T Xt ))2 |Xt }. This is the L2 projection and for a given θ ∈ Θ the solution is g(w; θ ) = E{Yt |θ T Xt = w}. This is just a regression function of Yt on θ T Xt , i.e., the best predictor of the output signal Yt by the projection of Xt onto the direction defined by the vector θ . Plugging this choice

9

Nonparametric and Semiparametric Inference

131

into Q(θ , g(•)), i.e., calculating Q(θ , g(•; θ )) we can readily obtain the following error function Q(θ ) = E{(Yt − g(θ T Xt ; θ )2 } = E{(var(Yt |θ T Xt )}.

(9.7)

The minimiser of Q(θ ) with respect to θ ∈ Θ defines the optimal θ ∗ and consequently the corresponding optimal nonlinearity g∗ (w) = g(w; θ ∗ ). In practice, it is difficult to find an explicit formula for g(w; θ ) and to characterise the choice of θ ∗ . It is clear that the smoothness and shape of g(w; θ ) is controlled by the smoothness of m(x) and the conditional distribution of Xt on θ T Xt . To shed some light on this issue let us consider a simple example. Example 9.1. Let Yt = m(X1t , X2t ) + Zt be the bivariate regression model with X1t = Ut , X2t = Ut−1 and m(x1 , x2 ) = x1 x2 . Assume that {Ut } is zero mean unit variance stationary Gaussian process with the correlation E{Ut+τ Ut } = ρ (τ ). Let us also denote ρ = ρ (1). The noise process {Zt } is assumed to be a stationary process with zero mean and unit variance being, moreover, independent of {Ut }. Hence, we wish to determine the best L2 approximation of the system Yt = Ut Ut−1 + εt by a model of the form g(Ut + θ Ut−1 ). The aforementioned discussion reveals that first we have to determine the regression function g(w; θ ) = E{Yt |Ut + θ Ut−1 = w} and next to find the optimal θ ∗ by minimising Q(θ ) = E{(var(Yt |Ut + θ Ut−1 )}. To do so, let us first note that the random vector (Ut−1 ,Ut + θ Ut−1 ) has the bivariate Gaussian distribution with zero mean and covariance matrix ρ +θ 1 . ρ + θ 1 + θ 2 + 2θ ρ This fact and some algebra yield g(w; θ ) = a(θ )w2 + b(θ ),

(9.8)

where a(θ ) = (ρ + θ (1 − θ ))/(1 + θ 2 + 2θ ρ ) and b(θ ) = −θ (1 − ρ 2))/(1 + θ 2 + 2θ ρ ). Further algebra leads also to an explicit formula for the projection error Q(θ ) for a given θ . In Figure 9.1 we depict Q(θ ) as a function of θ for the value of the correlation coefficient ρ equal to 0, 0.4, 0.8. The dependence of Q(θ ) on negative values of ρ is just a mirror reflection of the curves shown in Figure 9.1. Interestingly, we observe that√in the case of ρ = 0 we have two values of θ minimising Q(θ ), i.e., θ ∗ = ±1/ 3. When |ρ | is increasing, the optimal θ is unique and is slowly decreasing from θ ∗ = 0.577 for ρ = 0 to θ ∗ = 0.505 for ρ = 0.9. On the hand, the value of the minimal error Q(θ ∗ ) is decreasing fast from Q(θ ∗ ) = 0.75 for ρ = 0 to Q(θ ∗ ) = 0.067 for ρ = 0.9. Figure 9.2 shows the optimal nonlinearities g∗ (w) = g(w; θ ∗ ) corresponding to the values ρ = 0, 0.4, 0.8. It is worth nothing that g∗ (w) for ρ = 0 is smaller than g∗ (w) for any ρ > 0. Similar relationships hold for ρ < 0. Thus, we can conclude that the best approximation (for ρ = 0) of the system Yt = Ut Ut−1 + Zt by the class of models {g(Ut + θ Ut−1 ) : θ ∈ Θ } is the √ model √ √ 3−1 2 ∗ ∗ ∗ ∗ Yt = g (Ut + θ Ut−1 ) + Zt , where θ = ±1/ 3 and g (w) = ± 4 w ∓ 43 . In the case when, e.g., ρ = 0.5 we obtain θ ∗ = 0.532 and g∗ (w) = 0.412w2 − 0.219.

132

M. Pawlak

2.5 2.0

0

ρ=

1.5

ρ=

1.0

0.4

ρ = 0.8

0.5 0.0 2

1

0

1

2

Fig. 9.1: The projection error Q(θ ) versus θ for the values of the correlation coefficient ρ = 0, 0.4, 0.8

12 10

ρ = 0.8

8

0. 4

6

ρ

=

4 2

ρ=

0 4

2

0

2

0

4

Fig. 9.2: The optimal nonlinearity g (w) for the values of the correlation coefficient ρ = 0, 0.4, 0.8

We should also observe that our model represents the Wiener cascade system with the impulse response (1, θ ∗ ) and the nonlinearity g∗ (w). The fact that the correlation reduces the value of the projection error Q(θ ) can be interpreted as follows. With an increasing correlation between input variables the bivariate function m(Ut ,Ut−1 ) = Ut Ut−1 behaves like a function of a single variable. In fact, from (9.8) we have that b(θ ) → 0 as |ρ | → 1. Thus far, we have discussed the preliminary aspects of the semiparametric inference concerning the characterisation of the optimal characteristics (θ ∗ , g∗ (•)) of the model in (9.3). Next, we wish to set up estimators of θ ∗ and g∗ (w). If the regression function g(w; θ ) = E{Yt |θ T Xt = w}, θ ∈ Θ was known, then, due to (9.7), an obvious estimator of θ ∗ would be a minimiser of the following empirical counterpart of Q(θ ):

9

Nonparametric and Semiparametric Inference

133

N

QN (θ ) = N −1 ∑ (Yt − g(θ T Xt ; θ ))2 .

(9.9)

t=1

Since g(w; θ ) is unknown, this is not a feasible estimator. We can, however, estimate the regression function g(w; θ ) by some standard nonparametric methods like kernel algorithms, see (9.1). Let g(w; ˆ θ ) denote a nonparametric estimate of g(w; θ ). As a concrete example we can use the classical kernel estimate N

N

t=1

l=1

g(w; ˆ θ ) = ∑ Yt K((w − θ T Xt )/b)/ ∑ K((w − θ T Xl )/b),

(9.10)

where b is the bandwidth parameter that controls the accuracy of the estimate. In the limit any reasonable nonparametric estimate g(w; ˆ θ ) is expected to tend to g(w; θ ) which, in turn, satisfies the restriction g(w; θ ∗ ) = g∗ (w). Hence, substituting g(w; θ ) in (9.9) by g(w; ˆ θ ) we can obtain the following criterion depending solely on θ : N

ˆ θ T Xt ; θ ))2 . Qˆ N (θ ) = N −1 ∑ (Yt − g(

(9.11)

t=1

It is now natural to define an estimate θˆ of θ ∗ as the minimiser of Qˆ N (θ ), i.e.,

θˆ = arg min Qˆ N (θ ). θ ∈Θ

(9.12)

This approach may lead to an effective estimator of θ ∗ subject some limitations. First, as we have already noted, we should be able to estimate the projection g(w; θ ) for a given θ . In the context of block-oriented systems, the difficulty of this step depends on the complexity of the studied nonlinear system, i.e., whether nonlinear components can be easily estimated as if the parametric part of the system were known. In the next section we will demonstrate that this is the case for MISO Hammerstein and parallel systems. Second, we must minimise the criterion Qˆ N (θ ) which may be an expensive task mostly if θ is highly dimensional and if the gradient vector of Qˆ N (θ ) is difficult to evaluate. To partially overcome these computational difficulties we can use the following simplified iterative method: ˆ θˆ (0) ). Step 1: Select an initial θˆ (0) and set g(w; Step 2: Minimise the criterion N

Q˜ N (θ ) = N −1 ∑ (Yt − g( ˆ θ T Xt ; θ (0) ))2 t=1

with respect to θ and use the obtained value θˆ (1) to update g(w; ˆ θ ), i.e., go to Step 1 to get g(w; ˆ θ (1) ). Step 3: Iterate between the above two steps until a certain stopping rule is satisfied.

134

M. Pawlak

Note that in the above algorithm the criterion Q˜ N (θ ) has a weaker dependence ˆ θ ) is on θ than the original criterion Qˆ N (θ ). In fact, in Q˜ N (θ ) the nonlinearity g(w; already specified. Concerning the recovery of the model optimal nonlinearity g∗ (•) we can use the ˆ θ ) to obtain g(w; ˆ θˆ ). This estimate θˆ and plug it back into our pilot estimate g(w; ∗ can define a nonparametric estimate of g (•). Nevertheless, one can use any other nonparametric estimate g(•; ˜ θ ) with θ replaced by θˆ . Yet another important issue is the problem of selecting a smoothing parameter, like the bandwidth b in the kernel ˆ The forestimate in (9.10), which tunes nonparametric estimates g(•; ˆ θ ) and g(•). mer estimate is used as a preliminary estimator of the projection g(•; θ ) so that θ ∗ can be estimated in the process of minimising Qˆ N (θ ) in (9.11). On the other hand, the latter estimate is used as a final estimate of g∗ (•). Hence, we may be forced to select two separate smoothing parameters. The one for g(•; ˆ θ ), and the other for g(•). ˆ This can be done by augmenting the definition of Qˆ N (θ ) in (9.11) by adding the smoothing parameter as a variable in Qˆ N (θ ). Hence, we define Qˆ N (θ , b) and then minimise Qˆ N (θ , b) simultaneously with respect to θ and b. The bandwidth obtained in this process is by no means good for selecting the estimate θˆ . Whether this is the proper choice for the accurate estimation of g(•) ˆ is not quite clear, see [7] for the affirmative answer to this controversy in the context of the classical regression problem from i.i.d. data. In order to establish consistency properties of the aforementioned estimates we first need to establish that the criterion Qˆ N (θ ) in (9.11) tends ((P)) as N → ∞ to the limit criterion Q(θ ) in (9.7) for a given θ ∈ Θ . This holds under fairly general conditions due to the law of large numbers. Furthermore, as we have already argued we identify the optimal θ ∗ with the minimum of Q(θ ). Note, however, that Qˆ N (θ ) is not a convex function of θ and therefore need not achieve a unique minimum. This, however, is of no serious importance for the consistency since we may weaken our requirement on the minimiser θˆ of Qˆ N (θ ) and define θˆ as any estimator that nearly minimises Qˆ N (θ ), i.e., Qˆ N (θˆ ) ≤ inf Qˆ N (θ ) + δN , (9.13) θ ∈Θ

for any random sequence δN , such that δN → 0(P). It is clear that (9.13) implies that Qˆ N (θˆ ) ≤ Qˆ N (θ ∗ ) + δN and this is sufficient for the convergence of θˆ defined in (9.13) to θ ∗ . Since θˆ depends on the whole mapping θ → Qˆ N (θ ), the convergence of θˆ to θ ∗ requires uniform consistency of the corresponding criterion function, i.e., we ( ( need supθ ∈Θ (Qˆ N (θ ) − Q(θ )( → 0(P). This uniform convergence is the essential step in proving the convergence of θˆ to θ ∗ . This can be established by using formal techniques for verifying whether the following class of functions {(y − g(w; ˆ θ ))2 : θ ∈ Θ } satisfies a uniform law of large numbers that is often referred to as the GlivienkoCantelli property, see [13]. This along with the assumption that the limit criterion

9

Nonparametric and Semiparametric Inference

135

Q(θ ) is a continuous function on the compact set Θ ⊂ Rq , such that θ ∗ ∈ Θ , imply that for any sequence of estimators θˆ that satisfy (9.13) we have

θˆ → θ ∗

as N → ∞(P).

A related issue of interest pertaining to a given estimate is the rate of convergence, i.e., how fast the estimate tends to the true characteristic. Under the differentiability condition of the mapping θ → (• − g(•; ˆ θ ))2 we can consider the problem of the convergence rate. Hence, if Q(θ ) admits the second-order Taylor expansion at θ = θ ∗ then for θˆ defined in (9.13) with N δN → 0(P), we have

θˆ = θ ∗ + OP(N −1/2 ).

(9.14)

This is the usual parametric rate of convergence. Since θˆ → θ ∗ then it is reasonable to expect that the estimate g(•; θˆ ) converges to g(•; θ ∗ ) = g∗ (•). The following decomposition will facilitate this claim g(•; θˆ ) − g∗(•) =

{g(•; ˆ θˆ ) − g(•; ˆ θ ∗ )} ∗ + {g(•; ˆ θ ) − g∗(•)}.

(9.15)

The convergence (P) of the second term to zero in the above decomposition represents a classical problem in nonparametric estimation. The difficulty of establishing this convergence depends on the nature of the dependence between random signals within the underlying system. Concerning the first term in (9.15) we can apply the linearisation technique, i.e., g(•; ˆ θˆ ) − g(•; ˆ θ ∗) =

∂ g(•; ˆ θ )|θ =θ ∗ ∂θ + oP(θˆ − θ ∗ ).

2T

(θˆ − θ ∗)

To show the convergence (P) of the first term to zero it suffices to prove that the derivative has a finite limit (P). This fact can be directly verified for a specific estimate g(•; ˆ θ ) of g(•; θ ). Hence the statistical accuracy of g(•; ˆ θˆ ) is determined by the second term of the decomposition in (9.15). Since for typical nonparametric estimates we have g(•; ˆ θ ∗ ) = g∗ (•) + OP (N −β ), where 1/3 ≤ β < 1, therefore we obtain (9.16) g(•; ˆ θˆ ) = g∗ (•) + OP(N −β ). For instance, if g∗ (•) is the Lipschitz continuous function and g(•; ˆ θˆ ) is the kernel nonparametric estimate then (9.16) holds with β = 1/3. For twice differentiable nonlinearities, this takes place with β = 2/5. The criterion Qˆ N (θ ) in (9.11) utilises the same data to form the pilot nonparametric estimate g(w; ˆ θ ) and to define Qˆ N (θ ). This is not generally a good strategy and some form of resampling scheme should be applied in order to separate measurements into the testing data (used to form Qˆ N (θ )) and training sequence (used for forming the estimate g(w; ˆ θ )). Such separation will facilitate not only the

136

M. Pawlak

mathematical analysis of the estimation algorithms but also gives a desirable separation of parametric and nonparametric estimation problems, which allows one to evaluate parametric and nonparametric estimates more precisely. One such a strategy would be the leave-one-out method which modifies Qˆ N (θ ) as follows N

Q¯ N (θ ) = N −1 ∑ (Yt − gˆt (θ T Xt ; θ ))2 ,

(9.17)

t=1

ˆ θ ) with the training data pair where gˆt (w; θ ) is the version of the estimate g(w; (Xt ,Yt ) omitted from calculation. For instance, in the case of the kernel estimate in (9.10) this takes the form N

N

i=t

l=t

gˆt (w; θ ) = ∑ Yi K((w − θ T Xi )/b)/ ∑ K((w − θ T Xl )/b) . Yet another efficient resampling scheme is based on the partition strategy which reorganises a set of training data DN into two non overlapping subsets that are dependent as weakly as possible. Hence, we define two non overlapping subsets DN,1 , DN,2 of the training set DN such that DN,1 is used to estimate the regression function g(w; θ ) whereas DN,2 is used as a testing sequence to form the least-squares criterion Qˆ N (θ ) in (9.11). There are various strategies to split the data for the efficient estimation of θ ∗ and g∗ (•). The machine learning principle says the testing sequence DN,2 should consists (if it is feasible) of independent observations, whereas the training sequence DN,1 can be arbitrary.

9.3 Semiparametric Block-oriented Systems In this section, we will illustrate the nonparametric/semiparametric methodology developed in Section 2 by examining a few important cases of block-oriented systems. This includes MISO Hammerstein and parallel systems.

9.3.1 Semiparametric Hammerstein Systems Let us begin with the multiple-input, single-output (MISO) Hammerstein system which is depicted in in Figure 3. The fully nonparametric Hammerstein system is given by the following inputoutput relationship: Yn

= Λ (z)Vn + H(z)en ,

Vn

=

m(Un ),

(9.18)

−i −1 where Λ (z) is a causal transfer function defined as Λ (z) = ∑∞ being i=0 λi z , with z the backward-shift operator.

9

Nonparametric and Semiparametric Inference

137 en H(z)

Un

Vn

m(•)

Zn Λ(z)

+

Yn

Fig. 9.3: MISO Hammerstein system

Moreover, H(z) is the stable and inversely stable noise model driven by a white process {en }. The system is excited by the d-dimensional input Un , which throughout the paper is assumed to be a sequence of i.i.d. random vectors. The output of the linear dynamic subsystem Gn is corrupted by an additive noise Zn being independent of {Un }. The system nonlinearity m(•) is a nonparametric function defined on Rd . It is known, see [6], that if Λ (∞) = 1 and E{m(Un )} = 0 then m(u) = E{Yn |Un = u}. This fact holds for any correlated noise process. This allows us to recover m(•) by applying nonparametric regression estimates such those defined in (9.1). Let mˆ N (•) be such an estimate based on the training data DN = {(U1 ,Y1 ), . . . , (UN ,YN )}. It can be demonstrated (under common smoothing conditions on m(u)) that for a large class of nonparametric regression estimates, see [6], we have " # (9.19) m(u) ˆ = m(u) + OP N −2/(d+4) . Hence, the estimates suffer the curse of dimensionality since the rate of convergence gets slower as d increases. It is also worth noting that the linear part of the system Λ (z) can be recovered via the correlation method independently on the form of the system nonlinearity and the noise structure, see [6]. This defines a fully nonparametric identification strategy for the MISO Hammerstein system. The statistical accuracy, however, of such estimation algorithms is rather low due to the generality of the problem. In many practical situations and due to the inherent complexity of the nonparametric Hammerstein system it is sufficient if we resort to the following semiparametric alternative of (9.18) (see Figure 9.4) Yn

= Λ (z; λ )Vn + H(z; λ )en ,

Vn

=

g(γ T Un ),

(9.20)

where Λ (z; λ ) and H(z; λ ) are parametrised rational transfer functions. The function g(•) and the d−dimensional parameter γ define the one-dimensional semiparametric approximation of m(•) which was already introduced in Section 2, see (9.3). Note the class of dynamical systems represented by the rational transfer functions covers a wide range of linear autoregressive and moving average processes. Hence, the semiparametric model in (9.20) is characterised by the pair (θ , g(•)), where θ = (λ , γ ). Since the identifiability of the model requires that Λ (∞; λ ) = 1

138

M. Pawlak en H(z; λ) Un

Wn

γ

g(•)

Vn

Zn Λ(z; λ)

+

Yn

Fig. 9.4: Semiparametric MISO Hammerstein model

and γ1 = 1, therefore we can define the parameter space as Θ = {(λ , γ ) : Λ (∞; λ ) = 1, γ1 = 1}, such that Θ is a compact subset of R p+d , where p is the dimensionality of λ . In order to develop constructive identification algorithms let us define the concept of the true Hammerstein system corresponding to (9.20). We may assume without loss of generality that the true system is in the form as in (9.20) and this will be denoted by the asterisk sign, i.e., the true system is defined as Yn Vn

= Λ (z; λ ∗ )Vn + H(z; λ ∗ )en , = g∗ (γ ∗T Un ),

(9.21)

where it is natural to expect that θ ∗ ∈ Θ . Since the system is linear between the signal Vn and the output Yn then we can recall, see [11], that a one-step ahead prediction error for a given θ ∈ Θ is given by

εn (θ ) = H −1 (z; λ ) [Yn − Λ (z; λ )Vn (γ )] ,

(9.22)

where Vn (γ ) is the counterpart of the true signal Vn corresponding to the value γ . Under our normalisation we note that for a given γ T Un the best L2 predictor of Vn (γ ) is the regression E{Vn (γ )|γ T Un } = E{Yn |γ T Un }. Hence, let g(w; γ ) = E{Yn |γ T Un = w} be the regression function predicting the unobserved signal Vn (γ ). It is worth nothing that g(w; γ ∗ ) = g∗ (w). All these considerations lead to the following form of (9.22) εn (θ ) = H −1 (z; λ ) Yn − Λ (z; λ )g(γ T Un ; γ ) . (9.23) Reasoning now as in Section 2 we can readily form selection criterion for estimating θ ∗ QN (θ ) = N −1

N

∑ εn2 (θ ).

n=1

This is a direct counterpart of the criterion defined in (9.9). As we have already noted the regression g(w; γ ) is unknown but can be directly estimated by nonparametric regression estimates. For example, we use the kernel method, see (9.10), g(w; ˆ γ) =

N

N

n=1

l=1

∑ Yt K((w − γ T Un )/b)/ ∑ K((w − γ T Ul )/b).

(9.24)

9

Nonparametric and Semiparametric Inference

139

Using this or any other nonparametric regression estimate in QN (θ ) we can form our final selection criterion for estimating θ ∗ Qˆ N (θ ) = N −1

N

∑ εˆn2 (θ ),

(9.25)

n=1

ˆ γ T Un ; γ ). The where εˆn (θ ) is the version of (9.23) with g(γ T Un ; γ ) replaced by g( ∗ minimiser of Qˆ N (θ ) = Qˆ N (λ , γ ) defines an estimate (λˆ , γˆ) of (λ , γ ∗ ). Following the reasoning from Section 2 we can show that (λˆ , γˆ) tends (P) to (λ ∗ , γ ∗ ). Furthermore, under additional mild conditions we can find that (λˆ , γˆ) is converging with the optimal OP (N −1/2 ) rate. It is worth noting that if the linear subsystem of the Hammerstein structure is ∗ p λi z−i then we of the moving average type of order p, i.e., Λ (z; λ ∗ ) = 1 + ∑i=1 can estimate Λ (z; λ ∗ ) (independently of g∗ (•) and γ ∗ ) via the correlation method, see [6]. In fact, for a given function η : Rd → R such that E η (Un ) = 0 and E{η (Un )g∗ (Wn )} = 0 we have the following estimate of λ ∗ N −1 ∑N−t i=1 Yt+i η (Ui ) λ˜ t = , −1 N ∑Ni=1 Yi η (Ui )

t = 1, . . . , p.

This applied in (9.25) gives the simplified least squares criterion for selecting γ Qˆ N (γ ) = N

−1

N

∑

i=p+1

&

Yi − ∑ λ˜ t gˆ γ T Ui−t ; γ p

'2 .

(9.26)

t=0

Once the parametric part of the Hammerstein system is obtained one can define the following nonaprametric estimate for the system nonlinearity g(w) ˆ = g(w; ˆ γˆ), where g(w; ˆ γ ) is any nonparametric consistent estimate of g(w; γ ) and γˆ is the minimiser of Qˆ N (λ , γ ). Recalling the arguments given in Section 2 we can conclude that if g∗ (w) is twice differentiable and if we select the bandwidth as b = cN −1/5 then we have g(w) ˆ = g∗ (w) + OP (N −2/5 ). This rate is independent of the dimensionality of the input signal and it is known to be optimal [14]. This should be contrasted with the nonparametric Hammerstein system identification accuracy, see (9.19). The bandwidth choice is critical for the precision of the kernel estimate. The choice b = cN −1/5 is only asymptotically optimal and in practice one would like to specify b depending on the data at hand. One possibility as we have already pointed out, would be to extend the criterion Qˆ N (θ ) in (9.25) and include b into the minimisation process. Hence, we would have the modified criterion Qˆ N (θ , b).

140

M. Pawlak

It is worth noting that in the above two-step scheme the estimate g(w; ˆ γ ) in (9.25) and the criterion function Qˆ N (θ ) share the same training data. This is usually not the recommended strategy since it may lead to estimates with unacceptably large variance. Indeed, some resampling schemes would be useful here which would partition the training data into the testing and training sequences. The former should be used to form the criterion Qˆ N (λ , γ ), whereas the latter to obtain the nonparametric estimate g(w; ˆ γ ). The aforementioned concepts are illustrated in the following simulation example, see also [12] for further details. Example 9.2. In our simulation example, the d-dimensional input signal Un is generated according to uncorrelated Gaussian distribution Nd (0, σ 2 I). We assume that the actual system can be exactly represented by the semiparametric model, with √ √ T the characteristics γ = cos(θ ), sin(θ )/ d − 1, · · · , sin(θ )/ d − 1 and g(w) = 0.7 arctan(β w). Note that with this parametrisation ||γ || = 1. The true value of γ corresponds to θ ∗ = π /4. The slope parameter β defining g(w) is changed in some experiments. Note that the large β defines the nonlinearity with a very rapid change at w = 0. The FIR(3) linear subsystem is used with the transfer function Λ ∗ (z) = 1 + 0.8z−1 − 0.6z−2 + 0.4z−3 . The noise Zt is N(0, 0.1). In our simulation examples we generate L different independent training sets and determine our estimates γˆ and g(·) ˆ described in this section. The local linear kernel estimate with the kernel function K(w) = (1 − w2 )2 , |w| ≤ 1 was employed. In implementing the kernel estimate, the window length b was selected simultaneously with the choice of γ . Furthermore, tin the partition resampling strategy the size of the training subset is set to 55% of the complete training data of the size n = 150. It is also worth noting that the optimal b needed for estimating a preliminary regression estimate g(w; ˆ γ ), has been observed to be different than that required for the final estimate g(w). ˆ Figure 9.5 shows the mean squared error (MSE) of γˆ versus the parameter β . Figure 9.6 represents the identical dependence for the mean integrated squared error (MISE) of g(·). ˆ In both figures we have d = 2. We observe a little influence of the complexity of the nonlinearity g(w) on the accuracy of the estimate g(w; ˆ γ ). This is not the case for estimating g(w). Clearly, a faster changing function is harder to estimate than the one that changes slowly. Figures 7 and 8 show the influence of ˆ The slope parameter was set the input dimensionality on the accuracy of γˆ and g(·). to β = 2. As d varies from d = 2 to d = 10 we observe a very little change in the error values. This supports the observation that the semi-parametric approach may behave favourable in high dimensions.

9.3.2 Semiparametric Parallel Systems In this section we make use of the semiparametric methodology in the context of the parallel system with a single (without loss of generality) input and a finite memory linear subsystem. Hence, the system shown in Figure 9.9 is assumed to be the true system with the following input-output description:

9

Nonparametric and Semiparametric Inference

141

Fig. 9.5: MSE(γˆ) versus the slope parameter β ; n = 150, d = 2

Fig. 9.6: MISE(g) ˆ versus the slope parameter β ; n = 150, d = 2 p

Yn = m∗ (Un ) + ∑ λ j∗Un− j + Zn .

(9.27)

j=0

The identifiability condition for this system is that λ0∗ = 1. Hence, let Λ = {λ ∈ R p+1 : λ0 = 1} be a set of all admissible parameters that is assumed to be the compact subset of R p+1 .

142

M. Pawlak

Fig. 9.7: MSE(γˆ) versus the input dimensionality d; n = 150, β = 2

Fig. 9.8: MISE(g) ˆ versus the input dimensionality d; n = 150, β = 2

As we have already discussed, the semiparametric least squares strategy begins with the elimination of the nonlinear characteristic from the optimisation process. To this end let, Wn (λ ) =

p

∑ λ jUn− j ,

(9.28)

j=0

be the output of the linear subsystem for a given λ ∈ Λ . Clearly Wn (λ ∗ ) = Wn . Next, let (9.29) m(u; λ ) = E{Yn − Wn (λ )|Un = u}

9

Nonparametric and Semiparametric Inference

{λ∗i , 0 ≤ i ≤ p}

143

Wn

Un

Zn Yn

m∗ (•) Fig. 9.9: Semiparametric nonlinear parallel model

be the best model (regression function) of m∗ (u) for a given λ ∈ Λ . Indeed, the signal Yn − Wn (λ ) − Zn is the output of the nonlinear subsystem for λ ∈ Λ . Noting that, p

m(u; λ ) = m∗ (u) + ∑ (λ j∗ − λ j ) E{Un− j |Un = u}, j=0

m(u; λ ∗ )

we can conclude that = m∗ (u). For a given training set DN = {(U1 ,Y1 ), . . . , (UN ,YN )} we can easily form a nonparametric estimate of the regression function m(u; λ ). Hence let, N t (Yt − Wt (λ ))K u−U ∑t=p+1 b , (9.30) m(u; ˆ λ) = N t K u−U ∑t=1 b be the kernel regression estimate of m(u; λ ). The mean-squared criterion for estimating λ ∗ can now be defined as follows: Qˆ N (λ ) = N −1

N

∑

(Yt − m(U ˆ t ; λ ) − Wt (λ ))2 .

(9.31)

t=p+1

The minimiser of the prediction error Qˆ N (λ ) defines an estimate λˆ of λ ∗ . As soon as λˆ is determined we can estimate m∗ (u) by the two-stage process, i.e., we have, m(u) ˆ = m(u; ˆ λˆ ).

(9.32)

Thus far we have used the same data for estimating the pilot regression estimate m(u; ˆ λ ) and the criterion function Qˆ N (λ ). This may lead to consistent estimates but the mathematical analysis of such algorithms is lengthy. In Section 2 we suggested the partition resampling scheme which gives a desirable separation of the training and testing data sets and reduces the mathematical complications. This strategy can be easily applied here, i.e., we can use the subset DN,1 of DN to derive the kernel estimate in (9.30) and then utilise the remaining part of DN for computing the criterion function Qˆ N (λ ). For estimates of λˆ and m(u) ˆ obtained as outlined above, we can follow the arguments given in Section 2 and show that λˆ → λ ∗ (P) and consequently m(u; ˆ λˆ ) → ∗ ∗ m(u; λ ) = m (u)(P).

144

M. Pawlak

The minimisation procedure required to obtain λˆ can be involved due to the highly nonlinear nature of Qˆ N (λ ). A reduced complexity algorithm can be developed based on the general iterative scheme described in Section 2. Hence, for a ˆ λˆ (old) ). Then we form the modified criterion, given λˆ (old) , set m(u; Q˜ N (λ ) = N −1

N

∑

" #2 Yt − m(U ˆ t ; λˆ (old) ) − Wt (λ ) ,

(9.33)

t=p+1

and find

λˆ (new) = arg min Q˜ N (λ ). λ ∈Λ

ˆ λˆ (new) ) and iterate the above process until the Next, we use λˆ (new) to get m(u; ˜ criterion QN (λ ) does not change significantly. It is worth noting that Wt (λ ) in (9.33) is a linear function of λ and therefore we can explicitly find λˆ (new) that minimises Q˜ N (λ ). Indeed, this is the classical linear least squares problem with the following solution

λˆ (new) = (UT U)−1 UT O,

(9.34)

where O is the (N − p) × 1 vector with the t-th coordinate being equal to Yt − m(U ˆ t ; λˆ (old) ), t = p + 1, . . ., N. U is a (N − p) × (p + 1) matrix, U = (UTp+1 , . . . , UTN )T , where Ut = (Ut , . . . ,Ut−p )T . We should note that the above algorithm can work with the dependent input process {Un }. However, if {Un } is a sequence of i.i.d. random variables, then the correlation method provides the following explicit solution for recovering λ ∗ . In fact, we have cov(Yn ,Un− j ) ; j = 1, . . . , p. λ j∗ = var(U0 ) Note also that

m∗ (u) = E{Yn |Un = u} − u.

which allows us to recover m∗ (u). Empirical counterparts of cov(Yn ,Un− j ), var(U0 ), and the regression function E{Yn |Un = u} define the estimates of the system characteristics. Although these are explicit estimates, they are often difficult to generalise in more complex cases. On the other hand, the semiparametric approach can easily be extended to a large class of interconnected complex systems.

9.4 Concluding Remarks In this paper we have compared the nonparametric and semiparametric approaches to the problem of recovering characteristics of block-oriented systems. We have argued that the semiparametric inference can offer an attractive strategy for

9

Nonparametric and Semiparametric Inference

145

identification of large scale composite systems where one faces an inherent problem of dimensionality and model complexity. In fact, the semiparametric paradigm allows us to project the original system onto some parsimonious alternative. The semiparametric version of the least squares method employed in this paper determines such a projection via an optimisation procedure. We have examined a very simple class of semiparametric models, see (9.3), characterised by a single function of one variable and the projection parameter of the dimensionality equal to the number of the input-output signals used in the identification problem. The following is the natural generalisation of the approximation in (9.3)

μ (x) =

L

∑ gl (θlT x),

(9.35)

l=1

where now we wish to specify the univariate functions {gl (•), 1 ≤ l ≤ L} and the parameters {θl , 1 ≤ l ≤ L}. Often one also needs to estimate the degree L of this approximation network. The approximation properties of (9.35) has been examined in [1]. It is worth nothing the nonlinear characteristic in Example 9.1, i.e., m(x1 , x1 ) = x1 x2 , can be exactly reproduced by the network in (9.35). In fact, we have 1 1 x1 x2 = (x1 + x2 )2 − (x1 − x2 )2 . 4 4 This corresponds to (9.35) with g1 (w) = 14 w2 , g2 (w) = − 14 w2 and θ1 = (1, 1)T , θ2 = (1, −1)T . Semiparametric models have been extensively examined in the econometric literature, see [8], [15]. There, they have been introduced as more flexible extension of the standard linear regression model and popular models include partial linear and multiple-index models. These are static models and this paper can be viewed as the generalisation of these models to dynamic nonlinear block-oriented systems. In fact, the partially linear models fall into the category of parallel models, whereas multiple-index models correspond to Hammerstein/Wiener connections. Semiparametric models have recently been introduced in the nonlinear time series literature [3], [4]. Some empirical results on the identification of the partially linear model have been reported in [2]. Comprehensive studies of semiparametric Hammerstein/Wiener models have been given in [6]. There are number of issues worth further studies. First of all, one can consider a more robust version the leastsquare criterion with a general loss function. This would lead to the semiparametric alternative of M- estimation [13]. As a result, we could examine semiparametric counterparts of maximum-likelihood estimation and some penalised M-estimators. The latter would allow us to incorporate some shape constraints like convexity and monotonicity of underlying characteristics. Acknowledgements. The author wishes to thank Mount First and Jiaqing Lv for assistance.

146

M. Pawlak

References 1. Diaconis, P., Shahshahani, M.: On nonlinear functions of linear combinations. SIAM Journal on Scientific Computing 5(1), 175–191 (1984) 2. Espinozo, M., Suyken, J.A.K., De Moor, B.: Kernel based partially linear models and nonlinear identification. IEEE Trans. on Automatic Control 50, 1602–1606 (2005) 3. Fan, J., Yao, Q.: Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, New York (2003) 4. Gao, J., Tong, H.: Semiparametric non-linear time series model selection. Journal of Royal Statistical Society B 66, 321–336 (2004) 5. Giannakis, G.B., Serpendin, E.: A bibliography on nonlinear system identification. Signal Processing 81, 533–580 (2001) 6. Greblicki, W., Pawlak, M.: Nonparametric System Identification. Cambridge University Press, Cambridge (2008) 7. Härdle, W., Hall, P., Ichimura, H.: Optimal smoothing in single-index models. The Annals of Statistics 21, 157–178 (1993) 8. Härdle, W., Müller, M., Sperlich, S., Werwatz, A.: Nonparametric and Semiparametric Models. Springer, Heidelberg (2004) 9. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2009) 10. Kvam, P.H., Vidakovic, B.: Nonparametric Statistics with Applications to Science and Engineering. Wiley, Chichester (2007) 11. Ljung, L.: System Identification: Theory for the User. Prentice-Hall, Englewood Cliffs (1999) 12. Pawlak, M., Lv, J.: On nonparametric identification of MISO Hammerstein systems (Submitted, 2010) 13. Van Der Vaart, A.W.: Asymptotic Statistics. Cambridge University Press, Cambridge (1998) 14. Wasserman, L.: All of Nonparametric Statistics. Springer, Heidelberg (2006) 15. Yatchev, A.: Semiparametric Regression for the Applied Econometrician. Cambridge University Press, Cambridge (2003)

Chapter 10

Identification of Block-oriented Systems Using the Invariance Property Martin Enqvist

10.1 Introduction Identification of systems that can be written as interconnected linear time-invariant (LTI) dynamical subsystems and static nonlinearities has been an active research area for several decades. These systems are often referred to as block-oriented systems since their structures can be characterised using linear dynamical and static nonlinear blocks. In particular, block-oriented systems where the blocks are connected in series have received special attention. For example, Wiener and Hammerstein systems are common examples of series connected block-oriented systems. A Wiener system consists of an LTI subsystem followed by a static nonlinearity, and a Hammerstein system has the same subsystems but in the opposite order. Wiener and Hammerstein systems can both be viewed as special cases of the more general Wiener–Hammerstein systems, which have one LTI subsystem before the static nonlinearity and one after. Many results about identification of Wiener, Hammerstein or Wiener–Hammerstein systems are directly or indirectly related to a particular invariance property which, in some cases, can be shown using Bussgang’s theorem [9] or the theory of separable processes [29, 30]. The reason for the usefulness of this invariance property is that it makes it possible to identify the linear parts of a Wiener, Hammerstein or Wiener–Hammerstein system without compensating for or estimating the static nonlinearity in the system. Hence, the identification problem can be divided into two easier problems. The purpose of this chapter is to give an introduction to the invariance property and its use for identification of block-oriented systems and also to describe some other related results. Martin Enqvist Division of Automatic Control, Department of Electrical Engineering, Linköping University, SE-58183 Linköping, Sweden e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 147–158. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

148

M. Enqvist

10.2 Preliminaries One of the most popular methods for system identification is the prediction-error method [25], which has proved to produce accurate models in a large number of estimation settings. The results that will be presented later in this chapter can be motivated using this method, but are useful also for other approaches to identification of block-oriented systems. The prediction-error method can be applied if a finite sequence N Z N = (u(t), y(t))t=1

of simultaneous measurements of the input signal u(t) and output signal y(t) from the studied system is available and a parametrised model y(t, θ ) = G(q, θ )u(t) + H(q, θ )e(t),

(10.1)

has been selected. Here, θ is a d-dimensional vector of parameters and q denotes the shift operator qu(t) = u(t + 1). Under the assumption that e(t) is white noise (i.e., a sequence of independent random variables) and that the model (10.1) is an exact description of the true system, the mean-square error optimal predictor y(t, ˆ θ) of y(t) is (10.2) y(t, ˆ θ ) = H −1 (q, θ )G(q, θ )u(t) + (1 − H −1(q, θ ))y(t). The basic idea in the prediction-error method is to compare how well the predictor (10.2) can predict the measured output y(t) for different θ values and to select a parameter estimate θˆN by minimising some criterion VN (θ , Z N ). For example, this criterion can be chosen to be quadratic such that 1 N θˆN = arg min VN (θ , Z N ) = arg min ∑ (y(t) − y(t, ˆ θ ))2 . θ ∈DM θ ∈DM N t=1 Here, θ is restricted to some pre-specified set DM ⊂ Rd . Usually, DM is the set of parameters that make the predictor (10.2) stable. In many cases, the minimisation of VN (θ , Z N ) has to be performed using some kind of numerical method, e.g., a Gauss-Newton or a damped Gauss-Newton method. The convergence properties of θˆN have been analysed and in [24] it is shown that under rather weak conditions on the true system and on the input and output signals, it holds that

θˆN → θ ∗ = arg min E((y(t) − y(t, ˆ θ ))2 ), θ ∈DM

w.p.1 as N → ∞,

(10.3)

where E denotes the expected value. With some abuse of notation, y(t) and y(t, ˆ θ) here denote the stochastic signals while they previously in this section have denoted realisations of these signals. The convergence result (10.3) shows that many asymptotic properties of the prediction-error method can be derived by analysing the asymptotic cost function E((y(t) − y(t, ˆ θ ))2 ). This is the approach that will be

10

Identification of Block-oriented Systems Using the Invariance Property

149

used here and for simplicity, only output error models, with H(q, θ ) = 1 in (10.1), will be discussed. However, it is straightforward to modify all results to the general case with an arbitrary noise model. More specifically, the identification problem for block-oriented systems that will be described here is formulated in a stochastic framework where all signals are stationary stochastic processes. For simplicity, all signals are assumed to have zero mean, i.e., E(v(t)) = 0 for any signal v(t) and all t ∈ Z. Furthermore, both the covariance function Rv (τ ) = E(v(t)v(t − τ )) and its z-transform

Φv (z) =

∞

∑

τ =−∞

Rv (τ )z−τ

are assumed to be well-defined. The function Φv (z) is called the z-spectrum of the signal and it is assumed that its region of convergence contains the unit circle. Similarly, when two signals u(t) and y(t) are considered, it is assumed that they are jointly stationary and that the cross-covariance function Ryu (τ ) = E(y(t)u(t − τ )) exists. Furthermore, it will be assumed that also this function has a z-transform Φyu (z) whose region of convergence contains the unit circle. Since system identification deals with the input and output signals of a system, any assumptions on these signals can be viewed as implicit assumptions on the system itself. This is a convenient alternative to making explicit assumptions about the system in order to guarantee certain properties of the input and output signals, in particular in the nonlinear setting studied here. Hence, only nonlinear systems with input and output signals with the following properties will be studied in this chapter. Assumption 10.1. Assume that (a)The input u(t) is a real-valued stationary stochastic process with E(u(t)) = 0,

∀t ∈ Z.

(b)There exist K > 0 and α , 0 < α < 1, such that the second order moment Ru (τ ) = E(u(t)u(t − τ )) satisfies |Ru (τ )| < K α |τ | ,

∀τ ∈ Z.

(c)The z-spectrum Φu (z) has a unique canonical spectral factorisation

Φu (z) = L(z)ru L(z−1 ),

(10.4)

where L(z) and 1/L(z) are causal transfer functions that are analytic in the set {z ∈ C | |z| ≥ 1}, L(∞) lim|z|→∞ L(z) = 1 and ru is a positive constant.

150

M. Enqvist

Assumption 10.2. Assume that (a)The output y(t) is a real-valued stationary stochastic process with E(y(t)) = 0,

∀t ∈ Z.

(b)There exist K > 0 and α , 0 < α < 1, such that the second order moments Ryu (τ ) = E(y(t)u(t − τ )) and Ry (τ ) = E(y(t)y(t − τ )) satisfy |Ryu (τ )| < K α |τ | , |τ |

|Ry (τ )| < K α ,

∀τ ∈ Z, ∀τ ∈ Z.

Assumptions 10.1 and 10.2 are satisfied for a wide range of nonlinear systems, including many block-oriented systems. Under these assumptions, and using only output error models, the optimal LTI approximation of a particular nonlinear system can be defined as the stable and causal LTI model G0,OE that minimises the meansquare error E((y(t) − G(q)u(t))2 ). This model is often called the Wiener filter for prediction of y(t) from (u(t − k))∞ k=0 [33, 32, 16]. However, in order to avoid any ambiguities concerning which type of Wiener filter is referred to and in order to emphasise the approximation properties of G0,OE , we will not use the term Wiener filter here, but instead call G0,OE the Output Error LTI Second Order Equivalent (OE-LTI-SOE) of the nonlinear system. This terminology has been used previously in, for example, [26], [15] and [13]. Definition 10.1. Consider a nonlinear system with input u(t) and output y(t) such that Assumptions 10.1 and 10.2 are fulfilled. The Output Error LTI Second Order Equivalent (OE-LTI-SOE) of this system is the stable and causal LTI model G0,OE (q) that minimises the mean-square error E((y(t) − G(q)u(t))2 ), i.e., G0,OE (q) = arg min E((y(t) − G(q)u(t))2 ), G∈G

where G denotes the set of all stable and causal LTI models. Here, stability means bounded input bounded output stability [19]. It should be noted that the OE-LTI-SOE of a nonlinear system is input dependent, i.e., that different input signals in general result in different OE-LTI-SOEs for one particular nonlinear system. Using classic Wiener filter theory, a closed-form expression for the OE-LTI-SOE can be obtained. Theorem 10.1 (OE-LTI-SOEs). Consider a nonlinear system with input u(t) and output y(t) such that Assumptions 10.1 and 10.2 are fulfilled. Then the OE-LTI-SOE G0,OE of this system is 3 4 Φyu (z) 1 G0,OE (z) = , (10.5) ru L(z) L(z−1 ) causal where [. . .]causal denotes taking the causal part, and where L(z) is the canonical spectral factor of Φu (z) from (10.4).

10

Identification of Block-oriented Systems Using the Invariance Property

151

Proof. See [19] or any other textbook on the theory of Wiener filters.

A simple, but useful, corollary to Theorem 10.1 shows that the expression (10.5) for the OE-LTI-SOE sometimes can be simplified. Corollary 10.1. Consider a nonlinear system with input u(t) and output y(t) such that Assumptions 10.1 and 10.2 are fulfilled, and assume that the ratio Φyu (z)/Φu (z) defines a stable and causal LTI system. Then G0,OE (z) =

Φyu (z) . Φu (z)

(10.6)

Proof. This is a direct consequence of Wiener filter theory. A short proof can be found in [15]. The OE-LTI-SOE of a system will be called regular if (10.6) holds. Hence, we have the following definition. Definition 10.2. An OE-LTI-SOE G0,OE (z) is regular if it can be written G0,OE (z) =

Φyu (z) . Φu (z)

In most applications, the order of the OE-LTI-SOE is unknown. However, if a parametrised model, possibly of lower order than the OE-LTI-SOE, is estimated from data, this model will approximate the OE-LTI-SOE according to the following theorem, which is a special case of Theorem 4.1 in [26]. Theorem 10.2. Consider a nonlinear system with input u(t) and output y(t) such that Assumptions 10.1 and 10.2 are fulfilled. Let G0,OE be the corresponding OELTI-SOE according to Theorem 10.1. Suppose that a parametrised stable and causal output error model G(q, θ ) is fitted to the signals u and y according to

θˆ = arg min E(η (t, θ )2 ), θ

where

η (t, θ ) = y(t) − G(q, θ )u(t).

(10.7)

Then it follows that

θˆ = arg min θ

Proof. See [15].

π −π

|G0,OE (eiω ) − G(eiω , θ )|2 Φu (eiω ) d ω .

Theorem 10.2 shows that a low-order model will approximate the OE-LTI-SOE in the same way as a low-order model approximates the true system in a linear identification problem (cf. Section 8.5 and Problem 8G.5 in [25]).

152

M. Enqvist

In the remaining sections, the theory of OE-LTI-SOEs will be used to analyse linear approximations of block-oriented systems. Depending on the type of input signal used, such an approximation can be useful when estimating a complete model of the system.

10.3 The Invariance Property and Separable Processes A key feature of block-oriented systems like Wiener, Hammerstein and Wiener– Hammerstein systems is that the nonlinear characteristics of the system is contained in a single static nonlinearity. Hence, it is natural that there is a strong connection between identification results for such systems and results about static nonlinearities. Because of their simplicity and frequent occurrences in many applications, static nonlinearities is a well-studied topic in the control and identification literature. For example, the properties of static nonlinearities in closed-loop systems is a classic topic and many useful results exist, e.g., the describing function framework for analysis of oscillations [2] and the circle criterion for stability analysis [20]. One of the most useful results about stochastic signals is the particular invariance property that holds for some signals. Definition 10.3. Consider a stationary stochastic process u(t) with E(u(t)) = 0 and Ru (τ ) < ∞ for all τ ∈ Z and a static nonlinearity y(t) = f (u(t)) such that E(y(t)) = 0 and Ryu (τ ) < ∞ for all τ ∈ Z. The invariance property holds if Ryu (τ ) = b0 Ru (τ ),

∀τ ∈ Z,

(10.8)

for some constant b0 . An early result about the invariance property is Bussgang’s theorem [9], which says that the invariance property holds, with b0 = E( f (u(t))), for a large class of static nonlinearities when u(t) is Gaussian. For such a Gaussian signal, the constant b0 is called the equivalent gain in [6] and can be viewed as a describing function for a random input signal. Just like ordinary describing functions, it can be used to analyse nonlinear closed-loop systems [2]. Bussgang’s theorem has been generalised to functions of several Gaussian variables [1, 31, 27] and to other classes of signals than Gaussian in [3], [8], [29] and [30]. Nuttall’s generalisation [29, 30] uses the concept of separable processes and is particularly interesting. Definition 10.4 (Separability). A stationary stochastic process u(t) with E(u(t)) = 0 is separable (in Nuttall’s sense) if E(u(t − τ )|u(t)) = a(τ )u(t)

(10.9)

for some function a(τ ). A number of separable signals are listed in [29] and [30], e.g., Gaussian signals, random binary signals and several types of modulated signals. In addition, it has

10

Identification of Block-oriented Systems Using the Invariance Property

153

been shown that signals with elliptically symmetric distributions are separable [28] as well as random multisine signals with flat amplitude spectra [14]. It is easy to show that the function a(τ ) in (10.9) can be expressed using the covariance function of u(t). Lemma 10.1. Consider a separable stationary stochastic process u(t) with E(u(t)) = 0. The function a(τ ) from (10.9) can then be written a(τ ) =

Ru (τ ) . Ru (0)

(10.10)

Proof. The result is shown in [29] and [30] but since the proof is quite short and a good example of how the separability property can be used, it has been included here. Actually, the result follows immediately from the fact that Ru (τ ) = E(u(t)u(t − τ )) = E u(t)E u(t − τ )|u(t) = a(τ )E(u(t)2 ) = a(τ )Ru (0) if u(t) is separable. Here, we have used the facts that E(Y ) = E(E(Y |X)), E(g(X)Y |X) = g(X)E(Y |X) for two random variables X and Y [17].

(10.11a) (10.11b)

Furthermore, it is easy to show that the separability of u(t) is a sufficient condition for the invariance property (10.8) to hold. Consider a separable process u(t) with zero mean and a static nonlinearity such that y(t) has zero mean too. Then it follows that Ryu (τ ) = E f (u(t))u(t − τ ) = E E f (u(t))u(t − τ )|u(t) = E f (u(t))E u(t − τ )|u(t) = a(τ )E f (u(t))u(t) = b0 Ru (τ ), (10.12) where b0 = E f (u(t))u(t) /Ru (0) and where (10.10) has been used in the last equality. This result can be found in [29] and [30] together with the converse result, which says that separability, in a certain sense, is also a necessary condition for (10.8) to hold. Consider an arbitrary stationary stochastic process u(t) with zero mean and let Du be a class of Lebesgue integrable functions such that Du = { f : R → R | E f (u(t)) = 0, E f (u(t))2 < ∞, Ryu (τ ) = E f (u(t))u(t − τ ) exists ∀τ ∈ Z}. (10.13) The following result shows a certain equivalence between the invariance property and separability of the input signal. Theorem 10.3. Consider a stationary stochastic process u(t) with E(u(t)) = 0 and Ru (τ ) < ∞ for all τ ∈ Z. The invariance property (10.8) holds for all f ∈ Du if and only if u(t) is separable.

154

M. Enqvist

Proof. See [29] or [30].

Theorem 10.3 shows that it is impossible to find a more general signal class for which the invariance property holds for any function in Du . In particular, it explains exactly which feature of Gaussian signals that is crucial for Bussgang’s theorem to hold. Theorem 10.3 has been generalised to nonlinear finite impulse response systems using a separability concept where the conditioning in (10.9) is with respect to several signal components [15]. Using z-spectra, the invariance property can be written

Φyu (z) = b0 Φu (z). Hence, Corollary 10.1 gives that the OE-LTI-SOE of a static nonlinearity is static when the invariance property holds. It is easy to show that this result does not hold for all input signals. As will be shown in the next section, the invariance property has turned out to be quite useful for identification of some classes of block-oriented systems.

10.4 Block-oriented Systems The invariance property is particularly useful for block-oriented systems that contain one static nonlinearity, such as Wiener–Hammerstein systems. In this case, the OELTI-SOE turns out to be equal to the product of the transfer functions of the two linear subsystems and some constant. Theorem 10.4. Consider a Wiener-Hammerstein system y(t) = G2 (q)v(t) + w(t) where v(t) = f (n(t)) and n(t) = G1 (q)u(t) and where G1 (q) and G2 (q) are stable and causal LTI systems. Assume that u(t) and y(t) fulfil Assumptions 10.1 and 10.2 and that w(t) is uncorrelated with u(t − τ ) for all τ ∈ Z. Assume also that n(t) and v(t) fulfil Assumptions 10.1 and 10.2 and that the invariance property holds for n(t) and v(t) = f (n(t)) such that Φvn (z) = b0 Φn (z). Then the OE-LTI-SOE of this system is (10.14) G0,OE (z) = b0 G2 (z)G1 (z). Proof. We have

Φyu (z) = G2 (z)Φvu (z),

(10.15a)

−1

Φvn (z) = Φvu (z)G1 (z ),

(10.15b) −1

Φn (z) = G1 (z)Φu (z)G1 (z ).

(10.15c)

In addition, the invariance property (10.8) gives that

Φvn (z) = b0 Φn (z).

(10.16)

Inserting (10.15b) and (10.15c) in (10.16) gives

Φvu (z) = b0 G1 (z)Φu (z),

(10.17)

10

Identification of Block-oriented Systems Using the Invariance Property

155

and inserting (10.17) in (10.15a) gives

Φyu (z) = G2 (z)b0 G1 (z)Φu (z). Hence, (10.14) follows from Corollary 10.1.

Theorem 10.4 shows that the OE-LTI-SOE of a Wiener–Hammerstein system will be b0 G2 (z)G1 (z) when the invariance property holds for the static nonlinearity v(t) = f (n(t)). Hence, an estimated output error model will approach this model when the number of measurements tends to infinity and it is possible to obtain accurate information about the linear subsystems without compensating for or estimating the static nonlinearity in the system. This information is particularly useful if either G1 or G2 is equal to one, i.e., if we have either a Hammerstein or a Wiener system. In these cases, the OE-LTI-SOE will simply be a scaled version of the LTI subsystem and it is straightforward to estimate the static nonlinearity in a second step. However, for a Wiener–Hammerstein system, there is also the problem of factorising the OE-LTI-SOE into G1 (z) and G2 (z). It is easy to see that Theorem 10.4 can be applied for all Hammerstein systems with separable input signals, but the Wiener and Wiener–Hammerstein cases are harder to analyse. The main problem is that separability is not preserved under linear transformations. This means that there is no guarantee that a separable input signal u(t) will imply that the input n(t) = G1 (q)u(t) to the static nonlinearity is also separable. However, since a Gaussian input signal to an LTI system produces a Gaussian output signal, the separability of the signal is preserved in this case. Hence, it is natural that the special case of a Gaussian input signal has been studied in detail. Some results about identification of Wiener, Hammerstein and Wiener-Hammerstein systems using Gaussian input signals can, for example, be found in [7], [4], [5], [21] and [18]. All these papers cover complete Wiener–Hammerstein systems and use third order moments to factorise the OE-LTI-SOE in G1 (z) and G2 (z). Some related results for random multisine signals exist too [10, 11, 12]. It should be mentioned also that the approach to first identify the linear subsystems in a Wiener, Hammerstein or Wiener–Hammerstein system using the invariance property is based on asymptotic properties of the model estimates. Hence, it is useful mainly for relatively large datasets. The use of the invariance property is illustrated in the following simple example. Example 10.1. Consider the Hammerstein system y(t) = G(q) f (u(t)) + w(t), where 1 + 0.4q−1 , 1 − q−1 + 0.24q−2 f (x) = arctan(4x),

G(q) =

156

M. Enqvist

and w(t) is Gaussian white noise with E(w(t)) = 0 and E(w(t)2 ) = 1. The input to the system is 0.3 e(t), u(t) = 1 + 0.4q−1 where e(t) is Gaussian white noise with E(e(t)) = 0 and E(e(t)2 ) = 1. The signals w(t) and e(t) are independent. The behaviour of this system has been simulated for a particular realisation of the input and noise signals and a dataset with 20 000 input and output measurements has been collected. An output error model has been estimated from this dataset using the System Identification Toolbox in MATLABTM and the result was ˆ G(q) =

1 + 0.415q−1 . 1 − 0.973q−1 + 0.220q−2

Here, the numerator coefficients have been rescaled such that the first coefficient is one in order to make the comparison with the true linear subsystem easier. As can be ˆ seen, G(q) is a rather accurate approximation of G(q). This result is obtained since the invariance property holds for a Gaussian input signal.

10.5 Discussion The invariance property has been used for identification of block-oriented systems in numerous publications and is a well-established tool in the system identification community. However, the underlying theoretical results have also given rise to a large number of results in the statistical literature. Many of these results seem to have been developed independently without a lot of interaction with the system identification community. It seems likely that some methods that have been designed for particular problems in, for example, regression analysis should be useful also for identification of block-oriented dynamical systems. In [22], a nonlinear regression model y = f (ST x, e) is studied. Here, y is a scalar response variable, S is a p × q-dimensional matrix, x is a p-dimensional vector of explanatory variables and e is some unknown disturbance which is independent of x. The main result in [22] shows that a basis for the column space of the matrix S can be estimated using a method called sliced inverse regression, provided that the distribution of the x vector satisfies a condition about linear conditional expectations which is similar to the definition of separability. The main idea in sliced inverse regression is to estimate E(x|y) for different values of y and to extract information about the column space of S from these estimates using principal component analysis. It is mentioned in [22] that the method is applicable when, for example, x has an elliptical distribution. The method is particularly useful when q is much smaller than p. In this case, knowledge about S, or at least

10

Identification of Block-oriented Systems Using the Invariance Property

157

its column space, can be used to transform a high-dimensional nonlinear regression problem into a more low-dimensional and tractable one. A related approach to the same problem is presented in [23], where a method based on principal Hessian directions is used instead of sliced inverse regression. Numerous modifications and improvements of these original methods can also be found in literature. Investigation of the available methods for dimension reduction in the context of block-oriented systems, possibly with some modifications to handle also the infinite impulse response case, seems like an interesting topic for future research.

References 1. Atalik, T.S., Utku, S.: Stochastic linearization of multi-degree-of-freedom non-linear systems. Earthquake Engineering and Structural Dynamics 4, 411–420 (1976) 2. Atherton, D.P.: Nonlinear Control Engineering, student edn. Van Nostrand Reinhold, New York (1982) 3. Barrett, J.F., Lampard, D.G.: An expansion for some second-order probability distributions and its application to noise problems. IRE Transactions on Information Theory 1(1), 10–15 (1955) 4. Billings, S.A., Fakhouri, S.Y.: Theory of separable processes with applications to the identification of nonlinear systems. Proceedings of the IEE 125(9), 1051–1058 (1978) 5. Billings, S.A., Fakhouri, S.Y.: Identification of systems containing linear dynamic and static nonlinear elements. Automatica 18(1), 15–26 (1982) 6. Booton Jr., R.C.: Nonlinear control systems with random inputs. IRE Transactions on Circuit Theory 1(1), 9–18 (1954) 7. Brillinger, D.R.: The identification of a particular nonlinear time series system. Biometrika 64(3), 509–515 (1977) 8. Brown Jr., J.L.: On a cross-correlation property for stationary random processes. IRE Transactions on Information Theory 3(1), 28–31 (1957) 9. Bussgang, J.J.: Crosscorrelation functions of amplitude-distorted Gaussian signals. Technical Report 216, MIT Research Laboratory of Electronics, Cambridge, Massachusetts (1952) 10. Crama, P., Schoukens, J.: Initial estimates of Wiener and Hammerstein systems using multisine excitation. IEEE Transactions on Instrumentation and Measurement 50(6), 1791–1795 (2001) 11. Crama, P., Schoukens, J.: Hammerstein-Wiener system estimator initialization. Automatica 40(9), 1543–1550 (2004) 12. Crama, P., Schoukens, J.: Computing an initial estimate of a Wiener-Hammerstein system with a random phase multisine excitation. IEEE Transactions on Instrumentation and Measurement 54(1), 117–122 (2005) 13. Enqvist, M.: Linear Models of Nonlinear Systems. PhD thesis, Linköping University, Linköping, Sweden (2005) 14. Enqvist, M.: Identification of Hammerstein systems using separable random multisines. In: Proceedings of the 14th IFAC Symposium on System Identification, Newcastle, Australia, pp. 768–773 (2006) 15. Enqvist, M., Ljung, L.: Linear approximations of nonlinear FIR systems for separable input processes. Automatica 41(3), 459–473 (2005)

158

M. Enqvist

16. Gardner, W.A.: Introduction to Random Processes. Macmillan Publishing Company, New York (1986) 17. Gut, A.: An Intermediate Course in Probability. Springer, New York (1995) 18. Hunter, I.W., Korenberg, M.J.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cybernetics 55, 135–144 (1986) 19. Kailath, T., Sayed, A.H., Hassibi, B.: Linear Estimation. Prentice Hall, Upper Saddle River (2000) 20. Khalil, H.K.: Nonlinear Systems, 3rd edn. Prentice Hall, Upper Saddle River (2002) 21. Korenberg, M.J.: Identifying noisy cascades of linear and static nonlinear systems. In: Proceedings of the 7th IFAC Symposium on Identification and System Parameter Estimation, York, UK, pp. 421–426 (1985) 22. Li, K.-C.: Sliced inverse regression for dimension reduction. Journal of the American Statistical Association 86(414), 316–327 (1991) 23. Li, K.-C.: On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. Journal of the American Statistical Association 87(420), 1025–1039 (1992) 24. Ljung, L.: Convergence analysis of parametric identification methods. IEEE Transactions on Automatic Control 23(5), 770–783 (1978) 25. Ljung, L.: System Identification: Theory for the User, 2nd edn. Prentice Hall, Upper Saddle River (1999) 26. Ljung, L.: Estimating linear time-invariant models of nonlinear time-varying systems. European Journal of Control 7(2-3), 203–219 (2001) 27. Lutes, L.D., Sarkani, S.: Stochastic Analysis of Structural and Mechanical Vibrations. Prentice Hall, Upper Saddle River (1997) 28. McGraw, D.K., Wagner, J.F.: Elliptically symmetric distributions. IEEE Transactions on Information Theory 14(1), 110–120 (1968) 29. Nuttall, A.H.: Theory and application of the separable class of random processes. Technical Report 343, MIT Research Laboratory of Electronics, Cambridge, Massachusetts (1958) 30. Nuttall, A.H.: Theory and Application of the Separable Class of Random Processes. PhD thesis, MIT, Cambridge, Massachusetts (1958) 31. Scarano, G., Caggiati, D., Jacovitti, G.: Cumulant series expansion of hybrid nonlinear moments of n variates. IEEE Transactions on Signal Processing 41(1), 486–489 (1993) 32. Schetzen, M.: The Volterra & Wiener Theories of Nonlinear Systems. John Wiley & Sons, Chichester (1980) 33. Wiener, N.: Extrapolation, Interpolation and Smoothing of Stationary Time Series. Technology Press & Wiley, New York (1949)

Part IV

Frequency Methods

Chapter 11

Frequency Domain Identification of Hammerstein Models Er-Wei Bai

11.1 Introduction In this chapter, we discuss a frequency approach for Hammerstein model identification. The method is based on the fundamental frequency and therefore, no a priori information on the structure of the nonlinearity is required. Moreover, the method is not limited to Hammerstein models whose linear part is a finite-order rational transfer functions, but applies to Hammerstein models with a non-parametric linear part. The method can be easily extended to Wiener models with minor modifications. The chapter is based on an article [1], IEEE Trans. on Automatic Control, Vol. 48, pp.530-542, 2003 with permission from IEEE Intellectual Property Rights Office. All the proofs can be found in [1]. Use of sinusoidal inputs in identification of Hammerstein models has certain advantages. The periodicity of the input signals implies that all the signals inside the system consist of frequencies that are integer multiples of the input frequencies. Subharmonics or chaos can never happen. This makes identification simple. Another important observation in our approach is that with sinusoidal inputs, the output of the nonlinearity permits a Fourier series representation. Moreover, the Fourier coefficients are invariant with respect to the input frequencies. We remark that the idea of frequency domain identification to Hammerstein models is not new and appeared in the study of identification for Hammerstein models [6, 11, 17]. Though there were several approaches in the literature, they are more or less the same ideas as in [6, 9, 10]. In [10], the nonlinearity is assumed to be a polynomial with a known order. The reason is that once the order is known, the highest harmonic has a known frequency and behaves in a linear manner [10]. Thus, linear techniques based on the highest harmonic can be applied to identify the linear part. The problem is that the nonlinearity may not be a polynomial with a known order. Even it is a polynomial with known order, the coefficient of the highest order is usually very small. For Er-Wei Bai Dept. of Electrical and Computer Engineering, University of Iowa e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 161–180. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

162

E.-W. Bai

instance, in a practical situation, the unknown nonlinearity may be approximated by a polynomial. To have a reasonably good approximation, the order has to be high. Moreover, the coefficient with the highest order is usually small. This implies that the very signal used in identification with the highest harmonic has a small amplitude. This has a significant impact on the signal to noise ratio in identification and makes the method sensitive. The frequency approach of [6] also assumes the exact order knowledge of the nonlinearity. With a known order, the output is a known nonlinear function of input magnitude and frequency with some unknown parameters. By repeatedly applying different magnitudes and frequencies, these unknown parameters can be uniquely solved. Without the exact order information, uniqueness is however lost. The other frequency approach [9] also needs a parametrisation of the unknown nonlinearity.

11.2 Problem Statement and Point Estimation Consider the Hammerstein model shown in Figure 11.1, where u(t), v(t) y(t) and y f (t) are the system input, noise, output and filtered output respectively. x(t) denotes the unavailable internal signal. These are continuous time signals. u(iTs ) and y f (iTs ) denote the sampled input and sampled filtered output signals respectively with the sampling interval Ts that will be specified later. The filter is a lowpass filter at designer’s disposal. The goal of the frequency domain identification is to apply inputs of the form u(t) = Acos(ωk t), ωk = 0, t ∈ [0, T ] ˆ and then, to determine a pair of the estimates fˆ(·) and G(s) based on finite sampled ˆ ˆ → G(s) in inputs and filtered outputs u(iTs ) and y f (iTs ) so that f (·) → f (·), G(s) some sense. Note that the continuous time model G(s), not its discretised model, ˆ is our interests. The exact forms of fˆ(·) and G(s) will be given later. In fact, the ˆ ˆ forms of f (·) and G(s) depend on whether they are parametric or not. Just like the frequency identification approaches for linear systems, the proposed method may also have to be repeated for a number of frequencies. Assumption 11.1. Throughout the chapter, we assume 1. The nonlinearity x = f (u) is a static function that is continuous and piecewise smooth for u ∈ [−A, A], where [−A, A] is the input range of interest. 2. The linear part G(s) is an exponentially stable continuous time system which can be represented by a rational transfer function or simply non-parametric. 3. Noise v(t) is a continuous time random signal that is the output of an unknown stable and proper linear system driven by a white noise source with zero mean and finite variance. No rationality on the transfer function G(s) and no a priori knowledge on the structure of the unknown nonlinearity f (·) are assumed. The standard polynomial nonlinearities as well as many hard input nonlinearities, e.g., deadzone and saturation, belong to the class specified by Assumption 11.1.

11

Frequency Domain Identification of Hammerstein Models

163

v(t) u(t)

f(u)

-

x(t)

-

G(s)

-

y(t)

+

?

-

+ -

?

filter

u(iTs)

-

yf( iTs )

Fig. 11.1: The Hammerstein model

11.2.1 Continuous Time Frequency Response Note that the nonlinearity x = f (u) is continuous and piecewise smooth. If the input u(t) = Acos(ωk t) which is an even and periodic function with the period 2ωπk , then, x(t) is also an even and periodic function that is continuous and piecewise smooth, and consequently it permits a Fourier series representation ∞

x(t) = ∑ ri cos(iωk t)

(11.1)

i=0

where the Fourier coefficients are given by r0 =

ωk 2π

2π ωk

0

f (Acos(ωk t))dt, ri =

ωk π

2π ωk

0

f (Acos(ωk t))cos(iωk t)dt, i = 1, 2, ...

Moreover, since f (u)|t=0 = f (u)|t=2π /ωk and x(t) is continuous and piecewise ¯ smooth, it follows from [15] that |x(t) − ∑ii=0 ri cos(iωk t)| → 0 uniformly in t as i¯ → ∞. Lemma 11.1. Let u(t) = Acos(ωk t) and x(t) be represented by the Fourier series representation (11.1). Then, 1. The Fourier coefficients ri ’s are independent of the input frequency ωk . In other words, the Fourier series expression (11.1) is valid for any non-zero input frequency with the identical Fourier coefficients ri ’s. 2. If the nonlinearity is odd, i.e., f (−u) = − f (u), then r2i = 0, i = 0, 1, 2, .... 3. If the nonlinearity is even, i.e., f (−u) = f (u), then r2i+1 = 0, i = 0, 1, 2, .... The lemma shows that ri ’s are independent of the input frequency ωk . This observation is the key that makes the frequency domain identification of Hammerstein models possible. We now define the finite time Fourier transform. Given the input frequency ωk and with the observation interval

164

E.-W. Bai

T =L

2π ωk

(11.2)

for some integer L > 0, let VT (ω ) YT (ω )

!

= 0T v(t)e− jω t dt, UT (ω ) = ! = 0T y(t)e− jω t dt, Y f ,T (ω ) =

!T u(t)e− jω t dt, !0T − jω t 0

y f (t)e

(11.3)

dt

denote the finite time Fourier transforms of v(t), u(t), y(t) and y f (t) respectively, where u(t) = Acos(ωk t) is the input, and v(t), y(t) and y f (t) are the noise, output and filtered output respectively. When u(t) = Acos(ωk t), since x(t) = ∑∞ i=0 ri cos(iωk t) and the linear part is an unknown transfer function G(s), it follows that y(t) = ∑∞ i=0 ri |G( j ωk i)|cos(iωk t + ∠G( jωk i)).

11.2.2 Point Estimation of G( jω ) Based on YT and UT In this section, we develop theoretical framework in continuous time domain based on the continuous time model G(s) and continuous time signals u(t), x(t) and y(t). Digital implementation using only the sampled u(iTs ) and y f (iTs ) will be discussed in the next section. ¯ jωk ) of G( jω ) at Given the input u(t) = Acos(ωk t), define the point estimate G( ω = ωk as 1 Y (ω ) ¯ j ωk ) = A T T k G( (11.4) 1 T UT (ωk ) where YT (ωk ) and UT (ωk ) are the finite time Fourier transforms as defined in (11.3). It is a straightforward calculation to show that T1 UT (ωk ) = A2 and because ∞ ∑i=0 ri |G( jωk i)|cos(iωk t + ∠G( jωk i)) is absolutely integrable [15], we have 1 1 YT (ωk ) = T T

T ∞

1

∑ ri |G( jωk i)|cos(iωk t + ∠G( jωk i))e− jωk t dt + T VT (ωk )

0 i=0

r1 1 G( jωk ) + VT (ωk ) . 2 T It is important to note at this point that in the characterisation of the Hammerstein model, the gains of f (u) and G(s) are actually not unique because of the product. Any pair (α f (u), G(s)/α ), α = 0, would produce identical input and output measurements. There are several ways to make the representation unique, e.g., either the gain of f (u) or G(s) can be fixed to be unit. In this chapter we take a different approach and assume =

Assumption 11.2. The coefficient r1 in (11.1) is normalised to be one, i.e., r1 = 1.

11

Frequency Domain Identification of Hammerstein Models

165

Normalisation of r1 = 1 is arbitrary. In theory, r1 may be normalised to be any fixed number or we can also normalise any ri , i = 1, to be unity in the case that r1 = 0. From the above assumption, it follows that ¯ j ωk ) = A G(

1 T YT (ωk ) 1 T UT (ωk )

= G( jωk ) +

2 VT (ωk ) . T

(11.5)

¯ jωk ) using YT (ωk ) and UT (ωk ). The following theorem gives the quality of G( Theorem 11.1. Consider the Hammerstein model under Assumptions 11.1 and ¯ jωk ) in (11.4) by frequency domain identifi11.2. Consider the point estimate G( cation. Then, uniformly in k, ¯ jωk ) → G( jωk ) G( in probability as T → ∞. ¯ jωk ) of G( jωk ) can be accurately Similar to the linear case, the point estimate G( obtained in the presence of the unknown nonlinearity f (·) provided that the continuous time data u(t) and y(t) are available. However, in most applications, only sampled values are available and thus, we discuss implementation of this point estimation algorithm by using the sampled data u(iTs ) and y f (iTs ). We will show that the identification results remain almost identical if the lowpass filter and the sampling interval Ts are properly chosen as suggested in [14].

11.2.3 Implementation Using Sampled Data ¯ jωk ) is the calculations of 1 UT (ωk ) and 1 YT (ωk ) that involves A key to find G( T T continuous time integrations. We show in this section that these quantities are computable by applying DFTs (discrete Fourier transform) on the sampled input and filtered output u(iTs ) and y f (iTs ). There are three steps involved: the choice of lowpass filter cutoff frequency ω¯ , the determination of the sampling interval Ts and the calculation of DFTs. Filter choice: When u(t) = Acos(ωk t), the output is given by ∞

y(t) = ∑ ri |G( jωk i)|cos(iωk t + θi ) + v(t) i=0

with θi = ∠G( jωk i) . Recall that the purpose of the point estimation is to estimate G( jωk ). In the absence of any structural prior information on the unknown nonlinearity, we will see later however that the pair (u(iTs ), x(iT ˆ s )) plays an impor¯ tant role, where x(t) ˆ = ∑ii=0 rî cos(iωk t) is the estimate of unknown internal variable x(t) = ∑∞ i=0 ri cos(iωk t) and rî is the estimate of ri . To estimate ri based on the sampled data u(iTs ) and y f (iTs ), y f (t) must contain frequencies up to i¯ωk . To this end, let the cutoff frequency ω¯ of the lowpass filter be

166

E.-W. Bai

i¯ωk < ω¯ < (i¯ + 1)ωk

(11.6)

for some integer i¯ ≥ 1. Then, the output y f (t) of the lowpass filter is in the form of i¯

y f (t) = ∑ ri |G( jωk i)|cos(iωk t + θi ) + v f (t)

(11.7)

i=0

where v f (t) is the filtered noise. Here, we assume that the higher order terms i > i¯ are negligible. How to deal with these small errors will be provided later in discussions of Sections 3.1 and 3.2. Determination of the sampling interval Ts . Since the highest frequency remaining in y f (t) due to the input is i¯ωk , we define the sampling interval by 2π 1 , M>2. (11.8) Ts = ¯ iωk M The choice of the integer M > 2 is to make sure that the sampling frequency is always higher than the Nyquist frequency i¯ωk /π . Obviously, from (11.8), we have T =L

2π ¯ s , Ts /T = 1 , ωk Ts = 2π . = LiMT i¯M ωk Li¯M

We comment that a large number of simulation seems to suggest that in many cases identification results are similar with or without the lowpass filter. One ex¯ planation is that because | ∑∞ i=i¯+1 ri cos(iωk t)| → 0 as i gets larger, when rî → ri and ∞ | ∑i=i¯+1 ri cos(iωk t)| is already small, the use of the lowpass does not make too much difference. In the absence of the lowpass filter, the choice of the sampling interval (11.8) remains valid. DFT implementation: With the sampled input and filtered output data u(iTs ) and y f (iTs ), i = 0, 1, ..., Li¯M − 1, we now define the DFTs of u(iTs ) and y f (iTs ). ¯

1 LiM−1 Y f ,DFT (pωk ) = ¯ ∑ y f (lTs )e− j pωk lTs LiM l=0 =

1 T

Li¯M−1

∑

(11.9)

y f (lTs )e− j pωk lTs Ts , p = 0, 1, ..., i¯

l=0

and ¯

1 LiM−1 1 UDFT (pωk ) = ¯ ∑ u(lTs )e− j pωk lTs = T LiM l=0

Li¯M−1

∑

u(lTs )e− j pωk lTs Ts .

(11.10)

l=0

These DFTs Y f ,DFT (ωk ) and UDFT (ωk ) have a very clear interpretation with respect to the continuous time integrations T1 UT (ωk ) and T1 Y f ,T (ωk ). In fact,

11

Frequency Domain Identification of Hammerstein Models

Y f ,DFT (ωk ) and UDFT (ωk ) are numerical integrations of T1 UT (ωk ) and by Li¯M rectangular of equal width Ts . For UDFT (ωk ), recall again that u(t) = Acos(ωk t). This implies

167 1 T Y f ,T (ωk )

u(iTs ) = Acos(ωk iTs ) and

¯

1 LiM−1 A jωk lTs UDFT (ωk ) = ¯ + e− jωk lTs )e− jωk lTs ∑ 2 (e LiM l=0 ¯

1 LiM−1 A A = ¯ ∑ 2 (1 + e−2 jωklTs ) = 2 . LiM l=0 This implies that UDFT (ωk ) = A2 = T1 UT (ωk ). In other words, the continuous time integration T1 UT (ωk ) can be obtained exactly by UDFT (ωk ) which is computable ¯ − 1. From (11.7), we now calusing only the sampled input u(iTs ), i = 0, 1, ..., LiM culate ¯

¯

1 LiM−1 i Y f ,DFT (pωk ) = ¯ ∑ ∑ ri |G( jωk i)|cos(iωk lTs + θi)e− j pωk lTs LiM l=0 i=0 ¯

1 LiM−1 + ¯ ∑ v f (lTs )e− j pωk lTs . LiM l=0 The second term is exactly the DFT V f ,DFT (pωk ) of v f (lTs ) and the first term can be rewritten as ¯

¯

1 i 1 LiM−1 j(i−p)ωk lTs jθi r |G( j ω i)| e + e− j(i+p)ωklTs e− jθi ) i k ∑ ∑ (e Li¯M i=0 2 l=0 =

rp 2 G( j ωk p) r0 G( j0)

p = 1, 2, ..., i¯ p = 0.

In particular, when p = 1, Y f ,DFT (ωk ) =

r1 1 G( jωk ) + V f ,DFT (ωk ) = G( jωk ) + V f ,DFT (ωk ) . 2 2

We comment that the calculation of UDFT (ωk ) and Y f ,DFT (ωk ) is well known in the literature [7]. We now define the point estimate G¯ d ( jωk ) using only the sampled u(iTs ) and y f (iTs ) by Y f ,DFT (ωk ) . (11.11) G¯ d ( jωk ) = A UDFT (ωk )

168

E.-W. Bai

From the calculation of Y f ,DFT (ωk ) and UDFT (ωk ), it follows that G¯ d ( jωk ) = G( jωk ) + 2V f ,DFT (ωk ) where V f ,DFT (ω ) is the DFT of v f (t), and the estimation error is G¯ d ( jωk ) − G( jωk ) = 2V f ,DFT (ωk ) . We now summarise the algorithm for estimating G( jωk ) using only the sampled u(iTs ) and y f (iTs ). Identification algorithm for estimating G( jωk ) using only the sampled data. Given u(t) = Acos(ωk t), let i¯ωk < ω¯ < (i¯+ 1)ωk for some integer i¯ ≥ 1, Ts = i¯2ωπ M1 k

for some integer M > 2 and T = L 2ωπ for some integer L > 0. k Step 1: Collect u(iTs ) and y f (iTs ), i = 0, 1, ..., Li¯M − 1. Step 2: Calculate Y f ,DFT (ωk ) and UDFT (ωk ). Y (ω ) Step 3: Define the estimate G¯ d ( jωk ) = A Uf ,DFT(ω k) . DFT k The estimate G¯ d ( jωk ) is computable using only the sampled data. Moreover, as its continuous counterpart, G¯ d ( jωk ) → G( jωk ) as T → ∞ as shown in the following theorem. Theorem 11.2. Consider the point estimate G¯ d ( jωk ) of (11.11) with T and Ts defined in (11.2) and (11.8) respectively. Then, uniformly in k, G¯ d ( jωk ) = G( jωk ) + 2V f ,DFT (ωk ) → G( jωk ) in probability as T → ∞.

11.3 Identification of G(s) Given the point estimates G¯ d ( jωk )’s, to find a G( jω ) is a curve fitting problem. Whether a particular method is effective for identification of G( jω ) depends on the assumptions of G( jω ). If G( jω ) is non-parametric, it is expected that the method is complicated and tedious. On the other hand, the identification is much easier if the unknown G(ω ) is known to be an nth order rational transfer function.

11.3.1 Finite-order Rational Transfer Function G(s) In this section, we will discuss a simple case when the unknown G(s) is characterised by an nth order stable rational transfer function G(s) =

b1 sn−1 + b2 sn−2 + ...... + bn . sn + a1sn−1 + a2sn−2 + ...... + an

(11.12)

11

Frequency Domain Identification of Hammerstein Models

169

The unknown coefficient vector θ and its estimate θˆ are denoted by

θ = (b1 , ......, bn , a1 , ....., an ) , θˆ = (bˆ 1 , ......, bˆ n , aˆ1 , ....., aˆn ) . The simplest way to find θˆ is to solve the least squares minimisation [12]. Let e(θˆ , ωk ) = (( jωk )n−1 bˆ 1 + ... + bˆ n) − G¯ d ( jωk )(( jωk )n + ( jωk )n−1 aˆ1 + ... + aˆ n) . Then, the estimate θˆ is obtained by N

θˆ = arg min ∑ e(θˆ , ωk )2

(11.13)

k=1

for some N ≥ n. Clearly, if G¯ d ( jωk ) = G( jωk ), e(θ , ωk ) = 0 and θˆ = θ . Now, we ˆ define the estimate G(s) as ˆ = G(s)

bˆ 1 sn−1 + ... + bˆ n n s + aˆ1sn−1 + ... + aˆ n

(11.14)

The following theorem can be easily derived that gives the estimation error analysis. Theorem 11.3. Let G( jω ) = 0, ∀ω and let the parameter vector estimate θˆ and ˆ the transfer function estimate G(s) be defined by (11.13) and (11.14) with N ≥ n. Suppose G¯ d ( jωk ) → G( jωk ) in probability as T → ∞. Then, ˆ jω ) − G( jω )| → 0 θˆ → θ , sup |G( ω

in probability as T → ∞. ˆ jω ) of (11.13) and (11.14) are consisWe remark that the least squares solutions G( ¯ tent in theory because Gd (ωi ) → G( jω ) as T → ∞. In some applications, however, the least squares may not perform well due to various reasons. For instance, (1) When T is finite which is always the case in reality, G¯ d ( jωi ) = G( jωi ) and this ˆ jω ), (2) The lowpass filter is not introduces errors on the least squares estimate G( ideal or the noise may not be completely captured by the assumptions. This again causes errors on the point estimate G¯ d ( jωi ) and consequently the least squares esˆ jω ) and (3) A large range of input frequencies can over-emphasise high timate G( frequency errors and results in a poor low frequency fit. To overcome these difficulties, the iterative least squares can be used. Let θˆ (l) be the estimate obtained at the lth iteration, the iterative least squares solution of θˆ (l+1) consists of minimising N

θˆ (l+1) = arg min ∑

e(θˆ (l+1) , ωk )2

(l) (l) k=1 ( j ωk )n + ( j ωk )n−1 aˆ1 + ... + aˆ n 2

.

170

E.-W. Bai

Alternatively, the nonlinear least squares estimate can be defined N

θˆ = arg min ∑ G¯ d (ωk ) − k=1

bˆ 1 ( jωk )n−1 + ... + bˆ n 2 ( jωk )n + aˆ1 ( jωk )n−1 + ... + aˆ n

and solved using numerical methods, e.g., the Newton-Gauss scheme. For both the iterative least squares and the nonlinear least squares, the linear least squares solutions provided by (11.13) and (11.14) can be used as an initial estimate to begin with. It was shown in [14] that, if convergent, the iterative least squares and the nonlinear least squares tend to give a smaller estimation error. For details, see Sections 7.8 and 7.9 of [14].

11.3.2 Non-parametric G(s) Given the point estimates, how to find the transfer function is a classical problem. There exist some methods in the literature that could be modified and used here. For instance, the well known spectral analysis method [13] aims to determine the transfer function based on spectral estimation and smoothing. Here, we adopt an approach based on interpolation technique which is used in H∞ identification setting. To this end, consider the standard bilinear transformation s=

1 λ −1 1 + sγ or λ = γ λ +1 1 − sγ

for some γ > 0. In this section, we set the numerical value γ = 1. Now define H(λ ) = G(s)|s= λ −1 = λ +1

∞

∑ hk λ −k .

k=0

Since G(s) is unknown but exponentially stable and the bilinear transformation preserves the stability, the unknown H(λ ) satisfies |hk | ≤ M1 ρ k for some constants M1 > 0 and 0 < ρ < 1. Further, let s = jω and λ = e jΩ , we have jω =

e jΩ − 1 Ω Ω = jtan or ω = tan . j Ω e +1 2 2

Our idea of identification is to use the point estimate G¯ d ( jωk ) of (11.11) at ωk = ˆ λ) = tan kNπ or Ωk = 2kNπ , k = 0, 1, ..., N − 1 for some N > 0. Then, we construct H( N−1 ˆ −k ∑k=0 hk λ such that ˆ jΩk ) = H(e

N−1

¯ jωk ), k = 0, 1, ..., N − 1 . ∑ hˆ l e− jΩk l = G(

(11.15)

l=0

ˆ Finally, we define the estimate G(s) of G(s) as ˆ = H( ˆ λ )| 1+s . G(s) λ= 1−s

(11.16)

11

Frequency Domain Identification of Hammerstein Models

171

Theorem 11.4. Let ωk = tan kNπ , Tk = Lk 2ωπk , k = 0, 1, ..., N − 1 and T = mink Tk . ˆ Consider the estimate G(s) of (11.16). Suppose G¯ d (ωk ) → G(ωk ) in probability, and N → ∞ and N/T → 0 as T → ∞. Then, uniformly in ω , ˆ jω ) − G( jω ))| ≤ O(ρ N ), E|G( ˆ jω ) − G( jω )|2 ≤ O(ρ 2N ) + O( N ) → 0 |E(G( T ˆ jω ) converges to G( jω ) uniformly in probability. as T → ∞. In other words, G( The idea of the above estimate is the polynomial interpolation of (11.15). Under Assumptions 11.1 and 11.2, G¯ d ( jωk ) → G( jωk ) in probability and consequently ˆ jω ) → G( jω ). However, if G¯ d ( jωk ) → G( jωk ) for various reasons, e.g., T is G( finite, then Gˆ d ( jωk ) = G( jωk ) and, say only |G¯ d ( jωk ) − G( jωk )| ≤ ε can be guaranteed for some small but non-zero ε > 0. Then, the polynomial interpolation of (11.15) tends to show some overshooting for very large N. In fact, the overshooting is in the order of ε ln N as N → ∞. To avoid this problem, several methods can be used, e.g., interpolations using splines. We discuss the following two robustness modifications: • If only a finite frequency range is interested, we can apply Fejer interpolation which matches the given data but also limits the magnitude of the derivatives. ˆ jω ) − G( jω )| = O(ε ) as N → ∞ ˆ jω ) satisfies |G( Then, the obtained estimate G( in the frequency range of interest. This algorithm is linear and see [4] for details. • If the whole frequency range (−∞, ∞) is interested, it is well known that there does not exist any linear algorithm which ensures the robustness in the presence of small but non-vanishing errors in the point estimation of G¯ d ( jωk ). Here, linear algorithm means that the algorithm is linear from the given data to the estimate. In this case, a two stage nonlinear algorithm [8] can be applied. The first stage of the algorithm is to find a non-causal system and the second stage is to apply the Nehari approximation to find a best fit which is stable and causal. The error between the estimate and the true system is in the order of O(ε ) as N → ∞ [8]. Since the above two modifications are available in the literature, we only provide a brief discussion and interested readers can find more from [6, 8]. We also comment that the approach adopted here is based on the interpolation technique. Other techniques, e.g., the well known spectral analysis method can also be modified and used to determine the transfer function. The key is the reliable point estimation provided in the previous sections.

11.4 Identification of the Nonlinear Part f (u) ˆ Once the linear part G(s) is identified, we can estimate the nonlinear part x = fˆ(u). Two cases are discussed: (1) There is no a priori knowledge on the structure of the unknown f (u) and (2) f (u) is represented by a polynomial with a known order. In both cases, we need to estimate the ri ’s.

172

E.-W. Bai

11.4.1 Unknown Nonlinearity Structure Although the structure of the nonlinearity is assumed to be unknown, the nonlinearity is static and can be determined by the graph information using pairs (u(iTs ), x(iTs )). The input u(iTs ) is available and therefore, recovery of x(t) and, consequently x(iTs ), becomes a key in determining the nonlinearity. The input is in the form of u(t) = Acos(ωk t) and the output of the nonlinearity ¯ is x(t) = ∑∞ ˆ = ∑ii=0 rî cos(iωk t) for i=0 ri cos(iωk t). Define the estimate of x(t) as x(t) some integer i¯ > 0. Here rî denotes the estimate of ri . Clearly the estimation error is given by i¯

|x(t) − x(t)| ˆ = | ∑ (ri − rî )cos(iωk t) + i=0

i¯

≤ ∑ |ri − rî | + | i=0

∞

∑

∞

∑

ri cos(iωk t)|

i=i¯+1

ri cos(iωk t)| .

(11.17)

i=i¯+1

By the continuity and piecewise smoothness condition on f (·), the second term converges to zero uniformly [15] as i¯ → ∞. We need now to find the estimates rî ’s so that the first term also converges to zero. Recall when u(t) = Acos(ωk t), UDFT (ωk ) = A/2 and ri ¯ 2 G( j ωk i) + V f ,DFT (iωk ) i = 1, 2, ..., i , Y f ,DFT (iωk ) = r0 G( j0) + V f ,DFT (0) i=0. Define the estimate rî ’s by rˆ0 =

rî =

G( j0) V f ,DFT (0) A Y f ,DFT (0) = r0 + ˆ ˆ j0) ˆ j0) 2G( j0) UDFT (ωk ) G( G(

(11.18)

Y f ,DFT (iωk ) G( jiωk ) 2V f ,DFT (iωk ) A = ri + , i = 1, 2, ..., i¯ . (11.19) ˆ ˆ jiωk ) ˆ jiωk ) U ( ω ) G( jiωk ) DFT k G( G(

ˆ jω ) − G( jω ), this implies ˜ jω ) = G( With G( |ˆr0 − r0 | = | −

˜ j0) V f ,DFT (0) G( ˜ j0)|| r0 | + | V f ,DFT (0) | r + | ≤ |G( ˆ j0) 0 ˆ j0) ˆ j0) ˆ j0) G( G( G( G(

and for i = 1, 2, ..., i¯, |ˆri − ri | = | −

˜ jiωk ) 2V f ,DFT (iωk ) 2V f ,DFT (iωk ) G( ˜ jiωk )|| ri ri + | ≤ |G( |+| |. ˆ ˆ ˆ ˆ jiωk ) G( jiωk ) G( jiωk ) G( jiω0 ) G(

˜ jiω0 ) → 0 in probability as T → ∞, we have Clearly, if G( jω ) = 0, ∀ω and G( rî → ri , i = 0, 1, ..., i¯ and this results in the following theorem.

11

Frequency Domain Identification of Hammerstein Models

173

ˆ jωk i) → Theorem 11.5. Assume that G( jω ) = 0 ∀ω , and uniformly in ωk i G( G( jωk i) in probability. Let rî ’s be given by (11.18) and (11.19). Suppose i¯ → ∞, ¯2 N → ∞, i¯2 ρ 2N → 0, i TN → 0 as T → ∞, where T is defined in Theorem 11.4. Then, in probability |x(t) − x(t)| ˆ →0 ˆ s )| → 0. uniformly as T → ∞ and consequently, |x(iTs ) − x(iT We comment that for each input frequency ωk , we could derive a set of estimates rî (ωk ), i = 0, 1, ..., i¯, where ωk emphasises that the estimate is derived when input frequency is at ωk . In applications, an average is recommended rî =

rî (ω1 ) + rî (ω2 ) + ... + rî(ωN ) , i = 0, 1, ..., i¯ . N

In this section, the structure of the input nonlinearity is assumed to be unknown and thus estimation relies on the graph given by the pairs (u, x). ˆ Once the graph is obtained, its structure can be determined. The next step is to parametrise this nonlinearity by using appropriate base functions, e.g., x = f (u) = ∑ fl (u, αl ) for some known nonlinear functions fl ’s and unknown coefficients αl ’s. The choice of fl ’s of course depends on the structure shown in the graph. Then, the optimal αˆ can be calculated αˆ = arg min ∑(∑ fl (u(iTs ), αl ) − x(iT ˆ s ))2 . (11.20) α

i

l

In the case that the nonlinearity f (·) is even or odd, then from Lemma 11.1, the number of the Fourier coefficients ri ’s which have to be estimated can be cut into half.

11.4.2 Polynomial Nonlinearities In this section, we discuss a simple case when the unknown nonlinearity is parametrised by a polynomial x = ∑li=0 βi ui . The exact order of the polynomial is not necessarily known. However, an upper bound l is assumed to be available. Note that the identification of such a nonlinearity can be carried out without using the structure as discussed in the previous section section. However, if such information is available, these information should be taken into consideration in identification. Denote l l l1 = , l2 = − rem(l + 1, 2) 2 2

(11.21)

where 2l rounds l/2 to the nearest integer towards zero and rem(l + 1, 2) is the remainder after division (l + 1)/2. When u(t) = Acos(ωk t), it follows that l

x(t) = ∑ βi ui (t) = i=0

l

∑

i=0,even

βi Ai cosi (ωk t) +

l

∑

i=0,odd

βi Ai cosi (ωk t)

(11.22)

174

E.-W. Bai

= β 0 A0 +

l1

m−1

1

∑ β2m A2m 22m−1 [ ∑ ci2m cos((2m − 2i)ωkt) +

m=1

i=0

l2

+

m

l

i=0

i=0

1

cm 2m ] 2

∑ β2m+1A2m+1 22m ∑ ci2m+1cos((2m − 2i + 1)ωkt) = ∑ ri cos(iωkt)

m=0

where l1

r0 = β0 +

A2m

∑ β2m 22m cm2m

(11.23)

m=1 l2

r2k+1 =

∑ β2m+1

m=k l1

r2k =

A2m+1 m−k c , k = 0, 1, ......, l2 22m 2m+1

A2m

∑ β2m 22m−1 cm−k 2m ,

(11.24)

k = 1, 2, ......, l1

(11.25)

m=k

with

m(m − 1)...(m − k + 1) . (11.26) 1 · 2... · ... · k From equations (11.23), (11.24) and (11.25), we see that βi ’s and rk ’s satisfy the following equations. ckm =

⎛

1

⎜ ⎜0 ⎜ ⎜0 ⎜ ⎜. ⎜. ⎝. 0 ) ⎛A ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ )

20

c01

0 0 .. . 0

A2 1 c 22 2 A2 0 c 21 2

0 .. . 0

A4 2 c 24 4 A4 1 c 23 4 A4 0 c 23 4

.. . 0

... ... ... .. . *+

⎞ A2l1 l1 c ⎛ 22l1 2l1 A2l1 l1 −1 ⎟ ⎟ c 22l1 −1 2l1 ⎟ ⎜ A2l1 l1 −2 ⎟ ⎜ c 22l1 −1 2l1 ⎟ ⎜ .. .

A2l1 0 c 22l1 −1 2l1

...

Σ0

A3 1 c 22 3 A3 0 c 22 3

0 .. . 0

A5 2 c 24 5 A5 1 c 24 5 A5 0 c 24 5

.. . 0

*+ Σ1

... ... ... .. . ...

β0 β2 . ⎟ ⎝ .. ⎟ ⎠ β2l

2

⎛

r0 ⎟ ⎜ r2 ⎟ ⎜ ⎟ = ⎜ .. ⎠ ⎝ .

⎞ ⎟ ⎟ ⎟, ⎠

(11.27)

r2l1

1

,

⎞ A2l2 +1 l2 c 2l 2l +1 ⎛ 2 2 2 A2l2 +1 l2 −1 ⎟ ⎟ c 22l2 2l2 +1 ⎟ ⎜ A2l2 +1 l2 −2 ⎟ ⎜ c ⎜ 22l2 2l2 +1 ⎟ ⎟⎝ .. . A2l2 +1 0 2l2 c2l2 +1

⎞

β1 β3 .. .

⎞

⎛

r1 r3 .. .

⎞

⎟ ⎜ ⎟ ⎟ ⎜ ⎟ ⎟=⎜ ⎟, ⎠ ⎝ ⎠ ⎟ ⎠ β2l +1 r2l2 +1 2

(11.28)

,

where l1 and l2 are defined in (11.21). The matrices Σ0 and Σ1 are independent of unknown βi ’s and rk ’s. Since there is one-to-one map between βi ’s and rk ’s, the estimates βî ’s and fˆ = l ∑i=0 βî ui can be easily obtained based on the estimates of rî ’s,

11

Frequency Domain Identification of Hammerstein Models

⎛ ˆ ⎞ ⎛ ⎞ β0 rˆ0 ⎜ βˆ2 ⎟ ⎜ rˆ2 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ . ⎟ = Σ0−1 ⎜ .. ⎟ , ⎝ .. ⎠ ⎝ . ⎠ ˆ rˆ2l1 β2l1

⎛ ˆ ⎞ ⎛ ⎞ β1 rˆ1 ⎜ βˆ3 ⎟ ⎜ rˆ3 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ . ⎟ = Σ1−1 ⎜ .. ⎟ , ⎝ .. ⎠ ⎝ . ⎠ rˆ2l2 +1 βˆ2l2 +1

and

175

(11.29)

l

fˆ(u) = ∑ βî ui (t) .

(11.30)

i=0

Clearly, if rî → ri in probability as T → ∞, then βî → βi and supu∈[−A,A] | fˆ(u) − f (u)| → 0 in probability as T → ∞. We now summarise the above discussion into the following theorem. Theorem 11.6. Let rî ’s be given as in (11.18) and (11.19). Consider the estimates βî ’s and fˆ(u) = ∑li=0 βî ui derived from (11.29) and (11.30). Then, under the conditions of Theorem 11.5, in probability as T → ∞

βî → βi ,

sup | fˆ(u) − f (u)| → 0 . u∈[−A,A]

We comment that the inverses Σ0−1 and Σ1−1 are involved in calculating the estimates. In theory, the matrices Σ0 and Σ1 are always nonsingular. However, these matrices become ill-conditioned very soon, see Table 11.1 for condition numbers. For a low dimension polynomial, the method of (11.29) and (11.30) are fairly effective. For a higher dimensional polynomial, caution has to be exercised because of large condition numbers. For a really high order polynomial, a two step method preˆ s ) to determine sented in the previous section can be used, i.e., using u(iTs ) and x(iT the nonlinearity and then using the optimisation (11.20) to find the polynomial coefficients. Alternately, an orthogonal polynomial approach may be used to overcome this difficulty. Table 11.1: Condition numbers of Σ0 and Σ1 dimension 1 2

3

4

5

6

cond(Σ0 ) 1 2.6 14.4 82.4 447 2780 cond(Σ1 ) 1 6.3 37.6 220 1294 7570

11.5 Simulation In this section, we consider two numerical examples. Example 1: The unknown nonlinear and linear parts are given, respectively, by 3

f (u) = u + u2 = ∑ βi ui , G(s) = i=0

s+1 . s2 + 5s + 6

176

E.-W. Bai

The nonlinearity is known to be a polynomial with the maximum order 3 and the linear part is a second order transfer function. The noise v(t) is a random signal uniformly distributed in [−0.25, 0.25] and the input is u(t) = Acos(ωit), A = 1, i = 1, 2, 3 with ω1 = 0.5 ω2 = 1, ω3 = 5 and Ti = 100 2ωπi . For input frequency ωi , the sampling interval is set to be 100 · 2ωπi /10000 = 50πωi . No lowpass filter is used in simulation, i.e., y f (t) = y(t). Because the linear part is parametric, we use the estimate of (11.14). The identified linear and nonlinear coefficients are shown in the tables 11.2, 11.3, 11.4. Table 11.2: The true values and the estimates of ri ’s

true values estimates, ω1 = 0.5 estimates, ω2 = 1 estimates, ω3 = 5 average

r0

r1

r2

r3

0.5 0.5073 0.5020 0.4947 0.5013

1 1.0027 0.9975 1.0001 1.0001

0.5 0.4947 0.4977 0.5108 0.5011

0 0.0055 0.0097 0.0052 0.0068

Table 11.3: The true values and the estimates of βi ’s

β0 true values 0

β1

β2

β3

1

1

0

estimates 0.003 1.0052 1.0022 -0.0068 Table 11.4: The true values and the estimates of θ = (θ1 , θ2 , θ3 , θ4 )

θ1 true values 1

θ2

θ3

θ4

1

5

6

estimates 0.9861 1.0128 4.8555 6.0745

ˆ Thus, the estimates of fˆ(·) and G(s) are given by ˆ = fˆ(u) = 0.003 + 1.0052u + 1.0022u2 − 0.0068u3, G(s)

0.9861s + 1.0128 s2 + 4.8555s + 6.0745

which are very close to the true but unknown f (u) and G(s). The true (solid line) and the estimated (dash-dot) nonlinearities are shown in Figure 11.2, and the Bode plots of the true (solid line) and the estimated transfer functions are shown in Figure 11.3. They are basically indistinguishable.

11

Frequency Domain Identification of Hammerstein Models

177

Fig. 11.2: The true (solid) and the estimated (dash-dot) nonlinearities

Example 2: The linear part is the same as in Example 1. However, the nonlinear part is a saturation nonlinearity as shown in Figure 11.4 in solid line, ⎧ u ≥ 0.808 , ⎨ 0.808 u −0.808 < u < 0.808 , x= ⎩ −0.808 u ≤ −0.808 . The structure is unknown in simulation. The noise v(t) is a random signal uniformly distributed in [−0.1, 0.1] and the input is u(t) = 2 ∗ cos(ωit), i = 1, 2 with ω1 = 0.5, ω2 = 2 and Ti = 100 2ωπi . The estimate of the transfer function is given by ˆ = 0.9951s + 0.9879 G(s) s2 + 4.9907s + 5.9404 which is very close to the true but unknown G(s). For the nonlinearity, because its structure is unknown, we estimate the Fourier coefficients rî ’s first, which are shown in Table 11.5.

178

E.-W. Bai

Fig. 11.3: The true (solid) and the estimated (dash-dot) Bode plots Table 11.5: The true values and the estimates ri ’s (saturation) i

0

12

3

4

5

6

7

8

9

true ri rî , ω = 0.5 rî , ω = 2 average

0 .0032 -.0001 .0015

1 1 1 1

-.2625 -.2642 -.2673 -.2658

0 .0012 -.0007 .0003

.0889 .0859 .0827 .0843

0 .0004 -.0117 -.0057

-.0141 -.0151 -.0195 -.0173

0 0 -.0141 -.0070

-.0153 -.0092 -.0136 -.0114

0 .0014 .0002 .0008

The estimated nonlinearity (circle) is shown in Figure 11.4 by using the pairs 9

u(mTs ) = 2 ∗ cos(ωk mTs ), x(mT ˆ s ) = ∑ rî cos(iωk mTs ) i=0

Either input frequency ω1 or ω2 can be used. In our simulation, they really do not make any difference. Although the structure of the nonlinearity is not known a priori, the graph using the pair (u, x) ˆ gives a good estimate of the unknown nonlinearity. Further, if the form of f (u) is unknown, but it is known that f (u) is odd. Then, from Lemma 11.1, all the even coefficients r2i ’s are zero. In this case, we only have to identify the odd coefficients 9

x(t) ˆ =

∑

i=1,odd

rî cos(iω t) .

11

Frequency Domain Identification of Hammerstein Models

179

Fig. 11.4: The true (solid) and the estimated (circle) nonlinearities

11.6 Concluding Remarks In this chapter, we have proposed a frequency domain identification approach for Hammerstein models. By exploring the fundamental frequency, the linear part and the nonlinear part can be identified. No information on the form of the nonlinearity is assumed. The method is simple. Note that in the absence of prior information on the structure of the nonlinearity, the estimation is based on the Fourier series and thus, the rate of the convergence of the Fourier series becomes important. For those with rapidly decreasing coefficients, the first a few terms suffice to give a quite accurate approximation. This leads naturally to the question of how to speed up the convergence. There are some interesting ideas along this direction in [15]. It will certainly be a very interesting topic to pursue this further in the context of identification for Hammerstein models as discussed.

References 1. Bai, E.W.: A frequency domain approach Hammerstein model identification. IEEE Trans. on Auto. Contr. 48, 530–542 (2003) 2. Bai, E.W.: Identification of linear systems with hard input nonlinearities of known structure. Automatica 38, 853–860 (2002)

180

E.-W. Bai

3. Bai, E.W.: A blind approach to the Hammerstein–Wiener model identification. Automatica 38, 967–979 (2002) 4. Bai, E.W., Raman, S.: A linear interpolatory algorithm for robust system identification with corrupted measurement data. IEEE Trans. on Auto. Contr. 38, 1236–1241 (1993) 5. Bauer, D., Ninness, B.: Asymptotic properties of Hammerstein model estimates. In: Proc. of IEEE CDC, Sydney, Australia, pp. 2855–2860 (2000) 6. Baumgartner, S., Rugh, W.: Complete identification of a class of nonlinear systems from steady state frequency response. IEEE Trans. on Circuits and Systems 22, 753–759 (1975) 7. Brigham, E.O.: The fast Fourier transform. Prentice-Hall, Englewood Cliffs (1974) 8. Chen, J., Gu, G.: Control Oriented System Identification: An H∞ Approach. John Wiley & Sons, New York (2000) 9. Crama, P., Schoukens, J.: First estimates of Wiener and Hammerstein systems using multisine excitation. In: Proc. of IEEE Instrumentation and Measurement Conf., Budapest, Hungary, pp. 1365–1369 (2001) 10. Gardiner, A.: Frequency domain identification of nonlinear systems. In: 3rd IFAC Symp. on Identification and System Parameter Estimation, Hague, Netherlands, pp. 831–834 (1973) 11. Krzyzak, A.: On nonparametric estimation of nonlinear dynamic systems by the Fourier series estimate. Signal Processing 52, 299–321 (1996) 12. Levi, E.C.: Complex curve fitting. IEEE Trans. on Auto. Contr. 4, 37–43 (1959) 13. Ljung, L.: System Identification: Theory for the users, 2nd edn. Prentice-Hall, Upper Saddle River (1999) 14. Pintelon, R., Schoukens, J.: System Identification: A Frequency Domain Approach. IEEE Press, Piscataway (2001) 15. Tolstov, G.: Fourier Series. Prentice-Hall, Englewood Cliffs (1962) 16. Vandersteen, G., Rolain, Y., Schoukens, J.: Nonparametric estimation of the frequencyresponse function of the linear blocks of a Wiener–Hammerstein models. Automatica 33, 1351–1355 (1997) 17. Zadeh, L.: On the identification problem. IRE Trans. on Circuit Theory 3, 277–281 (1956)

Chapter 12

Frequency Identification of Nonparametric Wiener Systems Fouad Giri, Youssef Rochdi, Jean-Baptiste Gning, and Fatima-Zahra Chaoui

12.1 Introduction A great deal of interest has recently been paid to Wiener system identification Figure 12.1. However, most proposed solutions have been developed in the case of parametric systems, see e.g. [13, 14, 15, 17, 19]. As the internal signal x(t) is not accessible for measurement, and may even be of no physical meaning, the system output then turns out to be a bilinear (but fully known) function of the unknown parameters (those of the nonlinearity, on one hand, and those of the linear subsystem, on the other hand). Such bilinearity feature has been carried out following different approaches. One of them is the iterative optimisation method that consists in computing alternatively the parameters of the linear subsystem and those of the nonlinear subsystem. When optimisation is performed with respect to one set of parameters, the other set is fixed. Such an iterative procedure is shown to be efficient provided that it converges, e.g. [17]. But, convergence cannot be guaranteed except under restrictive conditions, e.g. [20]. In [2] a separable nonlinear optimisation solution is suggested. It consists in expressing a part of the parameters (namely, those of the linear subsystem) in function of the others, using first order optimality conditions. The dimension of the parameter space is thus reduced making easier the optimisation problem. Frequency-type solutions have also been proposed, see e.g. [5]. The idea is to apply repeatedly a sine input with different amplitudes and frequencies. Then, exploiting the polynomial nature of the nonlinearity, the input-output equation can Fouad Giri GREYC Lab,University of Caen, France Youssef Rochdi University of Cadi Ayyad, Marrakech, Morocco Jean-Baptiste Gning Crouzet Automatismes, Valence, France Fatima-Zahra Chaoui ENSET, Rabat, Morocco F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 181–207. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

182

F. Giri et al.

be uniquely solved with respect to the unknown parameters. Further details on the separable least squares approach can be found in Chapter 16. Nonparametric nonlinearities have also been dealt with using different approaches. In [7, 8, 9, 10] the identification problem is dealt with using stochastic tools. But, the nonlinearity is supposed to be invertible and smooth. Another stochastic approach has been proposed in [12] where the linear subsystem coefficients are first estimated using a least-squares type algorithm and the obtained estimates are used to recover the nonlinearity N(x) at any fixed x. Consistency is established supposing that the linear subsystem is MA with known and nonzero leading coefficient and, in addition, the nonlinearity is continuous with growth not faster than a polynomial. In [1] a frequency method is proposed for noninvertible nonlinearities. It consists in applying repeatedly sine input signals and operating a discrete-Fourier transformation of the obtained (steady-state) periodic output signals to estimate the frequency gain (at different frequencies) and the nonlinearity. In [4], a recursive identification scheme is proposed for Wiener systems including a dead-zone preload nonlinearity. There too, consistency is only achieved in presence of MA subsystems with known and nonzero leading coefficient. More recently, a semi-parametric identification approach has been presented in [16]. The impulse response function of the linear subsystem is identified via the nonlinear least squares approach with the system nonlinearity estimated by a pilot nonparametric kernel regression estimate. The linear subsystem estimate is then used to form a nonparametric kernel estimate of the nonlinearity. The price paid is that the impulse response function is a finite order which actually amounts to suppose the linear subsystem to be MA. The maximum likelihood method recently proposed in [11] constitutes an interesting alternative. Then, consistency of parameter estimates is guaranteed provided all inputs (control and noises) are Gaussian processes. In the present monograph, the above stochastic approaches are dealt with in Chapters 6 to 10. In the present chapter, a frequency-domain identification scheme is presented for Wiener systems involving possibly noninvertible and nonsmooth nonlinearities. The identification purpose is to estimate the system nonlinearity f (.) and the system phase and gain (∠G ( jωk ) , |G ( jωk ) |), for a set of a priori chosen frequencies ωk (k = 1 . . . m). As a matter of fact, the complexity of the identification problem partly depends on the assumptions made on the system. Presently, no assumption is made on the linear part G(s), except stability, while the nonlinear element is allowed to be noninvertible and nonsmooth. The problem complexity also lies in the fact that the (unavailable) internal signal (x (t)) cannot be uniquely reconstituted from the input-output signals. Consequently, the system cannot be uniquely modelled i.e. any couple of the form (G(s)/K, f (Kx)) (K = 0) is a possible model. The present frequency-domain identification method relies on analytic geometry tools first introduced in [6]. It particularly involves Lissajous like curves and their geometrical characteristics e.g. area and spread. These tools and other portions of [6] are presently reused with permission from the IEEE Intellectual Property Rights Office.

12

Frequency Identification of Nonparametric Wiener Systems

183

Fig. 12.1: Wiener model structure

12.2 Identification Problem Statement We are considering nonlinear systems that can be described by the Wiener model of Fig12.1 where G(s) denotes the transfer function of the linear subsystem and f (.) is a memoryless nonlinear function. Analytically, the Wiener model is described by the equations x(t) = g(t) ∗ u(t) , y(t) = w(t) + v(t)

with w(t) = f (x(t)) ,

(12.1) (12.2)

where the symbol ∗ refers to the convolution operation, g(t) = L−1 (G(s)) is the inverse Laplace transform of G(s) (g(t) is also the impulse response of the linear subsystem). Only the input u(t) and the output y(t) are accessible to measurement, the internal signals x(t) and w(t) are not. The equation error v(t) is a random signal that accounts for output noise. The linear subsystem is supposed to be BIBO stable (which is normal when system identification is carried out in open loop). Also, {v(t)} is assumed to be a stationary ergodic sequence of zero-mean independent random variables. Ergodicity makes possible the substitution of arithmetic averages to probabilistic means, simplifying forthcoming developments. As this is usually the case, the frequency-domain identification method we are developing necessitates the application of sine signals, u(t) = U cos(ω t), for a set of a priori chosen amplitudes and frequencies (U, ω ) ∈ {(Uk , ωk ); k = 1, . . . , m}. Thanks to linear subsystem stability, the steady-state internal signal turns out to be de f

of the form x(t) = U|G( jω )| cos(ω t − ϕ (ω )) with ϕ (ω ) = −∠G( jω ). With these notations, it is supposed that f (.) is defined on all intervals −Uk |G( jωk )| ≤ x ≤ Uk |G( jωk )| and falls in the following class of functions: Assumption 12.1 a. If f (.) is even then, f (0) should be known and f −1 ( f (0)) = {0} b. If f (.) is not even then, there should exist , −1 ≤ σ0 ≤ σ1 ≤ 1 such that f (.) is locally invertible on the subsets σ0Uk |G( jωk )| σ1Uk |G( jωk )| (k ∈ {1, . . . , m}) , i.e. one has, for all x ∈ σ0Uk |G( jωk )| σ1Uk |G( jωk )| and all z ∈ [−Uk |G( jωk )|, Uk |G( jωk )|]; f (x) = f (z) ⇒ x = z.

184

F. Giri et al.

Without loss of generality we suppose that |G( jωk )| = 0, for all k. Indeed, if |G( jωk )| were zero (for some k) then x(t) would be null and the output y(t) would in turn be null (up to noise). This case can easily be recognised in practice and discarded, observing the output. Except for the above assumption, the system is arbitrary. In particular, G(s) and f (.) are not necessarily parametric and the latter is allowed to be noninvertible and nonsmooth. On the other hand, note that Part b of Assumption 12.1 is not too restrictive because the sizes of the subintervals (σ0Uk |G( jωk )| σ1Uk |G( jωk )|) are unknown and may be arbitrarily small (i.e. σ1 may be too close to σ0 ); this feature guarantees a wide application field of the proposed identification approach. The identification scheme must estimates of the sys# " be able to provide accurate

jωk ), f (x) (k = 1, . . . , m).

jωk )|, ∠G( tem frequency characteristics |G( As previously mentioned, any couple of the form (G(s)/K, f (Kx)) (K = 0) is a possible model for the considered system. This naturally leads to the question: what particular model should the identification method focuses on? This question will be answered later in Subsection 12.4.2 . At this point, let us just emphasise that, as long as the system phase is concerned, all models are similar to either (G, f ) or (G− , f − ), de f

de f

depending on the sign of K, where G− (s) = −G(s), f − (x) = f (−x). Therefore, we begin the identification process designing a phase estimator. The main design ingredients are developed in the next section.

12.3 Frequency Behaviour Geometric Interpretations All along this section, the Wiener system is excited by a sine input signal u(t) = U cos(ω t) where (U, ω ) is any fixed couple belonging to the set {(Uk , ωk ) ; k = 1, . . . , m} Using the models (G, f ) and (G− , f − ) the resulting steady-state internal signals are respectively defined by the equations x(t) = U|G( jω )| cos (ω t − ϕ (ω )) and w(t) = f (x(t)) and (12.3) x− (t) = U|G( jω )| cos (ω t − ϕ (ω ) − π ) and w(t) = f − (x− (t)) .(12.4) Let us also define the corresponding known normalised signals: xn (t) = cos (ω t − ϕ (ω )) , x− n (t) = cos (ω t − ϕ (ω ) − π ) .

(12.5)

12.3.1 Characterisation of the Loci (xn (t), w(t)) and (x− n (t), w(t)) The aim of this subsection is to establish key properties that characterise the parametrised curves (xn (t), w(t)) and (x− n (t), w(t)). First, notice that the signals xn (t) and x− n (t) are particular elements of the more general class of signals

χψ (t) = cos(ω t − ψ ), ψ ∈ R .

(12.6)

12

Frequency Identification of Nonparametric Wiener Systems

185

Indeed, it is readily seen that: xn (t) = cos (ω t − ϕ (ω )) = χϕ (ω ) (t), x− n (t) = − cos (ω t − ϕ (ω )) = χϕ (ω )+π (t) .

(12.7)

Now, let Cψ (U, ω ) be the parametrised locus constituted of all points of coordinates χψ (t), w(t) (t ≥ 0), i.e. Cψ (U, ω ) = { χψ (t), w(t) ; t ≥ 0} . (12.8) As χψ (t) is periodic (with period T = 2π /ω ) and, in steady state, w(t) is in turn periodic with period nT (for some n ∈ N∗ ) the curve Cψ (U, ω ) turns out to be an oriented closed-locus. In general, Cψ (U, ω ) is constituted of one or several loops, (see e.g. Figure 12.2). In the particular case where w(t) is a sine signal, Cψ (U, ω ) is a Lissajous curve 1 and may present different shapes depending on the value of ψ , e.g. an ellipse, a circle or a line [18]. As, presently, w(t) is not necessarily a sine signal, the curve Cψ (U, ω ) is referred to Lissajous like. Furthermore, Cψ (U, ω ) and Cψ +π (U, ω ) are symmetric with respect to the waxis, a property that will prove to be useful in the forthcoming sections. For now, the only characteristic of Cψ (U, ω ) that is of interest is its geometric area, denoted A(ψ ). The geometric area of a single loop curve ignores the curve orientation and so is positive. This is to be distinguished from the algebraic area that may be positive or negative, depending on the curve orientation sense. In the case of a multi-loops locus, the global geometric area equals the sum of the geometric areas of the different single loops. Figure 12.2 shows an oriented closed-locus composed of two loops. Now, we are ready to introduce the next important definition. Definition 12.1. The closed-locus Cψ (U, ω ) is said to be static if its geometrical area is null (A(ψ ) = 0). Then Cψ (U, ω ) looks like a standard curve (non-closed locus). Inversely, Cψ (U, ω ) is said non-static (or memory) when (A(ψ ) = 0). Proposition 12.1. Consider the Wiener system described by equations (1)-(2), subject to Assumption 12.1, excited by sine inputs u(t) = U cos(ω t), with (U, ω ) ∈ {(Uk , ωk ) ; k = 1, . . . , m}. Then, the following facts hold: 1. For any h and any ω , the Curve Cϕ (ω )+2hπ (U, ω ) is symmetric to Cϕ (ω )+(2h+1)π (U, ω ) with respect to the ordinate axis (w-axis). 2. The curves Cϕ (ω )+hπ (U, ω )(h = 0, ±1, ±2, . . .; ω = ω1 , . . . , ωm ) are all static. Proof 1. From (12.2) (12.3) (12.6) and (12.7) it follows that: w(t) = f U|G( jω )|χϕ (ω ) (t) = f − U|G( jω )|χϕ (ω )+π (t) . 1

(12.9)

Lissajous curves are the family of parametric curves (x(t), y(t)) with x(t) = A cos(ωx t + δx ) and y(t) = B cos(ωyt + δy ). They were studied in 1857 by the French physicist JulesAntoine Lissajous [18].

186

F. Giri et al.

Fig. 12.2: An example of closed-locus with null algebraic area while its geometric area is the sum of areas of the four loops

On the other hand, one has: χϕ (ω ) (t) if h is even, χϕ (ω )+hπ (t) = χϕ (ω )+π (t) = −χϕ (ω ) (t) if h is odd.

(12.10)

Then, it follows from (12.8) that Cϕ (ω )+2hπ (U, ω ) and Cϕ (ω )+(2h+1)π (U, ω ) coincide respectively with Cϕ (ω ) (U, ω ) and Cϕ (ω )+π (U, ω ). Recall that Cϕ (ω )+hπ (U, ω ) is the set of all points (cos(ω t − ϕ (ω )), w(t)) and Cϕ (ω )+(2h+1)π (U, ω ) is the set of (− cos(ω t − ϕ (ω )), w(t)). Therefore Cϕ (ω )+2hπ (U, ω ) and Cϕ (ω )+(2h+1)π (U, ω ) are symmetric with respect to ordinate axis (0, w). 2. Since f (.) and f − (.) are functions (in the standard sense), it follows from (12.8) that Cϕ (ω ) (U, ω ) and Cϕ (ω )+π (U, ω ) are static curves. Proposition 12.2. Consider the problem statement of Proposition 12.1. if Cψ (U, ω ) is static for some ψ , then one has: 1. f (U|G( jω )| cos(θ )) = f (U|G( jω )| cos(θ − 2(ψ − ϕ ))), for all θ . 2. f − (U|G( jω )| cos(θ )) = f − (U|G( jω )| cos(θ − 2(ψ − ϕ ))), for all θ . 3. If the function f (.) is even, then: ψ − ϕ (ω ) = hπ or π2 + hπ , for some h = 0, ±1, ±2 . . . Proof. From (12.3)-(12.4) one has, for all t: w(t) = f (U|G( jω )| cos(ω t − ϕ (ω ))) = f − (−U|G( jω )| cos(ω t − ϕ (ω ))) .

12

Frequency Identification of Nonparametric Wiener Systems

187

On the other hand, if Cψ is static then there exists a function g(.) such that: w(t) = g(cos(ω t − ψ )) .

(12.11)

f (U|G( jω )| cos(ω t − ϕ (ω ))) = g(cos(ω t − ψ )) ,

(12.12)

f − (−U|G( jω )| cos(ω t − ϕ (ω ))) = g(cos(ω t − ψ )) .

(12.13)

Then, one has for all t:

On the other hand, it can be easily checked that: # " " # " " ψ# ψ# − ψ = cos ω T − t + − ψ (for all t) , cos ω t + ω ω

(12.14)

where T = 2π /ω . Then, ones gets that: " " " ψ# ## " " " ## ψ# g cos ω t + − ψ = g cos ω T − t + − ψ (for all t), ω ω which, together with equations (12.12) and (12.13), yields (for all t): f (U|G( jω )| =

f − (−U|G( jω )| =

" " ψ# ## − ϕ (ω ) cos ω t + , ω " " " ## ψ# − ϕ (ω ) f U|G( jω )| cos ω T − t + . ω " " ψ# ## − ϕ (ω ) cos ω t + , ω " " ## " ψ# f − −U|G( jω )| cos ω T − t + . − ϕ (ω ) ω

These immediately imply, respectively: f (U|G( jω )| = = f − (−U|G( jω )| = =

cos (ω t + ψ − ϕ (ω ))) f (U|G( jω )| cos (ω T − ω t + ψ − ϕ (ω ))) f (U|G( jω )| cos (ω t − ψ + ϕ (ω ))) , cos (ω t + ψ − ϕ (ω ))) f − (−U|G( jω )| cos (ω T − ω t + ψ − ϕ (ω ))) f − (−U|G( jω )| cos (ω t − ψ + ϕ (ω ))) .

Parts 1 and 2 of the current proposition follows from these expressions letting θ = ω t + ψ − ϕ (ω ). To prove Parts 3 and 4, let us introduce the following notations:

188

F. Giri et al.

where:

2(ψ − ϕ (ω )) = δ (ω ) + 2hπ

(12.15)

0 ≤ δ (ω ) < 2π and h = 0, ±1, ±2, . . .

(12.16)

Then, using Parts 1 and 2, it follows that, for all θ ∈ [0, 2π ) : f (U|G( jω )| cos(θ )) = f (U|G( jω )| cos(θ − δ ) .

(12.17)

Case 1: f (.) is not even. It follows from (12.10)-(12.11), using Assumption 12.1 (Part b), that the function f is invertible in the subinterval (σ0U|G( jω )| σ1U|G( jω )|) . Then, from (12.17) it follows that for any θ ∈ (0, 2π ) such that cos(θ ) ∈ (σ0 σ1 ), one has: (12.18) cos(θ ) = cos(θ − δ (ω )) . Now, it can be easily checked that if δ (ω ) = 0, then for all θ ∈ (0, 2π ) − δ (ω ) δ (ω ) { 2 ,π + 2 } : (12.19) cos(θ ) = cos(θ − δ (ω )) . But this clearly contradicts (12.18); proving thus Part 3 of the current proposition. Case 2: f (.) is even. Letting θ = π /2 in (12.17), one gets " "π ## − δ (ω ) f (0) = f U|G( jω )| cos . 2 " Then it follows from Assumption 12.1 (part a) that cos π2 −δ (ω ) = 0 i.e.

δ (ω ) = −2iπ or δ (ω ) = π − 2iπ for some integer i = 0, ±1, ±2, . . .) (12.20) This, together with (12.15) implies that, either ψ − ϕ = (h − i)π or ψ − ϕ = π + (h − i)π which proves Part 4 and completes the proof of Proposition 12.2. 2 Proposition 12.3. Consider the problem statement of Proposition 12.1, If Cψ (U, ω ) is static then there exists a unique function g(.), such that: g(cos(ω t − ψ )) = w(t), ∀t and g(0) = f (0). More specifically, for all z ∈ [−1 + 1]: g(z) =

f (U|G( jω )|z) f − (U|G( jω )|z)

i f ψ − ϕ = 2hπ (for some integer h), i f ψ − ϕ = π + 2hπ .

Proof. The fact that Cψ (U, ω ) is static guarantees the existence of a g(.) such that: g(cos(ω t − ψ )) = w(t), ∀t .

(12.21)

12

Frequency Identification of Nonparametric Wiener Systems

189

The uniqueness of g(0) is proved separating the two cases referred to in Assumption 12.1. Case 1: the function f (0) is not even. Using Proposition 12.2 (Part 3) it follows that ψ − ϕ (ω ) = hπ , for some integer h. Then (12.21), implies: " # g (−1)h cos(ω t − ϕ (ω )) = w(t), ∀t (12.22) Comparing (12.22) with (12.3) yields: " # g (−1)h cos(ω t − ϕ (ω )) = f (U|G( jω )| cos(ω t − ϕ (ω ))) which implies that, for all z ∈ [−1 + 1]: g(z) = g(z) =

f (U|G( jω )|z) f − (U|G( jω )|z)

if h is even , if h is odd .

(12.23) (12.24)

Hence, Proposition 12.3 holds in Case 1. Case 2: the function f (0) is even. Using Proposition 12.2 (Part 4) it follows that

ψ − ϕ (ω ) = kπ or ψ − ϕ (ω ) =

π + kπ (for some h = 0, ±1, ±2, . . .) . (12.25) 2

Let us show, by contradiction, that the second solution in (12.25) can not hold. To this end, assume that, for some integer k:

ψ − ϕ (ω ) =

π + kπ . 2

It follows from (12.21) that, for all t, " # g (−1)k sin(ω t − ϕ (ω )) = w(t) .

(12.26)

(12.27)

Comparing (12.27) with (12.3) yields: " # g (−1)k sin(ω t − ϕ (ω )) = f (U|G( jω )| cos(ω t − ϕ (ω ))) . This can be given a more compact form, letting θ = ω t − ϕ (ω ): # " g (−1)k sin(θ ) = f (U|G( jω )| cos(θ )) , ∀θ .

(12.28)

Substituting θ + π to θ in (12.28) implies that, for all θ : " # g −(−1)k sin(θ ) = f (−U|G( jω )| cos(θ )) .

(12.29)

As f is even, it follows, comparing (12.28)-(12.29), that g is in turn even. Then, one gets from (12.28) that, for all θ :

190

F. Giri et al.

f (U|G( jω )| cos(θ )) = g

5 1 − cos2 (θ ) .

(12.30)

1 Using the variable change z = 1 − cos2 (θ ), it follows from (12.30) that, ∀z ∈ [−1 + 1]: # " 1 (12.31) g(z) = f U|G( jω )| 1 − z2 . Let us check that such a solution g(.) is not admissible in the sense of Assumption 12.1 (Part a). Indeed, one readily gets from (12.31) that: g(0) = f (U|G( jω )|) .

(12.32)

On the other hand, it follows from Assumption 12.1 (Part a) that: f (x) = f (0) =⇒ x = 0 .

(12.33)

Since |G( jω )| = 0 ( for all ω ∈ {ω1 , . . . , ωm }) it follows from (12.32)-(12.33) that g(0) = f (0). This clearly shows that g(.) does not satisfy Assumption 12.1 (Part a) and, so, is not admissible. Hence, the solution (12.26) should be discarded. Then, in view of (12.25), one necessarily has ψ − ϕ = kπ . The rest of the proof is similar to Case 1. The proof of Proposition 12.3 is completed.

12.3.2 Estimation of the Loci Cψ (U, ω ) Proposition 12.3 is quite interesting as it shows that ϕ (ω ) = −∠G( jω ) can be recovered (modulo π ) by just tuning the parameter ψ until the closed-locus Cψ (U, ω ) becomes static. Furthermore, this proposition also says that the obtained static curve is precisely the graphical plot of either f (U|G( jω )|x) or f (−U|G( jω )|x). However, these results cannot be directly used because the plotting of loci Cψ (U, ω ) necessitates the availability of the signal w(t) which is not accessible for measurement. This is presently coped with making full use of the information at hand, namely the periodicity (with period 2π /ω ) of both χψ (t) and w(t) and the ergodicity of the noise v(t). Bearing these in mind, the relation y(t) = w(t) + v(t) suggests the following estimator: de f

w(t,

N) =

1 N ∑ y(t + kT ); t ∈ [0, T ) , N k=1

(12.34)

where T = 2π /ω and N is a sufficiently large integer. Specifically, for a fixed time instant t, the quantity w(t,

N) is the mean value of the (measured) sequence {y(t + kT ); k = 0, 1, 2, . . .}. Then, an estimate C ψ ,N (U, ω ) of Cψ (U, ω ) is simply obtained

substituting w(t,

N) to w(t) when constructing Cψ (U, ω ). Accordingly, Cψ ,N (U, ω)

N) turns out to be the parametrised locus including all points χψ (t), w(t, (t ≥ 0), i.e.

12

Frequency Identification of Nonparametric Wiener Systems

C ψ ,N (U, ω ) = { χψ (t), w(t)

; t ≥ 0} .

191

(12.35)

The above remarks lead to the following proposition: Proposition 12.4. Consider the problem statement of Proposition 12.1. Then, one has: 1. w(t,

N) converges in probability to w(t) (as N → ∞). 2. C ψ ,N (U, ω ) converges in probability to Cψ (U, ω ) (as N → ∞). i.e. one has for all t ≥ 0:

N) = χψ (t), w(t) (w.p.1) . (12.36) lim χψ (t), w(t, N→∞

3. Consequently, if ψ = ϕ (ω ) + hπ then lim C ψ ,N (U, ω ) is static (w.p.1). N→∞

4. Inversely, if lim C ψ ,N (U, ω ) is static for some ψ , then one of the following stateN→∞

ments hold w.p.1: a. ψ = ϕ (ω ) + 2hπ (for some h = 0, ±1, ±2, ±3 . . .) and the mapping cos(ω t − ψ ) → w(t,

N) coincides with the curve of the function f (U|G( jω )|x) with x ∈ [−1 + 1]. b. ψ = ϕ (ω ) + π + 2hπ for some h = 0, ±1, ±2, ±3 . . .) and the mapping cos(ω t − ψ ) → w(t,

N) coincides with the curve of the function f − (U|G( jω )|x) with x ∈ [−1 + 1]. Proof. From (12.2) one gets that, for all t ∈ [0 T ): y(t) = w(t) + v(t) .

(12.37)

Then, using the fact that w(t) is periodic of period T , it follows from (12.37) that, for all t ∈ [0 T ) and all integers k : y(t + kT ) = w(t) + v(t + kT ), which in turn implies that: w(t,

N) =

1 N 1 N y(t + kT ) = w(t) + ∑ v(t + kT ) . ∑ N k=1 N k=1

(12.38)

Since {v(tk )} where tk = t + kT , is zero mean and ergodic, the last term on the right side vanishes w.p.1 as N → ∞ . This proves Part 1 of the proposition. To prove χψ (t) and Part 2, notice that, using the fact that both w(t) are periodic (with period T ) one has xψ (t + kT ), y(t + kT ) = xψ (t), w(t) + (0, v(t + kT )). Averaging both sides (with respect to k) gives: ' & ' & 1 N 1 N (12.39) xψ (t), ∑ y(t + kT ) = xψ (t), w(t) + 0, ∑ v(t + kT ) . N k=1 N k=1

192

F. Giri et al.

Again, the last term on the right side vanishes w.p.1 as N → ∞ , for the same reasons as above. This proves Part 2 of the proposition. Part 3 is a consequence of Part 2 and Proposition 12.1. Part 4 follows from Part 2 and Proposition 12.3. Remark 12.1. In the light of Proposition 12.4 (Part 4) it is seen that, for sufficiently large N, all curves belonging to the family {C ϕ (ωk ),N (U, ω ); k = 1, . . . , m} are static, concentric and any one of them is a, more or less, spread version of all the others. The same remark applies to the family {C ϕ (ωk )+pi,N (U, ω ); k = 1, . . . , m}. The notion of spread is illustrated by Figure 12.3 and analytically defined in Section 12.5.

12.4 Wiener System Identification Method 12.4.1 Phase Estimation (PE) Proposition 12.4 and Remark 12.1 suggest the following phase estimator: PE1 PE2

For a fixed (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )}, apply the input signal u(t) = U cos(ω t) to the nonlinear system of interest. Get a recording of the output y(t) over a sufficiently large interval of time 0 ≤ t ≤ NT . Generate the continuous signals w(t,

N − 1) and w(t,

N) using (12.34). Compute the L2 [0, T ] norms, i.e.:

and

"! #1/2 T w(N)

N)2 dt 2= 0 w(t, "! #1/2 w(N)

− w(N

− 1)2 = 0T (w(t,

N) − w(t,

N − 1))2 dt .

Choose a threshold 0 < ε (n + 1) suffices [4]. Note that the key of the blind approach is to cancel the unknown signals u from equation (19.6) to form equation (19.7). This is possible for any l ≥ (n + 1) because if h = T /(n¯ + 1) or T = h(n¯ + 1) for any n¯ ≥ n, we have ¯ l = 1, 2, ... u[(l − 1)T ] = u[(l − 1)T + h] = .... = u[(l − 1)T + nh],

Fig. 19.5: Output error versus order, Example 19.3

330

E.-W. Bai

Hence, equation (19.7) follows. Of course, details of the algorithm including equations are not exactly same when l > (n + 1) instead of l = (n + 1). However, the idea remains the same and all modifications are minor. In theory, as long as l ≥ (n + 1), blind identification is possible. In practice, however, there is a limit on how large l can be. For a very large l or equivalently a very small h, two consecutive samples will likely have similar values and this makes the blind identification numerically ill-conditioned. A good choice of h is T 1 ≤h≤ 2 fy (n + 1) where fy is the bandwidth of the output y. Clearly from the sampling theorem, h = 21fy implies that y(t) can be completely determined from y[kh] and further increasing l or equivalently reducing h will not provide any additional information and will only make the algorithm ill-conditioned. The over-sampling approach is to fix the input sampling interval at T and over-sample the output at h so that l = T /h = (n+1) ¯ ≥ (n+1). Another avenue to make l = (n¯ + 1) ≥ (n + 1) is the under-sampling approach. By letting the output sampling interval h = T and keeping the input constant between k(n¯ + 1)T and (k + 1)(n¯ + 1)T for each k, we have l = (n¯ + 1). This under-sampling approach would avoid the numerical instability problem at a price that (1) the utilisation of time is less efficient prolonging the identification process and (2) the system may be excited only at low frequency ranges. It is conceivable that an “optimal” way to achieve l ≥ (n + 1) in some cases could be the one that combines both the over-sampling and under-sampling approaches. 6. Relation with the step response identification. In a way, the blind technique presented in this chapter may be considered as repeatedly applying piece-wise constant inputs. Conceptually, a number of step responses could be used to give information first of the output nonlinearity and the linear part, and then of the input nonlinearity. However, the blind technique works fundamentally different from the traditional step response identification method [17]. The traditional step response method relies heavily on the steadystate value y(t), t → ∞, of the step response and would suffer from large noises at the end of transient. This is specially true in the setting of parametric identification and therefore, it is suggested to apply the step response identification method several times to average out the effect of noises [17]. Clearly, with only the output measurements y[kT ], blind identification is not possible if both the input and output sampling intervals are fixed at T . This is because two different sampled systems combined with properly chosen inputs could provide the identical output y[kT ]. With additional intermediate values y[kh] (h ≤ T /(n + 1)) between kT and (k + 1)T , however, the choice of the system becomes unique. This is the basic observation that tells us why and how the blind technique works. Now, with the output observations over each T , an equation (19.7) related to the unknown parameters is derived. Obviously from (19.7),

19

A Blind Approach to the Hammerstein-Wiener Model Identification

331

the blind technique does not rely heavily on any particular value of the output observation but depends on each one y[kh] equally. 7. Identifiability With the PE inputs and Assumptions 19.1, 19.2, and 19.3, the identifiability of the Hammerstein-Wiener model shown in Figure 19.1 can be easily established. Identifiability here means that the representation of the system is unique in the absence of noise and model uncertainties. This can be seen easily. With the PE regressors, the solutions of (19.7) and (19.13) are unique. Moreover, the true but unknown system parameters are solutions in the absence of noises and model uncertainties. This establishes the identifiability.

19.4 Concluding Remarks In this chapter, we proposed a blind identification approach to sampled Hammerstein-Wiener models with the structure of the input nonlinearity unknown. The idea is to recover the internal signals u[kT ] and x[kT ] solely based on the output measurements. This is essential because the input nonlinearity has an unknown structure. The purpose of the chapter is to present the main idea and to illustrate the effectiveness of the proposed approach. Some important topics were not discussed in the chapter, e.g., how exactly the noise would influence the estimates in blind identification? This is an interesting and difficult question, and it gets more difficult to separate the effects of the noise from the under-parametrisation in inverting the output nonlinearity. We expect that the findings will be quite different from the existing results on the (non-blind) system identification. Another important issue is the application of the proposed method to a real world problem.

References 1. Bai, E.W.: A blind approach to Hammerstein-Wiener model identification. Automatica 38, 967–979 (2002) 2. Bai, E.W.: Identification of systems with hard input nonlinearities. In: Moheimani, R. (ed.) Perspectives in Control. Springer, Heidelberg (2001) 3. Bai, E.W.: An optimal two-stage identification algorithm for Hammerstein-Wiener nonlinear systems. Automatica 34(3), 333–338 (1998) 4. Bai, E.W., Fu, M.: Blind system identification and channel equalization of IIR systems without statistical information. IEEE Trans. on Signal Processing 47(7), 1910–1921 (1999) 5. Bai, E.W., Fu, M.: Hammerstein model identification: a blind approach, Tech Report, Dept of Elec. and Comp., Univ. of Iowa (2001) 6. Bilings, S.A., Fakhouri, S.Y.: Identification of a class of nonlinear systems using correlation analysis. Proc. of IEE 125(7), 691–697 (1978) 7. Boutayeb, M., Rafaralahy, H., Darouach, M.: A robust and recursive identification method for Hammerstein model. In: IFAC World Congress 1996, San Francisco, pp. 447–452 (1996) 8. Chang, F., Luus, R.: A non-iterative method for identification using Hammerstein model. IEEE Trans. on Auto. Contr. 16, 464–468 (1971)

332

E.-W. Bai

9. Greblicki, W.: Nonparametric identification of Wiener system. IEEE Trans. on Information Theory 38, 1487–1493 (1992) 10. Hsia, T.: A multi-stage least squares method for identifying Hammerstein model nonlinear systems. In: Proc. of CDC, Clearwater Florida, pp. 934–938 (1976) 11. Haber, R., Unbehauen, H.: Structure identification of nonlinear dynamic systems-a survey of input/output approaches. Automatica 26, 651–677 (1990) 12. Kalafatis, A.D., Wang, L., Cluett, W.R.: Identification of Wiener-type nonlinear systems in a noisy environment. Int. J. Contr. 66, 923–941 (1997) 13. Ljung, L.: System Identification: Theory for the User. Prentice-Hall, Englewod Cliffs (1987) 14. Narendra, K.S., Gallman, P.G.: An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Trans. on Auto. Contr. 11, 546–550 (1966) 15. Pajunen, G.: Adaptive control of Wiener type nonlinear systems. Automatica 28, 781– 785 (1992) 16. Pawlak, M.: On the series expansion approach to the identification of Hammerstein systems. IEEE Trans. on Auto Contr. 36, 763–767 (1991) 17. Rake, H.: Step response and frequency response methods. Automatica 16, 519–526 (1980) 18. Soderstrom, T., Stoica, P.: System Identification. Prentice-Hall, Englewood Cliffs (1989) 19. Stoica, P.: On the convergence of an iterative algorithm used for Hammerstein system identification. IEEE Trans. on Auto. Contr. 26, 967–969 (1981) 20. Sun, L., Liu, W., Sano, A.: Identification of dynamical system with input nonlinearity. IEE Proc. Control Theory Appl. 146(1), 41–51 (1998) 21. Wigren, T.: Convergence analysis of recursive identification algorithms based on the nonlinear Wiener model. IEEE Trans. on Auto Contr. 39, 2191–2206 (1994)

Part VII

Decoupling Inputs and Bounded Error Methods

Chapter 20

Decoupling the Linear and Nonlinear Parts in Hammerstein Model Identification Er-Wei Bai

20.1 Problem Statement Consider a discrete time Hammerstein model [2, 8, 9, 10] with y(k), u(k) and v(k) being system output, input and noise respectively. The internal signal x(k) is not measurable. The linear system is assumed to be exponentially stable represented by a nonparametric transfer function ∞

G(z) = ∑ θi z−i

(20.1)

i=1

for some |θi | ≤ M λ i , 0 < M < ∞ and 0 ≤ λ < 1. The nonlinear block represents a static nonlinearity ∞

x = f (u, η ) = ∑ ηi pi (u) i=0

{pi (u)}∞ 0,

e.g., the power series or some orthonormal sysfor some base functions tems. These base functions can be unknown in identification. The purpose of identification is to find a pair of estimates Gˆ and fˆ based on the input-output date set {u(k), y(k)}N1 . In the non-parametric case, the transfer function estimate is represented by a FIR system n

ˆ θˆ n ) = ∑ θˆn,i z−i , θˆ n = (θˆn,1 , . . . , θˆn,n ) . G(z,

(20.2)

i=1

To estimate the coefficients θˆn,i , define the output prediction ˆ θˆ n ) f (u(k), ηˆ ) , y(k) ˆ = G(z, where ηˆ is the estimate of η . Now, consider the quadratic error criterion for each n Er-Wei Bai Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA 52242 USA e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 335–345. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

336

E.-W. Bai

1 N 2 JN (θˆ n , ηˆ ) = ∑ (y(k) − y(k)) ˆ N k=1

(20.3)

and the associated estimates (θˆ n , ηˆ ) = arg min JN (θˆ n , ηˆ ) . The quadratic error (20.3) is one of the most common cost functions in the identification literature and is certainly the case for Hammerstein model identifications. Many well known Hammerstein model identification methods [2, 3, 5, 8] belong to this class and the differences lie only in the detail techniques for solving (20.3). It is well known [6], provided that the noise is white, zero mean and independent of the input, that with probability one

θˆ n → θ∗ , ηˆ → η∗ , as N → ∞ for some θ∗ and η∗ satisfying (θ∗ , η∗ ) = arg min lim EJN (θˆ n , ηˆ ) N→∞

where E denotes the expectation operator over the probability space upon which the random variable is defined. Thus, the parameter and the transfer function estimates are convergent. This is similar to the linear case. The convergence rate is however, a different story. Recall that in a linear setting, i.e., in the absence of the nonlinearity ˆ jω , θˆ n ) is asymptotically given by [6, 7] f , the variance of the estimate G(e ˆ jω , θˆ n )} ≈ Var{G(e

n Φv (ω ) N Φu (ω )

(20.4)

where Φv (ω ) and Φu (ω ) are the spectral densities of the noise v(·) and the input u(·) respectively. Naturally, in the presence of the nonlinearity, one would expect ˆ jω , θˆ n )} ≈ n Φv (ω ) , where Φx (ω ) is the spectral density of the internal sigVar{G(e N Φx (ω ) nal x = f (u, η ) which is the input to the linear system. This is however not the case. As shown in [9], the asymptotic variance for the Hammerstein model identification is given by ˆ jω , θˆ n )} ≈ Var{G(e n Φv (ω ) 1 d d ˆ jω , θˆ n )|2 + ( log Φx (ω )) Pη ( log Φx (ω ))|G(e N Φx (ω ) 4N d η dη

(20.5)

where Pη /N is the variance matrix of the estimate ηˆ . The first term in (20.5) is as expected and the second term is from the coupling between the linear part and the nonlinear part. This coupling is one of the major difficulties in Hammerstein model identifications that not only makes identification of the linear part nonlinear,

20

Decoupling the Linear and Nonlinear Parts

337

but also adds an unavoidable degradation to the accuracy of the estimates. Therefore, it is very desirable, if possible, to reduce the effect of the coupling. This seems to be a non-trivial problem. Note that the asymptotic variance (20.5) is based on the quadratic criterion (20.3) and is independent of the technique which solves the minimisation. In other words, the asymptotic variance is independent of the identification algorithm. On the other hand, the asymptotic variance does rely on the spectral density function Φx (ω ) that in turn depends on the structure of the nonlinearity. If the structure of the unknown nonlinearity were available, one could adjust the input spectral density so that the second term is reduced. The problem is clearly that the structure of the nonlinearity is not assumed to be known and any method which tries to reduce the coupling effect should be independent of the structure of the unknown nonlinearity. In this chapter, we propose a two step identification algorithm. In the first step, the linear part is identified using the Pseudo-Random Binary Sequences (PRBS) input. With the help of the PRBS input, we show that identification of the linear part is decoupled from the nonlinear part and the resultant asymptotic variance is give by (20.4) exactly the same as if the unknown nonlinearity is absent. In other words, the decoupling is achieved and the effect of the coupling is eliminated. Moreover, this decoupling is independent of the nonlinearity which could be discontinuous and unknown.

20.2 Nonlinearity with the PRBS Inputs The PRBS input is a signal that shifts between two levels ±c for some c = 0. For simplicity, we assume in this chapter that c = 1. Because the PRBS input is easy to generate and has desired properties similar to a white noise, it is widely and successfully used in linear system identifications [6, 11]. It is also well known, however, that the PRBS signal is inappropriate for nonlinear system identifications in general [2, 9] because it assumes only two values that may not sufficiently excite the unknown nonlinearity. Despite of this well known fact, we show in this chapter that use of the PRBS input is actually very beneficial for identification of Hammerstein models. In particular, because of this very binary nature, any static nonlinearity can be exactly characterised by a linear (affine) function under the PRBS input. Therefore, the effect of nonlinearity is eliminated. To be precise, let x = f (u) be any static nonlinear function, e.g., as illustrated in Figure 20.1. Since, u(k) = ±1 for any k, x(k) = f (±1) for all k. Thus, x = f (u) can be completely characterised by a linear (affine) function under the PRBS input x = f (u) = η0 + η1 · u . The coefficients η0 and η1 can be determined by solving η0 + η1 · 1 η0 1 1 f (1) = = η0 − η1 · 1 1 −1 η1 f (−1)

(20.1)

(20.2)

338

E.-W. Bai

⇒

1 1 η0 = η1 2 1

1 f (1) + f (−1) 1 f (1) = . −1 f (−1) 2 f (1) − f (−1)

Though η0 and η1 depend on the unknown f (·), they are well defined and unique. In this chapter, we assume f (1) = f (−1) that implies η1 = 0. Clearly, if f (1) = f (−1), x(k) becomes a constant and identification may not be possible. At this point, observe that i the gains of f (u) and G(z) are actually not unique. Any pair (α f (u), G(z)/α ), α = 0, would produce identical input and output measurements. Therefore, to be uniquely identifiable, one of the gains have to be fixed. There are several ways to normalise the gains, e.g., either the gain of f (u) or G(z) can be fixed to be unit. In this chapter, we assume Assumption 20.1. The coefficient η1 in (20.1) is normalised to be one, i.e., η1 = 1. With this normalisation, the Hammerstein model can be re-written as y(k) = G(z)x(k) + v(k) = G(z)(η0 + η1 u(k)) + v(k) ∞

∞

= ∑ θi (η0 + u(k − i)) + v(k) = θ0 + ∑ θi u(k − i) + v(k)

with

i=1 ∞ θ0 = η0 ∑i=1 θi .

i=1

Fig. 20.1: The functions x = f (u) and x = η0 + η1 · u

(20.3)

20

Decoupling the Linear and Nonlinear Parts

339

Except the bias term θ0 which can be easily estimated and corrected, the equation (20.3) that relates the unknown G(z) to the input output data is linear (affine). Thus, identification of the linear part of the Hammerstein model is virtually linear under the PRBS input. It is important to point out that we obtain this linear (affine) equation with no knowledge of the nonlinearity.

20.3 Linear Part Identification Since the equation (20.3) is linear, any linear identification method can be applied. The key is that identification of the linear part is decoupled from the nonlinear part under the PRBS input.

20.3.1 Non-parametric Identification For each n, define two vectors Wn (ω ) = (e− jω , e−2 jω , . . . , e−n jω ) .

φn (k) = (u(k − 1), u(k − 2), . . ., u(k − n)) . We now re-write the output prediction y(k) ˆ and the quadratic error using equation (20.3): n

y(k) ˆ = θˆn,0 + ∑ θˆn,i u(k − i) , i=1

n 1 N 1 N 2 (θˆn,0 , θˆ n ) = arg min ∑ (y(k) − y(k)) ˆ = ∑ [y(k) − (θˆn,0 + ∑ θˆn,i u(k − i))]2 , N k=1 N k=1 i=1

and the associated transfer function estimate n

ˆ jω , θˆ n ) = ∑ θˆn,i e−i jω = Wn (ω )θˆ n . G(e i=1

Clearly, the parameter estimates θˆn,0 and θˆ n based on the quadratic error (20.3) are the well known least squares solutions ⎛ˆ ⎞ θn,0 ⎜θˆ ⎟ N ˆ 1 N θn,0 1 1 ⎜ n,1 ⎟ −1 1 = { φ (k))} [ (1, = ⎟ ⎜ . ∑ φn (k) ∑ φn (k) y(k)] . n θˆ n ⎝ .. ⎠ N k=1 N k=1 θˆn,n ˆ we model the PRBS input as an i.i.d. To analyse the consistency of the estimate G, process with binomial density distribution 0.5 u(k) = 1 , prob{u(k)} = 0.5 u(k) = −1 .

340

E.-W. Bai

It is easily verified that 1 N 1 N 2 u(k) = 0, Eu2 (k) = lim ∑ ∑ u (k) = 1 , N→∞ N N→∞ N k=1 k=1

Eu(k) = lim

1 N ∑ u(k)u(k − τ ) = δ (τ ), Φu (ω ) = 1 , N→∞ N k=1

R(τ ) = Eu(k)u(k − τ ) = lim

1 N ∑ u(k)v(t) = 0 , N→∞ N k=1

Eu(k)v(t) = lim

provided that the noise is independent of the input. Intuitively, from the above equations and the assumption that |θi | ≤ M λ i , we expect as N → ∞, 1 N 1 ∑ φn (k) (1, φn (k)) = N k=1 ⎛

1 u(k − 1) ... 2 N ⎜u(k − 1) u (k − 1) . .. 1 ⎜ ⎜ .. .. ∑ . .. N k=1 ⎝ . . u(k − n) u(k − n)u(k − 1) . . .

⎞ ⎛ u(k − n) 1 0 ⎜0 1 u(k − 1)u(k − n)⎟ ⎟ ⎜ ⎟ → ⎜ .. .. .. ⎠ ⎝. . . 2 0 0 u (k − n)

... ... .. . ...

⎞ 0 0⎟ ⎟ .. ⎟ .⎠ 1

∞ 1 N 1 N φn (k)y(k) = ∑ φn (k)(θ0 + ∑ θi u(k − i) + v(k)) ∑ N k=1 N k=1 i=1

⎛ ⎞ θ1 ⎟ ⎜ N ∞ 1 ⎜θ2 ⎟ → ∑ φn (k) ∑ θn u(k − i) → ⎜ . ⎟ . N k=1 ⎝ .. ⎠ i=1 θn

ˆ jω , θˆ n ) → G(e jω ) as Roughly speaking, this implies θˆ n → (θ1 , . . . , θn ) and G(e n, N → ∞. In fact, by mimicking the proofs and technical assumptions on the order n and the noise v(·) as in the linear case [7], the above observation can be made precise. ˆ θˆ n ) derived Theorem 20.1. Consider the Hammerstein model, the estimate G(z, from the quadratic criterion under the PRBS input. Suppose that the noise v(·) is the output of some unknown exponentially stable linear system driven by an i.i.d. sequence with zero mean and independent of the input. Further, let the order n = n(N) satisfy n2 (N)/N → 0 and n(N) → ∞, as N → ∞,

∞

∑[n(N 2 )/N]2 < ∞, 1

∞

∑[n3 (N)/N]q < ∞ . 1

20

Decoupling the Linear and Nonlinear Parts

341

for some q > 0. Then, with probability one as N → ∞, ; N ˆ jω ˆ n Φv (ω ) j ω n j ω ˆ [G(e , θ ) − G(e jω )] ≈ normal (0, ) |G(e , θˆ ) − G(e )| → 0, n Φu (ω ) and this implies ˆ jω , θˆ n )} ≈ Var{G(e

n Φv (ω ) . N Φu (ω )

This is exactly the same result one would have in a linear identification setting. In other words, the second term in (20.5) which is the effect of coupling in Hammerstein model identifications is completely eliminated by using the PRBS input. To show the improved performance by decoupling, we consider a numerical example. The nonlinearity is a pre-load shown in Figure 20.1, x = f (u, η ) = η1 · u + η0 · sign(u) with η1 = η0 = 0.5. The linear part is a second order stable system G(z) =

z − 0.75 . z2 − 1.5z + 0.56

In the simulation, N = 3000 and the noise is assumed to be i.i.d. uniformly in [−1, 1]. Figure 20.2 shows the estimation errors (in *) 1 2π

ˆ jω , θˆ n )|2 d ω |G(e jω ) − G(e

as a function of n by using the proposed method. To compare the results with existing methods of non-PRBS inputs, we also show the estimation errors by the iterative method [8] √ (in√o). In the simulation of the iterative method, an uniform random input in [− 3, 3] is applied to keep the same input energy and η1 is normalised to 0.5. As expected, the proposed method outperforms the iterative method because of decoupling. Though only the comparison with the iterative method is provided, the comparison is representative. For example, because of iterative nature and that the pre-load can be written as a linear function of u(·) and sign(u(·)), the performance of the half-substitution approach of [12] is similar. More importantly, as discussed before, the asymptotical performance (20.5) is independent of the method as long as the quadratic criterion is considered.

20.3.2 Parametric Identification In the above discussion, G(z) is assumed to be non-parametric. The idea can be easily extended to parametric identifications. Suppose

342

E.-W. Bai

G(z) =

β1 zm−1 + β2zm−2 + ... + βm , zm + α1 zm−1 + ... + αm

(20.1)

for some m, βi ’s and αi ’s. Re-define

θ = (α1 , . . . , αm , β1 , . . . , βm , θ0 ) , φ (k) = (−y(k − 1), . . ., −y(k − m), u(k − 1), . . ., u(k − m), 1) . The output y(k) can be written in time domain as m

m

i=1

i=1

y(k) = − ∑ αi y(k − i) + ∑ βi u(k − i) + θ0 + v0 (k) = θ φ (k) + v0 (k)

(20.2)

for some v0 (·) and θ0 = η0 ∑m i=1 βi . This equation is again linear (affine) as if the unknown nonlinearity is absent and thus, any linear method may apply. Since there is a huge volume of work in the literature on linear system identification, we only discuss three cases here relevant to the bias term θ0 . The quadratic error criterion: If the quadratic error of (20.3) is considered, the estimates θ = (αˆ 1 , . . . , βˆm , θˆ0 ) is again the least squares solution θ = (Q(N) Q(N))−1 Q(N)Y (N) → θ as N → ∞

Fig. 20.2: Estimation errors

20

Decoupling the Linear and Nonlinear Parts

343

provided that the noise v0 (·) is i.i.d. with zero mean and independent of the input, where ⎞ ⎛ ⎞ ⎛ φ (1) y(1) ⎟ ⎜ ⎟ ⎜ Y (N) = ⎝ ... ⎠ , Q(N) = ⎝ ... ⎠ .

φ (N)

y(N)

Consequently, αˆ i → αi and βˆ j → β j asymptotically, and ˆ = G(z)

βˆ1 zm−1 + . . . + βˆm → G(z) . zm + αˆ 1 zm−1 + . . . + αˆ m

Non-zero mean i.i.d. noises: If the noise v0 (·) is i.i.d. with non-zero mean, this non-zero mean and the bias term θ0 can be easily taken care of by the filter (1 − z−1). By defining y f (k) = (1 − z−1)y(k), u f (k) = (1 − z−1)u(k), v f (k) = (1 − z−1)v0 (k) and applying the filter to both sides of (20.2), we obtain m

m

i=1

i=1

y f (k) = − ∑ αi y f (k − i) + ∑ βi u f (k − i) + v f (k) . Clearly, the bias term θ0 is eliminated by the filtering and the new noise sequence v f (k) is i.i.d. with zero mean. Then, the least squares method can be applied again. Non-i.i.d. noises: If the noise v0 (·) is not i.i.d. but a stationary process with rational spectral density, then the instrumental variable method can be used. It is a standard result that the estimate derived by the instrumental variable method converges asymptotically provided that the instrumental variable is properly chosen [11].

20.4 Nonlinear Part Identification Once the linear part is obtained, we can identify the nonlinear part. At this point, disadvantages of PRBS inputs become apparent because the PRBS assumes only two values that do not excite the nonlinearity sufficiently. In order to excite the nonlinearity, the input has to be rich enough. In other words, to identify the nonlinear part, a new input output data set has to be generated. The primary difficulty in identifying the unknown nonlinearity f (·) is that no a priori structural knowledge on f (·) is assumed and thus estimation of f (·) is no longer a parameter estimation problem. We can deal with this problem in at least two ways. The first approach is to write f (u) = ∑∞ i=0 ηi pi (u) for some base functions {pi }∞ and then, to estimate the coefficients η ’s. This approach has been discussed in i 0 details in [10] along its convergence and consistency. The advantage of this approach is that only ηi ’s are estimated and no additional steps are needed. The disadvantage is that usually a large number of ηi ’s is required to have a reasonable representation

344

E.-W. Bai

of the unknown f (·). We focus the second approach in this chapter which is based on the following observation: though f (·) is unknown, it is static and if u(k) and x(k) are available, the picture of f (·) can be graphed and this graphical picture provides structural information on the unknown f (·). The input u(k) is available and the key to determine the structure of f (·) is to recover x(k). We divide the discussion in two cases. ˆ The estimated linear part G(z) is minimum phase: ˆ In this case, the internal signal x(k) can be estimated by inverting G(z), ˆ −1 y(k) . x(k) ˆ = G(z) In the case of parametric identification, we may write x(k) ˆ in time domain as x(k) ˆ =

1 ˆ my(k − m + 1)] . [−βˆ2 x(k − 1) − ... − βˆ mx(k − m + 1) + y(k + 1) + ... + α ˆ β1

Note all y(k)’s are available and the causality is not an issue. ˆ The estimated linear part G(z) is non-minimum phase: ˆ In this case, inversion becomes problematic. However, as long as G(z) does not have zeros on the unit circle, the following result can be easily derived, e.g., by a similar proof as in [4]. ˆ is stable and does not have zeros on the unit circle. Lemma 20.1. Assume that G(z) Then, for any arbitrarily small ε > 0, there always exists a stable transfer function ˆ H(z) and a positive integer l such that z1l − H(z)G(z) 2 ≤ ε. Now, define the estimate x(k) ˆ by x(k ˆ − l) = H(z)y(k). From the lemma, the error ε can be made arbitrarily small. Once x(k) ˆ is recovered, the structural information can be identified from the graph determined by the pairs (u(k), x(k)). ˆ If necessary, we can then parametrise the nonlinearity by using appropriate base functions xˆ = f (u) = ∑ pi (u, ηi ) for some known base functions pi ’s. The coefficients ηi ’s can estimated from

ηˆ i = arg min ηˆ i

1 N ˆ − ∑ pi (u(k), ηˆ i ))2 . ∑ (x(k) N k=1

A weighted criterion assigning smaller weights of earlier estimates of the internal signal might provide better results. Such earlier estimates are indeed influenced by the transient behaviour. Convergence analysis of this type of estimates has been carried out in the literature and we refer interested readers to [3] for details.

20

Decoupling the Linear and Nonlinear Parts

345

20.5 Concluding Remarks In Hammerstein model identifications, a major difficulty is the coupling between the linear and the nonlinear parts. Under the PRBS input, however, we show that they can be decoupled and thus, identification of the linear part is essentially a linear problem. This greatly reduces the complexity and improves the efficiency of the identification algorithm. The chapter is based on [1] with permission from Automatica/Elsevier.

References 1. Bai, E.W.: Decoupling of linear and nonlinear parts in Hammerstein model identification. Automatica 40, 1651–1659 (2004) 2. Bai, E.W.: Identification of linear systems with hard input nonlinearities of known structure. Automatica 38, 853–860 (2002) 3. Bai, E.W.: A blind approach to the Hammerstein–Wiener model identification. Automatica 38, 967–979 (2002) 4. Bai, E.W., Dasgupta, S.: A minimum k-step delay controller for robust tracking of nonminimum phase systems. Systems & Control Letters 28, 197–203 (1996) 5. Boutayeb, M., Rafaralahy, H., Darouach, M.: A robust and recursive identification method for Hammerstein model. In: IFAC World Congress 1996, San Francisco, pp. 447–452 (1996) 6. Ljung, L.: System Identification: Theory for the User, 2nd edn. Prentice-Hall, Upper Saddle River (1999) 7. Ljung, L., Yuan, Z.-D.: Asymptotic properties of black box identification of transfer functions. IEEE Trans. on Auto Contr. 30, 514–530 (1985) 8. Narendra, K.S., Gallman, P.G.: An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Trans. on Auto. Contr. 11, 546–550 (1966) 9. Ninness, B., Gibson, S.: Quantifying the accuracy of Hammerstein model estimation. Automatica 38, 2037–2051 (2002) 10. Pawlak, M.: On the series expansion approach to the identification of Hammerstein systems. IEEE Trans. on Auto Contr. 36, 763–767 (1991) 11. Soderstrom, T., Stoica, P.: System Identification. Prentice-Hall, New York (1989) 12. Voros, J.: Parameter identification of discontinuous Hammerstein systems. Automatica 33, 1141–1146 (1997)

Chapter 21

Hammerstein System Identification in Presence of Hard Memory Nonlinearities Youssef Rochdi, Vincent Van Assche, Fatima-Zahra Chaoui, and Fouad Giri

21.1 Introduction Most works on Hammerstein system identification have focused on the case of memoryless nonlinearities characterised by a static relation u = F(v, θ ) where θ is a vector of unknown parameters (Figure 21.1). For a long time, the function F(v, θ ) has been assumed to be of known structure (e.g. polynomial), continuous in v and linear in θ (see e.g. Chapters 3 and 5 and reference list therein). Hard nonlinearities of known type have been considered in [1, 4, 6, 8]. Then, F(v, θ ) may be nonlinear in θ and discontinuous in v. The case of memoryless nonlinearities F(v) with no prior knowledge has been dealt with in [5]. Hammerstein system identification in presence of memory hard nonlinearities is a more challenging problem. The case of backlash-relay and backlash (Figures 21.2a and 21.3a) has been coped with in [3] using a separable nonlinear least squares method. However, the proposed solution has only been applied to symmetric nonlinearities with a single unknown parameter (specifically, h1 = h2 = a and M1 = M2 = 1 where a denotes the unknown parameter). A thorough description of this approach is given in Chapter 16. In [6] a two-stage procedure is developed for estimating parameter bounds for Hammerstein systems involving not-necessarily symmetric backlash element. The quality of the estimated bounds depends on the output noise amplitude: the smaller the noise Youssef Rochdi FST, University Caddi Ayad Marrakech Vincent Van Assche GREYC, University of Caen, Caen, France Fatima-Zahra Chaoui ENSET, University of Rabat, Morocco Fouad Giri GREYC, University of Caen, Caen, France e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 347–365. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

348

Y. Rochdi et al.

amplitude, the tighter the parameter bounds (Fig 21.2a). A detailed presentation of this approach is made in Chapter 22. In the present chapter, a quite different solution is described for identifying Hammerstein systems containing not-necessarily symmetric nonlinearities of the backlash, backlash-relay, switch and switch-relay types (Figures 21.2a to 21.3b). The linear subsystem parameters are first estimated using a least squares estimator. The estimation process relies on a key design feature: an appropriate input signal is designed to ensure persistent excitation and to make possible the measurement of the internal signal u(t). The linear subsystem identification is then decoupled from the nonlinear element. The parameters of the nonlinear element are then estimated using estimators of the least-squares type, based on adequate system parametrisation. All involved estimators are shown to have quite interesting consistency properties. These consistency results have first been established in [7]. The chapter is organised as follows: the identification problem is stated in Section 21.2; Section 21.3 is devoted to the linear subsystem identification while the nonlinear element identification is carried out in Section 21.4.

ξ(t) v(t)

u(t) F(.)

Memory nonlinearity

y(t) B(q-1)

1/A(q-1)

Linear subsystem Fig. 21.1: Hammerstein System

21.2 Identification Problem Formulation We are interested in systems that can be described by the Hammerstein model (Figure 21.1): A q−1 y(t) = B q−1 u(t) + ξ (t) and u(t) = F(v(t)) (21.1) with A q−1 = 1 + a1q−1 + . . . + an q−n , B q−1 = b1 q−1 + . . . + bn q−n

(21.2)

where the internal signal u(t) is not measurable and the equation error ξ (t) is a bounded zero-mean stationary and ergodic sequence of stochastically independent variables. The linear subsystem is supposed to be (asymptotically) stable, controllable and of known order n. Controllability is required for persistent excitation purpose ([5]). The function F(.) is allowed to be a backlash, backlash-relay, switch

21

Hammerstein System Identification

349

or switch-relay. These are analytically defined in Table 21.1 and graphically illustrated by Figures 21.2a to 21.3b. The backlash-relay and switch-relay elements are characterised by the (unknown) parameters (M1 , M2 , h1 , h2 ); they will be denoted R(M1 ,M2 ,h1 ,h2 ) . The switch and backlash elements are characterised by (S, h1 , h2 ); they will be denoted Sw(S,h1 ,h2 ) and Ba(S,h1 ,h2 ) respectively. The meaning of (h1 , h2 ) is different for the different elements. For backlash- and switch-relay, h1 is the smallest number such that, for all t and whatever the value of v(t − 1): v(t) ≥ h1 ⇒ F(v(t)) = M1 , h2 is the largest number such that, for all t and whatever the value of v(t − 1): v(t) ≤ h2 ⇒ F(v(t)) = M2 . For backlash operators, [h2 , h1 ] is the widest interval such that one may have F(v) = 0, for all v ∈ [h2 , h1 ]. For switch elements, F(v) = 0 ⇒ v = h2 or v = h1 .

F(v)

M1 S

-hm

h2

hm v

h1

M2

Fig. 21.2a Backlash operator

Fig. 21.2b Switch operator

F(v)

F(v)

M

M1

v

-hm h2

h1

M2

Fig. 21.3a Backlash-relay

hm

-hm h2

h1

M2

Fig. 21.3b Switch-relay

hm

v

350

Y. Rochdi et al. Table 21.1: Analytic definitions of the considered nonlinearities

Backlash ⎧ ⎨ S (v(t) − h2 ) u(t) = S (v(t) − h1 ) ⎩ u(t − 1) u(t−1) where hL = S + h2

Switch ⎧ if v(t) ≤ hL ⎨ S (v(t) − h1 ) if v(t) ≥ hR u(t − 1) if hL < v(t) < hR u(t) = ⎩ S (v(t) − h2 ) u(t−1) and hR = S + h1

Backlash-relay ⎧ M1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ u(t) =

M2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

if v(t) > h1 or h2 ≤ v(t) ≤ h1 and u(t − 1) = M1 if v(t) < h2 or h2 ≤ v(t) ≤ h1 and u(t − 1) = M2

Switch-relay ⎧ M1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ u(t − 1) u(t) = ⎪ ⎪ M ⎪ 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

if v(t) > v(t − 1) if v(t) = v(t − 1) if v(t) < v(t − 1)

if v(t) > h1 or h2 ≤ v(t) ≤ h1 and v(t) > v(t − 1) if v(t) = v(t − 1) if v(t) < h2 or h2 ≤ v(t) ≤ h1 and v(t) < v(t − 1)

Assumption 21.1. The following assumptions are supposed to be satisfied by the identified system: 1. B(1) = 0 i.e. the static gain of the linear subsystem is nonzero. 2. There is a known real hm such that: hm > max (|h1 |, |h2 |). Our purpose is to design an identification scheme that provides consistent estimates of all unknown parameters i.e. those of the linear subsystem B q−1 /A q−1 and those of the nonlinear element F(.). Remark 2.1. i) Part 2 of Assumption 21.1 is not too restrictive because hm may be arbitrarily large. A similar assumption is needed in Section 15.7 where symmetric nonlinearities (of backlash type) were considered. There, a graphical search method has been proposed to find the value of a. Then, the search should be initialised in an interval including the unknown parameter a to prevent local convergence. ii) It can be easily checked that the backlash-relay element (resp. switch-relay) can be seen as a cascade of a simple backlash (resp. switch) element in series with a static-relay.

21.3 Linear Subsystem Identification The linear subsystem identification is dealt with in three steps. First, an adequate system rescaling is introduced, in Subsection 21.3.1, to reduce the number of uncertain parameters. The obtained system representation is further transformed in Subsection 21.3.2 to cope with the unavailability of the internal signal u(t). The transformed representation involves linearly the linear subsystem parameters and, therefore, is used to estimate them. Finally, a persistently exciting input is proposed in Subsection 21.3.3 and shown to ensure estimate consistency.

21

Hammerstein System Identification

351

21.3.1 Model Reforming The considered identification problem solution up to a multiplicative has a unique real constant. Actually, if the triplet A q−1 , B q−1 , F(.) is a solution then so is −1 def A q , μ B q−1 , Fμ (.) for any μ = 0 with Fμ (v) = F(v) μ . −1 ∗ −1 ∗ Let A q ,B q , F (.) denotes the particular model corresponding to def

μ ∗ = M2 − M1 , i.e. def def B∗ = μ ∗ b1 q−1 + . . . + bn q−n = b∗1 q−1 + b∗2 q−2 + . . . + b∗n q−n , F∗ =

def

F(.) . μ∗

(21.3) (21.4)

It is readily seen from Figures 21.2a to 21.3b that: F = R(M1 ,M2 ,h1 ,h2 ) ⇒ F ∗ = R(m1 ,m2 ,h1 ,h2 ) ∗

(21.5) ∗

F = SwS,h1 ,h2 or F = Ba(S,h1 ,h2 ) ⇒ F = Sw(s,h1 ,h2 ) or F = Ba(s,h1 ,h2 ) with: m1 =

M1 M2 S , m2 = , s= . M2 − M1 M2 − M1 M2 − M1

(21.6)

(21.7)

The following relations are also of interest: m2 − m1 = 1, S =

M1 −M2 m1 −m2 = , s= = . hm − h1 hm + h2 hm − h1 hm + h2

(21.8)

Using (21.3)-(21.4), the system (21.1) can be rewritten as follows: def A q−1 y(t) = B∗ q−1 u∗ (t) + ξ (t), u∗ (t) = F ∗ (v(t)) .

(21.9)

Note that for Hammerstein systems with backlash-relay or switch-relay, the element R(m1 ,m2 ,h1 ,h2 ) involves three uncertain parameters (due to (21.8)) rather than four in the initial element R(M1 ,M2 ,h1 ,h2 ) . However, the new internal signal u∗ (t) in (21.9) is still unavailable. Therefore, the system representation (21.9) needs further transformations.

21.3.2 Model Centring and Linear Subsystem Parameter Estimation Let {y1 (t)} denote the response of (21.9) when the following input signal is applied: −hm if t = 0 , def v(t) = v1 (t) = . (21.10) hm if t ≥ 1 .

352

Y. Rochdi et al.

Then, using the definitions of Table 21.1 (or simply Figures 21.2a to 21.3b), it follows that, for all t ≥ 1: def A q−1 y1 (t) = B∗ q−1 u∗1 (t) + ξ1 (t), u∗1 (t) = F ∗ (hm ) = m1 ,

(21.11)

where ξ1 (t) denotes the realisation of ξ (t) during the present experiment. Time averaging the first equality in (21.11), over the interval 1 ≤ t ≤ L, yields1 : A(1)y¯1 (L) = B∗ (1)m1 + ξ¯1 (L) .

(21.12)

The ergodicity of {ξ1 (t)} implies that ξ¯1 (L) → E(ξ (t)) = 0 as L → 0 (w.p. 1). Also, let y¯1 denotes the limit of y(L) ¯ when L → ∞. It follows from (21.12) that such a limit exists and satisfies: (21.13) A(1)y¯1 = B∗ (1)m1 . Practically, y¯1 can be computed from a sufficiently large sample {y(t);t = 1 . . . L}. Now, subtracting (21.13) from the first equality in (21.9) gives: A q−1 (y(t) − y¯1) = B∗ q−1 (u∗ (t) − m1) + ξ (t) . (21.14) For convenience, let us introduce the following centred signals: y(t) ˜ = y(t) − y¯1 , u(t) ˜ = u∗ (t) − m1 . def

def

(21.15)

Using (21.14) and (21.15), it follows that the identified system (21.9) can be given the compact form: ˜ = B∗ q−1 u(t) ˜ + ξ (t) . A q−1 y(t) (21.16) On the other hand, using (21.15), (21.4) and the definitions of F(.) (Table 21.1 or Figures 21.2a to 21.3b), it follows that, for all t: 0 if v(t) = hm , (21.17) u(t) ˜ = 1 if v(t) = −hm . That is, the internal sequence {u(t)} ˜ in (21.16) turns out to be measurable as long as the input sequence {v(t)} takes its values in the set {−hm , hm }. Therefore, the coefficients of A q−1 and B∗ q−1 can be estimated based upon the equation error (21.16). To this end, the latter is given the following regressive form: y(t) ˜ = φ˜ (t)T θ ∗ + ξ (t) ,

(21.18)

with:

φ˜ (t) = [−y(t ˜ − 1) · · · − y(t ˜ − n) u(t ˜ − 1) · · · u(t ˜ − n)]T , 1

(21.19)

Throughout the chapter, x(N) ¯ denotes the arithmetic mean of {x(t)} i.e. x(N) ¯ = x(i)/N. If {x(t)} is an ergodic stochastic process, then x(N) ¯ → E(x(t)) (w.p. 1) as ∑N i=1 N → ∞ where E(x(t)) denotes the ensemble mean.

21

Hammerstein System Identification

353

θ ∗ = [a1 . . . an b∗1 · · · b∗n ]T .

(21.20)

The unknown parameter vector can then be estimated using the standard least squares estimator: 8

1 N ˜ ˜ T θˆ (N) = ∑ φ (i)φ (i) N i=1

9−1 8

1 N ˜ ˜ T ∑ φ (i)y(i) N i=1

9 .

(21.21)

˜ be It is understood that {v(t)} takes its values only in the set {−hm , hm } so that u(t) measurable.

21.3.3 A Class of Exciting Input Signal The input signal {v(t)} should meet the two requirements: (i) it must take its values in the set {−hm , hm } so that {u(t)} ˜ is measurable; (ii) the resulting regression vectors φ˜ (t) should satisfy the persistent excitation (PE) property (Proposition 21.1). Bearing these in mind, we propose the following periodic signal, with period T = 4n, where k is any integer, tk = kT , tk ≤ kT < tk + 1: −hm for t = tk + 2n , def v(t) = v2 (t) = (21.22) hm otherwise. Then, in view (21.17), the internal signal {u(t)} ˜ turns out to be the following: 1 for t = tk + 2n , def u(t) ˜ = u˜2 (t) = (21.23) 0 otherwise.

21.3.4 Consistency of Linear Subsystem Parameter Estimates Let z˜(t) denotes the undisturbed output defined as follows: A q−1 z˜(t) = B∗ q−1 u˜2 (t) .

(21.24)

Introduce the undisturbed state vector ˜ def = [−˜z(t − 1) · · · − z˜(t − n) u(t ˜ − 1) · · · u(t ˜ − n)]T . Z(t) Proposition 21.1. Let the system (21.1) be excited by the signal (21.22) so that it can be represented by the equation error (21.16) or its regression form (21.18). Then, with I denoting the identity matrix, one has: ˜ is PE i.e. there exists a real λ > 0 such that, for all k: 1. Z(t) 4n−1

∑

i=0

Z˜ (tk + i) Z˜ (tk + i)T ≥ λ I .

354

Y. Rochdi et al.

2. φ˜ (t) is PE in the mean i.e. there exists a real β > 0, such that: 1 N ˜ ˜ T ∑ φ (i)φ (i) > β I (w.p.1). N→∞ N i=1 lim

3. The estimator (21.21) is consistent i.e. θˆ (N) → θ ∗ as N → ∞ (w.p. 1). Proof. The PE property of Part 1 can be obtained applying a Technical Lemma in [5] to the system (21.24), using the fact that the input u˜2 (t) has the form required in [5], due to (21.23). Part 2 follows from Part 1, based on the relation ˜ + 1/A q−1 [ξ (t − 1) . . . ξ (t − n) 0 . . . 0]T , φ˜ (t) = Z(t) ˜ using the ergodicity of ξ (t) and its independence with Z(t). Part 3 is in turn a consequence of Part 2.

21.3.5 Simulation Consider two Hammerstein systems with a linear subsystem characterised by: (21.25) A q−1 = 1 − 1.3q−1; B q−1 = q−1 − 0.5q−1 .

ξ (t) is a zero-mean i.i.d random sequence in [−0.5 0.5]. The nonlinear elements are a backlash and switch-relay, respectively defined as follows: for Ba(S,h1 ,h2 ) : h1 = 1, h2 = −0.5, S = 1 ,

(21.26)

for R(M1 ,M2 ,h1 ,h2 ) : h1 = 2, h2 = −1, M1 = 1, M2 = −2 .

(21.27)

Notice that these nonlinearities are asymmetric. The real hm in Assumption 2 is chosen as follows: 2 for F = Ba(S,h1 ,h2 ) , (21.28) hm = 3 for F = R(M1 ,M2 ,h1 ,h2 ) . For Ba(S,h1 ,h2 ) , it follows from (21.26) and (21.28) that M1 = F (hm ) = 1; M2 = F(−hm ) = 1.5. Then, (21.3) gives: B∗ q−1 = −2.5q−1 + 1.25q−2 for F = Ba(S,h1 ,h2 ) , (21.29) B∗ q−1 = −3q−1 + 0.5q−2 for F = R(M1 ,M2 ,h1 ,h2 ) . (21.30) Also, one gets from (21.4)-(21.7) that for F = Ba(S,h1 ,h2 ) : 2 2 F ∗ = Ba(s,h1 ,h2 ) with h1 = 1, h2 = −0.5, s = − , m1 = − , 5 5

(21.31)

21

Hammerstein System Identification

355

and for F = R(m1 ,m2 ,h1 ,h2 ) : F ∗ = R(m1 ,m2 ,h1 ,h2 ) with m1 = −

−1 , h1 = 2, h2 = −1 , 3

(21.32)

where m2 = 1 + m1 . The resulting parameter vector θ ∗ is given in Table 21.2 (line 3). First, the system is excited by the step signal v1 (t) defined by (21.10). Then, time averaging the first equality in (21.11), over the interval 1 ≤ t ≤ L = 1000, yields y¯1 . Then, the system is excited by the periodic signal defined by (21.22). The obtained data are used in algorithm (21.21) (with N = 1000). Doing so, one gets the estimates of Table 21.2 (line 4) which are clearly quite close to their true values. Table 21.2: The estimate obtained by (21.22)

θ∗ θˆ (N)

Case where F = Ba(S,h1 ,h2 )

Case where F = R(M1 ,M2 ,h1 ,h2 )

[−1.3 0.42 − 2.5 1.25] [−1.297 0.419 − 2.493 1.245]

[−1.3 0.42 − 3 1.5]T [−1.297 0.418 − 2.993 1.493]T

21.4 Nonlinear Element Estimation ∗ Let θˆ (N) bethe estimate −1 −1of θ obtained by (21.21) (using a sufficiently large N). ˆ ˆ and BN q be the estimates of A q−1 and B∗ q−1 induced by Let AN q θˆ (N). These estimates will now be used to determine the parameters of F ∗ (.) ∈ R(m1 ,m2 ,h1 ,h2 ) , Ba(s,h1 ,h2 ) , Sw(s,h1 ,h2 ) .

21.4.1 Estimation of m1 Equation (21.13) suggests for m1 the following estimate: mˆ 1 (L, N) =

Aˆ N (1) y¯1 (L) , for a sufficiently large L. Bˆ N (1)

(21.33)

Using Proposition 21.1 (Part 3), one gets comparing (21.13) and (21.33) the following result: Proposition 21.2. Consider the system described as well by (21.1) or (21.9), where F ∗ ∈ R(m1 ,m2 ,h1 ,h2 ) , Ba(s,h1 ,h2 ) , Sw(s,h1 ,h2 ) . Let it be excited by the signal (21.10) so that it can be described by (21.13). Then, the estimator (21.33) is consistent i.e. mˆ 1 (L, N) converges (w.p. 1) to m1 , as L, N → ∞ .

356

Y. Rochdi et al.

21.4.2 Estimation of (h1 , h2 ) 21.4.2.1

Input Signal Design

The (21.9) and (21.16) do share the same linear part, i.e. parametrisations system A q−1 , B∗ q−1 . But, they involve different nonlinearities, namely F ∗ (v) and ˜ F(v) respectively. On the other hand, using Figures 21.2a to 21.3b, it follows from the relation u∗ (t) = F ∗ (v(t)) that: ˜ u(t) ˜ = F(v(t))

(21.34)

with def F˜ = R(0,1,h1 ,h2 ) when F ∗ = R(m1 ,m2 ,h1 ,h2 ) ,

(21.35)

F˜ = Ba(s,h1 ,h2 ) when F ∗ = Ba(s,h1 ,h2 ) , and

(21.36)

F˜ = Sw(s,h1 ,h2 ) when F ∗ = Sw(s,h1 ,h2 ) .

(21.37)

def def

The system parametrisation (21.16), which proved to be useful for the identification of the linear subsystem, turns out to be also appropriate for the identification of the nonlinearity. Indeed, in the light of (21.35)-(21.36), it is seen that (21.16) involves less uncertain parameters than (21.9). In order to estimate the uncertain ˜ parameters of F(.), the system is now excited by a periodic input, denoted v3 (α ,t), defined by (Figure 21.4): v3 (α ,t) = hm − t Δ1 for t = 0, 1, . . . , α T and

(21.38)

v3 (α ,t) = −hm + (t − α T )Δ2 for t = α T + 1, . . ., T ,

(21.39)

def

def

2hm m where T > 0, 0 < α < 1, Δ1 = 2h α T , and Δ 2 = T −α T . The period T and the ratio α are arbitrary, but α T should be an integer. In presence of F = R(M1 ,M2 ,h1 ,h2 ) (switch- and backlash-relay types), we will focus on the two integers t1 , t2 ∈ (0 T ) where v3 (α ,t) is as close as possible to h2 and h1 , respectively (Fig 21.4). Let us introduce the following notations:

t1 = β T and t2 = γ T def

def

where (β , γ ) satisfy 0 < β < α < γ < 1. It follows from (21.38)-(21.39) that: hm − β T Δ1 = h2 − ε (α , T ) with 0 ≤ ε1 (α , T ) < Δ1 , −hm + (γ − α )T Δ2 = h1 + ε2 (α , T ) with 0 ≤ ε2 (α , T ) < Δ2 .

(21.40)

Similarly, in presence of F = Ba(S,h1 ,h2 ) , the focus will be made on t1 , t2 ∈ (0 T ) where v(t) = v3 (α ,t) is as close as possible to the particular values (hm − (h1 − h2)) and (−hm + (h1 − h2)) (see Fig 21.4). Let us introduce the following notations: t1 = μ T and t2 = η T def

def

21

Hammerstein System Identification

357

where μ and η verify 0 < μ < α < η < 1. It follows from (21.38)-(21.39) that: hm − μ T Δ1 = hm − (h1 − h2) − ε1 (α , T )

(21.41)

−hm + (η − α )T Δ2 = −hm + (h1 − h2) + ε2 (α , T ) with 0 ≤ ε1 (α , T ) < Δ1 and 0 ≤ ε2 (α , T ) < Δ2 . Equations (21.40)-(21.41) show that, for a given α , the residuals εi (α , T ), εi (α , T ) (i = 1, 2) vanish as T → ∞. On the other hand, it follows from (21.15) and Figures 21.2a to 21.3b that, for t > T , u(t) ˜ is in turn periodic, with period T . The transient behaviour, during the first period [0 T ), may be ignored letting T be the origin of time. Then, u(t) ˜ turns out to be defined in a period [0 T ) as follows (Figures 21.5 to 21.7): ⎧ 1 if β T ≤ t ≤ γ T def ⎪ ⎪ u(t) ˜ = u˜R (α ,t) = ⎪ ⎪ 0 otherwise ⎪ ⎪ ⎪ ∗ ⎪ for ⎪ ⎧ F = R(m1 ,m2 ,h1 ,h2 ) , ⎪ ⎪ ⎪ 0 if 0 < t ≤ μ T ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ t− μ T ⎨ ⎪ ⎪ if μ T < t ≤ α T def ⎪ α T −μ T ⎨ u(t) ˜ = u˜Ba (α ,t) = 1 if α T < t < η T ⎪ ⎪ ⎪ ⎩ − t−η T + 1 if η T < t ≤ T ⎪ ⎪ ⎪ T −η T ⎪ ⎪ ∗ = Ba ⎪ for F ⎪ (s,h1 ,h2 ) , ⎪ ⎪ ⎪ 1−s(h 1 −h2 ) ⎪ ⎪ def t + s (h1 − h2) if 0 ≤ t < α T ⎪ αT u(t) ˜ = u˜Sw (α ,t) = ⎪ ⎪ ⎪ 2shm Tt−−ααTT − 2shm if α T ≤ t < T ⎪ ⎪ ⎩ for F ∗ = Sw(s,h1 ,h2 ) . Let u¯˜R (α ), u¯˜Ba (α ), and u¯˜Sw (α ) be the averages of u˜R (α ,t), u˜Ba (α ,t), and u˜Sw (α ,t) over [0 T ). One gets from (21.42): (21.42) u˜¯R (α ) = (γ − β ) + ε (α , T ) for F ∗ = R(m1 ,m2 ,h1 ,h2 ) , ∗ u˜¯Ba (α ) = 0.5(α − μ ) + (η − α ) + 0.5(1 − η ) + ε (α , T) for F = Ba(s,h1 ,h2 ) , (21.43) ¯u˜Sw (α ) = 0.5α + 0.5α s (h1 − h2) + (1 − α )shm for F ∗ = Sw(s,h ,h ) . (21.44) 1

2

In (21.42)-(21.44), and in subsequent equations, ε (α , T ) denotes a generic error resulting from ε1 (α , T ), ε1 (α , T ), ε2 (α , T ), ε2 (α , T ), such that for any fixed 0 < α < 1: lim ε (α , T ) = 0 . (21.45) T →∞

Using (21.40)-(21.41) and the expressions of Δ1 and Δ2 , it follows from (21.42)(21.43) that: h1 1 (1 − α ) + ε (α , T ) for F ∗ = R(m1 ,m2 ,h1 ,h2 ) , u¯˜R (α ) = + 2 2hm 1 h1 − h2 (1 − 2α ) + ε (α , T ) for F ∗ = Ba(s,h1 ,h2 ) . u˜¯Ba (α ) = + 2 4hm

(21.46) (21.47)

358

Y. Rochdi et al.

Fig. 21.4: Shape of the signal v(t) = v3 (α ,t), defined by (21.38)-(21.39), over one period T

Furthermore, due to (21.8), (21.47) can be rewritten as follows: 1 (2hm + 1/s) u¯˜Ba (α ) = + (1 − 2α ) + ε (α , T ) . 2 4hm

(21.48)

In the sequel, estimates of u¯˜R (α ), u¯˜Ba (α ), u¯˜Sw (α ) will be needed. To determine these, let us perform averaging of both sides of (21.16) over [0 MT ), when v(t) = v3 (α ,t). Doing so, one gets: ¯˜ ¯˜ A(1)y(MT ) = B∗ (1)u(MT ) + ξ¯ (MT ) .

(21.49)

¯˜ In fact, u(MT ) coincides with the average value of u(t) ˜ on the period [0 T ) because, in the present case, u(t) ˜ is periodic with period T . Specifically, one has: ⎧ ⎨ u¯˜R for F ∗ = R(m1 ,m2 ,h1 ,h2 ) , ¯˜ u¯˜Ba for F ∗ = Ba(s,h1 ,h2 ) , u(MT )= (21.50) ⎩ ¯ u˜Sw for F ∗ = Sw(s,h1 ,h2 ) . Using (21.50) and the ergodicity of {ξ (t)}, it follows from (21.49) that: ¯˜ )] = 0 for F ∗ = R(m1 ,m2 ,h1 ,h2 ) (w.p. 1) , lim [B∗ (1)u¯˜R (α ) − A(1)y(MT

(21.51)

¯˜ lim [B∗ (1)u¯˜Ba (α ) − A(1)y(MT )] = 0 for F ∗ = Ba(s,h1 ,h2 ) (w.p. 1) ,

(21.52)

¯˜ lim [B∗ (1)u¯˜Sw (α ) − A(1)y(MT )] = 0 for F ∗ = Sw(s,h1 ,h2 ) (w.p. 1) .

(21.53)

M→∞ M→∞ M→∞

21

Hammerstein System Identification

359

Fig. 21.5 Shapes of the sequences obtained over a period T for backlash-relay and switch-relay

21.4.2.2

Estimation of (h1 , h2 ) for Backlash- and Switch-Relay

To determine (h1 , h2 ), one needs two equations like (21.46). We then proceed with two experiments involving two inputs v3 (t, α1 ) and v3 (t, α2 ) with the same period T but different ratios α1 and α2 . The resulting internal signals are denoted u˜R1 and u˜R2 . In view of (21.46), these have the average values: h1 1 h2 (1 − α1) + α1 + ε (α1 , T ) , u¯˜R1 (α1 ) = + 2 2hm 2hm h1 1 h2 (1 − α2) + α2 + ε (α2 , T ) . u¯˜R2 (α2 ) = + 2 2hm 2hm

(21.54) (21.55)

Solving (21.54)-(21.55) for (h1 , h2 ) yields: 3

h1 h2

4

3 = 2hm

(1 − α1) α1 (1 − α2) α2

4−1 3

u˜¯R1 (α1 ) − 0.5 − ε (α1, T ) u¯˜R2 (α1 ) − 0.5 − ε (α2, T )

4 .

(21.56)

In order to use (21.56), one needs u¯˜R1 (α1 ), u¯˜R2 (α2 ). Using (21.50) and (21.51), one gets the estimates: Aˆ N (1) ¯ Aˆ N (1) ¯ uˆ¯˜R1 (M, N) = y˜31 (MT ) , u¯ˆ˜R2 (M, N) = y˜32 (MT ) , ˆ BN (1) Bˆ N (1)

(21.57)

360

Y. Rochdi et al.

Fig. 21.6 Shapes of the sequences obtained over a period T for backlash operator

where y¯˜31 (t), y¯˜32 (t) are the responses of (21.15)-(21.16) to the inputs v3 (t, α1 ) and v3 (t, α2 ), respectively. This, together with (21.56), suggests the following estimation algorithm for (h1 , h2 ): 9 3 3 4−1 8 ˆ 4 u¯˜R1 (M, N) − 0.5 hˆ 1 (M, N) (1 − α1) α1 . (21.58) = 2hm (1 − α2) α2 hˆ 2 (M, N) uˆ¯˜R2 (M, N) − 0.5 Proposition 21.3. Consider the Hammerstein system described as well by (21.1) and (21.16), (21.34) with F˜ = R(0,1,h1 ,h2 ) . Let it be excited by two inputs v3 (α ,t) (with different values of α ) and let the Algorithm (21.57)-(21.58) be used to get the estimates hˆ 1 (M, N), hˆ 2 (M, N). Then, one has: ( ( ( ( lim sup (ˆh1 (M, N) − h1 ( ≤ e(T ) , lim sup (ˆh2 (M, N) − h2 ( ≤ e(T ) (w.p.1) M→∞ N→∞

M→∞ N→∞

with e(T ) an asymptotically vanishing generic function. That is, the larger T the better the estimates. Proof. Subtracting (21.56) from (21.58) gives:

21

Hammerstein System Identification

361

Fig. 21.7 Shapes of the sequences obtained over a period T for a switch operator

9 4−1 8 ˆ u¯˜R1 (M, N) − u¯˜R1 (α1 ) + ε (α1 , T ) . uˆ¯˜R2 (M, N) − u¯˜R2 (α1 ) + ε (α2 , T ) (21.59) On the other hand, applying (21.51) to the couples

3 4 3 hˆ 1 (M, N) − h1 (1 − α1) = 2h m (1 − α2) hˆ 2 (M, N) − h2

α1 α2

(u¯˜R1 (α1 ), y¯˜1 (MT )) and (u¯˜R2 (α2 ), y¯˜2 (MT )) yields: lim [B∗ (1)u¯˜R1 (α1 ) − A(1)y¯˜1(MT )]

M→∞

= lim [B∗ (1)u¯˜R2 (α2 ) − A(1)y¯˜2 (MT )] = 0 (w.p.1) . (21.60) M→∞

Using Proposition 21.1 (Part 3), it follows from (21.34) that: lim Bˆ N (1)u¯˜R1 (α1 ) − Aˆ N (1)y¯˜1 (MT )

M→∞ N→∞

362

Y. Rochdi et al.

= lim Bˆ N (1)u¯˜R2 (α2 ) − Aˆ N (1)y¯˜2 (MT ) = 0 (w.p.1) . (21.61) M→∞ N→∞

But, from (21.57) one has: Bˆ N (1)u¯ˆ˜R1 (M, N) − Aˆ N (1)y¯˜1 (MT ) = Bˆ N (1)u¯ˆ˜R2 (M, N) − Aˆ N (1)y¯˜2 (MT ) = 0 . (21.62) Comparing (21.61) and (21.62), it follows (using Part 3 of Proposition 21.1) that: " " # # lim uˆ¯˜R1 (M, N) − u¯˜R1 (α1 ) = lim uˆ¯˜R2 (M, N) − u¯˜R2 (α2 ) = 0 . (21.63) M→∞ N→∞

M→∞ N→∞

Proposition 21.3 follows from (21.45), (21.59), (21.63), using the fact that ε (α , T ) vanishes as T → ∞. 21.4.2.3

Estimation of (s, h1 , h2 ) for Backlash Operators

We now focus on the case of F ∗ = Ba(s,h1 ,h2 ) . Equations (21.48), (21.49) and (21.50) suggest for the parameter s the estimate: 3 4−1 1 1 s(M, ˆ N) = 4hm uˆ¯˜Ba (M, N) − − 2hm 2 1 − 2α ˆ def AN (1) ¯˜ uˆ¯˜Ba = y(MT ) Bˆ N (1)

(21.64) (21.65)

where y(t) ˜ is the response of (21.15)-(21.16) to the input v(t) = v3 (t, α ) and 0 < α < 1 is arbitrary but α = 0.5. Consequently, only one experiment is needed to estimate the parameter s. On the other hand, it is easily seen from (21.8) and Figure 21.4 1 that: h1 = hm − ms1 and h2 = −hm − 1+m s . This suggests for (h1 , h2 ) the following estimates: mˆ 1 (L, N) mˆ 1 (L, N) and hˆ 2 (L, M, N) = −hm − . hˆ 1 (L, M, N) = hm − s(M, ˆ N) s(M, ˆ N)

(21.66)

Proposition 21.4. Consider the Hammerstein system with backlash nonlinearity described by (21.1) or by (21.16), (21.38) with F˜ = Ba(s,h1 ,h2 ) . Let it be excited by v3 (α ,t) (with α = 0.5) and use Algorithm (21.64)-(21.66) to get the estimates s(M, ˆ N), hˆ 1 (L, M, N), hˆ 2 (L, M, N). Then, one has w.p.1: ˆ N) − s| ≤ e(T ) , lim sup |s(M,

M→∞ N→∞

( ( lim sup (ˆh1 (L, M, N) − h1 ( ≤ e(T ) ,

L→∞ M→∞ N→∞

21

Hammerstein System Identification

363

( ( lim sup (ˆh2 (L, M, N) − h2 ( ≤ e(T ) ,

L→∞ M→∞ N→∞

with e(T ) a generic error that vanishes as T → ∞. That is, the larger is T the better the estimates. Proof. From (21.64) and (21.48) one gets, respectively: 1 −1 ¯ s = 4hm (u˜Ba − 0.5 − ε (α , T)) − 2hm , 1 − 2α 1 1 (s(M, ˆ N))−1 = 4hm uˆ¯˜Ba (M, N) − − 2hm . 2 1 − 2α

(21.67) (21.68)

Subtracting each side of (21.68) from the corresponding side of (21.67), yields: " # 1 s−1 − (s(M, ˆ N))−1 = 4hm u¯˜Ba (α ) − uˆ¯˜Ba (M, N) − ε (α , T ) . (21.69) 1 − 2α ˜ = On the other hand, applying (21.52) to the case where v(t) = v3 (α ,t) and u(t) u˜B (α ,t), one gets: ¯˜ lim [B∗ (1)u¯˜Ba (α ) − A(1)y(MT )] = 0 (w.p. 1) ,

M→∞

(21.70)

where y(t) ˜ denotes the response of (21.15)-(21.16) to the input v(t) = v3 (α ,t). Using Proposition 21.1 (Part 3), it follows from (21.70) that: ¯˜ lim Bˆ N (1)u¯˜Ba (α ) − Aˆ N (1)y(MT ) = 0 (w.p. 1) . (21.71) M,N→∞

¯˜ ) = 0. This, together with But, from (21.65) one has, Bˆ N (1)uˆ¯˜Ba (M, N) − Aˆ N (1)y(MT (21.71), implies: 6 7 lim u¯˜Ba (α ) − uˆ¯˜Ba (M, N) = 0 , (21.72) M→∞ N→∞

which together with (21.69) and (21.45) establishes Proposition 21.4 for s (using the fact that ε (α , T ) vanishes as T → ∞). One can similarly establishes the convergence results concerning h1 , h2 . 21.4.2.4

Estimation of (s, h1 , h2 ) for Switch Elements

We now focus on the case of F ∗ = Sw(s,h1 ,h2 ) . An estimation algorithm can be designed following the approach of Subsection 21.4.2.2. This consists in performing two experiments involving two input signals v3 (α1 ,t) and v3 (α2 ,t), defined by (21.38)-(21.39), with the same period T (but α1 = α2 ). The resulting internal signals are denoted u˜Sw1 (t), u˜Sw2 (t) and, in view of (21.44), have the average values:

364

Y. Rochdi et al.

α1 h1 − h2 + α1 s + (1 − α1)shm , u¯˜Sw1 (α1 ) = 2 2 h1 − h2 α2 + α2 s + (1 − α2)shm . u¯˜Sw1 (α2 ) = 2 2

(21.73) (21.74)

These constitute the main ingredients to get estimates for s, h1 , h2 , just as we did in Subsection 21.4.2.2.

21.4.3 Simulation Consider the Hammerstein system with switch-relay of Subsection 21.3.5. Applying algorithm (21.33) (with N = 1000) yields mˆ 1 = 0.336 (compare with the true value m1 = −1/3). Algorithm (21.57)-(21.58) is then applied with the parameters T = 120, L = 1000, M = 1000, α1 = 0.5 and α2 = 0.7. The estimates obtained for (h1 , h2 ) are: hˆ 1 = 2.03, hˆ 2 = −1.006. Compare these with the corresponding true values, namely h1 = 2, h2 = −1. Consider the Hammerstein system with backlash nonlinearity defined in Subsection 21.3.5. Applying algorithm (21.33) (with N = 1000) yields mˆ 1 = −0.404 (compare with mˆ 1 = −2/5). Then, algorithm (21.64)-(21.66) is applied with the parameters T = 120, L = 1000, M = 100, α = 0.7. The estimates thus obtained for (s, h1 , h2 ) are: sˆ = −0.385, hˆ 1 = 0.953, hˆ 2 = −0.455. Compare these with the true values, namely s = −2/5, h1 = 1, h2 = −0.5. The above simulation results have been confirmed by running the proposed identification approach several times with different realisations of the noise ξ (t).

21.5 Conclusion Hammerstein system identification has been addressed in presence of backlash, switch, backlash-relay and switch-relay nonlinearities. The estimation of the linear subsystem and the determination of the nonlinear element are performed separately. One key step in the identification scheme design is the construction of the system parametrisation (21.16) that presents three crucial properties: (i) the unknown linear parameters come in linearly; (ii) the involved internal signal (u(t)) ˜ is known as ˜ involves less unlong as v(t) ∈ {hm , +hm }; (iii) the corresponding nonlinearity F(.) certain parameters than the initial element F(.) in (21.1). The second main feature of the identification scheme is the design of persistently exciting input signals that made it possible to achieve formal consistency results for the involved estimators.

References 1. Bai, E.W.: Identification of linear systems with hard input nonlinearities of known structure. Automatica 38, 853–860 (2002) 2. Cerone, V., Regruto, D.: Bounding the parameters of linear systems with input backlash. IEEE Transactions on Automatic Control 52, 531–536 (2007)

21

Hammerstein System Identification

365

3. Chaoui, F.Z., Giri, F., Rochdi, Y., Haloua, M., Naitali: A System identification based on Hammerstein model. International Journal of Control 78(6), 430–442 (2005) 4. Giri, F., Chaoui, F.Z., Rochdi, Y.: Parameter identification of a class of Hammerstein plants. Automatica 37, 749–756 (2001) 5. Giri, F., Chaoui, F.Z., Rochdi, Y.: Interval excitation through impulse sequences. A technical lemma. Automatica 38, 457–465 (2002) 6. Giri, F., Chaoui, F.Z., Rochdi, Y.: Recursive identification of systems with hard input nonlinearities of known structure. In: IEEE American Control Conference, Boston, Massachusetts, USA, pp. 4764–4769 (2004) 7. Giri, F., Rochdi, Y., Chaoui, F.Z.: Identification of Hammerstein systems in presence of Hysteresis-Backlash and Hysteresis-Relay nonlinearities. Automatica 44, 767–775 (2008) 8. Vörös, J.: Parameter identification of discontinuous Hammerstein systems. Automatica 33, 1141–1146 (1997)

Chapter 22

Bounded Error Identification of Hammerstein Systems with Backlash Vito Cerone, Dario Piga, and Diego Regruto

22.1 Introduction Actuators and sensors commonly used in control systems may exhibit a variety of nonlinear behaviours that may be responsible for undesirable phenomena such as delays and oscillations, which may severely limit both the static and the dynamic performance of the system under control (see, e.g., [22]). In particular, one of the most relevant nonlinearities affecting the performance of industrial machines is the backlash (see Figure 22.1), which commonly occurs in mechanical, hydraulic and magnetic components like bearings, gears and impact dampers (see, e.g., [17]). This nonlinearity, which can be classified as dynamic (i.e., with memory) and hard (i.e. non-differentiable), may arise from unavoidable manufacturing tolerances or sometimes may be deliberately incorporated into the system in order to describe lubrication and thermal expansion effects [3]. The interested reader is referred to [22] for real-life examples of systems with either input or output backlash nonlinearities. In order to cope with the limitations caused by the presence of backlash, either robust or adaptive control techniques can be successfully employed (see, e.g., [7], and [21] respectively), which, on the other hand, require the characterisation of the nonlinear dynamic block. Few contributions can be found in literature on the idenVito Cerone Dipartimento di Automatica e Informatica, Politecnico di Torino, corso Duca degli Abruzzi 24, 10129 Torino, Italy e-mail: [email protected] Dario Piga Dipartimento di Automatica e Informatica, Politecnico di Torino, corso Duca degli Abruzzi 24, 10129 Torino, Italy e-mail: [email protected] Diego Regruto Dipartimento di Automatica e Informatica, Politecnico di Torino, corso Duca degli Abruzzi 24, 10129 Torino, Italy e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 367–382. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

368

V. Cerone, D. Piga, and D. Regruto

tification of systems with input backlash. An input-holding scheme is exploited in [20] to compute the system parameters estimates by solving least squares problems, while a separable least squares approach is discussed in [1]. In [8] a consistent estimator, based on careful selection of the system parametrisation and the input signal, is presented, whereas an iterative algorithm relying on a newly proposed model for the backlash nonlinearity is discussed in [23]. Although the most common assumption in system identification is that measurement errors are statistically described, a worthwhile alternative is the bounded-errors or set-membership characterisation, where uncertainties are assumed to belong to a given set. In this context all parameters consistent with measurements, error bounds and the assumed model structure are feasible solutions of the identification problem. The interested reader can refer to survey papers [15, 24] and book [14] for a thorough presentation of the main theoretical basis. In this chapter, the procedure for the identification of linear systems with input backlash presented in [6] is reviewed and improved. More specifically, the problem of bounding the parameters of a stable, single-input single-output (SISO) discrete time linear system with unknown input backlash (see Figure 22.2) in the presence of bounded output measurement error is considered, under the common assumption that the inner signal x(t) is not supposed to be measurable. The chapter is organised as follows. Section 22.2 is devoted to the problem formulation. In Section 22.3, parameters of the nonlinear block are tightly bounded using input-output data from the steady-state response of the system to a collection of square wave inputs. Then, in Section 22.4, through a dynamic experiment, for all ut belonging to a suitable pseudo random binary signal (PRBS) sequence {ut }, we compute tight bounds on the inner signal, which are used to bound the parameters of the linear part together with noisy output measurements. Recently proposed relaxation techniques based on linear matrix inequalities (LMIs) are exploited in the identification of the linear block parameters, providing a significant improvement over the algorithm proposed in [6]. A simulated example is reported in Section 22.5.

22.2 Problem Formulation Let us consider the Hammerstein system depicted in Figure 22.2, where the nonlinear block that transforms the input signal ut into the unmeasurable inner variable xt is a backlash described by (see, e.g., [22]) ⎧ ⎪ ⎨ml (ut + cl ) for ut ≤ zl , xt = mr (ut − cr ) for ut ≥ zr , (22.1) ⎪ ⎩ xt−1 for zl < ut < zr , where ml > 0, mr > 0, cl > 0, cr > 0 are constant parameters characterising the backlash and . xt−1 . xt−1 zl = − cl , zr = + cr , (22.2) ml mr

22

Bounded Error Identification of Hammerstein Systems with Backlash xt

6

mr

ml

−cl

369

cr

ut

Fig. 22.1: Backlash characteristic

are the u-axis values of the intersections between the two lines with slopes ml and mr and the horizontal inner segment containing xt−1 . The backlash characteristic is depicted in Figure 22.1. The block that maps xt into the noise-free output wt is a discrete-time linear dynamic SISO system defined by wt = G (q−1 )xt =

B(q−1 ) xt , A (q−1 )

(22.3)

where A (q−1 ) = 1 + a1 q−1 + . . . + ana q−na and B(q−1 ) = b0 + b1 q−1 + . . . + bnb q−nb are polynomials in the backward shift operator q−1 , (q−1wt = wt−1 ). Furthermore, the following common assumptions are made: A1 A2 A3

the linear system is asymptotically stable (see, e.g., [10, 11, 12, 19, 20]); the steady-state gain of the linear block is not equal to zero (see, e.g., [11, 12, 20]); a rough upper bound of the settling time is available (see, e.g., [9]).

As ordinarily assumed in block-oriented nonlinear system identification, the inner signal xt is not supposed to be measurable. Therefore, identification of the Hammerstein system described by (22.1) - (22.3) relies only on input-output data. Here, we assume that the input signal ut is exactly known, while measurements yt of output wt are corrupted by bounded additive noise according to yt = wt + ηt ,

(22.4)

| ηt |≤ Δ ηt .

(22.5)

where

370

V. Cerone, D. Piga, and D. Regruto

Let γ ∈ R4 and θ ∈ R p be the unknown parameter vectors to be estimated, defined as

γT θT

. = . =

[ γ1 γ2 γ3 γ4 ] = [ ml cl mr cr ] ,

(22.6)

[ a1 . . . ana b0 b1 . . . bnb ] ,

(22.7)

where na + nb + 1 = p. It is worth noting that the parametrisation of the structure depicted in Figure 22.2 is not unique. In fact, given the pair of subsystems G˜(q−1 ), N˜ (ut , γ˜), any Hammerstein system (see Figure 22.2) with G (q−1 ) = α −1 G˜(q−1 ) and N (wt , γ ) = N˜ (wt , α γ˜) provides the same input-output behaviour for any nonzero and finite constant α ∈ R. In order to get a unique parametrisation, in this work we assume that the steady-state gain g of the linear block G (q−1 ) be one, that is: nb

∑ bj

g=

j=0 na

1 + ∑ ai

= 1.

(22.8)

i=1

In the next sections we describe a two-stage procedure for deriving lower and upper bounds of parameters γ and θ , consistently with the assumed model structure, given measurements and uncertainty bounds. ηt ut

-

N (·)

xt

−1

- B(q )

A (q−1 )

+ yt wt + ? -

Fig. 22.2: Hammerstein system with backlash

22.3 Assessment of Tight Bounds on the Nonlinear Static Block Parameters In this section we describe the first step of the proposed identification procedure where steady-state operating conditions are exploited to bound the parameters of the backlash. We apply to the system a set of square wave inputs with M different amplitudes and collect 2M steady-state values of the noisy output. More precisely, for each value of the input square wave amplitude, one steady-state output sample is collected on the positive half-wave of the input and one steady-state output measurement is collected on the negative half-wave. Because the backlash deadzone is unknown, the input amplitude must be chosen as large as to guarantee that the output shows any nonzero response. By combining Eqs. (22.1), (22.3), (22.4) and (22.8) under assumptions A1, A2 and A3 stated in Section 22.2, we obtain the following input-output description of the system in Figure 22.2 in steady-state operating conditions:

22

Bounded Error Identification of Hammerstein Systems with Backlash

w¯ i−1 + cr , mr i = 1, . . . , M;

(22.9)

w¯ j−1 − cl , ml j = 1, . . . , M;

(22.10)

w¯ i = mr (u¯i − cr ) for u¯i ≥ y¯i = w¯ i + η¯ i ,

w¯ j = ml (u¯ j + cl ) for u¯ j ≤ y¯ j = w¯ j + η¯ j ,

371

where the triplets {u¯i , y¯i , η¯ i } and {u¯ j , y¯ j , η¯ j } are collections of steady-state values of the known input signal, output observation and measurement error taken during the positive and the negative square wave respectively. As can be noted, Eqs. (22.9) and (22.10) depend only on the backlash parameters, thus the identification of γ can be carried out leaving aside the dynamics of the linear block. A block diagram description of Equation (22.10) is depicted in Figure 22.3; an analogous schematic representation also hold for Equation (22.9). Because (22.9) depends only on the right side backlash parameters mr and cr , while (22.10) involves only ml and cl , the overall feasible parameter region of the backlash can be written as the Cartesian product of two sets, that is . (22.11) Dγ = Dγr × Dγl , where . Dγr = (mr , cr ) ∈ R2+ : y¯i = mr (u¯i − cr ) + η¯ i , | η¯ i |≤ Δ η¯ i ; i = 1, . . . , M , (22.12) . Dγl = (ml , cl ) ∈ R2+ : y¯ j = ml (u¯ j + cl ) + η¯ j , | η¯ j |≤ Δ η¯ j ; j = 1, . . . , M , (22.13) {Δ η¯ i } and {Δ η¯ j } are the sequences of bounds on measurements uncertainty.

u¯ j

-

N (·)

η¯ j + y¯ j w¯ j + ? -

Fig. 22.3: Hammerstein system with backlash in steady-state operating conditions

Remark 22.1. Dγr and Dγl are 2-dimensional disjoint sets lying on the (mr , cr )−plane and the (ml , cl )−plane respectively, which means that they can be handled separately. It is worth noting that Dγr and Dγl are 2-dimensional sets enjoying the same topological features and the same mathematical properties. Therefore, the results derived in the rest of the paper for one of the two sets, say Dγl , also hold for the other set (Dγr ). Remark 22.2. Note that Dγr and Dγl are bounded sets as far as at least two measurements with different input amplitude are collected.

372

V. Cerone, D. Piga, and D. Regruto

An exact description of the feasible parameter set Dγl in terms of edges and vertices is presented below together with an orthotopic outer-bounding set providing tight parameter uncertainty intervals. Introductory definitions and preliminary results are first given.

22.3.1 Definitions and Preliminary Results − Definition 22.1. Let h+ l (u j ) and hl (u j ) be the constraints boundaries defining the FPS Dγl corresponding to the j-th sets of data:

. h+ l (u j ) = . h− l (u j ) =

(ml , cl ) ∈ R2+ : y j + Δ η j = ml (u j − cl ) , (ml , cl ) ∈ R2+ : y j − Δ ηs = ml (u j − cl ) .

(22.14) (22.15)

. Definition 22.2. Boundary of Dγl = ∂ Dγl . − Definition 22.3. The constraints boundaries h+ l (u j ) and hl (u j ) are said to be active if their intersections with ∂ Dγl is not the empty set:

h+ l (u j ) h− l (u j )

-

∂ Dγl = 0/ ⇐⇒ h+ l (u j ) is active.

(22.16)

∂ Dγl = 0/ ⇐⇒ h− l (u j ) is active.

(22.17)

− Remark 22.3. The constraints boundaries h+ l (u j ) and hl (u j ) may either intersect l l ∂ Dγ or be external to Dγ .

Definition 22.4 (Edges of Dγl ). $ % . + h˜ + Dγl = ml , cl ∈ Dγl : y j + Δ η j = ml (u j − cl ) , l (u j ) = hl (u j )

(22.18)

$ % . − l l h˜ − (u ) = h (u ) D = m , c ∈ D : y − Δ η = m (u − c ) . j j j j l l l l s γ γ l l

(22.19)

Definition 22.5 (Constraints intersections). The set of all pairs (ml , cl ) ∈ R2+ where intersections among the constraints occur is $ . − − Iγl = (ml , cl ) ∈ R2+ : {h+ {h+ / r (uρ ), hr (uρ )} r (uσ ), hr (uσ )} = 0; % (22.20) ρ , σ = 1, . . . , M; ρ = σ . Definition 22.6 (Vertices of Dγl ). The set of all vertices of Dγl is defined as the set of all intersection couples belonging to the feasible parameter set Dγl : . V (Dγl ) = Iγl Dγl .

(22.21)

22

Bounded Error Identification of Hammerstein Systems with Backlash

373

22.3.2 Exact Description of Dγl An exact description of Dγl can be given in terms of edges, each one being described, from a practical point of view, as a subset of an active constraint lying between two vertices. An effective procedure for deriving active constraints, vertices and edges of Dγl is reported in the Appendix.

22.3.3 Tight Orthotope Description of Dγl Unfortunately, the exact description of Dγl provided by edges could be not so easy to handle. A somewhat more practical, although approximate, description can be obtained by computing the following tight orthotope outer-bounding set Pγl containing Dγl : . Pγl = γ ∈ R2+ : γ j = γ cj + δ γ j , | δ γ j |≤ Δ γ j , j = 1, 2 , where

min max . γj + γj , γ jc = 2

max min . (γ j − γ j ) , Δγj = 2

. . = min γ j , γ max = max γ j . γ min j j γ ∈Dγl

γ ∈Dγl

(22.22)

(22.23) (22.24)

Because the constraints defining Dγl are nonconvex in ml and cl , standard nonlinear optimisation tools (gradient method, Newton method, etc.) cannot be used to solve problems (22.24) since they can trap in local minima, which may result arbitrary far from the global one. Thus, parameter uncertainty intervals obtained using these tools are not guaranteed to contain the true unknown parameters, which is a key requirement of any bounded-error identification method. Global optimal solutions to problems (22.24) can be computed thanks to the result reported below. Proposition 22.1. The global optimal solutions to problems (22.24) occur on the vertices of Dγl . Proof. First (i) we notice that each level curve of functionals (22.24) — parallel lines to ml -axis and cl -axis respectively — intersect the constraint boundaries (22.14) and (22.15) only once. Next, (ii) objective functions in (22.24) are monotone in Dγl , which implies that the optimal solution lies on the boundary of Dγl . Thanks to (i) the optimal value cannot lie on one edge between two vertices: if that was true, it would mean that there is a suboptimal value where the functional intersect the edge twice: that would contradict (i). Then the global optimal solutions of problems (22.24) can only occur on the vertices of Dγl .

374

V. Cerone, D. Piga, and D. Regruto

Remark 22.4. Given the set of vertices V (Dγl ) computed via Algorithm 22.1 reported in the Appendix, evaluation of (22.24) is an easy task because it only requires the computation of (a) the objective functions on a set of at most 4M points and (b) the maximum over a set of real-valued elements.

22.4 Bounding the Parameters of the Linear Dynamic Model In the second stage of the presented procedure, parameters bounds of the linear dynamic part are computed, using a PRBS input {ut } taking values ±u , with u > 0. Thanks to its properties, this kind of input sequence has been successfully used to identify linear dynamic systems (see, e.g., [13, 18]) while, in general, it is inappropriate for the identification of nonlinear systems (see, e.g., [1, 16]). However, as shown in [2], a PRBS input can be effectively employed to decouple the linear and the nonlinear parts in the identification of Hammerstein models with a static nonlinearity. In this chapter we show that the use of a PRBS sequence is profitable for the identification of linear system with input backlash. The key idea underlying the choice of the input sequence {ut } is based on the following result. Result 1. Let us consider a PRBS input {ut } whose levels are ±u . If u > cr and −u < −cl , then the output sequence {xt } of the backlash described by (22.1) is still a PRBS with levels x¯ = mr (u − cr ), x = ml (u − cl ). Proof. The proof of Result 1 follows from the backlash mathematical model (22.1) assuming u = u with u > cr and −u < −cl . From Result 1, it can be noted that the choice of suitable PRBS input levels ±u depends on the unknown parameters cr and cl , which are bounded in the first stage of the presented procedure. Therefore, in order to satisfy hypotheses of Result 1, and −u < −cmax values of u , such that u > cmax r l , are chosen. Given the exact description of Dγr and Dγl , tight bounds on the amplitudes x¯ and x of unmeasurable inner signal xt can be defined as . x¯min = min r mr (u − cr ), for u ≥ cmax r , mr ,cr ∈Dγ

max

x¯

. = max r mr (u − cr ), for u ≥ cmax r ,

(22.25)

mr ,cr ∈Dγ

. xmin = min ml (u + cl ), for − u ≤ −cmax l , ml ,cl ∈Dγl

x

max

. = max m( u + cl ), for − u ≤ −cmax l .

(22.26)

ml ,cl ∈Dγl

Computation of bounds in (22.25) and (22.26) requires, at least in principle, the solution to two nonconvex optimisation problems with two variables and 4M nonlinear inequality constraints. Thanks to Proposition 22.2 reported below, the global optimal solution is guaranteed to be achieved.

22

Bounded Error Identification of Hammerstein Systems with Backlash

375

Definition 22.7. Let us define the x-level curve of the objective function of problem (22.25) as . (22.27) gr (u , x) = (mr , cr ) ∈ R2+ : x = mr (u − cr ) , and the x-level curve of the objective function of problem (22.26) as . gl (u , x) = (ml , cl ) ∈ R2+ : x = ml (−u + cl ) .

(22.28)

Proposition 22.2. The global optimal solutions to problems (22.25) and (22.26) occur on the vertices of Dγr and Dγl , respectively. Proof. First (i) we notice that each x-level curve gl (u , x) intersect each constraint boundary in (22.14) and (22.15) only once. Next, (ii) the objective function x = ml (−u + cl ) is a monotone function in Dγl , which implies that the optimal solution lies on the boundary of Dγl . Thanks to (i) the optimal value cannot lie on an edge between two vertices: if that was true, it would mean that there is a suboptimal value where the functional intersect the edge twice: that would contradict (i). Then the global optimal solutions to problems (22.26) can only occur on the vertices of Dγl . Similar considerations apply to the right side of the backlash. By defining the central estimate x¯tc of x¯t and the uncertainty bound Δ x¯t as min + x¯max . x¯ x¯tc = , 2

max − x¯min . x¯ , Δ x¯t = 2

(22.29)

as well as the central estimate xtc and the uncertainty bound Δ xt of xt as min + xmax . x xtc = , 2

max − xmin . x , Δ xt = 2

(22.30)

the following relation can be established between the unknown inner signals xt and the corresponding central value xtc : xtc = xt + δ xt , | δ xt |≤ Δ xt ,

(22.31)

where xtc

= x¯tc , Δ xt

xtc

= xtc ,

Δ xt

= Δ x¯t if ut = u ,

(22.32)

= Δ xt if ut = −u .

(22.33)

Given the uncertain inner sequence {xtc } and the noise-corrupted output sequence {yt }, the problem of parameters bounds evaluation of the linear system can be formulated in the framework of bounded errors-in-variables (EIV) as shown in Figure 22.4, i.e. the identification of linear dynamic models where both the input and the output are affected by bounded uncertainties. The exact description of the feasible parameter region Dθ for the linear system, i.e. the set of all linear model parameters θ consistent with the assumed model structure, input and output signals xt and yt and error bounds Δ xt and Δ ηt , is

376

V. Cerone, D. Piga, and D. Regruto

$ . Dθ = θ ∈ R p : A (q−1 )(yt − ηt ) = B(q−1 )(xtc − δ xt ); % g = 1; | ηt |≤ Δ ηt ; | δ xt |≤ Δ xt ;t = 1, . . . , N ,

(22.34)

where N is the length of data sequence and g = 1 accounts for condition (22.8) on the steady-state gain. The parameter uncertainty intervals defined as . (22.35) PUI j = θ j , θ j , where

θj

. =

θj

. =

min θ j ,

(22.36)

max θ j ,

(22.37)

θ ∈Dθ θ ∈Dθ

can be computed finding the global optimal solution to the constrained optimisation problems (22.36) and (22.37). Because Dθ is a nonconvex set defined by nonlinear inequalities in the variables θ , ηt and δ xt , numerical optimisation tools cannot be employed to solve problems (22.36) and (22.37) because they can trap in local minima/maxima, which may prevent the computed uncertainty intervals from containing the true parameter θ j . One possible solution to overcome this problem is to relax (22.36) and (22.37) to convex problems to obtain a lower (upper) bound of θ j (θ j ). In paper [6] the technique presented in [4], which provides a polytopic outer approximation of the FPS Dθ , is used to derive relaxed parameter uncertainty intervals through the solution of linear programming problems. In this Section, we exploit the algorithm for the computation of the PUIs (22.35) presented in [5], which is based on the approximation of the original optimisation problems (22.36) and (22.37) by a hierarchy of convex LMI relaxations. Relaxed parameter uncertainty intervals obtained through the application of such a technique are guaranteed to be less conservative than those computed in [6] and to contain the true unknown parameter θ j . Besides, the computed relaxed bounds are guaranteed to converge monotonically to the tight ones defined in (22.36) and (22.37) as the number of successive LMI relaxations, the relaxation order δ , increases (see [5] for details).

−1 - B(q )

xt

−1

A (q

+? δ xt + xtc

?

wt

-

)

+ ηt ? +

yt

?

Fig. 22.4: Errors-in-variables basic setup for linear dynamic system

22

Bounded Error Identification of Hammerstein Systems with Backlash

377

22.5 A Simulated Example In this section we illustrate the presented parameter bounding procedure through a numerical example. The simulated system is characterised by a linear block with A (q−1 ) = (1 − 0.76q−1 + 0.82q−2), B(q−1 ) = (2.15q−1 − 1.09q−2) and a nonsymmetric backlash with ml = 0.25, mr = 0.26, cl = 0.0628, cr = 0.0489. Thus, the true parameters vectors are γ = [ml cl mr cr ]T = [0.25 0.0628 0.26 0.0489]T and θ = [a1 a2 b1 b2 ]T = [−0.76 0.82 2.15 − 1.09]T . It must be pointed out that the backlash parameters are realistically chosen. In fact, we consider the parameters of a real world precision gearbox which features a gear ratio equal to 0.25 and a deadzone as large as 0.0524rad (≈ 3o ) and simulate a possible fictitious nonsymmetric backlash with gear ratio ml = 0.25, mr = 0.26 and deadzone cl = 0.0628 (≈ 3.6o ), cr = 0.0489 (≈ 2.8o ). Bounded absolute output errors are considered when simulating the collection of both steady state data {u¯s , y¯s }, and transient sequence {ut , yt }. Uncertainties η¯ s and ηt are random sequences belonging to the uniform Table 22.1: Nonlinear block parameters evaluation: central estimates (γ cj ), parameter bounds max (γ min j , γ j ) and parameter uncertainty bounds Δ γ j

Δη 0.005

SNR (db) 54

0.02

42

0.05

34

0.15

25

0.2

22

0.3

18

γj ml cl mr cr ml cl mr cr ml cl mr cr ml cl mr cr ml cl mr cr ml cl mr cr

True Value 0.2500 0.0628 0.2600 0.0489 0.25000 0.06280 0.26000 0.04890 0.2500 0.0628 0.2600 0.0489 0.2500 0.06280 0.2600 0.04890 0.25000 0.06280 0.26000 0.04890 0.25000 0.06280 0.26000 0.04890

γ min j

γ cj

γ max j

Δγj

0.2500 0.0624 0.2600 0.0484 0.2496 0.0512 0.2596 0.0377 0.2493 0.0576 0.2593 0.0438 0.2488 0.0504 0.2588 0.0369 0.2493 0.0444 0.2593 0.0311 0.2490 0.0352 0.2590 0.0223

0.2500 0.0628 0.2600 0.0489 0.2499 0.0602 0.2599 0.0464 0.2501 0.0649 0.2601 0.0509 0.2495 0.0661 0.2595 0.0520 0.2503 0.0626 0.2603 0.0487 0.2504 0.0625 0.2604 0.0486

0.2500 0.0633 0.2600 0.0493 0.2503 0.0693 0.2603 0.0550 0.2509 0.0722 0.2609 0.0579 0.2503 0.0818 0.2603 0.0671 0.2512 0.0809 0.2612 0.0662 0.2518 0.0898 0.2618 0.0749

0.0000 0.0005 0.0000 0.0004 0.0003 0.0090 0.0003 0.0087 0.0008 0.0073 0.0008 0.0070 0.0007 0.0157 0.0007 0.0151 0.0009 0.0182 0.0009 0.0175 0.0014 0.0273 0.0014 0.0263

378

V. Cerone, D. Piga, and D. Regruto

Table 22.2: Linear block parameters evaluation: central estimates (θ cj ), parameter bounds max (θ min j , θ j ) and parameter uncertainty bounds Δ θ j for N = 100

Δη 0.005

SNR (db) 50

0.02

37

0.05

29

0.15

19

0.2

17

0.3

14

θj a1 a2 b1 b2 a1 a2 b1 b2 a1 a2 b1 b2 a1 a2 b1 b2 a1 a2 b1 b2 a1 a2 b1 b2

True Value -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900

θ min j

θ cj

θ max j

Δθj

-0.7605 0.8196 2.1470 -1.0943 -0.7666 0.8152 2.1187 -1.1283 -0.7664 0.8142 2.1201 -1.1461 -0.7703 0.8050 2.0725 -1.1953 -0.7814 0.8015 2.0324 -1.2712 -0.7933 0.7914 1.9652 -1.3676

-0.7600 0.8201 2.1503 -1.0906 -0.7594 0.8205 2.1506 -1.0931 -0.7601 0.8191 2.1581 -1.1005 -0.760 0.8206 2.1582 -1.0985 -0.7610 0.8224 2.1680 -1.1176 -0.7606 0.8240 2.1679 -1.1346

-0.7596 0.8205 2.1535 -1.0860 -0.7522 0.8259 2.1825 -1.0580 -0.7539 0.8240 2.1960 -1.0550 -0.7497 0.8353 2.2439 -1.0017 -0.7405 0.8433 2.3036 -0.9639 -0.7278 0.8567 2.3707 -0.9016

0.0004 0.0004 0.0033 0.0037 0.0072 0.0054 0.0319 0.0352 0.0063 0.0049 0.0380 0.0455 0.0103 0.0147 0.0857 0.0968 0.0204 0.0209 0.1356 0.1537 0.0328 0.0327 0.2027 0.2330

distributions U[−Δ η¯ s , +Δ η¯ s ] and U[−Δ ηt , +Δ ηt ], respectively. Bounds on steadystate and transient output measurement errors are supposed to have the same value, . i.e., Δ η¯ s = Δ ηt = Δ η . The numerical example is performed for six different values of Δ η . From the simulated steady-state data {w¯ s , η¯ s } and the transient sequence {wt , ηt }, the signal to noise rations SNR and SNR are evaluated, respectively, as < M M . 2 2 SNR = 10 log ∑ w¯ s (22.38) ∑ η¯ s , SNR

. = 10 log

s=1 N

∑

t=1

< wt2

s=1 N

∑

ηt2

.

(22.39)

t=1

For a given Δ η , the length of steady-state and the transient data are M = 50 and N = [100, 300] respectively. Parameters bounds of the linear system are computed

22

Bounded Error Identification of Hammerstein Systems with Backlash

379

Table 22.3: Linear block parameters evaluation: central estimates (θ cj ), parameter bounds max (θ min j , θ j ) and parameter uncertainty bounds Δ θ j for N = 300

Δ η SNR θ j (db) 0.005 50 a1 a2 b1 b2 0.02 38 a1 a2 b1 b2 0.05 30 a1 a2 b1 b2 0.15 20 a1 a2 b1 b2 0.2 18 a1 a2 b1 b2 0.3 14 a1 a2 b1 b2

True Value -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900

θ min j

θ cj

θ max j

Δθj

-0.7603 0.8196 2.1464 -1.0924 -0.7643 0.8158 2.1135 -1.1145 -0.7641 0.8153 2.1266 -1.1273 -0.7708 0.8110 2.0897 -1.1782 -0.7721 0.8034 1.9965 -1.2069 -0.7794 0.7942 1.9154 -1.2735

-0.7600 0.8200 2.1494 -1.0892 -0.7598 0.8202 2.1435 -1.0832 -0.7597 0.8200 2.1559 -1.0964 -0.7600 0.8213 2.1599 -1.0993 -0.7605 0.8216 2.1302 -1.0651 -0.7605 0.8226 2.1234 -1.0540

-0.7598 0.8204 2.1525 -1.0861 -0.7552 0.8247 2.1734 -1.0519 -0.7552 0.8246 2.1853 -1.0655 -0.7493 0.8315 2.2300 -1.0203 -0.7488 0.8398 2.2638 -0.9233 -0.7416 0.8510 2.3314 -0.8345

0.0002 0.0004 0.0030 0.0031 0.0045 0.0044 0.0300 0.0313 0.0045 0.0047 0.0294 0.0309 0.0107 0.0102 0.0701 0.0790 0.0116 0.0182 0.1336 0.1418 0.0189 0.0284 0.2080 0.2195

through the LMI-based procedure proposed in Section 22.4, for an LMI relaxation order δ equal to 2. Results on the backlash parameters evaluation are reported in Table 22.1, while Table 22.2 and Table 22.3 show results on the linear block parameters estimation for a transient-data sequence length N equal to 100 and 300, respectively. γ j and From Tables 22.1,22.2 and 22.3 it can 7 that the 6 true parameters 7 6 be noted

max and θ min , θ max respectively, θ j belong to the computed intervals γ min j , γj j j for j = 1, . . . , 4. It must be pointed out that, although in principle the estimation algorithm is not guaranteed to provide tight bounds on the parameters of the linear block for a finite value of the relaxation order δ , in practice satisfactory bounds on the parameters θ are obtained also for low signals to noise ratio (SNR < 20 db) and for a small number of experimental measurements (N = 100).

380

V. Cerone, D. Piga, and D. Regruto

22.6 Conclusion A two stage procedure for bounding the parameters of a single-input single-output Hammerstein system, where the nonlinear block is a backlash and the output measurements are corrupted by bounded noise, is presented in this chapter. The proposed approach is based on the selection of the input signal in order to decouple the nonlinear and the linear block parameters. In the first stage, a set of square wave input signals with different amplitudes is applied and the corresponding steady-state output samples are collected, from which the characterisation of the backlash feasible parameter set is derived, thanks to the fact that in steady-state operating conditions the input-output mapping does not depend on the linear block. On the basis of the derived backlash feasible parameter set, parameter uncertainty intervals are evaluated by computing the global optima solutions to nonconvex optimisation problems. In the second stage, a method for computing bounds on the unmeasurable inner signal is presented when a pseudo random binary signal is applied to the system. The obtained inner signal bounds, together with output noisy measurements, are used to estimate the linear block parameters through the solution of a suitable errors-invariables identification scheme. Recent results on relaxation techniques based on linear matrix inequalities are profitably used to compute the linear block parameter uncertainty intervals, which are guaranteed to converge monotonically to the tight ones as the relaxation order increases. The effectiveness of the procedure discussed in this chapter is shown through a simulated example, where satisfactory parameters bounds are obtained also for a small number of experimental data and in the presence of low signal to noise ratios.

Appendix In this appendix a procedure for the computation of vertices and active constraints defining the feasible parameter set Dγl is presented. The following additional symbols and quantities are introduced: HL is a list of active constraints boundaries, that is, each element HL (k) of the list is an active constraint boundary; the expression X ← {z} means that the element z is included in the set or list X; Dγl (s) is the set of the backlash parameters that are consistent with the first s measurement, the error bound and the assumed backlash model structure. A formal description of Dγl (s) is: . Dγl (s) = (ml , cl ) ∈ R2+ : y¯ j = ml (u¯ j − cl ) + η¯ j , | η¯ j |≤ Δ η¯ j ; j = 1, . . . , s . (22.40) The proposed procedure, Algorithm 22.1 below, works in four stages. First, the active constraints boundaries and the vertices of the set Dγl are characterised exploiting Definitions 1, 3, 5 and 6. Then, for each new measurement us , the intersections among the constraints boundaries h+ (us ) and h− (us ) and the active constraints boundaries contained in the list HL are computed; these intersections are temporarily included in the set V (Dγl ); the constraints boundaries h+ (us ) and h− (us ) are included in the list HL . Further, vertices of Dγl (s) are obtained rejecting the constraints

22

Bounded Error Identification of Hammerstein Systems with Backlash

381

Algorithm 22.1: Computation of vertices and active constraints of Dγl 1. begin = 2. V (Dγl ) ← {h+ (u1 ) h+ (u1 )}. = 3. V (Dγl ) ← {h+ (u1 ) h− (u2 )}. = 4. V (Dγl ) ← {h− (u1 ) h+ (u2 )}. = 5. V (Dγl ) ← {h− (u1 ) h− (u2 )}. 6. HL ← {h+ (u1 ), h+ (u2 ), h− (u1 ), h− (u2 )}. 7. for s = 3 : 1 : M 8. L = length(HL ); 9. q = 0; 10. for z = 1 : 1 : L = 11. V (Dγl ) ← {h+ (us ) HL (z)}. 12. if h+ (us ) ∈ / HL then 13. HL ← {h+ (us )}. 14. end if = 15. V (Dγl ) ← {h− (us ) HL (z)}. 16. if h− (us ) ∈ / HL then 17. HL ← {h− (us )}. 18. end if 19. end for = 20. V (Dγl ) = {V (Dγl ) Dγl (s)}. 21. for k=1:1:length(HL) = 22. if ∃ j = k : {HL (k) HL ( j)} ∈ V (Dγl ) then 23. Haux (q) = HL (k). 24. q = q + 1. 24. end if 25. end for 26. HL = Haux . 27. end for 28. return HL . 29. return V (Dγl ). 30. end

boundaries intersections that do not satisfy all the constraints generated by the first s measurements, which implicitly define Dγl (s). Finally, HL is updated retaining only the constraints boundaries whose intersection with each other is a vertex of Dγl (s).

References 1. Bai, E.W.: Identification of linear systems with hard input nonlinearities of known structure. Automatica 38, 853–860 (2002) 2. Bai, E.W.: Decoupling the linear and nonlinear parts in Hammerstein model identification. Automatica 40(4), 671–676 (2004) 3. Bapat, C.N., Popplewell, N., McLachlan, K.: Stable periodic motions of an impact-pair. Journal of Sound and Vibration 87, 19–40 (1983)

382

V. Cerone, D. Piga, and D. Regruto

4. Cerone, V.: Parameter bounds for armax models from records with bounded errors in variables. Int. J. Control 57(1), 225–235 (1993) 5. Cerone, V., Piga, D., Regruto, D.: Set-membership EIV identification through LMI relaxation techniques. In: Proc. of American Control Conference, Baltimore, Maryland, USA (2010) 6. Cerone, V., Regruto, D.: Bounding the parameters of linear systems with input backlash. IEEE Trans. Automatic Control 52(3), 531–536 (2007) 7. Corradini, M.L., Orlando, G., Parlangeli, G.: A VSC approach for the robust stabilization of nonlinear plants with uncertain nonsmooth actuator nonlinearities — A unified framework. IEEE Trans. Automatic Control 49(5), 807–813 (2004) 8. Giri, F., Rochdi, Y., Chaoui, F.Z., Brouri, A.: Identification of hammerstein systems in presence of hysteresis-backlash and hysteresis-relay nonlinearities. Automatica 44(3), 767–775 (2008) 9. Kalafatis, A.D., Wang, L., Cluett, W.R.: Identification of Wiener-type nonlinear systems in a noisy enviroment. Int. J. Control 66(6), 923–941 (1997) 10. Krzy˙zak, A.: Identification of nonlinear block-oriented systems by the recursive kernel estimate. Int. J. Franklin Inst. 330(3), 605–627 (1993) 11. Lang, Z.Q.: Controller design oriented model identification method for Hammerstein system. Automatica 29(3), 767–771 (1993) 12. Lang, Z.Q.: A nonparametric polynomial identification algorithm for the Hammerstein system. IEEE Trans. Automatic Control 42(10), 1435–1441 (1997) 13. Ljung, L.: System Identification, Theory for the User. Prentince Hall, Upper Saddle River (1999) 14. Milanese, M., Norton, J., Piet-Lahanier, H., Walter, E. (eds.): Bounding approaches to system identification. Plenum Press, New York (1996) 15. Milanese, M., Vicino, A.: Optimal estimation theory for dynamic sistems with set membership uncertainty: an overview. Automatica 27(6), 997–1009 (1991) 16. Ninness, B., Gibson, S.: Quantifying the accuracy of Hammerstein model estimation. Automatica 38, 2037–2051 (2002) 17. Nordin, M., Gutman, P.O.: Controlling mechanical systems with backlash — a survey. Automatica 38, 1633–1649 (2002) 18. Söderström, T., Stoica, P.: System Identification. Prentice-Hall, Upper Saddle River (1989) 19. Stoica, P., Söderström, T.: Instrumental-variable methods for identification of Hammerstein systems. Int. J. Control 35(3), 459–476 (1982) 20. Sun, L., Liu, W., Sano, A.: Identification of a dynamical system with input nonlinearity. IEE Proc. Part D 146(1), 41–51 (1999) 21. Tao, G., Canudas de Wit, C.A.: Special issue on adaptive systems with non-smooth nonlinearities. Int. J. of Adapt. Control & Sign. Proces. 11(1) (1997) 22. Tao, G., Kokotovic, P.V.: Adaptive control of systems with actuator and sensor nonlinearities. Wiley, New York (1996) 23. Vörös, J.: Modeling and identification of systems with backlash. Automatica 46(2), 369– 374 (2010) 24. Walter, E., Piet-Lahanier, H.: Estimation of parameter bounds from bounded-error data: a survey. Mathematics and Computers in simulation 32, 449–468 (1990)

Part VIII

Application of Block-oriented Models

Chapter 23

Block Structured Modelling in the Study of the Stretch Reflex David T. Westwick

23.1 Introduction Nonlinear system identification has a long history in several disciplines related to biomedical engineering. Much of this can be credited to the seminal textbook written by Marmarelis and Marmarelis [25], which popularised the use of the crosscorrelation method for estimating the Wiener kernels of a system driven by a white Gaussian noise input [24]. This, together with the rapidly evolving capabilities of the digital computers of the day, gave a large number of researchers the tools to investigate a wide variety of nonlinear dynamical systems, particularly in the area of sensory physiology. More recent developments have been summarised in a series of multiple-author research volumes [26, 27, 28], and two recent textbooks [29, 43]. The balance of this chapter will be organised as follows. Section 23.2 presents background information on the relationship between the Volterra/Wiener series and several commonly used block structured models. Initial applications of block structured modelling in the study of sensory systems are reviewed in Section 23.3 . This will be followed by a discussion of the identification of systems containing high-degree nonlinearities in Section 23.4. Section 23.5 discuses a model developed specifically for the investigation of joint dynamics. Finally, Section 23.6 will summarise the chapter and suggest some open problems.

23.2 Preliminaries A wide variety of time-invariant nonlinear systems can be represented by a Volterra series. The system is described by a series of Volterra kernels of degrees ranging from 0 to some maximum value L. For a discrete time system, the output of it’s ’th degree Volterra kernel can be written: David T. Westwick Department of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary, 2500 University Drive, NW Calgary, AB, Canada e-mail: [email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 385–402. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

386

D.T. Westwick T

y (t) =

T

T

∑ ∑ . . . ∑ h(k1 , k2 , . . . , k)u(t − k1)u(t − k2) . . . u(t − k)

k1 =0 k2 =0

(23.1)

k =0

where u(t) and y (t) are the input and output, respectively, T is the memory length of the system, and h (k1 , k2 , . . . , k ) is the ’th degree Volterra kernel. The output of the system is then written as the sum of the outputs of the individual kernels. Thus: L

y(t) =

∑ y(t) ,

(23.2)

=0

Boyd and Chua [4] proved that any time-invariant system could be approximated, to within an arbitrary precision, by a Volterra series model, provided the system had what they defined as a fading memory. While this result established the theoretical validity of the Volterra series model, it says nothing about its practical applicability. It can be shown [25] that the number of model parameters, that is the number of independent kernel values, in a Volterra series model is given by N par =

(T + L + 1)! . (T + 1)!L!

(23.3)

It is clear in (23.1) that the Volterra series model is linear in the parameters, and can therefore be identified using an ordinary least squares regression. However, the number of parameters given in (23.3) makes it clear that this approach will only be practical for systems with relatively mild nonlinearities (low L), short memories (low T ) or both. A more attractive alternative involves working with block structured models, interconnections of dynamic linear elements and memoryless nonlinearities. Any number of representations can be used for the linear and nonlinear elements. However, to emphasise the relationship to the Volterra series, we will use finite impulse responses (FIR) to represent the dynamic linear elements, and polynomials to represent the nonlinearities. Consider a Wiener system, a dynamic linear element followed by a memoryless nonlinearity, L

y(t) =

&

T

'

∑ c() ∑ h(k)u(t − k)

=0

(23.4)

k=0

where h(k) are the elements of the FIR filter, and c() are the polynomial coefficients of the nonlinearity. Comparing (23.4) with (23.1), it is evident that the Volterra kernels of a Wiener system are given by: h (k1 , k2 , . . . , k ) = c()h(k1 )h(k2 ) . . . h(k ) .

(23.5)

This leads to the observation that the second-order Volterra kernel will be a rank-1 matrix. Furthermore, if one sums a Volterra kernel over all but one of its indices, the result will be proportional to the impulse response of the linear element.

23

Block Structured Modelling in the Study of the Stretch Reflex

387

Reversing the order of the linear and nonlinear elements produces a Hammerstein system: T

L

k=0

=0

∑ h(k) ∑ c()u(t − k) ,

y(t) =

(23.6)

which has Volterra kernels given by h (k1 , k2 , . . . , k ) = c()h(k1 )δ (k2 − k1 )δ (k3 − k1 ) . . . δ (k − k1 ) ,

(23.7)

where δ (i − j) is interpreted as a Kronecker delta. Thus, the Volterra kernels of a Hammerstein system will only be non-zero on their main diagonals, which will all be proportional to the impulse response of the linear element. The Wiener–Hammerstein model often turns up in physiological applications, where it is more commonly referred to either as an LNL model, or a sandwich model [22]. The output of a Wiener Hammerstein model can be written as: y(t) =

T −1

L

j=0

=0

&

T −1

'

∑ g( j) ∑ c() ∑ h(k)u(t − j − k)

,

(23.8)

k=0

where h(k) and g( j) are the impulse responses of the first and second linear elements, respectively. By comparing (23.8) with (23.1), it is possible to compute the Volterra kernels of the Wiener–Hammerstein model. T −1

h (k1 , . . . , kq ) = c()

∑ g( j)h(k1 − j)h(k2 − j) . . . h(k − j) .

(23.9)

j=0

23.3 Initial Applications Block structured models, have found widespread application in the study of physiological systems because they can often be used to model highly nonlinear systems using relatively few parameters. Furthermore, it is sometimes possible to map the individual elements to different parts of the system under study. The nervous system encodes physical signals in the firing rates of neurons. This leads to saturation, on both the upper and lower extremes of the input, and perhaps some sort of compression of the signal’s dynamic range. Since this process is generally much faster than the rest of the dynamics, it appears to be memoryless when working at time scales that are relevant to the rest of the system. For example, the auditory system is often modelled using a Wiener Hammerstein cascade [9], two linear filters separated by a static nonlinearity, where the nonlinearity can be associated with the process of encoding the response of the hair cells onto an auditory nerve fibre. Similarly, the stretch reflex, the relationship between the angular velocity of a joint and the electrical activity evoked in its muscles, can be modelled as a Hammerstein cascade [19]. The nonlinearity is believed to model the muscle spindles, which are organs that respond to a stretching of the muscles. In both cases, using a block structured

388

D.T. Westwick

model provides insight regarding the processes that are happening on either side of the nonlinearities. Some of the first applications physiological applications of block structured modelling [37, 38, 39] involved analysing the first and second-order Wiener kernels from the responses of various cells in the bullfrog retina to both light inputs, and to electrical stimulation. In models of Type-N amacrine cells, Sakuranaga noted that the second-order kernels could be generated by by a Wiener cascade consisting of the system’s first-order Wiener kernel followed by a square-law device. This was not true for the Type-C amacrine cells. Later, Naka et al. [33] showed that these second-order kernels could be generated (approximately) by convolving the firstorder Wiener kernel of the electrical response between two peripheral cells, with the second-order kernel of a Type-N amacrine cell, thus suggesting that the response of the Type-C cells could be explained by a Wiener–Hammerstein model. While the generation of the block structured models helped explain the nature of the transformations occurring in the retina, the accuracy of the models was limited by that of the initial Wiener kernel estimates. Korenberg [21] developed a highly efficient method, the fast orthogonal algorithm (FOA), for estimating the zero, first and second-order terms in a truncated Volterra series model. This development prompted a series of studies of various sensory systems, including the visual system [10, 11], and the stretch receptor in the cockroach hindleg [12, 13]. In these studies, the elements of a Wiener–Hammerstein model were extracted from estimates of the first- and second-order Volterra kernels obtained using the FOA. The first significantly non-zero row in the estimated second-order Volterra kernel was used as an estimate of the initial linear element. This was then deconvolved from the estimated first-order kernel to yield an estimate of the second linear element in the cascade. The nonlinearity could then be recovered via least squares regression. Several iterative, correlation based techniques have been proposed for the identification of Hammerstein [17] Wiener [17, 35, 23] and Wiener–Hammerstein [22] models. These techniques all require Gaussian inputs, as they use Bussgang’s famous theorem [5] to obtain initial estimates of the linear elements from measurements of the input/output cross-correlation. The methods iterate between fitting the nonlinear elements (via some sort of least squares computation), and the linear elements (using correlation based calculations). Although these procedures generally produce acceptable results, none of them has been proved to converge, much less to any sort of optimal solution. Kearney and Hunter [18] used system identification techniques to fit a nonlinear model between the angular velocity of the ankle joint and the electrical activity that it evoked over the Gastrocnemius and Soleus (GS) muscles. In this study, they compared linear impulse response models with Hammerstein models obtained by fitting linear models between half-rectified velocity and the resulting myoelectric activity. The choice of a half wave rectifier nonlinearity was motivated by knowledge of the underlying physiology. In particular, there was experimental evidence to suggest that the muscle spindle was expected to respond to stretching of the muscle, and to be velocity sensitive [7]. In a later study, Kearney and Hunter [19] estimated the first and

23

Block Structured Modelling in the Study of the Stretch Reflex

389

second-order Wiener kernels between the ankle angular velocity and GS-EMG, and found that the second-order kernel was dominated by its diagonal values, suggesting a Hammerstein structure. They then applied the iterative algorithm suggested by [17] to estimate the linear and nonlinear elements in a Hammerstein cascade. The predictive power of the resulting model was found to be similar to that of the ad-hoc Hammerstein model identified in the earlier study [18].

23.4 Hard Nonlinearities Traditionally, the nonlinearities in block structured models have been represented by polynomials [34, 6]. This underscores the connection to the kernels in the Volterra series. However, polynomial estimation can be problematical, as the resulting linear regressions can become very poorly conditioned. As a result, Hunter and Korenberg [17] suggested using orthogonal polynomials instead. In particular, they recommended using Chebyshev polynomials, since these tend to produce well conditioned estimation problems for a wide variety of input distributions [2]. Figure 23.1 shows the first 6 elements of three polynomial bases: a power series, Hermite polynomials, and Chebyshev polynomials. Suppose that we wish to fit a polynomial to input-output data from a static nonlinearity. Consider 2 inputs, each 1000 points long: one with a standard normal distribution, and one that is uniformly distributed. Table 23.1 shows the condition numbers of the regressions that would fit polynomials of degrees 0 through 10 using these two inputs and each of the three polynomial bases shown in Figure 23.1. Notice that the regressions for the Chebyshev polynomials remain well conditioned for all polynomial degrees tested, whereas the other two bases quickly become severely ill-conditioned. Furthermore, note that the Chebyshev basis remains better conditioned with a uniform input, than with a normally distributed input. Table 23.1: Condition numbers for the example polynomial regression obtained using Power Series, Hermite, and Chebyshev basis functions Poly. Normal Distribution Uniform Distribution Deg. Power Series Hermite Chebyshev Power Series Hermite Chebyshev 0 1.00 1.00 1.00 1.00 1.00 1.00 1.05 1.05 2.85 1.00 1.00 1.74 1 2.07 1.14 4.93 2.91 1.02 2.02 2 3.37 1.33 7.63 5.89 3.36 2.52 3 6.02 1.94 9.11 10.3 9.51 2.67 4 13.6 3.87 10.5 29.3 32.1 3.05 5 27.7 8.70 11.0 55.5 147. 3.20 6 62.4 23.2 11.4 149. 539. 3.48 7 140. 68.2 11.5 305. 2,789. 3.57 8 335. 208 11.6 804. 11,944. 3.77 9 773. 730. 11.6 1,707. 63,793. 3.81 10

390

D.T. Westwick

Fig. 23.1: Comparison of the power series, Hermite and Chebyshev polynomial basis functions. The power series (left) is shown over an arbitrarily chosen domain. The Hermite polynomials (centre) are shown over the range [-3 3], since they are orthogonal for standard normal inputs. The Chebyshev polynomials are shown over the domain [-1 1], since the input is mapped to this range as part of the fitting procedure

Figure 23.2 shows polynomials of degrees 2,3,5 and 7 fit to a hard nonlinearity, the composition of an arc-tangent with a half-wave rectifier, using the inputs described above. This nonlinearity includes a linear segment, a sharp corner, and a gentle saturation. Results obtained with a normally distributed input are shown on the left. Since the data are more heavily distributed close to the origin, the result-

23

Block Structured Modelling in the Study of the Stretch Reflex

391

Fig. 23.2: Polynomial approximations to a hard nonlinearity obtained with a normally distributed (left) and a uniformly distributed input (right)

ing approximations are relatively poor near the extremities. The plots on the right were obtained using a uniformly distributed input, which causes the fitting error to be more uniformly distributed across the input domain. Fitting hard nonlinearities will require high-degree polynomials, which may in turn lead to ill-conditioned estimation problems. To a certain degree, conditioning problems can be reduced if a Chebyshev polynomial basis is used. A separable least squares (SLS) based algorithm has been developed that identifies Hammerstein models consisting of a polynomial nonlinearity followed by a FIR filter [42]. The Hammerstein model was parametrised by the vector: 3 4 θn θ= (23.10) θ where θ is a vector that includes all the parameters that appear linearly in the output, and θn contains all the remaining parameters. Clearly, the polynomial Hammerstein model (23.6) is bilinear in the polynomial coefficients and impulse response weights. Thus, the output can be regarded as being linear in either the filter weights, or in the polynomial coefficients, provided the other parameters are held fixed. Since the number of filter weights will generally be much larger than the number of polynomial coefficients, they were treated as the “linear” parameters, whereas the polynomial coefficients were placed in θn , the vector of nonlinear parameters. Given any set of nonlinear parameters, θn , the optimal linear parameters corresponding to those particular nonlinear parameters may be obtained by solving a linear regression. Thus, the linear parameters, and hence the model output and resulting mean squared prediction error, can be viewed as functions of the nonlinear parameters. Thus, one need only perform an optimisation with respect to the relatively small number of nonlinear parameters. These separable nonlinear least squares problems were studied in [14]. Ruhe and Wedin [36] proposed several algorithms for solving them. In the system

392

D.T. Westwick

identification literature, interest in these algorithms appears to have been initiated in [40], which applied them to neural network training. The SLS based algorithm [42] has been used to identify a Hammerstein model of the stretch reflex EMG. Details of the experiment can be found in [15, 19]. Briefly, a human subject lay supine, with his left foot inserted into a custom fitted fibre-glass cast. The cast was attached to a rotary electro-hydraulic actuator, and aligned the axis of rotation of the ankle with that of the actuator. The actuator was configured as a position servo, which was used to apply the experimental perturbation to the joint. The measurements consisted of the ankle angle, measured by a potentiometer attached to the actuator, the torque, measured by a load cell placed between the cast and actuator, and the electrical activity (EMG) measured over the Tibialis Anterior and over the Gastrocnemius and Soleus muscles. Data were sampled at 1000Hz, and then decimated by a factor of 5. Figure 23.3 shows 10 seconds of data from this experiment. Thus, models were fit between the angular velocity of the ankle joint, and the electrical activity recorded over the GS muscles. Results were compared with those obtained using the iterative technique proposed in [17]. Figure 23.4 shows a typical result from this study. Both models comprised a 7th-degree Chebyshev polynomial

Fig. 23.3: 10 seconds of experimental joint dynamics data. The top panel shows the measured ankle angle, while the torque is shown in the middle. The lower panel shows both EMGs. The activity of the Tibialis Anterior is shown in grey, while the activity of the Gastrocnemius/Soleus is shown in black, with the sign reversed

23

Block Structured Modelling in the Study of the Stretch Reflex

393

Fig. 23.4: Comparison of stretch reflex EMG models identified using the Hunter-Korenberg iterative method (dashed), and a separable least squares algorithm (solid). The separable least squares model provided substantially better predictions than did the model identified using the Hunter-Korenberg algorithm (5.13% vs 8.83% normalised mean squared error in separate validation data)

followed by a 32-tap FIR filter. However, the model identified using SLS produced considerably better output predictions than did the iterative model, resulting in a 42% reduction in the normalised mean squared error (NMSE), when evaluated using separate validation data. This is not surprising, since the SLS approach explicitly minimises the mean squared error, and was initialised using the result of the iterative approach. Even with their robust numerical properties, Chebyshev polynomials are not well suited to modelling “hard” nonlinearities such as rectification, saturation, thresholding and deadzones, since all of the basis functions have support over the whole real line, and eventually tend to ±∞. One approach to dealing with these hard nonlinearities, is to adopt specialised models. For example Bai [1] developed single parameter models of a number of hard nonlinearities. Since the linear element in the Hammerstein cascade could be identified using linear regression, it was possible to express the mean squared prediction error as a function of the single parameter that represented the nonlinearity. Thus, the optimal nonlinearity could be fitted by numerically solving a one-dimensional optimisation problem. While this is a promising approach, the resulting nonlinearity models were all piecewise linear. While the hard

394

D.T. Westwick

nonlinearities often encountered in physiological systems are sometimes described in terms of these piecewise linear transformations, the nonlinearities themselves are are somewhat more complicated, and more likely contain smooth curves rather than linear segments.

Use of Spline Functions One of the benefits of using a separable least squares approach to identify the elements of a Hammerstein model is that there is no reason to use a nonlinearity model that is linear in its parameters, since the nonlinearity parameters will be fitted using an iterative optimisation. Thus, in [8] the polynomial nonlinearity used in [42] was replaced with a cubic spline [3], a piecewise cubic function defined by a series of knot points. Between each pair of successive knots, the spline is defined by a cubic polynomial, chosen such that the spline is continuous and twice differentiable. There are a variety of different spline representations, with slight variations in their properties. In [8], the spline was represented by the coordinates of its knot points. Using this representation, the output is nonlinear with respect to all of the parameters. The benefit derived by using this representation, is that each of the parameters only influences the shape of the nonlinearity in a local region, determined by the positions of the adjacent knot points. This, in turn, leads to a relatively well conditioned estimation problem. Cubic splines are well suited to the approximation of hard nonlinearities. For example, Figure 23.5 shows cubic spline approximations to the same hard nonlinearity used to generate Figure 23.2. A cubic spline with 5 knots has 10 free parameters, the same number as a degree 9 polynomial. However, the fits shown in Figure 23.5 for 5 knot splines are almost perfect. This suggests that cubic splines could be well suited to representing hard nonlinearities in block-structured models.

Fig. 23.5: Cubic spline approximations to a hard nonlinearity obtained with a normally distributed (left) and a uniformly distributed input (right)

23

Block Structured Modelling in the Study of the Stretch Reflex

395

The algorithm proposed in [8] was used to fit a Hammerstein model the data shown in Figure 23.4. Results are summarised in Figure 23.6, which compares identified Hammerstein models with Chebyshev polynomial and cubic spline nonlinearities. The two models appear to be virtually identical, however, in cross-validation testing, the spline-based model generated simulation errors that were 22% smaller than those generated by the polynomial based model.

Fig. 23.6: Comparison of Hammerstein models the stretch reflex EMG using either a Chebyshev polynomial (dashed) or a cubic spline (solid) to represent the static nonlinearity. Although the two models appear quite similar, the spline based model produced more accurate predictions than the polynomial based model (4.01% vs 5.13% NMSE in separate validation data)

Although splines (cubic or otherwise) posses powerful approximation abilities, their outputs are highly nonlinear functions of their knot positions. This leads to non-convex optimisation problems, whose solutions can be sensitive to the choice of initial estimate. Although it is possible to obtain excellent results using block structured models that incorporate spline nonlinearities, the resulting algorithms can be difficult to use, due to the need for an appropriate initialisation. Until this difficulty can be addressed, it will limit the usefulness of spline-based block-structured models.

396

D.T. Westwick

23.5 The Parallel Cascade Stiffness Model While the electrical activity recorded in the EMG provides useful information regarding the timing of muscle contractions, it is just a byproduct of the muscle activation system. On the other hand, it is the force (or torque) generated by the muscles that is of functional significance. hIS (τ ) TI (t) 2

Is + Bs + K

θ (t)

+ T q(t)

nRS (·) d dt

hRS (τ )

+ TR (t)

Fig. 23.7: Block structured model of joint stiffness. The upper pathway represents the intrinsic stiffness, due to the properties of active muscle, tendons ligaments etc., in the absence of reflexive activity. It is represented as a second-order linear mass-spring-damper system. The lower pathway represents the effects of reflexes on the joint stiffness. It is a Wiener– Hammerstein system comprising a differentiator, a memoryless nonlinearity that likely includes half-wave rectification, and a second linear element, with is taken to represent the activation dynamics of the muscle, but which also includes the propagation delay in the entire reflex arc

Kearney et al. [20] proposed a model, shown in Figure 23.7, that could be fit between the angular position of a joint, and the resulting torque generated around that joint, and an iterative, correlation-based identification scheme for this model. This was followed by several studies where this model was used to study the effects of spinal lesions, strokes and degenerative neuromuscular diseases on the significance of the reflex contributions to overall joint stiffness [30, 31, 32]. Zhang and Rymer [46] used a similar model to study of the effects of reflexes in the mechanics of the human elbow joint. While this model structure has the potential to provide very detailed information regarding the physiology of the system, its identification presents several challenges: 1. The intrinsic pathway is an improper system, as it has two zeros and no poles. Discretisation of this system results in an acausal system [44], which can be modelled using a two-sided impulse response [18]. This approach has been widely used in the identification of linearised models of joint stiffness [16]. 2. Some sort of constraint must be used to separate the velocity term in the intrinsic response from that due to the the first-degree term in the polynomial representation of the nonlinearity. In the physiological system, the reflex loop includes a propagation delay, which can be used to separate these two contributions.

23

Block Structured Modelling in the Study of the Stretch Reflex

397

23.5.1 Iterative, Correlation based Approach The correlation based algorithm for identifying the elements of the parallel cascade stiffness model, proposed in [20], used the approximately 40ms propagation delay in the reflex loop to separate the contributions of the intrinsic and reflex pathways, and hence restore identifiability. They first fit a two-sided impulse response between the (input) position and (output) torque, but limited the memory length to ± 40ms. A Hammerstein cascade, whose linear element included a 40ms delay, was fitted between the joint’s angular velocity and the residuals remaining after the output of the intrinsic pathway had been subtracted from the measured torque. This iteration then continued, but the restrictions on the memory length of the intrinsic pathway were removed. Figure 23.8 shows 10 seconds (out of a total of 65) of experimental ankle position/torque data. The experiment that produced these data was essentially the same as that described above. However, the input perturbation had been redesigned to have a slightly higher bandwidth, and more uniform velocity distribution, both of which facilitate the identification of the nonlinear model. Nevertheless, the bandwidth of the input was restricted enough not to significantly suppress the stretch reflex. The first 50 seconds were used for identification, so that the last 15 seconds could be

Fig. 23.8: Data from an experimental study of ankle joint mechanics. The data consisted of approximately 65 seconds of ankle position (top) and torque (bottom) measurements, sampled at 100Hz

398

D.T. Westwick

Fig. 23.9: Parallel Cascade Stiffness models identified using the iterative correlation based technique introduced by [20]. Note that the impulse responses in the reflex pathway (lower right) were smoothed using a 3-point, zero-phase smoother, for presentation purposes

used for model validation. Figure 23.9 shows the elements of two models identified from these data using variants of the iterative correlation based approach proposed in [20]. In one case (dashed lines), the static nonlinearity was assumed to be halfwave rectifier. In the other case (solid lines), the nonlinearity was represented by a 7th degree Chebyshev polynomial. Cross-validation resulted in simulation errors of 24.74% and 22.30% NMSE for the rectifier and polynomial based models, respectively. While it is much simpler to fit the rectifier based model, including the nonlinearity in the fitting procedure improved the simulation accuracy of the model, both in the identification and validation data. Furthermore the impulse responses were also visibly less noisy in the polynomial based model than was the case in the rectifier based model.

23.5.2 Separable Least Squares Optimisation Let the memoryless nonlinearity, nRS (·) in Figure 23.7, be parametrised by the vector θn . Then the output of the parallel cascade stiffness model can be written: M1

T (t) =

∑

τ =−M1

hIS (τ )θ (t − τ ) +

M2

∑

τ =DR

hRS (τ )nRS (θ˙ (t − τ ), θn )

(23.11)

where M1 is the memory (and anticipation) length of the two-sided intrinsic stiffness model, Dr and M2 are the delay and memory length of the reflex model, and the remaining variables are defined in Figure 23.7. Regardless of how the nonlinearity is parametrised, the model output is linear in the tap weights of the two impulse responses. Thus, this model structure appears to be an ideal candidate for a separable least squares approach.

23

Block Structured Modelling in the Study of the Stretch Reflex

399

Fig. 23.10: Parallel Cascade Stiffness models identified using the separable least squares based approach introduced in [41], using both polynomial (dashed) and cubic spline (solid) models for the static nonlinearity

Fig. 23.11: Torque predictions produced by the cubic-spline based parallel cascade stiffness model

A separable least squares algorithm for a parallel cascade stiffness model with a polynomial nonlinearity was developed in [41]. Clearly, this approach could be extended to a variety nonlinearity representations. Figure 23.10 shows the elements of two models that were fit to the data in Figure 23.8 using SLS. The first (dashed lines) used represented the nonlinearity as a 7th degree Chebyshev polynomial. The second model (solid lines) used a 5-knot cubic spline instead. The separable least squares models produced slightly more accurate predictions than did the models identified using the iterative correlation based algorithms. The polynomial and spline based models resulted in 20.03 and 19.28% NMSE, respectively, in the validation segment. In addition to modest improvements in the prediction accuracy, the linear impulse responses in the models identified using SLS techniques were quite visibly less noisy than those identified using the correlation approach. However, it should be noted that the SLS method required significantly

400

D.T. Westwick

more computation time than either of the iterative schemes. Furthermore, the nonlinearities in Figure 23.10 appear to contain a constant gain, as compared to those in Figure 23.9. This may be due to differences in the velocity term assigned to the intrinsic pathway. Additional constraints may be required to remove this ambiguity.

23.6 Conclusions Block structured models can provide functional insight into certain physiological systems. They are particularly well suited to the study of the effects of reflexes on joint mechanics. There are several challenges in applying these methods, including the need to model high-degree nonlinearities, and the need to use appropriately chosen constraints to maintain identifiability. Acknowledgements. The author would like to thank Dr. Robert E. Kearney from the Department of Biomedical Engineering at McGill University for supplying the experimental data.

References 1. Bai, E.: Identification of linear systems with hard input nonlinearities of known structure. Automatica 38, 853–860 (2002) 2. Beckmann, P.: Orthogonal Polynomials for Engineers and Physicists. The Golem Press, Boulder (1973) 3. de Boor, C.D.: A practical guide to splines, Applied Mathematical Sciences, vol. 27. Springer, New York (1978) 4. Boyd, S., Chua, L.: Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Trans. Circuits Syst. CAS 32(11), 1150–1161 (1985) 5. Bussgang, J.: Crosscorrelation functions of amplitude-distorted Gaussian signals. Tech. Rep. 216, MIT Electrical Research Lab (1952) 6. Chang, F., Luus, R.: A noniterative method for identification using Hammerstein model. IEEE Trans. Autom. Control 16, 464–468 (1971) 7. Chen, W., Poppele, R.: Static fusimotor effect on the sensitivity of mammalian muscle spindles. Brain Res. 57, 244–247 (1973) 8. Dempsey, E., Westwick, D.: Identification of Hammerstein models with cubic spline nonlinearities. IEEE Trans. Biomed. Eng. 51, 237–245 (2004) 9. Eggermont, J.: Wiener and Volterra analyses applied to the auditory system. Hearing Res 66, 177–201 (1993) 10. Emerson, R., Korenberg, M., Citron, M.: Identification of intensive nonlinearities in cascade models of visual cortex and its relation to cell classification. In: [27], pp. 97–111 (1989) 11. Emerson, R., Korenberg, M., Citron, M.: Identification of complex-cell intensive nonlinearities in a cascade model of cat visual cortex. Biol. Cybern. 66, 291–300 (1992) 12. French, A., Korenberg, M.: A nonlinear cascade model for action potential encoding in an insect sensory neuron. Biophys. J. 55, 655–661 (1989) 13. French, A., Korenberg, M.: Disection of a nonlinear cascade model for sensory encoding. Ann. Biomed. Eng. 19, 473–484 (1991)

23

Block Structured Modelling in the Study of the Stretch Reflex

401

14. Golub, G., Pereyra, V.: The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM J. Numer. Anal. 10(2), 413–432 (1973) 15. Hunter, I., Kearney, R.: Two-sided linear filter identification. Med. Biol. Eng. Comput. 21, 203–209 (1983) 16. Hunter, I., Kearney, R.: Quasi-linear, time-varying, and nonlinear approaches to the identification of muscle and joint mechanics. In: [26], pp. 128–147 (1987) 17. Hunter, I., Korenberg, M.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biol. Cybern. 55, 135–144 (1986) 18. Kearney, R., Hunter, I.: System identification of human triceps surae stretch reflex dynamics. Exp. Brain Res. 51, 117–127 (1983) 19. Kearney, R., Hunter, I.: Nonlinear identification of stretch reflex dynamics. Ann. Biomed. Eng. 16, 79–94 (1988) 20. Kearney, R., Stein, R., Parameswaran, L.: Identification of intrinsic and reflex contributions to human ankle stiffness dynamics. IEEE Trans. Biomed. Eng. 44(6), 493–504 (1997) 21. Korenberg, M.: Identifying nonlinear difference equation and functional expansion representations: The fast orthogonal algorithm. Ann. Biomed. Eng. 16, 123–142 (1988) 22. Korenberg, M., Hunter, I.: The identification of nonlinear biological systems: LNL cascade models. Biol. Cybern. 55, 125–134 (1986) 23. Korenberg, M., Hunter, I.: Two methods for identifying Wiener cascades having noninvertible static nonlinearities. Ann. Biomed. Eng. 27(6), 793–804 (1999) 24. Lee, Y., Schetzen, M.: Measurement of the Wiener kernels of a non-linear system by cross-correlation. Int. J. Control 2, 237–254 (1965) 25. Marmarelis, P., Marmarelis, V.: Analysis of Physiological Systems. Plenum Press, New York (1978) 26. Marmarelis, V. (ed.): Advanced Methods of Physiological System Modelling. Biomedical Simulations Resource, USC-LA, Los Angeles, vol. 1 (1987) 27. Marmarelis, V. (ed.): Advanced Methods of Physiological System Modeling, vol. 2. Plenum Press, New York (1989) 28. Marmarelis, V. (ed.): Advanced Methods of Physiological System Modeling, vol. 3. Plenum Press, New York (1994) 29. Marmarelis, V.: Nonlinear Dynamic Modeling of Physiological Systems. IEEE Press, Piscataway (2004) 30. Mirbagheri, M., Barbeau, H., Kearney, R.: Intrinsic and reflex contributions to human ankle stiffness: variation with activation level and position. Exp. Brain Res. 135, 423– 436 (2000) 31. Mirbagheri, M., Barbeau, H., Ladouceur, M., Kearney, R.: Intrinsic and reflex stiffness in normal and spastic, spinal cord injured subjects. Exp. Brain Res. 141, 446–459 (2001) 32. Mirbagheri, M., Alibiglou, L., Thajchayapong, M., Rymer, W.: Muscle and reflex changes with varying joint angle in hemiparetic stroke. J. NeuroEng. and Rehabil. 5(1), 6 (2008) 33. Naka, K.I., Sakai, H., Naohiro, I.: Generation and transformation of second-order nonlinearity in catfish retina. Ann. Biomed. Eng. 16, 53–64 (1988) 34. Narendra, K., Gallman, P.: An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Trans. Autom. Control AC 11, 546–550 (1966) 35. Paulin, M.: A method for constructing data-based models of spiking neurons using a dynamic linear-static nonlinear cascade. Biol. Cybern. 69, 67–76 (1993) 36. Ruhe, A., Wedin, P.: Algorithms for separable nonlinear least squares problems. SIAM Rev. 22(3), 318–337 (1980)

402

D.T. Westwick

37. Sakuranaga, M., Ando, Y.I., Naka, K.I.: Signal transmission in the catfish retina. I. transmission in the outer retina. J. Neurophysiology 53, 373–389 (1985) 38. Sakuranaga, M., Ando, Y.I., Naka, K.I.: Signal transmission in the catfish retina. II. transmission to type N cell. J. Neurophysiology 53, 390–410 (1985) 39. Sakuranaga, M., Ando, Y.I., Naka, K.I.: Signal transmission in the catfish retina. III. transmission to type C cell. J. Neurophysiology 53, 411–428 (1985) 40. Sjöberg, J., Viberg, M.: Separable non-linear least squares minimization – possible improvements for neural net fitting. In: IEEE Workshop on Neural Networks for Signal Processing, vol. 7, pp. 345–354 (1997) 41. Westwick, D., Kearney, R.: Separable least squares identification of a parallel cascade model of human ankle stiffness. In: Proc. IEEE EMBS Conf., Istanbul, Turkey, vol. 23, pp. 1282–1285 (2001) 42. Westwick, D., Kearney, R.: Separable least squares identification of nonlinear Hammerstein models: Application to stretch reflex dynamics. Ann. Biomed. Eng. 29(8), 707–718 (2001) 43. Westwick, D., Kearney, R.: Identification of Nonlinear Physiological Systems. IEEE Press Series in Biomedical Engineering. IEEE Press / Wiley, Piscataway (2003) 44. Westwick, D., Perreault, E.: Identification of apparently a-causal stiffness models. In: Proc. IEEE EMBS Conf., vol. 27, pp. 5611–5614 (2005) 45. Wiener, N.: Nonlinear problems in random theory. Technology Press Research Monographs. Wiley, New York (1958) 46. Zhang, L., Rymer, W.: Simultaneous and nonlinear identification of mechanical and reflex properties of human elbow joint muscles. IEEE Trans. Biomed. Eng. 44(12), 1192– 1209 (1997)

Chapter 24

Application of Block-oriented System Identification to Modelling Paralysed Muscle Under Electrical Stimulation Zhijun Cai, Er-Wei Bai, and Richard K. Shield

24.1 Introduction Most biomedical and biological systems have nonlinear dynamics [25, 27], which bring difficulties in modelling and identification. These systems are usually represented as a series of blocks, and each block stands for a linear or nonlinear subsystem. Among those block-oriented models, Wiener model, Hammerstein model and Wiener–Hammerstein model are very popular. The Wiener model is represented as a linear dynamics followed by a nonlinear static system, while the Hammerstein model is composed in the reversed order, and the Wiener–Hammerstein model is constructed as a nonlinear static system surrounded by two linear dynamics (see Figure 24.1). All those three type of models have been extensively researched, and many identification methods have been proposed (e.g. [2, 3, 4, 6, 11, 23, 31, 39, 40]). In this chapter, we use Wiener–Hammerstein system to model the paralysed muscle under electrical stimulation, and propose an effective identification method specifically for the proposed model. Spinal cord injury (SCI) may cause the loss of volitional muscle activity, and in consequence trigger a range of deleterious adaptations. Muscle cross-sectional area declines by as much as 45% in the first six weeks after injury, with further additional atrophy occurring for at least six months [10]. Muscle atrophy impairs weight distribution over bony prominences, predisposing individuals with SCI to pressure ulcers, a potentially life-threatening secondary complication [32]. The diminution of muscular loading through the skeleton precipitates severe osteoporosis in paralysed limbs. The lifetime fracture risk for individuals with SCI is twice the risk Er-Wei Bai and Zhijun Cai Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA 52242 USA e-mail: er-wei-bai,[email protected] Richard K. Shield Graduate Program in Physical Therapy and Rehabilitation Science, The University of Iowa, Iowa City, IA 52242 USA F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 403–419. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com

404

Z. Cai, E.-W. Bai, and R.K. Shield

Fig. 24.1: (a) Wiener model, (b) Hammerstein model and (c)Wiener–Hammerstein model. L and NL in the Figure represents the linear dynamics and nonlinear static system, respectively

experienced by the non-SCI population [38]. Rehabilitation interventions to prevent post-SCI muscle atrophy and its sequel are an urgent need. Functional electrical stimulation (FES) after (SCI) is an effective method to induce muscle hypertrophy [20, 30], fibre type and metabolic enzyme adaptations [1, 12], and improvements in torque output and fatigue resistance [33, 34, 35]. New evidence suggests that an appropriate longitudinal dose of muscular load can be an effective anti-osteoporosis countermeasure [21, 37, 33]. FES also has potential utility for restoration of function in tasks such as standing, reaching, and ambulating. The numerous applications for electrical stimulation after SCI have created a demand for control systems that adjust stimulus parameters in real-time to accommodate muscle output changes (potentiation, fatigue) or inter-individual force production differences. To facilitate the refinement of control system algorithms, mathematical models of paralysed muscle under electrical stimulation are continuously being developed. To most successfully adapt stimulus parameters to real-time muscle output changes, an accurate and easy-to-implement model is essential. Over the last decades, researchers have developed a number of muscle models aimed at predicting muscle force output [7, 14, 15, 17, 18, 16, 19]. The Hill-Huxleytype model [15] is the most advanced and accurate model put forward to date [8, 22]. Compared to others, the Hill-Huxley-type model represents muscle dynamics well. However, its complexity undermines its usefulness for real-time implementation. Identification of a Hill-Huxley-type model is non-trivial because it is time-varying, high dimensional and nonlinear. Local minimum versus global minimum is always a difficult issue, and a user must tune identification algorithm parameters patiently (including the initial estimates) in order to have a good result. FES fatigue models are also developed in [14, 15, 17, 18] based on the Hill-Huxley-type model for individual stimulation trains. Due to the structural complexity and convergence issues, the fatigue model has to be identified off-line, which is not suitable for applications that require real time adaptations. Our goal has been to develop fatigue and non-fatigue models that can be used in real-time to predict the force output for a large class of stimulation patterns. The model proposed should be at least comparable to the Hill-Huxley-type model, but at a much reduced complexity. We propose to use a Wiener–Hammerstein system which resembles the Hill-Huxley-type structure but has the added advantage of greater simplicity. This approach was previously suggested by Hunt and his colleagues [26] for non-fatigue force but was deemed inadequate for muscle modelling. By examining the experimental data sets and the system, we noted two problems.

24

Application of Block-oriented System Identification

405

First, a linear block prior to the nonlinear block was missing and secondly, the static nonlinearity seemed suboptimal. The Wiener–Hammerstein fatigue model is developed based on a non-fatigue model proposed in our previous works [5]. By assuming that fatigue is a slow-varying process, which is a reasonable assumption, the fatigue effects are described by slowing varying model parameters. We test the proposed model on the actual soleus force data under 15Hz stimulation from 14 subjects with SCI. It is demonstrated that the advantages of the proposed model over previous ones are theoretically justified and numerically verified.

24.2 Problem Statement A typical electrical stimulation and the corresponding muscle force responses are shown in Figure 24.2. The electrical stimulation pattern is composed of a number of trains (124 trains in Figure 24.2) denoted by the pulses and the corresponding force output is represented by the thick curves. We observe that the peak force of each train decreases as the number of the trains increases, though the input stimulation pattern remains the same, a phenomenon referred to as fatigue. We try to find a mathematical model that describes force output under the given stimulation pattern for each individual train and also captures the fatigue phenomenon.

The Hill-Huxley-type Fatigue Model Researchers have developed many mathematical muscle models. Some of them are based on the physiologies of the muscle [9, 24, 28, 29], and some of them are not [7, 26]. However, there are few models that just model the stimulated muscle force

Fig. 24.2: Force responses (thick curve) to the stimulation pulse (thin pulses) for subject 1. The muscle was stimulated by a sequence of trains with duration of 2 seconds for total 124 trains. Each train is composed of 10-pulses with 15Hz frequency followed by a resting period of 1337ms. The muscle fatigue effect is clearly shown through the reduced peak force response of the 62nd and 124th train

406

Z. Cai, E.-W. Bai, and R.K. Shield

output under electrical stimulation, and among which, the Hill-Huxley-type fatigue model [8, 15, 22] is the most accurate model so far in the literature. It describes the force output during non-fatigue and fatigue conditions and is given by a set of equations (24.1)-(24.5). 1 n t − ti CN dCN = ∑ Ri exp(− )− , dt τc i=1 τc τc where Ri = 1 + (R0 − 1) exp(−

(24.1)

ti −ti−1 τc )

dy CN y =A − , CN dt 1 + CN τ1 + τ2 1+C N

(24.2)

dA A − Arest + αA y, =− dt τ f at

(24.3)

R0 − R0,rest dR0 =− + αR0 y, dt τ f at

(24.4)

τc − τc,rest d τc =− + ατc y. dt τ f at

(24.5)

Equations (24.1) and (24.2) describe the stimulated muscle behaviour for each individual stimulation train, and Equations (24.3)-(24.5) govern the fatigue effect via varying the parameter A, R0 and τc . In Equations (24.1) and (24.2), ti is the time of the ith stimulation input and CN is the (internal) state variable, while A, R0 and are gains, and y(t) is the force output and τ1 , τ2 , and τc are the time constants. Note no actual input amplitude is directly used but only the input time sequence is used. The effect of the input amplitude is automatically adjusted by the parameters Ri and τc . In Equations (24.3)-(24.5), αA , αR0 and ατc are decaying parameters and τ f at is recovery rate. These four parameters control the fatigue model. Arest , R0,rest and τc,rest are the parameter values at the initial (non-fatigue) stage which have to be identified from the non-fatigue model (24.1) and (24.2). Note that output force y is involved as a feedback in Equations (24.3)-(24.5). To identify the Hill-Huxley-type model, an offline multi-step method is used [15]. First, it fixes τc = 0.02(seconds) to obtain the other two time constants τ1 and τ2 , which are fixed at the obtained value in the next step; then parameters A, R0 , and τc will be identified for each individual stimulation train. The last step is to derive the fatigue parameters αA , αR0 , ατc and τ f at by using the actual force output. This procedure is time-consuming and has to be done off-line due to the complexity of the model and algorithm that causes local minimum problems.

24.3 The Wiener–Hammerstein Fatigue Model The proposed Wiener–Hammerstein (block-oriented) model for single stimulation Av0 train is shown in Figure 24.3 (a) with the nonlinearity w = f (v0 ) denoted by 1+Bv , 0

24

Application of Block-oriented System Identification

407

Fig. 24.3: (a): Wiener–Hammerstein muscle model. (b): The middle nonlinear block can be decomposed into three parts. (c): The simplified Wiener–Hammerstein muscle model

where B and A are unknown parameters that are adjusted for each individual subject. The system is in the discrete time domain for easy use of digital computers. The input is the electrical stimulus (in volts) at time kT where T (0.2ms in the paper) is the sampling interval and the output y(k) = y(kT ) is the muscle force at time kT . The internal signals v(k) = v(kT ) and w(k) = w(kT ) are unavailable. The linear blocks prior and after the nonlinearity are the first order dynamic systems. The Wiener– Hammerstein model resembles the structure of the Hill-Huxley-type model but at a much reduced complexity. By decomposing middle block into three blocks (Figure 24.3 (b)) and combining the gains B and A/B with a0 and b0 , respectively, we obtain the following system in Figure 24.3 (c), where a2 = a0 B and b2 = b0 A/B. This process greatly simplifies the identification problem, reducing the number of unknown parameters from six to four. Also, no additional sequence of ti ’s is needed, which is not the case for the Hill-Huxley-type model. It is important to comment that the system in Figure 24.3 (c) is identical to the system in Figure 24.3 (a) from the input to output point of view, though the complexity is greatly reduced. The identification algorithm and convergence analysis of Wiener–Hammerstein model for single train (non-fatigue) are described in [5]. We here concentrate on the Wiener–Hammerstein fatigue model, which is developed upon the Wiener–Hammerstein model for a single stimulation train. From Figure 24.3, the fatigue model is described by the following set of Equations (24.6)(24.9): 1 b2 (p) f u p (kT ) , (24.6) y p (kT ) = z − b1(p) z − a1(p) a1 (p + 1) = αa1 (p)a1 (p) + βa1(p)a1 (p − 1),

(24.7)

b1 (p + 1) = αb1 (p)b1 (p) + βb1(p)b1 (p − 1),

(24.8)

b2 (p + 1) = αb2 (p)b2 (p) + βb2(p)b2 (p − 1).

(24.9)

408

Z. Cai, E.-W. Bai, and R.K. Shield

Fig. 24.4: The cost function J (aˆ1 (p), h(aˆ1 )) vs. aˆ1 . The right plot is the zoom of the left plot with aˆ1 ∈ [0.9, 1]

In Equation (24.6), b1 (p), b2 (p) and a1 (p) are the parameters of pth (p > 2) train x in the stimulation pattern, f (x) = 1+x is the saturation function, y p (kT ) and u p (kT ) are the force output and stimulus input for the pth stimulation train, respectively. Note that in the fatigue model, we set a2 (p) = 1 for all stimulation trains because the fitting performance is not sensitive to a2 , and it also reduces the number of parameters to three for each individual train. For stability reason, we set 0 ≤ b1 (p) ≤ 1 and 0 ≤ a1 (p) ≤ 1. Time varying nature of the parameters b1 (p), b2 (p) and a1 (p) reveals the muscle fatigue effects (Equations (24.7)-(24.9)). It is reasonable to assume that the parameters for each individual train are slow varying so they can be predicted by the extrapolation of the previous two corresponding parameters. In Equations (24.7)-(24.9), αa1 , αb1 ,αb2 , βa1 , βb1 and βb2 are the coefficients, and they are obtained through the iterative (least squares) method. ⎡ ⎤ a1 (3) 4 3 αa1 (p) −1 T ⎢ ⎥ .. (24.10) A1 ⎣ = AT1 A1 ⎦, . βa1 (p) a1 (p) ⎡ ⎤ b1 (3) 4 3 −1 T ⎢ αb1 (p) ⎥ .. (24.11) B1 ⎣ = BT1 B1 ⎦, . βb1 (p) b1 (p) ⎡ ⎤ b2 (3) 4 3 −1 T ⎢ αb2 (p) ⎥ .. B2 ⎣ = BT2 B2 (24.12) ⎦, . βb2 (p) b2 (p)

24

Application of Block-oriented System Identification

⎡ ⎢ ⎢ where A1 = ⎢ ⎣ ⎡ ⎢ ⎢ and B2 = ⎢ ⎣

a1 (2) a1 (3) .. .

⎤

a1 (1) a1 (2) .. .

⎡

⎥ ⎢ ⎥ ⎢ ⎥, B1 = ⎢ ⎦ ⎣

a1 (p) a1 (p − 1) ⎤ b2 (2) b2 (1) b2 (3) b2 (2) ⎥ ⎥ ⎥. .. .. ⎦ . . b2 (p) b2 (p − 1)

b1 (2) b1 (3) .. .

409

b1 (1) b1 (2) .. .

⎤ ⎥ ⎥ ⎥, ⎦

b1 (p) b1 (p − 1)

It is commented that only the data up to p − 1 trains are needed to calculate αa1 (p), βa1 (p), αb1 (p), βb1 (p), αa2 (p) and βb2 (p). This causal property is important which makes identification in real time feasible.

24.4 Identification of the Wiener–Hammerstein System 24.4.1 Identification of the Wiener–Hammerstein Non-fatigue Model (Single Train Stimulation Model) Before presenting the identification algorithm for the Wiener–Hammerstein fatigue model, we need to show the method to identify the Wiener–Hammerstein nonfatigue model with given input and output data for pth single stimulation train. Let θ (p) = [a1 (p), b1 (p), b2 (p)] and θˆ (p) = [aˆ1 (p), bˆ 1 (p), bˆ 2 (p)] denote the unknown system parameters and the estimate, respectively. Let y(kT ˆ ) be the predicted output calculated using the estimates 1 bˆ 2 (p) f u p (kT ) . y(p)(kT ˆ )= (24.13) z − bˆ 1(p) z − bˆ 1(p) We want to find the best parameter set θ ∗ = [a∗1 (p), a∗2 (p), b∗1 (p), b∗2 (p)] that minimises the sum of squared errors between the actual output y p (kT ) and yˆ p (kT )

θ ∗ = arg min θˆ

∑ (y p(kT ) − yˆ p(kT ))2

.

(24.14)

k

Obviously, (24.14) is a nonlinear optimisation problem because of the involvement of the nonlinear function f and thus in general, local minimum is always a tough issue. However, we show that this is not a problem for (24.14). # " u (kT ) Suppose a value of aˆ1 (p) is given, the internal signal wˆ p (kT ) = f z−paˆ (p) 1

can be calculated. Based on this internal signal and the model yˆ p ((k + 1)T ) = bˆ 1 (p)yˆ p (kT ) + bˆ 2 (p)wˆ p (kT, aˆ1 (p)), The optimal value of bˆ 1 (p) and bˆ 2 (p) with given value of aˆ1 (p) is the solution of

410

Z. Cai, E.-W. Bai, and R.K. Shield

[bˆ 1 (p), bˆ 2 (p)]∗ = arg

= arg

min

[bˆ 1 (p),bˆ 2 (p)]

min

∑ ⎧ ⎨

k

∑

[bˆ 1 (p),bˆ 2 (p)] ⎩ k

2 y p (kT ) − yˆp (kT )

'2 ⎫ ⎬ y p (kT ) − ∑ bˆ 1k−1−i (p)b2 (p)wˆ p (iT, aˆ1 (p)) . ⎭ i=1

&

k−1

(24.15)

By taking the derivative of the above cost function with respect to bˆ 2 (p) and setting it to zero yields k−1 k−1−i (kT ) b (p) w ˆ (iT, a ˆ (p)) y ∑ ∑ p p 1 k i=1 1 (24.16) bˆ 2 (p) = 2 , k−1−i (p)wˆ p (iT, aˆ1 (p)) ∑k ∑k−1 i=1 b1 k−1−i which is well defined provided ∑k ∑k−1 (p)wˆ p (iT, aˆ1 (p)) = 0. Replace i=1 b1 bˆ 2 (p) with bˆ 1 (p), the optimisation (24.15) becomes one-dimensional. By visualising the cost function versus bˆ 1 ∈ [0, 1] in Figure 24.5, it is easily seen that there is only one (global) minimum in that range. Then, any nonlinear optimisation algorithm can be used to find the global minimum. Finally, the optimal bˆ 2 (p) is obtained from bˆ 1 (p) as in (24.16). This process guarantees a unique optimal solution bˆ 1 (p) and bˆ 2 (p) for given aˆ1 (p), written as [bˆ 1 (p), bˆ 2 (p)]T = h (aˆ1 (p)) ,

(24.17)

and the minimisation problem (24.14) of three parameters becomes the minimisation problem of one parameters min J (aˆ1 (p), h (aˆ1 (p))) .

(24.18)

Although the minimisation problem is simplified from (24.14) to (24.18), (24.18) is still nonlinear. But it is one-dimensional and the cost function J versus aˆ1 can now be easily plotted and visualised in Figure 24.4. It is clearly shown that there is one (global) minimum for aˆ1 ∈ [0, 1], precisely in the range bˆ 1 ∈ [0.98, 1]. In fact, for both minimisation problems (24.16) and (24.18), we have found that, for all subjects tested, there is only one (local and global) minimum. Then any nonlinear optimisation algorithm can be used to find the global minimum. We now summarise the identification algorithm for single train response (non-fatigue model). Single train (non-fatigue) model identification algorithm: Given the data set u p (k p T ) and y p (k p T ) for pth stimulation train number and k p = 1, 2, · · · , N p : 1. For each aˆ1 (p), use any nonlinear optimisation algorithm to find the optimal bˆ 1 (p) and bˆ 2 (p) of (24.15). 2. Apply any nonlinear optimisation algorithm to find the optimal aˆ1 ∈ [0, 1], and compute the corresponding bˆ 1 (p) and bˆ 2 (p). In simulations, we use the modified MATLABTM program “fminsearchbnd” to solve the nonlinear optimisation problem for both Step 1 and Step 2. The MATLAB TM

24

Application of Block-oriented System Identification

411

Fig. 24.5: The cost function vs. bˆ 1 with aˆ1 = 0.9

program “fminsearchbnd” is Nelder-Mead simplex approach based and is able to deal with simple upper-bound and lower-bound constraints.

24.4.2 Identification of the Wiener–Hammerstein Fatigue Model Now, we are ready to present the identification algorithm for the fatigue model. Online fatigue model Identification algorithm: Given the data set u p (k p )T and y p (k p T ) where p = 1, 2, · · · , N, is the simulation train number and k p ∈ {1, 2, · · · , N p } with N p being the total number of sampling data for pth stimulation train and N being the total stimulation train number. 1. Identify the single train (non-fatigue) models for the first ks stimulation trains respectively, which are assumed to be the non-fatigue case. Let p = ks . 2. Format the parameters aˆ1 (p) , bˆ 1 (p) and bˆ 2 (p) into A1 , B1 and B2 as in Equation (24.10)-(24.12). Apply Equation (24.10)-(24.12) to obtain the coefficients αa1 , αb1 , αb2 , βa1 , βb1 and βb2 . 3. Substitute the above coefficients into (24.7) - (24.9), we have the predicted values of a1 (p + 1), b1 (p + 1) and b2 (p + 1) for the next stimulation train. 4. The force response yˆ p+1 (k p+1 T ) of the next stimulation train is predicted by substituting a1 (p + 1), b1 (p + 1) and b2 (p + 1) into Equation (24.6). 5. Collect the actual input and force output for the (p + 1)th stimulation train. 6. If p = N − 1, stop. Otherwise let p = p + 1 and go back to Step 2. Key Points: • Because of the simple structure of the proposed model and global optimum properties of the least squares (24.10) - (24.12), and (24.16) and (24.18), this identification procedure is suitable for on line identification, that is, identify the model

412

Z. Cai, E.-W. Bai, and R.K. Shield

for each simulation train and use those parameters to predict the force response for the next stimulation train. • In Equations (24.10) - (24.12), the identified parameters of the current and the previous train are used to obtain the model parameter of the next stimulation train. The number of previous trains that used to obtain the next stimulation train model can be increased, which makes the prediction more accurate but it increases complexity. • The first ks trains are used to establish the initial fatigue model, and then the parameters a1 (p + 1) , b1 (p + 1) and b2 (p + 1) are updated in real time. The exact value of ks can be adjusted based on some criteria, for example, ks is the value that the fatigue phenomenon is initially observed.

24.5 Collection of SCI Patient Data Fourteen subjects with chronic SCI (> 2 years) provided written informed consent, as approved by the University of Iowa Human Subjects Institutional Review Board. A detailed description of the stimulation and the force transducing systems has been previously reported [33, 34, 35] (Figure 24.6). In brief, the subject sat in a wheelchair with the knee and ankle positioned at ninety degrees. The foot rested upon a rigid metal plate, and the ankle was secured with a soft cuff and turnbuckle connectors. Padded straps over the knee and forefoot ensured isometric conditions. The tibial nerve was supramaximally stimulated in the popliteal fossa using a nerve probe and a custom computer-controlled constant-current stimulator. Stimulation was controlled by digital pulses from a data-acquisition board (Metrabyte DAS 16F, Keithley Instruments Inc., Cleveland, OH) housed in a microcomputer under custom

Fig. 24.6: Schematic representation of the limb fixation and force measurement system

24

Application of Block-oriented System Identification

413

software control. The simulator was programmed to deliver a 10-pulse train (15Hz; train duration: 667ms) every 2 seconds for total 124 trains. In this paper, we will only consider stimulation at 15Hz. This is because that muscular overload ( 60% of maximal torque) can be generated via 15Hz supra-maximal stimulation [35] and eliciting muscle contractions with a 1 on: 2 off work-rest cycle (Burke like protocol) with a 15Hz frequency induced significant low-frequency fatigue without compromising neuromuscular transmission [13, 36]. The ensuing soleus isometric plantar flexion torques were measured via a load cell (Genisco AWU-250) positioned under the first metatarsal head. Torque was amplified 500 times (FPM 67, Therapeutics Unlimited) and input to a 12-bit resolution analog-to-digital converter at a sampling rate of 5000 samples per second. The digitised signals were analysed with Datapac 2K2 software (RUN Technologies, Mission Viejo, CA).

24.6 Results We applied the Wiener–Hammerstein model on 14 subjects. To show the advantages of the proposed model, we compared its performance against the Hill-Huxley-type model. We tried to obtain the Hill-Huxley-type fatigue model using the algorithm presented in [15] which unfortunately did not perform well. Therefore, we identified the Hill-Huxley model train by train using Equation (24.1) and (24.2) (with optimal τ1 , τ2 identified in advance), and denoted the predicted force of each train by dCˆN 1 n ˆ CˆN t − ti = )− , Ri exp(− ∑ ˆ ˆ ˆ dt τc (p) i=1 τc (p) τc (p)

(24.19)

t −ti−1 where Rˆ i = 1 + (Rˆ 0 − 1) exp(− iτˆc (p) )

d yˆ p yˆ p CˆN ˆ = A(p) − . ˆ dt 1 + CN τ1 + τ2 CˆN 1+Cˆ

(24.20)

N

The actual and the predicted peak forces for the Hill-Huxley-type model are defined as Fpk (p) = max y p (t),

(24.21)

t∈[t0 ,t0+1 ]

Fˆpk (p) = max yˆ p (t), t∈[t0 ,t0+1 ]

where t p is the starting time of the pth train. The performance of the proposed Wiener–Hammerstein model is compared against the performance of (24.19)-(24.21). It is important to note that the performance of (24.19)-(24.22) is better than the actual Hill-Huxley fatigue model (24.3)(24.5). Hence, the comparison appears reasonable. Similarly, the actual and the predicted peak force for the proposed model are defined as

414

Z. Cai, E.-W. Bai, and R.K. Shield

Fpk (p) = Fˆpk (p) =

max

y p (k p T ),

max

yˆ p (k p T ).

k p ∈[1,2,··· ,N p ]

k p ∈[1,2,···,N p ]

(24.22)

There are totally 124 trains and each train is composed of 10 pulses at 15Hz with a resting period 1337ms (1/3rd on: 2/3rds rest). Figure 24.7 and Figure 24.8 show the fits for the first and last three trains of the proposed model, respectively. Both figures demonstrate the capability of fitting the actual force response very well. The predicted parameters (a1 (p + 1) ,b1 (p + 1)and b2 (p + 1)) (x) in the fatigue model based on (24.7)-(24.9) are shown in Figure 24.9 along with the individually identified ones (dot) based on the data within individual trains. This illustrates the model proposed is capable of capturing the properties of individual trains as well as the fatigue phenomenon. Figure 24.10 shows the predicted (star) and actual (dot) peak force for subject 1. They match well in both shape and magnitude. For each subject, the good-of-fitness (gof) is calculated for each train and their average is used for comparison, see Equation (24.23). : Np 2 ∑ p=1 (y(k p T )−yˆ pk (k p T )) gof p = 1 − Np 2, (24.23) ∑ p=1 (y(k p T )−y¯ˆ pk (k p T )) gofave =

1 N

∑Np=1 gof p , N = 124.

The average good-of-fitness for all 124 stimulation trains of all 14 subjects are in Table 24.1. The proposed model substantially outperforms the Hill-Huxley-type model in gofave (0.9102 vs. 0.7805). Keep in mind that the proposed model is for the prediction and the Hill-Huxley-type model reflects only the off line identification result.

Fig. 24.7: The force response of the first three stimulation trains for actual output (solid) and the predicted output (dashed) for subject 1

24

Application of Block-oriented System Identification

415

Fig. 24.8: The force response of the last three stimulation trains for actual output (solid) and the predicted output (dashed) for subject 1. The total number of trains is 124

Fig. 24.9: Model parameters a1 , b1 and b2 of the predicted (x) and identified (dot) for subject 1

As for the prediction performance on the peak force, we use the correlation coefficient, as used in [15], between the actual peak force and predicted peak force, which is defined as ∑Np=1 (Fpk (p) − F¯pk (p))(Fˆpk (p) − F¯ˆpk (p)) #2 , 2 N " ¯ N ¯ ˆ ˆ ∑ p=1 Fpk (p) − Fpk (p) ∑ p=1 Fpk (p) − Fpk (p)

R= ;

(24.24)

416

Z. Cai, E.-W. Bai, and R.K. Shield

where F¯pk and F¯ˆpk are the average of actual and predicted peak force for all stimulation trains, respectively. This measurement tells us how the predicted force correlates to the actual peak force of each individual train. A problem with the use of correlation coefficients is that it is unable to distinguish the discrepancy between the actual peak force and predicted peak force due to a DC shift or scaling. That is, the correlation is a measure of association and not agreement. For instance, the correlation coefficient could be close to one even though the actual peak force and predicted peak force differ substantially. Consequently, a very high correlation

Fig. 24.10: Actual (dot) and predicted (star) peak force of each stimulation train for subject 1 Table 24.1: Average good-of-fitness (gofave ) for each stimulation train and the correlation coefficient (R) and good-of-fitness (gof pk ) for the actual and predicted peak force for 14 subjects using the proposed fatigue model and Hill-Huxley-type model, respectively Subject Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Ave

Wiener–Hammerstein model gofave R gof pk 0.9363 0.9993 0.9532 0.8938 0.9993 0.9331 0.8889 0.9967 0.8375 0.8958 0.9976 0.8924 0.9286 0.9997 0.9643 0.9212 0.9983 0.9212 0.9187 0.999 0.9337 0.9223 0.9995 0.891 0.9217 0.9992 0.917 0.9265 0.9994 0.9374 0.911 0.9982 0.8557 0.8687 0.9975 0.7262 0.9025 0.9993 0.8886 0.9072 0.9996 0.9351 0.9102 0.9988 0.8990

Hill-Huxley-type model gofave R gof pk 0.8077 0.9978 0.7721 0.7692 0.9989 0.8443 0.8437 0.9971 0.519 0.7826 0.9978 0.7481 0.841 0.9993 0.8428 0.8154 0.9994 0.8833 0.8673 0.9982 0.9039 0.781 0.9952 0.5571 0.7311 0.9969 0.7216 0.7734 0.9988 0.7558 0.802 0.9924 0.22 0.7889 0.9971 0.8284 0.7249 0.9936 0.4065 0.5992 0.9821 0.4535 0.7805 0.9961 0.6755

24

Application of Block-oriented System Identification

417

coefficient does not always imply a small error in predicting the actual force. A better criterion is the good-of-fitness, widely used in identification literature. Since we are more concerned about the peak force for each stimulation train during the FES, we want to know how well the predicted peak force fit the actual peak fore of each individual train defined by > ? N ? ∑ p=1 Fpk (p) − Fˆpk (p) 2 ? (24.25) gof p = 1 − @ " #2 . ∑Np=1 Fpk (p) − F¯ˆpk (p) Table 24.1 also shows the peak force correlation coefficient and good-of-fitness of the actual and predicted peak force for the proposed Wiener–Hammerstein fatigue model and the Hill-Huxley-type model. It is shown that the proposed fatigue model has a similar correlation coefficient to the Hill-Huxley-type individual model (0.9988 vs. 0.9916 in R). In terms of the gof, the proposed model outperforms the Hill-Huxley model substantially (0.8990 vs 0.6755 in gof pk ).

24.7 Discussion and Conclusions We have developed a Wiener–Hammerstein model for paralysed muscle under electrical stimulation, and presented an effective identification method. The proposed model is much less complicated than the most accurate model but captures the characteristics of the stimulated muscle by taking the advantage of the block-oriented system. However, the fitting and predicting performances are not diminished. The proposed model predicts well not only the peak force tendency (high gof pk ) but also the force output profile (high gofave ) of the each individual stimulation train. The proposed model is appropriate for online implementation due to its one “global” optimum property, which makes the model quickly and accurately identified. Patients’ muscles are easily fatigued due to the synchronising recruitment mechanism of FES during the stimulation training, so accurate and fast online prediction is very critical, especially for relatively long time. Further, this model has a strong potential to be incorporated into a feedback controlled FES system. In this paper, the fatigue model is only tested on 15Hz stimulation and has not been tested on other frequency stimulations or hybrid stimulation frequencies, which will be addressed in subsequent experiments. Our future goals are to analyse other stimulation protocols and implement the algorithm into a real time feedback FES system.

References 1. Andersen, J.L., Mohr, T., et al.: Myosin heavy chain isoform transformation in single fibres from m. vastus lateralis in spinal cord injured individuals: effects of long-term functional electrical stimulation (FES). Pflugers Archiv - European Journal of Physiology 431, 513–518 (1996)

418

Z. Cai, E.-W. Bai, and R.K. Shield

2. Bai, E.W.: An optimal two-stage identification algorithm for Hammerstein–Wiener nonlinear systems. Automatica 34, 333–338 (1998) 3. Bai, E.W.: Frequency Domain Identification of Hammerstein Models. IEEE Trans. Autom. Control 48, 530–542 (2003) 4. Bai, E.W.: Frequency Domain Identification of Wiener Models. Automatica 39, 1521– 1530 (2003) 5. Bai, E.W., Cai, Z.J.: Identification a Modified Wiener–Hammerstein System and Its Application in Electrically Stimulated Paralyzed Skeletal Muscle Modeling. Automatica 45, 736–743 (2009) 6. Bai, E.W., Fu, M.: A blind approach to Hammerstein model identification. IEEE Trans. Signal Processing 50, 1610–1619 (2002) 7. Bobet, J., Stein, R.B.: A simple model of force generation by skeletal muscle during dynamic isometric contractions. IEEE Trans. Biomed. Eng. 13, 1010–1016 (1998) 8. Bobet, J., Stein, R.B.: A comparison of models of force production during stimulated isometric ankle dorsiflexion in humans. IEEE Trans. Neural Syst. Rehabil. Eng. 13, 444– 451 (2005) 9. Brown, I.E., Scott, S.H., Loeb, G.E.: Mechanics of feline soleus: II. Design and validation of a mathematical model. J. Muscle Res. Cell. Motil. 17, 219–232 (1996) 10. Castro, M.J., Apple, D.F., et al.: Influence of complete spinal cord injury on skeletal muscle cross-sectional area within the first 6 months of injury. European Journal of Applied Physiology & Occupational Physiology 80, 373–378 (1999) 11. Crama, P., Schoukens, J.: Initial estimates of Wiener and Hammerstein systems using multisine excitation. IEEE Transactions on Instrumentation and Measurement 50, 1971– 1975 (2001) 12. Crameri, R.M., Weston, A., et al.: Effects of electrical stimulation-induced leg training on skeletal muscle adaptability in spinal cord injury. Scandinavian Journal of Medicine & Science in Sports 80, 316–322 (2002) 13. Deluca, C.: Myoelectrical manifestations of localized muscular fatigue in human. JCRC Crit. Rev. Biomed. Eng. 11, 251–279 (1984) 14. Ding, J., Binder-Macleod, S.A., et al.: Two-step, predictive, isometric force model tested on data from human and rat muscles. J. Appl. Physiol. 85(6), 2176–2189 (1998) 15. Ding, J., Wexler, A.S., et al.: A predictive model of fatigue in human skeletal muscles. J. Appl. Physiol. 89, 1322–1332 (2000) 16. Ding, J., Wexler, A.S., et al.: A mathematical model that predicts the force-frequency relationship of human skeletal muscle. Muscle Nerve 26, 477–485 (2002) 17. Ding, J., Wexler, A.S., et al.: A predictive fatigue model–I: Predicting the effect of stimulation frequency and pattern on fatigue. IEEE Trans. Neural Syst. Rehabil Eng. 10(1), 48–58 (2002) 18. Ding, J., Wexler, A.S., et al.: A predictive fatigue model–II: Predicting the effect of resting times on fatiguee. IEEE Trans. Neural Syst. Rehabil Eng. 10(1), 59–67 (2002) 19. Ding, J., Wexler, A.S., et al.: Mathematical models for fatigue minimization during functional electrical stimulation. Electromyogr. Kinesiol. 13, 575–588 (2003) 20. Dudley, G.A., Castro, M.J., et al.: A simple means of increasing muscle size after spinal cord injury: a pilot study. European Journal of Applied Physiology & Occupational Physiology 80, 394–396 (1999) 21. Dudley-Javoroski, S., Shields, R.K.: Case report: Dose estimation and surveillance of mechanical load interventions for bone loss after spinal cord injury. Physical Therapy (2008), (in press) 22. Frey Law, L.A., Shields, R.K.: Mathematical models of human paralyzed muscle after long-term training. J. Biomech. (2007)

24

Application of Block-oriented System Identification

419

23. Greblicki, G.: Continuous time Hammerstein system identification. IEEE Trans. Automat. Contr. 45, 1232–1236 (2000) 24. Gribble, P.L., Ostry, D.J.: Origins of the power law relation between movement velocity and curvature: Modeling the e.ects of muscle mechanics and limb dynamics. J Neurophysiol 76, 2853–2860 (1996) 25. Haber, R., Unbehauen, H.: Structure Identification of Nonlinear Dynamic SystemssA Survey on Input/Output Approaches. Automatica 26, 651–677 (1990) 26. Hunt, K.J., Munih, M., et al.: Investigation of the Hammerstein hypothesis in the modeling of electrically stimulated muscle. IIEEE Trans. Biomed. Eng. 45, 998–1009 (1998) 27. Hunter, I.W., Korenberg, M.J.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biol. Cybern. 55, 135–144 (1986) 28. Krylow, A.M., Rymer, W.Z.: Role of intrinsic muscle properties in producing smooth movements. IEEE Trans on BME 44, 165–176 (1997) 29. Loeb, G.E., Brown, I.E., Cheng, E.J.: A hierarchical foundation for models of sensorimotor control. Exp Brain Res 126, 1–18 (1999) 30. Mahoney, E.T., Bickel, C.S., et al.: Changes in skeletal muscle size and glucose tolerance with electrically stimulated resistance training in subjects with chronic spinal cord injury. Arch. Phys. Med. Rehabil. 86, 1502–1504 (2005) 31. Narendra, K.S., Gallman, P.G.: An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Transactions on Automatic Control 11, 546– 550 (1966) 32. Shields, R.K., Dudley-Javoroski, S.: Musculoskeletal deterioration and hemicorporectomy after spinal cord injury. Physical Therapy 83, 263–275 (2003) 33. Shields, R.K., Dudley-Javoroski, S.: Musculoskeletal plasticity after acute spinal cord injury: Effects of long-term neuromuscular electrical stimulation training. Journal of Neurophysiology 95, 2380–2390 (2006) 34. Shields, R.K., Dudley-Javoroski, S.: Musculoskeletal adaptation in chronic spinal cord injury: effects of long-term soleus electrical stimulation training. Journal of Neurorehabilitation and Neural Repair 21, 169–179 (2006) 35. Shields, R.K., Dudley-Javoroski, S., et al.: Post-fatigue potentiation of paralyzed soleus muscle: Evidence for adaptation with long-term electrical stimulation training. Journal of Applied Physiology 101, 556–565 (2006) 36. Shields, R.K., Fray Law, L.A.: Effects of electrically induced fatigue on the twitch and tetanus of paralyzed soleus muscle in humans. JJ. Appl. Physiol. 82(5), 1499–1507 (1997) 37. Shields, R.K., Schlechte, J., et al.: Bone mineral density after spinal cord injury: a reliable method for knee measurement. Arch. Phys. Med. Rehabil. 86, 1969–1973 (2005) 38. Vestergaard, P., Krogh, K., et al.: Fracture rates and risk factors for fractures in patients with spinal cord injury. Spinal Cord 36, 790–796 (1998) 39. Vörös, J.: Paramter identification of discontinuous Hammerstein systems. Automatica 33, 1141–1146 (1997) 40. Westwick, D., Verhaegen, M.: Identifying MIMO Wiener systems using subspace model identification methods. Signal Processing 52, 235–258 (1998)

Index

ARMAX system, 69 AutoRegressive with eXternal input (ARX), 69, 241, 295 Backlash, 304, 348, 367 characteristic, 369 parameters identification, 371 Basis function, 217 Best linear approximation, 209 Bi-convex optimisation, 243 Biased estimates, 91 Binary, 337 Blind identification, 91, 97, 108, 273, 293, 303, 314, 329 asymptotic properties, 279, 285 cost function, 278 Cramér-Rao lower bound, 282 impact of output noise, 283, 286 initial estimates, 280 laboratory experiment, 287 negative log-likelihood function, 278 simulation results, 283 uncertainty bounds, 282 Block linear, 368 nonlinear, 368 Block-oriented models, 89, 273, 369 Bounded errors, 368 errors-in-variables, 375 noise, 369 uncertainties, 375 Bounds computation, 374

parameters, 370, 374 parameters evaluation, 375 Bussgang’s theorem, 152 CCA, 229 Central estimate, 375 Chebyshev Polynomials,Hermite Polynomials, 389 Closed-loop system identification, 230 Coloured noise, 95, 105 Combined method, 120 Component-wise LS-SVM, 247 Compound operator, 35 Consistent estimates, 91 Control valve stiction, 303, 306 Convergence a.s., 71 Convex LMI relaxation, 376 Convex optimisation, 245 Cross-correlation analysis, 115 Curse of dimensionality, 130 Deadzone, 370 Decomposition, 36 Decouple, 339, 348 Equalisation, 275 Error bounds, 375 output measurement, 368 Error propagation, 304 Errors-in-variables, 375 Expectation maximisation algorithm, 99

422 Fast sampling outputs, 295 Feasible parameter set, 372 Feature map, 245, 247 Feature space, 245, 247 Frequency identification, 165, 181 Frequency Response Function (FRF), 211 G-functionals, 18 Gradient-based search, 98 Half-substitution, 37, 43, 45 Hammerstein, 368 Hammerstein model, 273, 348, 387 Hammerstein system, 35, 39, 241, 293, 294 Hammerstein–Wiener system, 42, 255 Hard input nonlinearities, 262, 347 Hyper parameters, 245, 247 Hysteresis, 294 Identification, 241 Ill-conditioning, 244 Instrumental variable method (IVM), 299 Invariance property, 152 Inverse regression approach, 114 Iterative algorithm, method, 47, 55, 58 Kernel, 245, 247 estimate, estimators, 114, 129 function, 245 Kernel function, 247 Key term separation principle, 37 Laboratory experiment blind identification of Wiener systems, 287 Lagrange multipliers, 245, 247 Least squares, 112, 245, 297 Least Squares Support Vector Machines (LS-SVM), 229, 245, 247 Likelihood function, 94 Linear matrix inequalities, 368 Linear programming, 376 Lissajous curves, 185 LPV, 229 Magneto-rheological dampers, 306 Maximum likelihood, 89, 94 Maximum likelihood algorithm, 97, 274 asymptotic properties, 279, 285 MIMO, 250

Index MISO Hammerstein systems, 136 Model structure selection, 213 Multisine, 212 NARX, 241 Noise output noise, 275 Nonconvex constraints, 373 optimisation problems, 374 Nonlinear feedback, 214 Nonparametric identification, 114, 129 Nonparametric nonlinearity, 71, 181 Nonparametric regression, 129 Normalisation constraint, 55, 58 Numerical integration, 99 One-step ahead prediction, 138 Orthotopic outer-bounding, 372 Output noise, 275 Output signal only, 277 Output-error approach, 91 Output-error criterion, 91 Over-sampling, 315, 329 Overparametrisation, 243 method, 28 nonlinear optimisation, 282 Parallel systems, 136, 140 Parameter backlash, 368 bounds, 370 bounds computation, 374 feasible set, 372 tight intervals, 372 uncertainty intervals, 368 Particle filter, 102 methods, 102 smoother, 102 Piecewise constant, 295 Piecewise linear, 44, 77 point estimate, 164, 165, 167, 168, 170 Polytopic outer approximation, 376 Prediction-error criterion, 91 Preload, 57 Primal-dual framework, 245, 247 Projections, 130 Pseudo random binary sequence, 368 Pseudo random binary sequences, 337

Index

423

Recursive estimation, 69, 72 Regression, 245 Regularisation, 245, 247 Ridge regression, 245

Stretch reflex, 389 Subspace direct equalisation (SDE), 300 Subspace identification, 255 Switch operator, 348

Sampled data, 165 sampled data, 165 Saturation, 56 Semiparametric Inference, 127, 128 Sensor calibration, 275 Separability, 152 Separable least squares, 263–265 Sequential Importance Resampling (SIR), 102 Set-membership, 368 Signal spread, 192, 205 Singular value decomposition (SVD), 30 SISO, 247 Spline, 394 Static function approximation, 245 Stochastic Hammerstein system, 69

Two-stage, 28, 30 Uncertainty bounds, 370 parameter interval, 368 Uniform consistency, 134 Unobserved input, 273 Volterra series, 14, 385, 388 WH-NFIRsum, 214 WH-polyNFIR, 214 Wiener model, 35, 41, 89, 91, 111, 181, 273, 386 Wiener theory, 17 Wiener–Hammerstein model, 209, 387

Lecture Notes in Control and Information Sciences Edited by M. Thoma, F. Allgöwer, M. Morari Further volumes of this series can be found on our homepage: springer.com Vol. 404: Giri, F.; Bai, E.-W. (Eds.): Block-oriented Nonlinear System Identification 425 p. 2010 [978-1-84996-512-5] Vol. 403: Tóth, R.; Modeling and Identification of Linear Parameter-Varying Systems 319 p. 2010 [978-3-642-13811-9] Vol. 402: del Re, L.; Allgöwer, F.; Glielmo, L.; Guardiola, C.; Kolmanovsky, I. (Eds.): Automotive Model Predictive Control 284 p. 2010 [978-1-84996-070-0] Vol. 401: Chesi, G.; Hashimoto, K. (Eds.): Visual Servoing via Advanced Numerical Methods 393 p. 2010 [978-1-84996-088-5] Vol. 400: Tomás-Rodríguez, M.; Banks, S.P.: Linear, Time-varying Approximations to Nonlinear Dynamical Systems 298 p. 2010 [978-1-84996-100-4] Vol. 399: Edwards, C.; Lombaerts, T.; Smaili, H. (Eds.): Fault Tolerant Flight Control appro. 350 p. 2010 [978-3-642-11689-6] Vol. 398: Hara, S.; Ohta, Y.; Willems, J.C.; Hisaya, F. (Eds.): Perspectives in Mathematical System Theory, Control, and Signal Processing appro. 370 p. 2010 [978-3-540-93917-7] Vol. 397: Yang, H.; Jiang, B.; Cocquempot, V.: Fault Tolerant Control Design for Hybrid Systems 191 p. 2010 [978-3-642-10680-4] Vol. 396: Kozlowski, K. (Ed.): Robot Motion and Control 2009 475 p. 2009 [978-1-84882-984-8] Vol. 395: Talebi, H.A.; Abdollahi, F.; Patel, R.V.; Khorasani, K.: Neural Network-Based State Estimation of Nonlinear Systems appro. 175 p. 2010 [978-1-4419-1437-8]

Vol. 394: Pipeleers, G.; Demeulenaere, B.; Swevers, J.: Optimal Linear Controller Design for Periodic Inputs 177 p. 2009 [978-1-84882-974-9] Vol. 393: Ghosh, B.K.; Martin, C.F.; Zhou, Y.: Emergent Problems in Nonlinear Systems and Control 285 p. 2009 [978-3-642-03626-2] Vol. 392: Bandyopadhyay, B.; Deepak, F.; Kim, K.-S.: Sliding Mode Control Using Novel Sliding Surfaces 137 p. 2009 [978-3-642-03447-3] Vol. 391: Khaki-Sedigh, A.; Moaveni, B.: Control Configuration Selection for Multivariable Plants 232 p. 2009 [978-3-642-03192-2] Vol. 390: Chesi, G.; Garulli, A.; Tesi, A.; Vicino, A.: Homogeneous Polynomial Forms for Robustness Analysis of Uncertain Systems 197 p. 2009 [978-1-84882-780-6] Vol. 389: Bru, R.; Romero-Vivó, S. (Eds.): Positive Systems 398 p. 2009 [978-3-642-02893-9] Vol. 388: Jacques Loiseau, J.; Michiels, W.; Niculescu, S-I.; Sipahi, R. (Eds.): Topics in Time Delay Systems 418 p. 2009 [978-3-642-02896-0] Vol. 387: Xia, Y.; Fu, M.; Shi, P.: Analysis and Synthesis of Dynamical Systems with Time-Delays 283 p. 2009 [978-3-642-02695-9] Vol. 386: Huang, D.; Nguang, S.K.: Robust Control for Uncertain Networked Control Systems with Random Delays 159 p. 2009 [978-1-84882-677-9]

Vol. 385: Jungers, R.: The Joint Spectral Radius 144 p. 2009 [978-3-540-95979-3] Vol. 384: Magni, L.; Raimondo, D.M.; Allgöwer, F. (Eds.): Nonlinear Model Predictive Control 572 p. 2009 [978-3-642-01093-4] Vol. 383: Sobhani-Tehrani E.: Khorasani K.; Fault Diagnosis of Nonlinear Systems Using a Hybrid Approach 360 p. 2009 [978-0-387-92906-4]

Vol. 373: Wang Q.-G.; Ye Z.; Cai W.-J.; Hang C.-C.: PID Control for Multivariable Processes 264 p. 2008 [978-3-540-78481-4] Vol. 372: Zhou J.; Wen C.: Adaptive Backstepping Control of Uncertain Systems 241 p. 2008 [978-3-540-77806-6] Vol. 371: Blondel V.D.; Boyd S.P.; Kimura H. (Eds.): Recent Advances in Learning and Control 279 p. 2008 [978-1-84800-154-1]

Vol. 382: Bartoszewicz A.; Nowacka-Leverton A.: Time-Varying Sliding Modes for Second and Third Order Systems 192 p. 2009 [978-3-540-92216-2]

Vol. 370: Lee S.; Suh I.H.; Kim M.S. (Eds.): Recent Progress in Robotics: Viable Robotic Service to Human 410 p. 2008 [978-3-540-76728-2]

Vol. 381: Hirsch M.J.; Commander C.W.; Pardalos P.M.; Murphey R. (Eds.): Optimization and Cooperative Control Strategies: Proceedings of the 8th International Conference on Cooperative Control and Optimization 459 p. 2009 [978-3-540-88062-2]

Vol. 369: Hirsch M.J.; Pardalos P.M.; Murphey R.; Grundel D.: Advances in Cooperative Control and Optimization 423 p. 2007 [978-3-540-74354-5]

Vol. 380: Basin M.: New Trends in Optimal Filtering and Control for Polynomial and Time-Delay Systems 206 p. 2008 [978-3-540-70802-5] Vol. 379: Mellodge P.; Kachroo P.: Model Abstraction in Dynamical Systems: Application to Mobile Robot Control 116 p. 2008 [978-3-540-70792-9] Vol. 378: Femat R.; Solis-Perales G.: Robust Synchronization of Chaotic Systems Via Feedback 199 p. 2008 [978-3-540-69306-2] Vol. 377: Patan K.: Artificial Neural Networks for the Modelling and Fault Diagnosis of Technical Processes 206 p. 2008 [978-3-540-79871-2] Vol. 376: Hasegawa Y.: Approximate and Noisy Realization of Discrete-Time Dynamical Systems 245 p. 2008 [978-3-540-79433-2] Vol. 375: Bartolini G.; Fridman L.; Pisano A.; Usai E. (Eds.): Modern Sliding Mode Control Theory 465 p. 2008 [978-3-540-79015-0] Vol. 374: Huang B.; Kadali R.: Dynamic Modeling, Predictive Control and Performance Monitoring 240 p. 2008 [978-1-84800-232-6]

Vol. 368: Chee F.; Fernando T. Closed-Loop Control of Blood Glucose 157 p. 2007 [978-3-540-74030-8] Vol. 367: Turner M.C.; Bates D.G. (Eds.): Mathematical Methods for Robust and Nonlinear Control 444 p. 2007 [978-1-84800-024-7] Vol. 366: Bullo F.; Fujimoto K. (Eds.): Lagrangian and Hamiltonian Methods for Nonlinear Control 2006 398 p. 2007 [978-3-540-73889-3] Vol. 365: Bates D.; Hagström M. (Eds.): Nonlinear Analysis and Synthesis Techniques for Aircraft Control 360 p. 2007 [978-3-540-73718-6] Vol. 364: Chiuso A.; Ferrante A.; Pinzoni S. (Eds.): Modeling, Estimation and Control 356 p. 2007 [978-3-540-73569-4] Vol. 363: Besançon G. (Ed.): Nonlinear Observers and Applications 224 p. 2007 [978-3-540-73502-1] Vol. 362: Tarn T.-J.; Chen S.-B.; Zhou C. (Eds.): Robotic Welding, Intelligence and Automation 562 p. 2007 [978-3-540-73373-7] Vol. 361: Méndez-Acosta H.O.; Femat R.; González-Álvarez V. (Eds.): Selected Topics in Dynamics and Control of Chemical and Biological Processes 320 p. 2007 [978-3-540-73187-0]